troubleshoot Kubernetes SDP self-hosted installation : Portal

Need

When deploying a Helm chart, you might encounter an error message such as:

Error: failed post-install: 1 error occurred: 
* job <job_name> failed: BackoffLimitExceeded

Error: failed post-install: 1 error occurred: 
* timed out waiting for the condition

Unfortunately, this message alone does not provide enough detail to diagnose the root cause. To troubleshoot effectively, we need to collect more detailed information from the Kubernetes cluster — specifically from the pods involved in the failed job.

Summarized Solution

To investigate the issue, you can:

List all pods in the target namespace to identify which ones might be failing.

kubectl get pods -n <namespace>

Describe and get logs from the problematic pods to see what went wrong.

kubectl describe pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace>

Detailed Solution

1- Identify the namespace and list pods

Run the following command to list all pods in the namespace where your Helm release was deployed:

kubectl get pods -n <namespace>

This will display the status of each pod.

Look for pods with a Status such as Error, CrashLoopBackOff, or ImagePullBackOff.

Example output:

NAME              READY         STATUS         RESTARTS       AGE
<pod1>            0/1           Error          1              2m
<pod2>            1/1           Running        0              5m

2- Describe the failing pods

To understand what caused the failure, describe the problematic pod:

kubectl describe pod <pod> -n <namespace>

This command shows detailed information about what the pod is trying to do.

Example 'Event' section:

Events:
  Type     Reason     Age                From               Message
  Warning  Unhealthy  39m (x9 over 42m)  kubelet            Readiness probe failed: Get "http://10.244.0.17:5000/auth/realms/master": dial tcp 10.244.0.17:5000: connect: connection refused
  Warning  Unhealthy  39m (x6 over 41m)  kubelet            Liveness probe failed: dial tcp 10.244.0.17:5000: connect: connection refused
  Normal   Killing    39m                kubelet            Container keycloak failed liveness probe, will be restarted

3- Retrieve the logs

Next, get the container logs to see what happened right before the failure:

kubectl logs <pod> -n <namespace> --all-containers=true

Optional: Use the automation script

To accelerate the process, you can use the homemade shell script that automates the collection of the logs:

#!/bin/sh

# Script to collect pod descriptions and logs for one or all namespaces
# Organized into a timestamped main folder with per-namespace subfolders


# Check for help options first
if [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    echo "Usage: $0 [OPTIONS] [NAMESPACE...]"
    echo
    echo "Collects pod descriptions and logs for some or all namespaces."
    echo "Data is saved in a timestamped main folder with per-namespace subfolders."
    echo
    echo "Options:"
    echo "  -h, --help          Show this help message and exit."
    echo "  [NAMESPACE]         Specify one or more namespaces, separated by space, to collect data from."
    echo "                      If no namespaces are specified, data for all namespaces is gathered."
    echo
    exit 0
fi

# set -u after option -h/--help in case of $1 is missing
set -eu

# Generate main timestamped folder
main_folder="pod_details_$(date +"%Y%m%d_%H%M%S")"
mkdir -p "$main_folder"

# Determine target namespaces
# If no argument provided: loop on all namespaces
if [ "$#" -eq 0 ]; then
    echo "No namespace provided. Gathering data for all namespaces..."
    namespaces=$(kubectl get ns --no-headers | awk '{print $1}')
else
    # If one or more namespace(s) is/are provided, use it/them
    namespaces="$@"
    echo "Namespaces provided. Gathering data only for namespaces: $namespaces"
fi

# Loop over each namespace
for ns in $namespaces; do
    echo "Checking namespace: $ns"

    # Get pods in the namespace
    pod_list=$(kubectl get pod -n "$ns" --no-headers 2>/dev/null | awk '{print $1}')

    # Skip if no pods found
    if [ -z "$pod_list" ]; then
        echo "  No pods found in namespace: $ns — skipping."
        continue
    fi

    echo "  Pods found in namespace: $ns — collecting logs."

    # Create subfolder for the namespace
    ns_folder="${main_folder}/${ns}"
    mkdir -p "$ns_folder"

    # Save pod list to file and extract pod names
    kubectl get pod -o wide -n "$ns" > "$ns_folder/pods.txt"

    # Loop through pods
    for pod in $pod_list; do
        (
            echo "    Processing pod: $pod"
            kubectl describe pod "$pod" -n "$ns" > "$ns_folder/${pod}_describe.txt"

            log_file="$ns_folder/${pod}_logs.log"
            # remove ANSI color, replace \n with new lines and  and \t with tabs, remove escape for slashes /
            kubectl logs "$pod" -n "$ns" --all-containers=true 2>/dev/null \
                | sed 's/\x1b\[[0-9;]*m//g' | sed 's/\\n/\n/g' | sed 's/\\t/\t/g' | sed 's/\\\//\//g' > "$log_file"

            # Remove the file if it's empty
            if [ ! -s "$log_file" ]; then
                echo "    No logs for pod: $pod"
                rm -f "$log_file"
            fi
        ) &
    done
    wait

    echo "  Finished namespace: $ns"
done

echo "All available pod details and logs saved in: $main_folder"

Copy the script into a file named poddetails.sh, then make it executable and retrieve the details of the namespace :

chmod +x poddetails.sh
./poddetails.sh <namespace>

How to troubleshoot Kubernetes self-hosted installation Print

Need

Summarized Solution

Detailed Solution

Related Articles