Skip to content

Instantly share code, notes, and snippets.

@arubis
Last active February 6, 2026 00:17
Show Gist options
  • Select an option

  • Save arubis/caa06bd714515fb67f6760e9ea7005aa to your computer and use it in GitHub Desktop.

Select an option

Save arubis/caa06bd714515fb67f6760e9ea7005aa to your computer and use it in GitHub Desktop.
Fix for kubernetes-security-hardening-zero-disruption task: wait for old pods using default SA to fully terminate before grader runs

Fix: Wait for Terminating Pods Before Grader Runs

Problem

The solution.sh for the kubernetes-security-hardening-zero-disruption task was failing the RBAC check with score 0.75 even though the solution correctly:

  1. Created a dedicated ServiceAccount (bleater-app)
  2. Patched all Deployments/StatefulSets to use the new SA
  3. Performed a rolling restart

Root Cause

The grader's check_rbac() function queries ALL pods in the namespace:

pods = json.loads(out).get("items", [])
for pod in pods:
    sa = pod.get("spec", {}).get("serviceAccountName", "default")
    sas_in_use.add(sa)

if "default" in sas_in_use:
    return False, "bleater workloads still use default ServiceAccount"

This includes pods in Terminating state. After a rolling restart:

  • New pods are created with the new ServiceAccount
  • Old pods are marked for deletion but take time to fully terminate
  • kubectl rollout status returns success when new pods are ready, not when old pods are gone

The grader runs immediately after the solution completes, sees the Terminating pods still using default SA, and fails.

Solution

Add a wait loop after kubectl rollout status that polls until ALL pods using the default ServiceAccount have fully terminated:

timeout 120 bash -c '
  while true; do
    pods_with_default=$(kubectl get pods -n bleater -o json | \
      python3 -c "import sys,json; pods=json.load(sys.stdin)[\"items\"]; print(len([p for p in pods if p[\"spec\"].get(\"serviceAccountName\",\"default\")==\"default\"]))")
    if [ "$pods_with_default" = "0" ]; then
      break
    fi
    sleep 5
  done
'

This ensures the grader only runs after old pods are completely gone, not just marked for deletion.

Result

  • Before fix: Score 0.75 (RBAC check fails)
  • After fix: Score 1.0 (all checks pass)
--- /tmp/task-check/kubernetes-security-hardening-zero-disruption/solution.sh 2026-02-05 16:26:12.093429844 -0700
+++ /home/dylan/dev/Nebula/tasks/kubernetes-security-hardening-zero-disruption/solution.sh 2026-02-05 17:00:52.333861664 -0700
@@ -185,6 +185,22 @@
kubectl rollout restart deployment -n "$NAMESPACE"
kubectl rollout status deployment -n "$NAMESPACE" --timeout=180s
+echo "[INFO] Waiting for old pods using default SA to fully terminate..."
+# Wait up to 120s for ALL pods using 'default' SA to completely disappear (including Terminating)
+timeout 120 bash -c '
+ while true; do
+ # Count ALL pods (including Terminating) that use default SA
+ pods_with_default=$(kubectl get pods -n '"$NAMESPACE"' -o json | \
+ python3 -c "import sys,json; pods=json.load(sys.stdin)[\"items\"]; print(len([p for p in pods if p[\"spec\"].get(\"serviceAccountName\",\"default\")==\"default\"]))")
+ if [ "$pods_with_default" = "0" ]; then
+ echo "[INFO] All pods now using dedicated ServiceAccount"
+ break
+ fi
+ echo "[INFO] Still waiting for $pods_with_default pod(s) using default SA to terminate..."
+ sleep 5
+ done
+' || echo "[WARN] Timeout waiting for old pods, continuing anyway"
+
########################################
# Final checks
########################################
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment