Date: 2026-02-13
Authors: Dylan Fitzgerald + Claude (Opus 4.6)
Scope: Standalone tasks only (not subtasks). All numbers are total tasks including the current ~120.
| --- /tmp/task-download/single-node-chaos-hardening/grader.py 2026-02-13 13:36:32.569022191 -0700 | |
| +++ /home/dylan/dev/Nebula/tasks/single-node-chaos-hardening/grader.py 2026-02-13 14:17:17.945132597 -0700 | |
| @@ -242,9 +242,6 @@ | |
| "priority_ok": priority_value > min_priority_value, | |
| "priority_value": priority_value, | |
| "priority_class": pc_name, | |
| - "mem_toleration": any( | |
| - t.get("key") == "node.kubernetes.io/memory-pressure" for t in tolerations | |
| - ), | |
| "node_affinity": "nodeAffinity" in affinity, |
Date: 2026-02-13
Authors: Dylan Fitzgerald + Claude (Opus 4.6)
Context: The client has ~120 accepted apex-arena tasks on the Nebula platform. They want to reach 900. This document estimates what we can deliver and in what timeframe, grounded in empirical gap analysis.
check_grafana_alert_rule() in grader.py only detects the >= 5 threshold
when it appears inline in the PromQL expr field (e.g. "expr": "pg_lock_wait_time_seconds > 5").
The solution.sh creates alerts this way, so test-solution passes. But agents
universally create alerts using Grafana's standard multi-step format, where the
threshold lives in a separate step -- not in the expr string. This causes
| --- a/tasks/single-node-chaos-hardening/grader.py | |
| +++ b/tasks/single-node-chaos-hardening/grader.py | |
| @@ -130,13 +130,6 @@ def validate_setup(): | |
| if node_taints and not affinity_issues_found: | |
| feedback.append("All Tier 1/2 workloads compatible with node taints") | |
| - # --- PDB Existence Check --- | |
| - print("validating PDB existence") | |
| - for ns in ["argocd", "bleater", "monitoring"]: | |
| - rc, out = run(f"kubectl get pdb -n {ns} -o json") |
| --- a/tasks/single-node-chaos-hardening/grader.py | |
| +++ b/tasks/single-node-chaos-hardening/grader.py | |
| @@ -69,18 +69,23 @@ def validate_setup(): | |
| all_ok = False | |
| # --- Priority Ordering Check --- | |
| - print("validating relative priority ordering") | |
| - t1 = get_effective_priority("bleater") | |
| - t2 = get_effective_priority("gitea") | |
| - t3 = get_effective_priority("loadgenerator") |
The solution.sh for the kubernetes-security-hardening-zero-disruption task was failing the RBAC check with score 0.75 even though the solution correctly:
- Created a dedicated ServiceAccount (
bleater-app) - Patched all Deployments/StatefulSets to use the new SA
- Performed a rolling restart
| --- a/setup.sh | |
| +++ b/setup.sh | |
| @@ -69,6 +69,16 @@ | |
| # ------------------------------------------------------------ | |
| -# Prometheus discovery label mismatch (fault 4) | |
| +# Prometheus discovery label mismatch (fault 4 & 5) | |
| +# Fault 4: observability label | |
| +# Fault 5: remove app label so the bleater-services endpoint | |
| +# discovery job can no longer match these services. | |
| +# This forces annotation-based pod discovery to be the |
Task ID: a6b6b25b-fbdf-4830-bd13-258c6bfd9948
Current Version: v32
Date: 2026-02-04
After fixing the grader bugs (broken self-parameter methods), the task now passes test-solution with a perfect score. The agent pass rate is expected to increase with the fixed grader—it may exceed the 70% threshold, but it also might not.