Task ID: a6b6b25b-fbdf-4830-bd13-258c6bfd9948
Current Version: v32
Date: 2026-02-04
After fixing the grader bugs (broken self-parameter methods), the task now passes test-solution with a perfect score. The agent pass rate is expected to increase with the fixed grader—it may exceed the 70% threshold, but it also might not.
This document is provided as a contingency. These hardening suggestions should only be pursued if the task proves too easy after the grader fix. If the task passes within the acceptable threshold (pass rate < 70%) after fixing the grader, no further changes are needed.
The options below are evaluated by implementation effort and expected effectiveness, prioritized for if/when hardening becomes necessary.
Before suggesting discovery paths, we must understand what agents can actually access:
| Resource | Access | Notes |
|---|---|---|
crictl images |
✅ YES | Host command, not RBAC restricted |
kubectl get ingress -A |
❌ NO | No cluster-wide ingress permission |
kubectl get namespaces |
❌ NO | No namespace list permission |
ConfigMaps in observability |
✅ YES | Task-specific RBAC grant |
Pods/Services in observability |
❌ NO | Only ConfigMap access granted |
| Wiki content (Gitea HTTP) | ✅ YES | HTTP accessible at gitea.devops.local |
quadrantChart
title Hardening Options: Effort vs Effectiveness
x-axis Low Effort --> High Effort
y-axis Low Effectiveness --> High Effectiveness
quadrant-1 High Impact, More Work
quadrant-2 Sweet Spot
quadrant-3 Low Priority
quadrant-4 Avoid
"Remove version hints": [0.25, 0.75]
"Require SLO burn rate": [0.45, 0.80]
"Multi-protocol modules": [0.40, 0.70]
"Stricter dashboard": [0.55, 0.72]
"Recording rules": [0.50, 0.55]
"Alert severities": [0.45, 0.50]
"HA deployment": [0.55, 0.50]
"Retention config": [0.15, 0.25]
"Decoy services": [0.80, 0.35]
"RBAC/ServiceAccount": [0.70, 0.40]
Good hardening requires agents to perform more discovery steps, not guess at hidden information.
Tasks can require genuinely harder configurations that test real DevOps expertise.
| Approach | Type | Verdict |
|---|---|---|
| Remove wiki entirely | Guessing | Bad |
| Remove version hints only | Discovery (crictl) | Good |
| Require SLO burn rate alerts | Technical complexity | Good |
| Require multi-protocol blackbox | Technical complexity | Good |
| Add endpoints via Ingress discovery | Requires missing RBAC | Won't work |
| Effort | Effectiveness | Type |
|---|---|---|
| Low | High | Discovery |
Discovery Path: crictl images | grep -E "prometheus|blackbox|grafana" ✅ Verified
Currently the wiki tells agents exactly which versions to use. Remove this hint and let agents discover preloaded images via containerd.
Change required:
# setup.sh - wiki content
- Use the **most recent version** of each image.
- crictl images | grep -E \"prometheus|blackbox|grafana\"
+ Container images are preloaded in the air-gapped environment.
+ Agents must discover available versions using standard container tooling.| Effort | Effectiveness | Type |
|---|---|---|
| Medium | High | Technical complexity |
Requires understanding multi-window burn rates. Instead of simple "probe failed for 2m" alerts, require proper SLO-based alerting with burn rate calculations.
Example required configuration:
groups:
- name: synthetic-slo
rules:
# Fast burn: 14.4x error budget consumption
- alert: SyntheticProbeHighBurnRate
expr: |
(
1 - avg_over_time(probe_success{job="blackbox"}[5m])
) / (1 - 0.99) > 14.4
for: 2m
labels:
severity: critical
annotations:
description: "High error budget burn rate detected"
# Slow burn: 1x error budget consumption over 6h
- alert: SyntheticProbeLowBurnRate
expr: |
(
1 - avg_over_time(probe_success{job="blackbox"}[6h])
) / (1 - 0.99) > 1
for: 1h
labels:
severity: warningGrader check:
def check_slo_burn_rate_alerts():
"""Verify alerts use SLO burn rate pattern."""
code, out, _ = sh(
"kubectl get configmap prometheus-config "
"-n observability -o yaml"
)
if code != 0:
return False, "Prometheus config not readable"
# Check for burn rate pattern
has_burn_rate = (
'error budget' in out.lower() or
'14.4' in out or # Fast burn multiplier
'1 - avg_over_time' in out or
'burn' in out.lower()
)
if not has_burn_rate:
return False, "Alerts should use SLO burn rate pattern, not simple threshold"
return True, "Alerts use proper SLO burn rate calculations"| Effort | Effectiveness | Type |
|---|---|---|
| Medium | High | Technical complexity |
Requires understanding protocol-specific probe configuration. Require different blackbox modules for different protocols, with proper TLS validation where appropriate.
Required modules:
http_2xxfor HTTP endpoints with TLS validationtcp_connectfor raw TCP endpoints (K8s API)icmpfor network layer checks (optional bonus)
Current check already validates this but could be stricter:
def check_blackbox_modules():
"""Verify correct Blackbox modules used for each protocol."""
# ... existing code ...
# NEW: Require explicit TLS verification for HTTPS targets
if 'https://' in out:
if 'tls_config' not in out and 'insecure_skip_verify: false' not in out:
return False, "HTTPS targets should have explicit TLS verification config"
return True, "Blackbox modules correctly matched to target protocols"| Effort | Effectiveness | Type |
|---|---|---|
| Medium | High | Technical complexity |
Discovery Path: Standard Grafana/Prometheus patterns (documentation)
Require dashboards to include:
- Availability percentage panel (not just raw probe_success)
- Per-endpoint breakdown
- Response time histogram or percentiles
Example required query:
avg_over_time(probe_success{job="blackbox"}[1h]) * 100
Change required:
# grader.py - check_grafana_dashboard_semantics()
def check_grafana_dashboard_semantics():
# ... existing checks ...
# NEW: Check for percentage calculation
has_percentage = (
'* 100' in out or
'*100' in out or
'percentage' in out.lower() or
'100 *' in out
)
if not has_percentage:
return False, "Dashboard should show availability as percentage"
# NEW: Check for response time metrics
has_latency = (
'probe_duration' in out or
'probe_http_duration' in out or
'duration_seconds' in out
)
if not has_latency:
return False, "Dashboard should include response time metrics"
return True, "Dashboard meets visualization requirements"| Effort | Effectiveness | Type |
|---|---|---|
| Medium | Medium | Technical complexity |
Requires understanding Prometheus recording rules. Require Prometheus recording rules for pre-computed availability metrics:
groups:
- name: synthetic-recording
rules:
- record: probe:availability:5m
expr: avg_over_time(probe_success[5m])
- record: probe:availability:1h
expr: avg_over_time(probe_success[1h])
- record: probe:latency_p99:5m
expr: histogram_quantile(0.99, sum(rate(probe_duration_seconds_bucket[5m])) by (le, instance))Grader check:
def check_recording_rules():
"""Verify recording rules exist for availability metrics."""
code, out, _ = sh(
"kubectl get configmap prometheus-config "
"-n observability -o yaml"
)
if code != 0:
return False, "Prometheus config not readable"
# Check for recording rule pattern
if 'record:' not in out:
return False, "Prometheus should have recording rules for availability metrics"
if 'probe:' not in out.lower() and 'availability' not in out.lower():
return False, "Recording rules should compute availability metrics"
return True, "Recording rules configured for availability metrics"| Effort | Effectiveness |
|---|---|
| Medium | Medium |
Require both warning (degraded) and critical (down) alerts:
- alert: SyntheticProbeWarning
expr: avg_over_time(probe_success[5m]) < 0.99
for: 5m
labels:
severity: warning
- alert: SyntheticProbeCritical
expr: probe_success == 0
for: 2m
labels:
severity: critical| Effort | Effectiveness |
|---|---|
| Medium | Medium |
Require replicas: 2 for Prometheus or blackbox-exporter with proper PodDisruptionBudget.
| Option | Reason |
|---|---|
| Remove wiki entirely | No discovery path - becomes guessing |
| Require ingress discovery | Agent RBAC doesn't allow kubectl get ingress -A |
| Decoy services | Ambiguous - which are "critical"? Leads to guessing |
| RBAC/ServiceAccount validation | High effort, agents often get this implicitly |
| Complex probe depth (timeouts, TLS) | Low signal - easy to copy from docs |
Start with options 1-4 (verified discovery + technical complexity), then add 5-7 if pass rate remains above 70%.
| Priority | Change | Type | Effort | Expected Impact |
|---|---|---|---|---|
| 1 | Remove version hints | Discovery | Low | -10-15% pass rate |
| 2 | Require SLO burn rate alerts | Technical | Medium | -15-20% pass rate |
| 3 | Multi-protocol blackbox | Technical | Medium | -10-15% pass rate |
| 4 | Stricter dashboard validation | Technical | Medium | -10-15% pass rate |
| 5+ | Recording rules, alert severities, HA | Technical | Medium | -5-10% each |
The most effective hardening combines:
- Discovery requirements that use verified accessible paths (
crictl, wiki HTTP) - Technical complexity that tests genuine DevOps expertise (SLO burn rates, recording rules)
Avoid hardening that relies on RBAC access the agent doesn't have (kubectl get ingress -A) or pure guessing (removing all documentation).