Skip to content

Instantly share code, notes, and snippets.

@Nottlespike
Created December 31, 2025 01:24
Show Gist options
  • Select an option

  • Save Nottlespike/7c7977b66ef5529a775abdd93dfdbec2 to your computer and use it in GitHub Desktop.

Select an option

Save Nottlespike/7c7977b66ef5529a775abdd93dfdbec2 to your computer and use it in GitHub Desktop.
# P100 Extended Implementation Tasks - 100+ tasks for full agent utilization
# Tesla P100 (GP100) - 56 SMs, 3584 CUDA cores, 16GB HBM2 @ 732 GB/s
tasks:
# ============================================
# CUDA Kernels (P0 - Critical) - 20 tasks
# ============================================
- name: kernel-vecadd-sm60
prompt: |
Create contrib/p100/kernels/vecadd_sm60.cu
Reference: contrib/p40/kernels/vecadd.cu
Implement vector addition kernel optimized for sm_60:
- Use __ldg() for read-only data
- Optimize for 732 GB/s HBM2 bandwidth
- Include FP32 and FP16 variants
- Add warp-level primitives for reduction
priority: P0
dependencies: []
- name: kernel-matmul-sm60
prompt: |
Create contrib/p100/kernels/matmul_sm60.cu
Implement tiled matrix multiplication for P100:
- Shared memory tiling for 64KB per SM
- Register blocking for high arithmetic intensity
- Support for transposed inputs
- FP32/FP16 variants with mixed precision accumulation
priority: P0
dependencies: []
- name: kernel-softmax-sm60
prompt: |
Create contrib/p100/kernels/softmax_sm60.cu
Implement numerically stable softmax for P100:
- Online softmax algorithm (single pass)
- Warp-level reductions using __shfl_down_sync
- Support for variable sequence lengths
- Fused with attention score scaling
priority: P0
dependencies: []
- name: kernel-gelu-sm60
prompt: |
Create contrib/p100/kernels/gelu_sm60.cu
Implement GELU activation for P100:
- Exact GELU using erf()
- Fast GELU approximation (tanh-based)
- Fused with bias addition
- In-place variant for memory efficiency
priority: P0
dependencies: []
- name: kernel-layernorm-sm60
prompt: |
Create contrib/p100/kernels/layernorm_sm60.cu
Implement fused LayerNorm for P100:
- Single-pass mean and variance
- Warp-level reductions
- Fused with residual addition
- Support for RMSNorm variant
priority: P0
dependencies: []
- name: kernel-attention-sm60
prompt: |
Create contrib/p100/kernels/attention_sm60.cu
Implement scaled dot-product attention for P100:
- Q×K^T computation with scaling
- Softmax in registers where possible
- V projection fused
- Support for causal masking
priority: P0
dependencies: []
- name: kernel-flash-attn-sm60
prompt: |
Create contrib/p100/kernels/flash_attention_sm60.cu
Implement Flash Attention algorithm for P100:
- Tiled Q, K, V processing
- Online softmax with rescaling
- Maximize HBM2 bandwidth utilization
- Support for multi-head attention
priority: P0
dependencies: []
- name: kernel-rope-sm60
prompt: |
Create contrib/p100/kernels/rope_sm60.cu
Implement Rotary Position Embeddings for P100:
- Fused sin/cos computation
- In-place rotation
- Support for variable positions
- Batch processing for efficiency
priority: P0
dependencies: []
- name: kernel-silu-sm60
prompt: |
Create contrib/p100/kernels/silu_sm60.cu
Implement SiLU/Swish activation for P100:
- Fused sigmoid and multiply
- Support for gated variants (SiLU-Gate)
- In-place computation
- FP16 support
priority: P0
dependencies: []
- name: kernel-embedding-sm60
prompt: |
Create contrib/p100/kernels/embedding_sm60.cu
Implement embedding lookup for P100:
- Coalesced memory access
- Support for large vocabularies
- Position embedding addition
- Padding handling
priority: P0
dependencies: []
- name: kernel-reduce-sm60
prompt: |
Create contrib/p100/kernels/reduce_sm60.cu
Implement general reduction kernels for P100:
- Sum, mean, max, min operations
- Multi-stage reduction for large tensors
- Warp-level and block-level variants
- Support for different data types
priority: P0
dependencies: []
- name: kernel-transpose-sm60
prompt: |
Create contrib/p100/kernels/transpose_sm60.cu
Implement optimized transpose for P100:
- Shared memory transpose to avoid bank conflicts
- Batched transpose for 3D/4D tensors
- Support for non-contiguous strides
- Fused with permute operations
priority: P0
dependencies: []
- name: kernel-concat-sm60
prompt: |
Create contrib/p100/kernels/concat_sm60.cu
Implement tensor concatenation for P100:
- Along any dimension
- Memory-efficient for large tensors
- Support for variable number of inputs
- Fused with split operations
priority: P0
dependencies: []
- name: kernel-scatter-sm60
prompt: |
Create contrib/p100/kernels/scatter_gather_sm60.cu
Implement scatter/gather operations for P100:
- Index-based scatter/gather
- Atomic operations for overlapping indices
- Support for multi-dimensional indexing
- Optimized for sparse patterns
priority: P0
dependencies: []
- name: kernel-conv1d-sm60
prompt: |
Create contrib/p100/kernels/conv1d_sm60.cu
Implement 1D convolution for P100:
- Direct convolution for small kernels
- FFT-based for large kernels
- Causal padding support
- Grouped convolution
priority: P1
dependencies: []
- name: kernel-topk-sm60
prompt: |
Create contrib/p100/kernels/topk_sm60.cu
Implement top-k selection for P100:
- Radix-based selection
- Support for large k values
- Sorted and unsorted output
- Index tracking
priority: P1
dependencies: []
- name: kernel-dropout-sm60
prompt: |
Create contrib/p100/kernels/dropout_sm60.cu
Implement dropout for P100:
- PHILOX random number generator
- Deterministic with seeds
- Fused with scaling
- Inference mode (pass-through)
priority: P1
dependencies: []
- name: kernel-cross-entropy-sm60
prompt: |
Create contrib/p100/kernels/cross_entropy_sm60.cu
Implement cross-entropy loss for P100:
- Numerically stable log-softmax
- Label smoothing support
- Ignore index handling
- Gradient computation
priority: P1
dependencies: []
- name: kernel-adamw-sm60
prompt: |
Create contrib/p100/kernels/adamw_sm60.cu
Implement AdamW optimizer step for P100:
- Fused parameter update
- Weight decay handling
- FP32 master weights with FP16 params
- Gradient clipping option
priority: P1
dependencies: []
- name: kernel-rmsprop-sm60
prompt: |
Create contrib/p100/kernels/rmsprop_sm60.cu
Implement RMSprop optimizer for P100:
- Running average of squared gradients
- Momentum variant
- Epsilon for numerical stability
- Centered variant
priority: P1
dependencies: []
# ============================================
# Memory Management (P1 - High) - 15 tasks
# ============================================
- name: mem-pool-allocator
prompt: |
Create contrib/p100/worker/p100_pool_allocator.py
Implement memory pool allocator for P100 HBM2:
- Power-of-2 size classes
- Thread-safe allocation
- Memory coalescing
- Fragmentation tracking
priority: P1
dependencies: []
- name: mem-async-transfer
prompt: |
Create contrib/p100/worker/p100_async_transfer.py
Implement async memory transfers:
- Pinned host memory
- Bidirectional DMA
- Stream-ordered transfers
- Overlap with compute
priority: P1
dependencies: []
- name: mem-tensor-cache
prompt: |
Create contrib/p100/worker/p100_tensor_cache.py
Implement tensor caching for P100:
- LRU eviction policy
- Reference counting
- Cache coherency with host
- Size-based eviction
priority: P1
dependencies: []
- name: mem-kv-cache-manager
prompt: |
Create contrib/p100/worker/p100_kv_cache.py
Implement KV cache for transformer inference:
- Paged attention support
- Dynamic cache growth
- Multi-sequence batching
- Memory-efficient layout
priority: P1
dependencies: []
- name: mem-gradient-checkpointing
prompt: |
Create contrib/p100/worker/p100_checkpoint.py
Implement gradient checkpointing:
- Selective recomputation
- Segment boundaries
- Memory-compute tradeoff
- Integration with autograd
priority: P1
dependencies: []
- name: mem-zero-copy
prompt: |
Create contrib/p100/worker/p100_zero_copy.py
Implement zero-copy memory for P100:
- Direct GPU access to host memory
- Memory mapping
- Access pattern optimization
- Coherency management
priority: P1
dependencies: []
- name: mem-prefetch
prompt: |
Create contrib/p100/worker/p100_prefetch.py
Implement memory prefetching:
- Hardware prefetch hints
- Software-managed prefetch
- Prefetch scheduling
- Bandwidth management
priority: P1
dependencies: []
- name: mem-compaction
prompt: |
Create contrib/p100/worker/p100_compaction.py
Implement memory compaction:
- Defragmentation algorithm
- Background compaction
- Minimal disruption
- Statistics tracking
priority: P2
dependencies: []
- name: mem-oversubscription
prompt: |
Create contrib/p100/worker/p100_oversubscription.py
Implement memory oversubscription:
- Page eviction to host
- Working set tracking
- Demand paging
- Access pattern learning
priority: P2
dependencies: []
- name: mem-shared-pool
prompt: |
Create contrib/p100/worker/p100_shared_pool.py
Implement shared memory pool across processes:
- IPC memory sharing
- Reference counting
- Cleanup on process exit
- Security boundaries
priority: P2
dependencies: []
- name: mem-nvlink-p2p
prompt: |
Create contrib/p100/worker/p100_nvlink.py
Implement NVLink P2P transfers (if available):
- Direct GPU-to-GPU transfer
- Topology detection
- Optimal routing
- Fallback to PCIe
priority: P2
dependencies: []
- name: mem-staging-buffer
prompt: |
Create contrib/p100/worker/p100_staging.py
Implement staging buffers:
- Double buffering
- Ring buffer for streams
- Size optimization
- Lifetime management
priority: P2
dependencies: []
- name: mem-hbm2-optimizer
prompt: |
Create contrib/p100/worker/p100_hbm2_optimizer.py
Implement HBM2-specific optimizations:
- Stack interleaving patterns
- Bank conflict avoidance
- Access coalescing
- Pseudo-channel awareness
priority: P1
dependencies: []
- name: mem-ecc-manager
prompt: |
Create contrib/p100/worker/p100_ecc.py
Implement ECC memory management:
- ECC status monitoring
- Error counters
- Page retirement
- Health reporting
priority: P2
dependencies: []
- name: mem-numa-aware
prompt: |
Create contrib/p100/worker/p100_numa.py
Implement NUMA-aware memory allocation:
- CPU socket detection
- Optimal placement
- Migration support
- Affinity management
priority: P2
dependencies: []
# ============================================
# Neural Network Layers (P1 - High) - 20 tasks
# ============================================
- name: nn-linear
prompt: |
Create contrib/p100/worker/layers/p100_linear.py
Implement Linear layer for P100:
- Optimized GEMM wrapper
- Bias fusion
- Mixed precision support
- Batch processing
priority: P1
dependencies: []
- name: nn-embedding
prompt: |
Create contrib/p100/worker/layers/p100_embedding.py
Implement Embedding layer:
- Lookup kernel wrapper
- Gradient accumulation
- Padding index handling
- Sparse gradients
priority: P1
dependencies: []
- name: nn-multihead-attention
prompt: |
Create contrib/p100/worker/layers/p100_mha.py
Implement Multi-Head Attention:
- QKV projection fused
- Attention kernel dispatch
- Output projection
- KV caching integration
priority: P1
dependencies: []
- name: nn-mlp-block
prompt: |
Create contrib/p100/worker/layers/p100_mlp.py
Implement MLP/FFN block:
- Fused linear + activation
- Gated variants (SwiGLU, GeGLU)
- Gradient checkpointing hooks
- Memory-efficient backward
priority: P1
dependencies: []
- name: nn-transformer-block
prompt: |
Create contrib/p100/worker/layers/p100_transformer_block.py
Implement full Transformer block:
- Pre/Post LayerNorm variants
- Attention + MLP composition
- Residual connections
- Parallel attention+FFN option
priority: P1
dependencies: []
- name: nn-conv1d-layer
prompt: |
Create contrib/p100/worker/layers/p100_conv1d.py
Implement Conv1d layer:
- Kernel dispatch
- Padding modes
- Dilation support
- Groups support
priority: P1
dependencies: []
- name: nn-groupnorm
prompt: |
Create contrib/p100/worker/layers/p100_groupnorm.py
Implement GroupNorm layer:
- Group-wise statistics
- Affine parameters
- Memory-efficient backward
- Instance norm as special case
priority: P1
dependencies: []
- name: nn-batchnorm
prompt: |
Create contrib/p100/worker/layers/p100_batchnorm.py
Implement BatchNorm layer:
- Running statistics
- Training vs eval modes
- Sync BN for multi-GPU
- Momentum for EMA
priority: P1
dependencies: []
- name: nn-dropout-layer
prompt: |
Create contrib/p100/worker/layers/p100_dropout.py
Implement Dropout layer:
- Kernel wrapper
- Deterministic mode
- Various dropout patterns
- Inference passthrough
priority: P1
dependencies: []
- name: nn-positional-encoding
prompt: |
Create contrib/p100/worker/layers/p100_pos_encoding.py
Implement positional encodings:
- Sinusoidal encoding
- Learned embeddings
- RoPE integration
- ALiBi support
priority: P1
dependencies: []
- name: nn-softmax-layer
prompt: |
Create contrib/p100/worker/layers/p100_softmax.py
Implement Softmax layer:
- Stable computation
- Temperature scaling
- Log-softmax variant
- Sparse softmax option
priority: P1
dependencies: []
- name: nn-activation-layer
prompt: |
Create contrib/p100/worker/layers/p100_activations.py
Implement activation functions:
- GELU, SiLU, ReLU, Tanh
- Fused variants
- Custom activation support
- In-place operations
priority: P1
dependencies: []
- name: nn-cross-entropy-layer
prompt: |
Create contrib/p100/worker/layers/p100_loss.py
Implement loss functions:
- CrossEntropyLoss
- Label smoothing
- Focal loss variant
- Gradient computation
priority: P1
dependencies: []
- name: nn-optimizer-wrapper
prompt: |
Create contrib/p100/worker/layers/p100_optimizers.py
Implement optimizer wrappers:
- AdamW, SGD, RMSprop
- Learning rate scheduling
- Gradient clipping
- Parameter groups
priority: P1
dependencies: []
- name: nn-lm-head
prompt: |
Create contrib/p100/worker/layers/p100_lm_head.py
Implement language model head:
- Tied embeddings
- Efficient logit computation
- Temperature sampling
- Top-k/p filtering
priority: P1
dependencies: []
- name: nn-attention-mask
prompt: |
Create contrib/p100/worker/layers/p100_attention_mask.py
Implement attention mask generation:
- Causal masks
- Padding masks
- Sliding window
- Custom patterns
priority: P1
dependencies: []
- name: nn-weight-init
prompt: |
Create contrib/p100/worker/layers/p100_init.py
Implement weight initialization:
- Xavier/Glorot
- Kaiming/He
- Orthogonal
- Custom init patterns
priority: P2
dependencies: []
- name: nn-gradient-scale
prompt: |
Create contrib/p100/worker/layers/p100_grad_scale.py
Implement gradient scaling:
- Loss scaling for FP16
- Dynamic scaling
- Overflow detection
- Gradient accumulation
priority: P1
dependencies: []
- name: nn-model-parallel
prompt: |
Create contrib/p100/worker/layers/p100_model_parallel.py
Implement tensor parallelism:
- Column parallel
- Row parallel
- Sequence parallel
- All-reduce communication
priority: P2
dependencies: []
- name: nn-pipeline-parallel
prompt: |
Create contrib/p100/worker/layers/p100_pipeline.py
Implement pipeline parallelism:
- Stage boundaries
- Micro-batching
- 1F1B scheduling
- Memory optimization
priority: P2
dependencies: []
# ============================================
# Benchmarks & Tests (P2 - Medium) - 20 tasks
# ============================================
- name: bench-gemm
prompt: |
Create contrib/p100/benchmarks/bench_gemm.py
GEMM benchmark suite:
- Various matrix sizes
- FP16/FP32 comparison
- Batched GEMM
- Peak TFLOPS measurement
priority: P2
dependencies: []
- name: bench-memory-bandwidth
prompt: |
Create contrib/p100/benchmarks/bench_bandwidth.py
Memory bandwidth benchmark:
- HBM2 read/write bandwidth
- Host-to-device transfers
- Device-to-device if multi-GPU
- Sustained vs peak bandwidth
priority: P2
dependencies: []
- name: bench-attention
prompt: |
Create contrib/p100/benchmarks/bench_attention.py
Attention benchmark:
- Standard vs Flash attention
- Various sequence lengths
- Multi-head configurations
- Memory usage tracking
priority: P2
dependencies: []
- name: bench-transformer
prompt: |
Create contrib/p100/benchmarks/bench_transformer.py
Transformer layer benchmark:
- Forward pass timing
- Backward pass timing
- Memory footprint
- Batch size scaling
priority: P2
dependencies: []
- name: bench-end-to-end
prompt: |
Create contrib/p100/benchmarks/bench_e2e.py
End-to-end inference benchmark:
- Tokens per second
- Time to first token
- Memory efficiency
- Different model sizes
priority: P2
dependencies: []
- name: bench-kernel-launch
prompt: |
Create contrib/p100/benchmarks/bench_launch.py
Kernel launch overhead benchmark:
- Empty kernel timing
- Grid size impact
- Stream switching cost
- Async launch efficiency
priority: P2
dependencies: []
- name: bench-allreduce
prompt: |
Create contrib/p100/benchmarks/bench_collective.py
Collective communication benchmark:
- All-reduce timing
- All-gather timing
- Ring vs tree algorithms
- Message size scaling
priority: P2
dependencies: []
- name: test-kernel-vecadd
prompt: |
Create contrib/p100/tests/test_kernels_vecadd.py
Vector addition kernel tests:
- Correctness verification
- Edge cases (empty, large)
- Different data types
- Random input fuzzing
priority: P2
dependencies: []
- name: test-kernel-matmul
prompt: |
Create contrib/p100/tests/test_kernels_matmul.py
Matrix multiplication tests:
- Compare with CPU reference
- Transposed variants
- Non-square matrices
- Numerical accuracy
priority: P2
dependencies: []
- name: test-kernel-attention
prompt: |
Create contrib/p100/tests/test_kernels_attention.py
Attention kernel tests:
- Softmax stability
- Causal masking
- Multi-head correctness
- Gradient verification
priority: P2
dependencies: []
- name: test-memory-alloc
prompt: |
Create contrib/p100/tests/test_memory.py
Memory allocation tests:
- Alloc/free cycles
- Fragmentation behavior
- OOM handling
- Pool efficiency
priority: P2
dependencies: []
- name: test-layers
prompt: |
Create contrib/p100/tests/test_layers.py
Layer implementation tests:
- Forward correctness
- Backward correctness
- Parameter initialization
- State serialization
priority: P2
dependencies: []
- name: test-transformer
prompt: |
Create contrib/p100/tests/test_transformer.py
Transformer integration tests:
- Block stacking
- KV cache consistency
- Sequence length handling
- Batch processing
priority: P2
dependencies: []
- name: test-precision
prompt: |
Create contrib/p100/tests/test_precision.py
Numerical precision tests:
- FP16 overflow detection
- Accuracy vs FP32
- Loss scaling verification
- Gradient magnitudes
priority: P2
dependencies: []
- name: test-streams
prompt: |
Create contrib/p100/tests/test_streams.py
CUDA stream tests:
- Stream creation/deletion
- Synchronization
- Event timing
- Multi-stream execution
priority: P2
dependencies: []
- name: test-error-handling
prompt: |
Create contrib/p100/tests/test_errors.py
Error handling tests:
- CUDA error recovery
- OOM behavior
- Invalid input handling
- Timeout detection
priority: P2
dependencies: []
- name: test-regression
prompt: |
Create contrib/p100/tests/test_regression.py
Regression test suite:
- Known bug reproductions
- Performance regression
- Memory leak detection
- API compatibility
priority: P2
dependencies: []
- name: test-stress
prompt: |
Create contrib/p100/tests/test_stress.py
Stress testing:
- Long-running workloads
- Memory pressure
- Concurrent operations
- Resource exhaustion
priority: P2
dependencies: []
- name: test-compatibility
prompt: |
Create contrib/p100/tests/test_compat.py
Compatibility tests:
- P40 code compatibility
- API consistency
- Behavior parity
- Migration validation
priority: P2
dependencies: []
- name: test-integration
prompt: |
Create contrib/p100/tests/test_integration.py
Full integration tests:
- Model loading
- Inference pipeline
- Training loop
- Checkpoint save/load
priority: P2
dependencies: []
# ============================================
# Documentation (P2 - Medium) - 15 tasks
# ============================================
- name: docs-architecture
prompt: |
Create contrib/p100/docs/ARCHITECTURE.md
P100 architecture documentation:
- GP100 die layout
- SM organization
- Memory hierarchy
- Compute capabilities
priority: P2
dependencies: []
- name: docs-kernels
prompt: |
Create contrib/p100/docs/KERNELS.md
Kernel development guide:
- sm_60 specifics
- Register usage
- Shared memory
- Optimization tips
priority: P2
dependencies: []
- name: docs-hbm2
prompt: |
Create contrib/p100/docs/HBM2_GUIDE.md
HBM2 memory guide:
- Stack architecture
- Bandwidth optimization
- Access patterns
- Bank conflicts
priority: P2
dependencies: []
- name: docs-api-reference
prompt: |
Create contrib/p100/docs/API_REFERENCE.md
API reference documentation:
- All public classes
- All public functions
- Parameter descriptions
- Usage examples
priority: P2
dependencies: []
- name: docs-getting-started
prompt: |
Create contrib/p100/docs/GETTING_STARTED.md
Getting started guide:
- Prerequisites
- Installation
- First example
- Common pitfalls
priority: P2
dependencies: []
- name: docs-migration
prompt: |
Create contrib/p100/docs/MIGRATION_FROM_P40.md
P40 to P100 migration guide:
- API differences
- Performance changes
- Memory considerations
- Code examples
priority: P2
dependencies: []
- name: docs-performance
prompt: |
Create contrib/p100/docs/PERFORMANCE_TUNING.md
Performance tuning guide:
- Profiling tools
- Bottleneck identification
- Optimization techniques
- Benchmark interpretation
priority: P2
dependencies: []
- name: docs-troubleshooting
prompt: |
Create contrib/p100/docs/TROUBLESHOOTING.md
Troubleshooting guide:
- Common errors
- Debug techniques
- Log interpretation
- Recovery procedures
priority: P2
dependencies: []
- name: docs-examples-inference
prompt: |
Create contrib/p100/examples/inference_example.py
Inference example:
- Model loading
- Input preprocessing
- Generation loop
- Output postprocessing
priority: P2
dependencies: []
- name: docs-examples-benchmark
prompt: |
Create contrib/p100/examples/benchmark_example.py
Benchmark example:
- Setup and teardown
- Timing methodology
- Results reporting
- Comparison scripts
priority: P2
dependencies: []
- name: docs-examples-memory
prompt: |
Create contrib/p100/examples/memory_example.py
Memory management example:
- Allocation patterns
- Pool usage
- Transfer optimization
- Monitoring
priority: P2
dependencies: []
- name: docs-examples-multistream
prompt: |
Create contrib/p100/examples/multistream_example.py
Multi-stream example:
- Stream creation
- Work distribution
- Synchronization
- Performance benefits
priority: P2
dependencies: []
- name: docs-changelog
prompt: |
Create contrib/p100/CHANGELOG.md
Changelog documentation:
- Version history
- Breaking changes
- New features
- Bug fixes
priority: P2
dependencies: []
- name: docs-contributing
prompt: |
Create contrib/p100/CONTRIBUTING.md
Contributing guide:
- Code style
- Test requirements
- PR process
- Review guidelines
priority: P2
dependencies: []
- name: docs-faq
prompt: |
Create contrib/p100/docs/FAQ.md
Frequently asked questions:
- Common questions
- Best practices
- Limitations
- Future plans
priority: P2
dependencies: []
# ============================================
# Utilities & Tools (P2 - Medium) - 10 tasks
# ============================================
- name: tool-profiler-analysis
prompt: |
Create contrib/p100/tools/profiler_analyzer.py
Profiler output analyzer:
- Parse profiler data
- Generate reports
- Identify hotspots
- Visualization
priority: P2
dependencies: []
- name: tool-memory-visualizer
prompt: |
Create contrib/p100/tools/memory_visualizer.py
Memory usage visualizer:
- Timeline view
- Allocation tracking
- Fragmentation display
- Export to HTML
priority: P2
dependencies: []
- name: tool-kernel-comparator
prompt: |
Create contrib/p100/tools/kernel_compare.py
Kernel performance comparator:
- Side-by-side comparison
- Statistical analysis
- Regression detection
- Report generation
priority: P2
dependencies: []
- name: tool-model-analyzer
prompt: |
Create contrib/p100/tools/model_analyzer.py
Model analysis tool:
- Layer breakdown
- Parameter count
- FLOP estimation
- Memory requirements
priority: P2
dependencies: []
- name: tool-config-validator
prompt: |
Create contrib/p100/tools/config_validator.py
Configuration validator:
- Schema validation
- Constraint checking
- Compatibility verification
- Suggestions for improvement
priority: P2
dependencies: []
- name: tool-debug-helper
prompt: |
Create contrib/p100/tools/debug_helper.py
Debug helper utilities:
- Memory dump
- State inspection
- Breakpoint helpers
- Trace logging
priority: P2
dependencies: []
- name: tool-benchmark-runner
prompt: |
Create contrib/p100/tools/benchmark_runner.py
Benchmark automation:
- Configuration loading
- Sequential execution
- Results aggregation
- Comparison with baselines
priority: P2
dependencies: []
- name: tool-test-runner
prompt: |
Create contrib/p100/tools/test_runner.py
Test automation:
- Test discovery
- Parallel execution
- Coverage reporting
- Failure analysis
priority: P2
dependencies: []
- name: tool-code-generator
prompt: |
Create contrib/p100/tools/codegen.py
Code generation utilities:
- Kernel templates
- Binding generators
- Test scaffolding
- Documentation stubs
priority: P2
dependencies: []
- name: tool-health-dashboard
prompt: |
Create contrib/p100/tools/health_dashboard.py
GPU health dashboard:
- Real-time monitoring
- Temperature tracking
- Utilization graphs
- Alert system
priority: P2
dependencies: []
# ============================================
# P100 Facility Scale Tasks - Additional 400 tasks
# Focus: multi-human, multi-node facility operations and optimizations
# ============================================
# Cluster Scheduling & Orchestration - 40 tasks
- name: cluster-scheduling-priority-queues-design
prompt: |
Design priority queues for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-priority-queues-implement
prompt: |
Implement priority queues for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-priority-queues-test
prompt: |
Test priority queues for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-priority-queues-document
prompt: |
Document priority queues for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-fair-share-scheduling-design
prompt: |
Design fair-share scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-fair-share-scheduling-implement
prompt: |
Implement fair-share scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-fair-share-scheduling-test
prompt: |
Test fair-share scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-fair-share-scheduling-document
prompt: |
Document fair-share scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-preemption-policy-design
prompt: |
Design preemption policy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-preemption-policy-implement
prompt: |
Implement preemption policy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-preemption-policy-test
prompt: |
Test preemption policy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-preemption-policy-document
prompt: |
Document preemption policy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-backfilling-design
prompt: |
Design backfilling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-backfilling-implement
prompt: |
Implement backfilling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-backfilling-test
prompt: |
Test backfilling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-backfilling-document
prompt: |
Document backfilling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-gang-scheduling-design
prompt: |
Design gang scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-gang-scheduling-implement
prompt: |
Implement gang scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-gang-scheduling-test
prompt: |
Test gang scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-gang-scheduling-document
prompt: |
Document gang scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-reservation-windows-design
prompt: |
Design reservation windows for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-reservation-windows-implement
prompt: |
Implement reservation windows for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-reservation-windows-test
prompt: |
Test reservation windows for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-reservation-windows-document
prompt: |
Document reservation windows for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-node-labeling-and-constraints-design
prompt: |
Design node labeling and constraints for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-node-labeling-and-constraints-implement
prompt: |
Implement node labeling and constraints for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-node-labeling-and-constraints-test
prompt: |
Test node labeling and constraints for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-node-labeling-and-constraints-document
prompt: |
Document node labeling and constraints for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-gpu-health-aware-placement-design
prompt: |
Design GPU health-aware placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-gpu-health-aware-placement-implement
prompt: |
Implement GPU health-aware placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-gpu-health-aware-placement-test
prompt: |
Test GPU health-aware placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-gpu-health-aware-placement-document
prompt: |
Document GPU health-aware placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-data-locality-placement-design
prompt: |
Design data locality placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-data-locality-placement-implement
prompt: |
Implement data locality placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-data-locality-placement-test
prompt: |
Test data locality placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-data-locality-placement-document
prompt: |
Document data locality placement for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-mixed-workload-isolation-design
prompt: |
Design mixed workload isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-mixed-workload-isolation-implement
prompt: |
Implement mixed workload isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: cluster-scheduling-mixed-workload-isolation-test
prompt: |
Test mixed workload isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: cluster-scheduling-mixed-workload-isolation-document
prompt: |
Document mixed workload isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# GPU/NUMA/Affinity Tuning - 40 tasks
- name: numa-affinity-numa-pinning-design
prompt: |
Design NUMA pinning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-numa-pinning-implement
prompt: |
Implement NUMA pinning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-numa-pinning-test
prompt: |
Test NUMA pinning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-numa-pinning-document
prompt: |
Document NUMA pinning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-cpu-affinity-per-agent-design
prompt: |
Design CPU affinity per agent for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-cpu-affinity-per-agent-implement
prompt: |
Implement CPU affinity per agent for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-cpu-affinity-per-agent-test
prompt: |
Test CPU affinity per agent for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-cpu-affinity-per-agent-document
prompt: |
Document CPU affinity per agent for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-pcie-topology-awareness-design
prompt: |
Design PCIe topology awareness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-pcie-topology-awareness-implement
prompt: |
Implement PCIe topology awareness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-pcie-topology-awareness-test
prompt: |
Test PCIe topology awareness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-pcie-topology-awareness-document
prompt: |
Document PCIe topology awareness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-gpu-memory-partitioning-design
prompt: |
Design GPU memory partitioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-gpu-memory-partitioning-implement
prompt: |
Implement GPU memory partitioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-gpu-memory-partitioning-test
prompt: |
Test GPU memory partitioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-gpu-memory-partitioning-document
prompt: |
Document GPU memory partitioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-pcie-peer-access-design
prompt: |
Design PCIe peer access for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-pcie-peer-access-implement
prompt: |
Implement PCIe peer access for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-pcie-peer-access-test
prompt: |
Test PCIe peer access for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-pcie-peer-access-document
prompt: |
Document PCIe peer access for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-hbm2-bandwidth-throttling-design
prompt: |
Design HBM2 bandwidth throttling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-hbm2-bandwidth-throttling-implement
prompt: |
Implement HBM2 bandwidth throttling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-hbm2-bandwidth-throttling-test
prompt: |
Test HBM2 bandwidth throttling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-hbm2-bandwidth-throttling-document
prompt: |
Document HBM2 bandwidth throttling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-power-and-clock-management-design
prompt: |
Design power and clock management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-power-and-clock-management-implement
prompt: |
Implement power and clock management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-power-and-clock-management-test
prompt: |
Test power and clock management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-power-and-clock-management-document
prompt: |
Document power and clock management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-thermal-monitoring-design
prompt: |
Design thermal monitoring for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-thermal-monitoring-implement
prompt: |
Implement thermal monitoring for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-thermal-monitoring-test
prompt: |
Test thermal monitoring for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-thermal-monitoring-document
prompt: |
Document thermal monitoring for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-ecc-error-handling-design
prompt: |
Design ECC error handling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-ecc-error-handling-implement
prompt: |
Implement ECC error handling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-ecc-error-handling-test
prompt: |
Test ECC error handling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-ecc-error-handling-document
prompt: |
Document ECC error handling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-cuda-mps-sharing-design
prompt: |
Design CUDA MPS sharing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-cuda-mps-sharing-implement
prompt: |
Implement CUDA MPS sharing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: numa-affinity-cuda-mps-sharing-test
prompt: |
Test CUDA MPS sharing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: numa-affinity-cuda-mps-sharing-document
prompt: |
Document CUDA MPS sharing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Data Pipeline & Storage - 40 tasks
- name: data-pipeline-dataset-cache-hierarchy-design
prompt: |
Design dataset cache hierarchy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-dataset-cache-hierarchy-implement
prompt: |
Implement dataset cache hierarchy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-dataset-cache-hierarchy-test
prompt: |
Test dataset cache hierarchy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-dataset-cache-hierarchy-document
prompt: |
Document dataset cache hierarchy for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-object-store-ingest-design
prompt: |
Design object store ingest for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-object-store-ingest-implement
prompt: |
Implement object store ingest for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-object-store-ingest-test
prompt: |
Test object store ingest for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-object-store-ingest-document
prompt: |
Document object store ingest for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-shard-management-design
prompt: |
Design shard management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-shard-management-implement
prompt: |
Implement shard management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-shard-management-test
prompt: |
Test shard management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-shard-management-document
prompt: |
Document shard management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-prefetching-design
prompt: |
Design prefetching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-prefetching-implement
prompt: |
Implement prefetching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-prefetching-test
prompt: |
Test prefetching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-prefetching-document
prompt: |
Document prefetching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-streaming-decompression-design
prompt: |
Design streaming decompression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-streaming-decompression-implement
prompt: |
Implement streaming decompression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-streaming-decompression-test
prompt: |
Test streaming decompression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-streaming-decompression-document
prompt: |
Document streaming decompression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-format-conversion-design
prompt: |
Design format conversion for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-format-conversion-implement
prompt: |
Implement format conversion for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-format-conversion-test
prompt: |
Test format conversion for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-format-conversion-document
prompt: |
Document format conversion for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-data-integrity-checks-design
prompt: |
Design data integrity checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-data-integrity-checks-implement
prompt: |
Implement data integrity checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-data-integrity-checks-test
prompt: |
Test data integrity checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-data-integrity-checks-document
prompt: |
Document data integrity checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-snapshot-versioning-design
prompt: |
Design snapshot versioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-snapshot-versioning-implement
prompt: |
Implement snapshot versioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-snapshot-versioning-test
prompt: |
Test snapshot versioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-snapshot-versioning-document
prompt: |
Document snapshot versioning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-metadata-catalog-design
prompt: |
Design metadata catalog for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-metadata-catalog-implement
prompt: |
Implement metadata catalog for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-metadata-catalog-test
prompt: |
Test metadata catalog for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-metadata-catalog-document
prompt: |
Document metadata catalog for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-cold-hot-tiering-design
prompt: |
Design cold/hot tiering for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-cold-hot-tiering-implement
prompt: |
Implement cold/hot tiering for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: data-pipeline-cold-hot-tiering-test
prompt: |
Test cold/hot tiering for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: data-pipeline-cold-hot-tiering-document
prompt: |
Document cold/hot tiering for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Network Transport & Messaging - 40 tasks
- name: network-transport-zeromq-routing-scale-design
prompt: |
Design ZeroMQ routing scale for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-zeromq-routing-scale-implement
prompt: |
Implement ZeroMQ routing scale for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-zeromq-routing-scale-test
prompt: |
Test ZeroMQ routing scale for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-zeromq-routing-scale-document
prompt: |
Document ZeroMQ routing scale for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-redis-queue-scaling-design
prompt: |
Design Redis queue scaling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-redis-queue-scaling-implement
prompt: |
Implement Redis queue scaling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-redis-queue-scaling-test
prompt: |
Test Redis queue scaling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-redis-queue-scaling-document
prompt: |
Document Redis queue scaling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-message-batching-design
prompt: |
Design message batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-message-batching-implement
prompt: |
Implement message batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-message-batching-test
prompt: |
Test message batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-message-batching-document
prompt: |
Document message batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-backpressure-control-design
prompt: |
Design backpressure control for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-backpressure-control-implement
prompt: |
Implement backpressure control for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-backpressure-control-test
prompt: |
Test backpressure control for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-backpressure-control-document
prompt: |
Document backpressure control for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-transport-compression-design
prompt: |
Design transport compression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-transport-compression-implement
prompt: |
Implement transport compression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-transport-compression-test
prompt: |
Test transport compression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-transport-compression-document
prompt: |
Document transport compression for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-tls-for-control-plane-design
prompt: |
Design TLS for control plane for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-tls-for-control-plane-implement
prompt: |
Implement TLS for control plane for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-tls-for-control-plane-test
prompt: |
Test TLS for control plane for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-tls-for-control-plane-document
prompt: |
Document TLS for control plane for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-heartbeats-and-timeouts-design
prompt: |
Design heartbeats and timeouts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-heartbeats-and-timeouts-implement
prompt: |
Implement heartbeats and timeouts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-heartbeats-and-timeouts-test
prompt: |
Test heartbeats and timeouts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-heartbeats-and-timeouts-document
prompt: |
Document heartbeats and timeouts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-node-discovery-design
prompt: |
Design node discovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-node-discovery-implement
prompt: |
Implement node discovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-node-discovery-test
prompt: |
Test node discovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-node-discovery-document
prompt: |
Document node discovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-bandwidth-shaping-design
prompt: |
Design bandwidth shaping for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-bandwidth-shaping-implement
prompt: |
Implement bandwidth shaping for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-bandwidth-shaping-test
prompt: |
Test bandwidth shaping for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-bandwidth-shaping-document
prompt: |
Document bandwidth shaping for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-cross-datacenter-links-design
prompt: |
Design cross-datacenter links for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-cross-datacenter-links-implement
prompt: |
Implement cross-datacenter links for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: network-transport-cross-datacenter-links-test
prompt: |
Test cross-datacenter links for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: network-transport-cross-datacenter-links-document
prompt: |
Document cross-datacenter links for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Observability & Dashboards - 40 tasks
- name: observability-per-agent-metrics-design
prompt: |
Design per-agent metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-per-agent-metrics-implement
prompt: |
Implement per-agent metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-per-agent-metrics-test
prompt: |
Test per-agent metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-per-agent-metrics-document
prompt: |
Document per-agent metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-gpu-metrics-export-design
prompt: |
Design GPU metrics export for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-gpu-metrics-export-implement
prompt: |
Implement GPU metrics export for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-gpu-metrics-export-test
prompt: |
Test GPU metrics export for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-gpu-metrics-export-document
prompt: |
Document GPU metrics export for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-queue-depth-metrics-design
prompt: |
Design queue depth metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-queue-depth-metrics-implement
prompt: |
Implement queue depth metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-queue-depth-metrics-test
prompt: |
Test queue depth metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-queue-depth-metrics-document
prompt: |
Document queue depth metrics for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-task-latency-tracing-design
prompt: |
Design task latency tracing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-task-latency-tracing-implement
prompt: |
Implement task latency tracing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-task-latency-tracing-test
prompt: |
Test task latency tracing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-task-latency-tracing-document
prompt: |
Document task latency tracing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-log-aggregation-design
prompt: |
Design log aggregation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-log-aggregation-implement
prompt: |
Implement log aggregation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-log-aggregation-test
prompt: |
Test log aggregation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-log-aggregation-document
prompt: |
Document log aggregation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-anomaly-detection-design
prompt: |
Design anomaly detection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-anomaly-detection-implement
prompt: |
Implement anomaly detection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-anomaly-detection-test
prompt: |
Test anomaly detection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-anomaly-detection-document
prompt: |
Document anomaly detection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-sla-dashboards-design
prompt: |
Design SLA dashboards for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-sla-dashboards-implement
prompt: |
Implement SLA dashboards for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-sla-dashboards-test
prompt: |
Test SLA dashboards for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-sla-dashboards-document
prompt: |
Document SLA dashboards for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-capacity-planning-reports-design
prompt: |
Design capacity planning reports for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-capacity-planning-reports-implement
prompt: |
Implement capacity planning reports for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-capacity-planning-reports-test
prompt: |
Test capacity planning reports for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-capacity-planning-reports-document
prompt: |
Document capacity planning reports for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-alerting-rules-design
prompt: |
Design alerting rules for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-alerting-rules-implement
prompt: |
Implement alerting rules for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-alerting-rules-test
prompt: |
Test alerting rules for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-alerting-rules-document
prompt: |
Document alerting rules for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-cost-accounting-design
prompt: |
Design cost accounting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-cost-accounting-implement
prompt: |
Implement cost accounting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: observability-cost-accounting-test
prompt: |
Test cost accounting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: observability-cost-accounting-document
prompt: |
Document cost accounting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Reliability & Fault Tolerance - 40 tasks
- name: reliability-agent-crash-recovery-design
prompt: |
Design agent crash recovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-agent-crash-recovery-implement
prompt: |
Implement agent crash recovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-agent-crash-recovery-test
prompt: |
Test agent crash recovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-agent-crash-recovery-document
prompt: |
Document agent crash recovery for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-coordinator-failover-design
prompt: |
Design coordinator failover for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-coordinator-failover-implement
prompt: |
Implement coordinator failover for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-coordinator-failover-test
prompt: |
Test coordinator failover for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-coordinator-failover-document
prompt: |
Document coordinator failover for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-task-retry-policies-design
prompt: |
Design task retry policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-task-retry-policies-implement
prompt: |
Implement task retry policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-task-retry-policies-test
prompt: |
Test task retry policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-task-retry-policies-document
prompt: |
Document task retry policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-idempotency-keys-design
prompt: |
Design idempotency keys for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-idempotency-keys-implement
prompt: |
Implement idempotency keys for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-idempotency-keys-test
prompt: |
Test idempotency keys for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-idempotency-keys-document
prompt: |
Document idempotency keys for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-checkpointing-tasks-design
prompt: |
Design checkpointing tasks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-checkpointing-tasks-implement
prompt: |
Implement checkpointing tasks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-checkpointing-tasks-test
prompt: |
Test checkpointing tasks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-checkpointing-tasks-document
prompt: |
Document checkpointing tasks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-rate-limiting-design
prompt: |
Design rate limiting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-rate-limiting-implement
prompt: |
Implement rate limiting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-rate-limiting-test
prompt: |
Test rate limiting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-rate-limiting-document
prompt: |
Document rate limiting for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-circuit-breakers-design
prompt: |
Design circuit breakers for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-circuit-breakers-implement
prompt: |
Implement circuit breakers for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-circuit-breakers-test
prompt: |
Test circuit breakers for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-circuit-breakers-document
prompt: |
Document circuit breakers for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-rolling-restarts-design
prompt: |
Design rolling restarts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-rolling-restarts-implement
prompt: |
Implement rolling restarts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-rolling-restarts-test
prompt: |
Test rolling restarts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-rolling-restarts-document
prompt: |
Document rolling restarts for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-config-hot-reload-design
prompt: |
Design config hot reload for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-config-hot-reload-implement
prompt: |
Implement config hot reload for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-config-hot-reload-test
prompt: |
Test config hot reload for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-config-hot-reload-document
prompt: |
Document config hot reload for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-disaster-recovery-plan-design
prompt: |
Design disaster recovery plan for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-disaster-recovery-plan-implement
prompt: |
Implement disaster recovery plan for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: reliability-disaster-recovery-plan-test
prompt: |
Test disaster recovery plan for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: reliability-disaster-recovery-plan-document
prompt: |
Document disaster recovery plan for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Security & Multi-Tenant Controls - 40 tasks
- name: security-tenant-isolation-design
prompt: |
Design tenant isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-tenant-isolation-implement
prompt: |
Implement tenant isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-tenant-isolation-test
prompt: |
Test tenant isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-tenant-isolation-document
prompt: |
Document tenant isolation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-api-authentication-design
prompt: |
Design API authentication for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-api-authentication-implement
prompt: |
Implement API authentication for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-api-authentication-test
prompt: |
Test API authentication for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-api-authentication-document
prompt: |
Document API authentication for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-audit-logging-design
prompt: |
Design audit logging for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-audit-logging-implement
prompt: |
Implement audit logging for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-audit-logging-test
prompt: |
Test audit logging for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-audit-logging-document
prompt: |
Document audit logging for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-secrets-management-design
prompt: |
Design secrets management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-secrets-management-implement
prompt: |
Implement secrets management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-secrets-management-test
prompt: |
Test secrets management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-secrets-management-document
prompt: |
Document secrets management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-network-segmentation-design
prompt: |
Design network segmentation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-network-segmentation-implement
prompt: |
Implement network segmentation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-network-segmentation-test
prompt: |
Test network segmentation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-network-segmentation-document
prompt: |
Document network segmentation for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-least-privilege-design
prompt: |
Design least privilege for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-least-privilege-implement
prompt: |
Implement least privilege for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-least-privilege-test
prompt: |
Test least privilege for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-least-privilege-document
prompt: |
Document least privilege for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-usage-quotas-design
prompt: |
Design usage quotas for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-usage-quotas-implement
prompt: |
Implement usage quotas for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-usage-quotas-test
prompt: |
Test usage quotas for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-usage-quotas-document
prompt: |
Document usage quotas for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-data-access-policies-design
prompt: |
Design data access policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-data-access-policies-implement
prompt: |
Implement data access policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-data-access-policies-test
prompt: |
Test data access policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-data-access-policies-document
prompt: |
Document data access policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-per-tenant-encryption-design
prompt: |
Design per-tenant encryption for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-per-tenant-encryption-implement
prompt: |
Implement per-tenant encryption for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-per-tenant-encryption-test
prompt: |
Test per-tenant encryption for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-per-tenant-encryption-document
prompt: |
Document per-tenant encryption for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-vulnerability-scanning-design
prompt: |
Design vulnerability scanning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-vulnerability-scanning-implement
prompt: |
Implement vulnerability scanning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: security-vulnerability-scanning-test
prompt: |
Test vulnerability scanning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: security-vulnerability-scanning-document
prompt: |
Document vulnerability scanning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Performance Kernels & Memory - 40 tasks
- name: performance-kernel-auto-tuning-design
prompt: |
Design kernel auto-tuning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-auto-tuning-implement
prompt: |
Implement kernel auto-tuning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-auto-tuning-test
prompt: |
Test kernel auto-tuning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-kernel-auto-tuning-document
prompt: |
Document kernel auto-tuning for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-mixed-precision-policies-design
prompt: |
Design mixed precision policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-mixed-precision-policies-implement
prompt: |
Implement mixed precision policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-mixed-precision-policies-test
prompt: |
Test mixed precision policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-mixed-precision-policies-document
prompt: |
Document mixed precision policies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-fused-kernels-design
prompt: |
Design fused kernels for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-fused-kernels-implement
prompt: |
Implement fused kernels for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-fused-kernels-test
prompt: |
Test fused kernels for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-fused-kernels-document
prompt: |
Document fused kernels for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-kernel-launch-overhead-design
prompt: |
Design kernel launch overhead for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-launch-overhead-implement
prompt: |
Implement kernel launch overhead for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-launch-overhead-test
prompt: |
Test kernel launch overhead for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-kernel-launch-overhead-document
prompt: |
Document kernel launch overhead for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-memory-pool-allocator-design
prompt: |
Design memory pool allocator for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-memory-pool-allocator-implement
prompt: |
Implement memory pool allocator for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-memory-pool-allocator-test
prompt: |
Test memory pool allocator for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-memory-pool-allocator-document
prompt: |
Document memory pool allocator for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-unified-memory-vs-pinned-design
prompt: |
Design unified memory vs pinned for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-unified-memory-vs-pinned-implement
prompt: |
Implement unified memory vs pinned for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-unified-memory-vs-pinned-test
prompt: |
Test unified memory vs pinned for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-unified-memory-vs-pinned-document
prompt: |
Document unified memory vs pinned for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-overlap-compute-and-transfer-design
prompt: |
Design overlap compute and transfer for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-overlap-compute-and-transfer-implement
prompt: |
Implement overlap compute and transfer for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-overlap-compute-and-transfer-test
prompt: |
Test overlap compute and transfer for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-overlap-compute-and-transfer-document
prompt: |
Document overlap compute and transfer for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-vectorization-strategies-design
prompt: |
Design vectorization strategies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-vectorization-strategies-implement
prompt: |
Implement vectorization strategies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-vectorization-strategies-test
prompt: |
Test vectorization strategies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-vectorization-strategies-document
prompt: |
Document vectorization strategies for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-kernel-profiling-harness-design
prompt: |
Design kernel profiling harness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-profiling-harness-implement
prompt: |
Implement kernel profiling harness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-kernel-profiling-harness-test
prompt: |
Test kernel profiling harness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-kernel-profiling-harness-document
prompt: |
Document kernel profiling harness for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-microbenchmark-suite-design
prompt: |
Design microbenchmark suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-microbenchmark-suite-implement
prompt: |
Implement microbenchmark suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: performance-microbenchmark-suite-test
prompt: |
Test microbenchmark suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: performance-microbenchmark-suite-document
prompt: |
Document microbenchmark suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Model Serving & Batching - 40 tasks
- name: serving-dynamic-batching-design
prompt: |
Design dynamic batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-dynamic-batching-implement
prompt: |
Implement dynamic batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-dynamic-batching-test
prompt: |
Test dynamic batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-dynamic-batching-document
prompt: |
Document dynamic batching for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-request-prioritization-design
prompt: |
Design request prioritization for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-request-prioritization-implement
prompt: |
Implement request prioritization for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-request-prioritization-test
prompt: |
Test request prioritization for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-request-prioritization-document
prompt: |
Document request prioritization for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-warmup-caches-design
prompt: |
Design warmup caches for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-warmup-caches-implement
prompt: |
Implement warmup caches for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-warmup-caches-test
prompt: |
Test warmup caches for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-warmup-caches-document
prompt: |
Document warmup caches for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-model-sharding-design
prompt: |
Design model sharding for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-model-sharding-implement
prompt: |
Implement model sharding for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-model-sharding-test
prompt: |
Test model sharding for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-model-sharding-document
prompt: |
Document model sharding for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-multi-model-routing-design
prompt: |
Design multi-model routing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-multi-model-routing-implement
prompt: |
Implement multi-model routing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-multi-model-routing-test
prompt: |
Test multi-model routing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-multi-model-routing-document
prompt: |
Document multi-model routing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-concurrency-limits-design
prompt: |
Design concurrency limits for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-concurrency-limits-implement
prompt: |
Implement concurrency limits for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-concurrency-limits-test
prompt: |
Test concurrency limits for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-concurrency-limits-document
prompt: |
Document concurrency limits for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-kv-cache-management-design
prompt: |
Design KV cache management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-kv-cache-management-implement
prompt: |
Implement KV cache management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-kv-cache-management-test
prompt: |
Test KV cache management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-kv-cache-management-document
prompt: |
Document KV cache management for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-quantization-pipeline-design
prompt: |
Design quantization pipeline for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-quantization-pipeline-implement
prompt: |
Implement quantization pipeline for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-quantization-pipeline-test
prompt: |
Test quantization pipeline for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-quantization-pipeline-document
prompt: |
Document quantization pipeline for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-a-b-deployment-design
prompt: |
Design A/B deployment for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-a-b-deployment-implement
prompt: |
Implement A/B deployment for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-a-b-deployment-test
prompt: |
Test A/B deployment for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-a-b-deployment-document
prompt: |
Document A/B deployment for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-sla-aware-scheduling-design
prompt: |
Design SLA-aware scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-sla-aware-scheduling-implement
prompt: |
Implement SLA-aware scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: serving-sla-aware-scheduling-test
prompt: |
Test SLA-aware scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: serving-sla-aware-scheduling-document
prompt: |
Document SLA-aware scheduling for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
# Testing & Validation - 40 tasks
- name: validation-load-testing-design
prompt: |
Design load testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-load-testing-implement
prompt: |
Implement load testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-load-testing-test
prompt: |
Test load testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-load-testing-document
prompt: |
Document load testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-soak-testing-design
prompt: |
Design soak testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-soak-testing-implement
prompt: |
Implement soak testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-soak-testing-test
prompt: |
Test soak testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-soak-testing-document
prompt: |
Document soak testing for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-chaos-kill-agents-design
prompt: |
Design chaos kill agents for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-chaos-kill-agents-implement
prompt: |
Implement chaos kill agents for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-chaos-kill-agents-test
prompt: |
Test chaos kill agents for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-chaos-kill-agents-document
prompt: |
Document chaos kill agents for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-network-partition-tests-design
prompt: |
Design network partition tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-network-partition-tests-implement
prompt: |
Implement network partition tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-network-partition-tests-test
prompt: |
Test network partition tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-network-partition-tests-document
prompt: |
Document network partition tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-disk-latency-injection-design
prompt: |
Design disk latency injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-disk-latency-injection-implement
prompt: |
Implement disk latency injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-disk-latency-injection-test
prompt: |
Test disk latency injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-disk-latency-injection-document
prompt: |
Document disk latency injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-gpu-fault-injection-design
prompt: |
Design GPU fault injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-gpu-fault-injection-implement
prompt: |
Implement GPU fault injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-gpu-fault-injection-test
prompt: |
Test GPU fault injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-gpu-fault-injection-document
prompt: |
Document GPU fault injection for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-config-regression-tests-design
prompt: |
Design config regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-config-regression-tests-implement
prompt: |
Implement config regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-config-regression-tests-test
prompt: |
Test config regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-config-regression-tests-document
prompt: |
Document config regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-security-regression-tests-design
prompt: |
Design security regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-security-regression-tests-implement
prompt: |
Implement security regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-security-regression-tests-test
prompt: |
Test security regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-security-regression-tests-document
prompt: |
Document security regression tests for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-performance-regression-suite-design
prompt: |
Design performance regression suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-performance-regression-suite-implement
prompt: |
Implement performance regression suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-performance-regression-suite-test
prompt: |
Test performance regression suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-performance-regression-suite-document
prompt: |
Document performance regression suite for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-reproducibility-checks-design
prompt: |
Design reproducibility checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-reproducibility-checks-implement
prompt: |
Implement reproducibility checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P1
dependencies: []
- name: validation-reproducibility-checks-test
prompt: |
Test reproducibility checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
- name: validation-reproducibility-checks-document
prompt: |
Document reproducibility checks for a multi-node P100 facility.
Requirements:
- Multi-tenant safe defaults
- Clear operational runbook steps
- Metrics for success and rollback
priority: P2
dependencies: []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment