Skip to content

Instantly share code, notes, and snippets.

@sethwebster
Created January 30, 2026 21:06
Show Gist options
  • Select an option

  • Save sethwebster/2740c4d5d6a68c36627fba39ae0e4f3a to your computer and use it in GitHub Desktop.

Select an option

Save sethwebster/2740c4d5d6a68c36627fba39ae0e4f3a to your computer and use it in GitHub Desktop.
AGENTS.md
# Agent Development Guide
Enterprise-grade guidelines for building production systems with AI agents.
## Core Principles
### 1. Explicit Over Implicit
- Every decision must have a clear rationale
- No magic values or hidden assumptions
- State changes must be traceable
- Dependencies must be declared upfront
### 2. Fail Fast, Fail Loud
- Validate inputs at boundaries
- No silent failures or degraded modes
- Throw errors immediately when invariants break
- Never catch exceptions just to log them
### 3. Optimize for Deletion
- Code that doesn't exist can't break
- Delete > Comment out > Keep
- Prefer inline over abstraction until third use
- Remove dead code immediately
### 4. Trust Nothing, Verify Everything
- User input is hostile until proven otherwise
- External APIs will fail in unexpected ways
- Database constraints are your last line of defense
- Type systems prevent bugs, runtime checks prevent disasters
## Code Quality Standards
### Complexity Budget
- Functions: ≤50 lines (hard limit: 100)
- Files: ≤500 lines (hard limit: 1000)
- Cyclomatic complexity: ≤10 per function
- Nesting depth: ≤3 levels
- Function parameters: ≤4 (use objects for more)
### Zero Tolerance
- ❌ `any` types (use `unknown` + type guards)
- ❌ Non-null assertions (`!`) without comments
- ❌ Empty catch blocks
- ❌ Disabled linter rules without issue links
- ❌ TODO comments without owner + date
- ❌ Console.log in production code
- ❌ Commented-out code
- ❌ Magic numbers (use named constants)
### Required Patterns
- ✅ Discriminated unions for state machines
- ✅ Exhaustive switch statements (never default case for enums)
- ✅ Early returns for guard clauses
- ✅ Immutable data structures (no mutations)
- ✅ Pure functions wherever possible
- ✅ Dependency injection over singletons
## Architecture
### Layered Architecture
```
┌─────────────────────────────────────┐
│ Presentation (UI Components) │
│ - No business logic │
│ - Props in, events out │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Application (Hooks/Controllers) │
│ - Orchestration only │
│ - State management │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Domain (Business Logic) │
│ - Framework-agnostic │
│ - Pure functions │
│ - Core algorithms │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Infrastructure (DB/API/Cache) │
│ - External dependencies │
│ - I/O operations │
└─────────────────────────────────────┘
```
### Module Boundaries
Each module must:
- Have a single public entry point (`index.ts`)
- Export types explicitly
- Hide implementation details
- Never import from sibling modules' internals
- Document public API with TSDoc
### Dependency Rules
1. Higher layers depend on lower layers only
2. Domain layer has zero external dependencies
3. Infrastructure implements domain interfaces
4. Circular dependencies = architectural failure
### Architecture Decision Records (ADRs)
**REQUIRED**: All significant architectural decisions MUST be documented in the `adr/` folder.
#### What Qualifies as an ADR
Document when:
- Choosing between architectural patterns
- Selecting frameworks, libraries, or tools
- Defining API contracts or data models
- Establishing security policies
- Making performance trade-offs
- Changing core infrastructure
- Introducing new dependencies
- Adopting coding standards
Don't document:
- Routine bug fixes
- Obvious choices with no alternatives
- Temporary workarounds (use code comments instead)
#### ADR Format
```markdown
# ADR-NNN: Title
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX
**Date**: YYYY-MM-DD
**Deciders**: @username, @agent-name
**Consulted**: @username, @agent-name
## Context
What is the issue/problem we're facing? What constraints exist?
## Decision Drivers
- Performance requirements
- Security concerns
- Developer experience
- Operational complexity
- Cost implications
## Considered Options
1. **Option A**: Description
- Pros: ...
- Cons: ...
2. **Option B**: Description
- Pros: ...
- Cons: ...
## Decision
We chose Option A because [rationale].
## Consequences
### Positive
- Benefit 1
- Benefit 2
### Negative
- Trade-off 1
- Trade-off 2
### Neutral
- Change 1
## Implementation
- File/module changes required
- Migration steps if needed
- Rollback procedure
## Validation
How we'll verify this decision was correct:
- Metrics to track
- Success criteria
## References
- [Related ADR-XXX](./adr-xxx-title.md)
- [External documentation](https://...)
- [GitHub issue #123](https://...)
```
#### ADR Workflow for Agents
**CRITICAL REQUIREMENT**: AI agents making architectural decisions MUST:
1. **Document Decision Rationale**
- Create ADR in `adr/` folder before implementation
- Number sequentially: `adr-001-title.md`, `adr-002-title.md`, etc.
- Use kebab-case for titles
- Include detailed comparison of alternatives
- Explain why rejected options weren't chosen
2. **Get Code Review Sign-off**
- After creating ADR, use Task tool with `subagent_type='neckbeard-code-reviewer'`
- Provide detailed description: "Review ADR-XXX for [architectural decision]. Focus on: [specific concerns like security, performance, maintainability]."
- Address all feedback before proceeding
- Update ADR based on review comments
- Mark ADR as "Accepted" only after reviewer approval
3. **Link to Implementation**
- Reference ADR in commit messages: "Implements ADR-042: GraphQL API"
- Link ADR in PR description
- Update ADR if implementation reveals new information
#### Example Agent ADR Workflow
```bash
# 1. Agent creates ADR
echo "# ADR-015: Switch to Drizzle ORM..." > adr/adr-015-drizzle-orm.md
# 2. Agent invokes code reviewer
# Uses Task tool: subagent_type='neckbeard-code-reviewer'
# Prompt: "Review ADR-015 for ORM migration decision. Focus on: migration
# safety, performance implications, type safety, and developer experience."
# 3. Agent addresses feedback and updates ADR
# 4. Reviewer approves
# 5. ADR status → "Accepted"
# 6. Implementation begins
```
#### File Organization
```
adr/
├── README.md # Index of all ADRs with status
├── template.md # Copy this for new ADRs
├── adr-001-monorepo.md
├── adr-002-auth-strategy.md
├── adr-003-caching-layer.md
└── ...
```
#### README.md Format
```markdown
# Architecture Decision Records
| ADR | Title | Status | Date |
|-----|-------|--------|------|
| [001](./adr-001-monorepo.md) | Monorepo Structure | Accepted | 2024-01-15 |
| [002](./adr-002-auth-strategy.md) | Auth Strategy | Accepted | 2024-01-20 |
| [003](./adr-003-caching-layer.md) | Redis Caching | Deprecated | 2024-02-10 |
```
#### Updating ADRs
- Never delete ADRs (historical record)
- To supersede: Change status, link to replacement
- To deprecate: Change status, explain why
- Keep original decision visible (strikethrough if needed)
#### Review Checklist
Before accepting ADR:
- [ ] Clear problem statement
- [ ] ≥2 alternatives considered
- [ ] Explicit trade-offs documented
- [ ] Implementation steps defined
- [ ] Success metrics identified
- [ ] Code reviewer approved
- [ ] Links to related ADRs/issues
## React Best Practices
### Component Hierarchy
```typescript
// ❌ WRONG - Business logic in component
function UserProfile() {
const [user, setUser] = useState(null)
useEffect(() => {
fetch('/api/user')
.then(r => r.json())
.then(setUser)
}, [])
return <div>{user?.name}</div>
}
// ✅ CORRECT - Logic in custom hook
function useUser() {
const [user, setUser] = useState(null)
useEffect(() => {
fetch('/api/user')
.then(r => r.json())
.then(setUser)
}, [])
return user
}
function UserProfile() {
const user = useUser()
return <div>{user?.name}</div>
}
```
### Hook Guidelines
- Never call `useEffect` directly in components
- One hook per concern (don't combine unrelated logic)
- Hooks must be pure (no side effects except in useEffect)
- Always specify exhaustive dependencies
- Extract complex effects to custom hooks
### State Management
```typescript
// ❌ WRONG - Prop drilling
<Parent>
<Child1 onUpdate={handleUpdate} />
<Child2 onUpdate={handleUpdate} />
<Child3 onUpdate={handleUpdate} />
</Parent>
// ✅ CORRECT - Context for shared state
const UpdateContext = createContext<(val: T) => void>()
function Parent() {
const handleUpdate = useCallback((val: T) => {...}, [])
return (
<UpdateContext.Provider value={handleUpdate}>
<Child1 />
<Child2 />
<Child3 />
</UpdateContext.Provider>
)
}
```
### Performance Rules
- Memo only after profiling shows need
- Don't optimize prematurely
- `useCallback` for props passed to memoized components
- `useMemo` for expensive computations only
- Virtual scrolling for lists >100 items
## TypeScript Standards
### Type Safety
```typescript
// ❌ WRONG - Weak types
interface User {
id: string
role: string
status: string
}
// ✅ CORRECT - Strong types
interface User {
id: UserId // Branded type
role: 'admin' | 'user' | 'guest'
status: UserStatus // Enum or union
}
type UserId = string & { readonly __brand: 'UserId' }
```
### Branded Types
Use for:
- IDs (UserId, PostId, etc.)
- Validated strings (Email, URL)
- Units (Milliseconds, Pixels)
- Sanitized input (SafeHTML)
### Error Handling
```typescript
// ❌ WRONG - Throwing strings
throw 'Something went wrong'
// ❌ WRONG - Generic errors
throw new Error('Failed')
// ✅ CORRECT - Typed errors
class ValidationError extends Error {
constructor(
public field: string,
public constraint: string
) {
super(`${field} failed ${constraint}`)
this.name = 'ValidationError'
}
}
// ✅ BEST - Result type
type Result<T, E = Error> =
| { ok: true; value: T }
| { ok: false; error: E }
```
## Testing Requirements
### Test-First Development (Non-Negotiable)
**CRITICAL**: All fixes and features REQUIRE breaking tests first, then code.
**Workflow**:
1. Write failing test that demonstrates the bug or specifies the feature
2. Verify test fails for the right reason
3. Implement minimum code to make test pass
4. Refactor if needed (test still passing)
5. No code without failing test first
**Rationale**:
- Tests are the specification
- Proves test actually catches the bug
- Prevents "test passes because it doesn't test anything"
- Forces clarity on requirements before implementation
- Prevents scope creep
```typescript
// ✅ CORRECT workflow
// 1. Write test (fails)
it('should reject invalid email', () => {
expect(() => validateEmail('not-an-email')).toThrow()
})
// 2. Run test → RED
// 3. Implement
function validateEmail(email: string) {
if (!email.includes('@')) throw new Error('Invalid')
}
// 4. Run test → GREEN
```
### Coverage Targets
- Unit tests: ≥80% line coverage
- Integration tests: All critical paths
- E2E tests: Primary user flows
- No mocking in E2E tests
### Test Structure
```typescript
// ✅ CORRECT - AAA pattern
describe('UserService', () => {
describe('createUser', () => {
it('should create user with valid data', async () => {
// Arrange
const input = { email: 'test@example.com' }
const mockDb = createMockDb()
const service = new UserService(mockDb)
// Act
const result = await service.createUser(input)
// Assert
expect(result.ok).toBe(true)
expect(mockDb.insert).toHaveBeenCalledWith(
expect.objectContaining({ email: input.email })
)
})
it('should reject invalid email', async () => {
// Arrange
const input = { email: 'invalid' }
const service = new UserService(mockDb())
// Act
const result = await service.createUser(input)
// Assert
expect(result.ok).toBe(false)
expect(result.error).toBeInstanceOf(ValidationError)
})
})
})
```
### Test Naming
- Use `should` statements
- Be specific about conditions
- One assertion per test (prefer multiple tests)
- Tests are documentation (name explains behavior)
### What to Test
✅ Test:
- Business logic (pure functions)
- Integration points
- Error conditions
- Edge cases (null, empty, boundary values)
- State transitions
❌ Don't test:
- Framework internals
- Third-party libraries
- Getters/setters
- Private methods directly
## Database Best Practices
### Migration Strategy
```typescript
// ❌ WRONG - Destructive migration
await db.schema.dropTable('users')
await db.schema.createTable('users', ...)
// ✅ CORRECT - Additive migration
await db.schema.createTable('users_v2', ...)
// Deploy code that reads from users_v2
// Backfill data
// Switch reads to users_v2
// Drop users (separate migration)
```
### Query Patterns
```typescript
// ❌ WRONG - N+1 queries
const users = await db.select().from(users)
for (const user of users) {
user.posts = await db.select().from(posts).where(eq(posts.userId, user.id))
}
// ✅ CORRECT - Eager loading
const users = await db
.select()
.from(users)
.leftJoin(posts, eq(posts.userId, users.id))
```
### Constraints
Every table must have:
- Primary key
- Created/updated timestamps
- NOT NULL on required fields
- Foreign keys with explicit ON DELETE behavior
- Unique constraints for natural keys
- Check constraints for invariants
## API Design
### RESTful Endpoints
```
POST /users - Create
GET /users/:id - Read one
GET /users - Read many
PATCH /users/:id - Partial update
PUT /users/:id - Full replacement
DELETE /users/:id - Delete
```
### Response Format
```typescript
// ✅ Success
{
"data": { ... },
"meta": {
"requestId": "uuid",
"timestamp": "ISO8601"
}
}
// ✅ Error
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Human readable message",
"details": [
{ "field": "email", "issue": "invalid_format" }
]
},
"meta": {
"requestId": "uuid",
"timestamp": "ISO8601"
}
}
```
### Status Codes
- 200: Success with body
- 201: Created (return created resource)
- 204: Success without body
- 400: Client error (validation)
- 401: Unauthenticated
- 403: Unauthorized (authenticated but forbidden)
- 404: Not found
- 409: Conflict (unique constraint, optimistic lock)
- 422: Unprocessable (semantic error)
- 429: Rate limited
- 500: Server error
- 503: Service unavailable (maintenance, overload)
## Security Requirements
### Input Validation
```typescript
// ✅ CORRECT - Validate at boundary
export async function createUser(req: Request) {
const input = validateCreateUserInput(await req.json())
// input is now trusted
const user = await userService.create(input)
return Response.json(user)
}
// Domain layer assumes valid input
class UserService {
create(input: ValidatedCreateUserInput) {
// No validation needed here
}
}
```
### Authentication
- Never roll your own crypto
- Use established libraries (Auth.js, Lucia, etc.)
- Store only hashed passwords (bcrypt, Argon2)
- Session tokens: cryptographically random, ≥128 bits
- Expire sessions (30d max, 24h for sensitive)
### Authorization
```typescript
// ✅ CORRECT - Explicit permissions
function canDeletePost(user: User, post: Post): boolean {
return user.id === post.authorId || user.role === 'admin'
}
// Check before action
if (!canDeletePost(user, post)) {
throw new ForbiddenError('Cannot delete post')
}
await deletePost(post.id)
```
### Common Vulnerabilities
❌ Prevent:
- SQL injection (use parameterized queries)
- XSS (escape output, CSP headers)
- CSRF (SameSite cookies, CSRF tokens)
- Mass assignment (explicit allowlists)
- Timing attacks (constant-time comparison)
- Open redirects (validate redirect URLs)
## Performance
### Caching Strategy
```typescript
// ✅ CORRECT - Layered caching
async function getUser(id: UserId): Promise<User> {
// L1: In-memory (fastest)
const cached = memoryCache.get(id)
if (cached) return cached
// L2: Redis (fast)
const redisData = await redis.get(`user:${id}`)
if (redisData) {
const user = JSON.parse(redisData)
memoryCache.set(id, user)
return user
}
// L3: Database (slow)
const user = await db.query.users.findFirst({
where: eq(users.id, id)
})
if (user) {
await redis.setex(`user:${id}`, 300, JSON.stringify(user))
memoryCache.set(id, user)
}
return user
}
```
### Cache Invalidation
```typescript
// ✅ CORRECT - Explicit invalidation
async function updateUser(id: UserId, data: UserUpdate) {
const user = await db.update(users)
.set(data)
.where(eq(users.id, id))
.returning()
// Invalidate all cache layers
memoryCache.delete(id)
await redis.del(`user:${id}`)
return user
}
```
### Database Indexes
Create indexes for:
- Foreign keys (always)
- WHERE clause columns (frequently queried)
- ORDER BY columns
- Covering indexes for hot queries
Avoid:
- Indexes on high-cardinality columns with low selectivity
- Too many indexes (slows writes)
- Redundant indexes (covered by composite)
## Monitoring & Observability
### Logging Levels
- ERROR: Requires immediate action
- WARN: Degraded state, still functional
- INFO: Significant events (user actions, state changes)
- DEBUG: Detailed diagnostic info (disabled in production)
### Structured Logging
```typescript
// ✅ CORRECT
logger.info('User created', {
userId: user.id,
email: user.email,
source: 'registration_flow',
duration_ms: performance.now() - start
})
// ❌ WRONG
console.log(`User ${user.id} created`)
```
### Metrics
Track:
- Request latency (p50, p95, p99)
- Error rate (by endpoint, by error type)
- Database query time
- Cache hit rate
- Queue depth
- Active connections
### Alerting
Alert on:
- Error rate >1% sustained for 5m
- p99 latency >2s sustained for 5m
- Database connections >80% pool size
- Disk usage >85%
- Memory usage >90%
## Deployment
### Environment Parity
- Dev, staging, production must match
- Same OS, runtime versions, dependencies
- Same environment variables (different values)
- Same infrastructure (scaled down for staging)
### Configuration
```typescript
// ✅ CORRECT - Type-safe config
const config = {
database: {
url: env.DATABASE_URL, // Required
poolSize: env.DB_POOL_SIZE ?? 10, // Optional with default
},
redis: {
url: env.REDIS_URL,
},
} as const
// Validate at startup
function validateConfig(config: unknown): Config {
// Throw if invalid (fail fast)
return configSchema.parse(config)
}
const validatedConfig = validateConfig(config)
```
### Zero-Downtime Deploys
1. Deploy new version alongside old
2. Health check new version
3. Gradually shift traffic (10%, 50%, 100%)
4. Monitor error rates
5. Rollback on degradation
6. Terminate old version after success
### Rollback Strategy
- Keep last 3 versions deployed
- One-command rollback
- Database migrations must be backward-compatible
- Feature flags for risky changes
## Documentation
### Code Comments
Only comment:
- Why, not what (code shows what)
- Non-obvious optimizations
- Workarounds for external bugs
- Complex algorithms (link to paper/article)
- Security-sensitive code
```typescript
// ❌ WRONG - Obvious comment
// Increment counter by 1
counter += 1
// ✅ CORRECT - Explains rationale
// Use post-increment to avoid race condition with concurrent readers
counter += 1
```
### README Requirements
Every repo must have:
- One-line description
- Prerequisites
- Setup instructions
- How to run tests
- How to deploy
- Architecture diagram
- API documentation link
### API Documentation
- Generate from code (OpenAPI, GraphQL schema)
- Include request/response examples
- Document error codes
- Link to runnable examples
### Architecture Decision Records
- **REQUIRED** for all significant decisions
- See [Architecture → ADRs](#architecture-decision-records-adrs) for full guidelines
- All ADRs in `adr/` folder with sequential numbering
- Agents MUST get code review approval before proceeding
- Keep `adr/README.md` index up to date
## Git Workflow
### Commit Messages
```
<type>(<scope>): <subject>
<body>
<footer>
```
Types:
- `feat`: New feature
- `fix`: Bug fix
- `perf`: Performance improvement
- `refactor`: Code restructuring
- `test`: Test additions/changes
- `docs`: Documentation only
- `chore`: Build, CI, dependencies
Rules:
- Subject: ≤50 chars, imperative mood, no period
- Body: Wrap at 72 chars, explain why not what
- Reference issues and ADRs in footer (e.g., "Implements ADR-042", "Refs #123")
### Branch Strategy
```
main - Production (protected)
├─ staging - Pre-production (protected)
└─ feat/* - Feature branches (ephemeral)
```
- Merge to staging first
- Staging → main after QA
- Delete branches after merge
- Never commit directly to main/staging
### Pull Requests
Required:
- ≥1 approval
- CI passing
- No merge conflicts
- Branch up to date with target
- Description explains changes
- Links to issue/ticket
- Links to ADR if architectural change
- ADR approved before PR if new architectural decision
## Common Pitfalls
### Race Conditions
```typescript
// ❌ WRONG - Race condition
const count = await getCount()
await setCount(count + 1)
// ✅ CORRECT - Atomic operation
await db.update(counter).set({
value: sql`${counter.value} + 1`
})
```
### Memory Leaks
Watch for:
- Event listeners not cleaned up
- Intervals/timeouts not cleared
- Growing caches without eviction
- Circular references in closures
### N+1 Queries
```typescript
// ❌ WRONG
const posts = await db.select().from(posts)
for (const post of posts) {
post.author = await db.query.users.findFirst({
where: eq(users.id, post.authorId)
})
}
// ✅ CORRECT
const posts = await db
.select()
.from(posts)
.leftJoin(users, eq(users.id, posts.authorId))
```
### Unbounded Operations
```typescript
// ❌ WRONG - No limit
const users = await db.select().from(users)
// ✅ CORRECT - Pagination
const users = await db
.select()
.from(users)
.limit(pageSize)
.offset(page * pageSize)
```
## Refactoring Checklist
Before refactoring:
- [ ] Tests exist and pass
- [ ] Understand current behavior completely
- [ ] Have clear improvement goal
- [ ] Know stopping condition
During refactoring:
- [ ] Keep tests passing at each step
- [ ] Commit frequently (atomic changes)
- [ ] No feature additions (refactor OR new feature, never both)
- [ ] Verify performance doesn't degrade
After refactoring:
- [ ] All tests still pass
- [ ] Code coverage maintained or improved
- [ ] Documentation updated
- [ ] No observable behavior change
## Review Guidelines
### What Reviewers Check
1. Correctness: Does it solve the problem?
2. Security: Any vulnerabilities?
3. Performance: Any red flags?
4. Maintainability: Will we understand this in 6 months?
5. Tests: Are critical paths covered?
### Review Etiquette
- Suggest, don't demand
- Explain why, not just what
- Approve if minor nits only
- Block for security, correctness, data loss
- Respond within 24h
### Self-Review Checklist
Before requesting review:
- [ ] Ran tests locally
- [ ] Manually tested feature
- [ ] Checked for console errors
- [ ] Reviewed own diff
- [ ] Removed debug code
- [ ] Updated documentation
## Emergency Response
### Production Incidents
1. **Acknowledge** (2m): Page on-call
2. **Mitigate** (15m): Stop the bleeding (rollback, kill feature flag)
3. **Investigate** (1h): Root cause analysis
4. **Fix** (4h): Permanent solution
5. **Review** (24h): Postmortem
### Postmortem Template
```markdown
# Incident: [Title]
**Date**: YYYY-MM-DD
**Duration**: Xh Ym
**Impact**: X users affected, Y requests failed
**Severity**: Critical/Major/Minor
## Timeline
- HH:MM - Incident began
- HH:MM - Detected
- HH:MM - Mitigated
- HH:MM - Resolved
## Root Cause
[What went wrong and why]
## Resolution
[How it was fixed]
## Action Items
- [ ] Prevent recurrence
- [ ] Improve detection
- [ ] Update runbooks
```
## Conclusion
These guidelines ensure:
- **Reliability**: Systems stay up
- **Velocity**: Teams move fast
- **Quality**: Code remains maintainable
- **Security**: Users stay protected
When in doubt:
1. Make it work
2. Make it right
3. Make it fast
In that order. Always.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment