I'll structure this as a learning path with concrete milestones. Each level builds on the previous, and I'll give you specific exercises to develop intuition—not just theory.
Before any techniques, you need the right mental model for what you're actually doing.
When you prompt an LLM, you're not "programming" in the traditional sense. You're providing context that influences probability distributions over token sequences. The model asks: "Given this context, what text is most likely to come next?"
Your job is to construct context that makes the response you want the statistically likely continuation.
Key insight: The model has no goals, no memory between sessions, no understanding of "you" as a persistent entity. Each prompt is evaluated fresh. Everything the model "knows" about the task must be in the prompt itself.
Everything happens within the context window—typically 100K-200K tokens for frontier models. This includes:
- System prompts (instructions from the application)
- Your prompt
- Any documents or code you provide
- The model's response
- Follow-up conversation
Practical implication: If information isn't in the context, the model doesn't have access to it. If you want the model to follow a coding convention, you must either describe it or show examples of it.
Models process tokens, not characters or words. A token is roughly 3-4 characters in English, but varies by language and content. Code tokenizes differently than prose. This matters because:
- Context limits are in tokens
- Models can struggle with character-level tasks (counting letters, anagrams)
- Some strings tokenize unexpectedly, affecting model behavior
These are the fundamentals that underpin everything else.
The most common prompting failure is ambiguity. Models will fill gaps with assumptions—often wrong ones.
Weak prompt:
Write a function to process user data
Strong prompt:
Write a Python function called `validate_user_input` that:
- Takes a dictionary with keys: 'email', 'age', 'username'
- Validates email format using regex
- Ensures age is an integer between 13 and 120
- Ensures username is 3-20 alphanumeric characters
- Returns a tuple: (bool, list of error messages)
- Raises no exceptions; all errors go in the error list
Exercise: Take a task you'd normally give to a junior developer verbally. Write it as a prompt. Now read it as if you knew nothing about your codebase or conventions. What's ambiguous? Rewrite until a stranger could implement it correctly.
Models adjust their "voice" based on how you frame the interaction.
You are a senior security engineer reviewing code for vulnerabilities.
Be thorough and pessimistic—assume attackers are creative.
This isn't roleplay; it's context that shifts which patterns the model draws on. A "security engineer" persona surfaces security-relevant knowledge more readily than a generic assistant framing.
When to use: When you want specialized knowledge or a particular analytical lens applied consistently.
Never leave output format to chance if it matters.
Respond with a JSON object containing:
{
"analysis": "your analysis here",
"risk_level": "low" | "medium" | "high",
"recommendations": ["array", "of", "strings"]
}
Output only valid JSON, no markdown code blocks, no additional text.
Key formats to know:
- JSON (structured data, API responses)
- Markdown (documentation, reports)
- XML (when you need nested structure with attributes)
- Plain text with delimiters (simpler parsing)
Examples are often more powerful than instructions. The model pattern-matches.
Convert informal requirements to user stories.
Example 1:
Input: "Users should be able to reset their password"
Output: "As a registered user, I want to reset my password via email so that I can regain access to my account if I forget my credentials."
Example 2:
Input: "Need admin dashboard"
Output: "As an administrator, I want a dashboard showing key metrics (active users, error rates, system health) so that I can monitor platform status at a glance."
Now convert:
Input: "Add search functionality"
Output:
Rule of thumb: 2-3 examples usually suffice. More can help with complex patterns but consumes context.
Exercise: Create a few-shot prompt that converts plain English error descriptions into structured bug reports with severity, reproduction steps, and expected vs. actual behavior.
For reasoning tasks, asking the model to show its work dramatically improves accuracy.
Without CoT:
Is this code thread-safe? [code]
Answer: Yes or No
With CoT:
Analyze whether this code is thread-safe.
Think through:
1. What shared state exists?
2. What operations modify shared state?
3. Are those operations atomic?
4. What race conditions could occur?
Then conclude with your assessment.
Why it works: The model's reasoning in early tokens influences later tokens. By generating intermediate reasoning, it "primes" itself with relevant analysis before committing to an answer.
Complex tasks should be broken into explicit steps.
Task: Review this pull request for a payment processing module.
Execute these steps in order:
## Step 1: Security Analysis
Identify any security vulnerabilities, especially:
- Input validation gaps
- Authentication/authorization issues
- Data exposure risks
## Step 2: Logic Correctness
Trace the payment flow and identify any logical errors or edge cases.
## Step 3: Error Handling
Evaluate whether errors are handled gracefully and appropriately logged.
## Step 4: Summary
Provide an overall assessment and list action items by priority.
Be explicit about what you don't want.
Refactor this function to improve readability.
Constraints:
- Do NOT change the function signature
- Do NOT change the observable behavior
- Do NOT add new dependencies
- Keep the function under 30 lines
Why it matters: Models optimize for plausible responses. Without constraints, they may "improve" things in ways that break your requirements.
Ask the model to check its own work.
Generate SQL for this query, then:
1. Explain what the query does in plain English
2. Identify any edge cases where it might return unexpected results
3. If you find issues, provide a corrected version
This catches errors the model might make on first pass.
If any requirements are ambiguous, list your assumptions before proceeding.
If critical information is missing, ask clarifying questions instead of guessing.
This prevents the model from confidently proceeding with wrong assumptions.
For complex tasks, break the interaction into stages rather than one massive prompt.
Stage 1: Provide context, ask for analysis
Stage 2: Based on analysis, ask for specific recommendations
Stage 3: Based on recommendations, ask for implementation
This lets you course-correct between stages and keeps each prompt focused.
Anti-pattern: Trying to get everything in one prompt. This often produces lower quality than iterative refinement.
Use the output of one prompt as input to another.
Prompt 1: "List all the API endpoints in this codebase with their HTTP methods"
↓
Prompt 2: "For each endpoint, identify what authentication is required"
↓
Prompt 3: "Generate OpenAPI documentation for these endpoints"
This is the manual version of what agentic tools do automatically.
Show what you don't want.
Good comment:
// Retry with exponential backoff to handle transient network failures
Bad comment (don't do this):
// increment i
i++
Negative examples are especially useful when models tend toward a problematic pattern.
Ask the model to help construct the prompt.
I need to write a prompt that will help me review database schemas for
performance issues. What information should I include in the prompt, and
how should I structure the request to get thorough analysis?
Then use the model's suggestions to build your actual prompt.
These reflect the current frontier of what's possible.
For production applications, system prompts follow a structured architecture:
<role>
You are a code review assistant for a fintech company.
</role>
<context>
Our stack: Python 3.11, FastAPI, PostgreSQL, Redis
Coding standards: PEP 8, type hints required, 80% test coverage minimum
Security requirements: PCI-DSS compliance required
</context>
<task>
Review pull requests for code quality, security, and adherence to standards.
</task>
<output_format>
Structure your review as:
1. Summary (2-3 sentences)
2. Critical Issues (blocking)
3. Suggestions (non-blocking)
4. Positive Notes (what's done well)
</output_format>
<guidelines>
- Be constructive, not condescending
- Cite specific line numbers
- Explain *why* something is an issue, not just *that* it is
- If uncertain, say so rather than guessing
</guidelines>
Modern models can call external tools. You define available tools, and the model decides when to use them.
{
"tools": [
{
"name": "run_tests",
"description": "Execute the test suite for a given module",
"parameters": {
"module_path": "string: path to the module to test"
}
},
{
"name": "search_codebase",
"description": "Search for code patterns or definitions",
"parameters": {
"query": "string: search query"
}
}
]
}Prompt engineering for tool use:
- Write clear, unambiguous tool descriptions
- Specify when each tool should (and shouldn't) be used
- Handle tool errors gracefully in your system prompt
For autonomous agents, prompts define behavior loops:
You are an autonomous coding agent. You can:
- Read files
- Write/modify files
- Run shell commands
- Search the web for documentation
## Execution Loop
1. Analyze the task and break it into steps
2. For each step:
a. Decide what action to take
b. Execute the action
c. Observe the result
d. Decide if the step is complete or needs iteration
3. Verify the complete solution works
4. Report results
## Guidelines
- Run tests after every code change
- If stuck for more than 3 attempts, ask for clarification
- Never modify files outside the project directory
- Commit logical units of work with descriptive messages
When working with large codebases or documentation, RAG retrieves relevant chunks and injects them into the prompt.
Prompting for RAG:
I've retrieved the following relevant code sections based on your query:
<retrieved_context>
[File: auth/middleware.py, Lines 45-80]
[code here]
[File: models/user.py, Lines 1-50]
[code here]
</retrieved_context>
Using the above context, answer: How does the authentication flow work?
If the provided context is insufficient, say what additional information you'd need.
Key techniques:
- Clearly delineate retrieved content from the prompt
- Tell the model to acknowledge when context is insufficient
- Provide metadata (filenames, line numbers) so the model can reference sources
Inspired by Anthropic's research—build self-checking into the prompt:
Before providing your final answer, evaluate it against these criteria:
1. Does it actually answer the question asked?
2. Is it technically accurate?
3. Does it follow the specified constraints?
4. Could it cause harm if followed blindly?
If any check fails, revise before responding.
Goal: Develop intuition for how prompts affect outputs
Exercises:
- Take 5 tasks from your real work. Write prompts for each. Run them. Note where outputs diverge from expectations.
- For each failure, identify: Was it ambiguity? Missing context? Wrong format? Rewrite and test.
- Create a "prompt template" for code review that you can reuse.
Milestone: You can write prompts that work on first try 70% of the time for well-defined tasks.
Goal: Master few-shot, chain-of-thought, and output formatting
Exercises:
- Create a few-shot prompt for converting requirements to test cases. Refine until consistent.
- Compare CoT vs. direct answers for a debugging task. Measure accuracy difference.
- Build a prompt that outputs valid JSON for a schema you define. Handle edge cases.
Milestone: You can reliably get structured, reasoned outputs for complex tasks.
Goal: Build reusable prompt systems
Exercises:
- Design a system prompt for an application (code assistant, documentation helper, etc.)
- Implement a multi-turn workflow for a complex task (e.g., design doc → implementation plan → code)
- Create prompts that work with tool calling
Milestone: You can architect prompt-based systems, not just individual prompts.
Goal: Understand and apply state-of-the-art patterns
Exercises:
- Use Claude Code or similar tool. Observe its prompting patterns. What makes them effective?
- Design an agentic workflow prompt for a task in your domain
- Implement a RAG-style prompt that handles retrieved context gracefully
Milestone: You understand how production AI coding tools work and can customize their behavior.
| Failure Mode | Symptom | Fix |
|---|---|---|
| Ambiguity | Output is plausible but wrong | Add constraints, examples, explicit requirements |
| Context overflow | Model "forgets" early instructions | Move critical instructions to end, summarize context |
| Format drift | Output structure changes mid-response | Use XML tags to structure, add format reminders |
| Hallucination | Confident but incorrect facts | Ask for sources, add "if unsure, say so" |
| Verbosity | Excessive preamble or explanation | "Be concise. Skip preamble. Start directly with [X]" |
| Refusal | Model declines valid task | Reframe task, add context about legitimate purpose |
-
Anthropic's Prompting Documentation: https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview Comprehensive and authoritative.
-
OpenAI Cookbook: Practical examples across use cases
-
LangChain / LlamaIndex documentation: If you want to build applications, these frameworks encode best practices
-
Read production system prompts: When you use AI tools, observe what works. Many system prompts are discoverable or documented.
Want me to dive deeper into any specific level, create more exercises for a particular technique, or work through a concrete example from your domain?