Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save pyykkis/16564c7dc0661de4153116233b307d2c to your computer and use it in GitHub Desktop.

Select an option

Save pyykkis/16564c7dc0661de4153116233b307d2c to your computer and use it in GitHub Desktop.
Prompt Engineering: From Fundamentals to State of the Art

Prompt Engineering: From Fundamentals to State of the Art

I'll structure this as a learning path with concrete milestones. Each level builds on the previous, and I'll give you specific exercises to develop intuition—not just theory.


Level 0: Mental Model Foundation

Before any techniques, you need the right mental model for what you're actually doing.

What Prompting Actually Is

When you prompt an LLM, you're not "programming" in the traditional sense. You're providing context that influences probability distributions over token sequences. The model asks: "Given this context, what text is most likely to come next?"

Your job is to construct context that makes the response you want the statistically likely continuation.

Key insight: The model has no goals, no memory between sessions, no understanding of "you" as a persistent entity. Each prompt is evaluated fresh. Everything the model "knows" about the task must be in the prompt itself.

The Context Window as Your Entire Working Memory

Everything happens within the context window—typically 100K-200K tokens for frontier models. This includes:

  • System prompts (instructions from the application)
  • Your prompt
  • Any documents or code you provide
  • The model's response
  • Follow-up conversation

Practical implication: If information isn't in the context, the model doesn't have access to it. If you want the model to follow a coding convention, you must either describe it or show examples of it.

Tokens, Not Characters

Models process tokens, not characters or words. A token is roughly 3-4 characters in English, but varies by language and content. Code tokenizes differently than prose. This matters because:

  • Context limits are in tokens
  • Models can struggle with character-level tasks (counting letters, anagrams)
  • Some strings tokenize unexpectedly, affecting model behavior

Level 1: Core Techniques

These are the fundamentals that underpin everything else.

1.1 Clear Task Specification

The most common prompting failure is ambiguity. Models will fill gaps with assumptions—often wrong ones.

Weak prompt:

Write a function to process user data

Strong prompt:

Write a Python function called `validate_user_input` that:
- Takes a dictionary with keys: 'email', 'age', 'username'
- Validates email format using regex
- Ensures age is an integer between 13 and 120
- Ensures username is 3-20 alphanumeric characters
- Returns a tuple: (bool, list of error messages)
- Raises no exceptions; all errors go in the error list

Exercise: Take a task you'd normally give to a junior developer verbally. Write it as a prompt. Now read it as if you knew nothing about your codebase or conventions. What's ambiguous? Rewrite until a stranger could implement it correctly.

1.2 Persona and Context Setting

Models adjust their "voice" based on how you frame the interaction.

You are a senior security engineer reviewing code for vulnerabilities. 
Be thorough and pessimistic—assume attackers are creative.

This isn't roleplay; it's context that shifts which patterns the model draws on. A "security engineer" persona surfaces security-relevant knowledge more readily than a generic assistant framing.

When to use: When you want specialized knowledge or a particular analytical lens applied consistently.

1.3 Output Format Specification

Never leave output format to chance if it matters.

Respond with a JSON object containing:
{
  "analysis": "your analysis here",
  "risk_level": "low" | "medium" | "high",
  "recommendations": ["array", "of", "strings"]
}

Output only valid JSON, no markdown code blocks, no additional text.

Key formats to know:

  • JSON (structured data, API responses)
  • Markdown (documentation, reports)
  • XML (when you need nested structure with attributes)
  • Plain text with delimiters (simpler parsing)

1.4 Few-Shot Examples

Examples are often more powerful than instructions. The model pattern-matches.

Convert informal requirements to user stories.

Example 1:
Input: "Users should be able to reset their password"
Output: "As a registered user, I want to reset my password via email so that I can regain access to my account if I forget my credentials."

Example 2:
Input: "Need admin dashboard"
Output: "As an administrator, I want a dashboard showing key metrics (active users, error rates, system health) so that I can monitor platform status at a glance."

Now convert:
Input: "Add search functionality"
Output:

Rule of thumb: 2-3 examples usually suffice. More can help with complex patterns but consumes context.

Exercise: Create a few-shot prompt that converts plain English error descriptions into structured bug reports with severity, reproduction steps, and expected vs. actual behavior.

1.5 Chain of Thought (CoT)

For reasoning tasks, asking the model to show its work dramatically improves accuracy.

Without CoT:

Is this code thread-safe? [code]
Answer: Yes or No

With CoT:

Analyze whether this code is thread-safe.

Think through:
1. What shared state exists?
2. What operations modify shared state?
3. Are those operations atomic?
4. What race conditions could occur?

Then conclude with your assessment.

Why it works: The model's reasoning in early tokens influences later tokens. By generating intermediate reasoning, it "primes" itself with relevant analysis before committing to an answer.


Level 2: Intermediate Techniques

2.1 Structured Decomposition

Complex tasks should be broken into explicit steps.

Task: Review this pull request for a payment processing module.

Execute these steps in order:

## Step 1: Security Analysis
Identify any security vulnerabilities, especially:
- Input validation gaps
- Authentication/authorization issues
- Data exposure risks

## Step 2: Logic Correctness
Trace the payment flow and identify any logical errors or edge cases.

## Step 3: Error Handling
Evaluate whether errors are handled gracefully and appropriately logged.

## Step 4: Summary
Provide an overall assessment and list action items by priority.

2.2 Constraint Specification

Be explicit about what you don't want.

Refactor this function to improve readability.

Constraints:
- Do NOT change the function signature
- Do NOT change the observable behavior
- Do NOT add new dependencies
- Keep the function under 30 lines

Why it matters: Models optimize for plausible responses. Without constraints, they may "improve" things in ways that break your requirements.

2.3 Self-Consistency and Verification

Ask the model to check its own work.

Generate SQL for this query, then:
1. Explain what the query does in plain English
2. Identify any edge cases where it might return unexpected results
3. If you find issues, provide a corrected version

This catches errors the model might make on first pass.

2.4 Handling Ambiguity Explicitly

If any requirements are ambiguous, list your assumptions before proceeding.
If critical information is missing, ask clarifying questions instead of guessing.

This prevents the model from confidently proceeding with wrong assumptions.


Level 3: Advanced Techniques

3.1 Multi-Turn Strategy

For complex tasks, break the interaction into stages rather than one massive prompt.

Stage 1: Provide context, ask for analysis Stage 2: Based on analysis, ask for specific recommendations
Stage 3: Based on recommendations, ask for implementation

This lets you course-correct between stages and keeps each prompt focused.

Anti-pattern: Trying to get everything in one prompt. This often produces lower quality than iterative refinement.

3.2 Prompt Chaining

Use the output of one prompt as input to another.

Prompt 1: "List all the API endpoints in this codebase with their HTTP methods"
↓
Prompt 2: "For each endpoint, identify what authentication is required"
↓  
Prompt 3: "Generate OpenAPI documentation for these endpoints"

This is the manual version of what agentic tools do automatically.

3.3 Negative Examples

Show what you don't want.

Good comment:
// Retry with exponential backoff to handle transient network failures

Bad comment (don't do this):
// increment i
i++

Negative examples are especially useful when models tend toward a problematic pattern.

3.4 Meta-Prompting

Ask the model to help construct the prompt.

I need to write a prompt that will help me review database schemas for 
performance issues. What information should I include in the prompt, and
how should I structure the request to get thorough analysis?

Then use the model's suggestions to build your actual prompt.


Level 4: State-of-the-Art Techniques

These reflect the current frontier of what's possible.

4.1 System Prompt Architecture

For production applications, system prompts follow a structured architecture:

<role>
You are a code review assistant for a fintech company.
</role>

<context>
Our stack: Python 3.11, FastAPI, PostgreSQL, Redis
Coding standards: PEP 8, type hints required, 80% test coverage minimum
Security requirements: PCI-DSS compliance required
</context>

<task>
Review pull requests for code quality, security, and adherence to standards.
</task>

<output_format>
Structure your review as:
1. Summary (2-3 sentences)
2. Critical Issues (blocking)
3. Suggestions (non-blocking)
4. Positive Notes (what's done well)
</output_format>

<guidelines>
- Be constructive, not condescending
- Cite specific line numbers
- Explain *why* something is an issue, not just *that* it is
- If uncertain, say so rather than guessing
</guidelines>

4.2 Tool Use and Function Calling

Modern models can call external tools. You define available tools, and the model decides when to use them.

{
  "tools": [
    {
      "name": "run_tests",
      "description": "Execute the test suite for a given module",
      "parameters": {
        "module_path": "string: path to the module to test"
      }
    },
    {
      "name": "search_codebase", 
      "description": "Search for code patterns or definitions",
      "parameters": {
        "query": "string: search query"
      }
    }
  ]
}

Prompt engineering for tool use:

  • Write clear, unambiguous tool descriptions
  • Specify when each tool should (and shouldn't) be used
  • Handle tool errors gracefully in your system prompt

4.3 Agentic Prompting

For autonomous agents, prompts define behavior loops:

You are an autonomous coding agent. You can:
- Read files
- Write/modify files
- Run shell commands
- Search the web for documentation

## Execution Loop
1. Analyze the task and break it into steps
2. For each step:
   a. Decide what action to take
   b. Execute the action
   c. Observe the result
   d. Decide if the step is complete or needs iteration
3. Verify the complete solution works
4. Report results

## Guidelines
- Run tests after every code change
- If stuck for more than 3 attempts, ask for clarification
- Never modify files outside the project directory
- Commit logical units of work with descriptive messages

4.4 Retrieval-Augmented Generation (RAG) Integration

When working with large codebases or documentation, RAG retrieves relevant chunks and injects them into the prompt.

Prompting for RAG:

I've retrieved the following relevant code sections based on your query:

<retrieved_context>
[File: auth/middleware.py, Lines 45-80]
[code here]

[File: models/user.py, Lines 1-50]  
[code here]
</retrieved_context>

Using the above context, answer: How does the authentication flow work?

If the provided context is insufficient, say what additional information you'd need.

Key techniques:

  • Clearly delineate retrieved content from the prompt
  • Tell the model to acknowledge when context is insufficient
  • Provide metadata (filenames, line numbers) so the model can reference sources

4.5 Constitutional AI Patterns

Inspired by Anthropic's research—build self-checking into the prompt:

Before providing your final answer, evaluate it against these criteria:
1. Does it actually answer the question asked?
2. Is it technically accurate?
3. Does it follow the specified constraints?
4. Could it cause harm if followed blindly?

If any check fails, revise before responding.

Learning Path with Milestones

Week 1-2: Foundation

Goal: Develop intuition for how prompts affect outputs

Exercises:

  1. Take 5 tasks from your real work. Write prompts for each. Run them. Note where outputs diverge from expectations.
  2. For each failure, identify: Was it ambiguity? Missing context? Wrong format? Rewrite and test.
  3. Create a "prompt template" for code review that you can reuse.

Milestone: You can write prompts that work on first try 70% of the time for well-defined tasks.

Week 3-4: Structured Techniques

Goal: Master few-shot, chain-of-thought, and output formatting

Exercises:

  1. Create a few-shot prompt for converting requirements to test cases. Refine until consistent.
  2. Compare CoT vs. direct answers for a debugging task. Measure accuracy difference.
  3. Build a prompt that outputs valid JSON for a schema you define. Handle edge cases.

Milestone: You can reliably get structured, reasoned outputs for complex tasks.

Week 5-6: Production Patterns

Goal: Build reusable prompt systems

Exercises:

  1. Design a system prompt for an application (code assistant, documentation helper, etc.)
  2. Implement a multi-turn workflow for a complex task (e.g., design doc → implementation plan → code)
  3. Create prompts that work with tool calling

Milestone: You can architect prompt-based systems, not just individual prompts.

Week 7-8: Agentic and Advanced

Goal: Understand and apply state-of-the-art patterns

Exercises:

  1. Use Claude Code or similar tool. Observe its prompting patterns. What makes them effective?
  2. Design an agentic workflow prompt for a task in your domain
  3. Implement a RAG-style prompt that handles retrieved context gracefully

Milestone: You understand how production AI coding tools work and can customize their behavior.


Common Failure Modes and Fixes

Failure Mode Symptom Fix
Ambiguity Output is plausible but wrong Add constraints, examples, explicit requirements
Context overflow Model "forgets" early instructions Move critical instructions to end, summarize context
Format drift Output structure changes mid-response Use XML tags to structure, add format reminders
Hallucination Confident but incorrect facts Ask for sources, add "if unsure, say so"
Verbosity Excessive preamble or explanation "Be concise. Skip preamble. Start directly with [X]"
Refusal Model declines valid task Reframe task, add context about legitimate purpose

Resources for Continued Learning

  1. Anthropic's Prompting Documentation: https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview Comprehensive and authoritative.

  2. OpenAI Cookbook: Practical examples across use cases

  3. LangChain / LlamaIndex documentation: If you want to build applications, these frameworks encode best practices

  4. Read production system prompts: When you use AI tools, observe what works. Many system prompts are discoverable or documented.


Want me to dive deeper into any specific level, create more exercises for a particular technique, or work through a concrete example from your domain?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment