Skip to content

Instantly share code, notes, and snippets.

@roninjin10
Created December 18, 2025 14:05
Show Gist options
  • Select an option

  • Save roninjin10/7763e68f44b3505dd8db2ea6f295f806 to your computer and use it in GitHub Desktop.

Select an option

Save roninjin10/7763e68f44b3505dd8db2ea6f295f806 to your computer and use it in GitHub Desktop.
Hill Climbing Context Draft 3

Hill Climbing Context

The control surface for agents

Most people talk about “getting better at prompting” as if the goal were to discover a perfect incantation that unlocks the model’s intelligence. That framing misses the actual control surface we have.

A modern LLM call is stateless. It does not remember your repo, your last run, or what you meant five minutes ago. It only sees what you send this time. The “context window” is not a mystical entity; it’s the tokens in the request: instructions, the plan, tool outputs, snippets of files, diffs, logs—everything the model can condition on.

Once that clicks, a blunt conclusion follows:

For a fixed model, performance is a function of context quality—because context is the only thing the agent can actually control at inference time.

Everything else—tools, planning mode, subagents, “agentic coding”—is downstream of one thing: repeatedly producing higher-quality context windows.

This is what I call hill climbing context.


1) Two hills

Hill climbing is just iterating uphill until progress stops.

There are two different “hills” worth separating:

Hill climbing the output

This is the common loop: generate something, critique it, regenerate, repeat. UI iteration is the cleanest mental model: “polish… do better… do even better” until improvements plateau or regress.

Hill climbing the context

This is the higher-order move. Instead of pushing the output uphill directly, push uphill on the substrate the model reasons over.

Not: “make it better.” But: “what information would make the next attempt obviously better?”

That shift sounds philosophical. It’s not. It’s mechanical. Context hill climbing is just doing context transformations in a feedback loop.


2) What “context” actually is

A context window is the full input to a single model call. That’s it.

  • The model is stateless: input tokens → output tokens.
  • A chat UI simulates continuity by resending prior tokens plus the new message.
  • An agent expands this by adding tool outputs, summaries, handoffs, and sometimes rewriting or pruning the context.

This has an uncomfortable implication: every call is onboarding a new intelligence “from scratch.” If a fact is only in a human head, it does not exist for the model.

So the job becomes: manufacture the best possible context windows, repeatedly.


3) The three context moves

For a fixed model, there are only three ways to systematically improve performance (plus a technique that makes them scale):

3.1 Delete incorrect context

Incorrect context is the worst kind.

Not because it fails to help, but because it actively pulls the model off-course.

Incorrect context is not merely unhelpful—it creates tunnel vision. The model may visibly try to move toward the correct answer, but it keeps snapping back toward the false anchor because that anchor is inside its world-model.

There’s one nuance: it’s fine to include wrong paths if they’re explicitly labeled as wrong and distilled down to negative evidence:

  • “We tried X; it failed because Y; don’t do X again.”

That is correct context (it describes reality). What’s toxic is carrying a whole transcript of dead-end reasoning forward as if it were useful memory.

3.2 Add missing context

Missing context is what tool use is fundamentally for.

A tool call is not “extra intelligence.” It’s a way to turn unknowns into tokens:

  • run tests and capture the failure,
  • inspect a file,
  • search the repo,
  • produce a diff,
  • gather trace output,
  • validate a claim.

The key: tool outputs are only valuable if they are real. A broken feedback channel doesn’t just fail to help—it becomes incorrect context, because the loop starts optimizing around a phantom signal.

3.3 Remove useless context

Useless context is a tax. It burns tokens and can degrade performance—sometimes mildly, sometimes catastrophically.

There’s a specific failure mode that is consistently brutal: reusing one long context window across unrelated tasks. Cross-task residue behaves like adversarial noise. The model starts pattern-matching against the wrong story.

The highest-leverage operational rule is boring:

If you can /clear, you should /clear.

Most people don’t. Forever-context is convenient, and it’s also a quiet performance killer.

3.4 Compress (the scaling technique)

Compression is how the first three moves scale.

  • Distill failures into a few lines.
  • Convert bloated artifacts into compact representations.
  • Keep the smallest set of constraints that preserves correctness.

Compression can go too far (you can drop the one detail that mattered). That’s not a philosophical objection; it’s a stop condition. Hill climbing can go downhill.


4) Agents are feedback loops over context

A lot of people treat agents like black boxes. The cleaner framing is:

  • The model is the model.
  • The agent is the loop.

An agent is a system that repeatedly:

  1. sends a context window,
  2. gets output,
  3. uses feedback (often via tools),
  4. builds a better context window for the next step.

This is why the “model without a feedback loop is almost nothing” framing is directionally true. A raw model can generate plausible text. But convergence—especially in engineering tasks—comes from structured feedback that updates the next context window.


5) Prompt Rebasing: the pattern that changes everything

This is the move that feels “crazy” in human workflows but becomes rational with agents:

Most people treat generated code as the asset and the prompt as the disposable wrapper. For agentic work, that’s often backwards.

Pattern: Prompt Rebasing

Problem: A run “works,” but it was messy: the model got stuck, used way more context than expected, debugged itself out of a bind, or produced something that feels fragile. Reviewing and patching the output is expensive because the human is the bottleneck.

Move: Throw the code away. Update the original plan/prompt based on hindsight. Rerun from scratch.

The point isn’t drama. The point is that the durable artifact is the prompt-plan-spec that reliably produces good work. Code becomes rebuildable output.

The .bak incident (literal example)

A feature required copying code from a different project into the current project. The other project folder had a .bak ending. The missing detail: despite the similar name, that .bak folder was not “a previous state of this repo.” The model interpreted it as repo history and went into git history, restoring the wrong thing.

That’s missing context turning into incorrect context.

Prompt rebasing is what fixes it at the right layer: update the prompt with a single disambiguating constraint (“this .bak folder is a different project; do not treat it as repo history”), clear the messy context, rerun clean.


6) Planning mode, used correctly: “plan the plan”

Planning is valuable. Planning is also messy.

The common failure mode is letting planning debris pollute implementation:

  • grep loops,
  • “oh that’s not what I wanted,”
  • reading irrelevant files to find one tiny fact,
  • web searching,
  • backtracking.

By the time the plan exists, the context window contains a graveyard of partial hypotheses and irrelevant residue.

The disciplined workflow is:

  1. Plan.
  2. Clear.
  3. Implement in a fresh context window that contains the finalized plan and only the minimal constraints.

This is why “planning mode” is useful in practice: it encourages planning-first behavior and tends to reduce certain kinds of clutter. But the deeper principle is independent of UI: implementation quality correlates strongly with starting from a clean, high-signal context window.


7) Subagents: where the mess belongs

Subagents are a structural solution to the same problem: keep exploration messy, keep implementation clean.

The most important subagent pattern is simple:

  • Let a subagent do the searching, grepping, and circling.
  • Keep only the distilled answer and the final plan in the master context.
  • Throw away everything the subagent did to get there.

Planning is allowed to be chaotic as long as the implementation context is not.

This is why subagents are such a big deal: they act like context garbage collectors. They let the system spend tokens exploring without forcing the implementation run to carry those tokens forward.


8) Wrong instrumentation: the EVM tracer story

A single incorrect assumption can poison an entire loop.

A concrete example: debugging an Ethereum Virtual Machine against the official EVM tests. The workflow assumed the existence of a tracer that would diff the local EVM trace against the reference EVM trace and show the mismatch. The whole debugging strategy was “debug the diff.”

But the tracer/diff tool wasn’t actually working.

That’s not merely missing context; it’s incorrect context about the feedback channel itself. The model spends cycles chasing a signal that isn’t real—trying to reason around it, or trying to repair it, or optimizing a plan that depends on it. A 24/7 loop can burn a day and a pile of credits this way.

Fixing the diff tool flips the regime: once the feedback becomes real, the loop has ground truth, and the model starts debugging tests quickly.

The general lesson:

A broken feedback channel doesn’t just fail to help—it becomes a false anchor.


9) Vibe coding vs context engineering

There are two legitimate modes of working, and they shouldn’t be confused.

Vibe coding is being part of the loop in a low-friction way—copy-pasting errors, running commands, nudging the model. It’s useful for building intuition (“model empathy”) and for low-stakes momentum.

Context engineering is intentional: aggressively keeping context clean, planning before implementing, using subagents to isolate mess, rebasing prompts instead of patching output, and treating verification as the definition of “done.”

Both have a place. But high-quality, repeatable output at scale comes from context engineering, not from staying in a forever chat and hoping.


10) The endgame: factory making

Prompt rebasing is the local optimization: fix the prompt, rerun the feature.

There’s a “next level up” that emerges naturally: instead of building only a codebase, build a factory of prompts/specs that produces the codebase.

  • Multiple specs exist at once.
  • The specs are refined and reordered (e.g., start with the most impactful for a PoC, or start with the cleanest foundations so later iterations inherit good context).
  • If a step is not worth the time right now, revert the attempt and backlog it without contaminating the rest of the factory.

In that framing, the prompt/spec repository becomes the durable system. Code is the compiled artifact.

This sounds extreme until the economics are acknowledged: for an agent, rerunning from scratch is often cheaper than incremental patching—because the human bottleneck dominates.


11) What this reduces to (the non-overengineered version)

Most workflows here are overkill for most people. The highest-return behavior change is still the simplest:

  1. Clear context when the task changes.
  2. Treat wrong context as a critical bug.
  3. Use real feedback (tests, diffs, trace output) as first-class context.
  4. When a run is messy, rebase the prompt and rerun instead of patching the mess.

Everything else—planning mode, subagents, factories—is an implementation detail around these mechanics.


Closing

A model call is stateless. The only way to make the system “smarter” without changing the model is to make the context better.

So the core skill is not writing perfect prompts. It’s the willingness to run the loop: delete incorrect context, add missing context, remove useless context, compress what remains—and repeat until the run starts clean and executes clean.

In practice, the best practitioners are not prompt writers; they are feedback loop engineers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment