Skip to content

Instantly share code, notes, and snippets.

@jcohen66
Last active February 7, 2026 17:47
Show Gist options
  • Select an option

  • Save jcohen66/0511c1f07abc40ebfd4663f863087b19 to your computer and use it in GitHub Desktop.

Select an option

Save jcohen66/0511c1f07abc40ebfd4663f863087b19 to your computer and use it in GitHub Desktop.
AI Prompt Caching #ai #prompt #cache #caching

Input Caching

Not like regular output caching (think database).

Prompt Caching

- Cache only the input prompt not the output.
	- Model computes key value pairs at every transformer layer
	- Cache stores the computed kv pairs

Context Window Contents - Doc - Summary - Some other type of questions - Big savings

What Can Be Cached?

- Documents
- System Prompts
- Few Shot Examples
- Tool Function definitions

How Does LLM Know What Gets Cached?

- Prefix matching
- Prompt Structure is key
	- System Instructions
    - Manual
    - Few Shot Examples
  Question
  	- What is the warranty terms?

How Much To Cache?

- Need minimum 1024 tokens to make it worth it
- Cleared in 5-10 minutes to keep data fresh
- Some providers provide automatic caching
- Some do explicit caching
	- Asks you to mark what should be cached
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment