AI Prompt Caching #ai #prompt #cache #caching

Input Caching

Not like regular output caching (think database).

Prompt Caching

- Cache only the input prompt not the output.
	- Model computes key value pairs at every transformer layer
	- Cache stores the computed kv pairs

Context Window Contents - Doc - Summary - Some other type of questions - Big savings

What Can Be Cached?

- Documents
- System Prompts
- Few Shot Examples
- Tool Function definitions

How Does LLM Know What Gets Cached?

- Prefix matching
- Prompt Structure is key
	- System Instructions
    - Manual
    - Few Shot Examples
  Question
  	- What is the warranty terms?

How Much To Cache?

- Need minimum 1024 tokens to make it worth it
- Cleared in 5-10 minutes to keep data fresh
- Some providers provide automatic caching
- Some do explicit caching
	- Asks you to mark what should be cached

jcohen66/ai_prompt_caching.md

Select an option

No results found

Select an option

No results found

Input Caching

Prompt Caching

What Can Be Cached?

How Does LLM Know What Gets Cached?

How Much To Cache?