Not like regular output caching (think database).
- Cache only the input prompt not the output.
- Model computes key value pairs at every transformer layer
- Cache stores the computed kv pairs
Context Window Contents - Doc - Summary - Some other type of questions - Big savings
- Documents
- System Prompts
- Few Shot Examples
- Tool Function definitions
- Prefix matching
- Prompt Structure is key
- System Instructions
- Manual
- Few Shot Examples
Question
- What is the warranty terms?
- Need minimum 1024 tokens to make it worth it
- Cleared in 5-10 minutes to keep data fresh
- Some providers provide automatic caching
- Some do explicit caching
- Asks you to mark what should be cached