Created
December 18, 2025 07:11
-
-
Save andrew-kramer-inno/34f9303a5cc29a14af7c2e729b676fc9 to your computer and use it in GitHub Desktop.
Anthropic api limits
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The logic is actually quite simple, when /v1/messages api is called through claude code, backend server returns something like | |
| ``` | |
| anthropic-ratelimit-unified-5h-reset: 1765944000 | |
| anthropic-ratelimit-unified-5h-status: allowed | |
| anthropic-ratelimit-unified-5h-utilization: 0.042598363636363636 | |
| anthropic-ratelimit-unified-7d-reset: 1766030400 | |
| anthropic-ratelimit-unified-7d-status: allowed | |
| anthropic-ratelimit-unified-7d-utilization: 0.3068459187383675 | |
| anthropic-ratelimit-unified-fallback-percentage: 0.5 | |
| ``` | |
| in response headers. | |
| So by comparing deltas between 5h/7d limits and the input/output/cache token count, you can derive their correlations if you sample enough data points. You can also try this by yourself using mitm proxies, but be aware that it is a potential violation of tos because you have to get past certificate pinning. | |
| Additional findings: | |
| Unlike standard API rates, 5m cached writes weighs the same as standard inputs (1x instead of 1.25x). And cached read is completely free instead of 0.1x standard input. So the actual value of the equivalent credit is even higher. | |
| Weighing between standard input and standard output tokens is the same at 5x (standard rates is $5/$25 MTok) | |
| Some of my experiment logs, to prove that I didn't make up the numbers /s | |
| ``` | |
| Δutil_5h (probe before -> cache creation) = 0.00026263636363636246 | |
| Δutil_5h (cache creation -> cache read) = 5.454545454544601e-07 | |
| cache_creation_input_tokens=4325 | |
| cache_read_input_tokens=4325 | |
| Cost_cache_create_5h = 6.072517078297398e-08 | |
| Cost_cache_read_5h = 1.261166579085457e-10 | |
| Cost_cache_create_5h / Standard_input_cost_in_5h = 1.0006936416184973 | |
| ================================================================================ | |
| [compare] derived weights (5h) | |
| W_in (Cost_in_5h) = 6.068307847420574e-08 | |
| W_out (Cost_out_5h) = 3.030372626449337e-07 | |
| Ratio W_out/W_in = 4.993768778123942 | |
| ``` | |
| Note that Δutil_5h (cache creation -> cache read) is actually caused by the dummy input_tokens and 1 single output_tokens, full usage object for that response is | |
| "usage": { | |
| "input_tokens": 3, | |
| "cache_creation_input_tokens": 4325, | |
| "cache_read_input_tokens": 0, | |
| "cache_creation": { | |
| "ephemeral_5m_input_tokens": 4325, | |
| "ephemeral_1h_input_tokens": 0 | |
| }, | |
| "output_tokens": 1, | |
| "service_tier": "standard" | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I love it.