andrew-kramer-inno · December 18, 2025 07:11 · LawtonRescue · Dec 20, 2025 · RahulSDeshpande · Dec 22, 2025
diff --git a/gistfile1.txt b/gistfile1.txt
 The logic is actually quite simple, when /v1/messages api is called through claude code, backend server returns something like
 ```
 anthropic-ratelimit-unified-5h-reset: 1765944000
 anthropic-ratelimit-unified-5h-status: allowed
 anthropic-ratelimit-unified-5h-utilization: 0.042598363636363636
 anthropic-ratelimit-unified-7d-reset: 1766030400
 anthropic-ratelimit-unified-7d-status: allowed
 anthropic-ratelimit-unified-7d-utilization: 0.3068459187383675
 anthropic-ratelimit-unified-fallback-percentage: 0.5
 ```
 in response headers.
 So by comparing deltas between 5h/7d limits and the input/output/cache token count, you can derive their correlations if you sample enough data points. You can also try this by yourself using mitm proxies, but be aware that it is a potential violation of tos because you have to get past certificate pinning.

 Additional findings:

 Unlike standard API rates, 5m cached writes weighs the same as standard inputs (1x instead of 1.25x). And cached read is completely free instead of 0.1x standard input. So the actual value of the equivalent credit is even higher.

 Weighing between standard input and standard output tokens is the same at 5x (standard rates is $5/$25 MTok)

 Some of my experiment logs, to prove that I didn't make up the numbers /s
 ```
 Δutil_5h (probe before -> cache creation) = 0.00026263636363636246

 Δutil_5h (cache creation -> cache read) = 5.454545454544601e-07

 cache_creation_input_tokens=4325

 cache_read_input_tokens=4325

 Cost_cache_create_5h = 6.072517078297398e-08

 Cost_cache_read_5h = 1.261166579085457e-10

 Cost_cache_create_5h / Standard_input_cost_in_5h = 1.0006936416184973

 ================================================================================

 [compare] derived weights (5h)

 W_in (Cost_in_5h) = 6.068307847420574e-08

 W_out (Cost_out_5h) = 3.030372626449337e-07

 Ratio W_out/W_in = 4.993768778123942
 ```
 Note that Δutil_5h (cache creation -> cache read) is actually caused by the dummy input_tokens and 1 single output_tokens, full usage object for that response is

 "usage": {
    "input_tokens": 3,
    "cache_creation_input_tokens": 4325,
    "cache_read_input_tokens": 0,
    "cache_creation": {
        "ephemeral_5m_input_tokens": 4325,
        "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 1,
    "service_tier": "standard"
 }
	The logic is actually quite simple, when /v1/messages api is called through claude code, backend server returns something like
	```
	anthropic-ratelimit-unified-5h-reset: 1765944000
	anthropic-ratelimit-unified-5h-status: allowed
	anthropic-ratelimit-unified-5h-utilization: 0.042598363636363636
	anthropic-ratelimit-unified-7d-reset: 1766030400
	anthropic-ratelimit-unified-7d-status: allowed
	anthropic-ratelimit-unified-7d-utilization: 0.3068459187383675
	anthropic-ratelimit-unified-fallback-percentage: 0.5
	```
	in response headers.
	So by comparing deltas between 5h/7d limits and the input/output/cache token count, you can derive their correlations if you sample enough data points. You can also try this by yourself using mitm proxies, but be aware that it is a potential violation of tos because you have to get past certificate pinning.

	Additional findings:

	Unlike standard API rates, 5m cached writes weighs the same as standard inputs (1x instead of 1.25x). And cached read is completely free instead of 0.1x standard input. So the actual value of the equivalent credit is even higher.

	Weighing between standard input and standard output tokens is the same at 5x (standard rates is $5/$25 MTok)

	Some of my experiment logs, to prove that I didn't make up the numbers /s
	```
	Δutil_5h (probe before -> cache creation) = 0.00026263636363636246

	Δutil_5h (cache creation -> cache read) = 5.454545454544601e-07

	cache_creation_input_tokens=4325

	cache_read_input_tokens=4325

	Cost_cache_create_5h = 6.072517078297398e-08

	Cost_cache_read_5h = 1.261166579085457e-10

	Cost_cache_create_5h / Standard_input_cost_in_5h = 1.0006936416184973

	================================================================================

	[compare] derived weights (5h)

	W_in (Cost_in_5h) = 6.068307847420574e-08

	W_out (Cost_out_5h) = 3.030372626449337e-07

	Ratio W_out/W_in = 4.993768778123942
	```
	Note that Δutil_5h (cache creation -> cache read) is actually caused by the dummy input_tokens and 1 single output_tokens, full usage object for that response is

	"usage": {
	"input_tokens": 3,
	"cache_creation_input_tokens": 4325,
	"cache_read_input_tokens": 0,
	"cache_creation": {
	"ephemeral_5m_input_tokens": 4325,
	"ephemeral_1h_input_tokens": 0
	},
	"output_tokens": 1,
	"service_tier": "standard"
	}
No results found