Back to Home
5 Anthropic Prompt Caching Patterns That Cut My API Bill 70%

5 Anthropic Prompt Caching Patterns That Cut My API Bill 70%

B
Blizine Admin
·2 min read·0 views

RAXXO Studios Posted on Jun 1 • Originally published at raxxo.shop 5 Anthropic Prompt Caching Patterns That Cut My API Bill 70% # ai # productivity # claudecode # automation System-prompt caching alone cut repeat-call costs by half Tool definitions cache separately, perfect for agent loops Conversation history caching pays off after turn three 1-hour TTL beats the default 5 minutes for batch jobs My Anthropic API bill dropped 70 percent last month and I did not change a single model. I changed where the cache breakpoints went. Here are the five patterns I now use on every Claude integration I ship. Pattern 1: Cache The System Prompt First The system prompt is the cheapest win and most people skip it. My agents run with a 4,000 token system prompt that explains the role, the output format, the safety rules, and a few examples. That prompt never changes inside a session. Before caching, I paid full input price for those 4,000 tokens on every single call. With an agent that loops 30 times to finish a task, that is 120,000 tokens of pure repetition. The fix is one parameter. I add a cache_control block with type: "ephemeral" to the last content item in the system prompt array. The first call writes the cache and costs slightly more (cache writes carry a small premium). Every call after that reads the cache at roughly one tenth the input price. Here is the rule I follow: the cached block has to be at least 1,024 tokens for Claude Sonnet, or it gets ignored silently. My 4,000 token prompt clears that easily. If your system prompt is short, this pattern does nothing, so do not bother adding the breakpoint to a 200 token instruction. The order matters more than people expect. The cache works as a prefix. Everything before the breakpoint gets stored. Everything after it is read fresh. So I put the stable stuff (role, rules, examples) up top and the volatile stuff (user query, current date) down below the breakpoint. Reorder this wrong and your cache hit rate collapses b

📰Dev.to — dev.to

Comments