brozi.codes
← back to blog
·10 min readcost optimizationclaude codetoken usagetutorial

Why Claude Code Burns So Many Tokens (And How to Cut It in Half)

You ran a two-hour Claude Code session, typed /cost, and felt that specific kind of sting — the one where the number is higher than you expected but not high enough to justify stopping.

You're not imagining it. Heavy users report API-equivalent sessions running to $5,000 or more per month. Even on the Max plan at $200/month, the usage ceiling arrives faster than most developers expect. The instinct is to assume the cost comes from the work Claude is doing — the code it writes, the bugs it finds, the refactors it plans.

That instinct is wrong.

Most of Claude Code's token spend isn't on the code it writes — it's on the context it carries.

This article covers the three root causes behind that, then the concrete fixes ordered by impact. By the end you'll have a clear picture of where the waste lives and how to eliminate most of it.


Section 1: How Claude Code Actually Counts Tokens

This is the section most optimization articles skip. It's also the one that makes everything else make sense.

1.1 It's not per message — it's cumulative

Claude Code doesn't charge you for the tokens in your message. It charges you for the tokens in your message plus every previous message in the session — every file read, every tool result, every response Claude has generated since you started.

That compounds fast:

  • Message 1: ~20K tokens
  • Message 10: ~60K tokens
  • Message 50: ~200K tokens — not because your question got harder, but because Claude re-reads all 49 previous exchanges to answer it

Long sessions aren't just expensive — they're geometrically expensive. Each new turn costs more than the last, even if you're asking simpler questions.

1.2 The hidden overhead most developers never see

Three sources of silent token spend that don't show up in your messages:

MCP servers. Each connected server loads its full tool schema into every single message. One server adds roughly 18K tokens of overhead per turn — before you've typed a word. If you have five servers connected, you're spending 90K tokens per turn just on tool definitions. In one documented workflow, switching to deferred tool loading cut a pipeline from 51K to 8.5K tokens — a 46.9% reduction from MCP overhead alone, not from any change in the actual task.

File reads. Claude Code can read your entire working directory to build context. On a typical project this includes node_modules/, lock files, build artifacts, generated code, and binary assets — none of which help Claude write a function, all of which cost tokens to load.

Output verbosity. Claude's responses are output tokens, and output tokens cost five times more than input tokens on Sonnet. Long explanations, summaries of what Claude is about to do, and restating the problem before answering it — these all cost money. Most developers have never asked Claude to be concise, and it won't be by default.

1.3 The 200K threshold nobody talks about

Once your input exceeds 200K tokens in a single request, Anthropic doubles the input rate:

  • Sonnet: $3/M → $6/M on input, $15/M → $22.50/M on output

This threshold can be crossed mid-session without any warning. A long context window with several file reads and verbose responses is enough to push a routine turn past 200K. Most developers don't know this happens, which means they're paying double rates for the second half of their sessions without realizing it.


Section 2: The Three Root Causes

Root Cause 1: Sessions run too long

The developer instinct is to keep a session going. Switching costs feel real — re-explaining context, re-loading files, re-orienting Claude. So sessions stretch to two, three, four hours.

The counterintuitive truth: a 20-minute session with /clear between tasks costs less and produces equally good output as a two-hour marathon. The reason isn't that Claude forgets things — it's that a fresh session starts with a clean context window, so every turn costs a fraction of what the same turn would cost at hour three.

Long sessions make individual tasks cheaper to describe but drastically more expensive to execute.

Root Cause 2: Claude reads everything, not just what it needs

Without constraints, Claude explores your entire project directory to build context. On a typical Node.js project this can include:

  • node_modules/ — thousands of files, none of which help Claude write your code
  • package-lock.json, yarn.lock — enormous, semantically useless for most tasks
  • Build artifacts and generated code — things Claude produced in a previous session
  • Binary assets and images — things Claude cannot meaningfully process
  • Runtime logs and data files — things that change every run

The tragic part: none of this helps. A function that validates a form input does not benefit from Claude having read your lockfile. The tokens spent reading it are pure overhead.

Root Cause 3: Tools and hooks don't compress their output

Every tool call's result enters the context window at full size. Run git log --oneline in a repo with 500 commits: 500 lines of output, all of it in context forever. Run a verbose test suite: every passing test, every timing line, every stack frame — all there, taking up space that future turns have to re-read.

This is structural. It's not that Claude is doing anything wrong. The default behavior is to capture and preserve tool output faithfully. The problem is that faithful preservation of verbose output is expensive, and most of that output is irrelevant to the actual task.


Section 3: The Fixes, Ordered by Impact

Fix 1: Add a .claudeignore file (5 minutes, immediate impact)

This is the highest-leverage 5-minute change available. .claudeignore works exactly like .gitignore — it prevents Claude from indexing files it has no reason to read.

For a standard Node.js project:

node_modules/
dist/
.next/
build/
coverage/
*.lock
*.min.js
*.map
*.log
.env*

For Python:

__pycache__/
.venv/
venv/
*.pyc
.pytest_cache/
dist/
build/
*.egg-info/

On projects with a large node_modules/ or significant generated output, this single change is often the biggest token reduction available — potentially hundreds of thousands of tokens saved per session. Do this before anything else.

Fix 2: Use /clear between tasks, not just at the end

Treat each task as its own session. The context you built debugging a race condition is dead weight when you switch to building a new feature. Carrying it costs tokens on every subsequent turn, and it doesn't help.

The rule of thumb: if the new task doesn't require understanding the old task's context, use /clear. The few seconds of re-orientation are worth it.

When you want to preserve a summary of prior work without paying for the full transcript, use /compact instead. It distills the session into a condensed context and continues from there.

Fix 3: Write a tight CLAUDE.md

CLAUDE.md is injected into every session. A 300-line CLAUDE.md means 300 lines of tokens in every single turn — including the turns where you're tweaking CSS and the deployment architecture section is completely irrelevant.

Rules for a lean CLAUDE.md:

  • Stay under 100 lines. If it's longer, it's probably documenting things Claude already knows.
  • Architecture decisions only. Not style preferences, not lists of best practices, not things that are true of every project.
  • No redundant context. Don't document your tech stack if it's obvious from your package.json.
  • Split by domain on large projects. Claude loads CLAUDE.md files selectively — a src/payments/CLAUDE.md is only loaded when you're working in that directory.

A well-scoped CLAUDE.md pays dividends on every single session for the lifetime of the project.

Fix 4: Use hooks to compress bash output

This is where structural fixes become programmatic — and where discipline becomes automatic.

Hooks fire deterministically, unlike instructions in CLAUDE.md. A PostToolUse hook that strips and truncates bash output runs on every command, every time, without exception. It can't be forgotten, misinterpreted, or skipped.

BroziCode's brozi_run tool implements this at the tool level: it strips ANSI escape codes and caps output before it enters the context window. A command that would have added 15K tokens adds 2K instead. The errors and warnings are preserved — the noise is eliminated.

For teams, this is particularly valuable: you configure it once, and every developer on every machine benefits from the compression automatically.

→ Install BroziCode

Fix 5: Stop using Opus for everything

Opus costs $5/$25 per million tokens (input/output). Sonnet costs $3/$15. For routine coding tasks — fixing a bug, refactoring a function, updating tests — Sonnet produces equivalent results at roughly 40% lower cost.

The practical strategy:

  • Haiku for fast exploration, file discovery, and classification
  • Sonnet as the default for coding tasks
  • Opus only for complex architectural decisions or multi-step planning problems where reasoning quality materially affects the outcome

The /model command lets you switch mid-session. Most developers never switch, which means they're paying Opus rates for tasks that don't need it.

Fix 6: Ask targeted questions

"How does the authentication system work?" makes Claude read eight files and produce a 600-token explanation that covers things you don't need to know.

"In src/auth/session.ts, how does the token refresh logic decide when to re-issue?" costs a fraction of that and gets you a direct answer.

The more specific your question, the less context Claude pulls to answer it. This sounds obvious. It isn't practiced consistently — most developers communicate with Claude Code the way they'd write a Google query, and Claude treats the ambiguity as an invitation to be thorough.


Section 4: What BroziCode Automates for You

The fixes above work. The problem is they require discipline: discipline to add .claudeignore to every new project, discipline to use /clear at the right moment, discipline to keep CLAUDE.md trim as the project evolves.

BroziCode bakes the structural fixes into hooks that run automatically — no discipline required after installation:

SessionStart hook. Initializes savings tracking and builds a PageRank-weighted repo map from your project's import graph. Claude gets structural understanding of the codebase without reading every file. Typically reduces initial context loading by 60–80% compared to unguided exploration.

PreToolUse hook. Hard-blocks native Read, Grep, and Glob calls before they fire. Instead of reading a 500-line file into context, Claude gets redirected to brozi_smart_search, which returns an AST skeleton — signatures, exports, and imports — instead of the full source. A 2,000-line file becomes 150 lines.

PostToolUse hook. Strips ANSI escape codes and truncates output on every bash command and file read, before the result enters the context window. Runs every time, without exception.

PreCompact/PostCompact hooks. Before compaction, snapshots the recently accessed files and current git diff to a recovery file. After compaction, re-anchors the agent to its tool constraints and points it to the snapshot. Continuity is preserved without re-reading everything from scratch.

The status line shows live session savings — roundtrips saved, estimated tokens, dollar estimate — so you can see the impact as it accumulates.

→ Install BroziCode in two commands


Conclusion

The core insight: Claude Code isn't expensive because of the work it does. It's expensive because of the context it carries doing it.

A .claudeignore file, consistent /clear usage between tasks, and a lean CLAUDE.md eliminate most of the structural waste manually. Hooks and a plugin like BroziCode make those gains automatic — every session, every project, every developer on the team.

Install BroziCode, run your next session, and check the status line. The savings show up immediately.


Want to go deeper? The next article covers Claude Code hooks end-to-end — including the exact PostToolUse configuration that powers brozi_run and how to write your own output compressors for any tool.

Ready to cut your Claude bill?

Install BroziCode in 2 minutes — free, open source, MIT licensed.