There is something subtle but deeply frustrating happening in AI coding tools that most people don’t talk about. It’s not the models or the code quality.
It’s compaction.
The standard workflow looks fine on the surface: you start a thread, build context, iterate, refine. The model gets “smarter” as the conversation grows. Until it doesn’t. Until you hit the limit. And everything starts collapsing.
What compaction actually does
When a thread gets too long (100k–200k tokens), coding agents compress the conversation to keep going.
Sounds reasonable.
But here’s the problem:
Compaction is lossy.
It throws away information and often the critical signals that the LLM needs with it.
Why it fails when it matters most
Compaction works fine when the task is small, like:
- Fix a typo
- Write a helper
- Refactor a function
But that’s not where it matters.
It fails when:
- You’re deep in debugging
- You’ve made layered decisions over time
- The system spans multiple files and flows
- The problem is messy and non-linear
This is exactly when context matters most. And, this is exactly when compaction gets aggressive.
The real issue
Most systems treat compaction like summarizing a document. But your workflow is not a document. It’s a state machine.
It includes:
- Decisions
- Constraints
- Rejections
- Tradeoffs
- Failed attempts
Traditional compaction doesn’t preserve state. It preserves sentences and that’s why things break.
What this looks like in practice
You’ve seen this:
- The model forgets why a decision was made
- It reintroduces bugs you already fixed
- It suggests patterns you rejected
- It loses architectural alignment
This isn’t randomness. This is context loss. And it compounds.
The cost nobody talks about
Compaction looks like an optimization. Fewer tokens. Lower cost. But the hidden cost is massive:
- Re-explaining context
- Repeating decisions
- Fixing regressions
- Splitting work into multiple threads
What you save in tokens, you lose in time. And, worse you lose your coding flow. Reprompting and realigning the agent to match your expectations for the third time in a heavily compacted conversation.
Why Claude Code, Codex, Cursor, Opencode all hit this wall
This isn’t a single coding agent problem. It shows up across almost all coding agents:
- Claude Code → aggressive auto-compaction
- Codex-style agents → lose mid-context signal
- Cursor / Opencode → rely on similar summarization heuristics
They all share the same assumption:
Context can be compressed without losing meaning.
That assumption is wrong because not all tokens are equal.
A better way to think about compaction
The problem isn’t compaction. It’s how compaction is done.
Most systems do:
“Shrink everything.”
What you actually want is:
“Keep what matters. Drop what doesn’t.”
That requires understanding relevance, not just summarization.
Command Code’s Fast Compact Mode
This is where things get interesting.
Instead of a single-pass summary, Command Code introduces Fast Compact Mode, a tiered compaction system.
Here’s what’s different:
- It liked running multiple parallel agents
- Evaluates relevance based on your current task
- Separates signal from noise
- Preserves high-value context at higher fidelity
Think of it like:
1┌────────────────────────────┐
2│ Your Context │
3└────────────────────────────┘
4 |
5 v
6┌────────────────────────────┐
7│ Critical │
8│ (Preserved Daily) │
9└────────────────────────────┘
10 |
11 v
12┌────────────────────────────┐
13│ Useful │
14│ (Lightly Compressed) │
15└────────────────────────────┘
16 |
17 v
18┌────────────────────────────┐
19│ Noise │
20│ (Removed) │
21└────────────────────────────┘Not all history is treated equally and that’s the key.
To try Fast Compact Mode in Command Code, open a cmd session and type /compact-mode.
Then choose either Default or Fast.
- Default behaves like other coding agents you know the usual auto-compaction.
- Fast Compact is the real differentiator: it tiers and compresses your thread intelligently, keeps what actually matters, and lets you continue deep work without losing context.
Why this actually works
Fast Compact is more expensive per step. More compute. More tokens. But it wins overall.
Because:
- You don’t lose context mid-flow
- You don’t restart threads
- You don’t repeat work
- You don’t fight the model
Compaction is an alignment problem
This is the deeper takeaway:
AI coding tools don’t fail at generation. They fail at retention.
Every step you take adds signal:
- Preferences
- Constraints
- Decisions
Bad compaction deletes that signal. Good compaction preserves it.
The difference is alignment.
What to do about it
Compaction is where most coding agents quietly break. Not in demos. Not in toy problems. But in real work.
Claude Code, Codex, Cursor, Opencode—they all generate good code. But they don’t reliably hold context when it matters most.
Command Code’s Fast Compact Mode points to a different direction:
- Treat context as state
- Preserve relevance, not just text
- Spend compute where it actually matters
Because in the end, the bottleneck isn’t writing code. It’s the failure to recall why you wrote it that way.

