← Blog
ENGINEERING

Why compaction fails in coding agents when it matters most

When long coding sessions hit their limits, most agents break by losing context. Here’s why compaction fails and how a better approach fixes it.

Maham BatoolMaham Batool
5 min read
Mar 18, 2026

There is something subtle but deeply frustrating happening in AI coding tools that most people don’t talk about. It’s not the models or the code quality.

It’s compaction.

The standard workflow looks fine on the surface: you start a thread, build context, iterate, refine. The model gets “smarter” as the conversation grows. Until it doesn’t. Until you hit the limit. And everything starts collapsing.

What compaction actually does

When a thread gets too long (100k–200k tokens), coding agents compress the conversation to keep going.

Sounds reasonable.

But here’s the problem:

Compaction is lossy.

It throws away information and often the critical signals that the LLM needs with it.

Why it fails when it matters most

Compaction works fine when the task is small, like:

  • Fix a typo
  • Write a helper
  • Refactor a function

But that’s not where it matters.

It fails when:

  • You’re deep in debugging
  • You’ve made layered decisions over time
  • The system spans multiple files and flows
  • The problem is messy and non-linear

This is exactly when context matters most. And, this is exactly when compaction gets aggressive.

The real issue

Most systems treat compaction like summarizing a document. But your workflow is not a document. It’s a state machine.

It includes:

  • Decisions
  • Constraints
  • Rejections
  • Tradeoffs
  • Failed attempts

Traditional compaction doesn’t preserve state. It preserves sentences and that’s why things break.

What this looks like in practice

You’ve seen this:

  • The model forgets why a decision was made
  • It reintroduces bugs you already fixed
  • It suggests patterns you rejected
  • It loses architectural alignment

This isn’t randomness. This is context loss. And it compounds.

The cost nobody talks about

Compaction looks like an optimization. Fewer tokens. Lower cost. But the hidden cost is massive:

  • Re-explaining context
  • Repeating decisions
  • Fixing regressions
  • Splitting work into multiple threads

What you save in tokens, you lose in time. And, worse you lose your coding flow. Reprompting and realigning the agent to match your expectations for the third time in a heavily compacted conversation.

Why Claude Code, Codex, Cursor, Opencode all hit this wall

This isn’t a single coding agent problem. It shows up across almost all coding agents:

  • Claude Code → aggressive auto-compaction
  • Codex-style agents → lose mid-context signal
  • Cursor / Opencode → rely on similar summarization heuristics

They all share the same assumption:

Context can be compressed without losing meaning.

That assumption is wrong because not all tokens are equal.

A better way to think about compaction

The problem isn’t compaction. It’s how compaction is done.

Most systems do:

“Shrink everything.”

What you actually want is:

“Keep what matters. Drop what doesn’t.”

That requires understanding relevance, not just summarization.

Command Code’s Fast Compact Mode

This is where things get interesting.

Instead of a single-pass summary, Command Code introduces Fast Compact Mode, a tiered compaction system.

Here’s what’s different:

  • It liked running multiple parallel agents
  • Evaluates relevance based on your current task
  • Separates signal from noise
  • Preserves high-value context at higher fidelity

Think of it like:

1┌────────────────────────────┐ 2│ Your Context │ 3└────────────────────────────┘ 4 | 5 v 6┌────────────────────────────┐ 7│ Critical │ 8│ (Preserved Daily) │ 9└────────────────────────────┘ 10 | 11 v 12┌────────────────────────────┐ 13│ Useful │ 14│ (Lightly Compressed) │ 15└────────────────────────────┘ 16 | 17 v 18┌────────────────────────────┐ 19│ Noise │ 20│ (Removed) │ 21└────────────────────────────┘

Not all history is treated equally and that’s the key.

To try Fast Compact Mode in Command Code, open a cmd session and type /compact-mode.
Then choose either Default or Fast.

  • Default behaves like other coding agents you know the usual auto-compaction.
  • Fast Compact is the real differentiator: it tiers and compresses your thread intelligently, keeps what actually matters, and lets you continue deep work without losing context.

Why this actually works

Fast Compact is more expensive per step. More compute. More tokens. But it wins overall.

Because:

  • You don’t lose context mid-flow
  • You don’t restart threads
  • You don’t repeat work
  • You don’t fight the model

Compaction is an alignment problem

This is the deeper takeaway:

AI coding tools don’t fail at generation. They fail at retention.

Every step you take adds signal:

  • Preferences
  • Constraints
  • Decisions

Bad compaction deletes that signal. Good compaction preserves it.

The difference is alignment.

What to do about it

Compaction is where most coding agents quietly break. Not in demos. Not in toy problems. But in real work.

Claude Code, Codex, Cursor, Opencode—they all generate good code. But they don’t reliably hold context when it matters most.

Command Code’s Fast Compact Mode points to a different direction:

  • Treat context as state
  • Preserve relevance, not just text
  • Spend compute where it actually matters

Because in the end, the bottleneck isn’t writing code. It’s the failure to recall why you wrote it that way.

Maham Batool
Maham Batool@MahamDev

Share this article