Unmarkdown
AI Tools

Claude Compacting Explained: Why Your AI Forgets Mid-Conversation

Updated Feb 24, 2026 · 10 min read

You're deep into a conversation with Claude. You've explained your project architecture, discussed three possible approaches, made a decision, and started implementing. Then, around message 40, Claude starts contradicting decisions you made earlier. It suggests an approach you already rejected. It asks you to clarify something you explained in detail an hour ago.

Claude isn't being careless. It literally no longer has access to what you said. The reason is a mechanism called compacting, and understanding how it works is the key to working around it.

What compacting is

When your conversation with Claude reaches a certain length, the system automatically summarizes the earlier parts to free up space in the context window. The original messages are replaced with a compressed summary. From that point forward, Claude works from the summary, not from what you actually said.

Think of it like someone taking notes during a long meeting. Instead of recording every word, they write a one-page summary and shred the original transcript. If you later ask "What exactly did Sarah say about the timeline?" the notes might say "Timeline discussed" but the specific details are gone.

That's compacting. Claude generates a condensed version of the conversation so far, discards the original messages, and continues with more room in the context window. The conversation keeps going, but the foundation it's building on has been simplified.

When compacting triggers

In Claude Code, compacting triggers at approximately 83.5% of the context window. On a standard 200K token context window, that means compacting kicks in around 167K tokens, leaving a buffer of roughly 33K tokens for the summary and continued conversation.

On claude.ai (the web and desktop apps), the exact threshold varies, but the behavior is similar. Once the conversation approaches the limit of what Claude can hold in active context, the system compacts earlier messages to make room.

The 200K token window is Claude's default. Organizations with tier 4 or higher access can opt into the 1M token beta, which pushes the compacting threshold much further out but doesn't eliminate it. Eventually, even a million-token conversation gets long enough to trigger compression.

The buffer after compacting, roughly 33K tokens in Claude Code, is the space available for the summary plus your continued conversation. This means the summary itself needs to be relatively concise, which is exactly the source of the problem.

What happens technically

When compacting triggers, Claude generates a summary wrapped in a <summary> block. All messages before that point are dropped from the active context. The conversation continues with the summary as the only record of everything that came before.

The result is dramatic. The API payload shrinks by approximately 85%. A conversation that was using 167K tokens of context gets compressed to roughly 25K tokens. That's an enormous amount of information being reduced to a fraction of its original size.

The summary preserves the general shape of the conversation: the topics discussed, the broad conclusions reached, the overall direction of the work. What it cannot preserve is the precision. Exact numbers, specific code snippets, precise variable names, nuanced chains of reasoning, carefully worded constraints. These details are the first casualties of compression.

What gets lost

The summary captures the "gist" but not the substance. Here's what typically degrades:

Precise technical details. If you told Claude that your API endpoint accepts a maximum payload of 512KB with a 30-second timeout and requires a specific header format, the summary might retain "discussed API constraints" without the specific numbers.

Full code snippets. Code you shared or that Claude generated earlier in the conversation gets reduced to descriptions. "Implemented a retry mechanism with exponential backoff" replaces the actual 40-line implementation you spent 20 minutes refining.

Exact variable names and file paths. The summary might say "modified the authentication module" without retaining that the file was src/lib/auth/session-manager.ts and the function was refreshTokenWithRetry.

Nuanced reasoning chains. If you spent several messages working through why Option A was better than Option B, considering three tradeoffs and arriving at a qualified conclusion, the summary might compress this to "chose Option A."

Specific instructions from earlier. "Always use TypeScript strict mode, never use any types, and follow the existing naming convention of camelCase for functions and PascalCase for components" becomes something like "coding preferences discussed."

The cascading compression problem

This is where compacting gets genuinely problematic. If the conversation keeps going after the first compaction, the context window fills up again. When it hits the threshold a second time, compacting triggers again. But now it's summarizing an already-summarized conversation.

Each compression cycle operates on content that's already been compressed. The first compaction loses details. The second compaction loses the context around those details. By the third compaction, you're working with a summary of a summary of a summary.

There's a known bug that makes this worse: compaction summaries don't preserve project-context.md content in Claude Code. If you have a CLAUDE.md or project configuration file that provides critical instructions, those instructions may be lost or degraded after compaction, even though they were loaded at the start of the conversation.

The practical result: after two or three compactions, Claude's understanding of your project has degraded significantly. It still has a vague sense of what you're working on, but the specific constraints, decisions, and technical details that make AI assistance actually useful are gone.

Real-world symptoms

You'll recognize compacting when you see these patterns:

Claude "forgets" instructions you gave earlier. You told it to use a specific library, and it starts suggesting an alternative. You established a naming convention, and it starts using a different one. The instructions were in the compacted-away portion of the conversation.

Claude produces work that contradicts earlier decisions. You decided on a database schema in message 10. By message 50, Claude suggests a different schema without acknowledging the previous decision. It's not disagreeing. It simply doesn't have access to message 10 anymore.

Claude loses track of which files it already modified. In a coding session, Claude might re-do work it already completed, or modify a file in a way that conflicts with changes it made earlier. The earlier work was compacted into a summary too vague to prevent conflicts.

Claude asks you to repeat information you already provided. This is the most obvious symptom. You explained your project structure in detail, and now Claude is asking basic questions about it. The details were lost in compaction.

Claude's memory feature

Claude launched a memory feature in September 2025 for Team and Enterprise plans, extending to Pro and Max in October 2025. Memory auto-synthesizes a "Memory summary" that updates approximately every 24 hours. It's project-scoped and user-editable.

Memory helps with basic preferences and recurring context. If you consistently tell Claude about your role, your tech stack, or your formatting preferences, memory will retain that across conversations.

But memory is a summary layer too. It synthesizes information into a condensed form, which means it has the same fundamental limitation as compacting: the compression is lossy. Memory retains the broad strokes but not the technical details. It knows you work with TypeScript and React but might not retain that you're on React 18 with a specific Vite configuration and a custom webpack alias for path resolution.

Memory also updates slowly. The ~24-hour synthesis cycle means information from today's conversation won't appear in memory until tomorrow. It's useful for slowly-evolving context but not for fast-moving project work.

How to work around compacting

Compacting is a system-level behavior that you cannot disable. But you can structure your work to minimize its impact.

Keep conversations focused

The most effective strategy is also the simplest: don't let conversations get long enough for compacting to trigger. Break large tasks into smaller, focused conversations. Instead of a single 3-hour coding session in one conversation, do three 1-hour sessions with clear handoff notes.

At the end of each conversation, write a brief summary of what was accomplished and what should happen next. Use that summary as the opening context for the next conversation.

Use CLAUDE.md files in Claude Code

Claude Code reads CLAUDE.md files (and .claude/ directory configurations) at the start of every conversation. These files persist on disk, outside of any conversation context. Whatever you put in CLAUDE.md survives compaction because it's re-read from disk, not from the conversation history.

Use CLAUDE.md for: project architecture decisions, coding standards, file naming conventions, known constraints, key file paths, and any instructions that should apply to every conversation. This is the closest thing to "permanent context" that Claude Code supports.

The limitation: CLAUDE.md content can still be degraded if it's included in a compaction summary rather than re-read from disk. There's a known bug where compaction summaries don't fully preserve project-context.md content. Keep your CLAUDE.md concise and focused on the most critical information.

Summarize key decisions explicitly

Before you think compaction might be approaching (typically in long conversations), write an explicit summary of your key decisions and current state. Something like: "Before we continue, let me confirm the current state: we're using PostgreSQL with the schema from message 12, the API follows REST conventions with JWT auth, and the remaining task is implementing the rate limiting middleware in src/middleware/rate-limit.ts."

This pushes the critical information toward the end of the context, where it's most likely to survive compaction. It also gives Claude a self-contained reference point that doesn't depend on details from earlier messages.

Use an external knowledge base via MCP

This is the structural solution. Instead of keeping critical project context inside the conversation (where it's subject to compaction), store it in documents that live outside the conversation entirely.

With MCP (Model Context Protocol), Claude can read documents on demand. Your project architecture, decisions log, coding standards, and current task status live in documents that Claude accesses by reading them, not by remembering them from earlier in the conversation. After compaction clears the conversation history, Claude can simply re-read the document and have the full context again.

Unmarkdown™ provides a built-in MCP server with seven tools for document management. Claude can list your documents, read them in full, create new ones, and update existing ones. The documents are written in markdown, the same format Claude natively understands, so there's no translation or information loss.

The setup for Claude is straightforward. On claude.ai, add Unmarkdown™ as an integration via Settings. On Claude Desktop, add a configuration entry to your config file. In Claude Code, run a single claude mcp add command. The Claude integration guide has the exact steps for each client.

Once connected, you can tell Claude: "Read my project architecture document before we continue." Claude reads the full document, gets the exact context it needs, and produces responses grounded in your actual project details rather than a compressed summary.

The fundamental issue

Context windows are temporary working memory. No matter how large they get, they're session-bound. A 200K window and a 1M window both reset to zero when the conversation ends. Within a session, compacting progressively erodes detail as the conversation grows.

Chroma's "context rot" research from July 2025 quantified this: LLMs don't process context uniformly. Performance follows a U-shaped curve, with the best recall at the beginning and end of the context window and the worst in the middle. GPT-4.1's accuracy, by OpenAI's own MRCR test, drops from 84% at 8K tokens to 50% at 1M tokens. Claude models decay the slowest of the major providers, but they still decay.

Persistent knowledge requires a different approach. Compacting is not a bug to be fixed. It's an inherent consequence of finite context windows. The workaround isn't a larger window or better summarization. It's moving the knowledge you care about to a persistent location that any conversation can access, regardless of length, session boundaries, or compression cycles.

Your documents are the memory. The conversation is the workspace. Keeping them separate is what makes both reliable.

Your markdown deserves a beautiful home.

Start publishing for free. Upgrade when you need more.

View pricing