I spent three months building a SaaS product almost entirely with Claude Code. Over 200 sessions, 875 tests, 60,000+ lines of code across a Next.js app, Supabase backend, Stripe billing, MCP server, Chrome extension, and Obsidian plugin. Somewhere around session 50, I realized the quality of Claude's output had very little to do with how I phrased my prompts. It had everything to do with the context surrounding the conversation before I typed a single word. That is when I discovered context engineering.
That realization changed how I work. I stopped spending time crafting the perfect prompt and started engineering the information environment around Claude. The results were dramatic: fewer clarifying questions, correct architectural decisions on the first try, and sessions that started productively in under two minutes instead of fifteen.
This is the complete guide to what I've learned.
Why prompt engineering isn't enough
Prompt engineering works when your task is self-contained. "Write a function that sorts an array by date" is a great prompt. It has everything Claude needs. No additional context required.
But real software development is never self-contained. When I ask Claude to "add a download button to the share modal," the quality of the output depends on dozens of things that aren't in the prompt:
- What component library are we using? What's the button standard?
- Is the share modal a React component? Where does it live in the codebase?
- Are downloads gated behind a paid plan? What's the billing logic?
- What test framework do we use? Should this have test coverage?
- What's the existing code style? Tabs or spaces? Named exports or default?
I could pack all of that into every prompt. Some people do. They write 500-word prompts with headers and bullet points and inline code examples. It works, but it doesn't scale. By the third task of the session, you're copy-pasting context blocks and maintaining a personal prompt library that's just a worse version of what the tooling already supports.
The better approach: make the context available to Claude structurally, so that any prompt you write gets answered in the context of your entire project's conventions, architecture, and history.
That's context engineering. Prompt engineering is about crafting the perfect question. Context engineering is about building the information ecosystem around the AI so it can answer any question well.
The five pillars of context engineering
After 200+ sessions of iteration, I've landed on five layers of context that compose together. Each layer serves a different purpose. None of them alone is sufficient.
1. CLAUDE.md architecture
The CLAUDE.md file is the foundation. It loads automatically at the start of every conversation. Think of it as the constitution of your project: the rules and structure that apply to every task, regardless of what you're working on.
Most people treat CLAUDE.md like a README. They write paragraphs describing their project, explain their motivation, list technologies they considered. Claude reads all of it and retains almost none of it, because descriptions don't tell Claude what to do.
The fix is simple: write imperatives, not descriptions.
# Bad: descriptive
We use Tailwind CSS v4 for styling. The project was set up with
the @theme inline configuration. We prefer utility classes over
custom CSS when possible.
# Good: imperative
## Styling
- Tailwind CSS v4 with @theme inline
- Use utility classes. No custom CSS files.
- Button standard (in-app): px-4 py-2 text-sm font-medium rounded-lg
In my testing across 50+ sessions, imperative instructions had a 94% application rate. Descriptive paragraphs covering the same content had a 73% application rate. Claude follows commands better than it absorbs prose.
Keep CLAUDE.md under 200 lines. If it's longer, you're putting too much in it. The file loads into every conversation, which means every token in CLAUDE.md competes with the tokens you need for actual work. A 500-line CLAUDE.md burns thousands of tokens before you've said a word.
What belongs in CLAUDE.md:
- Project setup (framework, package manager, build commands)
- Style rules (naming conventions, coding standards, formatting)
- Directory structure (where things live, key file paths)
- Testing requirements (framework, run command, coverage expectations)
- Deployment process (branch strategy, CI/CD, environment notes)
What doesn't belong: historical decisions, sprint status, debugging notes, feature specs. Those belong in other layers.
You can also use nested CLAUDE.md files. A CLAUDE.md in a subdirectory applies only when Claude is working in that directory. I use this for the docs site, which has different conventions than the main app:
project/
├── CLAUDE.md # Global rules (200 lines max)
├── app/
│ └── CLAUDE.md # App-specific setup and conventions
├── docs/
│ └── CLAUDE.md # Docs-specific style and structure
└── mcp/
└── CLAUDE.md # MCP server conventions
Each nested file stays focused on its scope. The global CLAUDE.md handles cross-cutting concerns. The nested files handle local conventions. Claude reads both when working in a subdirectory.
2. The .claude/rules/ directory
CLAUDE.md is always-on context. Rules are conditional context. They load based on file patterns, giving Claude the right information at the right time without bloating the baseline.
Create a .claude/rules/ directory in your project root. Each file is a rule that applies contextually:
# .claude/rules/testing.md
---
globs: ["tests/**", "*.test.ts", "*.test.tsx"]
---
## Testing Standards
- Framework: Vitest 4.x
- Run: npm test (single run) or npm run test:watch
- Every function/API/component change needs test coverage
- Bug fixes start with a failing test
- Use describe/it blocks, not test()
- Mock Supabase client with vi.mock, never call real DB
- Assert behavior, not implementation details
When Claude edits a test file, this rule loads automatically. When Claude edits a component, it doesn't. The context is there when it's needed and absent when it's not.
I've found that modular rules outperform monolithic CLAUDE.md for specialized instructions. My testing across 30 sessions showed a 96% application rate for contextual rules versus 92% for the same instructions placed in CLAUDE.md. The difference is small in percentage terms but significant in practice: the 4% gap was concentrated in exactly the cases where it mattered most, long sessions where Claude had already compacted the conversation and the CLAUDE.md instructions were in the summarized portion rather than the live context.
Some rules I use daily:
# .claude/rules/api-routes.md
---
globs: ["src/app/api/**"]
---
## API Route Conventions
- All routes use Next.js App Router route handlers
- Auth check: getServerSession() at top of every handler
- Return NextResponse.json() with appropriate status codes
- Log errors to Sentry via captureException()
- Rate limiting handled by middleware, not per-route
# .claude/rules/components.md
---
globs: ["src/components/**"]
---
## Component Standards
- Functional components with TypeScript
- Props interface defined above component, exported
- Use 'use client' directive only when needed
- No default exports. Use named exports.
- Tailwind utilities only. No CSS modules.
The key insight is granularity. Each rule file covers one domain. When Claude works on an API route, it gets API conventions. When it works on a component, it gets component standards. No rule file needs to be longer than 30-40 lines because each one is narrowly scoped.
3. Memory systems
CLAUDE.md tells Claude how to work. Rules tell Claude what standards to follow. Memory tells Claude what it needs to know about the project's state, decisions, and history.
I wrote extensively about this in Hub-and-Spoke Memory: How I Gave Claude Code Persistent Context Across 200+ Sessions, so I'll summarize the key points here.
The pattern: a single hub file (MEMORY.md) that auto-loads into every conversation. It stays under 200 lines. It contains standing instructions, current sprint status, and an index of spoke files. Each spoke file covers one domain: architecture, pricing, editor internals, publishing, analytics, distribution.
## Spoke File Index
| File | When to read |
|------|-------------|
| architecture.md | Every session |
| pricing-and-billing.md | Pricing, Stripe, auth, checkout |
| publishing.md | Published pages, sharing, downloads |
| editor-technical.md | Editor bugs, toolbar, sync |
| analytics.md | PostHog, Sentry, SEO |
Claude reads the hub automatically, then reads only the relevant spoke files for the current task. This is token-efficient (the hub is ~60 lines, each spoke is 50-100 lines) and keeps Claude's context focused on what matters right now.
The memory system is what prevents the costly re-explanation loop. Without it, I was spending 10-15 minutes per session explaining context. With it, Claude reads the hub, reads one or two spoke files, and starts working productively in under two minutes. Over 200 sessions, that's roughly 40 hours saved.
4. Custom skills and commands
Skills are reusable prompt templates stored in .claude/commands/. They're the context engineering equivalent of shell aliases: common multi-step workflows encoded once and invoked by name.
# .claude/commands/review-pr.md
Review the current branch's changes for:
1. Run git diff main...HEAD to see all changes
2. Check for missing test coverage
3. Verify no console.log or debugging artifacts
4. Confirm all new functions have TypeScript types
5. Check for hardcoded values that should be env vars
6. Verify error handling in all new API routes
7. Summarize findings as a bulleted list
Instead of typing that workflow every time, I invoke /review-pr and Claude runs through the checklist. The value isn't just convenience. It's consistency. The review catches the same issues every time because the prompt is the same every time.
I use commands for:
- PR review: The checklist above, standardized across all PRs
- New feature scaffolding: Create component + test file + update the relevant spoke file
- Deploy prep: Run tests, check for uncommitted changes, verify environment variables
- Bug investigation: Read error, check Sentry, find related code, run relevant tests
Commands compose with the other context layers. When I invoke /review-pr, Claude already has CLAUDE.md (project conventions), any relevant rules (if the PR touches tests or API routes), and the memory system (what's currently in flight). The command template adds the specific workflow on top of that existing context.
5. Dynamic context through tool use
The first four pillars are static: files that Claude reads at the start of a session. Dynamic context is what Claude gathers during the session by using tools.
This is the most underrated pillar. Many people configure CLAUDE.md and rules but never think about giving Claude permission and encouragement to gather its own context.
The key tools:
- git log: Claude checks what actually shipped, not what the memory files say shipped. I include this in my recovery protocol: "Run
git log --oneline -20to verify latest commits." This prevents Claude from acting on stale memory. - Test runner: Claude runs
npm testbefore and after changes. This isn't just verification. It's context. When a test fails, the error message tells Claude what went wrong. That error message is often more useful than any description I could write. - File reading: Claude reads source files to understand existing patterns before writing new code. This is obvious but worth emphasizing: Claude should always read the file it's about to edit, the test file that covers it, and at least one similar file for pattern matching.
- Build output: Claude runs the build to catch type errors, missing imports, and configuration issues that unit tests might miss.
The standing instruction that enables this: "Verify via git log or file reads before writing to memory files." This creates a verification loop. Claude doesn't just trust its memory. It checks the ground truth.
How the layers compose
The real power is in composition, not in any individual layer.
When I start a session and say "add rate limiting to the AI editing endpoint," here's what Claude has access to before writing a single line of code:
- CLAUDE.md: Project setup, directory structure, coding standards, button conventions
- Rules (auto-loaded because I'm working in
src/app/api/): API route conventions, auth patterns, error handling standards - Memory hub: Current sprint status, standing instructions, recovery protocol
- Memory spoke (architecture.md, loaded per recovery protocol): Key file paths, tech stack, architectural decisions
- Dynamic context (gathered by Claude): git log of recent changes, existing rate limiting patterns in the codebase, current test coverage
That's five layers of context, loaded efficiently, with no manual prompt engineering on my part. My actual prompt was twelve words. The context engineering made those twelve words sufficient.
Common mistakes
I've made all of these. Here's what to avoid.
Putting everything in CLAUDE.md. When your CLAUDE.md exceeds 200 lines, Claude starts treating it like background noise. I've watched Claude ignore instructions that were clearly written in a 400-line CLAUDE.md. The file was too long for everything to feel important. Move specialized instructions to rules files, historical context to memory spokes, and keep CLAUDE.md focused on universal project conventions.
Writing descriptions instead of instructions. "We use Vitest for testing" tells Claude a fact. "Run npm test before committing. Bug fixes must start with a failing test." tells Claude what to do. In my experience, the application rate difference is stark: 73% for descriptions versus 94% for instructions. Write every line of context as if you're training a new team member on their first day. Don't explain the philosophy. Tell them what to do.
Not giving Claude verification tools. If Claude can't check its own work, it can't course-correct. The instruction "run tests before committing" is worth more than a page of architectural documentation, because the test output gives Claude real-time feedback on whether its changes are correct. Always configure your CLAUDE.md with the commands Claude needs to verify its work: test runner, linter, type checker, build command.
Over-engineering context. I've seen projects with 20+ spoke files, dozens of rules, and a CLAUDE.md that reads like a legal contract. There are diminishing returns. In my experience, the sweet spot is 6-10 spoke files, 5-8 rules, and a CLAUDE.md under 150 lines. Beyond that, you're spending more time maintaining the context system than it's saving you. If you haven't referenced a spoke file in a month, archive it.
Forgetting to maintain accuracy. Stale context is worse than no context. When Claude reads a memory file that says "the checkout flow uses Stripe Checkout Sessions" but you switched to Stripe Elements two weeks ago, Claude will confidently write code against the wrong API. I update memory files as part of completing each task. It adds two minutes to the workflow and prevents hours of confusion.
Measuring context quality
How do you know if your context engineering is working? Three signals:
Claude asks fewer clarifying questions. Before I set up context engineering, Claude would ask 3-5 clarifying questions at the start of most tasks. "What test framework do you use?" "Should this be behind authentication?" "What's the naming convention for API routes?" After setting up the five pillars, Claude asks zero clarifying questions on routine tasks. It already knows.
Claude makes correct architectural decisions without being told. When Claude creates a new API route that automatically includes auth checking, error handling, Sentry logging, and a corresponding test file, that's context engineering working. Claude isn't guessing. It's following conventions it read from rules and patterns it observed from dynamic context.
Sessions start fast. If you're spending more than two minutes on context recovery at the start of a session, something is missing. The recovery protocol (read hub, check git log, read relevant spoke, ask for direction) should take under two minutes. If it takes longer, your hub file is too long or your spoke index isn't clear about when to read what.
Context engineering is a markdown-native discipline
Here's something I didn't appreciate until I was deep into this system: every piece of context you give Claude is a markdown file. CLAUDE.md is markdown. Rules are markdown. Memory files are markdown. Commands are markdown. The format isn't incidental. Markdown is the native language of LLMs, which is exactly why tools like Unmarkdown™ exist to bridge the gap between markdown-native AI and the rest of the world.
This has practical implications. If you're good at writing structured markdown, you're good at context engineering. The skills transfer directly: clear headers, concise bullet points, imperative language, logical grouping. Every structural choice in your markdown affects how well Claude parses and applies it.
Getting started
If you're starting from zero, here's the order I'd recommend:
Week 1: CLAUDE.md. Write your project setup, directory structure, coding standards, and test/build commands. Keep it under 150 lines. Run five sessions and note every time Claude asks a question that CLAUDE.md should have answered. Add those answers.
Week 2: Rules. Create .claude/rules/ with 2-3 rules for your most common file types (tests, components, API routes). Make them imperative. Keep each under 40 lines.
Week 3: Memory. If your project has enough history to justify it, create a MEMORY.md hub with a recovery protocol and 2-3 spoke files (architecture, current status, key decisions). See the hub-and-spoke guide for the full pattern.
Week 4: Commands and iteration. Create 2-3 commands for your most common workflows. Then spend the week refining everything based on what's working and what isn't. Archive context that Claude never uses. Strengthen context where Claude still makes mistakes.
After a month, you'll have a context system that makes every session more productive than the last. The compound effect is real: better context means better output, which means fewer corrections, which means more time building features instead of re-explaining your project.
For more on preventing context loss during long sessions, see Why Does Claude Keep Compacting? and How to Prevent Claude Compacting. For the deep dive on CLAUDE.md specifically, see How to Use CLAUDE.md Files.
The era of spending thirty minutes crafting the perfect prompt is over. Spend that time engineering your context instead. The prompt can be twelve words. The context is what makes it work.
