Everyone says you need a vector database for AI agent memory. I tried three of them. Then I switched to plain markdown files and everything got better.
I've been building Unmarkdown™ with Claude Code for over three months, across 200+ sessions. The question of AI agent memory is central to this workflow. The product has 875 tests, a dozen integrations, and enough architectural complexity that context persistence isn't optional. It's the difference between productive sessions and wasted ones.
I detailed my file-based approach in Hub-and-Spoke Memory. That post covered the how. This one covers the why: why files beat the alternatives for most developer workflows, where the alternatives actually win, and how to decide which approach fits your project.
The AI agent memory landscape in 2026
The AI agent memory space has exploded in the past year. What used to be "just put it in the system prompt" has branched into four distinct approaches:
File-based memory. CLAUDE.md files, hub-and-spoke patterns, .claude/rules/ directories. The context lives in structured text files that the agent reads directly. No embedding, no retrieval pipeline, no infrastructure. This is what I use.
Vector databases. Mem0, Pinecone, Chroma, Weaviate. You embed your knowledge into vectors, store them in a database, and retrieve relevant chunks via semantic search. This is what most of the AI memory ecosystem is building toward.
Graph databases. Letta (formerly MemGPT), Neo4j-backed knowledge graphs. Instead of flat vector similarity, these model relationships between concepts. "The billing module depends on Stripe webhooks" becomes a traversable edge in a graph.
Hybrid approaches. Combinations of the above, often with a vector store for retrieval and a graph layer for relationships, plus file-based rules for deterministic instructions. The most sophisticated setups run all three simultaneously.
Each approach has real strengths. But the discourse has a bias problem: complexity is more interesting to write about than simplicity. So the conversation gravitates toward vector databases and graph stores, even when they're overkill for the task.
What I tried first
Before settling on files, I experimented with vector-based memory for project context. The appeal was obvious: embed all my project knowledge, and let semantic search surface the right context automatically. No manual organization. No deciding which spoke file to read. The system would just know.
Here's what actually happened.
I asked Claude about the pricing logic for annual subscriptions. The vector retrieval returned five chunks: a section about Stripe webhook handlers, a paragraph about the authentication flow for checkout, test descriptions from the billing test suite, a note about the free tier limitations, and a snippet about the subscription sync helper. All semantically related to "pricing." None of them answered my actual question about how annual billing periods work in the state machine.
This is the fundamental problem with semantic search for developer context. Semantic similarity and task relevance are different things. When you search for "pricing logic," you want the specific state machine that handles plan transitions, not everything in the codebase that touches money. But to a vector database, all of those concepts are neighbors in embedding space.
The file-based approach solved this instantly. I have a pricing-and-billing.md spoke file. It contains the billing state machine, the Stripe product IDs, the plan transition logic, and the known gotchas. When Claude reads that file, it gets exactly the context it needs. Nothing more, nothing less.
I ran into a second problem with vector retrieval: inconsistency. The same question would surface different chunks depending on slight phrasing differences. "How does pricing work" and "explain the billing tiers" returned different result sets, even though they're asking the same thing. With files, the retrieval is deterministic. The spoke file is the spoke file. It doesn't change based on how you phrase the question.
Why files beat databases for AI agent memory
After three months of daily use across hundreds of sessions, I can articulate five specific advantages of file-based AI agent memory over vector or graph alternatives.
Deterministic retrieval
When Claude reads architecture.md, I know exactly what context it received. I can read the same file and see precisely what Claude sees. There is no embedding uncertainty, no retrieval scoring, no "top-k results that may or may not include the critical piece."
This matters enormously for debugging. When Claude makes a mistake, I can check: did it have the right context? With files, this is a five-second check. With vector retrieval, I'd need to inspect the query embedding, review the similarity scores, check which chunks were returned, and determine whether the relevant information was in the retrieved set. That's a debugging session in itself.
Human-readable and editable
Every piece of memory in my system is a markdown file I can open in any editor. I review what Claude writes to the memory. I edit entries that are wrong. I restructure spoke files when the organization stops making sense.
With a vector database, the knowledge lives as embeddings. You can't read an embedding. You can't edit a single fact without re-embedding the entire chunk. You can't glance at the database and assess whether the knowledge is accurate and current.
This isn't a minor convenience. It's a fundamental property of the system. Memory that humans can't audit will drift from reality, and you won't notice until the AI acts on stale information in a way that costs you hours.
Zero infrastructure
My memory system requires no server, no embedding model, no vector database, no retrieval pipeline. It's files on disk. It works offline. It has zero latency. It never goes down. There are no API rate limits, no cold starts, no costs that scale with usage.
Setting up a vector-based memory system means choosing an embedding model, provisioning a vector store, building an ingestion pipeline, tuning retrieval parameters, and maintaining all of it. For a solo developer or a small team, that's infrastructure overhead that directly competes with time spent building the actual product.
Token-efficient
A typical spoke file in my system is 50 to 100 lines, roughly 500 to 1,000 tokens. Claude loads only the spokes relevant to the current task. Total memory overhead per session: around 1,500 tokens.
Vector retrieval systems typically return 5 to 10 chunks per query, each 200 to 500 tokens. That's 1,000 to 5,000 tokens of retrieved context, and much of it will be tangentially related rather than directly useful. Worse, if you run multiple retrievals during a session (which you often need to), the accumulated context grows quickly.
In a world where context windows are precious and every token of irrelevant context dilutes the signal, loading exactly the right 100 lines beats retrieving 500 lines of "similar" content.
Git-native
Every memory update in my system is a git commit. I can see when a spoke file was last updated, who changed it, and exactly what changed. If a bad update goes in, I can revert it. If I want to understand how the project's architecture evolved, I read the git log for architecture.md.
This is version control for AI memory, and it comes free. No separate versioning system, no snapshot management, no backup strategy. Just git, the tool I'm already using for everything else.
When you DO need something more
I want to be fair about this. File-based memory has real limitations, and there are scenarios where vector databases, graph stores, or hybrid approaches genuinely outperform.
Massive, unfamiliar codebases
If you're working with a codebase of 1,000+ files that you didn't write, manual organization of spoke files isn't practical. You don't know the codebase well enough to decide what goes in each file. In this scenario, embedding the entire codebase and using semantic search to surface relevant code is a legitimate strategy. The retrieval won't be perfect, but it's better than no context at all.
My system works because I know my codebase intimately. I built it. I know which details matter and which don't. That knowledge is what makes the spoke files useful. If you lack that knowledge, you can't write good spoke files.
Multi-user, multi-agent systems
File-based memory assumes a single developer (or a small team) working with a single agent. When multiple agents need different views of the same knowledge base, or when dozens of users are contributing knowledge simultaneously, files become a coordination bottleneck.
Vector databases handle this naturally. Each agent queries the shared knowledge base and gets results tailored to its query. No one needs to decide which file each agent should read. The retrieval system handles routing automatically.
Real-time, rapidly changing data
My spoke files are stable knowledge: architecture decisions, technical constraints, known gotchas. They change slowly, maybe once or twice a week. If your memory needs to reflect data that changes every hour, like production metrics, live system status, or streaming event data, files can't keep up. You need a system that ingests and indexes new information continuously.
Episodic memory at scale
Files handle procedural memory ("how to deploy") and semantic memory ("the billing state machine has four states") well. They're weak on episodic memory: recalling specific past interactions, decisions made three months ago in a particular conversation, or patterns across hundreds of sessions.
If you need an agent to say, "The last time we tried this approach, it failed because of X," you need something that can search across session histories. Vector databases with conversation embeddings can do this. Files can't, at least not without growing to an unmanageable size.
The four types of memory
Understanding why files work for some memory types and not others requires a quick framework. AI agents deal with four kinds of memory:
Working memory. The current conversation context. What you've discussed in this session, the files you've read, the code you've written. Every AI system handles this natively through the context window. No external memory system needed.
Procedural memory. How to do things. Your build commands, testing conventions, deployment steps, coding standards. "Always run tests before committing." "Use these specific Tailwind classes for in-app buttons." Files excel at this. A CLAUDE.md or rules file with procedural instructions is deterministic and reliable.
Semantic memory. Facts and knowledge about your project. Architecture decisions, file paths, integration details, known constraints. "The template engine uses scoped CSS in a .template-preview container." Spoke files handle this perfectly. The knowledge is stable, domain-organized, and needs to be precise rather than fuzzy.
Episodic memory. Records of past experiences. "In session 47, we tried approach X and it failed because of Y." "The user prefers shorter variable names." "Last time we refactored this module, we missed a test." This is where files break down. You can't practically store hundreds of session summaries in spoke files and expect useful retrieval.
The insight is that most developer workflows lean heavily on procedural and semantic memory. You need Claude to know how your project works and how to work within its constraints. Episodic recall across hundreds of sessions is a nice-to-have, not a daily need.
Files cover the 80% case for AI agent memory. If you need the other 20%, add a vector layer specifically for episodic memory. Don't rebuild your entire memory system around it.
The real cost of complexity
There's a factor that rarely gets discussed in the "which memory system" conversation: maintenance burden.
A vector database needs to be kept in sync with your evolving codebase. When you refactor a module, the old embeddings become stale. When you change an architectural decision, the embedded chunks still reflect the old approach. Re-embedding is not free: it takes time, costs money, and requires you to notice that the knowledge has drifted.
A graph database is even more demanding. Relationships between concepts change as your project evolves. Edges that were true last month may be wrong today. Maintaining a knowledge graph requires ongoing curation that is, ironically, the same kind of manual work that vector databases promise to eliminate.
Files have this maintenance cost too. But the cost is visible and manageable. I can read a spoke file in 30 seconds and tell whether it's current. I can't do the same with a vector index containing 10,000 embedded chunks.
The projects I've seen succeed with AI agent memory are the ones that match their memory system to their actual maintenance capacity. A sophisticated system that drifts from reality is worse than a simple system that stays accurate.
My recommendation
Start with files. I mean this seriously, not as a hedged "it depends" answer.
Create a CLAUDE.md with your project setup, coding conventions, and current status. If you outgrow that, add a hub-and-spoke pattern with topic-specific spoke files. If you want more detail on how to set that up, I wrote a complete guide in Hub-and-Spoke Memory.
Add complexity only when you hit a specific limitation that files can't solve. Not when you think you might need it someday. Not because a blog post made vector databases sound impressive. When you have a concrete problem, like "I need my agent to recall which approach we tried for this bug three months ago," and files can't solve it.
Most projects never reach that point. Most developer workflows are procedural and semantic: "know my project, follow my conventions, remember my architecture." Files handle all of that with zero infrastructure, perfect determinism, and full human auditability.
For more on building persistent AI knowledge systems, see my guides on CLAUDE.md files, context engineering, and persistent AI knowledge bases.
Everything is markdown
Here's something I keep coming back to. Every piece of AI memory I've discussed is markdown. CLAUDE.md, spoke files, rules files. Even the structured outputs from vector retrievals get rendered as text. Markdown is the native format of AI communication, which is exactly why I built Unmarkdown™ to bridge the gap between markdown-native AI and the formatted documents humans actually read.
The AI ecosystem runs on markdown. Your agent's memory is markdown. Your prompts are markdown. The code your agent writes gets discussed in markdown. And then, when you need to share that work with humans who use Google Docs, Word, Slack, or email, you need to convert it into something they can read.
That translation layer, from AI-native markdown to human-readable formats, is the problem I think about every day. The memory question and the formatting question are two sides of the same coin: how do you make AI output useful in the real world?
File-based memory solves the persistence problem with elegant simplicity. For the formatting problem, that's what Unmarkdown™ is for.
