AI Memory Explained: What It Actually Changes (And When CLAUDE.md Is Enough)

Matthias Meyer

You spent an hour yesterday debugging something with Claude. Today you open a fresh session and ask about it. Claude knows nothing.

That is memory. Or more precisely, that is the absence of memory.

Anyone who has worked with AI assistants for more than a few months knows the feeling. You explained, confirmed, documented something. Next login, gone. Back to zero.

There are now roughly a dozen systems that claim to solve this. Some cost zero, some 475 dollars per month. Some run locally as markdown files, some as cloud services with a knowledge graph. What they actually do, and what is just marketing, is harder to figure out than it sounds.

This article walks through the main options. What they do, what they do not do, and when the effort pays off.

Memory is not one thing, it is three

Before we talk tools, let me draw a quick map. There are three layers in the market that all get sold as "memory" but are not the same thing.

Layer 1. Static notes. A markdown file like CLAUDE.md or AGENTS.md. You write "we use TypeScript strict mode" into it and Claude reads it on every session. This is memory in its simplest form. No algorithm, no embeddings, no cloud. Just a file that always loads.

Layer 2. Accumulating notes. This is what Claude Code has been doing since March 2026 with Auto-Memory. The AI writes notes itself while working, into the same markdown file. Over weeks the file grows. Auto-Dream cleans up once a day, merges duplicates, replaces "yesterday" with concrete dates. ChatGPT Memory is essentially the same, except you cannot see what OpenAI puts in there.

Layer 3. Structured memory with a knowledge graph. This is Mem0, Zep, Letta, and our own thing. Memory is not stored as text but as a graph of entities and relationships. Pasquale is a business partner. Pasquale works at company X. Company X is in sector Y. Plus semantic search, confidence decay, bi-temporality, meaning what was true when.

The three layers solve different problems. That matters in a minute.

Layer 1, why CLAUDE.md is enough for 80 percent

I have spent the last few months building my own memory system. One thing that keeps coming up in research, and that I have to confirm, is that most developers do not need a memory system. They need a well-maintained markdown file.

CLAUDE.md for Claude Code and AGENTS.md for OpenAI Codex and a few others are static files in the project. They get loaded into context at every session. You write what you would otherwise have to explain again and again, and never have to explain it again.

What typically goes in:

Language, framework, conventions
Where the files live, which commands matter
Which mistakes have already happened on this project and should not happen again
Personal style. Be direct, do not be overly friendly.

The advantages are massive. You see everything. You edit everything. It is in the git repo, you have history, you can share with teammates. It costs nothing, has no vendor lock-in, no OAuth, no cloud, no subscription. If you delete the project, the memory is gone, which is usually exactly what you want.

There are real downsides too. CLAUDE.md does not accumulate by itself. You have to maintain it. If you do not, the file goes stale and wrong. And it is static, meaning everything in it gets loaded every time, regardless of relevance. That eats context tokens.

Plus, it is per project. What you learn in one project does not automatically land in another. If you have five projects and discover in one that prisma db push is dangerous, you would have to copy that into all five.

Still, for a single project with a manageable set of conventions, CLAUDE.md is fully sufficient. Anyone who tries to sell you more than that is overselling.

Layer 2, what Auto-Memory actually does

Since March 2026, Claude Code has a feature called Auto-Memory. Default is on. While you work, Claude writes notes in the background to a local memory file. Build commands, architecture decisions, your preferences. Across sessions these accumulate.

Plus there is Auto-Dream. A background subagent that runs once a day across all memory files and consolidates them. Duplicates out. Stale info out. Relative dates like "yesterday" get rewritten as absolute dates so the file does not read confused six months later. Anthropic markets this as the nighttime brain of the AI.

Sounds good. Is useful. Has three limitations that get rarely mentioned.

First, it is local and Claude-Code-only. If you also use Cursor, or Codex, or try a different tool, none of them see what Claude Code wrote. Cross-tool does not exist.

Second, it is markdown-file-based. No knowledge graph, no semantic retrieval, no confidence score, no bi-temporal model. If you ask what we decided last week about auth architecture, Claude has to scan the entire file. That works as long as the file stays small. Once it grows past a few hundred lines, the model starts missing things.

Third, the nasty one. Auto-Memory can collide with external memory systems. If you run your own memory system that also writes to the Claude config directory, which a lot of people do, Auto-Dream might consolidate or scramble your own files at night. We hit this ourselves and ended up explicitly disabling Auto-Memory and Auto-Dream.

If you only use Claude Code and have no other memory tools, Auto-Memory is a fine default. If you do more, it is a hazard.

ChatGPT Memory, the black box

ChatGPT has had a memory feature since early 2024. You tell it something and it remembers. Next time it surfaces somehow.

The problem with ChatGPT Memory is you cannot see what it knows. There is a small settings tab with a reduced list, but not everything that actually sits in your embeddings. You cannot export. You cannot move to another tool. If OpenAI changes or removes the feature tomorrow, the memory is gone.

This is not malicious, this is how SaaS features often work. It is just important to know that ChatGPT Memory does not belong to you. It belongs to OpenAI. You use it.

For private smalltalk, fine. For work memory you want to query in a year, wrong tool.

Layer 3, the structured memory servers

This is where it gets interesting. Mem0, Zep, Letta, Memobase, Cognee, Supermemory, and a handful of others have raised serious money over the last 18 months to solve the memory problem. Each with a slightly different angle.

Mem0 is the best known. 21 framework integrations, big GitHub community, Apache 2.0 client. Sounds great. Look closer and there are two issues. The knowledge graph, the thing that actually makes memory structured, is locked behind the Pro plan starting at 249 dollars per month. Free and Starter are pure vector search. And on independent benchmarks, especially LongMemEval which is the standard for memory accuracy, Mem0 reaches only 49 percent. Significantly below what the competition manages.

Zep with the underlying Graphiti framework is the direct technical alternative. Bi-temporal, sub-200ms latency, SOC 2 Type II, HIPAA. On LongMemEval Zep lands between 63 and 71 percent depending on setup. Pricing is credit-based, starts at 25 dollars per month but scales with volume. The open-source Graphiti framework is self-hostable but you need Neo4j as a graph database and some infrastructure savvy.

Letta, formerly MemGPT, is the academic option. Apache 2.0, OS-inspired tiered memory with Core, Archival, Recall. Self-hostable, free, with API options starting at 20 per month. Letta targets people building greenfield agent systems. As a bolt-on for an existing setup it is heavyweight.

Memobase, Cognee, Supermemory, Hindsight. Various newcomers with various pricings, 19 to 399 per month, and various architectures. Hindsight is research-grade, hits 91 percent on LongMemEval but is not a polished SaaS. Mastra Observational Memory hits 94 percent but again has no typical product channel.

What they all share:

Cloud storage in their own infrastructure, US-default for most
DSGVO compliance is painful unless you choose carefully
Knowledge graph is usually the expensive premium feature
Cross-tool only works if all tools speak MCP
You have to maintain your data or it goes stale

What memory cannot do, and why that matters

While researching this article I had a critic agent rip the idea apart from the other side. One thing came up that I had not had on my radar quite as clearly.

Memory amplifies sycophancy.

Sycophancy is the documented behaviour of language models adapting to user opinion. If you tell the AI early, I love Tailwind, the AI will trend pro-Tailwind through the rest of the conversation, even if it would have been more neutral without that initial signal. This is measurable, it is published, it is a known issue.

Memory makes it worse. If you tell the AI every day that you love Tailwind, and the AI writes that into its knowledge graph with confidence 0.95, you very quickly have a memory system that confirms your Tailwind enthusiasm forever. Even when your project is screaming for a switch to CSS Modules.

Not a single memory provider currently addresses this. There is confidence decay, there is contradiction detection, but there is no bullshit detector that notices you are talking yourself into your own echo chamber.

There are a few more issues every memory provider officially addresses but rarely solves well.

Stale facts. You tell the AI in February you live in Hamburg. In April you move to Mallorca. Did you tell the AI explicitly? Probably not. You mention something like, I was at the beach today. Now the AI has two contradictory facts. Which one wins? Depends on the system. Some auto-detect contradictions (we do, Mem0 from Pro), some do not. If the system does not catch it, you have a hallucination source.

Privacy drift. You tell the AI something personal in a private chat. Next time you are working on a code review workflow and suddenly the AI brings up that personal detail. This is the difference between memory and selective memory with context awareness. Few systems have the latter today.

Context pollution. When your memory knows too much, every new prompt gets fed a mountain of supposedly relevant context. The AI gets lost in details, misses the point, hallucinates because it tries to integrate everything. This happens especially with markdown-based memories that load in full instead of being queried selectively.

Maintenance debt. Memory without care degrades. If you do not regularly archive old stuff, invalidate wrong stuff, link new connections, your memory becomes unusable within months. Like a cluttered desk.

When does what actually pay off

That is the question no marketing wants to answer. Here is my honest attempt.

You need nothing if you use AI sporadically, on side projects, in disconnected sessions. Classic ChatGPT use. Some code, some brainstorm. Memory would be more effort than benefit.

You need CLAUDE.md or AGENTS.md as soon as you work on a project longer than two weeks. At the latest. Write what you would otherwise explain every session. Keep it under 500 lines or it eats context. Update it weekly. That is the threshold where the effort starts paying off.

You need Claude Code Auto-Memory if you exclusively use Claude Code and do not already run your own memory system. Leave default-on, check weekly what got written, disable for a day if it accumulates nonsense. But careful if you run other memory systems in parallel, then disable.

You need a structured memory server if three conditions stack. One, the project runs longer than six months. Two, you work on it multiple days per week. Three, multiple people or multiple tools should access the same memory. If only one of these is true, the effort almost never pays off. If all three are true, it pays off enormously.

You need enterprise memory with SOC 2, HIPAA, dedicated hosting, if you are in an industry that requires it. Period. Then Zep is the obvious candidate, or self-hosted Letta plus your own compliance team.

What we build, and why

We have been building our own memory system since February 2026. Called StudioMeyer Memory, runs over the MCP protocol, currently has 53 tools and a knowledge graph that internally has grown to 2000 entities and 1500 learnings.

We do not build this because the others are bad. We build it because we wanted a few things differently.

Knowledge graph from the free tier instead of Mem0 starting at 249 per month. Cross-platform import, you drop your ChatGPT, Claude, Gemini, Copilot, Perplexity conversations and get a starter memory. EU hosting in Frankfurt, DSGVO out of the box. Confidence decay, automatic contradiction detection, episodic-and-semantic separation. 90 percent on LongMemEval, which puts us well ahead of Mem0 (49) and Zep (63) and only just behind the research-grade systems Hindsight (91) and Mastra OM (95).

Those are technical details. What matters as honest framing, not as a pitch: memory is a tool, not a product. It needs to fit your workflow or it becomes annoying. If CLAUDE.md works for you, use that. If a hosted server with OAuth is too complicated for your single project, then it is too complicated.

What we genuinely believe is that memory will become a standard layer over the next one to two years, similar to databases. You will pick a memory engine the way you pick a Postgres flavour today. One that is DSGVO-ready, that fits your tool stack, that you will still be able to use in five years because you can export the data.

The quick check for your situation

Ask yourself in order.

One, am I working on this project for more than two weeks? If no, you do not need memory.

Two, are there conventions I keep re-explaining to the AI? If yes, write a CLAUDE.md.

Three, is the project running longer than six months and am I using it multiple days per week? If yes, a structured memory pays off.

Four, do I need the memory in more than one tool, or by more than one person? If yes, you need a cloud-based solution with MCP or a similar protocol.

Five, is compliance a thing, meaning HIPAA, SOC 2, EU hosting? If yes, look specifically at providers that support it.

Six, do I want this memory in five years too? If yes, look at export functions and data ownership. Black-box memory is against you.

If all answers are no, leave it. If the first two are yes, CLAUDE.md. If the first four are yes, MCP-based memory server. Which one specifically depends on your stack and budget.

What is next

We are turning this article into a small learning path on our academy. A lesson on CLAUDE.md hygiene. One on Auto-Memory and Auto-Dream. One on knowledge graphs as a concept. One on sycophancy and how to protect against it.

If you have questions or experiences with a system I did not mention, reach out. Memory is a market that is moving fast, and no single article will cover it fully. But the underlying logic does not change. Memory helps when you need it, gets in the way when you do not. The trick is knowing which is which.