AI Memory: What We Learned About AI Memory Systems After 550 Sessions

Matthias Meyer

AI assistants forget everything after the conversation. Every new session starts from zero -- no context, no experience, no learning curve. For a one-time chat, that's acceptable. For a system that co-runs an agency, it's untenable.

That's why we built our own memory system. After over 550 sessions, 1,100 stored learnings, and 180 documented decisions, we're sharing what worked -- and what didn't.

The Problem: Forgetful AI

Imagine your most important employee forgets everything every morning. Every meeting, every decision, every experience -- gone. You'd have to start from scratch every day.

That's exactly how most AI systems work. No matter how brilliant the answer was -- in the next conversation, the insight has vanished. This makes deployment in real business processes problematic.

Our Approach: Two Types of Memory

The human brain distinguishes between episodic memory (experiences, mistakes, specific situations) and semantic memory (facts, concepts, general knowledge). We've applied the same principle to our AI system.

Episodic Memory

Mistakes and their causes
Specific incidents and how they were resolved
Decisions and their context
Patterns that appeared in certain situations

Semantic Memory

Architecture knowledge (how the system is built)
Infrastructure facts (which server does what)
Technology assessments (which tool suits what purpose)
Business rules and processes

Auto-classification happens at storage time: a reported bug is automatically classified as episodic, an architecture insight as semantic. At retrieval time, results are filtered -- a Research Agent gets semantic facts, a Critic Agent gets episodic errors.

The Architecture

The system is built on PostgreSQL with three search layers:

Vector Search -- 512-dimensional embeddings for semantic similarity. Finds related concepts even with different phrasing.
Trigram Search -- Fuzzy matching for imprecise queries. Finds "the SSL thing" even when stored as "Certbot renewal."
Full-Text Search -- Classic keyword search for German and English content.

All three layers are combined using Reciprocal Rank Fusion. The result: search quality that reliably answers both precise queries and vague recollections.

22 Tables for Structured Knowledge

The memory isn't one large text collection but a structured system:

Sessions -- When work happened, on which project, what was the result
Decisions -- What decisions were made, with what reasoning, what alternatives
Learnings -- What was learned, in which category, how often it was retrieved
Knowledge Graph -- Entities (projects, servers, people, tools) with observations and relationships
Skills -- Which capabilities were developed, how often successfully applied
Syntheses -- AI-generated summaries from learning clusters

The Knowledge Graph

Beyond linear memory, we maintain a Knowledge Graph with over 150 entities, 1,300 observations, and 180 relationships. Each entity has a type (project, server, person, tool) and any number of observations with timestamps and confidence scores.

This enables questions like: "Which servers does Project X use?" or "When was Tool Y last updated?" -- without that information existing explicitly in any document.

Five Features That Make the Difference

1. Admission Control

Not every piece of information deserves a place in memory. Our admission control system evaluates every new learning with five factors:

Novelty -- Does this insight already exist in similar form?
Specificity -- Is the information concrete enough to be useful?
Source Reliability -- Does it come from a trustworthy source?
Consistency -- Does it contradict existing knowledge?
Relevance -- Does it fit the current project context?

Information scoring below 0.3 gets rejected. Sounds strict, but it prevents the gradual quality degradation that's inevitable with uncontrolled storage.

2. Importance-Adaptive Decay

Not all memories are equally important. Our system calculates an Importance Score from five factors: retrieval frequency, recency, links to other learnings, user feedback, and propagated importance (similar to PageRank).

The key point: important memories decay up to six times slower than unimportant ones. A fundamental architectural decision stays relevant for months. A debugging workaround loses significance after weeks.

3. Lifecycle States

Every learning passes through three states:

Active -- Retrieved and ranked normally
Ephemeral -- Low importance, demoted in search results
Archived -- Removed from standard searches but still findable when needed

Transitions happen automatically based on the Importance Score. A learning can also be reactivated when it's retrieved again.

4. Bi-temporal Relationships

The Knowledge Graph stores not just current facts but past ones too. Every relationship has four timestamps:

When the relationship became valid in reality
When it became invalid
When it was recorded in the system
When it was marked as outdated in the system

This enables questions like: "What did we know about Server X on March 15th?" -- not just the current state, but the knowledge state at any point in time.

5. Causal Relationships

Beyond simple relations (A uses B, A belongs to B), the graph supports eight causal relationship types: caused, prevented, triggered, blocked, enabled, and more. Each causal relationship has an evidence field.

This enables chains like: "Decision A led to Problem B, which was prevented by Measure C." These causal chains are traversed automatically.

What We've Learned

Store Less, Retrieve Better

Our system was initially write-heavy: lots was stored, little was retrieved. The most important insight was that retrieval quality matters more than storage volume. Admission control and intelligent ranking delivered more than any new feature.

Contradiction Detection Is Essential

Over months, contradictory information accumulates. "Server X uses PostgreSQL 14" and three months later "Server X was migrated to PostgreSQL 16" -- both statements are correct, but only the second is current. Automatic contradiction detection and bi-temporal data management solve this problem.

Memory Limits Prevent Drift

Unlimited memory access sounds optimal but causes the agent to lose focus. Fixed limits (maximum three results per query) force the system to return only the most relevant information.

The Numbers

After three months of operation:

Over 1,100 stored learnings (episodic and semantic)
181 documented decisions
156 entities in the Knowledge Graph with 1,300 observations
180 relationships between entities
555 tracked sessions
393 automated tests

Conclusion

Building an AI memory system is easier than maintaining one. The real challenge isn't storage but quality control: what gets stored, how long it stays relevant, how quickly it's found.

The combination of episodic and semantic memory, strict admission control, and adaptive decay has transformed our system from a simple knowledge base into a learning memory that becomes more useful every day.