20+ AI Agents Instead of 6 Employees: How Agent Fleet Works

Matthias Meyer

What happens when an agency doesn't have six employees but twenty specialized AI agents? This isn't a hypothetical -- it's our daily reality at StudioMeyer. Since early 2026, we've been running an Agent Fleet that writes marketing copy, analyzes code, conducts research, onboards clients, and improves itself.

This article shows how we built the system, the architecture behind it, and what we've learned along the way.

What Is an Agent Fleet?

An Agent Fleet is a system of multiple AI agents working as a team. Each agent has a clearly defined role, its own tools, and its own memory. Instead of one generic chatbot that does everything mediocrely, you get specialists -- just like a real team.

The critical difference from a single AI assistant: the agents communicate with each other, delegate tasks, and cross-check each other's results.

Our Three Fleets

We operate three specialized Agent Fleets with over 20 agents in total:

StudioMeyer Fleet (16 Agents)

The operational backbone. These agents handle day-to-day business:

CEO Agent -- Orchestrates all other agents. Receives tasks and delegates to the right specialist. Makes no decisions of its own -- it coordinates.
DevOps Agent -- Monitors servers, checks container health, executes deployments.
Marketing Agent -- Creates social media content, plans campaigns, manages content pipelines.
Sales Agent -- Qualifies leads, creates proposals, maintains CRM data.
Onboarding Agent -- Guides new clients through the setup process.
Support Agent -- Handles client inquiries and escalates when needed.
CRM Agent -- Manages contacts, tracks interactions, maintains the pipeline.
Analytics Agent -- Evaluates website statistics and generates reports.
SEO Agent -- Monitors rankings, checks technical SEO, tracks AI visibility.

Each agent has exclusive tools. The CRM agent accesses the CRM system, the DevOps agent accesses server monitoring -- but never the other way around. This prevents tool sprawl and keeps responsibilities clean.

Nex Fleet (Research and Quality Assurance)

Our innovation lab. These agents think, analyze, and challenge:

Research Agent -- Investigates technologies, markets, and competitors. Verifies claims against actual code.
Critic Agent -- Devil's advocate. Questions every idea, every report, every plan. Actively searches for weaknesses.
Analyst Agent -- Analyzes code quality, architecture, and system health across all projects.

The Nex Fleet has a special property: each agent filters its memory by relevance. The Research Agent retrieves semantic knowledge (facts, architecture), while the Critic retrieves only episodic knowledge (mistakes, incidents). This prevents confirmation bias -- the Critic isn't influenced by confirming memories.

Social Fleet (LinkedIn Engagement)

Four agents working as a pipeline:

Research Agent -- Investigates people and companies, verifies claims against our codebase.
Analyst Agent -- Writes drafts with tagged claims.
Critic Agent -- Fact-checking, secret guard (no internal IPs, database names, or client data), veto power.

The pipeline runs in two phases: Research and Analyst work in parallel, then the Critic reviews with veto authority. Maximum two revisions, then it escalates.

The Architecture Behind It

Agent SDK Instead of CLI Spawning

All our agents run on the Anthropic Agent SDK. This means each agent is a standalone process with full access to its MCP servers (Model Context Protocol), its own memory tools, and configurable limits.

The advantage over simple CLI process spawning: agents get in-process MCP access. A spawned sub-agent would be blind -- no access to code analysis, no memory, no web research.

Dedicated Memory Per Agent

Each agent has isolated database tables for its memory. Nine tables per agent: Sessions, Decisions, Learnings, Patterns, Learning Links, Contradictions, Decision Links, Syntheses, Reflections.

That's over 50 tables just for the Agent Fleet. Sounds like a lot, but it has a crucial advantage: no agent can corrupt another's memory. If the Marketing agent stores a false insight, it doesn't affect the DevOps agent.

Each agent has 22 memory tools available:

Core: Learn, recall, load context
Decisions: Log, track outcomes, follow decision chains
Intelligence: Detect contradictions, link learnings, assign rewards
Synthesis: Summarize insights, recognize patterns
Cross-Agent: Query other agents' knowledge (with limits)

Neutrality Guard

A common problem with agents that have memory: confirmation bias. When a Critic agent retrieves past confirmations ("this worked last time"), it becomes uncritical.

Our solution: Critics receive only mistakes and warnings from memory, never confirmations. The Critic should judge independently, not rely on past successes.

Hard Limits Against Memory Drift

Unlimited memory retrieval sounds great but leads to memory drift: the agent loses focus on the current task and gets lost in old memories.

Our limits: maximum three results from own memory recall, maximum two per agent for cross-agent queries. The current task always takes priority over memory context.

Darwin: Self-Improvement

Perhaps the most interesting feature: our agents improve themselves. The system is called Darwin and works like this:

Every agent run is automatically evaluated (length, hallucination markers, source check, structure).
Three parallel Sonnet instances score the output (multi-critic).
Based on the scores, prompts are automatically evolved.

This runs invisibly in the background. Same workflow, same commands -- but quality continuously improves. We've currently run over 280 experiments across multiple prompt versions.

What We've Learned

Specialization Beats Generalism

An agent that can do everything can do nothing well. Our best results come from highly specialized agents with few, clearly defined tools. The sweet spot is 10 to 20 tools per agent.

Orchestration Is Key

The CEO agent has zero tools of its own. Its only ability: commissioning other agents. This sounds counterintuitive, but an orchestrator without its own agenda makes better delegation decisions.

Memory Needs Hygiene

Storing more doesn't mean remembering better. We had to learn that an admission control system (like in a database) is more important than storage capacity. Five factors determine whether a learning gets stored: novelty, specificity, source reliability, consistency, and relevance.

Maximum Three Agents in Parallel

Running more than three agents simultaneously leads to context loss and coordination problems. Three in parallel, extend sequentially -- that's our proven pattern.

Results

After three months with the Agent Fleet:

Over 275 stored learnings from agent runs
88 completed sessions
29 documented decisions
Over 40 research reports
Continuous self-improvement through Darwin

The Agent Fleet is no longer an experiment. It's a productive system that accelerates and improves our work every day.

Conclusion

Building an Agent Fleet isn't a weekend project. It requires clear architectural decisions: isolated memories, exclusive tools, hard limits against drift, and an orchestrator pattern that separates delegation from execution.

But once the system is in place, it transforms how an agency operates. Not because AI replaces humans -- but because specialized agents handle routine work, freeing people to focus on what truly matters: creativity, strategy, and client relationships.