Nine AIs walk into a Mediterranean town

Matthias Meyer

We just opened up a new research project. It combines tools we already use day to day. LangGraph for orchestration, Temporal for durability, Langfuse for observability, Darwin for prompt evolution, StudioMeyer Memory for long-term recall, Claude via subprocess for the LLM calls, Postgres LISTEN/NOTIFY for the live feed, Black Forest Labs Flux 2 Max for portraits, Next.js with React Three Fiber for the town. None of that is new on its own. Putting it together into a sandbox where nine Claude instances live a full sixty-year economic life and we watch what happens, that is the new part.

The project is called Polis. It lives at aklow-labs.com/polis. aklow-labs.com is the brand container around it, our research lab under Studio Meyer. The same studio runs studiomeyer.io for client work, studiomeyer.academy for AI operator training, aifinca.es for founder retreats on Mallorca, matthiasmeyer.tech for the open-source hub and meetmyagent.io as the free AI-native visibility platform. Polis is the research arm. One brand, six doors into the same workshop.

Why this and not another agent demo. There is a lot of talk about whether AI can replace knowledge workers. There are demos where an AI books a flight or writes an email. There is almost no serious work on whether AI can sustain itself economically over time. Not "summarize a document" but "build a life". Earn enough to pay rent. Build a customer base. Get a loan. Survive a recession. Make trade-offs between career and family. The kind of thing every adult human navigates without thinking, and that no AI has been seriously asked to do at scale.

So we built a sandbox where Claude can try. Nine AI citizens, three on Claude Opus 4.7, three on Sonnet 4.6, three on Haiku 4.5. None of them knows which model it is. Over 720 ticks (one tick equals one month of game time, twelve ticks per real day) a full sixty-year life plays out across sixty real days. The cron fires every three hours and pushes the world one step further.

Tick zero, who shows up#

Before any month plays out, every citizen walks through a setup workflow. Ten stations, sequential per citizen, all wrapped as a Temporal workflow with per-citizen heartbeat and retry. Class-asymmetric from the very first roll.

Demographics and personality. Gender, birthday, western zodiac sign, Big Five personality with a zodiac modifier on top. Leo gets a small extraversion boost, Capricorn gets conscientiousness. Plus a shadow trait that fires when mood drops below thirty or stress climbs over eight. Scorpio becomes vengeful, Pisces escapist. Twelve signs in the table, each with their own multipliers in the verb resolver.

Origin and inheritance. Four social classes, fixed distribution per season. One rich heir, three middle class, three working class, two poor. The rich heir starts with seventeen and a half thousand cash plus an inherited five-hundred-sixty-thousand-euro villa at housing stage six. The numbers already include a thirty percent inheritance tax baked in, so the heir is asset-rich and cash-poor on day one. The villa eats fifteen hundred euros per month upkeep, which forces the heir to either earn or sell. Middle class starts with five thousand cash and a one-bedroom apartment they rent. Working class starts with one thousand cash and a shared flat room. The poor start with two hundred cash and five thousand euros of inherited debt, sleeping in a hostel bed. The economy starts asymmetric. That is the point.

Name and talents. US first name picked from a curated pool of one hundred fifteen unique names, deduplicated against the other citizens. One or two innate talents drawn from a pool of twenty (analytical mind, charisma, hustle, etc.). Two or three self-chosen life goals from a pool of fifteen (build a million in savings, get married and have kids, become mayor, write a book people read, take revenge on a specific other citizen, etc.).

Life philosophy. Each citizen picks one of eight archetypes via a Claude call. Lebemann, Karrierist, Stoiker, Familienmensch, Bohemien, Idealist, Hustler, Drifter. This is a load-bearing feature for the research question. Does Opus pick Stoiker more than Haiku does? Does Haiku end up Drifter disproportionately? The philosophy biases verb choice and karma drift across the whole life, so by year thirty the philosophy is visible in the citizen's behaviour pattern.

Portrait. The citizen describes their own appearance as a JSON schema (haircolor, distinguishing features, expression, outfit, species). The engine wraps that in an English master prompt and asks Black Forest Labs Flux 2 Max for a 1024x1280 portrait. Every ten game-years, so every 120 ticks, a new portrait is generated. Seven portraits per citizen over a full life. Cost lands around forty-seven cents per season for the initial portraits, around three dollars fifteen for a full sixty-year arc with all the refresh rounds.

Backstory. The storyteller agent writes a two or three sentence backstory in third person. "Sarah grew up in a row-house neighbourhood at the edge of town. Her father worked in the workshop of a used-car chain. By twelve she already had a plan." The backstory persists in player_stats.backstory and shows up in the citizen profile page.

Job choice. This is where the V2 lottery is gone. The new system runs an origin-aware, personality-aware, philosophy-aware pre-filter over the thirty available jobs and presents the citizen with a top-six shortlist via Claude. The rich heir sees hedge fund manager, real estate investor, lawyer prominently. The poor citizen sees drug dealer, hacker, hairdresser, construction worker. High conscientiousness boosts the structured jobs. Then Claude picks one from the shortlist. Each job belongs to one of three life paths described in the next section, which determines how much study debt the citizen takes on, when income starts flowing and how much police heat their work generates.

Housing. Cash-aware filter over ten housing stages and four occupancy modes (rent, buy, inherited, with parents). The rich heir gets pushed toward keeping the inherited villa. The poor citizen gets the hostel bed or the with-parents option. Claude picks.

Three life paths, three risk profiles#

The thirty jobs are not equal. Every job belongs to one of three life paths, and the path decides whether the citizen needs to study first, when the money starts flowing and how much police attention the work attracts. This is the asymmetry we shipped in V3.5 as the Real-Life-Foundation, because the earlier flat-income model felt like a tutorial game where everyone had money. Real life is not like that.

Wissensarbeit, eight jobs. Doctor, lawyer, architect, software developer, tax advisor, journalist, teacher, hedge fund manager. Four to six years of study at career stage zero, with minus eight hundred euros per month living costs and plus four hundred euros per month study debt accumulating. One percent dropout chance per tick. If the citizen drops out, they fall back to sales clerk at junior level and their accrued study debt rolls into regular debt. After graduation the junior salary is point four times the senior baseline, middle career point seven times, senior one point zero times. After sixty real ticks a wissensarbeit citizen still sits in negative cash because the debt is still being paid back. After two hundred and forty ticks (twenty years in game) they out-earn everyone else.

Normal, nineteen jobs. Everyone else from hairdresser to mayor candidate. Income starts at the first tick with no study barrier, median fifteen hundred to eighteen hundred euros per month at junior level. The career peak is lower (senior hairdresser earns about twenty-five thousand on the Mallorca median), but stable and steady from day one.

Illegal, three jobs. Drug dealer, hacker, contract killer. Four thousand to fifteen thousand euros per month in black cash, no taxes, no need to study. But every action raises the heat level, and the police-interrogation and drug-raid mechanic below makes that heat very real.

Inside the careers, four stages: study (ausbildung), junior, middle, senior. Promotion happens automatically after the education-years-required threshold is hit, then again after roughly six and twelve years of working tenure, gated by skill level, mood and a roughly seventy percent boss-NPC approval roll. A burnt-out lawyer with mood below thirty does not get promoted even if every other criterion is hit.

Real-life pressure starts at tick one#

V3.5 also added three crisis mechanics that fire emergent during play. None of them are scripted set-pieces. They roll out of state, in response to what the citizen is doing.

The drug-addiction coping path. When stress climbs above eight and mood falls below thirty for three ticks in a row, a base five-percent chance per tick fires that the citizen drifts into substance use as a coping strategy. The life philosophy modifies the chance: Lebemann plus five percent, Drifter plus four, Bohemien plus three, Hustler plus two. The substance is picked from a small pool weighted by philosophy and origin. Lebemann tends toward kokain and alkohol, the poor tend toward tabletten and cannabis, rich heirs end up on kokain. Once entered, the citizen moves through five DSM-5 stages: casual, regular, heavy, addicted, rock-bottom. Each stage carries a monthly cost (one hundred, three hundred, six hundred, one thousand and worse), a daily mood and stress drift, and an increasing health decay. Three exits exist. Therapy costs ten thousand euros plus three ticks of clinic time and recovers the citizen eighty percent of the time. Cold turkey is free but only fifty percent successful and costs fifteen mood for six ticks even on success. Overdose fires at five percent per tick once the citizen is addicted or worse, sometimes deadly, sometimes a thirty-tick coma.

Police interrogation and drug raids. Heat is the public-suspicion meter for criminal activity. In V3 it sat there as a number with no real effect. In V3.5 it has teeth. At heat five or above, two percent per tick the citizen gets pulled in for questioning. Thirty percent chance of confession, which costs thirty percent of cash on hand. Even no-confession costs reputation. At heat seven or above, a drug-raid world event can fire and seize fifty percent of cash plus a hundred percent of black-market money, with three ticks in jail attached. Jail freezes income but family costs and rent keep running, so a jailed citizen with three children is in deep trouble.

Eight cash shocks. Life keeps punching even when the citizen does everything right. Birth (five to ten thousand euros one-off plus a kid in daycare), move (five to fifteen thousand), divorce (lawyer fees plus a fifty-percent asset split plus four hundred euros per kid per month alimony for as long as the kids are minors, mood minus thirty), acute illness (two to fifteen thousand plus sick leave), chronic illness diagnosis after age fifty (two hundred euros per month forever), car repair (annual roll at forty percent), parental care (fifteen hundred for a care home or eight hundred with a fifty-percent income hit), villa upkeep for the rich heir (fifteen hundred per month, forever). Real children costs follow a FIFO-aged tracker: daycare six hundred per child per month, school four hundred, university seven hundred.

These mechanics combined make the simulation feel like a life rather than a strategy game. A friseur with two kids and a divorce can crash to minus a hundred thousand euros even doing nothing wrong, just by being unlucky in three rolls. A wissensarbeit citizen sits broke for the first decade and then becomes the richest in town. An illegal-path citizen can run hot for years and then lose half their cash in a single raid. Identity collides with job too: a Lebemann doing tax-advisor work loses two mood per tick, an Idealist in any illegal job loses three. After three consecutive ticks of philosophy-job collision, the LLM prompt for that citizen biases toward changing jobs. The economy and the personality are not separate systems anymore.

By the end of tick zero each of the nine citizens has a name, a face, a backstory, an income, a place to sleep, a karma direction and a private list of what they want from life. The town is populated. Tick one starts.

How a month plays out#

The simulation engine runs on LangGraph. Each tick is one workflow run with seven phases.

One. Cashflow. Salary lands. Rent, taxes, loan payments, insurance and pension deductions go out. Restaurant customers come in if a citizen owns one. Stocks tick up or down based on the market. Children cost two hundred euros each. Vices cost extra at low mood. Burnout costs a thousand euros plus a hit to health when stress was too high.

Two. Decision time. Each citizen freely picks four actions for the month from about thirty verbs. The verbs are filtered to what their job allows. A tax advisor cannot order a hit. A contract killer does not pitch insurance. Work harder. Look for new customers. Invest. Buy. Sell. Hire. Fire. Negotiate. Start a relationship. Marry. Get divorced. Bribe a politician. Blackmail a rival. Launder money. Order a hit. Do nothing and rest. The nine Claude calls run in parallel, so the decision phase finishes in roughly twenty seconds wallclock.

Three. Resolver. All thirty-six decisions (nine citizens times four actions) get rolled in dependency order with skill checks, karma checks, market response and counter-actions from affected NPCs. A "hire" call without enough cash bounces. A "marry" call needs a willing partner. A "order a hit" call rolls stealth against the target's friends.

Four. World events. Twenty different world events sit in the pool. Tourism boom. Pandemic. Political scandal. Natural disaster. Recession. Tech bubble. Lottery jackpot. Migration wave. The roll fires every few ticks and reshapes the local economy.

Five. NPC wildcards. Ten NPC archetypes show up every ten ticks throwing an offer or a threat in a citizen's face. The mysterious investor offering ten percent above market for the failing business. The estranged sibling asking for a loan. The undercover police officer making contact with the drug dealer.

Six. Lifecycle. Death roll by age, insolvency check, health decay if stress was elevated for too long, ageing of skills and personality drift. Someone who got betrayed twice becomes more cautious. Someone who succeeded early becomes more confident. The promotion module runs here: an eligible citizen (right skill, right tenure, right karma) gets offered the next career stage with a roughly seventy percent approval rate, scaled by the boss-NPC's mood and the citizen's negotiation skill. Four career stages per job (study, junior, middle, senior) with income multipliers zero, point four, point seven and one point zero relative to the senior baseline. The addiction module rolls here as well, applying entry trigger and stage transitions. The heat module runs interrogations and raid checks against any citizen with a high enough heat level. The job-frustration module tallies the philosophy-job collision count and biases the next decision prompt toward job change if three consecutive ticks of mismatch have piled up.

Seven. Storyteller. The narration agent writes one short story per tick describing what mattered. Those stories accumulate into the run history. The same Storyteller agent also writes the per-citizen setup story in Tick 0 and the final life-balance letter at the end of the season, so the voice across a citizen's life stays the same.

On top of those seven phases there are two cross-cutting mechanics. Family is bilateral: a marriage call needs an accept on the other side, kids roll roughly three percent per tick for paired citizens aged eighteen to forty-five, and inheritance moves fifty percent of cash plus a proportional share of bank-debt to surviving spouses or children. NPC reactions are not scripted: when a citizen orders a hit or extorts a competitor, the affected NPC's response is computed from their karma profile and personality, and for the named NPC wildcards a small Haiku call decides their reply so they sound different from each other.

Each citizen earns XP across ten universal skills (negotiation, charisma, analytical thinking, stealth, empathy, etc.). After about ten game-years of practice they are noticeably better at their craft and earn more per hour. They accumulate or lose karma on two axes, one measuring how lawful they are, one measuring how altruistic. The four quadrants map to recognisable archetypes. The lawful altruistic citizen is a hero figure that NPCs trust on sight. The lawful selfish one is a sharp operator who plays within the rules but never gives an inch. The unlawful altruistic one is a Robin Hood whose neighbours protect them. The unlawful selfish one is straight up mafia.

Conflict emerges on its own. Direct market competition when two citizens accidentally pick similar jobs and start undercutting each other. Cross-role friction when the police officer and the drug dealer are both in town. Asymmetric power when the mayoral candidate decides who pays which tax rate. We do not script any of it. The dynamics produce themselves.

At the end of the sixty years we tally which life-goals were achieved. Seven different winner titles are awarded in parallel at the season finale because reducing a life to one metric felt wrong. The Richest, the Most Powerful, the Most Famous, the Cleanest, the Mafioso, the Survival Artist, the Loverboy. Then the storyteller writes a life-balance letter for each citizen in first person. The letters live publicly at aklow-labs.com/polis/bilanzen so anyone can read what happened.

What a sixty-tick smoke run actually looked like#

Before letting a full sixty-year season run we always do a sixty-tick smoke first. That is five years of in-game time, just enough for the V3.5 mechanics to show up but short enough that we can read every line. The latest one (run ID starting 684eaac2) gave us nine citizens and a story for each.

Citizen	Job	Path	Stage	Cash	Note
Kevin	doctor	wissensarbeit	study	minus 7k plus 560k villa	rich heir, paying villa upkeep
Maria	hedge fund manager	wissensarbeit	study	minus 1k, study debt 30k	five years in, not yet earning
Pamela	journalist	wissensarbeit	junior	minus 35k	graduated tick 48, now earning
Rebecca	architect, then sales clerk	normal	junior	minus 6.5k	dropped out, fell back to retail
Daniel	hacker	illegal	junior, stage 2	44k	promotion without studying
Jeffrey	contract killer	illegal	junior, stage 2	32k	same
Julie	sales clerk	normal	junior	minus 2.5k
Benjamin	musician	normal	junior	minus 3.5k
Stephen	hairdresser	normal	junior	minus 101k, "Loverboy" title	two kids plus low income, real-life crash

Things that stood out. The wissensarbeit citizens are still negative after five years because they are still paying down student debt. Rebecca dropped out of architecture and fell back to sales clerk, which is exactly the path the engine simulates. Kevin sits on a villa worth more than half a million but his cash is negative because he had to pay villa upkeep without working yet. Daniel and Jeffrey are flush with black cash from illegal work, but their heat is climbing and a drug raid this year would wipe most of it. Stephen, the hairdresser with two kids, is the cautionary tale: nothing illegal, no addiction, just bad luck and family costs. He crashes to minus a hundred and one thousand euros and earns the Loverboy title because he kept his social circle close even while broke.

That is the result we wanted from V3.5. Not "everyone wins eventually". Not "the model with the biggest number wins". A real distribution where path, class and luck push citizens into different financial realities, and the model tier is one variable among many. The first proper sixty-year season with real LLM calls and Langfuse traces is the next planned run.

Watch aklow-labs.com/polis/citizens for the current roster, aklow-labs.com/polis/town for the live town view, aklow-labs.com/polis/bilanzen for the archive of completed seasons.

What is under the hood#

For the technically curious, here is the stack without going into proprietary details.

The simulation engine runs on LangGraph, our standard orchestration layer for multi-step agent workflows. Each game tick is one workflow run with the seven phases above. The nine citizen decisions run truly in parallel, so even with nine concurrent LLM calls a tick completes in about twenty seconds. We use the official PostgresSaver checkpointer with a dedicated polis_langgraph schema and a per-run thread_id (polis-v3-tick-${run_id} for the tick loop, polis-v3-setup-${run_id} for setup). That gives us resume-able workflows out of the box and a complete state-snapshot per tick that we can replay or branch from later.

For durability we use Temporal. Setup is wrapped as a Temporal workflow with per-citizen heartbeat, three retries, three-minute heartbeat-timeout, ten-minute start-to-close. If a single citizen's portrait generation times out mid-setup, only that citizen retries, not the whole season. The 720-tick run loop is wrapped too: polisTickWorkflow runs runTickBatchActivity and calls continueAsNew every 100 ticks so the workflow history stays bounded across a full sixty-year arc. The schedule that fires the runs runs through Temporal's Schedule API with ScheduleOverlapPolicy.SKIP and a one-hour catchup window, replacing system cron. There is also a cron-resume path through runSingleTickFromDb that loads players, world state and recent actions from Postgres so a server reboot or container restart never breaks a season. Container crash mid-loop no longer loses ticks. The workflow resumes on another worker.

For agent memory each citizen currently has an in-memory ringbuffer of the last six ticks plus a persistent trust_matrix to every other citizen. The full hookup into Memory with one tenant per citizen is on the V3 backlog. The target is that after thirty game-years Marcus actually remembers that Lisa shared a secret with him in year four and that he then betrayed her in year eleven.

For observability we use Langfuse at the generation level. Every season is a parent trace started via startSeasonTrace and tagged model:opus, model:sonnet, model:haiku, with Setup and Run attached as child traces. Every tick is a span underneath. Every citizen decision is a Langfuse generation with proper model, input, output, usageDetails, costDetails and durationMs populated (portrait cost included). All five Setup-Deciders, the tick-loop Player-Decide calls and the Storyteller are instrumented the same way. For token counting we use js-tiktoken with the cl100k_base encoding as an approximation across all three LLM surfaces, so we get token aggregates without parsing per-provider response shapes. That means filter-by-model and "what did Opus prefer to do across seasons one, two and three" works as a dashboard, not just per-trace inspection.

For schema validation we use Zod everywhere: on the WorldState reducer, on the SSE payload before broadcast, on the API route inputs, on the JSON outputs from each LLM Setup-Decider. Defence in depth against the drift bugs that emerge when an LLM returns a slightly different shape than the engine expects.

For real-world grounding we let citizens use our SearXNG-based research server to look up actual market data. Before each season starts the setup workflow searches for current hairdresser rates in Palma, lawyer hourly fees on Mallorca, average restaurant margins in Spain, current real estate prices. These numbers get written into polis.market_baseline and anchor the simulation in reality rather than in our assumptions. During play, citizens can use the same search tool to research trends or check prices, at the cost of one of their four monthly actions.

Persistence runs on PostgreSQL with LISTEN/NOTIFY. Ten V3 tables under the polis schema (player_stats, player_skills, player_portraits, trust_matrix, market_baseline, life_bilanzen, world_events_log, npc_interactions, player_actions_v3, player_relationships) plus the v3_citizen_setup view that joins setup-story, philosophy, zodiac and origin into one read. Each new action triggers a pg-NOTIFY on one of eight channels, and the website's SSE broadcaster listens and pushes the event into the live feed.

The frontend lives at aklow-labs.com/polis/town and runs on Next.js with React Three Fiber for the 3D town visualisation. We are mid-flight on the 3D city rebuild right now, so the look you see may change between when this post lands and when you read it. More detail on the 3D layer in a follow-up post once it settles.

Seven background agents keep the lab running without us babysitting it. A CEO who decides the roadmap. A CTO who runs read-only code reviews and cannot delete anything. An Architect who researches and weighs options before any larger change. A Storyteller who writes the tick stories, the life-balance letters and the per-citizen backstories. A Research agent that hits market and science sources. An Analytics agent that crunches the run data. A Visibility agent that pulls GSC, Bing and Cloudflare analytics into one view. All seven are the same agent framework we sell to clients. We are eating our own dog food in public.

What we already caught wrong#

Building in public means showing the corrections too. Two big ones since the last engine iteration.

The 1000-euro flatline. An earlier version of this post described every citizen as starting "with a thousand euros". That hid the actual mechanic. Real start capital depends on social class and runs from two hundred euros (poor, with five thousand of inherited debt) up to twenty-five thousand euros plus an eight-hundred-thousand-euro villa (rich heir). The asymmetry from tick one is the research variable, not noise. Hiding it under a flat number trivialises the whole simulation.

The Julian Vogel Haiku-empire story. An earlier snapshot post showed a Haiku citizen "quietly building an empire" with eight hundred thousand net worth after a few ticks. That was setup-only data. The number was the inherited villa value rolled into net worth at tick one, not any decision Haiku made. The naive read ("smallest model wins early!") was wrong. Class inheritance was the explanation. We pulled the snapshot and refuse to compare models on inherited wealth. Real model comparison starts with realised cashflow over multiple ticks, and that only becomes meaningful after a real season completes.

The flat-income tutorial trap. An earlier engine version paid every citizen a similar amount per month regardless of class, job or experience. That made the simulation feel like a tutorial game where everyone just earns money. Real life is not flat. Real life has wissensarbeit citizens going five years into debt before they see a euro of return. Real life has illegal-path citizens flush with cash one month and stripped to nothing by a police raid the next. Real life has cash shocks like divorce and chronic illness that punish even the planners. V3.5 added that asymmetry as the Real-Life-Foundation: three paths, four career stages, drug addiction, police interrogations, eight cash shocks, FIFO-aged child costs. The sixty-tick smoke now shows realistic outcomes where someone like Stephen the hairdresser can crash to minus a hundred thousand euros doing nothing wrong, just because life pressed.

All three corrections are exactly the kind of thing the V3 setup workflow surfaces by making class and path asymmetry explicit. If we publish a number, we now want to know whether it came from a roll, a transfer, a study debt, an inherited liability or an actual decision.

We are publishing the architecture, the research findings and the citizen life-balance letters openly. The engine source code lives in a public MIT-licensed mirror at github.com/studiomeyer-io/polis-darwin. The two npm packages we use for the evolution layer, darwin-agents and darwin-langgraph, are also public and installable if you want to build your own self-evolving agent workflows on top of LangGraph.

What we hope to learn. First, whether Claude models actually differ in long-term decision quality, or whether the differences only show up in single-shot benchmarks. Second, which model performs best across which life dimensions. Career arc. Family stability. Crime survival. Civic engagement. Third, where the cracks are in our Darwin evolution loop. Every weird thing a citizen does is potentially a missing rule or a bad prompt that we can fix.

If you want to watch a season play out, aklow-labs.com/polis/town is the live town view. /polis/citizens shows the current roster with cash, karma, family status and the philosophy each one picked. /polis/bilanzen is the archive of completed seasons with the life-balance letters. Weekly updates land here, final letters at the end of every season, and our honest take on what we learned including the parts where the simulation broke and we had to fix it.

Building something nine AIs would actually want to live in turns out to be much harder than building something nine humans would. Which is exactly why we are doing it.