How AI Citations Actually Work, From Query to Footnote

Matthias Meyer

Ask ChatGPT, Perplexity and Gemini the same question and on more than a third of queries you get three different lists of sources. Same web, same question, three separate verdicts on who is worth citing. That divergence is the clearest window we have into how AI citations actually work, and almost everything written about the topic skips the mechanism and jumps straight to optimization tips. This is the mechanism.

An AI citation looks like a small thing. A number in superscript, a little source card under a paragraph, a link with utm_source=chatgpt.com stapled to the end. Behind that small thing sits a pipeline that runs in the second between your question and the answer, and it has almost nothing in common with how Google ranked pages for the last twenty years. Once you can see the pipeline, the topic stops being mystical. You can predict, fairly well, why a model cited one page and ignored another that was objectively better written.

A Citation Is Not the Model Knowing You#

There are two completely different ways a language model can produce a sentence about your business. The first is parametric memory, the knowledge baked into its weights during training. If your company was in the training data, the model might "know" you, but it cannot point at a source, because it never stored a URL, only a statistical blur of text it absorbed months earlier. The second way is real-time retrieval. The system goes out, fetches live documents, and writes the answer using those documents as evidence. A citation is only ever the second thing.

This is Retrieval-Augmented Generation, usually shortened to RAG, and it is the architecture under almost every AI answer that shows sources. The model retrieves relevant documents first, then generates the answer grounded in them. The difference between the two paths is the difference between "the model has a vague impression of you" and "the model can quote you and link to you." One analysis put across more than 680 million AI citations under the microscope in 2026, and the pattern holds everywhere: the cited answers are the grounded ones, not the remembered ones.

That distinction reframes the whole problem. Getting into the training data is slow, fuzzy and mostly out of your hands. Getting retrieved is a live, mechanical event that happens every time someone asks a relevant question, and it follows rules you can actually reason about.

The Pipeline Behind a Single Answer#

When you ask an AI search tool a question, five things happen in quick succession. They are worth walking through, because each stage filters out most of the web before the next one even starts.

First, query interpretation. Your messy human question gets rewritten into one or several short retrieval queries. These are called grounding queries, and they are the literal terms the system will actually search for. "Who is the best estate agent for rural fincas near Campos" might become three clean queries about regions, property types and agencies.

Second, retrieval. The system runs those queries against an index using hybrid search, combining old-school keyword matching (BM25) with dense vector embeddings that capture meaning rather than exact words. Where the index comes from differs by engine. Perplexity crawls the open web continuously. ChatGPT leans heavily on Bing's index. Each one is searching a different map of the internet.

Third, re-ranking. Retrieval returns far too many candidates, so a re-ranker scores them and keeps a handful. Perplexity is documented as running a three-tier reranker for this. Hundreds of pages collapse to maybe five or eight.

Fourth, extraction. This is the stage most people miss. The system does not read your whole page. It pulls the specific passages that answer the sub-question, the chunks, and discards the rest. Fifth, synthesis and the citation decision: the model writes the answer constrained by those passages, then attaches each source to the spans its passage supported.

The consequence of stage four is the single most important fact about AI citations. Engines cite passages, not pages. They do not rank your site or judge its overall quality the way Google does. They lift the paragraph that cleanly answered one narrow question. A page can sit at position one on Google and never get cited, because the answer was smeared across five paragraphs and no single chunk stood on its own. The numbers bear this out: only about 44 percent of pages ranking in Google's top ten show up in AI citations at all. It is a different game with a different scoreboard.

Grounding Is the Part That Makes Citations Trustworthy#

Grounding is the mechanism that separates a model's opinion from a model's evidence. After the draft answer is generated, good systems run span-level verification. Each assertion in the answer gets matched back against the retrieved passages. The system either confirms the passage supports the claim, flags the claim as unverified, or catches a contradiction between the claim and the source. It is a fact-checking layer sitting between the model's generation and your screen.

On top of that runs corroboration. The engine cross-references a claim against other authoritative sources across the web. If several trusted sources state the same fact in similar language, the engine treats it as verified and is comfortable citing it. A lone page making an unusual claim that nothing else echoes is a weak and risky citation candidate, even if the claim happens to be true.

This is why citations reduce hallucination instead of causing it. The answer is tethered to text the system genuinely pulled and checked. It is also why the structure of your content matters more than its prose polish. The pipeline is not admiring your writing. It is checking whether a passage supports a specific claim, and whether the rest of the web backs it up.

Four Engines, Four Different Minds#

The reason three assistants give three different source lists is that they are running different retrieval strategies on different indexes with different biases. The broad shapes in 2026 look like this.

Perplexity is retrieval-first. It searches almost every query, crawls the web continuously, and cites by default with numbered inline sources. It pulls nearly three times more sources per answer than ChatGPT, leans unusually hard on Reddit (close to 47 percent of its top citations), and reacts to structural changes on a page within two to seven days, the fastest of the bunch. Schema markup barely moves it.

ChatGPT is parametric by default. It answers from training unless a query trips its search behavior, at which point it retrieves through Bing's index. Its training left it biased toward consensus and encyclopedic sources, which is why Wikipedia looms so large in its citations. It cites a smaller share of what it finds, and with 800 million weekly users, being invisible there is the most expensive kind of invisible. Since June 2025 it tags citation links with utm_source=chatgpt.com, which at least makes the traffic measurable.

Claude is the conservative one. It leans on its training and a supplied corpus, and only browses when given tools. When it does cite, it rewards depth and clear structure, roughly 30 percent more likely to cite a well-organized, bullet-pointed page, and it is the strictest engine on freshness. On time-sensitive topics it discounts content whose last-modified date is more than a year old. Gemini and Google AI Overviews sit on Google's own Search index, skew toward brand and entity signals, and show their sources beneath the summary rather than inline.

The practical upshot is divergence. Across the three engines, somewhere between 35 and 40 percent of queries return source sets that barely overlap. ChatGPT and Perplexity have been measured sharing only around 11 percent of their cited domains. Treat AI visibility as one thing and you are optimizing for one engine while three others quietly ignore you.

Why Some Pages Get Pulled In and Most Do Not#

Once the pipeline is clear, the reasons certain pages keep getting cited stop looking like SEO folklore and start looking like plumbing.

Retrievability comes first, and it is the most common silent failure. An engine cannot cite a page its crawler cannot reach. Each one runs its own agent: OAI-SearchBot for ChatGPT search, ClaudeBot and Claude-User for Anthropic, PerplexityBot, Google-Extended for Gemini. Block one in your robots.txt and that engine is simply blind to you, no matter how strong the content is. Plenty of excellent pages are uncitable for this one boring reason.

Extractability comes next. Because the pipeline lifts passages, content that answers a sub-question cleanly in a single place gets extracted, and content where the answer is diffused across half a page does not. That is the real reason answer-first writing, clear headings, tables and direct definitions correlate with citations. They are not magic ranking signals. They are mechanically easier to chunk and lift.

Then corroboration and original data. A claim echoed across the web in consistent language is safe to cite, which is why brand presence on Reddit, Wikipedia, news and review sites moves citations more than any on-page tweak, especially on ChatGPT. The flip side is just as useful: publish a number nobody else has and you become the only possible source for it. Original research has been measured at roughly 3.7 times more likely to be cited, and structured data markup at about 2.1 times. Freshness closes the loop, since some engines, Claude most of all, quietly discount stale timestamps.

None of this is a trick. It is the shape of the pipeline showing through. The machine rewards content that is reachable, liftable, corroborated and current, because those are the four things the pipeline literally checks.

The Next Layer: From Reading You to Acting on You#

Citations are about whether a model can read and reference you. The frontier moving through 2026 is whether an agent can do something with you, and a few standards are quietly building that bridge.

The lightest is llms.txt, a markdown file at the root of your site that lists your important pages with short descriptions, a kind of sitemap written for models instead of crawlers. It reduces the work an engine has to do to figure out what matters, and it is already in use by Cloudflare, Stripe and hundreds of thousands of other sites. You can read the llms.txt spec in a couple of minutes. Schema.org markup does a related job at the data level, handing the parser structured facts instead of prose it has to interpret.

The bigger shift is the Model Context Protocol, an open standard from Anthropic that lets an AI app connect straight to a live data source or tool instead of scraping text off a page. Instead of guessing your prices from a cached paragraph, a model can query them directly. The common shorthand is "USB-C for AI." One step further sits the idea behind WebMCP and agents.json, where a site publishes callable tools, book a slot, check availability, request a quote, that an agent can invoke directly. The page stops being something to read and becomes something to operate.

The trajectory is straightforward. It runs from "is my content in the retrieval index" to "can an agent transact with my business without a human ever opening the site." Citations are the first rung on that ladder, which is exactly why they are worth understanding properly rather than chasing with checklists.

What This Actually Means#

Strip it all back and an AI citation is the visible end of a retrieval, grounding and verification pipeline. It is not proof the model knows you, and it has surprisingly little to do with how you rank on Google. The pages that get cited are the ones the pipeline can reach, lift cleanly, corroborate against the rest of the web, and trust as current.

My prediction is that the gap between "ranks well on Google" and "gets cited by AI" keeps widening, because the two measure genuinely different things, and a lot of businesses are about to discover their hard-won SEO does not carry over the way they assumed. The ones who treat AI visibility as its own discipline, with its own mechanics and its own measurement, will pull ahead while everyone else waits for the citations to show up on their own. That discipline has a name, Generative Engine Optimization, and the first half of doing it well is simply understanding the pipeline you are optimizing for.