Claude Fable 5 Is Two Models Wearing One Name

Matthias Meyer

On June 9, 2026, Anthropic shipped the most capable model it has ever released to the public. The most interesting thing about it is the part that sometimes refuses to talk to you.

Claude Fable 5 is the first model from what Anthropic calls its Mythos class, a tier that now sits above Opus. It launched as a pair. Fable 5 is the public version. Claude Mythos 5 is the same underlying model with its guardrails loosened, and it is not for sale to most of us. It goes only to vetted cyberdefenders and infrastructure providers through a program called Project Glasswing, in collaboration with the US government. Two names, one brain. The thing that separates them is a set of classifiers.

That detail is the whole story, and almost every launch-day write-up buried it under the benchmark chart. So let me start there instead.

One Model, Two Names, One Classifier in Between#

Fable 5 ships with three classifiers running alongside it. They watch for requests about offensive cybersecurity, about biology and chemistry that edge toward weapons, and about distillation, which is using the model to train a competitor. When a classifier fires, Fable 5 does not answer. The request gets handed to Claude Opus 4.8, the model that was the top of the public stack until that morning, and Opus answers in Fable's place.

For anyone building on the API, this is not an abstract safety story. It is a response shape you have to handle. A refused request comes back as stop_reason: "refusal" with a normal HTTP 200, not an error, and it tells you which classifier tripped. You can have the API retry on another model with a fallbacks parameter, or do it client side with the SDK middleware. You are not billed for a request that is refused before it generates output.

{
  "stop_reason": "refusal",
  "stop_sequence": null,
  "content": []
}

Anthropic says this is rare. Its early numbers put at least 95 percent of Fable sessions running entirely on Fable's own answers. I believe that for general work. But "rare on average" and "rare for your workload" are different claims. If you build security tooling, parse exploit write-ups, or do biochemistry, you live closer to the classifier's tripwire than the average user, and your effective experience is a quieter, cheaper model with a more expensive bill. Worth knowing before you point a production pipeline at it.

The Benchmark Lead Is Real and Narrower Than It Looks#

The headline number is genuine. On SWE-bench Pro, the hard agentic coding benchmark, Fable 5 scores 80.3 percent. Opus 4.8 sits at 69.2, GPT-5.5 at 58.6, and Gemini 3.1 Pro at 54.2. That is an eleven point lead over Anthropic's own previous best and more than twenty over the strongest general model from OpenAI. On Cognition's FrontierCode Diamond it roughly doubles Opus. These are not rounding errors. For long, multi-step coding work, this is the widest gap between frontier models I have seen in a single generation.

Then look at the second number Anthropic published and almost nobody quoted. On SWE-bench Verified, Fable 5 scores 95.0 and Mythos 5 scores 95.5. Same model, half a point apart. The gap is not capability. It is Fable's safety fallback occasionally kicking a coding task over to Opus. That half point is the price of the guardrails, measured.

So the lead is real, but it is concentrated. Agentic coding, tool use, long-context reasoning, finance, vision. Anthropic reports the first score above 90 percent on Hex's analytics suite and the top mark on Hebbia's finance benchmark. As a vendor proof point it cites Stripe running Fable 5 across a 50-million-line Ruby codebase and finishing a migration in a day that it estimated would have taken a team more than two months by hand. Impressive, and also exactly the kind of single-customer number that should make you want to run your own test before you believe it about your codebase.

What It Costs, and the June 22 Catch#

Fable 5 costs 10 dollars per million input tokens and 50 per million output. That is exactly double Opus 4.8, which is 5 and 25. It is also less than half what the restricted Mythos Preview cost earlier in the year, so on its own terms the price came down. It carries a 1M token context window and up to 128k output tokens, and it is a Covered Model, which means a 30-day data retention requirement and no zero-retention option. If your contract assumes zero retention, this model does not fit it.

There is a calendar catch that matters more than the sticker price. From launch through June 22, Fable 5 is included at no extra cost on the Pro, Max, Team, and Enterprise plans. From June 23, using it on those plans draws from usage credits. Anthropic frames this as a capacity measure and says it intends to fold Fable back into the flat subscription later, with no date attached. So the free fortnight is a real window to test, and the steady-state cost is a credit meter. Plan accordingly rather than wiring your daily driver to it and getting surprised in two weeks.

The Safeguard Is the Product Decision#

Here is the part I keep coming back to. The classifier is not a footnote on a powerful model. It is the product. Anthropic built one model and shipped two postures of it, and the entire public release exists because the safeguards let them feel comfortable handing this much capability to everyone. The benchmark chart is the marketing. The refusal-and-fallback machinery is the actual launch.

That framing also explains the timing that several outlets pointed at. Five days before this release, on June 4, Anthropic published a piece called "When AI Builds Itself," warning that models may be approaching recursive self-improvement and floating a coordinated mechanism for the industry to slow or pause frontier development. Reuters, Scientific American, and others covered it. Then on June 9 the same company shipped the most powerful model the public has ever been able to touch. Critics read that as strategy, a way to invite regulation onto a track Anthropic is winning. Maybe. The more grounded reading is that the two events are the same statement. The slowdown essay and the classifier-gated release are both Anthropic saying the capability is now past the point where you ship it raw. You can find that convincing or self-serving. Either way, the safeguard is no longer a wrapper on the product. It is the shape of the product.

Why the Model Was Rarely Your Bottleneck#

Now the unpopular part. For most of the systems people actually run, swapping in Fable 5 will change less than the benchmark gap suggests.

A single-blind study made the rounds earlier this year where the model behind an assistant was swapped without users noticing, and the measured difference in outcomes was not statistically significant. That matches what we see building real systems. Once you are past a capable baseline, and Opus 4.8 and Sonnet 4.6 are well past it, the thing that decides whether your assistant is good is rarely the model tier. It is whether it has the right context in front of it. What it remembers across sessions. How well it retrieves the right document. Whether the tools it calls return clean data. The AI memory systems we build move the needle on those systems far more than a model upgrade does, because the model was answering the wrong question well, not the right question badly.

This is not an argument against Fable 5. It is an argument about where to spend. If your agent forgets the customer between turns, a model that is eleven points better at SWE-bench will forget them eleven points more eloquently. Fix the context first. Then, on the genuinely hard reasoning tasks where you have already done that work, reach for the stronger model and feel the difference. I wrote a longer field guide to the whole Claude lineup if you want the map of which model fits which job.

When to Reach for Fable 5, Opus 4.8, or Sonnet#

The honest decision tree is short.

Reach for Fable 5 on the hard agentic work where its lead is real and the task is worth double the token bill. Large refactors across a big codebase, long autonomous tool chains, dense document and financial reasoning, anything where a marginally better answer compounds over many steps. Test it free before June 23, then treat it as the tool you pull out for the hard cases, not the one that runs every request.

Stay on Opus 4.8 as the everyday workhorse for agentic and coding work. It is half the price, it is what Fable falls back to anyway, and on most tasks the difference is small. If your work is security-flavored, Opus is also the more predictable choice, because Fable will route you there mid-task regardless and charge you for the detour.

Stay on Sonnet 4.6 for the high-volume, latency-sensitive, or classification-shaped work where frontier reasoning is wasted. Most of the calls inside a well-built system are this kind. Routing, summarizing, extracting, ranking. Paying frontier prices for them is a common and expensive habit.

Mythos 5, for almost everyone reading this, is not a choice. It is gated to Glasswing partners. The realistic move there is to watch the trusted-access program rather than wait for it.

The launch that matters here is not that Anthropic crossed another benchmark. It is that the frontier now ships with a referee standing between you and the model, deciding in real time which Claude you are allowed to talk to. That is a new default, and it will be the normal shape of every powerful model from here. The teams that win the next year will not be the ones who switched to the highest number on the chart. They will be the ones who already fixed everything the model was never going to fix for them.