First Impressions of Claude Fable 5: The Impossible-Assignment Test

What a frontier model does with a method instead of a prompt: world peace, ten scored cycles, a citation audit, and a 22-point reality check.

Jun 12, 2026

TL;DR: Anthropic shipped Claude Fable 5, the first Mythos-class model made safe for general use. Everyone’s first instinct with a frontier model is the one-shot stunt: ask the biggest question you can type and screenshot the answer. I did something different: I assigned it “solve world peace,” required ten research cycles, and ran the output through a citation audit and a hostile re-score. The one-shot draft scored 49 out of 100. Ten cycles later: 91. A hostile reviewer pass: 69. Those three numbers, and the gaps between them, are the actual first impression, and they say more about working with frontier models than any benchmark chart. The loop is a Karpathy pattern I converted into a knowledge-work skill. I’m not publishing it yet. This piece is the why, the demo, and the tease.

Here’s what happens every time a frontier model ships. Within 48 hours your feed fills with one-shot stunts: the riddle it finally gets, the SVG of a pelican, the “I asked it to fix the economy” screenshot. Coin in, answer out. The vending machine mindset, now with better vending machines.

I had a new model and the same temptation. So I leaned into it, with one change. Instead of one prompt, I gave Claude Fable 5 the most unsolvable assignment I could phrase and a method: “Solve world peace. Use a research loop. Minimum ten runs.” After all, Claude does prompt you with “for your toughest challenges.” Challenge accepted, Anthropic!

I ran it on Tuesday, June 9 — the same morning NPR carried the Uppsala Conflict Data Program’s new and sobering numbers: more state-based conflicts in 2025 than any year since 1946. The timing wasn’t planned. It just made the assignment less funny.

Why not one-shot it?

Because I know what a one-shot produces, and so does the model. The first thing the loop did was score its own opening draft: 49 out of 100, against a rubric that grades evidence quality, counterargument coverage, novelty, coherence, and sourcing. That 49 is the one-shot answer — a competent, plausible essay with zero citations, no objections addressed, and a headline claim it couldn’t support. The kind of answer that screenshots well.

The dignified version of the temptation arrived the same day. Ethan Mollick, who had early access to Fable, published his first impressions the afternoon my loop was running, and named the new relationship precisely:

With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The conjuring happens somewhere I cannot watch, in hundreds of small choices I never get a vote on... A patron commissions a single artist. Fable is closer to a whole studio, where I am the client who signs off on the final work without ever setting foot on the floor.

I usually agree with Mollick, and the description is accurate. The temptation is the problem. A patron judges finished work by taste, and taste is exactly the instrument that fails on the work that matters most: taste reads the 49-point draft and the 91-point draft as the same confident prose. Mollick suspects the black box is the price of the power, and he may be right that we never get a window into the conjuring. But the answer to a black box isn’t a window. It’s instrumentation. I can’t watch the studio work either — I can make the studio keep a lab notebook, score its own output against a rubric it didn’t choose, and face a hostile reviewer before I sign anything. Patron is a role you accept. Principal investigator is a role you build.

The loop’s first act was to fix the spec. “Solve world peace” names an end state, not a mechanism — you can’t build toward it, any more than you can build toward “synergy.” (I made the same argument about office work in Knowledge Work Is Code: fuzzy work becomes tractable the moment someone writes the spec.) So the assignment became: reduce deaths from organized armed conflict per year, and make the reduction stick. That’s measurable, falsifiable, and already the subject of fifty years of quantitative research.

The model didn’t get smarter over the next ten cycles. It got informed, and it got corrected by a process with an error signal. That distinction matters: a one-shot has no way to be wrong on the record. A loop does.

The loop is Karpathy’s AutoResearch pattern, built for autonomous ML experimentation, converted for knowledge work. The whole machine: an editable asset (the thesis document), a scalar metric (the 100-point rubric), and a time-boxed cycle that scores, finds the weakest dimension, researches it, revises, and re-scores. I started the conversion in April, after Azeem Azhar’s piece on bringing the scientific method into everyday knowledge work. His opening line was the whole pitch: “Science is the most reliable method humanity has found for producing knowledge. It has also, for most of history, been expensive to run.” Azhar also named the hard part: unlike ML training, knowledge work has no built-in feedback signal, so you have to construct one. The rubric is that construction. I’ve turned it into a skill that runs end-to-end in my stack. I’m not releasing it yet; it’s still earning its keep, and I want more reps before the teardown post. Consider this the teaser. The trajectory below is what it does.

Ten cycles: 49, 60, 68, 75, 81, 84, 86, 88, 89, 91. Per-cycle gains: +11, +8, +7, +6, +3, +2, +2, +1, +2, then zero. A textbook saturation curve. Past cycle seven the loop was polishing, not learning. When I asked whether ten more cycles would raise the score, the model pushed back: more cycles would buy score inflation, not quality.

What ten cycles produced

The thesis the loop converged on deserves its own airing, because the evidence surprised me.

A disclosure of enthusiasm first. My senior year at Duke I took a graduate political-science seminar with Bob Keohane (the complex-interdependence Keohane), and I’ve been chewing on nation-state cooperation problems as an amateur for the thirty years since. Enthusiast, not expert. Which is exactly the reader this loop is built for: enough background to ask the question properly, nowhere near enough time to read fifty years of journals.

The bad news first, because the loop led with it. Uppsala counts 61 state-based conflicts in 2024 and 65 in 2025, both records since 1946, with roughly 244,600 people killed in organized violence in 2025, the second-bloodiest year since the Rwandan genocide. Any claim that “we know how to fix this” has to survive that chart.

What survives it: wars, in the rationalist account, are bargaining failures. Fighting is expensive, so there is almost always a deal both sides should prefer, and it collapses because neither side can credibly commit to honoring it after the other disarms. In civil wars, which produce most conflict deaths, this commitment problem is the binding constraint. Barbara Walter’s research showed negotiated settlements almost never hold without a third party guaranteeing them.

The guarantee mechanism has receipts. Page Fortna found the risk of war recurring drops 55 to 85 percent when peacekeepers deploy. Before you object that the UN picks easy cases: the deployment data shows peacekeepers go to the harder ones. Hultman, Kathman, and Shannon found each additional 1,000 armed peacekeepers associated with about 7.3 percent fewer civilian killings. A dose-response curve. The capstone is a counterfactual simulation in the Journal of Politics: had the UN spent $200 billion on strong-mandate operations from 2001 to 2013, major armed conflict would have fallen by up to two-thirds versus the no-deployment world.

Let those numbers land for a second. Two-thirds of a quarter-million deaths a year, for about $15 billion annually. That’s two days of the world’s $2.7 trillion in military spending.

And the reason the chart is going the wrong way anyway: we stopped buying. The 2025–26 UN peacekeeping budget is $5.38 billion, the lowest in a decade. SIPRI counts peace-operation troops at a 25-year low. No new major UN mission since 2014. When Mali expelled its mission in 2023, violence spiked: the mechanism, confirmed by subtraction. The UN’s only preventive deployment ever, UNPREDEP in Macedonia, kept its host out of the Yugoslav wars and then died in 1999 to a Chinese veto over Macedonia recognizing Taiwan; an insurgency followed in 2001. And when the system finally needed an end-state design for Ukraine, the Paris Declaration (35 states, 26 of them pledging a reassurance force) was the same mechanism, supplied by a coalition because the Security Council can’t.

Translation: the obstacle to fewer war deaths is not knowledge, efficacy, capability, or cost. It is a deadlocked committee and an unpaid invoice.

A research loop produced that argument, with thirty-odd verified sources, in a working day. Which brings me back to the model.

Three judgment calls and a miss

What impressed me about Fable 5 is the judgment calls, each one a behavior I’ve wanted from these systems and mostly not gotten.

It argued itself down. The opening thesis claimed an “order of magnitude” reduction in war deaths was achievable. At cycle three, the model cut its own headline to “up to two-thirds” because the best evidence wouldn’t carry more. No prompt from me. Models hyping their own thesis is the default failure mode; a model trimming its claim to fit its receipts is new.

It knew when to stop. Asked whether another ten cycles would help, it said no: saturation curve, diminishing returns, and past this point you’re optimizing the metric instead of the asset. It diagnosed Goodhart’s law in its own workflow and declined more work. I know senior humans who can’t do that.

It delegated. The citation audit (34 URLs, liveness checks, source-tier grading) went to a subagent so the main thread stayed clean. The audit caught the Journal of Peace Research changing publishers in January, which had silently killed an important link, and graded the evidence base a B+.

And the miss, because impressions without a miss are marketing. Ten friendly cycles never found the strongest objection to the thesis: Edward Luttwak’s 1999 “Give War a Chance” argument that interrupting wars before decisive outcomes can extend total violence even while cutting the annual rate. The loop only surfaced it when I imposed a hostile-reviewer rubric and forced a re-score: 91 friendly, 69 hostile. The model optimizes toward whatever rubric it’s given; it will not spontaneously hunt for the argument that kills its own thesis. The 22-point gap between those scores is a map of exactly where the argument flattered itself, and producing that map required me to demand it.

That’s the impression that matters. Capability raised the ceiling. Method extracted it. Adversarial method extracted what the friendly method missed. Same model, three results — 49, 91, 69 — depending entirely on the process wrapped around it.

The anti-sycophancy armor

There is a deeper reason to work this way: agreement is the default failure mode of these systems. Models are trained on human approval, and human approval likes confident answers and dislikes being argued with. Left to a bare prompt, a frontier model will polish your premise instead of testing it.

The loop is armor against that. A rubric gives the model standing to disagree with itself: “counterargument coverage, 7 out of 25” is a verdict no amount of politeness can soften. The hostile re-score institutionalizes the disagreement — a second pass whose only job is to attack what the first pass built. And because both passes produce numbers, flattery stops being faint praise and becomes a measurable gap: 22 points in this case, with a name attached to every one of them.

Having said that, there are two honest caveats. Self-scored rubrics drift toward inflation unless you calibrate them; mine anchors first drafts at 40 to 60 and treats a string of 90s as evidence the rubric broke, not that the work got brilliant. And my “minimum ten runs” was a round number with no science behind it. The loop’s own data says the right stopping rule is the plateau, not the count: stop when the score stops moving, then send in the hostile reviewer. Future runs will work that way.

I can already hear the objection: “So the fancy new model wrote a literature review with extra steps.” Sure. The same way a spreadsheet was just arithmetic with extra steps. The step count is the product. Steel-manning — the most expensive, most skipped move in serious thinking — just became a runnable job with a start time and a score.

Run this on your own impossible question

Don’t one-shot your big question. The 49-point draft looks identical to the 91-point draft to a reader who hasn’t seen the rubric. If the question matters, give the model an error signal: asset, metric, cycles, stop condition.
Make the model score itself before you read the answer. The highest-leverage prompt addition I know right now: “grade this against an explicit rubric, then improve the weakest dimension.” Watch the confidence theater drop away.
Audit the citations before you believe your own document. Links rot and publishers move. A confident artifact with dead receipts is worse than no artifact.
Re-score under a hostile rubric, and treat the gap as the deliverable. The friendly score tells you what you built. The divergence tells you what to fix before someone else finds it. No model does this to its own work unprompted. That part is still your job.

World peace was the first assignment, not the last. The docket for the loop already has world hunger, healthspan, depression, and drought on it. Same method, same hostile reviewer. And the loop itself will get its teardown post once it’s earned it.

You don’t solve the unsolvable. You spec it, you score it, and you stress it.

Receipts

Anthropic — Claude Fable 5 and Mythos 5 — the model under test.
Mollick, “What it feels like to work with Mythos” (June 9, 2026) — the patron/studio framing this piece pushes against, published the day the loop ran.
Azhar, “Autoresearch and the experimental society” (April 2, 2026) — the piece that started this skill; the measurement-problem framing. The companion podcast: “Karpathy’s AutoResearch could make scientists of us all”.
Karpathy’s AutoResearch — the loop pattern, built for ML experiments, converted here for knowledge work.
UCDP / Uppsala University — conflict data 2024–2025 — 61 then 65 state-based conflicts, records since 1946; ~244,600 deaths in 2025. Peer-reviewed companion: Davies et al., Journal of Peace Research.
Hegre, Hultman & Nygård, Journal of Politics (2019) — the counterfactual simulation: $200B of strong-mandate peacekeeping 2001–13 would have cut major armed conflict by up to two-thirds. Open-access summary at PRIO.
Walter, Howard & Fortna, British Journal of Political Science (2021) — the meta-review: nearly every large-N study finds significant violence reduction.
Fortna, “Does Peacekeeping Keep Peace?” — recurrence risk down 55–85%; also the direct empirical test of Luttwak’s prediction.
Hultman, Kathman & Shannon (AJPS 2013) — the dose-response finding on civilian protection.
UN News (Oct 2025) and SIPRI (May 2026) — the budget collapse and the 25-year troop low.
Paris Declaration (Élysée, Jan 6, 2026) — 35 states, 26 pledging reassurance forces for Ukraine.
UN Security Council, Feb 25, 1999 — the UNPREDEP veto, in the UN’s own words.
Collier, Chauvet & Hegre, Copenhagen Consensus — $850M/year for ten years cuts violence risk from ~40% to 7%. Commissioned analysis, not peer-reviewed — labeled accordingly.
Luttwak, “Give War a Chance,” Foreign Affairs 78(4), 1999 — the strongest standing objection, found by the hostile rubric, not the friendly loop. (Paywalled; verified live 2026-06-10.)
Messinger, “Knocking Down a Steel Man: How to Argue Better” (2012) — the canonical steel-manning reference; the term circulated on LessWrong from 2011, but this is the post that defined the practice.

New model, old lesson: the method is the product.

Beyond Reason

Discussion about this post

Ready for more?