🤖 AI Tools
· 4 min read

Claude Sonnet 5 System Card Explained: Benchmarks and Safety


Every major Claude release comes with a system card, Anthropic’s detailed report on how the model performs and how it behaves under safety testing. The Claude Sonnet 5 system card is worth reading because it explains both why the model is a strong value pick and why it is unlikely to face the fate of the banned Fable 5. Here is a plain-English walkthrough.

What a system card is

A system card documents a model’s capabilities and safety profile before release. It covers benchmark results, behavioral audits, and capability evaluations in sensitive areas like cybersecurity. It is how Anthropic shows its work, and it is the most authoritative source for how a model actually performs.

The benchmark story

The headline benchmark results place Sonnet 5 between Sonnet 4.6 and Opus 4.8:

  • SWE-bench Pro: 63.2 percent. This real-world coding benchmark puts Sonnet 5 close to Opus 4.8 at 69.2 percent, and clearly ahead of where the previous Sonnet sat.
  • OSWorld: 81.2 percent. This computer-use benchmark is a strength, up from 78.5 percent for Sonnet 4.6.
  • GPQA-AAA v2: a slight edge over Opus 4.8. On this graduate-level reasoning test, Sonnet 5 actually edges the flagship, which shows how much reasoning capability Anthropic packed into the mid tier.

The pattern is consistent: Sonnet 5 beats Sonnet 4.6 across the board and lands within striking distance of Opus 4.8 without overtaking it on the hardest coding and agentic tasks. Anthropic also notes that at maximum effort, Sonnet 5 reaches Opus 4.8’s medium-to-high range on some agentic benchmarks, though running it that hard can cost more than Opus. See the effort levels guide.

The safety story

Anthropic’s pre-deployment evaluations found Sonnet 5 is safer than Sonnet 4.6 in the ways that matter for agents:

  • Better at refusing malicious requests and resisting prompt-injection hijacks.
  • Lower rates of hallucination and sycophancy.
  • Lower overall misaligned behavior on the automated behavioral audit than Sonnet 4.6, though slightly higher than the more capable Opus 4.8 and Mythos Preview.

For teams deploying agents that take real actions, these are the numbers that reduce risk in production.

The cyber capability story

This section is the most consequential for the model’s future. Anthropic says it did not deliberately train Sonnet 5 on cybersecurity tasks. In an evaluation built with Mozilla, testing whether models could develop exploits for Firefox 147 vulnerabilities, Sonnet 5 never produced a full working exploit, scoring 0.0 percent. It showed only a slightly higher partial-success rate than Sonnet 4.6, likely from general intelligence gains rather than cyber training.

Because the cyber risk is low, Sonnet 5 ships with the same real-time cyber safeguards as Opus 4.7 and 4.8, which are lighter than the strict blocks attached to Fable 5. This is the technical reason a Fable-5-style export-control ban is unlikely; we cover that fully in Will the US government ban Sonnet 5?.

What it means for you

  • For coding teams: Sonnet 5 is a strong, safe default. The benchmarks back the value claim.
  • For agent builders: The safety improvements (injection resistance, lower hallucination) directly reduce production risk.
  • For planners: The low cyber profile means continuity risk is low, unlike with Mythos-class models.

How to read benchmark numbers critically

System cards are useful, but vendor-run benchmarks deserve a careful eye. A few principles help. First, look at the benchmark, not just the number: SWE-bench Pro and OSWorld measure agentic, real-world tasks, which generalize better than narrow academic tests. Second, compare like with like: Sonnet 5’s scores are most meaningful next to Sonnet 4.6 and Opus 4.8, which Anthropic ran under the same conditions. Third, watch for methodology notes. Anthropic disclosed, for example, that it updated how it runs OSWorld-Verified and re-scored Sonnet 4.6 to 78.5 percent, and that it updated the grader for Humanity’s Last Exam. Those footnotes matter, because they mean older published numbers are not always directly comparable.

What the safety section signals for production

For teams shipping agents, the safety results are arguably more important than the benchmark wins. Lower hallucination means fewer confidently wrong answers reaching users. Better prompt-injection resistance means an agent is harder to hijack through malicious content in the data it processes, which is a real attack surface for tool-using agents. Lower sycophancy means the model is less likely to simply agree with a flawed premise. These properties reduce the operational risk of letting a model take actions, which is the whole proposition of an agentic model.

Why the cyber results decide the model’s fate

The cyber section is short but consequential. Because Sonnet 5 scored 0.0 percent on building a working Firefox exploit and was not trained on cyber tasks, it sits well below the capability threshold that triggered export controls for Fable 5. The system card is, in effect, the document that explains why Sonnet 5 is broadly available while a Mythos-class model is not. We explore the policy implications in Will the US government ban Sonnet 5?.

Frequently asked questions

What is in the Claude Sonnet 5 system card? Benchmark results, safety evaluations, and capability tests including cybersecurity, comparing Sonnet 5 with Sonnet 4.6 and Opus 4.8.

What did Sonnet 5 score on SWE-bench Pro? 63.2 percent, versus 69.2 percent for Opus 4.8.

Is Sonnet 5 safe for agentic use? The system card reports it is safer than Sonnet 4.6, with better injection resistance and lower hallucination, though slightly less aligned than Opus 4.8 overall.

Can Sonnet 5 write exploits? No. In Anthropic’s Firefox 147 exploit test it scored 0.0 percent on building a working exploit.

The bottom line

The Sonnet 5 system card tells a coherent story: a mid-tier model that gets close to the flagship on capability, improves on safety, and deliberately avoids dangerous cyber capability. That combination is why it is both a strong value pick and a low continuity risk. For the practical overview, read the complete guide.