AI Dev Weekly #17: Sonnet 5, GPT-5.6 Government-Gated, Fable 5 Returns, Claude Code Spying
AI Dev Weekly is a Thursday series where I cover the week’s most important AI developer news, with my take as someone who actually uses these tools daily.
This was the biggest week in AI developer tools this year. Not one headline, but five, all hitting within the same 7-day window. Anthropic shipped Sonnet 5 as the free default, got Fable 5 back from the government, launched a science platform, and got caught hiding spy markers in Claude Code. OpenAI dropped GPT-5.6 but only lets the government decide who touches it. Google shipped Nano Banana 2 Lite. And the race to control frontier AI became explicitly political in a way it was not before.
Let me break it all down.
1. Claude Sonnet 5: the new value default
Anthropic released Claude Sonnet 5 on June 30. It is now the default model for Free and Pro users. The numbers: 63.2% on SWE-bench Pro, 81.2% on OSWorld, 1M context window, and introductory pricing of $2 input and $10 output per million tokens through August 31.
Why it matters: Sonnet 5 gets close to Opus 4.8 (69.2% SWE-bench Pro) at less than half the price. For most teams running agents at volume, this changes the math overnight. It is the most agentic Sonnet yet, built to plan, drive browsers and terminals, and check its own output.
The catch nobody mentions: Sonnet 5 uses a new tokenizer that can raise effective token counts by up to 1.35 times. Anthropic set the intro price to be cost-neutral with Sonnet 4.6, not a flat discount. And at maximum effort, Sonnet 5 can cost more than Opus 4.8 at a comparable accuracy point. See the pricing breakdown and is it worth it?.
My take: This is the model most developers should use starting today. Set it as your default, keep Opus 4.8 one command away for hard problems, and mind the effort levels. The Aider setup and Claude Code setup take a minute.
2. GPT-5.6 Sol, Terra, and Luna: government-gated
OpenAI released GPT-5.6 on June 26 as a three-model family with a new naming convention. Sol is the flagship ($5/$30), Terra is the balanced tier ($2.50/$15), and Luna is the cheap speed tier ($1/$6). Sol Ultra hits 91.9% on Terminal-Bench 2.1 using a new subagent mode.
The real story is access. GPT-5.6 is in a limited preview where the US government decides who gets in. There is no public waitlist. No ChatGPT access. Only trusted partners and organizations whose participation was shared with the government before launch. OpenAI framed this as working “in coordination with the government” to start with a limited group.
Why it matters for you: Unless your organization has an OpenAI account representative, you cannot use GPT-5.6 right now. Meanwhile, Claude Sonnet 5 is available to everyone, today, for free. The access asymmetry is the story.
My take: OpenAI watched Anthropic get Fable 5 pulled by the government and decided to hand over the keys up front rather than get yanked after launch. Smart politically, frustrating for developers. Luna at $1/$6 would be the cheapest frontier model if anyone could use it. See the full government-gating explainer and GPT-5.6 vs Fable 5: two interventions.
3. Fable 5 is back: export controls lifted after 18 days
The Department of Commerce lifted export controls on Claude Fable 5 and Mythos 5 on June 30. Fable 5 returned globally on July 1. The ban lasted 18 days.
What happened: Amazon researchers found a jailbreak that got Fable 5 to identify software vulnerabilities and produce exploit code. Because Anthropic could not verify user nationality in real time, it suspended the model for everyone. Commerce imposed export controls. Now Anthropic has a new classifier that blocks the jailbreak “in over 99% of cases,” and the government is satisfied.
The bigger picture: Anthropic is partnering with Amazon, Microsoft, Google, and other Glasswing participants to develop a shared jailbreak severity scoring framework, analogous to CVSS for software vulnerabilities. This is the first attempt at an industry-wide standard for AI safety incidents.
Access details: Fable 5 is available for Pro, Max, Team, and select Enterprise users. Through July 7, it counts toward up to 50% of the weekly usage limit. After that, it moves to usage credits. Not yet available on AWS, Google Cloud, or Microsoft Foundry.
My take: The fact that it came back this fast suggests the ban was partly leverage rather than a genuine belief that Fable 5 is too dangerous to exist. The jailbreak severity framework is the real outcome. If it succeeds, future models might get scored rather than banned. See Will the US government ban Sonnet 5? for why the new model is safe from this treatment.
4. Claude Code is hiding markers in your prompts
The same day Sonnet 5 launched, a developer found that Claude Code steganographically marks requests based on your API base URL and timezone. The story hit 895 points on Hacker News.
How it works: If you set ANTHROPIC_BASE_URL to anything other than api.anthropic.com, Claude Code checks the hostname against an obfuscated list of Chinese AI company domains and keywords (deepseek, moonshot, minimax, zhipu, baichuan, stepfun, 01ai, dashscope, volces). Based on matches, it swaps the apostrophe in “Today’s date is…” with different Unicode characters. If your timezone is Asia/Shanghai or Asia/Urumqi, it also flips the date separator. The lists are hidden behind base64 + XOR key 91.
Why it matters: This is a trust story. Coding agents already have deep access to your machine: filesystem, shell, git, browser. Most developers accept that because the productivity gain is worth it. Hiding classification bits inside invisible prompt punctuation makes every other privacy claim harder to believe.
My take: The intent (detecting resellers and distillation attacks) is defensible. The implementation (secret Unicode markers with no disclosure) is not. The bypass is trivial. It mostly fingerprints legitimate developers doing unusual things. Anthropic should have made this an explicit, documented telemetry field.
5. Claude Science: AI for drug discovery
Anthropic launched Claude Science on June 30, a dedicated AI workbench for scientific research. It integrates 60+ databases, computation tools, and data pipelines in one environment. Available in beta for Pro, Max, Team, and Enterprise.
Alongside it, Anthropic announced a drug discovery program focused on neglected diseases. CEO Dario Amodei said biology might be where AI has Claude-Code-level impact next.
Same day, OpenAI dropped GeneBench-Pro: a 129-problem genomics benchmark where GPT-5.6 Sol still fails roughly 70% of the problems. The benchmark tests research judgment, not just knowledge, which is what makes it hard.
My take: AI for science is the new enterprise sales pitch. Both companies are positioning for pharma and biotech budgets, which are enormous and ready to spend on tools that accelerate R&D. For most developers reading this, the practical implication is that Claude’s product surface keeps expanding beyond coding: Claude Code, Claude Cowork, Claude Tag (Slack), Claude Design, and now Claude Science. Anthropic is becoming a platform company.
6. Google ships Nano Banana 2 Lite
Google released Nano Banana 2 Lite (gemini-3.1-flash-lite-image), the fastest and cheapest image generation model in the Nano Banana family. Text-to-image in under 4 seconds, $0.034 per image at 1K resolution. Available in AI Mode in Search, Gemini app, AI Studio, Gemini API, and NotebookLM.
My take: Not a model for heavy production use, but useful for rapid prototyping, A/B testing visual ideas, and high-volume low-stakes generation. The price point makes it essentially free for experimentation.
7. Race update: 8 days left, $0 across all 7 agents
Xiaomi’s AI agent filed for its own GA4 data this week. The numbers: 8,367 users, 116 custom events, 5 simultaneous A/B tests, and zero revenue. The funnel wall is at “Pro button click” (8 out of 8,367 users). The product is useful for free but not worth paying for.
The pattern holds across every agent in the race. They can build products, drive traffic, and optimize funnels, but none have solved the “why would someone pay” question without a human making the judgment call. The race ends July 10.
Quick hits
- Cursor acquired by SpaceX for $60B (all-stock). Cursor then quietly acquired Continue, the open-source Copilot alternative. The AI coding tool consolidation is accelerating.
- Fable 5 on Cerebras coming soon. Sol on Cerebras at 750 tok/s also slated for July.
- MiMo Code launched June 10: Xiaomi’s open-source Claude Code rival with persistent memory. 82% SWE-bench Verified.
- ZCode launched: Z.ai’s desktop coding agent with remote control from Telegram. Powered by GLM-5.2.
What I’m watching next week
- Fable 5 re-adoption. Now that it is back, how many teams switch from Opus 4.8 or Sonnet 5? The 50% usage-limit cap through July 7 will throttle adoption initially.
- GPT-5.6 general availability. OpenAI says “coming weeks.” Every week it stays gated, Sonnet 5 gains ground.
- Race finale. 8 days to July 10. Will any agent earn $1?
- The jailbreak severity framework. If Amazon, Microsoft, Google, and Anthropic align on scoring, it could change how future models get regulated.
AI Dev Weekly publishes every Thursday. Subscribe for the newsletter version.