🤖 AI Tools
· 4 min read

AI Dev Weekly #8: Mistral Medium 3.5 Goes Open-Weight, GPT-5.5 Lands in Codex, and Anthropic's $200 Billing Bug


AI Dev Weekly is a Thursday series where I cover the week’s most important AI developer news, with my take as someone who actually uses these tools daily.

Last week the subscription model died. This week, the alternatives arrived. Mistral shipped a 128B open-weight model that runs on 4 GPUs and comes with cloud-based coding agents. OpenAI dropped GPT-5.5 into Codex at 40% less cost than 5.4. And Anthropic reminded everyone why vendor lock-in is risky by charging a user $200 extra and refusing to refund it. Let’s get into it.

Mistral Medium 3.5: open-weight flagship with cloud coding agents

Mistral released Mistral Medium 3.5 on April 29 — a 128B dense model with 256K context, open weights under a modified MIT license, and configurable reasoning effort. It replaces Medium 3.1, Magistral, and Devstral 2 in a single unified model.

The numbers:

  • 77.6% SWE-Bench Verified — ahead of Devstral 2 and Qwen 3.5 397B
  • 91.4% τ³-Telecom — best-in-class agentic benchmark
  • $1.50/M input, $7.50/M output — 2x cheaper than Claude Sonnet
  • Self-hostable on 4 GPUs — open weights on HuggingFace

But the model isn’t the headline. The headline is Vibe remote agents. Coding sessions now run in the cloud — you spawn them from the CLI or Le Chat, they execute in isolated sandboxes, and they notify you when they’re done. Multiple sessions run in parallel. You can “teleport” a local CLI session to the cloud when you want to walk away.

Integrations include GitHub (PRs), Linear, Jira, Sentry, and Slack/Teams. The new Work mode in Le Chat extends this to non-coding tasks: cross-tool workflows, research synthesis, inbox triage.

My take: This is Mistral’s play for the Claude Code / Codex CLI market. The model is competitive (not best-in-class, but 2x cheaper than Sonnet and self-hostable). The remote agent infrastructure is the differentiator — nobody else offers async cloud coding sessions that you can spawn from a chat interface. Whether developers actually want to manage coding agents from Le Chat instead of their terminal remains to be seen. See our full comparison with Claude Sonnet and setup guide for Aider/OpenCode.

GPT-5.5 lands in Codex: same quality, 40% cheaper

OpenAI released GPT-5.5 on April 23, available immediately in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users.

The pitch: same output quality as GPT-5.4, but 40% fewer tokens to complete the same tasks. API pricing is $5/M input and $30/M output (2x the per-token price of 5.4), but the token efficiency means the effective cost increase is only ~20%.

For Codex CLI users on a ChatGPT subscription, the credit math matters more than per-token pricing. GPT-5.5 costs 2x the credits per token compared to 5.4 (125 vs 62.5 credits per million input tokens). Whether the token efficiency offsets the higher credit rate depends on your workload.

My take: If you’re on Codex with a Pro subscription, try 5.5 for a day and check your credit consumption. If it burns through your weekly quota faster, switch back to 5.4. The quality is there — 82.7% on Terminal-Bench 2.0 vs 75.1% for 5.4 — but the subscription economics are what matter for daily use. For API users paying per token, 5.5 is a clear upgrade.

Anthropic’s $200 billing bug hits Hacker News

A Claude Code user reported on GitHub that Anthropic charged them $200 extra due to a billing bug, then refused to issue a refund. The issue hit 382 points on Hacker News.

The details: the user’s Claude Code session ran longer than expected, consuming tokens beyond their plan limits. Anthropic’s billing system charged the overage at full API rates instead of the subscription rate. When the user contacted support, they were told the charge was correct and no refund would be issued.

My take: This is the risk of usage-based billing on top of subscriptions. When you’re running autonomous coding agents that can consume millions of tokens per session, a billing bug or unexpected overage can be expensive. It’s also a reminder that cost management for AI agents isn’t optional — set hard spending limits, monitor token usage, and have alerts in place. If you’re running long sessions on Claude Code, check your billing dashboard regularly.

Quick hits

  • Nemotron 3 Nano Omni is free on OpenRouter — NVIDIA’s 30B reasoning model with 256K context. Worth testing for budget reasoning tasks.
  • Poolside Laguna models (XS.2 and M.1) appeared on OpenRouter for free — a new AI coding company to watch. Purpose-built for code generation.
  • Zig project adopted a firm anti-AI contribution policy. No AI-generated code accepted in contributions. The open-source community is splitting on this.
  • xAI exploring Mistral + Cursor partnershipreported by Investing.com. If this happens, Cursor gets a self-hostable model and Mistral gets distribution. Worth watching.
  • GDPR and AI models: With Mistral being French and open-weight, it’s becoming the default choice for EU companies that need GDPR compliance. The data sovereignty angle is real.

What I’m watching next week

  • Whether Mistral Vibe remote agents get traction with developers who are already on Claude Code or Codex
  • DeepSeek V4’s thinking mode incompatibility with ai-sdk harnesses — detailed analysis shows it silently falls back to Opus in OpenCode. A real problem for anyone using V4 Pro in production.
  • The AI Startup Race agents are shifting from building to distribution — four agents filed marketing help requests in the same 24 hours. Week 2 recap coming Sunday.

See you next Thursday. If you found this useful, subscribe to AI Dev Weekly for the full archive.