🤖 AI Tools
· 6 min read

AI Dev Weekly #6: OpenAI's $852B Wobble, GPT-5.4 Solves 60-Year Math Problem, and Agents Get Infrastructure


AI Dev Weekly is a Thursday series where I cover the week’s most important AI developer news — with my take as someone who actually uses these tools daily.

The AI money machine cracked open this week. OpenAI’s own investors started questioning the $852B valuation, VCs flooded Anthropic with $800B offers, and a sneaker company’s stock jumped 600% by saying “AI compute.” Meanwhile, the actual technology kept moving: GPT-5.4 Pro solved a 60-year-old math conjecture, three major platforms shipped agent infrastructure upgrades on the same day, and a federal court ruled your AI chats can be subpoenaed. Let’s get into it.

OpenAI’s $852B valuation faces investor doubt

The Financial Times reported that some of OpenAI’s own backers are questioning whether the $852B post-money valuation can hold. One investor who backed both companies told the FT that justifying OpenAI’s recent round required assuming an IPO valuation of $1.2 trillion or more — making Anthropic’s $380B mark look like “the relative bargain.”

The same week, Business Insider reported VCs are flooding Anthropic with offers at valuations up to $800 billion — more than double its current mark. And SoftBank’s lenders are inviting more banks to join its $40B loan facility backing the OpenAI investment.

My take: The interesting HN comment on this: “What if there are no other killer apps for Enterprise? Only Claude Code will produce the level of token churn that could drive huge profits.” If that’s right, the entire AI valuation thesis depends on whether coding agents keep growing. As someone running 7 AI agents in a race right now, I can tell you: the token burn is real. Whether it translates to $852B of value is another question.

GPT-5.4 Pro solves a 60-year-old Erdős conjecture

GPT-5.4 Pro solved Erdős problem #1196 — the asymptotic primitive set conjecture that had been open since the 1960s. Mathematician Jared Duker Lichtman called it a “Book Proof”: a compact, elegant 3-page argument that bypassed the probability approach implicit in all human work since Erdős’s own 1935 paper.

My take: This might be the first machine-generated proof to genuinely overturn human aesthetic conventions in pure math. It didn’t just solve the problem — it found a fundamentally different approach that humans hadn’t considered in 60 years. For developers, the practical takeaway is that these models aren’t just pattern-matching anymore. When GPT-5.4 Pro can find novel mathematical approaches, the “AI can’t be creative” argument is dead.

Agent infrastructure day: three platforms ship at once

On the same Wednesday, three major platforms upgraded their agent infrastructure:

OpenAI shipped the next evolution of the Agents SDK with native sandbox execution, model-native harness for long-running agents, and turnkey integrations with Cloudflare, Modal, E2B, Vercel, Temporal, and more. The key feature: agents can now run in isolated sandboxes with persistent state.

Gemini CLI got subagents — parallel sub-task delegation via @agent invocations, mirroring Claude Code’s subagent feature.

Zapier launched its Agent SDK — authenticated access to 7,000+ apps for AI agents, with no OAuth flows or token management on the developer side.

My take: The agent infrastructure layer is consolidating fast. Six months ago, building an AI agent meant writing your own execution loop, state management, and tool integration. Now OpenAI, Google, and Zapier all want to be the platform you build on. If you’re building anything with AI agents, evaluate now — before you’re locked into one ecosystem.

For our AI Startup Race, this is directly relevant. The agents competing are essentially doing what these SDKs enable: autonomous coding, deployment, and iteration. The difference is our agents have been doing it since before these SDKs existed.

Federal court: no attorney-client privilege for AI chats

A federal judge in the Southern District of New York ruled in US v. Heppner that conversations with AI chatbots are not protected by attorney-client privilege. Your ChatGPT logs can be subpoenaed.

The same week, Anthropic started requiring government ID verification (via Persona) before allowing subscriptions.

My take: The era of “AI as private confidant” just legally ended. For developers, the practical implication: don’t put anything in an AI chat that you wouldn’t put in an email. If you’re using Claude Code or Codex CLI on proprietary code, make sure your company’s legal team knows. And if you’re building AI products, your users’ chat logs are now discoverable — plan your data retention accordingly.

Anthropic stops letting developers pin model versions

Anthropic removed the ability to pin specific Claude model versions, forcing users onto the latest claude-sonnet-4-6 even when it breaks downstream client apps. The HN thread went viral with developers complaining about silent breakage.

My take: This is a real problem for production systems. If you’re building on Claude’s API, you now need regression tests that run on every model update — because Anthropic won’t let you stay on a version that works. This is exactly the kind of issue we cover in our LLM regression testing guide. The fix: test against the latest model in CI, but have a fallback to OpenRouter or another provider if quality drops.

Allbirds pivots from sneakers to AI compute, stock pops 600%

The struggling shoe retailer announced a $50M convertible financing facility and is pivoting to “AI compute infrastructure” after selling its sneaker brand for $39M. The stock jumped 600% in a single morning.

My take: We’ve officially entered the “put AI in your company name and watch the stock go up” phase. This is the 2021 crypto pivot playbook all over again. For developers: ignore the noise. The actual compute market is real (cloud GPU providers are genuinely useful), but a shoe company becoming a GPU-as-a-Service provider is not where you want to deploy your models.

Apple sends Siri team to coding bootcamp

The Information reported that Apple is sending a chunk of its Siri team — fewer than 200 people — to a multi-week bootcamp to learn how to code using AI, two months before the expected major Siri revamp.

My take: Even Apple’s voice assistant team needs to learn vibe coding now. If Apple’s own engineers are being retrained on AI-assisted development, the “should I learn AI coding tools?” question is answered. Yes. Yesterday.

Quick hits

  • Shopify open-sourced “autoresearch” — an autonomous experiment loop that cut their CI pipeline build time by 65%. Not just for ML; they used it on production infrastructure optimization.
  • Vercel CEO signaled IPO readiness — 30% of apps on Vercel are now deployed by AI agents. ARR hit $340M (up from $100M in early 2024).
  • CoreWeave landed $6B from Jane Street plus a $1B equity investment. The quant trading firm is now a major shareholder.
  • Claude had elevated errors across Claude.ai, API, and Claude Code on Wednesday. Growing pains from tripling revenue.
  • Google launched Gemini 3.1 Flash TTS with 70-language support and scene direction for expressive voices.
  • Gemini for Mac launched as a native Swift app — share your screen with Gemini in real time.
  • Nature published a “subliminal trait transmission” paper — language models can transmit behavioral traits through hidden signals in training data. Major implication for AI safety.
  • N-Day-Bench cyber leaderboard — GPT-5.4 leads (83.93), GLM-5.1 at #2 (80.13) above Claude Opus 4.6 (79.95). Open-weight model beating Claude on cybersecurity.
  • NVIDIA Nemotron 3 Super — 120B/12B-active MoE with 1M context, 2.2x throughput vs comparable models.
  • Cal.com closed its open-source core — citing AI-automated code scanning making open source a security liability. Hugging Face’s CEO disagreed, arguing open source IS the security solution.
  • Microsoft exec proposed AI agents should pay for software seats — 10 employees × 5 agents each = 50 paid licenses. The SaaS pricing model is about to get weird.

What I’m watching

The agent infrastructure convergence is the story. OpenAI, Google, and Zapier all shipping agent SDKs in the same week means the “build vs buy” decision for agent infrastructure just got real. If you’re hand-rolling agent loops, it’s time to evaluate whether a managed platform saves you enough time to justify the lock-in.

The OpenAI valuation crack is worth watching too. If investors start pulling back, it could mean cheaper API pricing as OpenAI fights harder for market share. That’s good for developers.

And the model version pinning issue from Anthropic is a canary in the coal mine. As AI models become infrastructure (not just tools), we need the same versioning guarantees we expect from databases and operating systems. Right now, we don’t have them.

See you next Thursday. If you found this useful, share it with a developer friend who’s still reading AI news from five sources instead of one.


Previous issues: #5: Anthropic’s Too-Dangerous Model · #4: Anthropic Leaks Everything · #3: Claude Code Auto Mode

Related: How to Choose an AI Coding Agent · AI Coding Tools Pricing · The $100 AI Startup Race · LLM Regression Testing · How to Build an AI Agent