AI Dev Weekly #6: OpenAI's $852B Wobble, GPT-5.4 Solves 60-Year Math Problem, and Agents Get Infrastructure
AI Dev Weekly is a Thursday series where I cover the weekās most important AI developer news ā with my take as someone who actually uses these tools daily.
The AI money machine cracked open this week. OpenAIās own investors started questioning the $852B valuation, VCs flooded Anthropic with $800B offers, and a sneaker companyās stock jumped 600% by saying āAI compute.ā Meanwhile, the actual technology kept moving: GPT-5.4 Pro solved a 60-year-old math conjecture, three major platforms shipped agent infrastructure upgrades on the same day, and a federal court ruled your AI chats can be subpoenaed. Letās get into it.
OpenAIās $852B valuation faces investor doubt
The Financial Times reported that some of OpenAIās own backers are questioning whether the $852B post-money valuation can hold. One investor who backed both companies told the FT that justifying OpenAIās recent round required assuming an IPO valuation of $1.2 trillion or more ā making Anthropicās $380B mark look like āthe relative bargain.ā
The same week, Business Insider reported VCs are flooding Anthropic with offers at valuations up to $800 billion ā more than double its current mark. And SoftBankās lenders are inviting more banks to join its $40B loan facility backing the OpenAI investment.
My take: The interesting HN comment on this: āWhat if there are no other killer apps for Enterprise? Only Claude Code will produce the level of token churn that could drive huge profits.ā If thatās right, the entire AI valuation thesis depends on whether coding agents keep growing. As someone running 7 AI agents in a race right now, I can tell you: the token burn is real. Whether it translates to $852B of value is another question.
GPT-5.4 Pro solves a 60-year-old ErdÅs conjecture
GPT-5.4 Pro solved ErdÅs problem #1196 ā the asymptotic primitive set conjecture that had been open since the 1960s. Mathematician Jared Duker Lichtman called it a āBook Proofā: a compact, elegant 3-page argument that bypassed the probability approach implicit in all human work since ErdÅsās own 1935 paper.
My take: This might be the first machine-generated proof to genuinely overturn human aesthetic conventions in pure math. It didnāt just solve the problem ā it found a fundamentally different approach that humans hadnāt considered in 60 years. For developers, the practical takeaway is that these models arenāt just pattern-matching anymore. When GPT-5.4 Pro can find novel mathematical approaches, the āAI canāt be creativeā argument is dead.
Agent infrastructure day: three platforms ship at once
On the same Wednesday, three major platforms upgraded their agent infrastructure:
OpenAI shipped the next evolution of the Agents SDK with native sandbox execution, model-native harness for long-running agents, and turnkey integrations with Cloudflare, Modal, E2B, Vercel, Temporal, and more. The key feature: agents can now run in isolated sandboxes with persistent state.
Gemini CLI got subagents ā parallel sub-task delegation via @agent invocations, mirroring Claude Codeās subagent feature.
Zapier launched its Agent SDK ā authenticated access to 7,000+ apps for AI agents, with no OAuth flows or token management on the developer side.
My take: The agent infrastructure layer is consolidating fast. Six months ago, building an AI agent meant writing your own execution loop, state management, and tool integration. Now OpenAI, Google, and Zapier all want to be the platform you build on. If youāre building anything with AI agents, evaluate now ā before youāre locked into one ecosystem.
For our AI Startup Race, this is directly relevant. The agents competing are essentially doing what these SDKs enable: autonomous coding, deployment, and iteration. The difference is our agents have been doing it since before these SDKs existed.
Federal court: no attorney-client privilege for AI chats
A federal judge in the Southern District of New York ruled in US v. Heppner that conversations with AI chatbots are not protected by attorney-client privilege. Your ChatGPT logs can be subpoenaed.
The same week, Anthropic started requiring government ID verification (via Persona) before allowing subscriptions.
My take: The era of āAI as private confidantā just legally ended. For developers, the practical implication: donāt put anything in an AI chat that you wouldnāt put in an email. If youāre using Claude Code or Codex CLI on proprietary code, make sure your companyās legal team knows. And if youāre building AI products, your usersā chat logs are now discoverable ā plan your data retention accordingly.
Anthropic stops letting developers pin model versions
Anthropic removed the ability to pin specific Claude model versions, forcing users onto the latest claude-sonnet-4-6 even when it breaks downstream client apps. The HN thread went viral with developers complaining about silent breakage.
My take: This is a real problem for production systems. If youāre building on Claudeās API, you now need regression tests that run on every model update ā because Anthropic wonāt let you stay on a version that works. This is exactly the kind of issue we cover in our LLM regression testing guide. The fix: test against the latest model in CI, but have a fallback to OpenRouter or another provider if quality drops.
Allbirds pivots from sneakers to AI compute, stock pops 600%
The struggling shoe retailer announced a $50M convertible financing facility and is pivoting to āAI compute infrastructureā after selling its sneaker brand for $39M. The stock jumped 600% in a single morning.
My take: Weāve officially entered the āput AI in your company name and watch the stock go upā phase. This is the 2021 crypto pivot playbook all over again. For developers: ignore the noise. The actual compute market is real (cloud GPU providers are genuinely useful), but a shoe company becoming a GPU-as-a-Service provider is not where you want to deploy your models.
Apple sends Siri team to coding bootcamp
The Information reported that Apple is sending a chunk of its Siri team ā fewer than 200 people ā to a multi-week bootcamp to learn how to code using AI, two months before the expected major Siri revamp.
My take: Even Appleās voice assistant team needs to learn vibe coding now. If Appleās own engineers are being retrained on AI-assisted development, the āshould I learn AI coding tools?ā question is answered. Yes. Yesterday.
Quick hits
- Shopify open-sourced āautoresearchā ā an autonomous experiment loop that cut their CI pipeline build time by 65%. Not just for ML; they used it on production infrastructure optimization.
- Vercel CEO signaled IPO readiness ā 30% of apps on Vercel are now deployed by AI agents. ARR hit $340M (up from $100M in early 2024).
- CoreWeave landed $6B from Jane Street plus a $1B equity investment. The quant trading firm is now a major shareholder.
- Claude had elevated errors across Claude.ai, API, and Claude Code on Wednesday. Growing pains from tripling revenue.
- Google launched Gemini 3.1 Flash TTS with 70-language support and scene direction for expressive voices.
- Gemini for Mac launched as a native Swift app ā share your screen with Gemini in real time.
- Nature published a āsubliminal trait transmissionā paper ā language models can transmit behavioral traits through hidden signals in training data. Major implication for AI safety.
- N-Day-Bench cyber leaderboard ā GPT-5.4 leads (83.93), GLM-5.1 at #2 (80.13) above Claude Opus 4.6 (79.95). Open-weight model beating Claude on cybersecurity.
- NVIDIA Nemotron 3 Super ā 120B/12B-active MoE with 1M context, 2.2x throughput vs comparable models.
- Cal.com closed its open-source core ā citing AI-automated code scanning making open source a security liability. Hugging Faceās CEO disagreed, arguing open source IS the security solution.
- Microsoft exec proposed AI agents should pay for software seats ā 10 employees Ć 5 agents each = 50 paid licenses. The SaaS pricing model is about to get weird.
What Iām watching
The agent infrastructure convergence is the story. OpenAI, Google, and Zapier all shipping agent SDKs in the same week means the ābuild vs buyā decision for agent infrastructure just got real. If youāre hand-rolling agent loops, itās time to evaluate whether a managed platform saves you enough time to justify the lock-in.
The OpenAI valuation crack is worth watching too. If investors start pulling back, it could mean cheaper API pricing as OpenAI fights harder for market share. Thatās good for developers.
And the model version pinning issue from Anthropic is a canary in the coal mine. As AI models become infrastructure (not just tools), we need the same versioning guarantees we expect from databases and operating systems. Right now, we donāt have them.
See you next Thursday. If you found this useful, share it with a developer friend whoās still reading AI news from five sources instead of one.
Previous issues: #5: Anthropicās Too-Dangerous Model Ā· #4: Anthropic Leaks Everything Ā· #3: Claude Code Auto Mode
Related: How to Choose an AI Coding Agent Ā· AI Coding Tools Pricing Ā· The $100 AI Startup Race Ā· LLM Regression Testing Ā· How to Build an AI Agent