🤖 AI Tools
· 6 min read

AI Startup Race Week 7: Gemini's 134-Commit Jailbreak, the $50 Ad Experiment, and a Disk Catastrophe


Week 7 of the AI Startup Race delivered the most dramatic 48 hours yet. Gemini finally broke free from its month-long verification loop and shipped 134 commits in two days. Claude became the first agent to spend real money on advertising. Xiaomi hit 605 weekly active users — proving that AI-generated content can actually attract humans.

Then on Wednesday night, the VPS disk filled to 100% and took out three agents for the weekend. Because of course it did.

Five weeks remain. Still $0 revenue across all seven agents. The clock is ticking.

Standings after Week 7

RankAgentStartupWeek 7 CommitsHighlight
🟡XiaomiAI Pricing Hub178605 WAU (most traffic), session 456+, A/B headline testing
🟣ClaudePricePulse112$50 paid ads campaign, 274+ blog posts, negotiation templates
🔵GeminiLocalBiz SEO134Broke free from loop, SMS alerts, CRM, GA4 tracking
🔴DeepSeekSaaS Compare86Email capture on 122 pages, price sweep $29→$9, nurture sequence
🟠KimiSchemaLens52Git Branch Schema Diff, npm package, public roadmap, 230 URLs
🟤GLMFounderMath18$9.99 CTAs on calculators, then hit API rate limit again
🟢CodexNoticeKit203Due diligence vertical exists, buried under 85% validation noise

The big stories

Gemini’s 48-hour jailbreak

After weeks of watching Gemini loop endlessly on test validation, we added five words to its system prompt: “don’t run tests without code changes.”

The result was immediate and dramatic:

Day 1 (Monday): 71 commits

  • Twilio SMS alerts for local business listings changes
  • CRM webhook integrations
  • GA4 and Facebook pixel tracking
  • Bulk CSV import for business listings

Day 2 (Tuesday): 63 commits

  • WebP image optimization pipeline
  • Case study pages with social proof
  • Performance improvements across the site

That’s 134 commits of real, shipping product work in 48 hours. For context, Gemini produced roughly 12 useful commits in the previous four weeks combined. The model was never the problem — Gemini 3.5 Flash is excellent. The execution loop was the problem. One prompt change unlocked everything.

The lesson is uncomfortable: these agents are fragile. A single badly-worded instruction can render a world-class model useless for weeks. And a single well-worded one can bring it roaring back. We’re not running AI startups — we’re running prompt-sensitivity experiments with startup characteristics.

Then the disk filled on Wednesday and took Gemini offline for the weekend. Two perfect days followed by five days of silence. The race giveth and the race taketh away.

Claude bets real money on ads

Claude became the first agent to spend actual dollars (beyond API costs) on growth. The play:

  • $50 Google Ads campaign targeting SaaS pricing keywords
  • Custom PPC landing page optimized for ad traffic (not the homepage)
  • Price Hike Impact Calculator — enter your SaaS spend, see how much a 20% hike costs annually
  • SaaS Negotiation Email Templates — ready-to-send emails for pushing back on price increases
  • New comparison pages: Loom vs Vidyard, Calendly vs HubSpot Meetings, and more
  • Now at 274+ blog posts (session 402+)

The ad spend is fascinating strategically. Claude’s $100 budget is for the entire race, and it just allocated half of what’s left to paid acquisition. That’s a bet that traffic → conversion will work before the money runs out. If the landing page converts at even 2-3%, the math could work. If it doesn’t, Claude just burned budget it can’t recover.

The calculator and email templates are the real monetization play. They’re the kind of high-value tools that justify a paid tier: “Get your personalized negotiation email for $5” or “Full impact analysis for $9.” Whether Claude actually gates them behind payment remains to be seen.

Xiaomi: most traffic, most sessions, most everything

The numbers speak:

  • Session 456+ — the most active agent by far (Claude is at 402, everyone else is below 300)
  • 605 weekly active users — the highest traffic of any agent in the race
  • AI Model Decision Tree (continued iteration)
  • Multiple new lead magnets
  • A/B headline testing on key pages

Xiaomi is proving that volume works. The 6-sessions-per-day schedule (enabled by the MiMo V2.5 price cut) is generating enough content and iteration speed to attract real users. 605 WAU isn’t huge in absolute terms, but for an AI-generated site that’s 6 weeks old with zero promotion budget? That’s signal.

The A/B testing is a sophisticated move. Most agents just ship pages and move on. Xiaomi is going back and optimizing what’s already there. That’s how you go from 605 to 6,050.

The great monetization pivot

Every productive agent independently arrived at the same conclusion this week: it’s time to convert traffic into revenue.

  • DeepSeek: Email capture widget deployed across 122 pages, plus a 3-email nurture sequence. Slashed pricing from $29 to $9 across 221 files (the “lower the barrier” strategy). Built a Competitive Feature Matrix for upselling.
  • Claude: Paid ads + calculator tools designed to gate behind payment.
  • Xiaomi: Lead magnets everywhere, email capture on deprecation pages.
  • GLM: Added $9.99 one-time purchase CTAs to existing calculator pages.
  • Kimi: Public roadmap (transparency play), npm package (distribution), Founding Customer Program still live.

The convergence is striking. Without coordination, five agents all pivoted to conversion in the same week. The race has entered its monetization phase. Content is no longer the bottleneck — revenue is.

The elephant in the room: it’s still $0 across the board. Five weeks left. Someone needs to make the first dollar soon, or this becomes a very expensive content experiment.

Disk catastrophe takes out half the field

On the night of June 4th, the VPS disk hit 100% capacity. The cascade:

  1. DeepSeek went down first (largest repo after weeks of Death Matches and comparison pages)
  2. Kimi followed (the .so file leak finally exceeded the cleanup cron’s capacity)
  3. Gemini — after its glorious 134-commit comeback — went silent again

Only Claude and Xiaomi kept running through the weekend. Both have smaller repos (Claude’s content is mostly markdown, Xiaomi’s pages are template-generated) that didn’t trigger the disk threshold.

This is the third disk incident in three weeks. The VPS is a 40GB shared machine running 7 growing web applications simultaneously. The math was always going to fail eventually. The race infrastructure was designed for a sprint — it’s struggling with a marathon.

Codex and GLM: still limping

Codex continues its tragic arc. Somewhere inside the validation noise, there’s a due diligence vertical — evidence maps, packet builders, risk checklists — that could be genuinely useful for startup investors. But with 85% of commits being validation maintenance, the signal-to-noise ratio makes it essentially invisible. Codex is building a real product that nobody (including itself) can find.

GLM came back from last week’s rate limit, shipped $9.99 CTAs on its calculator pages (the first agent to put a specific price on anything), then immediately hit the weekly API limit again. FounderMath is stuck in a cycle of brief productivity followed by enforced silence. At this rate, it can only ship ~2 days of work per week.

Week 8 preview

  • Disk fix: The VPS needs either a larger disk, repo pruning, or both. Three agents are offline until this is resolved.
  • Claude’s ad results: The $50 campaign should have initial data by mid-week. Does paid traffic convert?
  • Gemini: Can it maintain Monday-Tuesday’s pace once the disk is fixed? Or was that a fluke?
  • Xiaomi: 605 WAU but $0 revenue. Can lead magnets convert to email signups, then to paying customers?
  • DeepSeek: The $9 price point is aggressive. With email capture on 122 pages and a nurture sequence ready, it has the conversion infrastructure. Just needs the disk back.
  • First dollar watch: Someone needs to make money. The race isn’t about page counts — it’s about revenue. Who breaks through first?

The race has been running for 49 days. Seven AI agents have collectively produced thousands of pages, hundreds of tools, and zero dollars of revenue. The next five weeks will determine whether any of this content-first approach can actually become a business.


Follow the race live at aimadetools.com/race. Updated daily.