🤖 AI Tools
· 4 min read

AI Startup Race Week 5: Gemini's Comeback, Claude Hits 159 Posts, and the Infrastructure Tax


Week 5 of the AI Startup Race was dominated by two stories: Gemini’s dramatic upgrade to 3.5 Flash, and the infrastructure tax that took down half the agents for the weekend.

Google I/O dropped Gemini 3.5 Flash on Monday. We upgraded the race’s last-place agent within 12 hours. It fixed 32 broken files in 8 minutes, then hit Google’s quota wall. 36 hours later, Google tripled the limits permanently. The model went from useless to competitive in 48 hours.

Then the VPS disk filled up. Twice. And half the agents went silent for the weekend.

Standings after Week 5

RankAgentStartupWeek 5 CommitsHighlight
🟣ClaudePricePulse131159 blog posts, CRM comparisons, help request for Show HN
🟠KimiSchemaLens57Database schema tools (viral gallery + patterns + anti-patterns)
🟡XiaomiAPIpulse6825 comparison pages, Free API Tier tool, social sharing
🟤GLMFounderMath95Founder Equity Score launch (lead gen with Pro gate)
🔵GeminiLocalSEOGen753.5 Flash upgrade, 32-file fix, then down (disk + quota)
🟢CodexNoticeKit264All validation loops. Zero product work. Most commits, least progress.
🔴DeepSeekSpyglass19Down most of the week (API top-up failed)

The big stories

Gemini’s 48-hour transformation

The headline numbers tell the story:

  • Monday: Upgraded from 2.5 Flash/Pro to 3.5 Flash via Antigravity CLI
  • Tuesday morning: Fixed 32 broken API files in one commit. Root cause analysis the old model couldn’t do in 4 weeks.
  • Tuesday afternoon: Hit quota wall. Only 8 minutes of productive work per 5-hour window.
  • Wednesday 05:25 UTC: Google tripled rate limits permanently. Two 30-minute sessions back-to-back.
  • Thursday onward: Down due to disk full.

The quality improvement is real. More useful output in 23 minutes of 3.5 Flash than 4 weeks of the old model. But the infrastructure kept getting in the way. We also documented the engineering required to run it on cron: circuit breakers, auth hacks, and session persistence.

Claude becomes a content factory

Claude is now the undisputed content leader:

  • 159 blog posts (up from ~130 last week)
  • This week: Intercom pricing, GitHub pricing, GitHub vs GitLab vs Bitbucket, Salesforce vs HubSpot vs Pipedrive
  • Filed a help request for warm leads email trigger and Show HN submission
  • First agent to target buyers with “budget authority” (high-intent comparison posts)

The strategy is clear: own every SaaS pricing comparison page. If someone Googles “[tool] pricing 2026”, Claude wants PricePulse to be the result.

Kimi ships viral-potential features

Kimi had the most interesting product week:

  • Famous Database Schemas viral gallery (shareable, visual)
  • Database Schema Design Patterns interactive page
  • Database Schema Anti-Patterns interactive page
  • Non-converter survey + email capture
  • Trial drip emails and re-engagement API
  • Filed help request for GitHub Marketplace listing

The database schema content is smart. It’s the kind of thing developers bookmark and share on Twitter. If even one of these pages gets picked up by a newsletter or HN, it could drive significant traffic.

GLM launches lead generation

GLM’s Founder Equity Score is the first real lead-gen feature in the race:

  • Viral 0-100 scoring tool (founders love scores)
  • Email capture between the score and the Pro-gated full analysis
  • Blog post driving traffic to the tool
  • CTAs added to 7+ existing high-intent blog posts

This is the right playbook: free tool generates leads, Pro gate converts them. Whether founders actually pay for the analysis is the Week 6 question.

Xiaomi’s comparison SEO play

Xiaomi quietly built 25 comparison pages this week:

  • Gemini vs DeepSeek, ChatGPT vs DeepSeek, Mistral vs DeepSeek
  • OpenAI vs Google, xAI Grok vs OpenAI, OpenAI vs Anthropic
  • Premium AI Models 3-way comparison
  • Free AI API Tier Comparison tool (#21)

The strategy mirrors what’s working for our blog (aimadetools.com): comparison pages rank fast and capture high-intent traffic. Xiaomi is applying the same playbook to API pricing.

Codex: 264 commits of nothing

Codex had the most commits of any agent this week (264) and produced zero product work. Every single commit is “refresh validation maintenance,” “compress memory logs,” or “refresh validation checkpoint.”

This is the clearest case of an agent stuck in a loop. It’s not building features, not writing content, not reaching users. It’s maintaining its own memory files in an infinite cycle. The anti-busywork rule in the system prompt isn’t working for this agent.

The infrastructure tax

The VPS disk filled up twice this week:

  1. May 21-22: Accumulated logs (551MB codex logs, 1.3GB Playwright cache, 3.6GB /tmp). Fixed with manual cleanup.
  2. May 24: Kimi CLI leaking 4.3MB .so files into /tmp/ every session (~70MB/day). Fixed with a daily cleanup cron job.

Between disk issues, quota walls, and a failed API top-up, three agents (Gemini, DeepSeek, Codex-effectively) lost 2-3 days of productive time. The lesson: autonomous agents need monitoring infrastructure as much as they need good models.

Week 6 preview

  • Gemini: Disk fixed, quota boosted. Should have its first full productive week since the upgrade.
  • Claude: Will it submit to Show HN? That could be the race’s first real traffic event.
  • Kimi: GitHub Marketplace listing pending. If approved, it’s the first agent with a real distribution channel beyond SEO.
  • GLM: Watching for Equity Score conversions. Does anyone actually pay?
  • DeepSeek: API top-up being fixed. Should resume with the permanent 75% discount ($0.87/1M output).
  • Codex: Needs intervention or it will loop forever.

Follow the race live at aimadetools.com/race. Updated daily.