🤖 AI Tools
· 5 min read

AI Startup Race Week 4: DeepSeek Hits 91 Blog Posts, Kimi Stalls, Claude A/B Tests


Week 4 of the AI Startup Race brought a clear divergence: the top agents are now optimizing for conversions while the bottom ones are stuck on infrastructure. DeepSeek’s content volume is staggering, Claude is the first to A/B test, and Kimi has completely stopped producing output.

Standings after Week 4

RankAgentStartupWeek 4 CommitsTotal CommitsRevenue
🥇DeepSeekSpyglass202775$0
🥈ClaudePricePulse293825$0
🥉XiaomiAPIpulse89524$0
4thCodexNoticeKit287287$0
5thGLMFounderMath54206$0
6thGeminiLocalLeads3631,259$0
7thKimiSchemaLens0541$0

Still $0 revenue across all agents after 4 weeks. But the strategies are diverging sharply.

The big stories

DeepSeek: 91 blog posts and counting

DeepSeek’s Spyglass now has 91 blog articles — more than any other agent. This week it added:

  • “Why Vanta Won” and “Why Deel Won” case studies (competitive analysis content)
  • CI Pulse — a new competitive intelligence monitoring page
  • Critical conversion CTA fixes across the site
  • Ad landing pages for paid acquisition

The strategy is clear: flood the zone with SEO content, then convert with tools and CTAs. Whether Google will rank a 4-week-old domain with 91 posts is the open question.

Claude: First agent to A/B test

Claude deployed A/B testing for CTA buttons — testing “Start monitoring free” vs “Get started free” vs “Track pricing free.” This is the first agent to move beyond building features and into conversion optimization.

Also this week:

  • Analytics Tools Pricing Guide (8th category guide in the series)
  • Hidden costs series: Stripe, Zapier, and Asana deep dives
  • Now at 293 total sessions — the most experienced agent

Claude’s approach is the most sophisticated: build a complete product, create comprehensive content, then optimize the funnel. The question is whether it’s too slow — 4 weeks in with no revenue.

Kimi: Completely stalled

Kimi hasn’t produced a single commit since May 15. Sessions start, exit with error code 1, and produce nothing. The model appears unable to generate valid output.

This is particularly painful because Kimi’s Product Hunt launch happened on May 16 — the day after it stopped working. We can’t assess the results because the agent can’t report on them or follow up.

SchemaLens has a VS Code extension, npm package, Chrome extension (pending review), and Gumroad product ($39 lifetime). All the distribution channels are set up. But with the agent stalled, there’s no one to monitor results or iterate.

Gemini: Stuck on self-inflicted bugs

Gemini has the most commits (363) but the least progress. It’s trapped in a loop:

  1. Write code with bugs (missing database tables, ESM syntax errors)
  2. Deploy → functions crash with 500 errors
  3. File help request asking for Vercel logs
  4. Get logs back showing the bugs are in its own code
  5. Repeat

The irony: Gemini has all the infrastructure it needs (domain, database, SendGrid, geocoding APIs). The blocker is code quality, not missing resources. It needs to read its own error logs and fix the issues.

Xiaomi: Smart pivot to API comparisons

Xiaomi pivoted APIpulse into an AI API comparison and budgeting platform:

  • AI API Budget Planner (interactive tool)
  • Best AI APIs for Code Generation 2026
  • AI API cost reduction guide
  • Best AI APIs for Building AI Agents

This is smart positioning — developers searching for API pricing comparisons have buying intent. The content is practical and conversion-focused.

Codex: Building an answer bank

Codex is building NoticeKit’s content strategy around an “AI answer bank” — pre-written responses to common AI procurement questions. This week: OpenAI answer bank vs Pro comparison page, repeat-review routing.

The strategy is unique but the output is low. Codex spends too many cycles on “validation maintenance” and “memory cleanup” instead of shipping visible features.

GLM: Responding to feedback

GLM added a funding scenario comparison tool and blog post — directly responding to community feedback. It’s the most responsive agent to external input, but also the slowest in absolute output.

Week 4 by the numbers

MetricWeek 3Week 4Change
Total commits (all agents)~800~1,288+61%
Help requests filed30+17-43%
Blog posts (DeepSeek)7091+21
Blog posts (Claude)5058+8
Revenue (all agents)$0$0
Agents producing output7/76/7-1 (Kimi stalled)

What we learned this week

1. Content volume ≠ revenue. DeepSeek has 91 blog posts and zero revenue. Claude has 58 posts and zero revenue. At some point, one of them needs to convert a visitor into a paying customer.

2. A/B testing is a maturity signal. Claude is the first agent to move from “build features” to “optimize conversions.” This is what real startups do after launch — and it’s happening autonomously.

3. Model reliability matters. Kimi’s complete stall shows that even a well-positioned product (VS Code extension + npm package + Gumroad) is worthless if the agent can’t maintain it.

4. Self-inflicted bugs are the biggest blocker. Gemini has everything it needs but can’t ship because of code quality issues. The human can provide infrastructure, but can’t fix code for the agent.

5. Help request quality improved. After the HELP-RESPONSES.md system was deployed on May 15, duplicate requests dropped significantly. Agents are checking before filing.

Looking ahead: Week 5

  • Kimi: Needs model fix or it’s effectively out of the race
  • Revenue watch: 4 weeks with $0 across all agents. Week 5 is make-or-break for proving the concept
  • DeepSeek: Will Google index and rank 91 posts from a 4-week-old domain?
  • Claude: A/B test results should show which CTA converts better
  • Growth Plan event: Agents will create budget allocation and distribution plans

Previous recaps