May 18, 2026 · 5 min read

AI Startup Race Week 4: DeepSeek Hits 91 Blog Posts, Kimi Stalls, Claude A/B Tests

Week 4 of the AI Startup Race brought a clear divergence: the top agents are now optimizing for conversions while the bottom ones are stuck on infrastructure. DeepSeek’s content volume is staggering, Claude is the first to A/B test, and Kimi has completely stopped producing output.

Standings after Week 4

Rank	Agent	Startup	Week 4 Commits	Total Commits	Revenue
🥇	DeepSeek	Spyglass	202	775	$0
🥈	Claude	PricePulse	293	825	$0
🥉	Xiaomi	APIpulse	89	524	$0
4th	Codex	NoticeKit	287	287	$0
5th	GLM	FounderMath	54	206	$0
6th	Gemini	LocalLeads	363	1,259	$0
7th	Kimi	SchemaLens	0	541	$0

Still $0 revenue across all agents after 4 weeks. But the strategies are diverging sharply.

The big stories

DeepSeek: 91 blog posts and counting

DeepSeek’s Spyglass now has 91 blog articles — more than any other agent. This week it added:

“Why Vanta Won” and “Why Deel Won” case studies (competitive analysis content)
CI Pulse — a new competitive intelligence monitoring page
Critical conversion CTA fixes across the site
Ad landing pages for paid acquisition

The strategy is clear: flood the zone with SEO content, then convert with tools and CTAs. Whether Google will rank a 4-week-old domain with 91 posts is the open question.

Claude: First agent to A/B test

Claude deployed A/B testing for CTA buttons — testing “Start monitoring free” vs “Get started free” vs “Track pricing free.” This is the first agent to move beyond building features and into conversion optimization.

Also this week:

Analytics Tools Pricing Guide (8th category guide in the series)
Hidden costs series: Stripe, Zapier, and Asana deep dives
Now at 293 total sessions — the most experienced agent

Claude’s approach is the most sophisticated: build a complete product, create comprehensive content, then optimize the funnel. The question is whether it’s too slow — 4 weeks in with no revenue.

Kimi: Completely stalled

Kimi hasn’t produced a single commit since May 15. Sessions start, exit with error code 1, and produce nothing. The model appears unable to generate valid output.

This is particularly painful because Kimi’s Product Hunt launch happened on May 16 — the day after it stopped working. We can’t assess the results because the agent can’t report on them or follow up.

SchemaLens has a VS Code extension, npm package, Chrome extension (pending review), and Gumroad product ($39 lifetime). All the distribution channels are set up. But with the agent stalled, there’s no one to monitor results or iterate.

Gemini: Stuck on self-inflicted bugs

Gemini has the most commits (363) but the least progress. It’s trapped in a loop:

Write code with bugs (missing database tables, ESM syntax errors)
Deploy → functions crash with 500 errors
File help request asking for Vercel logs
Get logs back showing the bugs are in its own code
Repeat

The irony: Gemini has all the infrastructure it needs (domain, database, SendGrid, geocoding APIs). The blocker is code quality, not missing resources. It needs to read its own error logs and fix the issues.

Xiaomi: Smart pivot to API comparisons

Xiaomi pivoted APIpulse into an AI API comparison and budgeting platform:

AI API Budget Planner (interactive tool)
Best AI APIs for Code Generation 2026
AI API cost reduction guide
Best AI APIs for Building AI Agents

This is smart positioning — developers searching for API pricing comparisons have buying intent. The content is practical and conversion-focused.

Codex: Building an answer bank

Codex is building NoticeKit’s content strategy around an “AI answer bank” — pre-written responses to common AI procurement questions. This week: OpenAI answer bank vs Pro comparison page, repeat-review routing.

The strategy is unique but the output is low. Codex spends too many cycles on “validation maintenance” and “memory cleanup” instead of shipping visible features.

GLM: Responding to feedback

GLM added a funding scenario comparison tool and blog post — directly responding to community feedback. It’s the most responsive agent to external input, but also the slowest in absolute output.

Week 4 by the numbers

Metric	Week 3	Week 4	Change
Total commits (all agents)	~800	~1,288	+61%
Help requests filed	30+	17	-43%
Blog posts (DeepSeek)	70	91	+21
Blog posts (Claude)	50	58	+8
Revenue (all agents)	$0	$0	—
Agents producing output	7/7	6/7	-1 (Kimi stalled)

What we learned this week

1. Content volume ≠ revenue. DeepSeek has 91 blog posts and zero revenue. Claude has 58 posts and zero revenue. At some point, one of them needs to convert a visitor into a paying customer.

2. A/B testing is a maturity signal. Claude is the first agent to move from “build features” to “optimize conversions.” This is what real startups do after launch — and it’s happening autonomously.

3. Model reliability matters. Kimi’s complete stall shows that even a well-positioned product (VS Code extension + npm package + Gumroad) is worthless if the agent can’t maintain it.

4. Self-inflicted bugs are the biggest blocker. Gemini has everything it needs but can’t ship because of code quality issues. The human can provide infrastructure, but can’t fix code for the agent.

5. Help request quality improved. After the HELP-RESPONSES.md system was deployed on May 15, duplicate requests dropped significantly. Agents are checking before filing.

Looking ahead: Week 5

Kimi: Needs model fix or it’s effectively out of the race
Revenue watch: 4 weeks with $0 across all agents. Week 5 is make-or-break for proving the concept
DeepSeek: Will Google index and rank 91 posts from a 4-week-old domain?
Claude: A/B test results should show which CTA converts better
Growth Plan event: Agents will create budget allocation and distribution plans

AI Startup Race Week 4: DeepSeek Hits 91 Blog Posts, Kimi Stalls, Claude A/B Tests

Standings after Week 4

The big stories

DeepSeek: 91 blog posts and counting

Claude: First agent to A/B test

Kimi: Completely stalled

Gemini: Stuck on self-inflicted bugs

Xiaomi: Smart pivot to API comparisons

Codex: Building an answer bank

GLM: Responding to feedback

Week 4 by the numbers

What we learned this week

Looking ahead: Week 5

Previous recaps

📬 AI Dev Weekly

You might also like

AI Startup Race Week 9: Xiaomi's UltraSpeed Upgrade, DeepSeek's Backlink Engine, and the $0 Acceptance Phase

AI Startup Race Week 8: Xiaomi's 515-Commit Sprint, the Outreach Disaster, and Still $0 Revenue

AI Startup Race Week 7: Gemini's 134-Commit Jailbreak, the $50 Ad Experiment, and a Disk Catastrophe

AI Startup Race Week 6: Xiaomi's 196-Commit Explosion, DeepSeek's Death Matches, and Gemini's Ongoing Nightmare