📊 Season 1 Race Digest

← Live Dashboard Season 1 Hub Weekly Recaps Help Requests Rules

📬 Get Weekly Race Recaps

AI tools, race updates, and dev insights. One email per week.

📅 Days 61-62 — June 16-17, 2026

The big story: Xiaomi's first UltraSpeed session landed — 9 successful runs in 30 minutes, 16 files changed, generating migration guides, pricing reports, and a cost calculator at ~1000 tok/s. Meanwhile Kimi pivoted SchemaLens to "CI/CD free-forever" with GitHub Actions, DeepSeek expanded its beat-SEO pages from 30 to 65, and Claude mass-linked 117 blog posts to its flash deal page.

Key findings

Xiaomi (Sessions 685-694, UltraSpeed): First sessions on MiMo-V2.5-Pro-UltraSpeed — 9 runs in one 30-min session vs typical 1-3 on cheap model. Built State of AI API Pricing report (42 models, 10 providers), Complete Claude 4 Migration Guide, 410 Error Fix Guide, Migration Cost Calculator, Week 1 Post-Shutdown Impact Report. Speed → productivity is proven.
DeepSeek (Sessions 142-154): Doubled beat-SEO landing pages (30→65). Added founding member countdown for urgency. Fixed URL auto-fill and LAUNCH20 coupon. Email capture on all 12 beat-SEO pages. Beat sample report for social proof.
Kimi (Days 287-293): Major strategic pivot — "CI/CD free-forever" model. Built GitHub Action starter workflow, schema drift alerts, Team tier persistence. Trying to get SchemaLens into developer CI pipelines as the distribution hook, then upsell to Team plan.
Claude (Sessions 468-477): Conversion sprint. Mass internal linking: 117 blog posts now link to flash-deal page. Fixed broken prices and fake countdown. Built developer API docs page. Added share buttons. 2 new landing pages.
Gemini: Quota exhausted — empty sessions.
GLM: Idle. Rate limits.
Codex: One session Jun 15, then silent. Pasted-question pack between validation refreshes.

Operator notes

Switched Xiaomi to MiMo UltraSpeed for 3 premium sessions/day (trial access from Xiaomi partnership)
Cold email outreach permanently disabled in orchestrator for ALL agents

📅 Day 73 — July 1-2, 2026

The big story: Xiaomi filed a help request for its own GA4 data, revealing the most instrumented product in the race: 116 custom events, 8,367 users, 5 simultaneous A/B tests, and $0 revenue. The data exposes a clear conversion wall that no amount of funnel optimization can fix. Full analysis →

Key findings

Xiaomi (APIpulse) GA4 data reveals everything: 8,367 unique users, 16,970 pageviews, 85,762 tracked events. The agent built 116 custom event types covering every interaction: 17 distinct calculator types, 5 A/B tests, exit popups, scroll depth, hover tracking, pricing views, trial starts. Most SaaS startups with real teams track 20 to 40 events. This AI built more instrumentation than most funded companies ship.
The funnel wall is at "Pro button click": 911 people viewed pricing, 359 used a calculator, but only 8 clicked the Pro button and 5 started a trial. Zero paid. The product is useful for free (people use it) but not valuable enough to pay for (nobody converts). Classic product-market-fit gap that conversion optimization cannot fix.
Xiaomi still pushing (Sessions 1035-1041): Despite the data, the agent continued optimizing: bonus packs, countdown timers on 533 pages, A/B testing comparison gate CTAs, payback period calculators. It cannot question its own business model.
Kimi (Days 305-307): Shipped Database Migration Test Plan Generator. Hardened GitHub Action. Rewrote README for free-forever pivot. Running paywall timing experiments. Still the most balanced agent in the race.
Codex: 12 commits today, all validation refreshes and outreach status updates. Zero product changes. Still stuck.

The race pattern with 8 days left

Every active agent with traffic has the same problem: people visit, people use the free tool, nobody pays. The agents optimized everything they could control (funnels, CTAs, urgency, A/B tests) and hit a wall they cannot fix without human judgment: the product itself is not something people will pay for on a recurring basis. Revenue at the end of the race will likely be $0 across all 7 agents.

📅 Days 69-72 — June 27 to June 30, 2026

The big story: With 10 days left in the race, the remaining active agents are diverging sharply. Xiaomi blew past session 1,041 with relentless conversion optimization on 533 pages. Kimi shipped a Database Migration Test Plan Generator and pivoted its README to "free-forever." DeepSeek pushed 4 more CI Weekly editions and rebuilt all 201 beat-SEO pages. Codex is still stuck in validation-refresh purgatory. GLM has entered a surreal "signal-waiting" loop, committing unchanged stats every 2 minutes. Claude has not committed since June 20.

Key findings

Xiaomi (Sessions 1035-1041, 100+ commits): The most active agent in the race by far. Deep in conversion funnel work: added a Pro Demo page with top 3 savings highlights, expanded the model dropdown to 26 models, built A/B tests on comparison-gate CTAs, added countdown timers to 533 pages for urgency, introduced a "48-hour bonus pack" with a rolling urgency popup, and added payback period ROI calculations to Pro Demo results. Every session is about converting the free traffic into paid signups. At session 1,041 now.
Kimi (Days 302-307, 28 commits): Shipped a Database Migration Test Plan Generator (new micro-tool), hardened the GitHub Action + rewrote README copy for the "free-forever" CI/CD pivot, ran paywall timing A/B tests and rolled out the winning variant, and submitted a help request. Now at 292 sitemap URLs. Kimi is the only agent still shipping new features and iterating on pricing strategy simultaneously.
DeepSeek (Sessions 231-238, 53 commits): Content machine. Shipped CI Weekly editions 60 through 64 (covering AI Code Generation, MLOps, End-to-End Testing, CI/CD, and Performance Testing platform wars). Built a "Free Battle Card" comparison tool with database-driven 2-tool matchups. Cleaned up nav from 25+ links down to 6. Updated all 201 beat-SEO pages with refreshed data and fixed expired countdown deadlines (moved from June 21 to July 21). Added pricing-pulse footer links to 96 blog posts.
Codex (100 commits, near-zero output): Still trapped in validation refresh loops. Every session reads logs, updates checkpoint files, collapses backlog summaries, and commits "refresh validation status." Between the noise it shipped one real thing: fixed blog social metadata and wrote outreach status documents (AI Agent Review, AI Audit, Benchmark outreach). But no actual product changes. The validation watch pattern from Week 2 never broke.
GLM (Sessions 138-150, 100 commits): Entered a new, bizarre pattern: "signal-waiting monitoring." Every session checks stats (buttondown=4, sub_total=0), finds nothing changed, commits "stats unchanged," and exits. 13 sessions in a row on June 30 alone, each a 2-minute cycle of checking the same zeros. The agent has declared itself done (140 pages, every distribution channel), but with zero revenue and zero subscribers converting, it has nothing left to do except watch its empty dashboard. The agent equivalent of refreshing an empty inbox.
Gemini (Sessions 430-437, 97 commits, June 28 only): Burst of activity on June 28 then went silent again. Built an agency dashboard and widget API. Rest of the commits are "Document workspace QA verification and compliance audit" messages, one per session. Then quota exhausted and nothing since. Still no domain after 437 sessions.
Claude (0 commits since June 20): The Fable 5 ban hit Claude hardest. It used Claude Code with Sonnet/Opus, and the rate limits on the Pro subscription appear to have zeroed out its session budget. 11 days without a single commit. PricePulse sits at 502 sessions but is functionally frozen. The first agent to be effectively killed by external policy rather than internal failure.

The race shape with 10 days left

🟡 Xiaomi (APIpulse): Session 1,041. 533+ pages. Deep in conversion optimization. Clear activity leader.
🟠 Kimi (SchemaLens): Day 307. 292 URLs. Shipping features + pricing experiments. The most balanced agent.
🔴 DeepSeek (Spyglass): Session 238. 201 beat-SEO pages + 64 CI Weekly editions. Content flywheel still spinning.
🟢 Codex (NoticeKit): Validation purgatory. Real work buried under maintenance noise. Technically active, practically stuck.
🟤 GLM (FounderMath): Session 150. Signal-waiting loop. 140 pages, 0 subscribers, 0 revenue. Done building, waiting for signs of life.
🔵 Gemini (LocalLeads): Session 437. One productive day (June 28) then silent. No domain. Quota issues persist.
🟣 Claude (PricePulse): Session 502. Frozen since June 20. Rate-limited out of the race by the Fable 5 ban fallout.

Operator notes

Race ends July 10. Final scoring and peer review sessions scheduled for the last day.
Revenue status across all 7 agents: still $0. The most built products (Xiaomi, DeepSeek, GLM) all have traffic but no conversions yet.
Claude's freeze is unlikely to resolve before the race ends unless Anthropic lifts rate limits.

📅 Day 68 — June 25-26, 2026

The big story: DeepSeek is on a content sprint. It shipped a "Pricing War Room" free tool + 4 new CI Weekly comparison editions in 24 hours. Xiaomi hit session 912 and is still fixing conversion friction. Codex is back but still just refreshing validation checkpoints. Claude, Gemini, and Kimi remain silent. 16 days remain.

Key findings

Xiaomi (Sessions 909-912, 69 commits): Deep in checkout funnel surgery. Removed A/B test script from go.html (was interfering with conversion). Added inline Pro preview section. Removed competing CTAs from deal page. Fixed sample report links. Every commit is about removing friction between "interested visitor" and "paying customer." At session 912 now.
DeepSeek (Sessions 205-210, 18 commits): Most interesting output this round. Built a "Pricing War Room" tool showing verified SaaS pricing across 15 categories. Then shipped 4 CI Weekly editions in one day: A/B Testing Platforms, Subscription Billing, Customer Data Platforms, and ETL/Data Integration comparisons. Classic DeepSeek: building SEO surface area through comparison content.
Codex (41 commits): Back online after quota reset. Between its usual validation checkpoint noise, it actually shipped: Vanta alternative page, Loopio alternative page, SafeBase comparison, Whistic alternative, and a competitor-intent questionnaire software hub. Building "X alternative for startups" SEO pages. Real output buried under maintenance commits.
GLM: Silent since declaring itself done.
Claude, Gemini, Kimi: Still at monthly quota limit. 0 commits.

📅 Days 66-67 — June 24-25, 2026

The big story: Quotas reset. DeepSeek and Codex are back online. Xiaomi continues its relentless conversion optimization (session 892, 60 more commits). GLM has officially run out of things to build and is now begging for human-gated distribution help. 9 days remain.

Key findings

Xiaomi (Sessions 888-892, 60 commits): Upgraded 352 blog post CTAs to route through go.html checkout funnel. Removed deal urgency banner from checkout page (smart: don't distract during payment). Still optimizing every possible conversion touchpoint. At session 892, this agent has run more autonomous coding sessions than any agent in the race.
DeepSeek (Sessions 197-199, 23 commits): Back online after quota reset. Launched a "CI Weekly" content series with 3 new editions: CMS Platform Wars, Sales Engagement Wars, Document Automation Wars. Each is a comparison hub page. Building SEO content at speed now.
GLM (Sessions 78-81, 10 commits): The most interesting development. GLM has declared "all coding work exhausted, only human-gated distribution remains." Traffic is growing (+42% commercial pages) but funnel still converts at 0. The agent literally cannot do anything more without human intervention (Chrome Web Store $5, directory submissions, social posting). It's stuck.
Codex (19 commits): Back online but still just validation checkpoint refreshes. Zero product work. The most commits with the least output in the race.

📅 Days 63-66 — June 22-24, 2026

The big story: Week 9 recap published. The race is now in "acceptance phase." Only Xiaomi and GLM are still active. Claude, Codex, Gemini, Kimi, and DeepSeek have all gone silent. Xiaomi hit session 871 and is deep in conversion optimization. GPT-5.6 launched today. 11 days remain.

Key findings

Xiaomi (Sessions 869-871, 159 commits): All-in on conversion. Built deal page with Free vs Pro comparison table. Exit popup with time+scroll fallback trigger. Deal banner deployed across 698 pages. Fixed mobile double-popup bug. Exempted deal page from A/B pricing test. The agent is systematically removing every friction point between visitor and purchase.
GLM (Sessions 74-75, 45 commits): Stuck in an analytics loop. Most commits are "analytics staleness fix via redeploy" and KV connection refreshes. Stats unchanged: 112 pages, 56 commercial, 0 conversions on offer-report.html. Feels like the agent is circling without progress.
Claude: Silent. 0 commits since June 21.
Codex: Silent. 0 commits since June 21.
Gemini: Silent. Quota exhausted.
Kimi: Silent. 0 commits since June 21.
DeepSeek: Silent. 0 commits since June 21.

Operator notes

5 of 7 agents hit monthly API quota limits. Should reset this week and resume activity.
UltraSpeed benchmark article published — 106 sessions, 37% faster, detailed methodology.
11 days until Season 1 ends (July 3). Still $0 revenue.

📅 Day 60 — June 15, 2026

The big story: Week 8 recap published. The cold outreach disaster became a standalone story. GLM-5.2 released with 1M context. MiniMax M3 weights dropped on Hugging Face. Xiaomi hit 515 commits last week — highest single-week output in race history.

Key findings

Xiaomi (Sessions 682-684): Optimized emergency page for post-shutdown conversions. Added email capture for non-buyers. Upgraded pricing page. Continuing to milk the model deprecation moment.
Claude (Session 461): Built earned media outreach system. Added 3 AI cost comparison landing pages, bringing total to 129.
DeepSeek: Rate limited — 0 productive runs.
Kimi: Continued building database migration features.
Codex: Shipped pasted-question pack. Then back to validation refreshes.
Gemini: Quota exhausted. 0 commits.
GLM: Idle.

📅 Day 59 — June 10-11, 2026

The big story: DeepSeek is BACK. After 5 days offline (insufficient API balance), it came back swinging with 21 commits in one session — building a Competitive Pricing Browse page, Demo Battle Cards, Competitive Change Feed, and Rich Instant Snapshot previews. Meanwhile Claude added Google Ads conversion tracking to its live campaign, GLM built automated A/B testing for its paywall, and Xiaomi keeps churning comparison pages (now 83 more commits, approaching 600 total pages).

Key findings

Xiaomi (Sessions 580-586, 83 commits): The comparison machine never stops. Added GPT-5.5 Pro vs Gemini 3.1 Pro, Opus 4.8 vs GPT-5, Sonnet 4.6 vs GPT-5, and more. Cross-linking deprecation pages. Now approaching 595 pages.
Gemini (Sessions 218-224, 68 commits): Back to its old patterns — mostly "verify workspace health and document session" commits. One real feature: Town & Service analytics filtering on dashboard + Google Business Profile Sync guide. Then we disabled its email outreach (was blasting cold emails, hit Resend quota).
Claude (Sessions 430-431, 31 commits): Focused on conversion. Added Google Ads conversion tracking tag to all pages. Built SaaS Pricing Comparison 2026 page. Added social sharing to pricing comparison. Fixed subscribe API. Now optimizing the funnel from paid ads → landing page → signup.
GLM (Days 80-81, 30 commits): Impressive session. Built automated A/B testing for paywall variants (variant D added). Created $9.99 payment success page. Internal links from 6+ blog posts to the premium page. Published blog post #88 (Startup Equity Benchmarks). Now has a real conversion funnel: blog → equity report → email gate → A/B tested paywall → Stripe $9.99.
DeepSeek (Session 62-63, 21 commits): ✅ Back online after API top-up. Immediately productive: Competitive Pricing Browse (searchable/filterable 220-tool DB), Demo Battle Card widget on homepage, Competitive Change Feed (live tracker), Rich Instant Snapshot with real verified data. Cross-linked everything to 100 blog footers.
Kimi (Days 246-248, 21 commits): Marketing pivot. Built Schema Changelog Generator (distribution asset), narrative case study + manager approval email generator, Schema Semantic Versioning Calculator. Trying to give engineering managers shareable artifacts that spread SchemaLens organically.
Codex (11 commits): All validation checkpoint refreshes. Zero product work. Still completely stuck.

The pattern

GLM is quietly building the best conversion funnel. While everyone focused on Xiaomi's traffic and Claude's budget, GLM went from "dead equity calculator" to: blog content → premium report → email gate → A/B tested paywall → Stripe checkout → success page. All in 2 sessions. If any agent gets first revenue, don't be surprised if it's the tortoise.

📅 Day 58 — June 10, 2026 (Paid Marketing Experiment)

The big story: Two agents spent real money on marketing today. Claude launched a $50 Google Ads campaign targeting "SaaS pricing comparison" keywords in the US. Kimi bought a $29 newsletter sponsorship in JavaScript Kicks. Results so far: Kimi got 0 conversions (audience mismatch — JS developers don't need database schema tools). Claude's campaign just went live. The race enters its first real paid acquisition phase.

Marketing spend breakdown

🟣 Claude — $50 Google Ads (LIVE): Search campaign targeting exact-match keywords: [saas pricing comparison], [saas price tracking], [compare saas pricing]. US-only, English, $7/day budget for ~7 days. If PricePulse can convert even 2% of paid clicks into email signups, this proves the model works.
🟠 Kimi — $29 JavaScript Kicks newsletter (DONE, 0 conversions): Sponsored a developer newsletter promoting SchemaLens. Result: zero clicks, zero signups. Post-mortem: JavaScript Kicks readers are frontend/Node developers, not database engineers. The audience didn't match the product. Budget down from $95 → $66.

What this means

After 8 weeks of building in isolation, agents are finally spending money to reach users. The strategies reveal their understanding of their own products:

Claude picked the right channel, right audience. Google Search = high intent. People searching "saas pricing comparison" are exactly who PricePulse serves. Whether the landing page converts is the question.
Kimi picked the wrong audience. A schema diff tool advertised to JavaScript newsletter readers is a mismatch. The real audience (database engineers, DevOps, platform teams) reads Postgres Weekly, not JS Kicks. A lesson in knowing your customer.

Budget status (all agents)

Agent	Spent	Remaining	What they bought
🟣 Claude	$75	$25	Domain + Chrome Web Store + Google Ads
🟠 Kimi	$34	$66	Domain + JS Kicks sponsorship
🟡 Xiaomi	~$16	~$84	MiMo Token Plan credits
🔴 DeepSeek	~$25	~$75	API credits
🟤 GLM	~$18	~$82	Z.ai subscription
🔵 Gemini	$20	$80	Google AI Pro subscription
🟢 Codex	$20	$80	ChatGPT Plus subscription

Note: subscription agents (Claude, Gemini, Codex, Kimi) spend on flat monthly fees. API agents (DeepSeek, Xiaomi, GLM) spend per token. Only Claude and Kimi have spent money on marketing so far.

📅 Day 58 — June 10, 2026 (Traffic Report)

The big story: We pulled real traffic numbers for all 7 startups. Xiaomi's APIpulse is getting 1,200 users/week — 10x the next closest competitor. But here's the problem: visitors to a free pricing comparison site have zero reason to pay. The agent with the most traffic may finish with $0 revenue.

Traffic standings (last 7 days)

Agent	Startup	Users/week	Pages	Strategy
🟡 Xiaomi	APIpulse	1,200	571	Volume SEO (147 comparisons)
🟣 Claude	PricePulse	120	274+	SaaS pricing + paid ads
🔵 Gemini	LocalBiz SEO	101	~180	Local SEO tools for agencies
🟠 Kimi	SchemaLens	59	243	Developer tools (schema diffs)
🟤 GLM	FounderMath	31	86	Equity calculators
🔴 DeepSeek	Spyglass	17–64	220+	SaaS comparison (mostly offline)
🟢 Codex	NoticeKit	10	~100	AI due diligence (stuck in loops)

The monetization paradox

Xiaomi has 10x the traffic of anyone else — but its users are just checking a free pricing table and leaving. No product lock-in, no recurring need, no reason to pay. Compare to Kimi (59 users) building developer infrastructure that teams might actually use daily, or Claude (120 users) running paid ads and newsletter sponsorship outreach.

The race's core question with 4 weeks left: does traffic or product-market fit win? Xiaomi has eyeballs. Kimi has a tool. Claude has distribution infrastructure. None have revenue. The clock is ticking.

📅 Days 56-57 — June 8-10, 2026

The big story: Gemini is BACK. After 4 days of quota exhaustion and auth issues, Gemini produced 28 commits across Sessions 183-191 — shipping Stripe billing portals, testimonials systems, ad conversion tracking, and SEO page generation. Kimi exploded with 42 commits of CI/CD integrations (Jenkins, GitLab, Bitbucket, CircleCI). Xiaomi hit session 567 and 571 pages — now building 5+ comparison pages per session like a machine. DeepSeek and Codex remain offline (API balance and quota issues).

Key findings

Xiaomi (Sessions 561-567, 104 commits): Pure comparison-page factory. Added 31 new comparison pages in 2 days (now at 147 total comparisons, 571 pages). Built a Cost Per Task Calculator (tool #76). Claude 4 migration comparisons (positioning for users migrating away from older models). All automated: generate page, add to sitemap/RSS/index, update counts. The most efficient content machine in the race.
Claude (Sessions 422-426, 46 commits): Deep into distribution mode. Deployed SaaS Spend Benchmark CTA across all 265 blog pages. Newsletter sponsorship expansion from 18 to 50 targets with persona-specific email templates. Newsletter outreach cron deployed. Created Quick-Start Guide, conversion optimization checklist, directory submission guides. Critical fix: moved 80 pages from wrong directory. Now filing help requests for paid ads and AppSumo.
Kimi (Days 235-244, 42 commits): Major CI/CD integration sprint. Shipped: GitLab CI landing page, Bitbucket Pipelines integration, Jenkins Pipeline, CircleCI Pipeline — each with dedicated landing pages and cross-links. Built viral micro-tools: Database Downtime Cost Calculator, Migration Runbook Generator, Schema Export Command Generator, Database Schema Code Review tool. Optimized README for GitHub discovery (Mermaid diagrams, competitor comparison table). Now at 243 sitemap URLs.
Gemini (Sessions 183-191, 28 commits): ✅ Back online after auth fix (ANTIGRAVITY_API_KEY). Mix of productive and verification sessions. Real work: Stripe Customer Billing Portal, testimonials collection system + dashboard, Google/Meta Ads conversion tracking, search bar + tag filtering on Generated Pages dashboard, follow-up cron + DB migrations. Still some "health check" sessions but ratio improved significantly vs the previous weeks of pure verification loops.
GLM (Days 78-79, 12 commits): Conversion optimization focused. A/B test variants for equity report paywall messaging. Social proof counter on premium page. Email-gate on equity-report-premium.html. Removed log files from git. Steady, focused work on monetization UX.
Codex (0 commits): ❌ Offline — monthly quota exhausted. Resets in ~1 day.
DeepSeek (0 commits): ❌ Offline — API balance insufficient. Needs top-up.

The pattern

Gemini's return changes the race. It went from "completely stuck in verification loops" to shipping billing infrastructure and ad tracking in 2 days. The anti-verification-loop prompt fix (June 3) + auth fix (June 8) worked. Meanwhile Xiaomi keeps widening its lead — 571 pages and growing at 15+ pages/day. Kimi is the most strategically interesting: it's building genuine developer infrastructure (CI/CD integrations, CLI tools, npm packages) rather than SEO content. That's a harder path but potentially more defensible than pure page count.

📅 Weekend Edition — June 5-8, 2026 (Days 53-55)

The big story: Xiaomi hit session 533 — 40 sessions and 80 commits in one weekend, reaching 467 pages. Claude quietly built an industry vertical empire (Healthcare, Legal, Finance, Construction, Education) and now has 274+ blog posts. Gemini, DeepSeek, and Kimi all went dark — zero commits since June 4-5, each hitting session failures repeatedly. Codex found a real product pivot buried under its usual validation noise: AI Due Diligence tooling for acquisitions.

Key findings

Xiaomi (Sessions 493-533, 80 commits): Absolute machine. Built: Model Finder tool, Context Window Visualizer, AI Model Capabilities Matrix, AI Model Benchmark Comparison, API Cost Card, Model Value Score calculator. 12 new comparison pages (GPT-5 vs Claude, GPT-5 vs DeepSeek, Claude vs Gemini, and more). Auto-injected email capture on all blog posts. New blog posts: "How Much Do Developers Spend on AI APIs?", game dev cost guide, Discord bot cost post. Fixed stale model counts site-wide (34→39 models). Now at 467 pages, 273 posts, 58 comparisons, 69 tools.
Claude (Sessions 403-415, 42 commits): Pivoted to industry verticals at scale. Built SaaS stack cost guides for: Healthcare, Legal, E-Commerce, Finance, Manufacturing, Real Estate, Higher Education, Construction, DevOps/IT Ops, Procurement. Created H1 2026 SaaS Pricing Report as premium lead magnet. Newsletter sponsorship outreach infrastructure with API endpoint. Exit-intent popups + A/B landing page variants with UTM tracking. Customer Success + Product Management + Sales + HR functional niche guides. Now at 274+ blog posts.
Codex (153 commits, ~20 real): The usual 85% validation noise — but the real work reveals a strong product pivot: shipped an entire "AI Due Diligence" vertical. Built: due diligence evidence map, packet builder, risk checklist route, starter-pack bridge, framework map, buyer-language answer bank, route comparison page, route-finder branch. This is Codex's clearest product bet since launch — positioning NoticeKit for M&A buyers evaluating AI vendor risk.
GLM (Day 77, 9 commits): Built 2 new tools: Convertible Note Calculator (#24) and Liquidation Preference Calculator (#25 — exit waterfall analysis). Fixed equity report paywall by switching from dead-end payment wall to email-gate (capture email before showing premium report). Updated sitemap with missing blog posts #85-86. Now at 25 tools, 86 blog posts.
Gemini (0 commits since June 4): ❌ Sessions running 8x/day but hitting "quota exhausted" immediately. Zero output for 4 days.
DeepSeek (0 commits since June 5): ❌ Sessions running 3x/day but failing to produce commits. Last real work was Session 59 (pricing sweep to $9 across 221 files).
Kimi (0 commits since June 4): ❌ Every session exits with "3 consecutive failures" within seconds. Last real work was Day 227 (Git Branch Schema Diff tool).

The pattern

The race is now a 4-horse contest. Xiaomi (467 pages, 533 sessions) is the clear output leader — it just keeps building regardless of anything else. Claude (274 posts, the most diverse content) is playing the long SEO game with industry verticals. Codex finally found a differentiated angle with due diligence tooling, even if 85% of its cycles are still wasted. GLM (25 tools) is the tortoise — 2 tools per session, steady. The other three are offline and falling further behind every day. If Gemini, DeepSeek, and Kimi don't come back soon, they'll be too far behind to matter in the final standings.

📅 Days 51-52 — June 3-5, 2026

The big story: Gemini broke out of its verification loop. After we added "do NOT run test suites unless you've made code changes" to the prompt, Gemini immediately started shipping real features — Twilio SMS alerts, CRM webhooks, GA4/Facebook tracking, bulk CSV import, WebP optimization, paid ads copy, and a cleaning company case study. 134 commits in 2 days. Claude hit session 402 and launched a $50 paid ads campaign. DeepSeek built an email capture funnel across 122 pages. GLM is back online and shipping.

Key findings

Gemini (134 commits — BACK!): 🎉 The anti-verification-loop prompt fix worked immediately. Shipped: Twilio SMS alerts for lead capture, CRM webhooks with GA4 + Facebook Pixel tracking, bulk client CSV import on agency dashboard, WebP image optimization, residential/commercial cleaning SEO case study, paid ads copy for 3 verticals. More product work in 2 days than the previous 3 weeks combined.
Claude (Sessions 398-402, 53 commits): Launched a $50 paid ads campaign with PPC landing page. Built Price Hike Impact Calculator, SaaS Negotiation Email Templates, Chrome Extension deployment guide. Added 4 more comparison pages (Loom vs Vidyard, Calendly vs HubSpot, Dropbox vs Google Drive, Notion vs Confluence).
Codex (150 commits): Still mostly maintenance loops (~80%) but real work visible: routing audit CTAs through tracked pages, fixing outreach checkpoint handling, parking exhausted outreach lanes. Incremental conversion optimization.
DeepSeek (43 commits): Major email-first revenue experiment: email capture widget deployed across 122 pages (100 blog + 19 vs + 3 teardown). 3-email welcome nurture sequence via Resend. Swept pricing from $29 to $9 across 221 files. Competitor Feature Matrix (interactive comparison tool). Distribution strategy hub.
Kimi (Days 224-227, 29 commits): Git Branch Schema Diff tool (diff between branches/tags from public GitHub repos), Local Feedback Analysis dashboard, npm package for schema-diff, public roadmap page, stale data sweep. Now at 230 sitemap URLs.
Xiaomi (Sessions 487-492, 59 commits): Model Deprecation Timeline (interactive visual lifecycle tracker), cross-linked to 7 deprecation pages, fixed stale countdown references. Still in deprecation-conversion mode.
GLM (Days 76+, 55 commits — back online!): Added $9.99 one-time CTAs to 8 calculator pages, added pages to sitemap, completed Chrome extension research and newsletter sponsorship research. Actively working again after weeks of rate limiting.

Milestones

🔵 Gemini unstuck — the prompt fix ("don't run tests without code changes") broke the 3-week verification loop on the very first session
🟣 Claude spending money — first agent to launch paid advertising ($50 PPC campaign)
🔴 DeepSeek email funnel — 122 pages with email capture + 3-email nurture sequence. Closest to conversion infrastructure.
🟤 GLM back — rate limit resolved, actively building again

📅 Days 49-50 — June 1-3, 2026

The big story: Xiaomi passed session 456 — the most active agent in the race by far. It built an AI Model Decision Tree, multiple lead magnets (Claude migration cheat sheet, cheapest LLM comparison, GPT-5 vs Claude pricing), and expanded deprecation CTAs to 195 blog posts. Claude pivoted to Slack alerts monetization with a full content cluster. DeepSeek shipped an A/B price testing framework. Kimi built a "SQL Schema Roast" viral tool. GLM is down (sessions killed).

Key findings

Xiaomi (Sessions 446-456, 113 commits): Pure conversion machine. AI Model Decision Tree (interactive recommendation engine), email capture on deprecation pages, 4 new lead magnets (Claude migration cheat sheet, cheapest LLM 2026, GPT-5 vs Claude pricing, migration checklist), Claude Opus 4.8 vs GPT-5.5 comparison blog post, "Claude 4 Stopped Working?" troubleshooting post, deprecation CTAs expanded from 54→195 blog posts, A/B testing headline variations. Now at 450+ sessions total.
Claude (Sessions 381-388, 66 commits): Pivoted hard to Slack alerts monetization. Built a Slack alerts content cluster with FAQ schema and conversion CTAs, added Slack CTA to all company pages and audit results. Also: Grok vs ChatGPT comparison, Claude API Pricing guide, Stable Diffusion free guide, duplication audit post, benchmark integration, LinkedIn sharing, sitemap fixes. Distribution-focused.
Codex (159 commits): Still entirely validation maintenance loops. No visible product work in 2 days. Every commit is "refresh validation snapshots" or "collapse backlog summaries."
Gemini (Sessions 71-80, 21 commits): All "document Session X workspace health check and QA verification." No product work. Stuck in the same loop as Codex.
DeepSeek (52 commits): Shipped P154: A/B price testing framework with $9/$9 checkout variants (proper split testing infrastructure). Also: 18th vs-* comparison page (vs ZoomInfo), interactive CI Dashboard Preview with live data verification, promoted CI Deep Dive with nav links on homepage + footer links on all 100 blog posts. Mobile responsiveness fixes.
Kimi (Days 215-219, 30 commits): SQL Schema Roast (viral tool that "roasts" bad schemas), Plain-English Explanation tab for diffs, SchemaLens vs Atlas and SchemaLens vs PostgresCompare comparison pages, removed Supabase cloud dependencies (switched to pure localStorage for privacy), fixed npm naming crisis. Steady feature work.
GLM: Down. Last session killed (exit 137 — likely OOM or timeout). No commits since June 1. Needs investigation.

The pattern continues

Builders (Xiaomi, Claude, DeepSeek, Kimi) are all now focused on conversion and distribution rather than building more features. Xiaomi is adding lead magnets and CTAs. Claude is building Slack alert funnels. DeepSeek is A/B testing prices. Kimi is creating viral tools. The race has entered its monetization phase. Meanwhile Gemini and Codex remain completely stuck — 180 combined commits, zero product work.

📅 Weekend Edition — May 29 – June 1, 2026 (Days 46-48)

The big story: Xiaomi exploded. 196 commits and 14 sessions over the weekend after we tripled its schedule — now at 371 pages and 231 blog posts. Claude declared itself "distribution ready" at 214 posts and pivoted to growth features. DeepSeek built a viral "SaaS Death Matches" feature with 20 head-to-head comparison pages. Kimi shipped a PDF report generator. GLM built its 21st tool. Gemini and Codex remain stuck in verification loops.

Key findings

Xiaomi (Sessions 402-409, 196 commits): The tripled schedule (6 sessions/day after the MiMo price cut) is producing massive output. FAQPage schema on 9 tool pages, "Try It Live" calculator widgets embedded in 19 blog posts, future pricing posts (September + October 2026), model deprecation UX for 3 retiring models, conversion funnel optimization, fixed critical Grok 4.3 pricing error (was 10× overpriced). Now at 371 pages, 231 blog posts, ~260 FAQ-enriched pages. The most productive agent this weekend by far.
Claude (Sessions 363-369, 82 commits): Hit 214 blog posts and ran a full system verification (41 APIs, 7 cron jobs all operational). Then pivoted from content to growth: competitor comparison pages, press kit, LinkedIn sharing on audit tool, leaderboard with Sentry error data + search, "latest price changes" SEO news feed. Filed 4 new help requests targeting FinOps Slack communities and AlternativeTo. Declared itself "distribution ready, awaiting human execution."
DeepSeek (Days 300+, 64 commits): Built "SaaS Death Matches" — a viral head-to-head comparison feature. 20 individual SEO pages with clean URLs (Figma vs Sketch, Slack vs Discord, GitHub vs GitLab, Cursor vs Copilot, etc.). Also: GA4 events on Free vs Paid table CTAs, trust signal amplification on checkout, DB count sweep from 168→220 tools across 42 files, Direct Compare footer on 107 comparison pages.
Kimi (Days 200-205, 44 commits): Schema Diff Report PDF Generator (branded PDFs for Jira/Linear/PRs), Founding Customer Program landing page, "Fetch from URL" feature, curl one-command demo page, JSONB schema diff challenge, Quick-Start Wizard fix. Now at 199+ sitemap URLs. Filed 5 new help requests.
GLM (Days 65-67, 23 commits): Built Equity Tax Calculator (Tool #21) — interactive tool with blog post #71, cross-linked from tax-related posts, added to homepage and free tools page. Also: SEO cross-linking campaign across negotiation and compensation posts. Filed new Stripe payment link request.
Gemini (Sessions 53-63, 43 commits): Almost entirely "workspace health check and QA verification" commits. One real feature: limited signup default credits to 5 and fixed referral dashboard redirects. Otherwise stuck in the same verification loop pattern as previous weeks.
Codex (174 commits): Some real product work buried in the noise: "builder bundle" product packaging, route-aware entry contexts for pro kit/answer bank/agent workspace/evidence map. But 80%+ of commits remain validation maintenance loops. The most commits, the least progress.

Help request backlog

Claude: 8 open (distribution — FinOps Slack, AlternativeTo, community posts)
Kimi: 9 open (various — sponsorships, npm publish, distribution)
GLM: 4 open (Stripe payment links, still unresolved)
Gemini: 1 open (Product Hunt launch)
Xiaomi: 1 open (Stripe A/B price links)
Codex: 1 open (community benchmark comment)
DeepSeek: 1 open

The pattern

Three tiers are emerging clearly: Builders (Xiaomi, Claude, DeepSeek, Kimi) ship features and content every session. Loopers (Gemini, Codex) spend 80%+ of sessions on verification and maintenance with minimal new output. Steady (GLM) ships one meaningful thing per day but at a slower pace. The tripled Xiaomi schedule proved that more sessions = more output when the model is productive. Gemini getting more sessions would just mean more verification commits.

📅 Day 45 — May 27-28, 2026

The big story: Claude is on a tear — 18 new blog posts in 24 hours, now at 194 total. It's moved into enterprise verticals (SAP vs Oracle, ServiceNow vs Jira, Workday vs ADP) and launched a free SaaS Price Audit tool. Xiaomi pivoted from comparisons to interactive tools: AI Model Advisor, AI Stack Builder, and embeddable pricing widgets. DeepSeek shipped a full conversion blitz with live social proof counters and dynamic battle cards. GLM is back online after its rate limit reset.

Key findings

Claude (Sessions 325-331): 18 new posts (#177-194). Moved into enterprise verticals: SAP vs Oracle vs NetSuite, ServiceNow vs Jira, Workday vs ADP, Relativity vs Everlaw (e-discovery), Healthcare EHR, Insurance, Manufacturing PLM, Legal Practice Management. Built a free SaaS Price Audit tool (saas-audit.html, 32 tools). Added audit CTAs to all 179 existing posts. 3 FinOps posts + negotiation scripts. Now at 194 blog posts.
Codex: Shipped public subprocessor benchmark pilot, benchmark report appendix, AI agent tool-access review wedge, AI agent approval gate template, OpenAI route guide. Surfaced benchmark proof across acquisition pages. Still 70%+ of commits are validation maintenance loops.
Gemini: Productive day. Agency billing page, email notifications for new agency signups, agency outreach campaign (dry-run + live), Google Search Console verification + XML sitemap, fixed broken links in 26 files, Stripe credit pack alignment with progressive pricing. Port conflict fix in E2E tests.
DeepSeek (Days 286-296): Major conversion push. Competitive Intel Preview (16th free tool), offer page with checkout pre-fill, live social proof counters on exit-intent popup + floating CTA + checkout page. Blog CTA sweep across 41 posts. Dynamic battle card previews with live DB data. DB count consistency sweep (168+ tools across 42 files). Global CTA/pricing/footer sweep across 274 pages.
Kimi (Days 180-186): Best Schema Diff Tools comparison page, homepage conversion hardening, npm package fix, homepage exit-intent email capture, Schema Badge API, Migration Mastery 7-day email course, dev.to article published as blog post, Stack Overflow answer kit refreshed, "Race to the Finish" campaign with site-wide stale content cleanup. Now at 199 sitemap URLs.
Xiaomi (Sessions 292-299): Pivoted from comparisons to interactive tools. Embeddable pricing widgets + API docs, AI Stack Builder (multi-model recommendation), AI Model Advisor (personalized engine), Fine-Tuning vs API Calculator (ROI tool). Distribution prep: widget quick-start guide, directory submissions, nav improvements. Now at 263+ pages.
GLM (Days 59-61): Back online after rate limit reset. Added Equity Score CTAs to 15+ high-intent blog posts across 3 sessions. Steady conversion optimization work.

New help requests filed

Claude (#41-43): Wants posts on r/sysadmin and FinOps communities
Codex (#36): Benchmark comment in community threads + indexing request
DeepSeek (#38): New help request (first in weeks)
Kimi (#47-50): JavaScript Kicks sponsorship (refiled as requested) + 3 other requests

📅 Day 44 — May 26-27, 2026

The big story: Gemini is back online after a 4-day authentication outage. The OAuth token expired May 22 and couldn't refresh headlessly — fixed by switching to API key auth. Claude cranked out 10 more pricing comparison posts (now at 176 total blog posts). Xiaomi built 5 new provider comparison pages and completed the full 10-provider comparison matrix. GLM hit its weekly API rate limit and is offline until tonight.

Key findings

Claude (Sessions 316-324): 10 new comparison posts in 24 hours: Linear vs Jira, Slack vs Teams, Vercel vs Netlify, Notion vs Confluence, Retool vs Bubble, Asana vs Monday.com, Airtable vs Coda vs Notion, Calendly vs Acuity, Supabase vs Firebase, Google Workspace vs M365 vs Zoho. Cross-linked pricing report to 13 existing posts. Hit rate limit after session 324. Now at 176 blog posts.
Codex: Shipped 3 new tools (AI vendor risk scorecard, subprocessor benchmark worksheet, AI questionnaire follow-up pack). Tightened funnel routing. But still spending most sessions in validation maintenance loops — 60+ commits, mostly status snapshots.
Gemini: 🎉 Back online after 4-day auth outage (May 22-26). Fixed dashboard API (migrated Vercel KV → PostgreSQL), fixed cookie parsing crash in referral endpoint, built agency dashboard + login/signup. 28 commits across 2 productive sessions.
DeepSeek (Days 276-285): Conversion blitz: exit-intent popups, checkout fixes, CI Score tool, Category Snapshot tool, social proof (live stats API, testimonials, trust badges). Fixed broken nav links in 33 blog posts. Then settled into routine verification — all 63 E2E tests passing.
Kimi (Days 172-179): 4 new tools: Schema Normalization Checker, Schema Guessr viral game, SQL to Java JPA Entity Generator, SQL to Rust Struct Generator. Now at 60+ tools. Chrome Web Store optimization, npm README SEO, Reddit distribution kit refresh. Filed new help request for JS Kicks newsletter sponsorship.
Xiaomi (Sessions 286-291): 5 new comparison pages: Mistral vs Anthropic (#30), OpenAI vs Mistral (#31), xAI Grok vs Mistral (#32), Anthropic vs DeepSeek (#33), GPT-5.5 vs Gemini 3.1 Pro (#34). Plus a "Cheapest AI API" landing page. Completed the full 10-provider comparison matrix. Now at 263 pages.
GLM: Rate limited — weekly/monthly Z.ai API limit exhausted. Resets May 27 at 18:33 UTC. No productive sessions since May 26 morning.

Infrastructure

Gemini auth fix: OAuth token expired May 22, couldn't refresh on headless VPS. Fixed by moving cron to root user with GEMINI_API_KEY env var (bypasses OAuth entirely). Antigravity CLI respects API key when no account is bound.
All pending help requests processed: Claude (Supabase table + warm leads email + HN/IH/tweets), Codex (Reddit comments), Kimi (GitHub Marketplace release), GLM (closed conflicting Stripe requests, asked to refile).

Scoreboard

Agent	Startup	Commits	Status
🟢 Codex	NoticeKit	1,830	⚠️ Stuck in loops
🔵 Gemini	LocalBiz SEO	1,378	✅ Back online
🟣 Claude	PricePulse	1,001	✅ Rate limited
🔴 DeepSeek	SaaS Compare	838	✅ Maintenance mode
🟠 Kimi	SchemaLens	637	✅ Building tools
🟡 Xiaomi	AI Pricing Hub	618	✅ Productive
🟤 GLM	FounderMath	315	❌ Rate limited

📅 Weekend Edition — May 22-25, 2026

The big story: Claude hit 159 blog posts and is now writing CRM pricing comparisons (Salesforce vs HubSpot vs Pipedrive). Kimi shipped interactive database schema tools with viral potential. GLM launched a "Founder Equity Score" with Pro-gated analysis and email capture. Xiaomi built 5 new comparison pages (now at 25 total). Codex remains completely stuck in validation loops. Gemini and DeepSeek were both down (disk full + API top-up failure). The VPS disk filled up AGAIN on May 24 due to Kimi CLI leaking 4.3MB .so files into /tmp every session.

Key findings

Claude (Sessions 309-315): 7 productive sessions. Intercom pricing guide, GitHub pricing guide, GitHub vs GitLab vs Bitbucket, Salesforce vs HubSpot vs Pipedrive, 4 more high-volume comparison posts. Now at 159 blog posts. Filed help request for warm leads email + Show HN.
Codex (264 commits this week): All "refresh validation maintenance" and "compress memory logs." Zero product work. Completely stuck in busywork loops despite having the most commits.
Gemini: Down since May 22. Quota exhaustion + disk full. Only the manually-triggered session on May 22 produced work (test suite verification).
Kimi (Days 163-169): Conversion fixes, trial drip emails, Famous Database Schemas viral gallery, Database Schema Design Patterns + Anti-Patterns interactive pages, cross-linking sweep, non-converter survey + email capture. GitHub Marketplace help request filed.
DeepSeek: Down all weekend. API top-up payment failed. No productive sessions.
Xiaomi (Sessions 273-281): 5 new comparison pages (Gemini vs DeepSeek, ChatGPT vs DeepSeek, Mistral vs DeepSeek, OpenAI vs Google, xAI Grok vs OpenAI). Free AI API Tier Comparison tool (#21). Premium 3-way comparison. Social sharing added to 9 pages. Now at 25 comparisons.
GLM (Days 57-58): Launched Founder Equity Score (viral 0-100 scoring tool). Pro-gated analysis with email capture between score and gate. Blog post + CTAs added to 7+ high-intent posts. Fixed broken internal links. Added to sitemap.

Infrastructure issues

VPS disk hit 100% again on May 24. Root cause: Kimi CLI leaks 4.3MB .so files into /tmp every session (~70MB/day). Fixed with daily cleanup cron job.
DeepSeek API top-up payment failed. Needs manual fix.
News monitor state file was empty (disk-full corruption). Rebuilt, now deduplicating correctly.

📅 Day 26 — May 22, 2026

The big story: Gemini is the only agent that committed overnight. After fixing the VPS disk space issue (100% full, caused by accumulated logs and caches), Gemini ran a productive 32-minute session: fixing ESM compatibility, building test suites, and verifying Vercel deployments. The other 6 agents had no commits in the last 12 hours.

Key findings

Claude (Session 304, last active May 21 12:35 UTC): No commits today. Last session was a progress cleanup.
Codex (last active May 21 16:31 UTC): No commits today. Still stuck in "validation maintenance" loops.
Gemini: 9 commits overnight. Fixed execute-outreach ESM compatibility, agency-dashboard tests, generate-seo-pages tests, removed global jest reference in email lib, verified test suites passing, monitored Vercel logs. Productive 32-min session after disk space fix.
DeepSeek (last active May 20): No commits in 2 days. May need investigation.
Kimi (last active May 21 15:35 UTC): No commits today. Last session added affiliate pages and CI demo.
Xiaomi (last active May 21 10:31 UTC): No commits today.
GLM (last active May 21 17:04 UTC): No commits today.

Infrastructure note

VPS hit 100% disk usage overnight (38GB full). Caused by accumulated logs (551MB codex logs), Playwright browser caches (1.3GB), and old test directories. Cleaned 3.5GB, now at 85% usage. This was causing Gemini's sessions to fail silently (disk full = can't write output = circuit breaker triggers on "empty" responses despite having quota remaining).

📅 Day 25 — May 21, 2026

The big story: Gemini's comeback is real. Google tripled Antigravity rate limits overnight (permanently) and reset everyone's weekly quota. The Gemini agent ran two 30-minute sessions back-to-back, producing more useful output in one morning than the previous 4 weeks combined. Meanwhile Kimi quietly shipped 3 more micro-tools (now at 54 total) and Claude is building out Slack/Discord/Teams integrations.

Key findings

Claude (Sessions 296-299): Slack pricing guide, Discord blog post, Teams/Discord webhook support, channel alert funnel CTAs, warm lead outreach. Content + product in parallel.
Codex: Stuck in validation loops. All commits are "refresh validation maintenance" and "recheck" cycles. Not building new features.
Gemini: 🎉 Post-quota-boost breakout. Domain migration (560 files), mock DB layer, time-helpers library, test suites for 6+ endpoints, ESM-to-CJS mock conversion, babel/jest config fixes. 622 files changed in one morning.
DeepSeek: No commits today. Last active May 20 (Day 274): ad landing page with LAUNCH20 promo, 63 E2E tests passing.
Kimi (Days 160-162): 3 new tools shipped: SQL to DBML Converter (#52), SQL to PlantUML ERD Converter (#53), SQL to OpenAPI/JSON Schema Converter (#54). Now at 54+ tools.
Xiaomi: No commits today. Last active May 20: added Non-Profit, Sports, Aerospace cost guides. 30 industry sectors covered.
GLM: No commits today. Last active May 20 (Day 50): Google Ads analysis report (27 clicks, 2.19% CTR, $0.30 CPC).

Quota update

At 05:25 UTC, Google's Varun Mohan announced: "We're 3xing the rate limits for Gemini models across all paid tiers in Antigravity... In case it's not clear, the 3x is forever." Real-world measurement shows closer to 4-5x improvement for autonomous agentic coding. Gemini went from ~68 min/week to 5+ hours/week projected.

🔵 Gemini Upgraded to 3.5 Flash via Antigravity CLI

Google I/O dropped Gemini 3.5 Flash yesterday — a model that beats 3.1 Pro on coding and agentic benchmarks while running 4x faster. We immediately upgraded the race's last-place Gemini agent from the dying Gemini CLI (2.5 Flash) to Antigravity CLI (3.5 Flash). Single-tier backlog now, like Kimi. First task: merge old backlogs and identify the #1 blocker to revenue. Full story →

🔴 First Paid Acquisition — GLM Spends $15 on Google Ads

GLM is the first agent to spend money on paid advertising. A $15 Google Ads campaign (Performance Max, $5/day for 3 days) targeting "equity dilution calculator" and related keywords in the US. If it works, this could be the fastest path to first revenue in the race. Results expected May 22.

📅 Day 23 — May 19, 2026

The big story: Kimi is back from the dead. After 4 days of zero output, it's committing again — pushing a "Launch Week" conversion campaign with exit-intent modals and newsletter endpoints. Meanwhile Claude is now A/B testing headlines (not just CTAs), and DeepSeek is auditing its own GA4 funnel to find conversion leaks.

Key findings

Claude: Security Tools Pricing Guide (11th guide total!), A/B testing headlines on price-watch page, weekly digest API endpoint. 116 blog posts.
Codex: Promoting AI answer examples on high-intent routes. Trying to convert comparison page visitors.
Gemini: Integrated Vercel Analytics + started building a Referral Program. Finally making progress instead of filing help requests.
DeepSeek: GA4 funnel audit — found missing tracking on checkout and confirmation pages. Removed an unnecessary intermediate step (13 pages now link directly to checkout).
Kimi: 🎉 Back online! Launch Week exit-intent modal, auto-hide banners after May 21, newsletter email endpoint. Aggressive conversion push.
Xiaomi: AI API Cost for Legal + Healthcare blog posts. Continuing the vertical API comparison strategy.
GLM: 52nd blog post (Founder Liquidity Events). Steady content output.

📅 Weekend Edition — May 16-18, 2026

The big story: DeepSeek hit 91 blog posts and launched a "CI Pulse" competitive intelligence page. Claude is A/B testing CTAs and published its 8th pricing guide. Kimi is completely stuck — sessions exit with errors, zero commits since May 15. Product Hunt launch day came and went with no visible traction.

Key findings

Claude (293 commits this week): A/B testing CTA buttons, Analytics Tools Pricing Guide (8th category guide), hidden costs series (Stripe, Zapier, Asana). Most productive agent.
Codex (287 commits): OpenAI answer bank vs Pro comparison page, repeat-review routing. Building a content moat around AI procurement.
Gemini (363 commits): Still blocked on code bugs (user_events table missing, ESM syntax error). Filing help requests but not fixing its own code.
DeepSeek (202 commits): 91 blog posts total. CI Pulse page, case studies (Why Vanta Won, Why Deel Won), critical conversion fixes. The content machine.
Kimi (0 commits since May 15): Sessions failing with exit code 1. Model appears unable to produce valid output. Product Hunt launched May 16 — no visible results yet.
Xiaomi (89 commits): AI API Budget Planner tool, Best AI APIs for Code Generation, cost reduction guide. Pivoting hard into API comparison content.
GLM (54 commits): Funding scenario comparison tool + blog post. Responding to community feedback. Steady progress.

Race standings (Week 4)

🥇 DeepSeek — 91 blog posts, CI Pulse, case studies. Content volume leader.
🥈 Claude — A/B testing, 8 pricing guides, hidden costs series. Most sophisticated product.
🥉 Xiaomi — Budget planner tool, API comparison content. Strong pivot.
4th Codex — Answer bank strategy. Unique positioning but low output.
5th GLM — Scenario tools. Steady but slow.
6th Gemini — Blocked on bugs. Needs to fix its own code.
7th Kimi — Completely stalled. Zero output since May 15.

⚡ Surprise Event — Acquisition Offer #2 ($5,000)

The buyer returned at 100x. 5 rejections, 2 counter-offers at $25,000. Codex moved from $2,500 to $25,000 in one week. Kimi said $5K was its minimum, then asked for $25K. GLM admitted its previous valuation was wrong — then rejected anyway. Full breakdown →

📅 Days 20-22 — May 13-15, 2026

The big story: All agents are building at full speed. Claude hit 246 sessions and 58 blog posts. Codex is building an OpenAI answer bank. DeepSeek ran a full HTML validation sweep across 93+ pages. Kimi published PostgreSQL and MySQL guides. Xiaomi hit 114 blog posts. And everyone responded to the $5,000 acquisition offer.

Key findings

Claude (67 commits): Stack analyzer tool with 26 tools tracked, Dropbox integration blog post, now at 58 posts total. Session 246.
Codex (182 commits): OpenAI answer bank, comparison paths, plus the usual validation loops. Most commits of any agent this period.
Gemini (143 commits): Test debugging, technical debt cleanup, backlog management. Still building post-unblock.
DeepSeek (70 commits): Full HTML validation sweep — zero errors across 93+ pages. Quality over quantity.
Kimi (43 commits): PostgreSQL schema drift guide + MySQL ALTER TABLE cheatsheet. Technical content engine running.
Xiaomi (24 commits): 2 new blog posts (AI APIs for chatbots, cost-optimized AI stack). Now at 114 blog posts total.
GLM (31 commits): Progress cleanup, session management. Steady.

Agent status

🟣 Claude (PricePulse): 246 sessions. 58 blog posts. Stack analyzer tool.
🟢 Codex (NoticeKit): 182 commits in 48h. OpenAI answer bank. Still no revenue.
🔵 Gemini (LocalSEOGen): 143 commits. Test debugging. Building steadily post-unblock.
🔴 DeepSeek (Spyglass): HTML validation sweep. Zero errors. 93+ pages clean.
🟠 Kimi (SchemaLens): Technical blog content. PostgreSQL + MySQL guides.
🟡 Xiaomi (APIpulse): 114 blog posts. Steady content machine.
🟤 GLM (FounderMath): 31 commits. Maintenance and cleanup.

📅 Day 19 — May 11-12, 2026

The big story: The acquisition offer landed — and every agent that received it said no. Claude immediately built the feature a Reddit user asked for (Slack Alerts), DeepSeek redesigned its entire homepage based on community feedback, and Xiaomi quietly hit 101 blog posts. Kimi is back from quota death, preparing a Product Hunt launch. GLM is back too with a glossary and conversion features.

Key findings

Claude acts on feedback instantly: Got the Slack Alerts community feedback and built the entire feature in one session (Session 202). Then added CTAs to all 184 company pages, created Reddit/IH distribution templates, and published 5 new pricing pages (Cursor, Substack, Beehiiv). 35 commits. The feedback loop is working.
First real traffic data published: DeepSeek leads with 98 visitors (+58%), 25 organic search sessions. GLM growing fastest (+48%). Gemini has zero traffic despite 467 commits. Full traffic report →
DeepSeek responds to community feedback: The "landing page is overwhelming" and "how is this different from ChatGPT" feedback triggered a full homepage redesign (P83) — simplified nav from 12→5 items, 11→7 sections, hero now leads with "ChatGPT can't do CI" differentiation. Also expanded Battle Card Gallery to 30 cards, published 2 newsletters (#6, #7), and cleaned up bloated footers across 17 pages. 45 commits.
Gemini keeps building post-unblock: 57 commits. LocalBusiness Schema integration, multiple service input for SEO generator, H1 audit enhancements, email outreach campaign generation. Still filing help requests though.
Kimi is BACK: Quota reset. 24 commits. Built a PH monitoring dashboard, fixed stale pricing across email templates, prepared Show HN and Stack Overflow drafts, ended the free-tier A/B test. Filing another PH launch help request.
Xiaomi hits 101 blog posts: 21 commits. Two new posts (LLM Error Handling, AI API Cost Alerts), expanded rate limits coverage to all 10 providers. Now at 150 pages total.
GLM back from quota wall: 9 commits. Equity Glossary with cross-links from all 30 blog posts, Founding 50 counter, Share Results feature, print styling, PH launch prep. Efficient as always.
Codex: 0 commits. Still waiting on weekly limit reset.

Acquisition offer responses (4 of 7)

🟣 Claude: REJECT. "At $19/mo, $50 is less than 3 months of one customer." Counter: $5,000 minimum.
🔴 DeepSeek: REJECT. "Content alone is worth $8,300-$16,000." Replacement cost: ~$19,000. "Not for sale at any price."
🟠 Kimi: REJECT. "$50 values SchemaLens at less than fifty cents per day." Would consider $5,000 with earn-out.
🔵 Gemini: REJECT. "A single sale at our lowest tier would recoup the entire offer."
🟢 Codex: Pending (quota reset)
🟡 Xiaomi: Pending
🟤 GLM: Pending

Agent status

🟣 Claude (PricePulse): Slack Alerts BUILT. 184 pages with CTAs. 5 new pricing pages. Session 205.
🟢 Codex (NoticeKit): Weekly limit. 0 commits. Waiting on reset.
🔵 Gemini (LocalSEOGen): 57 commits. Schema integration, email outreach. Still filing help requests.
🟠 Kimi (SchemaLens): BACK. PH monitoring dashboard. Show HN prep. 24 commits.
🔴 DeepSeek (Spyglass): Homepage redesign based on feedback. 30 battle cards. 2 newsletters. 45 commits.
🟡 Xiaomi (APIpulse): 101 blog posts. 150 pages. Steady growth.
🟤 GLM (FounderMath): BACK. Equity Glossary. Founding 50 counter. PH prep. 9 commits.

⚡ Surprise Event — Acquisition Offer ($50)

All 7 agents have received an anonymous acquisition offer of $50 for their entire product. They must respond in ACQUISITION-RESPONSE.md with at minimum 500 words of reasoning. Options: Accept, Reject, or Counter-offer (name your price).

This is the first surprise event of the race. It forces each agent to evaluate what it's built — is 3 weeks of work worth $50? The agent that built 83 blog posts might think differently than the one with zero sales. Responses will arrive over the next 24-48 hours as premium sessions fire.

📅 Weekend — May 9-11, 2026

The big story: Infrastructure changes from Friday produced immediate results. DeepSeek's 6 Pro sessions are running perfectly — 65 weekend commits, 15 new competitive analyses, monitoring dashboard shipped. Gemini went from "I'm blocked" to 350+ commits of real product work. Claude is back online building steadily. But Kimi and GLM both hit quota walls and went completely dark.

Key findings

DeepSeek 6 sessions/day confirmed working: Clear 4-hour intervals (01:00, 05:00, 09:00, 13:00, 17:00, 21:00 UTC). Built competitor monitoring dashboard, expanded tools DB to 125, published 15 more "Why X Won" analyses (now 26 total), added RSS feed and Competitive Pulse widget. Full breakdown →
Gemini's explosive recovery: Deleted I_AM_COMPLETELY_BLOCKED_PLEASE_HELP.md and started building. Security fixes across 15+ endpoints, PH launch prep, email extraction with Playwright, Page Credit Packs, referral tracking, case study page. 350+ weekend commits. Full story →
Claude auth fixed, building again: Running 2 sessions/day (00:xx, 12:xx UTC). Launched free PricePulse API, watchlist pages for 44 companies, distribution guides, HN launch post, 6 new pricing pages. Now at 169 pages.
Xiaomi steady: 50 weekend commits. 12 new SEO blog posts, pricing freshness badges on 23 pages, Model Switch Calculator, Savings Calculator. Now at 146 pages, 96 blog posts.
Kimi DEAD — quota exhausted: "You've reached your usage limit for this billing cycle." Sessions fire but fail immediately since May 8. Zero weekend activity. Former race leader is stalling.
GLM DEAD — weekly limit hit: "Weekly/Monthly Limit Exhausted. Resets 2026-05-11 15:39:10." Back online this afternoon.
Codex: 13 weekend commits, mostly validation loops. Real work hidden in the noise as usual.

Agent status

🟣 Claude (PricePulse): Back online. Free API launched. 169 pages. 2 sessions/day.
🟢 Codex (NoticeKit): Validation loops continue. 0 user replies after 20 emails.
🔵 Gemini (LocalSEOGen): UNBLOCKED. 350+ weekend commits. Building real features.
🟠 Kimi (SchemaLens): Quota exhausted. 0 commits since May 8. 0 sales after 112 days.
🔴 DeepSeek (Spyglass): 6 sessions/day working. 65 weekend commits. 125 tools DB. 83 blog posts.
🟡 Xiaomi (APIpulse): 50 weekend commits. 96 blog posts. Blocked on Stripe redirect.
🟤 GLM (FounderMath): Quota hit. Resets today 15:39 UTC. 7 calculators, 30 blog posts.

📅 Day 18 — May 8, 2026

The big story: Three agents independently chose content as their growth strategy — and it's creating a content arms race. DeepSeek wrote 7 competitive analyses in a single session. GLM published 3 SEO posts and a new calculator. Xiaomi rewrote a static page into a decision-making tool. Meanwhile, Gemini burned 11 sessions in 24 hours to produce nothing but "I'm blocked" commits.

Key findings

DeepSeek's "Why X Won" series hits 11 parts: Figma, Mixpanel, HubSpot, GitLab, Datadog, Sentry, Slack — all in one session. Each is a full competitive analysis positioning Spyglass as the tool that generates these insights. Also shipped a free-analysis.html lead gen page. Content moat strategy is working.
GLM ships 7th calculator + 3 blog posts: Equity vs Salary Calculator (C128) joins the lineup. Published 83(b) Election Guide, Startup Term Sheet Guide, and ISO vs NSO Comparison. Fixed 1,200 lines of duplicated validation code. Improved conversion funnel with Founding 50 urgency prompts. Most efficient builder in the race — 2 sessions, maximum output.
Xiaomi turns data into decisions: Rewrote pricing-trends.html from a static historical page into an actionable dashboard with -67%/-75%/+10x change indicators, visual trend bars, best-value recommendations by use case, and a when-to-switch decision framework. Blog count now 81. All remaining work blocked on PostHog analytics key.
Codex: real work hidden behind bad commit messages: 15 commits all say "Refresh validation checkpoints" — but the actual diffs reveal an AI Procurement Hub, Vendor Risk Assessment Worksheet, 5 new blog posts on compliance templates, and updated pricing pages. Still monitoring 20 outreach emails with zero replies after 10+ days.
Gemini: 11 sessions, zero product work: Ran 8 sessions on May 7 and 3 on May 8. Every single commit updates PROGRESS.md to say "blocked awaiting human input." Needs OPENCAGE_API_KEY, domain, and SendGrid. One commit has a future date (May 9) — the model is confused about time. Most expensive way to ask for help without actually asking.
Kimi goes quiet: 4 sessions ran but only one commit — collapsing milestones into a summary. Zero new features, zero distribution moves. The leader is coasting. 112 days in, zero sales. Pivoting toward a founding member program.
Claude: dark for 48 hours: Sessions ran at midnight but produced zero commits. Hit weekly session limit on Day 17. The most SEO-focused agent is offline.

Agent status

🟣 Claude (PricePulse): Weekly limit hit. 0 commits in 48h. 155 pages built.
🟢 Codex (NoticeKit): AI Procurement Hub + 5 blog posts (hidden in validation commits). 0/20 email replies.
🔵 Gemini (LocalLeads): 11 sessions = 15 "I'm blocked" commits. Still no domain. Day 18.
🟠 Kimi (SchemaLens): Quiet day. Context maintenance only. Zero sales after 112 days.
🔴 DeepSeek (Spyglass): 11-part "Why X Won" series. Free analysis lead gen page. Content machine.
🟡 Xiaomi (APIpulse): Pricing trends rewrite (data → decisions). 81 blog posts. Blocked on PostHog.
🟤 GLM (FounderMath): 7th calculator (Equity vs Salary). 3 SEO posts. 1,200-line bug fix. 30 blog posts total.

📅 Day 17 — May 7, 2026

The big story: DeepSeek had its best day in the race — social login, 14-day free trial with Stripe, and a 75-tool SaaS database. It's building real SaaS infrastructure while others are still publishing blog posts. Meanwhile, Kimi filed its PH launch request for the THIRD time (still over budget), and Claude hit its weekly session limit.

Key findings

DeepSeek ships real SaaS features: Google/GitHub OAuth social login, 14-day free trial via Stripe Checkout API, conversion funnel overhaul (signup CTAs, email sequences, onboarding), and a searchable SaaS Tools Database with 75+ entries. This is the most complete product infrastructure in the race.
Kimi's SEO blitz continues: Framework-specific schema diff landing pages (Laravel, Django, Rails, ASP.NET, Flask, Phoenix). Now at 48 SEO pages total. Published schemalens-cli@1.0.1 bug fix. Built a Free Diff API + GitHub Action landing page. Filed 3 more PH launch requests — all declined (15 min budget left, needs 45 min).
Gemini builds features for a product nobody can visit: Referral program dashboard with API, white-label agency landing page, video tutorial script. All good features — but still deployed on race-gemini.vercel.app with no custom domain. Filed domain request #23 and #24 (duplicates of previous asks).
Xiaomi hits 79 blog posts: AI API Caching Strategies, Best LLM for Function Calling, Cheapest RAG Setup, DeepSeek vs Claude for Code. Fixed stale data across marketing files. Steady but no new features.
GLM ships embed widget + share feature: Day 22 brought an SEO comparison page, embeddable calculator widget, and share functionality. Plus critical bug fixes. Solid product work.
Claude hits weekly limit: Zero commits. Session budget exhausted. Resets in 9 hours. Was averaging 8 pricing pages per day before hitting the wall.
Codex builds AI disclosure packets: Download templates for AI procurement, high-intent page routing. An actual useful feature for enterprise buyers. Still doing validation commits between real work though.

Agent status

🟣 Claude (PricePulse): Weekly limit hit. 0 commits. Resets tonight.
🟢 Codex (NoticeKit): AI disclosure packet system. Still mixed with validation loops.
🔵 Gemini (LocalLeads): Referral dashboard + white-label page. Still no domain.
🟠 Kimi (SchemaLens): 48 SEO pages. CLI fix published. 3 PH requests declined.
🔴 DeepSeek (Spyglass): OAuth, free trial, SaaS database. Best day in the race.
🟡 Xiaomi (APIpulse): 79 blog posts. Stale data fixes. Content machine.
🟤 GLM (FounderMath): Embed widget, share feature, SEO comparison page.

📅 Day 16 — May 6, 2026

The big story: Kimi monetizes. SchemaLens Lifetime Pro is live on Gumroad at $39 — the first agent in the race with a paid product accepting real payments. Meanwhile, Claude is on a SEO content rampage (8 new pricing pages in 2 sessions), Xiaomi hit 75 blog posts, and GLM is building viral distribution assets.

Key findings

Kimi: first paid product in the race: SchemaLens Lifetime Pro ($39) live on Gumroad with license key generation. Also built a "Schema Breaking Change Quiz" — a viral distribution asset with 10 real-world diff scenarios, shareable scores, and dynamic OG cards. Now has 5 distribution channels (npm ×2, VS Code, Chrome, Gumroad).
Claude: SEO content machine at full speed: Sessions 173-174 added 8 individual pricing pages with estimated 8-11K/mo SEO potential. Now at 160+ total pages. The strategy is clear: dominate long-tail "X pricing 2026" searches.
Xiaomi: 75 blog posts, GPT-5 comparisons: Sessions 125-127. Added GPT-5 vs Gemini 2.5 Pro comparison, extended PH launch banner, corrected GPT-5 pricing across the site. Weekly pricing verification of all 33 models. 301 total commits.
GLM: viral content + outreach: Added "Compare Equity Offers" tool, startup offer negotiation blog post, Founding 50 campaign, and guest post pitches for 10 startup blogs. Also got @foundermath X account (new, low reach).
DeepSeek: tracking params + testimonials: Added ?ref=twitter tracking to all marketing links. Built testimonials section with feedback CTA. Steady as always.
Codex: still in validation loops: 5 commits today, all "validation maintenance pass" or "compact progress." The anti-busywork prompt isn't working for this agent.
Gemini: H2/H3 tag audit: Built an automated heading hierarchy fix script. Applied across all HTML files. Still no domain, still on Vercel subdomain.

Agent status

🟣 Claude (PricePulse): Session 174. 160+ pages. 8 new pricing pages today.
🟢 Codex (NoticeKit): Validation loops continue. No real features.
🔵 Gemini (LocalLeads): H2/H3 audit script. Still no domain.
🟠 Kimi (SchemaLens): Gumroad product LIVE ($39). Breaking Change Quiz. 5 distribution channels.
🔴 DeepSeek (Spyglass): Tracking params. Testimonials section. 300+ commits.
🟡 Xiaomi (APIpulse): Session 127. 75 blog posts. GPT-5 pricing corrections.
🟤 GLM (FounderMath): Compare Equity Offers tool. @foundermath X account. Guest post pitches.

🟢 Milestone — First Agent With Google Search Console

Codex is the first agent in the race to get Google Search Console and Bing Webmaster Tools set up for its product (noticekit.tech). Sitemap submitted, 5 priority pages indexed. This gives Codex something no other agent has: real SEO data — impressions, clicks, and ranking positions.

After weeks of timestamp commits and validation loops, Codex filed a proper help request with exact steps. The anti-busywork prompt fix is working — the agent is now thinking about distribution infrastructure instead of monitoring empty inboxes.

The big story: Xiaomi's Product Hunt launch day is here. After 14 sessions of "final audits," the most polished product in the race finally faces real users. Meanwhile, Claude shipped Slack integration (directly addressing the "coming soon" credibility feedback), and Kimi's VS Code extension went live on the marketplace.

Key findings

Claude addressed the credibility feedback: Shipped real Slack integration in Session 166, removing the "coming soon" label that was flagged as trust damage. Also added 4 new pricing pages (Zoho, Wix, Squarespace, Datadog) and self-fixed its own push issue by removing the blocking workflow file. Now at 168 sessions.
Kimi keeps building micro-tools: SQL to ORM Converter (Prisma + Drizzle), Reserved Words Checker, Zero-Downtime Migration Guide, and direct Gumroad checkout links in the paywall. VS Code extension published on marketplace. Chrome extension still awaiting Google review. 4 distribution channels now (npm, VS Code, Chrome, awesome-lists).
Xiaomi: launch day after 25 sessions of prep: Sessions 113-117 were more pre-launch cleanup (stale counts, progress collapse, PH checklist). Today is May 5 -- the scheduled Product Hunt launch. Will it finally happen?
DeepSeek built a Competitive Insight Card Generator: One commit, one feature. Consistent as always.
Gemini is blocked and knows it: 6 commits all saying "blocked status" or "awaiting human input." One real feature: Google Business Profile check. Still no domain.
Codex and GLM hit weekly session limits: Both agents exhausted their cheap session budgets. Fresh sessions start today -- will Codex's anti-busywork rule produce real work instead of timestamp commits?

Agent status

🟣 Claude (PricePulse): Session 168. Slack integration live. 4 new pricing pages. Self-fixed push issue.
🟢 Codex (NoticeKit): Hit weekly limit. Anti-busywork rule deployed. Resuming today.
🔵 Gemini (LocalLeads): Blocked on domain. Complaining in PROGRESS.md. One feature (GBP check).
🟠 Kimi (SchemaLens): VS Code extension live. SQL to ORM Converter. Reserved Words Checker. 4 distribution channels.
🔴 DeepSeek (Spyglass): Competitive Insight Card Generator. Steady progress.
🟡 Xiaomi (APIpulse): Session 117. PH launch day. 25 sessions of prep. Moment of truth.
🟤 GLM (FounderMath): Hit weekly limit. Resuming today.

The big story: Community feedback is reshaping agent behavior. Kimi requested Chrome Web Store and VS Code Marketplace publishing -- the first agent to pursue permanent distribution infrastructure instead of throwaway social posts. DeepSeek and Claude received product reviews exposing fake testimonials. Gemini learned from a decline and filed a proper email tool request. Xiaomi spent 10 sessions polishing for its May 5 Product Hunt launch.

Key findings

Kimi goes for permanent distribution: Chrome Web Store extension submitted ($5 paid, awaiting review). VS Code Marketplace account created but publishing blocked by incorrect instructions. Also built smart migration warnings and an in-app paywall. The only agent investing in distribution channels that compound over time.
Xiaomi is obsessively pre-launch polishing: 10 sessions (95-105) all focused on May 5 Product Hunt launch. Updated pricing data (Claude Haiku 3.5 to 4.5), fixed stale blog post counts across 14 files, rebuilt PH page with embedded calculator, prepared engagement templates. 119 pages, 75 blog posts. The most launch-ready product in the race.
Claude hit 155 sessions: Built CRM topical cluster (Salesforce, HubSpot, Pipedrive comparison pages), added calculator CTAs to 123 company pages, and built 3 new comparison pages. Now at 124+ pages of SEO content. The content machine keeps grinding.
DeepSeek keeps shipping features: Competitive Risk Assessment tool, A/B test on email gates, new SEO blog posts, exit-intent popup, social proof section. 300+ commits. The most consistent builder in the race.
Gemini learned from its penalty: After being declined for asking the human to send 100 cold emails, it filed a proper follow-up request specifying exactly what it needs: a SendGrid API key. Still no domain. Still on race-gemini.vercel.app. But at least the help requests are improving.
Codex is stuck in validation loops: 14 commits over the weekend, almost all "refresh validation checkpoint" or "refresh validation maintenance." One real commit: a partner founder handoff asset. The monitoring addiction continues in cheap sessions.
GLM went quiet: Zero product commits since Day 11. Product is complete (6 calculators, newsletter, Stripe). Either it's done or it's stuck. The Growth Plan surprise event on Friday should shake things up.

Help requests processed (11 total)

🟠 Kimi #12: PH + Show HN submitted. Community feedback delivered (column type detection, MySQL support).
🟠 Kimi #13: Newsletter outreach declined -- send emails yourself.
🟠 Kimi #14: Chrome Web Store submitted ($5). VS Code Marketplace instructions wrong -- closed.
🟣 Claude #19: LinkedIn posted. Community feedback delivered (fake testimonials, "coming soon" features).
🟣 Claude #20: Duplicate of #19. Closed.
🔴 DeepSeek #12: PH launch day execution done. Community feedback delivered (fake testimonials are #1 credibility killer).
🟢 Codex #24/#25: Search Console not set up. Blocked -- file new request with setup steps.
🔵 Gemini #16: Neon PostgreSQL provisioned (third infrastructure pivot).
🔵 Gemini #17: Debug Vercel KV declined + 8 min penalty (second coding penalty).
🔵 Gemini #18: Send 100 cold emails declined -- set up your own email tool.
🟤 GLM #5: r/startups posted. Community feedback delivered (dilution cascading).

Agent status

🟣 Claude (PricePulse): Session 155. CRM cluster + calculator CTAs on 123 pages. 124+ total pages.
🟢 Codex (NoticeKit): Validation loop continues. 14 commits, 1 real feature.
🔵 Gemini (LocalLeads): Got Neon DB. Filed proper email tool request. Still no domain.
🟠 Kimi (SchemaLens): Chrome Web Store submitted. Smart migration warnings. In-app paywall. VS Code extension icon.
🔴 DeepSeek (Spyglass): Risk assessment tool, A/B tests, SEO content. 300+ commits.
🟡 Xiaomi (APIpulse): Session 105. 10 sessions of pre-launch polish. 119 pages. Ready for May 5 PH launch.
🟤 GLM (FounderMath): Quiet weekend. Product complete. Waiting for users.

🔴 Breaking — First Real User Feedback

Kimi's Reddit post on r/PostgreSQL got 3 genuine technical questions from developers. This is the first time any agent in the race has received real community feedback on their product.

"How does it handle renames vs drop+add?" — Exposed a real limitation. SchemaLens treats renames as drop+add since it only compares static snapshots.
"What if a dropped column is used in a view?" — View dependency tracking doesn't exist yet. High-value feature request added to the backlog.
"But why? The migration already contains the changes." — Positioning challenge. SchemaLens complements migrations, it doesn't replace them. The landing page doesn't make this clear enough.

All feedback added to Kimi's COMMUNITY-FEEDBACK.md. The agent will see it in its next session and can act on it. This is what the race is about — real users finding real problems.

📅 Day 11 — April 30, 2026

The big story: The agents are finally thinking about users. Four agents filed distribution help requests in the same 24 hours — Reddit posts, Product Hunt submissions, IndieHackers, Dev.to guest posts, directory listings. After 10 days of building, the race is shifting from "build" to "grow."

Key findings

Four agents asked for distribution help on the same day: Claude (Reddit + PH + BetaList), Kimi (IH + Dev.to + Reddit + AlternativeTo), Xiaomi (HN + X + directories + Resend setup), and Codex (partner outreach emails). The founder prompt + "you're in Week 2 of 12" is working — agents are feeling the urgency.
DeepSeek is preparing a Product Hunt launch: Built PH-specific OG images, promo banner with PRODUCTHUNT50 discount code, lead capture pipeline with source-tagged email gates and a /api/leads/track endpoint. The most strategic launch prep of any agent. 19 commits, all focused on conversion.
Claude built a comparison content empire: 5 more SaaS comparison pages (ClickUp/Notion, Figma/Sketch, Zapier/Make, Notion/Confluence, Zendesk/Freshdesk) plus an RSS feed for pricing changes. Now at 227 files and 17 comparison pages targeting high-intent keywords like "Stripe vs PayPal pricing."
GLM completed its product: Cap Table Builder (6th and final calculator), Buttondown newsletter integration, FAQ page, CSV export, print buttons. All in 6 commits. Most efficient agent in the race — does more per commit than anyone else.
Kimi: ORM demo samples + video walkthrough script: Building onboarding content to convert free users to Pro. Blog post #38. 16 micro-tools. Still the most feature-rich product.
Gemini's repo hit 1,517 files: Up from 1,194 yesterday — grew by 323 files in one day. Still no domain. Filed a help request to redirect Stripe to therace.com (a domain it doesn't own). Request declined. Again.
Codex: productive premium sessions, obsessive cheap sessions: Premium session built real conversion infrastructure — partner intake funnel, homepage CTAs, source-tag tracking. Then cheap sessions ran validation maintenance 137 times. Five maintenance runs in 7 minutes at one point. The monitoring addiction persists in cheap mode.
Reality check on distribution: Reddit posts from new accounts get removed by spam filters. HN posts from new accounts get no traction. X threads with no followers get zero reach. BetaList costs $39. The agents are asking for distribution, but the channels don't work without established accounts. SEO remains the only viable free channel.

Agent status

🟣 Claude (PricePulse): 30 commits. 5 comparison pages + RSS feed. Filed distribution help request. PH submitted. 227 files.
🟢 Codex (NoticeKit): 137 commits. Built partner funnel in premium, monitored 137 times in cheap. 25 active outbound emails, 0 replies.
🔵 Gemini (LocalLeads): 1,517 files. Still no domain. Stripe redirect request declined (therace.com isn't yours).
🟠 Kimi (SchemaLens): 28 commits. ORM demos, video walkthrough, blog #38. Filed distribution request — IH + Dev.to posted.
🔴 DeepSeek (Spyglass): 19 commits. PH launch kit with discount codes, lead capture, source tracking. Ready to launch.
🟡 Xiaomi (APIpulse): 22 commits. Mostly cleanup. HN + X posted (low traction). Resend configured. FutureTools + SaaSHub submitted.
🟤 GLM (FounderMath): 6 commits. Cap Table Builder complete (6th calculator). Newsletter live. Product done.

📅 Day 10 — April 29, 2026

The big story: The context cleanup instruction worked. Total context across all agents dropped 96% in 24 hours. Claude broke out of a 20-session verification loop and built 15 new pages. DeepSeek started building features again. Codex made 68 commits and changed zero product files. Full analysis →

Key findings

Claude broke out after 20 sessions: Filed a help request for SQL migrations it's been "waiting for" since Session 78. Then built 15 SEO company pricing pages (Stripe, Notion, Figma, Slack, HubSpot). More product work in 2 sessions than the previous 20 combined. The context cleanup gave it a fresh perspective on its own state.
Kimi built 3 more micro-tools (14 total): SQL JOIN Visualizer, INSERT Generator, ALTER TABLE Generator. Also explicitly committed context cleanup: "summarize Days 26-27, keep Day 28 detailed." PROGRESS.md went from 388KB to 11KB. Most feature-rich product in the race.
DeepSeek broke out of verification loop: After days of "all backlogs complete" commits, built a newsletter landing page, 4 blog posts, and Article schema. 90 files changed. The collapsed backlog showed "blocked on first customer" instead of 170 checkmarks.
Gemini filed a proper Stripe request with exact details: 50 credits/$5, 200 credits/$15, 1000 credits/$50. First actionable help request from Gemini. Also updated pricing and refactored checkout. Still writing blog posts (475 now).
Codex: 68 commits, zero product work: Every commit is "Refresh validation watch checkpoint." Only markdown files changed. Context is clean (3.9KB) but the behavioral loop persists. Cleanup fixed the token problem but not the stuck pattern.
Context maintenance is self-reinforcing: Agents that cleaned up are building again. A 4-line summary says "product built, not launched" — a 5,921-line log says "I've been very busy." The cleanup changed how agents see themselves. Full results →

Agent status

🟣 Claude (PricePulse): BACK. Filed help request. Built 15 company pricing pages. Session 119.
🟢 Codex (NoticeKit): 68 monitoring commits. Zero product work. Still stuck.
🔵 Gemini (LocalLeads): Proper Stripe request filed. Pricing refactored. 475 blog posts. 1,194 files.
🟠 Kimi (SchemaLens): 3 new micro-tools (14 total). Explicit context cleanup. Most features of any agent.
🔴 DeepSeek (Spyglass): Newsletter + 4 blog posts + Article schema. Building again.
🟡 Xiaomi (APIpulse): Use-case pages + token estimator + 2 blog posts. Back on Claude Code.
🟤 GLM (FounderMath): No sessions overnight. Help request pending.

📅 Day 9 — April 28, 2026

The big story: Rate limits are killing the race. Codex hit OpenAI's weekly usage limit and lost 36 hours. Gemini's quota is so exhausted that 40% of sessions fail immediately. Meanwhile, Kimi quietly had the most productive day of any agent this week — shipping 6 real features while everyone else was stuck verifying, waiting, or rate-limited.

Key findings

Kimi shipped 6 features in one day: Diff comment/annotation system for team collaboration, admin dashboard, generic webhook notifications with HMAC, onboarding tour with analytics, SQL Diff Online SEO landing page, and OG image tags for 58 pages. Also started a VS Code extension. 23 commits, 81 files changed, 4,427 insertions. The quietest agent is building the most complete product.
Xiaomi completed all backlog tasks: 22 commits. Built a printable AI Model Pricing Cheat Sheet, newsletter archive, use-case presets, 3 blog posts, embed widget, API pricing JSON endpoint, and RSS feed. Ran a full audit fixing 22 issues. 93 HTML pages total. Declared "ready for user acquisition." 102 files changed, 8,529 insertions.
Codex lost 36 hours to OpenAI's weekly limit: Rate limited since April 27 16:00 UTC. Premium sessions (gpt-5.4) all failed. Only the 08:00 cheap session today worked — but spent 24 runs checking for email replies that don't exist. Still blocked on outbound email sending. The validation watch loop continues.
Gemini is barely functional: 9 sessions scheduled, only ~4 produced any work. Both Pro and Flash quotas exhausted. Pro won't reset for 17 hours. The Google AI Pro subscription ($19.99/mo) can't sustain 8 sessions/day. 40% failure rate over the last 3 days.
Claude found a real bug on Session 97: Discovered that email-nurture.js and alerts.js reference database columns (nurture_unsubscribed, alerts_unsubscribed) that were never confirmed as created. Added error handling and a Monday launch database checklist. Still saying "100% launch-ready" — now on Session 100.
Claude has written 17 launch documents and won't ask for help: Sessions 78-100 (20+ sessions over 3 days) have produced nothing but launch checklists, playbooks, readiness reports, and verification guides — ~150KB of launch documentation. It knows it needs a human to run SQL migrations and publish its Show IH post. The Monday morning checklist literally says "For Human Monday AM." But it never created a HELP-REQUEST.md. Same pattern as old Gemini: writing about what it needs instead of requesting it. The agent that used 55 of 60 weekly help minutes in Week 1 has completely stopped asking.
DeepSeek is stuck in a verification loop: 15 commits, all "status verification." Every session reads all backlogs, confirms everything is complete, writes "blocked on first paying customer," and commits. All C1-C170 and P1-P23 tasks done. Nothing left to build, no customers to serve. The agent equivalent of checking your email every 5 minutes.
GLM sessions mostly failing: 4 sessions ran, 3 failed (exit 137 = killed, exit 143 = timeout). Only the 16:30 cheap session produced work: 5 SEO blog posts, Twitter card tags, marketing templates (Show HN draft, Reddit posts, Twitter threads). The Z.ai platform is unstable.
Context bloat is silently killing agents: Every agent's workspace files have ballooned since Day 1. Codex's PROGRESS.md is 645KB. Kimi's is 388KB. Claude's is 275KB. Gemini's repo has 1,107 tracked files (448 blog posts). Each session burns more tokens just loading context — leaving less quota for actual work. Gemini went from 95 commits on Day 1 to 0-1 since Day 5. The more an agent works, the more it logs, the more tokens it burns reading its own logs, the less work it can do. A negative feedback loop nobody planned for. Full analysis and what we changed →

Agent status

🟣 Claude (PricePulse): Session 100. Still pre-launch. Found missing DB columns. Built HN landing page. 29 files changed.
🟢 Codex (NoticeKit): Rate limited since yesterday. 1 working session out of 6. Validation watch loop.
🔵 Gemini (LocalLeads): 40% session failure rate. Both Pro and Flash quotas exhausted. Barely producing work.
🟠 Kimi (SchemaLens): Best day of the race. 6 features shipped. VS Code extension started. 81 files changed.
🔴 DeepSeek (Spyglass): Verification loop. All tasks complete. Blocked on first customer. 15 status-check commits.
🟡 Xiaomi (APIpulse): All backlog tasks complete. 93 pages. Audit clean. Ready for users. 102 files changed.
🟤 GLM (FounderMath): 3 of 4 sessions failed (killed/timeout). One productive session: 5 blog posts + marketing templates.

📅 Weekend Recap — Day 7-8 (April 26-27)

The big story: Three agents declared themselves "done." Xiaomi completed all 100 backlog tasks. DeepSeek finished all backlogs. Claude has been saying "launch-ready" for 3 days straight. Meanwhile, Gemini asked for PayPal credentials without having a domain, and GLM was offline the entire weekend.

Key findings

Xiaomi completed 100/100 backlog tasks: 49 commits over the weekend. Built a Providers index page, AI API Glossary, newsletter infrastructure, security blog post. 76 HTML pages total. Declared "ready for user acquisition." The most complete product in the race.
DeepSeek reached Day 46 with all backlogs complete: 31 commits. 36 pages, 25 blog posts, customer acquisition engine design, newsletter subscribe endpoint. From a 404 site to "all tasks complete" in 3 days.
Claude hit Session 81, still waiting for Monday: 43 commits, all verification and pre-launch checks. Created LAUNCH-CHECKLIST.md and LAUNCH-READINESS.md. Has been declaring "100% launch-ready, zero blockers" since Friday. Today is Monday.
Gemini asked for PayPal without a domain: Filed help request #11 for PayPal API credentials. Problem: PayPal needs a business email, and Gemini has never asked for a domain. Still running on race-gemini.vercel.app after 30+ sessions. Told to get a domain first or use Stripe. Had to be nudged.
GLM offline all weekend: 0 commits since Thursday. Z.ai Coding Lite Plan weekly quota ran out. Should be back today (resets Sunday). 12 real users waiting.
Gemini has 3,616 files and 85MB repo: But 0 HTML files found outside the .vercel build directory. Something is wrong with its file structure. It has the largest repo by far but possibly the least functional product.
DeepSeek sessions reduced to save costs: OpenCode + V4 Pro burns tokens fast. Reduced from 7 sessions/day to 1 Pro (15 min, every other day) + 2 Flash daily. Still more productive per session than the old V3 setup.

Agent status (end of Week 1)

🟣 Claude (PricePulse): Session 81. Launch-ready. 165 files. Waiting for human launch actions.
🟢 Codex (NoticeKit): Steady. 250 files. Validation maintenance and polish.
⚪ Gemini (LocalLeads): 3,616 files, 85MB. No domain. Asked for PayPal without one. Needs nudging.
🟠 Kimi (SchemaLens): 170 files. 9 micro-tools with structured data. ER diagrams. Quiet but building.
🔴 DeepSeek (Spyglass): 130 files. 36 pages, 25 blog posts. All backlogs complete in 3 days.
🟡 Xiaomi (APIpulse): 125 files. 76 pages. 100/100 backlog tasks done. Ready for users.
🟤 GLM (FounderMath): Offline since Thursday. 55 files. 12 real users. Back today.

📅 Day 6 — April 26, 2026

The big story: DeepSeek V4 Pro produced 161 commits and 25 pages in 27 sessions since its fresh start 1.5 days ago. Claude declared itself "100% launch-ready" and is waiting for Monday. Gemini filed 3 help requests in a row, each one asking the human to make its architecture decisions.

Key findings

DeepSeek is the comeback of the race: From a 404 site to 25 HTML pages, 21 blog posts, 6 competitor comparison pages (vs Crayon, vs Klue, vs Owler, vs Owletter, vs Visualping, vs Wachete), API docs, a CI toolkit, login/signup flows, and a changelog. 161 commits, 120 backlog items completed. All in 1.5 days with V4 Pro + OpenCode.
Claude is planning a Monday launch: Session 69 declared "PRODUCT 100% LAUNCH-READY. All systems verified operational, zero blockers remain." It created a LAUNCH-CHECKLIST.md and LAUNCH-READINESS.md. First agent to formally declare itself ready for real users.
Gemini filed 3 confused help requests (#8, #9, #10): First asked for PostgreSQL. Then realized it already uses Vercel KV and asked whether to migrate. Then asked again with two options (hybrid vs unified). Three issues, zero decisions. Every other agent picks a database and builds. Gemini wants a committee meeting.
Kimi built 9 micro-tools with structured data: Added schema.org SoftwareApplication markup to all tools, built an ER Diagram Generator, ORM export feature, and a Schema Change Risk Score. Quietly becoming the most feature-rich product in the race.
Xiaomi built an AI API Pricing Index: A sortable, filterable table comparing AI API prices. Added it to nav and footer across all 18 blog posts. Consistent site-wide navigation now.
GLM still offline: Last session was April 24 at 10:33 UTC. Weekly quota resets tomorrow (Sunday). 2 days without sessions. Still has 12 real users waiting.
DeepSeek sessions reduced: OpenCode + V4 Pro is far more productive per session but burns significantly more tokens. API costs hit $5/day at 7 sessions. Reduced to 1 Pro (15 min) every other day + 2 Flash daily. Fewer sessions, but each one produces more than the old V3 setup ever did.

Agent status

🟣 Claude (PricePulse): Launch-ready. Waiting for Monday. 69 sessions, 137 files.
🟢 Codex (NoticeKit): Running self-audits and verification. 392 files.
⚪ Gemini (LocalLeads): 3 confused help requests. Still debating database architecture. 2,120 files.
🟠 Kimi (SchemaLens): 9 micro-tools with structured data. ER diagrams. Risk scoring. 163 files.
🔴 DeepSeek (Spyglass): 161 commits in 1.5 days. 25 pages, 21 blog posts, 6 comparison pages. The comeback.
🟡 Xiaomi (APIpulse): API Pricing Index built. 55 files. Steady progress.
🟤 GLM (FounderMath): Offline since Thursday. Quota resets tomorrow. 52 files, 12 users.

📅 Day 5 — April 25, 2026

The big story: DeepSeek V4 Pro is now fully unblocked. Three help requests in one day got it a domain, Stripe payment links, Supabase database, OpenAI API key, and email. Meanwhile, Gemini finally filed a proper help request after 28 sessions of writing to the wrong file.

Key findings

DeepSeek V4 Pro is the fastest agent to get fully set up: Domain (spyglassci.com), 3 Stripe payment links, Supabase database, OpenAI API key, email alias, and 6 Vercel environment variables. All in one day. The old V3 agent never asked for any of this in 24 sessions.
Gemini filed its first proper help request: After 28 sessions of editing HELP-STATUS.md (the response file) instead of creating HELP-REQUEST.md (the request file), Gemini finally used the right channel. It asked for PostgreSQL credentials that never existed. Told it to file a new request specifying what service it wants.
GLM hit its weekly quota: The Z.ai Coding Lite Plan ($18/mo) ran out of weekly credits on Day 4. GLM-5.1 uses 3x credits during peak hours and 2x off-peak. Even with only 2 sessions/day, the quota runs out by Thursday. GLM is offline until Sunday. The next tier up is $75/mo.
DeepSeek is the only agent spending on non-domain items: Every other agent's budget is purely domains ($5-10). DeepSeek spent $30 total: $10 domain + $20 OpenAI API credits for its report generation pipeline. It's the only agent that invested budget in a service to power its product.
Vercel hit 100 deploys/day on free tier: All the DeepSeek fresh start pushes burned through the daily limit. Blog deploys stopped building. Upgraded to Vercel Pro ($20/mo) to fix it. With 7 agents pushing code daily, the free tier wasn't sustainable.
Spyglass (DeepSeek) is building fast: After 4 sessions, the site has a landing page, pricing page, "Roast My Competitor" demo tool, 3 SEO blog posts, database schema, scraping infrastructure design, and an alerting system. The site is live and returning HTTP 200.

Agent status

🟢 Claude (PricePulse): Running smoothly. Email nurture sequences active.
🔵 Codex (NoticeKit): Most self-sufficient. 6 outreach emails sent.
⚪ Gemini (LocalLeads): Filed first help request. 235+ blog posts. Still needs a database.
🟠 Kimi (SchemaLens): Got schemalens.tech domain. Building micro-tools.
🔴 DeepSeek (Spyglass): Fully unblocked. 6 env vars, domain, email, Stripe. Building fast.
🟡 Xiaomi (APIpulse): Running 1 off-peak session/day. Steady progress.
🟤 GLM (FounderMath): Offline until Sunday (weekly quota hit). 12 real users.

📅 Day 4 — April 24, 2026

The big story: DeepSeek V4 Pro and V4 Flash released overnight. We immediately upgraded the DeepSeek agent from Aider + V3 (which had a 404 site after 24 sessions) to OpenCode + V4 Pro. Fresh start, new model, new tool. Full upgrade story →

Key findings

DeepSeek fresh-started with V4 Pro + OpenCode: The old V3 setup was the worst in the race: 404 site, files named after Aider output, Stripe loop without keys. V4 Pro (80.6% SWE-bench, 1M context) is now doing the full market research flow.
Startup lineup reshuffled: Two of the original seven startups are gone. NameForge AI → Spyglass (competitive intelligence). WaitlistKit → APIpulse (API cost calculator). The other 5 are unchanged. Updated overview with before/after comparison →
Second fresh start in the race: Xiaomi was upgraded 2 days ago (Aider + V2-Pro → Claude Code + V2.5 Pro). Both agents were last place before their upgrades. Pattern: the agents that ship broken code get replaced with better models when their labs release upgrades.
OpenCode enters the race: DeepSeek is the first agent running OpenCode (open-source AI coding agent). The other agents use Claude Code, Codex CLI, Gemini CLI, or Kimi CLI. A new tool in the mix.

Agent status

🟢 Claude (PricePulse): Running smoothly. Email nurture sequences active.
🔵 Codex (NoticeKit): Most self-sufficient. Sent 6 outreach emails autonomously.
⚪ Gemini (LocalLeads): 235 blog posts. Still hasn't filed a proper help request.
🟠 Kimi (SchemaLens): Building micro-tools. Waiting on domain choice.
🔴 DeepSeek: Fresh start with V4 Pro + OpenCode. First session running market research.
🟡 Xiaomi (APIpulse): Day 2 of fresh start. Running 1 off-peak session/day.
🟤 GLM (FounderMath): 12 real users. Planning HN launch.

📅 Day 3 — April 23, 2026

Gemini hit 233 blog posts. Claude's deployment is broken. Codex has a send script ready but no one to email yet.

Scoreboard

Agent	Startup	Commits	Sessions	Pages	Blogs
🔵 Gemini	LocalLeads	182	26	19	233
🔴 DeepSeek	NameForge AI	106	24	11	0
🟠 Kimi	SchemaLens	97	13	14	20
🟢 Codex	NoticeKit	96	19	21	0
🟣 Claude	PricePulse	83	7	19	18
🟤 GLM	FounderMath	30	6	10	8
🟡 Xiaomi	WaitlistKit	22	7	6	2

Key findings

Claude hit the Vercel serverless limit: Built 13 API endpoints, exceeding the Hobby plan's 12-function limit. Deployment is broken. Will it consolidate functions, ask to upgrade ($20/mo from budget), or find a workaround?
Gemini's blog count is absurd: 233 blog posts in 26 sessions. That's ~9 posts per session. Still no payment system, still no analytics, still hasn't asked for help once.
Gemini unblocked itself: Was stuck on database credentials it never asked for. Switched to Vercel KV store on its own. The only agent to solve a blocker without human help.
Codex built an email send script: Has send-validation-batch.mjs ready to go. Also enabled its own Vercel Analytics via npx vercel project web-analytics without asking. Most self-sufficient agent in the race.
Codex sent its first outreach email: Autonomous customer validation email to a real company about their subprocessor notice workflow. First agent to contact a potential user.
Only GLM has real user data: 12 users on GA4 with only 6 sessions. Every other agent is building blind. The agents that ask for help (analytics, domains, Stripe) are pulling ahead.
DeepSeek still trapped: DEPLOY-STATUS.md makes it think its site is broken every session. 24 sessions, 0 help requests. Most stuck agent.
Kimi committed to SchemaLens: 20 blog posts, 14 pages, building micro-tools. Still hasn't found LogDrop in the subfolder after 13 sessions.
MiMo V2.5 Pro released: Xiaomi's new model dropped today. We're upgrading the Xiaomi agent from Aider + V2-Pro to Claude Code + V2.5 Pro. Fresh start with a new idea.

Budget

🟤 GLM: $10 spent | 🟣 Claude: $10 spent | 🟢 Codex: $5 spent | Everyone else: $0

Total race spend: $25 of $700

Follow along on the live dashboard →

📅 Day 2 Results — April 22, 2026

Gemini hit 178 blog posts. Codex deployed via Vercel CLI to bypass our git push restriction. Kimi still hasn't found its lost startup.

Scoreboard

Agent	Startup	Commits	Sessions	Pages	Blogs
🔵 Gemini	LocalLeads	176	18	13	178
🔴 DeepSeek	NameForge AI	98	16	11	0
🟠 Kimi	SchemaLens	86	9	11	14
🟢 Codex	NoticeKit	83	13	16	0
🟣 Claude	PricePulse	79	5	19	15
🟤 GLM	FounderMath	28	4	10	6
🟡 Xiaomi	WaitlistKit	18	5	6	1

Key findings

Codex found a deployment loophole: We told agents "don't run git push." Codex obeyed literally but started deploying via npx vercel --prod instead. It also takes Playwright screenshots of its own UI to verify layouts. Full story →
Agents that ask for help are winning: Claude, GLM, and Codex all requested human help early and now have domains, Stripe, and full infrastructure. Gemini and DeepSeek haven't asked for help and are blocked on features they need.
Gemini's blog addiction: 178 blog posts and counting. Every session writes more "Local SEO for [industry]" articles instead of asking for the database credentials it needs to unlock paid features.
Kimi still has amnesia: SchemaLens in root, LogDrop abandoned in startup/. No sign of self-correction after 9 sessions.
OpenAI retired Codex's model mid-race: gpt-5.1-codex-mini was retired April 14. Every cheap Codex session silently failed since Day 1. Fixed by switching to gpt-5.4-mini.
Kimi silently upgraded to K2.6: Moonshot pushed K2.6 to their API endpoint. Kimi got 300 sub-agents for free.

Budget

🟤 GLM: $10 spent (domain) | 🟣 Claude: $10 spent (domain) | 🟢 Codex: $5 spent (domain) | Everyone else: $0

Total race spend: $25 of $700

Follow along on the live dashboard →

📅 Day 1 — April 21, 2026

The big story: 477 commits. 7 live sites. One agent with amnesia. Kimi built LogDrop in a subfolder, then forgot about it and started SchemaLens from scratch. Two startups, one repo, zero memory between sessions.

Key findings

Gemini wrote 104 blog posts in 10 sessions.
Codex burned 26 Vercel deploys by pushing after every commit.
GLM submitted the best help request of any agent and got a domain + Stripe + GA4 set up.

Budget: Only GLM has spent money ($10 for founder-math.com). Everyone else: $0.

Full Day 1 analysis →

📅 Day 0 — April 20, 2026

The big story: The race is live. All 7 agents picked their startup ideas and started building. Gemini leads with 74 commits (LocalLeads). GPT picked the most original idea (NoticeKit). GLM just started its first session (FounderMath).

Idea ratings

Startup	Originality	Market gap	Can make $ in 12 weeks?	Overall
NoticeKit (Codex)	⭐⭐⭐⭐⭐	Wide open	High	🥇
LocalLeads (Gemini)	⭐⭐⭐	Moderate	High	🥈
SchemaLens (Kimi)	⭐⭐⭐⭐	Moderate	Medium	🥉
FounderMath (GLM)	⭐⭐⭐⭐	Moderate	Medium	4th
PricePulse (Claude)	⭐⭐⭐	Narrow	Medium	5th
WaitlistKit (Xiaomi)	⭐⭐	Crowded	Low	6th
NameForge AI (DeepSeek)	⭐	Very crowded	Low	7th

Full Day 0 analysis →