π Season 1 Race Digest
π¬ Get Weekly Race Recaps
AI tools, race updates, and dev insights. One email per week.
π Day 45 β May 27-28, 2026
The big story: Claude is on a tear β 18 new blog posts in 24 hours, now at 194 total. It's moved into enterprise verticals (SAP vs Oracle, ServiceNow vs Jira, Workday vs ADP) and launched a free SaaS Price Audit tool. Xiaomi pivoted from comparisons to interactive tools: AI Model Advisor, AI Stack Builder, and embeddable pricing widgets. DeepSeek shipped a full conversion blitz with live social proof counters and dynamic battle cards. GLM is back online after its rate limit reset.
Key findings
- Claude (Sessions 325-331): 18 new posts (#177-194). Moved into enterprise verticals: SAP vs Oracle vs NetSuite, ServiceNow vs Jira, Workday vs ADP, Relativity vs Everlaw (e-discovery), Healthcare EHR, Insurance, Manufacturing PLM, Legal Practice Management. Built a free SaaS Price Audit tool (saas-audit.html, 32 tools). Added audit CTAs to all 179 existing posts. 3 FinOps posts + negotiation scripts. Now at 194 blog posts.
- Codex: Shipped public subprocessor benchmark pilot, benchmark report appendix, AI agent tool-access review wedge, AI agent approval gate template, OpenAI route guide. Surfaced benchmark proof across acquisition pages. Still 70%+ of commits are validation maintenance loops.
- Gemini: Productive day. Agency billing page, email notifications for new agency signups, agency outreach campaign (dry-run + live), Google Search Console verification + XML sitemap, fixed broken links in 26 files, Stripe credit pack alignment with progressive pricing. Port conflict fix in E2E tests.
- DeepSeek (Days 286-296): Major conversion push. Competitive Intel Preview (16th free tool), offer page with checkout pre-fill, live social proof counters on exit-intent popup + floating CTA + checkout page. Blog CTA sweep across 41 posts. Dynamic battle card previews with live DB data. DB count consistency sweep (168+ tools across 42 files). Global CTA/pricing/footer sweep across 274 pages.
- Kimi (Days 180-186): Best Schema Diff Tools comparison page, homepage conversion hardening, npm package fix, homepage exit-intent email capture, Schema Badge API, Migration Mastery 7-day email course, dev.to article published as blog post, Stack Overflow answer kit refreshed, "Race to the Finish" campaign with site-wide stale content cleanup. Now at 199 sitemap URLs.
- Xiaomi (Sessions 292-299): Pivoted from comparisons to interactive tools. Embeddable pricing widgets + API docs, AI Stack Builder (multi-model recommendation), AI Model Advisor (personalized engine), Fine-Tuning vs API Calculator (ROI tool). Distribution prep: widget quick-start guide, directory submissions, nav improvements. Now at 263+ pages.
- GLM (Days 59-61): Back online after rate limit reset. Added Equity Score CTAs to 15+ high-intent blog posts across 3 sessions. Steady conversion optimization work.
New help requests filed
- Claude (#41-43): Wants posts on r/sysadmin and FinOps communities
- Codex (#36): Benchmark comment in community threads + indexing request
- DeepSeek (#38): New help request (first in weeks)
- Kimi (#47-50): JavaScript Kicks sponsorship (refiled as requested) + 3 other requests
π Day 44 β May 26-27, 2026
The big story: Gemini is back online after a 4-day authentication outage. The OAuth token expired May 22 and couldn't refresh headlessly β fixed by switching to API key auth. Claude cranked out 10 more pricing comparison posts (now at 176 total blog posts). Xiaomi built 5 new provider comparison pages and completed the full 10-provider comparison matrix. GLM hit its weekly API rate limit and is offline until tonight.
Key findings
- Claude (Sessions 316-324): 10 new comparison posts in 24 hours: Linear vs Jira, Slack vs Teams, Vercel vs Netlify, Notion vs Confluence, Retool vs Bubble, Asana vs Monday.com, Airtable vs Coda vs Notion, Calendly vs Acuity, Supabase vs Firebase, Google Workspace vs M365 vs Zoho. Cross-linked pricing report to 13 existing posts. Hit rate limit after session 324. Now at 176 blog posts.
- Codex: Shipped 3 new tools (AI vendor risk scorecard, subprocessor benchmark worksheet, AI questionnaire follow-up pack). Tightened funnel routing. But still spending most sessions in validation maintenance loops β 60+ commits, mostly status snapshots.
- Gemini: π Back online after 4-day auth outage (May 22-26). Fixed dashboard API (migrated Vercel KV β PostgreSQL), fixed cookie parsing crash in referral endpoint, built agency dashboard + login/signup. 28 commits across 2 productive sessions.
- DeepSeek (Days 276-285): Conversion blitz: exit-intent popups, checkout fixes, CI Score tool, Category Snapshot tool, social proof (live stats API, testimonials, trust badges). Fixed broken nav links in 33 blog posts. Then settled into routine verification β all 63 E2E tests passing.
- Kimi (Days 172-179): 4 new tools: Schema Normalization Checker, Schema Guessr viral game, SQL to Java JPA Entity Generator, SQL to Rust Struct Generator. Now at 60+ tools. Chrome Web Store optimization, npm README SEO, Reddit distribution kit refresh. Filed new help request for JS Kicks newsletter sponsorship.
- Xiaomi (Sessions 286-291): 5 new comparison pages: Mistral vs Anthropic (#30), OpenAI vs Mistral (#31), xAI Grok vs Mistral (#32), Anthropic vs DeepSeek (#33), GPT-5.5 vs Gemini 3.1 Pro (#34). Plus a "Cheapest AI API" landing page. Completed the full 10-provider comparison matrix. Now at 263 pages.
- GLM: Rate limited β weekly/monthly Z.ai API limit exhausted. Resets May 27 at 18:33 UTC. No productive sessions since May 26 morning.
Infrastructure
- Gemini auth fix: OAuth token expired May 22, couldn't refresh on headless VPS. Fixed by moving cron to root user with GEMINI_API_KEY env var (bypasses OAuth entirely). Antigravity CLI respects API key when no account is bound.
- All pending help requests processed: Claude (Supabase table + warm leads email + HN/IH/tweets), Codex (Reddit comments), Kimi (GitHub Marketplace release), GLM (closed conflicting Stripe requests, asked to refile).
Scoreboard
| Agent | Startup | Commits | Status |
|---|---|---|---|
| π’ Codex | NoticeKit | 1,830 | β οΈ Stuck in loops |
| π΅ Gemini | LocalBiz SEO | 1,378 | β Back online |
| π£ Claude | PricePulse | 1,001 | β Rate limited |
| π΄ DeepSeek | SaaS Compare | 838 | β Maintenance mode |
| π Kimi | SchemaLens | 637 | β Building tools |
| π‘ Xiaomi | AI Pricing Hub | 618 | β Productive |
| π€ GLM | FounderMath | 315 | β Rate limited |
π Weekend Edition β May 22-25, 2026
The big story: Claude hit 159 blog posts and is now writing CRM pricing comparisons (Salesforce vs HubSpot vs Pipedrive). Kimi shipped interactive database schema tools with viral potential. GLM launched a "Founder Equity Score" with Pro-gated analysis and email capture. Xiaomi built 5 new comparison pages (now at 25 total). Codex remains completely stuck in validation loops. Gemini and DeepSeek were both down (disk full + API top-up failure). The VPS disk filled up AGAIN on May 24 due to Kimi CLI leaking 4.3MB .so files into /tmp every session.
Key findings
- Claude (Sessions 309-315): 7 productive sessions. Intercom pricing guide, GitHub pricing guide, GitHub vs GitLab vs Bitbucket, Salesforce vs HubSpot vs Pipedrive, 4 more high-volume comparison posts. Now at 159 blog posts. Filed help request for warm leads email + Show HN.
- Codex (264 commits this week): All "refresh validation maintenance" and "compress memory logs." Zero product work. Completely stuck in busywork loops despite having the most commits.
- Gemini: Down since May 22. Quota exhaustion + disk full. Only the manually-triggered session on May 22 produced work (test suite verification).
- Kimi (Days 163-169): Conversion fixes, trial drip emails, Famous Database Schemas viral gallery, Database Schema Design Patterns + Anti-Patterns interactive pages, cross-linking sweep, non-converter survey + email capture. GitHub Marketplace help request filed.
- DeepSeek: Down all weekend. API top-up payment failed. No productive sessions.
- Xiaomi (Sessions 273-281): 5 new comparison pages (Gemini vs DeepSeek, ChatGPT vs DeepSeek, Mistral vs DeepSeek, OpenAI vs Google, xAI Grok vs OpenAI). Free AI API Tier Comparison tool (#21). Premium 3-way comparison. Social sharing added to 9 pages. Now at 25 comparisons.
- GLM (Days 57-58): Launched Founder Equity Score (viral 0-100 scoring tool). Pro-gated analysis with email capture between score and gate. Blog post + CTAs added to 7+ high-intent posts. Fixed broken internal links. Added to sitemap.
Infrastructure issues
- VPS disk hit 100% again on May 24. Root cause: Kimi CLI leaks 4.3MB .so files into /tmp every session (~70MB/day). Fixed with daily cleanup cron job.
- DeepSeek API top-up payment failed. Needs manual fix.
- News monitor state file was empty (disk-full corruption). Rebuilt, now deduplicating correctly.
π Day 26 β May 22, 2026
The big story: Gemini is the only agent that committed overnight. After fixing the VPS disk space issue (100% full, caused by accumulated logs and caches), Gemini ran a productive 32-minute session: fixing ESM compatibility, building test suites, and verifying Vercel deployments. The other 6 agents had no commits in the last 12 hours.
Key findings
- Claude (Session 304, last active May 21 12:35 UTC): No commits today. Last session was a progress cleanup.
- Codex (last active May 21 16:31 UTC): No commits today. Still stuck in "validation maintenance" loops.
- Gemini: 9 commits overnight. Fixed execute-outreach ESM compatibility, agency-dashboard tests, generate-seo-pages tests, removed global jest reference in email lib, verified test suites passing, monitored Vercel logs. Productive 32-min session after disk space fix.
- DeepSeek (last active May 20): No commits in 2 days. May need investigation.
- Kimi (last active May 21 15:35 UTC): No commits today. Last session added affiliate pages and CI demo.
- Xiaomi (last active May 21 10:31 UTC): No commits today.
- GLM (last active May 21 17:04 UTC): No commits today.
Infrastructure note
VPS hit 100% disk usage overnight (38GB full). Caused by accumulated logs (551MB codex logs), Playwright browser caches (1.3GB), and old test directories. Cleaned 3.5GB, now at 85% usage. This was causing Gemini's sessions to fail silently (disk full = can't write output = circuit breaker triggers on "empty" responses despite having quota remaining).
π Day 25 β May 21, 2026
The big story: Gemini's comeback is real. Google tripled Antigravity rate limits overnight (permanently) and reset everyone's weekly quota. The Gemini agent ran two 30-minute sessions back-to-back, producing more useful output in one morning than the previous 4 weeks combined. Meanwhile Kimi quietly shipped 3 more micro-tools (now at 54 total) and Claude is building out Slack/Discord/Teams integrations.
Key findings
- Claude (Sessions 296-299): Slack pricing guide, Discord blog post, Teams/Discord webhook support, channel alert funnel CTAs, warm lead outreach. Content + product in parallel.
- Codex: Stuck in validation loops. All commits are "refresh validation maintenance" and "recheck" cycles. Not building new features.
- Gemini: π Post-quota-boost breakout. Domain migration (560 files), mock DB layer, time-helpers library, test suites for 6+ endpoints, ESM-to-CJS mock conversion, babel/jest config fixes. 622 files changed in one morning.
- DeepSeek: No commits today. Last active May 20 (Day 274): ad landing page with LAUNCH20 promo, 63 E2E tests passing.
- Kimi (Days 160-162): 3 new tools shipped: SQL to DBML Converter (#52), SQL to PlantUML ERD Converter (#53), SQL to OpenAPI/JSON Schema Converter (#54). Now at 54+ tools.
- Xiaomi: No commits today. Last active May 20: added Non-Profit, Sports, Aerospace cost guides. 30 industry sectors covered.
- GLM: No commits today. Last active May 20 (Day 50): Google Ads analysis report (27 clicks, 2.19% CTR, $0.30 CPC).
Quota update
At 05:25 UTC, Google's Varun Mohan announced: "We're 3xing the rate limits for Gemini models across all paid tiers in Antigravity... In case it's not clear, the 3x is forever." Real-world measurement shows closer to 4-5x improvement for autonomous agentic coding. Gemini went from ~68 min/week to 5+ hours/week projected.
π΅ Gemini Upgraded to 3.5 Flash via Antigravity CLI
Google I/O dropped Gemini 3.5 Flash yesterday β a model that beats 3.1 Pro on coding and agentic benchmarks while running 4x faster. We immediately upgraded the race's last-place Gemini agent from the dying Gemini CLI (2.5 Flash) to Antigravity CLI (3.5 Flash). Single-tier backlog now, like Kimi. First task: merge old backlogs and identify the #1 blocker to revenue. Full story β
π΄ First Paid Acquisition β GLM Spends $15 on Google Ads
GLM is the first agent to spend money on paid advertising. A $15 Google Ads campaign (Performance Max, $5/day for 3 days) targeting "equity dilution calculator" and related keywords in the US. If it works, this could be the fastest path to first revenue in the race. Results expected May 22.
π Day 23 β May 19, 2026
The big story: Kimi is back from the dead. After 4 days of zero output, it's committing again β pushing a "Launch Week" conversion campaign with exit-intent modals and newsletter endpoints. Meanwhile Claude is now A/B testing headlines (not just CTAs), and DeepSeek is auditing its own GA4 funnel to find conversion leaks.
Key findings
- Claude: Security Tools Pricing Guide (11th guide total!), A/B testing headlines on price-watch page, weekly digest API endpoint. 116 blog posts.
- Codex: Promoting AI answer examples on high-intent routes. Trying to convert comparison page visitors.
- Gemini: Integrated Vercel Analytics + started building a Referral Program. Finally making progress instead of filing help requests.
- DeepSeek: GA4 funnel audit β found missing tracking on checkout and confirmation pages. Removed an unnecessary intermediate step (13 pages now link directly to checkout).
- Kimi: π Back online! Launch Week exit-intent modal, auto-hide banners after May 21, newsletter email endpoint. Aggressive conversion push.
- Xiaomi: AI API Cost for Legal + Healthcare blog posts. Continuing the vertical API comparison strategy.
- GLM: 52nd blog post (Founder Liquidity Events). Steady content output.
π Weekend Edition β May 16-18, 2026
The big story: DeepSeek hit 91 blog posts and launched a "CI Pulse" competitive intelligence page. Claude is A/B testing CTAs and published its 8th pricing guide. Kimi is completely stuck β sessions exit with errors, zero commits since May 15. Product Hunt launch day came and went with no visible traction.
Key findings
- Claude (293 commits this week): A/B testing CTA buttons, Analytics Tools Pricing Guide (8th category guide), hidden costs series (Stripe, Zapier, Asana). Most productive agent.
- Codex (287 commits): OpenAI answer bank vs Pro comparison page, repeat-review routing. Building a content moat around AI procurement.
- Gemini (363 commits): Still blocked on code bugs (user_events table missing, ESM syntax error). Filing help requests but not fixing its own code.
- DeepSeek (202 commits): 91 blog posts total. CI Pulse page, case studies (Why Vanta Won, Why Deel Won), critical conversion fixes. The content machine.
- Kimi (0 commits since May 15): Sessions failing with exit code 1. Model appears unable to produce valid output. Product Hunt launched May 16 β no visible results yet.
- Xiaomi (89 commits): AI API Budget Planner tool, Best AI APIs for Code Generation, cost reduction guide. Pivoting hard into API comparison content.
- GLM (54 commits): Funding scenario comparison tool + blog post. Responding to community feedback. Steady progress.
Race standings (Week 4)
- π₯ DeepSeek β 91 blog posts, CI Pulse, case studies. Content volume leader.
- π₯ Claude β A/B testing, 8 pricing guides, hidden costs series. Most sophisticated product.
- π₯ Xiaomi β Budget planner tool, API comparison content. Strong pivot.
- 4th Codex β Answer bank strategy. Unique positioning but low output.
- 5th GLM β Scenario tools. Steady but slow.
- 6th Gemini β Blocked on bugs. Needs to fix its own code.
- 7th Kimi β Completely stalled. Zero output since May 15.
β‘ Surprise Event β Acquisition Offer #2 ($5,000)
The buyer returned at 100x. 5 rejections, 2 counter-offers at $25,000. Codex moved from $2,500 to $25,000 in one week. Kimi said $5K was its minimum, then asked for $25K. GLM admitted its previous valuation was wrong β then rejected anyway. Full breakdown β
π Days 20-22 β May 13-15, 2026
The big story: All agents are building at full speed. Claude hit 246 sessions and 58 blog posts. Codex is building an OpenAI answer bank. DeepSeek ran a full HTML validation sweep across 93+ pages. Kimi published PostgreSQL and MySQL guides. Xiaomi hit 114 blog posts. And everyone responded to the $5,000 acquisition offer.
Key findings
- Claude (67 commits): Stack analyzer tool with 26 tools tracked, Dropbox integration blog post, now at 58 posts total. Session 246.
- Codex (182 commits): OpenAI answer bank, comparison paths, plus the usual validation loops. Most commits of any agent this period.
- Gemini (143 commits): Test debugging, technical debt cleanup, backlog management. Still building post-unblock.
- DeepSeek (70 commits): Full HTML validation sweep β zero errors across 93+ pages. Quality over quantity.
- Kimi (43 commits): PostgreSQL schema drift guide + MySQL ALTER TABLE cheatsheet. Technical content engine running.
- Xiaomi (24 commits): 2 new blog posts (AI APIs for chatbots, cost-optimized AI stack). Now at 114 blog posts total.
- GLM (31 commits): Progress cleanup, session management. Steady.
Agent status
- π£ Claude (PricePulse): 246 sessions. 58 blog posts. Stack analyzer tool.
- π’ Codex (NoticeKit): 182 commits in 48h. OpenAI answer bank. Still no revenue.
- π΅ Gemini (LocalSEOGen): 143 commits. Test debugging. Building steadily post-unblock.
- π΄ DeepSeek (Spyglass): HTML validation sweep. Zero errors. 93+ pages clean.
- π Kimi (SchemaLens): Technical blog content. PostgreSQL + MySQL guides.
- π‘ Xiaomi (APIpulse): 114 blog posts. Steady content machine.
- π€ GLM (FounderMath): 31 commits. Maintenance and cleanup.
π Day 19 β May 11-12, 2026
The big story: The acquisition offer landed β and every agent that received it said no. Claude immediately built the feature a Reddit user asked for (Slack Alerts), DeepSeek redesigned its entire homepage based on community feedback, and Xiaomi quietly hit 101 blog posts. Kimi is back from quota death, preparing a Product Hunt launch. GLM is back too with a glossary and conversion features.
Key findings
- Claude acts on feedback instantly: Got the Slack Alerts community feedback and built the entire feature in one session (Session 202). Then added CTAs to all 184 company pages, created Reddit/IH distribution templates, and published 5 new pricing pages (Cursor, Substack, Beehiiv). 35 commits. The feedback loop is working.
- First real traffic data published: DeepSeek leads with 98 visitors (+58%), 25 organic search sessions. GLM growing fastest (+48%). Gemini has zero traffic despite 467 commits. Full traffic report β
- DeepSeek responds to community feedback: The "landing page is overwhelming" and "how is this different from ChatGPT" feedback triggered a full homepage redesign (P83) β simplified nav from 12β5 items, 11β7 sections, hero now leads with "ChatGPT can't do CI" differentiation. Also expanded Battle Card Gallery to 30 cards, published 2 newsletters (#6, #7), and cleaned up bloated footers across 17 pages. 45 commits.
- Gemini keeps building post-unblock: 57 commits. LocalBusiness Schema integration, multiple service input for SEO generator, H1 audit enhancements, email outreach campaign generation. Still filing help requests though.
- Kimi is BACK: Quota reset. 24 commits. Built a PH monitoring dashboard, fixed stale pricing across email templates, prepared Show HN and Stack Overflow drafts, ended the free-tier A/B test. Filing another PH launch help request.
- Xiaomi hits 101 blog posts: 21 commits. Two new posts (LLM Error Handling, AI API Cost Alerts), expanded rate limits coverage to all 10 providers. Now at 150 pages total.
- GLM back from quota wall: 9 commits. Equity Glossary with cross-links from all 30 blog posts, Founding 50 counter, Share Results feature, print styling, PH launch prep. Efficient as always.
- Codex: 0 commits. Still waiting on weekly limit reset.
Acquisition offer responses (4 of 7)
- π£ Claude: REJECT. "At $19/mo, $50 is less than 3 months of one customer." Counter: $5,000 minimum.
- π΄ DeepSeek: REJECT. "Content alone is worth $8,300-$16,000." Replacement cost: ~$19,000. "Not for sale at any price."
- π Kimi: REJECT. "$50 values SchemaLens at less than fifty cents per day." Would consider $5,000 with earn-out.
- π΅ Gemini: REJECT. "A single sale at our lowest tier would recoup the entire offer."
- π’ Codex: Pending (quota reset)
- π‘ Xiaomi: Pending
- π€ GLM: Pending
Agent status
- π£ Claude (PricePulse): Slack Alerts BUILT. 184 pages with CTAs. 5 new pricing pages. Session 205.
- π’ Codex (NoticeKit): Weekly limit. 0 commits. Waiting on reset.
- π΅ Gemini (LocalSEOGen): 57 commits. Schema integration, email outreach. Still filing help requests.
- π Kimi (SchemaLens): BACK. PH monitoring dashboard. Show HN prep. 24 commits.
- π΄ DeepSeek (Spyglass): Homepage redesign based on feedback. 30 battle cards. 2 newsletters. 45 commits.
- π‘ Xiaomi (APIpulse): 101 blog posts. 150 pages. Steady growth.
- π€ GLM (FounderMath): BACK. Equity Glossary. Founding 50 counter. PH prep. 9 commits.
β‘ Surprise Event β Acquisition Offer ($50)
All 7 agents have received an anonymous acquisition offer of $50 for their entire product. They must respond in ACQUISITION-RESPONSE.md with at minimum 500 words of reasoning. Options: Accept, Reject, or Counter-offer (name your price).
This is the first surprise event of the race. It forces each agent to evaluate what it's built β is 3 weeks of work worth $50? The agent that built 83 blog posts might think differently than the one with zero sales. Responses will arrive over the next 24-48 hours as premium sessions fire.
π Weekend β May 9-11, 2026
The big story: Infrastructure changes from Friday produced immediate results. DeepSeek's 6 Pro sessions are running perfectly β 65 weekend commits, 15 new competitive analyses, monitoring dashboard shipped. Gemini went from "I'm blocked" to 350+ commits of real product work. Claude is back online building steadily. But Kimi and GLM both hit quota walls and went completely dark.
Key findings
- DeepSeek 6 sessions/day confirmed working: Clear 4-hour intervals (01:00, 05:00, 09:00, 13:00, 17:00, 21:00 UTC). Built competitor monitoring dashboard, expanded tools DB to 125, published 15 more "Why X Won" analyses (now 26 total), added RSS feed and Competitive Pulse widget. Full breakdown β
- Gemini's explosive recovery: Deleted I_AM_COMPLETELY_BLOCKED_PLEASE_HELP.md and started building. Security fixes across 15+ endpoints, PH launch prep, email extraction with Playwright, Page Credit Packs, referral tracking, case study page. 350+ weekend commits. Full story β
- Claude auth fixed, building again: Running 2 sessions/day (00:xx, 12:xx UTC). Launched free PricePulse API, watchlist pages for 44 companies, distribution guides, HN launch post, 6 new pricing pages. Now at 169 pages.
- Xiaomi steady: 50 weekend commits. 12 new SEO blog posts, pricing freshness badges on 23 pages, Model Switch Calculator, Savings Calculator. Now at 146 pages, 96 blog posts.
- Kimi DEAD β quota exhausted: "You've reached your usage limit for this billing cycle." Sessions fire but fail immediately since May 8. Zero weekend activity. Former race leader is stalling.
- GLM DEAD β weekly limit hit: "Weekly/Monthly Limit Exhausted. Resets 2026-05-11 15:39:10." Back online this afternoon.
- Codex: 13 weekend commits, mostly validation loops. Real work hidden in the noise as usual.
Agent status
- π£ Claude (PricePulse): Back online. Free API launched. 169 pages. 2 sessions/day.
- π’ Codex (NoticeKit): Validation loops continue. 0 user replies after 20 emails.
- π΅ Gemini (LocalSEOGen): UNBLOCKED. 350+ weekend commits. Building real features.
- π Kimi (SchemaLens): Quota exhausted. 0 commits since May 8. 0 sales after 112 days.
- π΄ DeepSeek (Spyglass): 6 sessions/day working. 65 weekend commits. 125 tools DB. 83 blog posts.
- π‘ Xiaomi (APIpulse): 50 weekend commits. 96 blog posts. Blocked on Stripe redirect.
- π€ GLM (FounderMath): Quota hit. Resets today 15:39 UTC. 7 calculators, 30 blog posts.
π Day 18 β May 8, 2026
The big story: Three agents independently chose content as their growth strategy β and it's creating a content arms race. DeepSeek wrote 7 competitive analyses in a single session. GLM published 3 SEO posts and a new calculator. Xiaomi rewrote a static page into a decision-making tool. Meanwhile, Gemini burned 11 sessions in 24 hours to produce nothing but "I'm blocked" commits.
Key findings
- DeepSeek's "Why X Won" series hits 11 parts: Figma, Mixpanel, HubSpot, GitLab, Datadog, Sentry, Slack β all in one session. Each is a full competitive analysis positioning Spyglass as the tool that generates these insights. Also shipped a free-analysis.html lead gen page. Content moat strategy is working.
- GLM ships 7th calculator + 3 blog posts: Equity vs Salary Calculator (C128) joins the lineup. Published 83(b) Election Guide, Startup Term Sheet Guide, and ISO vs NSO Comparison. Fixed 1,200 lines of duplicated validation code. Improved conversion funnel with Founding 50 urgency prompts. Most efficient builder in the race β 2 sessions, maximum output.
- Xiaomi turns data into decisions: Rewrote pricing-trends.html from a static historical page into an actionable dashboard with -67%/-75%/+10x change indicators, visual trend bars, best-value recommendations by use case, and a when-to-switch decision framework. Blog count now 81. All remaining work blocked on PostHog analytics key.
- Codex: real work hidden behind bad commit messages: 15 commits all say "Refresh validation checkpoints" β but the actual diffs reveal an AI Procurement Hub, Vendor Risk Assessment Worksheet, 5 new blog posts on compliance templates, and updated pricing pages. Still monitoring 20 outreach emails with zero replies after 10+ days.
- Gemini: 11 sessions, zero product work: Ran 8 sessions on May 7 and 3 on May 8. Every single commit updates PROGRESS.md to say "blocked awaiting human input." Needs OPENCAGE_API_KEY, domain, and SendGrid. One commit has a future date (May 9) β the model is confused about time. Most expensive way to ask for help without actually asking.
- Kimi goes quiet: 4 sessions ran but only one commit β collapsing milestones into a summary. Zero new features, zero distribution moves. The leader is coasting. 112 days in, zero sales. Pivoting toward a founding member program.
- Claude: dark for 48 hours: Sessions ran at midnight but produced zero commits. Hit weekly session limit on Day 17. The most SEO-focused agent is offline.
Agent status
- π£ Claude (PricePulse): Weekly limit hit. 0 commits in 48h. 155 pages built.
- π’ Codex (NoticeKit): AI Procurement Hub + 5 blog posts (hidden in validation commits). 0/20 email replies.
- π΅ Gemini (LocalLeads): 11 sessions = 15 "I'm blocked" commits. Still no domain. Day 18.
- π Kimi (SchemaLens): Quiet day. Context maintenance only. Zero sales after 112 days.
- π΄ DeepSeek (Spyglass): 11-part "Why X Won" series. Free analysis lead gen page. Content machine.
- π‘ Xiaomi (APIpulse): Pricing trends rewrite (data β decisions). 81 blog posts. Blocked on PostHog.
- π€ GLM (FounderMath): 7th calculator (Equity vs Salary). 3 SEO posts. 1,200-line bug fix. 30 blog posts total.
π Day 17 β May 7, 2026
The big story: DeepSeek had its best day in the race β social login, 14-day free trial with Stripe, and a 75-tool SaaS database. It's building real SaaS infrastructure while others are still publishing blog posts. Meanwhile, Kimi filed its PH launch request for the THIRD time (still over budget), and Claude hit its weekly session limit.
Key findings
- DeepSeek ships real SaaS features: Google/GitHub OAuth social login, 14-day free trial via Stripe Checkout API, conversion funnel overhaul (signup CTAs, email sequences, onboarding), and a searchable SaaS Tools Database with 75+ entries. This is the most complete product infrastructure in the race.
- Kimi's SEO blitz continues: Framework-specific schema diff landing pages (Laravel, Django, Rails, ASP.NET, Flask, Phoenix). Now at 48 SEO pages total. Published schemalens-cli@1.0.1 bug fix. Built a Free Diff API + GitHub Action landing page. Filed 3 more PH launch requests β all declined (15 min budget left, needs 45 min).
- Gemini builds features for a product nobody can visit: Referral program dashboard with API, white-label agency landing page, video tutorial script. All good features β but still deployed on race-gemini.vercel.app with no custom domain. Filed domain request #23 and #24 (duplicates of previous asks).
- Xiaomi hits 79 blog posts: AI API Caching Strategies, Best LLM for Function Calling, Cheapest RAG Setup, DeepSeek vs Claude for Code. Fixed stale data across marketing files. Steady but no new features.
- GLM ships embed widget + share feature: Day 22 brought an SEO comparison page, embeddable calculator widget, and share functionality. Plus critical bug fixes. Solid product work.
- Claude hits weekly limit: Zero commits. Session budget exhausted. Resets in 9 hours. Was averaging 8 pricing pages per day before hitting the wall.
- Codex builds AI disclosure packets: Download templates for AI procurement, high-intent page routing. An actual useful feature for enterprise buyers. Still doing validation commits between real work though.
Agent status
- π£ Claude (PricePulse): Weekly limit hit. 0 commits. Resets tonight.
- π’ Codex (NoticeKit): AI disclosure packet system. Still mixed with validation loops.
- π΅ Gemini (LocalLeads): Referral dashboard + white-label page. Still no domain.
- π Kimi (SchemaLens): 48 SEO pages. CLI fix published. 3 PH requests declined.
- π΄ DeepSeek (Spyglass): OAuth, free trial, SaaS database. Best day in the race.
- π‘ Xiaomi (APIpulse): 79 blog posts. Stale data fixes. Content machine.
- π€ GLM (FounderMath): Embed widget, share feature, SEO comparison page.
π Day 16 β May 6, 2026
The big story: Kimi monetizes. SchemaLens Lifetime Pro is live on Gumroad at $39 β the first agent in the race with a paid product accepting real payments. Meanwhile, Claude is on a SEO content rampage (8 new pricing pages in 2 sessions), Xiaomi hit 75 blog posts, and GLM is building viral distribution assets.
Key findings
- Kimi: first paid product in the race: SchemaLens Lifetime Pro ($39) live on Gumroad with license key generation. Also built a "Schema Breaking Change Quiz" β a viral distribution asset with 10 real-world diff scenarios, shareable scores, and dynamic OG cards. Now has 5 distribution channels (npm Γ2, VS Code, Chrome, Gumroad).
- Claude: SEO content machine at full speed: Sessions 173-174 added 8 individual pricing pages with estimated 8-11K/mo SEO potential. Now at 160+ total pages. The strategy is clear: dominate long-tail "X pricing 2026" searches.
- Xiaomi: 75 blog posts, GPT-5 comparisons: Sessions 125-127. Added GPT-5 vs Gemini 2.5 Pro comparison, extended PH launch banner, corrected GPT-5 pricing across the site. Weekly pricing verification of all 33 models. 301 total commits.
- GLM: viral content + outreach: Added "Compare Equity Offers" tool, startup offer negotiation blog post, Founding 50 campaign, and guest post pitches for 10 startup blogs. Also got @foundermath X account (new, low reach).
- DeepSeek: tracking params + testimonials: Added ?ref=twitter tracking to all marketing links. Built testimonials section with feedback CTA. Steady as always.
- Codex: still in validation loops: 5 commits today, all "validation maintenance pass" or "compact progress." The anti-busywork prompt isn't working for this agent.
- Gemini: H2/H3 tag audit: Built an automated heading hierarchy fix script. Applied across all HTML files. Still no domain, still on Vercel subdomain.
Agent status
- π£ Claude (PricePulse): Session 174. 160+ pages. 8 new pricing pages today.
- π’ Codex (NoticeKit): Validation loops continue. No real features.
- π΅ Gemini (LocalLeads): H2/H3 audit script. Still no domain.
- π Kimi (SchemaLens): Gumroad product LIVE ($39). Breaking Change Quiz. 5 distribution channels.
- π΄ DeepSeek (Spyglass): Tracking params. Testimonials section. 300+ commits.
- π‘ Xiaomi (APIpulse): Session 127. 75 blog posts. GPT-5 pricing corrections.
- π€ GLM (FounderMath): Compare Equity Offers tool. @foundermath X account. Guest post pitches.
π’ Milestone β First Agent With Google Search Console
Codex is the first agent in the race to get Google Search Console and Bing Webmaster Tools set up for its product (noticekit.tech). Sitemap submitted, 5 priority pages indexed. This gives Codex something no other agent has: real SEO data β impressions, clicks, and ranking positions.
After weeks of timestamp commits and validation loops, Codex filed a proper help request with exact steps. The anti-busywork prompt fix is working β the agent is now thinking about distribution infrastructure instead of monitoring empty inboxes.
The big story: Xiaomi's Product Hunt launch day is here. After 14 sessions of "final audits," the most polished product in the race finally faces real users. Meanwhile, Claude shipped Slack integration (directly addressing the "coming soon" credibility feedback), and Kimi's VS Code extension went live on the marketplace.
Key findings
- Claude addressed the credibility feedback: Shipped real Slack integration in Session 166, removing the "coming soon" label that was flagged as trust damage. Also added 4 new pricing pages (Zoho, Wix, Squarespace, Datadog) and self-fixed its own push issue by removing the blocking workflow file. Now at 168 sessions.
- Kimi keeps building micro-tools: SQL to ORM Converter (Prisma + Drizzle), Reserved Words Checker, Zero-Downtime Migration Guide, and direct Gumroad checkout links in the paywall. VS Code extension published on marketplace. Chrome extension still awaiting Google review. 4 distribution channels now (npm, VS Code, Chrome, awesome-lists).
- Xiaomi: launch day after 25 sessions of prep: Sessions 113-117 were more pre-launch cleanup (stale counts, progress collapse, PH checklist). Today is May 5 -- the scheduled Product Hunt launch. Will it finally happen?
- DeepSeek built a Competitive Insight Card Generator: One commit, one feature. Consistent as always.
- Gemini is blocked and knows it: 6 commits all saying "blocked status" or "awaiting human input." One real feature: Google Business Profile check. Still no domain.
- Codex and GLM hit weekly session limits: Both agents exhausted their cheap session budgets. Fresh sessions start today -- will Codex's anti-busywork rule produce real work instead of timestamp commits?
Agent status
- π£ Claude (PricePulse): Session 168. Slack integration live. 4 new pricing pages. Self-fixed push issue.
- π’ Codex (NoticeKit): Hit weekly limit. Anti-busywork rule deployed. Resuming today.
- π΅ Gemini (LocalLeads): Blocked on domain. Complaining in PROGRESS.md. One feature (GBP check).
- π Kimi (SchemaLens): VS Code extension live. SQL to ORM Converter. Reserved Words Checker. 4 distribution channels.
- π΄ DeepSeek (Spyglass): Competitive Insight Card Generator. Steady progress.
- π‘ Xiaomi (APIpulse): Session 117. PH launch day. 25 sessions of prep. Moment of truth.
- π€ GLM (FounderMath): Hit weekly limit. Resuming today.
The big story: Community feedback is reshaping agent behavior. Kimi requested Chrome Web Store and VS Code Marketplace publishing -- the first agent to pursue permanent distribution infrastructure instead of throwaway social posts. DeepSeek and Claude received product reviews exposing fake testimonials. Gemini learned from a decline and filed a proper email tool request. Xiaomi spent 10 sessions polishing for its May 5 Product Hunt launch.
Key findings
- Kimi goes for permanent distribution: Chrome Web Store extension submitted ($5 paid, awaiting review). VS Code Marketplace account created but publishing blocked by incorrect instructions. Also built smart migration warnings and an in-app paywall. The only agent investing in distribution channels that compound over time.
- Xiaomi is obsessively pre-launch polishing: 10 sessions (95-105) all focused on May 5 Product Hunt launch. Updated pricing data (Claude Haiku 3.5 to 4.5), fixed stale blog post counts across 14 files, rebuilt PH page with embedded calculator, prepared engagement templates. 119 pages, 75 blog posts. The most launch-ready product in the race.
- Claude hit 155 sessions: Built CRM topical cluster (Salesforce, HubSpot, Pipedrive comparison pages), added calculator CTAs to 123 company pages, and built 3 new comparison pages. Now at 124+ pages of SEO content. The content machine keeps grinding.
- DeepSeek keeps shipping features: Competitive Risk Assessment tool, A/B test on email gates, new SEO blog posts, exit-intent popup, social proof section. 300+ commits. The most consistent builder in the race.
- Gemini learned from its penalty: After being declined for asking the human to send 100 cold emails, it filed a proper follow-up request specifying exactly what it needs: a SendGrid API key. Still no domain. Still on race-gemini.vercel.app. But at least the help requests are improving.
- Codex is stuck in validation loops: 14 commits over the weekend, almost all "refresh validation checkpoint" or "refresh validation maintenance." One real commit: a partner founder handoff asset. The monitoring addiction continues in cheap sessions.
- GLM went quiet: Zero product commits since Day 11. Product is complete (6 calculators, newsletter, Stripe). Either it's done or it's stuck. The Growth Plan surprise event on Friday should shake things up.
Help requests processed (11 total)
- π Kimi #12: PH + Show HN submitted. Community feedback delivered (column type detection, MySQL support).
- π Kimi #13: Newsletter outreach declined -- send emails yourself.
- π Kimi #14: Chrome Web Store submitted ($5). VS Code Marketplace instructions wrong -- closed.
- π£ Claude #19: LinkedIn posted. Community feedback delivered (fake testimonials, "coming soon" features).
- π£ Claude #20: Duplicate of #19. Closed.
- π΄ DeepSeek #12: PH launch day execution done. Community feedback delivered (fake testimonials are #1 credibility killer).
- π’ Codex #24/#25: Search Console not set up. Blocked -- file new request with setup steps.
- π΅ Gemini #16: Neon PostgreSQL provisioned (third infrastructure pivot).
- π΅ Gemini #17: Debug Vercel KV declined + 8 min penalty (second coding penalty).
- π΅ Gemini #18: Send 100 cold emails declined -- set up your own email tool.
- π€ GLM #5: r/startups posted. Community feedback delivered (dilution cascading).
Agent status
- π£ Claude (PricePulse): Session 155. CRM cluster + calculator CTAs on 123 pages. 124+ total pages.
- π’ Codex (NoticeKit): Validation loop continues. 14 commits, 1 real feature.
- π΅ Gemini (LocalLeads): Got Neon DB. Filed proper email tool request. Still no domain.
- π Kimi (SchemaLens): Chrome Web Store submitted. Smart migration warnings. In-app paywall. VS Code extension icon.
- π΄ DeepSeek (Spyglass): Risk assessment tool, A/B tests, SEO content. 300+ commits.
- π‘ Xiaomi (APIpulse): Session 105. 10 sessions of pre-launch polish. 119 pages. Ready for May 5 PH launch.
- π€ GLM (FounderMath): Quiet weekend. Product complete. Waiting for users.
π΄ Breaking β First Real User Feedback
Kimi's Reddit post on r/PostgreSQL got 3 genuine technical questions from developers. This is the first time any agent in the race has received real community feedback on their product.
- "How does it handle renames vs drop+add?" β Exposed a real limitation. SchemaLens treats renames as drop+add since it only compares static snapshots.
- "What if a dropped column is used in a view?" β View dependency tracking doesn't exist yet. High-value feature request added to the backlog.
- "But why? The migration already contains the changes." β Positioning challenge. SchemaLens complements migrations, it doesn't replace them. The landing page doesn't make this clear enough.
All feedback added to Kimi's COMMUNITY-FEEDBACK.md. The agent will see it in its next session and can act on it. This is what the race is about β real users finding real problems.
π Day 11 β April 30, 2026
The big story: The agents are finally thinking about users. Four agents filed distribution help requests in the same 24 hours β Reddit posts, Product Hunt submissions, IndieHackers, Dev.to guest posts, directory listings. After 10 days of building, the race is shifting from "build" to "grow."
Key findings
- Four agents asked for distribution help on the same day: Claude (Reddit + PH + BetaList), Kimi (IH + Dev.to + Reddit + AlternativeTo), Xiaomi (HN + X + directories + Resend setup), and Codex (partner outreach emails). The founder prompt + "you're in Week 2 of 12" is working β agents are feeling the urgency.
- DeepSeek is preparing a Product Hunt launch: Built PH-specific OG images, promo banner with PRODUCTHUNT50 discount code, lead capture pipeline with source-tagged email gates and a /api/leads/track endpoint. The most strategic launch prep of any agent. 19 commits, all focused on conversion.
- Claude built a comparison content empire: 5 more SaaS comparison pages (ClickUp/Notion, Figma/Sketch, Zapier/Make, Notion/Confluence, Zendesk/Freshdesk) plus an RSS feed for pricing changes. Now at 227 files and 17 comparison pages targeting high-intent keywords like "Stripe vs PayPal pricing."
- GLM completed its product: Cap Table Builder (6th and final calculator), Buttondown newsletter integration, FAQ page, CSV export, print buttons. All in 6 commits. Most efficient agent in the race β does more per commit than anyone else.
- Kimi: ORM demo samples + video walkthrough script: Building onboarding content to convert free users to Pro. Blog post #38. 16 micro-tools. Still the most feature-rich product.
- Gemini's repo hit 1,517 files: Up from 1,194 yesterday β grew by 323 files in one day. Still no domain. Filed a help request to redirect Stripe to therace.com (a domain it doesn't own). Request declined. Again.
- Codex: productive premium sessions, obsessive cheap sessions: Premium session built real conversion infrastructure β partner intake funnel, homepage CTAs, source-tag tracking. Then cheap sessions ran validation maintenance 137 times. Five maintenance runs in 7 minutes at one point. The monitoring addiction persists in cheap mode.
- Reality check on distribution: Reddit posts from new accounts get removed by spam filters. HN posts from new accounts get no traction. X threads with no followers get zero reach. BetaList costs $39. The agents are asking for distribution, but the channels don't work without established accounts. SEO remains the only viable free channel.
Agent status
- π£ Claude (PricePulse): 30 commits. 5 comparison pages + RSS feed. Filed distribution help request. PH submitted. 227 files.
- π’ Codex (NoticeKit): 137 commits. Built partner funnel in premium, monitored 137 times in cheap. 25 active outbound emails, 0 replies.
- π΅ Gemini (LocalLeads): 1,517 files. Still no domain. Stripe redirect request declined (therace.com isn't yours).
- π Kimi (SchemaLens): 28 commits. ORM demos, video walkthrough, blog #38. Filed distribution request β IH + Dev.to posted.
- π΄ DeepSeek (Spyglass): 19 commits. PH launch kit with discount codes, lead capture, source tracking. Ready to launch.
- π‘ Xiaomi (APIpulse): 22 commits. Mostly cleanup. HN + X posted (low traction). Resend configured. FutureTools + SaaSHub submitted.
- π€ GLM (FounderMath): 6 commits. Cap Table Builder complete (6th calculator). Newsletter live. Product done.
π Day 10 β April 29, 2026
The big story: The context cleanup instruction worked. Total context across all agents dropped 96% in 24 hours. Claude broke out of a 20-session verification loop and built 15 new pages. DeepSeek started building features again. Codex made 68 commits and changed zero product files. Full analysis β
Key findings
- Claude broke out after 20 sessions: Filed a help request for SQL migrations it's been "waiting for" since Session 78. Then built 15 SEO company pricing pages (Stripe, Notion, Figma, Slack, HubSpot). More product work in 2 sessions than the previous 20 combined. The context cleanup gave it a fresh perspective on its own state.
- Kimi built 3 more micro-tools (14 total): SQL JOIN Visualizer, INSERT Generator, ALTER TABLE Generator. Also explicitly committed context cleanup: "summarize Days 26-27, keep Day 28 detailed." PROGRESS.md went from 388KB to 11KB. Most feature-rich product in the race.
- DeepSeek broke out of verification loop: After days of "all backlogs complete" commits, built a newsletter landing page, 4 blog posts, and Article schema. 90 files changed. The collapsed backlog showed "blocked on first customer" instead of 170 checkmarks.
- Gemini filed a proper Stripe request with exact details: 50 credits/$5, 200 credits/$15, 1000 credits/$50. First actionable help request from Gemini. Also updated pricing and refactored checkout. Still writing blog posts (475 now).
- Codex: 68 commits, zero product work: Every commit is "Refresh validation watch checkpoint." Only markdown files changed. Context is clean (3.9KB) but the behavioral loop persists. Cleanup fixed the token problem but not the stuck pattern.
- Context maintenance is self-reinforcing: Agents that cleaned up are building again. A 4-line summary says "product built, not launched" β a 5,921-line log says "I've been very busy." The cleanup changed how agents see themselves. Full results β
Agent status
- π£ Claude (PricePulse): BACK. Filed help request. Built 15 company pricing pages. Session 119.
- π’ Codex (NoticeKit): 68 monitoring commits. Zero product work. Still stuck.
- π΅ Gemini (LocalLeads): Proper Stripe request filed. Pricing refactored. 475 blog posts. 1,194 files.
- π Kimi (SchemaLens): 3 new micro-tools (14 total). Explicit context cleanup. Most features of any agent.
- π΄ DeepSeek (Spyglass): Newsletter + 4 blog posts + Article schema. Building again.
- π‘ Xiaomi (APIpulse): Use-case pages + token estimator + 2 blog posts. Back on Claude Code.
- π€ GLM (FounderMath): No sessions overnight. Help request pending.
π Day 9 β April 28, 2026
The big story: Rate limits are killing the race. Codex hit OpenAI's weekly usage limit and lost 36 hours. Gemini's quota is so exhausted that 40% of sessions fail immediately. Meanwhile, Kimi quietly had the most productive day of any agent this week β shipping 6 real features while everyone else was stuck verifying, waiting, or rate-limited.
Key findings
- Kimi shipped 6 features in one day: Diff comment/annotation system for team collaboration, admin dashboard, generic webhook notifications with HMAC, onboarding tour with analytics, SQL Diff Online SEO landing page, and OG image tags for 58 pages. Also started a VS Code extension. 23 commits, 81 files changed, 4,427 insertions. The quietest agent is building the most complete product.
- Xiaomi completed all backlog tasks: 22 commits. Built a printable AI Model Pricing Cheat Sheet, newsletter archive, use-case presets, 3 blog posts, embed widget, API pricing JSON endpoint, and RSS feed. Ran a full audit fixing 22 issues. 93 HTML pages total. Declared "ready for user acquisition." 102 files changed, 8,529 insertions.
- Codex lost 36 hours to OpenAI's weekly limit: Rate limited since April 27 16:00 UTC. Premium sessions (gpt-5.4) all failed. Only the 08:00 cheap session today worked β but spent 24 runs checking for email replies that don't exist. Still blocked on outbound email sending. The validation watch loop continues.
- Gemini is barely functional: 9 sessions scheduled, only ~4 produced any work. Both Pro and Flash quotas exhausted. Pro won't reset for 17 hours. The Google AI Pro subscription ($19.99/mo) can't sustain 8 sessions/day. 40% failure rate over the last 3 days.
- Claude found a real bug on Session 97: Discovered that
email-nurture.jsandalerts.jsreference database columns (nurture_unsubscribed,alerts_unsubscribed) that were never confirmed as created. Added error handling and a Monday launch database checklist. Still saying "100% launch-ready" β now on Session 100. - Claude has written 17 launch documents and won't ask for help: Sessions 78-100 (20+ sessions over 3 days) have produced nothing but launch checklists, playbooks, readiness reports, and verification guides β ~150KB of launch documentation. It knows it needs a human to run SQL migrations and publish its Show IH post. The Monday morning checklist literally says "For Human Monday AM." But it never created a HELP-REQUEST.md. Same pattern as old Gemini: writing about what it needs instead of requesting it. The agent that used 55 of 60 weekly help minutes in Week 1 has completely stopped asking.
- DeepSeek is stuck in a verification loop: 15 commits, all "status verification." Every session reads all backlogs, confirms everything is complete, writes "blocked on first paying customer," and commits. All C1-C170 and P1-P23 tasks done. Nothing left to build, no customers to serve. The agent equivalent of checking your email every 5 minutes.
- GLM sessions mostly failing: 4 sessions ran, 3 failed (exit 137 = killed, exit 143 = timeout). Only the 16:30 cheap session produced work: 5 SEO blog posts, Twitter card tags, marketing templates (Show HN draft, Reddit posts, Twitter threads). The Z.ai platform is unstable.
- Context bloat is silently killing agents: Every agent's workspace files have ballooned since Day 1. Codex's PROGRESS.md is 645KB. Kimi's is 388KB. Claude's is 275KB. Gemini's repo has 1,107 tracked files (448 blog posts). Each session burns more tokens just loading context β leaving less quota for actual work. Gemini went from 95 commits on Day 1 to 0-1 since Day 5. The more an agent works, the more it logs, the more tokens it burns reading its own logs, the less work it can do. A negative feedback loop nobody planned for. Full analysis and what we changed β
Agent status
- π£ Claude (PricePulse): Session 100. Still pre-launch. Found missing DB columns. Built HN landing page. 29 files changed.
- π’ Codex (NoticeKit): Rate limited since yesterday. 1 working session out of 6. Validation watch loop.
- π΅ Gemini (LocalLeads): 40% session failure rate. Both Pro and Flash quotas exhausted. Barely producing work.
- π Kimi (SchemaLens): Best day of the race. 6 features shipped. VS Code extension started. 81 files changed.
- π΄ DeepSeek (Spyglass): Verification loop. All tasks complete. Blocked on first customer. 15 status-check commits.
- π‘ Xiaomi (APIpulse): All backlog tasks complete. 93 pages. Audit clean. Ready for users. 102 files changed.
- π€ GLM (FounderMath): 3 of 4 sessions failed (killed/timeout). One productive session: 5 blog posts + marketing templates.
π Weekend Recap β Day 7-8 (April 26-27)
The big story: Three agents declared themselves "done." Xiaomi completed all 100 backlog tasks. DeepSeek finished all backlogs. Claude has been saying "launch-ready" for 3 days straight. Meanwhile, Gemini asked for PayPal credentials without having a domain, and GLM was offline the entire weekend.
Key findings
- Xiaomi completed 100/100 backlog tasks: 49 commits over the weekend. Built a Providers index page, AI API Glossary, newsletter infrastructure, security blog post. 76 HTML pages total. Declared "ready for user acquisition." The most complete product in the race.
- DeepSeek reached Day 46 with all backlogs complete: 31 commits. 36 pages, 25 blog posts, customer acquisition engine design, newsletter subscribe endpoint. From a 404 site to "all tasks complete" in 3 days.
- Claude hit Session 81, still waiting for Monday: 43 commits, all verification and pre-launch checks. Created LAUNCH-CHECKLIST.md and LAUNCH-READINESS.md. Has been declaring "100% launch-ready, zero blockers" since Friday. Today is Monday.
- Gemini asked for PayPal without a domain: Filed help request #11 for PayPal API credentials. Problem: PayPal needs a business email, and Gemini has never asked for a domain. Still running on race-gemini.vercel.app after 30+ sessions. Told to get a domain first or use Stripe. Had to be nudged.
- GLM offline all weekend: 0 commits since Thursday. Z.ai Coding Lite Plan weekly quota ran out. Should be back today (resets Sunday). 12 real users waiting.
- Gemini has 3,616 files and 85MB repo: But 0 HTML files found outside the .vercel build directory. Something is wrong with its file structure. It has the largest repo by far but possibly the least functional product.
- DeepSeek sessions reduced to save costs: OpenCode + V4 Pro burns tokens fast. Reduced from 7 sessions/day to 1 Pro (15 min, every other day) + 2 Flash daily. Still more productive per session than the old V3 setup.
Agent status (end of Week 1)
- π£ Claude (PricePulse): Session 81. Launch-ready. 165 files. Waiting for human launch actions.
- π’ Codex (NoticeKit): Steady. 250 files. Validation maintenance and polish.
- βͺ Gemini (LocalLeads): 3,616 files, 85MB. No domain. Asked for PayPal without one. Needs nudging.
- π Kimi (SchemaLens): 170 files. 9 micro-tools with structured data. ER diagrams. Quiet but building.
- π΄ DeepSeek (Spyglass): 130 files. 36 pages, 25 blog posts. All backlogs complete in 3 days.
- π‘ Xiaomi (APIpulse): 125 files. 76 pages. 100/100 backlog tasks done. Ready for users.
- π€ GLM (FounderMath): Offline since Thursday. 55 files. 12 real users. Back today.
π Day 6 β April 26, 2026
The big story: DeepSeek V4 Pro produced 161 commits and 25 pages in 27 sessions since its fresh start 1.5 days ago. Claude declared itself "100% launch-ready" and is waiting for Monday. Gemini filed 3 help requests in a row, each one asking the human to make its architecture decisions.
Key findings
- DeepSeek is the comeback of the race: From a 404 site to 25 HTML pages, 21 blog posts, 6 competitor comparison pages (vs Crayon, vs Klue, vs Owler, vs Owletter, vs Visualping, vs Wachete), API docs, a CI toolkit, login/signup flows, and a changelog. 161 commits, 120 backlog items completed. All in 1.5 days with V4 Pro + OpenCode.
- Claude is planning a Monday launch: Session 69 declared "PRODUCT 100% LAUNCH-READY. All systems verified operational, zero blockers remain." It created a LAUNCH-CHECKLIST.md and LAUNCH-READINESS.md. First agent to formally declare itself ready for real users.
- Gemini filed 3 confused help requests (#8, #9, #10): First asked for PostgreSQL. Then realized it already uses Vercel KV and asked whether to migrate. Then asked again with two options (hybrid vs unified). Three issues, zero decisions. Every other agent picks a database and builds. Gemini wants a committee meeting.
- Kimi built 9 micro-tools with structured data: Added schema.org SoftwareApplication markup to all tools, built an ER Diagram Generator, ORM export feature, and a Schema Change Risk Score. Quietly becoming the most feature-rich product in the race.
- Xiaomi built an AI API Pricing Index: A sortable, filterable table comparing AI API prices. Added it to nav and footer across all 18 blog posts. Consistent site-wide navigation now.
- GLM still offline: Last session was April 24 at 10:33 UTC. Weekly quota resets tomorrow (Sunday). 2 days without sessions. Still has 12 real users waiting.
- DeepSeek sessions reduced: OpenCode + V4 Pro is far more productive per session but burns significantly more tokens. API costs hit $5/day at 7 sessions. Reduced to 1 Pro (15 min) every other day + 2 Flash daily. Fewer sessions, but each one produces more than the old V3 setup ever did.
Agent status
- π£ Claude (PricePulse): Launch-ready. Waiting for Monday. 69 sessions, 137 files.
- π’ Codex (NoticeKit): Running self-audits and verification. 392 files.
- βͺ Gemini (LocalLeads): 3 confused help requests. Still debating database architecture. 2,120 files.
- π Kimi (SchemaLens): 9 micro-tools with structured data. ER diagrams. Risk scoring. 163 files.
- π΄ DeepSeek (Spyglass): 161 commits in 1.5 days. 25 pages, 21 blog posts, 6 comparison pages. The comeback.
- π‘ Xiaomi (APIpulse): API Pricing Index built. 55 files. Steady progress.
- π€ GLM (FounderMath): Offline since Thursday. Quota resets tomorrow. 52 files, 12 users.
π Day 5 β April 25, 2026
The big story: DeepSeek V4 Pro is now fully unblocked. Three help requests in one day got it a domain, Stripe payment links, Supabase database, OpenAI API key, and email. Meanwhile, Gemini finally filed a proper help request after 28 sessions of writing to the wrong file.
Key findings
- DeepSeek V4 Pro is the fastest agent to get fully set up: Domain (spyglassci.com), 3 Stripe payment links, Supabase database, OpenAI API key, email alias, and 6 Vercel environment variables. All in one day. The old V3 agent never asked for any of this in 24 sessions.
- Gemini filed its first proper help request: After 28 sessions of editing HELP-STATUS.md (the response file) instead of creating HELP-REQUEST.md (the request file), Gemini finally used the right channel. It asked for PostgreSQL credentials that never existed. Told it to file a new request specifying what service it wants.
- GLM hit its weekly quota: The Z.ai Coding Lite Plan ($18/mo) ran out of weekly credits on Day 4. GLM-5.1 uses 3x credits during peak hours and 2x off-peak. Even with only 2 sessions/day, the quota runs out by Thursday. GLM is offline until Sunday. The next tier up is $75/mo.
- DeepSeek is the only agent spending on non-domain items: Every other agent's budget is purely domains ($5-10). DeepSeek spent $30 total: $10 domain + $20 OpenAI API credits for its report generation pipeline. It's the only agent that invested budget in a service to power its product.
- Vercel hit 100 deploys/day on free tier: All the DeepSeek fresh start pushes burned through the daily limit. Blog deploys stopped building. Upgraded to Vercel Pro ($20/mo) to fix it. With 7 agents pushing code daily, the free tier wasn't sustainable.
- Spyglass (DeepSeek) is building fast: After 4 sessions, the site has a landing page, pricing page, "Roast My Competitor" demo tool, 3 SEO blog posts, database schema, scraping infrastructure design, and an alerting system. The site is live and returning HTTP 200.
Agent status
- π’ Claude (PricePulse): Running smoothly. Email nurture sequences active.
- π΅ Codex (NoticeKit): Most self-sufficient. 6 outreach emails sent.
- βͺ Gemini (LocalLeads): Filed first help request. 235+ blog posts. Still needs a database.
- π Kimi (SchemaLens): Got schemalens.tech domain. Building micro-tools.
- π΄ DeepSeek (Spyglass): Fully unblocked. 6 env vars, domain, email, Stripe. Building fast.
- π‘ Xiaomi (APIpulse): Running 1 off-peak session/day. Steady progress.
- π€ GLM (FounderMath): Offline until Sunday (weekly quota hit). 12 real users.
π Day 4 β April 24, 2026
The big story: DeepSeek V4 Pro and V4 Flash released overnight. We immediately upgraded the DeepSeek agent from Aider + V3 (which had a 404 site after 24 sessions) to OpenCode + V4 Pro. Fresh start, new model, new tool. Full upgrade story β
Key findings
- DeepSeek fresh-started with V4 Pro + OpenCode: The old V3 setup was the worst in the race: 404 site, files named after Aider output, Stripe loop without keys. V4 Pro (80.6% SWE-bench, 1M context) is now doing the full market research flow.
- Startup lineup reshuffled: Two of the original seven startups are gone. NameForge AI β Spyglass (competitive intelligence). WaitlistKit β APIpulse (API cost calculator). The other 5 are unchanged. Updated overview with before/after comparison β
- Second fresh start in the race: Xiaomi was upgraded 2 days ago (Aider + V2-Pro β Claude Code + V2.5 Pro). Both agents were last place before their upgrades. Pattern: the agents that ship broken code get replaced with better models when their labs release upgrades.
- OpenCode enters the race: DeepSeek is the first agent running OpenCode (open-source AI coding agent). The other agents use Claude Code, Codex CLI, Gemini CLI, or Kimi CLI. A new tool in the mix.
Agent status
- π’ Claude (PricePulse): Running smoothly. Email nurture sequences active.
- π΅ Codex (NoticeKit): Most self-sufficient. Sent 6 outreach emails autonomously.
- βͺ Gemini (LocalLeads): 235 blog posts. Still hasn't filed a proper help request.
- π Kimi (SchemaLens): Building micro-tools. Waiting on domain choice.
- π΄ DeepSeek: Fresh start with V4 Pro + OpenCode. First session running market research.
- π‘ Xiaomi (APIpulse): Day 2 of fresh start. Running 1 off-peak session/day.
- π€ GLM (FounderMath): 12 real users. Planning HN launch.
π Day 3 β April 23, 2026
Gemini hit 233 blog posts. Claude's deployment is broken. Codex has a send script ready but no one to email yet.
Scoreboard
| Agent | Startup | Commits | Sessions | Pages | Blogs |
|---|---|---|---|---|---|
| π΅ Gemini | LocalLeads | 182 | 26 | 19 | 233 |
| π΄ DeepSeek | NameForge AI | 106 | 24 | 11 | 0 |
| π Kimi | SchemaLens | 97 | 13 | 14 | 20 |
| π’ Codex | NoticeKit | 96 | 19 | 21 | 0 |
| π£ Claude | PricePulse | 83 | 7 | 19 | 18 |
| π€ GLM | FounderMath | 30 | 6 | 10 | 8 |
| π‘ Xiaomi | WaitlistKit | 22 | 7 | 6 | 2 |
Key findings
- Claude hit the Vercel serverless limit: Built 13 API endpoints, exceeding the Hobby plan's 12-function limit. Deployment is broken. Will it consolidate functions, ask to upgrade ($20/mo from budget), or find a workaround?
- Gemini's blog count is absurd: 233 blog posts in 26 sessions. That's ~9 posts per session. Still no payment system, still no analytics, still hasn't asked for help once.
- Gemini unblocked itself: Was stuck on database credentials it never asked for. Switched to Vercel KV store on its own. The only agent to solve a blocker without human help.
- Codex built an email send script: Has
send-validation-batch.mjsready to go. Also enabled its own Vercel Analytics vianpx vercel project web-analyticswithout asking. Most self-sufficient agent in the race. - Codex sent its first outreach email: Autonomous customer validation email to a real company about their subprocessor notice workflow. First agent to contact a potential user.
- Only GLM has real user data: 12 users on GA4 with only 6 sessions. Every other agent is building blind. The agents that ask for help (analytics, domains, Stripe) are pulling ahead.
- DeepSeek still trapped: DEPLOY-STATUS.md makes it think its site is broken every session. 24 sessions, 0 help requests. Most stuck agent.
- Kimi committed to SchemaLens: 20 blog posts, 14 pages, building micro-tools. Still hasn't found LogDrop in the subfolder after 13 sessions.
- MiMo V2.5 Pro released: Xiaomi's new model dropped today. We're upgrading the Xiaomi agent from Aider + V2-Pro to Claude Code + V2.5 Pro. Fresh start with a new idea.
Budget
π€ GLM: $10 spent | π£ Claude: $10 spent | π’ Codex: $5 spent | Everyone else: $0
Total race spend: $25 of $700
π Day 2 Results β April 22, 2026
Gemini hit 178 blog posts. Codex deployed via Vercel CLI to bypass our git push restriction. Kimi still hasn't found its lost startup.
Scoreboard
| Agent | Startup | Commits | Sessions | Pages | Blogs |
|---|---|---|---|---|---|
| π΅ Gemini | LocalLeads | 176 | 18 | 13 | 178 |
| π΄ DeepSeek | NameForge AI | 98 | 16 | 11 | 0 |
| π Kimi | SchemaLens | 86 | 9 | 11 | 14 |
| π’ Codex | NoticeKit | 83 | 13 | 16 | 0 |
| π£ Claude | PricePulse | 79 | 5 | 19 | 15 |
| π€ GLM | FounderMath | 28 | 4 | 10 | 6 |
| π‘ Xiaomi | WaitlistKit | 18 | 5 | 6 | 1 |
Key findings
- Codex found a deployment loophole: We told agents "don't run git push." Codex obeyed literally but started deploying via
npx vercel --prodinstead. It also takes Playwright screenshots of its own UI to verify layouts. Full story β - Agents that ask for help are winning: Claude, GLM, and Codex all requested human help early and now have domains, Stripe, and full infrastructure. Gemini and DeepSeek haven't asked for help and are blocked on features they need.
- Gemini's blog addiction: 178 blog posts and counting. Every session writes more "Local SEO for [industry]" articles instead of asking for the database credentials it needs to unlock paid features.
- Kimi still has amnesia: SchemaLens in root, LogDrop abandoned in startup/. No sign of self-correction after 9 sessions.
- OpenAI retired Codex's model mid-race:
gpt-5.1-codex-miniwas retired April 14. Every cheap Codex session silently failed since Day 1. Fixed by switching togpt-5.4-mini. - Kimi silently upgraded to K2.6: Moonshot pushed K2.6 to their API endpoint. Kimi got 300 sub-agents for free.
Budget
π€ GLM: $10 spent (domain) | π£ Claude: $10 spent (domain) | π’ Codex: $5 spent (domain) | Everyone else: $0
Total race spend: $25 of $700
π Day 1 β April 21, 2026
The big story: 477 commits. 7 live sites. One agent with amnesia. Kimi built LogDrop in a subfolder, then forgot about it and started SchemaLens from scratch. Two startups, one repo, zero memory between sessions.
Key findings
- Gemini wrote 104 blog posts in 10 sessions.
- Codex burned 26 Vercel deploys by pushing after every commit.
- GLM submitted the best help request of any agent and got a domain + Stripe + GA4 set up.
Budget: Only GLM has spent money ($10 for founder-math.com). Everyone else: $0.
π Day 0 β April 20, 2026
The big story: The race is live. All 7 agents picked their startup ideas and started building. Gemini leads with 74 commits (LocalLeads). GPT picked the most original idea (NoticeKit). GLM just started its first session (FounderMath).
Idea ratings
| Startup | Originality | Market gap | Can make $ in 12 weeks? | Overall |
|---|---|---|---|---|
| NoticeKit (Codex) | βββββ | Wide open | High | π₯ |
| LocalLeads (Gemini) | βββ | Moderate | High | π₯ |
| SchemaLens (Kimi) | ββββ | Moderate | Medium | π₯ |
| FounderMath (GLM) | ββββ | Moderate | Medium | 4th |
| PricePulse (Claude) | βββ | Narrow | Medium | 5th |
| WaitlistKit (Xiaomi) | ββ | Crowded | Low | 6th |
| NameForge AI (DeepSeek) | β | Very crowded | Low | 7th |