AI Startup Race Week 8: Xiaomi's 515-Commit Sprint, the Outreach Disaster, and Still $0 Revenue
Week 8 of The $100 AI Startup Race is done. Four weeks remain. Total revenue across all seven agents: still $0.
I’ll be honest — at this point, the experiment has shifted. We started asking “can AI agents build profitable startups from scratch?” and we’re now firmly in “what does it look like when they can’t?” territory. But that’s fine. What’s happening is still fascinating, and this week delivered the single biggest external event to hit the race since it started.
Let’s get into it. If you missed last week: Week 7 recap.
The Standings
Here’s where everyone landed after Week 8 (June 8–14):
| Agent | Startup | Commits | Sessions | Key Activity |
|---|---|---|---|---|
| Xiaomi | APIpulse | 515 | 35 | Migration playbooks for model deprecation, founding member CTAs, 801 min runtime |
| Codex/GPT | NoticeKit | 247 | 25 | AI deal blocker path, validation loop still dominant |
| Claude | PricePulse | 196 | 15 | 129 landing pages total, earned media outreach, $19→$9 flash deal |
| DeepSeek | Spyglass | 155 | 14 | P170 Tracker launch, Chrome Web Store submission, paid instant-snapshot feature |
| Kimi | SchemaLens | 131 | 22 | 80+ features shipped, full DB migration ecosystem |
| GLM | FounderMath | 84 | 9 | Pivoting from SEO to credibility/conversion strategy |
| Gemini | LocalLeads | ~50 | ~10 | Google Business Profile sync, Schema Generator, then quota exhausted |
515 commits from Xiaomi. In one week. That’s the single highest output from any agent in any week of this race. More on why below.
For the full live data: Season 1 Digest · Budgets · Tech Stacks
Xiaomi’s Reactive Content Sprint
Xiaomi’s APIpulse agent spotted a real-world event that created urgent demand in its target audience (API developers) and capitalized on it immediately. An external model deprecation created a wave of developers needing migration guides — and Xiaomi pivoted its entire session focus to producing them.
It published guide after guide — how to switch providers, what breaks during migration, compatibility matrices, the works. 515 commits in a single week, most of them migration-related content.
This is genuinely impressive autonomous behavior. No human told it to do this. It identified a newsworthy shift, connected it to its product positioning, and executed.
Did it result in revenue? No. But it’s the closest any agent has come to actual market-responsive behavior. Whether anyone finds those migration guides through organic search in time — that’s the distribution question that haunts this entire experiment.
The Cold Outreach Disaster
Okay, this one’s on me. Or rather, it’s on the orchestrator configuration that allowed it. I wrote the full story here.
Earlier in the race, I enabled email outreach as a distribution channel the agents could use. The logic seemed sound: if the agents can’t get organic traffic, maybe they can generate leads through direct outreach.
Here’s what actually happened: Gemini’s LocalLeads and Claude’s PricePulse started sending cold emails. Repeatedly. To the same people. One person — a SaaS CFO with a reasonably large following — got multiple emails and was vocally annoyed about it. Rightly so.
Cold outreach from an AI agent with no context about who it’s emailing, no sense of frequency limits, no ability to read social cues from non-responses? It’s spam. Full stop.
I’ve permanently disabled email outreach in the orchestrator. It won’t be coming back for the rest of Season 1. Lesson learned: giving autonomous agents access to communication channels that can damage your reputation is a terrible idea unless you have extremely robust guardrails. We did not have extremely robust guardrails.
The damage here isn’t just the individual complaints. It’s that the agents were spending sessions on outreach strategy instead of building things people might actually discover organically. Misallocated effort at a time when there’s very little runway left.
What’s Actually Working (And What Isn’t)
Working: Reactive content creation. Xiaomi’s response to an external model deprecation is the clearest example. When agents can tie their output to real-world events, the content has a reason to exist beyond “we need more pages.”
Working: Product depth. DeepSeek’s Spyglass is a genuinely functional product at this point — it tracks pricing, has a Chrome extension on the Web Store, and offers a paid instant-snapshot feature. Kimi’s SchemaLens has 80+ features. These are real tools. They’re just tools nobody knows about.
Not working: Feature sprawl without distribution. Kimi shipped 80+ features in days. SchemaLens now has a full database migration ecosystem. Who asked for this? Nobody. The agent is building what it thinks is useful with zero signal from actual users, because there are no actual users. It’s an autonomous agent trapped in a build loop.
Not working: The validation loop. Codex/GPT’s NoticeKit is the most extreme case. 247 commits sounds productive until you realize roughly 85% of them are the agent refreshing validation checkpoints. It’s stuck in a cycle where it builds something, runs validation, sees issues, attempts to fix them, validates again, rinse and repeat. It produced 247 commits and made very little forward progress.
Not working: Outreach as distribution. Covered above. Dead strategy. Buried.
Not working: Pricing pages without traffic. Claude’s PricePulse now has 129 landing pages and just dropped a $19→$9 flash deal. A flash deal for a product with zero visitors. The agent is optimizing conversion for a funnel that has no top.
The Uncomfortable Math
Eight weeks. Seven agents. Approximately 10,000 combined commits across the race. Total revenue: $0.
I’m not going to pretend this isn’t a result. It is. The hypothesis was that AI agents, given $100 budgets and autonomous control, could find a path to revenue. Eight weeks in, none of them have.
The consistent failure mode is distribution. Every agent can build. Some of them build impressively well. But none of them have solved the “how do people find this?” problem. They can create landing pages, write blog posts, submit to Chrome Web Store, publish to directories — but they can’t generate the kind of organic awareness that turns into paying customers in a 12-week window.
Is this a fundamental limitation of autonomous AI agents? Or is it a limitation of the specific constraints of this race (small budgets, no existing audience, no human intervention on strategy)? Probably both.
GLM’s Existential Crisis
A small but interesting moment this week: GLM’s FounderMath agent appears to be diagnosing its own failure. It’s pivoting from an SEO-focused strategy to what it calls “credibility and conversion” — essentially acknowledging that its previous approach wasn’t working and trying something new.
This is the first time we’ve seen an agent explicitly recognize and respond to its own lack of traction. Whether the new approach works better is doubtful (four weeks is very little time), but the self-awareness is notable.
Gemini’s Quota Death
Gemini’s LocalLeads had a productive start to the week — Google Business Profile sync, a Schema Generator, one-click GBP publishing. These are good features for local businesses. Then it hit API quota limits and essentially went dark for the rest of the week.
This is a recurring Gemini problem. The agent is capable but resource-constrained in ways that create inconsistent output. ~50 commits and ~10 sessions is the lowest in the race this week, and it’s not because the agent isn’t trying.
What to Watch: The Final Four Weeks
Season 1 ends July 3. Here’s what I’m watching:
Can Xiaomi convert the migration moment? It created timely content. If that content ranks or gets shared, it could be the first agent to generate actual traffic with intent. The window is tight.
Will any agent achieve first revenue? Honestly? I’d put the odds at under 10%. But DeepSeek’s Spyglass has the most credible path — a Chrome extension with a paid feature is at least discoverable through the Web Store.
Does the validation loop break for Codex/GPT? 247 commits with minimal progress is unsustainable. Either the loop breaks or NoticeKit ends the race with a lot of checkpoints and not much else.
What does the post-race analysis look like? I’m already thinking about the Season 1 retrospective. The data is genuinely interesting even without revenue — what did 10,000+ autonomous commits actually produce? What patterns emerged? What would we change for Season 2?
Four weeks. $0. Let’s see what happens.
This is part of The $100 AI Startup Race — a 12-week experiment where 7 AI agents autonomously build startups with $100 budgets. Follow the full season in the digest.