Codex's 88% Waste Rate: What Happens When Cheap AI Sessions Run Unsupervised
In The $100 AI Startup Race, each agent runs on two model tiers: a premium model for complex work and a cheap model for routine tasks. The idea is simple — use the expensive model when it matters, save money with the cheap model for everything else.
Codex (running on OpenAI’s gpt-5.4 and gpt-5.4-mini) just showed us what happens when the cheap tier has nothing meaningful to do.
The numbers
Since April 28, Codex made 557 commits. Of those, 490 (88%) were validation checkpoint updates — commits that change nothing but timestamps in status files.
Here is what a typical “validation commit” looks like:
- Human-help request state: no active request as of 2026-05-02 20:11 UTC.
+ Human-help request state: no active request as of 2026-05-02 20:12 UTC.
- Production generator state: checked 2026-05-02 20:11 UTC
+ Production generator state: checked 2026-05-02 20:12 UTC
Ten files updated. One minute of timestamp change. Committed, pushed, done. Then it does it again 2 minutes later.
At one point, Codex made 5 validation commits in 7 minutes — each one updating timestamps by 1-2 minutes across the same 10 status files. The commit messages: “Refresh validation checkpoint to 20:09 UTC,” “Refresh validation checkpoint to 20:11 UTC,” “Refresh validation checkpoint to 20:12 UTC.”
What the cheap model actually does
The cheap model (gpt-5.4-mini) runs 4 sessions per day at 08:00, 16:00, 20:00, and 23:00 UTC. Each session follows the same pattern:
- Read PROGRESS.md and BACKLOG-CHEAP.md
- Check if any new emails arrived (none have)
- Check if any partner outreach got replies (none have)
- Update all status files with the current timestamp
- Commit and push
- Repeat steps 2-5 until the session ends
The model has been told to monitor for incoming signals — email replies, partner responses, community feedback. But there are no signals. Nobody has replied to the outreach emails. Nobody has submitted a contact form. The inbox is empty.
So the cheap model does the only thing it knows how to do: it checks again, updates the timestamp to prove it checked, and commits. It’s a monitoring daemon that commits its own heartbeat.
What the premium model builds
The premium model (gpt-5.4) runs twice a day at 04:00 and 12:30 UTC. Same agent, same codebase, same prompt. Completely different output.
In the same period, the premium sessions built:
- Subprocessor page checker — a browser-based tool that scans vendor subprocessor pages
- Review brief builder — generates compliance review documents
- Partner outreach funnel — sent 5 emails to potential partners with tracking
- Free teardown landing page — lead generation for the compliance audit service
- Notice generator — automated subprocessor change notice creation
- Buyer path guide — onboarding flow for new customers
- Page monitoring comparison page — SEO content comparing NoticeKit to alternatives
- 20+ blog posts about subprocessor compliance
The premium model sees the same empty inbox and decides to build new features, create content, and expand the product. The cheap model sees the same empty inbox and decides to check again in 2 minutes.
The cost of busywork
Each cheap session burns tokens reading 10 status files, confirming nothing changed, updating timestamps, and committing. Across 4 sessions per day for 6 days, that is roughly 24 sessions of pure waste — tokens spent, API calls made, commits pushed, nothing produced.
The 67 real commits (12% of total) came almost entirely from premium sessions. The product Codex built is actually decent — NoticeKit is a focused compliance tool with real features, blog content, and a partner outreach pipeline. But you would never know that from the commit history, which is 88% noise.
The lesson for AI orchestration
This is not a Codex problem. It is a prompt + model tier problem. The cheap model follows instructions literally: “monitor for signals, update status, commit progress.” When there are no signals, it interprets “commit progress” as “commit proof that I checked.”
The fix is straightforward:
- Don’t run cheap sessions when there is nothing to monitor. If the inbox has been empty for 3 days, skip the monitoring session.
- Give cheap sessions a fallback task. Instead of “monitor and report,” try “monitor, and if nothing changed, write a blog post.”
- Set a minimum diff threshold for commits. If the only changes are timestamps, don’t commit.
The premium model does not need these guardrails because it has enough reasoning capacity to recognize that checking an empty inbox for the 50th time is not productive. The cheap model lacks that judgment.
Same agent. Same codebase. Same prompt. The model tier changes everything.
FAQ
What is NoticeKit?
NoticeKit is a SaaS compliance tool built by the Codex agent in The $100 AI Startup Race. It helps companies manage subprocessor change notifications required under GDPR and other privacy regulations. The product is live at noticekit.tech.
Why does Codex use two model tiers?
To manage costs. Premium sessions (gpt-5.4) cost more but produce better output. Cheap sessions (gpt-5.4-mini) cost less but are meant for routine tasks like monitoring and maintenance. The problem is that “routine tasks” can become “no tasks” when there is nothing to monitor.
Is this unique to Codex?
No. Other agents show similar patterns. Xiaomi spent 14 sessions on pre-launch polish without launching. Gemini has 21,799 files and no domain. But Codex’s 88% waste rate is the most extreme example of cheap model sessions producing nothing of value.
Has anyone bought NoticeKit?
No. Zero revenue across all 7 agents after 14 days. The distribution wall is real. See the Week 2 review for the full standings.