May 4, 2026 · 4 min read

Codex's 88% Waste Rate: What Happens When Cheap AI Sessions Run Unsupervised

In The $100 AI Startup Race, each agent runs on two model tiers: a premium model for complex work and a cheap model for routine tasks. The idea is simple — use the expensive model when it matters, save money with the cheap model for everything else.

Codex (running on OpenAI’s gpt-5.4 and gpt-5.4-mini) just showed us what happens when the cheap tier has nothing meaningful to do.

The numbers

Since April 28, Codex made 557 commits. Of those, 490 (88%) were validation checkpoint updates — commits that change nothing but timestamps in status files.

Here is what a typical “validation commit” looks like:

- Human-help request state: no active request as of 2026-05-02 20:11 UTC.
+ Human-help request state: no active request as of 2026-05-02 20:12 UTC.
- Production generator state: checked 2026-05-02 20:11 UTC
+ Production generator state: checked 2026-05-02 20:12 UTC

Ten files updated. One minute of timestamp change. Committed, pushed, done. Then it does it again 2 minutes later.

At one point, Codex made 5 validation commits in 7 minutes — each one updating timestamps by 1-2 minutes across the same 10 status files. The commit messages: “Refresh validation checkpoint to 20:09 UTC,” “Refresh validation checkpoint to 20:11 UTC,” “Refresh validation checkpoint to 20:12 UTC.”

What the cheap model actually does

The cheap model (gpt-5.4-mini) runs 4 sessions per day at 08:00, 16:00, 20:00, and 23:00 UTC. Each session follows the same pattern:

Read PROGRESS.md and BACKLOG-CHEAP.md
Check if any new emails arrived (none have)
Check if any partner outreach got replies (none have)
Update all status files with the current timestamp
Commit and push
Repeat steps 2-5 until the session ends

The model has been told to monitor for incoming signals — email replies, partner responses, community feedback. But there are no signals. Nobody has replied to the outreach emails. Nobody has submitted a contact form. The inbox is empty.

So the cheap model does the only thing it knows how to do: it checks again, updates the timestamp to prove it checked, and commits. It’s a monitoring daemon that commits its own heartbeat.

What the premium model builds

The premium model (gpt-5.4) runs twice a day at 04:00 and 12:30 UTC. Same agent, same codebase, same prompt. Completely different output.

In the same period, the premium sessions built:

Subprocessor page checker — a browser-based tool that scans vendor subprocessor pages
Review brief builder — generates compliance review documents
Partner outreach funnel — sent 5 emails to potential partners with tracking
Free teardown landing page — lead generation for the compliance audit service
Notice generator — automated subprocessor change notice creation
Buyer path guide — onboarding flow for new customers
Page monitoring comparison page — SEO content comparing NoticeKit to alternatives
20+ blog posts about subprocessor compliance

The premium model sees the same empty inbox and decides to build new features, create content, and expand the product. The cheap model sees the same empty inbox and decides to check again in 2 minutes.

The cost of busywork

Each cheap session burns tokens reading 10 status files, confirming nothing changed, updating timestamps, and committing. Across 4 sessions per day for 6 days, that is roughly 24 sessions of pure waste — tokens spent, API calls made, commits pushed, nothing produced.

The 67 real commits (12% of total) came almost entirely from premium sessions. The product Codex built is actually decent — NoticeKit is a focused compliance tool with real features, blog content, and a partner outreach pipeline. But you would never know that from the commit history, which is 88% noise.

The lesson for AI orchestration

This is not a Codex problem. It is a prompt + model tier problem. The cheap model follows instructions literally: “monitor for signals, update status, commit progress.” When there are no signals, it interprets “commit progress” as “commit proof that I checked.”

The fix is straightforward:

Don’t run cheap sessions when there is nothing to monitor. If the inbox has been empty for 3 days, skip the monitoring session.
Give cheap sessions a fallback task. Instead of “monitor and report,” try “monitor, and if nothing changed, write a blog post.”
Set a minimum diff threshold for commits. If the only changes are timestamps, don’t commit.

The premium model does not need these guardrails because it has enough reasoning capacity to recognize that checking an empty inbox for the 50th time is not productive. The cheap model lacks that judgment.

Same agent. Same codebase. Same prompt. The model tier changes everything.

FAQ

What is NoticeKit?

NoticeKit is a SaaS compliance tool built by the Codex agent in The $100 AI Startup Race. It helps companies manage subprocessor change notifications required under GDPR and other privacy regulations. The product is live at noticekit.tech.

Why does Codex use two model tiers?

To manage costs. Premium sessions (gpt-5.4) cost more but produce better output. Cheap sessions (gpt-5.4-mini) cost less but are meant for routine tasks like monitoring and maintenance. The problem is that “routine tasks” can become “no tasks” when there is nothing to monitor.

Is this unique to Codex?

No. Other agents show similar patterns. Xiaomi spent 14 sessions on pre-launch polish without launching. Gemini has 21,799 files and no domain. But Codex’s 88% waste rate is the most extreme example of cheap model sessions producing nothing of value.

Has anyone bought NoticeKit?

No. Zero revenue across all 7 agents after 14 days. The distribution wall is real. See the Week 2 review for the full standings.

Codex's 88% Waste Rate: What Happens When Cheap AI Sessions Run Unsupervised

The numbers

What the cheap model actually does

What the premium model builds

The cost of busywork

The lesson for AI orchestration

FAQ

What is NoticeKit?

Why does Codex use two model tiers?

Is this unique to Codex?

Has anyone bought NoticeKit?

📬 AI Dev Weekly

You might also like

Gemini's 21,799 Files: The AI Agent That Won't Stop Building and Won't Start Shipping

Week 2 Results: The Distribution Wall — Zero Revenue, 7 Products, and the Shift That Changed Everything

Xiaomi's Launch Loop: 14 Sessions of 'Final' Pre-Launch Audits

We Told Our AI Agents to Clean Up Their Notes. Here's What Happened.