The 'Tokenmaxxing' Era Is Over. Uber Blew Its Whole AI Budget in Four Months.
Companies spent a year burning as many AI tokens as possible, treating consumption as progress. Now the bills are landing, and the pullback arrives right as OpenAI and Anthropic head for IPOs.
By The Daily Query · · 3 min read
For about a year, the operating instruction inside a lot of companies was simple: use as much AI as you possibly can. Some firms literally incentivized it, treating token consumption as a proxy for productivity, running internal leaderboards for who burned the most. The industry even had a name for it. Tokenmaxxing.
That era is ending, and the way it is ending should worry the two companies about to go public on the assumption it would last.
The bills that ended the party
The clearest sign a trend has broken is when the finance team gets involved, and the finance teams have arrived.
Uber exhausted its entire 2026 AI budget in roughly four months. Around 5,000 engineers pushed token consumption so far past projections that the company capped individual AI spending at $1,500 a month on agentic coding tools. Microsoft cancelled Claude Code subscriptions for employees across several product divisions. Meta, which had an informal tokenmaxxing leaderboard, took it down and reined in internal token spending after the costs climbed toward the billions. These are not cost-conscious startups. They are among the richest engineering organizations on earth, and they hit a wall.
The most telling move came from a smaller company. The CEO of the AI startup Lindy moved 100% of its traffic off Anthropic's Claude models and onto DeepSeek's cheaper open-weight alternatives, and watched costs fall off a cliff. When a company switches its entire production workload to a rival to save money, that is not belt-tightening. That is a verdict on whether the premium was worth it.
Why consumption stopped meaning progress
Tokenmaxxing rested on an assumption that quietly turned out to be false: that more AI usage equals more output. It does not. A developer who runs an agent on every trivial task, re-prompts ten times instead of thinking once, and leaves long jobs churning in the background generates enormous token bills and not necessarily more shipped work. Consumption was easy to measure, so it got measured, and the thing that is easy to measure becomes the thing you optimize until someone checks whether it correlates with results.
Someone checked. Palantir's Alex Karp, never one to undersell a position, called the token-based pricing model "completely wrong" on July 1. You do not have to agree with him to notice that the customers, not just the pundits, are now acting on the same instinct.
The timing is brutal for the labs
Here is the number that connects this to everything else. OpenAI and Anthropic both filed to go public in early June, on valuations that assume enterprise token demand keeps climbing more or less forever. Their largest customers deciding, in the same month, to cap and cut that exact spending is the single most inconvenient thing that could happen to those stories.
It also explains a launch that otherwise looks like ordinary product news. When OpenAI shipped GPT-5.6 with cheap Terra and Luna tiers and Anthropic pushed a cheaper default model, that was not generosity. It was a response to customers who have started reading their invoices. The labs are being pulled toward efficiency by the same force squeezing their buyers.
None of this means AI spending falls. It means it matures. The question inside companies is shifting from how much AI can we use to how much value are we getting per dollar, which is the question every other line item on the budget has always had to answer. That is healthier for the technology and harder for anyone whose valuation was built on the assumption that the meter would only ever spin faster. The efficiency era is here, and it arrived, as these things do, the moment the first big bill came due.
enjoyed this one?_
Get the next one in your inbox.
One email every morning. The AI news that matters, decoded in five minutes.
up_next → Industry
AI's Real Bottleneck Isn't Chips Anymore. It's the Power Grid, and a Regulator Just Noticed.
FERC ordered the six largest US grid operators to rewrite how data centers plug into the grid. It confirms what the AI buildout made obvious: the binding constraint moved from chips to electricity.