Two blades, closing.
Everyone agrees on the sentence: AI is getting cheaper. Look closer and it is true at the unit level and false at the invoice level, and that gap is the whole story. A scissor has two blades. Watch both.
The month the chatbot started working.
Here's the thing the bill doesn't tell you: nobody raised prices. The token bill climbed because, around November 2025, the software quietly changed jobs.
For most of 2025, OpenAI and Anthropic were running what Simon Willison sums up as "Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models." Basically: teaching the models to write code that passes the tests. The payoff landed in a single fortnight. GPT-5.1 Codex Max shipped on 19 November 2025, Claude Opus 4.5 on 24 November, and paired with their coding harnesses, Willison dates this to the moment coding agents went "from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done."
A chatbot answers and stops. An agent keeps going. It reads the repository, plans, edits, runs the tests, reads the failures, and tries again. And here's the part that costs money: to hold its place, it replays the entire conversation back to the model on every single step. Willison's own guide puts the mechanics plainly. Coding agents "maintain state by replaying entire conversations with each new prompt," so "as a conversation gets longer, each prompt becomes more expensive since the number of input tokens grows every time." The work that used to be a question is now a loop, and the loop is metered.
And that's the part the cheaper-per-token headlines skip right over. The price of generating a line of code fell to almost nothing. The cost of getting an agent to deliver good code, across a long autonomous run, didn't fall with it. The tools the labs finally found a market for, Willison writes, are exactly the ones that "burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals." The capability showed up, and the consumption showed up welded to it.
Around November 2025, coding agents crossed from often-work to mostly-work and became daily drivers for highly paid engineers. The shift from a tool you talk to into a worker that runs for you is what changed the token math, because an agent that works autonomously burns vastly more tokens than a chat that answers a question.
Why more tokens beats cheaper tokens.
The other blade, plotted. One of Willison's GPT-5 Codex tasks consumed 169,818 input, 17,112 output, and 1,176,320 cached tokens: well over a million for a single task, and the only verified point on this chart, sitting at the top. The vertical axis is logarithmic because that anchor sits three orders of magnitude above the bottom. The lower points trace the shape from a single chat turn upward and are illustrative of task class, not measured.
The chronology is the argument.
Read top to bottom, the sequence explains itself. The tools got good, so the pricing flipped to metered usage; once usage was metered, the frontier price started rising; and an enterprise budget set against the old mental model ran out a third of the way into the year.
Sticker price versus what it actually rang up.
The rate card is one number; the realised cost is another. Over thirty days, at standard interface rates, Willison ran up more than two thousand dollars of tokens. He paid two hundred. That gap is what keeps the true cost off the individual's own statement, which is precisely why finance is the last to find out. The dashed line marks what he actually paid.
The first institution to hit the wall.
Uber is the first named company to run into the scissor in public. The cap is a ceiling, drawn around a line item that, left uncapped, ate a year's budget by April. The figures, set side by side.
The cost didn't fall. It moved.
The mistake was never in the rate card, but in reading a per-unit price as if it were the bill.
Every blade of the scissor is real. The price of a token did fall, by more than an order of magnitude, and it is still the headline. What the headline leaves out is that the work changed shape underneath it. A chatbot was something you priced like a phone call; an agent is something you meter like electricity, and it runs for hours. The cheaper-AI story was true about the cost of talking to a model, and silent about the cost of one working for you.
Uber is the first named company to say the number out loud, and the unit economics suggest it will not be the last. The work only grows more token-hungry as agents take on more of it, and the budgets are still being drawn from the old reading of the price.
So the cap at Uber is the first visible price of a quiet substitution. We stopped buying answers and started renting workers, and the meter that used to tick in thousands now turns in millions. The bill came due in April because that is when the arithmetic finally caught up with the language.
Uber is the first named company to hit the wall, not the last. The cheaper-AI story was true about the price of talking to a model and silent about the price of one working for you, and the business pays per task, not per token.