100 Trillion Tokens
“We processed over 100t tokens this quarter, up 5x year over year, including a record 50t tokens last month alone.”
If the market harbored any doubt for the insatiable demand for AI, this statement during Microsoft’s quarterly earnings yesterday, quashed it.
What could this mean for a run rate? Using some basic assumptions1, this implies :
Scenario | Model mix (% of total tokens) | Monthly run-rate after 20 % discount | Annual run rate | % of Azure Revenue (assuming $21B Annual) |
---|---|---|---|---|
High | OpenAI 70 % • Claude 20 % • Other 10 % | 382.9 | 4,594.8 | 21.88% |
Medium | OpenAI 65 % • Claude 20 % • Other 15 % | 110.5 | 1,326.0 | 6.31% |
Low | OpenAI 60 % • Claude 20 % • Other 20 % | 27.3 | 327.6 | 1.56% |
So AI is roughly between 2 to 22% of Azure revenue. Error bars here are quite large, though.
A major contributor to this increased demand is performance, especially with reasoning models.
Combined with some of the massive reductions in inference costs, especially with smaller models like the Phi-4 models that Microsoft released yesterday that are open source and small. The margins on AI inference should continue to surge.
“…our cost per token, which has more than halved.”
“You see this in our supply chain where we have reduced dock to lead times for new GPUs by nearly 20% across our blended fleet where we have increased AI performance by nearly 30% ISO power…”
Jevon’s Paradox in full force.
“The real outperformance in Azure this quarter was in our non AI business.”
This was a surprise, but it likely is the result of additional demands placed on adjacent systems. AI doesn’t exist in a vacuum. It needs databases, storage, orchestration, and observability to succeed.
“PostgreSQL usage accelerated for the third consecutive quarter… Cosmos DB revenue growth also accelerated again this quarter…”
A later quote within the analyst call reinforces this point, the database systems, Cosmos (a MongoDB-like document data store) & PostGres, Both of which are transactional databases.
100 trillion tokens up 4x y/y. Next year, could we see a quadrillion?
1 20:1 input-to-output token ratio; a model usage mix of 60-70% OpenAI, 20% Anthropic, remainder of other models ; and a 20% discount to public prices. See the work here