One trillion tokens per day. Is that a lot?

“And when we look narrowly at just the number of tokens served by Foundry APIs, we processed over 100t tokens this quarter, up 5x year over year, including a record 50t tokens last month alone.”

In April, Microsoft shared a statistic, revealing their Foundry product is processing about 1.7t tokens per month.

Screenshot 2025-09-18 at 12.57.26 PM

Yesterday, Vipul shared Together.ai is processing 2t of open-source inference daily.

In July, Google announced a staggering number :

“At I/O in May, we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number, now processing over 980 trillion monthly tokens, a remarkable increase.”

Company Daily Tokens (trillions) vs Microsoft Date
Google 32.7 574x July 2025
Together 2.0 35x September 2025
Microsoft Foundry 0.057 1x April 2025

Google processes 32.7t daily, 16x more than Together & 574x more than Microsoft Foundry’s April volume.

From these figures, we can draw a few hypotheses :

  1. Open-source inference is a single-digit fraction of inference. It’s unclear what fraction of Google’s inference tokens are from their open source models like Gemma. But, if we assume Anthropic & OpenAI are 5t-10t tokens per day1 & all closed-source, plus Azure is roughly similar in size, then open-source inference is likely around 1-3% of total inference. 2
  2. Agents are early. Microsoft’s data point suggests the agents within GitHub, Visual Studio, Copilot Studio, & Microsoft Fabric contribute less than 1% of overall AI inference on Azure.
  3. With Microsoft expected to invest $80 billion compared to Google’s $85 billion in AI data center infrastructure this year, the AI inference workloads of each company should increase significantly both through hardware coming online & algorithmic improvements.

“Through software optimization alone, we are delivering 90% more tokens for the same GPU compared to a year ago.”

Microsoft is squeezing more digital lemonade from their GPUs & Google must also be doing similar.

When will we see the first 10t or 50t AI tokens processed per day? It can’t be far off now.


  1. Estimates from thin air! ↩︎

  2. Google & Azure at 33t tokens per day each, Together & 5 other neoclouds at roughly 2t tokens per day each, & Anthropic & OpenAI at 5t tokens per day, gives us 88t tokens per day. If we assume 5% of Google’s tokens are from open-source models, that’s 1.65t tokens per day, or roughly 1.9% of total inference. Again, very rough math. ↩︎