Two years ago, the idea of useful AI on your phone was fantastical. Siri couldn’t finish a sentence. Local models hallucinated nonsense.
Last week, Google released Gemma 4 E4B1, a free model that matches GPT-4o and runs entirely on your phone.2
The next few weeks promise even more advanced pocket models. The market expects new releases from DeepSeek3, Qwen4, Kimi5 & Minimax6.
Frontier models don’t stay frontier for long. Within three to four months, you can run a model with similar performance on your laptop; 23 months later, you can run the same model on your phone.
Three forces are driving this compression. Better algorithms : distillation & reinforcement learning squeeze more capability into fewer parameters. Talent density : the biggest prizes in capitalism attract the best minds in the field. These are the fastest growing software companies in history. And capital : a trillion dollars invested in data centers powering training.
In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression. At this rate, the phone in your pocket will run today’s frontier models before you upgrade it.
-
Gemma 4 E4B matches or exceeds GPT-4o across multiple benchmarks including MATH, GSM8K, GPQA Diamond & HumanEval. Full benchmark comparison ↩︎