The Robotic Tortoise & the Robotic Hare
I set up a race today between two robots.
My Mac on the left vs Claude Code on the right. Both tasked with building a payment app on Stripe’s new Tempo blockchain. Same prompts, same task, side by side.
Opus 4.5 is about 20% smarter than Qwen 35B on benchmarks. And it’s likely 50x larger. The hare should have won. It didn’t.
The local model finished in 2 minutes. Claude took over 6. I asked Claude to score both outputs : local model 6.5, Claude 4.5.1