What 375 AI Builders Actually Ship

70% of production AI teams use open source models. 72.5% connect agents to databases, not chat interfaces. This is what 375 technical builders actually ship - & it looks nothing like Twitter AI.

350 out of 413 teams use open source models

70% of teams use open source models in some capacity. 48% describe their strategy as mostly open. 22% commit to only open. Just 11% stay purely proprietary.

Read more

Teaching Local Models to Call Tools Like Claude

Ten months ago, DeepSeek collapsed AI training costs by 90% using distillation - transferring knowledge from larger models to smaller ones at a fraction of the cost.

Distillation works like a tutor training a student : a large model teaches a smaller one.1 As we’ve shifted from knowledge retrieval to agentic systems, we wondered if there was a parallel technique for tool calling.2

Could a large model teach a smaller one to call the right tools?

Read more

From Knowledge to Action

GPT-5 launched yesterday. 94.6% on AIME 2025. 74.9% on SWE-bench.

As we approach the upper bounds of these benchmarks, they die.

What makes GPT-5 and the next generation of models revolutionary isn’t their knowledge. It’s knowing how to act. For GPT-5 this happens at two levels. First, deciding which model to use. But second, and more importantly, through tool calling.

We’ve been living in an era where LLMs mastered knowledge retrieval & reassembly. Consumer search & coding, the initial killer applications, are fundamentally knowledge retrieval challenges. Both organize existing information in new ways.

Read more

Why Synthetic Data Is the Secret Weapon for AI Startups in 2025

The most successful AI startups of 2024 shared an unlikely secret: they didn't rely on proprietary datasets. Instead, they leveraged synthetic data to outmaneuver competitors who were still chasing exclusive data partnerships and expensive labeling operations.

The numbers tell a compelling story. Synthesis AI grew 410.6% last year while Datagen raised $72M – the largest funding round in the synthetic data space. Meanwhile, companies burning millions on human data labeling watched their unit economics deteriorate as synthetic alternatives delivered 500-1000x cost reductions.

Read more