Why AI Can't Crack Your Database
GPT-5 achieves 94.6% accuracy on AIME 2025, suggesting near-human mathematical reasoning.
Yet ask it to query your database, and success rates plummet to the teens.
The Spider 2.0 benchmarks reveal a yawning gap in AI capabilities. Spider 2.0 is a comprehensive text-to-SQL benchmark that tests AI models’ ability to generate accurate SQL queries from natural language questions across real-world databases.
While large language models have conquered knowledge work in mathematics, coding, and reasoning, text-to-SQL remains stubbornly difficult.

