Given all the momentum of the NoSQL movement, it would be easy to write off SQL-based technologies as forgotten, or simply standing still. But there’s a tremendous amount of innovation occurring in SQL databases. Amazon’s Redshift, an elastic data-warehousing solution launched in late 2012 is the most salient example.
Redshift’s ability to process huge volumes of data is breathtaking. When running Redshift on solid state drives (SSDs), one team at FlyData queried 1 terabyte of data in less than 10 seconds.
AirBnB’s data science team wrote about their experiences contrasting Redshift and Hive. They found Redshift to be 20x faster at 25% the cost. Aggregate Knowledge shared their story of searching for a database system that permits linear scaling and quick access to same day data with Redshift
Although Redshift adoption is still much smaller than other data-warehousing technologies, consider Github has 96 Redshift repositories compared to close to 1200 for Hive; and although Redshift is missing some important data processing features, the benefits of a cloud-based data-warehouse with familiar SQL syntax and tremendous speed will make Redshift the data-warehouse of choice for many analytics teams.
The rise of Redshift creates opportunities for startups to create valuable products atop the cloud-based data-warehouse. First, it speeds product trial and adoption. For example, potential customers examining Looker’s product simply need to provision access to their Redshift instance to give it a whirl. Second, Redshift enables startups to focus on innovating in better design, application-level innovation and delivering insight rather than the infrastructure of data analysis.
Redshift is a boon for analysts and data scientists who will benefit from the speed, cost and scalability of the system. In addition, the next-generation of data science startups will profit from the accelerated product adoption and sales models afforded by Redshift’s delivery model.
Published 2014-01-30 in trends