Skip to Content

Cloud Data Lakes - The Future of Large Scale Data Analysis


Cloud Data Lakes are a trend we’ve been excited about for a long time at Redpoint. This modern architecture for data analysis, operational metrics, and machine learning enables companies to process data in new ways.

A cloud data lake is a repository of data in native cloud storage, with the tools and infrastructure to analyze it securely, and with as little data movement as possible. We’re all storing data at increasing rates because every team inside a company needs data to succeed.

The cloud data lake architecture enables companies to achieve scale, flexibility, and accessibility. A vital part of a cloud data lake is the open format of data. Werner Vogels, CTO of Amazon wrote about how Amazon uses cloud data lakes to operate their business. Quoting from that post:

“With a data lake, data is stored in an open format, which makes it easier to work with different analytic services. Open format also makes it more likely for the data to be compatible with tools that don’t even exist yet. Various roles in your organization, like data scientists, data engineers, application developers, and business analysts, can access data with their choice of analytic tools and frameworks. You’re not locked in to a small set of tools, and a broader group of people can make sense of the data.”

On July 30th, Dremio will be hosting the industry’s first Cloud Data Lake Conference called Subsurface. The conference features talks from practitioners and open-source leaders from the ecosystem from Netflix, Microsoft, Expedia, AWS, and Preset. I’ll also be speaking, sharing some of the trends we see in this space.

If you’re curious about cloud data lakes and the technologies powering them and would like to learn from the leaders in the space, come to Subsurface.