Democratizing Your Data With a Modern Cloud Data Lake

Roy Telles
6 min readFeb 10, 2021

Data lakes have arisen to solve a growing problem: the need for a scalable, low-cost data repository that allows organizations to easily store all data types from a diverse set of sources, and then analyze that data to make evidence-based decisions.

Data lakes are an ideal way to gather, store and analyze enormous amounts of data in one location. The modern cloud data lake leverages the power, flexibility, and near-infinite scalability of the cloud.

A Deep Dive Into Cloud Data Lakes

The term data lake was coined to describe a new type of data repository for storing massive amounts of raw data in its native form, in a single location (coined by James Dixon, 2010).

Getting the Data Flowing

Prior to the data lake, there was the data warehouse. Data warehouses were built primarily for analytics. They used relational databases and schemas to define tables of structured data in orderly columns and rows.

In contrast, the data lake’s goal was to enable organizations to explore, refine, and analyze huge amounts of information (about a petabyte’s worth!) without a predetermined notion of structure. It’s important to understand that data lakes enable a comprehensive way to explore and analyze petabytes of data constantly arriving from multiple sources.

The original data lake:

--

--

Roy Telles

I'm a data engineer looking to broaden my knowledge and am passionate about Big Data. I also enjoy blogging about data and big data infrastructures!