100% found this document useful (1 vote)
338 views44 pages

Data Lake Beyond The Data Warehouse

This document discusses the emergence of data lakes as a new paradigm for managing large volumes of raw data. It describes how data lakes differ from traditional data warehouses by allowing organizations to store all their data in its native format without structure imposed upon it. The document then outlines four stages of maturity for data lake implementation, from initial pilot projects to fully integrating the data lake platform as a core business competency. It argues that data lakes combined with tools for data discovery, analytics and business intelligence can provide a unified platform for deriving insights from big data.

Uploaded by

williawo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
338 views44 pages

Data Lake Beyond The Data Warehouse

This document discusses the emergence of data lakes as a new paradigm for managing large volumes of raw data. It describes how data lakes differ from traditional data warehouses by allowing organizations to store all their data in its native format without structure imposed upon it. The document then outlines four stages of maturity for data lake implementation, from initial pilot projects to fully integrating the data lake platform as a core business competency. It argues that data lakes combined with tools for data discovery, analytics and business intelligence can provide a unified platform for deriving insights from big data.

Uploaded by

williawo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Data Lake, beyond the Warehouse

Data Science Thailand Meetup#4

Shifting to the 3rd gen platform with Data Lake

February, 3, 2016
โกเมษ​​จันทวิมล
Komes Chandavimol

1 Cheow Lan Lake, Thailand


https://www.domo.com/learn/data-never-sleeps-3-0
2
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
The Growth of Data

https://www.domo.com/learn/data-never-sleeps-3-0
3
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
4
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
Can these tools support Big Data?

 Spreadsheet?
 Database?
 Data Mart?
 Data Warehouse?

Source: Forrester Research’s James Kobielus


5
The Emergence of Big Data Tools

http://blogs.forrester.com/category/hadoop
6
http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf
HADOOP

http://opensource.com/life/14/8/intro-apache-hadoop-big-data 7
Analytics 3.0
Data Mining Tools

Data Discovery and Visualization Tools

Tableu.com, RapidMiner.com
How to apply to current environment?

9
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
Traditional Data Warehouse

10
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
New Data Management Architecture

11
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
New Data Management Architecture

12
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
Data Lake

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

13
Data Lake

A single place to store every type of data in its native format


with no fixed limits on account size or file size, high throughput
to increase analytic performance and native integration with the
Hadoop ecosystem.

Reference: James Serra's Blog


https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now 15
Data Lake Development with Big Data , Pradeep Pasupuleti (2015)
Data Lake Processes

www.emc.com
16
Data Lake and Data Warehouse

Hadoop Distributed Compared,BlazeClan Technology,2015


17
Data Lake and Data Warehouse

Hadoop Distributed Compared,BlazeClan Technology,2015


18
Data Lakes

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key- differences.html
19
Data Lake

 Type of Data
 Raw Data
 Derived Data
 Aggregated Data

 Type of Environment
 Discovery Environment
 Production Environment

The Definition of Data Lake, John O’Brien(2015)


20
How the Data Lake works?

Traditional Enterprise Data warehouse


21
http://www.clearpeaks.com/blog/category/tableau
New Data Management Architecture

22
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
23
http://www.kdnuggets.com/2014/05/big-data-landscape-v30-
analyzed.html
Data Lake Maturity

25
The Definition of Data Lake, John O’Brien(2015)
4 Maturity Stages of Data Lake

 Stage 1 – Pilot Project (Understand the Technology)


 Stage 2 – Productionize Hadoop and its capabilities
 Stage 3 – Proactive consolidate data to (Big) Data Analytics
 Stage 4 – Platform the Data Lake to Core Competency

The Definition of Data Lake, John O’Brien(2015)


26
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 1 – Pilot Project

 Handling data at scale


 Involves getting the plumbing in place and learning to acquire
and transform data at scale.
 The analytics may be quite simple, but much is learned about
making Hadoop work the way you desire.

The Definition of Data Lake, John O’Brien(2015)


27
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 2– Productionize Hadoop
and its capabilities

 Involves improving the ability to transform and analyze data.


 Find the tools that are most appropriate to their skillset
 Acquiring more data and build applications.

The Definition of Data Lake, John O’Brien(2015)


28
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 3 – Proactive consolidate data to
(Big) Data Analytics

 Involves getting data and analytics into the hands of as many


people as possible.
 It is in this stage that the data lake and the enterprise data
warehouse start to work in unison, each playing its role.
 Started with a data lake eventually added an enterprise data
warehouse to operationalize its data.

The Definition of Data Lake, John O’Brien(2015)


29
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Big Data Analytics

http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
30
Data Lake and Big Data Analytics

http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/ 31
Stage 4 – Platform the Data Lake to
Core Competency

 Enhance Enterprise Capabilities are added to the data lake.


 Few companies have reached this level of maturity, but many
will as the use of big data grows,
 Require Data governance, compliance, security, and auditing
(and incorporate to Company Data Strategy)

The Technology of the Business Data Lake, Capgemini (2013)


32
Business Data Lake

The Technology of the Business Data Lake, Capgemini (2014)

33
https://shefsite.files.wordpress.com/2014/04/where.jpg 34
35
http://image.slidesharecdn.com/mapr-db-in-hadoop-nosql-overview-150929062856-lva1-
app6892/95/maprdb-the-first-inhadoop-document-database-12-638.jpg?cb=1443536326 36
http://www.predictiveanalyticstoday.com/waterline-data- 37
self-service-for-the-hadoop-data-lake/
The Data Lake Unifies Data Discovery,
Data Science, and BI 3.0
YARN
Hadoop Spark
Big Data Graph Analytics

Predictive Analytics Data Visualization Hive


Business Discovery
Visual Analytics
Data Lake
Deep Learning Data Science Business Intelligence 3.0
Self Serve Business
Machine Learning
Self Serve Business
Feature Engineering

Big Data

38
 20+ posts relates to “Data Lake”
 Type “Data Science Thailand” “Data Lake”

40
41
Traditional Enterprise Data warehouse
42
http://www.clearpeaks.com/blog/category/tableau
Questions?

43
44

You might also like