0 Principles of Big Data
0 Principles of Big Data
1
Today’s Agenda
2
Part I
3
Big Data
Data is key resource in the modern world.
According to IBM, we are currently creating 2.5 quintillion bytes of
data everyday.
IDC predicts that the world wide volume of data will reach 40 zettabytes
by 2020.
The radical expansion and integration of computation, networking, dig-
ital devices and data storage has provided a robust platform for the
explosion in big data.
4
Big Data
5
On the Verge of A Disruptive Century: Breakthroughs
Gene Ubiquitous
Sequencing and Computing
Biotechnology
Smaller, Faster,
Cheaper Sensors
Faster
Communication
6
Big Data Applications are Everywhere
7
Big Data: What Happens in the Internet in a Minute?
8
Data Generation and Consumption Model is Changing
9
Data Generation and Consumption Model is Changing
10
Big Data
11
Big Data
12
Your Smart Phone is now Very smart
13
Internet of Things (IoT)
A network devices, connect directly with each other to capture, share
and monitor vital data automatically through a SSL that connects a
central command and control server in the cloud
Enabling communication between devices, people & processes to ex-
change useful information & knowledge that create value for humans
A global Network Infrastructure linking Physical & Virtual Objects
Infrastructure: Internet and Network developments
Specific object identification, sensor, and connection capability
14
Big Data: Internet of Things
15
Prediction of IoT Usage1
1
https://www.ericsson.com/
16
Why IoT opportunity is growing now?
17
Smart X Phenomena
18
What it all produce?
19
Big Data: Activity Data
Simple activities like listening to music or reading a book are now
generating data.
Digital music players and eBooks collect data on our activities.
Your smart phone collects data on how you use it and your web
browser collects information on what you are searching for.
Your credit card company collects data on where you shop and your
shop collects data on what you buy.
It is hard to imagine any activity that does not generate data.
20
Big Data
The cost of sequencing one human genome has fallen from $100
million in 2001 to $1K in 2015
21
New Types of Data
22
New Types of Data
23
The Data Structure Evolution Over the Years
24
What Means Big Data?
25
Big Data (3V)
26
Big Data (5V)
27
Big Data
28
Big Data Definition
McKinsey global report described big data as the next frontier for in-
novation and competition.
The report defined big data as ”Data whose scale, distribution, di-
versity, and/or timeliness require the use of new technical architectures
and analytics to enable insights that unlock the new sources of business
value”
29
Big Data Revolution
30
IBM 5MB Hard Disk ;-)
31
Recent Advances in Computational power
32
Big Data
33
Your Pocket Size Terabytes Hard Disk
34
Hardware Advancements Enable Big Data Processing
35
Scale Up VS Scale Out
36
The Data Overload Problem
37
The Data Overload Problem
38
The Data Overload Problem
39
Fourth Paradigm
40
Fourth Paradigm
41
Fourth Paradigm
42
Computing Clusters
43
Data Centers
44
Big Data is a Competitive Advantage
45
Big Data is a Competitive Advantage
46
Data is the new Oil/Gold
47
Big Data Processing Systems
Big Data is the New Oil
and
Big Data Processing Systems is the Machinery
48
Part II
49
A Little History: Two Seminal contributions
”The Google File System”2
Describes a scalable, distributed, fault-tolerant file system tailored for
data-intensive applications, running on inexpensive commodity hardware,
delivers high aggregate performance
”MapReduce: Simplified Data Processing on Large Clusters”3
Describes a simple programming model and an implementation for pro-
cessing large data sets on computing clusters.
2
S. Ghemawat, H. Gobioff, S. Leung. The Google file system. SOSP 2003
3
J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters.
OSDI 2004
50
Hadoop4: A Star is Born
52
Hadoop’s Success
53
Hadoop’s Success5
5
https://www.google.com/trends/
54
The Always Dilemma: Does One Size Fit All?!
55
Big Data 2.0 Processing Systems
Cloudera Impala
Apache Samza
Apache S4
Trinity
Google MapReduce Apache Spark Apache Storm GraphLab Facebook Presto Apache Phoenix
Hadoop Apache Hive Google Pregel Apache Giraph IBM Big SQL GraphX Apache Tez
2004 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2015
56
Big Graphs
57
Hadoop for Big Graphs?!
60
Static Data Computation vs Streaming Data Computation
61
Hadoop for Big Streams?!
8
http://nosql-database.org/
65
Data Storage Options
66
Big Data Landscape
67
Big Data Landscape
68
Big Data Market Size9
9
https://www.statista.com/
69
The End
70