0% found this document useful (0 votes)

3 views

0 Principles of Big Data

The document provides an overview of Big Data, highlighting its significance as a key resource in the modern world and the exponential growth of data generation. It discusses the evolution of Big Data systems from 1.0 (Hadoop) to 2.0, emphasizing the shift towards domain-specific, optimized systems and the impact of the Internet of Things (IoT). Additionally, it addresses the challenges of data overload and the need for new computational architectures to process vast amounts of data effectively.

Uploaded by

Karim Osama

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

0 Principles of Big Data

Uploaded by

Karim Osama

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

CIT650

Introduction to Big Data

Principles of Big Data

1
Today’s Agenda

Big Data Phenomena

Big Data 1.0 Systems

Big Data 2.0 Systems

2
Part I

Big Data Phenomena

3
Big Data
Data is key resource in the modern world.
According to IBM, we are currently creating 2.5 quintillion bytes of
data everyday.
IDC predicts that the world wide volume of data will reach 40 zettabytes
by 2020.
The radical expansion and integration of computation, networking, dig-
ital devices and data storage has provided a robust platform for the
explosion in big data.

4
Big Data

5
On the Verge of A Disruptive Century: Breakthroughs

Gene Ubiquitous
Sequencing and Computing
Biotechnology

Smaller, Faster,
Cheaper Sensors

Faster
Communication

6
Big Data Applications are Everywhere

7
Big Data: What Happens in the Internet in a Minute?

8
Data Generation and Consumption Model is Changing

9
Data Generation and Consumption Model is Changing

Old Model: Few companies (producers) are generating data, all

others are consuming data

New Model: All of us are generating data, and all of us are

consuming data

10
Big Data

11
Big Data

Data generation and consumption is becoming a main part of people’s

daily life especially with the pervasive availability and usage of Internet
technology and applications.

12
Your Smart Phone is now Very smart

13
Internet of Things (IoT)
A network devices, connect directly with each other to capture, share
and monitor vital data automatically through a SSL that connects a
central command and control server in the cloud
Enabling communication between devices, people & processes to ex-
change useful information & knowledge that create value for humans
A global Network Infrastructure linking Physical & Virtual Objects
Infrastructure: Internet and Network developments
Specific object identification, sensor, and connection capability

14
Big Data: Internet of Things

15
Prediction of IoT Usage1

1
https://www.ericsson.com/
16
Why IoT opportunity is growing now?

Affordable hardware: Costs of actuators & sensors have been cut in

half over last 10 years
Smaller, more powerful hardware: Form factors of hardware have
shrunk to millimeter or even nanometer levels
Ubiquitous & cheap mobility: Cost for mobile devices, bandwidth
and data processing has declined over last 10 years
Availability of supporting tools: Big data tools & cloud based in-
frastructure have become widely available

17
Smart X Phenomena

18
What it all produce?

Data ... Data ... Data

19
Big Data: Activity Data
Simple activities like listening to music or reading a book are now
generating data.
Digital music players and eBooks collect data on our activities.
Your smart phone collects data on how you use it and your web
browser collects information on what you are searching for.
Your credit card company collects data on where you shop and your
shop collects data on what you buy.
It is hard to imagine any activity that does not generate data.

20
Big Data

The cost of sequencing one human genome has fallen from $100
million in 2001 to $1K in 2015

21
New Types of Data

22
New Types of Data

23
The Data Structure Evolution Over the Years

24
What Means Big Data?

25
Big Data (3V)

26
Big Data (5V)

27
Big Data

28
Big Data Definition

McKinsey global report described big data as the next frontier for in-
novation and competition.
The report defined big data as ”Data whose scale, distribution, di-
versity, and/or timeliness require the use of new technical architectures
and analytics to enable insights that unlock the new sources of business
value”

29
Big Data Revolution

30
IBM 5MB Hard Disk ;-)

31
Recent Advances in Computational power

Cheaper, larger, and faster disk storage

You can now put all your large database on disk
Cheaper, larger, and faster memory
You may even be able to accommodate it all in memory
Cheaper, more capable, and faster processors
Parallel computing architectures:
Operate on large datasets in reasonable time
Try exhaustive searches and brute force solutions

32
Big Data

Moore’s Law: The information density on silicon integrated circuits

double every 18 to 24 months
Users expect more sophisticated information

33
Your Pocket Size Terabytes Hard Disk

34
Hardware Advancements Enable Big Data Processing

35
Scale Up VS Scale Out

36
The Data Overload Problem

37
The Data Overload Problem

Data is growing at a phenomenal rate. It has become massive, opera-

tional, and opportunistic
Drowning in data but starving for knowledge
The hidden information and knowledge in these mountains of data are
really the most useful

38
The Data Overload Problem

39
Fourth Paradigm

Jim Gray, a database pioneer, described the big data phenomena as

the Fourth Paradigm and called for a paradigm shift in the computing
architecture and large scale data processing mechanisms.
The first three paradigms were experimental, theoretical and, more
recently, computational science

40
Fourth Paradigm

41
Fourth Paradigm

Thousand years ago - Experimental Science

Description of natural phenomena

Last few hundreds years - Theoretical Science

Newton’s laws, Maxwell’s equation , ...

Last few decades - Computational Science

Simulations of complex phenomena

Today - Data-Intensive Science

Scientists overwhelmed with datasets from many different sources
Data captured by instruments
Data generated by simulations
Data generated by sensor networks

42
Computing Clusters

Many racks of computers, thousands of machines per cluster.

Limited bisection bandwidth between racks.

43
Data Centers

44
Big Data is a Competitive Advantage

45
Big Data is a Competitive Advantage

”It’s not who has the

best algorithm that
wins, It’s who has
the most data”
Andrew Ng

46
Data is the new Oil/Gold

47
Big Data Processing Systems
Big Data is the New Oil
and
Big Data Processing Systems is the Machinery

48
Part II

Big Data 1.0 System: The Hadoop

Decade

49
A Little History: Two Seminal contributions
”The Google File System”2
Describes a scalable, distributed, fault-tolerant file system tailored for
data-intensive applications, running on inexpensive commodity hardware,
delivers high aggregate performance
”MapReduce: Simplified Data Processing on Large Clusters”3
Describes a simple programming model and an implementation for pro-
cessing large data sets on computing clusters.

2
S. Ghemawat, H. Gobioff, S. Leung. The Google file system. SOSP 2003
3
J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters.
OSDI 2004
50
Hadoop4: A Star is Born

Hadoop is an open-source software

framework that supports data-intensive
distributed applications and clones the
Google’s MapReduce framework.
It is designed to process very large
amount of unstructured and complex
data.
It is designed to run on a large number
of machines that don’t share any memory
or disks.
It is designed to run on a cluster of ma-
chines which can put together in rela-
tively lower cost and easier maintenance.
4
http://hadoop.apache.org/
51
Key Aspects of Hadoop

52
Hadoop’s Success

Big Data 1.0 = Hadoop

53
Hadoop’s Success5

Big Data 1.0 = Hadoop

5
https://www.google.com/trends/
54
The Always Dilemma: Does One Size Fit All?!

55
Big Data 2.0 Processing Systems

Big Data 2.0 ! = Hadoop

Domain-specific, optimized and vertically focused systems

Cloudera Impala
Apache Samza

Apache S4

Trinity

Apache Flink PowerGraph Apache Tajo

Google MapReduce Apache Spark Apache Storm GraphLab Facebook Presto Apache Phoenix

Hadoop Apache Hive Google Pregel Apache Giraph IBM Big SQL GraphX Apache Tez

2004 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2015

56
Big Graphs

Google estimates that the total number of

web pages exceeds 1 trillion; experimental
graphs of the World Wide Web contain more
than 20 billion nodes and 160 billion edges.

Facebook reportedly consists of more than a

billion users (nodes) and more than 140 bil-
lion friendship relationships (edges) in 2012.

The LinkedIn network contains almost 260

million nodes and billions of edges.

Linked data contains about 31 billion triples.

57
Hadoop for Big Graphs?!

Popular graph query/analysis operations

such as: Page rank, Pattern matching,
Shortest path, Clustering (e.g. Max clique,
triangle closure), Community detection,...,
etc. are iterative in nature.
MapReduce programming model does not
directly support iterative data analysis. Pro-
grammers may implement iterative programs
by manually issuing multiple MapReduce
jobs and orchestrating their execution using
a driver program which wastes I/O, network
bandwidth and CPU resources.
It is not intuitive to think of graphs as
key/value pairs or matrices.
58
Pregel/Giraph6

In 2010, Google introduced the Pregel sys-

tem as a scalable platform for implementing
graph algorithms.
Pregel relies on a vertex-centric approach
and is inspired by the Bulk Synchronous Par-
allel (BSP) model.
In 2012, Apache Giraph was launched as an
open source project that clones the concepts
of Pregel and leverages the Hadoop infras-
tructure.
Other Projects: Spark GraphX (Apache),
GoldenOrb (Apche), GraphLab (CMU) and
Signal/Collect (UZH).
6
https://giraph.apache.org/
59
Big Streaming Data

Every day, Twitter generates more than 12

TB of tweets.

New York Stock Exchange captures 1 TB of

trade information.

About 30 billion radio-frequency identifica-

tion (RFID) tags are created every day.

Hundreds of millions of GPS devices sold ev-

ery year.

60
Static Data Computation vs Streaming Data Computation

61
Hadoop for Big Streams?!

From the stream-processing point of view,

the main limitation of the original implemen-
tation of the MapReduce framework is that
it was designed so that the entire output of
each map and reduce task is materialized
into a local file before it can be consumed
by the next stage.
This materialization step enables the im-
plementation of a simple and elegant
checkpoint/restart fault-tolerance mecha-
nism. But it causes significant delay for jobs
with real-time processing requirements.
Some Hadoop-based Trials include: MapRe-
duce Online and Incoop.
62
Twitter Storm7

Open source project developed by Nathan

Marz and acquired by Twitter in 2012.
Storm is a distributed stream-processing
system with the following key design fea-
tures: horizontal scalability, guaranteed re-
liable communication between the process-
ing nodes, fault tolerance and programming-
language agnosticism.
A Storm cluster is superficially similar to a
Hadoop cluster. One key difference is that a
MapReduce job eventually finishes, whereas
a Storm job processes messages forever (or
until the user kills it).
Other Projects: Flink, Apex, Spark Stream-
ing and Kafka Streams.
7
http://storm.apache.org/
63
Massively Parallel Processing (MPP) Optimized SQL query
engines

Big Data Management 64 / 80

NoSQL Databases8

NoSQL database systems represent a new generation of low-cost, high-

performance database software which is increasingly gaining more and
more popularity.
These systems promise to simplify administration, be fault-tolerant and
able to scale on commodity hardware (Scale out).

8
http://nosql-database.org/
65
Data Storage Options

66
Big Data Landscape

67
Big Data Landscape

68
Big Data Market Size9

9
https://www.statista.com/
69
The End

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Advanced React
75% (4)
Advanced React
347 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Dell-EMC Powerstore Technical Overview
No ratings yet
Dell-EMC Powerstore Technical Overview
287 pages
Tableau - Resume Tableau Developer 4+ Year Exp
100% (1)
Tableau - Resume Tableau Developer 4+ Year Exp
4 pages
ODI Interview Questions and Answers
88% (8)
ODI Interview Questions and Answers
13 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Data Science
No ratings yet
Data Science
87 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
UNIT-IV PDF
No ratings yet
UNIT-IV PDF
26 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Lecture8 -Big Data (Hadoop)
No ratings yet
Lecture8 -Big Data (Hadoop)
29 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
Session 8 - George Strawn - Big Data
No ratings yet
Session 8 - George Strawn - Big Data
34 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
No ratings yet
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Anand J. Kulkarn
No ratings yet
Anand J. Kulkarn
4 pages
Detailednotes_unit1_Big Data
No ratings yet
Detailednotes_unit1_Big Data
22 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
BDA-1
No ratings yet
BDA-1
26 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Big Data Summery
No ratings yet
Big Data Summery
9 pages
Big Data
No ratings yet
Big Data
31 pages
Unit - 1
No ratings yet
Unit - 1
46 pages
BIG DATA_UNIT-I
No ratings yet
BIG DATA_UNIT-I
17 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
Big Data Network
No ratings yet
Big Data Network
33 pages
Week6 Iot Big Data
No ratings yet
Week6 Iot Big Data
21 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Big Data Analytics_Lecture Slides
No ratings yet
Big Data Analytics_Lecture Slides
72 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Module 1
No ratings yet
Module 1
54 pages
Current Big Data Issues and Their Solutions Via Deep Learning: An Overview
No ratings yet
Current Big Data Issues and Their Solutions Via Deep Learning: An Overview
12 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
Introduction To Big Data and Hadoop
No ratings yet
Introduction To Big Data and Hadoop
10 pages
Geie 112 - S19 - LCN - 5
No ratings yet
Geie 112 - S19 - LCN - 5
24 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Lecture 07
No ratings yet
Lecture 07
64 pages
Big-Data-A-Comprehensive-Overview
No ratings yet
Big-Data-A-Comprehensive-Overview
25 pages
Articulo Examen Global - Ingles PROTEGIDO
No ratings yet
Articulo Examen Global - Ingles PROTEGIDO
10 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Big Data Basics Unit 1
No ratings yet
Big Data Basics Unit 1
12 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Lec1 Special
No ratings yet
Lec1 Special
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
A Seminar Presentation On "Big Data": Presented By: Divyanshu Bhardwaj Department of Computer Science VIII Semester
No ratings yet
A Seminar Presentation On "Big Data": Presented By: Divyanshu Bhardwaj Department of Computer Science VIII Semester
19 pages
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Panelview 800 Hmi Terminals: User Manual
No ratings yet
Panelview 800 Hmi Terminals: User Manual
168 pages
History of Semiconductors
No ratings yet
History of Semiconductors
14 pages
Windows 10 Slow After Clone Try These Five Ways
No ratings yet
Windows 10 Slow After Clone Try These Five Ways
6 pages
API Security CheckList
No ratings yet
API Security CheckList
2 pages
Introduction To PN Junction Diode
No ratings yet
Introduction To PN Junction Diode
17 pages
Log
No ratings yet
Log
2 pages
CAM570_Datasheet
No ratings yet
CAM570_Datasheet
3 pages
Atlys RM Revc
No ratings yet
Atlys RM Revc
19 pages
Thông Tin Hàng
No ratings yet
Thông Tin Hàng
49 pages
An Internship Report Presentation On Quality Assurance at Rigo Technologies Pvt. LTD
No ratings yet
An Internship Report Presentation On Quality Assurance at Rigo Technologies Pvt. LTD
14 pages
Fundamentals of Cisco Networking
No ratings yet
Fundamentals of Cisco Networking
3 pages
Print
No ratings yet
Print
34 pages
Practical First
No ratings yet
Practical First
4 pages
Proposal For The Computer Vision
No ratings yet
Proposal For The Computer Vision
5 pages
Computers Then and Now: 1 9 6 7 Turing Award
No ratings yet
Computers Then and Now: 1 9 6 7 Turing Award
10 pages
Disk Operating System (DOS) : Reference Notes
No ratings yet
Disk Operating System (DOS) : Reference Notes
11 pages
4.IoT_raspberrypi
No ratings yet
4.IoT_raspberrypi
3 pages
IIOT-3
No ratings yet
IIOT-3
14 pages
MT1586 e
No ratings yet
MT1586 e
4 pages
US4237370
No ratings yet
US4237370
9 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Comp !0 ICSE CHP 3
No ratings yet
Comp !0 ICSE CHP 3
4 pages
Iec 60870-5 - 104
100% (1)
Iec 60870-5 - 104
44 pages
BJT Characteristics
No ratings yet
BJT Characteristics
9 pages
Network Video Recorder: User Manual
No ratings yet
Network Video Recorder: User Manual
213 pages
Base URL: Example
No ratings yet
Base URL: Example
14 pages

0 Principles of Big Data

Uploaded by

0 Principles of Big Data

Uploaded by

CIT650

Introduction to Big Data

Principles of Big Data

Big Data Phenomena

Big Data 1.0 Systems

Big Data 2.0 Systems

Big Data Phenomena

Old Model: Few companies (producers) are generating data, all

New Model: All of us are generating data, and all of us are

Data generation and consumption is becoming a main part of people’s

Affordable hardware: Costs of actuators & sensors have been cut in

Data ... Data ... Data

Cheaper, larger, and faster disk storage

Moore’s Law: The information density on silicon integrated circuits

Data is growing at a phenomenal rate. It has become massive, opera-

Jim Gray, a database pioneer, described the big data phenomena as

Thousand years ago - Experimental Science

Last few hundreds years - Theoretical Science

Last few decades - Computational Science

Today - Data-Intensive Science

Many racks of computers, thousands of machines per cluster.

”It’s not who has the

Big Data 1.0 System: The Hadoop

Hadoop is an open-source software

Big Data 1.0 = Hadoop

Big Data 1.0 = Hadoop

Big Data 2.0 ! = Hadoop

Apache Flink PowerGraph Apache Tajo

Google estimates that the total number of

Facebook reportedly consists of more than a

The LinkedIn network contains almost 260

Linked data contains about 31 billion triples.

Popular graph query/analysis operations

In 2010, Google introduced the Pregel sys-

Every day, Twitter generates more than 12

New York Stock Exchange captures 1 TB of

About 30 billion radio-frequency identifica-

Hundreds of millions of GPS devices sold ev-

From the stream-processing point of view,

Open source project developed by Nathan

Big Data Management 64 / 80

NoSQL database systems represent a new generation of low-cost, high-

You might also like