0% found this document useful (0 votes)
62 views20 pages

Cascade

This document provides an overview of big data concepts including what constitutes big data, how data is collected and stored at large scales, and some of the challenges involved. It defines big data as the massive amounts of both structured and unstructured data that are now being created and must be stored and analyzed. Examples given include over 2.5 quintillion bits of data created daily through social media, documents, and other online sources. The four V's of big data - volume, velocity, variety and veracity - are explained. Methods of collecting and storing large datasets across distributed systems like Hadoop are also summarized.

Uploaded by

api-279151942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views20 pages

Cascade

This document provides an overview of big data concepts including what constitutes big data, how data is collected and stored at large scales, and some of the challenges involved. It defines big data as the massive amounts of both structured and unstructured data that are now being created and must be stored and analyzed. Examples given include over 2.5 quintillion bits of data created daily through social media, documents, and other online sources. The four V's of big data - volume, velocity, variety and veracity - are explained. Methods of collecting and storing large datasets across distributed systems like Hadoop are also summarized.

Uploaded by

api-279151942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Presented by,

K.JanakiRam,

Contents: Classification of Data.


What Happens in the Internet in a Minute ?
What is BIG DATA ?
Explaining about 4 Vs of BIG DATA..
Volume
Velocity
Variety
Veracity
Collecting Data
Storing Data
Challenges
Hadoop

Classification of
DATA

Types of Data..

What Happens In the Internet in a


Minute
?
204 Million E-Mails were Sent.

300K Logins to FACEBOOK.


1.3 Million Views in YOUTUBE,
2M Google Searches.
1,00,000 Tweets in Twitter.
20 Hrs of Videos.
62,000 Hours of Music Downloads.
And there are many Souces for
Unstructured DATA

WHAT IS BIG DATA?


We Constantly Produce a Lot Of Data.
For Example Social media, Public Transport
and GPS etc..,
Daily We Upload 55 Million Pictures and 340
Million Tweets and 1 Billon Documents.
All these together Constitute 2.5 Quilliton Bits
a Day and this lots of Data Is Called BIG DATA

4 VS OF BIG DATA

DATA AVALANCHE/ MOORES LAW OF DATA

We are now collecting and converting large amount of data to


digital forms
90% of the data in the world today was created within the past
two years.
Amount of data we have doubles very fast.

BIG DATA ARCHITECTURE

DRIVERS OF BIG DATA

COLLECTING DATA
Data collected at sensors and sent to big data system via events or
flat files.
Event Streams: we name the events by its content/ originator .

Get data through


Point to Point
Event Bus

E.g. Data bridge


a thrift based transport we did
that do about 400k events/ sec

STORING DATA
Historically we used databases
Scale is a challenge: replication, sharding
Scalable options
No SQL (Cassandra, Hbase) [If data is
structured]
Distributed file systems (e.g. HDFS) [If
data is unstructured]
New SQL
In Memory computing, Volt DB
Specialized data structures
Graph Databases, Data structure servers

WHY BIG DATA IS HARD?


How store?
Assuming 1TB bytes it takes 1000 computers to store
a 1PB

How to move?
Assuming 10Gb network, it takes 2 hours to copy
1TB, or 83 days to copy a 1PB.

How to Search?
Assuming each record is 1KB and one machine can
process 1000 records per sec, it needs 277CPU days to
process a 1TB and 785 CPU years to process a 1 PB

How to process?
How to convert algorithms to work in large size
How to create new algorithms

CHALLENGES
System build of many Computers.
That handles lots of data.
Running complex logic.
This pushes us to frontier of Distributed Systems
and Databases.
More data does not mean there is a simple
model.
Some models can be complex as the system.

WHAT IS HADOOP ?

FUTURE SCOPE
ROBOTIC
S

BIG
DATA

THAN

Mail me at,
[email protected]
Mobile No :9247661152.

K. JANAKI
RAM
12481A0456

You might also like