0% found this document useful (0 votes)

16 views25 pages

Big Data

Uploaded by

VIJAYA PRABA P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views25 pages

Big Data

Uploaded by

VIJAYA PRABA P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

BIG DATA

Prepared By:-
EduTechLearners 1
How much time did it take?
• Excel : Have you ever tried a pivot table on 500 MB file?
• SAS/R : Have you ever tried a frequency table on 2 GB file?
• Access: Have you ever tried running a query on 10 GB file
• SQL: Have you ever tried running a query on 50 GB file

2
Can you think of ?
• Can you think of running a query on 20,980,000 GB file.
• What if we get a new data set like this, every day?
• What if we need to execute complex queries on this data set
everyday ?
• Does anybody really deal with this type of data set?
• Is it possible to store and analyze this data?
• Yes Google deals with more than 20 PB data everyday

3
In fact, in a minute
• Email users send more than 204 million messages;
• Mobile Web receives 217 new users;
• Google receives over 2 million search queries;
• YouTube users upload 48 hours of new video;
• Facebook users share 684,000 bits of content;
• Twitter users send more than 100,000 tweets;
• Consumers spend $272,000 on Web shopping;
• Apple receives around 47,000 application downloads;
• Brands receive more than 34,000 Facebook 'likes';
• Tumblr blog owners publish 27,000 new posts;
• Instagram users share 3,600 new photos;
• Flickr users, on the other hand, add 3,125 new photos;
• Foursquare users perform 2,000 check-ins;
• WordPress users publish close to 350 new blog posts.
And this is one year back͙ .. Damn!!
4
What is BIG DATA?
 Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications

 “Big Data” is the data whose scale, diversity, and complexity

require new architecture, techniques, algorithms, and analytics
to manage it and extract value and hidden knowledge from it͙
 ‘Big Data’ is similar to ‘small data’, but bigger in size
 An aim to solve new problems or old problems in a better way

 Big Data generates value from the storage and processing of

very large quantities of digital information that cannot be
analyzed with traditional computing techniques.
5
6
ariety Velocity Volume
Data Types • Data Speed • Data quantity
Three Characteristics of Big Data V3s
1st Character of Big Data
Volume
•A typical PC might have had 10 gigabytes of storage in 2000.

•Today, Face book ingests 500 terabytes of new data every day.

•Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
• The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environmental, location,
and other information, including video.

7
2nd Character of Big Data
Velocity
 Click streams and ad impressions capture user behavior at
millions of events per second

 high-frequency stock trading algorithms reflect market

changes within microseconds

 machine to machine processes exchange data between billions

of devices

 infrastructure and sensors generate massive log data in real-

time

 on-line gaming systems support millions of concurrent users,

each producing multiple inputs per second.
8
3rd Character of Big Data
Variety
 Big Data isn't just numbers, dates, and strings. Big
Data is also geospatial data, 3D data, audio and video,
and unstructured text, including log files and social
media.

 Traditional database systems were designed to address

smaller volumes of structured data, fewer updates or a
predictable, consistent data structure.

 Big Data analysis includes different types of data

9
Handling bigdata-
Parallel computing
• Imagine a 1gb text file, all the status updates on Facebook in a day
• Now suppose that a simple counting of the number of rows takes
10 minutes.
• Select count(*) from fb_status
• What do you do if you have 6 months data, a file of size 200GB, if
you still want to find the results in 10 minutes?
• Parallel computing?
• Put multiple CPUs in a machine (100?)
• Write a code that will calculate 200 parallel counts and finally
sums up
• But you need a super computer

10
Handling bigdata - Is there a better
way?
• Till 1985, There is no way to connect multiple computers. All
systems were Centralized Systems.
• So multi-core system or super computers were the only options
for big data problems
• After 1985,We have powerful microprocessors and High Speed
Computer Networks (LANs , WANs), which lead to distributed
systems

• Now that we have a distributed system that ensures a

collection of independent computers appears to its users as a
single coherent system, can we use some cheap computers
and process our bigdata quickly?

11
MapReduce Programming Model
• Processing data using special map() and reduce() functions
• The map() function is called on every item in the input and
emits a series of intermediate key/value pairs(Local
calculation)
• All values associated with a given key are grouped together
• The reduce() function is called on every unique key, and its
value list, and emits a value that is added to the output(final
organization)

12
Not just MapReduce
• Earlier count=count+1 was sufficient but now, we need to
1. Setup a cluster of machines, then divide the whole data set into
blocks and store them in local machines
2. Assign a master node that takes charge of all meta data, work
scheduling and distribution, and job orchestration
3. Assign worker slots to execute map or reduce functions
4. Load Balance (What if one machine is very slow in the cluster?)
5. Fault Tolerance (What if the intermediate data is partially read,
but the machine fails before all reduce(collation) operations
can complete?)
6. Finally write the map reduce code that solves our problem

13
Ok. Analysis on bigdata can give us awesome insights.

But, datasets are huge, complex and difficult to process.

I found a solution, distributed computing or MapReduce

But looks like this data storage & parallel processing

is complicated

What is the solution?

14
Hadoop
• Hadoop is a bunch of tools, it has many components. HDFS
and MapReduce are two core components of Hadoop
• HDFS: Hadoop Distributed File System
• makes our job easy to store the data on commodity hardware
• Built to expect hardware failures
• Intended for large files & batch inserts
• MapReduce
• For parallel processing

• So Hadoop is a software platform that lets one easily write

and run applications that process bigdata

15
Why Hadoop is useful
• Scalable: It can reliably store and process petabytes.
• Economical: It distributes the data and processing across
clusters of commonly available computers (in thousands).
• Efficient: By distributing the data, it can process it in parallel
on the nodes where the data is located.
• Reliable: It automatically maintains multiple copies of data
and automatically redeploys computing tasks based on
failures.
• And Hadoop is free

16
So what is Hadoop?
• Hadoop is not Bigdata
• Hadoop is not a database

• Hadoop is a platform/framework
• Which allows the user to quickly write and test distributed
systems
• Which is efficient in automatically distributing the data
and work across machines

17
Hadoop ecosystem

18
Big Data ecosystem

19
Big Data Analytics

 Examining large amount of data

 Appropriate information
 Identification of hidden patterns, unknown correlations
 Competitive advantage
 Better business decisions: strategic and operational
 Effective marketing, customer satisfaction, increased revenue

20
Types of tools used in
Big-Data
 Where processing is hosted?
• Distributed Servers / Cloud (e.g. Amazon EC2)
 Where data is stored?
• Distributed Storage (e.g. Amazon S3)
 What is the programming model?
• Distributed Processing (e.g. MapReduce)
 How data is stored & indexed?
• High-performance schema-free databases (e.g. MongoDB)
 What operations are performed on data?
• Analytic / Semantic Processing

21
Application Of Big Data analytics
Smarter Multi-channel
Healthcare sales

Homeland Telecom
Security

Trading
Traffic Analytics
Control

Search
Manufacturing Quality

22
Risks of Big Data
• Will be so overwhelmed
• Need the right people and solve the right problems

• Costs escalate too fast

• Isn’t necessary to capture 100%

• Many sources of big data

is privacy
• self-regulation
• Legal regulation

23
Benefits of Big Data
 Our newest research finds that organizations are using big data to
target customer-centric outcomes, tap into internal data and build a
better information ecosystem.

 Big Data is already an important part of the $64 billion database and
data analytics market

 It offers commercial opportunities of a comparable

scale to enterprise software in the late 1980s

 And the Internet boom of the 1990s, and the social media explosion
of today.

24
www.edutechlearners.com

Bigdata PPT Slides (E)
No ratings yet
Bigdata PPT Slides (E)
10 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Ccs334 BDA Important Questions
No ratings yet
Ccs334 BDA Important Questions
31 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data
No ratings yet
Big Data
16 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Big Data
No ratings yet
Big Data
30 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
BDA Unit 1
No ratings yet
BDA Unit 1
68 pages
Bigdata 201126054145 PDF
No ratings yet
Bigdata 201126054145 PDF
23 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
01 Introduction
No ratings yet
01 Introduction
23 pages
PHD CSE Seminar in Course Work
0% (1)
PHD CSE Seminar in Course Work
17 pages
Presented by Theerthana.H Pradeepa.A
No ratings yet
Presented by Theerthana.H Pradeepa.A
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
42 pages
Big Data in The Future of Workforce - Prof Abdullah
No ratings yet
Big Data in The Future of Workforce - Prof Abdullah
30 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Module 1
No ratings yet
Module 1
54 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
Big Data
100% (3)
Big Data
22 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Big Data
No ratings yet
Big Data
30 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Big Data
No ratings yet
Big Data
24 pages
Data Science
No ratings yet
Data Science
31 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Big Data
No ratings yet
Big Data
21 pages
Big Data
No ratings yet
Big Data
63 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Big Data
No ratings yet
Big Data
31 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
Big Data
No ratings yet
Big Data
31 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Big Data
No ratings yet
Big Data
25 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Big Data Strategy
0% (1)
Big Data Strategy
5 pages
Schemes of Work (IT Unit 3)
No ratings yet
Schemes of Work (IT Unit 3)
44 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
2022 Book CloudComputing
100% (1)
2022 Book CloudComputing
241 pages
The Rise of Digital Twins
No ratings yet
The Rise of Digital Twins
12 pages
IoT PI
No ratings yet
IoT PI
59 pages
CSE M.Tech
No ratings yet
CSE M.Tech
24 pages
Privacy and Security Issues in The Age of Big Data - Business
No ratings yet
Privacy and Security Issues in The Age of Big Data - Business
13 pages
Using Predictive Analytics To Optimize Asset Maintenance in The Utilities Industry
100% (1)
Using Predictive Analytics To Optimize Asset Maintenance in The Utilities Industry
6 pages
75-Examples FINAL PDF
No ratings yet
75-Examples FINAL PDF
78 pages
Distributed Databases, NOSQL Systems and BIGDATA
No ratings yet
Distributed Databases, NOSQL Systems and BIGDATA
62 pages
DB 36
No ratings yet
DB 36
33 pages
B.tech Iot and Ece Iot
No ratings yet
B.tech Iot and Ece Iot
58 pages
CSLeichter Interpol 11.09.2017
No ratings yet
CSLeichter Interpol 11.09.2017
47 pages
Data Visualization, Volume II
No ratings yet
Data Visualization, Volume II
33 pages
BDEv3.5 Perf. Benchmark and Architect Design
No ratings yet
BDEv3.5 Perf. Benchmark and Architect Design
28 pages
Grossman Et Al (RAND) - Chinese Views of Big Data Analytics (2020)
No ratings yet
Grossman Et Al (RAND) - Chinese Views of Big Data Analytics (2020)
79 pages
Analytics Maturity Models
No ratings yet
Analytics Maturity Models
20 pages
IDC State of The Market IT Spending by Industry, 2023 - 2023 Feb Presentation
No ratings yet
IDC State of The Market IT Spending by Industry, 2023 - 2023 Feb Presentation
28 pages
Introduction To Parallel and Distributed Databases
No ratings yet
Introduction To Parallel and Distributed Databases
12 pages
Anomaly Detection Firewalls Capabilities and Limitations ICCSE1.2018.8374204
No ratings yet
Anomaly Detection Firewalls Capabilities and Limitations ICCSE1.2018.8374204
5 pages
Introduction To IoT and Digital Transformation v2 1 Scope and Sequence
No ratings yet
Introduction To IoT and Digital Transformation v2 1 Scope and Sequence
5 pages
Artificial Intelligence For Decision Making in The Era of Big Data - Evolution Challenges and Research Agenda
No ratings yet
Artificial Intelligence For Decision Making in The Era of Big Data - Evolution Challenges and Research Agenda
9 pages
Cyber Security and Digital Economy Opportunities G
No ratings yet
Cyber Security and Digital Economy Opportunities G
22 pages
Ijeta V7i4p7
No ratings yet
Ijeta V7i4p7
6 pages
Big - Data My Notes
No ratings yet
Big - Data My Notes
37 pages
Project Domain - Big Data
No ratings yet
Project Domain - Big Data
21 pages
Big Data Notes
No ratings yet
Big Data Notes
17 pages
Obstfeld, J., Chen, X., Frebourg, O. Towards Near Real-Time BGP Deep Analysis A Big-Data
No ratings yet
Obstfeld, J., Chen, X., Frebourg, O. Towards Near Real-Time BGP Deep Analysis A Big-Data
7 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Big Data

Uploaded by

Big Data

Uploaded by

BIG DATA

 “Big Data” is the data whose scale, diversity, and complexity

 Big Data generates value from the storage and processing of

 high-frequency stock trading algorithms reflect market

 machine to machine processes exchange data between billions

 infrastructure and sensors generate massive log data in real-

 on-line gaming systems support millions of concurrent users,

 Traditional database systems were designed to address

 Big Data analysis includes different types of data

• Now that we have a distributed system that ensures a

But, datasets are huge, complex and difficult to process.

I found a solution, distributed computing or MapReduce

But looks like this data storage & parallel processing

What is the solution?

• So Hadoop is a software platform that lets one easily write

 Examining large amount of data

• Costs escalate too fast

• Many sources of big data

 It offers commercial opportunities of a comparable

You might also like