Session 8 Big Data

The document discusses big data analytics and distributed and parallel computing for big data. It talks about how big data is characterized by the 3 Vs - volume, velocity and variety. It then discusses distributed computing, parallel computing and the MapReduce paradigm for handling large scale data processing across distributed systems.

Uploaded by

pranjal rohilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views7 pages

Session 8 Big Data

Uploaded by

pranjal rohilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Big Data Analytics

Analyses which can handle the 3 Vs and do it with quality (veracity):

(Laney, 2001: META Group)

1. 2.
large arriving
quantity quickly

3.
[un]structed, multi-
modal
Distributed and Parallel Computing for
Big Data
• Distributed computing
• Multiple computing resources are connected in a network

• Computing tasks are distributed across the resources

• Faster and more efficient than traditional computing

• Parallel computing
• Processing power of a standalone personal computer is enhanced by adding multiple processing units

• Computing tasks are distributed across processing units

Distributed and Parallel Computing for Big
Data
Distributed Computing System Parallel Computing System

An independent, autonomous system connected in A computer system with several processing units
a network for accomplishing specific tasks attached to it

Coordination is possible between connected A common shared memory can be directly accessed
computers that have their own memory and CPU by every processing unit in a network

Loose coupling of computers connected in a Tight coupling of processing resources that are used
network that provides access to data and remotely for solving a single, complex problem
located resources
The MapReduce Paradigm
• Platform for reliable, scalable parallel computing
• Abstracts issues of distributed and parallel environment from programmer.
• Runs over distributed file systems
• Google File System
• Hadoop File System (HDFS)
MapReduce: Insight
• Consider the problem of counting the number of occurrences of each word in a
large collection of documents
• How would you do it in parallel ?
• Solution:
• Divide documents among workers
• Each worker parses document to find all words, outputs (word, count) pairs
• Partition (word, count) pairs across workers based on word
• For each word at a worker, locally add up counts
MapReduce Workflow

Input Data Output Data

Worker Output
write
local Worker File 0
Split 0 read write
Split 1 Worker
Split 2 Output
Worker File 1
Worker remote
read,
sort
Map Reduce
extract something you aggregate,
care about from each summarize, filter,
record or transform
6
Big Data Trends

Spark Introduction
No ratings yet
Spark Introduction
90 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
31 pages
ADSU1 VFTVF25 VF
No ratings yet
ADSU1 VFTVF25 VF
118 pages
Ecs765p W1
No ratings yet
Ecs765p W1
39 pages
BDA Answer Bank
No ratings yet
BDA Answer Bank
24 pages
Cloud Computing (AutoRecovered) - 1
No ratings yet
Cloud Computing (AutoRecovered) - 1
60 pages
Intro HPC IITK
No ratings yet
Intro HPC IITK
44 pages
Parallel and Distributed Computing Lec 4
No ratings yet
Parallel and Distributed Computing Lec 4
28 pages
BDA Model QP Soln
No ratings yet
BDA Model QP Soln
55 pages
Cloud Computing Unit - 1
No ratings yet
Cloud Computing Unit - 1
42 pages
BDA Module - 1 PSM
No ratings yet
BDA Module - 1 PSM
32 pages
Cloud Computing Continuation
No ratings yet
Cloud Computing Continuation
29 pages
L8: Cloud Computing Background: Anupinder Singh
No ratings yet
L8: Cloud Computing Background: Anupinder Singh
29 pages
Lecture 5
No ratings yet
Lecture 5
32 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
Parallel and Distributed Computing Lec 3
No ratings yet
Parallel and Distributed Computing Lec 3
25 pages
Week 1
No ratings yet
Week 1
14 pages
Cloud Compute
No ratings yet
Cloud Compute
46 pages
CDB21DW043 (Autosaved)
No ratings yet
CDB21DW043 (Autosaved)
19 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
25 pages
Cloud 4 Unit
No ratings yet
Cloud 4 Unit
26 pages
Unit 2 BDT
No ratings yet
Unit 2 BDT
24 pages
Big Data Distributed Platforms
No ratings yet
Big Data Distributed Platforms
18 pages
Week 14 Applications of Parallel and Distributed Computing
No ratings yet
Week 14 Applications of Parallel and Distributed Computing
10 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
Week 02
No ratings yet
Week 02
115 pages
Bda MQP 1
No ratings yet
Bda MQP 1
29 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
21cs71 Model Set 1 Paper Solution
No ratings yet
21cs71 Model Set 1 Paper Solution
32 pages
Assignment2 CCL 24
No ratings yet
Assignment2 CCL 24
9 pages
BigData ParallelComputing
No ratings yet
BigData ParallelComputing
9 pages
Distributed Computing in Big Data Analytics
No ratings yet
Distributed Computing in Big Data Analytics
10 pages
MA - VaishuAchini - VIT - 24 - ICT703 - A3
No ratings yet
MA - VaishuAchini - VIT - 24 - ICT703 - A3
21 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
CSE 423 Virtualization and Cloud Computinglecture0
No ratings yet
CSE 423 Virtualization and Cloud Computinglecture0
16 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
A Vision For Medical Affairs in 2025: Insights Into Pharmaceuticals and Medical Products
100% (1)
A Vision For Medical Affairs in 2025: Insights Into Pharmaceuticals and Medical Products
20 pages
Lecture 01
No ratings yet
Lecture 01
34 pages
3.3 Computing
No ratings yet
3.3 Computing
5 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Computing
No ratings yet
Computing
6 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Screenshot 2024-06-27 at 11.49.45 PM
No ratings yet
Screenshot 2024-06-27 at 11.49.45 PM
28 pages
Parallel and Distributed
No ratings yet
Parallel and Distributed
2 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
15 pages
3.3 Computing
No ratings yet
3.3 Computing
5 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
CC Unit4
No ratings yet
CC Unit4
14 pages
Unit 4 Map Reduce
No ratings yet
Unit 4 Map Reduce
10 pages
CS621 - Handouts - Mids
No ratings yet
CS621 - Handouts - Mids
61 pages
CS ELEC 2 Introduce Parallel Computing
No ratings yet
CS ELEC 2 Introduce Parallel Computing
28 pages
Big Data
No ratings yet
Big Data
29 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Parallel Data Processing in The Cloud
No ratings yet
Parallel Data Processing in The Cloud
25 pages
Big Data
No ratings yet
Big Data
12 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
2022 Dec. ITT401-A
No ratings yet
2022 Dec. ITT401-A
2 pages
SQL Fundamentals Slides
100% (1)
SQL Fundamentals Slides
84 pages
Vikas Gupta Resume
No ratings yet
Vikas Gupta Resume
3 pages
Newspaper Industry Thesis
100% (3)
Newspaper Industry Thesis
8 pages
Smart Hospital - Pemanfaatan Digitalisasi Pada RS Masa Kini - Dr. Anis Fuad DEA PDF
100% (1)
Smart Hospital - Pemanfaatan Digitalisasi Pada RS Masa Kini - Dr. Anis Fuad DEA PDF
32 pages
2015 - en - McKinsey - No Ordinary Disruption The Forces Reshaping Asia
No ratings yet
2015 - en - McKinsey - No Ordinary Disruption The Forces Reshaping Asia
20 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
2 pages
International Journal of Production Economics: Chunguang Bai, Patrick Dallasega, Guido Orzes, Joseph Sarkis
No ratings yet
International Journal of Production Economics: Chunguang Bai, Patrick Dallasega, Guido Orzes, Joseph Sarkis
15 pages
Arquivo5203 1
No ratings yet
Arquivo5203 1
180 pages
Zeke Resume
100% (1)
Zeke Resume
2 pages
Instant Download Fundamentals of Information Systems 9th Edition Ralph Stair PDF All Chapter
100% (1)
Instant Download Fundamentals of Information Systems 9th Edition Ralph Stair PDF All Chapter
47 pages
History of Information and Communications Technology
No ratings yet
History of Information and Communications Technology
29 pages
Human Sciences Methods and Tools
No ratings yet
Human Sciences Methods and Tools
41 pages
DABD (KMBNIT01) Model Paper With Solution
No ratings yet
DABD (KMBNIT01) Model Paper With Solution
19 pages
Unit 1DOC)
No ratings yet
Unit 1DOC)
3 pages
Ai For The Earth Jan 2018
No ratings yet
Ai For The Earth Jan 2018
52 pages
Student Handbook Sep 2021 V2
No ratings yet
Student Handbook Sep 2021 V2
61 pages
1902 06672 PDF
No ratings yet
1902 06672 PDF
24 pages
DS Assignment
No ratings yet
DS Assignment
31 pages
M.Tech Information Technology: Course Structure and Scheme of Valuation W.E.F. 2013-14 I Semester
No ratings yet
M.Tech Information Technology: Course Structure and Scheme of Valuation W.E.F. 2013-14 I Semester
33 pages
IoT83 Brochure
No ratings yet
IoT83 Brochure
4 pages
A Spark-Based Parallel Distributed Posterior Decoding Algorithm For Big Data Hidden Markov Models Decoding Problem
No ratings yet
A Spark-Based Parallel Distributed Posterior Decoding Algorithm For Big Data Hidden Markov Models Decoding Problem
13 pages
Big Data-Informed Urban Design
No ratings yet
Big Data-Informed Urban Design
33 pages
Analog & Digital Communication LAB File Nsit: Submitted By:-Name:-Jaideep Kumar ROLL NO: - 738/IT/13
No ratings yet
Analog & Digital Communication LAB File Nsit: Submitted By:-Name:-Jaideep Kumar ROLL NO: - 738/IT/13
19 pages
Consolidated Framework and Exam Roadmaps
No ratings yet
Consolidated Framework and Exam Roadmaps
6 pages
BDA Question Bank
No ratings yet
BDA Question Bank
3 pages
Survey On Big Data Analytics
No ratings yet
Survey On Big Data Analytics
5 pages
2019 May IT462-A - Ktu Qbank
No ratings yet
2019 May IT462-A - Ktu Qbank
2 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet

Session 8 Big Data

Uploaded by

Session 8 Big Data

Uploaded by

Big Data Analytics

Analyses which can handle the 3 Vs and do it with quality (veracity):

• Computing tasks are distributed across the resources

• Faster and more efficient than traditional computing

• Computing tasks are distributed across processing units

Input Data Output Data

You might also like