0% found this document useful (0 votes)

3 views13 pages

Big Data Processing, MapReduce

The document discusses Big Data processing, focusing on batch and transactional processing, and introduces Hadoop as a framework for distributed data management. It explains the MapReduce programming model, which enables parallel processing of data across a Hadoop cluster, highlighting its scalability and fault tolerance. Examples illustrate the Map and Reduce tasks in action, such as counting words in documents and cataloging coins.

Uploaded by

azamsyed811

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views13 pages

Big Data Processing, MapReduce

Uploaded by

azamsyed811

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Big Data Processing and Map Reduce

Outline
q Batch and Transactional Processing
q Hadoop
q MapReduce

Reference:
• Chapter 6, “Big Data Fundamentals: Concepts, Drivers & Techniques”, by Thomas Erl,
Wajid Khattak, Paul Buhler. 1st Ed. ISBN-10: 0134291077,
2
Big Data Management Software Stack

3
Distributed Data Processing

• Achieved through
physically separate
machines that are
networked together
as a cluster

4
Processing Workloads
• Batch: processing data in batches and usually imposes delays, which in
turn results in high-latency responses
o Also known as offline processing

o Queries can be complex and involve multiple joins

• Transactional: data is processed interactively without delay, resulting in

low-latency responses
o Also known as online processing

o small amounts of data with random reads and writes

5
Batch Processing
● a batch workload can include
grouped read/writes to INSERT,
SELECT, UPDATE and DELETE
● response time could vary from
minutes to hours
● generally involves processing a
range of large datasets

● majority of Big Data processing

occurs in batch mode

6
Transactional Processing
● Transactional workloads have few
joins and lower latency responses
than batch workloads
● Generally more write-intensive
than read-intensive
● smaller data footprint

7
Hadoop
● Hadoop is a versatile
framework that provides
both processing and storage
capabilities

● Two main components:

1. Hadoop Distributed File
System (HDFS) for
distributed storage
2. MapReduce for
distributed processing

8
Batch Processing with MapReduce
● MapReduce is a programming model that allows parallel and distributed
processing of data across a Hadoop cluster

● Highly scalable, reliable, and based on the principle of divide-and-

conquer

● Built-in fault tolerance and redundancy

● Does not require that the input data conform to any particular data model

● High coordination overhead

● Data processing algorithm is moved to the nodes that store the data

9
Map and Reduce Tasks

An illustration of a MapReduce job with the map stage highlighted

10
Example #1 of MapReduce
Goal: Count the number of times a word appeared in a document

11
Example #1 of MapReduce
Goal: Count the number of times a word appeared in a document

Assume we have 10 servers and 200 documents

1. Map: divide the documents and assign them to the servers (e.g., 20 each)
• (Key, Value) pair à (Word, Count) à (“Taco”, 7)
2. Combine and Partition if necessary
3. Shuffle and Sort à Take the output from previous stage and combine them
together in a sorted list
4. Reduce à Sum or merge to arrive at the final result

12
Example #2 of MapReduce
Goal: Count and catalog all the coins in a
pile (different currency types and
denominations)

“classical” approach to
parallel computing

Ref: https://freecontent.manning.com/explaining-mapreduce-with-ducks/
13
Example #2 of MapReduce
Goal: Count and catalog all the coins in a
pile (different currency types and
denominations)

MapReduce

Ref: https://freecontent.manning.com/explaining-mapreduce-with-ducks/
14

Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Cloud Computing Unit 3
No ratings yet
Cloud Computing Unit 3
10 pages
Map Reduce
No ratings yet
Map Reduce
69 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
CC Unit4
No ratings yet
CC Unit4
14 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Ditp ch2
No ratings yet
Ditp ch2
2 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
He-Phan-Bo - Thoai-Nam - Distributedsystem - 18 - Mapreduce - (Cuuduongthancong - Com)
No ratings yet
He-Phan-Bo - Thoai-Nam - Distributedsystem - 18 - Mapreduce - (Cuuduongthancong - Com)
31 pages
Big Data Processing Concepts
No ratings yet
Big Data Processing Concepts
9 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Unit 4
No ratings yet
Unit 4
10 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Big Data Engines: Binary Batch Processing
No ratings yet
Big Data Engines: Binary Batch Processing
12 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Big Data and Analytics and MapReduce 29052023 054155pm
No ratings yet
Big Data and Analytics and MapReduce 29052023 054155pm
35 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
50 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Lecture4 IntroMapReduce PDF
No ratings yet
Lecture4 IntroMapReduce PDF
75 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
BDA Unit 4 PDF
No ratings yet
BDA Unit 4 PDF
31 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Chapter 4 MapReduce
No ratings yet
Chapter 4 MapReduce
82 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
CHAPTER - 1 - MapReduce
No ratings yet
CHAPTER - 1 - MapReduce
27 pages
Big Data
No ratings yet
Big Data
29 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Cloud - UNIT V
No ratings yet
Cloud - UNIT V
18 pages
10-Big Data Nhom7
No ratings yet
10-Big Data Nhom7
81 pages
BDAunit III
No ratings yet
BDAunit III
4 pages
L02-Hadoop Framework
No ratings yet
L02-Hadoop Framework
40 pages
Lecture 3 MR Model and Systems
No ratings yet
Lecture 3 MR Model and Systems
67 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Harsh PDF
No ratings yet
Harsh PDF
3 pages
DPG 21XX
No ratings yet
DPG 21XX
54 pages
Wasim Mohammed CV New
No ratings yet
Wasim Mohammed CV New
5 pages
Design and Manufacture of TDS Measurement and Cont
No ratings yet
Design and Manufacture of TDS Measurement and Cont
21 pages
User Manual For Amazfit Band 5
No ratings yet
User Manual For Amazfit Band 5
25 pages
Eliwell 978 Manual
No ratings yet
Eliwell 978 Manual
12 pages
4.melsec Q Process Control
No ratings yet
4.melsec Q Process Control
13 pages
OB1 - : Name: Family: Author: Block Version: Time Stamp Code: Interface: Lengths (Block/logic/data)
No ratings yet
OB1 - : Name: Family: Author: Block Version: Time Stamp Code: Interface: Lengths (Block/logic/data)
11 pages
Email Security
No ratings yet
Email Security
2 pages
GT11 General Description: GT1155-QSBD, GT1150-QLBD
No ratings yet
GT11 General Description: GT1155-QSBD, GT1150-QLBD
6 pages
Applications Customer Support Engineer in Burlington VT Resume Jeramy Hammer
No ratings yet
Applications Customer Support Engineer in Burlington VT Resume Jeramy Hammer
2 pages
Imagine Videotek VSG 4mtg
No ratings yet
Imagine Videotek VSG 4mtg
8 pages
Introduction To MATLAB (Compatibility Mode)
No ratings yet
Introduction To MATLAB (Compatibility Mode)
20 pages
The Killhouse Entry Point Wiki Fandom
No ratings yet
The Killhouse Entry Point Wiki Fandom
1 page
FCASD - Lab Assignment - 6
No ratings yet
FCASD - Lab Assignment - 6
7 pages
Omnilogic Hlbase Operation
No ratings yet
Omnilogic Hlbase Operation
40 pages
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
6 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
6 pages
ATS2805A
No ratings yet
ATS2805A
21 pages
DR Deepak02
No ratings yet
DR Deepak02
1 page
Final Project On MR Puff
No ratings yet
Final Project On MR Puff
12 pages
GEPI Instructions 2025
No ratings yet
GEPI Instructions 2025
2 pages
15 - Software Development
No ratings yet
15 - Software Development
89 pages
Mri Report
No ratings yet
Mri Report
49 pages
A Literature Survey On Various Approaches On Content Based Image Search
No ratings yet
A Literature Survey On Various Approaches On Content Based Image Search
6 pages
Mx3ipg2a PDF
No ratings yet
Mx3ipg2a PDF
2 pages
Trung PN Wdu202c GD 1301 Question Bank
No ratings yet
Trung PN Wdu202c GD 1301 Question Bank
24 pages
Economics PPT Education and Skills
No ratings yet
Economics PPT Education and Skills
16 pages
Scandinavian
No ratings yet
Scandinavian
2 pages
Dex2jar Steps
No ratings yet
Dex2jar Steps
6 pages

Big Data Processing, MapReduce

Uploaded by

Big Data Processing, MapReduce

Uploaded by

Big Data Processing and Map Reduce

o Queries can be complex and involve multiple joins

• Transactional: data is processed interactively without delay, resulting in

o small amounts of data with random reads and writes

● majority of Big Data processing

● Two main components:

● Highly scalable, reliable, and based on the principle of divide-and-

● Built-in fault tolerance and redundancy

● High coordination overhead

An illustration of a MapReduce job with the map stage highlighted

Assume we have 10 servers and 200 documents

You might also like