0% found this document useful (0 votes)

42 views

Intro To Mapreduce

The document discusses MapReduce and Hadoop. It introduces MapReduce as a framework for processing large datasets in parallel across clusters of computers. It describes how Hadoop uses MapReduce to automatically parallelize jobs and handle failures. The key aspects of a MapReduce job in Hadoop are explained such as the map and reduce functions, partitioning, sorting, and output.

Uploaded by

Aruna Samanthapudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Intro To Mapreduce

Uploaded by

Aruna Samanthapudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 15

CPS216: Advanced Database

Systems (Data-intensive
Computing Systems)

Introduction to MapReduce
and Hadoop
Shivnath Babu
Word Count over a Given Set of
Web Pages
see bob throw
see 1
bob 1
throw 1
see 1
spot 1
run 1
bob 1
run 1
see 2
spot 1
throw 1

see spot run
Can we do word count in parallel?
The MapReduce Framework
(pioneered by Google)
Automatic Parallel Execution in
MapReduce (Google)
Handles failures automatically, e.g., restarts tasks if a
node fails; runs multiples copies of the same task to
avoid a slow task slowing down the whole job
MapReduce in Hadoop (1)
MapReduce in Hadoop (2)
MapReduce in Hadoop (3)
Data Flow in a MapReduce
Program in Hadoop
InputFormat
Map function
Partitioner
Sorting & Merging
Combiner
Shuffling
Merging
Reduce function
OutputFormat

1:many
Lifecycle of a MapReduce Job
Map function
Reduce function
Run this program as a
MapReduce job
Lifecycle of a MapReduce Job
Map function
Reduce function
Run this program as a
MapReduce job
Map
Wave 1
Reduce
Wave 1
Map
Wave 2
Reduce
Wave 2
Input
Splits
Lifecycle of a MapReduce Job
Time
How are the number of splits, number of map and reduce
tasks, memory allocation to tasks, etc., determined?
Job Configuration Parameters
190+ parameters in
Hadoop
Set manually or defaults
are used
How to sort data using Hadoop?
Let us look at a complete
example MapReduce program
in Hadoop

3+ Tableau Resume
No ratings yet
3+ Tableau Resume
4 pages
Tableau - Resume Tableau Developer 4+ Year Exp
100% (1)
Tableau - Resume Tableau Developer 4+ Year Exp
4 pages
CPS216: Advanced Database Systems (Data-Intensive Computing Systems)
No ratings yet
CPS216: Advanced Database Systems (Data-Intensive Computing Systems)
15 pages
CPS216: Advanced Database Systems (Data-Intensive Computing Systems)
No ratings yet
CPS216: Advanced Database Systems (Data-Intensive Computing Systems)
15 pages
Hadoop
No ratings yet
Hadoop
34 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Data Science
No ratings yet
Data Science
7 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
No ratings yet
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
26 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Unit 3 MapReduce Part 1
No ratings yet
Unit 3 MapReduce Part 1
12 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
bda megh
No ratings yet
bda megh
50 pages
hadoop2
No ratings yet
hadoop2
31 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Bda 03
No ratings yet
Bda 03
10 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
192 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
UNIT 3 NOTES (1)
No ratings yet
UNIT 3 NOTES (1)
21 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
No ratings yet
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
30 pages
Lecture 3 MapReduce Spark
No ratings yet
Lecture 3 MapReduce Spark
62 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Hadoop MapReduce v2 Cookbook - Second Edition
From Everand
Hadoop MapReduce v2 Cookbook - Second Edition
Thilina Gunarathne
No ratings yet
Administering ArcGIS for Server
From Everand
Administering ArcGIS for Server
Hussein Nasser
No ratings yet
Exploring ArcMap 10.5
From Everand
Exploring ArcMap 10.5
Prof. Sham Tickoo
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Finger Print
No ratings yet
Finger Print
10 pages
Experiment - 1: DDL Commands, DML Commands, TCL Commands: // CREATE
No ratings yet
Experiment - 1: DDL Commands, DML Commands, TCL Commands: // CREATE
23 pages

Intro To Mapreduce

Uploaded by

Intro To Mapreduce

Uploaded by

CPS216: Advanced Database

You might also like