0% found this document useful (0 votes)

9 views12 pages

HDFS Unit 4

Uploaded by

Bhanu Pratap Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

HDFS Unit 4

Uploaded by

Bhanu Pratap Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

MapReduce

Unit 4
Introduction
MapReduce is a processing technique and a program model for distributed computing based on java.

The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of
data and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs).

Secondly, reduce task, which takes the output from a map as an input and combines those data tuples
into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always
performed after the map job.
Introduction
The major advantage of MapReduce is that it is easy to scale data processing over multiple computing
nodes.

Under the MapReduce model, the data processing primitives are called mappers and reducers.

Decomposing a data processing application into mappers and reducers is sometimes nontrivial.

But, once we write an application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster is merely a configuration change. This
simple scalability is what has attracted many programmers to use the MapReduce model.
Data Flow in MapReduce
MapReduce is used to compute the huge amount of data . To handle the upcoming data in a parallel and
distributed form, the data has to flow from various phases.
Phases of MapReduce Dataflow
Input reader
The input reader reads the upcoming data and splits it into the data blocks of the appropriate size (64 MB to 128 MB). Each
data block is associated with a Map function. Once input reads the data, it generates the corresponding key-value pairs. The
input files reside in HDFS.

Map function
The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs. The map
input and output type may be different from each other.

Partition function
The partition function assigns the output of each Map function to the appropriate reducer. The available key and value
provide this function. It returns the index of reducers.
Phases of MapReduce Dataflow
Shuffling and Sorting
The data are shuffled between/within nodes so that it moves out from the map and get ready to process for reduce
function. Sometimes, the shuffling of data can take much computation time. The sorting operation is performed on input
data for Reduce function. Here, the data is compared using comparison function and arranged in a sorted form.

Reduce function
The Reduce function is assigned to each unique key. These keys are already arranged in sorted order. The values associated
with the keys can iterate the Reduce and generates the corresponding output.

Output writer
Once the data flow from all the above phases, Output writer executes. The role of Output writer is to write the Reduce output
to the stable storage.
MapRed vs MapReduce
The two packages represent input / output formats, mapper and reducer base classes for the corresponding
Hadoop mapred and mapreduce APIs.
MapRed is used in Hadoop version 1 which is org.apache.hadoop.mapred
The second was used in Hadoop version 2 where YARN was introduced and that version of API was
called as MapReduce org.apache.hadoop.mapreduce
Both MapRed and MapReduce different APIs but the functionality of both the APIs is almost the same.
The one and only major difference being that the old API was capable of pushing records to
mapper/reducer. Though there are a few advancements in MapReduce which might be lacking in
MapRed which enabled the upgrade of Hadoop version 2
The new API is cleaner and faster. The old API was deprecated but after a while, it got reverted. Which
API is better to use depends on your tasks.
Mapper→Combiner → Partitioner
The sequence of execution of the mentioned components happens in the below order:

Mapper -> Combiner -> Partitioner

Mapper : The Input data is initially processed by all the Mappers/Map jobs and the intermediate output is created.

Combiner : All the intermediate outputs are optimized by local aggregation before the shuffle/sort phase by the Combiner. The
primary goal of Combiners is to save as much bandwidth as possible by minimizing the number of key/value pairs that will be
shuffled across the network and provided as input to the Reducer.

Partitioner : In Hadoop, partitioning of the keys of the intermediate map output is controlled by Partitioner. Hash function, is used
to derive partition. On the basis of key-value pair each map output is partitioned. Record having same key value goes into the
same partition (within each mapper), and then each partition is sent to a Reducer. Partition phase takes place in between mapper
and reducer.

Default Partitioner (Hash Partitioner) computes a hash value for the key and assigns the partition based on this result
Apache Ambari
It provides a highly interactive dashboard that allows administrators to visualize the progress and status of every application running over

the Hadoop cluster.

Its ﬂexible and scalable user interface allows a range of tools such as Pig, MapReduce, Hive, etc. to be installed on the cluster and

administers their performances in a user-friendly fashion. Some of the key features of this technology can be highlighted as:

● Instantaneous insight into the health of the Hadoop cluster using preconﬁgured operational metrics

● User-friendly conﬁguration providing an easy step-by-step guide for installation

● Installation of Apache Ambari is possible through Hortonworks Data Platform (HDP)

● Monitoring dependencies and performances by visualizing and analyzing jobs and tasks

● Authentication, authorization, and auditing by installing Kerberos-based Hadoop clusters

● Flexible and adaptive technology ﬁtting perfectly in the enterprise environment

Important Links
https://home.cs.colorado.edu/~kena/classes/5448/s11/presentations/hadoop.pdf

Working of Ambari UI

https://docs.cloudera.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-operations/bk_amba
ri-operations.pdf

Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
991.20 Nitrogeno Total en Leche - Kjeldahl
No ratings yet
991.20 Nitrogeno Total en Leche - Kjeldahl
2 pages
MATH 5 - Q1 - Mod1 PDF
78% (49)
MATH 5 - Q1 - Mod1 PDF
25 pages
MSS 064 Rev.00 Final
No ratings yet
MSS 064 Rev.00 Final
33 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
100% (1)
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
10 pages
OSS Engine Parts Section
No ratings yet
OSS Engine Parts Section
28 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Mapreduce Notes
No ratings yet
Mapreduce Notes
4 pages
Big Data Notes
No ratings yet
Big Data Notes
13 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Unit 2
No ratings yet
Unit 2
12 pages
BDA Notes
No ratings yet
BDA Notes
39 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Data Science
No ratings yet
Data Science
7 pages
Map Red
No ratings yet
Map Red
6 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Bda U2
No ratings yet
Bda U2
79 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Unit - III
No ratings yet
Unit - III
37 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
What Is Mapreduce?
No ratings yet
What Is Mapreduce?
3 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit 3
No ratings yet
Unit 3
22 pages
Big Data
No ratings yet
Big Data
120 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Unit 3
No ratings yet
Unit 3
27 pages
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
No ratings yet
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
56 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
Alternating Quantities
No ratings yet
Alternating Quantities
16 pages
Orson Welles' Memo On by Lawrence French
100% (1)
Orson Welles' Memo On by Lawrence French
41 pages
Adult Christian Education: A Training of Kingdom Workers
No ratings yet
Adult Christian Education: A Training of Kingdom Workers
9 pages
Dasmesh Group of Schools: Faridkot/Kotkapura/Bargari Std. VII
No ratings yet
Dasmesh Group of Schools: Faridkot/Kotkapura/Bargari Std. VII
23 pages
Nottingham Contemporary Information
No ratings yet
Nottingham Contemporary Information
39 pages
Cómo Escribir Un Gancho para Un Ensayo
100% (1)
Cómo Escribir Un Gancho para Un Ensayo
7 pages
EMD001 - Medical Companion
No ratings yet
EMD001 - Medical Companion
115 pages
Updated Constitution of Business Club
No ratings yet
Updated Constitution of Business Club
13 pages
Result
No ratings yet
Result
1 page
Lift Manuals - Manuale Delle Parti - CHASSIS, MAST, OPTIONS & INTERNAL HOSING - PDF Tav 4 Ver
No ratings yet
Lift Manuals - Manuale Delle Parti - CHASSIS, MAST, OPTIONS & INTERNAL HOSING - PDF Tav 4 Ver
3 pages
OS Process Synchronization Unit 3
No ratings yet
OS Process Synchronization Unit 3
55 pages
Contracting Activity and Technical Staff Requirements
No ratings yet
Contracting Activity and Technical Staff Requirements
2 pages
5th Grade Gmo Plan
No ratings yet
5th Grade Gmo Plan
1 page
The Travelers Property Casualty Co. v. Saint-Gobain Technical Fabrics Canada Ltd.
No ratings yet
The Travelers Property Casualty Co. v. Saint-Gobain Technical Fabrics Canada Ltd.
11 pages
SCM 100 Review
No ratings yet
SCM 100 Review
23 pages
CT TIF Presentation For Kickoff-Final
No ratings yet
CT TIF Presentation For Kickoff-Final
13 pages
B1 Final Test SpeakingTestFormat
No ratings yet
B1 Final Test SpeakingTestFormat
4 pages
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
No ratings yet
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
6 pages
17 - 03 - 22, 8 - 52 AM Microsoft Lens
No ratings yet
17 - 03 - 22, 8 - 52 AM Microsoft Lens
13 pages
Query Optimization in Object Oriented Databases Through Detecting Independent Subqueries
No ratings yet
Query Optimization in Object Oriented Databases Through Detecting Independent Subqueries
5 pages
NS & Tech - Grade 4 - Terminology List - IsiZulu
No ratings yet
NS & Tech - Grade 4 - Terminology List - IsiZulu
11 pages
Mathura Vrindavan Tour
No ratings yet
Mathura Vrindavan Tour
1 page
Continuity at A Point
No ratings yet
Continuity at A Point
20 pages
Korea University Urban Planning and Urban Design Lab
No ratings yet
Korea University Urban Planning and Urban Design Lab
4 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

HDFS Unit 4

Uploaded by

HDFS Unit 4

Uploaded by

MapReduce

Mapper -> Combiner -> Partitioner

the Hadoop cluster.

● User-friendly conﬁguration providing an easy step-by-step guide for installation

● Installation of Apache Ambari is possible through Hortonworks Data Platform (HDP)

● Authentication, authorization, and auditing by installing Kerberos-based Hadoop clusters

● Flexible and adaptive technology ﬁtting perfectly in the enterprise environment

You might also like