MapReduce is a programming model that processes large data sets by dividing tasks into smaller chunks executed in parallel through map and reduce functions. It offers fault tolerance and efficiency by co-locating code and data, and is widely used in data analysis scenarios. The model is integral to Hadoop, allowing for scalable data processing across numerous servers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views7 pages
BDA Unit 3
MapReduce is a programming model that processes large data sets by dividing tasks into smaller chunks executed in parallel through map and reduce functions. It offers fault tolerance and efficiency by co-locating code and data, and is widely used in data analysis scenarios. The model is integral to Hadoop, allowing for scalable data processing across numerous servers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
rae ee ply cea Thee hs Tt SS 1 ay a aga aa Taw, aera Gla
unten dot set te ered
exploring the Features of MapReduce »"!)
3 MapReduce involves two operations: map and reduce, which ae exceed by
problems into smaller chunks. These chanks are run in parallel by different.
ing these subtasks
Synchronization—Exccution of
7 MapReduce program execution
that are taking place in the progr:
and starts the reduction
a —The effectiveness of a data processing mechanism
largely on the location required for the code to execute. The y
covlocation of the code and data produces the most effective presing outcome
fa Handling of ErrorsFaults—MapReduce engines usually provide a high level of fault tolerance
and robustness in handling eros. The reason for proving robust to these engines ee
ty to make errors or fault There are high chances in dlastered node} on
inning, Therefore, the MapReduce engine must have the
forcover the Mapfedce engine design
TEKS THat are incomplete and eventually asign them t
UL ‘best result is obtained when both code and data reside: chine. This means thatthe?
Tights to access the. services that other developers: have cre
le wage senaniosthat
possibility of intra,
2 SOIT Cy programming model eang
tipkedce PROSE wm
by using the map function.
further processing.
ange th out ist proper erable optimiza
Campoteasetof ress yung eves orton
6 Provide the flop
(the MopRedice apprasch of ana
Kinds of aplations. This
yzing data can be used by programmers for imple
ihm can also work with extremely large dataset
yes, or even beyond.
» the MapReduce model executes
‘The map function is executed
things we
two functions, map and redu
ines The reduce function takes the output of the map function t
agszegat form. The working ofthe MapReduce approach is shown in Fi
[ees]
Leese
Ta cr
G cluster uses commodit Py Bete fae
luster imodity Servers to store nodes. The
‘MapReduce and Hadoop Dish Bata FieSisem (IDES; aha * ;
) which re based on these nen
shown in Figure 52.
Sccomplished through
ations performed in
(Operations performed in the MapReduce model, according to the dataflow (as shown in Figure 52)
are as follows:
tin a Hadoop MapReduce programming model.
provided from large data files in the form of key-value pair (KVP), which isthe
is divided into small pieces, and master and slave nodes are created. The master
node usually executes on the machine where the data is present, and slaves are made to work
remotely on the data.
2 The map operation is performed simultancously on all the data pieces, which are Fead by the
‘map function. The map function extracts the relevant data and generates the KVP fort.
‘The input/output operations of the map function are shown in Figure 53:
ie tooo | ay | BEBEE
Mapping tunctlon Hi
ouptist | LT AEE
Figure 5.3: Working of the Map Function nthe MapRedice Model
425at Aa
apeduce programming model with the help
g model
and send the report of the population to the organization
people recorded from all the areas to reach an aggregate wh
Example 2: A data analytic professional parses out every term available in the gat text by creating a
‘creates a map function to find out every word of the chat. ches
‘Alter completing the map operation, the reduce function starts performing the reduce operation
Keeping the current target of finding the count of the number of times 3 warLappoars in.the test,
shuffling is performed next. This process involves distribution ofthe map output through hash
order to map the same keywords to the respective node ofthe reduce function. Assuming
, we require 26 nextes that can hand!
‘case, words stating with A will be handled by
another node, and soon. Thus, the numer of
inted by the reduce step.
the detailed MapReduce process used inthis example:
=
extenta articular company of aN
Motard ist of RP, Tag &
ete, me
the fact that it involves several they
4 ®Y sanctions to find out the desired re
pana gle machine wil eae
osha
re cntentsof her satis o
ferent slaves haveno knowledge of each other's tasks, This ¢
MapReduce. We studied about the Rome census, wi
no ae pues own area and provided the record directly to thea
ares at no valunter asanare abot that data collected by any other volunteer In thesay
voy predate cnmat be usd in a scenario where data sharing is required. The Maptn
+ programing model can also be used for data analysis. For example, we can parse a weblog at
FeBcaal nangoctaia uc raeinpenden clea olkes
i :
‘The map Tancton has been a par of many funcional programming languages for i
opuuty wth an anal bielience languoge called List fd trop
Sa at esos oo ore sans oueg) UST)
Eéploring Map and Reduce Functions
The Neder
form of KV Teoma cpa
ol acini data analysis for which the data is jakenit®
remains the same. The map function retrieves it fat3®
pestve of the data. The KVP lst i eee!
is provided as an output. The keys 1
128
creaming allows
reduce functions in Hadoop. *
ing is used for expressing in
ut and output
§ hich the Key and valuearcuporrc rat the text
format. The input and output are
for writing data
.d to bring together.
der an example of a program that counts 7
Cons rogram that counts the number of Indian cities having a population of
Gove one lakh. You must note thatthe following is not a
Sahel ea na programming coe stad a pain Engish
1g manner: .
ig = Cal] counties tnithe andiajthat partic{pated in the nost recent general
Use the map function to create a function, howttanyPeople, which selects the cities having a
population of more than one lakh
nap howanyPeople nylist) sit anypeople"city 27;
romarbrecpie “city. 3°" honbanyPeoph Uae abietemen ee
Now, generate a new output list ofall the cities having a population of more than one lakh:
(no, city dr yess icity 2 ioyes city 4302. city. nan)
‘making any modifications 10 the original list
the output list gets mapped to a corresponding
Moreover, you can
the requirement of having more than one lakh people, the map function determines it
‘otherwise, a no is specified.
ay
ww list provided by the howManyPeople map fi
‘This function processes each element of
‘greater than one lakh.
fed as an input for the reduce
‘list ofall the cities with a
MapReduce isthe heart of Hadoop. Its this programming paracigm tal alows
veacsive eealbilty across hundreds of thousands of servors in 2 Hadoop luster. The
MapReduce concept i fity simple to undorstand for those who ae fais wih
clustered scale-out data processing solutions.
“The term MapReduce actualy refers to two separate and csinct ask hat DOS
programs peter. Th fists the map job, which takes a eof Ae and conve
129eis 16 BOKEN dong GES
amep as i
osteo 8 put ang
nd
wy tuples. AS the Sequence of yo",
*yaye performed after the map jop, 0° Ray
pave a 5, 9 622 Fe Goya
tems) that represent a giy =
ye various measurement
£asy t0f10W. You eqn oO,
‘as it’s likely to contain lors
Jc, we want to find the maximum temperate
(roe that each Sle might have the same dy
MapReduce framework, we can break this do
Je werks on one ofthe fve fle, and the mapper
ets the maximum temperature foreach ey
luce Hem one mapper task for the data above would
(New York, 22) (Rome, 33) a
iasks (working on the other four files not shows
‘As an analogy. you can tank of map
Roman times, where the coy
conducted
ming a working of MapReduce progr
‘afer learning abo aming model, some of tha team mombers workin
Alfir. Richard Paul asked about th dfferent ways m which MapReduco obs con be ootrnced. his Pod
nfwored by descnbing the different approaches that can be used for MapRetice job optenizaton.
Techniques to Optimize MapReduce Jobs «*
jobs, especially the ones where the user requires a quick response to his/her query,
e important. Hence, we need the MapReduce job to be optimized.
wring a deadlock for even a single resource, during the execution of the program, slows
do
the performance of MapReduce jobs and their reliability, in addition to the code written for the main
application, can be optimized by using some techniques, We can organize the Mapfeduce
‘optimization techniques inthe following categories:
2 Hardware or network topology
y r
Sfewspaperual, > z
Srecigineatauke, a>
SportsnewsURL, I>
SebnspaperRL, 2> a
‘ 3
ShewspapervaL, 2> Res
{Limes the suebsite of a
of the Web,
“enewspaperURL) 3> - a a
Web Page Visitor Paths —Considera situation in which an advocacy group wishes to know how
“source,” and the Web page to which the link transfers
‘The map function scans the Web links for returning the result
‘The reduce function output, which is the final out
ist (source)?
A researcher wishes to reqd articles about flood! but, he does not want those
ood is discussed as a minor topic. Therefore, he decided that an article
should have the word “tectonic plate” in it more
mber of times the spied word occured
is . The reduce function
count and select only the re
Word Count—Suppose a researcher wishes to det
al analyzed cFis now whether MapReduce fe
=panerd Pas row wate aM
Fhe lean monbors wortrg wit M, aes ‘ean rae ore cD
Frees othe data by K5eF oF site lnc on ho
em 2 Tat 8 gam mabe wore Pare nteresteg Hh ay
a nd ars
ag maniy for data trap 3 ih
about HBase
amine the ways in which HBase ca”
Teip perform MapRedtuce operations better,
Letuses
_Role of HBase in Big Data Proc
essing
riba, column:
ing with Big Data, because every time you do not ala
Regi tees are ow-oiented, because
columns can be added lur ‘or column-oriented database, the data
ed very easly and are added row-by-row, providing Ge
developed
cloped rapidly and are used in
ths data and suiekly update the
all the dependency gu
‘The HBase file can be customized as per the user needs by exporting JAN a
(nbase-env.sh). To customize an HBase ae a
135