0% found this document useful (0 votes)
2 views7 pages

BDA Unit 3

MapReduce is a programming model that processes large data sets by dividing tasks into smaller chunks executed in parallel through map and reduce functions. It offers fault tolerance and efficiency by co-locating code and data, and is widely used in data analysis scenarios. The model is integral to Hadoop, allowing for scalable data processing across numerous servers.

Uploaded by

sanjithcb0311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
2 views7 pages

BDA Unit 3

MapReduce is a programming model that processes large data sets by dividing tasks into smaller chunks executed in parallel through map and reduce functions. It offers fault tolerance and efficiency by co-locating code and data, and is widely used in data analysis scenarios. The model is integral to Hadoop, allowing for scalable data processing across numerous servers.

Uploaded by

sanjithcb0311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
rae ee ply cea Thee hs Tt SS 1 ay a aga aa Taw, aera Gla unten dot set te ered exploring the Features of MapReduce »"!) 3 MapReduce involves two operations: map and reduce, which ae exceed by problems into smaller chunks. These chanks are run in parallel by different. ing these subtasks Synchronization—Exccution of 7 MapReduce program execution that are taking place in the progr: and starts the reduction a —The effectiveness of a data processing mechanism largely on the location required for the code to execute. The y covlocation of the code and data produces the most effective presing outcome fa Handling of ErrorsFaults—MapReduce engines usually provide a high level of fault tolerance and robustness in handling eros. The reason for proving robust to these engines ee ty to make errors or fault There are high chances in dlastered node} on inning, Therefore, the MapReduce engine must have the forcover the Mapfedce engine design TEKS THat are incomplete and eventually asign them t UL ‘best result is obtained when both code and data reside: chine. This means thatthe? Tights to access the. services that other developers: have cre le wage senanios that possibility of intra, 2 SOIT Cy programming model eang tipkedce PROSE wm by using the map function. further processing. ange th out ist proper erable optimiza Campoteasetof ress yung eves orton 6 Provide the flop (the MopRedice apprasch of ana Kinds of aplations. This yzing data can be used by programmers for imple ihm can also work with extremely large dataset yes, or even beyond. » the MapReduce model executes ‘The map function is executed things we two functions, map and redu ines The reduce function takes the output of the map function t agszegat form. The working ofthe MapReduce approach is shown in Fi [ees] Leese Ta cr G cluster uses commodit Py Bete fae luster imodity Servers to store nodes. The ‘MapReduce and Hadoop Dish Bata FieSisem (IDES; aha * ; ) which re based on these nen shown in Figure 52. Sccomplished through ations performed in (Operations performed in the MapReduce model, according to the dataflow (as shown in Figure 52) are as follows: tin a Hadoop MapReduce programming model. provided from large data files in the form of key-value pair (KVP), which isthe is divided into small pieces, and master and slave nodes are created. The master node usually executes on the machine where the data is present, and slaves are made to work remotely on the data. 2 The map operation is performed simultancously on all the data pieces, which are Fead by the ‘map function. The map function extracts the relevant data and generates the KVP fort. ‘The input/output operations of the map function are shown in Figure 53: ie tooo | ay | BEBEE Mapping tunctlon Hi ouptist | LT AEE Figure 5.3: Working of the Map Function nthe MapRedice Model 425 at Aa apeduce programming model with the help g model and send the report of the population to the organization people recorded from all the areas to reach an aggregate wh Example 2: A data analytic professional parses out every term available in the gat text by creating a ‘creates a map function to find out every word of the chat. ches ‘Alter completing the map operation, the reduce function starts performing the reduce operation Keeping the current target of finding the count of the number of times 3 warLappoars in.the test, shuffling is performed next. This process involves distribution ofthe map output through hash order to map the same keywords to the respective node ofthe reduce function. Assuming , we require 26 nextes that can hand! ‘case, words stating with A will be handled by another node, and soon. Thus, the numer of inted by the reduce step. the detailed MapReduce process used inthis example: = extent a articular company of aN Motard ist of RP, Tag & ete, me the fact that it involves several they 4 ®Y sanctions to find out the desired re pana gle machine wil eae osha re cntentsof her satis o ferent slaves haveno knowledge of each other's tasks, This ¢ MapReduce. We studied about the Rome census, wi no ae pues own area and provided the record directly to thea ares at no valunter asanare abot that data collected by any other volunteer In thesay voy predate cnmat be usd in a scenario where data sharing is required. The Maptn + programing model can also be used for data analysis. For example, we can parse a weblog at FeBcaal nangoctaia uc raeinpenden clea olkes i : ‘The map Tancton has been a par of many funcional programming languages for i opuuty wth an anal bielience languoge called List fd trop Sa at esos oo ore sans oueg) UST) Eéploring Map and Reduce Functions The Neder form of KV Teoma cpa ol acini data analysis for which the data is jakenit® remains the same. The map function retrieves it fat3® pestve of the data. The KVP lst i eee! is provided as an output. The keys 1 128 creaming allows reduce functions in Hadoop. * ing is used for expressing in ut and output § hich the Key and valuearcuporrc rat the text format. The input and output are for writing data .d to bring together. der an example of a program that counts 7 Cons rogram that counts the number of Indian cities having a population of Gove one lakh. You must note thatthe following is not a Sahel ea na programming coe stad a pain Engish 1g manner: . ig = Cal] counties tnithe andiajthat partic{pated in the nost recent general Use the map function to create a function, howttanyPeople, which selects the cities having a population of more than one lakh nap howanyPeople nylist) sit anypeople"city 27; romarbrecpie “city. 3°" honbanyPeoph Uae abietemen ee Now, generate a new output list ofall the cities having a population of more than one lakh: (no, city dr yess icity 2 ioyes city 4302. city. nan) ‘making any modifications 10 the original list the output list gets mapped to a corresponding Moreover, you can the requirement of having more than one lakh people, the map function determines it ‘otherwise, a no is specified. ay ww list provided by the howManyPeople map fi ‘This function processes each element of ‘greater than one lakh. fed as an input for the reduce ‘list ofall the cities with a MapReduce isthe heart of Hadoop. Its this programming paracigm tal alows veacsive eealbilty across hundreds of thousands of servors in 2 Hadoop luster. The MapReduce concept i fity simple to undorstand for those who ae fais wih clustered scale-out data processing solutions. “The term MapReduce actualy refers to two separate and csinct ask hat DOS programs peter. Th fists the map job, which takes a eof Ae and conve 129 eis 16 BOKEN dong GES amep as i osteo 8 put ang nd wy tuples. AS the Sequence of yo", *yaye performed after the map jop, 0° Ray pave a 5, 9 622 Fe Goya tems) that represent a giy = ye various measurement £asy t0f10W. You eqn oO, ‘as it’s likely to contain lors Jc, we want to find the maximum temperate (roe that each Sle might have the same dy MapReduce framework, we can break this do Je werks on one ofthe fve fle, and the mapper ets the maximum temperature foreach ey luce Hem one mapper task for the data above would (New York, 22) (Rome, 33) a iasks (working on the other four files not shows ‘As an analogy. you can tank of map Roman times, where the coy conducted ming a working of MapReduce progr ‘afer learning abo aming model, some of tha team mombers workin Alfir. Richard Paul asked about th dfferent ways m which MapReduco obs con be ootrnced. his Pod nfwored by descnbing the different approaches that can be used for MapRetice job optenizaton. Techniques to Optimize MapReduce Jobs «* jobs, especially the ones where the user requires a quick response to his/her query, e important. Hence, we need the MapReduce job to be optimized. wring a deadlock for even a single resource, during the execution of the program, slows do the performance of MapReduce jobs and their reliability, in addition to the code written for the main application, can be optimized by using some techniques, We can organize the Mapfeduce ‘optimization techniques inthe following categories: 2 Hardware or network topology y r Sfewspaperual, > z Srecigineatauke, a> SportsnewsURL, I> SebnspaperRL, 2> a ‘ 3 ShewspapervaL, 2> Res {Limes the suebsite of a of the Web, “enewspaperURL) 3> - a a Web Page Visitor Paths —Considera situation in which an advocacy group wishes to know how “source,” and the Web page to which the link transfers ‘The map function scans the Web links for returning the result ‘The reduce function output, which is the final out ist (source)? A researcher wishes to reqd articles about flood! but, he does not want those ood is discussed as a minor topic. Therefore, he decided that an article should have the word “tectonic plate” in it more mber of times the spied word occured is . The reduce function count and select only the re Word Count—Suppose a researcher wishes to det al analyzed c Fis now whether MapReduce fe =panerd Pas row wate aM Fhe lean monbors wortrg wit M, aes ‘ean rae ore cD Frees othe data by K5eF oF site lnc on ho em 2 Tat 8 gam mabe wore Pare nteresteg Hh ay a nd ars ag maniy for data trap 3 ih about HBase amine the ways in which HBase ca” Teip perform MapRedtuce operations better, Letuses _Role of HBase in Big Data Proc essing riba, column: ing with Big Data, because every time you do not ala Regi tees are ow-oiented, because columns can be added lur ‘or column-oriented database, the data ed very easly and are added row-by-row, providing Ge developed cloped rapidly and are used in ths data and suiekly update the all the dependency gu ‘The HBase file can be customized as per the user needs by exporting JAN a (nbase-env.sh). To customize an HBase ae a 135

You might also like