Map Reduce

Uploaded by

SL MA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

21 views15 pages

Map Reduce

Uploaded by

SL MA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 15

1.1 Load and examine the sample data 1. You should start with a window like this after you log in ee =o) In the PuTTY window, examine the directory structure of the hadoop file system of your directory. Type: You want to make a new directory to store the sample data files that you will use for this, exercise. Make a new directory called sampledata/, and verify that it was created. Type: mkdir sampl__4. Now, to move BDU_MapReduce_and_YARN.tar file into cloud. Open up WinSCP and transfer the file to your home directory. Just drag the file from the left side to the home directory on the right side. 3 Dewnenwentoa eCioale +@ euaaie (Cueeaion ApMNDanwats 5. Then extract the file.Now your window should show something similar when it's done extracting the files. Now that the files are extracted, we will upload the temperature data from the local file system to the HDFS using the following commands: ~/labfiles/SumnerCount yTemp. da Test to see that the file was uploaded correctly by typing the following command: hd£s ¢ ls edata Notice that your SumnerCountyTemp.dat files was uploaded correctly. You can view this data by executing the following command: hdfs df moreThe values in the 95th column (354, 353, 353,353, 352, ...) are the average daily temperatures. They are the result of multiplying the actual average temperature value times 10. (Incidentally, that way you don't have to worry about working with decimal points.) Press the spacebar a few times to scroll through the data and observe the temperature patterns by date. When you are satisfied, press Ctri+c to break out of the piped output.1.2 Start your Java project Create a directory to hold the three Java files that you will be making and make it accessible. The directory will be used to hold program artifacts and to separate it from the other things in the file system. cr mkdir com.company name cd com.company.name1.3 Create the Java file for the mapper class Create @ new Java file, MaxTempMapper.java: vi MaxTempMapper There is a standard set of imports that will also be used for the other two Java files that you create. The data type for the input key to the mapper will be LongWritable. The data itself will be of type Text. The output key trom the mapper will be of type Text. And the data from the mapper (the temperature) will be of type IntWritable. You need a public class with name MaxTempMapper. For this class, ___a. You will need to import java.io.|OException. __b. Extend Mapper Define a public class called map. d. Your code should look like the following: import java.io. IOException; import org. mport org. import org. mport org. pache.hadoop.io-Intwritable; ache .hadoop.io.LongWritable; pache.hadoop.io.Text; pache .hadoop.mapreduce Mapper; public class MaxTempMapper extends Mapper [ @override public void map(Longiiritable key, Text value, Context context) throws IOException, InterruptedException { In the next section you will define the map method. Note: You can also create the java file in notepad and transfer it via WinSCP1.4 Complete the mapper ‘Your program will read in a line of data as a string so that you can do string manipulation. You will want to extract the month and average temperature for each record. The month begins at the 22th character of the record (zero offset) and the average temperature begins at the 95th character. (Remember that the average temperature value is three digits, with implied one decimal place). __1. Inthe map method, add the following code (or whatever code you think is required): String line - value.tostring(); String month = line.substring (22,24) ; int avgTemp; avgTemp ~ Integer.parseInt (Line. substring (95, 98) ) context .weite(new Text (month), new IntWritable(avgTemp)) ; 2, From this document, you may wish to copy the entire content of the file into Windows Notepad where you can insert the code for the map method. Then transfer the java file (for ex: ‘MaxTempMapper java) to your com.company.name folder. Ifyou are using f press Ese, and then type :wq|to write and exit the vi editor and write your file or Press Esc then hit Shift + z twice.Create the reducer class Create a new Java file, MaxTempMapper.java: vi MaxTempReducer.. java You need a public class with name MaxTempReducer and the data type for the input key to the reducer will be Text. The data itself will be of type /ntWWritable. The output key from the reducer will be of type Text. And the data from the reducer will be of type IntWritable. For your class, a. You will need to import java.io.lOException b. Extend ReducercText, LongWritable, Text, IntWritable> ©. Define a public class called reduce. d. Your code should look like the following: import java.io. IOException; Inport org-apache.hadoop.10-Intwritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce Reducer} public class MaxTempReducer extends Reducer { @override public void reduce (Text key, Iterable values, Context context) throws IOException, Interruptedixception { }Complete the reducer For the reducer, you want to iterate through all values for a given key. For each value found, check to see if itis higher than any of the other values, Add the following code (or your variation) to the reduce method. int maxTemp = Integer.MIN_VALUE; for (IntWritable value: values) [ maxTemp = Math.max(maxTemp, value.get ()¢ context.write (key, new IntWritable(maxTemp)}; Assemble your file in the vi editor, notepad or any way you choose, and remember to save your work.17 Create the driver Create a new Java file, MaxMonthTemp.java: vi MaxMonthTemp, java ‘You need a public class with name MaxMonthTemp and the standard set of import files. The GenericOptionsParser() will extract any input parameters that are not system parameters and place them in an array. In your case, two parameters will be passed to your application. The first parameter is the input file. The second parameter is the output directory. (This directory must not exist or your MapReduce application will fail.) Your code should look like this: import org-apache.hadoop.conf.Configurat ion; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop .mapreduce . Job; import org.apache.hadoop.mapreduce.1ib.input .FileInputFormat; import org.apache. hadoop.mapreduce.1ib,output .FileoutputFormat; import org.apache.hadoop.util.GenericOpt ionsParser; public class MaxMonthTemp { public static void main(String[) args) throws Exception ( Configuration conf - new Configuration (); String|] programArgs new GenericOptionsParser (conf, args) .getRemainingaArgs (); if (programArgs.length != 2) ( system.err.println("Usage: MaxTemp "); System.exit (2); ) Job job - Job.getInstance (conf, "Monthly Max Temp"); Job. set JarByClass (MaxMonthTemp.class) job. setMapperClass (MaxTempMapper.class) ; job. setReducerClass (MaxTempReducer.class) ; job. setOutputKeyClass (Text class); job. set Output ValueClass (IntWritable. class); FileInputFormat.addInputPath (job, new Path (programArgs[0])); FileOutputFormat .setOutputPath (job, new Path (programArgs[1])}7 nit the 35h and wate pth cinien System.exit (job.waitForCompletion(true) 2.0: 1);__2. Assemble your file in the vi editor any way you choose, and remember to save your work.Compile your Java files & create the JAR file Compile all three Java files with one statement as the root user and then list your directory to see that you now have three Java source files and three Java class files. Type: any .name sspath’ *, java Note that the quotes here are back-quotes (the key to top left comer of your keyboard, to the left of the one-key ["1’}). The command hadoop classpath is executed to list the required classpath information needed by the compiler; the result of this is passed to javac with the classpath option Cop). TnpatPath(job, neu Path(pragranargs(@1)) faxtlonthTemp. java!29: error: unmappable character for encoding UTES Sincere titan nett TonC TEE) HonthTemp. java:29: crror: unmappable character for cncoding UTEB Pe ee reet en a en entero etC TTD Ce eee ee oe eee Em Sarre ier eeeet neers Exnrenerietere cthyy See eee er ee oe Seprerti een Ret oe So strie eees) i Roe ie re eo mc) ees re eve eS een) toe ‘This meant that you copied & pasted directly from this document while using a program that converted some of the whitespaces into a different format. To correct this, open notepad from windows and/or vi and delete the characters that give you an error and replace it with a typed character of itself. (Ex, replace the above boxes, which show up as spaces on your editor with spaces typed from your keyboard),Create a Java Archive File (jar cf, where c = create, f = file) from the three class files. Then list the manifest (jar tf) of the archive file MaxMT.jar *.c The Java Archive File was created in the directory where the ,java and .class files reside. But when we use Hadoop MapReduce to run the jar, Hadoop does not like to have the .class files in the same directory. Therefore you want to move the file to the parent directory, where we will run itin the next step:1.9 Run the JAR file 1. Run the application. Type: nadoop jar ./MaxMT. jar MaxMonthTe data/Tempout sampledata/Su yTemp. dat samp You will see the following output in that terminal window (but your results will be slightly different, of course): Nm aye det VB <= ee ees eee Besos eet ey eat Nun ras eas assBap mae ereirs eae Mee Your results are certainly different, but the final lines of output will probably be similar. You want to examine the output that was produced: The result of this run should be similar to: Fr Be aware that if you cut & paste from this file, sometimes Microsoft and Adobe software change a single dash (“") to an en-dash (’-") or an em-dash ('—’). Linux is not kind to these characters. You can see that the maximum temperature for January (01) was 36.7 F — this is the coldest month of Winter in Sumner County as evidenced by the data — and the maximum temperature for July (07) was 78.5 F for the County (a rather cool summer, it appears)

Lab Manual
No ratings yet
Lab Manual
86 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
BDA Record
No ratings yet
BDA Record
58 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Developing A MapReduce Application
No ratings yet
Developing A MapReduce Application
30 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
ADBMS Module4
No ratings yet
ADBMS Module4
31 pages
Hadoop Map Reduce
No ratings yet
Hadoop Map Reduce
8 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
Hadoop
No ratings yet
Hadoop
19 pages
BD 2lab
No ratings yet
BD 2lab
7 pages
DA Lab Program-3
No ratings yet
DA Lab Program-3
9 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
Practical 2-2
No ratings yet
Practical 2-2
9 pages
MR YARN - Lab 1 - Cloud - Updated-V2.0
No ratings yet
MR YARN - Lab 1 - Cloud - Updated-V2.0
30 pages
Group B PR 3 DSBDA
No ratings yet
Group B PR 3 DSBDA
6 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
54 pages
Short Programs
No ratings yet
Short Programs
41 pages
Data Science
No ratings yet
Data Science
82 pages
BDA4
No ratings yet
BDA4
7 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
AP20110010464
No ratings yet
AP20110010464
7 pages
Using Map Reduce Concept, Implement A Java Pro...
No ratings yet
Using Map Reduce Concept, Implement A Java Pro...
2 pages
Map Reduce 1
No ratings yet
Map Reduce 1
8 pages
BDA
No ratings yet
BDA
19 pages
3 MapReduce Framework
No ratings yet
3 MapReduce Framework
28 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
No ratings yet
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
5 pages
Exp 3 4
No ratings yet
Exp 3 4
7 pages
Hadoop Weather
No ratings yet
Hadoop Weather
4 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Week 10
No ratings yet
Week 10
4 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
cl3 Exp 09
No ratings yet
cl3 Exp 09
4 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
10 pages
Running Map Reduce Program in Eclipse: C:/hadoop
No ratings yet
Running Map Reduce Program in Eclipse: C:/hadoop
6 pages
Week 1 Hadoop and Hdfs Commands
No ratings yet
Week 1 Hadoop and Hdfs Commands
1 page
Tutorial Partitioner
No ratings yet
Tutorial Partitioner
8 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Worksheet 6th
No ratings yet
Worksheet 6th
6 pages
MapReduce Hands On
No ratings yet
MapReduce Hands On
28 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Bda Material Unit 3
No ratings yet
Bda Material Unit 3
14 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages

Map Reduce

Uploaded by

Map Reduce

Uploaded by

You might also like