0% found this document useful (0 votes)
21 views15 pages

Map Reduce

Uploaded by

SL MA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

Map Reduce

Uploaded by

SL MA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 15
1.1 Load and examine the sample data 1. You should start with a window like this after you log in ee =o) In the PuTTY window, examine the directory structure of the hadoop file system of your directory. Type: You want to make a new directory to store the sample data files that you will use for this, exercise. Make a new directory called sampledata/, and verify that it was created. Type: mkdir sampl __4. Now, to move BDU_MapReduce_and_YARN.tar file into cloud. Open up WinSCP and transfer the file to your home directory. Just drag the file from the left side to the home directory on the right side. 3 Dewnenwentoa eCioale +@ euaaie (Cueeaion ApMNDanwats 5. Then extract the file. Now your window should show something similar when it's done extracting the files. Now that the files are extracted, we will upload the temperature data from the local file system to the HDFS using the following commands: ~/labfiles/SumnerCount yTemp. da Test to see that the file was uploaded correctly by typing the following command: hd£s ¢ ls edata Notice that your SumnerCountyTemp.dat files was uploaded correctly. You can view this data by executing the following command: hdfs df more The values in the 95th column (354, 353, 353,353, 352, ...) are the average daily temperatures. They are the result of multiplying the actual average temperature value times 10. (Incidentally, that way you don't have to worry about working with decimal points.) Press the spacebar a few times to scroll through the data and observe the temperature patterns by date. When you are satisfied, press Ctri+c to break out of the piped output. 1.2 Start your Java project Create a directory to hold the three Java files that you will be making and make it accessible. The directory will be used to hold program artifacts and to separate it from the other things in the file system. cr mkdir com.company name cd com.company.name 1.3 Create the Java file for the mapper class Create @ new Java file, MaxTempMapper.java: vi MaxTempMapper There is a standard set of imports that will also be used for the other two Java files that you create. The data type for the input key to the mapper will be LongWritable. The data itself will be of type Text. The output key trom the mapper will be of type Text. And the data from the mapper (the temperature) will be of type IntWritable. You need a public class with name MaxTempMapper. For this class, ___a. You will need to import java.io.|OException. __b. Extend Mapper Define a public class called map. d. Your code should look like the following: import java.io. IOException; import org. mport org. import org. mport org. pache.hadoop.io-Intwritable; ache .hadoop.io.LongWritable; pache.hadoop.io.Text; pache .hadoop.mapreduce Mapper; public class MaxTempMapper extends Mapper [ @override public void map(Longiiritable key, Text value, Context context) throws IOException, InterruptedException { In the next section you will define the map method. Note: You can also create the java file in notepad and transfer it via WinSCP 1.4 Complete the mapper ‘Your program will read in a line of data as a string so that you can do string manipulation. You will want to extract the month and average temperature for each record. The month begins at the 22th character of the record (zero offset) and the average temperature begins at the 95th character. (Remember that the average temperature value is three digits, with implied one decimal place). __1. Inthe map method, add the following code (or whatever code you think is required): String line - value.tostring(); String month = line.substring (22,24) ; int avgTemp; avgTemp ~ Integer.parseInt (Line. substring (95, 98) ) context .weite(new Text (month), new IntWritable(avgTemp)) ; 2, From this document, you may wish to copy the entire content of the file into Windows Notepad where you can insert the code for the map method. Then transfer the java file (for ex: ‘MaxTempMapper java) to your com.company.name folder. Ifyou are using f press Ese, and then type :wq|to write and exit the vi editor and write your file or Press Esc then hit Shift + z twice. Create the reducer class Create a new Java file, MaxTempMapper.java: vi MaxTempReducer.. java You need a public class with name MaxTempReducer and the data type for the input key to the reducer will be Text. The data itself will be of type /ntWWritable. The output key from the reducer will be of type Text. And the data from the reducer will be of type IntWritable. For your class, a. You will need to import java.io.lOException b. Extend ReducercText, LongWritable, Text, IntWritable> ©. Define a public class called reduce. d. Your code should look like the following: import java.io. IOException; Inport org-apache.hadoop.10-Intwritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce Reducer} public class MaxTempReducer extends Reducer { @override public void reduce (Text key, Iterable values, Context context) throws IOException, Interruptedixception { } Complete the reducer For the reducer, you want to iterate through all values for a given key. For each value found, check to see if itis higher than any of the other values, Add the following code (or your variation) to the reduce method. int maxTemp = Integer.MIN_VALUE; for (IntWritable value: values) [ maxTemp = Math.max(maxTemp, value.get ()¢ context.write (key, new IntWritable(maxTemp)}; Assemble your file in the vi editor, notepad or any way you choose, and remember to save your work. 17 Create the driver Create a new Java file, MaxMonthTemp.java: vi MaxMonthTemp, java ‘You need a public class with name MaxMonthTemp and the standard set of import files. The GenericOptionsParser() will extract any input parameters that are not system parameters and place them in an array. In your case, two parameters will be passed to your application. The first parameter is the input file. The second parameter is the output directory. (This directory must not exist or your MapReduce application will fail.) Your code should look like this: import org-apache.hadoop.conf.Configurat ion; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop .mapreduce . Job; import org.apache.hadoop.mapreduce.1ib.input .FileInputFormat; import org.apache. hadoop.mapreduce.1ib,output .FileoutputFormat; import org.apache.hadoop.util.GenericOpt ionsParser; public class MaxMonthTemp { public static void main(String[) args) throws Exception ( Configuration conf - new Configuration (); String|] programArgs new GenericOptionsParser (conf, args) .getRemainingaArgs (); if (programArgs.length != 2) ( system.err.println("Usage: MaxTemp "); System.exit (2); ) Job job - Job.getInstance (conf, "Monthly Max Temp"); Job. set JarByClass (MaxMonthTemp.class) job. setMapperClass (MaxTempMapper.class) ; job. setReducerClass (MaxTempReducer.class) ; job. setOutputKeyClass (Text class); job. set Output ValueClass (IntWritable. class); FileInputFormat.addInputPath (job, new Path (programArgs[0])); FileOutputFormat .setOutputPath (job, new Path (programArgs[1])}7 nit the 35h and wate pth cinien System.exit (job.waitForCompletion(true) 2.0: 1); __2. Assemble your file in the vi editor any way you choose, and remember to save your work. Compile your Java files & create the JAR file Compile all three Java files with one statement as the root user and then list your directory to see that you now have three Java source files and three Java class files. Type: any .name sspath’ *, java Note that the quotes here are back-quotes (the key to top left comer of your keyboard, to the left of the one-key ["1’}). The command hadoop classpath is executed to list the required classpath information needed by the compiler; the result of this is passed to javac with the classpath option Cop). TnpatPath(job, neu Path(pragranargs(@1)) faxtlonthTemp. java!29: error: unmappable character for encoding UTES Sincere titan nett TonC TEE) HonthTemp. java:29: crror: unmappable character for cncoding UTEB Pe ee reet en a en entero etC TTD Ce eee ee oe eee Em Sarre ier eeeet neers Exnrenerietere cthyy See eee er ee oe Seprerti een Ret oe So strie eees) i Roe ie re eo mc) ees re eve eS een) toe ‘This meant that you copied & pasted directly from this document while using a program that converted some of the whitespaces into a different format. To correct this, open notepad from windows and/or vi and delete the characters that give you an error and replace it with a typed character of itself. (Ex, replace the above boxes, which show up as spaces on your editor with spaces typed from your keyboard), Create a Java Archive File (jar cf, where c = create, f = file) from the three class files. Then list the manifest (jar tf) of the archive file MaxMT.jar *.c The Java Archive File was created in the directory where the ,java and .class files reside. But when we use Hadoop MapReduce to run the jar, Hadoop does not like to have the .class files in the same directory. Therefore you want to move the file to the parent directory, where we will run itin the next step: 1.9 Run the JAR file 1. Run the application. Type: nadoop jar ./MaxMT. jar MaxMonthTe data/Tempout sampledata/Su yTemp. dat samp You will see the following output in that terminal window (but your results will be slightly different, of course): Nm aye det VB <= ee ees eee Besos eet ey eat Nun ras eas ass Bap mae ereirs eae Mee Your results are certainly different, but the final lines of output will probably be similar. You want to examine the output that was produced: The result of this run should be similar to: Fr Be aware that if you cut & paste from this file, sometimes Microsoft and Adobe software change a single dash (“") to an en-dash (’-") or an em-dash ('—’). Linux is not kind to these characters. You can see that the maximum temperature for January (01) was 36.7 F — this is the coldest month of Winter in Sumner County as evidenced by the data — and the maximum temperature for July (07) was 78.5 F for the County (a rather cool summer, it appears)

You might also like