0% found this document useful (0 votes)

5 views30 pages

TP 2

Uploaded by

heithem.benmoussa.71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views30 pages

TP 2

Uploaded by

heithem.benmoussa.71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Exercise 4.

Running MapReduce and YARN jobs

Exercise 4. Running MapReduce and

YARN jobs
Estimated time
0:45

Overview
This exercise introduces you to a simple MapReduce program that uses Hadoop v2 and related
technologies. You compile and run the program by using Hadoop and Yarn commands. You also
explore the MapReduce job’s history with Ambari Web UI.

Objectives
After completing this exercise, you will be able to:
• List the sample MapReduce programs provided by the Hadoop community.
• Compile MapReduce programs and run them by using Hadoop and YARN commands.
• Explore the MapReduce job’s history by using the Ambari Web UI.

Introduction
In the MapReduce programming model, the program is written in a special way so that the
programs can be brought to the data. To accomplish this goal, the program is broken down into two
discrete parts: Map and Reduce.
• A mapper is typically a relatively small program with a relatively simple task. A mapper is
responsible for reading a portion of the input data, interpreting, filtering, or transforming the data
as necessary and then finally producing a stream of <key, value> pairs.
• Reducers are the last part of the picture. They are also typically small programs that are
responsible for sorting and aggregating the values with the key that they were assigned to work
on. Just like with mappers, the more unique keys you have, the more parallelism. After each
reducer completes what it is assigned to do, for example, add up the total sales for the state, it
emits key/value pairs that are written to storage. These key/value pairs can be used as the input
to the next MapReduce job.
The fundamental idea of YARN/MRv2 is to split the two main functions of the JobTracker, which are
resource management and job scheduling/monitoring, into separate daemons. The idea is to have
a global ResourceManager (RM) and a pre-application ApplicationMaster (AM) per application. For
more information about MapReduce 2.0 (MRv2) or YARN, see Apache Hadoop YARN.
The Hadoop community provides several standard example programs. You can use the example
programs to learn the relevant technology, and on a newly installed Hadoop cluster, to verify the
system’s operational environment.

© Copyright IBM Corp. 2016, 2021 4-1

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

In this exercise, you list the example programs that are provided by the Hadoop community. Then,
you compile the sample Java program wordcount, which is a MapReduce program that counts the
words in the input files. You run wordcount by using Hadoop and YARN commands. You also learn
to explore the MapReduce job’s history with Ambari Web UI from both MapReduce2 and YARN
services.

Requirements
• Complete "Exercise 1. Exploring the lab environment".
• Complete "Exercise 3. File access and basic commands with HDFS".
• PuTTY SSH client installed in your workstation.

© Copyright IBM Corp. 2016, 2021 4-2

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Exercise instructions
In this exercise, you complete the following tasks:
1. Run a simple MapReduce job from a Hadoop sample program.
2. Explore the MapReduce job’s history with the Ambari Web UI.
3. Run a simple MapReduce job by using YARN.

Part 1: Running a simple MapReduce job from a Hadoop sample program

In this part, you list the example programs that are provided by the Hadoop community. Then, you
compile the sample Java program wordcount, which is a MapReduce program that counts the
words in the input files. You run wordcount by using Hadoop commands.
Complete the following steps:
1. SSH to the Hadoop host by using PuTTY.
a. Enter the <hostname> or <ip_address> that is provided to you and that you updated in
the table of “Exercise 1: Part 1. Accessing your VM”.
b. Log in with the <username> and <password> that are assigned to you.
2. List all MapReduce examples on the system by running the following command:
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar

© Copyright IBM Corp. 2016, 2021 4-3

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

The results are shown in the following output.

[student0000@dataengineer ~]$ hadoop jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words
in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the
histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact
digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits
of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino
problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data
per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the
input files.
wordmedian: A map/reduce program that counts the median length of the words in
the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of
the length of the words in the input files.

© Copyright IBM Corp. 2016, 2021 4-4

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

3. Run the sample program wordcount, which is a MapReduce program that counts the words
in the input files. Enter the following command in one line:
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
wordcount Gutenberg/Frankenstein.txt wcount
Where wcount is the directory where the program writes the final output.

Important

LONG RUNNING command. If a single user submits this command, it usually completes in
approximate 25 seconds. In the current class environment, when tens of students submit jobs at the
same time, more wait time is to be expected. In our tests, the maximum wait time that was
experienced was 3 minutes. Be patient: do not close or break your session.

© Copyright IBM Corp. 2016, 2021 4-5

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

The results are similar to the following output.

[student0000@dataengineer ~]$ hadoop jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount
Gutenberg/Frankenstein.txt wcount
20/09/21 15:39:00 INFO client.RMProxy: Connecting to ResourceManager at
dataengineer.ibm.com/192.168.122.1:8050
20/09/21 15:39:00 INFO client.AHSProxy: Connecting to Application History server at
dataengineer.ibm.com/192.168.122.1:10200
20/09/21 15:39:01 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for
path: /user/student0000/.staging/job_1599487765089_0034
20/09/21 15:39:01 INFO input.FileInputFormat: Total input files to process : 1
20/09/21 15:39:01 INFO mapreduce.JobSubmitter: number of splits:1
20/09/21 15:39:01 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1599487765089_0034
20/09/21 15:39:01 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/09/21 15:39:01 INFO conf.Configuration: found resource resource-types.xml at
file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/09/21 15:39:02 INFO impl.YarnClientImpl: Submitted application
application_1599487765089_0034
20/09/21 15:39:02 INFO mapreduce.Job: The url to track the job:
http://dataengineer.ibm.com:8088/proxy/application_1599487765089_0034/
20/09/21 15:39:02 INFO mapreduce.Job: Running job: job_1599487765089_0034
20/09/21 15:39:09 INFO mapreduce.Job: Job job_1599487765089_0034 running in uber
mode : false
20/09/21 15:39:09 INFO mapreduce.Job: map 0% reduce 0%
20/09/21 15:39:18 INFO mapreduce.Job: map 100% reduce 0%
20/09/21 15:39:24 INFO mapreduce.Job: map 100% reduce 100%
20/09/21 15:39:25 INFO mapreduce.Job: Job job_1599487765089_0034 completed
successfully
20/09/21 15:39:25 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=167616
FILE: Number of bytes written=802409
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=421645
HDFS: Number of bytes written=122090
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=32085
Total time spent by all reduces in occupied slots (ms)=20765
Total time spent by all map tasks (ms)=6417

© Copyright IBM Corp. 2016, 2021 4-6

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Total time spent by all reduce tasks (ms)=4153

Total vcore-milliseconds taken by all map tasks=6417
Total vcore-milliseconds taken by all reduce tasks=4153
Total megabyte-milliseconds taken by all map tasks=32855040
Total megabyte-milliseconds taken by all reduce tasks=21263360
Map-Reduce Framework
Map input records=7244
Map output records=74952
Map output bytes=717818
Map output materialized bytes=167616
Input split bytes=141
Combine input records=74952
Combine output records=11603
Reduce input groups=11603
Reduce shuffle bytes=167616
Reduce input records=11603
Reduce output records=11603
Spilled Records=23206
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=159
CPU time spent (ms)=6820
Physical memory (bytes) snapshot=2789412864
Virtual memory (bytes) snapshot=12801667072
Total committed heap usage (bytes)=3138387968
Peak Map Physical memory (bytes)=2466148352
Peak Map Virtual memory (bytes)=6387011584
Peak Reduce Physical memory (bytes)=323264512
Peak Reduce Virtual memory (bytes)=6414655488
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=421504
File Output Format Counters
Bytes Written=122090

Note

If the Gutenberg folder does not exist on your HDFS directory, run the steps in "Exercise 3. File
access and basic commands with HDFS", "Part 2.Exploring basic HDFS commands".

© Copyright IBM Corp. 2016, 2021 4-7

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

4. Notice that there is only one Reduce task, which is highlighted in bold under the Job
Counters section in the command results. Therefore, the result is produced in one file.
5. List the generated files by running the following command:
hdfs dfs -ls wcount
The result is shown in the following output.
[student0000@dataengineer ~]$ hdfs dfs -ls wcount
Found 2 items
-rw-r--r-- 3 student0000 hdfs 0 2020-09-21 15:39 wcount/_SUCCESS
-rw-r--r-- 3 student0000 hdfs 122090 2020-09-21 15:39 wcount/part-r-00000
As expected, the result is produced in only one file (part-r-0000)
6. To review part-r-0000, run the following command:
hadoop fs -cat wcount/part-r-00000 | more
The result is shown in the following output.

© Copyright IBM Corp. 2016, 2021 4-8

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

7. Type q or Ctrl + C to exit from the file.

8. Remove the directory wcount where your output file was stored, by running the following
command:
hdfs dfs -rm -R wcount
The result is shown in the following output.
[student0000@dataengineer ~]$ hdfs dfs -rm -R wcount
Deleted wcount

Note

It is necessary to remove the wcount directory and all files in it to run the command again without
changes. Alternatively, you could run the command again, but with a different output directory such
as wcount2.

Part 2: Exploring the MapReduce job’s history with the Ambari Web UI
In this part, you explore the MapReduce job that you submitted in the previous part by using the
Ambari Web UI.
Complete the following steps:
1. Start Ambari Web UI by opening the Ambari URL <hostname:8080> in your browser. Refer
to "Exercise 1. Exploring the lab environment".
2. Log in using the Ambari Username <ambari username> and Ambari Password <ambari
password> from "Exercise 1. Exploring the lab environment".
3. Click MapReduce2 under Services on the left pane.

4. Click JobHistory UI in Quick Links section.

The job history page is displayed as shown in the following figure.

© Copyright IBM Corp. 2016, 2021 4-9

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Hint

Press Ctrl + F and search for your username to find the jobs you ran. You can also search for
your Job ID by using the Job ID from the results of running wordcount.

5. Click the job ID to open its history to see the status and logs of the job.

Part 3: Running a simple MapReduce job by using YARN

In this part, you run the same job that you run in Part 1 but by using YARN commands. You also run
the wordcount sample program with more than one input text file.
Complete the following steps:
1. Return to the PuTTY SSH client that you connected to the VM in Part 1.

© Copyright IBM Corp. 2016, 2021 4-10

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Note

If your PuTTY session expired, reconnect.

2. Run the wordcount program by using the following yarn command.

yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
wordcount Gutenberg/Frankenstein.txt wcount2

Important

© Copyright IBM Corp. 2016, 2021 4-11

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

The result is shown in the following output.

[student0000@dataengineer ~]$ yarn jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount
Gutenberg/Frankenstein.txt wcount2
20/09/21 16:21:29 INFO client.RMProxy: Connecting to ResourceManager at
dataengineer.ibm.com/192.168.122.1:8050
20/09/21 16:21:29 INFO client.AHSProxy: Connecting to Application History server at
dataengineer.ibm.com/192.168.122.1:10200
20/09/21 16:21:29 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for
path: /user/student0000/.staging/job_1599487765089_0035
20/09/21 16:21:29 INFO input.FileInputFormat: Total input files to process : 1
20/09/21 16:21:29 INFO mapreduce.JobSubmitter: number of splits:1
20/09/21 16:21:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1599487765089_0035
20/09/21 16:21:29 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/09/21 16:21:30 INFO conf.Configuration: found resource resource-types.xml at
file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/09/21 16:21:30 INFO impl.YarnClientImpl: Submitted application
application_1599487765089_0035
20/09/21 16:21:30 INFO mapreduce.Job: The url to track the job:
http://dataengineer.ibm.com:8088/proxy/application_1599487765089_0035/
20/09/21 16:21:30 INFO mapreduce.Job: Running job: job_1599487765089_0035
20/09/21 16:21:37 INFO mapreduce.Job: Job job_1599487765089_0035 running in uber
mode : false
20/09/21 16:21:37 INFO mapreduce.Job: map 0% reduce 0%
20/09/21 16:21:45 INFO mapreduce.Job: map 100% reduce 0%
20/09/21 16:21:50 INFO mapreduce.Job: map 100% reduce 100%
20/09/21 16:21:51 INFO mapreduce.Job: Job job_1599487765089_0035 completed
successfully
20/09/21 16:21:51 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=167616
FILE: Number of bytes written=802411
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=421645
HDFS: Number of bytes written=122090
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=30780
Total time spent by all reduces in occupied slots (ms)=14920
Total time spent by all map tasks (ms)=6156

© Copyright IBM Corp. 2016, 2021 4-12

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Total time spent by all reduce tasks (ms)=2984

Total vcore-milliseconds taken by all map tasks=6156
Total vcore-milliseconds taken by all reduce tasks=2984
Total megabyte-milliseconds taken by all map tasks=31518720
Total megabyte-milliseconds taken by all reduce tasks=15278080
Map-Reduce Framework
Map input records=7244
Map output records=74952
Map output bytes=717818
Map output materialized bytes=167616
Input split bytes=141
Combine input records=74952
Combine output records=11603
Reduce input groups=11603
Reduce shuffle bytes=167616
Reduce input records=11603
Reduce output records=11603
Spilled Records=23206
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=146
CPU time spent (ms)=5870
Physical memory (bytes) snapshot=2794078208
Virtual memory (bytes) snapshot=12812599296
Total committed heap usage (bytes)=3176660992
Peak Map Physical memory (bytes)=2474508288
Peak Map Virtual memory (bytes)=6391799808
Peak Reduce Physical memory (bytes)=319569920
Peak Reduce Virtual memory (bytes)=6420799488
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=421504
File Output Format Counters
Bytes Written=122090
3. Cleanup the output directory wcount2 by running the following command:
hdfs dfs -rm -R wcount*
The results are shown in the following output.
[student0000@dataengineer ~]$ hdfs dfs -rm -R wcount*
Deleted wcount2

© Copyright IBM Corp. 2016, 2021 4-13

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

4. Re-run the job with all four files in the Gutenberg directory as input, by using the command:
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
wordcount Gutenberg/* wcount2

Important

© Copyright IBM Corp. 2016, 2021 4-14

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

The result is shown in the following output.

[student0000@dataengineer ~]$ yarn jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount
Gutenberg/* wcount2
20/09/21 16:38:43 INFO client.RMProxy: Connecting to ResourceManager at
dataengineer.ibm.com/192.168.122.1:8050
20/09/21 16:38:44 INFO client.AHSProxy: Connecting to Application History server at
dataengineer.ibm.com/192.168.122.1:10200
20/09/21 16:38:44 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for
path: /user/student0000/.staging/job_1599487765089_0037
20/09/21 16:38:44 INFO input.FileInputFormat: Total input files to process : 4
20/09/21 16:38:44 INFO mapreduce.JobSubmitter: number of splits:4
20/09/21 16:38:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1599487765089_0037
20/09/21 16:38:44 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/09/21 16:38:45 INFO conf.Configuration: found resource resource-types.xml at
file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/09/21 16:38:45 INFO impl.YarnClientImpl: Submitted application
application_1599487765089_0037
20/09/21 16:38:45 INFO mapreduce.Job: The url to track the job:
http://dataengineer.ibm.com:8088/proxy/application_1599487765089_0037/
20/09/21 16:38:45 INFO mapreduce.Job: Running job: job_1599487765089_0037
20/09/21 16:38:52 INFO mapreduce.Job: Job job_1599487765089_0037 running in uber
mode : false
20/09/21 16:38:52 INFO mapreduce.Job: map 0% reduce 0%
20/09/21 16:39:04 INFO mapreduce.Job: map 50% reduce 0%
20/09/21 16:39:05 INFO mapreduce.Job: map 100% reduce 0%
20/09/21 16:39:10 INFO mapreduce.Job: map 100% reduce 100%
20/09/21 16:39:11 INFO mapreduce.Job: Job job_1599487765089_0037 completed
successfully
20/09/21 16:39:12 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=474331
FILE: Number of bytes written=2116635
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1401923
HDFS: Number of bytes written=261426
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=193980
Total time spent by all reduces in occupied slots (ms)=20745

© Copyright IBM Corp. 2016, 2021 4-15

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Total time spent by all map tasks (ms)=38796

Total time spent by all reduce tasks (ms)=4149
Total vcore-milliseconds taken by all map tasks=38796
Total vcore-milliseconds taken by all reduce tasks=4149
Total megabyte-milliseconds taken by all map tasks=198635520
Total megabyte-milliseconds taken by all reduce tasks=21242880
Map-Reduce Framework
Map input records=24960
Map output records=246237
Map output bytes=2365551
Map output materialized bytes=474349
Input split bytes=579
Combine input records=246237
Combine output records=32643
Reduce input groups=23934
Reduce shuffle bytes=474349
Reduce input records=32643
Reduce output records=23934
Spilled Records=65286
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=1229
CPU time spent (ms)=28690
Physical memory (bytes) snapshot=7629279232
Virtual memory (bytes) snapshot=31986794496
Total committed heap usage (bytes)=9484369920
Peak Map Physical memory (bytes)=2478088192
Peak Map Virtual memory (bytes)=6393864192
Peak Reduce Physical memory (bytes)=327307264
Peak Reduce Virtual memory (bytes)=6420643840
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1401344
File Output Format Counters
Bytes Written=261426
5. Notice how many mappers and reducers run for this job (highlighted in bold under Job
Counters section of the command results).
6. Return to the Ambari Web UI. Click YARN under Services on the left pane.

© Copyright IBM Corp. 2016, 2021 4-16

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

7. Click ResourceManager UI under Quick Links.

8. Select the Applications tab.

© Copyright IBM Corp. 2016, 2021 4-17

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

9. Select the application ID of the job you run in step 4.

Note

You can get the application ID from the result of running the job. Find the value of Submitted
application (for example application_1599487765089_0037 highlighted in bold in the command
results).

10. Click History at the upper right.

Notice the number of Map and Reduce tasks.

11. Return to PuTTY to clean up the output directories by running the following command:
hdfs dfs -rm -R wcount*

© Copyright IBM Corp. 2016, 2021 4-18

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

The result looks like the following output.

[student0000@dataengineer ~]$ hdfs dfs -rm -R wcount*
Deleted wcount2

End of exercise

© Copyright IBM Corp. 2016, 2021 4-19

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 4. Running MapReduce and YARN jobs

Exercise review and wrap-up

In this exercise, you compiled the sample Java program wordcount, which is a MapReduce
program that counts the words in the input files. You ran wordcount by using Hadoop and YARN
commands. You also explored the MapReduce job’s history by using the Ambari Web UI.

© Copyright IBM Corp. 2016, 2021 4-20

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Exercise 5. Creating and coding a

simple MapReduce job
Estimated time
30 minutes

Overview
In this exercise, you compile and run a new and more complex version of the WordCount program
that was introduced in “Exercise 4. Running MapReduce and YARN jobs”. This new version uses
many of the features that are provided by the MapReduce framework.

Objectives
After completing this exercise, you will be able to:
• Compile and run more complex MapReduce programs.

Introduction
In this exercise, you use a more complex MapReduce program, WordCount2.java, which is
provided as part of the Apache Hadoop MapReduce tutorials at
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/M
apReduceTutorial.html#Example:_WordCount_v2.0.
WordCount.java v2 is more sophisticated than the one that you used in Exercise 4. In this version,
you can specify patterns that you might want to skip when the program counts words, such as "to",
"the", “/”, and others.
There are some limitations; if you are an experienced Java programmer, you might want to
experiment later with other features. For instance, are all words lowercased when they are
tokenized?
Since you are now more familiar with the process of running MapReduced and YARN jobs, the
directions provided here concentrate on the compilation and running only.

Requirements
• Complete "Exercise 1. Exploring the lab environment".
• Complete "Exercise 3. File access and basic commands with HDFS".
• PuTTY SSH client installed in your workstation.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Exercise instructions
In this exercise, you complete the following tasks:
1. Compile and run a more complete version of WordCount program.

Part 1: Compiling and running a more complete version of WordCount

program
In this part, you compile WordCount v2, which is a more complete version of the WordCount
program that you compiled in Exercise 4. Then, you run WordCount by using a Hadoop command.
Complete the following steps:
1. Connect to the Hadoop host by using PuTTY, as described in “Exercise 1: Part 1. Accessing
your VM”.
2. Explore the files in the WordCount2 folder y by running the following command.
ls /labfiles/WordCount2/
The result is similar to the following output.
[student0001@dataengineer ~]$ ls /labfiles/WordCount2/
patternsToSkip WordCount2.java
As shown, the WordCount2 folder contains the following files:
- WordCount2.java: A more complete version of the WordCount progam was used in
Exercise 4.
- patternToSkip: A file that contains the patterns to skip, one pattern per line.
3. Copy WordCount2.java to the current directory by running the commands:
cd ~
cp /labfiles/WordCount2/WordCount2.java .

Note

At the end of this command, there is a period. It indicates the current directory, which is your home
directory in this case.

4. Get the classpath environment variable for Hadoop, which you need for compilation:
hadoop classpath
The result is similar to the following output.
[student0000@dataengineer WordCount2]$ hadoop classpath
/usr/hdp/3.1.4.0-315/hadoop/conf:/usr/hdp/3.1.4.0-315/hadoop/lib/*:/usr/hdp/3.1.4.
0-315/hadoop/.//*:/usr/hdp/3.1.4.0-315/hadoop-hdfs/./:/usr/hdp/3.1.4.0-315/hadoop-
hdfs/lib/*:/usr/hdp/3.1.4.0-315/hadoop-hdfs/.//*:/usr/hdp/3.1.4.0-315/hadoop-mapre
duce/lib/*:/usr/hdp/3.1.4.0-315/hadoop-mapreduce/.//*:/usr/hdp/3.1.4.0-315/hadoop-
yarn/./:/usr/hdp/3.1.4.0-315/hadoop-yarn/lib/*:/usr/hdp/3.1.4.0-315/hadoop-yarn/./

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

/*:/usr/hdp/3.1.4.0-315/tez/*:/usr/hdp/3.1.4.0-315/tez/lib/*:/usr/hdp/3.1.4.0-315/
tez/conf:/usr/hdp/3.1.4.0-315/tez/conf_llap:/usr/hdp/3.1.4.0-315/tez/doc:/usr/hdp/
3.1.4.0-315/tez/hadoop-shim-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/hadoop-
shim-2.8-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib:/usr/hdp/3.1.4.0-315/t
ez/man:/usr/hdp/3.1.4.0-315/tez/tez-api-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315
/tez/tez-common-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-dag-0.9.1.3.1.4
.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-examples-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.
1.4.0-315/tez/tez-history-parser-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/te
z-javadoc-tools-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-job-analyzer-0.
9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-mapreduce-0.9.1.3.1.4.0-315.jar:/
usr/hdp/3.1.4.0-315/tez/tez-protobuf-history-plugin-0.9.1.3.1.4.0-315.jar:/usr/hdp
/3.1.4.0-315/tez/tez-runtime-internals-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/
tez/tez-runtime-library-0.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-tests-0
.9.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-yarn-timeline-cache-plugin-0.9.1
.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-yarn-timeline-history-0.9.1.3.1.4.0-
315.jar:/usr/hdp/3.1.4.0-315/tez/tez-yarn-timeline-history-with-acls-0.9.1.3.1.4.0
-315.jar:/usr/hdp/3.1.4.0-315/tez/tez-yarn-timeline-history-with-fs-0.9.1.3.1.4.0-
315.jar:/usr/hdp/3.1.4.0-315/tez/ui:/usr/hdp/3.1.4.0-315/tez/lib/async-http-client
-1.9.40.jar:/usr/hdp/3.1.4.0-315/tez/lib/commons-cli-1.2.jar:/usr/hdp/3.1.4.0-315/
tez/lib/commons-codec-1.4.jar:/usr/hdp/3.1.4.0-315/tez/lib/commons-collections-3.2
.2.jar:/usr/hdp/3.1.4.0-315/tez/lib/commons-collections4-4.1.jar:/usr/hdp/3.1.4.0-
315/tez/lib/commons-io-2.4.jar:/usr/hdp/3.1.4.0-315/tez/lib/commons-lang-2.6.jar:/
usr/hdp/3.1.4.0-315/tez/lib/commons-math3-3.1.1.jar:/usr/hdp/3.1.4.0-315/tez/lib/g
cs-connector-1.9.10.3.1.4.0-315-shaded.jar:/usr/hdp/3.1.4.0-315/tez/lib/guava-28.0
-jre.jar:/usr/hdp/3.1.4.0-315/tez/lib/hadoop-aws-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.
1.4.0-315/tez/lib/hadoop-azure-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/
hadoop-azure-datalake-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/hadoop-hd
fs-client-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/hadoop-mapreduce-clie
nt-common-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/hadoop-mapreduce-clie
nt-core-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/hadoop-yarn-server-time
line-pluginstorage-3.1.1.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/tez/lib/jersey-clien
t-1.19.jar:/usr/hdp/3.1.4.0-315/tez/lib/jersey-json-1.19.jar:/usr/hdp/3.1.4.0-315/
tez/lib/jettison-1.3.4.jar:/usr/hdp/3.1.4.0-315/tez/lib/jetty-server-9.3.24.v20180
605.jar:/usr/hdp/3.1.4.0-315/tez/lib/jetty-util-9.3.24.v20180605.jar:/usr/hdp/3.1.
4.0-315/tez/lib/jsr305-3.0.0.jar:/usr/hdp/3.1.4.0-315/tez/lib/metrics-core-3.1.0.j
ar:/usr/hdp/3.1.4.0-315/tez/lib/protobuf-java-2.5.0.jar:/usr/hdp/3.1.4.0-315/tez/l
ib/RoaringBitmap-0.4.9.jar:/usr/hdp/3.1.4.0-315/tez/lib/servlet-api-2.5.jar:/usr/h
dp/3.1.4.0-315/tez/lib/slf4j-api-1.7.10.jar:/usr/hdp/3.1.4.0-315/tez/lib/tez.tar.g
z
5. Compile WordCount.java with the Hadoop2 API and this CLASSPATH:
javac -cp `hadoop classpath` WordCount2.java

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Note

Notice the back quotation mark, also known as backtick around `hadoop classpath`. A backtick is
not a quotation sign. It has a very special meaning as a command substitution. The purpose of
command substitution is to evaluate the command, which is placed inside the backtick and provide
its result as an argument to the actual command. Everything that you type between backticks is
evaluated (executed) by the shell before the main command (like hadoop classpath in this
example), and the output of that execution is used by that command (javac in this example) as if
you'd type that output at that place in the command line.

6. Create a JAR file that can be run in the Hadoop2/YARN environment:

jar cf WC2.jar *.class
7. To run your JAR file successfully, remove all the class files from your Linux directory:
rm *.class
8. Copy the patternsToSkip file to your HDFS home directory by running the command:
hdfs dfs -put /labfiles/WordCount2/patternsToSkip

Note

By default the output of this command goes to your HDFS home directory, which is
/user/<username> in the environment for this course.

9. You are now ready to run the compiled program with the appropriate parameters. Review
the program logic and the use of these additional parameters in the WordCount V2.0 tutorial
at
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client
-core/MapReduceTutorial.html#Example:_WordCount_v2.0.
Run the following command in one line:
hadoop jar WC2.jar WordCount2 -D wordcount.case.sensitive=false Gutenberg/*.txt
wc2out -skip patternsToSkip

Important

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

The result is similar to the following output.

[student0000@dataengineer ~]$ hadoop jar WC2.jar WordCount2 -D
wordcount.case.sensitive=false Gutenberg/*.txt wc2out -skip patternsToSkip
20/09/22 01:45:58 INFO client.RMProxy: Connecting to ResourceManager at
dataengineer.ibm.com/192.168.122.1:8050
20/09/22 01:45:59 INFO client.AHSProxy: Connecting to Application History server at
dataengineer.ibm.com/192.168.122.1:10200
20/09/22 01:45:59 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for
path: /user/student0000/.staging/job_1599487765089_0041
20/09/22 01:45:59 INFO input.FileInputFormat: Total input files to process : 4
20/09/22 01:45:59 INFO mapreduce.JobSubmitter: number of splits:4
20/09/22 01:45:59 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1599487765089_0041
20/09/22 01:45:59 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/09/22 01:46:00 INFO conf.Configuration: found resource resource-types.xml at
file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/09/22 01:46:00 INFO impl.YarnClientImpl: Submitted application
application_1599487765089_0041
20/09/22 01:46:00 INFO mapreduce.Job: The url to track the job:
http://dataengineer.ibm.com:8088/proxy/application_1599487765089_0041/
20/09/22 01:46:00 INFO mapreduce.Job: Running job: job_1599487765089_0041
20/09/22 01:46:08 INFO mapreduce.Job: Job job_1599487765089_0041 running in uber
mode : false
20/09/22 01:46:08 INFO mapreduce.Job: map 0% reduce 0%
20/09/22 01:46:20 INFO mapreduce.Job: map 25% reduce 0%
20/09/22 01:46:22 INFO mapreduce.Job: map 50% reduce 0%
20/09/22 01:46:23 INFO mapreduce.Job: map 100% reduce 0%
20/09/22 01:46:28 INFO mapreduce.Job: map 100% reduce 100%
20/09/22 01:46:29 INFO mapreduce.Job: Job job_1599487765089_0041 completed
successfully
20/09/22 01:46:29 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=314816
FILE: Number of bytes written=1803560
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1401923
HDFS: Number of bytes written=161660
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=211835
Total time spent by all reduces in occupied slots (ms)=25625

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Total time spent by all map tasks (ms)=42367

Total time spent by all reduce tasks (ms)=5125
Total vcore-milliseconds taken by all map tasks=42367
Total vcore-milliseconds taken by all reduce tasks=5125
Total megabyte-milliseconds taken by all map tasks=216919040
Total megabyte-milliseconds taken by all reduce tasks=26240000
Map-Reduce Framework
Map input records=24960
Map output records=237985
Map output bytes=2268163
Map output materialized bytes=314834
Input split bytes=579
Combine input records=237985
Combine output records=21832
Reduce input groups=14870
Reduce shuffle bytes=314834
Reduce input records=21832
Reduce output records=14870
Spilled Records=43664
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=2480
CPU time spent (ms)=29210
Physical memory (bytes) snapshot=10452713472
Virtual memory (bytes) snapshot=31970623488
Total committed heap usage (bytes)=12929990656
Peak Map Physical memory (bytes)=2567954432
Peak Map Virtual memory (bytes)=6394359808
Peak Reduce Physical memory (bytes)=328450048
Peak Reduce Virtual memory (bytes)=6417879040
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
WordCount2$TokenizerMapper$CountersEnum
INPUT_WORDS=237985
File Input Format Counters
Bytes Read=1401344
File Output Format Counters
Bytes Written=161660

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Note

• If the Gutenberg folder does not exist in your HDFS directory, run the steps in Exercise 3.
• You can run the same program with the following yarn command. But before you run it, clean
up the output directory wc2out by running hdfs dfs -rm -R wc2out.
yarn jar WC2.jar WordCount2 -D wordcount.case.sensitive=false Gutenberg/*.txt
wc2out -skip patternsToSkip

10. Notice how many mappers and reducers run for this job. They are highlighted in bold under
the Job Counters section of the command results.
11. List the generated files by running the command:
hdfs dfs -ls wc2out
The result is similar to the following output.
[student0000@dataengineer ~]$ hdfs dfs -ls wc2out
Found 2 items
-rw-r--r-- 3 student0000 hdfs 0 2020-09-22 01:46 wc2out/_SUCCESS
-rw-r--r-- 3 student0000 hdfs 161660 2020-09-22 01:46 wc2out/part-r-00000
12. Explore the results of the program that are generated in the file part-r-00000.
hdfs dfs -cat wc2out/part-r-00000 | more

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Scroll through the file by pressing Enter and look at the output pages.
13. Enter q to quit.
13. Clean up the output directory wc2out by running the command:
hdfs dfs -rm -R wc2out
The result is similar to the following output.
student0000@dataengineer ~]$ hdfs dfs -rm -R wc2out
Deleted wc2out

End of exercise

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Exercise 5. Creating and coding a simple MapReduce job

Exercise review and wrap-up

In this exercise, you compiled and ran a more complete version of the WordCount program and
experienced some features offered by the MapReduce framework.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Fabric Onelake
No ratings yet
Fabric Onelake
89 pages
Risk Curves Manual
100% (1)
Risk Curves Manual
164 pages
DEV 301 - Lab Guide
100% (1)
DEV 301 - Lab Guide
46 pages
Manual - 4000 9000 en PDF
100% (2)
Manual - 4000 9000 en PDF
387 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
How To Install - Adobe - Premiere - Pro - 2022
No ratings yet
How To Install - Adobe - Premiere - Pro - 2022
2 pages
PBDS Unit4
No ratings yet
PBDS Unit4
32 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
31 pages
Case Study of OM Sales Order Using Open Interface TableCase Study of OM Sales Order Using Open Interface Table
No ratings yet
Case Study of OM Sales Order Using Open Interface TableCase Study of OM Sales Order Using Open Interface Table
30 pages
SimPlot User Guide
No ratings yet
SimPlot User Guide
16 pages
DM-Intro 13.0 L-01 Intro WB
100% (1)
DM-Intro 13.0 L-01 Intro WB
58 pages
Open Form Post Processing
No ratings yet
Open Form Post Processing
101 pages
Hadoop Mapreduce V2 Cookbook 2Nd Edition Explore The Hadoop Mapreduce V2 Ecosystem To Gain Insights From Very Large Datasets Thilina Gunarathne
No ratings yet
Hadoop Mapreduce V2 Cookbook 2Nd Edition Explore The Hadoop Mapreduce V2 Ecosystem To Gain Insights From Very Large Datasets Thilina Gunarathne
51 pages
Computer Virus and Antivirus Techniques
No ratings yet
Computer Virus and Antivirus Techniques
37 pages
Visualization Using ChimeraX
No ratings yet
Visualization Using ChimeraX
23 pages
Testo 420 Flow Hood: Instruction Manual
No ratings yet
Testo 420 Flow Hood: Instruction Manual
30 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Dell Precision M6400 Manual
No ratings yet
Dell Precision M6400 Manual
54 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
AccuRIP Software Manual
No ratings yet
AccuRIP Software Manual
22 pages
LTspice Tutorial
No ratings yet
LTspice Tutorial
29 pages
EasyFile MANUAL
No ratings yet
EasyFile MANUAL
29 pages
SYS600 Status Codes
No ratings yet
SYS600 Status Codes
180 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
SABDE3G05 Big Data MapReduce Yarn
No ratings yet
SABDE3G05 Big Data MapReduce Yarn
69 pages
Chapter 4 MapReduce
No ratings yet
Chapter 4 MapReduce
82 pages
05-MapReduce and Yarn
No ratings yet
05-MapReduce and Yarn
82 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Data Output in The qs-STAT Format: User'S Manual V2.2
No ratings yet
Data Output in The qs-STAT Format: User'S Manual V2.2
23 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
No ratings yet
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
26 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
BDA Record
No ratings yet
BDA Record
58 pages
Unit Iii
No ratings yet
Unit Iii
38 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Data Science
No ratings yet
Data Science
82 pages
Update BIOS Steps en
No ratings yet
Update BIOS Steps en
10 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
MR YARN - Lab 2 - Cloud - Updated-V2.0
No ratings yet
MR YARN - Lab 2 - Cloud - Updated-V2.0
22 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
9 Practicas+BigData MapReduce
No ratings yet
9 Practicas+BigData MapReduce
6 pages
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
24 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
SETUP PROCEDURE: How To Force The Explorer Into Software Upload Mode
No ratings yet
SETUP PROCEDURE: How To Force The Explorer Into Software Upload Mode
4 pages
Chapter 2: Running Example Program and Bench Mark: Big Data Analytics (15CS82)
No ratings yet
Chapter 2: Running Example Program and Bench Mark: Big Data Analytics (15CS82)
12 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Notes
No ratings yet
Notes
53 pages
DOC1 Designer Tips and Tricks v2
No ratings yet
DOC1 Designer Tips and Tricks v2
11 pages
Bank Account Management System Project Report
No ratings yet
Bank Account Management System Project Report
76 pages
Purbanchal University Gomendra Multiple College: in Partial Fulfillment of The Requirement of Bca 2
No ratings yet
Purbanchal University Gomendra Multiple College: in Partial Fulfillment of The Requirement of Bca 2
43 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Analysis of Security Issues in Web Applications Through Penetration Testing
No ratings yet
Analysis of Security Issues in Web Applications Through Penetration Testing
7 pages
Chapter3 HDFS MapReduce YARN
No ratings yet
Chapter3 HDFS MapReduce YARN
35 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
UNIT4 Notes
No ratings yet
UNIT4 Notes
32 pages
SPG U4 2
No ratings yet
SPG U4 2
25 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Unit-2 Bda Kalyan - Pagenumber
No ratings yet
Unit-2 Bda Kalyan - Pagenumber
15 pages
Practice 2
No ratings yet
Practice 2
7 pages
Toc 9780134049984
No ratings yet
Toc 9780134049984
10 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
1c MR YARN Transcript
No ratings yet
1c MR YARN Transcript
4 pages
Bda 03
No ratings yet
Bda 03
10 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Models - Llsimulink.battery Control Thermal
No ratings yet
Models - Llsimulink.battery Control Thermal
12 pages
Excel 2007 Hyperlink To PDF Cannot Open The Specified File
No ratings yet
Excel 2007 Hyperlink To PDF Cannot Open The Specified File
2 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Using Microsoft DFS For Failover
No ratings yet
Using Microsoft DFS For Failover
10 pages
How To Capture A Process Monitor Trace
No ratings yet
How To Capture A Process Monitor Trace
1 page
How To Install - West Coast Grammy 2 VST
No ratings yet
How To Install - West Coast Grammy 2 VST
7 pages
Frostwire 4.17.0 Forensic Examinations
No ratings yet
Frostwire 4.17.0 Forensic Examinations
3 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
IBM 1401 Programming Systems
From Everand
IBM 1401 Programming Systems
Archive Classics
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

TP 2

Uploaded by

TP 2

Uploaded by

Exercise 4.

Running MapReduce and YARN jobs

Exercise 4. Running MapReduce and

© Copyright IBM Corp. 2016, 2021 4-1

© Copyright IBM Corp. 2016, 2021 4-2

Part 1: Running a simple MapReduce job from a Hadoop sample program

© Copyright IBM Corp. 2016, 2021 4-3

The results are shown in the following output.

© Copyright IBM Corp. 2016, 2021 4-4

© Copyright IBM Corp. 2016, 2021 4-5

The results are similar to the following output.

© Copyright IBM Corp. 2016, 2021 4-6

Total time spent by all reduce tasks (ms)=4153

© Copyright IBM Corp. 2016, 2021 4-7

© Copyright IBM Corp. 2016, 2021 4-8

7. Type q or Ctrl + C to exit from the file.

4. Click JobHistory UI in Quick Links section.

© Copyright IBM Corp. 2016, 2021 4-9

Part 3: Running a simple MapReduce job by using YARN

© Copyright IBM Corp. 2016, 2021 4-10

If your PuTTY session expired, reconnect.

2. Run the wordcount program by using the following yarn command.

© Copyright IBM Corp. 2016, 2021 4-11

The result is shown in the following output.

© Copyright IBM Corp. 2016, 2021 4-12

Total time spent by all reduce tasks (ms)=2984

© Copyright IBM Corp. 2016, 2021 4-13

© Copyright IBM Corp. 2016, 2021 4-14

The result is shown in the following output.

© Copyright IBM Corp. 2016, 2021 4-15

Total time spent by all map tasks (ms)=38796

© Copyright IBM Corp. 2016, 2021 4-16

7. Click ResourceManager UI under Quick Links.

8. Select the Applications tab.

© Copyright IBM Corp. 2016, 2021 4-17

9. Select the application ID of the job you run in step 4.

10. Click History at the upper right.

© Copyright IBM Corp. 2016, 2021 4-18

The result looks like the following output.

© Copyright IBM Corp. 2016, 2021 4-19

Exercise review and wrap-up

© Copyright IBM Corp. 2016, 2021 4-20

Exercise 5. Creating and coding a

© Copyright IBM Corp. 2016, 2021 5-1

Part 1: Compiling and running a more complete version of WordCount

© Copyright IBM Corp. 2016, 2021 5-2

© Copyright IBM Corp. 2016, 2021 5-3

6. Create a JAR file that can be run in the Hadoop2/YARN environment:

© Copyright IBM Corp. 2016, 2021 5-4

The result is similar to the following output.

© Copyright IBM Corp. 2016, 2021 5-5

Total time spent by all map tasks (ms)=42367

© Copyright IBM Corp. 2016, 2021 5-6

© Copyright IBM Corp. 2016, 2021 5-7

© Copyright IBM Corp. 2016, 2021 5-8

Exercise review and wrap-up

© Copyright IBM Corp. 2016, 2021 5-9

You might also like