0% found this document useful (0 votes)
68 views

Tutorial MapReduce

The document describes running a MapReduce word count job on Hadoop. It involves copying text files from the local file system to HDFS, running the MapReduce job which counts word occurrences, and checking the output stored in HDFS. It also describes the web interfaces for viewing job tracking, task tracking and HDFS information.

Uploaded by

pavan2711
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Tutorial MapReduce

The document describes running a MapReduce word count job on Hadoop. It involves copying text files from the local file system to HDFS, running the MapReduce job which counts word occurrences, and checking the output stored in HDFS. It also describes the web interfaces for viewing job tracking, task tracking and HDFS information.

Uploaded by

pavan2711
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1

Running a MapReduce job


We will now run your first Hadoop MapReduce job. We will use the WordCount example job which reads text files and
counts how often words occur.
The input is text files and the output is text files, each line of which contains a word and the count of how often it
occurred, separated by a tab.

copy input data


$ls -l /mnt/hgfs/Hadoopsw
total 3604
-rw-r--r-- 1 hduser hadoop

674566 Feb

3 10:17 pg20417.txt

-rw-r--r-- 1 hduser hadoop 1573112 Feb

3 10:18 pg4300.txt

-rw-r--r-- 1 hduser hadoop 1423801 Feb

3 10:18 pg5000.txt

Restart the Hadoop cluster


Restart your Hadoop cluster if its not running already.
# bin/start-all.sh

www.hpottech.com

Running a MapReduce job


Copy local example data to HDFS
Before we run the actual MapReduce job, we first have to copy the files from our local file system to HadoopsHDFS.
#bin/hadoop fs mkdir /user/root
#bin/hadoop fs mkdir /user/root/in
#bin/hadoop dfs -copyFromLocal /mnt/hgfs/Hadoopsw/*.txt /user/root/in

Run the MapReduce job


Now, we actually run the WordCount example job.
#cd $HADOOP_HOME
#bin/hadoop jar hadoop-examples-1.0.0.jar wordcount /user/root/in /user/root/out

This command will read all the files in the HDFS directory /user/root/in, process it, and store the result in the
HDFS directory /user/root/out.

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

Check if the result is successfully stored in HDFS directory /user/root/out/:


#bin/hadoop dfs -ls /user/root

www.hpottech.com

Running a MapReduce job


$ bin/hadoop dfs -ls /user/root/out

www.hpottech.com

Running a MapReduce job


Retrieve the job result from HDFS
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the command
# bin/hadoop dfs -cat /user/root/out/part-r-00000

www.hpottech.com

Running a MapReduce job


Copy the output to local file.
$ mkdir /tmp/hadoop-output
# bin/hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output

www.hpottech.com

Running a MapReduce job


Hadoop Web Interfaces
Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at
these locations:


http://localhost:50030/ web UI for MapReduce job tracker(s)

http://localhost:50060/ web UI for task tracker(s)

http://localhost:50070/ web UI for HDFS name node(s)

These web interfaces provide concise information about whats happening in your Hadoop cluster. You might want to give
them a try.

MapReduce Job Tracker Web Interface


The job tracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed
jobs and a job history log file. It also gives access to the local machines Hadoop log files (the machine on which the web
UI is running on).
By default, its available at http://localhost:50030/.

www.hpottech.com

10

Running a MapReduce job

A screenshot of Hadoop's Job Tracker web interface.

www.hpottech.com

11

Running a MapReduce job


Task Tracker Web Interface
The task tracker web UI shows you running and non-running
non running tasks. It also gives access to the local machines Hadoop
log files.
By default, its available at http://localhost:50060/.
http://localhost:50060/

A screenshot of Hadoop's Task Tracker web interface.

www.hpottech.com

12

Running a MapReduce job


HDFS Name Node Web Interface
The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead
nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It
also gives access to the local machines Hadoop log files.
By default, its available at http://localhost:50070/.

www.hpottech.com

13

Running a MapReduce job

A screenshot of Hadoop's Name Node web interface.

www.hpottech.com

You might also like