0% found this document useful (0 votes)

154 views13 pages

Tutorial MapReduce

The document describes running a MapReduce word count job on Hadoop. It involves copying text files from the local file system to HDFS, running the MapReduce job which counts word occurrences, and checking the output stored in HDFS. It also describes the web interfaces for viewing job tracking, task tracking and HDFS information.

Uploaded by

pavan2711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views13 pages

Tutorial MapReduce

Uploaded by

pavan2711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Running a MapReduce job

We will now run your first Hadoop MapReduce job. We will use the WordCount example job which reads text files and
counts how often words occur.
The input is text files and the output is text files, each line of which contains a word and the count of how often it
occurred, separated by a tab.

copy input data

$ls -l /mnt/hgfs/Hadoopsw
total 3604
-rw-r--r-- 1 hduser hadoop

674566 Feb

3 10:17 pg20417.txt

-rw-r--r-- 1 hduser hadoop 1573112 Feb

3 10:18 pg4300.txt

-rw-r--r-- 1 hduser hadoop 1423801 Feb

3 10:18 pg5000.txt

Restart the Hadoop cluster

Restart your Hadoop cluster if its not running already.
# bin/start-all.sh

www.hpottech.com

Running a MapReduce job

Copy local example data to HDFS
Before we run the actual MapReduce job, we first have to copy the files from our local file system to HadoopsHDFS.
#bin/hadoop fs mkdir /user/root
#bin/hadoop fs mkdir /user/root/in
#bin/hadoop dfs -copyFromLocal /mnt/hgfs/Hadoopsw/*.txt /user/root/in

Run the MapReduce job

Now, we actually run the WordCount example job.
#cd $HADOOP_HOME
#bin/hadoop jar hadoop-examples-1.0.0.jar wordcount /user/root/in /user/root/out

This command will read all the files in the HDFS directory /user/root/in, process it, and store the result in the
HDFS directory /user/root/out.

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

Check if the result is successfully stored in HDFS directory /user/root/out/:

#bin/hadoop dfs -ls /user/root

www.hpottech.com

Running a MapReduce job

$ bin/hadoop dfs -ls /user/root/out

www.hpottech.com

Running a MapReduce job

Retrieve the job result from HDFS
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the command
# bin/hadoop dfs -cat /user/root/out/part-r-00000

www.hpottech.com

Running a MapReduce job

Copy the output to local file.
$ mkdir /tmp/hadoop-output
# bin/hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output

www.hpottech.com

Running a MapReduce job

Hadoop Web Interfaces
Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at
these locations:

http://localhost:50030/ web UI for MapReduce job tracker(s)

http://localhost:50060/ web UI for task tracker(s)

http://localhost:50070/ web UI for HDFS name node(s)

These web interfaces provide concise information about whats happening in your Hadoop cluster. You might want to give
them a try.

MapReduce Job Tracker Web Interface

The job tracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed
jobs and a job history log file. It also gives access to the local machines Hadoop log files (the machine on which the web
UI is running on).
By default, its available at http://localhost:50030/.

www.hpottech.com

Running a MapReduce job

A screenshot of Hadoop's Job Tracker web interface.

www.hpottech.com

Running a MapReduce job

Task Tracker Web Interface
The task tracker web UI shows you running and non-running
non running tasks. It also gives access to the local machines Hadoop
log files.
By default, its available at http://localhost:50060/.
http://localhost:50060/

A screenshot of Hadoop's Task Tracker web interface.

www.hpottech.com

Running a MapReduce job

HDFS Name Node Web Interface
The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead
nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It
also gives access to the local machines Hadoop log files.
By default, its available at http://localhost:50070/.

www.hpottech.com

Running a MapReduce job

A screenshot of Hadoop's Name Node web interface.

www.hpottech.com

Hadoop Lab Practical Guide
No ratings yet
Hadoop Lab Practical Guide
69 pages
Hadoop Single-Node Setup Guide
No ratings yet
Hadoop Single-Node Setup Guide
4 pages
Hadoop Web Interfaces Overview
No ratings yet
Hadoop Web Interfaces Overview
6 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Comprehensive Hadoop Course Overview
No ratings yet
Comprehensive Hadoop Course Overview
60 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Formatting Hadoop Namenode
No ratings yet
Formatting Hadoop Namenode
27 pages
Big Data
No ratings yet
Big Data
28 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Toc 9780134049984
No ratings yet
Toc 9780134049984
10 pages
Setting Up and Running Hadoop 0.20.2
No ratings yet
Setting Up and Running Hadoop 0.20.2
20 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
HDFS Overview and Setup Guide
No ratings yet
HDFS Overview and Setup Guide
40 pages
BDA Record
No ratings yet
BDA Record
58 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
MapReduce Programming Architecture Guide
No ratings yet
MapReduce Programming Architecture Guide
50 pages
Big-Data Unit-3
No ratings yet
Big-Data Unit-3
7 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
3 Unit
No ratings yet
3 Unit
17 pages
Big Data Questions MQC
No ratings yet
Big Data Questions MQC
9 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
04 MapRed 6 JobExecutionOnYarn
No ratings yet
04 MapRed 6 JobExecutionOnYarn
20 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Unit 3 Handouts
No ratings yet
Unit 3 Handouts
11 pages
Understanding Hadoop Ecosystem Basics
No ratings yet
Understanding Hadoop Ecosystem Basics
12 pages
Apache Hadoop MapReduce Commands
No ratings yet
Apache Hadoop MapReduce Commands
5 pages
Hadoop 1
No ratings yet
Hadoop 1
26 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
3 Introduction To Hadoop Administration
No ratings yet
3 Introduction To Hadoop Administration
8 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Big Data File
No ratings yet
Big Data File
16 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
31 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
BDA Lab Manual-2
No ratings yet
BDA Lab Manual-2
61 pages
W Java132
No ratings yet
W Java132
14 pages
Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Big Data-Week 3 - 1
No ratings yet
Big Data-Week 3 - 1
22 pages
Mapreduce
No ratings yet
Mapreduce
5 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Bda Internal 1
No ratings yet
Bda Internal 1
22 pages
Configuring Hadoop MapReduce Applications
No ratings yet
Configuring Hadoop MapReduce Applications
30 pages
Hadoop Ecosystem Overview and Setup
No ratings yet
Hadoop Ecosystem Overview and Setup
48 pages
Tutorial Partitioner
No ratings yet
Tutorial Partitioner
8 pages
Install Hadoop on RedHat Linux Guide
No ratings yet
Install Hadoop on RedHat Linux Guide
4 pages
PowerCenter Data Validation Results
No ratings yet
PowerCenter Data Validation Results
4 pages
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
100% (4)
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
36 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Manual K200 K200 Compact V2 1
No ratings yet
Manual K200 K200 Compact V2 1
41 pages
VM Ii 3.2kva 3.5kva 5kva 5.5kva
No ratings yet
VM Ii 3.2kva 3.5kva 5kva 5.5kva
34 pages
4 PPT WVP 4
No ratings yet
4 PPT WVP 4
22 pages
Service Manual A4051 Series
No ratings yet
Service Manual A4051 Series
80 pages
Comprehensive Guide to Computer Hardware
No ratings yet
Comprehensive Guide to Computer Hardware
16 pages
220 kV Overhead Line Specifications
No ratings yet
220 kV Overhead Line Specifications
88 pages
Indigo Schedule
No ratings yet
Indigo Schedule
99 pages
Draftsman Grade II Town Planning Surveyor GR II
No ratings yet
Draftsman Grade II Town Planning Surveyor GR II
13 pages
W1.1 Introduction To Embedded System
No ratings yet
W1.1 Introduction To Embedded System
28 pages
Harlequin Rip: Proofready Plugin For Canon W7200/7250 Printers
No ratings yet
Harlequin Rip: Proofready Plugin For Canon W7200/7250 Printers
62 pages
Java
No ratings yet
Java
1 page
For Presentation
No ratings yet
For Presentation
13 pages
Nx100 Maintenance Manual
No ratings yet
Nx100 Maintenance Manual
231 pages
EAGLE Detail
No ratings yet
EAGLE Detail
9 pages
Computer Storage Devices Workshop
No ratings yet
Computer Storage Devices Workshop
48 pages
SQL Server 2008 Admin Course
No ratings yet
SQL Server 2008 Admin Course
122 pages
Asus M3A32-MVP Deluxe Series Manual PDF
No ratings yet
Asus M3A32-MVP Deluxe Series Manual PDF
176 pages
Clipsal Wiring Accessories 2009-2010
No ratings yet
Clipsal Wiring Accessories 2009-2010
88 pages
Student Transcript Sample Guide
No ratings yet
Student Transcript Sample Guide
3 pages
Final Project
No ratings yet
Final Project
4 pages
Customer Experiences Using Linux On Ibm Z Systems and Linuxone
No ratings yet
Customer Experiences Using Linux On Ibm Z Systems and Linuxone
53 pages
Optimizing Keyscape For Live Performance
No ratings yet
Optimizing Keyscape For Live Performance
8 pages
Online Exam System for Students
No ratings yet
Online Exam System for Students
6 pages
Huawei Product Certification Document
No ratings yet
Huawei Product Certification Document
15 pages
Arc Welding Transformer Upto 400 Amps
100% (1)
Arc Welding Transformer Upto 400 Amps
5 pages
MGE Comet S31: Three Phase / Single Phase
No ratings yet
MGE Comet S31: Three Phase / Single Phase
4 pages
Bms Honeywell
No ratings yet
Bms Honeywell
24 pages
Versa Frame Manual
No ratings yet
Versa Frame Manual
240 pages
Van'S Aircraft, Inc.: Figure 1: Drilling The Pitot Tube Figure 3: Attaching Pitot Tube
No ratings yet
Van'S Aircraft, Inc.: Figure 1: Drilling The Pitot Tube Figure 3: Attaching Pitot Tube
1 page
The Discovery of E.W. Beth's Semantics For Intuitionistic Logic
No ratings yet
The Discovery of E.W. Beth's Semantics For Intuitionistic Logic
10 pages

Tutorial MapReduce

Uploaded by

Tutorial MapReduce

Uploaded by

1

Running a MapReduce job

copy input data

-rw-r--r-- 1 hduser hadoop 1573112 Feb

-rw-r--r-- 1 hduser hadoop 1423801 Feb

Restart the Hadoop cluster

Running a MapReduce job

Run the MapReduce job

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

Check if the result is successfully stored in HDFS directory /user/root/out/:

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

http://localhost:50030/ web UI for MapReduce job tracker(s)

http://localhost:50060/ web UI for task tracker(s)

http://localhost:50070/ web UI for HDFS name node(s)

MapReduce Job Tracker Web Interface

Running a MapReduce job

A screenshot of Hadoop's Job Tracker web interface.

Running a MapReduce job

A screenshot of Hadoop's Task Tracker web interface.

Running a MapReduce job

Running a MapReduce job

A screenshot of Hadoop's Name Node web interface.

You might also like