0% found this document useful (0 votes)

67 views

Hadoop Lab Notes: Nicola Tonellotto November 15, 2010

This document provides instructions for setting up Hadoop on a single node and running sample jobs locally in standalone and pseudo-distributed modes. It describes prerequisites, installing Hadoop, verifying the installation, and configuring Hadoop to run in pseudo-distributed mode. It also includes sample code for a basic word count MapReduce job and describes running the job on the local Hadoop instance to count the number of words in input files.

Uploaded by

Bilal Aziz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Hadoop Lab Notes: Nicola Tonellotto November 15, 2010

Uploaded by

Bilal Aziz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Hadoop Lab Notes

Nicola Tonellotto
November 15, 2010
2
Contents
1 Hadoop Setup 4
1.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Word Count Exercise 7

3
1 Hadoop Setup
1.1 Prerequisites
1. Gnu/Linux computer
2. Java 1.6 SDK installed
3. SSH must be installed and SSHD must be running

1.2 Installation
1. Create the hadoop user account and login as hadoop user.
2. Download hadoop-??0.20.2.tar.gz in your home dir.
3. Unpack the downloaded Hadoop distribution in your home dir.
4. Check that you can ssh to localhost without a passphrase:

↑Code
hadoop@localhost$ ssh localhost
↓Code

If you can not ssh to localhost without a passphrase, execute the following commands:

↑Code
hadoop@localhost$ ssh-keygen -t dsa -P ’’ -f ~/.ssh/id_dsa
hadoop@localhost$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
↓Code

5. Move to the Hadoop distribution dir:

↑Code
hadoop@localhost$ cd $HOME/hadoop-0.20.2
↓Code

6. Create the HADOOP HOME environment variable:

↑Code
hadoop@localhost$ export HADOOP_HOME=‘pwd‘
↓Code

7. Edit the file $HADOOP HOME/conf/hadoop-env.sh to define at least JAVA HOME to be the root of
your Java installation.
8. Try the following command:

↑Code
hadoop@localhost$ bin/hadoop
↓Code

This will display the usage documentation for the hadoop script.

4
1.3 Verification
1. By default, Hadoop is configured to run in a non-distributed mode (standalone mode), as a single
Java process. This is useful for debugging.
2. The following example copies the unpacked conf directory to use as input and then finds and dis-
plays every match of the given regular expression. Output is written to the given output directory.

↑Code

hadoop@localhost$ mkdir input

hadoop@localhost$ cp conf/*.xml input
hadoop@localhost$ bin/hadoop jar hadoop-0.20.2-examples.jar \
grep input output ’dfs[a-z.]+’
hadoop@localhost$ cat output/*
↓Code

3. Clean up:

↑Code

hadoop@localhost$ rm -rf input output

↓Code

4. Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop
daemon runs in a separate Java process.
5. Edit the conf/core-site.xml file:

↑Code

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
↓Code

6. Edit the conf/hdfs-site.xml file:

↑Code

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
↓Code

7. Edit the conf/mapred-site.xml file:

↑Code

5
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
↓Code

8. Format a new distributed filesystem:

↑Code

hadoop@localhost$ bin/hadoop namenode -format

↓Code

9. Start the Hadoop daemons:

↑Code

hadoop@localhost$ bin/start-all.sh

↓Code

10. Browse the web interface for the NameNode and the JobTracker; by default they are available at:

• NameNode - http://localhost:50070
• JobTracker - http://localhost:50030
11. Copy the input files into the distributed filesystem:

↑Code

hadoop@localhost$ bin/hadoop fs -put conf input

↓Code

12. Run some of the examples provided:

↑Code

hadoop@localhost$ bin/hadoop jar hadoop-*-examples.jar \

grep input output ’dfs[a-z.]+’
↓Code

13. Copy the output files from the distributed filesystem to the local filesystem and examine them:

↑Code

hadoop@localhost$ bin/hadoop fs -get output output

hadoop@localhost$ cat output/*
↓Code

6
14. Clean up:

↑Code

hadoop@localhost$ rm -r output
hadoop@localhost$ bin/hadoop fs -rmr input output
↓Code

15. When you’re done, stop the daemons with:

↑Code

hadoop@localhost$ bin/stop-all.sh
↓Code

2 Word Count Exercise

We will see a basic Hadoop implementation of the word count application. Create the following
WordCount.java source file in the HADOOP HOME dir.

↑Code
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class NewMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class NewReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)

7
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values)
sum += val.get();
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCountNew.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(NewMapper.class);
job.setCombinerClass(NewReducer.class);
job.setReducerClass(NewReducer.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
↓Code

• Compile WordCount.java and create a jar file:

↑Code

hadoop@localhost$ cd ${HADOOP_HOME}
hadoop@localhost$ mkdir classes
hadoop@localhost?$ javac -cp hadoop-0.20.2-core.jar -d classes WordCount.java
hadoop@localhost?$ jar -cvf wordcount.jar -C classes/ .
↓Code

• Create the following sample files in your HOME dir:

↑Code

hadoop@localhost$ echo Hello World > file01

hadoop@localhost$ echo Hello Java > file02
hadoop@localhost$ echo Java and MapReduce > file03
↓Code

• Create the HDFS input dir:

↑Code

hadoop@localhost$ bin/hadoop fs -mkdir /user/hadoop/wordcount/input

8
↓Code

• Copy the sample files in HDFS:

↑Code

hadoop@localhost$ bin/hadoop fs -put file0? /user/hadoop/wordcount/input/

↓Code

• Check the sample files have been copied:

↑Code

hadoop@localhost$ bin/hadoop fs -ls /user/hadoop/wordcount/input/

hadoop@localhost$ bin/hadoop fs -cat /user/hadoop/wordcount/input/file01
hadoop@localhost$ bin/hadoop fs -cat /user/hadoop/wordcount/input/file02
hadoop@localhost$ bin/hadoop fs -cat /user/hadoop/wordcount/input/file03
↓Code

• Run the application:

↑Code

hadoop@localhost$ bin/hadoop jar wordcount.jar WordCount \

/user/hadoop/wordcount/input /user/hadoop/wordcount/output
↓Code

• Check the output:

↑Code

hadoop@localhost$ bin/hadoop fs -cat /user/hadoop/wordcount/output/part-00000

↓Code

• Clean up

↑Code

hadoop@localhost$ rm -r WordCount.java wordcount.jar classes/ file0?

hadoop@localhost$ bin/hadoop fs -rmr /user/hadoop/wordcount

↓Code

Ascend Pipeline ISDN Modem Manual/Guide
No ratings yet
Ascend Pipeline ISDN Modem Manual/Guide
295 pages
BW SystemEngineeringGuide
No ratings yet
BW SystemEngineeringGuide
95 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
2020300053_BDA_EXP1_CHINMAY
No ratings yet
2020300053_BDA_EXP1_CHINMAY
13 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
BDA record
No ratings yet
BDA record
58 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
CC EXP 8 VBHV
No ratings yet
CC EXP 8 VBHV
8 pages
BIG data file
No ratings yet
BIG data file
28 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
big data
No ratings yet
big data
28 pages
Part B Assignment_No_11
No ratings yet
Part B Assignment_No_11
6 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Big Data File
No ratings yet
Big Data File
16 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
big datalab
No ratings yet
big datalab
4 pages
Palak
No ratings yet
Palak
10 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
bda 1
No ratings yet
bda 1
6 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
BDA Lab Manual_organized (2) (1) - Copy
No ratings yet
BDA Lab Manual_organized (2) (1) - Copy
69 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
61 pages
CC 8
No ratings yet
CC 8
4 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
bda megh
No ratings yet
bda megh
50 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
hadoop6
No ratings yet
hadoop6
5 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
bda-manual
No ratings yet
bda-manual
33 pages
Step 1: Download Binary Package
No ratings yet
Step 1: Download Binary Package
50 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Worksheet3.1 CC
No ratings yet
Worksheet3.1 CC
8 pages
Mapreducejobrun
No ratings yet
Mapreducejobrun
86 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
bda lab s
No ratings yet
bda lab s
92 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
37 pages
Hadoop
No ratings yet
Hadoop
28 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
DA Lab EXERCISE
No ratings yet
DA Lab EXERCISE
24 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
HarshYadav 20CS3032 Assignment1
No ratings yet
HarshYadav 20CS3032 Assignment1
22 pages
Bda Record
No ratings yet
Bda Record
27 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
04. Hadoop Installaion (1)
No ratings yet
04. Hadoop Installaion (1)
113 pages
Data Science
No ratings yet
Data Science
82 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Vijeo Quick Start Tutorial V710
No ratings yet
Vijeo Quick Start Tutorial V710
86 pages
Italian Baroque
No ratings yet
Italian Baroque
1 page
Descriptive Paragraph - My Bedroom - WRITING CLASS
No ratings yet
Descriptive Paragraph - My Bedroom - WRITING CLASS
2 pages
Linux Device Driver Development
No ratings yet
Linux Device Driver Development
3 pages
Column Layout Plan For Two Story Building Learn Everything - C
No ratings yet
Column Layout Plan For Two Story Building Learn Everything - C
6 pages
Contoh Company Profile
No ratings yet
Contoh Company Profile
10 pages
Color Schemes
No ratings yet
Color Schemes
19 pages
SP36 1
No ratings yet
SP36 1
360 pages
V Sepa Presentation
0% (1)
V Sepa Presentation
13 pages
Pa 4000 PDF
No ratings yet
Pa 4000 PDF
4 pages
Beatriz Colomina - A Selective Bibliography
No ratings yet
Beatriz Colomina - A Selective Bibliography
7 pages
Makalah Bahasa Inggris
No ratings yet
Makalah Bahasa Inggris
4 pages
Label 1
No ratings yet
Label 1
7 pages
I Flex Heavy Duty Bollard
No ratings yet
I Flex Heavy Duty Bollard
2 pages
RCS 5
No ratings yet
RCS 5
15 pages
PSS5000-TEGU VRM Verify v2 Install Test 80337601
No ratings yet
PSS5000-TEGU VRM Verify v2 Install Test 80337601
34 pages
How To Save A PDF To Your Iphone or Ipad
No ratings yet
How To Save A PDF To Your Iphone or Ipad
19 pages
Design of Grating For Platform
100% (2)
Design of Grating For Platform
5 pages
Finroc Crash Course
No ratings yet
Finroc Crash Course
21 pages
The Psychological Impact of Architectural Design
No ratings yet
The Psychological Impact of Architectural Design
44 pages
Bioclimatic Chart
No ratings yet
Bioclimatic Chart
4 pages
Top 10 API Security Risks 2019 PDF
No ratings yet
Top 10 API Security Risks 2019 PDF
31 pages
Mcsa - Mcse Track II
No ratings yet
Mcsa - Mcse Track II
2 pages
Sapota Pruning Experiment BND
No ratings yet
Sapota Pruning Experiment BND
5 pages
Compare
No ratings yet
Compare
1 page
Rate Analysis: (Estimation Costing & Valuation)
100% (3)
Rate Analysis: (Estimation Costing & Valuation)
20 pages
Struts1.2 Tutorial
No ratings yet
Struts1.2 Tutorial
15 pages
Intel Core 2 Duo E4600 Specifications
No ratings yet
Intel Core 2 Duo E4600 Specifications
5 pages

Hadoop Lab Notes: Nicola Tonellotto November 15, 2010

Uploaded by

Hadoop Lab Notes: Nicola Tonellotto November 15, 2010

Uploaded by

Hadoop Lab Notes

2 Word Count Exercise 7

5. Move to the Hadoop distribution dir:

6. Create the HADOOP HOME environment variable:

hadoop@localhost$ mkdir input

hadoop@localhost$ rm -rf input output

6. Edit the conf/hdfs-site.xml file:

7. Edit the conf/mapred-site.xml file:

8. Format a new distributed filesystem:

hadoop@localhost$ bin/hadoop namenode -format

9. Start the Hadoop daemons:

hadoop@localhost$ bin/hadoop fs -put conf input

12. Run some of the examples provided:

hadoop@localhost$ bin/hadoop jar hadoop-*-examples.jar \

hadoop@localhost$ bin/hadoop fs -get output output

15. When you’re done, stop the daemons with:

2 Word Count Exercise

public class WordCount {

public void map(Object key, Text value, Context context)

public static class NewReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)

public static void main(String[] args) throws Exception {

FileInputFormat.addInputPath(job, new Path(args[0]));

• Compile WordCount.java and create a jar file:

• Create the following sample files in your HOME dir:

hadoop@localhost$ echo Hello World > file01

• Create the HDFS input dir:

hadoop@localhost$ bin/hadoop fs -mkdir /user/hadoop/wordcount/input

• Copy the sample files in HDFS:

hadoop@localhost$ bin/hadoop fs -put file0? /user/hadoop/wordcount/input/

• Check the sample files have been copied:

hadoop@localhost$ bin/hadoop fs -ls /user/hadoop/wordcount/input/

• Run the application:

hadoop@localhost$ bin/hadoop jar wordcount.jar WordCount \

• Check the output:

hadoop@localhost$ bin/hadoop fs -cat /user/hadoop/wordcount/output/part-00000

hadoop@localhost$ rm -r WordCount.java wordcount.jar classes/ file0?

You might also like