0% found this document useful (0 votes)
14 views5 pages

L4A Running Hadoop With MR

The document outlines how to execute a Hadoop application using a MapReduce program to count word frequencies in files of varying sizes. It includes step-by-step instructions for running the program with different input files and configurations, as well as checking the output results. The exercises demonstrate the use of default and custom reducer settings in the MapReduce execution process.

Uploaded by

2024554243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

L4A Running Hadoop With MR

The document outlines how to execute a Hadoop application using a MapReduce program to count word frequencies in files of varying sizes. It includes step-by-step instructions for running the program with different input files and configurations, as well as checking the output results. The exercises demonstrate the use of default and custom reducer settings in the MapReduce execution process.

Uploaded by

2024554243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

L4A - Running Hadoop Application with

MapReduce Program
Objective To explore how to execute Hadoop application based on mapreduce program
To explore how to change the number of reducer for running Hadoop application

Exercise 1 Application objective:


To count the frequency of each words in the file where the size of the file is less than
128MB

Sample Download samplefile.txt (size:28.6 KB)


dataset

Steps The steps:


transfer or put the input file into HDFS
execute the command
check the results

execute 1) execute the following command via SSH :


the *assuming the txt file is not in a directory
command
$hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar wordcount samplefile.txt countfromfile

*note: if the file is in a directory, such as "input", then you need to specify "input/samplefile.txt"

2) mapreduce program is executed and you should get as follows:

Note: observe the number of mapper and reducer executed by default:

mapper = 1
reducer = 2
check the 1) you should see an output folder created named countfromfile
output
(via HUE)

2) click on the folder, and you should get the following:

3) Click on one of the files.

What is the function of wordcount application?

Exercise 2 Application objective:


To count the frequency of each words in the file (.csv) where the size of the file is
greater than 128MB

Sample Download Selected_Task_sample.csv (size: 188.7 MB)


dataset

Steps The steps:


transfer or put the input file into HDFS
execute the command
check the results

execute 1) execute the following command via SSH :


the
command $hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar wordcount Selected_Task_sample.csv countfromSample
2) mapreduce program is executed and you should get as follows:

Note: observe the number of mapper and reducer executed by default:

mapper = 2
reducer = 2

check the 1) check that, you should see an output folder created named countfromSample
output
(via HUE) 2) click on the folder, and you should get the following:

3) Click on one of the files to check the output.

Exercise 3 Application objective: To count the frequency of each words in the file (.csv) where the
size of the file is greater than 128MB

Sample Download Selected_Task_sample.csv (size: 188.7 MB)


dataset

Steps The steps:


transfer or put the input file into HDFS
execute the command with additional setting to change the default number of reducer
check the results

execute 1) execute the following command via SSH :


the *assuming the csv file is in a directory named input2
command
$hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar wordcount -D mapred.reduce.tasks=1 input2/Selected_Task_sample.csv
countfromSample2
2) mapreduce program is executed and you should get as follows:

Note: observe the number of mapper and reducer executed by default:

mapper = 2
reducer = 1

check the 1) check that, you should see an output folder created named countfromSample2
output
(via HUE) 2) click on the folder, and you should get the following:

3) Click on the file

You might also like