Run The WordCount Program Instructions

This document provides instructions for executing the WordCount application in Hadoop to count the frequency of words in a file. It describes running WordCount on the complete works of Shakespeare stored in HDFS, copying the results out of HDFS to the local file system, and viewing the results, which list each word and its count. The steps are: start the VM, see example MapReduce programs, verify the input file exists in HDFS, examine the WordCount arguments, run WordCount on the input file, view the output directory in HDFS, look inside the output directory, copy the results to the local file system, and view the WordCount results.

Uploaded by

Varsha Chotalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Run The WordCount Program Instructions

Uploaded by

Varsha Chotalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Module: Big Data Technology

NMIMS University
Prof. Sarada Samantaray

Learning Goals
By the end of this activity, you will be able to:

 Execute the WordCount application.

 Copy the results from WordCount out of HDFS.

1. Open a terminal shell. Start the Cloudera VM in VirtualBox, if not already running, and open a terminal
shell. Detailed instructions for these steps can be found in the previous Readings.
2. See example MapReduce programs. Hadoop comes with several example MapReduce applications.
You can see a list of them by running hadoop jar /usr/lib/Hadoop-mapreduce/hadoop-mapreduce-
examples.jar. We are interested in running WordCount.
(below screenshots should be changed as appropriate)

The output says that WordCount takes the name of one or more input files and the name of the output
directory. Note that these files are in HDFS, not the local file system.
3. Verify input file exists. In the previous Reading, we downloaded the complete works of Shakespeare and
copied them into HDFS. Let's make sure this file is still in HDFS so we can run WordCount on it. Run hadoop fs -
ls

4. See WordCount command line arguments. We can learn how to run WordCount by examining its command-
line arguments. Run hadoop jar /usr/jars/hadoop-examples.jar wordcount.

5. Run WordCount. Run WordCount for words.txt: hadoop jar /usr/jars/hadoop-examples.jar wordcount
words.txt out

As WordCount executes, the Hadoop prints the progress in terms of Map and Reduce. When the WordCount is
complete, both will say 100%.

6. See WordCount output directory. Once WordCount is finished, let's verify the output was created. First, let's
see that the output directory, out, was created in HDFS by running hadoop fs –ls

We can see there are now two items in HDFS: words.txt is the text file that we previously created, and out is
the directory created by WordCount.

7. Look inside output directory. The directory created by WordCount contains several files. Look inside the
directory by running hadoop –fs ls out
The file part-r-00000 contains the results from WordCount. The file _SUCCESS means WordCount executed
successfully.

8. Copy WordCount results to local file system. Copy part-r-00000 to the local file system by running hadoop fs
–copyToLocal out/part-r-00000 local.txt

9. View the WordCount results. View the contents of the results: more local.txt

Each line of the results file shows the number of occurrences for a word in the input file. For example, Accuse
appears four times in the input, but Accusing appears only once.

Alex Senior Net Developer Resume
No ratings yet
Alex Senior Net Developer Resume
3 pages
Five Steps For Acquia Certified Developer Exam
No ratings yet
Five Steps For Acquia Certified Developer Exam
30 pages
03_Run the WordCount program instructions.docx
No ratings yet
03_Run the WordCount program instructions.docx
4 pages
Word_Count(2021)
No ratings yet
Word_Count(2021)
50 pages
Labs Lecture2
No ratings yet
Labs Lecture2
6 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
BDM Lab Manual 2
No ratings yet
BDM Lab Manual 2
4 pages
ExNo04
No ratings yet
ExNo04
4 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
Practice 2
No ratings yet
Practice 2
7 pages
TPhadoop
No ratings yet
TPhadoop
27 pages
Hadoop Map-Reduce
No ratings yet
Hadoop Map-Reduce
2 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
BDA
No ratings yet
BDA
6 pages
Homework_Labs_Lecture2
No ratings yet
Homework_Labs_Lecture2
6 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Week 2 Seminar
No ratings yet
Week 2 Seminar
50 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Lab2 WC
No ratings yet
Lab2 WC
2 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Big Data Analytics - Lecture 6
No ratings yet
Big Data Analytics - Lecture 6
33 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Word Count
No ratings yet
Word Count
10 pages
Hadoop Exercise Mapreduce
No ratings yet
Hadoop Exercise Mapreduce
5 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
Wordcount
No ratings yet
Wordcount
3 pages
GettingStarted
No ratings yet
GettingStarted
2 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Group 11 Assignment 4
No ratings yet
Group 11 Assignment 4
10 pages
Word Count using MapReduce on Hadoop
No ratings yet
Word Count using MapReduce on Hadoop
14 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
No ratings yet
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
14 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Commands Guide.: 5.3 Walk-Through
No ratings yet
Commands Guide.: 5.3 Walk-Through
1 page
Experiment 3
No ratings yet
Experiment 3
5 pages
Activity 2
No ratings yet
Activity 2
31 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Hadoop Lab Notes: Nicola Tonellotto November 15, 2010
No ratings yet
Hadoop Lab Notes: Nicola Tonellotto November 15, 2010
9 pages
Cloud Computing
No ratings yet
Cloud Computing
5 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
JAVASCRIPT FRONT END PROGRAMMING: Crafting Dynamic and Interactive User Interfaces with JavaScript (2024 Guide for Beginners)
From Everand
JAVASCRIPT FRONT END PROGRAMMING: Crafting Dynamic and Interactive User Interfaces with JavaScript (2024 Guide for Beginners)
DAISY JOHNSTON
No ratings yet
How To Reverse The Columns of A 2D Array?
No ratings yet
How To Reverse The Columns of A 2D Array?
27 pages
Cucumber Java Selenium
No ratings yet
Cucumber Java Selenium
8 pages
Python 3 Patterns Idioms
100% (1)
Python 3 Patterns Idioms
301 pages
Various Addressing Modes of 8086 - 8088
No ratings yet
Various Addressing Modes of 8086 - 8088
3 pages
File Handling (1)
No ratings yet
File Handling (1)
16 pages
Servicenow Docs
No ratings yet
Servicenow Docs
41 pages
SMA 2276 Assignment I PDF
No ratings yet
SMA 2276 Assignment I PDF
2 pages
Bapi VS Badi
No ratings yet
Bapi VS Badi
6 pages
Entity Relationship Model: IS 2511 - Fundamentals of Database Systems
No ratings yet
Entity Relationship Model: IS 2511 - Fundamentals of Database Systems
56 pages
Creating Usecase, Class and Sequence Diagrams For Library in Rose
No ratings yet
Creating Usecase, Class and Sequence Diagrams For Library in Rose
10 pages
Computer Science
No ratings yet
Computer Science
2 pages
6.1 38-06 Model Driven Programmability - YANG, NETCONF, RESTCONF and GRPC
No ratings yet
6.1 38-06 Model Driven Programmability - YANG, NETCONF, RESTCONF and GRPC
13 pages
AzurePipeline PDF
No ratings yet
AzurePipeline PDF
1,671 pages
Lab Report 1 and 2, Computer Graphics BCA 5th Sem
No ratings yet
Lab Report 1 and 2, Computer Graphics BCA 5th Sem
7 pages
Assignment Web Scraping
No ratings yet
Assignment Web Scraping
2 pages
Midterm 1 W 02 Solt
No ratings yet
Midterm 1 W 02 Solt
9 pages
Python Ckan
No ratings yet
Python Ckan
73 pages
Zend Framework Interview Question Answers
No ratings yet
Zend Framework Interview Question Answers
10 pages
OSAMA KHAN-Software Engineer
No ratings yet
OSAMA KHAN-Software Engineer
1 page
Object Grid Loader
No ratings yet
Object Grid Loader
5 pages
UnoArduSim FullHelp
100% (1)
UnoArduSim FullHelp
30 pages
CS 325-Computer Architecture: Instructor: Asim Rehan
No ratings yet
CS 325-Computer Architecture: Instructor: Asim Rehan
15 pages
Wily Intro Scope Faq
75% (4)
Wily Intro Scope Faq
3 pages
Arunai Theory Exam Time Table NovDec2024
No ratings yet
Arunai Theory Exam Time Table NovDec2024
25 pages
Module - I - CP
No ratings yet
Module - I - CP
16 pages
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
No ratings yet
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
1,020 pages
Python The Complete Manual 1652453990
No ratings yet
Python The Complete Manual 1652453990
34 pages
Regulations - 2009: Affiliated Institutions Anna University, Chennai
No ratings yet
Regulations - 2009: Affiliated Institutions Anna University, Chennai
44 pages

Run The WordCount Program Instructions

Uploaded by

Run The WordCount Program Instructions

Uploaded by

Module: Big Data Technology

 Execute the WordCount application.

You might also like