0% found this document useful (0 votes)

61 views10 pages

Palak

This document contains a summary of key concepts in Hadoop MapReduce and examples of code to implement common tasks. [1] It explains the mapper, reducer, and driver code components of a MapReduce program using a word count example. The mapper parses input and emits counts, the reducer sums counts by word, and the driver runs the job. [2] File management tasks like creating directories, uploading/downloading files, copying/moving files, and removing files or directories are demonstrated using Hadoop fs commands. [3] Modes for running Hadoop programs are described, from standalone on one machine to pseudo-distributed and fully-distributed across clusters.

Uploaded by

Dolly Mehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views10 pages

Palak

Uploaded by

Dolly Mehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

GURU TEGH BAHADUR INSTITUTE

OF TECHNOLOGY

BIG DATA ANALYTICS PRACTICAL FILE

Submitted By- Palak

Class - IT-2 (8th Semester)

Enrollment No. – 20813203119
Program 1: How to Use Hadoop Cluster
Solution

There are three modes in which you can get the experience of hadoop.

Standalone mode:

In this mode you need an ide like eclipse and the hadoop library files (which you can
download from the apache website). You can create your mapreduce program and run it
in your local machine. You will be able to check the logic of the code and you can check
any syntax errors and this needs some sample data to perform these actions but you will
not get the full experience of hadoop.

Psuedo-distributed mode:

In this mode you get all the daemons of hadoop running on a single machine and you
can get a vm from cloudera or hortonworks which is just plug and play type of thing. It
will have all the necessary tools installed and configured. In this mode you can scale up
your data to check how your code performs and optimize accordingly to get the job
done in the required time.

Fully-distributed mode:

In this mode you get all the daemons running on different machines. This is mostly used
in the production stage of your project. When you have already verified your code you
will get a chance to implement it in this mode.

Since you request an online service where you can practice your hadoop code. Install
eclipse on pc and download the libraries and start coding

File Management tasks in Hadoop

Program 2. Create a directory in HDFS at given path(s).

Usage:

hadoop fs -mkdir

Example:

hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2

Program 3. Upload and download a file in HDFS.
Upload:

hadoop fs -put:
Copy single src file, or multiple src files from local file system to the Hadoop data file system
Usage:
hadoop fs -put <localsrc>……<HDFS_dest_Path>
Example:

hadoop fs -put /home/saurzcode/Samplefile.txt /user/ saurzcode/dir3/

Download: hadoop fs -get:

Copies/Downloads files to the local file system
Usage:
hadoop fs -get
Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/

Program 4. See contents of a file Same as unix cat command:

Usage: hadoop fs -cat
Example: hadoop fs -cat /user/saurzcode/dir1/abc.txt5.

Program 5. Copy a file from source to destination

This command allows multiple sources as well in which case the destination must be a
directory.
Usage:
hadoop fs –cp
Example:

hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

Program 6. Remove a file or directory in HDFS.
Remove files specified as argument. Deletes directory only when it is empty
Usage : hadoop fs -rm
Example: hadoop fs -rm /user/saurzcode/dir1/abc.txt

Recursive version of delete.

Usage :
hadoop fs -rmr Example: hadoop fs -rmr /user/saurzcode/

Program 7. Copy a file from source to destination

This command allows multiple sources as well in which case the destination must be a
directory.
Usage:
hadoop fs –cp
Example:
hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

Program 8. Move file from source to destination.

Note:-

Moving files across filesystem is not permitted.

Usage :
hadoop fs -mv
Example:
hadoop fs -mv /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

Program 9.Display the aggregate length of a file.

Usage :

hadoop fs -du
Example:
hadoop fs -du /user/saurzcode/dir1/abc.txt

Program 10. Implement a program of Word Count Map Reduce

program to understand Map Reduce Paradigm.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.fs.Path;

public class WordCount

{
public static class Map extends Mapper<LongWritable,Text,text,Inkwritable>

{
public void map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())

{
value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}

}
public static class Reduce extends Reducer
{
public void reduce(Text key, Iterable values,Context context) throws
IOException,InterruptedException
{
int sum=0;for(IntWritable x: values)

{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}

}
public static void main(String[] args) throws Exception
{
Configuration conf= new Configuration();
Job job = new Job(conf,"My Word Count Program");

job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);

//Configuring the input/output path from the filesystem into the job

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

//deleting the output path automatically from hdfs so that we don't have to delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath); //exiting the job only if the flag value
becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);

}
}

The entire MapReduce program can be fundamentally divided into three parts:
• Mapper Phase Code

• Reducer Phase Code

• Driver Code
We will understand the code for each of these three parts sequentially.
Mapper code:
public static class Map extends Mapper { public void map(LongWritable key, Text value, Context
context) throws IOException,InterruptedException

{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{

value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
• We have created a class Map that extends the class Mapper which is already defined in the
MapReduce Framework
. • We define the data types of input and output key/value pair after the class declaration using
angle brackets.
• Both the input and output of the Mapper is a key/value pair.

• Input:
◦ The key is nothing but the offset of each line in the text file:LongWritable
◦ The value is each individual line (as shown in the figure at the right): Text• Output:
◦ The key is the tokenized words: Text ◦ We have the hardcoded value in our case which
is
1: IntWritable
◦ Example – Dear 1, Bear 1, etc.

• We have written a java code where we have tokenized each word and assigned them a
hardcoded value equal to 1.

Reducer Code:
public static class Reduce extends Reducer
{
public void reduce(Text key, Iterable values,Context context) throws
IOException,InterruptedException
{

int sum=0; for(IntWritable x: values)

{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}

}
• We have created a class Reduce which extends class Reducer like that of Mapper.
• We define the data types of input and output key/value pair after the class declaration using
angle brackets as done for Mapper
.• Both the input and the output of the Reducer is a keyvalue pair.
• Input:
◦ The key nothing but those unique words which have been generated after the sorting
and shuffling phase: Text

◦ The value is a list of integers corresponding to each key: IntWritable

◦ Example – Bear, [1, 1], etc.

Output:
◦ The key is all the unique words present in the input text file: Text

◦ The value is the number of occurrences of each of the unique words: IntWritable
◦ Example – Bear, 2; Car, 3, etc.
• We have aggregated the values present in each of the list corresponding to each key and
produced the final answer.
• In general, a single reducer is created for each of the unique words, but, you can specify the
number of reducer in map red-site.xml
Driver Code:
Configuration conf= new Configuration();
Job job = new Job(conf,"My Word Count Program");

job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);

//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

• In the driver class, we set the configuration of our MapReduce job to run in Hadoop.
• We specify the name of the job , the data type of input/ output of the mapper and reducer.
• We also specify the names of the mapper and reducer classes.

• The path of the input and output folder is also specified.

• The method setInputFormatClass () is used for specifying that how a Mapper will read the
input data or what will be the unit of work. Here, we have chosen TextInputFormat so that
single line is read by the mapper at a time from the input text file.
• The main () method is the entry point for the driver. In this method, we instantiate a new
Configuration object for the job.
Run the MapReduce code:
The command for running a MapReduce code is:
hadoop jar hadoop-mapreduce-example.jar WordCount / sample/input /sample/output

BDC Output 3
No ratings yet
BDC Output 3
4 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Classcreation
No ratings yet
Classcreation
2 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
37 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Hadoop
No ratings yet
Hadoop
28 pages
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
No ratings yet
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
5 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Execute Java Map Reduce Sample Using Eclipse
No ratings yet
Execute Java Map Reduce Sample Using Eclipse
9 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Hadoop Week 4
No ratings yet
Hadoop Week 4
13 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
CPD Law-Ph
No ratings yet
CPD Law-Ph
6 pages
TT1285-Instruction Manual
No ratings yet
TT1285-Instruction Manual
20 pages
Abraham Wondale
No ratings yet
Abraham Wondale
73 pages
March Pump SP-TE-7K-MD
No ratings yet
March Pump SP-TE-7K-MD
2 pages
Great Debaters
No ratings yet
Great Debaters
51 pages
Statistical Analysis of Data From The Stock Market
No ratings yet
Statistical Analysis of Data From The Stock Market
25 pages
Phannarak CV
No ratings yet
Phannarak CV
2 pages
l1 Auto Sensors Accessible
No ratings yet
l1 Auto Sensors Accessible
14 pages
Him Portland v. Devito Builders (2003)
No ratings yet
Him Portland v. Devito Builders (2003)
4 pages
Arts 10: 3 Quarter Week 3
No ratings yet
Arts 10: 3 Quarter Week 3
10 pages
ct9 Ilm3
No ratings yet
ct9 Ilm3
11 pages
List Spare Part NCR BSB - 6622 - 6622e - Rev1
No ratings yet
List Spare Part NCR BSB - 6622 - 6622e - Rev1
56 pages
NLC Accomplishment Report 2024-2025
No ratings yet
NLC Accomplishment Report 2024-2025
5 pages
John Crane Seal Type 1A 2
No ratings yet
John Crane Seal Type 1A 2
6 pages
BodyLanguagefor Leaders PDF
No ratings yet
BodyLanguagefor Leaders PDF
14 pages
Notif VO BVO 06 2024 23082024
No ratings yet
Notif VO BVO 06 2024 23082024
1 page
Ledesma vs. CA Notes
No ratings yet
Ledesma vs. CA Notes
4 pages
Case Application 1-b
No ratings yet
Case Application 1-b
2 pages
3.6 Relay
No ratings yet
3.6 Relay
4 pages
Administration: Order of Completion
No ratings yet
Administration: Order of Completion
24 pages
Socialization of Agriculture
No ratings yet
Socialization of Agriculture
2 pages
Industrial Shakers
No ratings yet
Industrial Shakers
4 pages
Imaging and Design For The Online Environment: CS - ICT11/12-ICTPT-Ie-f-6
No ratings yet
Imaging and Design For The Online Environment: CS - ICT11/12-ICTPT-Ie-f-6
49 pages
Inventor Tutorials
100% (3)
Inventor Tutorials
1,264 pages
Lecture 1 Basics of PCB
No ratings yet
Lecture 1 Basics of PCB
32 pages
Letter To Ranjit Sinha CBI Director July 1, 2013-FIR Against Ramnish With Annex
No ratings yet
Letter To Ranjit Sinha CBI Director July 1, 2013-FIR Against Ramnish With Annex
44 pages
Oferta de Compraventa Bilingüe
No ratings yet
Oferta de Compraventa Bilingüe
6 pages
Bank Math Lecture Book v-1
No ratings yet
Bank Math Lecture Book v-1
99 pages
Data Mining Cat
No ratings yet
Data Mining Cat
6 pages
STO Process - Pricing Procedure
No ratings yet
STO Process - Pricing Procedure
30 pages

Palak

Uploaded by

Palak

Uploaded by

GURU TEGH BAHADUR INSTITUTE

BIG DATA ANALYTICS PRACTICAL FILE

Submitted By- Palak

Class - IT-2 (8th Semester)

File Management tasks in Hadoop

Program 2. Create a directory in HDFS at given path(s).

hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2

hadoop fs -put /home/saurzcode/Samplefile.txt /user/ saurzcode/dir3/

Download: hadoop fs -get:

Program 4. See contents of a file Same as unix cat command:

Program 5. Copy a file from source to destination

hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

Recursive version of delete.

Program 7. Copy a file from source to destination

Program 8. Move file from source to destination.

Moving files across filesystem is not permitted.

Program 9.Display the aggregate length of a file.

Program 10. Implement a program of Word Count Map Reduce

public class WordCount

FileInputFormat.addInputPath(job, new Path(args[0]));

• Reducer Phase Code

int sum=0; for(IntWritable x: values)

◦ The value is a list of integers corresponding to each key: IntWritable

FileOutputFormat.setOutputPath(job, new Path(args[1]));

• The path of the input and output folder is also specified.

You might also like