0% found this document useful (0 votes)

10 views7 pages

Exp 12

Uploaded by

C 10 Mayur Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views7 pages

Exp 12

Uploaded by

C 10 Mayur Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Department of Computer Engineering Subject : DSBDAL

----------------------------------------------------------------------------------------------------------------

Group B
Assignment No: 2
----------------------------------------------------------------------------------------------------------------
Theory:
● Steps to Install Hadoop for distributed environment
● Java Code for processes a log file of a system

Steps to Install Hadoop for distributed environment:

Initially create one folder logfiles1 on desktop. In that folder store input file
(access_log_short.csv), SalesMapper.java, SalesCountryReducer.java, SalesCountryDriver.java
files)

Step 1) Go to Hadoop home directory and format the NameNode.

cd hadoop-2.7.3

bin/hadoop namenode -format

Step 2) Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the
daemons/nodes.

cd hadoop-2.7.3/sbin

1) Start NameNode:

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files
stored in the HDFS and tracks all the file stored across the cluster.

./hadoop-daemon.sh start namenode

2) Start DataNode:

On startup, a DataNode connects to the Namenode and it responds to the requests from the
Namenode for different operations.

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

./hadoop-daemon.sh start datanode

3) Start ResourceManager:

ResourceManager is the master that arbitrates all the available cluster resources and thus helps in
managing the distributed applications running on the YARN system. Its work is to manage each
NodeManagers and the each application’s ApplicationMaster.

./yarn-daemon.sh start resourcemanager

4) Start NodeManager:

The NodeManager in each machine framework is the agent which is responsible for managing
containers, monitoring their resource usage and reporting the same to the ResourceManager.

./yarn-daemon.sh start nodemanager

5) Start JobHistoryServer:

JobHistoryServer is responsible for servicing all job history related requests from client.

./mr-jobhistory-daemon.sh start historyserver

Step 3) To check that all the Hadoop services are up and running, run the below command.

jps

Step 4) cd

Step 5) sudo mkdir mapreduce_vijay

Step 6) sudo chmod 777 -R mapreduce_vijay/

Step 7) sudo chown -R vijay mapreduce_vijay/

Step 8) sudo cp /home/vijay/Desktop/logfiles1/* ~/mapreduce_vijay/

Step 9) cd mapreduce_vijay/

Step 10) ls

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

Step 11) sudo chmod +r .

Step 12) export CLASSPATH="/home/vijay/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-

mapreduce-client-core-2.7.3.jar:/home/vijay/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-
mapreduce-client-common-2.7.3.jar:/home/vijay/hadoop-2.7.3/share/hadoop/common/hadoop-
common-2.7.3.jar:~/mapreduce_vijay/SalesCountry/*:$HADOOP_HOME/lib/*"

Step 13) javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

Step 14) ls

Step 15) cd SalesCountry/

Step 16) ls (check is class files are created)

Step 17) cd ..

Step 18) gedit Manifest.txt

(add following lines to it:
Main-Class: SalesCountry.SalesCountryDriver)

Step 19) jar -cfm mapreduce_vijay.jar Manifest.txt SalesCountry/*.class

Step 20) ls

Step 21) cd

Step 22) cd mapreduce_vijay/

Step 23) sudo mkdir /input200

Step 24) sudo cp access_log_short.csv /input200

Step 25) $HADOOP_HOME/bin/hdfs dfs -put /input200 /

Step 26) $HADOOP_HOME/bin/hadoop jar mapreduce_vijay.jar /input200 /output200

Step 27) hadoop fs -ls /output200

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

Step 28) hadoop fs -cat /out321/part-00000

Step 29) Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the
NameNode interface.

Java Code to process logfile

Mapper Class:
package SalesCountry;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable,

Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector<Text,

IntWritable> output, Reporter reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split("-");
output.collect(new Text(SingleCountryData[0]), one);
}
}
Reducer Class:
package SalesCountry;

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,

IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException
{
Text key = t_key;
int frequencyForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
output.collect(key, new IntWritable(frequencyForCountry));
}
}
Driver Class:
package SalesCountry;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class SalesCountryDriver {

public static void main(String[] args) {
JobClient my_client = new JobClient();
// Create a configuration object for the job
JobConf job_conf = new JobConf(SalesCountryDriver.class);

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

// Set a name of the Job

job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(SalesCountry.SalesMapper.class);
job_conf.setReducerClass(SalesCountry.SalesCountryReducer.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,

//arg[0] = name of input directory on HDFS, and arg[1] = name of
output directory to be created to store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Input File
Pune
Mumbai
Nashik
Pune

GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL

Nashik
Kolapur

Assignment Questions
1. Write down the steps for Design a distributed application using MapReduce which
processes a log file of a system.

GCOERC,NASHIK

BDA Practical
No ratings yet
BDA Practical
18 pages
Assignment 2 Write-Up
No ratings yet
Assignment 2 Write-Up
7 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
DA Lab Manual Final
No ratings yet
DA Lab Manual Final
46 pages
02 HDFS - 3 JavaAPI
No ratings yet
02 HDFS - 3 JavaAPI
26 pages
Big Data
No ratings yet
Big Data
23 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
DA Lab EXERCISE
No ratings yet
DA Lab EXERCISE
24 pages
Big Data - ASSIGNMENT 2
No ratings yet
Big Data - ASSIGNMENT 2
15 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
BDA Record
No ratings yet
BDA Record
34 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Step 1: Download Binary Package
No ratings yet
Step 1: Download Binary Package
50 pages
Developing A MapReduce Application
No ratings yet
Developing A MapReduce Application
30 pages
XILINX XC Series
100% (1)
XILINX XC Series
41 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
Dsbda 14
No ratings yet
Dsbda 14
16 pages
Data Science
No ratings yet
Data Science
82 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
SAP PPPI - Process Management - Process Instructions and Process Messages
50% (2)
SAP PPPI - Process Management - Process Instructions and Process Messages
40 pages
Big Data File
No ratings yet
Big Data File
16 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
Bda Record
No ratings yet
Bda Record
27 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
SalesData Map Reduce
No ratings yet
SalesData Map Reduce
3 pages
Big Data Lab Record
No ratings yet
Big Data Lab Record
30 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
DSBDA Lab Assignment No B-2
No ratings yet
DSBDA Lab Assignment No B-2
4 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Lecture 02 (Energy and Power Signal)
No ratings yet
Lecture 02 (Energy and Power Signal)
14 pages
GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
BTech KTU Project Report ECE CET
No ratings yet
BTech KTU Project Report ECE CET
58 pages
Programming 8051 Microcontroller
No ratings yet
Programming 8051 Microcontroller
121 pages
LG GW-305 Service Manual
No ratings yet
LG GW-305 Service Manual
153 pages
WHite Instruments 4220 Passive Eq Schematic
No ratings yet
WHite Instruments 4220 Passive Eq Schematic
6 pages
Data Processing: COMP 100: Computer Fundamentals
100% (1)
Data Processing: COMP 100: Computer Fundamentals
8 pages
IT Course Structure
No ratings yet
IT Course Structure
7 pages
Week-03Assignment MCQ
No ratings yet
Week-03Assignment MCQ
5 pages
18ec71 Test 1 2021
No ratings yet
18ec71 Test 1 2021
3 pages
DVCon Europe 2015 T01 Presentation
No ratings yet
DVCon Europe 2015 T01 Presentation
152 pages
Design 2 Digit BCD Display Using 7 Segment Display & 7447 Decoder IC
No ratings yet
Design 2 Digit BCD Display Using 7 Segment Display & 7447 Decoder IC
23 pages
List, Tupels&dictionary
No ratings yet
List, Tupels&dictionary
104 pages
TP 301 Me
No ratings yet
TP 301 Me
50 pages
4415fa PDF
No ratings yet
4415fa PDF
20 pages
Instrumentation and Measurements BE-56 A&B Fall 2020: Instructor: LT Dr. Col Humayun Zubair Khan
No ratings yet
Instrumentation and Measurements BE-56 A&B Fall 2020: Instructor: LT Dr. Col Humayun Zubair Khan
35 pages
Amada Weld Tech-Is-300a - Inverter Spot Welder
No ratings yet
Amada Weld Tech-Is-300a - Inverter Spot Welder
5 pages
Automatic Medicine Reminder Using Arduino: Te Mechanical (B)
No ratings yet
Automatic Medicine Reminder Using Arduino: Te Mechanical (B)
7 pages
Features of Java
No ratings yet
Features of Java
3 pages
VT3002 1 2X.48F
No ratings yet
VT3002 1 2X.48F
4 pages
1.XEROX Showroom Color Promo
No ratings yet
1.XEROX Showroom Color Promo
2 pages
RCA VICTOR MODELS 7-BX-6, 7-BX-7 (Ch. RC-1161, A) Repair Manual
100% (1)
RCA VICTOR MODELS 7-BX-6, 7-BX-7 (Ch. RC-1161, A) Repair Manual
4 pages
ZXNVM E558S-D2-E: LCD Display Unit
No ratings yet
ZXNVM E558S-D2-E: LCD Display Unit
2 pages
Bosto Kingtee Specifications
No ratings yet
Bosto Kingtee Specifications
2 pages
English ASUS Update MyLogo2 3 v1.0 PDF
No ratings yet
English ASUS Update MyLogo2 3 v1.0 PDF
4 pages
Test: Jfo Section 3 Quiz 1 - L1-L2
No ratings yet
Test: Jfo Section 3 Quiz 1 - L1-L2
2 pages
Shashi Shekhar: Work Experience Skills
No ratings yet
Shashi Shekhar: Work Experience Skills
1 page
Ubuntu Basic Commands
No ratings yet
Ubuntu Basic Commands
1 page
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet

Exp 12

Uploaded by

Exp 12

Uploaded by

Department of Computer Engineering Subject : DSBDAL

Steps to Install Hadoop for distributed environment:

Step 1) Go to Hadoop home directory and format the NameNode.

bin/hadoop namenode -format

./hadoop-daemon.sh start namenode

./hadoop-daemon.sh start datanode

./yarn-daemon.sh start resourcemanager

./yarn-daemon.sh start nodemanager

./mr-jobhistory-daemon.sh start historyserver

Step 5) sudo mkdir mapreduce_vijay

Step 6) sudo chmod 777 -R mapreduce_vijay/

Step 7) sudo chown -R vijay mapreduce_vijay/

Step 8) sudo cp /home/vijay/Desktop/logfiles1/* ~/mapreduce_vijay/

Step 11) sudo chmod +r *.*

Step 12) export CLASSPATH="/home/vijay/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-

Step 13) javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

Step 15) cd SalesCountry/

Step 16) ls (check is class files are created)

Step 18) gedit Manifest.txt

Step 19) jar -cfm mapreduce_vijay.jar Manifest.txt SalesCountry/*.class

Step 22) cd mapreduce_vijay/

Step 23) sudo mkdir /input200

Step 24) sudo cp access_log_short.csv /input200

Step 25) $HADOOP_HOME/bin/hdfs dfs -put /input200 /

Step 26) $HADOOP_HOME/bin/hadoop jar mapreduce_vijay.jar /input200 /output200

Step 27) hadoop fs -ls /output200

Step 28) hadoop fs -cat /out321/part-00000

Java Code to process logfile

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable,

public void map(LongWritable key, Text value, OutputCollector<Text,

String valueString = value.toString();

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,

public void reduce(Text t_key, Iterator<IntWritable> values,

public class SalesCountryDriver {

// Set a name of the Job

// Specify data type of output key and value

// Specify names of Mapper and Reducer Class

// Specify formats of the data type of Input and output

// Set input and output directories using command line arguments,

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

You might also like

Step 11) sudo chmod +r .