Exp 12
Exp 12
----------------------------------------------------------------------------------------------------------------
Group B
Assignment No: 2
----------------------------------------------------------------------------------------------------------------
Theory:
● Steps to Install Hadoop for distributed environment
● Java Code for processes a log file of a system
cd hadoop-2.7.3
Step 2) Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the
daemons/nodes.
cd hadoop-2.7.3/sbin
1) Start NameNode:
The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files
stored in the HDFS and tracks all the file stored across the cluster.
2) Start DataNode:
On startup, a DataNode connects to the Namenode and it responds to the requests from the
Namenode for different operations.
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
3) Start ResourceManager:
ResourceManager is the master that arbitrates all the available cluster resources and thus helps in
managing the distributed applications running on the YARN system. Its work is to manage each
NodeManagers and the each application’s ApplicationMaster.
4) Start NodeManager:
The NodeManager in each machine framework is the agent which is responsible for managing
containers, monitoring their resource usage and reporting the same to the ResourceManager.
5) Start JobHistoryServer:
JobHistoryServer is responsible for servicing all job history related requests from client.
Step 3) To check that all the Hadoop services are up and running, run the below command.
jps
Step 4) cd
Step 9) cd mapreduce_vijay/
Step 10) ls
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
Step 14) ls
Step 17) cd ..
Step 20) ls
Step 21) cd
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
Step 29) Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the
NameNode interface.
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
}
output.collect(key, new IntWritable(frequencyForCountry));
}
}
Driver Class:
package SalesCountry;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Input File
Pune
Mumbai
Nashik
Pune
GCOERC,NASHIK
Department of Computer Engineering Subject : DSBDAL
Nashik
Kolapur
Assignment Questions
1. Write down the steps for Design a distributed application using MapReduce which
processes a log file of a system.
GCOERC,NASHIK