0% found this document useful (0 votes)
6 views4 pages

Big Datalab

The document provides detailed instructions for setting up and installing Hadoop in three modes: Standalone, Pseudo Distributed, and Fully Distributed, along with the necessary configurations and commands. It also covers file management tasks in Hadoop, including adding, retrieving, and deleting files in HDFS, as well as running a basic Word Count MapReduce program to demonstrate the MapReduce paradigm. The document outlines the algorithm steps for each task and includes example commands for effective execution.

Uploaded by

Menaka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Big Datalab

The document provides detailed instructions for setting up and installing Hadoop in three modes: Standalone, Pseudo Distributed, and Fully Distributed, along with the necessary configurations and commands. It also covers file management tasks in Hadoop, including adding, retrieving, and deleting files in HDFS, as well as running a basic Word Count MapReduce program to demonstrate the MapReduce paradigm. The document outlines the algorithm steps for each task and includes example commands for effective execution.

Uploaded by

Menaka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

i) Perform setting up and Installing Hadoop in its three operating modes:?

Standalone?Pseudo Distributed?Fully Distributed

DESCRIPTION:Hadoop is written in Java, so you will need to have Java installed on


your machine,version 6 or later. Sun's JDK is the one most widely used with Hadoop,
although others havebeen reported to work

Hadoop runs on Unix and on Windows. Linux is the only


supported productionplatform, but other flavors of Unix (including Mac OS X) can
be used to run Hadoop fordevelopment. Windows is only supported as a development
platform, and additionally requiresCygwin to run. During the Cygwin
installation process, you should include the opensshpackage if you plan
to run Hadoop in pseudo-distributed mode

ALGORITHM STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED MODE:-


1. In order install pseudo distributed mode we need to configure the hadoop
configuration files resides in the directory /home/lendi/hadoop-2.7.1/etc/hadoop.
2. First configure the hadoop-env.sh file by changing the java path.
3. Configure the core-site.xml which contains a property tag, it contains name and
value. Name as fs.defaultFS and value as hdfs://localhost:9000
4. Configure hdfs-site.xml.
5. Configure yarn-site.xml.
6. Configure mapred-site.xml before configure the copy mapred-site.xml.templateto
mapred-site.xml.
7. Now format the name node by using command hdfs namenode �format.
8. Type the command start-dfs.sh,start-yarn.sh means that starts the daemons like
NameNode,DataNode,SecondaryNameNode ,ResourceManager,NodeManager.
9. Run JPS which views all daemons. Create a directory in the hadoop by using
command hdfs dfs �mkdr /csedir and enter some data into lendi.txt using command
nano lendi.txt and copy from local directory to hadoop using command hdfs dfs �
copyFromLocal lendi.txt /csedir/and run sample jar file wordcount to check whether
pseudo distributed mode is working or not.
10. Display the contents of file by using command hdfs dfs �cat /newdir/part-r-
00000.

FULLY DISTRIBUTED MODE INSTALLATION:


ALGORITHM
1. Stop all single node clusters
$stop-all.sh

2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

3. Copy public key to all three hosts to get a password less


SSH access$ssh-copy-id �I $HOME/.ssh/id_rsa.pub lendi@l5sys24

4. Configure all Configuration files, to name Master and Slave Nodes.


$cd $HADOOP_HOME/etc/hadoop
$nano core-site.xml
$ nano hdfs-site.xml

5. Add hostnames to file slaves and save it.


$ nano slaves

6. Configure
$ nano yarn-site.xml
7. Do in Master Node
$ hdfs namenode �format
$ start-dfs.sh
$start-yarn.sh
8. Format NameNode

9. Daemons Starting in Master and Slave Nodes

10. END

INPUT

ubuntu @localhost> jps

OUTPUT:
Data node, name nodem Secondary name node, NodeManager, Resource Manager

________________________________________________________________________

II Implement the following file management tasks in Hadoop:?


Adding files and directories?
Retrieving files?
Deleting Files

HDFS is a scalable distributed filesystem designed to scale to petabytes of data


while running on top of the underlying filesystem of the operating system.
HDFS keeps track of where the data resides in a network by associating the name
of its rack (or network switch) with the dataset. This allows Hadoop to efficiently
schedule tasks to those nodes that contain data, or which are nearest to it,
optimizing bandwidth utilization. Hadoop provides a set of command line utilities
that work similarly to the Linux file commands, and serve as your primary interface
with HDFS. We�re going to have a look into HDFS by interactig with it from the
command line. We will take a

*Adding files and directories to HDFS?


*Retrieving files from HDFS to local filesystem?
*Deleting files from HDFS

ALGORITHM: -
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you�ll need to put the
data into HDFS first. Let�s create a directory and put a file in it. HDFS has a
default working directory of/user/$USER, where $USER is your login user name. This
directory isn�t automatically created for you, though, so let�s create it with the
mkdir command. For the purpose of illustration, we use chuck. You should substitute
your user name in the example commands.

hadoop fs -mkdir /user/chuck


hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck

Step-2
Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieve example.txt, we can run the following command:
hadoop fs -cat example.txt

Step-3
Deleting Files from HDFS
hadoop fs -rm example.txt
Command for creating a directory in hdfs is �hdfs dfs �mkdir /lendicse�.
Adding directory is done through the command �hdfs dfs �put lendi_english /�.

Step-4
Copying Data from NFS to HDFS
Copying from directory command is
�hdfs dfs �copyFromLocal/home/lendi/Desktop/shakes/glossary /lendicse/"

*View the file by using the command �hdfs dfs �cat /lendi_english/glossary�

*Command for listing of items in Hadoop is �hdfs dfs �ls hdfs://localhost:9000/�.

*Command for Deleting files is �hdfs dfs �rm r /kartheek�.

___________________________________________________________________

III Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
a) Find the number of occurrences of each word appearing in the input file(s)
b) Performing a MapReduce Job for word search count (look for specific keywords in
a file)

MAPREDUCE
PROGRAM
WordCount is a simple program which counts the number of occurrences of each word
in a given text input data set. WordCount fits very well with the MapReduce
programming model making it a great example to understand the Hadoop Map/Reduce
programming style. Our implementation consists of three main parts:
1. Mapper
2. Reducer
3. Driver
Step-1. Write a Mapper
A Mapper overrides the ?map? function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key, value> pairs as the
input. A Mapper implementation may output<key,value> pairs using the provided
Context .Input value of the WordCount Map task will be a line of text from the
input data file and the key would be the line number <line_number, line_of_text> .
Map task outputs <word, one> for each word in the line of text.

Pseudo-code
void Map (key, value)
{
for each word x in value:
output.collect(x, 1);
}

Step-2. Write a Reducer


A Reducer collects the intermediate <key,value> output from multiple map
tasks andassemble a single result. Here, the WordCount program will sum up the
occurrence of eachword to pairs as
<word, occurrence>.

Pseudo-code
void Reduce (keyword, <list of value>)
{
for each x in <list of value>:
sum+=x;
final_output.collect(keyword, sum);
}
Step-3. Write Driver
The Driver program configures and run the MapReduce job. We use the main program
toperform basic configurations such as:
?Job Name : name of this Job
?Executable (Jar) Class: the main executable class. For here, WordCount.?Mapper
Class: class which overrides the "map" function. For here, Map.
?Reducer: class which override the "reduce" function. For here , Reduce.
?Output Key: type of output key. For here, Text.
?Output Value: type of output value. For here, IntWritable.
?File Input Path
?File Output Path

You might also like