Bdafile
Bdafile
EAST CAMPUS
Geeta Colony, New Delhi- 110031
PRACTICAL FILE
OUTPUT:
Experiment -2
AIM: To perform setting up of Hadoop in three operating modes: a)stand alone
(b)pseudo-distributed (c)fully-distributed
DESCRIPTION:
Hadoop is written in java, so you will need to have java in your machine,v6 or later. Sun's
Java Development Kit is the one most widely used with Hadoop, although others have
been reported to work.
Hadoop runs on Unix and Windows, Linux is the only supported production platform
,but the flavours of Unix (including MAC OS(x)) can be used to run Hadoop for
development. windows is only supported as a dev platform ,and additionally requires
Cygwin . During the installation you should include the open SSH packet if you plan to run
Hadoop in solo distributed mode.
ALGORITHM:
a) STEPS INVOLVED IN INSTALLING HDOOP IN STAND ALONE MODE
2. First configure the hdoop - env.sh file by changing the java path
3. Configure the core side.xml which contains the property tag, it contains the name and
value. Name as fs.defaultfs and value as hdfs://localhost:9000
configure yarn-side.xml
start -yarn.sh
start dfs.sh ,start the daemons like namenode,datnode run
6.Create a directory by using the cmd hdfsdfs-mkdir/csedir and enter some data into syastem
name.txt and copy from local directory to hdoop using cmd hdfsdfs-copy from localcsedir/
and run sample jar file.wordcount to check whether pseudo distributed is working or not
$ cd $HDOOP_HOME/etc/hdoop
$ nano core-side.xml
$ nano hdfs-side.xml
4. Add host name to file slaves and save it
$nano slaves
6. Do in master node
$hdfs namenode format
$start dfs.sh
$start yarn.sh
7.Format namenode
9. End
INPUT FORMAT:
ubantu @localhost>jps
OUTPUT FORMAT:
DESCRIPTION:
HDFS is a scalable distributed file system designed to scale to petabytes of data while
running on top of underline file system of OS. HDFS keeps track of where the data resides in
a network by associating the names of RACK or network switch with the data set. This
allows Hadoop to efficiently schedule task to those nodes that contains data, or which are
nearest to optimizing bandwidth utilisation. Hadoop provide a set of command line, utilities
that work similarly to the Linux file commands and serve as primary interface with HDFS.
We are going to have a look into HDFS by interacting with it from the command line. We
will take a look at the most common file management task in Hadoop, which
includes;
a) add files and directory to HDFS.
b) retrieve files from HDFS to local file system
c) deleting files from HDFS.
ALGORITHM:
1)Adding Files and Directories to HDFS:
Before you can run Hadoop programme on data stored in HDFS, you will need to put the
data into HDFS first. Lets create a directory and put a file in it. HDFS has a default working
directory of /user/$USER where $USER is your user name.
This directory is automatically created for you, though, lets create it with mkdir
command. For the purpose of illustration we use chuck. You should substitute your
username in the example command.
hadoop fs-mkdir/user/chuck
hadoop fs-put example.txt
hadoop fs-put example.txt/user/chuck
2)Retrieve file from HDFS :
The Hadoop command get copies file from HDFS back to the local file system to
retrieve example.txt , we can run the following command
hadoop fs-cat example.txt
EXPECTED OUTPUT: