0% found this document useful (0 votes)
16 views9 pages

Bdafile

Big Data Analytics File
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Bdafile

Big Data Analytics File
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

NETAJI SUBASH UNIVERSITY OF TECHNOLOGY

EAST CAMPUS
Geeta Colony, New Delhi- 110031

BIG DATA ANALYTICS


Course Code – CBCPC11

PRACTICAL FILE

Submitted By: Aarav Jain Submitted To: Shajal Afaq

Roll No: 2022UCB6063


INDEX
SNO EXPERIMENT DATE SUBMISSION SIGN
Experiment – 1
AIM: Installation of VMWare to set up Hadoop Environment and its ecosystems.

OUTPUT:
Experiment -2
AIM: To perform setting up of Hadoop in three operating modes: a)stand alone
(b)pseudo-distributed (c)fully-distributed

DESCRIPTION:
Hadoop is written in java, so you will need to have java in your machine,v6 or later. Sun's
Java Development Kit is the one most widely used with Hadoop, although others have
been reported to work.

Hadoop runs on Unix and Windows, Linux is the only supported production platform
,but the flavours of Unix (including MAC OS(x)) can be used to run Hadoop for
development. windows is only supported as a dev platform ,and additionally requires
Cygwin . During the installation you should include the open SSH packet if you plan to run
Hadoop in solo distributed mode.

ALGORITHM:
a) STEPS INVOLVED IN INSTALLING HDOOP IN STAND ALONE MODE

1. COMMAND FOR INSTALLING SSH IS : pseudo app get install ssh


2. COMMAND FOR KEY GENERATION IS : ssh-keygen-trsa-P's'
3. STORE THE KEY INTO RSA.PUB BY USING THE COMMAND : CAT
$HOME.ssh/id-isaput>>$HOME/.ssh/authorised_keys

4. Extract java using the command :tar-fz jdk 8u60-linux-i586.tar.gz


5. Extract eclipse using the command :tar XVfz-eclipse-jee-mars-tar-linux-gtk.tar.gz
6. Extract hdoop using the command :tar-XVfz-hdoop-2.71.tar.gz
7. move the java to /usr/library/jbm and eclipse to /opt/path. configure the java path in the
java
8. Export java path and hdoop path in ./bashrc
9. check is installation is successful or not by checking the java version and hdoop version
10. check the hadoop in stand alone mode working correctly or not by usng an
implicit hadoop jad file as word count.
11. if the word count is displayed correctly in -r-00000-filename it means the stand alone
mode is installed successfully
b) STEPS INVOLVED IN INSTALLING HDOOP IN PSSEUDO DISTRIBUTED
MODE:

1. In order to install hdoop in pseudo distributed mode we need to configure hddop


configuration file resides in the directory/home/systemname/hdoop/2.7.1/etc/hdoop

2. First configure the hdoop - env.sh file by changing the java path
3. Configure the core side.xml which contains the property tag, it contains the name and
value. Name as fs.defaultfs and value as hdfs://localhost:9000
configure yarn-side.xml

4. Configure mapred-side.xml before configure the copy mapred.side.xml.template to


side.xml
5. Now format the namenode by using the command hdfs-namenode-format

start -yarn.sh
start dfs.sh ,start the daemons like namenode,datnode run

jps which views all daemons.

6.Create a directory by using the cmd hdfsdfs-mkdir/csedir and enter some data into syastem
name.txt and copy from local directory to hdoop using cmd hdfsdfs-copy from localcsedir/
and run sample jar file.wordcount to check whether pseudo distributed is working or not

7. display content of file by using cmd hdfsdfs-cat/newdirectory/part-r-00000.

FULLY DISTRIBUTION MODE INSTALLATION:


ALGO:

1. Stop all single node cluster


$STOP CALL.SH

2. Decide one as namenode [master] and remaining as datanodes[slave] copy


public key to all 3 host to get a password less ssh access

$ssh-copy-id-I $HOME/.ssh/id_rsa.pub systemname @systemno.


3. Configure all configuration file, to name master and slave nodes

$ cd $HDOOP_HOME/etc/hdoop
$ nano core-side.xml
$ nano hdfs-side.xml
4. Add host name to file slaves and save it
$nano slaves

5. Configure $nano yarn-side.xml

6. Do in master node
$hdfs namenode format
$start dfs.sh
$start yarn.sh

7.Format namenode

8. Daemons starting in master and slave node

9. End

INPUT FORMAT:
ubantu @localhost>jps

OUTPUT FORMAT:

Datanode, Name node, Secondary namenode, Node manager , resource manager.


Experiment – 3
AIM: Implement the following task management task in Hadoop: a) Add file to
directory (b) retrieving files (c) delete files

DESCRIPTION:
HDFS is a scalable distributed file system designed to scale to petabytes of data while
running on top of underline file system of OS. HDFS keeps track of where the data resides in
a network by associating the names of RACK or network switch with the data set. This
allows Hadoop to efficiently schedule task to those nodes that contains data, or which are
nearest to optimizing bandwidth utilisation. Hadoop provide a set of command line, utilities
that work similarly to the Linux file commands and serve as primary interface with HDFS.
We are going to have a look into HDFS by interacting with it from the command line. We
will take a look at the most common file management task in Hadoop, which
includes;
a) add files and directory to HDFS.
b) retrieve files from HDFS to local file system
c) deleting files from HDFS.

ALGORITHM:
1)Adding Files and Directories to HDFS:

Before you can run Hadoop programme on data stored in HDFS, you will need to put the
data into HDFS first. Lets create a directory and put a file in it. HDFS has a default working
directory of /user/$USER where $USER is your user name.
This directory is automatically created for you, though, lets create it with mkdir
command. For the purpose of illustration we use chuck. You should substitute your
username in the example command.
hadoop fs-mkdir/user/chuck
hadoop fs-put example.txt
hadoop fs-put example.txt/user/chuck
2)Retrieve file from HDFS :

The Hadoop command get copies file from HDFS back to the local file system to
retrieve example.txt , we can run the following command
hadoop fs-cat example.txt

3)Delete file from HDFS :

hadoop fs-rm example.txt


command for create A DIRECTORY IN HDFS is :
hdfs dfs mkdir/lendicse
command for add A DIRECTORY IN HDFS is : "hdfs dfs-put lendi_english

4)NFS to HDFS copying from directory command is :

‘hdfs dfs -copyFROMLOCAL/home/lendi/desktop/shakes/glossary/lendicse/’ View

the file by command “ hdfs dfs-cta/lendi_english/glossary”


Command for listing items in hadoop is :

“hdfs dfs-ls hdfs://localhost:9000/”


Command for deleting the file is : “hdfs dfs rmr/”

INPUT : As any data of format structured, semi-structured and unstructured.

EXPECTED OUTPUT:

You might also like