0% found this document useful (0 votes)
61 views16 pages

Big Data Analytics Lab Experiments

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views16 pages

Big Data Analytics Lab Experiments

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

BIG DATA ANALYTICS

Course Code : 21CA32D4 L-T-P-S : 2-0-2-0

EXPERIMENT-1:

(i)Perform setting up and Installing Hadoop in its three operating modes:


Standalone, Pseudo distributed, Fully distributed .

(ii)Use web based tools to monitor your Hadoop setup.

Hadoop architecture

HDFS (Hadoop Distributed File System) is utilized for storage permission. It is mainly designed
for working on commodity hardware devices, working on a distributed file system design. HDFS
is designed in such a way that it believes more in storing the data in a large chunk of blocks
rather than storing small data blocks.

HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the
other devices present in that Hadoop cluster. Data storage Nodes in HDFS.

• NameNode(Master)

• DataNode(Slave)

NameNode : NameNode works as a Master in a Hadoop cluster that guides the


Datanode(Slaves). Namenode is mainly used for storing the Metadata i.e. the data about the data.
Meta Data can be the transaction logs that keep track of the user’s activity in a Hadoop cluster.

Meta Data can also be the name of the file, size, and the information about the location(Block
number, Block ids) of Datanode that Namenode stores to find the closest DataNode for Faster
Communication. Namenode instructs the DataNodes with the operation like delete, create,
Replicate, etc.
DataNode: DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a
Hadoop cluster, the number of DataNodes can be from 1 to 500 or even more than that. The
more number of DataNode, the Hadoop cluster will be able to store more data. So it is advised

that the DataNode should have High storing capacity to store a large number of file blocks.

(i) Installing Hadoop in its three operating modes: Standalone, Pseudo


distributed, Fully distributed .
Hadoop is a framework written in Java for running applications on large clusters of
commodity hardware and incorporates features similar to those of the Google File System (GFS)
and of the MapReduce computing paradigm. Hadoop’s HDFS is a highly fault-tolerant distributed
file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides
high throughput access to application data and is suitable for applications that have large data sets.

Hadoop can be installed in 3 Modes


➢ Local (Standalone)
➢ Pseudo distributed
➢ Fully Distributed

INSTALLING hadoop -3.3.6 in Ubuntu 20.04 (Pseudo Mode)


➢ Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is
recommended because it helps to separate the Hadoop installation from other software applications
and user accounts running on the same machine

Settings → System Settings→User Accounts→unlock →click + button

Choose user as administrator or Standard

You can set password here -- Click on Account Disabled →click password →confirm password
→ click on change

From Console , open terminal

# To create a group and user

$ sudo addgroup hadoop

$ sudo adduser --ingroup hadoop hduser

# To change password

$sudo su

$Enter login password

$sudo passwd new username

Enter password and confirm password

$ exit # to come out of terminal

The prerequisites needed to install Hadoop on Ubuntu are pretty simple


Java 11 and SSH

STEP 1 : Install JAVA

# Update the source list

$ sudo apt-get update

# Check whether java installed /not

$ java –version

# Install Sun Java 6 JDK

$ sudo apt-get install openjdk-11-jdk

The full JDK which will be placed in /usr/lib/jvm/java

java-1.11.0-openjdk i386 (i386 means 32-bit OS) You can remove the last chars

copy it to java-1.11.0-

ivcse@hadoop cd /usr/lib/jvm/

ivcse@hadoop cp -r java-1.11.0-openjdk i386 java-1.11.0-openjdk

STEP 2: Install SSH

$sudo apt-get install ssh

$ssh localhost

#To disable password

$ ssh-keygen -t dsa -P ‘ ‘ –f ~/.ssh/id_dsa

$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys


Now use this command to make sure that your SSH connection has all the required permissions:

chmod 600 ~/ .ssh/authorized_keys

Switch to new user

STEP 3 : Download Apache Tarbal from Apache

You can visit the Apache Hadoop website to see a list of versions with their recent change log.
Select the version of your choice

Google → type Apache Mirrors → Select the first site( apache.org) →all apache softwares will
be displayed → select hadoop folder→ displays 3 type

Chukwa\,common\ core\ select core\ select stable version for hadoop-3.3.6 i.e., stable1 → select
hadoop-3.3.6. tar.gz

(or) use the below command

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-
3.3.6.tar.gz

Create a new folder (say ABC) in home directory

Copy tar.gz into ABC from downloads folder and extract it.

(or) Use tar command to extract files

STEP 4 : CONFIGURATION (5 files)

Enter into hadoop-3.3.6 folder main folder – conf folder

Important files --hadoop-env.sh , core-site.xml, hdfs-site.xml, map-red.xml


Configuring ~/.bashrc file

$ gedit ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk

export HADOOP_HOME=/home/ivcse/ABC/hadoop-1.2.1

export PATH=$ JAVA_HOME/bin:$ HADOOP_HOME/bin:$PATH

export HADOOP_HOME=/home/username/folder/hadoop-1.2.1

Conf/core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:8020 </value>

</property>

</configuration>

conf/mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

</configuration>

conf/mapred-site.xml

yarn-site.xml
<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk

STEP 6 : Format namenode

ivcse@hadoop:~$ hadoop namenode –format

STEP 7 : Start the Processes

Launch HDFS and YARN services.

hduser@ubuntu:~$ start-dfs.sh

hduser@ubuntu:~$ start-yarn.sh
hduser@ubuntu:~$ start-all.sh

hduser@ubuntu:~$ jps

hduser@ubuntu:~$ stop-all.sh

EXPERIMENT-2:

Implement the following file management tasks in Hadoop:

• Adding files and directories

• Retrieving files

• Deleting files

File management tasks & Basic linux commands


➢ Print hadoop version
Usage : hadoop version

➢ fs : The File System (FS) shell includes various shell-like commands that
directly interact with the Hadoop Distributed File System (HDFS) as well as
other file systems that Hadoop supports.

Usage: hadoop fs ( or) hdfs dfs


➢ mkdir : Create a directory in HDFS at given path(s).
Usage: hadoop fs -mkdir [-p] <paths>
Takes path uri’s as argument and creates directories.
Options:

The -p option behavior is much like Unix mkdir -p, creating parent directories
along the path.

Example:

hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2

hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir

hdfs://nn2.example.com/user/hadoop/dir

➢ ls : List the contents of a directory


Usage: hadoop fs -ls [-d] [-h] [-R] <args>
Options:

-d: Directories are listed as plain files.

-h: Format file sizes in a human-readable fashion (eg 64.0m instead of


67108864).

-R: Recursively list subdirectories encountered.

Example:

hadoop fs -ls /user/hadoop/file1

➢ lsr
Usage: hadoop fs -lsr <args>
Recursive version of ls.

Note: This command is deprecated. Instead use hadoop fs -ls –R

➢ cat
Usage: hadoop fs -cat URI [URI ...]

Copies source paths to stdout.

Example:

hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2

hadoop fs -cat file:///file3 /user/hadoop/file4

➢ get
Usage: hadoop fs -get <src> <localdst>
Copy files to the local file system. Example:

hadoop fs -get /user/hadoop/file localfile

hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile

➢ put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination file
system. Also reads input from stdin

and writes to destination file system.

hadoop fs -put localfile /user/hadoop/hadoopfile

hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir

hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile

hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input


from stdin.

➢ copyFromLocal
Usage: hadoop fs -copyFromLocal <localsrc> URI
Similar to put command, except that the source is restricted to a local file
reference.

Options:

The -f option will overwrite the destination if it already exists.

➢ copyToLocal
Usage: hadoop fs -copyToLocal URI <localdst>
Similar to get command, except that the destination is restricted to a local file
reference.

➢ count
Usage: hadoop fs -count <paths>
Count the number of directories, files and bytes under the paths that match the
specified file pattern. The

output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE,


PATHNAME

➢ cp
Usage: hadoop fs -cp [-f] [-p ] [URI ...] <dest>
Copy files from source to destination (Both files should be in HDFS). This
command allows multiple sources as

well in which case the destination must be a directory.

Options:

The -f option will overwrite the destination if it already exists.

The -p option will preserve file attributes.

Example:
hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

➢ mv
Usage: hadoop fs -mv URI [URI ...] <dest>
Moves files from source to destination (both should be in HDFS) . This command
allows multiple sources as well

in which case the destination needs to be a directory. Moving files across file
systems is not permitted.

Example:

hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2

hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2

hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

➢ rm
Usage: hadoop fs -rm [-f] [-r |-R] URI [URI ...]

Delete files specified as args.

Options:

The -f option will not display a diagnostic message or modify the exit status to
reflect an error if the file

does not exist.

The -R option deletes the directory and any content under it recursively.

The -r option is equivalent to -R.


Example:

hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir

➢ rmdir
Usage: hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]

Delete a directory.

Options:

--ignore-fail-on-non-empty: When using wildcards, do not fail if a directory still


contains files.

Example:

hadoop fs -rmdir /user/hadoop/emptydir

rmr

Usage: hadoop fs -rmr URI [URI ...]

Recursive version of delete.

Note: This command is deprecated. Instead use hadoop fs -rm -r

➢ df
Usage: hadoop fs -df [-h] URI [URI ...]
Displays free space.

Options:

The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m
instead of 67108864)

Example:

hadoop dfs -df /user/hadoop/dir1


➢ du
Usage: hadoop fs -du URI [URI ...]
Displays sizes of files and directories contained in the given directory or the length
of a file in case its just a file.

Example:

hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1

hdfs://nn.example.com/user/hadoop/dir1

➢ help

Usage: hadoop fs -help


➢ setrep
Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Changes the replication factor of a file. If path is a directory then the command
recursively changes the

replication factor of all files under the directory tree rooted at path.

Options:

The -w flag requests that the command wait for the replication to complete. This
can potentially take a

very long time.

The -R flag is accepted for backwards compatibility. It has no effect.

Example:

hadoop fs -setrep -w 3 /user/hadoop/dir1

➢ tail
Usage: hadoop fs -tail URI
Displays last kilobyte of the file to stdout.

Example:

hadoop fs -tail pathname

➢ checksum
Usage: hadoop fs -checksum URI
Returns the checksum information of a file.

Example:

hadoop fs -checksum hdfs://nn1.example.com/file1

hadoop fs -checksum file:///etc/hosts

➢ chgrp
Usage: hadoop fs -chgrp [-R] GROUP URI [URI ...]

Change group association of files. The user must be the owner of files, or else a
super-user. Additional

information is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

➢ chmod
Usage: hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE>
URI [URI ...]
Change the permissions of files. With -R, make the change recursively through the
directory structure. The user

must be the owner of the file, or else a super-user. Additional information is in


the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

➢ chown
Usage: hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
Change the owner of files. The user must be a super-user. Additional information
is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

You might also like