Big Data Analytics Lab Experiments
Big Data Analytics Lab Experiments
EXPERIMENT-1:
Hadoop architecture
HDFS (Hadoop Distributed File System) is utilized for storage permission. It is mainly designed
for working on commodity hardware devices, working on a distributed file system design. HDFS
is designed in such a way that it believes more in storing the data in a large chunk of blocks
rather than storing small data blocks.
HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the
other devices present in that Hadoop cluster. Data storage Nodes in HDFS.
• NameNode(Master)
• DataNode(Slave)
Meta Data can also be the name of the file, size, and the information about the location(Block
number, Block ids) of Datanode that Namenode stores to find the closest DataNode for Faster
Communication. Namenode instructs the DataNodes with the operation like delete, create,
Replicate, etc.
DataNode: DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a
Hadoop cluster, the number of DataNodes can be from 1 to 500 or even more than that. The
more number of DataNode, the Hadoop cluster will be able to store more data. So it is advised
that the DataNode should have High storing capacity to store a large number of file blocks.
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is
recommended because it helps to separate the Hadoop installation from other software applications
and user accounts running on the same machine
You can set password here -- Click on Account Disabled →click password →confirm password
→ click on change
# To change password
$sudo su
$ java –version
java-1.11.0-openjdk i386 (i386 means 32-bit OS) You can remove the last chars
copy it to java-1.11.0-
ivcse@hadoop cd /usr/lib/jvm/
$ssh localhost
You can visit the Apache Hadoop website to see a list of versions with their recent change log.
Select the version of your choice
Google → type Apache Mirrors → Select the first site( apache.org) →all apache softwares will
be displayed → select hadoop folder→ displays 3 type
Chukwa\,common\ core\ select core\ select stable version for hadoop-3.3.6 i.e., stable1 → select
hadoop-3.3.6. tar.gz
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-
3.3.6.tar.gz
Copy tar.gz into ABC from downloads folder and extract it.
$ gedit ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk
export HADOOP_HOME=/home/ivcse/ABC/hadoop-1.2.1
export HADOOP_HOME=/home/username/folder/hadoop-1.2.1
Conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020 </value>
</property>
</configuration>
conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
conf/mapred-site.xml
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk
hduser@ubuntu:~$ start-dfs.sh
hduser@ubuntu:~$ start-yarn.sh
hduser@ubuntu:~$ start-all.sh
hduser@ubuntu:~$ jps
hduser@ubuntu:~$ stop-all.sh
EXPERIMENT-2:
• Retrieving files
• Deleting files
➢ fs : The File System (FS) shell includes various shell-like commands that
directly interact with the Hadoop Distributed File System (HDFS) as well as
other file systems that Hadoop supports.
The -p option behavior is much like Unix mkdir -p, creating parent directories
along the path.
Example:
hdfs://nn2.example.com/user/hadoop/dir
Example:
➢ lsr
Usage: hadoop fs -lsr <args>
Recursive version of ls.
➢ cat
Usage: hadoop fs -cat URI [URI ...]
Example:
➢ get
Usage: hadoop fs -get <src> <localdst>
Copy files to the local file system. Example:
➢ put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination file
system. Also reads input from stdin
➢ copyFromLocal
Usage: hadoop fs -copyFromLocal <localsrc> URI
Similar to put command, except that the source is restricted to a local file
reference.
Options:
➢ copyToLocal
Usage: hadoop fs -copyToLocal URI <localdst>
Similar to get command, except that the destination is restricted to a local file
reference.
➢ count
Usage: hadoop fs -count <paths>
Count the number of directories, files and bytes under the paths that match the
specified file pattern. The
➢ cp
Usage: hadoop fs -cp [-f] [-p ] [URI ...] <dest>
Copy files from source to destination (Both files should be in HDFS). This
command allows multiple sources as
Options:
Example:
hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2
➢ mv
Usage: hadoop fs -mv URI [URI ...] <dest>
Moves files from source to destination (both should be in HDFS) . This command
allows multiple sources as well
in which case the destination needs to be a directory. Moving files across file
systems is not permitted.
Example:
hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1
➢ rm
Usage: hadoop fs -rm [-f] [-r |-R] URI [URI ...]
Options:
The -f option will not display a diagnostic message or modify the exit status to
reflect an error if the file
The -R option deletes the directory and any content under it recursively.
➢ rmdir
Usage: hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]
Delete a directory.
Options:
Example:
rmr
➢ df
Usage: hadoop fs -df [-h] URI [URI ...]
Displays free space.
Options:
The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m
instead of 67108864)
Example:
Example:
hdfs://nn.example.com/user/hadoop/dir1
➢ help
replication factor of all files under the directory tree rooted at path.
Options:
The -w flag requests that the command wait for the replication to complete. This
can potentially take a
Example:
➢ tail
Usage: hadoop fs -tail URI
Displays last kilobyte of the file to stdout.
Example:
➢ checksum
Usage: hadoop fs -checksum URI
Returns the checksum information of a file.
Example:
➢ chgrp
Usage: hadoop fs -chgrp [-R] GROUP URI [URI ...]
Change group association of files. The user must be the owner of files, or else a
super-user. Additional
Options
The -R option will make the change recursively through the directory structure.
➢ chmod
Usage: hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE>
URI [URI ...]
Change the permissions of files. With -R, make the change recursively through the
directory structure. The user
Options
The -R option will make the change recursively through the directory structure.
➢ chown
Usage: hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
Change the owner of files. The user must be a super-user. Additional information
is in the Permissions Guide.
Options
The -R option will make the change recursively through the directory structure.