Experiment No 1

Experiment No.
Aim : To study basic commands of Hadoop Ecosystem.
Lab Outcome No. : 8.ITL801.1
Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.
Date of Performance: 14/2/22
Date of Submission: 21/2/22
Program formation/ Documentation (02) Timely Submission Viva Answer (03) Experiment Marks Teacher Signature
Execution / Ethical (03) (15) with date
practices (07 )
EXPERIMENT NO : 1
Aim : To study basic commands of Hadoop Ecosystem.
Lab Outcome No. : 8.ITL801.1
Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.
THEORY :
Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming
models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to
scale up from a single server to thousands of machines, each offering local computation and storage.
HADOOP ECOSYSTEM
Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Apache projects and various commercial tools and
solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common. Most of the tools or solutions are used to supplement or
support these major elements. All these tools work collectively to provide services such as absorption, analysis, storage and maintenance of data etc.
1. Pig
● A data flow language and execution environment for exploring very large datasets.
● Pig is made up of two pieces:
o The language used to express data flows, called Pig Latin. o The execution environment to run Pig
Latinprograms.
2. Hive
● A distributed data warehouse.
● Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for
querying the data.
3. Sqoop
● Sqoop is an open-source tool that allows users to extract data from a relational database into Hadoop for further processing.
● A tool for efficient bulk transfer of data between structured data stores such as RDBMS andHDFS.
4. HBase
● A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and
point queries (random reads).
● HBase is not relational and does not support SQL, but given the proper problem space, it is able to do what an RDBMS cannot, such as host very large,
sparsely populated tables on clusters made from commodity hardware.
5. Zookeeper
● A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed
applications.
● Writing distributed applications is hard.

6. Avro
● A serialization system for efficient, cross-language RPC, and persistent data storages.
● Apache Avro is a language-neutral data serialization system.
HADOOP BASIC COMMANDS
There are many more commands in hadoop than that are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no
additional arguments will list all the commands that can be run with the FsShell system. Furthermore, /bin/hadoop fs -help command will display a short usage summary for
the operation in question, if you are stuck.
1) -ls <path>
Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry.
[cloudera@quickstart ~]$ ls cloudera-manager Documents kerberos parcels Templates cm_api.py
Downloads lib PicturesVideosDesktop
eclipse Music Public workspace
2) -du <path>
Shows disk usage, in bytes, for all the files which match path; filenames are reported with the full HDFS protocol prefix.
3) -rm <path>
Removes file or empty directory identified by path.

4) -mkdir <path>
Creates a directory named path in HDFS. Creates any parent directories in the path that are missing (e.g., mkdir -p in Linux).
[cloudera@quickstart Downloads]$ mkdir abc [cloudera@quickstart Downloads]$ ls
abc Documents HADOOPL K-Means-master.zip Desktop
enterprise-deployment.jsonK-Means-master
5) -touchz <path>
Creates a file path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.
6) -stat [format]<path>
Prints information about the path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o), replication (%r), and
modification date (%y, %Y).
7) -mv<src><dest>
Moves file or directory indicated by src to dest, within HDFS.

8) -cp<src><dest>
Copies file or directory identified by src to dest, within HDFS.

9) -help <cmd-name>
Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd.
10) -test -[ezd]<path>
Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.
11)-get [-crc] <src><localDest>
Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.
12) -put <localSrc><dest>
Copies the file or directory from the local file system identified by localSrc to dest within the DFS.
OUTPUT:
1) help HDFS Shell Command
Syntax of help hdfs Command
$ hadoop fs –help
Help hdfs shell command helps hadoop developers figure out all the available hadoop commands and how to use them.
2) Usage HDFS Shell Command
$ hadoop fs –usage ls
Usage command gives all the options that can be used with a particular hdfs command.
3) ls HDFS Shell Command
Syntax for ls Hadoop Command -
$ hadoop fs –ls
This command will list all the available files and subdirectories under default directory.For instance, in our example the default directory for Cloudera VM is
/user/cloudera
Variations of Hadoop ls Shell Command

$ hadoop fs –ls /
Returns all the available files and subdirectories present under the root directory.
$ hadoop fs –ls –R /user/cloudera
Returns all the available files and recursively lists all the subdirectories under /user/Cloudera
4) mkdir- Used to create a new directory in HDFS at a given location.
Example of HDFS mkdir Command -
$ hadoop fs –mkdir /user/cloudera/dezyre1

The above command will create a new directory named dezyre1 under the location /user/cloudera
5) copyFromLocal
Copy a file from the local file system to the HDFS location.
For the following examples, we will use BDL1.txt file available in the /home/BDL location.
Example - $ hadoop fs –copyFromLocal BDL/BDL1.txt /user/cloudera/dezyre1
Copy/Upload Sample1.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)
6) put –
This hadoop command uploads a single file or multiple source files from local file system to hadoop distributed file system (HDFS).
Ex - $ hadoop fs –put BDL/BDL2.txt /user/cloudera/dezyre1
Copy/Upload BDL2.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)
7) moveFromLocal
This hadoop command functions similar to the put command but the source file will be deleted after copying.
Example - $ hadoop fs –moveFromLocal BDL/BDL3.txt /user/cloudera/dezyre1
Move BDL3.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path). Source file will be deleted after moving.
Before moving After moving
8) du
Displays the disk usage for all the files available under a given directory.
Example - $ hadoop fs –du /user/cloudera/dezyre1
9) df
Display disk usage of current hadoop distributed file system.
Example - $ hadoop fs –df

10) Expunge
This HDFS command empties the trash by deleting all the files and directories.
Example - $ hadoop fs –expunge
11) Cat
This is similar to the cat command in Unix and displays the contents of a file.
Example - $ hadoop fs –cat /user/cloudera/dezyre1/Sample1.txt
12) cp
Copy files from one HDFS location to another HDFS location.
Example – $ hadoop fs –cp /user/cloudera/dezyre/war_and_peace /user/cloudera/dezyre1/
13) mv
Move files from one HDFS location to another HDFS location.
Example – $ hadoop fs –mv /user/cloudera/dezyre1/Sample1.txt /user/cloudera/dezyre/
14) rm
Removes the file or directory from the mentioned HDFS location.
Example – $ hadoop fs –rm -r /user/cloudera/dezyre3

15) tail
This hadoop command will show the last kilobyte of the file to stdout.
Example – $ hadoop fs -tail /user/cloudera/dezyre/war_and_peace
Example – $ hadoop fs -tail –f /user/cloudera/dezyre/war_and_peace
Using the tail commands with -f option, shows the last kilobyte of the file from end in a page wise format.
16) copyToLocal
Copies the files to the local filesystem . This is similar to hadoop fs -get command but in this case the destination location must be a local file reference
Example - $ hadoop fs –copyFromLocal /user/cloudera/dezyre1/Sample1.txt /home/cloudera/hdfs_bkp/
Copy/Download Sample1.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)
17) get
Downloads or Copies the files to the local filesystem.
Example - $ hadoop fs –get /user/cloudera/dezyre1/Sample2.txt /home/cloudera/hdfs_bkp/
Copy/Download Sample2.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)
18) touchz
Used to create an empty file at the specified location.
Example - $ hadoop fs –touchz /user/cloudera/dezyre1/Sample4.txt
It will create a new empty file Sample4.txt in /user/cloudera/dezyre1/ (hdfs path)
19) setrep
This hadoop fs command is used to set the replication for a specific file.
Example - $ hadoop fs –setrep –w 1 /user/cloudera/dezyre1/Sample1.txt
It will set the replication factor of Sample1.txt to 1
20) chgrp
This hadoop command is basically used to change the group name.
Example - $ sudo –u hdfs hadoop fs –chgrp –R cloudera /dezyre
It will change the /dezyre directory group membership from supergroup to cloudera (To perform this operation superuser permission is required)
21) chown
This command lets you change both the owner and group name simultaneously.
Example - $ sudo –u hdfs hadoop fs –chown –R cloudera /dezyre
It will change the /dezyre directory ownership from hdfs user to cloudera user (To perform this operation superuser is permission required)
22) hadoop chmod
Used to change the permissions of a given file/dir.
Example - $ hadoop fs –chmod /dezyre
It will change the /dezyre directory permission to 700 (drwx------).
CONCLUSION :
All major universities and companies have started investing in building tools that would help them understand and create useful insights from the data that
they have access to. One such tool that helps in analyzing and processing Big-data is Hadoop and in this experiment we have performed some of the basic
operations used in Hadoop via Cloudera.

Experiment No 1

Uploaded by

Copyright:

Available Formats

Experiment No 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Experiment No 1

Uploaded by

Copyright:

Available Formats

Experiment No.

Aim : To study basic commands of Hadoop Ecosystem.

Lab Outcome No. : 8.ITL801.1

Date of Performance: 14/2/22

Date of Submission: 21/2/22

Aim : To study basic commands of Hadoop Ecosystem.

Lab Outcome No. : 8.ITL801.1

● Pig is made up of two pieces:

querying the data.

point queries (random reads).

sparsely populated tables on clusters made from commodity hardware.

● Writing distributed applications is hard.

HADOOP BASIC COMMANDS

the operation in question, if you are stuck.

[cloudera@quickstart ~]$ ls cloudera-manager Documents kerberos parcels Templates cm_api.py

Downloads lib PicturesVideosDesktop

eclipse Music Public workspace

Removes file or empty directory identified by path.

[cloudera@quickstart Downloads]$ mkdir abc [cloudera@quickstart Downloads]$ ls

abc Documents HADOOPL K-Means-master.zip Desktop

modification date (%y, %Y).

Moves file or directory indicated by src to dest, within HDFS.

Copies file or directory identified by src to dest, within HDFS.

10) -test -[ezd]<path>

Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.

11)-get [-crc] <src><localDest>

12) -put <localSrc><dest>

Syntax of help hdfs Command

2) Usage HDFS Shell Command

3) ls HDFS Shell Command

Syntax for ls Hadoop Command -

Variations of Hadoop ls Shell Command

$ hadoop fs –ls –R /user/cloudera

4) mkdir- Used to create a new directory in HDFS at a given location.

Example of HDFS mkdir Command -

$ hadoop fs –mkdir /user/cloudera/dezyre1

Example - $ hadoop fs –copyFromLocal BDL/BDL1.txt /user/cloudera/dezyre1

Copy/Upload Sample1.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)

Ex - $ hadoop fs –put BDL/BDL2.txt /user/cloudera/dezyre1

Copy/Upload BDL2.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)

Example - $ hadoop fs –moveFromLocal BDL/BDL3.txt /user/cloudera/dezyre1

Before moving After moving

Example - $ hadoop fs –du /user/cloudera/dezyre1

Display disk usage of current hadoop distributed file system.

Example - $ hadoop fs –df

Example - $ hadoop fs –expunge

Example - $ hadoop fs –cat /user/cloudera/dezyre1/Sample1.txt

Copy files from one HDFS location to another HDFS location.

Example – $ hadoop fs –cp /user/cloudera/dezyre/war_and_peace /user/cloudera/dezyre1/

Move files from one HDFS location to another HDFS location.

Example – $ hadoop fs –mv /user/cloudera/dezyre1/Sample1.txt /user/cloudera/dezyre/

Removes the file or directory from the mentioned HDFS location.

Example – $ hadoop fs –rm -r /user/cloudera/dezyre3

Example – $ hadoop fs -tail /user/cloudera/dezyre/war_and_peace

Example – $ hadoop fs -tail –f /user/cloudera/dezyre/war_and_peace

Example - $ hadoop fs –copyFromLocal /user/cloudera/dezyre1/Sample1.txt /home/cloudera/hdfs_bkp/

Copy/Download Sample1.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)

Downloads or Copies the files to the local filesystem.

Example - $ hadoop fs –get /user/cloudera/dezyre1/Sample2.txt /home/cloudera/hdfs_bkp/

Copy/Download Sample2.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)