Experiment No 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Experiment No.

Aim : To study basic commands of Hadoop Ecosystem.

Lab Outcome No. : 8.ITL801.1

Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.

Date of Performance: 14/2/22

Date of Submission: 21/2/22

Program formation/ Documentation (02) Timely Submission Viva Answer (03) Experiment Marks Teacher Signature
Execution / Ethical (03) (15) with date
practices (07 )
EXPERIMENT NO : 1

Aim : To study basic commands of Hadoop Ecosystem.

Lab Outcome No. : 8.ITL801.1

Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.

THEORY :

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming
models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to
scale up from a single server to thousands of machines, each offering local computation and storage.

HADOOP ECOSYSTEM
Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Apache projects and various commercial tools and
solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common. Most of the tools or solutions are used to supplement or
support these major elements. All these tools work collectively to provide services such as absorption, analysis, storage and maintenance of data etc.
1. Pig
● A data flow language and execution environment for exploring very large datasets.

● Pig is made up of two pieces:

o The language used to express data flows, called Pig Latin. o The execution environment to run Pig

Latinprograms.

2. Hive
● A distributed data warehouse.

● Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for

querying the data.

3. Sqoop
● Sqoop is an open-source tool that allows users to extract data from a relational database into Hadoop for further processing.

● A tool for efficient bulk transfer of data between structured data stores such as RDBMS andHDFS.

4. HBase
● A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and

point queries (random reads).

● HBase is not relational and does not support SQL, but given the proper problem space, it is able to do what an RDBMS cannot, such as host very large,

sparsely populated tables on clusters made from commodity hardware.

5. Zookeeper
● A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed

applications.

● Writing distributed applications is hard.


6. Avro
● A serialization system for efficient, cross-language RPC, and persistent data storages.
● Apache Avro is a language-neutral data serialization system.

HADOOP BASIC COMMANDS

There are many more commands in hadoop than that are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no

additional arguments will list all the commands that can be run with the FsShell system. Furthermore, /bin/hadoop fs -help command will display a short usage summary for

the operation in question, if you are stuck.

1) -ls <path>

Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry.

[cloudera@quickstart ~]$ ls cloudera-manager Documents kerberos parcels Templates cm_api.py

Downloads lib PicturesVideosDesktop

eclipse Music Public workspace

2) -du <path>

Shows disk usage, in bytes, for all the files which match path; filenames are reported with the full HDFS protocol prefix.

3) -rm <path>

Removes file or empty directory identified by path.


4) -mkdir <path>

Creates a directory named path in HDFS. Creates any parent directories in the path that are missing (e.g., mkdir -p in Linux).

[cloudera@quickstart Downloads]$ mkdir abc [cloudera@quickstart Downloads]$ ls

abc Documents HADOOPL K-Means-master.zip Desktop

enterprise-deployment.jsonK-Means-master
5) -touchz <path>

Creates a file path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.

6) -stat [format]<path>

Prints information about the path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o), replication (%r), and

modification date (%y, %Y).

7) -mv<src><dest>

Moves file or directory indicated by src to dest, within HDFS.


8) -cp<src><dest>

Copies file or directory identified by src to dest, within HDFS.


9) -help <cmd-name>

Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd.

10) -test -[ezd]<path>

Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.

11)-get [-crc] <src><localDest>

Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.

12) -put <localSrc><dest>

Copies the file or directory from the local file system identified by localSrc to dest within the DFS.

OUTPUT:
1) help HDFS Shell Command

Syntax of help hdfs Command

$ hadoop fs –help
Help hdfs shell command helps hadoop developers figure out all the available hadoop commands and how to use them.

2) Usage HDFS Shell Command

$ hadoop fs –usage ls
Usage command gives all the options that can be used with a particular hdfs command.

3) ls HDFS Shell Command

Syntax for ls Hadoop Command -

$ hadoop fs –ls

This command will list all the available files and subdirectories under default directory.For instance, in our example the default directory for Cloudera VM is

/user/cloudera

Variations of Hadoop ls Shell Command


$ hadoop fs –ls /

Returns all the available files and subdirectories present under the root directory.

$ hadoop fs –ls –R /user/cloudera

Returns all the available files and recursively lists all the subdirectories under /user/Cloudera

4) mkdir- Used to create a new directory in HDFS at a given location.

Example of HDFS mkdir Command -

$ hadoop fs –mkdir /user/cloudera/dezyre1


The above command will create a new directory named dezyre1 under the location /user/cloudera

5) copyFromLocal

Copy a file from the local file system to the HDFS location.

For the following examples, we will use BDL1.txt file available in the /home/BDL location.

Example - $ hadoop fs –copyFromLocal BDL/BDL1.txt /user/cloudera/dezyre1

Copy/Upload Sample1.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)

6) put –

This hadoop command uploads a single file or multiple source files from local file system to hadoop distributed file system (HDFS).

Ex - $ hadoop fs –put BDL/BDL2.txt /user/cloudera/dezyre1

Copy/Upload BDL2.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path)

7) moveFromLocal
This hadoop command functions similar to the put command but the source file will be deleted after copying.

Example - $ hadoop fs –moveFromLocal BDL/BDL3.txt /user/cloudera/dezyre1

Move BDL3.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path). Source file will be deleted after moving.

Before moving After moving

8) du

Displays the disk usage for all the files available under a given directory.

Example - $ hadoop fs –du /user/cloudera/dezyre1

9) df

Display disk usage of current hadoop distributed file system.

Example - $ hadoop fs –df


10) Expunge

This HDFS command empties the trash by deleting all the files and directories.

Example - $ hadoop fs –expunge

11) Cat

This is similar to the cat command in Unix and displays the contents of a file.

Example - $ hadoop fs –cat /user/cloudera/dezyre1/Sample1.txt

12) cp

Copy files from one HDFS location to another HDFS location.

Example – $ hadoop fs –cp /user/cloudera/dezyre/war_and_peace /user/cloudera/dezyre1/

13) mv

Move files from one HDFS location to another HDFS location.

Example – $ hadoop fs –mv /user/cloudera/dezyre1/Sample1.txt /user/cloudera/dezyre/

14) rm

Removes the file or directory from the mentioned HDFS location.

Example – $ hadoop fs –rm -r /user/cloudera/dezyre3


15) tail

This hadoop command will show the last kilobyte of the file to stdout.

Example – $ hadoop fs -tail /user/cloudera/dezyre/war_and_peace

Example – $ hadoop fs -tail –f /user/cloudera/dezyre/war_and_peace

Using the tail commands with -f option, shows the last kilobyte of the file from end in a page wise format.

16) copyToLocal

Copies the files to the local filesystem . This is similar to hadoop fs -get command but in this case the destination location must be a local file reference

Example - $ hadoop fs –copyFromLocal /user/cloudera/dezyre1/Sample1.txt /home/cloudera/hdfs_bkp/

Copy/Download Sample1.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)

17) get

Downloads or Copies the files to the local filesystem.

Example - $ hadoop fs –get /user/cloudera/dezyre1/Sample2.txt /home/cloudera/hdfs_bkp/

Copy/Download Sample2.txt available in /user/cloudera/dezyre1 (hdfs path) to /home/cloudera/hdfs_bkp/ (local path)

18) touchz

Used to create an empty file at the specified location.

Example - $ hadoop fs –touchz /user/cloudera/dezyre1/Sample4.txt

It will create a new empty file Sample4.txt in /user/cloudera/dezyre1/ (hdfs path)

19) setrep

This hadoop fs command is used to set the replication for a specific file.

Example - $ hadoop fs –setrep –w 1 /user/cloudera/dezyre1/Sample1.txt

It will set the replication factor of Sample1.txt to 1

20) chgrp

This hadoop command is basically used to change the group name.

Example - $ sudo –u hdfs hadoop fs –chgrp –R cloudera /dezyre

It will change the /dezyre directory group membership from supergroup to cloudera (To perform this operation superuser permission is required)
21) chown

This command lets you change both the owner and group name simultaneously.

Example - $ sudo –u hdfs hadoop fs –chown –R cloudera /dezyre

It will change the /dezyre directory ownership from hdfs user to cloudera user (To perform this operation superuser is permission required)

22) hadoop chmod

Used to change the permissions of a given file/dir.

Example - $ hadoop fs –chmod /dezyre

It will change the /dezyre directory permission to 700 (drwx------).

CONCLUSION :

All major universities and companies have started investing in building tools that would help them understand and create useful insights from the data that

they have access to. One such tool that helps in analyzing and processing Big-data is Hadoop and in this experiment we have performed some of the basic

operations used in Hadoop via Cloudera.

You might also like