Experiment No 1
Experiment No 1
Experiment No 1
Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.
Program formation/ Documentation (02) Timely Submission Viva Answer (03) Experiment Marks Teacher Signature
Execution / Ethical (03) (15) with date
practices (07 )
EXPERIMENT NO : 1
Lab Outcome : Demonstrate capability to use Big Data Frameworks like Hadoop.
THEORY :
Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming
models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to
scale up from a single server to thousands of machines, each offering local computation and storage.
HADOOP ECOSYSTEM
Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Apache projects and various commercial tools and
solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common. Most of the tools or solutions are used to supplement or
support these major elements. All these tools work collectively to provide services such as absorption, analysis, storage and maintenance of data etc.
1. Pig
● A data flow language and execution environment for exploring very large datasets.
o The language used to express data flows, called Pig Latin. o The execution environment to run Pig
Latinprograms.
2. Hive
● A distributed data warehouse.
● Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for
3. Sqoop
● Sqoop is an open-source tool that allows users to extract data from a relational database into Hadoop for further processing.
● A tool for efficient bulk transfer of data between structured data stores such as RDBMS andHDFS.
4. HBase
● A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and
● HBase is not relational and does not support SQL, but given the proper problem space, it is able to do what an RDBMS cannot, such as host very large,
5. Zookeeper
● A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed
applications.
There are many more commands in hadoop than that are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no
additional arguments will list all the commands that can be run with the FsShell system. Furthermore, /bin/hadoop fs -help command will display a short usage summary for
1) -ls <path>
Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry.
2) -du <path>
Shows disk usage, in bytes, for all the files which match path; filenames are reported with the full HDFS protocol prefix.
3) -rm <path>
Creates a directory named path in HDFS. Creates any parent directories in the path that are missing (e.g., mkdir -p in Linux).
enterprise-deployment.jsonK-Means-master
5) -touchz <path>
Creates a file path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.
6) -stat [format]<path>
Prints information about the path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o), replication (%r), and
7) -mv<src><dest>
Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd.
Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.
Copies the file or directory from the local file system identified by localSrc to dest within the DFS.
OUTPUT:
1) help HDFS Shell Command
$ hadoop fs –help
Help hdfs shell command helps hadoop developers figure out all the available hadoop commands and how to use them.
$ hadoop fs –usage ls
Usage command gives all the options that can be used with a particular hdfs command.
$ hadoop fs –ls
This command will list all the available files and subdirectories under default directory.For instance, in our example the default directory for Cloudera VM is
/user/cloudera
Returns all the available files and subdirectories present under the root directory.
Returns all the available files and recursively lists all the subdirectories under /user/Cloudera
5) copyFromLocal
Copy a file from the local file system to the HDFS location.
For the following examples, we will use BDL1.txt file available in the /home/BDL location.
6) put –
This hadoop command uploads a single file or multiple source files from local file system to hadoop distributed file system (HDFS).
7) moveFromLocal
This hadoop command functions similar to the put command but the source file will be deleted after copying.
Move BDL3.txt available in /home/BDL to /user/cloudera/dezyre1 (hdfs path). Source file will be deleted after moving.
8) du
Displays the disk usage for all the files available under a given directory.
9) df
This HDFS command empties the trash by deleting all the files and directories.
11) Cat
This is similar to the cat command in Unix and displays the contents of a file.
12) cp
13) mv
14) rm
This hadoop command will show the last kilobyte of the file to stdout.
Using the tail commands with -f option, shows the last kilobyte of the file from end in a page wise format.
16) copyToLocal
Copies the files to the local filesystem . This is similar to hadoop fs -get command but in this case the destination location must be a local file reference
17) get
18) touchz
19) setrep
This hadoop fs command is used to set the replication for a specific file.
20) chgrp
It will change the /dezyre directory group membership from supergroup to cloudera (To perform this operation superuser permission is required)
21) chown
This command lets you change both the owner and group name simultaneously.
It will change the /dezyre directory ownership from hdfs user to cloudera user (To perform this operation superuser is permission required)
CONCLUSION :
All major universities and companies have started investing in building tools that would help them understand and create useful insights from the data that
they have access to. One such tool that helps in analyzing and processing Big-data is Hadoop and in this experiment we have performed some of the basic