0% found this document useful (0 votes)

14 views9 pages

Labs Hadoop1

The document provides instructions for completing homework labs using Apache Hadoop on a CentOS 6.3 Virtual Machine. It details the setup process, commands for interacting with the Hadoop Distributed File System (HDFS), and various tasks such as uploading, viewing, and manipulating files within HDFS. Additionally, it explains the structure of commands and the use of subsystems within the Hadoop framework.

Uploaded by

Alexandru Cristian Popa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

Labs Hadoop1

Uploaded by

Alexandru Cristian Popa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Apache Hadoop

Labs, Lecture 1

1
General Notes on Homework Labs
Homework for this course is completed using the course Virtual Machine (VM)
which runs the CentOS 6.3 Linux distribution. This VM has CDH (Cloudera’s
Distribution, including Apache Hadoop) installed in pseudo- distributed mode.
Pseudo-distributed mode is a method of running Hadoop whereby all Hadoop
daemons run on the same machine. It is, essentially, a cluster consisting of a single
machine. It works just like a larger Hadoop cluster, the key difference (apart from
speed, of course!) is that the block replication factor is set to one, since there is only
a single DataNode available.

Getting Started
1. The VM is set to automatically log in as the user training. Should you log out
at any time, you can log back in as the user training with the password
training.

Working with the Virtual Machine

1. Should you need it, the root password is training. You may be prompted for
this if, for example, you want to change the keyboard layout. In general, you
should not need this password since the training user has unlimited sudo
privileges.

2. In some command- line steps in the labs, you will see lines like this:

$ hadoop fs -put shakespeare \

/user/training/shakespeare

The dollar sign ($) at the beginning of each line indicates the Linux shell prompt.
The actual prompt will include additional information (e.g.,
[training@localhost workspace]$ ) but this is omitted from these
instructions for brevity.

The backslash (\) at the end of the first line signifies that the command is not
completed, and continues on the next line. You can enter the code exactly as

2
shown (on two lines), or you can enter it on a single line. If you do the latter, you
should not type in the backslash.

3. Although many students are comfortable using UNIX text editors like vi or
emacs, some might prefer a graphical text editor. To invoke the graphical editor
from the command line, type gedit followed by the path of the file you wish to
edit. Appending & to the command allows you to type additional commands
while the editor is still open. Here is an example of how to edit a file named
myfile.txt:

$ gedit myfile.txt &

3
Lab: Using HDFS
Files Used in This Exercise:

Data files (local)

~/training_materials/developer/data/shakespeare.tar.gz
~/training_materials/developer/data/access_log.gz

In this lab you will begin to get acquainted with the Hadoop tools. You will
manipulate files in HDFS, the Hadoop Distributed File System.

Set Up Your Environment

1. Before starting the labs, run the course setup script in a terminal window:

$ ~/scripts/developer/training_setup_dev.sh

Hadoop
Hadoop is already installed, configured, and running on your virtual machine.

Most of your interaction with the system will be through a command- line wrapper
called hadoop. If you run this program with no arguments, it prints a help message.
To try this, run the following command in a terminal window:

$ hadoop

The hadoop command is subdivided into several subsystems. For example, there is
a subsystem for working with files in HDFS and another for launching and managing
MapReduce processing jobs.

4
Step 1: Exploring HDFS
The subsystem associated with HDFS in the Hadoop wrapper program is called
FsShell. This subsystem can be invoked with the command hadoop fs.

1. Open a terminal window (if one is not already open) by double- clicking the
Terminal icon on the desktop.

2. In the terminal window, enter:

$ hadoop fs

You see a help message describing all the commands associated with the
FsShell subsystem.

3. Enter:

$ hadoop fs -ls /

This shows you the contents of the root directory in HDFS. There will be
multiple entries, one of which is /user. Individual users have a “home”
directory under this directory, named after their username; your username in
this course is training, therefore your home directory is /user/training.

4. Try viewing the contents of the /user directory by running:

$ hadoop fs -ls /user

You will see your home directory in the directory listing.

5. List the contents of your home directory by running:

$ hadoop fs -ls /user/training

There are no files yet, so the command silently exits. This is different from
running hadoop fs -ls /foo, which refers to a directory that doesn’t exist.
In this case, an error message would be displayed.

5
Note that the directory structure in HDFS has nothing to do with the directory
structure of the local filesystem; they are completely separate namespaces.

Step 2: Uploading Files

Besides browsing the existing filesystem, another important thing you can do with
FsShell is to upload new data into HDFS.

1. Change directories to the local filesystem directory containing the sample data
we will be using in the homework labs.

$ cd ~/training_materials/developer/data

If you perform a regular Linux ls command in this directory, you will see a few
files, including two named shakespeare.tar.gz and
shakespeare-stream.tar.gz. Both of these contain the complete works of
Shakespeare in text format, but with different formats and organizations. For
now we will work with shakespeare.tar.gz.

2. Unzip shakespeare.tar.gz by running:

$ tar zxvf shakespeare.tar.gz

This creates a directory named shakespeare/ containing several files on your

local filesystem.

3. Insert this directory into HDFS:

$ hadoop fs -put shakespeare /user/training/shakespeare

This copies the local shakespeare directory and its contents into a remote,
HDFS directory named /user/training/shakespeare.

4. List the contents of your HDFS home directory now:

$ hadoop fs -ls /user/training

You should see an entry for the shakespeare directory.

6
5. Now try the same fs -ls command but without a path argument:

$ hadoop fs -ls

You should see the same results. If you don’t pass a directory name to the -ls
command, it assumes you mean your home directory, i.e. /user/training.

Relative paths

If you pass any relative (non-absolute) paths to FsShell commands (or use
relative paths in MapReduce programs), they are considered relative to your
home directory.

6. We will also need a sample web server log file, which we will put into HDFS for
use in future labs. This file is currently compressed using GZip. Rather than
extract the file to the local disk and then upload it, we will extract and upload in
one step. First, create a directory in HDFS in which to store it:

$ hadoop fs -mkdir weblog

7. Now, extract and upload the file in one step. The -c option to gunzip
uncompresses to standard output, and the dash (-) in the hadoop fs -put
command takes whatever is being sent to its standard input and places that data
in HDFS.

$ gunzip -c access_log.gz \
| hadoop fs -put - weblog/access_log

8. Run the hadoop fs -ls command to verify that the log file is in your HDFS
home directory.

9. The access log file is quite large – around 500 MB. Create a smaller version of
this file, consisting only of its first 5000 lines, and store the smaller version in
HDFS. You can use the smaller version for testing in subsequent labs.

7
$ hadoop fs -mkdir testlog
$ gunzip -c access_log.gz | head -n 5000 \
| hadoop fs -put - testlog/test_access_log

Step 3: Viewing and Manipulating Files

Now let’s view some of the data you just copied into HDFS.

1. Enter:

$ hadoop fs -ls shakespeare

This lists the contents of the /user/training/shakespeare HDFS

directory, which consists of the files comedies, glossary, histories,
poems, and tragedies.

2. The glossary file included in the compressed file you began with is not
strictly a work of Shakespeare, so let’s remove it:

$ hadoop fs -rm shakespeare/glossary

Note that you could leave this file in place if you so wished. If you did, then it
would be included in subsequent computations across the works of
Shakespeare, and would skew your results slightly. As with many real- world big
data problems, you make trade- offs between the labor to purify your input data
and the precision of your results.

3. Enter:

$ hadoop fs -cat shakespeare/histories | tail -n 50

This prints the last 50 lines of Henry IV, Part 1 to your terminal. This command
is handy for viewing the output of MapReduce programs. Very often, an
individual output file of a MapReduce program is very large, making it
inconvenient to view the entire file in the terminal. For this reason, it’s often a

8
good idea to pipe the output of the fs -cat command into head, tail, more,
or less.

4. To download a file to work with on the local filesystem use the fs -get
command. This command takes two arguments: an HDFS path and a local path.
It copies the HDFS contents into the local filesystem:

$ hadoop fs -get shakespeare/poems ~/shakepoems.txt

$ less ~/shakepoems.txt

Other Commands
There are several other operations available with the hadoop fs command to
perform most common filesystem manipulations: mv, cp, mkdir, etc.

1. Enter:

$ hadoop fs

This displays a brief usage report of the commands available within FsShell.
Try playing around with a few of these commands if you like.

This is the end of the lab.

Experiment No 1
No ratings yet
Experiment No 1
13 pages
Final Bda Lab Manual
No ratings yet
Final Bda Lab Manual
56 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
BDA Exp 2
No ratings yet
BDA Exp 2
15 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
COMMAND Line Interface
No ratings yet
COMMAND Line Interface
26 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
OS_3_linux
No ratings yet
OS_3_linux
90 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
3a HDFS
No ratings yet
3a HDFS
17 pages
Activity 2
No ratings yet
Activity 2
31 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Word Discutie Practica 2077 2 Iulie
No ratings yet
Word Discutie Practica 2077 2 Iulie
45 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Module 2 Cont.
No ratings yet
Module 2 Cont.
16 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
Hands On Exercises 2013
No ratings yet
Hands On Exercises 2013
51 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Assignment Week 1
No ratings yet
Assignment Week 1
9 pages
Quanta ZHP - ZSP - 1a, Da0zhpmb8f0, Daozhpmb8f0 Rev. F Schematic Diagram
No ratings yet
Quanta ZHP - ZSP - 1a, Da0zhpmb8f0, Daozhpmb8f0 Rev. F Schematic Diagram
38 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
HOL - Exploring HDFS
No ratings yet
HOL - Exploring HDFS
6 pages
Hands-On Exercise: Access HDFS With Command Line and Hue
No ratings yet
Hands-On Exercise: Access HDFS With Command Line and Hue
7 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Create A Directory in HDFS at Given Path(s) .: Upload
No ratings yet
Create A Directory in HDFS at Given Path(s) .: Upload
11 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Chapter7 2
No ratings yet
Chapter7 2
23 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Mastering Java ByteCode
No ratings yet
Mastering Java ByteCode
35 pages
Corelink Mmu600ae TRM 101412 0100 00 en
No ratings yet
Corelink Mmu600ae TRM 101412 0100 00 en
194 pages
General Notes: Hands-On Exercises: Apache Hadoop For Developers 2012
No ratings yet
General Notes: Hands-On Exercises: Apache Hadoop For Developers 2012
5 pages
Adaptive Voltage Regulation of PWM Buck DC-DC Converters Using Backstepping Sliding Mode Control
No ratings yet
Adaptive Voltage Regulation of PWM Buck DC-DC Converters Using Backstepping Sliding Mode Control
6 pages
SSJ Bda File
No ratings yet
SSJ Bda File
16 pages
CC Hadoop Lab
No ratings yet
CC Hadoop Lab
6 pages
Core Basic Java Web Server Technologies
No ratings yet
Core Basic Java Web Server Technologies
20 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Sid-8bt High Speed Transfer Operation Manual
No ratings yet
Sid-8bt High Speed Transfer Operation Manual
29 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Hands-On Exercises: Apache Hadoop For Developers 2015: Uploading Files
No ratings yet
Hands-On Exercises: Apache Hadoop For Developers 2015: Uploading Files
1 page
Basic HDFS Commands
No ratings yet
Basic HDFS Commands
7 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
HDFS Commands v02 PDF
No ratings yet
HDFS Commands v02 PDF
7 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
CN Unit-4
No ratings yet
CN Unit-4
84 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
Linux Commands For Cybersecurity Analysts
No ratings yet
Linux Commands For Cybersecurity Analysts
4 pages
2373 Programming With MS Visual Basic
No ratings yet
2373 Programming With MS Visual Basic
5 pages
### Promethus Counter. Adding Prometheus To A FastAPI App - Python - by Carlos Armando Marcano Vargas - Python in Plain English
No ratings yet
### Promethus Counter. Adding Prometheus To A FastAPI App - Python - by Carlos Armando Marcano Vargas - Python in Plain English
15 pages
Basic Hadoop Commands
No ratings yet
Basic Hadoop Commands
7 pages
Travelmate p243m PDF
No ratings yet
Travelmate p243m PDF
271 pages
اتصالات البيانات والشبكات
No ratings yet
اتصالات البيانات والشبكات
52 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
User Manual: HMC9000A Diesel Engine Controller
No ratings yet
User Manual: HMC9000A Diesel Engine Controller
46 pages
Exp No4
No ratings yet
Exp No4
6 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
5 pages
Transport Layer: Process-to-Process Delivery: Udp and TCP
No ratings yet
Transport Layer: Process-to-Process Delivery: Udp and TCP
74 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Tss Seminar7
No ratings yet
Tss Seminar7
8 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Hands On
No ratings yet
Hands On
26 pages
Homework Labs Lecture01
No ratings yet
Homework Labs Lecture01
9 pages
Edc Q Bank
No ratings yet
Edc Q Bank
9 pages
Command
No ratings yet
Command
1 page
Network Analysis Module 2
No ratings yet
Network Analysis Module 2
57 pages
Labs Lecture2
No ratings yet
Labs Lecture2
6 pages
Rtai Brochure May 2023 Digital
No ratings yet
Rtai Brochure May 2023 Digital
2 pages
CV Popa-Alexandru-Cristian EN
No ratings yet
CV Popa-Alexandru-Cristian EN
2 pages
Hdfs Lab Work
No ratings yet
Hdfs Lab Work
2 pages
Loadmaster HW PDF
No ratings yet
Loadmaster HW PDF
15 pages
Handbook UPS PDF
No ratings yet
Handbook UPS PDF
52 pages
Unit-1 MPMC
No ratings yet
Unit-1 MPMC
20 pages
Security Information and Event Management (SIEM) : Mohamed Zohair Business Development Consultant
No ratings yet
Security Information and Event Management (SIEM) : Mohamed Zohair Business Development Consultant
39 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
Opsis LD500
No ratings yet
Opsis LD500
16 pages
p1364r0 2
No ratings yet
p1364r0 2
10 pages
VERILOG: Synthesis - Combinational Logic Combination Logic Function Can Be Expressed As
No ratings yet
VERILOG: Synthesis - Combinational Logic Combination Logic Function Can Be Expressed As
23 pages
SiRRAN LTEnet-EPC ProductBrochure01
No ratings yet
SiRRAN LTEnet-EPC ProductBrochure01
5 pages
TD 6280 1-P 3911 PDF
No ratings yet
TD 6280 1-P 3911 PDF
9 pages
PDF
No ratings yet
PDF
6 pages
Product Specification: Rohs-6 Compliant 10Gb/S 10Km Single Mode Datacom SFP+ Transceiver Ftlx1471D3Bcl
No ratings yet
Product Specification: Rohs-6 Compliant 10Gb/S 10Km Single Mode Datacom SFP+ Transceiver Ftlx1471D3Bcl
11 pages
Sapling PriceList Jan-2021
No ratings yet
Sapling PriceList Jan-2021
1 page
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Labs Hadoop1

Uploaded by

Labs Hadoop1

Uploaded by

Apache Hadoop

Working with the Virtual Machine

$ hadoop fs -put shakespeare \

$ gedit myfile.txt &

Data files (local)

Set Up Your Environment

2. In the terminal window, enter:

4. Try viewing the contents of the /user directory by running:

$ hadoop fs -ls /user

You will see your home directory in the directory listing.

5. List the contents of your home directory by running:

$ hadoop fs -ls /user/training

Step 2: Uploading Files

2. Unzip shakespeare.tar.gz by running:

$ tar zxvf shakespeare.tar.gz

This creates a directory named shakespeare/ containing several files on your

3. Insert this directory into HDFS:

$ hadoop fs -put shakespeare /user/training/shakespeare

4. List the contents of your HDFS home directory now:

$ hadoop fs -ls /user/training

You should see an entry for the shakespeare directory.

$ hadoop fs -mkdir weblog

Step 3: Viewing and Manipulating Files

$ hadoop fs -ls shakespeare

This lists the contents of the /user/training/shakespeare HDFS

$ hadoop fs -rm shakespeare/glossary

$ hadoop fs -cat shakespeare/histories | tail -n 50

$ hadoop fs -get shakespeare/poems ~/shakepoems.txt

This is the end of the lab.

You might also like