100% found this document useful (1 vote)
193 views

Hands-On Hadoop Tutorial

This document provides an overview of Hands-On Hadoop tutorial. It discusses that Hadoop uses HDFS distributed file system based on GFS for shared storage. HDFS divides files into large 64MB chunks distributed across data servers. It also describes the master and slave node architecture and how to start, stop and use HDFS to manage files. Configuration and adding new slave nodes are also summarized.

Uploaded by

Jomy Antony
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
193 views

Hands-On Hadoop Tutorial

This document provides an overview of Hands-On Hadoop tutorial. It discusses that Hadoop uses HDFS distributed file system based on GFS for shared storage. HDFS divides files into large 64MB chunks distributed across data servers. It also describes the master and slave node architecture and how to start, stop and use HDFS to manage files. Configuration and adding new slave nodes are also summarized.

Uploaded by

Jomy Antony
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Hands-On Hadoop

Tutorial
Chris Sosa
Wolfgang Richter
May 23, 2008
General Information
 Hadoop uses HDFS, a distributed file
system based on GFS, as its shared
filesystem

 HDFS architecture divides files into large


chunks (~64MB) distributed across data
servers

 HDFS has a global namespace


General Information (cont’d)
 Provided a script for your convenience
– Run source /localtmp/hadoop/setupVars from centurtion064
– Changes all uses of {somePath}/command to just command

 Goto http://www.cs.virginia.edu/~cbs6n/hadoop for web


access. These slides and more information are also
available there.

 Once you use the DFS (put something in it), relative


paths are from /usr/{your usr id}. E.G. if your id is tb28
… your “home dir” is /usr/tb28
Master Node
 Hadoop currently configured with
centurion064 as the master node

 Master node
– Keeps track of namespace and metadata
about items
– Keeps track of MapReduce jobs in the system
Slave Nodes
 Centurion064 also acts as a slave node

 Slave nodes
– Manage blocks of data sent from master node
– In terms of GFS, these are the chunkservers

 Currently centurion060 is also another


slave node
Hadoop Paths
 Hadoop is locally “installed” on each machine
– Installed location is in /localtmp/hadoop/hadoop-
0.15.3
– Slave nodes store their data in
/localtmp/hadoop/hadoop-dfs (this is automatically
created by the DFS)
– /localtmp/hadoop is owned by group gbg (someone
in this group must administer this or a cs admin)

 Files are divided into 64 MB chunks (this is


configurable)
Starting / Stopping Hadoop
 For the purposes of this tutorial, we
assume you have run the setupVars from
earlier

 start-all.sh – starts all slave nodes and


master node
 stop-all.sh – stops all slave nodes and
master node
Using HDFS (1/2)
 hadoop dfs
– [-ls <path>]
– [-du <path>]
– [-cp <src> <dst>]
– [-rm <path>]
– [-put <localsrc> <dst>]
– [-copyFromLocal <localsrc> <dst>]
– [-moveFromLocal <localsrc> <dst>]
– [-get [-crc] <src> <localdst>]
– [-cat <src>]
– [-copyToLocal [-crc] <src> <localdst>]
– [-moveToLocal [-crc] <src> <localdst>]
– [-mkdir <path>]
– [-touchz <path>]
– [-test -[ezd] <path>]
– [-stat [format] <path>]
– [-help [cmd]]
Using HDFS (2/2)
 Want to reformat?

 Easy
– hadoop namenode –format

 Basically we see most commands look similar


– hadoop “some command” options
– If you just type hadoop you get all possible
commands (including undocumented ones – hooray)
To Add Another Slave
 This adds another data node / job execution site
to the pool
– Hadoop dynamically uses filesystem underneath it
– If more space is available on the HDD, HDFS will try
to use it when it needs to
 Modify the slaves file
– In centurion064:/localtmp/hadoop/hadoop-
0.15.3/conf
– Copy code installation dir to
newMachine:/localtmp/hadoop/hadoop-0.15.3 (very
small)
– Restart Hadoop
Configure Hadoop

 Can configure in {$installation dir}/conf


– hadoop-default.xml for global
– hadoop-site.xml for site specific (overrides global)
That’s it for Configuration!
Real-time Access

You might also like