100% found this document useful (1 vote)

217 views13 pages

Hands-On Hadoop Tutorial

This document provides an overview of Hands-On Hadoop tutorial. It discusses that Hadoop uses HDFS distributed file system based on GFS for shared storage. HDFS divides files into large 64MB chunks distributed across data servers. It also describes the master and slave node architecture and how to start, stop and use HDFS to manage files. Configuration and adding new slave nodes are also summarized.

Uploaded by

Jomy Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

217 views13 pages

Hands-On Hadoop Tutorial

Uploaded by

Jomy Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 13

Hands-On Hadoop

Tutorial
Chris Sosa
Wolfgang Richter
May 23, 2008
General Information
 Hadoop uses HDFS, a distributed file
system based on GFS, as its shared
filesystem

 HDFS architecture divides files into large

chunks (~64MB) distributed across data
servers

 HDFS has a global namespace

General Information (cont’d)
 Provided a script for your convenience
– Run source /localtmp/hadoop/setupVars from centurtion064
– Changes all uses of {somePath}/command to just command

 Goto http://www.cs.virginia.edu/~cbs6n/hadoop for web

access. These slides and more information are also
available there.

 Once you use the DFS (put something in it), relative

paths are from /usr/{your usr id}. E.G. if your id is tb28
… your “home dir” is /usr/tb28
Master Node
 Hadoop currently configured with
centurion064 as the master node

 Master node
– Keeps track of namespace and metadata
about items
– Keeps track of MapReduce jobs in the system
Slave Nodes
 Centurion064 also acts as a slave node

 Slave nodes
– Manage blocks of data sent from master node
– In terms of GFS, these are the chunkservers

 Currently centurion060 is also another

slave node
Hadoop Paths
 Hadoop is locally “installed” on each machine
– Installed location is in /localtmp/hadoop/hadoop-
0.15.3
– Slave nodes store their data in
/localtmp/hadoop/hadoop-dfs (this is automatically
created by the DFS)
– /localtmp/hadoop is owned by group gbg (someone
in this group must administer this or a cs admin)

 Files are divided into 64 MB chunks (this is

configurable)
Starting / Stopping Hadoop
 For the purposes of this tutorial, we
assume you have run the setupVars from
earlier

 start-all.sh – starts all slave nodes and

master node
 stop-all.sh – stops all slave nodes and
master node
Using HDFS (1/2)
 hadoop dfs
– [-ls <path>]
– [-du <path>]
– [-cp <src> <dst>]
– [-rm <path>]
– [-put <localsrc> <dst>]
– [-copyFromLocal <localsrc> <dst>]
– [-moveFromLocal <localsrc> <dst>]
– [-get [-crc] <src> <localdst>]
– [-cat <src>]
– [-copyToLocal [-crc] <src> <localdst>]
– [-moveToLocal [-crc] <src> <localdst>]
– [-mkdir <path>]
– [-touchz <path>]
– [-test -[ezd] <path>]
– [-stat [format] <path>]
– [-help [cmd]]
Using HDFS (2/2)
 Want to reformat?

 Easy
– hadoop namenode –format

 Basically we see most commands look similar

– hadoop “some command” options
– If you just type hadoop you get all possible
commands (including undocumented ones – hooray)
To Add Another Slave
 This adds another data node / job execution site
to the pool
– Hadoop dynamically uses filesystem underneath it
– If more space is available on the HDD, HDFS will try
to use it when it needs to
 Modify the slaves file
– In centurion064:/localtmp/hadoop/hadoop-
0.15.3/conf
– Copy code installation dir to
newMachine:/localtmp/hadoop/hadoop-0.15.3 (very
small)
– Restart Hadoop
Configure Hadoop

 Can configure in {$installation dir}/conf

– hadoop-default.xml for global
– hadoop-site.xml for site specific (overrides global)
That’s it for Configuration!
Real-time Access

CS621 Week 15
No ratings yet
CS621 Week 15
64 pages
Chapter8 - Storage Systems
No ratings yet
Chapter8 - Storage Systems
37 pages
CSE545 Sp23 (3) Hadoop MapReduce 2-13
No ratings yet
CSE545 Sp23 (3) Hadoop MapReduce 2-13
96 pages
Unit 2-Data Storage and Cloud Computing
No ratings yet
Unit 2-Data Storage and Cloud Computing
87 pages
Chapter 2. Python Basics
No ratings yet
Chapter 2. Python Basics
125 pages
Data Analysis with LLMs
From Everand
Data Analysis with LLMs
Immanuel Trummer
No ratings yet
Slide 2 GFS and Hadoop
No ratings yet
Slide 2 GFS and Hadoop
95 pages
Cloud Computing Unit 3
No ratings yet
Cloud Computing Unit 3
10 pages
3
No ratings yet
3
11 pages
Tcs
No ratings yet
Tcs
6 pages
Tcs 12
No ratings yet
Tcs 12
14 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
87 pages
Unit 1
No ratings yet
Unit 1
22 pages
Apache Hadoop Training
No ratings yet
Apache Hadoop Training
377 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Python IQ
No ratings yet
Python IQ
123 pages
Chapter 3
No ratings yet
Chapter 3
47 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
Scalability in Distributed Systems
No ratings yet
Scalability in Distributed Systems
12 pages
Google Bigtable: Describe The Data Model of Bigtable
100% (1)
Google Bigtable: Describe The Data Model of Bigtable
6 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
26 pages
Different Hadoop Modes: 1. Local Mode or Standalone Mode
No ratings yet
Different Hadoop Modes: 1. Local Mode or Standalone Mode
12 pages
16 06295990 Secure DFS Blakley Secret Sharing PDF
No ratings yet
16 06295990 Secure DFS Blakley Secret Sharing PDF
8 pages
Gfs Vs Hfs
No ratings yet
Gfs Vs Hfs
2 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
Mastering SaltStack - Second Edition
From Everand
Mastering SaltStack - Second Edition
Joseph Hall
No ratings yet
Hadoop Admin
No ratings yet
Hadoop Admin
13 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
Hadoop Fundamentals and Hive Interview Questions
No ratings yet
Hadoop Fundamentals and Hive Interview Questions
8 pages
Ethnotech - Data Science With Python
No ratings yet
Ethnotech - Data Science With Python
480 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Hadoop Big Data Unit 2
No ratings yet
Hadoop Big Data Unit 2
23 pages
Types of State-Management Objects
No ratings yet
Types of State-Management Objects
12 pages
How Google Works - Baseline
No ratings yet
How Google Works - Baseline
12 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Cloudera Developer Training For Apache Spark: Hands-On Exercises
No ratings yet
Cloudera Developer Training For Apache Spark: Hands-On Exercises
61 pages
M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Data Science Internship
No ratings yet
Data Science Internship
2 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
CC QP
No ratings yet
CC QP
5 pages
2021 - CC - ASSIGN - 8weeks Course
No ratings yet
2021 - CC - ASSIGN - 8weeks Course
34 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Py Spark
No ratings yet
Py Spark
427 pages
Distributed Filesystems Review
No ratings yet
Distributed Filesystems Review
30 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Introduction To Elasticsearch.: Ruslan Zavacky
No ratings yet
Introduction To Elasticsearch.: Ruslan Zavacky
75 pages
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
100% (1)
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
72 pages
DBT Flow
No ratings yet
DBT Flow
15 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mortar Pig Cheat Sheet
50% (2)
Mortar Pig Cheat Sheet
13 pages
Jnu Dbms Lab File
No ratings yet
Jnu Dbms Lab File
55 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
Hadoop & Big Data
No ratings yet
Hadoop & Big Data
36 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
DataScience With Python Course Content Syllabus Meritude
No ratings yet
DataScience With Python Course Content Syllabus Meritude
10 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
BigQuery Query Optimization With Troposphere PDF
No ratings yet
BigQuery Query Optimization With Troposphere PDF
51 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Scholastic Averages Sheet: X STD Xii Std/Puc Board/Univer Board/Univer
No ratings yet
Scholastic Averages Sheet: X STD Xii Std/Puc Board/Univer Board/Univer
1 page
Apache Pig
No ratings yet
Apache Pig
21 pages
How Google Works
No ratings yet
How Google Works
11 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Introduction To Splunk
No ratings yet
Introduction To Splunk
7 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
Case 11 - Big Data and The Elephant 2022 Valacich IS Today
No ratings yet
Case 11 - Big Data and The Elephant 2022 Valacich IS Today
1 page
Top 100 Hadoop Interview Questions and Answers 2016
No ratings yet
Top 100 Hadoop Interview Questions and Answers 2016
21 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
Teradata Advanced SQL Part1 PDF
100% (2)
Teradata Advanced SQL Part1 PDF
38 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
Citibank Payment Gateway
No ratings yet
Citibank Payment Gateway
1 page
Snow SQL
No ratings yet
Snow SQL
3 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Hive Commands
No ratings yet
Hive Commands
3 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
Python Syllbus by Lokesh
No ratings yet
Python Syllbus by Lokesh
5 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Hands-On Hadoop Tutorial

Uploaded by

Hands-On Hadoop Tutorial

Uploaded by

Hands-On Hadoop

 HDFS architecture divides files into large

 HDFS has a global namespace

 Goto http://www.cs.virginia.edu/~cbs6n/hadoop for web

 Once you use the DFS (put something in it), relative

 Currently centurion060 is also another

 Files are divided into 64 MB chunks (this is

 start-all.sh – starts all slave nodes and

 Basically we see most commands look similar

 Can configure in {$installation dir}/conf

You might also like