0% found this document useful (0 votes)

43 views22 pages

Part 02 AcessingHadoopAtTACC

The document provides information about accessing a Hadoop cluster on the Wrangler supercomputer at TACC, including an overview of Hadoop and its components. It describes how to create a Hadoop reservation through the Wrangler data portal and then access the cluster through secure shell, idev sessions, or the VNC portal to submit jobs. Users can check reservation status and access various Hadoop interfaces through web UIs once connected to the cluster.

Uploaded by

Sahera Shabnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views22 pages

Part 02 AcessingHadoopAtTACC

Uploaded by

Sahera Shabnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Big Data Analysis Workshop:

Accessing Hadoop Cluster

on Wrangler

Drs. Weijia Xu, Ruizhu Huang and Amit Gupta

Data Mining & Statistics Group
Texas Advanced Computing Center
University of Texas at Austin

Sept. 28~29, 2017

Atlanta, GA
Hadoop
•  Hadoop is an open source implementation of
MapReduce programming model in JAVA with
interface to other programming language such as C/
C++, python.
•  The top 6 vendors offering Big Data Hadoop solution
are:
•  Cloudera
•  HortonWorks
•  Amazon Web Services Elastic MapReduce Hadoop
Distribution
•  MicrosoftMapR
•  IBM InfoSphere Insights
Hadoop includes
– HDFS, a distributed file system based on googel file system
(GFS), as its shared file system.
– YARN, A resource manager to assign resources to the
computational tasks
– MapReduce, a library to enable efficient distributed data
processing easily.
– Mahout, scalable machine learning and data mining library
– Hadoop streaming, enable processing with other language.
–…
Wrangler
TACC Indiana
Mass Storage Subsystem
Mass Storage Subsystem
10 PB
10 PB
(Replicated)
(Replicated)

IB Interconnect 120 Lanes

(56 Gb/s) non-blocking
40 Gb/s
Ethernet Access &
Access &
•  A direct attached Analysis System 100 Gbps Analysis System
Public 24 Nodes
PCI interface 96 Nodes
Network 128 GB+ Memory
128 GB+ Memory
allows access to Haswell CPUs
Haswell CPUs
Globus
the NAND flash. Interconnect with
1 TB/s throughput

•  Not limited by High Speed Storage System

500+ TB •  The Hadoop cluster can be dynamically
networking 1 TB/s created over 2 to 48 nodes for each
250M+ IOPS
connection project to use in allocated ;me
•  Each node have access to 4 TB ﬂash
•  Flash storage not storage across four channels
tied to individual •  Accessible via the Hadoop cluster via
nodes idev, batch job submission and VNC
sessions.
4

Hadoop Cluster on Wrangler
•  Started dynamically up on Hadoop reservation request.
Usually, you need two steps:

•  Step 1: create a Hadoop reservation through Wrangler

data portal
What do you need?
Any web browser

•  Step 2: Access your Hadoop cluster and submit jobs

What do you need?
Secure Shell Client
Any VNC client

5
However, for this course,

Step 1: create a Hadoop reservation through Wrangler data

portal Done !
Res !!
erva;o
n Name
What do you need? : hado
op+TRA
Any web browser INING-O
PEN+23
75

Step 2: Access your Hadoop cluster and submit jobs

What do you need?
Secure Shell Client
Any VNC client

6
Multi-Factor Authentication
•  Multi-Factor Authentication with Duo
https://portal.xsede.org/mfa
Check Hadoop Reservation
•  log on to Wrangler login node from your SSH client

>ssh [email protected]

•  user can check the reserva;on status with `scontrol` command:

>scontrol show reservation

•  The reserva;on will include all users from the projects

•  The ﬁrst node in the reserva;on will be used as namenode

8
Access Hadoop Reservation
Once the reservation status is “active”, a user can access
through slurm job:

•  VNC job: starts a vnc server session on one of the node in

Hadoop cluster,
Check cluster information and hadoop job status
Application with Graphical/Web user interface

•  idev job: Assign one node in Hadoop cluster to user

Manage data in and out hadoop cluster,
Submit Hadoop jobs via command line
Code testing

•  Batch job: submitting jobs to YARN resource manager in

Hadoop cluster.
Submit large analysis job
Submit batch of processing jobs to run sequentially
Start other applications , e.g. Zeppelin

9
Access Hadoop Cluster with VNC
Please visit: vis.tacc.utexas.edu

Choose “TACC User

Portal User”

Enter creden;al

10
1. Choose Wrangler Tab

TRAINING-OPEN

3. Fill in reserva;on name:

hadoop+TRAINING-
OPEN+2375 And choose
“hadoop” queue

0. Set VNC password, (Only need once)

11
An VNC Session Enable
Access to WebUI
•  There several Web UIs run on different port
namenode
•  Cluster information port 50070
•  E.g. c252-101:50070
•  Job information port 8088
•  E.g c252-101:8088

•  Other application may have its own UI running

•  Spark Job UI
•  Hive UI

•  Web UI may not be required, as all information can be

accessed through command line as well.

12
13
14
15
16
Access Hadoop Reservation via idev
Session
User can submit idev session to hadoop cluster
reservation
Ø idev –r hadoop+TRAINING-OPEN+2375

It defaults to use your default project,

The –A allocaiton_name option to specify allocation to use

The default duration for idev is 30 minutes

The -m minutes option can specify the time of the idev session

Please limit your usage to Hadoop related tasks, you can also
submit idev without using reservation for non-hadoop tasks.

17
Slurm
Slurm is an open source, fault-tolerant, and highly scalable cluster
management and job scheduling system for large and small Linux
clusters.
•  sbatch is used to submit a job script for later execution.
•  sbatch myHadoopJob.slurm
•  scancel is used to cancel a pending or running job or job step.
•  scancel 1234
•  scontrol is the administrative tool used to view and/or modify
Slurm state.
•  scontrol show reservation
•  sinfo reports the state of partitions and nodes managed by Slurm.
•  squeue reports the state of jobs or job steps.
•  squeue -u $USER
Batch Job Script
https://portal.tacc.utexas.edu/user-guides/
wrangler#hadoop-hdfs-jobs-on-wrangler
myHadoopJob.slurm

login1$ sbatch myHadoopJob.slurm

Recap
•  Access by secure shell client
ssh [email protected]
idev –r hadoop+TRAINING-OPEN+2375 –m 240 –p hadoop

•  Access by vis portal

–  Go to vis.tacc.utexas.edu using web browser
–  Login with your creden;al
–  Goto Wrangler tab to start VNC sessions using
reserva;on hadoop+TRAINING-OPEN+2375 and
using Hadoop queue

20
FYI: How to Create Hadoop
Reservation
Wrangler data portal: portal.wrangler.tacc.utexas.edu

21
On project page choose: Manage -> Create Hadoop Reserva;on

the number of
nodes (1 ~10) to be Schedule
used for the Start ;me
Hadoop cluster.

Dura;on
(1-30 Days)

02 Hadoop Architecture and HDFS
100% (1)
02 Hadoop Architecture and HDFS
74 pages
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
No ratings yet
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
28 pages
Hadoop Online Tutorials: 250 Hadoop Interview Questions and Answers For Experienced Hadoop Developers
No ratings yet
Hadoop Online Tutorials: 250 Hadoop Interview Questions and Answers For Experienced Hadoop Developers
34 pages
Module III
No ratings yet
Module III
33 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Cloud and Ubiquitous Computing Practical Manual
100% (1)
Cloud and Ubiquitous Computing Practical Manual
20 pages
6 Hadoop
No ratings yet
6 Hadoop
20 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
CP5261Data Analytics Laboratory
No ratings yet
CP5261Data Analytics Laboratory
57 pages
BIGDATA AND HADOOP - Unit II
No ratings yet
BIGDATA AND HADOOP - Unit II
11 pages
Exp 1
No ratings yet
Exp 1
24 pages
Unit 4-1
No ratings yet
Unit 4-1
6 pages
2016 09 05 Raspberry Pi Hadoop Setup v1
No ratings yet
2016 09 05 Raspberry Pi Hadoop Setup v1
18 pages
Adobe Scan 05-Nov-2023
No ratings yet
Adobe Scan 05-Nov-2023
9 pages
Department of Computer Engineering Istanbul S. Zaim University, Istanbul, Turkey
No ratings yet
Department of Computer Engineering Istanbul S. Zaim University, Istanbul, Turkey
42 pages
Java 828242
No ratings yet
Java 828242
43 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
Hadoop Basics With Ibm Biginsights
No ratings yet
Hadoop Basics With Ibm Biginsights
22 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
bdcc-2 3
No ratings yet
bdcc-2 3
16 pages
02 Haddop Biginsights
No ratings yet
02 Haddop Biginsights
36 pages
3 Introduction To Hadoop Administration
No ratings yet
3 Introduction To Hadoop Administration
8 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop
No ratings yet
Hadoop
27 pages
Hadoop 6
No ratings yet
Hadoop 6
5 pages
Lab 1
No ratings yet
Lab 1
12 pages
Unit 3 PART 2
No ratings yet
Unit 3 PART 2
11 pages
Cloud Computing Lab Setup Using Hadoop & Open Nebula
100% (4)
Cloud Computing Lab Setup Using Hadoop & Open Nebula
46 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Unit IV
No ratings yet
Unit IV
10 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Samsung SM-A217F Service Manual
No ratings yet
Samsung SM-A217F Service Manual
24 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Big Data Apache Spark123
No ratings yet
Big Data Apache Spark123
121 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Bda 2
No ratings yet
Bda 2
25 pages
Hadoop Single Node Installation
No ratings yet
Hadoop Single Node Installation
4 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
Class 10 Computer Chapter 1 Internet Basics Notes
No ratings yet
Class 10 Computer Chapter 1 Internet Basics Notes
16 pages
Apache Hadoop: A Guide For Cluster Configuration & Testing
No ratings yet
Apache Hadoop: A Guide For Cluster Configuration & Testing
6 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
HANDS Hadoop Cloud
No ratings yet
HANDS Hadoop Cloud
10 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
A48970353 16469 14 2019 Hadoop
No ratings yet
A48970353 16469 14 2019 Hadoop
18 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
How To Set Up A Multi-Node Hadoop Cluster On Amazon EC2 - WithScreenShots
No ratings yet
How To Set Up A Multi-Node Hadoop Cluster On Amazon EC2 - WithScreenShots
42 pages
Unit-4-Unit-4-Bda EDIT
No ratings yet
Unit-4-Unit-4-Bda EDIT
16 pages
Jenny Blog
No ratings yet
Jenny Blog
12 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
Hadoop Multi Node Cluster
No ratings yet
Hadoop Multi Node Cluster
7 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Embedded IoT Assignment Answers Template
No ratings yet
Embedded IoT Assignment Answers Template
5 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Zabbix Integration Guide: Application Note
100% (1)
Zabbix Integration Guide: Application Note
22 pages
Basics of Internet, Intranet, E-Mail, Audio and Video-Conferencing (ICT)
No ratings yet
Basics of Internet, Intranet, E-Mail, Audio and Video-Conferencing (ICT)
5 pages
CDN Microproject
No ratings yet
CDN Microproject
17 pages
Cisco 1000 Series Integrated Services Routers Data Sheet
No ratings yet
Cisco 1000 Series Integrated Services Routers Data Sheet
31 pages
IOT KREASI INDONESIA - 2022 Final
No ratings yet
IOT KREASI INDONESIA - 2022 Final
21 pages
SIC MCQ Latest
100% (1)
SIC MCQ Latest
31 pages
PDF Install Guide ShareScan Comprehensive Canon
No ratings yet
PDF Install Guide ShareScan Comprehensive Canon
45 pages
HTSG Optimal Combining STBC and Spatial Multiplexing Mimo Ofdm
No ratings yet
HTSG Optimal Combining STBC and Spatial Multiplexing Mimo Ofdm
14 pages
Curso FOX615 - Rev1
No ratings yet
Curso FOX615 - Rev1
291 pages
Huawei Agile Controller-Campus Datasheet - CloudCampus
No ratings yet
Huawei Agile Controller-Campus Datasheet - CloudCampus
14 pages
Fundamentals Aerodynamics - Anderson J.D.jr.
No ratings yet
Fundamentals Aerodynamics - Anderson J.D.jr.
293 pages
02 EtherCAT Introduction 1609
No ratings yet
02 EtherCAT Introduction 1609
111 pages
STFG - Sfi1088,341,1091 - 2 Ela Nuestro
No ratings yet
STFG - Sfi1088,341,1091 - 2 Ela Nuestro
11 pages
OpenAMIP 0418 FINAL
No ratings yet
OpenAMIP 0418 FINAL
1 page
FortiGate 100E Series
No ratings yet
FortiGate 100E Series
6 pages
Huawei AP4050DE-M Access Point Datasheet
No ratings yet
Huawei AP4050DE-M Access Point Datasheet
13 pages
Wisenet WAVE Guia Rapida
No ratings yet
Wisenet WAVE Guia Rapida
2 pages
GPRS at PDF
No ratings yet
GPRS at PDF
8 pages
H 71 0200 0320 en - CU-E2x - Technical Data
No ratings yet
H 71 0200 0320 en - CU-E2x - Technical Data
2 pages
Sensors 21 03784 v2 PDF
No ratings yet
Sensors 21 03784 v2 PDF
24 pages
Mirrors - Ubuntu
No ratings yet
Mirrors - Ubuntu
19 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
CCS372 Units 3 To 5 Technical Notes
No ratings yet
CCS372 Units 3 To 5 Technical Notes
2 pages
12 - 29 - 2017 - Unsteady N
No ratings yet
12 - 29 - 2017 - Unsteady N
28 pages
Computer Networks Assigment-3
No ratings yet
Computer Networks Assigment-3
6 pages
Nmap + Nessus Cheat Sheet: Different Usage Options
100% (1)
Nmap + Nessus Cheat Sheet: Different Usage Options
1 page
Thermophysical Characteristics of Shear-Coaxial Lox-H Ames at Supercritical Pressure
No ratings yet
Thermophysical Characteristics of Shear-Coaxial Lox-H Ames at Supercritical Pressure
9 pages
Faq Clicks Digital Token
No ratings yet
Faq Clicks Digital Token
7 pages
Guide To Internet Number Resources ARIN
No ratings yet
Guide To Internet Number Resources ARIN
4 pages
Samsung - About Pbap
No ratings yet
Samsung - About Pbap
2 pages
Voip Server Tutorial PDF
No ratings yet
Voip Server Tutorial PDF
2 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Mastering Proxmox - Second Edition
From Everand
Mastering Proxmox - Second Edition
Wasim Ahmed
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet

Part 02 AcessingHadoopAtTACC

Uploaded by

Part 02 AcessingHadoopAtTACC

Uploaded by

Big Data Analysis Workshop:

Accessing Hadoop Cluster

Drs. Weijia Xu, Ruizhu Huang and Amit Gupta

Sept. 28~29, 2017

IB Interconnect 120 Lanes

• Not limited by High Speed Storage System

• Step 1: create a Hadoop reservation through Wrangler

• Step 2: Access your Hadoop cluster and submit jobs

Step 1: create a Hadoop reservation through Wrangler data

Step 2: Access your Hadoop cluster and submit jobs

• user can check the reserva;on status with `scontrol` command:

• The reserva;on will include all users from the projects

• The ﬁrst node in the reserva;on will be used as namenode

• VNC job: starts a vnc server session on one of the node in

• idev job: Assign one node in Hadoop cluster to user

• Batch job: submitting jobs to YARN resource manager in

Choose “TACC User

3. Fill in reserva;on name:

0. Set VNC password, (Only need once)

• Other application may have its own UI running

• Web UI may not be required, as all information can be

It defaults to use your default project,

The default duration for idev is 30 minutes

login1$ sbatch myHadoopJob.slurm

• Access by vis portal

You might also like

•  Not limited by High Speed Storage System

•  Step 1: create a Hadoop reservation through Wrangler

•  Step 2: Access your Hadoop cluster and submit jobs

•  user can check the reserva;on status with `scontrol` command:

•  The reserva;on will include all users from the projects

•  The ﬁrst node in the reserva;on will be used as namenode

•  VNC job: starts a vnc server session on one of the node in

•  idev job: Assign one node in Hadoop cluster to user

•  Batch job: submitting jobs to YARN resource manager in

•  Other application may have its own UI running

•  Web UI may not be required, as all information can be

•  Access by vis portal