0% found this document useful (0 votes)

60 views25 pages

Cluster Stack Basics

This document provides an overview of cluster stack basics, including: - A cluster approach uses shared filesystems, job management, dedicated compute nodes, and a consistent environment across nodes interconnected with a low-latency network. - Key components of a cluster include basic network services like NTP and DNS, shared storage like NFS, logging, licensing, databases, and specialized components like a job manager and parallel storage. - The document discusses various aspects of cluster networking, interconnect technologies, parallel filesystems, and cluster management software.

Uploaded by

DanielRomero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views25 pages

Cluster Stack Basics

Uploaded by

DanielRomero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Linux Clusters Ins.

tute:
Cluster Stack Basics
Bre$ Zimmerman, University of Oklahoma
Senior Systems Analyst, OU Supercompu<ng Center for Educa<on and Research (OSCER)

A Bunch of Computers
Users can login to any node
Filesystems arent shared between nodes
Work is run wherever you can nd space
Nodes maintained individually

4-8 August 2014

Whats wrong with a bunch of nodes?

Compe<<on for resources

Size and type of problem is limited

Nodes get out of sync

Problems for users
Diculty in management

4-8 August 2014

Cluster Approach
Shared lesystems
Job management
Nodes dedicated to compute
Consistent environment
Interconnect

4-8 August 2014

Whats right about the cluster approach?

Easier to use
Maximize eciency
Can do bigger and be$er problems
Nodes can be used coopera<vely

4-8 August 2014

The Types of Nodes

Users login here

Compiling
Edi<ng
SubmiTng and Monitoring jobs

Compute

Users might login here

Run jobs as directed by the scheduler

Support

Users dont login here

Do all the other stu

4-8 August 2014

What a cluster needs the mundane

Network services NTP, DNS, DHCP
Shared Storage -- NFS
Logging Consolidated Syslog as a star<ng point
Licensing FlexLM and the like
Database User and Administra<ve Data
Boot/Provisioning PXE, build system
Authen<ca<on LDAP

4-8 August 2014

What a cluster needs -- Specialized

Interconnect An ideally low-latency network
Job manager Resource manager/ scheduler
Parallel Storage Get around the limita<ons of NFS

4-8 August 2014

Network Services

NTP Network Time Protocol, provides clock

synchroniza<on across all nodes in the cluster
DHCP Dynamic Host Congura<on Protocol,
allows central congura<on of host networking
DNS Provides name to address transla<on for the
cluster
NFS Basic UNIX network lesystem

4-8 August 2014

Logging
Syslog

The classic system for UNIX logging

Applica<on has to opt to emit messages

Monitoring

Ac<ve monitoring to catch condi<ons elec<ve

monitoring doesnt catch
Resource manager
Nagios/cac</zabbix/ganglia

IDS

Intrusion detec<on
Monitoring targe<ng misuse/a$acks on the cluster

4-8 August 2014

Basic services, con.nued

Licensing FlexNet/FlexLM or equivalent, mediates access
to a pool of shared licenses.
Database Administra<ve use for logging/monitoring,
dynamic congura<on. Requirements of user so`ware.
Boot/Provisioning For example PXE/Cobbler, PXE/Image
or part of a cluster management suite

4-8 August 2014

Authen.ca.on
Flat les -- passwd, group, shadow entries
NIS -- network access to central at les
LDAP -- Read/Write access to a dynamic tree
structure of account and other informa<on
Host equivalency

4-8 August 2014

Cluster Networking
Hardware Management Lights out management
External Public interfaces to the cluster
Internal General node to node communica<on
Storage Access to network lesystems
Interconnect high-speed, low-latency for mul<-
node jobs
Some of these can share a medium

4-8 August 2014

Interconnect
In the most recent Top 500 list (h$p://top500.org)
there were 224 installa<ons relying on Inniband,
100 using Gigabit Ethernet, and 88 using 10 Gigabit
Ethernet

Ethernet Latency of 50-125 s (GbE), 5-50 s
(10GbE), ~5 s RoCEE
Inniband Latency of 1.3 s (QDR) .7 s (FDR-10/
FDR), .5 s (EDR)

4-8 August 2014

Parallel Filesystem
Lustre - h$p://lustre.org/
PanFS - h$p://www.panasas.com/
GPFS -
h$p://www-03.ibm.com/so`ware/products/en/
so`ware

Parallel lesystems take the general approach of
separa<ng lesystem metadata from the storage.
Lustre and PanFS have dedicated nodes for metadata
(MDS or director blades). GPFS distributes metadata
throughout the cluster
4-8 August 2014

Cluster Management
Automates the building of a cluster
Some way to easily maintain cluster system
consistency
The ability to automate cluster maintenance tasks
Oer some way to monitor cluster health and
performance

4-8 August 2014

Cluster Managemement SoNware

The resource manager knows the state of the various resources on the
cluster and maintains a list of the jobs that are reques<ng resources

The scheduler, using the informa<on from the resource manager
selects jobs from the queue for execu<on

Rocks (h$p://www.rocksclusters.org/wordpress/)
Bright Cluster Manager (
h$p://www.brightcompu<ng.com/Bright-Cluster-Manager)
xCAT (Extreme Cluster/Cloud Administra<on Toolkit) (
h$p://sourceforge.net/p/xcat/wiki/Main_Page/

4-8 August 2014

Congura.on Management
While it is true that boo<ng with a central boot server can make it
easier to make sure the OS on each compute node (or, at least, each
type of compute node) has an iden<cal setup/install, there are s<ll
les which wind up being more dynamic. Some such les are
password/group/shadow and hosts les.

Rsync
Cfengine
Chef
Puppet
Salt

4-8 August 2014

SoNware Installa.on and Management

All linux distros have some sort of package management tool. For
Redhat/CentOS/Scien<c based clusters, this is rpm and yum. Debian
has dpkg and apt

In any case pre-packaged so`ware tends to assume that it is going to
be installed in a specic place on the machine and that it will be the
only version of that so`ware on the machine.

One a cluster, it may be necessary to look at so`ware installa<on
dierently from a standard linux machine

Install to global lesystem
Keep boot image as small as possible
Maintain mul<ple versions

4-8 August 2014

SoNware installa.on and management

There are a couple of tools useful for naviga<ng the dicul<es of
maintaining user environments when dealing with mul<ple versions of
so`ware or so`ware in non-standard loca<ons.
So`Env (h$p://h$p://www.lcrc.anl.gov/info/So`ware/So`env)
Useful for packaging sta<c user environment required by packages
Modules (h$p://modules.sourceforge.net/)
Can be used to make dynamic changes to a users environment.

4-8 August 2014

Resource Manager/Scheduler
Accepts job submissions, maintains a queue of jobs
Allocates nodes/resources and starts jobs on
compute nodes
Schedules wai<ng jobs
Available op<ons
SGE (Sun Grid Engine)
LSF / Openlava (Load Sharing Facility)
PBS (Portable Batch System)
OpenPBS
Torque

SLURM
4-8 August 2014

Best Prac.ces
Here is a quick overview of the general func<ons to
secure a cluster
Risk Avoidance
Deterrence
Preven<on
Detec<on
Recovery

The priority of these will depend on your security
approach
4-8 August 2014

Risk Avoidance
Provide the minimum of services necessary
Grant the least privileges necessary
Install the minimum so`ware necessary
The simpler the environment, the fewer the vectors
available for a$ack.

4-8 August 2014

Deterrence
Limit the discoverability of the cluster
Publish acceptable use policies

Preven.on
Fix known issues (patching)
Congure services for minimal func<onality
Restrict user access and authority
Document ac<ons and changes

4-8 August 2014

Detec.on
Monitor the cluster
Integrate feedback from the users
Set alerts and automated response

Recovery
Backups
Documenta<on
Dene acceptable loss

4-8 August 2014

Vcs and Oracle Ha
No ratings yet
Vcs and Oracle Ha
168 pages
SUSE Linux Cluster
No ratings yet
SUSE Linux Cluster
392 pages
Ansible Playbook Essentials
100% (3)
Ansible Playbook Essentials
298 pages
SUSE HA Arch Overview
No ratings yet
SUSE HA Arch Overview
26 pages
Suse Linux
100% (1)
Suse Linux
984 pages
Sles Admin Fcs
No ratings yet
Sles Admin Fcs
972 pages
User Manual
No ratings yet
User Manual
116 pages
SUSE Linux Enterprise High Availability PDF
100% (1)
SUSE Linux Enterprise High Availability PDF
35 pages
Cluster Vision
100% (1)
Cluster Vision
25 pages
Sun Cluster
100% (1)
Sun Cluster
87 pages
Practical Guide Rac
No ratings yet
Practical Guide Rac
63 pages
Lotus Domino Cluster
No ratings yet
Lotus Domino Cluster
170 pages
QP - 12-CS - PB-I 23-24 Set 1
No ratings yet
QP - 12-CS - PB-I 23-24 Set 1
10 pages
SUSE Linux Enterprise Server 10 Installation and Administration
100% (1)
SUSE Linux Enterprise Server 10 Installation and Administration
982 pages
Book Sleha
No ratings yet
Book Sleha
502 pages
Book-Sleha-Guide Color en PDF
No ratings yet
Book-Sleha-Guide Color en PDF
369 pages
Book Sleha Guide Color en
No ratings yet
Book Sleha Guide Color en
368 pages
Book Sle Reference
No ratings yet
Book Sle Reference
988 pages
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
No ratings yet
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
76 pages
Sun Cluster
No ratings yet
Sun Cluster
87 pages
22 Clusters Slides
No ratings yet
22 Clusters Slides
61 pages
Martin Berger - Oracle Priva
No ratings yet
Martin Berger - Oracle Priva
46 pages
Book-Sleha Color en PDF
No ratings yet
Book-Sleha Color en PDF
394 pages
1 Cluster Computing
No ratings yet
1 Cluster Computing
42 pages
HPC Introduction Lecture 2
No ratings yet
HPC Introduction Lecture 2
55 pages
Advance Computing Technology (170704)
No ratings yet
Advance Computing Technology (170704)
106 pages
2 Node Cluster With Rhel-6 KVM
No ratings yet
2 Node Cluster With Rhel-6 KVM
182 pages
Module 4 Linux Cloud and RTOS
No ratings yet
Module 4 Linux Cloud and RTOS
34 pages
Lecture 8 ICT723
No ratings yet
Lecture 8 ICT723
41 pages
Vcs and Oracle Ha
No ratings yet
Vcs and Oracle Ha
167 pages
Introduction To XCAT
No ratings yet
Introduction To XCAT
45 pages
Linux Clusters Institute: Scheduling
No ratings yet
Linux Clusters Institute: Scheduling
93 pages
Capco Murex Cs
100% (1)
Capco Murex Cs
4 pages
02-Chapter 2 - Workstations
No ratings yet
02-Chapter 2 - Workstations
27 pages
Linux Scenario - Based Interview Q&A
No ratings yet
Linux Scenario - Based Interview Q&A
25 pages
1 Introduction
No ratings yet
1 Introduction
19 pages
HPC Cluster Admin Tools DAY
No ratings yet
HPC Cluster Admin Tools DAY
27 pages
Cluster 2
No ratings yet
Cluster 2
26 pages
Cluster Computing
No ratings yet
Cluster Computing
57 pages
Sun Grid Engine Tutorial
No ratings yet
Sun Grid Engine Tutorial
14 pages
Latihan Soal Paket 1
0% (1)
Latihan Soal Paket 1
14 pages
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
No ratings yet
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
49 pages
Clustering Tech Overview
No ratings yet
Clustering Tech Overview
48 pages
Sun Cluster 3.1 Quick Reference
No ratings yet
Sun Cluster 3.1 Quick Reference
6 pages
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
No ratings yet
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
1 page
VCS Building Blocks
No ratings yet
VCS Building Blocks
31 pages
Industrial Training
No ratings yet
Industrial Training
17 pages
CloudEngineer Syllabus
No ratings yet
CloudEngineer Syllabus
12 pages
Interesting Facts About RAC
No ratings yet
Interesting Facts About RAC
40 pages
Cluster Computing
No ratings yet
Cluster Computing
11 pages
Staimer On PCA X8 Final
No ratings yet
Staimer On PCA X8 Final
10 pages
Design and Construction of A Battery Level Indicator
No ratings yet
Design and Construction of A Battery Level Indicator
10 pages
Nitsuko 124i - 384i Software Hardware Program Features
No ratings yet
Nitsuko 124i - 384i Software Hardware Program Features
1,195 pages
Astrology Proposal
No ratings yet
Astrology Proposal
11 pages
Skybox Security Product Tour Guide
No ratings yet
Skybox Security Product Tour Guide
52 pages
Listening Practice Questions
No ratings yet
Listening Practice Questions
28 pages
Data Center Security & Control: Smolianets Vladyslav
100% (1)
Data Center Security & Control: Smolianets Vladyslav
37 pages
MX Master 3S Manual
No ratings yet
MX Master 3S Manual
8 pages
Ayu Shahirah Salem: Objective
No ratings yet
Ayu Shahirah Salem: Objective
2 pages
Gta Int
No ratings yet
Gta Int
44 pages
HV 48V 80AH LiFeP04
No ratings yet
HV 48V 80AH LiFeP04
1 page
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
No ratings yet
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
26 pages
Asgore V2
No ratings yet
Asgore V2
29 pages
Defect Tracking System
No ratings yet
Defect Tracking System
38 pages
Wcms 2nd Unit Notes
No ratings yet
Wcms 2nd Unit Notes
31 pages
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
No ratings yet
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
2 pages
QA6
No ratings yet
QA6
8 pages
Automation Assignment
No ratings yet
Automation Assignment
2 pages
Unit 1-Introduction To Database Systems
No ratings yet
Unit 1-Introduction To Database Systems
36 pages
Schematic - Zigbee Stick 4.0 CH340C
No ratings yet
Schematic - Zigbee Stick 4.0 CH340C
1 page
Assembly Language For Intel - Based Computers, 4 Edition
No ratings yet
Assembly Language For Intel - Based Computers, 4 Edition
32 pages
MSBTE Solution App-2
No ratings yet
MSBTE Solution App-2
4 pages
Com - Bat.loader Logcat
No ratings yet
Com - Bat.loader Logcat
5 pages
Angular CRUD Using PHP and MySQL
No ratings yet
Angular CRUD Using PHP and MySQL
9 pages
Noto Sans Korean Font License
No ratings yet
Noto Sans Korean Font License
2 pages
Flyer Ki M en
No ratings yet
Flyer Ki M en
2 pages
Media Factsheet - JTC Wis and Gaussian Robotics Collaborate To Develop Singapores First Fully Autonomous Cleaning Solution
No ratings yet
Media Factsheet - JTC Wis and Gaussian Robotics Collaborate To Develop Singapores First Fully Autonomous Cleaning Solution
5 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
From Everand
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
Erwin Dirks
No ratings yet
Zorin OS Administration and User Guide: Definitive Reference for Developers and Engineers
From Everand
Zorin OS Administration and User Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Slackware Essentials: Definitive Reference for Developers and Engineers
From Everand
Slackware Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
uWSGI Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
uWSGI Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenShift Platforms and Operations: Definitive Reference for Developers and Engineers
From Everand
OpenShift Platforms and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive openSUSE Administration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive openSUSE Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ubuntu Administration Essentials: Definitive Reference for Developers and Engineers
From Everand
Ubuntu Administration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
From Everand
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Cluster Stack Basics

Uploaded by

Cluster Stack Basics

Uploaded by

Linux Clusters Ins.

4-8 August 2014

Whats wrong with a bunch of nodes?

Size and type of problem is limited

Nodes get out of sync

4-8 August 2014

4-8 August 2014

Whats right about the cluster approach?

4-8 August 2014

The Types of Nodes

Users login here

Users might login here

Users dont login here

4-8 August 2014

What a cluster needs the mundane

4-8 August 2014

What a cluster needs -- Specialized

4-8 August 2014

NTP Network Time Protocol, provides clock

4-8 August 2014

The classic system for UNIX logging

Ac<ve monitoring to catch condi<ons elec<ve

4-8 August 2014

Basic services, con.nued

4-8 August 2014

4-8 August 2014

4-8 August 2014

4-8 August 2014

Cluster Managemement SoNware

4-8 August 2014

SoNware Installa.on and Management

SoNware installa.on and management

4-8 August 2014

4-8 August 2014

4-8 August 2014

4-8 August 2014

You might also like