Cluster and Grid Computing
Cluster and Grid Computing
Cheng-Zhong Xu
Whats a Cluster?
Collection of independent computer systems working together as if a single system. Coupled through a scalable, high bandwidth, low latency interconnect.
It is:
Cluster Computing..
The Commodity Supercomputing!
Clusters of SMPs
SMPs are the fastest commodity machine, so use them as a building block for a larger machine with a network Common names: CLUMP = Cluster of SMPs Hierarchical machines, constellations Most modern machines look like this: Millennium, IBM SPs, (not the t3e)... What is the right programming model??? Treat machine as flat, always use message passing, even within SMP (simple, but ignores an important part of memory hierarchy). Shared memory within one SMP, but message passing outside of an SMP.
7
30 UltraSPARCs
MEMORY
I/O
Network
Cycle Stealing Usually a workstation will be owned by an individual, group, department, or organisation - they are dedicated to the exclusive use by the owners. This brings problems when attempting to form a cluster of workstations for running distributed applications.
12
Cycle Stealing Typically, there are three types of owners, who use their workstations mostly for:
1. Sending and receiving email and preparing documents. 2. Software development - edit, compile, debug and test cycle. 3. Running compute-intensive applications.
13
Cycle Stealing Cluster computing aims to steal spare cycles from (1) and (2) to provide resources for (3). However, this requires overcoming the ownership hurdle - people are very protective of their workstations. Usually requires organizational mandate that computers are to be used in this way. Stealing cycles outside standard work hours (e.g. overnight) is easy, stealing idle cycles during work hours without impacting interactive use (both CPU and memory) is much harder.
14
Workstations performance doubles every 18 mon. Higher link bandwidth (v 100Mbit Ethernet)
Gigabit and 10Gigabit Switches
...Architectural Drivers
Clusters can be grown: Incremental scalability (up, down, and across)
Individual nodes performance can be improved by adding additional resource (new memory blocks/disks) New nodes can be added or nodes can be removed Clusters of Clusters and Metacomputing
17
1960
1990
1995+ 2000
19
Interconnect
Windows of Opportunities
MPP/DSM:
Compute across multiple systems: parallel.
Network RAM:
Idle memory in other nodes. Page across other nodes idle memory
Software RAID:
file system supporting parallel I/O and reliablity, massstorage.
Multi-path Communication:
Communicate across multiple networks: Ethernet, ATM, Myrinet
21
Parallel Processing
Scalable Parallel Applications require
good floating-point performance low overhead communication scalable network bandwidth parallel file system
22
Network RAM
Performance gap between processor and disk has widened. Thrashing to significantly disk degrades performance
Paging across networks can be effective with high performance networks and OS that recognizes idle machines Typically thrashing to network RAM can be 5 to 10 times faster than thrashing to disk
23
RAID cost per byte is high compared to single disks RAIDs are connected to host computers which are often a performance and availability bottleneck RAID in software, writing data across an array of workstation disks provides performance and some degree of redundancy provides availability.
24
25
26
Clustering Today
28
Cluster Components2 OS
Cluster Components3 High Performance Networks Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel FDDI
32
33
Cluster Components5 Communication Software Traditional OS supported facilities (heavy weight due to protocol processing)..
Sockets (TCP/IP), Pipes, etc.
SSI makes collection appear as single machine (globalised view of system resources). Telnet cluster.myinstitute.edu SA - Check pointing and process migration..
35
Hardware
DEC Memory Channel, DSM (Alewife, DASH) SMP Techniques
OS / Gluing Layers
Solaris MC, Unixware, Glunix
MPI
Linux, NT, on many Supercomputers
37
Compilers
C/C++/Java/ ; Parallel programming with C++ (MIT Press book)
RAD (rapid application development tools).. GUI based tools for PP modeling Debuggers Performance Analysis Tools Visualization Tools
38
System availability (HA). offer inherent high system availability due to the redundancy of hardware, operating systems, and applications. Hardware Fault Tolerance. redundancy for most system components (eg. disk-RAID), including both hardware and software. OS and application reliability. run multiple copies of the OS and applications, and through this redundancy Scalability. adding servers to the cluster or by adding more clusters to the network as the need arises or CPU to SMP. High Performance. (running cluster enabled programs)
40
41
Clusters Classification..1
42
43
Clusters Classification..2
44
Clusters Classification..3
45
46
Clusters Classification..4
Clusters Classification..5
Based on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):
Homogeneous Clusters
All nodes will have similar configuration
Heterogeneous Clusters Nodes based on different processors and running different OSes.
48
Clusters Classification..6a
(3) Network
Public Enterprise Campus Department Workgroup Uniprocessor
Metacomputing (GRID)
/ OS
Technology
(1)
/ CPU
I /O
ory M em
Platform
(2)
49
to many millions)
Departmental Clusters (#nodes: 99-999) Organizational Clusters (#nodes: many 100s) (using ATMs Net) Internet-wide Clusters=Global Clusters: (#nodes: 1000s
Metacomputing Web-based Computing Agent Based Computing
50
Application
PVM / MPI/ RSH
???
Hardware/OS
53
CC should support
Multi-user, time-sharing environments Nodes with different CPU speeds and memory sizes (heterogeneous configuration) Many processes, with unpredictable requirements Unlike SMP: insufficient bonds between nodes
Each computer operates independently Inefficient utilization of resources
54
Application
PVM / MPI/ RSH
Middleware or Underware
Hardware/OS
55
56
Checkpointing, Automatic Failover, recovery from failure, fault-tolerant operating among all cluster nodes.
57
Scalable Performance
Easy growth of cluster no change of API & automatic load distribution.
Enhanced Availability
Automatic Recovery from failures Employ checkpointing & fault tolerant technologies Handle consistency of data when replicated..
58
What is Single System Image (SSI) ? A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. A cluster without a SSI is not a cluster
59
60
Single File Hierarchy: xFS, AFS, Solaris MC Proxy Single Control Point: Management from single GUI Single virtual networking Single memory space - Network RAM / DSM Single Job Management: Glunix, Codine, LSF Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology
61
Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively
62
UP
63
Hardware Level
64
Parametric Computations
Nimrod/Clustor
Threads
Compilers CPU
+ +
x x
Load Load
66
67
PC Workstation
68
Mini Computer
PC
Vector Supercomputer
MPP
69
70
What Next ??
71
Computational/Data Grid
Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations
direct access to computers, sw, data, and other resources, rather than file exchange Such sharing rules defines a set of individuals and/or institutions, which form a virtual organization Examples of VOs: application service providers, storage service providers, cycle providers, etc
Grid computing is to develop protocols, services, and tools for coordinated resource sharing and problem solving in VOs
Security solutions for management of credentials and policies RM protocols and services for secure remote access Information query protocols and services for configuratin Data management, etc
72
Scalable Computing
P E R F O R M A N C E + Q o S
2100
2100
2100
2100
2100
2100
2100
2100
2100
Administrative Barriers
Individual Group Department Campus State National Globe Inter Planet Universe
Personal Device
SMPs or SuperComputers
Local Cluster
Enterprise Cluster/Grid
Global Grid
73