22 Clusters Slides
22 Clusters Slides
Clusters
Paul Krzyzanowski
[email protected]
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons
Attribution 2.5 License.
Designing highly available systems
Incorporate elements of fault-tolerant design
– Replication, TMR
Problem: expensive!
Designing highly scalable systems
SMP architecture
Problem:
performance gain as f(# processors) is sublinear
– Contention for resources (bus, memory, devices)
– Also … the solution is expensive!
Clustering
Achieve reliability and scalability by
interconnecting multiple independent systems
• Other features:
– Scalable file I/O
– Dynamic process management
– Synchronization (barriers)
– Combining results
Programming tools: PVM
• Software that emulates a general-purpose
heterogeneous computing framework on
interconnected computers
• Process management
– Batch system
– Write software to control synchronization and load balancing
with MPI and/or PVM
– Preemptive distributed scheduling: not part of Beowulf (two
packages: Condor and Mosix)
Another example
• Rocks Cluster Distribution
– Based on CentOS Linux
DreamWorks:
• >3,000 servers and >1,000 Linux desktops
HP xw9300 workstations and HP DL145 G2 servers with 8 GB/server
• Shrek 3: 20 million CPU render hours. Platform LSF used for scheduling +
Maya for modeling + Avid for editing+ Python for pipelining – movie uses
24 TB storage
Single-queue work distribution
Render Farms:
– ILM:
• 3,000 processor (AMD) renderfarm; expands to 5,000 by harnessing
desktop machines
• 20 Linux-based SpinServer NAS storage systems and 3,000 disks from
Network Appliance
• 10 Gbps ethernet
• Commands
– Submit job scripts
• Submit interactive jobs
• Force a job to run
– List jobs
– Delete jobs
– Hold jobs
Load Balancing
for the web
Functions of a load balancer
Load balancing
Failover
www.mysite.com
Redirection
Simplest technique
HTTP REDIRECT error code
www.mysite.com
REDIRECT
www03.mysite.com
Redirection
Simplest technique
HTTP REDIRECT error code
www03.mysite.com
Redirection
• Trivial to implement
• Visible to customer
– Some don’t like it
www.mysite.com
Software load balancer
src=bobby, dest=www03
www.mysite.com
Software load balancer
src=bobby, dest=www03
www.mysite.com
response
Load balancing router
Routers have been getting smarter
– Most support packet filtering
– Add load balancing
Balancing decisions:
– Pick machine with least # TCP connections
– Factor in weights when selecting machines
– Pick machines round-robin
– Pick fastest connecting machine (SYN/ACK time)
High Availability
(HA)
High availability (HA)
Annual
Class Level Downtime
Continuous 100% 0
Architectural models
HA issues
How do you detect failover?
How long does it take to detect?
How does a dead application move/restart?
Where does it move to?
Heartbeat network
• Machines need to detect faulty systems
– “ping” mechanism
Active/Active
– Failed workload goes to remaining nodes
Design options for failover
Cold failover
– Application restart
Warm failover
– Application checkpoints itself periodically
– Restart last checkpointed image
– May use writeahead log (tricky)
Hot failover
– Application state is lockstep synchronized
– Very difficult, expensive (resources), prone
to software faults
Design options for failover
With either type of failover …
Multi-directional failover
– Failed applications migrate to / restart on
available systems
Cascading failover
– If the backup system fails, application can
be restarted on another surviving system
System support for HA
• Hot-pluggable devices
– Minimize downtime for component swapping
• Redundant devices
– Redundant power supplies
– Parity on memory
– Mirroring on disks (or RAID for HA)
– Switchover of failed components
• Diagnostics
– On-line serviceability
Shared resources (disk)
Shared disk
– Allows multiple systems to share access to
disk drives
Fabric A Fabric B
Achieving High Availability
Ethernet Ethernet
Storage Area
switch A’ heartbeat 3 switch B’ Network
(iSCSI)
ethernet A ethernet B
HA Storage: RAID
Redundant Array of Independent (Inexpensive)
Disks
RAID 0: Performance
Striping
• Advantages:
– Performance
– All storage capacity can be used
• Disadvantage:
– Not fault tolerant
RAID 1: HA
Mirroring
• Advantages:
– Double read speed
– No rebuild necessary if a disk fails: just copy
• Disadvantage:
– Only half the
space
RAID 3: HA
Separate parity disk
• Advantages:
– Very fast reads
– High efficiency: low ratio of parity/data
• Disadvantages:
– Slow random
I/O performance
– Only one I/O
at a time
RAID 5
Interleaved parity
• Advantages:
– Very fast reads
– High efficiency: low ratio of parity/data
• Disadvantage:
– Slower writes
– Complex
controller
RAID 1+0
Combine mirroring and striping
– Striping across a set of disks
– Mirroring of the entire set onto another set
The end