Lec20 Distributed
Lec20 Distributed
• File Caching
• Durability
• Authorization
• Distributed Systems
recovery
group
• Each disk is fully duplicated onto its "shadow“
– For high I/O rate, high availability environments
– Most expensive solution: 100% capacity overhead
• Bandwidth sacrificed on write:
– Logical write = two physical writes
– Highest bandwidth when disk heads and rotation fully
synchronized (hard to do exactly)
• Reads may be optimized
– Can have two independent reads to same data
• Recovery:
– Disk failure replace disk and copy data to new disk
– Hot Spare: idle disk already attached to system to be
used for immediate replacement
11/10/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 20.16
RAID 5+: High I/O Rate Parity
Stripe
• Data stripped across Unit
multiple disks
D0 D1 D2 D3 P0
– Successive blocks
stored on successive Increasing
(non-parity) disks D4 D5 D6 P1 D7 Logical
Disk
– Increased bandwidth Addresses
over single disk D8 D9 P2 D10 D11
• Parity block (in green)
constructed by XORing D12 P3 D13 D14 D15
data bocks in stripe
– P0=D0D1D2D3 P4 D16 D17 D18 D19
– Can destroy any one
disk and still
reconstruct data D20 D21 D22 D23 P5
– Suppose D3 fails, Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
then can reconstruct:
D3=D0D1D2P0
• Later in term: talk about spreading information widely
across internet for durability.
11/10/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 20.17
Hardware RAID: Subsystem Organization
single board
host array disk
CPU
adapter controller controller
single board
disk
• Some systems duplicate controller
all hardware, namely
controllers, busses, etc. often piggy-backed
in small format devices
Server
Client/Server Model
Peer-to-Peer Model
• Centralized System: System in which major functions
are performed by a single physical computer
– Originally, everything on single computer
– Later: client/server model
• Distributed System: physically separate computers
working together on some task
– Early model: multiple servers working together
» Probably in the same room or building
» Often called a “cluster”
– Later models: peer-to-peer/wide-spread collaboration
11/10/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 20.25
Distributed Systems: Motivation/Issues
• Why do we want distributed systems?
– Cheaper and easier to build lots of simple computers
– Easier to add power incrementally
– Users can have complete control over some components
– Collaboration: Much easier for users to collaborate through
network resources (such as network file systems)
• The promise of distributed systems:
– Higher availability: one machine goes down, use another
– Better durability: store data in multiple locations
– More security: each piece easier to make secure
• Reality has been disappointing
– Worse availability: depend on every machine being up
» Lamport: “a distributed system is one where I can’t do work
because some machine I’ve never heard of isn’t working!”
– Worse reliability: can lose data if any machine crashes
– Worse security: anyone in world can break into system
• Coordination is more difficult
– Must coordinate multiple copies of shared state information
(using only a network)
– What would be easy in a centralized system becomes a lot
more difficult
11/10/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 20.26
Distributed Systems: Goals/Requirements
• Transparency: the ability of the system to mask its
complexity behind a simple interface
• Possible transparencies:
– Location: Can’t tell where resources are located
– Migration: Resources may move without the user knowing
– Replication: Can’t tell how many copies of resource exist
– Concurrency: Can’t tell how many users there are
– Parallelism: System may speed up large jobs by spliting
them into smaller pieces
– Fault Tolerance: System may hide varoius things that go
wrong in the system
• Transparency and collaboration require some way for
different processors to communicate with one another