0% found this document useful (0 votes)

20 views57 pages

Lecture 18

Uploaded by

Roy abhisek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views57 pages

Lecture 18

Uploaded by

Roy abhisek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 57

ZOOKEEPER Ken Birman

Spring, 2018

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 1
CLOUD SYSTEMS HAVE
MANY “FILE SYSTEMS”
Before we discuss Zookeeper, let’s think about file systems.
Clouds have many!
One is for bulk storage: some form of “global file system” or GFS.
 At Google, it is actually called GFS
 At Amazon, S3 plays this role
 Azure uses “Azure storage fabric”

These often offer built-in block replication through a Linux feature,

but the guarantees are somewhat weak.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 2
HOW DO THEY WORK?
 A “Name Node” service runs, fault-tolerantly, and tracks file meta-data

(like a Linux inode): Name, create/update time, size, seek pointer, etc.
 The name node tells you which data nodes hold your file.
 Very common to use a simple DHT scheme to fragment the NameNode

into subsets, hopefully spreading the work around. DataNodes are

hashed at the block level (large blocks)
 Some form of primary/backup scheme for fault-tolerance. Writes are
automatically forwarded from the primary to the backup.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 3
HOW DO THEY WORK?
NameNo
de
open
File
MetaData
Copy of
metadata

read DataNo DataNo DataNo DataNo

de
DataNo de
DataNo de
DataNo de
DataNo
de de de de
File data

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 4
GLOBAL FILE SYSTEMS:
SUMMARY
Pros Cons
1. Scales well even for massive 1. Namenode (Master) can become
objects, overloaded, especially if individual
files become extremely popular.
2. Works well for large sequential
reads/writes, etc. 2. NameNode is a single-point of
failure
3. Provides High Performance
(massive throughput) 3. A slow NameNode can impact the
whole data center
4. Simple but robust reliability
model. 4. Concurrent writes to the same file
can interfere with one another
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 5
EDGE/FOG USE CASE
Building a smart highway: Like an air traffic control system,
but for cars

We want to make sure there is always exactly one

controller for each runway at the airport.

We need to be reliable: if the primary controller crashes,

the backup takes over within a few seconds.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 6
COULD WE USE A GLOBAL
FILE SYSTEM?
Yes, for images and videos of the cars

We could capture them on a cloud of tier-one data

collectors

Store the data into the global file system, run image
analysis on the files.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 7
WHAT ABOUT THE
CONSISTENCY ASPECT?
Consider the “configuration” of our application, which
implements this control behavior.

We need to track which machines are operational, what roles

they have.
 We want a file that will hold a table of resources and roles.
 Every application in the system would track the contents of
this file.
 So… this file is in the cloud file system! But which file system?
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 8
A, B and
C are

CLOUD FILE SYSTEM running A is

primary,

LIMITATIONS
C is
backup D A has
restarted
crashed

Consider this simple approach:

 We maintain a log of computer status events: “crashed”,
“recovered”…
 The log is append-only. When you sense something, do
a write to the
end of the log.
 Issue: If two events occur more or less at the same time,
one can
overwrite the other, hence one might be lost.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 9
OVERWRITES CAUSE
INCONSISTENCY!
We have discussed the concept of consistency many times, but haven’t really
spent so much time on its evil nemesis, inconsistency.

If we are logging versions of flight plans, when one update overwrites a

second one, some machines might have “seen” the lost one, and be in a state
different from others. An example of a split-brain problem.

If we are logging status of machines, some machines may think that C

crashed, but others never saw this message (worst case: maybe C really is
up, and the original log report was due to a transient timeout… but now half
out system thinks C is up, and half thinks C is down: another split brain
scenario)
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 10
AVOIDING “SPLIT BRAIN”
The name is from the title of an old science-fiction movie

We must never have two active controllers simultaneously,

or two different versions of the same flight plan record that
use the same version id.

You can turn this one type of mistake into many kinds of
risky problems that we would never want to tolerate in an
ATC system. So we must avoid such problems entirely!
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 11
ROOT ISSUE (1)
The quality of failure sensing is limited

If we sense faults because of noticing timeout, we might

make mistakes. Then if we reassign the role but the
“faulty” machine is really still running and was just
transiently inaccessible, we have two controllers!

This problem is unavoidable in a distributed system, so we

have to use agreement on membership, not “apparent
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 12
ROOT ISSUE (2)
In many systems two or more programs can try to write to
the same file at the same time, or to create the same file.

In such situations the normal Linux file system will work

correctly if the programs and the files are all on one
machine. Writes to the local file system won’t interfere.

But in distributed systems, using global file systems, we

lack this property!
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 13
HOW DOES A CONSISTENT
LOG SOLVE THIS?
If you trust the log, just read log records from top to bottom
you get an unambiguous way to track membership.
n effect,
Evenmembership isismanaged
if a logged record in“node
“wrong”, e.g. a consistent way.
6 has crashed”
Butit this
but works
hasn’t, we are only
forcediftowe can
agree trust
to use thatthe log
record.
And that in turn depends on the file system!
Equivalent mechanisms exist in systems like Derecho (self-
managed Paxos).

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 14
WITH S3 OR GFS WE
CANNOT TRUST A LOG!
These file systems don’t guarantee consistency!

They are unsafe with concurrent writes. Concurrent log appends could
 Overwrite one-another, so one is lost
 Be briefly perceived out of order, or some machine might glimpse a
written record that will then be erased a moment later and
overwritten
 Sometimes we can even have two conflicting versions that linger
for
extended periods.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 15
EXACTLY WHAT GOES
WRONG?
“Append-only log”
behavior In our application using it
1. Machines A, B and C are A is selected to be the primary
running controller for runway 7. C is
assigned as backup.
2-a. Machine D is launched
C notices 2-b, and takes over.
2-b. Concurrently, B thinks A But A wasn’t really crashed – B
crashed. was wrong!
3. 2-b is overwritten by 2-a Log entry 2-b is gone. A
4. A turns out to be fine, after keeps running.
all. Now we have A and C both in
the “control runway 7 role” – a
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 16
split brain!
WHY CAN’T WE JUST FIX
THE BUG?
First, be aware that S3 and GFS and similar systems are perfectly
fine for object storage by a single, non-concurrent writer.
 If nervous, take one more step and add a “self-certifying
signature”
 SHA3 hash codes are common for this, very secure and robust.
But there
are many options, even simple parity codes can help.

The reason that they don’t handle concurrent writes well is that the
required protocol is slower than the current weak consistency model.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 17
ZOOKEEPER: A SOLUTION
FOR THIS ISSUE
The need in many systems is for a place to store configuration, parameters,
lists of which machines are running, which nodes are “primary” or “backup”,
etc.

We desire a file system interface, but “strong, fault-tolerant semantics”

Zookeeper is widely used in this role. Stronger guarantees than GFS.

 Data lives in (small) files. Zookeeper is quite slow and not very
scalable.
 But even so, it is useful for many purposes. Even locking, synchronization.

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 18
SHOULD I USE ZOOKEEPER
FOR EVERYTHING?
Zookeeper isn’t for long-term storage, or for large objects. Put
those in the GFS. Then share the URL, which is small.

Use Zookeeper for small files used to do distributed

coordination, synchronization or configuration data (small
objects).

Mostly, try to have Zookeeper handle “active” configuration, so

it won’t need to persist data to a disk, at all.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 19
ZOOKEEPER DURABILITY
LIMITATION
Zookeeper is mostly used with no real “persistency” guarantee, meaning
that if we shut it down completely, it normally loses any “state”

There is a checkpointing mechanism, but not fully synchronized with file

updates. Recent updates might not yet have been checkpointed.
 The developers view this as a tradeoff for high performance.
 Normally, it is configured to run once every 5s.
 Many applications simply leave Zookeeper running and if it shuts
down,
the whole application must shut down, then restart.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 20
HOW DOES ZOOKEEPER
WORK?
Zookeeper has a layered implementation.

The health of all components is tracked, so that we know who is up and who
has crashed. In Derecho, this is called membership status.
 The Zookeeper meta-data layer is a single program: consistent by design.
 The Zookeeper data replication layer uses an atomic multicast to ensure
that
all replicas are in the same state.
 For long-term robustness, they checkpoint periodically (every 5s) and
restart
from checkpoint.
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 21
DETAILS ON ZOOKEEPER
(THIS IS THEIR SLIDE SET
AND REPEATS SOME PARTS
OF MINE)
Hunt, P., Konar, M., Junqueira, F.P. and Reed, B., 2010, June.
ZooKeeper: Wait-free Coordination for Internet-scale
Systems. In USENIX Annual Technical Conference (Vol. 8, p.
9).

Several other papers can be found on

http://scholar.google.com.
ZOOKEEPER GOALS

An open-source platform created to behave like a simple

file system
Easy to use, fault-tolerant. A bit slow, but not to a point of
being an issue.

Unlike standard cloud computing file systems, provides

strong guarantees.

Employed in many cloud computing platforms as a quick

HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 23
HOW IS ZOOKEEPER USED
TODAY?
Naming service − Identifying the nodes in a cluster by name. It is similar to
DNS, but for nodes
Configuration management − Latest and up-to-date configuration information
of the system for a joining node
Cluster management − Joining / leaving of a node in a cluster and node status
at real time
Leader election − Electing a node as leader for coordination purpose
Locking and synchronization service − Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other
distributed applications like Apache HBase
Highly reliable data registry − Availability of data even when one or a few
nodes are down
Recall the Browser
Amazon
HTTPS
-Services
example Web Interface Server

Client SDK

In -Services,
HTTP or TCP/IP
these need
Server SDK resource
Kafka or SQS Application Server management
Resource Plugins and
scheduling

Shipment Mailing
Billing Packing
Planner Labels
A SIMPLE -SERVICE
SYSTEM Microservice 1
Microservice 1
Microservice 1

N Jobs API Gateway

UI Server Microservice 2
& Router Microservice 2
Microservice 2

Some questions that might arise:

• Is Replica 2 of microservice 3 up and
running?
• Do I have at least one service running? Microservice 3
Microservice 3
Microservice 3
• Microservice 3 uses Master-Worker, and
the Master just failed. What do I do?
• Replica 2 needs to find configuration
APACHE ZOOKEEPER AND -
SERVICES
Zookeeper can manage
information in your system
IP addresses, version numbers,
and other configuration
information of your
microservices.
The health of the
microservices.
The state of a particular
calculation.
Group membership
APACHE ZOOKEEPER IS…
A system for solving distributed coordination
problems for multiple cooperating clients.
A lot like a distributed file system...
 As long as the files are tiny.
 You could get notified when the file changes
 The full file pathname is meaningful to applications

A way to solve -service management problems.

THE ZOOKEEPER SERVICE
Zookeeper is
itself an
interesting
distributed
These are system
your
microservice
s ZooKeeper Service is replicated over a set of machines
All machines store a copy of the data in memory (!). Checkpointed to disk if you
wish.
A leader is elected on service startup
Clients only connect to a single ZooKeeper server & maintains a TCP connection.
Client can read from any Zookeeper server.
Writes go through the leader & need majority consensus.
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription
ZOOKEEPER SUMMARY
ZooKeeper provides a simple and high performance kernel
for building more complex coordination primitives.
 Helps distributed components share information

Clients (your applications) contact Zookeeper services to

read and write metadata.
 Read from cache but writes are more complicated
 Sometimes, just the existence and name of a node are enough
ZOOKEEPER SUMMARY
(CONT.)
Tree model for organizing information into nodes.
 Node names may be all you need
 Lightly structured metadata stored in the nodes.

Wait-free aspects of shared registers with an event-

driven mechanism similar to cache invalidations of
distributed file systems
Targets simple metadata systems that read more than
they write.
 Small total storage
ZOOKEEPER, MORE BRIEFLY
Zookeeper Clients (that is, your applications) can create
and discover nodes on Zookeeper trees
Clients can put small pieces of data into the nodes and get
small pieces out.
 1 MB max for all data per server by default
 Each node also has built-in metadata like its version number.

You could build a small DNS with Zookeeper.

Some simple analogies
 Lock files and .pid files on Linux systems.
ZNODES
Maintain file meta-data with
version numbers for data changes,
ACL changes and timestamps.
Version numbers increases with
changes
Data is read and written in its
entirety
ZNODE TYPES
Regular • Clients create and delete explicitly

• Like regular znodes associated with

Ephemeral sessions
• Deleted when session expires

• Property of regular and ephemeral znodes

Sequential • Has a universal, monotonically increasing
counter appended to the name
ZOOKEEPER API (1/2)
create(path, data, flags): Creates a znode with path
name path, stores data[] in it, and returns the name of the
new znode.
 flags enables a client to select the type of znode: regular, ephemeral,
and set the sequential flag;
delete(path, version): Deletes the znode path if that
znode is at the expected version
exists(path, watch): Returns true if the znode with path
name path exists, and returns false otherwise.
 Note the watch flag
ZOOKEEPER API (2/2)
getData(path, watch): Returns the data and meta-data,
such as version information, associated with the znode.
setData(path, data, version): Writes data[] to znode path
if the version number is the current version of the znode
getChildren(path, watch): Returns the set of names of
the children of a znode
sync(path): Waits for all updates pending at the start of the
operation to propagate to the server that the client is
connected to.
CONFIGURATION
MANAGEMENT
All clients get their configuration information from a named
znode
 /root/config-me

Example: you can build a public key store with Zookeeper

Clients set watches to see if configurations change
Zookeeper doesn’t explicitly decide which clients are allowed to
update the configuration.
 That would be an implementation choice
 Zookeeper uses leader-follower model internally, so you could model
your own implementation after this.
THE RENDEZVOUS PROBLEM
Classic distributed computing algorithm
Consider master-worker
 Specific configurations may not be known until runtime
 EX: IP addresses, port numbers
 Workers and master may start in any order

Zookeeper implementation:
 Create a rendezvous node: /root/rendezvous
 Workers read /root/rendezvous and set a watch
 If empty, use watch to detect when master posts its configuration information
 Master fills in its configuration information (host, port)
 Workers are notified of content change and get the configuration information
LOCKS
Familiar analogy: lock files used by Apache HTTPD and MySQL
processes
Zookeeper example: who is the leader with primary copy of data?
Implementation:
 Leader creates an ephemeral file: /root/leader/lockfile
 Other would-be leaders place watches on the lock file
 If the leader client dies or doesn’t renew the lease, clients can attempt to create
a replacement lock file
Use SEQUENTIAL to solve the herd effect problem.
 Create a sequence of ephemeral child nodes
 Clients only watch the node immediately ahead of them in the sequence
Zookeeper is Browser
popular in
HTTPS
science-
computing Web Interface Server
gateways Client SDK

In micro-
HTTP or TCP/IP
service arch,
“Super” Server SDK these also
Scheduling and Application Server need
Resource scheduling
Resource Plugins
Management
Different
archs,
Karst:
schedulers, MOAB/Torq
Stampede: Comet: Jureca:SLU
SLURM SLURM RM
admin ue
domains,
With -
Services we API Server

often
replicate Application Metadata
components. Manager Server

API Server
API Server
API Server
API Server

Application Metadata
Application Metadata
Manager
Application Server
Metadata
Manager
Application Server
Metadata
Manager
Application Server
Metadata
Manager
Application Server
Metadata
Manager Server
Manager Server
WHY IS THIS FORM OF
REPLICATION NEEDED?
Fault tolerance
Increased throughput, load balancing
Component versions
 Not all components of the same type need to be on the same
version
 Backward compatibility checking

Component flavors
 Application managers can serve different types of resources
 Useful to separate them into separate processes if libraries conflict.
CONFIGURATION
MANAGEMENT
Problem: gateway components in a distributed system
need to get the correct configuration file.
Solution: Components contact Zookeeper to get
configuration metadata.
Comments: this includes both the component’s own
configuration file as well as configurations for other
components
 Rendezvous problem
SERVICE DISCOVERY
Problem: Component A needs to find instances of
Component B
Solution: Use Zookeeper to find available group members
instances of Component B
 More: get useful metadata about Component B instances like
version, domain name, port #, flavor
Comments
 Useful for components that need to directly communicate but not
for asynchronous communication (message queues)
GROUP MEMBERSHIP
Problem: a job needs to go to a specific flavor of
application manager. How can this be located?
Solution: have application managers join the appropriate
Zookeeper managed group when they come up.
Comments: This is useful to support scheduling
SYSTEM STATE FOR
DISTRIBUTED SYSTEMS
Which servers are up and running? What versions?
Services that run for long periods could use ZK to indicate
if they are busy (or under heavy load) or not.
Note overlap with our Registry
 What state does the Registry manage? What state would be more
appropriate for ZK?
LEADER ELECTION
Problem: metadata servers are replicated for read access
but only the master has write privileges. The master
crashes.
Solution: Use Zookeeper to elect a new metadata server
leader.
Comment: this is not necessarily the best way to do this
UNDER THE ZAB
ZOOKEEPER HOOD
Zab
Protocol
ZOOKEEPER HANDLING OF
WRITES
READ requests are served by any Zookeeper server
 Scales linearly, although information can be stale

WRITE requests change state so are handed differently

 One Zookeeper server acts as the leader
 The leader executes all write requests forwarded by followers
 The leader then broadcasts the changes
 The update is successful if a majority of Zookeeper servers
have correct
state at the end of the protocol
SOME ZOOKEEPER
IMPLEMENTATION
SIMPLIFICATIONS
Uses TCP for its transport layer.
 Message order is maintained by the network
 The network is reliable?

Assumes reliable file system

 Logging and DB checkpointing
SOME ZOOKEEPER
IMPLEMENTATION
SIMPLIFICATIONS
Does write-ahead logging
 Requests are first written to the log
 The Zookeeper DB is updated from the log

Zookeeper servers can acquire correct state by reading the

logs from the file system
 Checkpoint reading means you don’t have to reread the entire
history
Assumes a single administrator so no deep security
Speed isn’t
everything. Having
many servers
increases
reliability but
decreases
1. Failure and recovery
of follower.
2. Failure and recovery
of follower.
3. Failure of leader
(200 ms to recover).
4. Failure of two
followers (4a and
4b), recovery at 4c.
5. Failure of leader
6. Recovery of leader
(?)

luster of 5 zookeeper instances responds to manually injected failures.

STATE OF THE ART TODAY?
Zookeeper is solving a problem that Leslie Lamport formalized as the
state machine replication problem. Much work has been done on this.

The “Paxos” protocols solve this problem. Zookeeper’s ZAB is similar

to the Paxos concept of an “Atomic Multicast” (sometimes called
“Vertical Paxos”)

But checkpointing is not the same as the true durable Paxos. Durable
Paxos is like checkpointing on every operation. Zookeeper does so
every 5s
HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2018SP 56
SUMMARY AND CAUTIONS
Zookeeper is powerful for system management/configuration data
Derecho is great for ultra-high-speed atomic multicast, replication
A message queue (Corfu) is also a powerful distributed computing
concept
 You could build a queuing system with Zookeeper, but you shouldn’t
 https://cwiki.apache.org/confluence/display/CURATOR/TN
 There are high quality queuing systems already

Where is the state of your system? Make one choice.

The Richest Man in Babylon
100% (1)
The Richest Man in Babylon
139 pages
EU GDPR Opportunities For Grocery Retail
100% (1)
EU GDPR Opportunities For Grocery Retail
24 pages
SPARK
No ratings yet
SPARK
66 pages
Design, Modeling and Control of Modular Multilevel Converter Based HVDC Systems
No ratings yet
Design, Modeling and Control of Modular Multilevel Converter Based HVDC Systems
49 pages
DATA GATHERING PROCEDURE of A Research
No ratings yet
DATA GATHERING PROCEDURE of A Research
8 pages
Base de Datos de Puntuaciones en Revistas de Videojuegos
0% (1)
Base de Datos de Puntuaciones en Revistas de Videojuegos
416 pages
Amazon EC2 Auto Scaling User Guide
No ratings yet
Amazon EC2 Auto Scaling User Guide
367 pages
Unit III
No ratings yet
Unit III
120 pages
Difficult Heritage and Immersive Technologies
No ratings yet
Difficult Heritage and Immersive Technologies
41 pages
Computer Applications
No ratings yet
Computer Applications
217 pages
MFL67728105 00 160519 1
No ratings yet
MFL67728105 00 160519 1
284 pages
Rama 88203 06011181621075
No ratings yet
Rama 88203 06011181621075
108 pages
AE - Cerberus Portable-Diver-Detection-Sonar
No ratings yet
AE - Cerberus Portable-Diver-Detection-Sonar
2 pages
Reader Configuration Codes
No ratings yet
Reader Configuration Codes
82 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
GMM1
No ratings yet
GMM1
120 pages
Tiv PDF
No ratings yet
Tiv PDF
1 page
Economic Impact of Traffic Signals
100% (1)
Economic Impact of Traffic Signals
6 pages
Features and Applications of The P82B715 I2C-bus Extender
No ratings yet
Features and Applications of The P82B715 I2C-bus Extender
29 pages
Letters From Stardock To Valve and GOG Regarding DMCA Claims of Ford and Reiche
No ratings yet
Letters From Stardock To Valve and GOG Regarding DMCA Claims of Ford and Reiche
6 pages
Auto Cad 2D Multiple Choice Questions & Answers - Interview Questions and Answers - AtozIQ
100% (11)
Auto Cad 2D Multiple Choice Questions & Answers - Interview Questions and Answers - AtozIQ
3 pages
Ir 2113
No ratings yet
Ir 2113
18 pages
Introduction of Internet of Things: Drive For Ever
No ratings yet
Introduction of Internet of Things: Drive For Ever
13 pages
Distributed File System Google File System
No ratings yet
Distributed File System Google File System
44 pages
Matching Pursuit Decomposition of EEG
No ratings yet
Matching Pursuit Decomposition of EEG
22 pages
Ighpk6006b Itr Status
No ratings yet
Ighpk6006b Itr Status
2 pages
Appointments For The Case Study
No ratings yet
Appointments For The Case Study
7 pages
C Programming Language: Bitwise Structures
No ratings yet
C Programming Language: Bitwise Structures
11 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
10122me703 Computer Integrated Manufacturing
No ratings yet
10122me703 Computer Integrated Manufacturing
2 pages
Security Testing
No ratings yet
Security Testing
6 pages
Health Checkup Bill Format
No ratings yet
Health Checkup Bill Format
1 page
CS2510 00 Distributed Storage Overview
No ratings yet
CS2510 00 Distributed Storage Overview
53 pages
Distributed File System
No ratings yet
Distributed File System
68 pages
File MGT (Module 11)
No ratings yet
File MGT (Module 11)
99 pages
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
GD - Gregory Adekoya - Submitted
No ratings yet
GD - Gregory Adekoya - Submitted
1 page
Unit V - OS
No ratings yet
Unit V - OS
23 pages
Transcript
No ratings yet
Transcript
1 page
Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and Minio
No ratings yet
Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and Minio
9 pages
Distributed Systems Practitioners Dimos Raptis Raspoznan
No ratings yet
Distributed Systems Practitioners Dimos Raptis Raspoznan
259 pages
New Language Line Job Infographic
No ratings yet
New Language Line Job Infographic
1 page
Order PDF
No ratings yet
Order PDF
1 page
Gytha John Harikrishnan Hridya S7Cse: Presented by
No ratings yet
Gytha John Harikrishnan Hridya S7Cse: Presented by
17 pages
3distributed File System
No ratings yet
3distributed File System
42 pages
Maths 9th 2016
No ratings yet
Maths 9th 2016
2 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
CC - Lecture 8-Final
No ratings yet
CC - Lecture 8-Final
51 pages
Module 5
No ratings yet
Module 5
36 pages
He-Phan-Bo - Thoai-Nam - Distributedsystem - 16 - Fileservice - (Cuuduongthancong - Com)
No ratings yet
He-Phan-Bo - Thoai-Nam - Distributedsystem - 16 - Fileservice - (Cuuduongthancong - Com)
28 pages
Distributed File Systems
No ratings yet
Distributed File Systems
6 pages
Transaction Properties: Acid vs. Base
No ratings yet
Transaction Properties: Acid vs. Base
13 pages
CH 10
No ratings yet
CH 10
42 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
Outline: File System Consistency Issues in The Presence of Failures
No ratings yet
Outline: File System Consistency Issues in The Presence of Failures
4 pages
Acid Vs Base
No ratings yet
Acid Vs Base
13 pages
Lec7 Logging
No ratings yet
Lec7 Logging
4 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
CH 10
No ratings yet
CH 10
43 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
29 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Stepper Motors Catalog
100% (1)
Stepper Motors Catalog
35 pages
4
No ratings yet
4
53 pages
Operating System
No ratings yet
Operating System
40 pages
L8 DFS
No ratings yet
L8 DFS
35 pages
06 dfs2
No ratings yet
06 dfs2
50 pages
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
No ratings yet
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
7 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
5.distributed File System
No ratings yet
5.distributed File System
86 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
Reliable Distributed Systems
No ratings yet
Reliable Distributed Systems
44 pages
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
Hadoop Intro
No ratings yet
Hadoop Intro
40 pages
Unit 6: File-System Interface
No ratings yet
Unit 6: File-System Interface
43 pages
64 Prerna Jain Dspractassg11
No ratings yet
64 Prerna Jain Dspractassg11
8 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
DISTRIBUTEDFILESYS
No ratings yet
DISTRIBUTEDFILESYS
16 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Distributed File Systems: Pavel Bžoch
No ratings yet
Distributed File Systems: Pavel Bžoch
36 pages
Distributed File System
No ratings yet
Distributed File System
27 pages
L6 DFS
No ratings yet
L6 DFS
27 pages
18-Distributed File Systems Study On Operating Systems
No ratings yet
18-Distributed File Systems Study On Operating Systems
24 pages
6.1 Cassandra
No ratings yet
6.1 Cassandra
21 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
Other File Systems: LFS, NFS, and Afs
No ratings yet
Other File Systems: LFS, NFS, and Afs
37 pages
Distributed File Systems: Arvind Krishnamurthy Spring 2001
No ratings yet
Distributed File Systems: Arvind Krishnamurthy Spring 2001
3 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)

Lecture 18

Uploaded by

Lecture 18

Uploaded by

ZOOKEEPER Ken Birman

These often offer built-in block replication through a Linux feature,

into subsets, hopefully spreading the work around. DataNodes are

read DataNo DataNo DataNo DataNo

We want to make sure there is always exactly one

We need to be reliable: if the primary controller crashes,

We could capture them on a cloud of tier-one data

We need to track which machines are operational, what roles

CLOUD FILE SYSTEM running A is

Consider this simple approach:

If we are logging versions of flight plans, when one update overwrites a

If we are logging status of machines, some machines may think that C

We must never have two active controllers simultaneously,

If we sense faults because of noticing timeout, we might

This problem is unavoidable in a distributed system, so we

In such situations the normal Linux file system will work

But in distributed systems, using global file systems, we

We desire a file system interface, but “strong, fault-tolerant semantics”

Zookeeper is widely used in this role. Stronger guarantees than GFS.

Use Zookeeper for small files used to do distributed

Mostly, try to have Zookeeper handle “active” configuration, so

There is a checkpointing mechanism, but not fully synchronized with file

Several other papers can be found on

An open-source platform created to behave like a simple

Unlike standard cloud computing file systems, provides

Employed in many cloud computing platforms as a quick

N Jobs API Gateway

Some questions that might arise:

A way to solve -service management problems.

Clients (your applications) contact Zookeeper services to

Wait-free aspects of shared registers with an event-

You could build a small DNS with Zookeeper.

• Like regular znodes associated with

• Property of regular and ephemeral znodes

Example: you can build a public key store with Zookeeper

WRITE requests change state so are handed differently

Assumes reliable file system

Zookeeper servers can acquire correct state by reading the

luster of 5 zookeeper instances responds to manually injected failures.

The “Paxos” protocols solve this problem. Zookeeper’s ZAB is similar

Where is the state of your system? Make one choice.

You might also like