Apache ZooKeeper
Apache ZooKeeper
Workflow:
Once a ZooKeeper ensemble starts, it will wait for the clients to connect.
Clients will connect to one of the nodes in the ZooKeeper ensemble.
It may be a leader or a follower node. Once a client is connected, the node
assigns a session ID to the particular client and sends an acknowledgement to
the client.
If the client does not get an acknowledgment, it simply tries to connect another
node in the ZooKeeper ensemble.
Once connected to a node, the client will send heartbeats to the node in a regular
interval to make sure that the connection is not lost.
Locking and synchronization service − Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other
distributed applications like Apache HBase.
Highly reliable data registry − Availability of data even when one or a few
nodes are down.
Distributed applications offer a lot of benefits, but they throw a few complex and
hard-to-crack challenges as well. ZooKeeper framework provides a complete
mechanism to overcome all the challenges. Race condition and deadlock are
handled using fail-safe synchronization approach. Another main drawback is
inconsistency of data, which ZooKeeper resolves with atomicity.
Benefits of ZooKeeper
Ordered Messages
Reliability
Follower Nodes - All nodes other than Leader are called Follower Nodes. A
follower node is capable of servicing read requests on its own. For write
requests, it gets it done through Leader Node. Followers also play an
important role of electing a new leader if existing leader node goes down.
Apache Kafka uses it for choosing leader node for the topic partitions
Yahoo! utilties it as the coordination and failure recovery service for Yahoo!
Message Broker, which is a highly scalable publish-subscribe system
managing thousands of topics for replication and data delivery. It is used by
the Fetching Service for Yahoo! crawler, where it also manages failure
recovery.