0% found this document useful (0 votes)

16 views75 pages

Unit5 BDA

Uploaded by

Uday Kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views75 pages

Unit5 BDA

Uploaded by

Uday Kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

1.

ZOOKEEPER – OVERVIEW ZooKeeper

ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and
managing a service in a distributed environment is a complicated process. ZooKeeper solves this
issue with its simple architecture and API. ZooKeeper allows developers to focus on core
application logic without worrying about the distributed nature of the application.

The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an
easy and robust manner. Later, Apache ZooKeeper became a standard for organized service
used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses
ZooKeeper to track the status of distributed data.

Before moving further, it is important that we know a thing or two about distributed applications.
So, let us start the discussion with a quick overview of distributed applications.

DistributedApplication
A distributed application can run on multiple systems in a network at a given time
(simultaneously) by coordinating among themselves to complete a particular task in a fast and
efficient manner. Normally, complex and time-consuming tasks, which will take hours to
complete by a non-distributed application (running in a single system) can be done in minutes
by a distributed application by using computing capabilities of all the system involved.

The time to complete the task can be further reduced by configuring the distributed application
to run on more systems. A group of systems in which a distributed application is running is called
a Cluster and each machine running in a cluster is called a Node.

A distributed application has two parts, Server and Client application. Server applications are
actually distributed and have a common interface so that clients can connect to any server in

1
ZooKeeper

the cluster and get the same result. Client applications are the tools to interact with a distributed
application.

Benefits of Distributed Applications

 Reliability – Failure of a single or a few systems does not make the whole system to
fail.

 Scalability – Performance can be increased as and when needed by adding more

machines with minor change in the configuration of the application with no downtime.

 Transparency – Hides the complexity of the system and shows itself as a single entity
/ application.

Challenges of Distributed Applications

 Race condition - Two or more machines trying to perform a particular task, which
actually needs to be done only by a single machine at any given time. For example,
shared resources should only be modified by a single machine at any given time.

 Deadlock – Two or more operations waiting for each other to complete indefinitely.

 Inconsistency – Partial failure of data.

What is Apache ZooKeeper Meant For?

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between
themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself
a distributed application providing services for writing a distributed application.

The common services provided by ZooKeeper are as follows:

2
ZooKeeper

 Naming service – Identifying the nodes in a cluster by name. It is similar to DNS, but
for nodes.

 Configuration management – Latest and up-to-date configuration information of the

system for a joining node.

 Cluster management – Joining / leaving of a node in a cluster and node status at real
time.

 Leader election – Electing a node as leader for coordination purpose.

 Locking and synchronization service – Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other distributed
applications like Apache HBase.

 Highly reliable data registry – Availability of data even when one or a few nodes are
down.

Distributed applications offer a lot of benefits, but they throw a few complex and hard-to-crack
challenges as well. ZooKeeper framework provides a complete mechanism to overcome all the
challenges. Race condition and deadlock are handled using fail-safe synchronization
approach. Another main drawback is inconsistency of data, which ZooKeeper resolves with
atomicity.

Benefits of ZooKeeper
Here are the benefits of using ZooKeeper:

 Simple distributed coordination process

 Synchronization – Mutual exclusion and co-operation between server processes. This

process helps in Apache HBase for configuration management.

 Ordered Messages

 Serialization – Encode the data according to specific rules. Ensure your application runs
consistently. This approach can be used in MapReduce to coordinate queue to execute
running threads.

 Reliability

 Atomicity – Data transfer either succeed or fail completely, but no transaction is partial.

3
2. ZOOKEEPER – FUNDAMENTALSZooKeeper

Before going deep into the working of ZooKeeper, let us take a look at the fundamental concepts
of ZooKeeper. We will discuss the following topics in this chapter:

 Architecture
 Hierarchical namespace
 Session
 Watches

Architecture of ZooKeeper
Take a look at the following diagram. It depicts the “Client-Server Architecture” of ZooKeeper.

4
ZooKeeper

Each one of the components that is a part of the ZooKeeper architecture has been explained in
the following table.

Part Description

Clients, one of the nodes in our distributed application cluster, access

information from the server. For a particular time interval, every client
sends a message to the server to let the sever know that the client is
alive.
Client
Similarly, the server sends an acknowledgement when a client connects.
If there is no response from the connected server, the client
automatically redirects the message to another server.

Server, one of the nodes in our ZooKeeper ensemble, provides all the
Server services to clients. Gives acknowledgement to client to inform that the
server is alive.

Group of ZooKeeper servers. The minimum number of nodes that is

Ensemble
required to form an ensemble is 3.

Server node which performs automatic recovery if any of the connected

Leader
node failed. Leaders are elected on service startup.

Follower Server node which follows leader instruction.

Hierarchical Namespace
The following diagram depicts the tree structure of ZooKeeper file system used for memory
representation. ZooKeeper node is referred as znode. Every znode is identified by a name and
separated by a sequence of path (/).

 In the diagram, first you have a root znode separated by “/”. Under root, you have two
logical namespaces config and workers.

 The config namespace is used for centralized configuration management and the
workers namespace is used for naming.

 Under config namespace, each znode can store upto 1MB of data. This is similar to UNIX
file system except that the parent znode can store data as well. The main purpose of this
structure is to store synchronized data and describe the metadata of the znode. This
structure is called as ZooKeeper Data Model.

5
ZooKeeper

Every znode in the ZooKeeper data model maintains a stat structure. A stat simply provides
the metadata of a znode. It consists of Version number, Access Control List (ACL), Timestamp,
and Data length.

 Version number: Every znode has a version number, which means every time the data
associated with the znode changes, its corresponding version number would also
increased. The use of version number is important when multiple zookeeper clients are
trying to perform operations over the same znode.

 Access Control List (ACL): ACL is basically an authentication mechanism for accessing
the znode. It governs all the znode read and write operations.

 Timestamp: Timestamp represents time elapsed from znode creation and modification.
It is usually represented in milliseconds. ZooKeeper identifies every change to the znodes
from “Transaction ID” (zxid). Zxid is unique and maintains time for each transaction so
that you can easily identify the time elapsed from one request to another request.

 Data length: Total amount of the data stored in a znode is the data length. You can
store a maximum of 1MB of data.

6
ZooKeeper

Types of Znodes
Znodes are categorized as persistence, sequential, and ephemeral.

 Persistence znode: Persistence znode is alive even after the client, which created that
particular znode, is disconnected. By default, all znodes are persistent unless otherwise
specified.

 Ephemeral znode: Ephemeral znodes are active until the client is alive. When a client
gets disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted
automatically. For this reason, only ephemeral znodes are not allowed to have a children
further. If an ephemeral znode is deleted, then the next suitable node will fill its position.
Ephemeral znodes play an important role in Leader election.

 Sequential znode: Sequential znodes can be either persistent or ephemeral. When a

new znode is created as a sequential znode, then ZooKeeper sets the path of the znode
by attaching a 10 digit sequence number to the original name. For example, if a znode
with path /myapp is created as a sequential znode, ZooKeeper will change the path to
/myapp0000000001 and set the next sequence number as 0000000002. If two
sequential znodes are created concurrently, then ZooKeeper never uses the same number
for each znode. Sequential znodes play an important role in Locking and Synchronization.

Wireless Home Security System
No ratings yet
Wireless Home Security System
98 pages
OOP 8 - Object-Oriented Programming Principles
No ratings yet
OOP 8 - Object-Oriented Programming Principles
32 pages
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
No ratings yet
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
10 pages
Controlled Hand Gestures Using Python and OpenCV
No ratings yet
Controlled Hand Gestures Using Python and OpenCV
7 pages
Zookeeper Tutorial
100% (1)
Zookeeper Tutorial
43 pages
Zookeeper
100% (1)
Zookeeper
42 pages
Programmable Logic Devices
No ratings yet
Programmable Logic Devices
40 pages
Anurag Group of Institutions: (Formerly CVSR College of Engineering)
No ratings yet
Anurag Group of Institutions: (Formerly CVSR College of Engineering)
34 pages
Etl Tools and Comparison of Different Tools
100% (1)
Etl Tools and Comparison of Different Tools
3 pages
Jbasic Users Guide
No ratings yet
Jbasic Users Guide
247 pages
6th Gen Core PCH U y Io Datasheet Vol 1
No ratings yet
6th Gen Core PCH U y Io Datasheet Vol 1
278 pages
Apache Zookeeper
No ratings yet
Apache Zookeeper
31 pages
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
Unit-5 BDA
No ratings yet
Unit-5 BDA
96 pages
Zookeeper HOD
No ratings yet
Zookeeper HOD
85 pages
Chapter 4 - Stacks
No ratings yet
Chapter 4 - Stacks
38 pages
ViewPower User Manual
No ratings yet
ViewPower User Manual
46 pages
Chapter 4 Linked Stacks and Queues
No ratings yet
Chapter 4 Linked Stacks and Queues
56 pages
Zookeeper HBase SPARK
No ratings yet
Zookeeper HBase SPARK
25 pages
Zookeeper
No ratings yet
Zookeeper
59 pages
Lecture 5 Archof Confand Cood Systems
No ratings yet
Lecture 5 Archof Confand Cood Systems
42 pages
Zookeeper Tomwheeler Ll-20120607
No ratings yet
Zookeeper Tomwheeler Ll-20120607
23 pages
Apache Zookeeper
No ratings yet
Apache Zookeeper
28 pages
Apache ZooKeeper - Mesosphere
No ratings yet
Apache ZooKeeper - Mesosphere
27 pages
Hadoop Questions
No ratings yet
Hadoop Questions
61 pages
2.2 BA ZC426 RTA Apache ZooKeeper
No ratings yet
2.2 BA ZC426 RTA Apache ZooKeeper
24 pages
Zookeeper and Hbase
No ratings yet
Zookeeper and Hbase
43 pages
Zookeeper
No ratings yet
Zookeeper
23 pages
User Manual Quickhash-GUI-Windows-v3.3.4 para Hashear Archivos
No ratings yet
User Manual Quickhash-GUI-Windows-v3.3.4 para Hashear Archivos
38 pages
Zookeeper Programmers
No ratings yet
Zookeeper Programmers
20 pages
How Kids Can Start Learning IoT
No ratings yet
How Kids Can Start Learning IoT
2 pages
7-Mode Transition Tool 2.1
No ratings yet
7-Mode Transition Tool 2.1
27 pages
Zookeeper Tutorial
No ratings yet
Zookeeper Tutorial
24 pages
P 2 Linked List PDF
No ratings yet
P 2 Linked List PDF
16 pages
Zookeeper
No ratings yet
Zookeeper
28 pages
Zookeeper Hbase
No ratings yet
Zookeeper Hbase
14 pages
Project Compputer Class 9
No ratings yet
Project Compputer Class 9
19 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
Rocket Launcher Using Opengl
No ratings yet
Rocket Launcher Using Opengl
31 pages
Unit 2
No ratings yet
Unit 2
24 pages
Distributed Systems: Tutorial 6 - Apache Zookeeper™
No ratings yet
Distributed Systems: Tutorial 6 - Apache Zookeeper™
18 pages
Unit - 5 Updated MHM
No ratings yet
Unit - 5 Updated MHM
25 pages
Zookeeper: Coordinating Your Cluster
No ratings yet
Zookeeper: Coordinating Your Cluster
13 pages
Zookeeper Started
No ratings yet
Zookeeper Started
7 pages
Java Notes
No ratings yet
Java Notes
18 pages
Chp8 ZooKeeper Slider and Knox
No ratings yet
Chp8 ZooKeeper Slider and Knox
28 pages
Zookeeper
No ratings yet
Zookeeper
14 pages
Unit V-HBase
No ratings yet
Unit V-HBase
10 pages
Introduction To Zookeeper
No ratings yet
Introduction To Zookeeper
8 pages
Unit 5 Lecture No-4 (Zookeeper)
No ratings yet
Unit 5 Lecture No-4 (Zookeeper)
20 pages
Unit 5 Lecture No-4 (Zookeeper)
No ratings yet
Unit 5 Lecture No-4 (Zookeeper)
20 pages
Module 3-2
No ratings yet
Module 3-2
26 pages
The Origin of The Name "Zookeeper"
No ratings yet
The Origin of The Name "Zookeeper"
4 pages
Apache ZooKeeper Introduction
No ratings yet
Apache ZooKeeper Introduction
4 pages
Ga H61M S1
No ratings yet
Ga H61M S1
6 pages
Canon Therefore
No ratings yet
Canon Therefore
8 pages
Report
No ratings yet
Report
4 pages
HPE Trueview-4aa4-3937enw
No ratings yet
HPE Trueview-4aa4-3937enw
6 pages
Zookeeper
No ratings yet
Zookeeper
4 pages
Apache ZooKeeper
No ratings yet
Apache ZooKeeper
3 pages
Rfo Benchmark
No ratings yet
Rfo Benchmark
2 pages
SEDCAD4 Information 2-2023hjllll
No ratings yet
SEDCAD4 Information 2-2023hjllll
4 pages
MP Lab Final Question Set
No ratings yet
MP Lab Final Question Set
3 pages
Why Is The Central Management Server (CMS) Data Not Synchronizing - GFI LanGuard Support
No ratings yet
Why Is The Central Management Server (CMS) Data Not Synchronizing - GFI LanGuard Support
3 pages
MIT 6.824 - Lecture 8 - ZooKeeper
No ratings yet
MIT 6.824 - Lecture 8 - ZooKeeper
1 page
JETIR2211292
No ratings yet
JETIR2211292
3 pages
Question Bank: Subject Code: Ec1303 Sem / Year Subject Name
No ratings yet
Question Bank: Subject Code: Ec1303 Sem / Year Subject Name
3 pages
Zookeeper Getting Started Guide
No ratings yet
Zookeeper Getting Started Guide
5 pages
Intel 4004
No ratings yet
Intel 4004
2 pages
DP-420 Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB Certification Exam Guide
From Everand
DP-420 Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB Certification Exam Guide
Anand Vemula
No ratings yet
Service Discovery Across Kubernetes Clusters with Submariner Lighthouse: The Complete Guide for Developers and Engineers
From Everand
Service Discovery Across Kubernetes Clusters with Submariner Lighthouse: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
ZooKeeper Systems and Techniques: Definitive Reference for Developers and Engineers
From Everand
ZooKeeper Systems and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Eureka Service Discovery Essentials: Definitive Reference for Developers and Engineers
From Everand
Eureka Service Discovery Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Swarm Deployment and Orchestration: Definitive Reference for Developers and Engineers
From Everand
Swarm Deployment and Orchestration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
From Everand
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
From Everand
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
Krishna Rungta
3.5/5 (4)
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
From Everand
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
Jordan Lioy
No ratings yet
Deploying Scalable Systems with Nomad: Definitive Reference for Developers and Engineers
From Everand
Deploying Scalable Systems with Nomad: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Minikube in Practice: Definitive Reference for Developers and Engineers
From Everand
Minikube in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
From Everand
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
Jordan Lioy
No ratings yet
AZURE AZ 500 STUDY GUIDE-2: Microsoft Certified Associate Azure Security Engineer: Exam-AZ 500
From Everand
AZURE AZ 500 STUDY GUIDE-2: Microsoft Certified Associate Azure Security Engineer: Exam-AZ 500
Mamta Devi
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet

Unit5 BDA

Uploaded by

Unit5 BDA

Uploaded by

1.

ZOOKEEPER – OVERVIEW ZooKeeper

Benefits of Distributed Applications

 Scalability – Performance can be increased as and when needed by adding more

Challenges of Distributed Applications

 Inconsistency – Partial failure of data.

What is Apache ZooKeeper Meant For?

The common services provided by ZooKeeper are as follows:

 Configuration management – Latest and up-to-date configuration information of the

 Leader election – Electing a node as leader for coordination purpose.

 Simple distributed coordination process

 Synchronization – Mutual exclusion and co-operation between server processes. This

Clients, one of the nodes in our distributed application cluster, access

Group of ZooKeeper servers. The minimum number of nodes that is

Server node which performs automatic recovery if any of the connected

Follower Server node which follows leader instruction.

 Sequential znode: Sequential znodes can be either persistent or ephemeral. When a

You might also like