CSC 610
DISTRIBUTED SYSTEMS
Dr. S. Pitchumani Angayarkanni
COURSE CONTENT
MODULE 1
• Introduction to Distributed systems
• Cluster coordination service and Distributed algorithms
• Cluster Management, Registration and Discovery
Introduction to Distributed Systems
• Overview:
• Where we can find Distributed systems
• The problems with centralized systems
• Distributed systems definitions and challenges
Where we can find Distributed systems
• Distributed systems are everywhere
• Watch a movie on demand
• Shop online
• Order a ride share service through our mobile phone
• Search for something online
Why Distributed Systems?
• Those companies are running highly scalable distributed systems in order to
• Handle millions of users
• Petabytes of data
• Provide consistent user experience
The cloud and distributed systems
• Even the simplest website hosted on cloud is running on a distributed system.
• Cloud(AWS, AZURE,GCP …) itself is a complex distributed system designed
for companies and software developers. So we can focus on product and
cloud vendors
The virtue of a Distributed system
• Beauty of a well designed and well implemented Distributed system
• Not aware of the complexities of the systems
• Feels like a single machine on the other side of the internet connection dedicated
specifically for you
A distributed system is a collection of independent computers that
appears to its users as a single coherent system
The Problems with Centralized Systems
• It is an opposite of the distributed system
• Startup company wanted to reach users through
website or App
• Create an online shopping experience for people to
buy video games / computers and share their views
on purchase with friends
• They decide to host their web site on the spare
computer
• The mobile app and website hosted in a
centralized server
• As the user base grows our system
cannot keep up with number of requests
1. Performance and Storage – Limited and the data it needs to store.
vertical scalability
• We decide to upgrade our computer to
the latest and greatest most powerful
computer.
• This type of upgrade for a system is also
referred to as vertical scaling.
• As traffic to our system keeps increasing ,
the performance and the memory
becomes a bottleneck and there is no way
for upgrade of that computer.
• Though we get more memory and
compute power from one single machine
further vertical scaling is no longer option
for us.
2. Single point of failure
• If there is a power or network
outage in our area or we simply
need to restart the computer to
do some maintenance our
service would go down that
would lead to a big loss in
revenue and loss of our users
trust
3. High Latency
• Users from other continents
who want to check out our
website are faced with bad
experience in the form of slow
page load as the latency
towards computer grows with
distance.
• There is no way for us to
improve the latency in this
current configuration.
4. Security and Privacy
• Our computer is open to
internet which makes it
vulnerable to hackers
• DDoS attack and many other
threats cannot be handled by
the centralized system.
• It also cant handle security and
privacy
Solution – Distributed System
• Horizontal scalability allows the system to grow and shrink on
demand
Distributed Systems Definition and challenges
• A Distributed system is a system of several processes, running on different
computers, communicating with each other through the network, and are
sharing a state or are working together to achieve a common goal.
Discussion on the terminologies
• What is a process?
• After we compile our application into an executable class or a jar file its stored on the file system just like any
other text, music or image file
• When we launch the application the operating system creates an instance of that application in the memory
• That instance is called process
• This process is entirely isolated from any other running process on the same computer no matter that
other process are instance of the same application or instance of different applications
• Processes running on the same machine can communicate with each other
through then network , the file system and the memory through some
advanced techniques that the operating system provide.
Still not Distributed
• All process are running in the same
machine still sharing all resources
and cannot scale beyond the
capacity of that particular computer.
• So if we put each process on a
Solution separate machine those processes
are completely decoupled from each
other
• We put each process on a separate machine
and those processes are completely
decoupled from each other.
• They can scale horizontally as much as
needed meaning we can keep adding more
and more machines as we need to extend our
memory or processing power.
• If some machine become unavailable or break
down the processes keep functioning and
overall our system can stay available and
continue performing its tasks that is actually not
trivial to achieve .
• Decoupling of the processes and placing
them on different machines left us with
network as the only option for
communication between the processes
• Once we establish the communication
our job is to build those process in such a
way that they would maintain a shared
view of the world in a form of a state or
work together to achieve a common goal.
1.2 Cluster Coordination
Terminology - Node
• A process running on a dedicated machine as a part of the distributed system
• This term originally comes from graph theory
• When two nodes have an edge between them that means the two processes
can communicate with each other through the network
Terminology - Cluster
• Collection of computers/nodes connected to each other.
• The nodes in a cluster are working on the same task, and typically are running
the same code.
• Large amount of data to analyze or a
complex computation to solve we want
to hand over this task to a cluster of
nodes.
• Question: What part of task is going to
be performed by which node after all
the biggest benefit in a distributed
system is that we can paralyze the
work and let each node work
independently for that common goal.
Attempt 1
• We would manually distribute the work and assign each node a separate task
but that would not be scalable
• We could receive 1000 tasks per second and so we need a programmatic way
to do that distribution.
Attempt 2
• We could manually decide on one special
node to be the leader or the master node will
be incharge of distributing the work and
collecting the results.
• This is better than our first approach.
• The problem with this is all nodes can fail at
any time including the master node.
• In distributed system failure is a question of
when our leader is not there to distribute the
work or collect the results the entire cluster is
decommissioned.
Attempt 3
• The solution is to build an algorithm to allow the
nodes to elect their own leader on demand and
make them all watch the leaders health closely
• If the master node becomes unavailable the
remaining nodes will reelect a new leader
• Later if the old leader recovers from its failure once
it joins the cluster it should realize that it's not a
leader anymore and we join as a regular node to
help with the work that is what we want to achieve
• but if we think about this architecture for a moment
picking a leader in a large group of people each
one with their own ego is not a trivial
The way we're going to use zookeeper is instead of having our nodes communicating directly with
each other to coordinate the work they are going to communicate with the zookeeper servers
directly.
• Instead on the other side of the equation zookeeper provides us with a very
familiar and easy to use software abstraction and data model that looks a lot
like a tree and is very similar to a file system each element in that tree or
virtual file system is called a Z node
Persistent: ephemeral Z node:
if our application disconnects from the zookeeper and then reckon X is the exact opposite it gets deleted as soon as the application that
again a persistent Z node that was created by our application stays created that Z node disconnects from the zookeeper we can already
intact with all its children and data guess that ephemeral Z nodes would be a great tool for us to
identify if another node that created them when done
Design First distributed algorithm – The leader
election
• In step one every node that connects to zookeeper volunteers to become a
leader.
• Each node submits its candidacy by adding a Z node that represents itself
under the election parent since zookeeper maintains a global order it can
name each Z node according to the order of their addition
• In step two after each node finishes creating a Z node it would query the
current children of the election parent notice that because of that order that
zookeeper provides us each node when querying the children of the election
parent is guaranteed to see all the Z nodes created prior to its own Z node
creation
• So in step three if the Z note that the current node created is the smallest
number it knows that it is now the leader on the other hand if the Z note that
the current node is not the smallest then the node knows that is not the leader
and it is now waiting for instructions from the elected leader.
• This is how we break the symmetry and arrive to a global agreement on the
leader node
Zookeeper - Leader Election
• Let us analyze how a leader node can be elected in a ZooKeeper ensemble. Consider there are N number of nodes
in a cluster. The process of leader election is as follows −
• All the nodes create a sequential, ephemeral znode with the same path, /app/leader_election/guid_.
• ZooKeeper ensemble will append the 10-digit sequence number to the path and the znode created will
be /app/leader_election/guid_0000000001, /app/leader_election/guid_0000000002, etc.
• For a given instance, the node which creates the smallest number in the znode becomes the leader and all the other
nodes are followers.
• Each follower node watches the znode having the next smallest number. For example, the node which creates
znode /app/leader_election/guid_0000000008 will watch the znode /app/leader_election/guid_0000000007 and
the node which creates the znode /app/leader_election/guid_0000000007 will watch the
znode /app/leader_election/guid_0000000006.
• If the leader goes down, then its corresponding znode /app/leader_electionN gets deleted.
• The next in line follower node will get the notification through watcher about the leader removal.
• The next in line follower node will check if there are other znodes with the smallest number. If none, then it will
assume the role of the leader. Otherwise, it finds the node which created the znode with the smallest number as
leader.
• Similarly, all other follower nodes elect the node which created the znode with the smallest number as leader.
• Leader election is a complex process when it is done from scratch. But ZooKeeper service makes it very simple.
Focus
Zookeeper Configuration and Startup
• https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-
3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
Extract the Files to C:
Rename the configuration file
Create a folder called logs
Only Z node present
Create an hierarchy of parent and child Z node
• create command which takes the path we want to create and the data
our Z node will store the ACL parameter stands for access control list
which we can ignore at the moment so let's create a Z node called
parent under the root Z node and put some data in it
Zookeeper
• Create znodes
• Get data
• Watch znode for changes
• Set data
• Create children of a znode
• List children of a znode
• Check Status
• Remove / Delete a znode
• Create Znodes
• Create a znode with the given path. The flag argument specifies whether the
created znode will be ephemeral, persistent, or sequential. By default, all znodes
are persistent.
• Ephemeral znodes (flag: e) will be automatically deleted when a session expires or
when the client disconnects.
• Sequential znodes guaranty that the znode path will be unique.
• ZooKeeper ensemble will add sequence number along with 10 digit padding to the
znode path. For example, the znode path /myapp will be converted to
/myapp0000000001 and the next sequence number will be /myapp0000000002. If
no flags are specified, then the znode is considered as persistent.
To create a Sequential znode, add -s flag as
shown below.
To create an Ephemeral Znode, add -e flag as
shown below.
Remember when a client connection is lost, the ephemeral
znode will be deleted. You can try it by quitting the ZooKeeper
CLI and then re-opening the CLI.
• Get Data
• It returns the associated data of the znode and metadata of the specified
znode. You will get information such as when the data was last modified,
where it was modified, and information about the data. This CLI is also used
to assign watches to show notification about the data.
To access a sequential znode, you must enter the full path of the znode.
Sample
get /FirstZnode0000000023
• Watch
• Watches show a notification when the specified znode or znode’s children
data changes. You can set a watch only in get command.
• Set Data
• Set the data of the specified znode. Once you finish this set operation, you
can check the data using the get CLI command.
• Create Children / Sub-znode
• Creating children is similar to creating new znodes. The only difference is that
the path of the child znode will have the parent path as well.
• List Children
• This command is used to list and display the children of a znode.
• Check Status
• Status describes the metadata of a specified znode. It contains details such
as Timestamp, Version number, ACL, Data length, and Children znode.
• Remove a Znode
• Removes a specified znode and recursively all its children. This would happen
only if such a znode is available.
• https://dzone.com/articles/running-apache-kafka-on-windows-os