Introduction to Distributed Systems
Introduction to Distributed Systems
Unit Contents
– Introduction to distributed System
– Definition of Distributed System
– Goals of a Distributed System
– Challenges of Distributed Systems
– Types of DSs
1.1. Introduction to Distributed System
Transparency Description
Access Hide differences in data representation and how a resource is
accessed
The data level: Server. The data level manages the actual data
that is being acted on. This level is persistent, which implies data
is stored outside of an application. Typically a file system or
database but could also be a NoSQL store (like Hadoop) or an
object store.
The simplified organization of an Internet search engine into three
different layers (3-tier architecture).
•The simplest organization is to have only two types of machines:
1. A client machine containing only the programs implementing
(part of) the user-interface level.
• 2. A server machine containing the rest, that is the programs
implementing the processing and data level.
• In this organization everything is handled by the server while the
client is essentially no more than a dumb terminal, possibly with a
pretty graphical interface.
•There are many other possibilities, of which we explore some of
the more common ones in this section.
•One approach for organizing the clients and servers is to
distribute the programs in the application layers of the previous
section across different machines.
• As a first step, we make a distinction between only two kinds of
machines: client machines and server machines, leading to what is
also referred to as a (physically two-tier architecture).
• E.g.
• A
• B
• C: The front end can then check the correctness and consistency of
the form, and where necessary interact with the user & Word
processor
• D: For example, many banking applications run on an end-user's
machine where the user prepares transactions and such. Once
finished, the application contacts the data- base on the bank's
server and uploads the transactions for further processing.
• E: represents the situation where the client's local disk contains
part of the data. For example, when browsing the Web, a client
can gradually build a huge cache on local disk of most recent
inspected Web pages.
• D & E are more popular and preferable systems.
• A-c thin clients
• D & E fat clients
2. Multitier Architectures
The data section stores global and static variables, allocated and
initialized prior to executing main.
The heap is used for dynamic memory allocation, and is managed via
calls to new, delete, malloc, free, etc.
The stack is used for local variables. Space on the stack is reserved for
local variables when they are declared
Process in Memory
Process State
• As a process executes, it changes state
– new:
• The process is being created
– running:
• Instructions are being executed
– waiting:
• The process is waiting for some event to occur
– ready:
• The process is waiting to be assigned to a processor
– terminated:
• The process has finished execution
Diagram of Process State
Process Control Block (PCB)
Information associated with each process
– Process state
– Program counter
– CPU registers
– CPU scheduling information
– Memory-management information
– Accounting information
– I/O status information
CPU Switch From Process to Process
Process Scheduling
Addition of
Medium Term
Scheduling
Operation on Process
• Process Creation
– Parent process create children processes, which, in turn create
other processes, forming a tree of processes
– Generally, process identified and managed via a process
identifier (pid)
– Resource sharing
• Parent and children share all resources
• Children share subset of parent’s resources
• Parent and child share no resources
– Execution
• Parent and children execute concurrently
• Parent waits until children terminate
Process Creation
Process Termination
• Process executes last statement and asks the operating system to
delete it (exit)
– Output data from child to parent (via wait)
– Process’ resources are deallocated by operating system
• Parent may terminate execution of children processes (abort)
– Child has exceeded allocated resources
– Task assigned to child is no longer required
– If parent is exiting
• Some operating system do not allow child to continue if its
parent terminates
– All children terminated - cascading termination
2.4 Interprocess Communication
• Processes within a system may be independent or cooperating
• Cooperating process unlike independent process can affect or be
affected by other processes execution, including sharing data
• Reasons for cooperating processes:
– Information sharing
– Computation speedup
– Modularity and Convenience
• Cooperating processes need interprocess communication (IPC)
• Two models of IPC
– Shared memory
– Message passing
3.1. Introduction to Threads
Threads Vs Processes
The processes and threads are independent
sequences of execution, the typical difference is
that threads run in a shared memory space,
while processes run in separate memory spaces.
Threads exist within a process and
every process has at least one thread.
Each process provides the resources needed to
execute a program.
Conti…
• Responsiveness:
– One thread may provide rapid response while other threads are
blocked or slowed down doing intensive calculations
• Resource Sharing:-
– Allows multiple tasks to be performed simultaneously in a single
address space
• Economy:-
– Creating and managing threads (and context switches between them )
is much faster than performing the same tasks for processes.
• Scalability (i.e Utilization of multiprocessor architectures):-
– A single threaded process can only run on one CPU, no matter how
many may be available, whereas the execution of a multi-threaded
application may be split amongst available processors
Cont….
• Concurrent Execution on a Single-core System
• ..
3.3. Clients vs. Servers
3.4. Code Migration
•So far, we have been mainly concerned with distributed
systems in which communication is limited to passing
data.
•However, there are situations in which passing
programs, sometimes even while they are being
executed, simplifies the design of a distributed system.
•In this section, we take a detailed look at what code
migration actually is.
• We start by considering different approaches to code
migration, followed by a discussion on how to deal with
the local resources that a migrating program uses.
•A particularly hard problem is migrating code in
heterogeneous systems, which is also discussed.
In distributed computing, code mobility is the
ability for running programs, code or objects to be
migrated (or moved) from one machine or
application to another.
It is common practice in distributed systems to
require the movement of code or processes
between parts of the system, instead of data.
Reasons for Migrating Code
•Traditionally, code migration in distributed systems
took place in the form of process migration in which
an entire process was moved from one machine to
another.
•Moving a running process to a different machine is a
costly and intricate task, and there had better be a
good reason for doing so.
•That reason has always been performance. The basic
idea is that overall system performance can be
improved if processes are moved from heavily-loaded
to lightly-loaded machines. Load is often expressed in
terms of the CPU queue length or CPU utilization, but
other performance indicators are used as well.
•Load distribution algorithms by which decisions
are made concerning the allocation and
redistribution of tasks with respect to a set of
processors, play an important role in compute-
intensive systems.
•However, in many modern distributed systems,
optimizing computing capacity is less an issue than,
for example, trying to minimize communication.
•Moreover, due to the heterogeneity of the
underlying platforms and computer networks,
performance improvement through code migration
is often based on qualitative reasoning instead of
mathematical models.
•Consider, as an example, a client-server system
in which the server manages a huge database. If
a client application needs to perform many
database operations
Synchronization
• Multithreading introduces asynchronous behavior to the
programs. If a thread is writing some data another thread
may be reading the same data at that time. This may bring
inconsistency.
• When two or more threads need access to a shared
resource there should be some way that the resource will
be used only by one resource at a time. The process to
achieve this is called synchronization.
• To implement the synchronous behavior java has
synchronous method. Once a thread is inside a
synchronized method, no other thread can call any other
synchronized method on the same object. All the other
threads then wait until the first thread come out of the
synchronized block.
•When we want to synchronize access to objects of a
class which was not designed for the multithreaded
access and the code of the method which needs to
be accessed synchronously is not available with us, in
this case we cannot add the synchronized to the
appropriate methods. In java we have the solution
for this, put the calls to the methods (which needs to
be synchronized) defined by this class inside a
synchronized block in following manner.
• Synchronized(object) {
• // statement to be synchronized
• }
Chapter 4:Communication
Unit Contents:
1. Layer Protocols
2. Types of Communication
3. RMI, RPC & COBRA
1. Layer Protocols
• Low-level layers
• Transport layer
• Application layer
• Middleware layer
• Basic networking Model:
1. The ISO OSI model &
2. The TCP/IP Model
Conti…
Protocols vs. Layers
• Layer 1: Physical Layer: Telnet, FTP,TFTP,…
• Layer 2: Data Link Layer: PPP SBTV SLIP
• Layer 3: Network Layer:IP,Ipsec,ICMP,IGMP,OSPF,RIP
• Layer 4: Transport Layer:UDP TCP,
• Layer 5: Session Layer:
• Layer 6: Presentation Layer:
• Layer 7: Application Layer
Physical Layer
• The physical layer sometimes plays an important role in the effective
sharing of available communication resources, and helps avoid
contention among multiple users. It also handles the transmission rate
to improve the flow of data between a sender and receiver.
The physical layer provides the following services:
• Modulates the process of converting a signal from one form to
another so that it can be physically transmitted over a communication
channel
• Bit-by-bit delivery
• Line coding, which allows data to be sent by hardware devices that are
optimized for digital communications that may have discreet timing on
the transmission link
• Bit synchronization for synchronous serial communications
• Start-stop signaling and flow control in asynchronous serial
communication
• Circuit switching and multiplexing hardware control of multiplexed
digital signals
• Carrier sensing and collision detection, whereby the physical layer detects
carrier availability and avoids the congestion problems caused by
undeliverable packets
• Signal equalization to ensure reliable connections and facilitate multiplexing
• Forward error correction/channel coding such as error correction code
• Bit interleaving to improve error correction
• Auto-negotiation
• Transmission mode control
• Examples of protocols that use physical layers include:
• Digital Subscriber Line
• Integrated Services Digital Network
• Infrared Data Association
• Universal Serial Bus
• Bluetooth
• Controller Area Network
• Ethernet
Data link layer
• The data link layer is used for the encoding,
decoding and logical organization of data bits. Data
packets are framed and addressed by this layer,
which has two sublayers.
• The data link layer's first sublayer is the media
access control (MAC) layer. It is used for source and
destination addresses. The MAC layer allows the
data link layer to provide the best data transmission
vehicle and manage data flow control.
• The data link layer's second sublayer is the logical
link control. It manages error checking and data
flow over a network.
Network Layer
•Logical connection setup, data forwarding, routing
and delivery error reporting are the network layer’s
primary responsibilities.
•Network layer protocols exist in every host or
router.
The session layer
• The session layer manages a session by initiating
the opening and closing of sessions between end-
user application processes.
• This layer also controls single or multiple
connections for each end-user application, and
directly communicates with both the presentation
and the transport layers.
• The services offered by the session layer are
generally implemented in application
environments using remote procedure calls
(RPCs).
Conti…
• Low-level layers:
• Physical layer: contains the specification and
implementation of bits, and their
transmission between sender and receiver.
• Data link layer: prescribes the transmission
of a series of bits into a frame to allow for
error and flow control
• Network layer: describes how packets in a
network of computers are to be routed.
Conti…
Transport Layer
Important
The transport layer provides the actual
communication facilities for most distributed
systems.
Standard Internet Protocols:
• TCP: connection-oriented, reliable, stream-
oriented
communication
• UDP: unreliable (best-effort) datagram
communication
Application layer
• Remote login to hosts: Telnet.
• File transfer: File Transfer Protocol (FTP), Trivial File
Transfer Protocol (TFTP)
• Electronic mail transport: Simple Mail Transfer
Protocol (SMTP)
• Networking support: Domain Name System (DNS)
• Host initialization: BOOTP.
Middleware layer
Middleware
The software layer that lies between the operating
system and applications on each side of a
distributed computing system in a network."
Conti…
• Observation
Middleware is invented to provide common services
and protocols
that can be used by many different applications
• A rich set of communication protocols
• (Un)marshaling of data, necessary for integrated
systems
• Naming protocols, to allow easy sharing of resources
• Security protocols for secure communication
• Scaling mechanisms, such as for replication and
caching
• Note: What remains are truly application-specific
protocols... such as?
The TCP/IP Reference Model
•TCP/IP means Transmission Control Protocol and
Internet Protocol. It is the network model used in
the current Internet architecture as well.
•Protocols are set of rules which govern every
possible communication over a network. These
protocols describe the movement of data between
the source and destination or the internet. They
also offer simple naming and addressing schemes.
Protocols and networks in the TCP/IP model:
Internet layer:
• Delivering IP packets
• Performing routing
• Avoiding congestion
2. Types of Communication
• We can view the middleware as an additional
service in client server computing:
(Consider, for example an email system.)
Distinguish:
• Transient versus persistent communication
• Asynchronous versus synchronous communication
Transient versus persistent:
• Transient communication: Comm. server discards message when
cannot be delivered at the next server, or at the receiver.
• Persistent communication: A message is stored at a communication
server as long as it takes to deliver it.
Asynchronous versus synchronous:
• Asynchronous communication: A sender continues immediately
after
it has submitted the message for transmission.
• Synchronous communication: The sender is blocked until its request
is known to be accepted. There are three places that synchronization
can take place
( see Figure above ):At request submission
• At request delivery
• After request processing
• Client/Server
Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
• Client and server have to be active at time of communication.
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and subsequently
processes them
Drawbacks of synchronous communication
• Client cannot do any other work while waiting for reply
• Failures have to be handled immediately: the client is waiting
• The model may simply not be appropriate (mail, news)
Messaging
Message-oriented middleware ( MOM )
Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are queued
• Sender need not wait for immediate reply, but can do other things
• Middleware often ensures fault tolerance
3. RMI, RPC & COBRA
•RPC (Remote Procedure Call) and RMI (Remote
Method Invocation) are two mechanisms that allow
the user to invoke or call processes that will run on
a different computer from the one the user is using.
• The main difference between the two is the
approach or paradigm they used.
•RMI uses an object oriented paradigm where the
user needs to know the object and the method of
the object he needs to invoke.
•In comparison, RPC isn’t object oriented and
doesn’t deal with objects. Rather, it calls specific
subroutines that are already established.
Conti…
• RPC is a relatively old protocol that is based on the C
language, thus inheriting its paradigm. With RPC,
you get a procedure call that looks pretty much like a
local call.
• RPC handles the complexities involved with passing the
call from the local to the remote computer.
• RMI does the very same thing; handling the complexities
of passing along the invocation from the local to the
remote computer. But instead of passing a procedural
call, RMI passes a reference to the object and the
method that is being called.
• RMI was developed by Java and uses its virtual machine.
Its use is therefore exclusive to Java applications for
calling methods on remote computers.
Conti…
• In the end, RPC and RMI are just two means of achieving
the same exact thing. It all comes down to what language
you are using and which paradigm you are used to. Using
the object oriented RMI is the better approach between
the two, especially with larger programs as it provides a
cleaner code that is easier to track down once something
goes wrong. Use of RPC is still widely accepted, especially
when any of the alternative remote procedural protocols
are not an option.
• Summary:
• 1.RMI is object oriented while RPC isn’t
2.RPC is C bases while RMI is Java only
3.RMI invokes methods while RPC invokes functions
4.RPC is antiquated(old) while RMI is the future.
Corba vs. RMI
• There is no doubt about the popularity of Java among
developers. With Java, possibilities have expanded even
further. Java’s extremely portable nature is of great
advantage. It integrates well with web browsers, making it
ideal for Web development ventures. As far as developers
are concerned, it is easy to use and implement. This is the
main reason many developers embrace the technology.
• RMI and CORBA are two of the most significant and
commonly utilized distribution systems in Java. Both are
very effective but with their own pros and cons. The
applications using these systems are enormously
expansive and almost limitless. As a developer for a
particular project, choosing between the two can be a
difficult decision to make.
Conti…
• Common Object Request Broker Architecture or simply
CORBA has many adapters.
• It can also call many languages with a CORBA interface as it is
developed to be independent of whatever language a
program it is written in.
• It is in direct competition to RMI but CORBA offers better
portability.
• CORBA can easily integrate with older systems and newer
ones that support CORBA.
• However, for developers of JAVA, the technology provides less
flexibility as it does not allow executables to be forwarded to
remote systems.
• CORBA is an extensive family of standards and interfaces.
Exploring the details of these interfaces is quite a daunting
task.
Conti…
• RMI is an abbreviation of Remote Method Invocation.
This technology was released with Java 1.1, actually
available since JDK 1.02, and it lets Java developers
invoke object methods and allows them to be executed
on remote JVMs or Java Virtual Machines. Its
implementation is rather easy particularly if you know
Java very well. It’s just like calling a process locally;
however, its calls are limited to Java only.
• Having mentioned about RMI’s Java-centric
characteristic, the only way to integrate codes in other
languages into the RMI distribution system is to use an
interface. This interface is called the Java native-code
interface. However, it can be extremely complex and,
more often than not, results to fragile codes.
Conti…
• RMI has major features that CORBA doesn’t have, most notably, the
ability to send new objects ,code and data across a network, and
for remote virtual machines to faultlessly handle new objects
• When comparing RMI and CORBA, it is like making a comparison
between an apple and an orange. Principally, one is not better than
the other. It entirely depends on the application or project involved
and the preference of the developer.
• Summary:
• 1. RMI is Java-centric while CORBA is not tied to a single language.
• 2. RMI is easier to master particularly for Java programmers and
developers.
• 3. CORBA offers greater portability due to its high adaptability to
various programming languages.
• 4. CORBA can’t send new objects across networks.
Why RPC?
• Sockets are considered low-level
• RPCs offer a higher-level form
of communication
• Client makes procedure call to “remote”
server using ordinary procedure call
mechanisms
Client and Server
Issues in RPC