Introduction To Distributed Systems
Introduction To Distributed Systems
A distributed system is a collection of independent computers that appear to the users of the system as
a single system.
Deals with hardware: The machines linked in a distributed system are autonomous.
Deals with software: A distributed system gives an impression to the users that they are dealing with a
single system.
Concurrency
Distributed system function in a heterogeneous environment. So adaptability is a major issue.
Latency
Memory considerations: The distributed systems work on both local and shared memory.
Synchronization issues
Applications must need to adapt gracefully without affecting other parts of the systems in case
of failures.
Since they are widespread, security is a major issue.
Limits imposed on scalability
They are less transparent.
Knowledge about the dynamic network topology is a must.
QOS parameters
The distributed systems must offer the following QOS:
Performance
Reliability
Availability
Security
This definition leads to the following characteristics of distributed systems:
o Concurrency of components
o Lack of a global ‘clock’
o Independent failures of components
Examples:
• Network of workstations
• Distributed manufacturing system (e.g., automated assembly line)
• Network of branch office computers
Centralized System Characteristics:
One component with non-autonomous parts
Component shared by users all the time
All resources accessible
Software runs in a single process
Single point of control
Single point of failure
Distributed System Characteristics
Multiple autonomous components
Components are not shared by all users
Resources may not be accessible
Software runs in concurrent processes on different processors
Multiple points of control
Multiple points of failure
Advantages of Distributed Systems over Centralized Systems:
Local area
Web server network
email server
print
File server
other servers
the rest of
the Internet
router/firewall
• Different client applications want to access and update shared data in a database.
• Client applications might be banking systems, real-estate agencies, airline-ticket reservation
systems accessing data like balances of bank accounts, details of property that are for sale or to
let, or airfares and aircraft reservation data.
• The database is physically distributed over several processors to take advantage of local data
accesses for increased performance of client applications.
• Data may be replicated to reduce the impact of failures of a processor and/or the network.
• Each processor runs a database monitor that implements the mapping between the database
seen by clients and the physical database stored on the different processors.
• Database monitors have to cooperate with each other to implement client accesses to remote
data, updates of replicated data and concurrency control.
• The physical distribution of data is therefore transparent to clients.
• An automatic teller machine network enables bank customers to withdraw cash from their bank
account.
• Banks and building societies maintain large networks of teller machines.
• Customers have high security, privacy and reliability requirements.
• Customers may want to withdraw cash from their account through a ´foreign´ teller machine.
• A front-end computer controls one or several tellers. It
– transfers withdrawal requests to the computer of the account holder´s bank,
– awaits the bank granting the request, and
– Therefore has to be interoperable with heterogeneous computer systems (Hang Seng
Bank may have different account management systems than HongKong Bank and Bank
of China).
• Each bank has fault-tolerant systems to quickly recover from failures of their account holding
computers. An example is the ´Hot standby´ computer which maintains a copy of the account
database and can replace the main computer within seconds.
Internet:
Internet is the largest distributed system in the world.
A vast interconnected collection of computer networks of many different types.
Passing message by employing a common means of communication (Internet Protocol).
The web is not equal to the Internet
World Wide Web is the largest software application running on Internet.
It becomes the most popular distributed software application ever created.
intranet % %
% ISP
%
backbone
satellite link
desktop computer:
server:
network link:
• A Web browser is a user interface to the world´s biggest distributed system, the Internet.
• A Web page includes links to other Web pages. These links are specified as URLs.
• An URL is the name of a protocol (ftp, http, etc.), the name of a site (gateway1.cse.cuhk.edu.hk)
and the name of a file.
• To follow a link to a remote Web page, your Web browser
– talks to the local name server to resolve the symbolic site name into an IP address
(137.189.89.153).
– talks to the http daemon running on that web site and requests the delivery of the Web
page addressed by the URL.
• To obtain a file from a remote ftp site, your Web browser
– resolves the site name with the local name server
– talks to the ftp daemon running on that site and performs an anonymous login.
– switches the daemon into an appropriate transfer mode and
– obtains the file addressed by the file addressed in the URL.
• To send an e-mail, your Web browser
– opens a new dialog window where you can enter the addressee(s) and the e-mail text
– talks to the local sendmail daemon to have it delivering the e-mail to the sendmail
daemons on the sites of your addressees.
http://www.google.comlsearch?q=lyu
www.google.com
www.uu.se Internet
http://www.uu.se/
www.w3c.org
• Mobile and ubiquitous computer extend the access of Internet and distributed system
architecture from wire-line connections to wireless connections.
• Distributed systems techniques are equally applicable to mobile computing involving laptops,
PDAs and wearable computing devices.
Mobile computing (nomadic computing) - perform of computing tasks while moving (nomadic
computing)
Ubiquitous computing - small computers embedded in appliances i. harness of many small,
cheap computation devices ii. It benefits users while they remain in a single environment such
as home
Internet
Mobnilee
pho
Laptop Host site
Printer Camera
Today’s internet exists on the distributed systems. There are numerous applications of distributed
systems. Some of them are discussed below:
Web Search:
The task of a web search engine is to index the entire contents of the World Wide Web.
Distributed search is a search engine model in which the tasks of Web crawling, indexing and
query processing are distributed among multiple computers and networks
The search engines were supported by a single supercomputer . But in recent years, they have
moved to a distributed model.
Google search relies upon thousands of computers crawling the Web from multiple locations all
over the world.
In Google's distributed search system, each computer involved in indexing crawls and reviews a
portion of the Web, taking a URL and following every link available from it.
The computer gathers the crawled results from the URLs and sends that information back to a
centralized server in compressed format.
The centralized server then coordinates that information in a database, along with information
from other computers involved in indexing.
When a user types a query into the search field, Google's domain name server ( DNS ) software
relays the query to the most logical cluster of computers, based on factors such as its proximity
to the user or how busy it is.
At the recipient cluster, the Web server software distributes the query to hundreds or
thousands of computers to search simultaneously.
Hundreds of computers scan the database index to find all relevant records.
The index server compiles the results, the document server pulls together the titles and
summaries and the page builder creates the search result pages.
The following features of Google search makes it act as distributed system:
o The physical infrastructure with very large numbers of networked computers.
o Highly distributed file system that supports very large files.
o Availability of structured distributed storage system for fast access to data.
o Distributed locking and agreement.
o Works on a programming model that supports the management of large parallel and
distributed computations.
Massively multiplayer online games (MMOGs):
These games simulate real-life as much as possible.
As such it is necessary to constantly evolve the game world using a set of laws.
These laws are a complex set of rules that the game engine applies with every clock tick.
The virtual world consists not only of human players but also of all game elements that are not
living objects.
These elements are immutable and include the area terrain, trees, mountains, rivers, etc
MMOGs must be able to handle a very large number of simultaneous users.
As the information transferred between the players and the game server is large, the
bandwidth required to support a huge number of players if enormous.
Very large virtual worlds require huge computational power to simulate the existence of life (AI
Algorithms).
No single processor machine can handle the computational load required.
These gaming applications work on both client server and distributed architectures.
In distributed architectures, the large virtual world of the game is split into different smaller
areas and each area is to be handled by a separate physical machine (server).
Therefore both the bandwidth and computational load is spread out on many machines, thus
making the application distributed.
Financial Trading:
The financial trading is now moving to distributed systems.
These systems require frequent modifications, in response to the communication.
Processing the events in distributed systems, demands reliability.
So distributed event based systems are used.
A series of event feeds are given by the financial institution.
The events may be in different formats, employ different technologies etc.
So proper adaptors are a must.
Pattern detection is also very import aspect in financial trading.
The Complex Event Processing (CEP) is an automatized way of composing event together into
logical, temporal or spatial patterns.
Features of distributed systems or Characteristics distributed systems
Certain common characteristics can be used to assess distributed systems
Heterogeneity
Resource Sharing
Openness
Concurrency
Scalability
Fault Tolerance
Transparency
Security
Heterogeneity:
Heterogeneity means the diversity of the distributed systems in terms of hardware, software, platform,
etc. Modern distributed systems will likely to be operating with different:
Computer hardware: computers, tablets, mobile phones, embedded devices, etc
Operating System: Ms Windows, Linux, Mac, Unix, etc.
Network: Local network, the Internet, wireless network, satellite links, etc.
Programming languages: Java, C/C++, Python, PHP, etc.
Different roles of software developers, designers, system managers
Middleware: Middleware applies to a software layer that provides a programming abstraction
as well as masking the heterogeneity of the underlying networks, hardware, operating systems
and programming languages. Eg: CORBA, RMI
Mobile Code to refer to code that can be sent from one computer to another and run at the
destination (e.g., Java applets and Java virtual machine).
Resource Sharing :
Ability to use any hardware, software or data anywhere in the system
Resource manager controls access, provides naming scheme and controls concurrency.
Resource sharing model (e.g. client/server or object-based) describing how
o resources are provided,
o they are used and
o provider and user interact with each other
Openness:
Openness is concerned with extensions and improvements of distributed systems.
Detailed interfaces of components need to be published.
New components have to be integrated with existing components.
Differences in data representation of interface types on different processors (of different
vendors) have to be resolved.
Concurrency:
Distributed Systems usually is multi-users environment. In order to maximize concurrency, resource
handling components should anticipate as they will be accessed by competing users. Concurrency
prevents the system to become unstable when users compete to view or update data.
Components in distributed systems are executed in concurrent processes.
Components access and update shared resources (e.g. variables, databases, device
drivers).
Integrity of the system may be violated if concurrent updates are not coordinated.
Lost updates
Inconsistent analysis
Scalability:
Distributed systems must be scalable as the number of user increases. A system is said to be scalable if it
can handle the addition of users and resources without suffering a noticeable loss of performance or
increase in administrative complexity.
Adaptation of distributed systems to
o accommodate more users
o respond faster (this is the hard one)
Usually done by adding more and/or faster processors.
Components should not need to be changed when increases scale of a system.
Design components to be scalable!
Fault Tolerance:
Computer systems sometimes fail. When faults occur in hardware or software, programs may
produce incorrect results or may stop before they have completed the intended computation. The
handling of failures is particularly difficult.
Distributed Systems involves a lot of collaborating components (hardware, software, communication).
So there is a huge possibility of partial or total failure. The failures are handled in series of steps:
Detecting failures: Some failures like checksum can be detected
Masking failures: Some failures that have been detected can be hidden or made less severe.
Examples of hiding failures include retransmission of messages and maintaining a
redundant copy of same data.
Tolerating failures: All the failures cannot be handled. Some failures must be accepted by the
user.
Example of this is waiting for a video file to be streamed in
Recovery from failures - recover and rollback data after a server has crashed.
Redundancy- the way to tolerate failures – replication of services and data in multiple servers
Security: Many of the information resources that are made available and maintained in distributed
systems have a high intrinsic value to their users. Their security is therefore of considerable importance.
Security for information resources has three components
Confidentiality (protection against disclosure to unauthorized individuals)
Integrity (protection against alteration or corruption),
Availability for the authorized (protection against interference with the means to access the
resources).
Transparency:
Distributed systems should be perceived by users and application programmers as a whole rather than
as a collection of cooperating components.
• Transparency has different dimensions that were identified by ANSA.
• These represent various properties that distributed systems should have
Access Transparency - Hide differences in data representation and how a resource is accessed
Location Transparency - Hide where a resource is located
Migration Transparency -Hide that a resource may move to another location
Relocation Transparency - Hide that a resource may be moved to another location while in use
Replication Transparency - Hide that a resource may be copied in several places
Concurrency Transparency -Hide that a resource may be shared by several competitive users
Failure Transparency - Hide the failure and recovery of a resource
Persistence Transparency -Hide whether a (software) resource is in memory or a disk