0% found this document useful (0 votes)
25 views54 pages

Part 1

Uploaded by

Sai Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views54 pages

Part 1

Uploaded by

Sai Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 54

Distributed Computing

CS 434

Introduction to Distributed Systems

Distributed Computing Introduction 1


Overview
 Introduction
 Definitions
 Advantages of distributed systems
 Disadvantages of distributed systems
 Characteristics of Distributed Systems
 Examples of Distributed Systems
 The Internet; Intranet; Mobile and ubiquitous computing;
 Resource Sharing and the Web
 Challenges arising from the construction of Distributed Systems

Distributed Computing Introduction 2


Distributed computing and Distributed
system
 Early computing was performed on a single processor. Uni-processor
computing can be called centralized computing.
 Two advances in technology began to change the situation
 The development of powerful microprocessors
 The invention of high-speed computer networks
 Distributed computing is computing performed in a distributed system.
 A distributed system is
 a collection of independent computers, interconnected via a network,
capable of collaborating on a task, or
 One in which components located at networked computers communicate
and coordinate their actions only by message passing
 a collection of independent computers, that appears to the users of the
system as a single computer.

Distributed Computing Introduction 3


Advantages of Distributed Systems

 The motivation for constructing and using distributed systems stems from a
desire to share resources.
 Economics: distributed systems allow the pooling of resources, including
CPU cycles, data storage, input/output devices, and services.
– Distributed systems have a better price/performance ratio

 Concurrency: we can solve the problem more quickly using several processors
concurrently.

 Some applications are inherently distributed : In some problems the most natural
solution is to use separate parallel processes to perform the subtasks of the
given problem.
 Computer Supported Cooperative Work

Distributed Computing Introduction 4


Advantages of Distributed Systems—Cont.

 Communication: make human-to-human communication easier,


for example by electronic mail.

 Reliability: a distributed system allow replication of resources


and/or services, thus reducing service outage due to failures.

 Incremental growth: computing power can be added in small


increments.

Distributed Computing Introduction 5


Disadvantages of Distributed Computing

 Multiple Points of Failures: the failure of one or more participating


computers, or one or more network links, can spell trouble.
 Security Concerns: In a distributed system, there are more
opportunities for unauthorized attack.
 Difficulties of developing distributed software : how should
operating systems, programming languages and applications look like
 Networking problems: several problems are created by the network
infrastructure, which have to be dealt with : loss of messages,
overloading,..

Distributed Computing Introduction 6


Characteristics of Distributed Systems
 Concurrency of components (or Parallel activities)
 Autonomous components executing concurrent tasks
 Lack of a global clock (only limited precision for processes to
synchronize)
 No global state: No single process can have knowledge of the
current global state of the system
 Independent failures of components

Distributed Computing Introduction 7


Examples of distributed systems

 The following examples are based on familiar and widely used computer
networks: the Internet, intranets and the emerging technology of networks based
on mobile devices.

 Internet: A huge interconnected collection of computer networks of many


different types.
 The Internet is a vast interconnected collection of computer networks.
Programs running on computers connected to it interact by passing
messages. It is a "network of networks" that consists of millions of private
and public, academic, business, and government networks of local to global
scope that are linked by copper wires, fiber-optic cables, wireless connections,
and other technologies.
 It enables users to make use of services such as the World Wide Web, e-mail
and file transfer.

Distributed Computing Introduction 8


Examples of distributed systems—Cont.

 Intranet: A portion of the Internet (a network of computers and workstations


within an organization) managed by an organization that can be configured to
enforce local security policies.
 Composed of several LANs linked by backbone connections.
 Isolated from the Internet via a protective device (a firewall).

 Mobile and ubiquitous computing: the integration of small and portable


computing devices into distributed systems. These devices include: Laptop
computers, Handheld devices (PDAs, mobile phones, pagers, video cameras and
digital cameras), smart watches, devices embedded in appliances such as cars,
washing machines.
 Ubiquitous (‫)التواجد في كل مكان‬is intended to mean that small computing
devices will eventually become so pervasive ‫ انتشارا‬in everyday objects that
they are scarcely noticed.

Distributed Computing Introduction 9


The Internet

 It enables users to use services such as WWW, email, file transfer,


Multimedia services, etc.
 Programs running on the computers connected to the Internet interact by
passing messages.
 The design and construction of the Internet protocols enables a program
running anywhere to address messages to programs anywhere else.
 Note: Some times Web is incorrectly used to mean the Internet.
 Internet Service Providers (ISPs) are companies that provide modem links
and other types of connection to users and small organizations, enabling
them to
 Access services in the internet as well as
 Providing local services such as email and web hosting

Distributed Computing Introduction 10


A Typical Portion of the Internet

intranet 

 ISP

backbone

satellite link

desktop computer:
server:
network link:

Distributed Computing Introduction 11


Intranets
 A portion of the Internet that is separately administrated and has a boundary
that can be configured to enforce local security policies.

 Intranets are linked together by backbones.

 A backbone is a network link with a high transmission rate, employing satellite


connection, fiber optic cables.

 An intranet is connected to the Internet via a router that


 Allows users inside the intranet to make use of services such as the Web or
email

 Firewalls are used to protect an intranet by preventing unauthorized outgoing


and incoming messages.

 A firewall may allow only messages related to email and web access to pass into
or out of the intranet that it protects.

Distributed Computing Introduction 12


A typical intranet

email server Desktop


computers
print and other servers

Local area
Web server network

email server
print
File server
other servers
the rest of
the Internet
router/firewall

Distributed Computing Introduction 13


Mobile and ubiquitous computing (Ch16)

 Mobile computing becomes possible because of


 The portability of many of these devices Slide 9 and
 The ability of these devices to connect conveniently to networks in different
places.
 Ubiquitous computing could benefit users while they remain in a single
environment such as home or hospital
 In mobile computing, users who are away from their home intranet are still
provided with access to resources via devices they carry with them.
 Next Fig. shows a user who is visiting a host organization (or host site)
 Figure shows the user’s home intranet and the host intranet at the site that
the user is visiting
 Both intranets are connected to the rest of the Internet.
 Laptop has means of connecting to the host’s wireless LAN

Distributed Computing Introduction 14


Mobile and ubiquitous computing—Cont.

 This network covers a few hundreds of meters.


 It connects to the rest of the host intranet via a gateway.
 The user also has a mobile, which is connected to the Internet using the
Wireless Application Protocol (WAP) via a gateway
 The phone gives access to textual information on the display
 The user carries a digital camera, which can communicate over an infra-red
link when pointed at a corresponding device like printer

Distributed Computing Introduction 15


Mobile and Ubiquitous Computing:
Portable and handheld devices in a distributed
system

Internet

Host intranet WAP


Wireless LAN gateway Home intranet

Mobile
phone
Printer Laptop
Camera Host site

Users who are away from their home intranet are still provided with
access to resources via the devices they carry with them.

Distributed Computing Introduction 16


1.3 Resource Sharing and the Web
 We generally share HW resources such as printers, data resources such as files,
and resources with more specific functionality such as search engines.

 We share equipments such as printers and disks to reduce costs.


 Patterns of resource sharing vary widely in their scope and in how closely users
work together.
 Search engine provides services for users who need never come into contact
with each other.
 In computer-supported cooperative working (CSCW), a group of users who
cooperate directly share resources such as documents in a small, closed
group.

 The pattern of sharing and the geographic distribution of particular users


determines what mechanisms the system must supply to coordinate users’
actions.
Distributed Computing Introduction 17
Resource Sharing and the Web—Cont.
 We use the term service for a distinct part of a computer system
that manages a collection of related resources and presents their
functionality to users and applications.
 For example, we access shared files through a file service.
 The only access we have to the service is via the set of
operations that it exports. e.g. A file service provides read, write
and delete operations on files.

 The term server refers to a running program (a process) on a


network computer that accepts requests from programs running
on other computers to perform a service and responds
appropriately.

 The requesting processes are referred to as a clients.


Distributed Computing Introduction 18
Resource Sharing and the Web—Cont.
 Requests are sent in massages from clients to a server and replies
are sent in massages from the server to the clients.
 When the client sends a request for an operation to be carried out,
we say that the client invokes an operation upon the server.
 A complete interaction between a client and server, from the point
when the client sends its request to when it receives the server’s
response, is called a remote invocation.
 By default the terms ‘client’ and ‘server’ refer to processes rather
than the computers that they execute upon, although in everyday
parlance those terms also refer to the computers themselves.

Distributed Computing Introduction 19


Resource Sharing and the Web—Cont.
 In a distributed system written in an object-oriented language,
 resources may be encapsulated as objects and accessed by client
objects, in which case we speak of a client object invoking a
method upon a server object.
 An executing web browser is an example of a client.
 The web browser communicates with a web server, to request
web pages from it.

Distributed Computing Introduction 20


1.3.1 The World Wide Web (WWW)
 The World Wide Web [www.w3.org]
 is an evolving system for publishing and accessing resources and services
across the Internet.
 The World Wide Web (commonly shortened to the Web) is a system of
interlinked hypertext documents accessed via the Internet.
 A key feature of the Web is that
 it provides a hypertext structure among the documents that it stores,
reflecting the users’ requirement to organize their knowledge.
– This means that documents contain links- references to other
documents and resources that are also stored in the web.
 The Web is an open system: it can be extended and implemented in new ways
without disturbing its existing functionally as follows:

Distributed Computing Introduction 21


The World Wide Web (WWW)—Cont.
 First, Its operation is based on communication standards and
document standards that are freely published and widely
implemented,
 e.g. there are many types of browsers, each in many cases
implemented on several platforms;
 and there are many implementations of web servers.
 Any conformant browser can retrieve resources from any
conformant server.
 So users have access to browsers on the majority of the devices
that they use, from PDAs to desktop computers.

 Second, the web is open with respect to the types of ‘resources’


that can be published and shared on it.

Distributed Computing Introduction 22


The World Wide Web (WWW)—Cont.
 At its simplest, a resource on the Web is a web page or some
other type of content that can be stored in a file and presented
to the user, such as program files, media files, and documents in
PostScript or Portable Document Format.
 If somebody invent new image format, then images in this
format can be immediately be published on the Web.
– browsers are designed to accommodate new content-
presentation functionality in the form of ‘helper’ applications and
‘plug-ins’.
 The Web has moved beyond these simple data resources to
encompass services, such as electronic purchasing of goods. It has
evolved without changing its basic architecture.

Distributed Computing Introduction 23


The World Wide Web (WWW)—Cont.
 The web is based on three main standard technological components:
1. The HyperText Markup Language (HTML) is a language for
specifying the contents and layout of pages as they are displayed by
web browsers.
2. Uniform Resources Locators (URLs), which identify documents and
other resources stored as part of the Web.
3. A client-server system architecture, with standard rules for
interaction (the HyperText Transfer Protocol-HTTP) by which
browsers and other clients fetch documents and other resources
from web servers.

 Figure Next slide shows some web servers, and browsers making
requests of them.
 It is an important feature that users may locate and manage their
own web servers anywhere on the Internet

Distributed Computing Introduction 24


Figure 1.4 Page 10
Web servers and web browsers

http://www.google.comlsearch?q=kindberg
www.google.com

Browsers
Web servers

www.cdk3.net Internet
http://www.cdk3.net/

www.w3c.org

File system of http://www.w3c.org/Protocols/Activity.html


www.w3c.org Protocols

Activity.html

Resource sharing and the Web: open protocols, scalable servers,


and pluggable browsers
Distributed Computing Introduction 25
The World Wide Web (WWW)—Cont.
 HTML: The HyperText Markup Language is used to:
1. Specify the text and images that make up the contents of a
web page.
2. Specify how they are laid out and formatted for presentation
to the user.
3. Specify links and which resources are associated with them.
 A web page contains such structured items as headings,
paragraphs, tables and images.

Distributed Computing Introduction 26


The World Wide Web (WWW)—Cont.
 URL: the purpose of a uniform resource locator is to identify a
resource. Browsers examine URLs in order to access the
corresponding resources.
 Every URL has two top-level components:
 scheme : scheme-specific-identifier
 The scheme declare which type of URL this is.
– mailto:[email protected] identifies a user’s e-mail address.
– ftp://ftp.download.com/software/prog.exe identifies a file to be
retrieved using the File Transfer Protocol (FTP) rather than the
more commonly used protocol HTTP.
 HTTP URLs are the most widely used for accessing resources
using the standard HTTP protocol.

Distributed Computing Introduction 27


The World Wide Web (WWW)—Cont.
 An HTTP URL has two main jobs
 To identify which web server maintains the resource.
 To identify which of the resources at that server is required.

 HTTP URLs are of the following form:


 http://servername[:port][/pathname][?query][#fragment]
 Items in square brackets are optional.
 The server’s DNS name (Ch9) is optionally followed by the number of
the port on which the server listen for requests (Ch4) which is 80 by
default.
 In figure Figure 1.4 page 10, the third URL specifies a query to a
search engine. The path identifies a program called “search”, and the
string after the ‘?’ encodes a query string supplied as an argument to
the program.

Distributed Computing Introduction 28


The World Wide Web (WWW)—Cont.
 A program that web servers run to generate content for their
clients is often referred to as Common Gateway Interface (CGI)
program.

 A CGI program may have any application specific functionality, as


long as it can parse the arguments that the client provides to it and
produce contents of the required type (usually HTML text).

Distributed Computing Introduction 29


Examples of Distributed Systems
Embedded Systems
1-Avionics ( airplanes engineering ) control system
• Flight management systems in aircraft
2-Automotive control system
• Mercedes S-Class automobiles these days are equipped with 50+
autonomous embedded processors
• Connected through proprietary bus-like LANs
3-Consumer Electronics
• Audio HiFi equipment

Distributed Computing Introduction 30


1.4 Challenges with Business
Example
 Online bookstore (e.g. in the World Wide Web)
 Customers can connect their computer to your computer (web
server):
– Browse your inventory
– Place orders
–…

Distributed Computing Introduction 31


Challenges: Business Example
(Cont.)
 What if
 Your customer uses a completely different hardware? (PC,
MAC,…)
 … a different operating system? (Windows, Unix,…)
 … a different way of representing data? (ASCII, EBCDIC,…)

 Heterogeneity
 Or
 You want to move your business and computers to the
Caribbean (because of the weather)?
 Your client moves to the Caribbean (more likely)?

 Distribution transparency
Distributed Computing Introduction 32
Challenges: Business Example
(Cont.)
 What if
 Two customers want to order the same item at the same time?

 Concurrency
 Or
 The database with your inventory information crashes?
 Your customer’s computer crashes in the middle of an order?

 Failure handling

Distributed Computing Introduction 33


Challenges: Business Example
(Cont.)
 What if
 Someone tries to break into your system to steal data?
 … sniffs for information?

 Security
 Or
 You are so successful that millions of people are visiting your
online store at the same time?

 Scalability

Distributed Computing Introduction 34


Challenges: Business Example
(Cont.)
 When building the system…
 Do you want to write the whole software on your own (network,
database,…)?
 What about updates, new technologies?

 Reuse and Openness (Standards)

Distributed Computing Introduction 35


Challenges (Cont.)
1. Heterogeneity
 Heterogeneous components must be able to interoperate
2. Openness
 Interfaces should be publicly available to ease adding new
components
3. Security
 The system should only be used in the way intended
4. Scalability
 System should work efficiently with an increasing number of
users
 System performance should increase with inclusion of
additional resources.
Distributed Computing Introduction 36
Challenges (Cont.)
5. Fault tolerance and handling
 Failure of a component (partial failure) should not result in
failure of the whole system
6. Concurrency
 Shared access to resources must be possible
7. Distribution transparency
 Distribution should be hidden from the user as much as
possible

Distributed Computing Introduction 37


Heterogeneity
 Heterogeneity means variety and difference applies to all of the following:
 Networks
– Differences between different types of networks forming the Internet are
not noticed as all of the computers attached to them use the Internet
protocols (IPs) to communicate with one another.
 Computer Hardware
– Data types such as integers may be represented in different ways on
different sorts of hardware. These differences must be dealt with if
messages are to be exchanged between programs running on different
hardware.
 Operating systems
– Although all computers in the Internet must include an implementation
of the Internet protocols, they do not necessarily provide the same
application programming interface to these protocols.

Distributed Computing Introduction 38


Heterogeneity—Cont.
 Programming languages
– Different programming languages use different representations
for characters and data structures such as arrays and records.
These differences must be addressed if programs written in
different programming languages are to be able to communicate
with one another.
 Implementation by different developers
– Programs written by different developers can not communicate
unless they use common standards, for example, for
representation of primitive data items and data structures in
messages. Consequently, standards need to be agreed and
adopted.

Distributed Computing Introduction 39


Heterogeneity—Cont.
 Middleware: a software layer that provides a programming
abstraction as well as masking the heterogeneity of the underlying
networks, hardware, operating systems and programming
languages.

 The Common Object Request Broker (CORBA, Ch4, 5 and 20)


is an example.

 Java Remote Method Invocation (RMI, Ch5) is an example of a


middleware that support a single programming language.

Distributed Computing Introduction 40


Heterogeneity and Mobile code
 Mobile code: code that can be sent from one computer to another
and run in the destination.
 Example: Java applet.

 Code suitable for running on one computer may not be suitable for
running on another. Executable programs are normally specific
both to the instruction set and to the host operating system.

 The virtual machine approach provides a way of making code


executable on any hardware.

Distributed Computing Introduction 41


Openness
 The characteristic that determines whether the system can be
extended and re-implemented in various ways.

 The openness of a distributed system is determined by the degree


to which new resource-sharing services can be added and be made
available for use by client programs.

 Can not be achieved unless the specification and documentation of


the key software interfaces are published (available to software
developers)

Distributed Computing Introduction 42


Security
 The resources are accessible to authorized users and used in the
way they are intended.

 Security for information resources has three components:


 Confidentiality
– Protection against disclosure to unauthorized individual.
 Integrity
– Protection against alternation or corruption.
– e.g. changing the account number or amount value in a money
order
 Availability
– Protection against interference with the means to access the
resources.

Distributed Computing Introduction 43


Security—Cont.
 Security mechanisms
 Encryption and Authentication

 The following two security challenges have not yet been fully met
 Denial of service attack.
– Achieved by bombarding the service with such a large number of
pointless requests that the serious users are not able to use it.
 Security of mobile code
– Receiving an executable program as an electronic mail
attachment.

Distributed Computing Introduction 44


Scalability
 Distributed systems operate effectively and efficiently at many
different scales, ranging from a small intranet to the Internet.

 A system is described as scalable if will remain effective when


there is a significant increase in the number of resources and the
number of users.

 The Internet provides an illustration of a distributed system in


which the number of computers and services has increased
dramatically.

Distributed Computing Introduction 45


Figure 1.6
Computers vs. Web servers in the Internet

Date Computers Web servers Percentage

1993, July 1,776,000 130 0.008


1995, July 6,642,000 23,500 0.4
1997, July 19,540,000 1,203,096 6
1999, July 56,218,000 6,598,697 12
2001, July 125,888,197 31,299,592 25
2003, July 42,298,371

Distributed Computing Introduction 46


Scalability—Cont.
 The design of scalable distributed systems presents the following challenges:
 Cost of physical resources
– Cost should linearly increase with system size
– For a system with n users to be scalable, the quantity of physical
resources required to support them should be proportional to n.
 Performance Loss
– An increase in the size of resources will result in some loss in
performance.
– The time to access hierarchically structure data is O(logn), for a system
to be scalable, the max performance loss should not be worse than this.
 Preventing software resources running out:
– Numbers used to represent Internet address.
 Avoiding performance bottlenecks:
– Use decentralized algorithms (centralized DNS to decentralized).

Distributed Computing Introduction 47


Failure Handling
 Failure handling techniques include
 Masking Failures
– Messages can be retransmitted when they fail to arrive.
– Data can be written to a pair of disks.
 Tolerating Failures
– When a web browser can not contact a web server, it informs the
user about the problem.
 Recovery from failure
– Rollback mechanisms
 Redundancy
– A database may be replicated in several servers to ensure that
the data remains accessible after the failure of any single server.

Distributed Computing Introduction 48


Concurrency
 The process that managed a shared resource could take one client request at a
time. This approach limits throughput. Therefore services generally allow
multiple client requests to be processed concurrently (Multithreading).

 Suppose that each resource is encapsulated as an object and that invocations


are executes in concurrent threads. In this case several threads may be
executing concurrently within an object, in which case their operation in the
object may conflict with one another and produce inconsistent results.

 For an object to be safe in a concurrent environment, it is operations must be


synchronized in such a way that it is data remains consistent.
 This can be achieved by standard techniques such as semaphores, which
are used in most operating systems.

Distributed Computing Introduction 49


Transparency
 Transparency – the distributed system should appear as a conventional,
centralized system to the user.
 To hide from the user and the application programmer of the
separation/distribution of components, so that the system is perceived as a
whole rather than a collection of independent components.
 ISO Reference Model for Open Distributed Processing (RM-ODP) identifies
the following forms of transparencies:

 Access transparency
 Access to local or remote resources is identical

 Location transparency
 Resources can be accessed without knowledge of their physical or
network location (e.g. which building or IP address).
Distributed Computing Introduction 50
Transparency—Cont.
 Concurrency transparency
 Enables several processes to operate concurrently using shared resources
without interference between them.

 Replication transparency
 Enables multiples instances of resources to be used to enhanced reliability
and performance without knowledge of the replicas by users or application
programmers.

 Failure transparency
 Tasks can be completed despite failures
 e.g. message retransmission

 Migration (mobility/relocation) transparency


 Allow the movement of resources and clients within a system without
affection the operation of users or applications.

Distributed Computing Introduction 51


Transparency—Cont.
 Performance transparency:
 Allows the system to be reconfigured to improve performance as loads
vary.
 e.g., switching from linear structures to hierarchical structures when the
number of users increase.

 Scaling transparency:
 Allows the system and applications to expand in scale without change to
the system structure or the application algorithms.

 The two most important transparencies are access and location


transparency.

Distributed Computing Introduction 52


Forms of Transparency in a Distributed
System

Distributed Computing Introduction 53


Communication
•Components of a distributed system have to communicate in order to
interact. This implies support at two levels :
1.Networking infrastructure (interconnections & network software).
2.Appropriate communication primitives and models and their
implementation:
Communication primitives:
Send
Receive message passing
Remote procedure call (RPC)
Communication models
-Client-server communication: implies a message exchange
between two processes : the process which requests a service
and the one which provides it;
54
-Group multicast : the target of a message is a set of processes,
which are members of a given group .

You might also like