0% found this document useful (0 votes)
3 views

12 Distributed Web-Based Systems

Chapter 12 of 'Distributed Systems: Principles and Paradigms' discusses Distributed Web-Based Systems, highlighting the architecture, processes, and communication protocols involved in web services. It covers topics such as multi-tiered architectures, server clustering, and the use of protocols like HTTP and SOAP for communication. Additionally, it addresses issues related to synchronization, replication, and security within web hosting systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

12 Distributed Web-Based Systems

Chapter 12 of 'Distributed Systems: Principles and Paradigms' discusses Distributed Web-Based Systems, highlighting the architecture, processes, and communication protocols involved in web services. It covers topics such as multi-tiered architectures, server clustering, and the use of protocols like HTTP and SOAP for communication. Additionally, it addresses issues related to synchronization, replication, and security within web hosting systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Distributed Systems

Principles and Paradigms

Chapter 12
(version October 15, 2007)

Maarten van Steen


Vrije Universiteit Amsterdam, Faculty of Science
Dept. Mathematics and Computer Science
Room R4.20. Tel: (020) 598 7784
E-mail:[email protected], URL: www.cs.vu.nl/∼steen/
01 Introduction
02 Architectures
03 Processes
04 Communication
05 Naming
06 Synchronization
07 Consistency and Replication
08 Fault Tolerance
09 Security
10 Distributed Object-Based Systems
11 Distributed File Systems
12 Distributed Web-Based Systems
13 Distributed Coordination-Based Systems
00 – 1 /
Distributed Web-Based Systems
Essence: The WWW is a huge client-server system
with millions of servers; each server hosting thousands
of hyperlinked documents:

2. Server fetches
Client machine Server machine document from
local file

Browser Web server

OS
3. Response

1. Get document request (HTTP)

• Documents are generally represented in text (plain


text, HTML, XML)

• Alternative types: images, audio, video, but also


applications (PDF, PS)

• Documents may contain scripts that are executed


by the client-side software

12 – 1 Distributed Web-Based Systems/12.1 Architecture


Multi-tiered Architectures

Observation: Already very soon, Web sites were or-


ganized into three tiers:

3. Start process to fetch document

1. Get request 4. Database interaction


HTTP CGI
request program
6. Return result handler

5. HTML document
created
Web server CGI process Database server

12 – 2 Distributed Web-Based Systems/12.1 Architecture


Web Services

Observation: At a certain point, people started rec-


ognizing that it is was more than just user ↔ site in-
teraction: sites could offer services to other sites ⇒
standardization is then badly needed.

Client machine Server machine


Look up
a service Client Server Publish service
application application

Stub Stub

Communication SOAP Communication


subsystem subsystem

Generate stub Generate stub


from WSDL from WSDL
description description

Servicedescription
Service description(WSDL)
(WSDL)
Service description (WSDL)

Directory service (UDDI)

12 – 3 Distributed Web-Based Systems/12.1 Architecture


Clients: Web browsers

Observation: browsers form the Web’s most impor-


tant client-side sofware. They used to be simple, but
that is long ago.

Display back end


User interface

Browser engine

Rendering engine

Client-side
Network script HTML/XML
comm. interpreter parser

12 – 4 Distributed Web-Based Systems/12.2 Processes


Apache Web Server

Observation: More than 70% of all Web sites are


based on Apache. The server is internally organized
more or less according to the steps needed to process
an HTTP request:

Module Module Function Module

... ... ...

Link between
function and hook

Hook Hook Hook Hook

Apache core
Functions called per hook

Request Response

12 – 5 Distributed Web-Based Systems/12.2 Processes


Server Clusters (1/2)
Essence: To improve performance and availability,
WWW servers are often clustered in a way that is
transparent to clients:

Web Web Web Web


server server server server

LAN

Front end handles


Front all incoming requests
end and outgoing responses

Request Response

Problem: The front end may easily get overloaded,


so that special measures need to be taken.

Transport-layer switching: Front end simply passes


the TCP request to one of the servers, taking some
performance metric into account.
Content-aware distribution: Front end reads the con-
tent of the HTTP request and then selects the
best server.

12 – 6 Distributed Web-Based Systems/12.2 Processes


Server Clusters (2/2)

Question: Why can content-aware distribution be so


much better?

6. Server responses
Web
5. Forward server 3. Hand of
other f
TCP connection
messages Distributor

Other messages
Dis-
Client Switch 4. Inform patcher
Setup request switch
1. Pass setup request Distributor 2. Dispatcher selects
to a distributor server
Web
server

12 – 7 Distributed Web-Based Systems/12.2 Processes


Communication (1/2)

Essence: Communication in the Web is generally based


on HTTP; a relatively simple client-server transfer pro-
tocol having the following request messages:
Operation
Description
Head Request to return the header of a document
Get Request to return a document to the client
Put Request to store a document
Post Provide data that are to be added to a docu-
ment (collection)
Delete Request to delete a document

12 – 8 Distributed Web-Based Systems/12.3 Communication


Communication (2/2)
C/S
Header Contents
Accept C The type of documents the client can handle
Accept-Charset C The character sets are acceptable for the client
Accept- C The document encodings the client can handle
Encoding
Accept- C The natural language the client can handle
Language
Authorization C A list of the client’s credentials
WWW- S Security challenge the client should respond to
Authenticate
Date C+S Date and time the message was sent
ETag S The tags associated with the returned document
Expires S The time for how long the response remains valid
From C The client’s e-mail address
Host C The TCP address of the document’s server
If-Match C The tags the document should have
If-None-Match C The tags the document should not have
If-Modified- C Tells the server to return a document only if it has
Since been modified since the specified time
If-Unmodified- C Tells the server to return a document only if it has
Since not been modified since the specified time
Last-Modified S The time the returned document was last modified
Location S A document reference to which the client should
redirect its request
Referer C Refers to client’s most recently requested document
Upgrade C+S The application protocol sender wants to switch to
Warning C+S Information about status of the data in the message

12 – 9 Distributed Web-Based Systems/12.3 Communication


SOAP

Simple Object Access Protocol: Based on XML,


this is the standard protocol for communication be-
tween Web services.

• SOAP is bound to an underlying protocol (i.e., it


is not independent from its carrier)

• Conversational exchange style: Send a docu-


ment one way, get a filled-in response back.

• RPC-style exchange: Used to invoke a Web ser-


vice.

12 – 10 Distributed Web-Based Systems/12.3 Communication


A Note on XML

Observation: XML has the advantage of allowing self-


describing documents. Full stop (i.e., it introduces
performance problems and is not meant to be read
by human beings)

env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<n:alertcontrol xmlns:n="http://example.org/alertcontrol">
<n:priority>1</n:priority>
<n:expires>2001-06-22T14:00:00-05:00</n:expires>
</n:alertcontrol>
</env:Header>
<env:Body>
<m:alert xmlns:m="http://example.org/alert">
<m:msg>Pick up Mary at school at 2pm</m:msg>
</m:alert>
</env:Body>
</env:Envelope>

12 – 11 Distributed Web-Based Systems/12.3 Communication


Naming: URL
URL: Uniform Resource Locator tells how and where
to access a resource.

Scheme Host name Pathname


http :// www.cs.vu.nl /home/steen/mbox
(a)

Scheme Host name Port Pathname


http :// www.cs.vu.nl : 80 /home/steen/mbox
(b)

Scheme Host name Port Pathname


http :// 130.37.24.11 : 80 /home/steen/mbox
(c)

Examples:
http HTTP http://www.cs.vu.nl:80/globe
mailto Mail mailto:[email protected]
ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README
file Local file file:/edu/book/work/chp/11/11
data Inline data data:text/plain;charset=iso-8859-7,
%e1%e2%e3
telnet Remote login telnet://flits.cs.vu.nl
tel Telephone tel:+31201234567
modem Modem modem:+31201234567;type=v32

12 – 12 Distributed Web-Based Systems/12.4


Synchronization: WebDAV

Problem: There is a growing need for collaborative


auditing of Web documents, but bare-bones HTTP can’t
help here. Solution: Web Distributed Authoring and
Versioning.

• Supports exclusive and shared write locks, which


operate on entire documents

• A lock is passed by means of a lock token; the


server registers the client(s) holding the lock

• Clients modify the document locally and post it


back to the server along with the lock token

Note: There is no specific support for crashed clients


holding a lock.

12 – 13 Distributed Web-Based Systems/12.5 Synchronization


Web Proxy Caching
Basic idea: Sites install a separate proxy server that
handles all outgoing requests. Proxies subsequently
cache incoming documents. Cache-consistency pro-
tocols:

• Always verify validity by contacting server


• Age-based consistency:
Texpire = α · ( Tcached − Tlast modi f ied ) + Tcached
• Cooperative caching, by which you first check your
neighbors on a cache miss:
Web
server
3. Forward request
to Web server

1. Look in
local cache

Web 2. Ask neighboring proxy caches Web


Cache proxy proxy Cache

Client Client Client Client Client Client


Web
HTTP Get request proxy Cache

Client Client Client

12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication


Replication in Web Hosting
Systems

Observation: By-and-large, Web hosting systems are


adopting replication to increase performance. Much
research is done to improve their organization. Fol-
lows the lines of self-managing systems:

Uncontrollable parameters (disturbance / noise)

Initial configuration Corrections Observed output


Web hosting system

+/- +/- +/-

Reference input
Replica Consistency Request Metric
placement enforcement routing estimation

Analysis
Adjustment triggers Measured output

12 – 15 Distributed Web-Based Systems/12.6 Consistency and Replication


Handling Flash Crowds

Observation: We need dynamic adjustment to bal-


ance resource usage. Flash crowds introduce a se-
rious problem:

2 days 2 days
(a) (b)

6 days 2.5 days


(c) (d)

12 – 16 Distributed Web-Based Systems/12.6 Consistency and Replication


Server Replication

Content Delivery Network: CDNs act as Web host-


ing services to replicate documents across the Inter-
net providing their customers guarantees on high avail-
ability and performance (example: Akamai).

6. Get embedded documents


CDN (if not already cached)
Cache server

5. Get embedded
documents
Return IP address 7. Embedded documents
client-best server
1. Get base document
CDN DNS 4 Origin
Client
server server
2. Document with refs
DNS lookups 3 to embedded documents

Regular
DNS system

Question: How would consistency be maintained in


this system?

12 – 17 Distributed Web-Based Systems/12.6 Consistency and Replication


Replication of Web Apps. (1/3)

Observation: Replication becomes more difficult when


dealing with databses and such. No single best solu-
tion.

Edge-server side Origin-server side

Client query
Server Server

response

Content-blind Database
cache copy
full/partial data replication

Content-aware Authoritative
full schema replication/
cache database
Schema query templates Schema

Assumption: Updates are carried out at origin server,


and propagated to edge servers.

12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication


Replication of Web Apps. (2/3)

Edge-server side Origin-server side

Client query
Server Server

response

Content-blind Database
cache copy
full/partial data replication

Content-aware full schema replication/ Authoritative


cache database
Schema query templates Schema

• Full replication: high read/write ratio, often in


combination with complex queries. Note: replica-
tion may possibly speed-down performance when
R/W ratio goes down.

• Partial replication: high read/write ratio, but in


combination with simple queries

12 – 19 Distributed Web-Based Systems/12.6 Consistency and Replication


Replication of Web Apps. (3/3)

Edge-server side Origin-server side

Client query
Server Server

response

Content-blind Database
cache copy
full/partial data replication

Content-aware full schema replication/ Authoritative


cache query templates database
Schema Schema

• Content-aware caching: Check for queries at lo-


cal database, and subscribe for invalidations at
the server. Works good with range queries and
complex queries.

• Content-blind caching: Simply cache the result


of previous queries. Works great with simple queries
that address unique results (e.g., no range queries).

12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication


Security: TLS (SSL)

Transport Layer Security: Modern version of the


the Secure Socket Layer (SSL), which “sits” between
transport layer and application protocols. Relatively
simple protocol that can support mutual authentica-
tion using certificates:

1
Possibilities

2
Choices

Server
Client

3
[ K+S ] CA
4
[ K+C ] CA
5
K +S ([ R ] C)

12 – 21 Distributed Web-Based Systems/12.6 Consistency and Replication

You might also like