34 Applications
34 Applications
34 Applications
Internet Technology
William Stallings
Network Applications
Terminal Access Telnet
History
Oldest Internet application
Demonstrated on four-node ARPANET deployed in 1969
Two years to expand protocol sufficiently to make it
useful and to work out bugs
First published version RFC 97
"First Cut at a Proposed Telnet Protocol," February 1971
1983 final form issued as RFC 854 and RFC 855
(Get and study these RFCs see last slide)
Still useful Internet application
Also pioneering effort application-level protocol design
Basis of many newer protocols
HTTP
Remote Terminal Access
Early motivation for networks was remote access to
interactive systems
Dumb terminals
Keyboard and screen with primitive comms hardware
Stream of character data transmitted in each direction
Local host computer or terminal controller establish
connection to remote host
Local user can use remote host
Hosts handle particular set of characteristics
Figure 3.1 (a) Telnet Operational
Environment on Arpanet
Figure 3.2 Network Virtual
Terminal Concept
Phases of operation
Connection management
Connection request and termination
Telnet uses TCP
Negotiation
To determine mutually agreeable set of characteristics
NVT has range of capabilities and features
Real terminal more limited
NVT has options, such as line length
Control
Exchange of control information and commands
e.g., end of line, interrupt process
Data
Transfer of data between two correspondents
For Telnet, control and data conveyed in single stream
Current Use of Telnet
Original environment for Telnet little relevance
today
Still used and included in the TCP/IP suite
Availableon PCs for use over the Internet
PC includes Telnet software
Telnet protocol and translation between PC
keyboard/display and NVT
Not GUI
Services available include United States Library
of Congress
locis.loc.gov
Figure 3.1(b) Telnet Operational
Environment on Internet
The Longevity of Telnet
Telnet is older than most of its users
(But not most lecturers!)
Telnet is simple
RFC 854 is 15 pages
HTTP (see later lecture) is 176 pages
Simple job done by simple protocol
Telnet can evolve
Option negotiation was brilliant
Not common in IETF protocol designs until late 1980s
Enables Telnet to evolve to meet new demands without endless
new versions of basic protocol
Currently over 100 RFCs on Telnet and its options
3% of the entire body of RFCs
Most recent RFC 2953, Telnet Encryption, September 2000
FILE TRANSFERFTP
FTP evolved from an era of radically diverse systems
Has obsolete commands, transfer modes, and data representations
Objectives:
Promote sharing of files (computer programs and/or data)
Encourage indirect or implicit (via programs) use of remote computers
Shield user from variations in file storage systems among hosts
Transfer data reliably and efficiently
File systems, rather than just files
Single file viewed as set of bits with name
Trivial File Transfer Protocol (TFTP) does this
Send request header to read or write file with some name
Stream bits across
11 pages to define
FTP deals with metadata such as file pathnames, file organization,
access control, and data representation
Accordingly, RFC 959 is 69 pages long
FTP Model
User FTP entity and Server FTP entity
Initiating host is user
Chooses file name and options
Server accepts or rejects request
Based on its file system protection and options requested
If accepted, server responsible for establishing and managing
transfer
Operates on two levels (Figure 3.5)
Establish TCP connection
Exchange control information (commands and replies)
Second TCP connection established for data transfer
FTP user interface enables human user or program to
access User FTP
FTP Commands
Specify parameters for data connection
Data port, transfer mode, representation type, and structure
Nature of file system operation
Store, retrieve, append, delete
User data transfer protocol should "listen" on specified
data port
Server initiates data connection and data transfer
FTP uses Telnet protocol on control connection
FTP user protocol or FTP server protocol may implement Telnet
rules directly
FTP user protocol or FTP server protocol may use existing
Telnet module
Figure 3.5
FTP Model
Figure 3.6
Overview
of an FTP
Transfer
Options
FTP assumes files are objects in mass storage
Share some properties regardless of machine
Files uniquely identified by symbolic names
Files have owners and protection against unauthorized access
Files may be created, read from (copied from), written into, or
deleted (within protection rules)
To support specific computers and operating systems,
FTP can negotiate options in three dimensions
Datatype, file type, and transfer mode
Systems programmer on each system determines
How particular file can be mapped to standard file type
Using one of standard data types
Transferred using standard mode
Such that it is useful at destination
Figure 3.7 FTP File Types
Transmission Modes
Stream Mode
Optimise use of network
Stream mode (default)
Raw data sent
Least computational burden on user and server
systems
No restriction on file type
Record-structure files, 2-byte control code for EOR and EOF
Electronic Mail
Most heavily used application on any network
Simple Mail Transfer Protocol (SMTP)
TCP/IP
Delivery of simple text messages
Multi-purpose Internet Mail Extension (MIME)
Delivery of other types of data
Voice, images, video clips
SMTP
RFC 821
Not concerned with format of messages or data
Covered in RFC 822 (see later)
SMTP uses info written on envelope of mail
Message header
Does not look at contents
Message body
Except:
Standardize message character set to 7 bit ASCII
Add log info to start of message
Shows path taken
Basic Operation
Mail created by user agent program (mail client)
Message consists of:
Header containing recipients address and other info
Body containing user data
Messages queued and sent as input to SMTP
sender program
Typically a server process (daemon on UNIX)
Mail Message Contents
Each queued message has:
Message text
RFC 822 header with message envelope and list of recipients
Message body, composed by user
A list of mail destinations
Derived by user agent from header
May be listed in header
May require expansion of mailing lists
May need replacement of mnemonic names with mailbox
names
If BCCs indicated, user agent needs to prepare
correct message format
SMTP Sender
Takes message from queue
Transmits to proper destination host
Via SMTP transaction
Over one or more TCP connections to port 25
Host may have multiple senders active
Host should be able to create receivers on demand
When delivery complete, sender deletes destination
from list for that message
When all destinations processed, message is deleted
Optimization
If message destined for multiple users on a
given host, it is sent only once
Delivery to users handled at destination host
If multiple messages ready for given host, a
single TCP connection can be used
Saves overhead of setting up and dropping
connection
Possible Errors
Host unreachable
Host out of operation
TCP connection fail during transfer
Sender can re-queue mail
Give up after a period
Faulty destination address
User error
Target user changed address
Redirect if possible
Inform user if not
SMTP Protocol - Reliability
Used to transfer messages from sender to
receiver over TCP connection
Attempts to provide reliable service
No guarantee to recover lost messages
No end to end acknowledgement to originator
Error indication delivery not guaranteed
Generally considered reliable
SMTP Receiver
Accepts arriving message
Places in user mailbox or copies to outgoing
queue for forwarding
Receiver must:
Verify local mail destinations
Deal with errors
Transmission
Lack of disk space
Sender responsible for message until receiver
confirm complete transfer
Indicates mail has arrived at host, not user
SMTP Forwarding
Mostly direct transfer from sender host to
receiver host
May go through intermediate machine via
forwarding capability
Sender can specify route
Target user may have moved
Figure 3.9 SMTP Mail Flow
SMTP System Overview
Commands and responses between sender and
receiver
Initiative with sender
Establishes TCP connection
Sender sends commands to receiver
e.g. HELO<SP><domain><CRLF>
Each command generates exactly one reply
e.g. 250 requested mail action ok; completed
Format for Text Messages
RFC 882
Message viewed as having envelope and
contents
Envelope contains information required to
transmit and deliver message
Message is sequence of lines of text
Uses general memo framework
Header usually keyword followed by colon followed
by arguments
Example Message
Date:Tue, 16 Jan 1996 10:37:17 (EST)
From: William Stallings <[email protected]>
Subject:The syntax of RFC 822
To: [email protected]
Cc: Jones@Yet-another_host.com
This is the main text, delimited from the header by
a blank line.
Multipurpose Internet Mail
Extension (MIME)
Extension to RFC822
SMTP can not transmit executables
Uuencode and other schemes are available
Not standardized
Can not transmit text including international characters
(e.g. , , , , , , )
Need 8 bit ASCII
Servers may reject mail over certain size
Translation between ASCII and EBCDIC not standard
SMTP gateways to X.400 can not handle none text data
in X.400 messages
Some SMTP implementations do not adhere to standard
CRLF, truncate or wrap long lines, removal of white space, etc.
MIME Transfer Encodings
Reliable delivery across wide largest range of
environments
Content transfer encoding field
Six values
Three (7bit, 8bit, binary) no encoding done
Provide info about nature of data
Quoted-printable
Data largely printable ASCII characters
Non-printing characters represented by hex code
Base64
Maps arbitrary binary input onto printable output
X-token
Named nonstandard encoding
Internet Directory Services DNS
Directory lookup service
Provides mapping between host name and numerical
address
Essential to functioning of Internet
RFCs 1034 and 1035.
Four elements
Domain name space
Tree-structured
DNS database
Each node and leaf in name space tree structure names set of
information (e.g., IP address, type of resource) in resource record
Name servers
Servers that hold information about portion of tree
Resolvers
Programs that extract information from name servers
Domain Names
32-bit IP address uniquely identifies devices
Two components
Network number
Host address
Problems
Routers devise routes based on network number
Cant hold table of every network and path
Networks group to simplify routing
32-bit address usually written as four decimal numbers
Effective for computer processing
Not convenient for users
Problems are addressed by concept of domain
Group of networks are under control of single entity
Organized hierarchically
Names assigned reflect organization
Figure 4.4
Portion of Internet Domain Tree
Figure 4.5
DNS Resource Record Format
DNS Operation
User program requests IP address for domain name
Resolver module in local host or local ISP formulates
query for local name server
In same domain as resolver
Local name server checks for name in local database or
cache
If so, returns IP address to requestor
Otherwise, query other available name servers
Starting down from root of DNS tree or as high up as possible
Local name server caches reply
Depending on Time to live field
User program given IP address or error message
DNS name servers automatically send out updates to
other relevant name servers as conditions warrant
Figure 4.6
DNS Name Resolution
Server Hierarchy
Name servers operated by any organization that has
domain
Each name server holds subset of name space (a zone)
One or more (or all) subdomains within domain
Authoritative
This name server maintains accurate data for this portion hierarchy
Can extend to any depth
13 root name servers share responsibility for top level
zones
Replication prevents root server bottleneck
Individual root servers are busy
Internet Software Consortium server (F) answers almost 300
million DNS requests daily (www.isc.org/services/public/F-root-
server.html)
Typically, single queries carried over UDP
Queries for group of names carried over TCP
Name Resolution
Resolver knows name and address of local DNS server
If resolver does not have name in cache, it sends DNS
query to local server
Either returns address or after querying one or more
other servers
Server (A) forwards request to server (B)
If B has name in cache or database, it can return result
If not, B can
Query another name server and send result back to A
Recursive
Tell A address of next server (C) to ask
A then asks to C
Iterative
Server exchanges use can either
Name resolvers use recursive
The Web
The Web History (I)
1945: Vannevar Bush,
Memex:
"a device in which an
individual stores all his
books, records, and
communications, and which
is mechanized so that it may
be consulted with exceeding
speed and flexibility"
Vannevar Bush (1890-1974)
Memex
(See http://www.iath.virginia.edu/elab/hfl0051.html)
1967, Ted Nelson, Xanadu:
A world-wide publishing network
that would allow information to be
stored not as separate files but as
connected literature
Owners of documents would be
automatically paid via electronic
means for the virtual copying of
their documents
Coined the term Hypertext
Influenced research community
Who then missed the web..
The Web History (II)
Ted Nelson
Physicist trying to solve real problem
Distributed access to data
World Wide Web (WWW): a
distributed database of pages
linked through Hypertext Transport
Protocol (HTTP)
First HTTP implementation - 1990
Tim Berners-Lee at CERN
HTTP/0.9 1991
Simple GET command for the Web
HTTP/1.0 1992
Client/Server information, simple caching
HTTP/1.1 - 1996
The Web History (III)
Tim Berners-Lee
Why So Successful?
What do the web and youtube have in common?
The ability to self-publish
But the self-publishing mechanisms must be:
Technically easy
Independent (not requiring intricate coordination)
Free
Being part of a grand idealistic and collaborative
endeavor isnt what people want
People arent looking for Nirvana (or even Xanadu)
They want to make their mark, and find something neat
Moral of the Story
Timing is everything
Internet made vision not only possible, but necessary
Can you imagine with web without the Internet?
Visions are great, but problem solving is better
The best is the enemy of the good
Particularly when it blocks deployment
Nelson on HTML:
HTML is precisely what we were trying to PREVENT ever-
breaking links, links going outward only, quotes you can't
follow to their origins, no version management, no rights
management.
Components of Web
Infrastructure
Content
Objects
Clients
Send requests / Receive responses
Servers
Receive requests / Send responses
Store or generate the responses
Proxies
Placed between clients and servers
Act as a server for the client, and a client to the server
Provide extra functions
Caching, anonymization, logging, transcoding, filtering access
Explicit or transparent (interception)
Ingredients of Web
Implementation
HTML
URIs, URLs, URNs
HTTP
HTML
A Web page has several components
Base HTML file
Referenced objects (e.g., images)
HyperText Markup Language (HTML)
Representation of hypertext documents in ASCII format
Web browsers interpret HTML when rendering a page
Several functions:
Format text, reference images, embed hyperlinks (HREF)
Straight-forward to learn
Syntax easy to understand
Authoring programs can auto-generate HTML
Source almost always available
Uniform Resource Locator (URL)
Provides a means to get the resource
http://www.ietf.org/rfc/rfc3986.txt
Uniform Resource Name (URN)
Names a resource independent of how to get it
urn:ietf:rfc:3986 is a standard URN for RFC 3986
URI: Uniform Resource Identifier
Names, locations, etc.
Who has used a URN? A URL?
Which one solves a real problem?
Which one represents an idealistic vision?
The dominance of URLs over URNs reflects the lack of a
proper naming structure for objects
Naming was central component of our clean slate design
What properties should a URN have?
Not specify anything about object that can change
Cryptographic information about the associated key
Solves real problems: security and mobility of content
URL Syntax
protocol://hostname[:port]/directorypath/resource
protocol http, ftp, https, smtp, rtsp, etc.
hostname FQDN, IP address
port Defaults to protocols standard port
e.g. http: 80/tcp https: 443/tcp
directory path Hierarchical, often reflecting file system
resource Identifies the desired resource
Can also extend to program executions:
http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B%40Bulk&
MsgId=2604_1744106_29699_1123_1261_0_28917_3552_128995
7100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos
=0&vi ew=a&head=b
HTTP
HyperText Transfer Protocol (HTTP)
Client-server protocol for transferring
resources
Important properties:
Request-response protocol
Reliance on a global URI namespace
Resource metadata
Stateless
ASCII format
% telnet www.icir.org 80
GET /jdoe/ HTTP/1.0
<blank line, i.e., CRLF>
HTTP and TCP
What functions does HTTP leave to TCP?
Would HTTP be harder without layering?
Steps in HTTP Request
HTTP Client initiates TCP connection to server
HTTP Client sends HTTP request to server
HTTP Server responds to request
HTTP Client receives the request
TCP connection terminates
How many RTTs for a single request?
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
User-agent: Mozilla/4.0
Connection: close
Accept-language: fr
(blank line)
Client-to-Server Communication
HTTP Request Message
Request line: method, resource, and protocol version
Request headers: provide information or modify
request
Body: optional data (e.g., to POST data to the
server)
Not Not optional
Client-to-Server Communication
HTTP Request Message
Request line: method, resource, and protocol version
Request headers: provide information or modify request
Body: optional data (e.g., to POST data to the server)
Request methods include:
GET: Return current value of resource, run program,
HEAD: Return the meta-data associated with a resource
POST: Update resource, provide input to a program,
Headers include:
Useful info for the server
e.g. desired language
Server-to-Client Communication
HTTP Response Message
Status line: protocol version, status code, status phrase
Response headers: provide information
Body: optional data
HTTP/1.1 200 OK
Connection close
Date: Thu, 06 Aug 2006 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 2006 ...
Content-Length: 6821
Content-Type: text/html
(blank line)
data data data data data ...
Server-to-Client Communication
HTTP Response Message
Status line: protocol version, status code, status phrase
Response headers: provide information
Body: optional data
Response code classes
Similar to other ASCII app. protocols like SMTP
Code Class Example
1xx Informational
100 Continue
2xx Success
200 OK
3xx Redirection
304 Not Modified
4xx Client error
404 Not Found
5xx Server error
503 Service Unavailable
Web Server: Generating a Response
Return a file
URL matches a file (e.g., /www/index.html)
Server returns file as the response
Server generates appropriate response header
Generate response dynamically
URL triggers a program on the server
Server runs program and sends output to client
Return meta-data with no body
HTTP Resource Meta-Data
Meta-data
Info about a resource, stored as a separate entity
Examples:
Size of a resource, last modification time, etc.
Example: Type of the content
Data format classification (e.g., Content-Type: text/html)
Enables browser to automatically launch an appropriate
viewer
From e-mails Multipurpose Internet Mail Extensions (MIME)
Usage example: Conditional GET Request
Client requests object If-modified-since
If unchanged, HTTP/1.1 304 Not Modified
No body in the servers response, only a header
HTTP is Stateless
Stateless protocol
Each request-response exchange treated independently
Servers not required to retain state
This is good
Improves scalability on the server-side
Dont have to retain info across requests
Can handle higher rate of requests
Order of requests doesnt matter
This is also bad
Some applications need persistent state
Need to uniquely identify user or store temporary info
e.g., Shopping cart, user preferences/profiles, usage tracking,
State in a Stateless Protocol:
Cookies
Client-side state maintenance
Client stores small
(?)
state on behalf of server
Client sends state in future requests to the server
Can provide authentication
Request
Response
Set-Cookie: XYZ
Request
Cookie: XYZ
State in a Stateless Protocol:
HTTP Authentication
Tool to limit access to server documents
Basic HTTP Authentication
Client can add an Authorization header to GET request
Base64-encoded concatenation of username, a colon, & password
If client doesnt provide header, server responds with
a 401 Unauthorized and a WWW-Authenticate header
Server does not honor request until valid authorization
received
Stateless: Must happen on each request
Is this secure? Is this security?
No. Authentication is not security, but provides a
piece
Security Sneak-Peek: HTTPS
Transport Layer Security (TLS)
Came after Secure Sockets Layer (SSL)
Shim between App layer (e.g. HTTP) and Transport
layer (e.g. TCP)
Provides authentication and communication privacy
Putting It All Together
Client-Server
Request-Response
HTTP
Stateless
Get state with cookies
Content
URI/URL
HTML
Meta-data
Web Browser
Is the client
Generates HTTP requests
User types URL, clicks a hyperlink or bookmark, clicks reload or
submit
Automatically downloads embedded images
Submits the requests (fetches content)
Via one or more HTTP connections
Presents the response
Parses HTML and renders the Web page
Invokes helper applications (e.g., Acrobat, RealPlayer)
Maintains cache
Stores recently-viewed objects and ensures freshness
74
Web Browser History
1990, WorldWideWeb, Tim
Berners-Lee, NeXT computer
1993, Mosaic, Marc Andreessen
and Eric Bina
1994, Netscape
1995, Internet Explorer
.
Web Server
Handle client request:
1. Accept a TCP connection
2. Read and parse the HTTP request message
3. Translate the URI to a resource
4. Determine whether the request is authorized
5. Generate and transmit the response
Web site vs. Web server
Web site: one or more Web pages and objects
united to provide the user an experience of a
coherent collection
Web server: program that satisfies client requests
for Web resources
Some Important Milestones
1989 -
1990
Tim Berners-Lee & Robert Cailliau propose and
coin WorldWideWeb
- HTML, HTTP, URI
1991 Tim Berners-Lee writes first web browser
1993 Mosaic 1 - Supports images
1994 Netscape 1 - Multiple connections, cookies, <CENTER>
1996 CSS introduced
Netscape 2 & 3
Internet Explorer
- Separates content from structure
- Frames, JavaScript, mouseover
1997 HTML 4.0 - Tables, scripting, style sheets,
1997 -
1999
Boom!
XML
Dynamic content: DHTML/W3C DOM
Commerce
2000 -
now
Web 2.0 (coined 2004)
Accessibility, mobility, internationalization, voice,
media,
- The network is the platform
- Open data, user participation, rich user
experi ence
HTTP Performance
Most Web pages have multiple objects (items)
e.g., HTML file and a bunch of embedded images
How do you retrieve those objects?
One item at a time
What transport behavior
does this remind you of?
Fetch HTTP Items: Stop & Wait
Client Server
Finish; display
page
Start fetching
page
T
i
m
e
2 RTTs
per
object
Improving HTTP Performance:
Concurrent Requests & Responses
Use multiple connections in
parallel
Does not necessarily maintain
order of responses
Is this fair?
N parallel connections use bandwidth N times
more aggressively than just one
Whats a reasonable/fair limit as traffic
competes with that of other users?
Client = J Why?
Server = J Why?
Network = L Why?
R1
R2
R3
T1
T2
T3
Improving HTTP Performance:
Pipelined Requests & Responses
Batch requests and responses
Reduce connection overhead
Multiple requests sent in a single
batch
Small items (common) can also
share segments
Maintains order of responses
Item 1 always arrives before item
2
How is this different from
concurrent requests/responses?
What else could we do to speed
things up?
Client
Server
Improving HTTP Performance:
Persistent Connections
Enables multiple transfers per connection
Maintain TCP connection across multiple requests
Including transfers subsequent to current page
Client or server can tear down connection
Performance advantages:
Avoid overhead of connection set-up and tear-down
Allow TCP to learn more accurate RTT estimate
Allow TCP congestion window to increase
i.e., leverage previously discovered bandwidth
Default in HTTP/1.1
Improving HTTP Performance:
Caching
Many clients transfer same information
Generates redundant server and network load
Clients experience unnecessary latency
Server
Clients
Backbone ISP
ISP-1 ISP-2
Improving HTTP Performance:
Caching: How
Modifier to GET requests:
If-modified-since returns not modified if resource
not modified since specified time
Response header:
Expires how long its safe to cache the resource
No-cache ignore all caches; always get resource
directly from server
Improving HTTP Performance:
Caching: Why
Motive for placing content closer to client:
User gets better response time
Content providers get happier users
Time is money, really!
Network gets reduced load
Why does caching work?
Exploits locality of reference
How well does caching work?
Very well, up to a limit
Large overlap in content
But many unique requests
Improving HTTP Performance:
Caching on the Client
Example: Conditional GET Request
Return resource only if it has changed at the server
Save server resources!
How?
Client specifies if-modified-since time in request
Server compares this against last modified time of desired
resource
Server returns 304 Not Modified if resource has not changed
. or a 200 OK with the latest version otherwise
GET /~ee122/fa07/ HTTP/1.1
Host: inst.eecs.berkeley.edu
User-Agent: Mozilla/4.03
If-Modified-Since: Sun, 27 Aug 2006 22:25:50 GMT
<CRLF>
Request from client to server:
Improving HTTP Performance:
Caching with Reverse Proxies
Cache documents close to server
decrease server load
Typically done by content providers
Only works for static content
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
Improving HTTP Performance:
Caching with Forward Proxies
Cache documents close to clients
reduce network traffic and decrease latency
Typically done by ISPs or corporate LANs
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
Forward proxies
Improving HTTP Performance:
Caching w/ Content Distribution Networks
Integrate forward and reverse caching
functionality
One overlay network (usually) administered by one
entity
e.g., Akamai
Provide document caching
Pull: Direct result of clients requests
Push: Expectation of high access rate
Also do some processing
Handle dynamic web pages
Transcoding
Improving HTTP Performance:
Caching with CDNs (cont.)
Clients
ISP-1
Server
Forward proxies
Backbone ISP
ISP-2
CDN
Improving HTTP Performance:
CDN Example Akamai
Akamai creates new domain names for each client
content provider.
e.g., a128.g.akamai.net
The CDNs DNS servers are authoritative for the
new domains
The client content provider modifies its content so
that embedded URLs reference the new domains.
Akamaize content
e.g.: http://www.cnn.com/image-of-the-day.gif becomes
http://a128.g.akamai.net/image-of-the-day.gif
Improving HTTP Performance:
CDN Example Akamai
GET http://cnn.com
1 - DNS Lookup
2 - Fetch page w/ Akamaized
content
3 - DNS Lookup for Akamai URLs
4 - Fetch content
a
DNS server for
cnn.com
b
c
local
DNS server
cnn.com
Akamaizes its content.
Akamaized response object has inline
URLs for secondary content at (after
resolving CNAMEs) a1921.g.akamai.net
and other Akamai-managed DNS names.
akamai.net
DNS servers
lookup
a1921.g.akamai.net
Akamai servers store/cache
secondary content for
Akamaized services.
Improving HTTP Performance:
Caching and Replication
Caching (pull)
Replicate content on demand after a request
Store the response message locally for future use
Challenges:
May need to verify if the response has changed
and some responses are not cacheable
Replication (push)
Planned replication of content in multiple locations
Update of resources handled outside of HTTP
Can replicate scripts that create dynamic responses
Hosting: Multiple Sites Per
Machine
Multiple Web sites on a single machine
Hosting company runs the Web server on behalf of
multiple sites (e.g., www.foo.com and www.bar.com)
Problem: GET /index.html
www.foo.com/index.html or www.bar.com/index.html?
Solutions:
Multiple server processes on the same machine
Have a separate IP address (or port) for each server
Include site name in HTTP request
Single Web server process with a single IP address
Client includes Host header (e.g., Host: www.foo.com)
Required header with HTTP/1.1
Hosting: Multiple Machines Per
Site
Replicate a popular Web site across multiple machines
Helps to handle the load
Places content closer to clients
Helps when content isnt cacheable by proxies/CDNs
Problem: Want to direct client to a particular replica
Why?
Balance load across server replicas
Pair clients with nearby servers
Solution #1: Manual selection by clients
Each replica has its own site name
A Web page lists the replicas (e.g., by name, location)
and asks clients to click on a hyperlink to pick
Hosting: Multiple Machines Per
Site
Solution #2: single IP address, multiple machines
Run multiple machines behind a single IP address
Ensure all packets from a single
TCP connection go to the same replica
Load Balancer
64.236.16.20
Hosting: Multiple Machines Per
Site
Solution #3: multiple addresses, multiple machines
Same name but different addresses for all of the replicas
Configure DNS server to return different addresses
Internet
64.236.16.20
173.72.54.131
12.1.1.1
Conclusions
Key ideas underlying the Web
Uniform Resource Locator (URL)
HyperText Markup Language (HTML)
HyperText Transfer Protocol (HTTP)
Browser helper applications based on content type
Performance implications
Concurrent connections, pipelining, persistent conns.
Main Web infrastructure components
Clients, servers, proxies, CDNs