Web Engineering
HTTP Protocol
Anup Majumder
Lecturer, CSE, DIU
Internet and Web
HTML tells the browser how to present the
content to the user.
Web and HyperText Transfer Protocol (HTTP)
First some jargon
Web page consists of objects
Object can be HTML file, JPEG image, Java applet, audio
file,…
Web page consists of base HTML-file which includes several
referenced objects
Each object is addressable by a URL
Example URL:
www.someschool.edu/someDept/pic.gif
path name
host name
URL
HTTP overview
HTTP: hypertext transfer protocol
Web’s application layer protocol HT
TP
requ
client/server model PC running HT est
TP
Explorer res
client: browser that requests, receives, “displays” Webpoobjects
nse
server: Web server sends objects in response to requests
HTTP 1.0: RFC 1945 e st
u
r eq e Server
TP o ns
HTTP 1.1: RFC 2068 HT
es
p running
P r
T Apache Web
HT
server
Mac running
Navigator
Ports
The TCP port numbers from 0
to 1023 are reserved for well-
known services.
Don’t use these ports for your
own custom server programs!
HTTP overview (continued)
Uses TCP: HTTP is “stateless”
client initiates TCP connection server maintains no
(creates socket) to server, port 80 information about past
server accepts TCP connection client requests
from client aside
HTTP messages (application- Protocols that maintain “state”
layer protocol messages) are complex!
exchanged between browser past history (state) must be
(HTTP client) and Web server maintained
(HTTP server) if server/client crashes, their
TCP connection closed views of “state” may be
inconsistent, must be
reconciled
HTTP connections
Persistent HTTP
Nonpersistent HTTP
Multiple
At most one
objects
object
canisbe
sent
sent
over
over
a TCP
single
connection.
TCP connection
between
HTTP/1.0client
uses and server. HTTP
nonpersistent
HTTP/1.1 uses persistent connections in default mode
Nonpersistent HTTP
Suppose user enters URL
www.someSchool.edu/someDepartment/home.index (contains text,
references to
10
1a. HTTP client initiates TCP jpeg images)
connection to HTTP server (process)
at www.someSchool.edu on port
1b. HTTP server at host
www.someSchool.edu waiting
80
for TCP connection at port 80.
“accepts” connection, notifying
client
2. HTTP client sends HTTP
request message (containing
URL) into TCP connection 3. HTTP server receives request
socket. Message indicates that message, forms response
client wants object message containing requested
someDepartment/home.index object, and sends message into
its socket
time
Nonpersistent HTTP (cont.)
4. HTTP server closes TCP
connection.
5. HTTP client receives response
message containing html file,
displays html. Parsing html file,
finds 10 referenced jpeg objects
time
6. Steps 1-5 repeated for each of
10 jpeg objects
Response time modeling
Definition of RRT: time to send
a small packet to travel from
client to server and back.
initiate TCP
Response time: connection
one RTT to initiate TCP RTT
connection request
file
one RTT for HTTP request time to
RTT
transmit
and first few bytes of HTTP file
file
response to return received
file transmission time
time time
total = 2RTT+transmit time
Persistent HTTP
Nonpersistent HTTP issues: Persistent without pipelining:
requires 2 RTTs per object client issues new request only
OS must work and allocate host when previous response has
resources for each TCP been received
connection one RTT for each referenced
but browsers often open parallel object
TCP connections to fetch Persistent with pipelining:
referenced objects default in HTTP/1.1
Persistent HTTP client sends requests as soon as
server leaves connection open it encounters a referenced
after sending response object
subsequent HTTP messages as little as one RTT for all the
between same client/server are referenced objects
sent over connection
HTTP request message
two types of HTTP messages: request, response
HTTP request message:
ASCII (human-readable format)
request line
(GET, POST, GET /somedir/page.html HTTP/1.1
HEAD commands) Host: www.someschool.edu
User-agent: Mozilla/4.0
header Connection: close
lines Accept-language:fr
Carriage return, (extra carriage return, line feed)
line feed
indicates end
of message
HTTP request message
Anatomy of an HTTP GET request
Anatomy of an HTTP GET request
Ch 3 - 18
Anatomy of an HTTP POST request
Anatomy of an HTTP POST
Ch 3 - 20
request
HTTP request message: general format
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
User-agent: Mozilla/4.0
Connection: close
Accept-language:fr
(extra carriage return, line feed)
HTTP request message: general format
Now let's look at the header lines in the example. The header line HOST: www.someschool.edu specifies the host on which the
object resides. You night think that this header line is unnecessary, as there is already a TCP connection in place to the host. But,
as we'll see in Section 2.2.6, the information provided by the host header line is required by Web proxy caches. By including
theConnection:close header line, the browser is telling the server that it doesn't want to use persistent connections; it wants the
server to close the connection after sending the requested object. Thus the browser that generated this request message
implements HTTP/1.1 but it doesn't want to bother with persistent connections. The User-agent: header line specifies the user
agent, that is, the browser type that is making the request to the server . Here the user agent is Mozilla/4.0, a Netscape browser.
This header line is useful because the server can actually send different versions of the same object to different types of user
agents. (Each of the versions is addressed by the same URL.) Finally, the Accept-language: header indicates that the user prefers
to receive a French version of the object, if such an object exists on the server; otherwise, the server should send its default
version.
The Entity Body is not used with the GET method, but is used with the POST method. The HTTP client uses the POST method
when the user fills out a form
Method types
HTTP/1.1
HTTP/1.0
GET,
GET POST, HEAD
PUT
POST
uploads file in entity body to path specified in URL field
HEAD
DELETE
asks server to leave requested object out of response
deletes file specified in the URL field
HTTP response message
status line
(protocol
status code HTTP/1.1 200 OK
status phrase) Connection close
Date: Thu, 06 Aug 1998 12:00:15 GMT
header Server: Apache/1.3.0 (Unix)
lines Last-Modified: Mon, 22 Jun 1998 …...
Content-Length: 6821
Content-Type: text/html
data, e.g., data data data data data ...
requested
HTML file
HTTP response status codes
In first line in server->client response message.
A few sample codes:
200 OK
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in this message
(Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
User-Server Interaction: Authorization and Cookies
HTTP server is stateless – simplifies server design
Sometime server needs to identify user
Two mechanism for identification:
1. Authorization & 2. CooKies
Authorization :
1) Provide username and password to access documents on server
2) Status code 401: Authorization Required
User-server state: cookies
Example:
Many major Web sites use cookies
Susan access Internet always from same PC
Four components:
1) She visits a specific
line ine-commerce site for first time
cookie header the HTTP response message
When initial HTTP requests arrives at site, site creates a unique ID and
2) cookie header line in HTTP request message
creates an entry in backend database for ID
3) cookie file kept on user’s host and managed by user’s browser
4) back-end database at Web site
Cookies: keeping “state” (cont.)
client server
Cookie file usual http request msg en
server da try i
tab n b
usual http response + creates ID as ac
e
ebay: 8734 k en
Set-cookie: 1678 1678 for user d
Cookie file
usual http request msg
amazon: 1678 cookie: 1678 cookie- ss
ebay: 8734 specific acce
usual http response msg action
s
one week later:
s
ce
ac
Cookie file usual http request msg
cookie-
amazon: 1678
cookie: 1678
spectific
ebay: 8734 usual http response msg action
Cookies (continued)
aside
Cookies and privacy:
What cookies can bring: cookies permit sites to learn a lot about you
you may supply name and e-mail to sites
authorization
search engines use redirection & cookies to
learn yet more
advertising companies obtain info across sites
shopping carts
recommendations
user session state (Web e-
mail)
Thank you