CSC3530: Software Technology
HTTP Protocol
Dickson K.W. Chiu
Dept. of Computer Science & Engineering
Chinese University of Hong Kong, Shatin, HK
1
HTTP Protocol
Dickson Chiu CSC3530 02d-2
What is HTTP?
◼ HTTP stands for the HyperText Transfer Protocol
◼ The HTTP protocol supports a short conversation
between browser and server
◼ The entire conversation is conducted using ASCII
characters (8-bit)
◼ The standard (and default) port for HTTP servers to
listen on is 80, though they can use any port
◼ HTTP is a stateless protocol, which means that
once a server has delivered the requested data to a
client, the connection is broken, and the server
retains no memory of what has just taken place
Dickson Chiu CSC3530 02d-3
Why do we need to know?
◼ Understand the mechanisms of
browsers and web servers
◼ Write you own web client
◼ Download managers
◼ Spiders for collecting web-based data
◼ Automate web interactions
Dickson Chiu CSC3530 02d-4
HTTP Session
◼ HTTP is used to transmit resources, not just
files
◼ A resource is some chunk of information that can
be identified by a URL
◼ The most common kind of resource is a file, but a
resource may also be a dynamically-generated
query result, the output of a CGI script
◼ A basic HTTP session has four phases:
1. Client opens the connection (a TCP connection)
2. Client makes the request
3. Server sends a response
4. Server closes the connection
Dickson Chiu CSC3530 02d-5
Keep-Alive
◼ “Compound pages” (e.g., one that contains
embedded images) require multiple passes to
retrieve, i.e., the browser must make a separate
access for each image
◼ Suppose a client accesses a page containing 10 inline
images; to display the page completely would require
11 HTTP sessions
◼ HTTP 1.1 browsers/servers support a feature called
keep-alive which can keep the connection open
until explicitly it is torn down
Open Persistent Connection: Get /index.html
Get /file1.gif
Get /file2.gif
Dickson Chiu CSC3530 02d-6
Request and Response
◼ A client's request consists of a request header which
specifies the HTTP method (command) to be used
and other things, and data (if any)
◼ GET /path/to/file/index.html HTTP/1.0
◼ A server's response consists of a response header
consisting of a status code indicating whether the
transaction was successful, and data (if any)
◼ “HTTP/1.0 404 Not Found” being the most famous response
line; 404 is the status code
◼ Refer to the HTTP/1.0 spec or the HTTP/1.1 spec for
details
Dickson Chiu CSC3530 02d-7
Request Commands
◼ GET returns the contents of the indicated document
◼ The most frequently used command
◼ HEAD returns the header information for the
indicated document
◼ Useful for finding out info about a resource but not
retrieving it
◼ POST treats the document as a script and send some
data to it
◼ PUT replaces the contents of the document with
some data (uploads file)
◼ DELETE deletes the indicated document
Dickson Chiu CSC3530 02d-8
Status Code
◼ The status code is a three-digit integer, and
the first digit identifies the general category
of response:
◼ 1xx indicates an informational message only
◼ 2xx indicates success of some kind (200 OK)
◼ 3xx redirects the client to another URL
(301 Moved Permanently)
◼ 4xx indicates an error on the client's part
(404 Not Found)
◼ 5xx indicates an error on the server's part
(500 Internal Server Error)
Dickson Chiu CSC3530 02d-9
Sample Client Browser Request
GET /index.html HTTP/1.1 Version
Accept: image/gif, image/x-xbitmap, image/ jpeg,
image/pjpeg, */*
Accept-Language: en-us Preferred language
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows
NT)
Host: hypothetical.ora.com Host with multiple names
Connection: Keep-Alive
Persistent connections
Dickson Chiu CSC3530 02d-10
Sample Web Server Response
HTTP/1.1 200 OK Status Code
Greenwich Mean Time
Date: Mon, 06 Dec 1999 20:54:26 GMT
Server: Apache/1.3.6 (Unix)
Last-Modified: Fri, 04 Oct 1996 14:06:11 GMT
Content-length: 327
Connection: close Client need to connect again for more request
Mime_version: 1.0
Content-type: text/html MIME content type
Blank line here …
<html><head><title>Sample Homepage</title></head>
<body><img src="/images/oreilly_mast.gif">
Web page content
<h1>Welcome</h1>Hi there, this is a simple web page. Granted, it
may not be as elegant as some other web pages ...
Dickson Chiu CSC3530 02d-11
MIME types
◼ Multi-purpose Internet Mail Extension
◼ Originate from mechanisms for
transmitting email
◼ Format: content-type: type/subtype
◼ Include: text, audio, video, still images,
compressed file (zipped, gzipped) …
◼ HTTP itself is not fully MIME-compliant
application
Dickson Chiu CSC3530 02d-12
POST
◼ POST is mostly used in form-filling: the
contents of a form are translated by the
browser into some special format and sent to
a script on the server using the POST
command
◼ Here's a typical form submission, using POST:
POST /path/script.cgi HTTP/1.0
From:
[email protected] User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
Dickson Chiu CSC3530 02d-13
HTTP Proxy
◼ An HTTP proxy is a program that acts as an
intermediary between a client and a server
◼ It receives requests from clients, and forwards those
requests to the intended servers
◼ Sometimes it serves the requests itself, e.g., when it has a
copy of the requested resource
◼ The responses pass back through it in the same way
◼ Thus, a proxy has functions of both a client and a server
◼ Requests use the complete URL of the resource being
requested, instead of just the path:
◼ GET http://www.somehost.com/path/file.html HTTP/1.0
◼ That way, the proxy knows which server to forward
the request to
client proxy server
Dickson Chiu CSC3530 02d-14
Virtual Hosts
◼ With HTTP 1.1, one server at one IP address can be
multi-homed:
◼ “www.speech.cse.cuhk.hk” and “www.graphics.cse.chku.hk”
can live on the same server, i.e., virtual hosts
◼ Without the mechanism, we have to use
“www.cse.cuhk.hk/speech/” & “www.cse.cuhk.hk/graphics/”
◼ Every HTTP request must specify which host name
(and possibly port) the request is intended for:
GET /path/file.html HTTP/1.1
Host: www.host1.com:80
◼ Advantages
◼ reduces hardware expenditures
◼ extends ability to support additional servers makes load
balancing and capacity planning much easier
◼ Save IP address
Dickson Chiu CSC3530 02d-15
HTTP/1.1
◼ HTTP/1.1 is replacing/has replaced HTTP/1.0 as the
new Web protocol
◼ HTTP/1.1 has a number of features/improvements
over HTTP/1.0, including
◼ Persistent TCP connections
◼ Partial document transfers (resume large file transfer)
◼ Conditional fetch
◼ Support for nonstandard HTTP/1.0 extensions
◼ Better support for alternative character sets
◼ More flexible authentication
◼ Faster response and great bandwidth savings
◼ Efficient use of IP addresses (virtual hosting)
Dickson Chiu CSC3530 02d-16
HTTP Experiment via Telnet
◼ Test HTTP via telnet command
(preferably on UNIX)
◼ Steps
◼ Logon your UNIX account
◼ telnet www.yahoo.com 80
◼ GET /index.html
◼ The method can be used to test other
web protocols …
Dickson Chiu CSC3530 02d-17
Sample HTTP Session via Telnet
$ telnet www.yahoo.com 80
Trying 216.32.74.50...
Connected to www.yahoo.akadns.net.
Escape character is ’^]’.
GET /
HTTP/1.0 200 OK
Content-Length: 15582
Content-Type: text/html
<html><head><title>Yahoo!</title>
<base href=http://www.yahoo.com/>
...
Dickson Chiu CSC3530 02d-18
Using Download Managers
◼ Try ReGet (http://www.reget.com/)
INF Log started (ReGet 1.9 Pro (build 519) )
STA Going to: [Waiting in queue]
INF Starting download [http://www.cse.cuhk.edu.hk/index.html]. Attempt N 1
INF Connecting to Internet
STA Going to: [Request]
INF Connecting to www.cse.cuhk.edu.hk (137.189.91.192:80)
OUT GET /index.html HTTP/1.0
OUT User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
OUT Accept: */*
OUT Range: bytes=0-
OUT Referer: http://www.cse.cuhk.edu.hk/
OUT Host: www.cse.cuhk.edu.hk Hidden Proxy of
IN HTTP/1.0 200 OK IMS…
IN Date: Sun, 08 Sep 2002 18:08:39 GMT
IN Content-Type: text/html
IN Server: Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.6d
IN Via: 1.1 imsbbcache08 (NetCache NetApp/5.2.1R1D4)
STA Going to: [Download]
INF By the 00:00:00 downloaded 2689 bytes at speed of 2689 b/s
INF Download succeeded
STA Going to: [Ok]
Dickson Chiu CSC3530 02d-19
Using Download Managers (FTP)
INF Log started (ReGet 1.9 Pro (build 519) )
STA Going to: [Waiting in queue]
INF Starting download [ftp://ftp.us2.nero.com/Nero5599.exe]. Attempt N 1
STA Going to: [Request]
INF Connecting to ftp.us2.nero.com (64.124.173.25:21)
IN 220 Serv-U FTP Server v4.0 for WinSock ready...
OUT USER ******
IN 331 User name okay, please send complete E-mail address as password.
OUT PASS ******
IN 230 User logged in, proceed.
OUT SYST
IN 215 UNIX Type: L8
OUT PWD
IN 257 "/" is current directory.
OUT TYPE I
IN 200 Type set to I.
OUT REST 100
IN 350 Restarting at 100. Send STORE or RETRIEVE.
Dickson Chiu CSC3530 02d-20
Using Download Managers (FTP)
OUT REST 0
IN 350 Restarting at 0. Send STORE or RETRIEVE.
OUT CWD /Nero5599.exe
IN 550 /Nero5599.exe: No such file or directory.
OUT PASV
IN 227 Entering Passive Mode (64,124,173,25,16,24)
OUT LIST -la Nero5599.exe
IN 150 Opening ASCII mode data connection for /bin/ls.
IN -rw-rw-rw- 1 user group 12785670 Aug 30 15:47 nero5599.exe
IN 226 Transfer complete.
OUT SIZE Nero5599.exe
IN 213 12785670
OUT PASV
IN 227 Entering Passive Mode (64,124,173,25,16,27)
OUT RETR Nero5599.exe
IN 150 Opening BINARY mode data connection for Nero5599.exe (12785670 bytes).
STA Going to: [Download]
INF By the 00:07:18 downloaded 12785670 bytes at speed of 29124 b/s
IN 226 Transfer complete.
INF Download succeeded
STA Going to: [Ok]
Dickson Chiu CSC3530 02d-21