Lecture No. 2 (HTTP)
Lecture No. 2 (HTTP)
4
Example
• Now assume we need to read a scientific document that
contains one reference to another text file and one reference
to a large image. Figure below shows the situation.
• The main document and the image are stored in two
separate files in the same site (file A and file B); the
referenced text file is stored in another site (file C).
• Since we are dealing with three different files, we need
three transactions if we want to see the whole document.
• The first transaction (request/response) retrieves a copy of
the main document (file A), which has a reference (link) to
the second and the third files.
5
Note that although files A and B both are stored in site 1, they
are independent files with different names and addresses. Two
transactions are needed to read them. 6
Hypertext and Hypermedia
8
The controller receives input from the keyboard
or the mouse and uses the client programs to
access the document.
After the document has been accessed, the
controller uses one of the interpreters to display
the document on the screen.
The interpreter can be HTML, Java, or
JavaScript, depending on the type of document.
The client protocol can be one of the protocols
such as FTP, or TELNET, or HTTP.
Some commercial browsers include:
Internet Explorer, Netscape Navigator, and Firefox.9
Web Server
The Web page is stored at the server.
Each time a client request arrives, the corresponding
document is sent to the client.
To improve efficiency, servers normally store requested
files in a cache in memory; memory is faster to access than
disk.
If the server use the multithreading or multiprocessing,
then it can answer more than one request at a time (more
efficient ).
Some popular Web servers include Apache, Microsoft
Internet Information Server, and Sun Java System Web
Server.
10
Uniform Resource Locator (URL)
A client that wants to access a Web page needs the file
name and the address.
To facilitate the access of documents distributed throughout
the world, HTTP uses locators.
The uniform resource locator (URL) is a standard locator
for specifying any kind of information on the Internet.
The URL defines four things: protocol, host computer,
port, and path (see Figure below).
11
1- The protocol is the client-server application program used to
retrieve the document. Many different protocols can retrieve a
document; among them are Gopher, FTP, HTTP, News, and
TELNET. The most common today is HTTP.
2- The host is the domain name of the computer on which the
information is located or stored, and usually begin with the characters
“www”.
This is not mandatory, however, as the host can have any domain
name.
3-The URL can optionally contain the port number of the server.
If the port is included, it is inserted between the host and the path,
and it is separated from the host by a colon.
4- Path is the pathname of the file where the information is located.
In other words, the path defines the complete file name where the
document is stored in the directory system.
12
Example:
If you write the following sentence in the Google search
field:
What is URL?
The following URL will be generated:
https://www.google.com/search?q=what+is+URL?
https: This is the protocol used to transfer the data, which ensures a
secure connection.
www.google.com: This is the domain name, which identifies the
specific server hosting the resource.
/search: This is the path, specifying the location of the specific
resource within the server (in this case, the Google search function).
q=what+is+a+URL: This is the query string, containing additional
information about the request.
13
NOTES
• HTML is called HyperText Markup Language .
• Hypertext is text displayed on a computer screen or other
electronic device that contains links to other information that the
user can easily access. In other words:
• Non-linear: It is organized in a way that connects related
information to each other.
• Interactive: It allows the user to control how they explore the
information by clicking on links.
• Multimedia: It can include text, images, sound, and video.
• Markup means to structure it in a specific format.
• HTML is the standard markup language for Web pages.
HTML elements are the building blocks of HTML pages.
• Hypertext and hypermedia documents are linked to one another
through pointers.
14
WEB Documents:
The documents in the WWW can be grouped into three
broad categories:
static,
dynamic, and
active.
The category is based on the time the contents of the
document are determined.
1- Static documents
Static documents are fixed-content documents that are
created and stored in a server.
The client can get a copy of the document only.
15
• In other words, the contents of the file are determined
when the file is created, not when it is used.
• Of course, the contents in the server can be changed, but
the user cannot change them.
• When a client accesses the document, a copy of the
document is sent.
• The user can then use a browsing program to display the
document.
• Static documents are prepared using one of the several
languages:
• Hypertext Markup Language (HTML),
• Extensible Markup Language (XML),
• Extensible Style Language (XSL), and
• Extended Hypertext Markup Language (XHTML). 16
2- Dynamic document
Dynamic document is created by a Web server whenever
a browser requests the document.
When a request arrives, the Web server runs an
application program that creates the dynamic document.
The server returns the output of the program as a response
to the browser that requested the document.
Because a fresh document is created for each request, the
contents of a dynamic document may vary from one
request to another.
A very simple example of a dynamic document is the
retrieval of the time and date from a server.
Time and date are kinds of information that are dynamic in
that they change from moment to moment. 17
3- Active Documents :
For many applications, we need a program or a
script to be run at the client site. These are called
active documents.
For example, suppose we want to run a program
that creates animated graphics on the screen or a
program that interacts with the user.
The program needs to be run at the client site
where the animation or interaction takes place.
When a browser requests an active document, the
server sends a copy of the document or a script. The
document is then run at the client (browser) site. 18
Hypertext Transfer Protocol (HTTP)
The Hypertext Transfer Protocol (HTTP) is the main
protocol used to access data on the World Wide Web
(WWW).
HTTP uses a TCP connection to transfer files. HTTP
transactions are made of request and response
messages.
However, it is much simpler than FTP because it uses
only one TCP connection.
There is no separate control connection; only data are
transferred between the client and the server.
HTTP/3 (RFC9114) release: Launched in 2020, it offers
faster speeds and better efficiency than previous
versions.
HTTPS usage is growing for secure data transmission.19
• HTTP is an asymmetric request-response client-
server protocol as illustrated in Figure below.
• An HTTP client sends a request message to an HTTP
server. The server, in turn, returns a response message.
• In other words, HTTP is a pull protocol, the
client pulls information from the server (instead of
server pushes information down to the client).
20
• HTTP protocol is implemented in two programs: a
client program and a server program, talk to each
other by exchanging HTTP messages (request and
response).
• HTTP protocol defines the structure of these
messages and how the client and server exchange the
messages.
• The general idea of HTTP protocol is that the:
1. user requests a Web page (clicks on a hyperlink),
2. the browser sends HTTP request messages for the
objects in the page to the server.
3. The server receives the requests and responds with
HTTP response messages that contain the objects.21
• HTTP uses TCP as its underlying transport protocol.
• HTTP uses the services of TCP on well-known port 80.
• The HTTP client first initiates a TCP connection with
the server.
• Once the connection is established, the client and the
server processes access TCP through their socket
interfaces.
• It is important to note that the server sends requested
files to clients without storing any state information
about the client.
• If a particular client asks for the same object twice in a
period of a few seconds, the server resends the object, as
it has completely forgotten what it did earlier.
• Because an HTTP server maintains no information
about the clients, HTTP is said to be a stateless protocol.
22
The commands from the client to the server are embedded
in a request message.
The contents of the requested file or other information are
embedded in a response message.
Notes
• A Web page (also called a document) consists of
objects.
• An object is simply a file such as an HTML file, a
JPEG image, a Java applet, or a video clip that is
addressable by a single URL.
• Web browsers (such as Internet Explorer and Firefox)
implement the client side of HTTP.
• Web servers implement the server side of HTTP and
house Web objects, each addressable by a URL. 23
Figure below illustrates the HTTP transaction between
the client and server.
The client initializes the transaction by sending a
request. The server replies by sending a response.
24
1-Request Message
The format of the request is shown in Figure below. A
request message consists of a request line, a header, and
sometimes a body.
25
Request Line The first line in a request message is called
a request line.
There are three fields in this line separated by some
character delimiter as shown in Figure above.
The fields are called Methods, URL, and Version.
These three fields should be separated by a space
character.
At the end two characters, a carriage return followed by a
line feed, terminate the line.
The method field defines the request type. In version 1.1
of HTTP, several methods are defined, as shown in Table
below. 26
PATCH Applies partial modifications to an existing resource.
27
The second field, URL, defines the address and name
of corresponding Web page.
The third field, Version, gives the version of the
protocol; the most current version of HTTP is 2.
Header Lines in Request Message
After the request line, we can have zero or more request
header lines.
Each header line sends additional information from the
client to the server.
For example, the client can request that the document be
sent in a special format.
Each header line has a header name, a colon, a space,
and a header value (see Figure above).
28
Table below shows some header names commonly used in
a request.
The value field defines the values associated with each
header name. The list of values can be found in the
corresponding RFCs.
29
Body In Request Message
The body can be present in a request message.
Usually, it contains the comment to be sent.
2- Response Message
The format of the response message is shown in
Figure below. A response message consists of a
status line,
header lines,
a blank line and
sometimes a body.
30
31
Status Line
• The first line in a response message is called the status line.
• There are three fields in this line separated by spaces and
terminated by a carriage return and line feed.
• The first field defines the version of HTTP protocol, currently 2.
• The status code field defines the status of the request. It
consists of three digits. Whereas:
The codes in the 100 range are only informational,
The codes in the 200 range indicate a successful request.
The codes in the 300 range redirect the client to another URL.
The codes in the 400 range indicate an error at the client
site.
Finally, the codes in the 500 range indicate an error at the
server site. 32
• The status phrase explains the status code in text form.
The possible values for the status code and status phrase
are shown in Table below.
33
Header Lines In Response Message
After the status line, we can have zero or more header
lines. Each header line sends additional information from
the server to the client.
Each header line has a header name, a colon, a space, and
a header value.
Table below shows some header names commonly used in
a response message.
34
The value field defines the values associated with each
header name. The list of values can be found in the
corresponding RFCs.
Body
The body contains the document to be sent from the server
to the client.
The body is present unless the response is an error message.
Example
This example retrieves a document:
We use the GET method to retrieve an image with
the path /usr/bin/image1.
35
36
o The request message contains:
1. The request line shows the method (GET), the URL, and the
HTTP version (1.1).
2. The header has two lines that show that the client can accept
images in the GIF or JPEG format.
3. The request does not have a body.
Example
Figure below shows an example of a nonpersistent
connection.
Example
Figure below shows the same scenario as Example above,
but using persistent connection. 46
47
HTTP cookie
o The World Wide Web was originally designed as a
stateless entity.
o HTTP can use cookies to keep the state of the transactions.
o The server sends a cookie that can be stored in the client
and be retrieved later by the server.
o An HTTP cookie (web cookie, browser cookie) is a small
piece of data that a server sends to the user's web browser.
o The browser may store it and send it back with the next
request to the same server.
o Typically, it's used to tell if two requests came from the
same browser — keeping a user logged-in, for example.
o It remembers stateful information for the stateless HTTP
protocol.
48
Web Caching: Proxy Server
HTTP supports proxy servers.
A proxy server is a computer that keeps copies of
responses to recent requests.
The HTTP client sends a request to the proxy
server. The proxy server checks its cache. If the
response is not stored in the cache, the proxy server
sends the request to the corresponding server.
Incoming responses are sent to the proxy server and
stored for future requests from other clients.
The proxy server reduces the load on the original
server, decreases traffic, and improves latency.49
However, to use the proxy server, the client must
be configured to access the proxy instead of the
target server.