0% found this document useful (0 votes)
13 views52 pages

Lecture No. 2 (HTTP)

The document provides an overview of the World Wide Web (WWW) and the Hypertext Transfer Protocol (HTTP), detailing the architecture of the web, types of web documents, and the request-response model of HTTP. It explains the roles of web clients (browsers) and servers, the structure of URLs, and the differences between static, dynamic, and active documents. Additionally, it covers the format of HTTP request and response messages, including methods, status codes, and headers.

Uploaded by

altaaiy22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views52 pages

Lecture No. 2 (HTTP)

The document provides an overview of the World Wide Web (WWW) and the Hypertext Transfer Protocol (HTTP), detailing the architecture of the web, types of web documents, and the request-response model of HTTP. It explains the roles of web clients (browsers) and servers, the structure of URLs, and the differences between static, dynamic, and active documents. Additionally, it covers the format of HTTP request and response messages, including methods, status codes, and headers.

Uploaded by

altaaiy22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

World Wide Web and HTTP

• The World Wide Web (WWW) is a storehose of


information linked together from points all over the world.
• WWW is characterized by flexibility, portability, and
ease of use that distinguishes it from other services
provided by the Internet.
• The WWW project was initiated by CERN (European
Council for Nuclear Research, currently known as
European Laboratory for Particle Physics) to create a
system to handle distributed resources necessary for
scientific research.
• In this lecture we first discuss issues related to the Web. We then
discuss a protocol, HTTP, that is used to retrieve information
from the Web. 1
ARCHITECTURE
 The WWW today is a distributed client-server service, in
which a client using a browser can access a service using
a server.
 However, the service provided is distributed over many
locations called sites.
 Each site holds one or more documents, referred to as
Web pages.
 Each Web page, however, can contain some links to other
Web pages in the same or other sites.
 In other words, a Web page can be simple or composite.
 A simple Web page has no link to other Web pages;
a composite Web page has one or more links to other
Web pages.
 Each Web page is a file with a name and address. 2
Front-end focuses on design and user experience, while back-end
focuses on data and functionality.
3
Example
Assume we need to read a Web page that contains the
biography of a famous character with some pictures, which
are embedded in the page itself. Since the pictures are not
stored as separate files, the whole document is a simple
Web page. It can be retrieved using one single request/
response transaction, as shown in Figure below.

4
Example
• Now assume we need to read a scientific document that
contains one reference to another text file and one reference
to a large image. Figure below shows the situation.
• The main document and the image are stored in two
separate files in the same site (file A and file B); the
referenced text file is stored in another site (file C).
• Since we are dealing with three different files, we need
three transactions if we want to see the whole document.
• The first transaction (request/response) retrieves a copy of
the main document (file A), which has a reference (link) to
the second and the third files.
5
Note that although files A and B both are stored in site 1, they
are independent files with different names and addresses. Two
transactions are needed to read them. 6
Hypertext and Hypermedia

• Hypertext means creating documents that refer to


other documents.
• In a hypertext document, a part of text can be
defined as a link to another document.
• When a hypertext is viewed with a browser, the
link can be clicked to retrieve the other
document.
• Hypermedia is a term applied to document that
contains links to other textual document or
documents containing graphics, video, or audio.7
Web Client (Browser)
• A variety of vendors offer commercial browsers that
interpret and display a Web document, and all of
them use nearly the same architecture.
• Each browser usually consists of three parts: a
controller, client protocol, and interpreters. (see
Figure below)

8
 The controller receives input from the keyboard
or the mouse and uses the client programs to
access the document.
 After the document has been accessed, the
controller uses one of the interpreters to display
the document on the screen.
 The interpreter can be HTML, Java, or
JavaScript, depending on the type of document.
 The client protocol can be one of the protocols
such as FTP, or TELNET, or HTTP.
 Some commercial browsers include:
Internet Explorer, Netscape Navigator, and Firefox.9
Web Server
The Web page is stored at the server.
Each time a client request arrives, the corresponding
document is sent to the client.
To improve efficiency, servers normally store requested
files in a cache in memory; memory is faster to access than
disk.
If the server use the multithreading or multiprocessing,
then it can answer more than one request at a time (more
efficient ).
Some popular Web servers include Apache, Microsoft
Internet Information Server, and Sun Java System Web
Server.
10
Uniform Resource Locator (URL)
A client that wants to access a Web page needs the file
name and the address.
To facilitate the access of documents distributed throughout
the world, HTTP uses locators.
The uniform resource locator (URL) is a standard locator
for specifying any kind of information on the Internet.
The URL defines four things: protocol, host computer,
port, and path (see Figure below).

11
1- The protocol is the client-server application program used to
retrieve the document. Many different protocols can retrieve a
document; among them are Gopher, FTP, HTTP, News, and
TELNET. The most common today is HTTP.
2- The host is the domain name of the computer on which the
information is located or stored, and usually begin with the characters
“www”.
This is not mandatory, however, as the host can have any domain
name.
3-The URL can optionally contain the port number of the server.
If the port is included, it is inserted between the host and the path,
and it is separated from the host by a colon.
4- Path is the pathname of the file where the information is located.
In other words, the path defines the complete file name where the
document is stored in the directory system.
12
Example:
If you write the following sentence in the Google search
field:
What is URL?
The following URL will be generated:
https://www.google.com/search?q=what+is+URL?

https: This is the protocol used to transfer the data, which ensures a
secure connection.
www.google.com: This is the domain name, which identifies the
specific server hosting the resource.
/search: This is the path, specifying the location of the specific
resource within the server (in this case, the Google search function).
q=what+is+a+URL: This is the query string, containing additional
information about the request.
13
NOTES
• HTML is called HyperText Markup Language .
• Hypertext is text displayed on a computer screen or other
electronic device that contains links to other information that the
user can easily access. In other words:
• Non-linear: It is organized in a way that connects related
information to each other.
• Interactive: It allows the user to control how they explore the
information by clicking on links.
• Multimedia: It can include text, images, sound, and video.
• Markup means to structure it in a specific format.
• HTML is the standard markup language for Web pages.
HTML elements are the building blocks of HTML pages.
• Hypertext and hypermedia documents are linked to one another
through pointers.
14
WEB Documents:
The documents in the WWW can be grouped into three
broad categories:
 static,
 dynamic, and
 active.
The category is based on the time the contents of the
document are determined.
1- Static documents
Static documents are fixed-content documents that are
created and stored in a server.
The client can get a copy of the document only.
15
• In other words, the contents of the file are determined
when the file is created, not when it is used.
• Of course, the contents in the server can be changed, but
the user cannot change them.
• When a client accesses the document, a copy of the
document is sent.
• The user can then use a browsing program to display the
document.
• Static documents are prepared using one of the several
languages:
• Hypertext Markup Language (HTML),
• Extensible Markup Language (XML),
• Extensible Style Language (XSL), and
• Extended Hypertext Markup Language (XHTML). 16
2- Dynamic document
Dynamic document is created by a Web server whenever
a browser requests the document.
When a request arrives, the Web server runs an
application program that creates the dynamic document.
The server returns the output of the program as a response
to the browser that requested the document.
Because a fresh document is created for each request, the
contents of a dynamic document may vary from one
request to another.
A very simple example of a dynamic document is the
retrieval of the time and date from a server.
Time and date are kinds of information that are dynamic in
that they change from moment to moment. 17
3- Active Documents :
For many applications, we need a program or a
script to be run at the client site. These are called
active documents.
For example, suppose we want to run a program
that creates animated graphics on the screen or a
program that interacts with the user.
The program needs to be run at the client site
where the animation or interaction takes place.
When a browser requests an active document, the
server sends a copy of the document or a script. The
document is then run at the client (browser) site. 18
Hypertext Transfer Protocol (HTTP)
 The Hypertext Transfer Protocol (HTTP) is the main
protocol used to access data on the World Wide Web
(WWW).
 HTTP uses a TCP connection to transfer files. HTTP
transactions are made of request and response
messages.
 However, it is much simpler than FTP because it uses
only one TCP connection.
 There is no separate control connection; only data are
transferred between the client and the server.
 HTTP/3 (RFC9114) release: Launched in 2020, it offers
faster speeds and better efficiency than previous
versions.
 HTTPS usage is growing for secure data transmission.19
• HTTP is an asymmetric request-response client-
server protocol as illustrated in Figure below.
• An HTTP client sends a request message to an HTTP
server. The server, in turn, returns a response message.
• In other words, HTTP is a pull protocol, the
client pulls information from the server (instead of
server pushes information down to the client).

20
• HTTP protocol is implemented in two programs: a
client program and a server program, talk to each
other by exchanging HTTP messages (request and
response).
• HTTP protocol defines the structure of these
messages and how the client and server exchange the
messages.
• The general idea of HTTP protocol is that the:
1. user requests a Web page (clicks on a hyperlink),
2. the browser sends HTTP request messages for the
objects in the page to the server.
3. The server receives the requests and responds with
HTTP response messages that contain the objects.21
• HTTP uses TCP as its underlying transport protocol.
• HTTP uses the services of TCP on well-known port 80.
• The HTTP client first initiates a TCP connection with
the server.
• Once the connection is established, the client and the
server processes access TCP through their socket
interfaces.
• It is important to note that the server sends requested
files to clients without storing any state information
about the client.
• If a particular client asks for the same object twice in a
period of a few seconds, the server resends the object, as
it has completely forgotten what it did earlier.
• Because an HTTP server maintains no information
about the clients, HTTP is said to be a stateless protocol.
22
The commands from the client to the server are embedded
in a request message.
The contents of the requested file or other information are
embedded in a response message.

Notes
• A Web page (also called a document) consists of
objects.
• An object is simply a file such as an HTML file, a
JPEG image, a Java applet, or a video clip that is
addressable by a single URL.
• Web browsers (such as Internet Explorer and Firefox)
implement the client side of HTTP.
• Web servers implement the server side of HTTP and
house Web objects, each addressable by a URL. 23
 Figure below illustrates the HTTP transaction between
the client and server.
 The client initializes the transaction by sending a
request. The server replies by sending a response.

24
1-Request Message
The format of the request is shown in Figure below. A
request message consists of a request line, a header, and
sometimes a body.

25
Request Line The first line in a request message is called
a request line.
There are three fields in this line separated by some
character delimiter as shown in Figure above.
The fields are called Methods, URL, and Version.
These three fields should be separated by a space
character.
At the end two characters, a carriage return followed by a
line feed, terminate the line.
 The method field defines the request type. In version 1.1
of HTTP, several methods are defined, as shown in Table
below. 26
PATCH Applies partial modifications to an existing resource.

The Methods Types in Request message

27
 The second field, URL, defines the address and name
of corresponding Web page.
 The third field, Version, gives the version of the
protocol; the most current version of HTTP is 2.
Header Lines in Request Message
After the request line, we can have zero or more request
header lines.
Each header line sends additional information from the
client to the server.
For example, the client can request that the document be
sent in a special format.
Each header line has a header name, a colon, a space,
and a header value (see Figure above).
28
Table below shows some header names commonly used in
a request.
The value field defines the values associated with each
header name. The list of values can be found in the
corresponding RFCs.

29
Body In Request Message
The body can be present in a request message.
Usually, it contains the comment to be sent.

2- Response Message
The format of the response message is shown in
Figure below. A response message consists of a
status line,
header lines,
a blank line and
sometimes a body.
30
31
Status Line
• The first line in a response message is called the status line.
• There are three fields in this line separated by spaces and
terminated by a carriage return and line feed.
• The first field defines the version of HTTP protocol, currently 2.
• The status code field defines the status of the request. It
consists of three digits. Whereas:
 The codes in the 100 range are only informational,
 The codes in the 200 range indicate a successful request.
 The codes in the 300 range redirect the client to another URL.
 The codes in the 400 range indicate an error at the client
site.
 Finally, the codes in the 500 range indicate an error at the
server site. 32
• The status phrase explains the status code in text form.
The possible values for the status code and status phrase
are shown in Table below.

33
Header Lines In Response Message
After the status line, we can have zero or more header
lines. Each header line sends additional information from
the server to the client.
Each header line has a header name, a colon, a space, and
a header value.
Table below shows some header names commonly used in
a response message.

34
The value field defines the values associated with each
header name. The list of values can be found in the
corresponding RFCs.

Body
The body contains the document to be sent from the server
to the client.
The body is present unless the response is an error message.

Example
This example retrieves a document:
We use the GET method to retrieve an image with
the path /usr/bin/image1.
35
36
o The request message contains:
1. The request line shows the method (GET), the URL, and the
HTTP version (1.1).
2. The header has two lines that show that the client can accept
images in the GIF or JPEG format.
3. The request does not have a body.

o The response message contains


1.The status line and four lines of header.
2.The header lines define the date, server, MIME
version and length of the document.
3.The body of the document follows the header.
Note: Multipurpose Internet Mail Extensions (MIME) is a protocol
that allows non-ASCII data to be sent through e-mail.
37
Example
 In this example, the client wants to send data to
the server.
 In request message we use the POST method.
 The request line shows the method (POST), URL,
and HTTP version (1.1).
 There are four lines of headers.
 The request body contains the input information.
 The response message contains the status line and
four lines of headers.
 The created document, is included as the body (see
Figure below).
 Note: "*/*" indicating accepting all media types
38
39
Conditional Request
 A client can add a condition in its request.
 In this case, the server will send the requested
Web page if the condition is met or inform the
client otherwise.
 One of the most common conditions imposed
by the client is the time and date the Web page is
modified.
 The client can send the header line If-Modified-
Since in the request to tell the server that it needs
the page if it is modified after a certain point of
time. 40
Example
The following shows how a client requests the modification
data and time condition on a request.
Request line in request message:

GET http://www.commonServer.com/information/file1 HTTP/1.1


If-Modified-Since: Thu, Sept 04 00:00:00 GMT

The status line in the responds shows the file is not


modified after the defined point of time. The body of the
response message is also empty.
Status line in response message:

HTTP/1.1 304 Not Modified


Date: Sat, Sept 06 08 16:22:46 GMT
Server: commonServer.com
(Empty Body)
41
HTTP can be used in two modes:
persistent and nonpersistent.
 The nonpersistent mode uses a new TCP connection
for each transaction (request/response);
 The persistent mode uses only one TCP connection.

The default in the new version of HTTP is the persistent


mode.

HTTP, prior to version 1.1, specified a nonpersistent


connection, while a persistent connection is the default in
version 1.1.
42
1-Nonpersistent Connection
In a nonpersistent connection, new one TCP
connection is made for each request/response
(transection).

The following lists the steps in this strategy:


1. The client opens a TCP connection and sends a
request.
2. The server sends the response and closes the
connection.
3. The client reads the data until it encounters an
end-of-file marker; it then closes the
connection. 43
In this strategy, if a file contains link to N different
pictures in different files (all located on the same server),
the connection must be opened and closed N + 1 times.

The nonpersistent strategy imposes high overhead on the


server because the server needs N + 1 different buffers .

Example
Figure below shows an example of a nonpersistent
connection.

The client needs to access a file that contains two links to


images. The text file and images are located on the same
server. We need 3 HTTP transections.
44
45
2-Persistent Connection
• HTTP version 1.1 specifies a persistent connection by
default.
• In a persistent connection, the server leaves the
connection open for more requests after sending a
response.
• The server can close the connection at the request of a
client or if a time-out has been reached.
• The sender usually sends the length of the data with
each response.

Example
Figure below shows the same scenario as Example above,
but using persistent connection. 46
47
HTTP cookie
o The World Wide Web was originally designed as a
stateless entity.
o HTTP can use cookies to keep the state of the transactions.
o The server sends a cookie that can be stored in the client
and be retrieved later by the server.
o An HTTP cookie (web cookie, browser cookie) is a small
piece of data that a server sends to the user's web browser.
o The browser may store it and send it back with the next
request to the same server.
o Typically, it's used to tell if two requests came from the
same browser — keeping a user logged-in, for example.
o It remembers stateful information for the stateless HTTP
protocol.
48
Web Caching: Proxy Server
 HTTP supports proxy servers.
 A proxy server is a computer that keeps copies of
responses to recent requests.
 The HTTP client sends a request to the proxy
server. The proxy server checks its cache. If the
response is not stored in the cache, the proxy server
sends the request to the corresponding server.
 Incoming responses are sent to the proxy server and
stored for future requests from other clients.
 The proxy server reduces the load on the original
server, decreases traffic, and improves latency.49
 However, to use the proxy server, the client must
be configured to access the proxy instead of the
target server.

 Note that the proxy server acts both as a server


and client.
 When it receives a request from a client for which
it has a response, it acts as a server and sends
the response to the client.
 When it receives a request from a client for which
it does not have a response, it first acts as a
client and sends a request to the target server.
50
• When the response has been received, it acts again
as a server and sends the response to the client.
• A very important question is how long a response
should remain in the proxy server before being
deleted and replaced.
• Several different strategies are used for this
purpose:
• One solution is deleting less used web pages
• Specify a time for the page to remain in the proxy.
• Another recommendation is to add some headers to show
the last modification time of the information. 51
HTTP Security

The HTTP does not provide security.

However; HTTP can established an encrypted


link between the browser (client) and the web
server using the Secure Socket Layer (SSL)
or Transport Layer Security (TLS) protocols.
TLS is the new version of SSL.

In this case, HTTP is referred to as HTTPS.

HTTPS provides confidentiality, client and


server authentication, and data integrity.
52

You might also like