0% found this document useful (0 votes)
35 views111 pages

Module-5 Application Layer

Module-5 provides an introduction to the application layer of the TCP/IP protocol suite, focusing on client-server programming and standard protocols such as HTTP, FTP, and DNS. It discusses the differences between standard and nonstandard application-layer protocols, as well as the client-server and peer-to-peer paradigms. The module also covers the use of sockets for communication, the application programming interface (API), and the functionalities of UDP, TCP, and SCTP protocols.

Uploaded by

Ambika Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views111 pages

Module-5 Application Layer

Module-5 provides an introduction to the application layer of the TCP/IP protocol suite, focusing on client-server programming and standard protocols such as HTTP, FTP, and DNS. It discusses the differences between standard and nonstandard application-layer protocols, as well as the client-server and peer-to-peer paradigms. The module also covers the use of sockets for communication, the application programming interface (API), and the functionalities of UDP, TCP, and SCTP protocols.

Uploaded by

Ambika Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 111

Module-5

Introduction to Application Layer

Department of CSE- Data Science


Contents

 Introduction

 Client-Server Programming

 Standard Client-Server Protocols:

‣ World Wide Web and HTTP

‣ FTP
‣ Electronic Mail
‣ Domain Name System (DNS)
‣ TELNET
‣ Secure Shell (SSH)

Department of CSE- Data Science 2


INTRODUCTION
 The application layer provides services to the user.
 Communication is provided using a logical connection, which means that the
two application layers assume that there is an imaginary direct connection
through which they can send and receive messages.

 Logical connection takes place between the


application layer of a computer at Sky
Research and the application layer of a
server at Scientific Books.
 The communication at the application layer
is logical, not physical.
 Alice and Bob assume that there is a two-
way logical channel between them through
which they can send and receive messages.

Department of CSE- Data Science 3


Providing Services
 The application layer, however, is different from other layers in that it is the
highest layer in the suite.
 The protocols in this layer do not provide services to any other protocol in the
suite; they only receive services from the protocols in the transport layer.
 This means that protocols can be removed from this layer easily. New protocols
can be also added to this layer as long as the new protocols can use the services
provided by one of the transport-layer protocols.
Standard and Nonstandard Protocols
 To provide smooth operation of the Internet, the protocols used in the first four
layers of the TCP/IP suite need to be standardized and documented. To be flexible,
the application-layer protocols can be both standard and nonstandard.

Department of CSE- Data Science 4


Standard Application-Layer Protocols
 Each standard protocol is a pair of computer programs that interact with the user
and the transport layer to provide a specific service to the user.
 The study of these protocols enables a network manager to easily solve the
problems that may occur when using these protocols.
 The deep understanding of how these protocols work will also give us some
ideas about how to create new nonstandard protocols.
Nonstandard Application-Layer Protocols
 A programmer can create a nonstandard application-layer program if she can
write two programs that provide service to the user by interacting with the
transport layer.
 A private company can create a new customized application protocol to
communicate with all of its offices around the world using the services
provided by the first four layers of the TCP/IP protocol suite without using
any of the standard application programs .
Department of CSE- Data Science 5
Application-Layer Paradigms
1. client-server paradigm
2. peer-to-peer paradigm
client-server paradigm
 most popular paradigm
 the service provider is an application program, called the server process; it runs
continuously, waiting for another application program, called the client process, to
make a connection through the Internet and ask for service.
 some server processes that can provide a specific type of service, but there are many
clients that request service from any of these server processes.
 The server process must be running all the time; the client process is started when the
client needs to receive service
 For example, a telephone directory center in any area can be thought of as a server; a
subscriber that calls and asks for a specific telephone number can be thought of as a
client. The directory center must be ready and available all the time; the subscriber
can call the center for a short period when the service is needed.

Department of CSE- Data Science 6


 Figure shows an example of a
client-server communication in
which three clients communicate
with one server to receive the
services provided by this server.
 One problem with this paradigm
is that the concentration of the
communication load is on the
shoulder of the server, which
means the server should be a
Figure : Example of a client-server paradigm powerful computer.

 Another problem is that there should be a service provider willing to accept the cost
and create a powerful server for a specific service.
 Several traditional services are still using this paradigm, including the World Wide Web
(WWW) and its vehicle HyperText Transfer Protocol (HTTP), file transfer protocol
(FTP), secure shell (SSH), e-mail, and so on.

Department of CSE- Data Science 7


Peer-to-Peer
 often abbreviated P2P paradigm
 In this paradigm, there is no need for a server process to be running all the time
and waiting for the client processes to connect. The responsibility is shared
between peers.
 A computer connected to the Internet can provide service at one time and receive
service at another time. A computer can even provide and receive services at the
same time

Figure :Example of a peer-to-peer paradigm

Department of CSE- Data Science 8


 One of the areas that really fits in this paradigm is the Internet telephony.
Communication by phone is indeed a peer-to-peer activity; no party needs to be running
forever waiting for the other party to call.
 Another area in which the peer-to-peer paradigm can be used is when some computers
connected to the Internet have something to share with each other.
 For example, if an Internet user has a file available to share with other Internet users,
there is no need for the file holder to become a server and run a server process all the
time waiting for other users to connect and retrieve the file.
 peer-to-peer paradigm has been proved to be easily scalable and cost-effective in
eliminating the need for expensive servers to be running and maintained all the time
 The main challenge has been security; it is more difficult to create secure
communication between distributed services than between those controlled by some
dedicated servers.
 There are some new applications, such as BitTorrent, Skype, IPTV, and Internet
telephony, that use this paradigm
Department of CSE- Data Science 9
Mixed Paradigm
 An application may choose to use a mixture of the two paradigms by combining
the advantages of both.
 For example, a light-load client-server communication can be used to find the
address of the peer that can offer a service.
 When the address of the peer is found, the actual service can be received from the
peer by using the peer-to peer paradigm

Department of CSE- Data Science 10


Client-server Programming
Application Programming Interface
 If we need a process to be able to communicate with another process, we need a
new set of instructions to tell the lowest four layers of the TCP/IP suite to open
the connection, send and receive data from the other end, and close the
connection.
 A set of instructions of this kind is normally referred to as an application
programming interface (API).
• A computer manufacturer needs to build the first four layers of the suite in the
operating system and include an API.
• In this way, the processes running at the application layer are able to
communicate with the operating system when sending and receiving messages
through the Internet.
• Several APIs have been designed for communication. Three among them are
common: socket interface, Transport Layer Interface (TLI), and STREAM.

Department of CSE- Data Science 11


Sockets
 It is an object that is created and used by the application program
 Communication between a client process and a server process is communication between
two sockets, created at two ends

Figure : Use of sockets in process-to-process communication

 The client thinks that the socket is the entity that receives the request and gives the
response; the server thinks that the socket is the one that has a request and needs the
response.
 If we create two sockets, one at each end, and define the source and destination addresses
correctly, we can use the available instructions to send and receive data.
 The rest is the responsibility of the operating system and the embedded TCP/IP protocol.
Department of CSE- Data Science 12
Socket Addresses
 We need a pair of socket addresses for communication: a local socket address
and a remote socket address
 A socket address should first define the computer on which a client or a server is
Running

Finding Socket Addresses


Server Site
 The server needs a local (server) and a remote (client) socket address for
communication.
 The local (server) socket address is provided by the operating system. The
operating system knows the IP address of the computer on which the server
process is running.

Department of CSE- Data Science 13


 The port number of a server process needs to be assigned.
‣ If the server process is a standard one defined by the Internet authority, a port
number is already assigned to it.
‣ For example, the assigned port number for a Hypertext Transfer Protocol
(HTTP) is the integer 80, which cannot be used by any other process.
‣ If the server process is not standard, the designer of the server process can
choose a port number, in the range defined by the Internet authority, and assign
it to the process.
 The remote socket address for a server is the socket address of the client that
makes the connection.
 Since the server can serve many clients, it does not know beforehand the remote
socket address for communication. The server can find this socket address when a
client tries to connect to the server.
 The client socket address, which is contained in the request packet sent to the
server, becomes the remote socket address that is used for responding to the
client.
 The remote socket address is changed in each interaction with a different client.
Department of CSE- Data Science 14
Client Site
 The client also needs a local (client) and a remote (server) socket address for
communication.
 The local (client) socket address is also provided by the operating system. The
operating system knows the IP address of the computer on which the client is
running.
 The port number is a 16-bit temporary integer that is assigned to a client process
each time the process needs to start the communication.
 The port number needs to be assigned from a set of integers defined by the
Internet authority and called the ephemeral (temporary) port numbers
 The operating system needs to guarantee that the new port number is not used by
any other running client process.

Department of CSE- Data Science 15


 Finding the remote (server) socket address for a client needs more work. When a
client process starts, it should know the socket address of the server it wants to
connect to. We will have two situations in this case.
1. Sometimes, the user who starts the client process knows both the server port
number and IP address of the computer on which the server is running. This
usually occurs in situations when we have written client and server applications
and we want to test them. The programmer can provide these two pieces of
information when he runs the client program.
2. Although each standard application has a well-known port number, most of the
time, we do not know the IP address. This happens in situations such as when we
need to contact a web page, send an e-mail to a friend, copy a file from a remote
site, and so on. In these situations, the server has a name, an identifier that
uniquely defines the server process. The client process normally knows the port
number because it should be a well-known port number, but the IP address can
be obtained using another client-server application called the Domain Name
System (DNS).

Department of CSE- Data Science 16


Using Services of the Transport Layer
UDP Protocol
 UDP provides connectionless, unreliable, datagram service.
 UDP is not a reliable protocol. Although it may check that the data is not
corrupted during the transmission, it does not ask the sender to resend the
corrupted or lost datagram.
 For some applications, UDP has an advantage: it is message-oriented. It gives
boundaries to the messages exchanged.
 An application program may be designed to use UDP if it is sending small
messages and the simplicity and speed is more important for the application
than reliability.
 For example, some management and multimedia applications fit in this
category.

Department of CSE- Data Science 17


TCP Protocol
 TCP provides connection-oriented, reliable, byte-stream service. TCP requires
that two ends first create a logical connection between themselves by exchanging
some connection-establishment packets.
 By numbering the bytes exchanged, the continuity of the bytes can be checked.
For example, if some bytes are lost or corrupted, the receiver can request the
resending of those bytes, which makes TCP a reliable protocol.
 TCP also can provide flow control and congestion control.
 One problem with the TCP protocol is that it is not message-oriented; it does not
put boundaries on the messages exchanged.
 Most of the standard applications that need to send long messages and require
reliability may benefit from the service of the TCP.
Department of CSE- Data Science 18
SCTP Protocol
 SCTP provides a service which is a combination of the two other protocols.

 Like TCP, SCTP provides a connection-oriented, reliable service, but it is not


bytestream oriented.
 It is a message-oriented protocol like UDP.

 In addition, SCTP can provide multi-stream service by providing multiple


network-layer connections.
 SCTP is normally suitable for any application that needs reliability and at the
same time needs to remain connected, even if a failure occurs in one network-
layer connection.

Department of CSE- Data Science 19


Iterative Communication Using UDP
 Communication between a client program and a server program can occur
iteratively or concurrently.
 Although several client programs can access the same server program at the same
time, the server program can be designed to respond iteratively or concurrently.
 An iterative server can process one client request at a time; it receives a request,
processes it, and sends the response to the requestor before handling another
request.
Sockets Used for UDP
 In UDP communication, the client and server use only one socket each.
 The socket created at the server site lasts forever; the socket created at the client
site is closed (destroyed) when the client process terminates
Department of CSE- Data Science 20
Figure : Sockets for UDP communication

 Figure shows the lifetime of the sockets in the server and client processes.
 Different clients use different sockets, but the server creates only one socket and
changes only the remote socket address each time a new client makes a
connection

Department of CSE- Data Science 21


Flow Diagram

 There are multiple clients, but only one


server.
 Each client is served in each iteration of the
loop in the server.
 If a client wants to send two datagrams, it is
considered as two clients for the server.
 The second datagram needs to wait for its
turn.
 The diagram also shows the status of the
socket after each
Figure : Flow diagram for iterative UDP  action.
communication

Department of CSE- Data Science 22


Server Process
 The server makes a passive open, in which
it becomes ready for the communication,
but it waits until a client process makes the
connection.
 It creates an empty socket. It then binds the
socket to the server and the well-know port
 The server then issues a receive request
command, which blocks until it receives a
request from a client.
 The request is the process and the response
is sent back to the client.
 The server now starts another iteration
Figure : Flow diagram for iterative UDP waiting for another request to arrive
communication

Department of CSE- Data Science 23


Client Process
 The client process makes an active
open.
 It creates an empty socket and then
issues the send command, which fully
fills the socket, and sends the request.
 The client then issues a receive
command, which is blocked until a
response arrives from the server.
 The response is then handled and the
Figure : Flow diagram for iterative UDP socket is destroyed.
communication

Department of CSE- Data Science 24


Iterative Communication Using TCP

Sockets Used in TCP


 The TCP server uses two different sockets, one for connection
establishment(listen socket) and the other for data transfer(socket).
 The reason for having two types of sockets is to separate the connection phase
from the data exchange phase.
 A server uses a listen socket to listen for a new client trying to establish
connection. After the connection is established, the server creates a socket to
exchange data with the client and finally to terminate the connection.
 The client uses only one socket for both connection establishment and data
exchange

Department of CSE- Data Science 25


Figure : Sockets used in TCP communication

Department of CSE- Data Science 26


Flow Diagram

 Figure shows a simplified flow


diagram for iterative
communication using TCP.
 There are multiple clients, but only
one server.
 Each client is served in each
iteration of the loop.

Figure : Flow diagram for iterative TCP


communication
Department of CSE- Data Science 27
 The TCP server process creates a
socket and binds it, but these two
commands create the listen socket
to be used only for the connection
establishment phase.
 The server process then calls the
listen procedure, to allow the
operating system to start accepting
the clients, completing the
connection phase, and putting them
in the waiting list to be served.
 The server process now starts a
loop and serves the clients one by
Figure : Flow diagram for iterative TCP
one.
communication
Department of CSE- Data Science 28
 In each iteration, the server process
issues the accept procedure that
removes one client from the waiting
list of the connected clients for
serving.
 If the list is empty, the accept
procedure blocks until there is a
client to be served.
 When the accept procedure returns,
it creates a new socket for data
transfer. The server process now
uses the client socket address
obtained during the connection
establishment to fill the remote
socket address field in the newly
created socket. At this time the
Figure : Flow diagram for iterative TCP client and server can exchange data.
communication
Department of CSE- Data Science 29
Client Process
 The client flow diagram is almost
similar to the UDP version except
that the client data-transfer box
needs to be defined for each specific
case

Concurrent Communication
 A concurrent server can process several client requests at the same time. This can
be done using the available provisions in the underlying programming language.
 In C, a server can create several child processes, in which a child can handle a
client.
 In Java, threading allows several clients to be handled by each thread

Department of CSE- Data Science 30


Standard Client-Server Protocols
World Wide Web And HTTP
World Wide Web
 The Web today is a repository of information in which the documents, called
web pages, are distributed all over the world and related documents are linked
together.
 The linking of web pages was achieved using a concept called hypertext
Architecture
 The WWW today is a distributed client-server service, in which a client using a
browser can access a service using a server.
 The service provided is distributed over many locations called sites. Each site
holds one or more web pages.
 A web page can be simple or composite.
 A simple web page has no links to other web pages; a composite web page has
one or more links to other web pages.
 Each web page is a file with a name and address.
Department of CSE- Data Science 31
Example : Assume we need to retrieve a scientific document that contains one reference to another text file
and one reference to a large image.

 The main document and the image are stored in two separate files (file A and file B) in the same site; the
referenced text file (file C) is stored in another site. Since we are dealing with three different files, we
need three transactions if we want to see the whole document.
 The first transaction (request/response) retrieves a copy of the main document (file A), which has
references (pointers) to the second and third files.
 When a copy of the main document is retrieved and browsed, the user can click on the reference to the
image to invoke the second transaction and retrieve a copy of the image (file B).
 If the user needs to see the contents of the referenced text file, she can click on its reference (pointer)
invoking the third transaction and retrieving a copy of file C.
 The file A, file B, and file C are independent web pages, each with independent names and addresses.
Department of CSE- Data Science 32
Web Client (Browser)
 A variety of vendors offer commercial browsers that interpret and display a web page,
and all of them use nearly the same architecture.
 Each browser usually consists of three parts: a controller, client protocols, and
interpreters.

 The controller receives input from the keyboard or the mouse and uses the client
programs to access the document.
 After the document has been accessed, the controller uses one of the interpreters to
display the document on the screen.
 The client protocol can be one of the protocols such as HTTP or FTP.
 The interpreter can be HTML, Java, or JavaScript, depending on the type of document.
 Some commercial browsers include Internet Explorer, Netscape Navigator, and Firefox.
Department of CSE- Data Science 33
Web Server
 The web page is stored at the server. Each time a request arrives, the
corresponding document is sent to the client.
 To improve efficiency, servers normally store requested files in a cache in
memory; memory is faster to access than a disk.
 A server can also become more efficient through multithreading or
multiprocessing. In this case, a server can answer more than one request at a
time.
 Some popular web servers include Apache and Microsoft Internet Information
Server.
Uniform Resource Locator (URL)
 A web page, as a file, needs to have a unique identifier to distinguish it from
other
 web pages. To define a web page, we need three identifiers: host, port, and path.

Department of CSE- Data Science 34


 Protocol. The first identifier is the abbreviation for the client-server program that we
need in order to access the web page. Although most of the time the protocol is HTTP
(HyperText Transfer Protocol), we can also use other protocols such as FTP (File
Transfer Protocol).
 Host. The host identifier can be the IP address of the server or the unique name given
to the server. IP addresses can be defined in dotted decimal notation; the name is
normally the domain name that uniquely defines the host, such as forouzan.com
 Port. The port, a 16-bit integer, is normally predefined for the client-server
application. For example, if the HTTP protocol is used for accessing the web page, the
well-known port number is 80.
 Path. The path identifies the location and the name of the file in the underlying
operating system. The format of this identifier normally depends on the operating
system.

Department of CSE- Data Science 35


 To combine these four pieces together, the uniform resource locator (URL) has been
designed; it uses three different separators between the four pieces as shown below:

Web Documents
• The documents in the WWW can be grouped into three broad categories: static, dynamic,
and active.
Static Documents
 Static documents are fixed-content documents that are created and stored in a server.
 The contents of the file are determined when the file is created, not when it is used.
 The contents in the server can be changed, but the user cannot change them.
 When a client accesses the document, a copy of the document is sent.
 The user can then use a browser to see the document.
 Static documents are prepared using one of several languages: HyperText Markup
Language (HTML), Extensible Markup Language (XML), Extensible Style Language
(XSL), and Extensible Hypertext Markup Language (XHTML).
Department of CSE- Data Science 36
Dynamic Documents
 A dynamic document is created by a web server whenever a browser requests the
document.
 When a request arrives, the web server runs an application program or a script that
 creates the dynamic document. The server returns the result of the program or script
as a response to the browser that requested the document.
 Because a fresh document is created for each request, the contents of a dynamic
document may vary from one request to another.
 A very simple example of a dynamic document is the retrieval of the time and date
from a server.
 Common Gateway Interface (CGI) was used to retrieve a dynamic document in the
past, today’s options include one of the scripting languages such as Java Server Pages
(JSP), which uses the Java language for scripting, or Active Server Pages (ASP), a
Microsoft product that uses Visual Basic language for scripting, or ColdFusion, which
embeds queries in a Structured Query Language (SQL) database in the HTML
document.

Department of CSE- Data Science 37


Active Documents
 For many applications, we need a program or a script to be run at the client site.
These arecalled active documents.
 For example, suppose we want to run a program that creates animated graphics on
the screen or a program that interacts with the user.
 The program definitely needs to be run at the client site where the animation or
interaction takes place.
 When a browser requests an active document, the server sends a copy of the
document or a script.
 The document is then run at the client (browser) site.
 One way to create an active document is to use Java applets, a program written in
Java on the server. It is compiled and ready to be run. The document is in bytecode
(binary) format.
 Another way is to use JavaScripts but download and run the script at the client site.

Department of CSE- Data Science 38


HyperText Transfer Protocol (HTTP)
 The HyperText Transfer Protocol (HTTP) is used to define how the client-server
programs can be written to retrieve web pages from the Web. An HTTP client sends a
request; an HTTP server returns a response.
 The server uses the port number 80; the client uses a temporary port number.
 HTTP uses the services of TCP.
Nonpersistent versus Persistent Connections
 In a nonpersistent connection, one TCP connection is made for each request/response.
 The following lists the steps in this strategy:
1.The client opens a TCP connection and sends a request.
2.The server sends the response and closes the connection.
3.The client reads the data until it encounters an end-of-file marker; it then closes the
connection.
 In this strategy, if a file contains links to N different pictures in different files (all
located on the same server), the connection must be opened and closed N + 1 times.
 The nonpersistent strategy imposes high overhead on the server because the server
needs N + 1 different buffers each time a connection is opened.

Department of CSE- Data Science 39


Example: Figure shows an example of a nonpersistent connection. The client needs to
access a file that contains one link to an image. The text file and image are located on the
same server. Here we need two connections. For each connection, TCP requires at least
three handshake messages to establish the connection, but the request can be sent with the
third one. After the connection is established, the object can be transferred. After
receiving an object, another three handshake messages are needed to terminate the
connection,

Department of CSE- Data Science 40


 HTTP version 1.1 specifies a persistent connection by default. In a persistent connection,
the server leaves the connection open for more requests after sending a response.
 The server can close the connection at the request of a client or if a time-out has been
reached. The sender usually sends the length of the data with each response.
 Time and resources are saved using persistent connections.
 Only one set of buffers and variables needs to be set for the connection at each site.
 The round trip time for connection establishment and connection termination is saved.
 Example :Only one connection establishment and connection termination is used, but the
request for the image is sent separately.

Department of CSE- Data Science 41


Message Formats
 The HTTP protocol defines the format of the request and response messages

Figure : Formats of the request and response messages

 The first section in the request message is called the request line; the first section
in the response message is called the status line.

Department of CSE- Data Science 42


Request Message
 The first line in a request message is called a request line.
 There are three fields in this line separated by one space and terminated by two
characters. The fields are called method, URL, and version.
 The method field defines the request types. In version 1.1 of HTTP, several methods
are defined

 The second field, URL, was discussed earlier in the chapter. It defines the address
and name of the corresponding web page.
 The third field, version, gives the version of the protocol; the most current version
of HTTP is 1.1.
Department of CSE- Data Science 43
 After the request line, we can have zero or more request header lines.
 Each header line sends additional information from the client to the server.
 For example, the client can request that the document be sent in a special format.
 Each header line has a header name, a colon, a space, and a header value

 The value field defines the values associated with each header name.
 The body can be present in a request message. Usually, it contains the comment to be sent or
the file to be published on the website when the method is PUT or POST

Department of CSE- Data Science 44


Response Message
 A response message consists of a status line, header lines, a blank line, and sometimes
a body.
 The first line in a response message is called the status line. There are three fields in
this line separated by spaces and terminated by a carriage return and line feed.
 The first field defines the version of HTTP protocol, currently 1.1.
 The status code field defines the status of the request. It consists of three digits.
Whereas the codes in the 100 range are only informational, the codes in the 200 range
indicate a successful request.
 The codes in the 300 range redirect the client to another URL, and the codes in the 400
range indicate an error at the client site.
 Finally, the codes in the 500 range indicate an error at the server site.
 The status phrase explains the status code in text form.

Department of CSE- Data Science 45


 After the status line, we can have zero or more response header lines.
 Each header line sends additional information from the server to the client.
 For example, the sender can send extra information about the document.
 Each header line has a header name, a colon, a space, and a header value.

 The body contains the document to be sent from the server to the client. The body is
present unless the response is an error message

Department of CSE- Data Science 46


Example 1
 This example retrieves a document. We
use the GET method to retrieve an image
with the path /usr/bin/image1.
 The request line shows the method (GET),
the URL, and the HTTP version (1.1).
 The header has two lines that show that
the client can accept images in the GIF or
JPEG format. The request does not have a
body.
 The response message contains the status
line and four lines of header.
 The header lines define the date, server,
content encoding , and length of the
document. The body of the document
follows the header.

Department of CSE- Data Science 47


Example 2
 In this example, the client wants to send a
web page to be posted on the server.
 We use the PUT method. The request line
shows the method (PUT), URL, and
HTTP version (1.1).
 There are four lines of headers.
 The request body contains the web page
to be posted.
 The response message contains the status
line and four lines of headers.
 The created document, which is a CGI
document, is included as the body
Department of CSE- Data Science 48
Conditional Request
 A client can add a condition in its request.
 In this case, the server will send the requested web page if the condition is met or
inform the client otherwise.
 One of the most common conditions imposed by the client is the time and date the web
page is modified. T
 he client can send the header line If-Modified-Since with the request to tell the server
that it needs the page only if it is modified after a certain point in time.

Department of CSE- Data Science 49


Department of CSE- Data Science 50
Cookies
Creating and Storing Cookies
 The creation and storing of cookies depend on the implementation; however, the
principle is the same.
1. When a server receives a request from a client, it stores information about the client in
a file or a string. The information may include the domain name of the client, the
contents of the cookie (information the server has gathered about the client such as
name, registration number, and so on), a timestamp, and other information depending
on the implementation.
2. The server includes the cookie in the response that it sends to the client.
3. When the client receives the response, the browser stores the cookie in the cookie
directory, which is sorted by the server domain name.

Department of CSE- Data Science 51


Using Cookies
 When a client sends a request to a server, the browser looks in the cookie directory to see if it
can find a cookie sent by that server.
 If found, the cookie is included in the request. When the server receives the request, it knows
that this is an old client, not a new one.
 An electronic store (e-commerce) can use a cookie for its client shoppers. When a client
selects an item and inserts it in a cart, a cookie that contains information about the item, such as
its number and unit price, is sent to the browser. If the client selects a second item, the cookie is
updated with the new selection information, and so on. When the client finishes shopping and
wants to check out, the last cookie is retrieved and the total charge is calculated.
 The site that restricts access to registered clients only sends a cookie to the client when the
client registers for the first time. For any repeated access, only those clients that send the
appropriate cookie are allowed.

Department of CSE- Data Science 52


 A web portal uses the cookie in a similar way. When a user selects her favorite pages, a
cookie is made and sent. If the site is accessed again, the cookie is sent to the server to
show what the client is looking for.
 A cookie is also used by advertising agencies. An advertising agency can place banner
ads on some main website that is often visited by users. When a user visits the main
website and clicks the icon of a corporation, a request is sent to the advertising agency.
The advertising agency sends the requested banner, but it also includes a cookie with
the ID of the user. The advertising agency has compiled the interests of the user and can
sell this information to other parties.

Department of CSE- Data Science 53


Web Caching: Proxy Servers
 HTTP supports proxy servers. A proxy server is a computer that keeps copies of
responses to recent requests.
 The HTTP client sends a request to the proxy server. The proxy server checks its
cache.
 If the response is not stored in the cache, the proxy server sends the request to the
corresponding server.
 Incoming responses are sent to the proxy server and stored for future requests from
other clients.
 The proxy server reduces the load on the original server, decreases traffic, and
improves latency. To use the proxy server, the client must be configured to access the
proxy instead of the target server.
 The proxy server acts as both server and client.
 When it receives a request from a client for which it has a response, it acts as a server
and sends the response to the client.
 When it receives a request from a client for which it does not have a response, it first
acts as a client and sends a request to the target server

Department of CSE- Data Science 54


Proxy Server Location
 The proxy servers are normally located at the client site. This means that we can have
a hierarchy of proxy servers, as shown below:
1. A client computer can also be used as a proxy server, in a small capacity, that stores
responses to requests often invoked by the client.
2. In a company, a proxy server may be installed on the computer LAN to reduce the
load going out of and coming into the LAN.
3. An ISP with many customers can install a proxy server to reduce the load going out
of and coming into the ISP network.

Department of CSE- Data Science 55


HTTP Security
 HTTP does not provide security.
 HTTP can be run over the Secure Socket Layer (SSL). In this case, HTTP is referred to
as HTTPS.
 HTTPS provides confidentiality, client and server authentication, and data integrity.

FTP
 File Transfer Protocol (FTP) is the standard protocol provided by TCP/IP for copying
a file from one host to another.
 Although transferring files from one system to another seems simple and
straightforward, some problems must be dealt with first.
 For example, two systems may use different file name conventions.
 Two systems may have different ways to represent data.
 Two systems may have different directory structures.
 All of these problems have been solved by FTP in a very simple and elegant approach.

Department of CSE- Data Science 56


Figure : Basic model of FTP
 The client has three components: the user interface, the client control process, and the
client data transfer process.
 The server has two components: the server control process and the server data transfer
process.
 The control connection is made between the control processes.
 The data connection is made between the data transfer processes.
 Separation of commands and data transfer makes FTP more efficient.
 The control connection uses very simple rules of communication.

Department of CSE- Data Science 57


Two Connections
 The two connections in FTP have different lifetimes.
 When a user starts an FTP session, the control connection opens.
 While the control connection is open, the data connection can be opened and closed
multiple times if several files are transferred.
 FTP uses two well-known TCP ports: port 21 is used for the control connection, and
port 20 is used for the data connection.
Control Connection
 For control communication, FTP uses the NVT ASCII character set.
 Communication is achieved through commands and responses. This simple method is
adequate for the control connection because we send one command (or response) at a
time.
 Each line is terminated with a two-character (carriage return and line feed) end-of-line
token.
 During this control connection, commands are sent from the client to the server and
responses are sent from the server to the client.

Department of CSE- Data Science 58


 Commands, which are sent from the FTP client control process, are in the form of
ASCII uppercase, which may or may not be followed by an argument.
 Some of the most common commands are shown in Table below

Department of CSE- Data Science 59


 Every FTP command generates at least one response.
 A response has two parts: a three-digit number followed by text.
 The numeric part defines the code; the text part defines needed parameters or further
explanations.
 The first digit defines the status of the command. The second digit defines the area in
which the status applies. The third digit provides additional information.

Department of CSE- Data Science 60


Data Connection
 The data connection uses the well-known port 20 at the server site. The following
shows the steps:
1. The client, not the server, issues a passive open using an ephemeral port. This must
be done by the client because it is the client that issues the commands for
transferring files.
2. Using the PORT command the client sends this port number to the server.
3. The server receives the port number and issues an active open using the wellknown
port 20 and the received ephemeral port number.
Communication over Data Connection
 To transfer files through the data connection , the client must define the type of file to
be transferred, the structure of the data, and the transmission mode.
File Type
 FTP can transfer one of the following file types across the data connection: ASCII
file, EBCDIC file, or image file.

Department of CSE- Data Science 61


Data Structure
 FTP can transfer a file across the data connection using one of the following
interpretations of the structure of the data: file structure, record structure, or page
structure.
 The file structure format (used by default) has no structure. It is a continuous stream of
bytes.
 In the record structure, the file is divided into records. This can be used only with text
files.
 In the page structure, the file is divided into pages, with each page having a page
number and a page header. The pages can be stored and accessed randomly or
sequentially.

Department of CSE- Data Science 62


Transmission Mode
 FTP can transfer a file across the data connection using one of the following three
transmission modes: stream mode, block mode, or compressed mode.
 The stream mode is the default mode; data are delivered from FTP to TCP as a
continuous stream of bytes.
 In the block mode, data can be delivered from FTP to TCP in blocks. In this case,
each block is preceded by a 3-byte header. The first byte is called the block descriptor;
the next two bytes define the size of the block in bytes.
File Transfer
 File transfer occurs over the data connection under the control of the commands sent
over the control connection
 File transfer in FTP means one of three things: retrieving a file (server to client),
storing a file (client to server), and directory listing (server to client).

Department of CSE- Data Science 63


 Figure shows an example of using FTP for retrieving a file.

 The figure shows only one file to be


transferred.
 The control connection remains open
all the time, but the data connection is
opened and closed repeatedly.
 We assume the file is transferred in six
sections.
 After all records have been transferred,
the server control process announces
that the file transfer is done.
 Since the client control process has no
file to retrieve, it issues the QUIT
command, which causes the service
connection to be closed.

Department of CSE- Data Science 64


Security for FTP
 The FTP protocol was designed when security was not a big issue.
 Although FTP requires a password, the password is sent in plaintext (unencrypted),
which means it can be intercepted and used by an attacker.
 The data transfer connection also transfers data in plaintext, which is insecure. To be
secure, one can add a Secure Socket Layer between the FTP application layer and the
TCP layer. In this case FTP is called SSL-FTP.

Department of CSE- Data Science 65


Electronic Mail
 Electronic mail (or e-mail) allows users to exchange messages.
 e-mail is considered a one-way transaction. When Alice sends an email to Bob, she

may expect a response, but this is not a mandate.


 Bob may or may not respond. If he does respond, it is another one-way transaction.
 It is neither feasible nor logical for Bob to run a server program and wait until

someone sends an e-mail to him.


 Bob may turn off his computer when he is not using it. This means that the idea of

client/server programming should be implemented in another way: using some


intermediate computers (servers).
 The users run only client programs when they want and the intermediate servers

apply the client/server paradigm

Department of CSE- Data Science 66


Architecture
 In the common scenario, the sender and
the receiver of the e-mail, Alice and
Bob respectively, are connected via a
LAN or a WAN to two mail servers.
 The administrator has created one
mailbox for each user where the
received messages are stored.
 A mailbox is part of a server hard drive,
a special file with permission
restrictions.
Figure : Common scenario  Only the owner of the mailbox has
access to it.
 The administrator has also created a
queue (spool) to store messages waiting
to be sent.

Department of CSE- Data Science 67


 A simple e-mail from Alice to Bob takes nine different steps, as shown in the figure.
 Alice and Bob use three different agents: a user agent (UA), a message transfer
agent
 The electronic mail system needs two UAs, two pairs of MTAs (client and server),
and a pair of MAAs (client and server).
 There are two important points we need to emphasize here.
‣ First, Bob cannot bypass the mail server and use the MTA server directly. To use
the MTA server directly, Bob would need to run the MTA server all the time
because he does not know when a message will arrive.
‣ Second, note that Bob needs another pair of client-server programs: message
access programs. This is because an MTA client-server program is a push program:
the client pushes the message to the server. Bob needs a pull program. The client
needs to pull the message from the server

Department of CSE- Data Science 68


E-mail address

Message Transfer Agent: SMTP

Figure : Protocols used in electronic mail

 The formal protocol that defines the MTA client and server in the Internet is
called Simple Mail Transfer Protocol (SMTP).
 SMTP is used two times, between the sender and the sender’s mail server and
between the two mail servers.

Department of CSE- Data Science 69


 SMTP uses commands and responses to transfer messages between an MTA client and
an MTA server.
 The command is from an MTA client to an MTA server; the response is from an MTA
server to the MTA client. format of a command is shown below:
Keyword: argument(s)
 Responses Responses are sent from the server to the client. A response is a three digit
code that may be followed by additional textual information
Mail Transfer Phases
1. Connection Establishment After a client has made a TCP connection to the
wellknown port 25, the SMTP server starts the connection phase. This phase involves
the following three steps:
1. The server sends code 220 (service ready) to tell the client that it is ready to
receive mail. If the server is not ready, it sends code 421 (service not available).
2. The client sends the HELO message to identify itself, using its domain name
address. This step is necessary to inform the server of the domain name of the
client.
3. The server responds with code 250 (request command completed) or some other
code depending on the situation.

Department of CSE- Data Science 70


Message Transfer

Department of CSE- Data Science 71


Connection Termination
After the message is transferred successfully, the client terminates the connection. This
phase involves two steps.
1.The client sends the QUIT command.
2.The server responds with code 221 or some other appropriate code.

Department of CSE- Data Science 72


Message Access Agent: POP and IMAP
POP3
 Post Office Protocol, version 3 (POP3) is simple but limited in functionality.
 The client POP3 software is installed on the recipient computer; the server POP3
software is installed on the mail server.
 Mail access starts with the client when the user needs to download its e-mail from the
mailbox on the mail server.

Figure: POP3

Department of CSE- Data Science 73


 The client opens a connection to the server on TCP port 110. It then sends its user
name and password to access the mailbox.
 The user can then list and retrieve the mail messages, one by one.
 POP3 has two modes: the delete mode and the keep mode.
 In the delete mode, the mail is deleted from the mailbox after each retrieval. In
the keep mode, the mail remains in the mailbox after retrieval.
 The delete mode is normally used when the user is working at her permanent
computer and can save and organize the received mail after reading or replying.
 The keep mode is normally used when the user accesses her mail away from her
primary computer (for example, from a laptop). The mail is read but kept in the
system for later retrieval and organizing.

Department of CSE- Data Science 74


IMAP4
 Another mail access protocol is Internet Mail Access Protocol, version 4 (IMAP4).
 IMAP4 is more powerful and more complex.
 POP3 is deficient in several ways. It does not allow the user to organize her mail on
the server; the user cannot have different folders on the server.
 In addition, POP3 does not allow the user to partially check the contents of the mail
before downloading.
 IMAP4 provides the following extra functions:
‣ A user can check the e-mail header prior to downloading.
‣ A user can search the contents of the e-mail for a specific string of characters prior
to downloading.
‣ A user can partially download e-mail. This is especially useful if bandwidth is
limited and the e-mail contains multimedia with high bandwidth requirements.
‣ A user can create, delete, or rename mailboxes on the mail server.
‣ A user can create a hierarchy of mailboxes in a folder for e-mail storage.

Department of CSE- Data Science 75


MIME
 Multipurpose Internet Mail Extensions (MIME) is a supplementary protocol that
allows non-ASCII data to be sent through e-mail.
 We can think of MIME as a set of software functions that transforms non-ASCII data to
ASCII data and vice versa

MIME Headers
 MIME defines five headers which can be added to the original e-mail header section to
define the transformation parameters:

Department of CSE- Data Science 76


MIME-Version This header defines the version of MIME used. The current version is 1.1.
Content-Type This header defines the type of data used in the body of the message. The
content type and the content subtype are separated by a slash. Depending on the subtype,
the header may contain other parameters. MIME allows seven different types of data

Department of CSE- Data Science 77


 Content-Transfer-Encoding This header defines the method used to encode the
messages into 0s and 1s for transport. The five types of encoding methods are listed in
Table

 Content-ID This header uniquely identifies the whole message in a multiple message
environment.
 Content-Description This header defines whether the body is image, audio, or video.
E-Mail Security
 e-mail exchanges can be secured using two application-layer securities designed in
particular for e-mail systems.
 Two of these protocols are Pretty Good Privacy (PGP) and Secure/Multipurpose
Internet Mail Extensions (S/MIME),

Department of CSE- Data Science 78


TELNET
 Telnet, an acronym for “TErminaL NETwork”, is a network protocol used on the
Internet or local area networks.
Local versus Remote Logging
 When a user logs into a local system, it is called local logging.
 As a user types at a terminal or at a workstation running a terminal emulator, the
keystrokes are accepted by the terminal driver.
 The terminal driver passes the characters to the operating system.
 The operating system, in turn, interprets the combination of characters and invokes the
desired application program or utility.

Department of CSE- Data Science 79


 When a user wants to access an application program or utility located on a remote
machine, she performs remote logging.
 The user sends the keystrokes to the terminal driver where the local operating system
accepts the characters but does not interpret them.
 The characters are sent to the TELNET client, which transforms the characters into a
universal character set called Network Virtual Terminal (NVT) characters and delivers
them to the local TCP/IP stack.

Department of CSE- Data Science 80


Network Virtual Terminal (NVT)
 The network virtual terminal is an interface that defines how data and commands are
sent across the network.

Figure : Concept of NVT

 NVT uses two sets of characters, one for data and one for control. Both are 8-bit bytes
as shown in Figure.
 For data, NVT normally uses what is called NVT ASCII. This is an 8-bit character set
in which the seven lowest order bits are the same as US ASCII and the highest order
bit is 0.
 To send control characters between computers (from client to server or vice versa),
NVT uses an 8-bit character set in which the highest order bit is set to 1.

Department of CSE- Data Science 81


Options
 TELNET lets the client and server negotiate options before or during the use of the
service.
 Options are extra features available to a user with a more sophisticated terminal.
 Users with simpler terminals can use default features.
User Interface
 The operating system (UNIX, for example) defines an interface with user-friendly
commands.
 An example of such a set of commands can be found in Table

Department of CSE- Data Science 82


Secure Shell (SSH)
 SSH is an application-layer protocol with three components

Figure : Components of SSH

SSH Transport-Layer Protocol (SSH-TRANS)


 Since TCP is not a secured transport-layer protocol, SSH first uses a protocol that
creates a secured channel on top of the TCP. This new layer is an independent
protocol referred to as SSH-TRANS.
 When the procedure implementing this protocol is called, the client and server
first use the TCP protocol to establish an insecure connection. Then they exchange
several security parameters to establish a secure channel on top of the TCP.

Department of CSE- Data Science 83


 The services provided by this protocol:
1. Privacy or confidentiality of the message exchanged
2. Data integrity, which means that it is guaranteed that the messages exchanged between
the client and server are not changed by an intruder
3. Server authentication, which means that the client is now sure that the server is the one
that it claims to be
4. Compression of the messages, which improves the efficiency of the system and makes
attack more difficult
SSH Authentication Protocol (SSH-AUTH)
 After a secure channel is established between the client and the server and the server is
authenticated for the client, SSH can call another procedure that can authenticate the
client for the server.
 Authentication starts with the client, which sends a request message to the server.
 The request includes the user name, server name, the method of authentication, and the
required data.
 The server responds with either a success message, which confirms that the client is
authenticated, or a failed message, which means that the process needs to be repeated
with a new request message.

Department of CSE- Data Science 84


SSH Connection Protocol (SSH-CONN)
 After the secured channel is established and both server and client are authenticated
for each other, SSH can call a piece of software that implements the third protocol,
SSHCONN.
 One of the services provided by the SSH-CONN protocol is multiplexing.
 SSH-CONN takes the secure channel established by the two previous protocols and
lets the client create multiple logical channels over it.
 Each channel can be used for a different purpose, such as remote logging, file transfer,
and so on.
Applications
1. SSH for Remote Logging: Several free and commercial applications use SSH for
remote logging. Among them, we can mention PuTTy, by Simon Tatham, which is a
client SSH program that can be used for remote logging. Another application
program is Tectia, which can be used on several platforms.

Department of CSE- Data Science 85


2. SSH for File Transfer : One of the application programs that is built on top of SSH for
file transfer is the Secure File Transfer Program (sftp). The sftp application program
uses one of the channels provided by the SSH to transfer files. Another common
application is called Secure Copy (scp). This application uses the same format as the
UNIX copy command, cp, to copy files.
3. Port Forwarding : We can use the secured channels available in SSH to access an
application program that does not provide security services. Applications such as
TELNET and Simple Mail Transfer Protocol (SMTP) can use the services of the SSH
port forwarding mechanism. The SSH port forwarding mechanism creates a tunnel
through which the messages belonging to other protocols can travel. For this
reason, this mechanism is sometimes referred to as SSH tunneling.

Figure: Port forwarding

Department of CSE- Data Science 86


Format of the SSH Packets

 The length field defines the length of the packet but does not include the
padding.
 One to eight bytes of padding is added to the packet to make the attack on the
security provision more difficult.
 The cyclic redundancy check (CRC) field is used for error detection.
 The type field designates the type of the packet used in different SSH protocols.
 The data field is the data transferred by the packet in different protocols

Department of CSE- Data Science 87


DOMAIN NAME SYSTEM (DNS)
 To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies
the connection of a host to the Internet. However, people prefer to use names
instead of numeric addresses.
 Therefore, the Internet needs to have a directory system that can map a name to an
address.
 Since the Internet is so huge today, a central directory system cannot hold all the
mapping. In addition, if the central computer fails, the whole communication
network will collapse.
 A better solution is to distribute the information among many computers in the
world. In this method, the host that needs mapping can contact the closest
computer holding the needed information. This method is used by the Domain
Name System (DNS).

Department of CSE- Data Science 88


Figure : Purpose of DNS

 Figure 26.28 shows how TCP/IP uses a DNS client and a DNS server to map a name to
an address.
 A user wants to use a file transfer client to access the corresponding file transfer
server running on a remote host.
 The user knows only the file transfer server name, such as afilesource.com. However,
the TCP/IP suite needs the IP address of the file transfer server to make the
connection.

Department of CSE- Data Science 89


 The following six steps map the host name to an IP address:

1. The user passes the host name to the file transfer client.
2. The file transfer client passes the host name to the DNS client.
3. Each computer, after being booted, knows the address of one DNS server. The DNS
client sends a message to a DNS server with a query that gives the file transfer
server name using the known IP address of the DNS server.
4. The DNS server responds with the IP address of the desired file transfer server.
5. The DNS server passes the IP address to the file transfer client.
6. The file transfer client now uses the received IP address to access the file transfer
server.

Department of CSE- Data Science 90


Name Space
 The names must be unique because the addresses are unique.
 A name space that maps each address to a unique name can be organized in two
ways: flat or hierarchical.
 In a flat name space, a name is assigned to an address.
 A name in this space is a sequence of characters without structure. The names may or
may not have a common section; if they do, it has no meaning.
 The main disadvantage of a flat name space is that it cannot be used in a large system
such as the Internet because it must be centrally controlled to avoid ambiguity and
duplication.
 In a hierarchical name space, each name is made of several parts. The first part can
define the nature of the organization, the second part can define the name of an
organization, the third part can define departments in the organization, and so on.
 In this case, the authority to assign and control the name spaces can be decentralized.
A central authority can assign the part of the name that defines the nature of the
organization and the name of the organization.

Department of CSE- Data Science 91


 The responsibility for the rest of the name can be given to the organization itself. The
organization can add suffixes (or prefixes) to the name to define its host or resources.
 The management of the organization need not worry that the prefix chosen for a
host is taken by another organization because, even if part of an address is the same,
the whole address is different.
 For example, assume two organizations call one of their computers caesar. The first
organization is given a name by the central authority, such as first.com, the second
organization is given the name second.com.
 When each of these organizations adds the name caesar to the name they have
already been given, the end result is two distinguishable names: ceasar.first.com and
ceasar.second.com. The names are unique.

Department of CSE- Data Science 92


Domain Name Space
 To have a hierarchical name space, a domain name space was designed. In this design the
names are defined in an inverted-tree structure with the root at the top.
 The tree can have only 128 levels: level 0 (root) to level 127

Figure : Domain name space Root


Label
 Each node in the tree has a label, which is a string with a maximum of 63 characters.
 The root label is a null string (empty string).
 DNS requires that children of a node (nodes that branch from the same node) have
different labels, which guarantees the uniqueness of the domain names.

Department of CSE- Data Science 93


Domain Name
 Each node in the tree has a domain name.
 A full domain name is a sequence of labels separated by dots (.). The domain names
are always read from the node up to the root.
 The last label is the label of the root (null). This means that a full domain name always
ends in a null label, which means the last character is a dot because the null string is
nothing.
 If a label is terminated by a null string, it is called
a fully qualified domain name (FQDN).
 The name must end with a null label, but because
null means nothing, the label ends with a dot.
 If a label is not terminated by a null string, it is
called a partially qualified domain name (PQDN).
 A PQDN starts from a node, but it does not reach
the root. It is used when the name to be resolved
belongs to the same site as the client.
 Here the resolver can supply the missing part,
called the suffix, to create an FQDN.
Figure : Domain names and labels

Department of CSE- Data Science 94


Domain
 A domain is a subtree of the domain name space. The name of the domain is the name of the
node at the top of the subtree.

Figure : Domains

Distribution of Name Space


 The information contained in the domain name space must be stored. However, it is very
inefficient and also not reliable to have just one computer store such a huge amount of
information.
 It is inefficient because responding to requests from all over the world places a heavy
load on the system. It is not reliable because any failure makes the data inaccessible.
Department of CSE- Data Science 95
Hierarchy of Name Servers
 The solution to these problems is to distribute the information among many
computers called DNS servers.
 One way to do this is to divide the whole space into many domains based on the first
level.
 We let the root stand alone and create as many domains (subtrees) as there are first-
level nodes. Because a domain created this way could be very large, DNS allows
domains to be divided further into smaller domains (subdomains).
 Each server can be responsible (authoritative) for either a large or small domain.

Figure : Hierarchy of name servers

Department of CSE- Data Science 96


Zone
 Since the complete domain name hierarchy cannot be
stored on a single server, it is divided among many
servers. What a server is responsible for or has
authority over is called a zone.
 We can define a zone as a contiguous part of the entire
tree. If a server accepts responsibility for a domain and
does not divide the domain into smaller domains, the
“domain” and the “zone” refer to the same thing.
 The server makes a database called a zone file and
keeps all the information for every node under that
domain.
 The information about the nodes in the subdomains is
stored in the servers at the lower levels, with the Figure : Zone
original server keeping some sort of reference to these
lower-level servers.
 Ofcourse, the original server does not free itself from
responsibility totally. It still has a zone, but the detailed
information is kept by the lower-level servers
Department of CSE- Data Science 97
Root Server
 A root server is a server whose zone consists of the whole tree.
 A root server usually does not store any information about domains but delegates its
authority to other servers, keeping references to those servers.
 There are several root servers, each covering the whole domain name space. The
root servers are distributed all around the world.
Primary and Secondary Servers
 DNS defines two types of servers: primary and secondary.
 A primary server is a server that stores a file about the zone for which it is an
authority. It is responsible for creating, maintaining, and updating the zone file. It
stores the zone file on a local disk.
 A secondary server is a server that transfers the complete information about a zone
from another server (primary or secondary) and stores the file on its local disk. The
secondary server neither creates nor updates the zone files. If updating is required, it
must be done by the primary server, which sends the updated version to the
secondary.

Department of CSE- Data Science 98


 The primary and secondary servers are both authoritative for the zones they serve.
 The idea is not to put the secondary server at a lower level of authority but to create
redundancy for the data so that if one server fails, the other can continue serving
clients.
DNS in the Internet
 DNS is a protocol that can be used in different platforms.
 In the Internet, the domain name space (tree) was originally divided into three
different sections: generic domains, country domains, and the inverse domains.

Generic Domains
- The generic domains define registered hosts according to their generic behavior.
- Each node in the tree defines a domain, which is an index to the domain name
space database

Department of CSE- Data Science 99


Figure : Generic domains
 Looking at the tree, we see that the first level in the generic domains section allows
14 possible labels.

Department of CSE- Data Science 100


Country Domains
 The country domains section uses two-character country abbreviations (e.g., us for
United States).
 Second labels can be organizational, or they can be more specific national
designations.
 The United States, for example, uses state abbreviations as a subdivision of us (e.g.,
ca.us.).

Figure : Country domains

 The address uci.ca.us. can be translated to University of California, Irvine, in the state
of California in the United States.

Department of CSE- Data Science 101


Resolution
 Mapping a name to an address is called name-address resolution.
 DNS is designed as a client-server application.
 A host that needs to map an address to a name or a name to an address calls a DNS
client called a resolver.
 The resolver accesses the closest DNS server with a mapping request. If the server has
the information, it satisfies the resolver; otherwise, it either refers the resolver to
other servers or asks other servers to provide the information.
 After the resolver receives the mapping, it interprets the response to see if it is a real
resolution or an error, and finally delivers the result to the process that requested it.
 A resolution can be either recursive or iterative.

Department of CSE- Data Science 102


Recursive Resolution

Iterative Resolution

Department of CSE- Data Science 103


Caching
 Each time a server receives a query for a name that is not in its domain, it needs to
search its database for a server IP address.
 Reduction of this search time would increase efficiency. DNS handles this with a
mechanism called caching.
 When a server asks for a mapping from another server and receives the response, it
stores this information in its cache memory before sending it to the client.
 If the same or another client asks for the same mapping, it can check its cache memory
and resolve the problem.
 Caching speeds up resolution, but it can also be problematic. If a server caches a
mapping for a long time, it may send an outdated mapping to the client.
 To counter this, two techniques are used.
 First, the authoritative server always adds information to the mapping called time to live
(TTL). It defines the time in seconds that the receiving server can cache the information.
After that time, the mapping is invalid and any query must be sent again to the
authoritative server.
 Second, DNS requires that each server keep a TTL counter for each mapping it caches.
The cache memory must be searched periodically and those mappings with an expired
TTL must be purged.
Department of CSE- Data Science 104
Resource Records
 The zone information associated with a server is implemented as a set of resource
records.
 A resource record is a 5-tuple structure, as shown below:

 The domain name field is what identifies the resource record.


 The value defines the information kept about the domain name.
 The TTL defines the number of seconds for which the information is valid.
 The class defines the type of network; we are only interested in the class IN
(Internet). The type defines how the value should be interpreted.

Department of CSE- Data Science 105


DNS Messages
 To retrieve information about hosts, DNS uses two types of messages: query and
response.
 Both types have the same format as shown in Figure

Department of CSE- Data Science 106


 The identification field is used by the client to match the response with the query.
 The flag field defines whether the message is a query or response. It also includes
status of error.
 The next four fields in the header define the number of each record type in the
message.
 The question section consists of one or more question records. It is present in both
query and response messages.
 The answer section consists of one or more resource records. It is present only in
response messages.
 The authoritative section gives information (domain name) about one or more
authoritative servers for the query.
 The additional information section provides additional information that may help the
resolver.

Department of CSE- Data Science 107


Registrars
 How are new domains added to DNS? This is done through a registrar, a commercial
entity accredited by ICANN.
 A registrar first verifies that the requested domain name is unique and then enters it
into the DNS database. A fee is charged.
 Today, there are many registrars; their names and addresses can be found at

 To register, the organization needs to give the name of its server and the IP address of
the server.
 For example, a new commercial organization named wonderful with a server named
ws and IP address 200.200.200.5 needs to give the following information to one of the
registrars:

Department of CSE- Data Science 108


DDNS
 The DNS master file must be updated dynamically.
 The Dynamic Domain Name System (DDNS) therefore was devised to
respond to this need.
 In DDNS, when a binding between a name and an address is determined,
the information is sent, usually by DHCP to a primary DNS server.
 The primary server updates the zone.
 The secondary servers are notified either actively or passively.
 In active notification, the primary server sends a message to the
secondary servers about the change in the zone, whereas in passive
notification, the secondary servers periodically check for any changes.
 In either case, after being notified about the change, the secondary
server requests information about the entire zone (called the zone
transfer).
 To provide security and prevent unauthorized changes in the DNS
records, DDNS can use an authentication mechanism.
Department of CSE- Data Science 109
Department of CSE- Data Science 110
Security of DNS
 DNS is one of the most important systems in the Internet infrastructure; it provides
crucial services to Internet users.
 Applications such as Web access or e-mail are heavily dependent on the proper
operation of DNS. DNS can be attacked in several ways including:
1. The attacker may read the response of a DNS server to find the nature or names of
sites the user mostly accesses. This type of information can be used to find the
user’s profile. To prevent this attack, DNS messages need to be confidential
2. The attacker may intercept the response of a DNS server and change it or create a
totally new bogus response to direct the user to the site or domain the attacker
wishes the user to access. This type of attack can be prevented using message
origin authentication and message integrity
3. The attacker may flood the DNS server to overwhelm it or eventually crash it. This
type of attack can be prevented using the provision against denial-of-service attack.
 To protect DNS, IETF has devised a technology named DNS Security (DNSSEC) that
provides message origin authentication and message integrity using a security service
called digital signature
Department of CSE- Data Science 111

You might also like