0% found this document useful (0 votes)
53 views19 pages

Dynamic Web Site Development - Introduction To Web Technology

The document provides an introduction to basic web technology concepts. It discusses how the Internet began as a US military network and has since grown globally. It describes the TCP/IP protocol that allows heterogeneous computers and networks to communicate by establishing connections and ensuring data delivery. It also discusses IP addresses that uniquely identify devices, and domain names that provide a human-friendly naming system via the Domain Name System. Port numbers further allow multiple applications to use a single IP address for communication.

Uploaded by

Anna Ho
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views19 pages

Dynamic Web Site Development - Introduction To Web Technology

The document provides an introduction to basic web technology concepts. It discusses how the Internet began as a US military network and has since grown globally. It describes the TCP/IP protocol that allows heterogeneous computers and networks to communicate by establishing connections and ensuring data delivery. It also discusses IP addresses that uniquely identify devices, and domain names that provide a human-friendly naming system via the Domain Name System. Port numbers further allow multiple applications to use a single IP address for communication.

Uploaded by

Anna Ho
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 19

Dynamic Web Site Development – Introduction to Web Technology

Dynamic Web Site Development – Introduction to


Web Technology

This module introduces the basic communication paradigm of the


Internet.
The basic concepts of protocols, port numbers, IP addressing etc. is
required to be able to understand the communication between a
web client and a web server.

The Internet
The Internet began in the late 1970's as a research project of the
United States Department of Defense (D.O.D). The US military were
experimenting with wide-area networks to see if it was possible to
link computers across the US so that they could continue
communicating with each other in the advent of a nuclear war. This
was desirable, as most of the early warning systems and defense
systems were becoming computerized in the 1970s.

In the mid 1980's the Internet began a period of explosive growth as


government agencies, academic institutions, private research
laboratories and corporations began to inter-connect their
computers in a network that has come to span the globe.
Although the World Wide Web in the minds of many people, has
become synonymous with the Internet, the two are quite distinct.
The Internet is a hierarchical conglomeration of connected networks.
A network can be as small as two machines connected together or
as large as is convenient for client communication. At the junction of
networks there are dedicated computers called routers. It is the
routers job (among other things) to decide how to get a piece of
data from its source to its destination.

The novelty of the Internet is in its heterogeneity. The machines


connected to it range from personal computers to high speed
supercomputers to devices that are not normally viewed as
computers at all, for example, printers. The physical wires that inter-
connect these machines are also heterogeneous; they include optic

Page 1/19
Dynamic Web Site Development – Introduction to Web Technology

fibers, micro wave links, communication satellites, coaxial cables,


copper wires, telephone lines etc.

Page 2/19
Dynamic Web Site Development – Introduction to Web Technology

Protocols
There is no point in connecting heterogenous computers if they
cannot ex-change information. The diffculty of interconnecting
computers is they do not all speak the same language.
The one thing that all parts of the Internet have in common is the
protocol they use to send information from one machine to another.
The protocol used is TCP/IP (Transfer Control Protocol/Internet
Protocol). This language (and grammar) species how two computers
can send each other, how they introduce themselves, and how they
conduct a conversation.

Using TCP/IP any computer can contact any other computer on the
Internet and exchange data with it provided that
1. it knows the remote computer's address, and
2. the remote computer is willing to talk.

The TCP/IP is a low level protocol, it is used to establish the link, set
up the line of communication and ensure the data arrives at the
destination. TCP/IP has no knowledge of the contents of the data or
of high level structures. To TCP/IP all data is a linear stream of 8-bit
numbers. To use an analogy, the telephone company establishes the
line of communication when you dial a telephone number and
ensures it remains open and all the data (your voice) arrives at the
destination. It is up to you, not the telephone company, to ensure
that the data you send is understandable at the other end, that is,
you both speak the same language! The TCP/IP will maintain the link
and ensure the data arrives intact, something else must ensure that
the data has meaning.

Obviously it is useless to develop the infrastructure (the Internet,


TCP/IP) to link disparate computers if they send each others
languages incomprehensible. With the development of the physical
infrastructure for the exchange of data there was a corresponding
development in high level data exchange protocols. One of the
oldest, and still one of the most important protocols, is the Simple
Mail Transfer Protocol (SMTP). This is the language used by different
computers to transfer electronic mail around the world. The protocol
Page 3/19
Dynamic Web Site Development – Introduction to Web Technology

allows computers to recognise mail messages, and pass the


message onto the recipient.

The program Telnet (and the protocol it uses) was developed to


allow users to connect to remote computers and log onto them (if
they have a valid account). From the remote machine they can
interact with the computer as if they were directly connected to it.
The File Transfer Protocol (FTP) was developed to streamline the
retrieval of large _les from _le archives. If you know the name of the
archive machine FTP can be used to search the archive and retrieve
the required _les. Irrespective of the computer you use the
commands used in the FTP protocol are the same for getting _les
from and putting _les onto a remote computer's le system.

The Network News Transfer Protocol (NNTP) is used for Usenet, the
Internet news system. This was designed as a communal bulletin
board, where anyone could post information, state their opinions on
any topic, and generally have their say.

There are many more protocols designed for a variety of useful


tasks, (e.g. Archie, Gopher, WAIS) some have been successful some
have not. One major difficulty of these protocols is that each
required the user to master a different piece of software, no two
with the same interface

Page 4/19
Dynamic Web Site Development – Introduction to Web Technology

IP Addresses
The telephone company assigns a unique telephone number to
every telephone. There are no two telephone numbers that are the
same in the world (remember a full telephone number incorporates
the country code and area code). The TCP/IP is the same, every
machine on the Internet is assigned a unique number, the IP
address. IP addresses are 32-bit numbers that are usually written
out as four 8-bit numbers, separated by dots. There are
approximately 4 billion addresses available, which may seem
sufficient for sometime into the future, unfortunately this is not the
case. There are a number of reasons for this,
_ there are reserved addresses for special purposes, such as
multicasting.
_ more importantly addresses are issued in contiguous blocks, not
individually.

The IP addressing is organized hierarchically in a series of networks


and subnetworks. Blocks of contiguous addresses are issued to
organizations and regional networks, who in turn issue sub-blocks of
addresses. For example the University of Southern Queensland has
been issued the block of addresses 139.86.1.1 to 139.86.255.255
Within this range blocks are allocated to different departments by
the university.

Organizationally, it's simpler to give blocks of addresses to


organizations and allow those organizations to divide them up as
they see _t. Technically, it's much easier for network routers to
determine how to get data from one address to another when the
Internet is organized into a hierarchy of networks and sub networks.

Page 5/19
Dynamic Web Site Development – Introduction to Web Technology

Page 6/19
Dynamic Web Site Development – Introduction to Web Technology

Domain Names
The IP address is computer friendly but not people friendly. It is
difficult to remember and hard to type. For this reason, as well as an
IP address computers are also given a people friendly name. The
names are assigned using the distributed hierarchical lookup system
known as Domain Name System (DNS). In the DNS each machine
has a unique name consisting of multiple parts separated by dots
(Not unlike IP addresses, except there is no limit to the number of
parts).

The first part of a DNS name is the machine's name, followed by


an hierarchical list of domain names. The first domain is usually an
identifier for the department to which the machine belongs. The
next is usually an identifier for the organization as a whole. The next
identifies the type of organization and the last identifies the country.

The Mathematics and Computing web server is found on


machine www.sci.usq.edu.au. The machine name is www, the
department/faculty name is sci, the organization name is usq, the
organization type is edu and the country is au.
Page 7/19
Dynamic Web Site Development – Introduction to Web Technology

For a complete list of country codes refer to the ISO 3166 country
code list in the course resources directory.

Code Country
au Australia
uk United Kingdom
ch Switzerland
jp Japan
de Germany

This hierarchical list of domain names is gradually breaking down


as commercial interests have begun to dominate the Internet.
Companies prefer short and easily remembered domain names and
are not willing to accept the existing convention.

Therefore you will find domain names that do not follow the
convention. An important feature of the DNS is that a single
machine can have one or more aliases assigned to it in addition to
its true name. This feature is widely used to give descriptive names
to server machines.

For example the Department of Mathematics and Computing at


USQ maintains both a web server and an FTP server. The address for
the web server is www.sci.usq.edu.au, the address for the FTP server
is ftp.sci.usq.edu.au. Both names resolve to the same IP address and
neither is the machine's true name.

Page 8/19
Dynamic Web Site Development – Introduction to Web Technology

Port Numbers
When two programs on different computers wish to
communicate with each other it isn't enough that they know each
others IP addresses. They need a mechanism to be able to
rendezvous. As a single machine runs multiple programs and
supplies multiple services, an external program needs a mechanism
to be able to specify the program it wishes to communicate with on
a remote machine.

The mechanism used is port numbers. The IP address identifies


the machine, the port number identifies the particular program on
the remote machine. Ports are identified by a number from 0 to
65,535. Any program that wishes to use a port tells the machine it is
running on, to reserve a particular port for its exclusive use. Any
external program requesting communications with a port can only
talk to the program that has reserved the port. The external
program does not need to know the name of the local program only
the port number it expects it to be listening on.
Well known ports are those that by convention, have been reserved
for use
by particular services. Ports 0 to 1023 have been reserved for
internet services, all other ports are freely

Port Service
23 Telnet
80 HTTP
119 NNTP
21 FTP
Page 9/19
Dynamic Web Site Development – Introduction to Web Technology

25 SMTP

Clients and Servers


To establish a communications link between two programs (either on
the same machine or on different machines) one program must
initiate a connection and the other must accept it. This is
accomplished using a server/client scheme.

Server
When the server starts up it signals the operating system that it
is willing to accept connections on a given port. It then waits for the
connections. The server starts running first.

Client
When a client needs to send information to the server or
retrieve information from the server it opens a connection to the
known port, and passes information back and forth. When finished
the client closes the connection.

Most servers can handle multiple simultaneous incoming


connections. They do this by either replicating themselves in
memory when a new connection is requested or by cleverly
interleaving their communication activity amongst the clients.

The distinction between client and server rests on who initiates the
connection and who accepts it. Although the server is normally the
information provider this is not always the case. However, it is
generally true that the client interacts with the user, processing
keystrokes and displaying results.
Page 10/19
Dynamic Web Site Development – Introduction to Web Technology

The user interacts with the server through the client, never directly.

Exercise: A simple and informative way to learn about any protocol


is to connect to a server using the telnet program and talk to the
server directly.
Experiment connecting to mail servers by connecting to port 25
For example try the following command
telnet www.sci.usq.edu.au 25
This command directs telnet to connect to www.sci.usq.edu.au using
the SMTP port 25.
This command assumes you are using a Unix system, but any
system running telnet that allows you to specify the port should
work. After you have connected try the command

HELP
What happens?
To exit type QUIT
An alternative address to try if the one above fails is
romulus.sci.usq.edu.au or your ISP's mail server.

Page 11/19
Dynamic Web Site Development – Introduction to Web Technology

The World Wide Web


In 1989, Tim Berners-Lee5 and his associates at CERN, the European
particles physics center, proposed the creation of a new information
system called “WorldWideWeb”. The system was designed to aid the
CERN scientists with disseminating and locating information on the
Internet. Particle physics projects are such huge collaborative efforts
that a system was needed to unify all the fragmented information
services and _le protocols into a single point of access.

Instead of having to invoke different programs to retrieve


information via different protocols, users would be able to use a
single program, called a browser, with a single user interface, that
would understand the various protocols. The browser had the task of
figuring out how to fetch the information and display it.

A central part of the proposal was to use a hypertext metaphor:


information would be displayed as a series of resources. Related
resources would be linked together by specially tagged words,
phrases and images. By selecting one of these hypertext links the
browser would download the resource even though it is on a
different machine and accessed through a different protocol.

Page 12/19
Dynamic Web Site Development – Introduction to Web Technology

The turning point for the Web occurred in 1993, when the U.S.
National
Center for Supercomputing Applications (NCSA) released it's web
browser Mosaic. This browser used icons, pop up menus, rendered
bit mapped text, displayed images, used color links to display
hypertext links and provided support for sounds, animations, and
other types of multimedia.

Uniform Resource Identifier


The Uniform Resource Identifier or URI is an abstract standard
system for identifying resources on the Internet. There are currently
two types of URI's: the Uniform Resource Locator (URL) and the
Uniform Resource Name (URN). Currently only the URL has been
implemented.

Uniform Resource Locator


A Uniform Resource Locator is a way to tell a browser how and
where to find an item of interest on the Internet. The URL is a
straightforward way to indicate the retrieval protocol, host, and
location of an Internet resource.

http://www.sci.usq.edu.au:80/courses/CSC2406/index.html
protocol host port path resource

The first part of the URL (delimited by the colon) specifies the
communication protocol, eg. http, ftp, news, mailto, gopher, telnet.
If the protocol is omitted then a web browser assumes http. The
Page 13/19
Dynamic Web Site Development – Introduction to Web Technology

second part, beginning with the double slash and ending with the
single slash is the name of the host machine on which the resource
resides. An optional port number can be specified but it is only
required if the remote server has been configured to use a
nonstandard port. The rest of the URL is the path to the resource.

The path format is different depending on the protocol used.

Legal Characters in URLs


Only some characters are permitted within URLs. Alphanumeric (ie
upper-and lowercase letters and numerals) and the characters $ @,-
are all legal.

The characters =;/#?:%&+ and the space character are also legal
but have special meanings
(eg. the : is used to delimiter the port number). ALL other
characters, symbols etc. are illegal.

To include special characters without their special meaning or to


include illegal characters in a URL they must be escaped, using an
escape code. The escape code consists of the % character followed
by the two-digit hexadecimal code of the character.

For example, a carriage return can be placed into a URL using the
escape sequence %0D, a space is escaped to %20, and the percent
sign by %25. The character codes used in a URL are the ASCII
character codes (see the ASCII character codes in the course
resources directory) and the 8-bit superset, ISO Latin-1 (see the
Latin-1 page in the course resources directory).

An example of using escape codes in a URL:


/courses/CSC2406/Welcome%20Page

URL Addressing
There are two types of URLs, absolute and relative. An absolute URL
contains all the information necessary to locate the resource. For
instance the following are absolute URLs
http://www.sci.usq.edu.au/courses/CSC2406/index.html
Page 14/19
Dynamic Web Site Development – Introduction to Web Technology

www.sci.usq.edu.au/courses/CSC2406/closed/Changes.html
www.usq.edu.au/library/

Though the last two did not specify the protocol the web client will
assume http.
In the above examples the machine address, the path and the
resource were all specified. Given this information you know exactly
where the resource is located, or more importantly the web browser
does. What about the following valid URLs

/courses/index.html
../closed/Changes.html
appendix/ascii.html

How does the web browser know where to look for the resources
since the addresses are incomplete? The browser must assume the
URL is relative to the current resource to fill in the blanks. The
current resource is the resource that contains the relative URLs. For
example, if you download the HTML page
http://www.sci.usq.edu.au/courses/CSC2406/sb/index.html and it
contains the relative URLs above, they are interpreted as having the
following absolute URLs:
http://www.sci.usq.edu.au/courses/index.html
http://www.sci.usq.edu.au/courses/CSC2406/closed/Changes.html
http://www.sci.usq.edu.au/courses/CSC2406/sb/appendix/ascii.html

The leading slash of the relative URL /courses/index.html, implies


that this URL is missing only the machine name. The leading double
dots of the relative URL ../closed/Changes.html, has the same
meaning as in Unix, go up to the parent directory first then down
into the directory closed.

HTTP

Page 15/19
Dynamic Web Site Development – Introduction to Web Technology

Page 16/19
Dynamic Web Site Development – Introduction to Web Technology

Page 17/19
Dynamic Web Site Development – Introduction to Web Technology

Seeing the whole conversation

Before moving ahead, let's get a better idea of how HTTP headers
work by viewing a webpage without a browser, so we can see the
converation in is entirety.
Start by opening a command prompt (in windows, go to Start-
>Run, type cmd, and click "OK"...if you're using linux you
probably already know). At the prompt, type:

telnet expertsrt.com 80

and press Enter. This will connect you to expertsrt.com on port 80.
Next, copy and paste just the text below:

GET / HTTP/1.1
Host: expertsrt.com

Don't worry if when you type or paste the text, it does not show up
in your command window and all you see is the cursor -- it is indeed
being sent to the server. The first line says you are using the GET
request method to get the resource / (i.e. the file in the base
directory of the host), and that you are using HTTP version 1.1. The
second tells the server which host you want to connect to. When
you finish typing 'expertsrt.com', hit Enter twice (and twice only).
You should almost immediately get a response that looks like:

HTTP/1.1 301 Moved Permanently


Date: Wed, 08 Feb 2006 07:44:07 GMT
Server: Apache/2.0.54 (Debian GNU/Linux)
mod_auth_pgsql/2.0.2b1 mod_ssl/2.0.54 OpenSSL/0.9.7e
Location: http://www.expertsrt.com/
Content-Length: 233
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">


<html><head>
<title>301 Moved Permanently</title>

Page 18/19
Dynamic Web Site Development – Introduction to Web Technology

</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a
href="http://www.expertsrt.com/"> here</a>.</p>
</body></html>
Whoops! Looks like we requested a resource that wasn't there; it's
been permanently moved to the new Location
http://www.expertsrt.com. If you were using a browser, you'd only
see the HTML — everything before the first blank line is the headers.
In fact, modern browsers are even smarter than that — when they
see the Location header on the third line, they automatically go
there so you don't have to type in a new URL. Let's go to the new
URL. By this point, you probably got disconnected while you were
reading this. If so, just press your up arrow on the keyboard to get
your telnet command back, and press enter to reconnect. If you're
still connected, you can just go ahead and type the following:

GET / HTTP/1.1
Host: www.expertsrt.com

and press Enter twice after the second line. You'll get another similar
response telling you that the page is actually at
http://www.expertsrt.com/index.php. The server is particular, isn't it?
;-) Repeat the above, but this time type

GET /index.php HTTP/1.1


Host: www.expertsrt.com

Notice that the name of the file we want is in the first line. This time
we get flooded with text: the HTML from ERT's homepage. The
headers look like

HTTP/1.1 200 OK
Date: Wed, 08 Feb 2006 08:20:07 GMT
Server: Apache/2.0.54 (Debian GNU/Linux)
mod_auth_pgsql/2.0.2b1 mod_ssl/2.0.54 OpenSSL/0.9.7e
X-Powered-By: PHP/4.4.0
Transfer-Encoding: chunked
Content-Type: text/html
Page 19/19

You might also like