Web Programming: Origins of Internet and Web
Web Programming: Origins of Internet and Web
Internet started as a research project to experiment with connecting computers together with packet
switched networks. It was developed with funding and leadership of the DoD (Department of Defense ,
US) by Advanced Research Projects Agency (ARPA). First node established in 1969. This ARPAnet was
used for small text based email transfers. But it was available only for labs and universities.
Later came BITnet and CSnet for file and email transfer in late 70’s , still were not able to be identified as
National networks.
A new National network was constructed by National Science Foundation(NSFNet). Soon it was available
for many institutions and research labs and by 1990 NSFnet replaced ARPAnet, connecting more than
one million computers around the world by 1992. In 1995, a small part of NSFnet returned to be
research network and the rest developed as Internet.
Internet is a huge collection of computers and other devices connected in a communicative network. It
is a network of networks. These computers communicate using the low level protocol-TCP/IP,
Transmission Control Protocol/Internet Protocol.
IP Address:
A protocol is a set rules to govern the communication of data between computers in a network. All the
computers attached to a network must be identified. This is done by a number called Internet Protocol
address (IP address). It is a unique 32 bit address number, divided into 4 parts of 8 bits each. There are
various classes of networks depending on the IP addresses. Eg: 192.168.2.56
A Host is a computer in the network whose function is to provide services to the network users.
Domain is a part of the computer network where the data processing resources are under common
control. The resources may be processors, storage, I/O devices, files, data, programs etc. The name of
the last system in a network is called Domain name. There is a unique IP address for each domain name.
Users type the domain names on the browsers , they may not know the corresponding IP address. This
conversion of the domain name to the corresponding IP address is done by Domain Name server.
Domain names will begin with the name of the host machine, followed by the larger enclosing collection
of machines.
Hostname.firstdomain.seconddomain…..last domain
Eg: vtu.sit.mca.org here vtu is the host, sit is first domain name, org is the last demain which
identifies the type of organization in which the host resides.
Some popular domain names are: com, edu, org, mil, in, au etc.
This fully qualified domain name given by the user on their browser must be converted into an IP
address before the message is transmitted to the destination on the Internet. These conversions
(mapping) are done by software systems called Domain Name Servers. All document requests from
browsers are routed to the nearest DNS, if it cannot, to the next DNS.
Originally constructed by a small group of people led by Tim Berners Lee at CERN, in 1989. He proposed
a new protocol for the Internet and a system of document access to use it and named as WWW. This
system allowed document search and retrieval from any part of the Internet. In 1991 this was released
to the world.
The documents were having Hypertext as the content- text having links to other documents allowing
non-sequential access of the content. Then units of information on the web can be referred to as pages,
documents or resources. A document can contain text, images, sound and video, together called
Hypermedia. Web and Internet are not same.
So, Web is a vast collection of data , information, software and protocols , spread across the world in
web servers , which are accessed by client machines by browsers through the Internet. The components
of the web are as follows:
• Clients use browser application to send URIs via HTTP to servers requesting a Web page
• Web pages constructed using HTML (or other markup language) and consist of text, graphics,
sounds plus embedded files
• The entire system runs over standard networking protocols (TCP/IP, DNS,…)
Web Browsers:
A browser is (a software) a client on the web which initiates the communication with the
server. The request from the browser (client) is served by the server. Examples are Internet
Explorer, Mozilla FireFox, Netscape Navigator, Safari etc.
Static content: content stored in files and retrieved in response to an HTTP request
All Web content is associated with a file that is managed by the server.
All the communication between the web client and a web server use the std protocol http.
Web server informs its operating system to accept incoming network connections using a
specific port on the machine.
The server also runs as a bachground process.
A client (browser) opens a connection to the server, sends a request, receives information from
server and closes the connection.
Web server mainly monitors a communications port on its host machine, accept the http
commands through it and performs specified operations.
http commands include a URL specifying the host machine.
The URL received is translated into either a filename or a program name, accordingly the
requeated file or the output of the program execution is sent back to the browser.
Server characteristics:
1) Document root:- file hierarchy growing from this has the web documents which are served to the
clients.
2) Server root:- stores the server and support software.
Clients have NO DIRECT ACCESS to the document root through their URLs. The server maps the
requested URLs to the document root.
Latest servers provide more than one site on them thus reducing the cost maintenance. These
secondary hosts are called virtual hosts. Servers interact with databases using Common Gateway
Interface (CGI).
Apache : derived from a patchy version of earlier server httpd and is the most widely used one. It is an
open source software , fast and reliable. Based on UNIX but suitable for other platforms too.
The path of the document for a http protocol is same as that for a document or file or a directory in a
client. In Unix the path components are separated by forward slashes (/) and in windows backward
slashes (\).
But an URL need not include all the directories in the path. A path which includes all the directories is a
complete path, else it is a partial path.
Web uses many protocols, HTTP (Hyper Text transfer Protocol) being the most important which
is a very simple request/response protocol
It has two parts, request phase and response phase. Both have two parts: a header-contains info about
the communication and a body-contains the actual data.
• 1xx – Informational – request received, • 4xx – Client Error – bad syntax or cannot
processing be fulfilled
Header Fields:
Accept field is the most common request field which specifies the browser preference for MIME type of
the requested file. (host, user-agent being others)
accept: image/gif
1. Status line-3 digit status code included in http version and a short W
explanation of the code
Eg: http/1.1 200 ok. Meaning of status code is : c
First digit
1
Category
Informational
(bro
2 Success
3 Redirection
4 Client error
5 Server error
2. Response header fields- has many lines of info about the response.
• Date: Mon, 31 Dec 2007 • Server: Apache/2.0.46 (Red Hat)
03:29:50 • Last-Modified: Sun, 09 Jan 2005
• GMT • 03:00:18 GMT
3. Blank line
4. Response body <HTML><Head>……etc
Security:
Encryption: process of converting data into unknown format to prevent from unauthorized usage.
Before 1976, an encryption key had to be secretly communicated between parties. Diffie and
Hellman invented public key encryption system that uses two-part keys .The public key is freely
shared with the world Each person keeps their own private part of the key.
Public Key Encryption
A wants to send a message to B
A uses B's public key to encrypt the
message.
The message is sent over public channels.
Only B can decrypt it with his private key
Perl, CGI, PHP, Ruby, Rails and AJAX. ( Collect Information about these topics)
--------------------------------------------------------------------------------------------
HTML
HTML is a language for describing web pages.
HTML stands for Hyper Text Markup Language
HTML is not a programming language, it is a markup language
A markup language is a set of markup tags
HTML uses markup tags to describe web pages
HTML is the authoring language used to create documents on the World Wide Web.
HTML is used to define the structure and layout of a Web page, how a page looks and any
special functions.
HTML does this by using what are called tags that have attributes.
For example <p> means a paragraph break. Tim Berners-Lee & HTML
Tim Berners-Lee was the primary author of html, assisted by his colleagues at CERN, an
international scientific organization based in Geneva, Switzerland. Tim Berners-Lee is currently
the Director of the World Wide Web Consortium, the group that sets technical standards for
the Web.
HTML Tags
<html>
<body>
This is the visible page content
<h1>My First Heading</h1>
</body>
</html>
Any plain text editor is used (like Notepad) to create and edit HTML. Save the page with
.html as the extension. However, professional web developers often prefer HTML editors
like FrontPage, PageMill or Dreamweaver, instead of writing plain text. The file is then
opened using any we browser like Microsoft Internet Explorer, Netscape Navigator, Mozilla
FireFox etc.
HTML elements without content are called empty elements. Empty elements can be
closed in the start tag.
<br> is an empty element without a closing tag (it defines a line break).
In XHTML, XML, and future versions of HTML, all elements must be closed.
Adding a slash to the start tag, like <br />, is the proper way of closing empty elements,
accepted by HTML, XHTML and XML.
HTML tags are not case sensitive
HTML Attributes
Attributes provide additional information about HTML elements.
HTML elements can have attributes
Attributes provide additional information about the element
Attributes are always specified in the start tag
Attributes come in name/value pairs like: name="value"
(description of various tags and their attributes with examples is to be noted)
XHTML
XHTML is a stricter and cleaner version of HTML.
Before you continue you should have a basic understanding of HTML and the basics of
building web pages
XML is a markup language where everything must be marked up correctly, which results
in "well-formed" documents.
Important differences:
All XHTML documents must have a DOCTYPE declaration. The html, head, title, and
body elements must be present.
The xmlns attribute in <html>, specifies the xml namespace for a document, and is
required in XHTML documents. However, the HTML validator at w3.org does not
complain when the xmlns attribute is missing.
A DTD, or “Document Type Definition” describes the syntax to use for the current
document
There are three different DTDs for XHTML--you can pick the one you want
You must start your XHTML document with a reference to one of these DTDs
Every XHTML document must begin with one of the DOCTYPE declarations (DTDs):
Strict
Use for really clean markup, with no display information (no font, color, or
size information)
Use with CSS (Cascading Style Sheets) if you want to define how the
document should look
Transitional
Frameset
An XHTML Example:
------------------------------------------------------------------------------------------------------