WWW Protocols and Standards
WWW Protocols and Standards
Introduction to HTTP
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed,
collaborative, hypermedia information systems.
This is the foundation for data communication for the World Wide Web (i.e. internet) since
1990.
HTTP is a generic and stateless protocol which can be used for other purposes as well using
extensions of its request methods, error codes, and headers.
Basically, HTTP is a TCP/IP based communication protocol, that is used to deliver data
(HTML files, image files, query results, etc.) on the World Wide Web.
Basic Features
There are three basic features that make HTTP a simple but powerful protocol:
HTTP is connectionless
HTTP is media independent
HTTP is stateless
HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a
request is made, the client waits for the response. The server processes the request and sends a
response back after which client disconnect the connection. So client and server knows about each
other during current request and response only. Further requests are made on new connection like
client and server are new to each other.
HTTP is media independent: It means, any type of data can be sent by HTTP as long as both the
client and the server know how to handle the data content. It is required for the client as well as
the server to specify the content type using appropriate MIME-type.
HTTP is stateless: As mentioned above, HTTP is connectionless and it is a direct result of HTTP
being a stateless protocol. The server and client are aware of each other only during a current
request. Afterwards, both of them forget about each other. Due to this nature of the protocol,
neither the client nor the browser can retain information between different requests across the web
pages.
Basic Architecture
The following diagram shows a very basic architecture of a web application and depicts where
HTTP sits:
The HTTP protocol is a request/response protocol based on the client/server based architecture
where web browsers, robots and search engines, etc. act like HTTP clients, and the Web server
acts as a server.
Client:
The HTTP client sends a request to the server in the form of a request method, URI, and protocol
version, followed by a MIME-like message containing request modifiers, client information, and
possible body content over a TCP/IP connection.
Server:
The HTTP server responds with a status line, including the message's protocol version and a
success or error code, followed by a MIME- like message containing server information, entity
meta information, and possible entity- body content.
INTRODUCTION TO HTML
HTML is not a programming language; it is a markup language that defines the structure
of your content.
HTML consists of a series of elements, which you use to enclose, or wrap, different parts
of the content to make it appear a certain way, or act a certain way.
The enclosing tags can make a word or image hyperlink to somewhere else, can italicize
words, can make the font bigger or smaller, and so on.
For example, take the following line of content:
My cat is very grumpy
If we wanted the line to stand by itself, we could specify that it is a paragraph by enclosing
it in paragraph tags:
<p> My cat is very grumpy </p>
What is HTML?
HTML is the main markup language of the web. It runs natively in every browser and is
maintained by the World Wide Web Consortium.
You can use it to create the content structure of websites and web applications. It's the
lowest level of frontend technologies that serves as the basis for styling you can add with
CSS and functionality you can implement using JavaScript.
Anatomy of HTML element
Let’s explore a paragraph element further
Attributes contain extra information about the element that you don't want to appear in the
actual content.
Here, class is the attribute name, and editor-note is the attribute value.
The class attribute allows you to give the element a non-unique identifier that can be used
to target it (and any other element with the same class value) with style information and
other things.
An attribute should always have the following:
1. A space between it and the element name (or the previous attribute, if the element already
has one or more attributes).
2. The attribute name followed by an equal sign.
3. The attribute value wrapped by opening and closing quotation marks.
Elements
Nesting Elements:
You can put elements inside other elements too - - this is called nesting. If we wanted to state that
our cat is very grumpy, we could wrap the word "very" in a <strong> element, which means that
the word is to be strongly emphasized:
<p>My cat is<strong>very</strong>grumpy.</p>
In the example above, we opened the <p> element first, then the <strong> element; therefore, we
have to close the <strong> element first, then the <p> element. The following is incorrect:
<p>My cat is<strong>very grumpy.</strong></p>
Empty Element:
Some elements have no content and are called empty elements. Take the <img> element that we
already have in our HTML page:
<img src="images/firefox_icon.png alt="my test image">
This contains two attributes, but there is no closing </img> tag and no inner content. This is
because an image element doesn't wrap content to affect it. Its purpose is to embed an image in the
HTML page in the place it appears.
Marking up Text
This section will cover some of the essential HTML elements you'll use for marking up the text:
Heading: Heading elements allow you to specify that certain parts of your content are headings
— or subheadings. In the same way that a book has the main title, chapter titles and subtitles, an
HTML document can too. HTML contains 6 heading levels, <h1>-<h6> although you'll commonly
only use 3 to 4 at most:
Paragraphs:
As explained above, <p> elements are for containing paragraphs of text; you'll use these frequently
when marking up regular text content:
<p>This is a single paragraph</p>
Lists:
A lot of the web's content is lists and HTML has special elements for these. Marking up lists always
consist of at least 2 elements. The most common list types are ordered and unordered lists:
Unordered list: are for lists where the order of the items doesn't matter, such as a shopping
list. These are wrapped in a <ul> element.
Ordered list: are for lists where the order of the items does matter, such as a recipe. These
are wrapped in an <ol> element.
Each item inside the lists is put inside an <li> (list item) element:
For example, if we wanted to turn the part of the following paragraph fragment into a list:
<p>At Mozilla, we're global community of technologist, thinkers and builders working
together...<p>
We could modify the markup to this:
Links:
Links are very important — they are what makes the web a web! To add a link, we need to use a
simple element <a>, "a" being the short form for "anchor". To make text within your paragraph
into a link, follow these steps:
1. Choose some text. We chose the text "Mozilla Manifesto".
2. Wrap the text in an <a> element as shown below:
<a>Mozilla manifesto</a>
3. Give the <a> element an href attribute, as shown below:
<a href="">Mozilla manifesto</a>
4. Fill in the value of this attribute with the web address that you want the link to link to:
<a href="http://www.mozilla.org/en-US/about/manifesto/ ">Mozilla manifesto</a>
You might get unexpected results if you omit the https:// or http:// part, called the protocol, at the
beginning of the web address. After making a link, click it to make sure it is sending you where
you wanted it to.
Extensible HyperText Markup Language (XHTML)
XHTML stands for EXtensible HyperText Markup Language
XHTML is a stricter, more XML-based version of HTML
XHTML is HTML defined as an XML application
XHTML is supported by all major browsers
Why XHTML?
XML is a markup language where all documents must be marked up correctly (be "well-formed").
XHTML was developed to make HTML more extensible and flexible to work with other data
formats (such as XML). In addition, browsers ignore errors in HTML pages, and try to display the
website even if it has some errors in the markup. So XHTML comes with a much stricter error
handling.
The Most Important Differences from HTML
<!DOCTYPE> is mandatory
The xmlns attribute in <html> is mandatory
<html>, <head>, <title>, and <body> are mandatory
Elements must always be properly nested
Elements must always be closed
Elements must always be in lowercase
Attribute names must always be in lowercase
Attribute values must always be quoted
Attribute minimization is forbidden
XHTML - <!DOCTYPE ....> Is Mandatory
An XHTML document must have an XHTML <!DOCTYPE> declaration.
The <html>, <head>, <title>, and <body> elements must also be present, and the xmlns attribute
in <html> must specify the xml namespace for the document.
XHTML Elements Must be Properly Nested
In XHTML, elements must always be properly nested within each other, like this:
Correct:
<b><i>Some text</i></b>
Wrong:
<b><i>Some text</b></i>
Common Gateway Interface (CGI)
The Common Gateway Interface (CGI) provides the middleware between WWW servers and
external databases and information sources.
The World Wide Web Consortium (W3C) defined the Common Gateway Interface (CGI) and also
defined how a program interacts with a Hyper Text Transfer Protocol (HTTP) server.
The Web server typically passes the form information to a small application program that
processes the data and may send back a confirmation message. This process or convention for
passing data back and forth between the server and the application is called the common gateway
interface (CGI).
Features of CGI
It is a very well defined and supported standard.
CGI scripts are generally written in either Perl, C, or maybe just a simple shell script.
CGI is a technology that interfaces with HTML.
CGI is the best method to create a counter because it is currently the quickest
CGI standard is generally the most compatible with today's browsers
Advantages of CGI
The advanced tasks are currently a lot easier to perform in CGI than in Java.
It is always easier to use the code already written than to write your own. CGI specifies
that the programs can be written in any language, and on any platform, as long as they
conform to the specification.
CGI-based counters and CGI code to perform simple tasks are available in plenty.
Disadvantages of CGI
There are some disadvantages of CGI which are given below:
In Common Gateway Interface each page load incurs overhead by having to load the
programs into memory.
Generally, data cannot be easily cached in memory between page loads.
There is a huge existing code base, much of it in Perl.
CGI uses up a lot of processing time.
Common uses of CGI include
Guestbooks
Email Forms
Mailing List Maintenance
Blogs
EXTENSIBLE MARKUP LANGUAGE (XML)
XML stands for Extensible Markup Language. It is a text-based markup language derived from
Standard Generalized Markup Language (SGML).
XML tags identify the data and are used to store and organize the data (storage, transmission and
processing), rather than specifying how to display it like HTML tags, which are used to display
the data.
XML is not going to replace HTML in the near future, but it introduces new possibilities by
adopting many successful features of HTML.
Characteristics
There are three important characteristics of XML that make it useful in a variety of systems and
solutions:
• XML is extensible - XML allows you to create your own self-descriptive tags, or language,
that suits your application.
• XML carries the data, does not present it - XML allows you to store the data irrespective
of how it will be presented.
• XML is a public standard - XML was developed by an organization called the World
Wide Web Consortium (W3C) and is available as an open standard.
XML Usage
A short list of XML usage says it all:
XML can work behind the scene to simplify the creation of HTML documents for large
web sites.
XML can be used to exchange the information between organizations and systems. XML
can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your data handling
needs.
XML can easily be merged with style sheets to create almost any desired output. Virtually,
any type of data can be expressed as an XML document.
Features of XML
1) Text based
2) Does not DO anything
3) Free and Extensible
4) W3C Recommended
5) Designed to describe/carry data, not to display data
6) Designed to be self-descriptive
7) Platform independent
WIRELESS MARKUP LANGUAGE (WML)
WML stands for Wireless Markup Language (WML) which is based on HTML and HDML. It is
specified as an XML document type. It is a markup language used to develop websites for mobile
phones.
While designing with WML, constraints of wireless devices such as small display screens, limited
memory, low bandwidth of transmission and small resources have to be considered. WAP
(Wireless Application Protocol) sites are different from normal HTML sites in the fact that they
are monochromatic (only black and white), concise and has very small screen space, due to which
content in the WAP sites will be only the significant matter, much like how telegraph used to work
in the olden days.
The concept WML follows is that of a deck and card metaphor. A WML document is thought of
as made up of many cards.
Many cards can be inserted into a WML document, and the WML deck is identified by a URL. To
access the deck, the user can navigate using the WML browser, which fetches the deck as required.
Features of WML
Text and Images: WML gives a clue about how the text and images can be presented to
the user. The final presentation depends upon the user. Pictures need to be in WBMP
(Wireless Bitmap and with file extension .wbmp) format and will be monochrome.
User Interaction: WML supports different elements for input like password entry, option
selector and text entry control. The user is free to choose inputs such as keys or voice.
Navigation: WML offers hyperlink navigation and browsing history.
Context Management: The state can be shared across different decks and can also be
saved between different decks.
COMPACT HYPERTEXT MARKUP LANGUAGE (CHTML)
Short for compact HTML, cHTML is a subset of HTML used for small devices such as
smartphones and PDAs.
Some HTML features, such as tables, image maps, font styles/variations, background
colors, background images, frames, and style sheets are not supported in cHTML.
On cHTML devices, basic operations are performed by a combination of four buttons
rather than cursor movement, which is one reason why some features (like image maps)
are not supported.
cHTML was developed for i-mode devices by Access Company, Ltd. Today, cHTML
enables Internet access on limited-functionality mobile devices that are increasingly
popular in global markets.