Unit 2 - HTML,CSS & XML
Unit 2 - HTML,CSS & XML
HTML (HyperText Markup Language) is the most basic building block of the Web. It defines the
meaning and structure of web content. Other technologies besides HTML are generally used to describe
a web page's appearance/presentation (CSS) or functionality/behavior (JavaScript).
"Hypertext" refers to links that connect web pages to one another, either within a single website or
between websites. Links are a fundamental aspect of the Web. By uploading content to the Internet and
linking it to pages created by other people, you become an active participant in the World Wide Web.
HTML uses "markup" to annotate text, images, and other content for display in a Web browser. HTML
markup includes special "elements" such as
<head>, <title>, <body>, <header>, <footer>, <article>, <section>, <p>, <div>, <span>, <img>,
<aside>, <audio>, <canvas>, <datalist>, <details>, <embed>, <nav>, <output>, <progress>, <vi
deo>, <ul>, <ol>, <li> and many others.
HTML was invented by Tim Berners-Lee, a physicist at the CERN research institute in
Switzerland. He came up with the idea of an Internet-based hypertext system. Hypertext means a
text that contains references (links) to other texts that viewers can access immediately. He
published the first version of HTML in 1991, consisting of 18 HTML tags. Since then, each new
version of the HTML language came with new tags and attributes (tag modifiers) to the markup.
According to Mozilla Developer Network’s HTML Element Reference, currently, there are 140
HTML tags, although some of them are already obsolete (not supported by modern browsers). Due
to a quick rise in popularity, HTML is now considered an official web standard. The HTML
specifications are maintained and developed by the World Wide Web Consortium (W3C). You
can check out the latest state of the language anytime on W3C’s website. The biggest upgrade of
the language was the introduction of HTML5 in 2014. It added several new semantic tags to the
markup, that reveal the meaning of their own content, such as <article>, <header>, and <footer>.
Since the first days, HTML has gone through an incredible evolution. W3C constantly publish
new versions and updates, while historical milestones get dedicated names as well. HTML4 (these
days commonly referred to as “HTML”) was published in 1999, while the latest major version
came out in 2014. Named HTML5, the update has introduced many new features to the language.
One of the most anticipated features of HTML5 is native support for audio and video embedding.
Instead of using Flash player, we can simply embed videos and audio files to our web pages using
the new <audio></audio> and <video></video> tags. It also includes in-built support for scalable
vector graphics (SVG) and MathML for mathematical and scientific formulas. HTML5 introduced
a few semantic improvements as well. The new semantic tags inform br owsers about the meaning
of content, which benefits both readers and search engines.
The most popular semantic tags are:
Like most things, HTML comes with a handful of strengths and limitations.
Pros:
• A widely used language with a lot of resources and a huge community behind.
• Runs natively in every web browser.
• Comes with a flat learning curve.
• Open-source and completely free.
• Clean and consistent markup.
• The official web standards are maintained by the World Wide Web Consortium (W3C).
• Easily integrable with backend languages such as PHP and Node.js.
Cons:
• Mostly used for static web pages. For dynamic functionality, you may need to use
JavaScript or a backend language such as PHP.
• It does not allow the user to implement logic. As a result, all web pages need to be created
separately, even if they use the same elements, e.g. headers and footers.
• Some browsers adopt new features slowly.
• Browser behavior is sometimes hard to predict (e.g. older browsers don’t always render
newer tags).
While HTML is a powerful language, it isn’t enough to build a professional and fully responsive
website. We can only use it to add text elements and create the structure of the content. However,
HTML works extremely well with two other frontend languages: CSS (Cascading Style Sheets),
and JavaScript. Together, they can achieve rich user experience and implement advanced
functions.
• CSS is responsible for stylings such as background, colors, layouts, spacing, and
animations.
• JavaScript lets you add dynamic functionality such as sliders, pop-ups, and photo galleries.
6. SO…WHAT IS HTML?
HTML is the main markup language of the web. It runs natively in every browser and is maintained
by the World Wide Web Consortium. You can use it to create the content structure of websites
and web applications. It’s the lowest level of frontend technologies, that serves as the basis for
styling you can add with CSS and functionality you can implement using JavaScript.
HTML documents are files that end with a .html or .htm extension. You can view then using any
web browser (such as Google Chrome, Safari, or Mozilla Firefox). The browser reads the HTML
file and renders its content so that internet users can view it. Usually, the average website includes
several different HTML pages. For instance: home pages, about pages, contact pages would all
have separate HTML documents. Each HTML page consists of a set of tags (also
called elements), which you can refer to as the building blocks of web pages. They create a
hierarchy that structures the content into sections, paragraphs, headings, and other content bl ocks.
Most HTML elements have an opening and a closing that use the <tag></tag> syntax.
Below, you can see a code example of how HTML elements can be structured:
<div>
<h1>The Main Heading</h1>
<h2>A catchy subheading</h2>
<p>Paragraph one</p>
<img src="/" alt="Image">
<p>Paragraph two with a <a href="https://example.com">hyperlink</a></p>
</div>
• The outmost element is a simple division (<div></div>) you can use to mark up bigger
content sections.
• It contains a heading (<h1></h1>), a subheading (<h2></h2>), two paragraphs
(<p></p>), and an image (<img>).
• The second paragraph includes a link (<a></a>) with a href attribute that contains the
destination URL.
• The image tag also has two attributes: src for the image path and alt for the image
description.
8. HTML BASICS
HTML (Hypertext Markup Language) is the code that is used to structure a web page and its content.
For example, content could be structured within a set of paragraphs, a list of bulleted points, or using
images and data tables.
HTML is a markup language that defines the structure of your content. HTML consists of a series
of elements, which you use to enclose, or wrap, different parts of the content to make it appear a certain
way, or act a certain way. The enclosing tags can make a word or image hyperlink to somewhere else,
can italicize words, can make the font bigger or smaller, and so on. For example, take the following line
of content:
If we wanted the line to stand by itself, we could specify that it is a paragraph by enclosing it in paragraph
tags:
HTML tags have two main types: block-level and inline tags.
1. Block-level elements take up the full available space and always start a new line in the
document. Headings and paragraphs are a great example of block tags.
2. Inline elements only take up as much space as they need and don’t start a new line on the
page. They usually serve to format the inner contents of block-level elements. Links and
emphasized strings are good examples of inline tags.
The three block level tags every HTML document needs to contain are <html>, <head>,
and <body>.
1. The <html></html> tag is the highest level element that encloses every HTML page.
2. The <head></head> tag holds meta information such as the page’s title and charset.
3. Finally, the <body></body> tag encloses all the content that appears on the page.
<html>
<head>
<!-- META INFORMATION -->
</head>
<body>
<!-- PAGE CONTENT -->
</body>
</html>
• Headings have 6 levels in HTML. They range from <h1></h1> to <h6></h6>, where h1
is the highest-level heading and h6 is the lowest one. Paragraphs are enclosed by <p></p>,
while blockquotes use the <blockquote> </blockquote> tag.
• Divisions are bigger content sections that typically contain several paragraphs, images,
sometimes blockquotes, and other smaller elements. We can mark them up using
the <div></div> tag. A div element can contain another div tag inside it as well.
• You may also use <ol></ol> tags for ordered lists and <ul></ul> for unordered ones.
Individual list items must be enclosed by the <li></li> tag. For example, this is how a
basic unordered list looks like in HTML:
<ul>
<li>List item 1</li>
<li>List item 2</li>
<li>List item 3</li>
</ul>
Many inline tags are used to format text. For example, a <strong></strong> tag would render an
element in bold, whereas <em></em> tags would show it in italics.
Hyperlinks are also inline elements that require <a></a> tags and href attributes to indicate the
link’s destination:
<a href="https://example.com/">Click me!</a>
Images are inline elements too. You can add one using <img> without any closing tag. But you
will also need to use the src attribute to specify the image path, for example:
<img src="/images/example.jpg" alt="Example image">
Some elements have no content and are called empty elements. Take the <img> element that we already
have in our HTML page:
1. The opening tag: This consists of the name of the element (in this case, p), wrapped in opening
and closing angle brackets. This states where the element begins or starts to take effect — in
this case where the paragraph begins.
2. The closing tag: This is the same as the opening tag, except that it includes a forward
slash before the element name. This states where the element ends — in this case where the
paragraph ends. Failing to add a closing tag is one of the standard beginner errors and can lead
to strange results.
3. The content: This is the content of the element, which in this case, is just text.
4. The element: The opening tag, the closing tag, and the content together comprise the element.
Elements can also have attributes that look like the following:
Attributes contain extra information about the element that you don't want to appear in the actual content.
Here, class is the attribute name and editor-note is the attribute value. The class attribute allows you to
give the element a non-unique identifier that can be used to target it (and any other elements with the
same class value) with style information and other things.
Note: Simple attribute values that don't contain ASCII whitespace (or any of the characters " ' ` = < > )
can remain unquoted, but it is recommended that you quote all attribute values, as it makes the code more
consistent and understandable.
You can put elements inside other elements too — this is called nesting. If we wanted to state that our
cat is very grumpy, we could wrap the word "very" in a <strong> element, which means that the word is
to be strongly emphasized:
You do however need to make sure that your elements are properly nested. In the example above, we
opened the <p> element first, then the <strong> element; therefore, we have to close
the <strong> element first, then the <p> element. The following is incorrect:
The elements have to open and close correctly so that they are clearly inside or outside one another. If
they overlap as shown above, then your web browser will try to make the best guess at what you were
trying to say, which can lead to unexpected results. So don't do it!
This contains two attributes, but there is no closing </img> tag and no inner content. This is because an
image element doesn't wrap content to affect it. Its purpose is to embed an image in the HTML page in
the place it appears.
That wraps up the basics of individual HTML elements, but they aren't handy on their own. Now we'll
look at how individual elements are combined to form an entire HTML page. Let's revisit the code we
put into our index.html example
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>My test page</title>
</head>
<body>
<img src="images/firefox-icon.png" alt="My test image">
</body>
</html>
Here, we have the following:
a) <!DOCTYPE html> — doctype. It is a required preamble. In the mists of time, when HTML
was young (around 1991/92), doctypes were meant to act as links to a set of rules that the
HTML page had to follow to be considered good HTML, which could mean automatic error
checking and other useful things. However these days, they don't do much and are basically
just needed to make sure your document behaves correctly. That's all you need to know for
now.
b) <html></html> — the <html> element. This element wraps all the content on the entire
page and is sometimes known as the root element.
c) <head></head> — the <head> element. This element acts as a container for all the stuff
you want to include on the HTML page that isn't the content you are showing to your page's
viewers. This includes things like keywords and a page description that you want to appear
in search results, CSS to style our content, character set declarations, and more.
d) <meta charset="utf-8"> — This element sets the character set your document should use to
UTF-8 which includes most characters from the vast majority of written languages.
Essentially, it can now handle any textual content you might put on it. There is no reason
not to set this and it can help avoid some problems later on.
e) <title></title> — the <title> element. This sets the title of your page, which is the title that
appears in the browser tab the page is loaded in. It is also used to describe the page when
you bookmark/favorite it.
f) <body></body> — the <body> element. This contains all the content that you want to show
to web users when they visit your page, whether that's text, images, videos, games, playable
audio tracks, or whatever else.
12. IMAGES
As we said before, it embeds an image into our page in the position it appears. It does this via
the src (source) attribute, which contains the path to our image file.
We have also included an alt (alternative) attribute. In this attribute, you specify descriptive text for users
who cannot see the image, possibly because of the following reasons:
1. They are visually impaired. Users with significant visual impairments often use tools called
screen readers to read out the alt text to them.
2. Something has gone wrong causing the image not to display. For example, try deliberately
changing the path inside your src attribute to make it incorrect. If you save and reload the page,
you should see something like this in place of the image:
The keywords for alt text are "descriptive text". The alt text you write should provide the reader with
enough information to have a good idea of what the image conveys. In this example, our current text of
"My test image" is no good at all. A much better alternative for our Firefox logo would be "The Firefox
logo: a flaming fox surrounding the Earth."
12. HEADINGS
Heading elements allow you to specify that certain parts of your content are headings — or subheadings.
In the same way that a book has the main title, chapter titles, and subtitles, an HTML document can too.
HTML contains 6 heading levels, <h1>–<h6>, although you'll commonly only use 3 to 4 at most:
<h3>My subheading</h3>
<h4>My sub-subheading</h4>
13. PARAGRAPHS
HTML paragraph HTML <p> tag is used to define a paragraph in a webpage. It is a Paired Tag, i.e., it
comes with an opening <p> and a closing </p> tag.
A <p> tag is very important tag, as all the content written on a website needs to get formatted in the
form of paragraphs. Browsers automatically add blank lines above and below a paragraph to separate it
from other content or other paragraphs on the page.
HTML Paragraphs are block level elements, i.e., a new paragraph will always start from a new line.
Also, Paragraph tags gets automatically closed if another block-element gets parsed before the </p> tag.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Paragraph Tag </title>
</head>
<body>
<p> This is First Paragraph </p>
<p> This is Second Paragraph </p>
<p> This is Third Paragraph </p>
</body>
</html>
Output
This is First Paragraph
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Paragraph Tag </title>
</head>
<body>
<p> This is First Paragraph. </p>
<p> This is Second Paragraph. </p>
<p> This is Third Paragraph.</p>
</body>
</html>
Output
This is First Paragraph.
The <pre> tag is also a paired tag. It can be used when you want to display a certain amount of text with
preformatted spaces and line breaks. For example, to display a block of code of a programming language
or to display a poem with proper line breaks.
In the example below, you can see that the text is displayed as it is, in the browser, as it was written
inside the <pre> tag.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Pre Tag </title>
</head>
<p>The pre tag preserves both spaces and line breaks:</p>
<pre>
This is a Paragraph Tag.
This is a Paragraph Tag.
This is a Paragraph Tag.
This is a Paragraph Tag.
</pre>
</body>
</html>
Output
This is a Paragraph Tag.
The <a> tag is a paired tag with </a> tag as a closing tag. Whatever is written between these two tags
will feature as a hyperlink on the webpage.
Syntax
<a href="url">link text</a>
In the example above, the text "Google" will work as a hyperlink and will take the user to our html
tutorial page. We have given the address(Path) of that page as a reference in thehref attribute.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Anchor Tag </title>
</head>
<body>
<a href="https://www.google.com"> Welcome to Google </a>
</body>
</html>
Value Description
_self Opens the linked document in the same window/tab. This is the default value.
_top Opens the linked document in the full body of the window.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Anchor Tag Example </title>
</head>
<body>
<p><a href="https://www.google.com" target="_blank">Welcome to Google</a></p>
<p><a href="https://www. google.com" target="_top"> Welcome to Google </a></p>
<p><a href="https://www. google.com" target="_parent"> Welcome to Google </a></p>
<p><a href="https://www. google.com" target="_top"> Welcome to Google </a></p>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Image Link </title>
</head>
<body>
<p>The image is a link. You can click on it.</p>
<a href="https://www.google.com">
<img src="PUBG.png" alt="HTML Image" style="width:300px;height:200px;">
</a>
</body>
</html>
You can create a base path of your Base Domain. Whenever you give reference to any link, you can
skip the base domain and can directly write latter part. Browser will automatically concatenate the link
with the base path you have given and will make a complete URL.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Base Path Link Example</title>
<base href="https://www.Coderepublics.com" target="_blank">
</head>
<body>
<p> Click following link </p>
<a href="https://www.xyz.com/HTML/html-tutorial.php"> Learn HTML </a>
</body>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Change Link Color </title>
</head>
<body alink="green" vlink="red">
<p> Click following link </p>
<a href="https://google.com/"> Welcome to Google </a>
</body>
Syntax
<img src="url" alt="some_text" height="px" width="px">
Example:
<img src="url" alt="some_text" height="px" width="px">
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Image Tag </title>
</head>
<body>
<img src="HTML-Image.png" width="400px" height="200px">
</body>
</html>
If a browser cannot find an image, it will display the value of the alt attribute:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Image Alt Attribute </title>
<body>
<img src="HTML-Image.png" alt="HTML5 Image" style="width:400px; height:250px;">
</body>
</html>
The "alt" attribute provides an alternate text for an image, if the user for some reason cannot view it
(because of slow connection, an error in the src attribute, or if the user uses a screen reader).
If a browser cannot find an image, it will display the value of the alt attribute:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Image Alt Attribute </title>
<body>
<img src="HTML-Image.png" alt="HTML5 Image" style="width:400px; height:250px;">
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Image Size Attribute </title>
<body>
<img src="HTML5-Image.png" alt="HTML5 Image" width=600px height=250px>
</body>
</html>
Tables are also used in websites to present any data to the user. It looks really neat and also everyone
prefers tabular form of data nowadays. The HTML tables allows to arrange data like text, images, etc.
into rows and columns.
<!DOCTYPE html>
<html lang="en">
<HTML>
<head>
<meta charset="UTF-8">
<title> HTML Table </title>
</head>
<body>
<table>
<tr>
<th> Name </th>
<th> Salary </th>
<th> Age </th>
</tr>
<tr>
<td> Anshuman </td>
<td> Rs. 2,00,000 </td>
<td> 25 </td>
</tr>
<tr> <td> Kuldeep </td>
<td> Rs. 5,00,000 </td>
<td> 22 </td>
</tr>
</table> </body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Table Border Attribute </title>
</head>
<body>
<table border="1" width="100%">
<tr>
<th> Name </th>
<th> Salary </th>
<td> 22 </td>
</tr>
</table>
</body>
</html>
CELLPADDING ATTRIBUTE
The Cellpadding attribute is used to specify the space between the content of the cell and its borders. It
provides padding to the content of the cell. As its value increases the space between the cell’s content
and its border is also increases. The value of this attribute is taken in pixels by the browser. The
cellpadding is applied to all the four sides of the content. The value can also be defined in percentages.
CELLSPACING ATTRIBUTE
The Cellspacing attribute is used to specify the space between the cells of the table. Its value can be in
pixels or in percentages. It works similar to the Cellpadding attribute but only between cells. It is applied
to all the sides of the cells.
Note: These two attributes defined above are no longer a part of HTML 5. So it is better to use CSS to
color the tables.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Table Cellpadding Attribute </title>
</head>
<body>
<table border="1" cellpadding="5" cellspacing="5" style="width:100%">
<tr>
<th>Name</th>
<th>Salary</th>
</tr>
<tr>
<td>Peter</td>
<td>5000</td>
</tr>
<tr>
<td>John</td>
<td>7000</td>
</tr>
</table>
</body>
</html>
The 'Rowspan'
The rowspan attribute is used to merge two or more rows together to form a single row. A single row
occupies space of the number of merged rows.
<!DOCTYPE html>
<html>
<html lang="en">
<head>
<meta charset="UTF-8">
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: left;
}
</style>
</head>
<body>
<h2>Cell that spans two rows:</h2>
<table style="width:100%">
<tr>
<th>Name:</th>
<td>Bill Gates</td>
</tr>
<tr>
<th rowspan="2">Telephone:</th>
<td>9998887776</td>
</tr>
<tr>
<td>9998887776</td>
</tr>
</table>
</body>
</html>
The 'Colspan'
The colspan attribute is used to merge two or more columns into a single column. single column
occupies space of the number of merged columns.
<!DOCTYPE html>
<html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Table Colspan Attribute </title>
</head>
<body>
<table border="1" width="80%">
<tr>
<th> Person_Name </th>
<th colspan="2"> Mobile </th>
</tr>
<tr>
<td> Bill Gates </td>
<td> 9998887776 </td>
<td> 9998887775 </td>
</tr>
</table>
</body>
</html>
• A caption can be aligned around the table by using align attribute with values -
left/right/top/bottom.
• The default alignment is top.
<!DOCTYPE html>
<html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Table Caption Attribute </title>
<style>
table, th, td {
border: 1px solid black;
}
</style>
</head>
<body>
<table>
<caption>Monthly savings</caption>
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
<tr>
<td>February</td>
<td>$50</td>
</tr>
</table>
</body>
</html>
Ex.-For a numbered order list, the numbering starts at one and is incremented by one for each
successive ordered list element tagged with <li>.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Ordered List </title>
</head>
<body>
<h2>HTML Ordered list</h2>
<ol>
<li>Audi</li>
<li>Mercedes</li>
<li>Lamborghini</li>
</ol>
</body>
</html>
type ATTRIBUTE
The type attribute is used to change the series type.
Value Description
type="I" The list items will be numbered with uppercase roman numbers.
type="i" The list items will be numbered with lowercase roman numbers.
LIST OF NUMBERS
Numbers as type - <ol type="1">. Here the numbers will be used to order the elements. Each new
element will get incremented value from the previous one in the list.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Ordered List </title>
</head>
<body>
<h2>HTML Ordered list</h2>
<ol>
<li>Audi</li>
<li>Mercedes</li>
<li>Lamborghini</li>
</ol>
</body>
</html>
Uppercase
Uppercase alphabets as type - <ol type="A">. Here, Uppercase alphabets will be used to order the
elements.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Ordered List Uppercase </title>
</head>
<body>
<ol type="A">
<li>Audi</li>
<li>Mercedes</li>
<li>Lamborghini</li>
</ol>
</body>
</html>
Lowercase
Lowercase alphabets as type - <ol type="a">. Same as above, but the alphabets will be lowercased.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Ordered List Lowercase </title>
</head>
<body>
<ol type="a">
<li>Audi</li>
<li>Mercedes</li>
<li>Lamborghini</li>
</ol>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Ordered List Uppercase Roman </title>
</head>
<body>
<ol type="I">
<li>Audi</li>
<li>Mercedes</li>
<li>Lamborghini</li>
</ol>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Ordered List Start Attribute </title>
</head>
<body>
<ol start="50">
<li>Samsung</li>
<li>OnePlus</li>
<li>Nokia</li>
</ol>
<ol type="I" start="50">
<li>Oppo</li>
<li>Vivo</li>
<li>Xiaomi</li>
</ol>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Unordered List </title>
</head>
<body>
<h2> Unordered List </h2>
<ul>
<li> Harley-Davidson </li>
<li> Ducati </li>
<li> BMW </li>
</ul>
</body>
</html>
Value Description
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Unordered List Disc Attribute </title>
</head>
<body>
<h2> Unordered List </h2>
<ul type="disc">
<li> Harley-Davidson </li>
<li> Ducati</li >
<li> BMW </li>
</ul>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Unordered List Circle Attribute </title>
</head>
<body>
<h2> Unordered List </h2>
<ul type="circle">
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Unordered List Square Attribute </title>
</head>
<body>
<h2> Unordered List </h2>
<ul type="square">
<li> Harley-Davidson </li>
<li> Ducati </li>
<li> BMW </li>
</ul>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> Unordered List None Attribute </title>
</head>
<body>
<h2> Unordered List </h2>
<ul type="none">
<li> Harley-Davidson </li>
<li> Ducati </li>
<li> BMW </li>
</ul>
</body>
</html>
HTML support another list style which is called definition lists where entries are listed like in a
dictionary. The definition list is the ideal way to present a list of terms, or other name/value list.
The definition list created using <dl> tag. The Description <dt> — defines the item in the list,
and <dd> describes the items in the list.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Definition List</title>
</head>
<body>
<h1>HTML Definition List</h1>
<dl>
<dt>PUBG</dt>
<dd>PlayerUnknown's Battlegrounds (PUBG) developed by PUBG Corporation.</dd>
<dt>God Of War</dt>
<dd>God of War developed by Santa Monica Studio.</dd>
</dl>
</body>
</html>
Syntax:
<form>
....
Form Elements..
....
</form>
Attributes Description
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Select Attribute </title>
</head>
<body>
<form action="action-page.php">
<select name="Cars">
<option value="Audi"> Audi </option>
<option value="Mercedes"> Mercedes </option>
<option value="Lamborghini"> Lamborghini </option>
</select>
<input type="submit">
</form>
</body>
</html>
Note: The action attribute defines the action to be performed when the form is submitted. You should
add the destination where the form is submitted.
BASIS FOR
GET POST
COMPARISON
Form data type constraints Only ASCII characters are No constraints, even binary data is
permitted. permitted.
Form data length Should be kept as minimum as Could lie in any range.
possible.
Caching Method data can be cached. Does not cache the data.
Syntax
<form action="action-page.jsp" method="get">
<form action="action-page. jsp " method="post">
</body>
</html>
Grouping Form Data with <fieldset>
The <fieldset> element is used to group related data in a form and the <legend> element defines a
caption for the <fieldset> element.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Fieldset and Legend Attributes </title>
</head>
<body>
<form action="action-page.php">
<fieldset>
<legend>Personal information:</legend>
First name:
<input type="text" name="firstname" value="John">
Last name:
<input type="text" name="lastname" value="Snow">
<input type="submit" value="Submit">
</fieldset>
</form>
</body>
</html>
Type Description
Type Description
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Text </title>
</head>
<body>
<form action="action-page.php">
First name:
<input type="text" name="firstname">
Last name:
<input type="text" name="lastname">
<input type="submit">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Password </title>
</head>
<body>
<form action="#">
User name:
<input type="text" name="userid">
User password:
<input type="password" name="psw">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Submit </title>
</head>
<body>
<form action="action-page.php">
First name:
<input type="text" name="firstname" value="John">
Last name:
<input type="text" name="lastname" value="Snow">
<input type="submit" value="Submit">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Reset </title>
</head>
<body>
<form action="action-page.php">
First name:<br>
<input type="text" name="firstname" value="John">
Last name:<br>
<input type="text" name="lastname" value="Snow">
<input type="submit" value="Submit">
<input type="reset">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Radio Button </title>
</head>
<body>
<form action="action-page.php">
<input type="radio" name="gender" value="male" checked> Male<br>
<input type="radio" name="gender" value="female"> Female<br>
<input type="radio" name="gender" value="other"> Other<br><br>
<input type="submit">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Checkbox </title>
</head>
<body>
<form action="action-page.php">
<input type="checkbox" name="vehicle1" value="Bike">Samsung
<input type="checkbox" name="vehicle2" value="Car">Google Pixel>
<input type="submit">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Button </title>
</head>
<body>
<input type="button" onclick="alert('Hello World!')" value="Click Me!">
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Input Type Number </title>
</head>
<body>
<form action="action-page.php">
Quantity (between 1 and 10):
<input type="number" name="quantity" min="1" max="10">
<input type="submit">
</form>
</body>
</html>
Attribute Description
Attribute Description
readonly It specifies that an input field is read only that cannot be changed.
Email Fields
The value "email" is used for creating an input field for email address. This HTML input type is
specifically used to validate the email address entered by the user. It uses the standard email address
format and the user violates it then it shows error. Syntax: <input type=email>
Number Fields
The value "number" will create an input field to enter only numbers, if you enter alphabets or symbols
or anything other than numbers, it will show an error, however decimal points numbers are
allowed. Syntax:<input type=number>
Search Fields
It is used to create a search box. You can even add placeholder in the search box by using the
‘placeholder’ attribute. Syntax: <input type="search">
URL Fields
Specifically used to enter a URL. Syntax: <input type="url">
Number Fields
This HTML input type provides controls to enter numbers. It has small buttons on the right side to
increase or decrease the value of the number. In your smartphones this input type automatically opens
the numeric keyboard during entering the data. Syntax:<input type="number">
Range Fields
It creates a slider to select a value in within a range of two values. Syntax: <input type="range" min="0"
max="10">
Date Fields
This type is used to create an input area to enter date. You can manually enter the date or can select
value from a graphical calendar. Syntax:<input type="date">
Month Fields
It only provides options of Month and year. Syntax:<input type="month">
Week Fields
Allows you to pick the week and year. Syntax:<input type="week">
Time Fields
Allows you to enter time of the day. It can be entered manually or by the help of a digital clock
format. Syntax:<input type="time">
Datetime-local Fields
Enter Date and time together in a single input field. Syntax:<input type="datetime-local">
Color Fields
If you want to enter any RGB color information on the database then use this input type. Syntax:<input
type="color">
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Value Attribute </title>
</head>
<body>
<form action="#">
First name:
<input type="text" name="firstname" value="John">
Last name:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Readonly Attribute </title>
</head>
<body>
<form action="#">
First name:
<input type="text" name="firstname" value ="John" readonly>
Last name:
<input type="text" name="lastname">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Disabled Attribute </title>
</head>
<body>
<form action="#">
First name:
<input type="text" name="firstname" value ="John" disabled>
Last name:
<input type="text" name="lastname">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form Size Attribute </title>
</head>
<body>
<form action="#">
First name:
<input type="text" name="firstname" value="John" size="30">
Last name:
<input type="text" name="lastname">
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title> HTML Form maxlength Attribute </title>
</head>
<body>
<form action="#">
First name:
<input type="text" name="firstname" maxlength="10">
Last name:
<input type="text" name="lastname">
</form>
</body>
</html>
Note : The maxlength attribute, will not accept more than the allowed number of characters in input
field.
HTML <frameset> tag is used to contain the group of frames which can be controlled and styled as a unit.
The <frameset> element also specifies the number of rows and columns in the frameset, and how much
space they will occupy in a frame.
Note: Do not use HTML <frameset> element as it is deprecated and not supported by HTML5,
but you can use <iframe> tag instead.
Syntax
<frameset cols=" ">............</frameset>
Display Block
Usage Frames
Example 1
<!DOCTYPE html>
<html>
<head>
<title>Frame tag</title>
</head>
<frameset cols="50%,50%">
<frame src="xyz/html-table">
<frame src="xyz/css-table">
</frameset>
</html>
Tag-specific attribute
Attribute Value Description
cols Pixels It specifies the number and size of column spaces in the frameset. (Not
% Supported in HTML5)
*
rows Pixels It specifies the number and size of the rows spaces in the frameset.
% (Not Supported in HTML5)
*
1. REQUIREMENT OF CSS
HTML itself is not able to give style to the website elements. CSS extends HTML capabilities and provides many
properties to give the required style to the website. Here are some of the key advantages of learning CSS:
a) Create Beautiful Websites: Behind every stunning website, there is a CSS Style Sheet. It handles the
look and feel of a website. The beautiful designs and the attractive features of CSS make it worthy of
your time. It is a must-have skill of a web designer.
b) Solve Big Problems: Before CSS, all attributes like color, element alignments, border, and size had to
be repeated on every web page to style any element. It was a very long process. For example: If you are
developing a large website where fonts and color information has to be added on every single page, it
will become a long and expensive process. CSS was created to solve this problem. It was a W3C
recommendation.
c) Presentation capabilities: CSS provides powerful control over the presentation of an HTML document.
It is related to the previous point as CSS gets combined with markup languages like HTML to enhance
their capabilities and to provide better control to the user of the website.
d) Increase your skills: HTML and CSS are the most basic languages which a web designer should learn
as it opens the door to various other technologies like JavaScript, Angular, PHP, etc. These technologies
become easier to understand, once a person gets exposed to CSS and HTML.
e) Better styling capabilities: It has a much wider variety of attributes as compared to HTML which has a
limited set of attributes with limited capabilities.
2. APPLICATIONS OF CSS
a) CSS saves time: It saves a lot of time as only one Style Sheet can be used on multiple pages, there is no
need to write same code in all the pages. A single file can be used in all those pages where same styling
of webpages is required. This also decreases the page loading time as file transfer size gets reduced
because of reusing of code.
b) Multiple Device Compatibility: CSS can optimize the same webpage to adapt different
viewports(screens). It makes a webpage compatible and readable in all the devices like laptops, tablets,
mobiles, etc. which have different screen sizes.
c) Improves HTML Functionality: CSS has a large number of attributes as compared to Html which
only provides a limited number of styling attributes.
d) Pages load faster: When we use the same Style Sheet in multiple webpages then, loading of the pages
gets faster because of reusing of the code.
e) Responsive layout: It can optimize a webpage to fit varies screen sizes very easily. Different devices
have different screen sizes and CSS features allows the programmer to make their website responsive
to these devices. The responsive nature of CSS makes a webpage compatible and readable in different
screen sizes with ease.
f) Global web standards: The web standards are now depreciating the use of HTML attributes and it is
being recommended to use CSS.
Explanation-
In the example above the <style> tag is used to add CSS code within the webpage. The selectors for a particular
tag are used to stylize it. For example, h1 { } and p { } selectors are used in the example. Let’s see the attribute
used inside them:
• Color: Used to specify the text color of the heading and the paragraph.
• Background-color: Used to define the background color of the heading.
• Padding: Specified the padding.
3. CSS SYNTAX
A CSS rule-set consists of a selector and a declaration block:
Selector: A Selector is any HTML element we want to style, like <p>, <div> etc.
Declaration Block: This block contains the different properties and their values which we want to change in the
selector element. Example: -
a) color:yellow;
b) font-face:arial;
Property: A Property is kind of a quality of any HTML element like color, border, font-size etc. which will
change the appearance and style of it.
Value: Every CSS property has a value which will give different results for a particular property. In the above
example, value yellow is assigned to color property.
4. INCORPORATING CSS
It is an important feature of CSS that it can be added by various ways in a webpage.
Use Description
Inline CSS Using the style attribute in the HTML start tag.
Embedded CSS Using the style Using the <style> tag within the code of the webpage.
External CSS Using the <link> tag to link an external CSS file in the webpage.
5. CSS SELECTORS
CSS selectors are used to select any element and then apply CSS on it. These selectors are very useful in writing
a clean and effective code.
CSS Selectors are used in internal and external CSS. Although inline CSS is also an option but it is never
recommended to use. There are various types of selector through which you can select an element. These selectors
can be combined together to select a more specific element also.
A CSS Selector can select an element by its id, class, type, attribute, tag, etc.
Syntax
tag-selector{
CSS Styling
}
Example-
p{
color:red;
}
Explanation: In the syntax example, we have used the <p> tag as a tag selector. Now all the paragraphs in the
webpage will have red text color. Remember that tags < > will be omitted when using tag selectors. Only tag
name will be used.
Syntax
Note: When using ID Selectors, the hash(#) symbol is used in front of ID name.
#ID_Name{
CSS Styling
}
Example-
#demo-id{
color:red;
}
Explanation: In the example above, the ID #demo-id is used as a selector. Now, wherever this ID is applied,
whether in a paragraph, heading, div, or anything, the text color will be red inside that element.
To define a class for an element, just use the class attribute. You can write anything as an class name but it should
start with an alphabet or an underscore(_).
Syntax
Note: When using Class Selector, the dot(.) symbol is used in front of class name.
.Class_Name{
Css Styling
}
Example-
.demo-class{
color:red;
}
Explanation: Any element which will use the demo-class as class name will display text color as red. It can be
applied to more than one elements.
Syntax
tag.class-name{
Css Styling
}
Example-
p.demo-class{
color:red;
}
Suppose, there is a div, a heading and a paragraph and in all of these we want the font color to be red. Instead of
declaring same color property 3 times for all 3 elements we can just declare it once using a comma(,).
It is not limited to tags only, multiple tags, IDs, or Classes can be given similar CSS by using the comma(,).
Syntax
p, h2, div{
color:red;
}
p, .demo-class, h2{
color:red;
}
Explanation: Notice in the above example, how the comma is used to create a grouping selector. We have also
combined the tags with a class. Now, all the paragraphs, h2 headings, div or elements with. demo-class will display
red color font.
Syntax
*{
Css Styling
}
Example-
*{
margin:0;
padding:0;
}
a. background-color
b. background-image
c. background-repeat
d. background-attachment
e. background-position
Background color
The background-color property specifies the background color of an element.
Background Image
The background-image property sets an image in the background of an element.
CSS Font property is way smarter than HTML Font Tag. HTML font Tag was used till HTML 4, but in HTML
5, it was deprecated. You can no longer use it, but it was an inferior option compared to the CSS Font property.
CSS Fonts has multiple options to give style to the fonts. You can alter the color, size, weight, and family of the
fonts. You can assign various font families at once as a replacement if one is not supported. Let's look at multiple
CSS Font properties and their uses:
Values Description
CSS Font family property let us assign multiple font families at once. It doesn't mean that all families will be
loaded and used together. Instead, the second one will be used if the browser does not support the first font, and
the third one will be used if the second font is not supported. It works in a particular order of priority.
To change the color of a particular section of the website like paragraph and headings, Class, ID, and tag selectors
will always help.
Font style property has three values. One is normal and the other two are slightly different versions of italic:
• font-style: normal;
• font-style: italic;
• font-style: oblique;
• font-variant: normal;
• font-variant: small-caps;
• font-weight: normal;
• font-weight: lighter;
• font-weight: bold;
• font-weight: 500;
Links can be styled with any CSS property (e.g. color, font-family, background, etc.). Links can be styled
differently depending on what state they are in.
Types Description
9. CSS TABLES
The CSS can also be applied to HTML tables. Below are some properties which can make an ordinary looking
table into an attractive one:
• border
• border-collapse
• padding
• width
• text-align
• color
• background color
According to the CSS box model, a browser engine represents every element in HTML as a box. This box is a
combination of the element's content, padding, margins and borders.
In the image above, the content is wrapped with padding. This padding area is then wrapped with borders and
then, outside the borders, there is margin. All these properties take space around the content, affecting the other
elements outside the box.
Changes in Content, padding and borders affects the size of the container, even when its size is fixed. The margins
on the other hand, clears the outside space for the container. All these properties together create a CSS Box.
A markup language is a computer language that uses tags to define elements within a document. It is human-
readable, meaning markup files contain standard words, rather than typical programming syntax. While several
markup languages exist, the two most popular are HTML and XML.
<students>
<student>
<rollno> 1 </rollno>
<first_name> Akarsh</first_name>
<last_name> Malhotra</last_name>
<branch> CSE </branch>
<section> A </section>
</student>
<student>
<rollno> 2 </rollno>
<first_name> Prateek </first_name>
<last_name> Bajpai </last_name>
<branch> CSE </branch>
<section> A </section>
</student>
</students>
(i) XML focuses on data rather than how it looks: One of the reasons, XML is popular because it focuses
on data rather than data presentation. The other markup language such as HTML is used for data
presentation. This separates the data and its presentation part and gives us the freedom to present the
data, the way we want, once we receive it using XML. Two or more systems can receive the same data
from a same XML and present it in a different way using other markup language such as HTML.
(ii) Easy and efficient data sharing: Since XML is software and hardware independent, it is easier to share
data between different systems with different hardware and software configuration. Any system with any
programming language can read and process a XML document.
(iii) Compatibility with other markup language HTML: It is so much easier to read the data from XML
and display it on an GUI(graphical user interface) using HTML markup language. When the data changes
over time, we need not to make any changes in the HTML.
(iv) Supports platform transition: The main reason why changing to new systems and platform is
challenging, because it involves the headache of data conversion between incompatible formats which
often results in data loss. XML simplifies this process as the data is transported on new upgraded systems
without any data loss.
(v) Allows XML validation: An XML document can be validated using DTD or XML schema. This ensures
that the XML document is syntactically correct and avoids any issues that may arise due to the incorrect
XML.
(vi) Adapts technology advancements: The reason why XML is popular and being used from a very long
time is because, it can adapt to the new technologies because of its platform-independent nature.
(vii) XML supports Unicode: XML supports Unicode that allows it to communicate almost any information
in any written human language.
1.3.2. DISTADVANTAGES
(i) XML syntax is verbose and redundant compared to other text-based data transmission formats such as
JSON.
(ii) The redundancy in syntax of XML causes higher storage and transportation cost when the volume of
data is large.
(iii) XML document is less readable compared to other text-based data transmission formats such as JSON.
(iv) XML doesn’t support array.
(v) XML file sizes are usually very large due to its verbose nature, it is totally dependent on who is writing
it.
2. NEED OF XML
Since there are systems with different-different operating systems having data in different formats. In order to
transfer the data between these systems is a difficult task as the data needs to converted in compatible formats
before it can be used on other system. With XML, it is so easy to transfer data between such systems as XML
doesn’t depend on platform and the language.
Now, the question is, why not use the existing Database Management System (DBMS) products such as Oracle,
SQL Server, IMS, IDMS, and Informix, etc., for exchanging data over the Internet (and also outside of the
Internet)? The reason is incompatibility of various kinds. These DBMS products are extremely popular and
provide great data storage and access mechanisms. However, they are not always compatible with each other in
terms of sharing or transferring data. Their formats, internal representations, data types, encoding, etc., are
different. This creates problems in data exchange.
This is similar to a situation when one person understands only English and the other understands only Hindi.
English and Hindi by themselves are great languages. However, they are not compatible with each other.
Similarly, for instance, suppose organization X uses Oracle as its DBMS (relational) and organization Y uses IMS
as its DBMS (Hierarchical). Each of these DBMS systems internally represents the data in their own formats as
well as by using data structures such as chains, indexes, lists, etc. Now, whenever X and Y want to exchange any
kind of data (say list of products available, last month’s sales report, etc.), they would not be able to do this
directly. Consider the following figure.
Database Management Systems (DBMS) are incompatible with each other, when it comes to data exchange.
If X and Y want to exchange data, the simple solution would be that they agree on a common data format, and use
that format for data exchange. For example, when X wants to send an inventory status to Y, it would first convert
that data from Oracle format into this common format and then send it to Y. When Y receives this data, it would
convert the data from this common format into IMS format, and then its applications can use it. In the simplest
case, this common format can be a text file.
This approach of exchanging data in the text format seems to be fine. After all, all that is needed is some data
transformation programs at both ends, which either read from or write to text format from the native (Oracle/IMS)
format. This approach would be very similar to the one used in our translator approach for human conversations.
But there are some issues with this approach as well, in addition to what we had discussed earlier in the context
of human conversations.
• For instance, suppose another organization Z now wants to do business with X and Y. Therefore, X and
Y now need to exchange data with Z also. Suppose that Z is already interacting with other business
partners such as A and B. Now, if Z is using a different text format for data exchange with A and B, its
data exchange text formats with X/Y and A/B would be different! That is, for exchanging the same data
with different business partners, different application programs might be required.
• Also, suppose that these business partners specify some business rules. For instance, Z mandates that a
sales order arriving from any of its business partners (i.e., A, B, X or Y) must carry at least three items.
For this, appropriate logic can be incorporated in the application program at its end to validate this rule,
whenever it receives any sales order from one of its business partners. However, can we not apply this
business rule before the data is sent by any of the business partners, rather than first accepting the data
and then validating it? If different data exchanges among different business partners demand different
business rules like this, it might be difficult to apply them in the text format.
HTML is the de facto language of the Internet. HTML defines a set of tags describing how the Web browser
should display the contents of a particular document to the end user. For example, it uses tags that indicate that a
particular portion of the text is to be made boldface, underlined, small, big, and so on. In addition, we can display
lists of values using bullets, or create tables on the screen by using HTML.
The similarity between XML and HTML is that both languages use tags to structure documents. This,
incidentally, is perhaps the only real similarity between the two!
XML also uses tags to organize documents and the contents therein just as HTML does, it is not concerned with
these presentation features of a document. XML is more concerned with the meaning and rules of the data
contained in a document. XML describes what the various data items in a document mean, rather than describing
how to display them. Therefore, whereas HTML is an information presentation language, XML is an information
description language. Thus, conceptually, XML is pretty similar to a data definition language. HTML concentrates
on the display/presentation of data to the end user, whereas XML deals with the representation of data in
documents.
3. XML TERMINOLOGY
Every XML file has an extension of .XML. Let us call the above file as books.xml. As we can see, the file seems
to contain information organized in a hierarchical manner, with some unfamiliar symbols. Let us understand this
example step by step. In the process, we will start getting familiar with the XML syntax and terminology.
4. INTRODUCTION TO DTD
Consider an XML document that we intend to write for capturing bank account information. We would like to
see data such as the account number, account holder’s name, opening balance, type of account, etc., as the
fields for which we want to capture information. However, at the same time, we also wish to ensure that this
XML document does not contain any other irrelevant information. For instance, we would like to make sure
that our XML document does not contain information about students, books, projects, or data not needed.
In short, we need easy mechanisms for validating an XML document. For example, we should be able to
specify and validate, which elements, attributes, etc., are allowed in an XML document.
For example, a DTD will allow us to specify that a book XML document can contain exactly one book
name and at the most two author names. A DTD is usually a file with an extension of DTD, although this extension
is optional. Technically, a DTD file need not have any extension. We can specify the relationship between an
XML document and a DTD. That is, we can mention that for a given XML file, we want to use a given DTD file.
Also, we specify the rules that we want to apply in that DTD file. Once this linkage is established, the DTD file
checks the contents of the XML document with reference to these rules automatically whenever we attempt to
make use of the XML document.
Imagine a situation where we do not have anything such as a DTD. Yet, let us imagine that we want to
apply certain rules. How can we accomplish this? Well, there is no simple solution here. The programs that use
the XML document will need to perform all these validations before they can make use of the contents of the
XML document. Of course, it is not impossible. However, it would need to be performed by every program, which
wants to use this XML document for any purposes. Otherwise, there is no guarantee that the XML
document contains bad data.
A DTD will free application programs from the worry of validating the contents of an XML
document. It will take this responsibility on itself. Therefore, the portion of validation is concentrated in just
once place—inside the DTD. All other parties interested in the contents of an XML document are free to
concentrate on what they want to do, i.e., to make use of the XML document the way they want and process it, as
appropriate. On the other hand, the DTD would be busy validating the contents of the XML document on
behalf of any program or application.
• DTD helps us in specifying the rules for validating the contents of an XML document at once place,
thereby allowing the application programs to concentrate on the processing of the XML document.
• A DTD is a file with a DTD extension.
• The contents of this file are purely textual in nature.
An XML document contains a reference to a DTD file. This is similar to, for example, how a C program would
include references to various header files, or a Java program would include packages.
A DOCTYPE declaration in an XML document specifies that we want to include a reference to a DTD file.
Whenever any program (usually called as an XML parser) reads our XML document containing a DOCTYPE
tag, it understands that we have defined a DTD for our XML document. Therefore, it attempts to also load and
interpret the contents of the DTD file. In other words, it applies the rules specified in the DTD to the contents
of our XML document for verifying them.
There are two types of DTDs, internal DTD and external DTD, also respectively called as internal
subset and external subset.
An internal subset means that the contents of the DTD are inside an XML document itself. On the other hand,
an external subset means that an XML document has a reference to another file, which we call as external
subset.
Let us take a simple example. Suppose we want to define an XML document containing a book name as
the only element. We also wish to write a corresponding DTD, which will define the template or rule book for
our XML document. Then we have two situations: the DTD can be internal or external. Let us call our XML
document as book.xml, and our external DTD as book.dtd. Note that when the DTD is internal, there is no need
to provide a separate name for the DTD (since the contents of the DTD are inside the contents of the XML
document anyway). But when the DTD is external, we must provide a name to this DTD file.
As we can see, when a DTD is internal, we embed the contents of the DTD inside the XML document, as
shown in case (a). However, when a DTD is external, we simply provide a reference to the DTD inside our
XML document, as shown in case (b). The actual DTD file has a separate existence of its own.
When should we use an internal DTD, and when should we use an external DTD? For simple situations,
internal DTDs work well. However, external DTDs help us in two ways:
(i) External DTDs allow us to define a DTD once, and then refer to it from any number of XML documents.
Thus, they are reusable. Also, if we need to make any changes to the contents of the DTD, the change needs
to be made just once (to the DTD file).
(ii) External DTDs reduce the size of the XML documents, since the XML documents now contain just a
reference to the DTD, rather than the actual contents of the DTD. Another keyword we need to remember in
the context of internal DTDs.
An XML document can be declared as standalone, if it does not depend on an external DTD.
The keyword standalone is used along with the XML opening tag.
Let us now understand the syntax of the DTD declaration or reference, i.e., regardless of whether the DTD
is internal or external. We know that the internal DTD declaration looks like this in our example:
This DTD declaration indicates that our XML document will contain a root element called as myBook,
which, in turn, contains an element called as book_name. Also, the contents of the DTD need to be wrapped inside
square brackets. This informs the XML parser to know the start and the end of the DTD syntax, and also to help
it differentiate between the DTD contents and the XML contents. On the other hand, the external DTD reference
looks like this:
<!DOCTYPE myBook SYSTEM “myBook.dtd”>
This does not give us any idea about the actual contents of the DTD file, since the DTD is external.
Let us now worry about the DOCTYPE syntax. In general, the basic syntax for the DOCTYPE line is as shown
below:
1. The DOCTYPE keyword indicates that this is either an internal declaration of a DTD, or a reference to
an external DTD.
2. Regardless of whether it is internal or external, this is followed by the name of the root element in the
XML document.
3. This is followed by the actual contents of the DTD (if the DTD is internal), or by the name of the DTD
file (if it is an external DTD). This is currently shown with dots (…).
Elements are the backbone of any XML document. If we want to associate a DTD with an XML document, we
need to declare all the elements that we would like to see in the XML document, also in the DTD. This should be
quite obvious to understand. After all, a DTD is a template or rule book for an XML
document. An element is declared in a DTD by using the element type declarations (ELEMENT tag).
For example, we can declare an element called as book_name, we can use the following declaration:
<!ELEMENT book_name (#PCDATA)>
As we can see, book_name is the name of the element, and its data type is PCDATA. The XML jargon calls an
element name as generic identifier. The data type is called as content specification.
Let us consider an example. Suppose that we want to store just the name of a book in our XML document.
Example below shows a sample XML document and the corresponding DTD that specifies the rules for this XML
document. Note that we are using an external DTD. We have added line numbers simply for the sake of
understanding the example easily by providing references during our discussion. The actual XML document
and DTD will never have line numbers.
Sequences: The first question is how we add more element type declarations to a DTD. For example, suppose
that our book DTD needs to contain the book name and author name. For this, we simply need to add a comma
between these two element type declarations. For example:
This declaration specifies that our XML document should contain exactly one book name, followed by exactly
one author name. Any number of book name-author pairs can exist. Following figure shows an example of
specifying the address book.
As we can see, our address book contains sub-elements, such as street, region, postal code, locality, and country.
Each of these sub-elements is defined as a parsed character data field. Of course, we can extend the concept of
sub-elements further. That is, we can, for example, break down the street sub-element into street number and
street name. This is shown in figure below.
Choices: Choices can be specified by using the pipe (|) character. This allows us to specify options of the type A
or B. For example, we can specify that the result of an examination can be that the student has passed or failed
(but not both), as follows.
Occurrence: The number of occurrences, or the frequency, of an element can be specified by using the plus
(+), asterisk (*), or question mark (?) characters. If we do not use any of the occurrence symbol (i.e., +, *, or ?),
then the element can occur only once. That is, the default frequency of an element is 1.
For example, we can specify that a book must contain one or more chapters as follows.
We can use the same concept to apply to a group of sub-elements. For example, suppose that we want to specify
that a book must contain a title, followed by at least one chapter and at least one author, we can use this declaration.
A sample XML document conforming to this DTD declaration is shown in figure below:
Of course, the grouping of sub-elements for the purpose of specifying frequency is not restricted to the plus sign
(+). It can be done equally well for the asterisk (*) or question mark (?) symbols. The asterisk symbol (*) specifies
that the element may or may not occur. If it is used, it can repeat any number of times.
The DTD specifies that the XML document can depict zero or more employees in an organization. One sample
XML document has three employees, the other has none. Both are allowed. On the other hand, if we replace the
asterisk (*) with a plus sign (+), the situation changes. We must now have at least one employee. Therefore, the
empty organization case (i.e., an organization containing no employees) is now ruled out.
Finally, a question mark (?) indicates that the element cannot occur at all or can occur exactly once.
A nation can have only one president. This is indicated by the following declaration.
At times, of course, the nation may be without a president temporarily. However, at no point can a nation
have more than one president.
Elements describe markup of an XML document. Attributes provide more details about the elements. An
element can have 0 or more attributes. For example, an employee XML document can contain elements to
depict the employee number, name, designation, and salary. The designation element, in turn, can have a
manager attribute that indicates the manager for that employee.
Figure shows an XML document containing an inline DTD. We can see that the element contains an
attribute.
We can see that the message element has three attributes: from, to, and subject. All the three attributes have a data
type of CDATA (which stands for character data), and a #REQUIRED keyword. The #REQUIRED keyword
indicates that this attribute must be a part of the element.
5. INTODUCTION TO SCHEMA
We know that a DTD is used for validating the contents of an XML document. DTD is undoubtedly a very
important feature of the XML technology. However, there are a number of areas in which DTDs are weak. The
main argument against DTDs is that their syntax is not like that of XML documents. Therefore, the people working
with DTDs have to learn new syntax to work with DTDs. Furthermore, this leads to problems, such as, we cannot
search for information inside DTDs, we cannot display their contents in the form of HTML, etc.
It is expected that schemas would eventually completely replace most (but not all) features of DTDs. DTDs are
easier to write and provide support for some features (e.g., entities) better. However, schemas are much richer in
terms of their capabilities and extensibility. A schema document is a separate document, just like a DTD. However,
the syntax of a schema is like the syntax of an XML document. Therefore, we can state:
The main difference between a DTD and a schema is that the syntax of a DTD is different from that of XML.
However, the syntax of a schema is the same as that of XML.
We declare an element in a DTD by using the syntax <!ELEMENT>. This is clearly not legal in XML. We cannot
begin an element declaration with an exclamation mark, as happens in the case of a DTD.
We can use a very simple, yet powerful example to illustrate the difference between using a DTD and using a
schema. Suppose that we want to represent the marks of a student in an XML document. For this purpose, we
want to add an element called as Marks to our root element Student. We will declare this element as of type
PCDATA in our DTD file. This will ensure that the parser checks for the existence of the Marks element in the
XML document. However, can it ensure that marks are numeric? Clearly, no! We cannot control what contents
the element Marks can have. These contents can very well be alphabetic or alphanumeric.
As we can see, the usage of PCDATA in the declaration of an element does not stop us from entering alphabetic
data in a Marks element. In other words, we cannot specify exactly what should our elements contain. This is
quite clearly not desirable at all. In the case of a schema, we can very well specify that our element should only
contain numeric data. Moreover, we can control many other aspects of the contents of elements, which is not
possible in the case of DTDs. We use similar terminology for checking the correctness of XML documents in the
case of a schema (as in the case of DTDs). An XML document that conforms to the rules of a schema is called as
a valid XML document. Otherwise, it is called as invalid.
First and foremost, an XML schema is defined in a separate file. This file has the extension xsd.
In our example, the schema file is named message.xsd. The following declaration in our XML document indicates
that we want to associate this schema with our XML document:
<MESSAGE xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:noNamespaceSchemaLocation=”message.xsd”>
1. The word MESSAGE indicates the root element of our XML document. There is nothing unusual
about it.
This is followed by the actual contents of our XML document. In this case, the contents are nothing but the
contents of our root element.
Note that the schema file is an XML file with an extension of xsd. That is, like any XML document, it begins with
an <?xml …?> declaration. The following lines specify that this is a schema file, and not an ordinary XML
document. They also contain the actual contents of the schema. Let us first reproduce them:
<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>
<xsd:element name = “MESSAGE“ type = “xsd:string”/>
</xsd:schema>
2. The declaration <xsd:element name = “MESSAGE” type = “xsd:string”/> specifies that we want to use an element
called as MESSAGE in our XML document. The type of this element is string. Also, we are using the namespace
prefix xsd. Recall that this namespace prefix was associated with a namespace URI
http://www.w3org/2001/XMLSchema in our earlier statement.
Elements in schema can be divided into two categories: simple and complex.
Simple Elements
Simple elements can contain only text. They cannot have sub-elements or attributes. The text that they can contain,
however, can be of various data types such as strings, numbers, dates, etc.
Complex Elements
Complex elements, on the other hand, can contain sub-elements, attributes, etc. Many times, they are made up of
one or more simple element.
Suppose we want to capture student information in the form of the student’s roll number, name, marks, and result.
Then we can have all these individual blocks of information as simple elements. Then we will have a complex
element in the form of the root element. This complex element will encapsulate these individual simple elements.
We know that the root element of the schema is a reserved keyword called as schema. Here also, same is the case.
The namespace prefix xsd maps to the namespace URI http://www.w3.org/2001/ XMLSchema, as before. In
general, this will be true for any schema that we write.
This declares STUDENT as the root element of our XML document. In the schema, it is called as the top-level
element. Remember that in the case of a schema, the root element is always the keyword schema. Therefore, the
root element in an XML document is not the root of the corresponding schema. Instead, it appears in the schema
after the root element schema.
Conceptually, a user-defined type is similar to a structure in C/C++ or a class in Java (without the
methods). It allows us to create our own custom type.
In other words, the schema specification allows us to create our own custom data types. For example,
we can create our own types for storing information about employees, departments, songs, friends, sports games,
and so on. We recognize this as a user-defined type because it does not have our namespace prefix xsd. Remember
that all the standard data types provided by the XML schema specifications reside at the namespace
http://www.w3.org/2001/XMLSchema, which we have prefixed as xsd in the earlier statement.
Now that we have declared our own type, we must explain what it represents and contains. That is exactly what
we are doing here. This statement indicates that we have used StudentType as a type earlier, and now we want to
explain what it means. Also, note that we use a keyword complexType to designate that StudentType is a complex
element. This is similar to stating struct StudentType or class StudentType in C++/Java.
4. <xsd:sequence>:
Schemas allow us to force a sequence of simple elements within a complex element. We can specify that a
particular complex element must contain one or more simple elements in a strict sequence. Thus, if the complex
element is A, containing two simple elements B and C, we can mandate that C must follow B inside A. In other
words, the XML document must have:
<A>
<B> … </B>
<C>… </C>
</A>
This declaration specifies that the first simple element inside our complex element is ROLL_NUMBER, of type
string. After this, we have NAME, MARKS, and RESULT as three more simple elements following
ROLL_NUMBER.
6. BASIC OF PARSING
Parsing of XML is the process of reading and validating an XML document and converting it into the desired
format. The program that does this job is called as a parser.
An XML file is something that exists on the disk. So, the parser has to first of all bring it from the disk into the
main memory. More importantly, the parser has to make this in memory representation of an XML file available
to the programmer in a form that the programmer is comfortable with. A parser reads a file from the disk, converts
it into an in-memory object and hands it over to the programmer. The programmer’s responsibility is then to take
this object and manipulate it the way she wants. For example, the programmer may want to display the values of
certain elements, add some attributes, count the total number of elements, and so on.
Suppose that someone younger in your family has returned from playing a cricket match. He is very excited about
it, and wants to describe what happened in the match. He can describe it in two ways:
When an XML document is to be presented to a Java program as an object, there are two main possibilities.
1. Present the document in bits and pieces, as and when we encounter certain sections or portions of the
document.
2. Present the entire document tree at one go. This means that the Java program has to then think of this
document tree as one object, and manipulate it the way it wants.
We have discussed this concept in the context of the description of a cricket match earlier. We can either
describe the match as it happened, event by event; or first describe the overall highlights and then get into
specific details. For example, consider an XML document
1. Go through the XML structure item by item (e.g., to start with, the line <?xml version=”1.0”?>,
followed by the element <employees>, and so on).
2. Read the entire XML document in the memory as an object, and parse its contents as per the needs.
Technically, the first approach is called as Simple API for XML (SAX), whereas the latter is known as
Document Object Model (DOM).
In general, we can equate the SAX approach to our example of the step-by-step description of a cricket match.
The SAX approach works on an event model. This works as follows.
(i) The SAX parser keeps track of various events, and whenever an event is detected, it informs our Java
program.
(ii) Our Java program needs to then take an appropriate action, based on the requirements of handling
that event. For example, there could be an event Start element as shown in the diagram.
(iii) Our Java program needs to constantly monitor such events, and take an appropriate action.
(iv) Control comes back to SAX parser, and steps (i) and (ii) repeat.
In general, we can equate the DOM approach to our example of the overall description of a cricket match.This
works as follows.
(i) The DOM approach parses through the whole XML document at one go. It creates an in-memory
tree-like structure of our XML document.
(ii) This tree-like structure is handed over to our Java program at one go, once it is ready. No events get
fired unlike what happens in SAX.
(iii) The Java program then takes over the control and deals with the tree the way it wants, without
actively interfacing with the parser on an event-by-event basis. Thus, there is no concept of
something such as Start element, Characters, End element, etc.