11
11
Article
Talk
Read
View source
View history
Tools
Appearance hide
Text
Small
Standard
Large
Width
Standard
Wide
Color (beta)
Automatic
Light
Dark
Page semi-protected
From Wikipedia, the free encyclopedia
(Redirected from Html)
".htm" and ".html" redirect here. For other uses, see HTM.
HTML
Web browsers receive HTML documents from a web server or from local storage and
render the documents into multimedia web pages. HTML describes the structure of a
web page semantically and originally included cues for its appearance.
HTML elements are the building blocks of HTML pages. With HTML constructs, images
and other objects such as interactive forms may be embedded into the rendered page.
HTML provides a means to create structured documents by denoting structural
semantics for text such as headings, paragraphs, lists, links, quotes, and other
items. HTML elements are delineated by tags, written using angle brackets. Tags
such as <img> and <input> directly introduce content into the page. Other tags such
as <p> and </p> surround and provide information about document text and may
include sub-element tags. Browsers do not display the HTML tags but use them to
interpret the content of the page.
HTML can embed programs written in a scripting language such as JavaScript, which
affects the behavior and content of web pages. The inclusion of CSS defines the
look and layout of content. The World Wide Web Consortium (W3C), former maintainer
of the HTML and current maintainer of the CSS standards, has encouraged the use of
CSS over explicit presentational HTML since 1997.[3] A form of HTML, known as
HTML5, is used to display video and audio, primarily using the <canvas> element,
together with JavaScript.
History
Development
Photograph of Tim Berners-Lee in April 2009
Tim Berners-Lee in April 2009
In 1980, physicist Tim Berners-Lee, a contractor at CERN, proposed and prototyped
ENQUIRE, a system for CERN researchers to use and share documents. In 1989,
Berners-Lee wrote a memo proposing an Internet-based hypertext system.[4] Berners-
Lee specified HTML and wrote the browser and server software in late 1990. That
year, Berners-Lee and CERN data systems engineer Robert Cailliau collaborated on a
joint request for funding, but the project was not formally adopted by CERN. In his
personal notes of 1990, Berners-Lee listed "some of the many areas in which
hypertext is used"; an encyclopedia is the first entry.[5]
The first publicly available description of HTML was a document called "HTML Tags",
[6] first mentioned on the Internet by Tim Berners-Lee in late 1991.[7][8] It
describes 18 elements comprising the initial, relatively simple design of HTML.
Except for the hyperlink tag, these were strongly influenced by SGMLguid, an in-
house Standard Generalized Markup Language (SGML)-based documentation format at
CERN. Eleven of these elements still exist in HTML 4.[9]
HTML is a markup language that web browsers use to interpret and compose text,
images, and other material into visible or audible web pages. Default
characteristics for every item of HTML markup are defined in the browser, and these
characteristics can be altered or enhanced by the web page designer's additional
use of CSS. Many of the text elements are mentioned in the 1988 ISO technical
report TR 9537 Techniques for using SGML, which describes the features of early
text formatting languages such as that used by the RUNOFF command developed in the
early 1960s for the CTSS (Compatible Time-Sharing System) operating system. These
formatting commands were derived from the commands used by typesetters to manually
format documents. However, the SGML concept of generalized markup is based on
elements (nested annotated ranges with attributes) rather than merely print
effects, with separate structure and markup. HTML has been progressively moved in
this direction with CSS.
After the HTML and HTML+ drafts expired in early 1994, the IETF created an HTML
Working Group. In 1995, this working group completed "HTML 2.0", the first HTML
specification intended to be treated as a standard against which future
implementations should be based.[13]
Further development under the auspices of the IETF was stalled by competing
interests. Since 1996, the HTML specifications have been maintained, with input
from commercial software vendors, by the World Wide Web Consortium (W3C).[14] In
2000, HTML became an international standard (ISO/IEC 15445:2000). HTML 4.01 was
published in late 1999, with further errata published through 2001. In 2004,
development began on HTML5 in the Web Hypertext Application Technology Working
Group (WHATWG), which became a joint deliverable with the W3C in 2008, and was
completed and standardized on 28 October 2014.[15]
XHTML 1.0 was published as a W3C Recommendation on January 26, 2000,[60] and was
later revised and republished on August 1, 2002. It offers the same three
variations as HTML 4.0 and 4.01, reformulated in XML, with minor restrictions.
XHTML 1.1[61] was published as a W3C Recommendation on May 31, 2001. It is based on
XHTML 1.0 Strict, but includes minor changes, can be customized, and is
reformulated using modules in the W3C recommendation "Modularization of XHTML",
which was published on April 10, 2001.[62]
XHTML 2.0 was a working draft. Work on it was abandoned in 2009 in favor of work on
HTML5 and XHTML5.[63][64][65] XHTML 2.0 was incompatible with XHTML 1.x and,
therefore, would be more accurately characterized as an XHTML-inspired new language
than an update to XHTML 1.x.
Transition of HTML publication to WHATWG
See also: HTML5 § W3C and WHATWG conflict
On 28 May 2019, the W3C announced that WHATWG would be the sole publisher of the
HTML and DOM standards.[66][67][68][69] The W3C and WHATWG had been publishing
competing standards since 2012. While the W3C standard was identical to the WHATWG
in 2007 the standards have since progressively diverged due to different design
decisions.[70] The WHATWG "Living Standard" had been the de facto web standard for
some time.[71]
Markup
HTML markup consists of several key components, including those called tags (and
their attributes), character-based data types, character references and entity
references. HTML tags most commonly come in pairs like <h1> and </h1>, although
some represent empty elements and so are unpaired, for example <img>. The first tag
in such a pair is the start tag, and the second is the end tag (they are also
called opening tags and closing tags).
Another important component is the HTML document type declaration, which triggers
standards mode rendering.
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<div>
<p>Hello world!</p>
</div>
</body>
</html>
The text between <html> and </html> describes the web page, and the text between
<body> and </body> is the visible page content. The markup text <title>This is a
title</title> defines the browser page title shown on browser tabs and window
titles and the tag <div> defines a division of the page used for easy styling.
Between <head> and </head>, a <meta> element can be used to define webpage
metadata.
The Document Type Declaration <!DOCTYPE html> is for HTML5. If a declaration is not
included, various browsers will revert to "quirks mode" for rendering.[72]
Elements
Main article: HTML element
Tags may also enclose further tag markup between the start and end, including a
mixture of tags and text. This indicates further (nested) elements, as children of
the parent element.
The start tag may also include the element's attributes within the tag. These
indicate other information, such as identifiers for sections within the document,
identifiers used to bind style information to the presentation of the document, and
for some tags such as the <img> used to embed images, the reference to the image
resource in the format like this: <img src="example.com/example.jpg">
Some elements, such as the line break <br> do not permit any embedded content,
either text or further tags. These require only a single empty tag (akin to a start
tag) and do not use an end tag.
Many tags, particularly the closing end tag for the very commonly used paragraph
element <p>, are optional. An HTML browser or other agent can infer the closure for
the end of an element from the context and the structural rules defined by the HTML
standard. These rules are complex and not widely understood by most HTML authors.
Element examples
See also: HTML element
Header of the HTML document: <head>...</head>. The title is included in the head,
for example:
<head>
<title>The Title</title>
<link rel="stylesheet" href="stylebyjimbowales.css"> <!-- Imports Stylesheets -->
</head>
Headings
HTML headings are defined with the <h1> to <h6> tags with H1 being the highest (or
most important) level and H6 the least:
Heading Level 1
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
CSS can substantially change the rendering.
Paragraphs:
There are several common attributes that may appear in many elements :
Most elements take the language-related attribute dir to specify text direction,
such as with "rtl" for right-to-left text in, for example, Arabic, Persian or
Hebrew.[79]
The ability to "escape" characters in this way allows for the characters < and &
(when written as < and &, respectively) to be interpreted as character data,
rather than markup. For example, a literal < normally indicates the start of a tag,
and & normally indicates the start of a character entity reference or numeric
character reference; writing it as & or & or & allows & to be included
in the content of an element or in the value of an attribute. The double-quote
character ("), when not used to quote an attribute value, must also be escaped as
" or " or " when it appears within the attribute value itself.
Equivalently, the single-quote character ('), when not used to quote an attribute
value, must also be escaped as ' or ' (or as ' in HTML5 or XHTML
documents[80][81]) when it appears within the attribute value itself. If document
authors overlook the need to escape such characters, some browsers can be very
forgiving and try to use context to guess their intent. The result is still invalid
markup, which makes the document less accessible to other browsers and to other
user agents that may try to parse the document for search and indexing purposes for
example.
Escaping also allows for characters that are not easily typed, or that are not
available in the document's character encoding, to be represented within the
element and attribute content. For example, the acute-accented e (é), a character
typically found only on Western European and South American keyboards, can be
written in any HTML document as the entity reference é or as the numeric
references é or é, using characters that are available on all keyboards
and are supported in all character encodings. Unicode character encodings such as
UTF-8 are compatible with all modern browsers and allow direct access to almost all
the characters of the world's writing systems.[82]
The original purpose of the doctype was to enable the parsing and validation of
HTML documents by SGML tools based on the document type definition (DTD). The DTD
to which the DOCTYPE refers contains a machine-readable grammar specifying the
permitted and prohibited content for a document conforming to such a DTD. Browsers,
on the other hand, do not implement HTML as an application of SGML and as
consequence do not read the DTD.
HTML5 does not define a DTD; therefore, in HTML5 the doctype declaration is simpler
and shorter:[83]
<!DOCTYPE html>
An example of an HTML 4 doctype
In addition, HTML 4.01 provides Transitional and Frameset DTDs, as explained below.
The transitional type is the most inclusive, incorporating current tags as well as
older or "deprecated" tags, with the Strict DTD excluding deprecated tags. The
frameset has all tags necessary to make frames on a page along with the tags
included in transitional type.[84]
Semantic HTML
Main article: Semantic HTML
Semantic HTML is a way of writing HTML that emphasizes the meaning of the encoded
information over its presentation (look). HTML has included semantic markup from
its inception,[85] but has also included presentational markup, such as <font>, <i>
and <center> tags. There are also the semantically neutral div and span tags. Since
the late 1990s, when Cascading Style Sheets were beginning to work in most
browsers, web authors have been encouraged to avoid the use of presentational HTML
markup with a view to the separation of content and presentation.[86]
In a 2001 discussion of the Semantic Web, Tim Berners-Lee and others gave examples
of ways in which intelligent software "agents" may one day automatically crawl the
web and find, filter, and correlate previously unrelated, published facts for the
benefit of human users.[87] Such agents are not commonplace even now, but some of
the ideas of Web 2.0, mashups and price comparison websites may be coming close.
The main difference between these web application hybrids and Berners-Lee's
semantic agents lies in the fact that the current aggregation and hybridization of
information is usually designed by web developers, who already know the web
locations and the API semantics of the specific data they wish to mash, compare and
combine.
An important type of web agent that does crawl and read web pages automatically,
without prior knowledge of what it might find, is the web crawler or search-engine
spider. These software agents are dependent on the semantic clarity of web pages
they find as they use various techniques and algorithms to read and index millions
of web pages a day and provide web users with search facilities withou