0% found this document useful (0 votes)
4 views31 pages

02_html_syntax_2024

The document provides an overview of HTML as a markup language for web browsers, explaining its structure, syntax, and various elements such as tags and attributes. It discusses the importance of standards in HTML development, including the role of organizations like W3C and WHATWG in shaping HTML specifications. Additionally, it covers best practices for writing HTML, including the use of whitespace, encoding, and the inclusion of elements.

Uploaded by

cuks00zero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

02_html_syntax_2024

The document provides an overview of HTML as a markup language for web browsers, explaining its structure, syntax, and various elements such as tags and attributes. It discusses the importance of standards in HTML development, including the role of organizations like W3C and WHATWG in shaping HTML specifications. Additionally, it covers best practices for writing HTML, including the use of whitespace, encoding, and the inclusion of elements.

Uploaded by

cuks00zero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Web Technologies I

HTML Language: the Basics

Darja Solodovņikova

adopted from Krišs Rauhvargers


HTML: a Language for Web Browsers

▪ HTML is a language web browsers


understand
▪ … not intended for reading in a book
▪ How to see it:
▪ A way of adding "thingies" to text
document to make it look better
▪ A way to formalize the structure of text
to make it look better

2
HTML: What is It?
▪ HTML is a text markup language
▪ HTML document =
text document + markup *
A demonstration <h1>A demonstration</h1>

I am a simple document <p>I am a simple document


without any formatting. without any formatting.</p>

I consist of several <p>I consist of several


paragraphs. The first paragraphs. The first
paragraph ought to be a paragraph ought to be a
title. title.</p>
Plain text document Document with markup

3
Types of Markup Languages
▪ "Macro" style languages
▪ Can be compiled down to "draw this letter
here" commands
▪ Example: LaTeX, PostScript
▪ WYSIWYG languages
▪ Markup exists, it's hidden from the user
▪ Example: Microsoft RTF
▪ Structural markup languages
▪ Markup defines only structure
▪ Examples: HTML, XML
4
Tags in HTML
▪ Each "tag" has its meaning and place
▪ If you read between the "tags", you get
readable text
▪ The computer uses "tags" to alter the
display of text
▪ "Tags" mark the beginning and end of
HTML elements and may contain
attributes
5
HTML CONCEPTS

6
Some Ground Rules
▪ Browser should try its best to display a
document (even with syntax errors)
▪ If the document contains unknown
elements, treat them as simple text
– Compatibility feature so older browsers can
display newer markup (and vice versa)
– Also users make lots of mistakes

7
Whitespaces and Line Breaks
▪ HTML was designed to be pretty-formatted
▪ Line breaks are allowed to wrap long lines
▪ Additional white spaces can be added
▪ Rules are simple:
▪ Multiple whitespaces are treated as one
▪ Linebreak is treated as a whitespace

<p>This is the most This is the most dull HTML sample.


dull
HMTL sample.
</p> In code In the browser
8
Syntax of HTML Elements
▪ Most elements have an opening tag and a closing tag
▪ An opening tag is written like this:
▪ "<" + element name + ">"
▪ for instance, <p>
▪ A closing tag is written like this:
▪ "</" + element name + ">"
▪ for instance, </p>
▪ Content of the element (text, other elements) is
written between the opening and the closing tags
▪ For instance,
▪ <p>An example sentence.</p>
12
Auto-Closing of Elements
▪ Browsers forgive missing closing tags in
many cases
▪ <p>This is the first paragraph.
<p> This is the second one.
▪ Obviously, a paragraph cannot contain a
paragraph, hence the first paragraph is
auto-closed by the browser
▪ But div can contain div
▪ Good practice – always close your tags
10
Elements without Textual Content
▪ Some elements (ones that do not have
textual content) do not need a closing tag
▪ Solutions for this case depend on which
version of HTML is used:
▪ HTML4, HTML5.x: just use the opening tag:
▪ <img src="picture.jpg" alt="Sample">
▪ XHTML: manifest that the element is self-
closing:
▪ <img src="picture.jpg" alt="Sample" />
<img src="picture.jpg"
alt="Sample"></img>
Attributes
▪ An element may need additional information for
completeness
▪ Hyperlink: where does it link to
▪ Image: where to take the picture from
▪ Rules
▪ Attributes are written in the opening tag of an element
▪ Attribute values should be enclosed in quotes
▪ Single or double – does not matter, don't mix different
together
▪ More than 1 attribute should be separated by space
▪ <p id="someid" class="someclass">
▪ Examples
▪ <a href="www.lu.lv">
▪ <img src='nomnom.gif'>
15
Inclusion of Elements
▪ An element may contain other elements (as per its
specification/DTD)
▪ For instance, a paragraph (p) can contain an image (img).
<p>
Text of the paragraph. And an image here:
<img src="pic.jpg" alt="bilde">
</p>
▪ Inclusion must be complete, start/end tags may not overlap:
▪ Wrong:
<strong>This is bold,
<em>this is italics and bold</strong>,
the author hoped this would be just italics</em>
▪ Right:
<strong>This is bold,
<em>this is italics and bold</em></strong>,
<em>the author hoped this would be just italics</em> 13
WEB AUTHORING

14
Two Approaches

WYSIWYG

Code editing
24
Two Approaches of Web Authoring
▪ The two approaches still remain:
▪ WYSIWYG
▪ use visual formatting tools, let the computer take care of
markup
▪ no (not enough) control over what is generated
▪ Source code editing
▪ full control over what is generated
▪ author has to know syntax and semantics of HTML
elements
▪ slower on large amounts of data
▪ In this course we use the latter approach
▪ We are the IT guys, we have to know how it works
to fix it when it breaks 17
HTML STANDARDIZATION

18
The Browser Wars
▪ Facts
▪ In 1994, Mosaic Communications was founded and
started creating Netscape Navigator
▪ In 1995, Microsoft began developing Internet Explorer
▪ The first proposal for an HTML specification was published
in the mid-1993
▪ HTML 2.0 standard was published in November, 1995
▪ Way too late!
▪ To attract developers and users, both vendors tried to
implement as many features as possible
▪ Features that were different from other browsers
▪ Features that were not documented elsewhere
▪ and were "a bit different" than the competitor's way.
▪ Result: part of web only working partway! 20
Standards Organizations: W3C

▪ A resolution in 1994 to start a World


Wide Web Consortium, W3C
(http://w3.org)
▪ Main aim: creation of generally
available web development standards
▪ Web standards should be followed by
browser developers as well as
document authors
21
Clarifications Made by in HTML 2.0
▪ Tags vs Elements
▪ An HTML document consists of separate elements
▪ Element is an "atomic particle" of a document
▪ To serialize an element, tags are used
▪ DOCTYPE
▪ An instruction to the web browser about what version of
HTML the page is written in
▪ Separation of a document and its metadata
▪ HEAD and BODY elements
▪ Backward compatibility
▪ "Bad" elements from previous standards are deprecated, but
should be rendered by the browser
▪ "Do not break the web" 30
W3C Standards/Recommendations
HTML 2.0 (year 1995) SGML
Fixes SGML compatibility
(DOCTYPE) HTML
Standardizes stuff which is already Tags
HTML 2.0
implemented
HTML 3.2 (beginning 1997) HTML 3.0
Adds presentational markup
(center, font, small) HTML 3.2 XML 1.0

Adds CSS support HTML XHTML


HTML 4.0 (1999) 4.01 1.0
XHTML 2
HTML 5 XHTML 5
Removes presentational markup
New different
Adds frames, iframes, new form features serialization
elements
XHTML 1.0 ( 2000)
Same as HTML4, rewritten to XML
28
WHATWG – Another Standards
Organization?
▪ WHATWG (Web Hypertext Application
Technology Working Group)
▪ a group of vendors and implementers
interested in improving the current web
▪ one of the reasons formed:W3C's decision
to abandon HTML in favor of XML-based
technologies
▪ WHATWG started work on HTML5 which
was later continued together with W3C
31
Standards Support
▪ Browsers support standards to certain
amount (different in each browser)
▪ Standards are ambiguous
▪ Browser developers sometimes still:
▪ "forget" something relevant
▪ partially implement something that will be included
in the next versions of standards (HTML5 Canvas)
▪ do something on their own (-moz-border-radius)
▪ do something that "everybody does"
▪ http://caniuse.com
35
OTHER FEATURES OF HTML
LANGUAGE

36
Global Attributes – Used on Any Element

▪ ID – shows that the element differs from


antyhing else, assigns an identifier
▪ CLASS – shows that the element is similar to
others having the same class
▪ TITLE attribute describes the contents of an
element.
▪ Technically, title text is shown when you put
mouse cursor on the element
▪ Other global attributes:
https://developer.mozilla.org/en/HTML/Global_attributes
37
File Encoding
▪ As any text file, HTML files are written to disk
in specific encoding
▪ It is a good idea to define the encoding in the
document HEAD (must match the actual
encoding!)
▪ HTML5.x
▪ <meta charset="UTF-8">
▪ HTML allows other encodings, but…
▪ It's 21st century, use UTF-8 everywhere

38
Why UTF-8?
▪ When did you last time do View→Character
encoding while browsing the Web?
▪ How often did you do that in 2005?
▪ Historical issues in 90s/00s
▪ Wrong encoding

▪ Undefined encoding using OS defaults


▪ Especially in then bilingual Latvian web
▪ Is there any (at least theoretical) drawbacks
of using UTF-8?
39
Substitute Symbols
▪ If page text contains angle braces, these have to be
substituted, when writing down:
▪ <p><Look left | Look right></p>
▪ does not work, no such element "Look"
▪ <p> &lt;Look left | Look right&gt;</p>
▪ lt = less than
▪ gt = greater than
▪ Substitute symbols are in the form
▪ "&" + entity name + ";"
▪ Another case: if an attribute value contains a quote
symbol:
▪ <img alt=""Like" and "Share" this"> - does not work
▪ <img alt="&quot;Like&quot; it"> - OK
▪ quot = quotation mark 40
Substitute Symbols (2)
▪ Since the & symbol is reserved for
substitutions, it has to be "escaped" as well
▪ The & symbol itself has to be substituted:
&amp;
▪ <img title="Wiley & sons"> - sometimes
not ok
▪ <img title="Wiley &amp; sons"> - OK
▪ amp = ampersand

41
Other Substitutes
< &lt; (less than)
> &gt; (greater than)
” &quot; (quote)
& &amp; (ampersand)
... &hellip; (ellipsis)
”” &nbsp; (non-breaking space)
€ &euro;
ü &#252; (any Unicode symbol)
— &mdash; (m-dash, the real "domuzīme")
▪ Other:
▪ Raquo Laquo? http://www.raquo.net
▪ http://htmlhelp.com/reference/html40/entities/
▪ Note: if you're using UTF-8 everywhere, you don't need the
42
Unicode substitutes (except for < > " &)
When are Whitespaces not Welcome?
▪ Whitespaces can be multiplied/added nearly
anywhere in textual content … with some
exceptions
▪ Do not put a space
▪ between the tag brace and the element name
▪ < table
▪ Do put a space
▪ after attribute quote and the next attribute name
▪ <table border="1"id="sampletable">
44

You might also like