HTML Introduction Part 2
HTML Introduction Part 2
The character-set for the early world wide web was ASCII. ASCII supports the numbers from 0-9, the uppercase
and lowercase English alphabet, and some special characters.
Since many countries use characters which are not a part of ASCII, the default character-set for modern browsers
is ISO-8859-1.
If a web page uses a different character-set than ISO-8859-1, it should be specified in the <meta> tag.
Try it yourself
The different character-sets being used around the world are listed below:
Unicode enables processing, storage and interchange of text data no matter what the platform, no matter what
the program, no matter what the language.
The Unicode Standard has become a success and is implemented in XML, Java, ECMAScript (JavaScript), LDAP,
CORBA 3.0, WML, etc. The Unicode standard is also supported in many operating systems and all modern
browsers.
The Unicode Consortium cooperates with the leading standards development organizations, like ISO, W3C, and
ECMA.
Unicode can be implemented by different character-sets. The most commonly used encodings are UTF-8 and UTF-
16:
Character-set Description
UTF-8 A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the
Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred
encoding for e-mail and web pages
UTF-16 16-bit Unicode Transformation Format is a variable-length character encoding for Unicode,
capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating
systems and environments, like Microsoft Windows 2000/XP/2003/Vista/CE and the Java
and .NET byte code environments
Tip: The first 256 characters of Unicode character-sets correspond to the 256 characters of ISO-8859-1.
Tip: All HTML 4 processors already support UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16!
It contains the numbers from 0-9, the uppercase and lowercase English letters from A to Z, and some special
characters.
The character-sets used in modern computers, HTML, and Internet are all based on ASCII.
The following table lists the 128 ASCII characters and their equivalent HTML entity codes.
ISO-8859-1
ISO-8859-1 is the default character set in most browsers.
The first 128 characters of ISO-8859-1 is the original ASCII character-set (the numbers from 0-9, the uppercase
and lowercase English alphabet, and some special characters).
The higher part of ISO-8859-1 (codes from 160-255) contains the characters used in Western European countries
and some commonly used special characters.
Entities are used to implement reserved characters or to express characters that cannot easily be entered with the
keyboard.
HTML and XHTML processors must support the five special characters listed in the table below:
URL Encoding
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a + sign.
Try It Yourself
If you click the "Submit" button below, the browser will URL encode the input before it is sent to the server. A
page at the server will display the received input.
Submit
In JavaScript you can use the encodeURI() function. PHP has the rawurlencode() function and ASP has the
Server.URLEncode() function.
Click the "URL Encode" button to see how the JavaScript function encodes the text.
According to the W3C recommendation you should declare the primary language for each Web page with the lang
attribute inside the <html> tag, like this:
<html lang="en">
...
</html>
Maltese mt
Maori mi
Marathi mr
Moldavian mo
Mongolian mn
Nauru na
Nepali ne
Norwegian no
Occitan oc
Oriya or
Oromo (Afan, Galla) om
Papiamentu
Pashto (Pushto) ps
Polish pl
Portuguese pt
Punjabi pa
Quechua qu
Rhaeto-Romance rm
Romanian ro
Russian ru
Sami (Lappish)
Samoan sm
Sangro sg
Sanskrit sa
Serbian sr
Serbo-Croatian sh
Sesotho st
Setswana tn
Shona sn
Sindhi sd
Sinhalese si
Siswati ss
Slovak sk
Slovenian sl
Somali so
Spanish es
Sundanese su
Swahili (Kiswahili) sw
Swedish sv
Syriac
Tagalog tl
Tajik tg
Tamazight
Tamil ta
Tatar tt
Telugu te
Thai th
Tibetan bo
Tigrinya ti
Tonga to
Tsonga ts
Turkish tr
Turkmen tk
Twi tw
Uighur ug
Ukrainian uk
Urdu ur
Uzbek uz
Venda
Vietnamese vi
Volapük vo
Welsh cy
Wolof wo
Xhosa xh
Yi
Yiddish yi, ji
Yoruba yo
Zulu zu
1xx: Information
Message: Description:
100 Continue Only a part of the request has been received by the server, but as
long as it has not been rejected, the client should continue with the
request
101 Switching Protocols The server switches protocol
2xx: Successful
Message: Description:
200 OK The request is OK
201 Created The request is complete, and a new resource is created
202 Accepted The request is accepted for processing, but the processing is not
complete
203 Non-authoritative Information
204 No Content
205 Reset Content
206 Partial Content
3xx: Redirection
Message: Description:
300 Multiple Choices A link list. The user can select a link and go to that location.
Maximum five addresses
301 Moved Permanently The requested page has moved to a new url
302 Found The requested page has moved temporarily to a new url
303 See Other The requested page can be found under a different url
304 Not Modified
305 Use Proxy
306 Unused This code was used in a previous version. It is no longer used, but
the code is reserved
307 Temporary Redirect The requested page has moved temporarily to a new url
Message: Description:
400 Bad Request The server did not understand the request
401 Unauthorized The requested page needs a username and a password
402 Payment Required You can not use this code yet
403 Forbidden Access is forbidden to the requested page
404 Not Found The server can not find the requested page
405 Method Not Allowed The method specified in the request is not allowed
406 Not Acceptable The server can only generate a response that is not accepted by the
client
407 Proxy Authentication Required You must authenticate with a proxy server before this request can be
served
408 Request Timeout The request took longer than the server was prepared to wait
409 Conflict The request could not be completed because of a conflict
410 Gone The requested page is no longer available
411 Length Required The "Content-Length" is not defined. The server will not accept the
request without it
412 Precondition Failed The precondition given in the request evaluated to false by the
server
413 Request Entity Too Large The server will not accept the request, because the request entity is
too large
414 Request-url Too Long The server will not accept the request, because the url is too long.
Occurs when you convert a "post" request to a "get" request with a
long query information
415 Unsupported Media Type The server will not accept the request, because the media type is not
supported
416
417 Expectation Failed
Message: Description:
500 Internal Server Error The request was not completed. The server met an unexpected
condition
501 Not Implemented The request was not completed. The server did not support the
functionality required
502 Bad Gateway The request was not completed. The server received an invalid
response from the upstream server
503 Service Unavailable The request was not completed. The server is temporarily
overloading or down
504 Gateway Timeout The gateway has timed out
505 HTTP Version Not Supported The server does not support the "http protocol" version
« Previous Next Reference »
HTML 4.01 / XHTML 1.0 Reference
« W3Schools Home Next Reference »
Ordered Alphabetically
DTD: indicates in which HTML 4.01 / XHTML 1.0 DTD the tag is allowed. S=Strict, T=Transitional, and F=Frameset
Ordered by Function
DTD: indicates in which HTML 4.01 / XHTML 1.0 DTD the tag is allowed. S=Strict, T=Transitional, and F=Frameset