HTML5 Character Encodings

A character encoding is a method of converting bytes into characters. To validate or display an HTML document, a program must choose a character encoding. HTML5 authors have three means of setting the character encoding:

HTTP Content-Type Header

If you were writing CGI or similar program then you would use HTTP Content-Type header to set any character encoding.

print "Content-Type: text/html; charset=utf-8\r\n";

The <meta> element

You can use a <meta> element with a charset attribute that specifies the encoding within the first 512 bytes of the HTML5 document.

<meta charset="UTF-8">

Unicode Byte Order Mark (BOM)

A byte order mark (BOM) consists of the character code U+FEFF at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files.