0% found this document useful (0 votes)

31 views

Programacion Web Parte-4

The document discusses character encodings and how they allow computers to store and represent text in different languages. It describes the ASCII character set which only supports English, and more extensive encodings like Unicode that allow all languages. Unicode defines UTF-8, UTF-16 and UTF-32 encodings to store its large character set. UTF-8 is generally recommended for use on the web as it is supported by HTML and XML. Character encodings ensure text can be properly represented and transferred across different systems.

Uploaded by

david

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Programacion Web Parte-4

Uploaded by

david

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

E

Character Encodings
Appendix D, Color Names and Values, discusses how computers store information, how
a character-encoding scheme is a table that translates between characters, and how they are
stored in the computer.
The most common character set (or character encoding) in use on computers is The American
Standard Code for Information Interchange (ASCII), which is probably the most widely used
character set for encoding text electronically. You can expect all computers browsing the web
to understand ASCII.
The problem with ASCII is that it supports only the uppercase and lowercase Latin alphabet, the numbers 09, and some extra characters: a total of 128 characters. Table E-1 lists
the printable characters of ASCII. (The other characters are things such as line feeds and
carriage-return characters.)
Table E-1: Printable Characters of ASCII

However, many languages use either accented Latin characters or completely different alphabets. ASCII does not address these characters, so you need to learn about character encodings
if you want to use any non-ASCII characters.

562

Appendix ECharacter Encodings

Character encodings are also important if you want to use symbols because these cannot be guaranteed to transfer properly between different encodings (from some dashes to some quotation mark
characters). If you do not indicate the character encoding the document is written in, some of the
special characters might not display.
The International Standards Organization created a range of character sets to deal with different
national characters. ISO-8859-1 is commonly used in Western versions of authoring tools such as
Adobe Dreamweaver, as well as applications such as WindowsNotepad, as shown in Table E-2.
Table E-2: ISO Character Sets
Character Set

Description

ISO-8859-1

Latin alphabet part 1

Covering North America, Western Europe, Latin America, the Caribbean,
Canada, and Africa

ISO-8859-2

Latin alphabet part 2

Covering Eastern Europe including Bosnian, Croatian, Czech, Hungarian,
Polish, Romanian, Serbian (in Latin transcription), Serbo-Croatian, Slovak,
Slovenian, Upper Sorbian, and Lower Sorbian

ISO-8859-3

Latin alphabet part 3

Covering SE Europe, Esperanto, Maltese, Turkish, and miscellaneous others

ISO-8859-4

Latin alphabet part 4

Covering Scandinavia/Baltics (and others not in ISO-8859-1)

ISO-8859-5

Latin/Cyrillic alphabet part 5

ISO-8859-6

Latin/Arabic alphabet part 6

ISO-8859-7

Latin/Greek alphabet part 7

ISO-8859-8

Latin/Hebrew alphabet part 8

ISO-8859-9

Latin 5 alphabet part 9 (same as ISO-8859-1 except Turkish characters

replace Icelandic ones)

ISO-8859-10

Latin 6 Lappish, Nordic, and Eskimo

ISO-8859-15

The same as ISO-8859-1 but with more characters added

ISO-8859-16

Latin 10
Covering SE Europe, Albanian, Croatian, Hungarian, Polish, Romanian and
Slovenian, plus can be used in French, German, Italian, and Irish Gaelic

ISO-2022-JP

Latin/Japanese alphabet part 1

ISO-2022-JP-2

Latin/Japanese alphabet part 2

ISO-2022-KR

Latin/Korean alphabet part 1

Character Encodings

563

It is helpful to note that the first 128 characters of ISO-8859-1 match those of ASCII, so you can
safely use those characters as you would in ASCII.
The Unicode Consortium was then set up to devise a way to show all characters of different languages,
rather than have these different, incompatible character codes for different languages.
Therefore, if you want to create documents that use characters from multiple character sets, you can
do so using the single Unicode character encodings. Furthermore, users can view documents written
in different character sets, providing their processor (and fonts) supports the Unicode standards, no
matter what platform they are on or which country they are in. By having the single-character encoding, you can reduce software development costs because the programs do not need to be designed to
support multiple character encodings.
One problem with Unicode is that a lot of older programs were written to support only 8-bit character
sets (limiting them to 256 characters), which is nowhere near the number required for all languages.
Unicode therefore specifies encodings that can deal with a string in special ways to make enough
space for the huge character set it encompasses. These are known as UTF-8, UTF-16, and UTF-32,
as shown in Table E-3.
Table E-3: Unicode Character Sets
Character Set

Description

UTF-8

A Unicode Translation Format that comes in 8-bit units. That is, it comes
in bytes. A character in UTF-8 can be from 1 to 4 bytes, making UTF-8 a
variable width.

UTF-16

A Unicode Translation Format that comes in 16-bit units. That is, it comes in
shorts. It can be 1 or 2 shorts, making UTF-16 a variable width.

UTF-32

A Unicode Translation Format that comes in 32-bit units. That is, it comes in
longs. It is a fixed-width format and is always 1 long in length.

The first 256 characters of Unicode character sets correspond to the 256 characters of ISO-8859-1.
By default, HTML 4 processors should support UTF-8, and XML processors are supposed to support
UTF-8 and UTF-16; therefore, all XHTML-compliant processors should also support UTF-16 (because
XHTML is an application of XML). The HTML5 specification is strongly biased toward UTF-8.
In practice you almost always want to use UTF-8.
For more information on internationalization and different character sets and encodings, see
www.i18nguy.com and the article The Absolute Minimum Every Software Developer Absolutely,
Positively Must Know about Unicode and Character Sets (No Excuses!) at www.joelonsoftware
.com/articles/Unicode.html.

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
How To Crack Any Type of Software Protection
No ratings yet
How To Crack Any Type of Software Protection
3 pages
HTML Introduction Part 2
No ratings yet
HTML Introduction Part 2
28 pages
Unicode®: Character Encodings
No ratings yet
Unicode®: Character Encodings
11 pages
06_02_emerging
No ratings yet
06_02_emerging
6 pages
Introduction To Unicode: History of Character Codes
No ratings yet
Introduction To Unicode: History of Character Codes
4 pages
Unicode Fundamentals
No ratings yet
Unicode Fundamentals
51 pages
Power Point
No ratings yet
Power Point
10 pages
Unicode - Language of Universe
No ratings yet
Unicode - Language of Universe
15 pages
Linux Unicode Programming
No ratings yet
Linux Unicode Programming
10 pages
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
No ratings yet
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
12 pages
Howto Unicode
No ratings yet
Howto Unicode
12 pages
Text Encoding
No ratings yet
Text Encoding
8 pages
Unicode - Wikipedia, The Free Encyclopedia
No ratings yet
Unicode - Wikipedia, The Free Encyclopedia
18 pages
An Introduction To Unicode - The Trainer's Friend
No ratings yet
An Introduction To Unicode - The Trainer's Friend
52 pages
Coding Encoding
No ratings yet
Coding Encoding
14 pages
Charsets Encodings Java
No ratings yet
Charsets Encodings Java
64 pages
Uni Code
No ratings yet
Uni Code
9 pages
Problem Addressed by The Topic
No ratings yet
Problem Addressed by The Topic
2 pages
10.2005.5 Unicode
No ratings yet
10.2005.5 Unicode
4 pages
CH 01
No ratings yet
CH 01
8 pages
(Digital Classical Philology) Character Encoding of Classical Languages
No ratings yet
(Digital Classical Philology) Character Encoding of Classical Languages
22 pages
Howto Unicode
No ratings yet
Howto Unicode
9 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
12 pages
Unicode and Character Sets
No ratings yet
Unicode and Character Sets
2 pages
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
No ratings yet
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
18 pages
Ascii and Unicode
No ratings yet
Ascii and Unicode
6 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
11 pages
Lecture - ASCII and Unicode
No ratings yet
Lecture - ASCII and Unicode
38 pages
Unicode in C and C
No ratings yet
Unicode in C and C
8 pages
(Ebook) Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard by Richard Gillam ISBN 9780201700527, 0201700522 2024 Scribd Download
100% (1)
(Ebook) Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard by Richard Gillam ISBN 9780201700527, 0201700522 2024 Scribd Download
81 pages
7-Text Preprocessing - ASCII and UNICODE-10!01!2024
No ratings yet
7-Text Preprocessing - ASCII and UNICODE-10!01!2024
34 pages
PPT.UNICODE
No ratings yet
PPT.UNICODE
9 pages
Uni Code
No ratings yet
Uni Code
4 pages
Universal Character Set Characters
No ratings yet
Universal Character Set Characters
34 pages
Unicode Tutorial
No ratings yet
Unicode Tutorial
15 pages
[FREE PDF sample] Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebooks
100% (1)
[FREE PDF sample] Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebooks
81 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam - Download the full ebook version right now
100% (2)
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam - Download the full ebook version right now
78 pages
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam instant download
100% (1)
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam instant download
72 pages
Utf-8 - Wikipedia, The Free Encyclopedia
No ratings yet
Utf-8 - Wikipedia, The Free Encyclopedia
10 pages
SS3 Note 2nd Term
No ratings yet
SS3 Note 2nd Term
10 pages
Uni Code
No ratings yet
Uni Code
13 pages
Immediate access to Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebook full chapters
No ratings yet
Immediate access to Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebook full chapters
87 pages
Ckumar
No ratings yet
Ckumar
25 pages
Character Sets
No ratings yet
Character Sets
1 page
Unicode Enabling of ABAP
No ratings yet
Unicode Enabling of ABAP
82 pages
Short Notes On ASCII
100% (1)
Short Notes On ASCII
16 pages
CH 02
No ratings yet
CH 02
42 pages
Unicode CPP PDF
No ratings yet
Unicode CPP PDF
139 pages
Machine Level Representation of Data Character Representation
No ratings yet
Machine Level Representation of Data Character Representation
14 pages
Text, Sound & Images
No ratings yet
Text, Sound & Images
48 pages
ASCII
0% (1)
ASCII
2 pages
Lecture-02-write
No ratings yet
Lecture-02-write
9 pages
Unicode Better Explained
No ratings yet
Unicode Better Explained
5 pages
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
No ratings yet
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
30 pages
Unicode vs UTF-8
No ratings yet
Unicode vs UTF-8
2 pages
People of Africa
From Everand
People of Africa
Edith A. How
No ratings yet
Dictionary of Computing
From Everand
Dictionary of Computing
Handz Valentin, Sr
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Data Protection and Recovery in Small Mid-Size
No ratings yet
Data Protection and Recovery in Small Mid-Size
40 pages
A New Data Structure For Cumulative Frequency Tables: Peter M. Fenwick
No ratings yet
A New Data Structure For Cumulative Frequency Tables: Peter M. Fenwick
10 pages
Image Steganography
No ratings yet
Image Steganography
13 pages
Object Oriented
No ratings yet
Object Oriented
35 pages
Chapter 8
No ratings yet
Chapter 8
31 pages
27.1.5 Lab - Convert Data Into A Universal Format
No ratings yet
27.1.5 Lab - Convert Data Into A Universal Format
11 pages
Chap2 Netw Models
No ratings yet
Chap2 Netw Models
23 pages
Cellocator Integration Practical Examples
No ratings yet
Cellocator Integration Practical Examples
24 pages
Palladium Brochure
No ratings yet
Palladium Brochure
12 pages
Hardware Hacking 101: Bsides Munich 2019 Radek Domanski, Johannes Wagner
100% (1)
Hardware Hacking 101: Bsides Munich 2019 Radek Domanski, Johannes Wagner
38 pages
VIO Commands PADMIN
No ratings yet
VIO Commands PADMIN
7 pages
TB30 - LAN Interface
No ratings yet
TB30 - LAN Interface
5 pages
Splunk 7.0.0 Installation Installation Manual
No ratings yet
Splunk 7.0.0 Installation Installation Manual
113 pages
FTK Quick InstallGuide
No ratings yet
FTK Quick InstallGuide
10 pages
The Components of Computer Systems
No ratings yet
The Components of Computer Systems
16 pages
Movie Character Database: An Access Project
No ratings yet
Movie Character Database: An Access Project
5 pages
Commands of Rhel
No ratings yet
Commands of Rhel
15 pages
Chapter 4: SQL Schema Used in Examples
No ratings yet
Chapter 4: SQL Schema Used in Examples
25 pages
Ruggedcom Win Free Radius Server Configuration e
No ratings yet
Ruggedcom Win Free Radius Server Configuration e
7 pages
Bib 2 Gls
No ratings yet
Bib 2 Gls
616 pages
Syllabus Computer Network
No ratings yet
Syllabus Computer Network
2 pages
Nonvolatile Bios Memory
No ratings yet
Nonvolatile Bios Memory
11 pages
Biraj Patel Unit3-4
No ratings yet
Biraj Patel Unit3-4
55 pages
VIOS Bestpractices VUG 062608 1.1
No ratings yet
VIOS Bestpractices VUG 062608 1.1
66 pages
Understanding U-Boot-Min Startup For DM814x: Links
No ratings yet
Understanding U-Boot-Min Startup For DM814x: Links
2 pages
Library Normalization Example
No ratings yet
Library Normalization Example
5 pages
A Course in Data Design For Relational Databases
No ratings yet
A Course in Data Design For Relational Databases
76 pages
SQLRPGLE Interview Questions (1)
No ratings yet
SQLRPGLE Interview Questions (1)
3 pages
IDF-200 User Manual
No ratings yet
IDF-200 User Manual
11 pages

Programacion Web Parte-4

Uploaded by

Programacion Web Parte-4

Uploaded by

E

Appendix ECharacter Encodings

Latin alphabet part 1

Latin alphabet part 2

Latin alphabet part 3

Latin alphabet part 4

Latin/Cyrillic alphabet part 5

Latin/Arabic alphabet part 6

Latin/Greek alphabet part 7

Latin/Hebrew alphabet part 8

Latin 5 alphabet part 9 (same as ISO-8859-1 except Turkish characters

Latin 6 Lappish, Nordic, and Eskimo

The same as ISO-8859-1 but with more characters added

Latin/Japanese alphabet part 1

Latin/Japanese alphabet part 2

Latin/Korean alphabet part 1

You might also like