0% found this document useful (0 votes)
1K views45 pages

ISO/IEC 10646 & The Unicode Standard: Mike Ksar

ISO / IEC 10646 - The Unicode Standard Achievements and Directions. ISO working group is already working on a future amendment of the latest edition. Backward compatibility and future enhanced interoperability are driving forces.

Uploaded by

henrykylaw
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views45 pages

ISO/IEC 10646 & The Unicode Standard: Mike Ksar

ISO / IEC 10646 - The Unicode Standard Achievements and Directions. ISO working group is already working on a future amendment of the latest edition. Backward compatibility and future enhanced interoperability are driving forces.

Uploaded by

henrykylaw
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

ISO/IEC 10646 The Unicode Standard Achievements and Directions

ISO/IEC 10646 & The Unicode Standard

Mike Ksar
Senior Program Manager International Standards Strategy Microsoft Corporation JTC1/SC2/WG2 Convener
Screenplay by Asmus Freytag 1

21st International Unicode Conference

Dublin, Ireland, May 2002

With the publication of the latest edition of ISO/IEC 10646-1:2000, the ISO/IEC 10646-2:2001 and Unicode 3.2 it is worth noting that our combined efforts to preserve one International Standard that is fully synchronized are a success. However, we all need to continue to work towards that objective and thwart any efforts to split the effort into two competing standards. The technical working group within ISO (SC2/WG2) is already working on a future amendment of the latest edition to ensure that parts 1 and 2 are tied together and reflect future Unicode publications. Backward compatibility and future enhanced interoperability are two strong driving forces to ensure future synchronization and flexible architecture. Several officers from the ISO technical working group participated in the review of the Unicode 3.2 (UTR # 28) to ensure the continued convergence of both standards. Unicode 3.2 is equivalent to the cumulative parts 1 & 2 of ISO/IEC 10646, inclu-ding the first amendment to part 1. Unicode 3.2 contains all additional characters in ISO/IEC 10646-2 and of the first amendment, which increases the total repertoire to 95156 encoded characters. This International Standard is one of the building blocks in creating a global and interoperable solutions across multiple platforms that meet and reflects the global market requirements. The author wishes to thank Dr. Asmus Freytag for his contributions in organizing, editing and reviewing this presentation. Unicode, Inc. Unicode, and the Unicode logo are trademarks of Unicode, Inc. and registered in some jur isdictions.
21st International Unicode Conference 1 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Outline

t t

Background Relation between Unicode and ISO/10646


What is the same What is different What is being merged Shared Process and Policies Aligned Program of Work Common publication resources Character properties & Collation Internationalization Products and Standards

Synchronization

Beyond character coding


t
20th International Unicode Conference

Summary
2 Washington, DC, January 2002

This presentation focuses on the current state and future directions of ISO/IEC 10646, including the latest edition (ISO/IEC 10646-1: 2000) and International Standard (IS) for ISO/IEC 10646 part 2, and Unicode (3.2) and future updates to continue the synchronization efforts. A quick overview of the architecture is also presented. The goal is to reduce the confusion in the user community regarding part 2 contents and to increase the confidence of the audience about the strong relationship between the ISO technical working group and the Unicode Consortium. A review of the charter, history and organizational structure and relationship of the organizations involved in developing 10646 is followed by a discussion of the core features of the International Standard and how these are compatibly extended by the additional specifications of the Unicode Standard. Any real world solution must look beyond character coding, and touches on many areas considered the domain of internationalization. Standards in these areas build on and refer to the Unicode Standard. As the Standard has become more mature, products supporting it have become more numerous and divergent in scope.

21st International Unicode Conference

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

The Internet
t

The internet pushes the envelope on internationalization


Users have easy access to documents worldwide, in any character set Servers can be accessed by users from anywhere, speaking any language Software can no longer be targeted to a single national market

The need for a single character set standard was never greater. Why do we have two?
3 Washington, DC, January 2002

20th International Unicode Conference

In recent years, the Internet has become one of the primary users and supporter of the standard. The Internet provides users with easy access to documents worldwide in any character set including ISO/IEC 10646/Unicode.

21st International Unicode Conference

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Common Charter
Develop a standard of graphic character repertoire and coding for an international graphic character set ... of the written form of the languages of the world.

20th International Unicode Conference

Washington, DC, January 2002

The charter of the ISO technical working group was developed by JTC1/SC2 in 1984 based on input from a variety of sources, particularly industry, to ensure that there is one international character set standard, rather than having national or regional standards. Having only one international character set standard is one of the major cornerstones in the development of localizable and/or localized products. The charter of the Unicode Consortium is nearly identical. Users wanted one International Standard that is flexible and expandable to meet the various market needs. An example of quickly meeting an unforeseen need is the addition of the Euro symbol. The standard must also provide a smooth interoperable environment across major platform suppliers. UTF-8, an official amendment to ISO/IEC 10646, is a major vehicle to do this in IETF and W3C Recommendations. The standard must be practical and implementable. It has been already implemented on several platforms by major computer vendors such as Compaq, Microsoft, IBM, HP, SUN and Apple. Other vendors such as NCR, Sybase, Oracle and Progress Software are actively pursuing its implementation. The standard must provide backward compatibility with existing standards with a well-defined migration path and the need to coexist with past standards until the migration is complete. An example of backward compatibility is the inclusion of compatibility characters.
21st International Unicode Conference 4 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Organizations
ISO/IEC
JTC 1: Information Technology
SC 2: Codes and Character Sets WG 2: ISO/IEC 10646
IRG: Ideographic Rapporteur Group

ANSI (US)
INCITS: Information Technology NB L2: Codes, Character Sets, and Internationalization Member and other National Bodies

SC 22: Programming Languages.. WG 20: Internationalization Liaison

The Unicode Consortium


UTC: Unicode Technical Committee
Bidi and other subcommittees

20th International Unicode Conference

Washington, DC, January 2002

There are a large number of organizations that collaborate on maintaining Unicode and 10646, not counting liaisons from external groups, such as W3C and IETF. ISO/IEC is the international organization in which JTC 1 is active in standards for Information Technology. JTC1 has a few dozen subcommittees, for example, SC 2: Codes and Character Sets with its working group 2: ISO/IEC 10646 UCS. WG2 has a separate Ideographic Rapporteur Group (IRG) focusing on defining the repertoire of Han ideographs. Another subcommittee, SC 22: Programming Languages, their Environments, and System Software Interfaces. WG 20 is one of the technical working groups in SC22. Its focus is Internationalization, the work of which is tightly related to character encoding efforts of SC2. The members in ISO, its TCs and SCs, are National Body Organizations (NB) such as ANSI for the US. Under the auspices of ANSI, INCITS is the committee for the standardization of Information Technology. Within it the committee L2: Codes, Character Sets, and Internationalization reviews the technical work of WG2, WG3 and WG20. Industry and user groups are represented directly as members in their relevant national bodies (for the US INCITS or L2), or indirectly via a membership in The Unicode Consortium, in which the Unicode Technical Committee (UTC), and its subcommittees handle all technical work. Meetings of the UTC are open, and are conducted in conjunction with L2 meetings. While SC2 has the final authority about 10646, UTC has the final authority about the Unicode Standard. Both committees are collaborating actively, via direct liaison relationships as well as indirectly through formal membership of Unicode in L2, to ensure that the two standards remain synchronized. Both committees have a group of editors. Many individuals serve in both groups, strengthening the collaboration and sharing efforts.
21st International Unicode Conference 5 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

ISO Framework
t t t

Basis for other standards: ISO, JTC1, ECMA, IETF, CEN/TC304 & W3C Well established and recognized ISO development process of standardization Worldwide expertise through national standards bodies, industry and liaison organizations Identified as one of the standards for procurement requirements by major organizations and agencies
6 Washington, DC, January 2002

20th International Unicode Conference

The ISO framework and process for standardization were used in developing ISO/IEC 10646. ISO/IEC 10646 is the basis for other standards in ISO, JTC1, ECMA, IETF, CEN/TC304 and W3C standards development organizations. Members of the technical working group include national bodies, experts from liaison organizations, such as the Unicode Consortium, and invited linguists and technical experts who are not members of official organizations. Initially (Fall 1984) only seven nations, namely the United Kingdom, France, Germany, Japan, Sweden, the United States, and the former Soviet Union participated in the development work. In the latter part of the development cycle there were at least 15 countries parti-cipating and several liaison organizations such as ECMA, The Unicode Consor-tium, CEN TC304, W3C-I18N, IETF, TC46, ITU- TS, JTC1/SC35, and others. After the standard was published, the attendance at some of our meetings reached to 19 national bodies and several additional liaison organizations such as SC22/WG20, W3C-I18N and CEN TC304. Attendance at many of our meetings is heavy and in some cases goes over 40 participants. The WG2 experts distribution list has about 100 experts and organizations. ISO/IEC 10646/Unicode is becoming one of the procurement standards for the European Union.
21st International Unicode Conference 6 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode Framework
t t t t t t

Consortium with open membership Industry backing Direct support from key implementers Open to academic and user input Cooperation with ISO, JTC1, ECMA, IETF, CEN/TC304 & W3C Unicode Technical Committee (UTC)

20th International Unicode Conference

Washington, DC, January 2002

Membership in the Unicode Consortium is open to developers or implementers. The Consortium has a solid industry backing which has been growing recently with direct support from key implementers. The academic and the user community provide input to the Unicode Consortium on a regular basis with a lot of input from linguistic and computer experts. The Unicode Consortium has been working and cooperating with the ISO technical working group that developed ISO/IEC 10646 for over 10 years. The Unicode Consortium has a strong and active working relationships with IETF , CEN/TC304 and W3C. IETF is the Internet Engineering Task Force that develops standards for the Internet. CEN/TC304 is the part of the European Standards organization (CEN) which develops European Localizatio n Requirements standards and guidelines. The focus of the technical deliberation within the Consortium is with the Unicode Technical Committee. For a complete membership list, please visit the Unicode web site URL at: < http://www.unicode.org/unicode/consortium/memblogo.html >.

21st International Unicode Conference

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode Members

Apple Basis Technology Compaq Govt. of India Govt. of Pakistan Hewlett-Packard Hyperion IBM Justsystem Microsoft
8

NCR Oracle Peoplesoft Progress Software RLG Reuters SAP Sun Microsystems Sybase Unisys
... plus Associate Member companies, Specialists and individual members
Washington, DC, January 2002

20th International Unicode Conference

21st International Unicode Conference

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Development Path
ASCII ISO 646 ... ISO 8859-x

10646
Part-...

Part-1

Part-2

...

& Unicode
1 Universal Code
Washington, DC, January 2002

Industry 8-bit Codes

IBM

Windows . . .

Other

National/Industry Multibyte Codes

20th International Unicode Conference

Instead of having a mix of 7-bit, 8-bit and multibyte industry and national standards, the ISO/IEC 10646/Unicode main feature is to have a flat structure which incorporates the repertoire of many of these standards to reduce the time it takes to develop localizable and/or localized products. In the past, several versions of localized products were developed, one for each different character set. This made synchronization and maintenance a major headache and very costly. With ISO/IEC 10646/Unicode that is not the case any longer.

21st International Unicode Conference

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Sources of Characters
t

International standards

JTC1/SC2 coded character sets JTC1/SC18 text formatting and presentation ISO TC46 bibliographic community China (GB2312), Japan (JIS 208), Korea (KSC 5601) and many others

National standards and committees

t t t

Widely supported vendor character sets Regional standards committees


ASMO, ECMA ATG & Bidi & SC2/WG2/IRG Unicode, inc., ECMA, ITU-TS, AFII, TCA, W3C, CEN/TC304 and others STIX
10 Washington, DC, January 2002

Liaison organizations: User communities

20th International Unicode Conference

A variety of sources where used to form the basis of the repertoire and content of ISO/IEC 10646 and Unicode. These include: 1. JTC1/SC2 which in the past developed only 7-bit and 8-bit standards such as ISO 646, ISO/IEC 8859-x, ISO/IEC 6937, etc. 2. JTC1/SC18 which focused on text formatting and presentation of character codes such as ISO 9541. 3. National standards organizations and committees of several countries of many countries around the world. 4. Regional standards development organizations such as ASMO (Arab Standards and Metrology Organization), ECMA (a Europe-based international organization that developed and fast tracked ISO/IEC-8859 8bit single octet character set standards), and WG2/IRG (Ideographic Rapporteur Group that developed the repertoire of the Unified Han ideographs of ISO/IEC 10646 and Unicode.) 5. A variety of liaison organizations and consortia worldwide such as ECMA and the International Telecommunication Union (ITU-TS), AFII (Association for Information Interchange), TCA (Taipei Computer Association), and of course, Unicode Inc. were sources of information in the development of ISO/IEC 10646. Liaison also exists with CEN TC304 and W3C-I18N technical working groups. 6. IT industry experts who contributed widely supported vendor character sets. 7. User communities such as STIX which helped define the repertoire of mathematical symbols.
21st International Unicode Conference 10 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions


t

ISO/IEC t 10646 Milestones t


t t t

1984: ISO starts developing 1991: Convergence with Unicode 1993: ISO/IEC 10646-part 1, First edition

Architecture & Basic Multilingual Plane Equivalent to Unicode 1.1

1998: ISO/IEC TR 15285


An operational model for characters and glyphs

1995 1999: Technical amendments


UTF-8, UTF-16, Korean, Tibetan, Braille, etc. Unicode 2.0 is equivalent through amendment 7 3 technical corrigenda 31 amendments since 10646-1: 1993 first edition Equivalent to Unicode 3.0 Unicode 3.1 includes repertoires of both 10646-1 and 10646-2 plus two additional characters Equivalent to Unicode 3.2
11 Washington, DC, January 2002

2000: ISO/IEC 10646-1, Second edition

2001: ISO/IEC 10646-2 for Planes 1, 2 & 14

2002: Amd-2 to part 1

20th International Unicode Conference

DIS

(Draft International Standard) 10646-1 approved

June 1992 May 1993

Publication of 10646-1:1993 standard 1 st edition

Publication of 10646-1:2000 2nd edition September 2000 Second edition incorporate all technical amendments (1-31), 3 technical corrigenda and several editorial corrigenda. Repertoire incremented by about 10K characters. Publication of 10646-2:2001 2001 Part 2 of 10646 Planes 1, 2 (additional Han), and 14 (language tags) November

Corresponding Unicode versions: Unicode 2.0 matches the repertoire of 10646-1:1993 through amendment #7 Unicode 3.0 matches the repertoire of 10646-1:2000 (2nd edition) Unicode 3.1 matches the repertoire of both parts: 10646-1:2000 plus 106462:2001, plus two additional characters Unicode 3.2 matches the repertoire of both parts: 10646-1:2000 plus 106462:2001, plus AMD1 to 10646-1:2000
21st International Unicode Conference 11 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode
14 Years
(1988-2002)

t t t t t t t t t t t t t t

1988: First use of name Unicode 1991: Unicode Consortium founded 1991: Unicode, Version 1 1991: First Implementers' Workshop 1991: Convergence with ISO/IEC 10646 Liaison to ISO/IEC 10646 Working Group 1992: First Unicode Technical Reports 1993: Unicode, Version 1.1 1996: Version 2.0 published 2000: Version 3.0 published Dramatic increase in number and scope of Unicode-based implementations 2001: Version 3.1 published 2002: Version 3.2 2002: 20th International Unicode Conference
12 Washington, DC, January 2002

20th International Unicode Conference

Informal meetings between industry experts on character set encoding started in 1987. The term Unicode was coined in late December 1987 so it could be fairly stated that the Unicode effort started in 1988. Some of these industry experts were also members of the ISO technical working group who have been working on ISO/IEC 10646 development since 1984. The Consortium was founded in 1991 just before Version 1 was pub lished. The Consortium and the ISO technical working group merged their efforts in Spring of 1991 in response to user input and requirements for ha ving only one concentrated effort to publish one international standard. The consortium worked in unison with the ISO technical working group to ensure that Version 2, published in 1996, incorporated all approved amendments (1-7) of ISO/IEC 10646-1. All additional amendments to ISO/IEC 10646-1 that are under ballot have been approved by the Unicode Consortium as well. Unicode 3.0 is in sync with the latest edition of ISO/IEC 10646-1: 2000. Unicode 3.1 is also in sync with with both parts of ISO/IEC 10646. Since 1991, 19 international Unicode conferences were held. The attendance reached over 400 at some of the recent ones. The next international conference, #21, will be held in Dublin, Ireland in May 2002.
21st International Unicode Conference 12 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Outline

t t

Background Relation between Unicode and ISO/10646


What is the same What is different What is being merged

20th International Unicode Conference

13

Washington, DC, January 2002

21st International Unicode Conference

13

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Code Space & Structure


ISO/IEC 10646 Parts 1 and 2
Only

use code space in planes 0 to 16 Plane 16 Private Define characters only in planes Plane 15 Use 0 (BMP), 1, 2 & 14 so far Private Use Reserve planes 15, 16 . . . Plane 14 for private use .. .
Plane 02 Plane 01 Plane 00

. ..
Planes

BMP
20th International Unicode Conference 14 Washington, DC, January 2002

The ISO/IEC 10646 plane is the basic grouping of characters, each plane can encode up to 64K characters. The first Plane (plane 0) is called the Basic Multilingual Plane or BMP. In the latest edition of ISO/IEC 10646, the BMP (Plane 0) and three supplementary planes (Planes 1, 2 and 14) contain characters. Planes 15 and 16 are reserved for private use. All other planes are empty. Unicode consists of the BMP of ISO/IEC 10646 plus the additional 16 planes of ISO/IEC 10646. While in its canonical form, ISO/IEC 10646 is a 31-bit or 4byte code, no characters will be assigned outside Planes 0-16 Practically speaking, do we really need to allow space for some 2 billion characters in ISO/IEC 10646? The best estimate is that a total of 250,000 characters will be required to encode the worlds scripts. However, the exact count of characters to be added will not be known until the encoding of all the scripts has been completed. UTF-16 defines the practical limit of code space in ISO/IEC 10646 and Unicode. Even though 3 additional planes might have been sufficient to meet the 250,000 character estimate, it was felt that allowing 16 additional planes (~1,000,000 characters) would be much safer. UTF-16 addresses these by using pairs of 16-bit codes in the BMP. They can also be directly accessed using UTF-32, which is equivalent to UCS4 for these additional planes. Proposals for repertoires of additional scripts and characters have been received. Since space in the BMP is extremely limited, most new scripts and 21st International Unicode Conference 14 Dublin, Ireland May 2002 characters will be coded in planes outside the BMP.

ISO/IEC 10646 The Unicode Standard Achievements and Directions

A Plane in 10646
Plane (16-bits) Row
tA

Cell

65,536 characters
20th International Unicode Conference 15

plane is the basic division of code-space in ISO/IEC 10646 t The first plane (Plane 0) is the Basic Multilingual Plane (BMP) t Unicode 3.1 matches planes 0-16

Washington, DC, January 2002

Definition of a plane: A plane is the basic code space of ISO/IEC 10646. A plane is made up of 256 rows containing 256 cells. Code points can be accessed either as a 2-octet value (16-bit value) or a 32-bit value. Definition of the first plane - BMP (Basic Multilingual Plane). This is where 49194 characters are coded with the publication of ISO/IEC 10646-1:2000. AMD1 adds 1016 new characters to the BMP for a combined total of 50210. Unicode3.0 matches repertoire of the latest edition ISO/IEC 10646-1:2000. ISO/IEC 10646-2 is now published and is available from the ISO web site, www.iso.ch, in Geneva, Switzerland. It contains additional characters in planes 1, 2 and 14. Unicode 3.1 repertoire encompasses the repertoires of both parts of 10646. Unicode 3.2 also contains the complete repertoire of AMD1

21st International Unicode Conference

15

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Basic Multilingual Plane


C0 Controls C1 Controls

Alphabets, Symbols, CJK Auxiliary, Hangul, . . . Unified Chinese, Japanese, Korean Ideographs

Reserved for accessing code points outside BMP (2048) Private Use (6K), Compatibility Area, Arabic Presentation Forms, . . . (8190)
20th International Unicode Conference 16 Washington, DC, January 2002

The Basic Multilingual Plane consists of the following areas: Alphabets, symbols, CJK Auxiliary, Hangul Ideographic characters such as Unified Chinese, Japanese and Korean Reserved for future allocation Special area which points to an additional 16 planes of coding space beyond the BMP that can be accessed via 2 sets of 16-bit values. Other use area which includes 6K of private use characters, a compatibility area and a set of Arabic Presentation Forms.

21st International Unicode Conference

16

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Character Count Statistics


10646: Unicode: BMP Alphas/Symbols Supplemental Alphas/Symbols Han (URO) Han (Extension A) Han (Extension B) Han Compatibility Supplemental Han Compatibility Hangul/Korean Syllables Subtotal BMP Private Use Supplemental Private Use Surrogate Code Points Controls BMP Non-characters BMP Reserved Supplemental Reserved 1st ed U 1.0 4748 U 1.1 6309 U 2.0 6509 to AMD7 U 2.1 6511 2nd ed U 3.0 10236 +Part2 U 3.1 10238 1691 20902 6582 42711 302 542 11172 94140 6400 131068 65 34 7793 872532

20902

20902

20902

20902

20902 6582 302 11172 49194 6400 131068 2048 65 2 7827 917476

302 2350 28302 5632 2048 65 2 31535

302 6656 34169 6400 2048 65 2 24900

302 11172 38885 6400 131068 2048 65 2 18136 917476

302 11172 38887 6400 131068 2048 65 2 18134 917476

Note that Unicode 3.1 has two additional characters than ISO 104 646. been approved as part of the first amendment to ISO 10464-1.
20th International Unicode Conference 17

These two characters have


Washington, DC, January 2002

The statistics were kindly compiled by Ken Whistler of Sybase. Unicode 3.2 adds 1016 new characters, in the BMP Alpha/Symbols and Han Compatibility areas. This brings the new subtotal to 95156 characters. Unlike 3.1, Unicode 3.2 matches the combined character repertoire of the two parts of ISO/IEC 10646, as well as AMD1 to ISO/IEC 10646-1:2000.

21st International Unicode Conference

17

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Adopted Form
t

ISO/IEC 10646 is a 16-bit or 32-bit code


UCS-2: for accessing code points in BMP, 2-bytes (16-bits) UCS-4: canonical form for accessing any code point using 4-bytes (32-bits) UTF-8: for use in 8-bit environments (e.g. HTML, XML) (variable length code, 1 to 6 bytes/character) UTF-16: for use with UCS-2 to access sixteen additional planes beyond the BMP Unicode 3.2 supports UTF-8, UTF-16 and UTF-32. UTF-32 is equivalent to UCS-4, with an upper limit of 10FFFFx.
18 Washington, DC, January 2002

Transformation formats

Note:

20th International Unicode Conference

Conformance of information interchange in ISO/IEC 10646 shall specify: 1. The Adopted Form 2. The Implementation Level 3. The subset The Adopted Form can be: 1. UCS-4 (32-bit) canonical form or 2. UCS-2 (16-bit) form Note that there are also two Transformation Formats: 1. UTF-8 for use with 8-bit encoding environments (e.g. Unix, HTML, XML) 2. UTF-16 for use with UCS-2 to access sixteen additional planes beyond the BMP using 16-bit encoding. Note 1: UTF-16 can be considered another adopted form of coded representation but it is actually compatible with the two-octet (16-bit) form of UCS-2. Note 2: Unicode 3.1 supports UTF-8, UTF-16 and UTF-32 which is equivalent to UCS-4, with an upper limit of 10FFFFx.

21st International Unicode Conference

18

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Implementation Levels
t

Implementation level for combining sequences


Level 1: only precomposed characters Level 2: restricted combining sequences Level 3: unrestricted combining sequences

Unicode has no formal restrictions on combining sequences

An implementation may choose to support a subset of characters which does not contain any or all combining characters

20th International Unicode Conference

19

Washington, DC, January 2002

The second conformance attribute in ISO/IEC 10646 is the declaration of the Implementation Level. There are three Implementation Levels: 1. Level 1: Only precomposed characters with assigned code locations 2. Level 2: Restricted usage of combining sequences for certain scripts such as Indic. 3. Level 3: Unrestricted usage of combining sequences. Note that Unicode has no formal restrictions on combining sequences. Any implementation may choose to support a subset of characters whic h does not contain any or all combining sequences.

21st International Unicode Conference

19

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Collections for Subsets

Collections of coded graphic characters The collections listed below are ordered by collection number. An * in the positions column indicates that the collection is a fixed collection. Collection number and name Positions 1 BASIC LATIN 0020 - 007E * 2 LATIN- 1 SUPPLEMENT 00A0 - 00FF * 3 LATIN EXTENDED-A 0100 - 017F * 4 LATIN EXTENDED- B 0180 - 024F 5 IPA EXTENSIONS 0250 - 02AF 6 SPACING MODIFIER LETTERS 02B0 02FF Etc.

The Unicode declared subset is the whole of the BMP plus planes 1-16 accessible through UTF-16
20 Washington, DC, January 2002

20th International Unicode Conference

The third element of conformance to ISO/IEC 10646 is the declaration of a subset. Annex A of ISO/IEC 10646 includes a predefined list of subsets and collections including the whole of the BMP. Here is a partial list:
BMP, BMP first edition, BMP-AMD, BMP second edition Basic Latin, Latin-1 Supplement, Extended A & B, Latin Extended Additional IPA Extensions, Spacing Modified Letters Combining Diacritical Marks Basic Greek, Extended, Greek Symbols & Coptic Extended Cyrillic Armenian Basic Hebrew, Extended Basic Arabic, Extended, Presentation Forms A & B Devanagari, Bengali, Gurmukhi, Gujarati, Oriya,Tamil, Telugu, Kannada, Malayalam Thai, Lao Basic Georgian, Extended Hangul Jamo, Compatibility Jamo, Hangul Syllables General Punctuation, Superscripts and Subscripts Currency Symbols, Combining Diacritical Marks for Symbols, Letterlike Symbols, Number Forms Arrows, Mathematical Operators, Misc. Technical Control Pictures, Optical Character Recognition, Enclosed Alphanumerics Box Drawing, Block Elements, Geometric Shapes, Misc. Symbols, Dingbats CJK Symbols and Punctuation Hiragana, Katakana, Bopomofo CJK Unified Ideographs, CJK Extension A CJK Compatibility Ideographs Combining Half Marks CJK Compatibility Forms, Small Form Variants Halfwidth and Fullwidth Forms Basic Tibetan, Tibetan Extended Unified Canadian Aboriginal Syllabics Ethiopic, Cherokee, Braille Patterns Yi Syllables, Yi Radicals, KangXi Radicals CJK Radicals Supplement Ogham, Runic, Sinhala, Syriac, Thaana Basic Myanmar, Extended Myanmar Khmer, Mongolian Private Use Area, Specials

21st International Unicode Conference

20

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode Implements
t t

BMP plus next 16 planes Three encoding forms


UTF-8 UTF-16 UTF-32 (0 to 10FFFF)

t t

Implementation level 3 No subsets

Unicode encourages transparency so that implementations can at least retransmit every character undamaged, but the level of support is otherwise explicitly left to the implementation
21 Washington, DC, January 2002

20th International Unicode Conference

Unicode supports the Basic Multilingual plane plus the next 16 planes. In the fall of 1993 the ISO technical working group and the Unicode Consortium estimated that 16 additional planes would be enough to incorporate the repertoire of characters for the long term. This repertoire of ~1,000,000 characters fits within the limits of UTF-16. Unicode recently made the three encoding forms UTF-8, UTF-16 and UTF-32 fully equivalent representations of Unicode. Implementers might and can use additional planes, such as Planes 15 and 16, which are specifically allocated for private use. 10646 recently removed additional private use space, not accessibly by UTF-16, which brings the two standards in line in terms of the range of characters accessible. A formal restatement of UTF-8, which technically still allows 5 and 6-byte sequences in 10646, but which don't access any assigned characters, is planned. This would make not just the practically used, but also the formal definition of UTF-8 equivalent between both standards. Unicode supports implementation level 3 which is a superset of levels 1 and 2. There is no formal way in the Unicode Standard of identifying subsets. Unicode encourages transparency so that implementations can at least retransmit every character undamaged, but the level of support is otherwise explicitly arbitrary.
21st International Unicode Conference 21 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode - 10646 Relationship


t t t

ISO/IEC 10646 is a character encoding standard Unicode is code for code compatible with ISO/IEC 10646 Unicode defines additional specifications about behavior and use of characters such as bidi algorithm, ordering, mappings, equivalence algorithm and other semantics Conformant implementations of Unicode are conformant implementations of ISO/IEC 10646
22 Washington, DC, January 2002

20th International Unicode Conference

In Summary Relationship of Unicode and ISO/IEC 10646: ISO/IEC 10646 is a character coding standard. The charter of the technical working group that developed ISO/IEC 10646 does not allow it to define or specify how it should be implemented. ISO/IEC JTC1 subcommittees and their respective working group develop IT standards whose specifications that are the basic ingredient rather than how a standard is implemented. Thus ISO/IEC 10646 is not an implementation standard for a Universal character set but it is purely a technical specification of a character encoding standard. Unicode on the other hand defines a set of implementation guidelines and specifies precise behavior of coded characters based on their properties and semantics.

21st International Unicode Conference

22

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode: Beyond 10646


In addition to character codes Unicode specifies: t Behavior and use of characters t A complete bidi algorithm t An equivalence algorithm t Normalization t Additional character properties and semantics for spacing, zero-width space, combining characters, numeric, case and casing, directionality, letters, math operators etc
20th International Unicode Conference 23 Washington, DC, January 2002

ISO/IEC 10646 is a character set standard but Unicode is more than that. Unicode specifies semantics and attributes to character codes defined in ISO/IEC 10646. Here are some of the items that Unicode specifies which is outside the ISO technical working group. Identity, behavior and use of characters whereas ISO/IEC 10646 only identifies them. A complete bidi algorithm A canonical ordering algorithm for determining character equivalence Additional character properties and semantics for directionality, case, symmetric swapping, letters, mathematical operators, zero-width space, combining characters, and others. Order and use of double-diacritic non-spacing marks A mapping for compatibility characters Default shaping behavior of cursive scripts Which combining marks are non-spacing marks Default mapping tables for conversion to and from other character set standards. Rendering of Indic characters.

21st International Unicode Conference

23

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode: Beyond 10646 (Cont.)


t t t t t t t

Which combining marks are non-spacing marks Order and use of double-diacritic non-spacing marks A mapping for compatibility characters Default shaping behavior of cursive scripts Default mapping tables for conversion to and from other character set standards Rendering for Indic characters Line breaking
24 Washington, DC, January 2002

20th International Unicode Conference

See previous slide to what Unicode specifies beyond what ISO/IEC 10646 defines.

21st International Unicode Conference

24

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Outline

t t

Background Relation between Unicode and ISO/10646


What is the same What is different What is being merged

Synchronization
Shared Process and Policies Aligned Program of Work Common publication resources

20th International Unicode Conference

25

Washington, DC, January 2002

21st International Unicode Conference

25

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Continued Cooperation
t

Architecture changes:
UTF-32
w

(Proposed Amendment)

Restricts UCS-4 to planes 0 to 16

Future

editorial and technical corrigenda to second edition | of ISO/IEC 10646-1: 2000 (will be part of Unicode 3.2) 10646-2 (planes 1, 2 & 14)

Repertoire extensions (included in Unicode 3.2)


ISO/IEC
w w w

Plane 1, mathematics, hieroglyphs, music symbols, etc Plane 2, CJKV ideographic extensions Plane 14, language tags

Support current and future implementers


Increase awareness and provide technical help Continued synchronization of future editions of

ISO/IEC 10646 and the Unicode Standard


20th International Unicode Conference 26 Washington, DC, January 2002

And the Convergence Continues: Here are our successes: 1. Architecture extensions development of 10646-2 with planes 1, 2 and 14 Future editorial and technical corrigenda to the second edition of ISO/IEC 10646-1: 2000 Unicode 3.1 includes the repertoires of ISO/IEC 10646-1 and 2 Unicode 3.2 will also include Amendment 1 of ISO/IEC 106461:2000 which is under development. 2. Repertoire extensions to add additional scripts or additional characters of existing scripts 3. Support for current implementers in their development efforts 4. Increase awareness and provide technical help for new developers 5. Continue synchronization efforts to ensure future editions of ISO/IEC 10646 and the Unicode Standard are on a solid footing

21st International Unicode Conference

26

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Going in the Same Direction

One standard
No dialects

Common usage

Common Encoding Forms


UTF-8

UTF-16 UTF-32/UCS-4 Examples: UTF-8, UTF-16, UTF-32, EURO, collation, tags

Cooperation with ISO

t Incorporation into other IETF WWW Consortium (W3C) t Shared


20th International Unicode Conference

standards

expertise for lesser-used and obscure scripts


27 Washington, DC, January 2002

Synchronization between Unicode and ISO/IEC 10646 requires: 1. Ensuring that there is only one International Standard for a Universal Character Set with no dialects and one common usage. 2. There is only one Unicode standard which has the same architecture and repertoire as ISO/IEC 10646 Keeping a good working relationship and continued cooperation between the ISO technical working group (SC2/WG2), the Unicode Consortium and other entities that use the standard. Real-world examples of this cooperation include the speed by which technical amendments and additional repertoires were completed. UTF-32 enables implementers to support 32-bit values which is equivalent to UCS-4. UTF-16, enabled implementers to access encoding space outside the BMP in 16-bit values. UTF-8, enables Unix-based systems and web applications to utilize the encoding of 10646 in 8-bit octets instead of 16-bit octets. The repertoire additions for a few critical characters such as the EURO made it possible to meet the requirements of the European Union requirements sooner than expected. 4. Ensure that the standard is used as a normative and/or informative reference for other standards.
21st International Unicode Conference 27 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

WG2 Program of Work


t t t

1st Amendment 10646-1:2000 2nd Amendment 10646-1: 2000 1st Amendment 10646-2: 2001 WG2 future meetings: Meeting 42 Dublin, Ireland Meeting 43 Tokyo, Japan

March 2002 December 2002 2003

May 2002 December 2002

20th International Unicode Conference

28

Washington, DC, January 2002

21st International Unicode Conference

28

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Outline

t t

Background Relation between Unicode and ISO/10646


What is the same What is different What is being merged Shared Process and Policies Aligned Program of Work Common publication resources Character properties & Collation Internationalization Products and Standards

Synchronization

Beyond character coding


20th International Unicode Conference

29

Washington, DC, January 2002

21st International Unicode Conference

29

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Collation & Character Properties


t

ISO/IEC 14651 Collation Standard


Produced by SC22/WG20 Internationalization Matches Unicode Collation Algorithm Unicode Technical Standard (UTS) #10

Unicode Character Database


Collection of character classification and properties Geared towards the needs of implementers Supports Internationalization http://www.unicode.org/Public/UNIDATA

20th International Unicode Conference

30

Washington, DC, January 2002

Real life solutions must go beyond character encoding. The Unicode Consortium is very active in standards that are closely related to Unicode and or using Unicode. For example it cooperated with ISO/IEC JTC1/SC22/WG20 on creating ISO/IEC 16451, a standard on collatio n that matches Unicode Technical Standard (UTS) #10, Unicode Collation Algorithm (UCA). The maintenance of these standards is expected to proceed in synchrony, but will always lag the development of 10646, as meaningful default collation information for new characters and scripts may not be available at the time they are first encoded. The Unicode Character Database is an ever growing collection of useful and indispensable character classifications and properties, covering the needs of actual implementers of the standard. The UCD, as it is affectionately known, is referenced by such important algorithms as the Bidi Algorithm and Normalization. Other areas of text handling and internationalization, such as line breaking, Casing and East Asian Width are also supported in the UCD.

21st International Unicode Conference

30

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Language Innovation
SOURCE: Identifiers Comments Literals Data Types: char wchar_t C / C++ ASCII Local charset byte oriented L converts local charset JAVA / C# Unicode Unicode Unicode

Byte oriented

Unicode

Unicode on N/A some implementations


31 Washington, DC, January 2002

20th International Unicode Conference

More innovative programming languages, such as Java and C#, are based on ISO/IEC 10646/Unicode from the outset. The full repertoire of Unicode is supported in all of the following Identifiers Comments Literals Data type - char within typical lexical restrictions on identifiers, etc.. Access to legacy data is supported by a rich set of code converters which can be invoked (sometimes transparently) to read and write external data. More traditional programming languages, such as C and C++, follow a character set independent philosophy, but tend to support both byte oriented characters (completely) as well as "wide" characters (with some restrictions). However, there is no guarantee in the language that wide characters use Unicode code assignments, nor is UTF-16 supported, as it is neither one codeunit per charac-ter, nor one byte-per code-unit, which are the explicit or implicit assumptions behind the wide character or the standard 'char' data types respectively. While these languages are sufficiently flexible to allow support of UTF-16 data using user-defined data types, converting legacy source code which may rely on large amounts of literal character data has proven challenging. Unicode is supporting an effort in the relevant ISO standards committees to address this issue. Older languages, such as APL, have seen various efforts at retrofitting Unicode
21st International Unicode Conference support as well. 31 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Types of Products Full Set

Products Are Here!


Phase 2: Increased Functionality
More Scripts, Combining Characters, etc.

Phase 1: Deliver a full set of products


Browsers, Development Tools, Fonts, Word Processors, etc.

1994 93

1995

1996

1997

19981999

2000 and beyond

Increased Function of Products


20th International Unicode Conference 32 Washington, DC, January 2002

As the standard matures, the breadth and scope of products implementing it increase simultaneously. Phase 1 primary focus was on platform support, development tools, browsers, fonts, printers support and a few end user applications. Phase 2 primary focus is on additional platform support, increased development tools, increased functionality to support additional scripts, usage of combining characters and full range of applications suites.

21st International Unicode Conference

32

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Products Are HERE!


t t t t t t t t t t t t

Microsoft: Windows XP, Office XP, Internet Explorer 6.0, ECMAScript, C#/CLI Compaq: Tru64 Unix HP: HP-UX & Printers Netscape: communicator 6.0, JavaScript, ECMAScript Sun: Solaris & Java Apple: Cyberdog, Mac OS X Lotus: Lotus Suite Asian solutions: JustSystems (Ichitaro) and Star+Globe (MASS) Databases: Software AG, Sybase, Oracle, DB2, NCR Teradata, Progress Software SAP platform Fonts: Adobe, Agfa/Monotype, Apple Advanced Typography, Bitstream, OpenType Tools and libraries: several vendors
33 Washington, DC, January 2002

20th International Unicode Conference

This is a partial list of additional products not listed above: Tools and libraries from several vendors which include: Alis Technologies: Unity for Windows Sybase: UDK (TM) - Developers Kit for Unicode and URK (TM) Runtime Kit for Unicode ECO Kommunication: Multi- lingual passport for Windows Gamma Productions: Gamma Unitype, Universe MRJ: Symbolic OCR for Japanese Novell: NetWare 4.01 Directory Services PENKEY: SAVANT 2.0 Universal Handwriting Recognition Production First Software: Typographic International Series PostScript Fonts Stonehand: Stonehand Composition Toolbox URW Software & Type: URW EuroWorks, V 1.5 Visix Software: Galaxy Application Environment Y&Y: Lucida Sans Font Zinc Software: Zinc Application Framework 3.6

21st International Unicode Conference

33

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Version 3.2 Is Here!

Version 3.2 is in sync with both parts of ISO/IEC 10646 and 1 st amendment to 10646-1
total repertoire of 95156 characters completed math repertoire for MathML and other uses

Further restriction on ill-formed UTF-8

http://www.unicode.org/unicode/reports/tr28
20th International Unicode Conference 34 Washington, DC, January 2002

Unicode 3.2 is here. It is in synch with the last edition of ISO/IEC 10646-1:

2000 including its first amendment, as well as ISO/IEC 10646-2:2001. The a total repertoire consists of 95156 characters. The stability of the Unicode standard is now stated very clearly and unambiguously: Each version of the Unicode Standard, once published, is absolutely stable and will never change. Implementations or specifications that refer to a specific version of the Unicode Standard can rely upon this stability. If future versions of these implementations or specifications up grade to a future version of the Unicode Standard, then some changes may be necessary. Unicode 3.2 further restricts UTF-8 to eliminate irregular sequences posing potential difficulties in interchange. The former UTR #21, Case Mappings has been upgraded in status to a Unicode Standard Annex in Unicode 3.2. This means that UAX #21, Case Mappings is now formally a part of the Unicode Standard. New property names and property name aliases.

21st International Unicode Conference

34

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Outline

t t

Background Relation between Unicode and ISO/10646


What is the same What is different What is being merged Shared Process and Policies Aligned Program of Work Common publication resources Character properties & Collation Internationalization Products and Standards

Synchronization

Beyond character coding


t
20th International Unicode Conference

Summary
35 Washington, DC, January 2002

21st International Unicode Conference

35

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Common Repertoire
t

The character repertoire of Unicode and ISO/IEC 10646 are exactly identical

Three matching encoding forms

There are minor differences in


Terminology Publication format

Any conformant Unicode implementation conforms to ISO/IEC 10646

20th International Unicode Conference

36

Washington, DC, January 2002

The core feature of ISO/IEC 10646, its repertoire is exactly identical to the Unicode Standard. In addition, both standards support encoding forms that are equivalent and directly interchangeable. Minor differences do exist though, mostly in areas of terminology. These are well documented. Another difference involves the publication formats. ISO/IEC 10646 contains two parts whereas Unicode 3.1 covers both parts of 10646. Implementations that conform to the Unicode standard also conform to ISO/IEC 10646, allowing one implementations to conform to both standards. This synchronization between the standards will continue into the future.

21st International Unicode Conference

36

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Unicode Extends...
t t t

Character semantics

Discover and catalogue Relate characters to their established use

Canonical and compatibility equivalence Technical reports with implementation guidelines


Normalization Script behavior such as bi-directional algorithm

Active promotion of the standard


37 Washington, DC, January 2002

20th International Unicode Conference

The scope of Unicode extends that of ISO/IEC 10646 in the following areas: 1. Character properties and semantics 2. Support of an algorithm that supports bi-directional scripts 3. Canonical and compatibility equivalence algorithm 4. Publication of technical reports and implementation guidelines 5. Active promotion of the standard in the development community including the Internet.

21st International Unicode Conference

37

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

What Do 10646 and Unicode Do for You?


t

Global interoperability - write once run

everywhere; One source code one binary with user installable/callable locales t Simplified software - one application with one code set versus multiple applications and managing different code sets t Data stability - A single common and widely adopted format t Reduced costs - development, maintenance, training
20th International Unicode Conference 38 Washington, DC, January 2002

The benefits of the standard are highly important yet they are seldom pointed out. Key benefits include: Global Interoperability A platform implementation can be written to allow user installable locales by script or country/region of the world. A piece of software can be written to allow a user, for example in Japan, to create a bilingual presentation in Japanese and English, Japanese only, or English only. The user does not have to transform the presentation from a Japanese system to an English system. The system would understand what the user locale is and would work using a user installable/callable locale including the appropriate fonts, formats and locales. Such a platform or application can be written using a single source and a single binary. One needs to remember that user locale names by various vendors do not necessarily use the exact same names. Simplified Software One Application with One Code versus Multiple Applications and Managing Different Code Pages and Conversions Data Stability A single common and widely adopted format. Reduced Cost Development, Maintenance, Training
21st International Unicode Conference 38 Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Great Expectations

t t t

Enhance global interoperability Enhance data interchange Permit easier development of localizable products Reduce development cost of localized application software Replace retrofitting with concurrent development

20th International Unicode Conference

39

Washington, DC, January 2002

Great Expectations of ISO/IEC 10646 usage and Unicode Implementations: Enhance interoperability between various system platforms and applications Enhance data interchange using physical devices or through networks Reduce development cost of localized application software. If platforms have built- in support for one universal character set and applications are designed in a localizable fashion, then the cost of retrofitting platforms and applications is almost eliminated. Entering new growth markets or jumping on an opportunity in an already defined market becomes easier since engineering time is eliminated. No more code page overlays Gone are the days of having a specific version of application software for each country/region or script.. Now there is only one flat structure where each character has a unique code. What one needs are the localizability and localization development tools as well as the fonts - but not the various character code pages of the past. This permits easier developme nt of localized applications.

21st International Unicode Conference

39

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Recommendations
t

Buy the international standard (including all published amendments) as well as the Unicode standard

Watch for updates on the web including Unicode technical reports and ISO amendments

t t

Join the Unicode consortium, W3C, your national body standards committee or other organization to influence standards development processes Define your needs and communicate them to your vendors Build products that support ISO/IEC 10646 and The Unicode Standard
40 Washington, DC, January 2002

20th International Unicode Conference

How to proceed:

Get a copy of the International Standard 10646 and the Unicode Standard.

Evaluate the value of ISO/IEC 10646/Unicode to your organization based on market need. Build your products in a localizable way. When a new market emerges or an opportunity presents itself to enter an existing market no retrofitting of your platform or applications would be necessary.

Define any new needs that you may discover and communicate them to your vendors and suppliers.

Monitor product announcements that support Unicode implementation and ISO/IEC 10646. Ensure that the options supported satisfy your own needs. Join the Unicode Consortium, W3C, your national standards body and other standards development organization that are using ISO/IEC 10646/Unicode as a basis for their standardization efforts ensure your views are he ard and ultimately reflected in future guidelines and standards.

21st International Unicode Conference

40

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

References
The Unicode Standard, Version 3.0
The Unicode Consortium
http://www.unicode.org

Addison-Wesley - ISBN 0-201-48345-9


Updated by UAX#27 (3.1) and UAX#28 (3.2) http://www.unicode.org/unicode/reports/

ISO/IEC 10646
Part 1: Architecture and Basic Multilingual Plane Part 2: Supplementary Planes ISO CENTRAL SECRETARIAT
1, rue de Varemb CH-1211 Genve 20, Switzerland Phone: + 41 22 749 01 11 Fax: + 41 22 733 34 30 email: [email protected]
20th International Unicode Conference 41 Washington, DC, January 2002

Here are the two most important references one needs: The Unicode Standard - Version 3.0. It can be ordered from Addison-Wesley - ISBN 0-201-48385-9 The Unicode Standard Version 3.1 available on the Unicode web site at < http://www.unicode.org/unicode/reports/tr27 > The official ISO/IEC 10646-1: 2000 standard (2nd edition) can be ordered directly from your national body or directly from ISO in Geneva, Switzerland. Its cost has been reduced to about $50.00 compared to the previous price of about $400.00. Monitor the SC2/WG2 home page and the Unicode home page to ensure you also get a copy of the latest amendments.

21st International Unicode Conference

41

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Organizations
ISO/IEC JTC1/SC2 Secretariat
JISC - ITSCJ Kikai Shinko Building 3-5-8 Shibakoen, Minato-Ku Tokyo 105, Japan Phone: + 81 3 34 31 28 08 Fax: + 81 3 34 31 64 93 Internet: [email protected]

The Unicode Consortium


P. O. Box 391476 Mountain View. CA 94039 Phone: +1 (650) 693-3921 Fax: +1 (650) 390-3010 email: [email protected]
20th International Unicode Conference 42 Washington, DC, January 2002

Addresses and contact information for some of the organizations involved in ISO/IEC 10546 and The Unicode Standard. These addresses include the Secretariat of SC2, (hosted by Japan) and the head-office of the Unicode Consortium.

21st International Unicode Conference

42

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Electronic Distribution Lists


[email protected]
to subscribe, send a mail message

subscribe [email protected] unicode


to:

[email protected] [email protected]
to subscribe, send a mail message

sub iso10646 first_name last_name


to:

[email protected]
20th International Unicode Conference 43 Washington, DC, January 2002

Here are a couple of electronic distribution lists that you can subscribe to: 1. Unicode Consortium: send a mail message to: [email protected] The mailing list is designated for general discussions and information on Unicode and related issues. While some technical discussions amo ng interested experts take place on this list, there is a separate list for technical discussions intended to lead to changes in the Unicode standard. That list is accessible to members and member organizations of the Unicode consortium and their employees. 2. ISO10646 Reflector, hosted by Johns Hopkins University. To subscribe send the following one- line mail message to [email protected]: sub iso10646 first_name last_name This is a discussion forum for general audiences

21st International Unicode Conference

43

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

WWW Home Pages


JTC1/SC2: www.dkuug.dk/JTC1/SC2 SC2/WG2: www.dkuug.dk/JTC1/SC2/WG2 Unicode: www.unicode.org W3C: www.w3.org/International/ ISO: www.iso.ch
20th International Unicode Conference 44 Washington, DC, January 2002

Here are several URL pointers to web home pages that carry and communicate additional information on character set standards: 1. JTC1/SC2: www.dkuug.dk/JTC1/SC2 This is the official ISO Technical Committee responsible for character encoding. It contains two technical working groups, WG2 and WG3. WG2 is responsible for multi-octet encoding of ISO/IEC 10646 development, amendments and maintenance whereas WG3 is responsible for 8-bit character coding standards such as ISO/IEC 8859 and all its parts. 2. JTC1/SC2/WG2: www.dkuug.dk/JTC1/SC2/WG2 This is the official ISO working group home page which is on a web site in Denmark. 3. Unicode Consortium: www.unicode.org The is the official Unicode home page. 4. W3C: www.w3.org/International/ This is the official site of the World Wide Web Consortium Internationalization working group. Access to documents is limited to members of W3C. 5. ISO: www.iso.ch This is the official ISO home page with information on all ISO activities including JTC1, the parent organization of SC2/WG2.

21st International Unicode Conference

44

Dublin, Ireland May 2002

ISO/IEC 10646 The Unicode Standard Achievements and Directions

Thank You!

21st International Unicode Conference

45

Dublin, Ireland, May 2002

21st International Unicode Conference

45

Dublin, Ireland May 2002

You might also like