|
| 1 | +# BIG5.TXT |
| 2 | +# Date: 2015-12-02 23:52:00 GMT [KW] |
| 3 | +# © 2015 Unicode®, Inc. |
| 4 | +# For terms of use, see http://www.unicode.org/terms_of_use.html |
1 | 5 | #
|
2 | 6 | # Name: BIG5 to Unicode table (complete)
|
3 | 7 | # Unicode version: 1.1
|
4 |
| -# Table version: 0.0d3 |
| 8 | +# Table version: 2.0 |
5 | 9 | # Table format: Format A
|
6 |
| -# Date: 11 February 1994 |
7 |
| -# Authors: Glenn Adams < [email protected]> |
8 |
| -# John H. Jenkins < [email protected]> |
9 |
| -# |
10 |
| -# Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved. |
11 |
| -# |
12 |
| -# This file is provided as-is by Unicode, Inc. (The Unicode Consortium). |
13 |
| -# No claims are made as to fitness for any particular purpose. No |
14 |
| -# warranties of any kind are expressed or implied. The recipient |
15 |
| -# agrees to determine applicability of information provided. If this |
16 |
| -# file has been provided on magnetic media by Unicode, Inc., the sole |
17 |
| -# remedy for any claim will be exchange of defective media within 90 |
18 |
| -# days of receipt. |
19 |
| -# |
20 |
| -# Recipient is granted the right to make copies in any form for |
21 |
| -# internal distribution and to freely use the information supplied |
22 |
| -# in the creation of products supporting Unicode. Unicode, Inc. |
23 |
| -# specifically excludes the right to re-distribute this file directly |
24 |
| -# to third parties or other organizations whether for profit or not. |
| 10 | +# Date: 2011 October 14 (header updated: 2015 December 02) |
25 | 11 | #
|
26 | 12 | # General notes:
|
27 | 13 | #
|
28 |
| -# This table contains the data Metis and Taligent currently have on how |
29 |
| -# BIG5 characters map into Unicode. |
| 14 | +# |
| 15 | +# This table contains one set of mappings from BIG5 into Unicode. |
| 16 | +# Note that these data are *possible* mappings only and may not be the |
| 17 | +# same as those used by actual products, nor may they be the best suited |
| 18 | +# for all uses. For more information on the mappings between various code |
| 19 | +# pages incorporating the repertoire of BIG5 and Unicode, consult the |
| 20 | +# VENDORS mapping data. |
30 | 21 | #
|
31 | 22 | # WARNING! It is currently impossible to provide round-trip compatibility
|
32 |
| -# between BIG5 and Unicode. |
| 23 | +# between BIG5 and Unicode. |
33 | 24 | #
|
34 | 25 | # A number of characters are not currently mapped because
|
35 | 26 | # of conflicts with other mappings. They are as follows:
|
|
46 | 37 | #
|
47 | 38 | # We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER.
|
48 | 39 | # It is also possible to map these characters to their duplicates, or to
|
49 |
| -# the user zone. |
50 |
| -# |
| 40 | +# the user zone. |
| 41 | +# |
51 | 42 | # Notes:
|
52 | 43 | #
|
53 | 44 | # 1. In addition to the above, there is some uncertainty about the
|
54 | 45 | # mappings in the range C6A1 - C8FE, and F9DD - F9FE. The ETEN
|
55 |
| -# version of BIG5 organizes the former range differently, and adds |
56 |
| -# additional characters in the latter range. The correct mappings |
57 |
| -# these ranges need to be determined. |
| 46 | +# version of BIG5 organizes the former range differently, and adds |
| 47 | +# additional characters in the latter range. The correct mappings |
| 48 | +# these ranges need to be determined. |
58 | 49 | #
|
59 | 50 | # 2. There is an uncertainty in the mapping of the Big Five character
|
60 |
| -# 0xA3BC. This character occurs within the Big Five block of tone marks |
61 |
| -# for bopomofo and is intended to be the tone mark for the first tone in |
62 |
| -# Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER |
63 |
| -# MACRON (Mandarin Chinese first tone) to reflect this semantic. |
64 |
| -# However, because bopomofo uses the absense of a tone mark to indicate |
65 |
| -# the first Mandarin tone, most implementations of Big Five represent |
66 |
| -# this character with a blank space, and so a mapping such as U+2003 EM SPACE |
67 |
| -# might be preferred. |
68 |
| -# |
69 |
| -# |
| 51 | +# 0xA3BC. This character occurs within the Big Five block of tone marks |
| 52 | +# for bopomofo and is intended to be the tone mark for the first tone in |
| 53 | +# Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER |
| 54 | +# MACRON (Mandarin Chinese first tone) to reflect this semantic. |
| 55 | +# However, because bopomofo uses the absense of a tone mark to indicate |
| 56 | +# the first Mandarin tone, most implementations of Big Five represent |
| 57 | +# this character with a blank space, and so a mapping such as U+2003 EM |
| 58 | +# SPACE might be preferred. |
70 | 59 | #
|
71 | 60 | # Format: Three tab-separated columns
|
72 | 61 | # Column #1 is the BIG5 code (in hex as 0xXXXX)
|
73 | 62 | # Column #2 is the Unicode (in hex as 0xXXXX)
|
74 | 63 | # Column #3 is the Unicode name (follows a comment sign, '#')
|
75 |
| -# The official names for Unicode characters U+4E00 |
76 |
| -# to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX", |
77 |
| -# where XXXX is the code point. Including all these |
78 |
| -# names in this file increases its size substantially |
79 |
| -# and needlessly. The token "<CJK>" is used for the |
80 |
| -# name of these characters. If necessary, it can be |
81 |
| -# expanded algorithmically by a parser or editor. |
| 64 | +# The official names for Unicode characters U+4E00 |
| 65 | +# to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX", |
| 66 | +# where XXXX is the code point. Including all these |
| 67 | +# names in this file increases its size substantially |
| 68 | +# and needlessly. The token "<CJK>" is used for the |
| 69 | +# name of these characters. If necessary, it can be |
| 70 | +# expanded algorithmically by a parser or editor. |
82 | 71 | #
|
83 | 72 | # The entries are in BIG5 order
|
84 | 73 | #
|
85 |
| -# Any comments or problems, contact < [email protected]> |
| 74 | +# Revision History: |
| 75 | +# |
| 76 | +# [v2.0, 2015 December 02] |
| 77 | +# updates to copyright notice and terms of use |
| 78 | +# no changes to character mappings |
| 79 | +# |
| 80 | +# [v1.0, 2011 October 14] |
| 81 | +# Updated terms of use to current wording. |
| 82 | +# Updated contact information. |
| 83 | +# No changes to the mapping data. |
| 84 | +# |
| 85 | +# [v0.0d3, 11 February 1994] |
| 86 | +# First release. |
86 | 87 | #
|
| 88 | +# Use the Unicode reporting form <http://www.unicode.org/reporting.html> |
| 89 | +# for any questions or comments or to report errors in the data. |
87 | 90 | #
|
| 91 | +# Manually added mapping of lower ASCII characters |
88 | 92 | 0x0 0x0
|
89 | 93 | 0x1 0x1
|
90 | 94 | 0x2 0x2
|
|
239 | 243 | 0xA157 0xFE31 # PRESENTATION FORM FOR VERTICAL EM DASH
|
240 | 244 | 0xA158 0x2014 # EM DASH
|
241 | 245 | 0xA159 0xFE33 # PRESENTATION FORM FOR VERTICAL LOW LINE
|
| 246 | +0xA15A 0xFFFD # *** NO MAPPING *** |
242 | 247 | 0xA15B 0xFE34 # PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
|
243 | 248 | 0xA15C 0xFE4F # WAVY LOW LINE
|
244 | 249 | 0xA15D 0xFF08 # FULLWIDTH LEFT PARENTHESIS
|
|
309 | 314 | 0xA1C0 0x32A3 # CIRCLED IDEOGRAPH CORRECT
|
310 | 315 | 0xA1C1 0x2105 # CARE OF
|
311 | 316 | 0xA1C2 0x203E # OVERLINE
|
| 317 | +0xA1C3 0xFFFD # *** NO MAPPING *** |
312 | 318 | 0xA1C4 0xFF3F # FULLWIDTH LOW LINE
|
| 319 | +0xA1C5 0xFFFD # *** NO MAPPING *** |
313 | 320 | 0xA1C6 0xFE49 # DASHED OVERLINE
|
314 | 321 | 0xA1C7 0xFE4A # CENTRELINE OVERLINE
|
315 | 322 | 0xA1C8 0xFE4D # DASHED LOW LINE
|
|
366 | 373 | 0xA1FB 0x2198 # SOUTH EAST ARROW
|
367 | 374 | 0xA1FC 0x2225 # PARALLEL TO
|
368 | 375 | 0xA1FD 0x2223 # DIVIDES
|
| 376 | +0xA1FE 0xFFFD # *** NO MAPPING *** |
| 377 | +0xA240 0xFFFD # *** NO MAPPING *** |
369 | 378 | 0xA241 0xFF0F # FULLWIDTH SOLIDUS
|
370 | 379 | 0xA242 0xFF3C # FULLWIDTH REVERSE SOLIDUS
|
371 | 380 | 0xA243 0xFF04 # FULLWIDTH DOLLAR SIGN
|
|
471 | 480 | 0xA2C9 0x3027 # HANGZHOU NUMERAL SEVEN
|
472 | 481 | 0xA2CA 0x3028 # HANGZHOU NUMERAL EIGHT
|
473 | 482 | 0xA2CB 0x3029 # HANGZHOU NUMERAL NINE
|
| 483 | +0xA2CC 0xFFFD # *** NO MAPPING *** |
474 | 484 | 0xA2CD 0x5344 # <CJK>
|
| 485 | +0xA2CE 0xFFFD # *** NO MAPPING *** |
475 | 486 | 0xA2CF 0xFF21 # FULLWIDTH LATIN CAPITAL LETTER A
|
476 | 487 | 0xA2D0 0xFF22 # FULLWIDTH LATIN CAPITAL LETTER B
|
477 | 488 | 0xA2D1 0xFF23 # FULLWIDTH LATIN CAPITAL LETTER C
|
|
13916 | 13927 | 0xF9D3 0x9F7E # <CJK>
|
13917 | 13928 | 0xF9D4 0x9F49 # <CJK>
|
13918 | 13929 | 0xF9D5 0x9F98 # <CJK>
|
13919 |
| -# The following ETEN extensions are copied from CP950.txt: |
| 13930 | +# The following ETEN extensions are copied from CP950.txt (https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT): |
13920 | 13931 | 0xF9D6 0x7881 #CJK UNIFIED IDEOGRAPH
|
13921 | 13932 | 0xF9D7 0x92B9 #CJK UNIFIED IDEOGRAPH
|
13922 | 13933 | 0xF9D8 0x88CF #CJK UNIFIED IDEOGRAPH
|
|
0 commit comments