diff options
author | Tatsuo Ishii | 2001-01-09 09:54:11 +0000 |
---|---|---|
committer | Tatsuo Ishii | 2001-01-09 09:54:11 +0000 |
commit | 9df71d4c6a729494932319933e5a461fbbdb9e71 (patch) | |
tree | bb2041cf1c26996e178716c2790b0773d12c63a4 | |
parent | 7edb4442aae0261aa5d17943a96f3d09345209ed (diff) |
Add a README file for multi-byte. This file is contributed by
Chih-Chang Hsieh <[email protected]>, written in traditional Chinese
(Big5).
-rw-r--r-- | doc/README.mb.big5 | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/doc/README.mb.big5 b/doc/README.mb.big5 new file mode 100644 index 0000000000..eabbcc42e4 --- /dev/null +++ b/doc/README.mb.big5 @@ -0,0 +1,326 @@ +PostgreSQL 7.0.1 multi-byte (MB) support README May 20 2000 + + Tatsuo Ishii + http://www.sra.co.jp/people/t-ishii/PostgreSQL/ + +[��] 1. �P�¥ۤ��F�� (Tatsuo Ishii) ����! + 2. �����������ҵL, ��Ķ�Y�����~, ���p�� [email protected] + + +0. ²�� + +MB �䴩�O���F�� PostgreSQL ��B�z�h�줸�զr�� (multi-byte character), +�Ҧp: EUC (Extended Unix Code), Unicode (�Τ@�X) �M Mule internal code +(�h��y�����X). �b MB ���䴩�U, �A�i�H�b���W���ܦ� (regexp), LIKE �� +��L�@�Ǩ禡���ϥΦh�줸�զr��. �w�]���s�X�t�Υi���M��A�w�� PostgreSQL +�ɪ� initdb(1) �R�O, ��i�� createdb(1) �R�O�Ϋإ߸�Ʈw�� SQL �R�O�M�w. +�ҥH�A�i�H���h�Ӥ��P�s�X�t�Ϊ���Ʈw. + +MB �䴩�]�ѨM�F�@�� 8 �줸��줸�զr���� (�]�t ISO-8859-1) ���������D, +(�ڨèS�����Ҧ����������D���ѨM�F, �ڥu�O�T�{�F�j�k���հ��榨�\, +�Ӥ@�Ǫk�y�r���b MB �ɤU�i�H�ϥ�. �p�G�A�b�ϥ� 8 �줸�r���ɵo�{�F +������D, �гq����) + +1. �p��ϥ� + +�sĶ PostgreSQL �e, ���� configure �ɨϥ� multibyte ���ﶵ + + % ./configure --enable-multibyte[=encoding_system] + % ./configure --enable-multibyte[=�s�X�t��] + +�䤤���s�X�t�Υi�H���w���U���䤤���@: + + SQL_ASCII ASCII + EUC_JP Japanese EUC + EUC_CN Chinese EUC + EUC_KR Korean EUC + EUC_TW Taiwan EUC + UNICODE Unicode(UTF-8) + MULE_INTERNAL Mule internal + LATIN1 ISO 8859-1 English and some European languages + LATIN2 ISO 8859-2 English and some European languages + LATIN3 ISO 8859-3 English and some European languages + LATIN4 ISO 8859-4 English and some European languages + LATIN5 ISO 8859-5 English and some European languages + KOI8 KOI8-R + WIN Windows CP1251 + ALT Windows CP866 + +�Ҧp: + + % ./configure --enable-multibyte=EUC_JP + +�p�G�ٲ����w�s�X�t��, ����w�]�ȴN�O SQL_ASCII. + +2. �p��]�w�s�X + +initdb �R�O�w�q PostgresSQL �w�˫᪺�w�]�s�X, �Ҧp: + + % initdb -E EUC_JP + +�N�w�]���s�X�]�w�� EUC_JP (Extended Unix Code for Japanese), �p�G�A���w +�������r��, �A�]�i�H�� "--encoding" �Ӥ��� "-E". �p�G�S���ϥ� -E �� +--encoding ���ﶵ, ����sö�ɪ��]�w�|�����w�]��. + +�A�i�H�إߨϥΤ��P�s�X����Ʈw: + + % createdb -E EUC_KR korean + +�o�өR�O�|�إߤ@�ӥs�� "korean" ����Ʈw, �Ө�ĥ� EUC_KR �s�X. +�t�~���@�Ӥ�k, �O�ϥ� SQL �R�O, �]�i�H�F��P�˪��ت�: + + CREATE DATABASE korean WITH ENCODING = 'EUC_KR'; + +�b pg_database �t�γW��� (system catalog) �����@�� "encoding" �����, +�N�O�ΨӬ����@�Ӹ�Ʈw���s�X. �A�i�H�� psql -l �ζi�J psql ��� \l �� +�R�O�Ӭd�ݸ�Ʈw�ĥΦ�ؽs�X: + +$ psql -l + List of databases + Database | Owner | Encoding +---------------+---------+--------------- + euc_cn | t-ishii | EUC_CN + euc_jp | t-ishii | EUC_JP + euc_kr | t-ishii | EUC_KR + euc_tw | t-ishii | EUC_TW + mule_internal | t-ishii | MULE_INTERNAL + regression | t-ishii | SQL_ASCII + template1 | t-ishii | EUC_JP + test | t-ishii | EUC_JP + unicode | t-ishii | UNICODE +(9 rows) + +3. �e�ݻP��ݽs�X���۰��ഫ + +[��: �e�ݪx���Ȥ�ݪ��{��, �i��O psql �R�O��Ķ��, �αĥ� libpq �� C +�{��, Perl �{��, �Ϊ̬O�z�L ODBC ���������ε{��. �ӫ�ݴN�O�� PostgreSQL +��Ʈw�����A�{��] + +PostgreSQL �䴩�Y�ǽs�X�b�e�ݻP��ݶ����۰��ഫ: [��: �o�̩ҿת��۰� +�ഫ�O���A�b�e�ݤΫ�ݩҫŧi�ĥΪ��s�X���P, ���u�n PostgreSQL �䴩�o +��ؽs�X�����ഫ, ���|���A�b�s���e���ഫ] + + encoding of backend available encoding of frontend + -------------------------------------------------------------------- + EUC_JP EUC_JP, SJIS + + EUC_TW EUC_TW, BIG5 + + LATIN2 LATIN2, WIN1250 + + LATIN5 LATIN5, WIN, ALT + + MULE_INTERNAL EUC_JP, SJIS, EUC_KR, EUC_CN, + EUC_TW, BIG5, LATIN1 to LATIN5, + WIN, ALT, WIN1250 + +�b�Ұʦ۰ʽs�X�ഫ���e, �A�����i�D PostgreSQL �A�n�b�e�ݱĥΦ�ؽs�X. +���n�X�Ӥ�k�i�H�F��o�ӥت�: + +o �b psql �R�O��Ķ�����ϥ� \encoding �o�өR�O + +\encoding �o�өR�O�i�H���A���W�����e�ݽs�X, �Ҧp, �A�n�N�e�ݽs�X������ SJIS, +����Х�: + + \encoding SJIS + +o �ϥ� libpq [��: PostgreSQL ��Ʈw�� C API �{���w] ���禡 + +psql �� \encoding �R�O���u�O�h�I�s PQsetClientEncoding() �o�Ө禡�ӹF��ت�. + + int PQsetClientEncoding(PGconn *conn, const char *encoding) + +�W���� conn �o�ӰѼƥN���@�ӹ��ݪ��s�u, encoding �o�ӰѼƭn��A�Q�Ϊ��s�X, +���p�����\�a�]�w�F�s�X, �K�|�Ǧ^ 0 ��, ���Ѫ��ܶǦ^ -1. �ܩ�ثe�s�u���s�X�i +�Q�ΥH�U�禡�d��: + + int PQclientEncoding(const PGconn *conn) + +�o�̭n�`�N���O: �o�Ө禡�Ǧ^���O�s�X���N�� (encoding id, �O�Ӿ�ƭ�), +�Ӥ��O�s�X���W�٦r�� (�p "EUC_JP"), �p�G�A�n�ѽs�X�N���o���s�X�W��, +�����I�s: + +char *pg_encoding_to_char(int encoding_id) + +o �ϥ� PGCLIENTENCODING �o�������ܼ� + +�p�G�e�ݩ��]�w�F PGCLIENTENCODING �o�@�������ܼ�, �����ݷ|���s�X�۰��ഫ. + +[��] PostgreSQL 7.0.0 ~ 7.0.3 ���� bug -- ���{�o�������ܼ� + +o �ϥ� SET CLIENT_ENCODING TO �o�� SQL ���R�O + +�n�]�w�e�ݪ��s�X�i�H�ΥH�U�o�� SQL �R�O: + + SET CLIENT_ENCODING TO 'encoding'; + +�A�]�i�H�ϥ� SQL92 ���y�k "SET NAMES" �F��P�˪��ت�: + + SET NAMES 'encoding'; + +�d�ߥثe���e�ݽs�X�i�H�ΥH�U�o�� SQL �R�O: + + SHOW CLIENT_ENCODING; + +��������ӹw�]���s�X, �ΥH�U�o�� SQL �R�O: + + RESET CLIENT_ENCODING; + +[��] �ϥ� psql �R�O��Ķ����, ��ij���n�γo�Ӥ�k, �Х� \encoding + +4. ���� Unicode (�Τ@�X) + +�Τ@�X�M��L�s�X�����ഫ�i��n�b 7.1 ����~�|��{. + +5. �p�G�L�k�ഫ�|�o�ͤ����? + +���]�A�b��ݿ�ܤF EUC_JP �o�ӽs�X, �e�ݨϥ� LATIN1, (�Y�Ǥ��r���L�k�ഫ�� +LATIN1) �b�o�Ӫ��p�U, �Y�Ӧr���Y�����ন LATIN1 �r����, �N�|�Q�ন�H�U������: + + (�Q���i���) + +6. �ѦҸ�� + +These are good sources to start learning various kind of encoding +systems. + +ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf + Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW + appear in section 3.2. + +Unicode: http://www.unicode.org/ + The homepage of UNICODE. + + RFC 2044 + UTF-8 is defined here. + +5. History + +May 20, 2000 + * SJIS UDC (NEC selection IBM kanji) support contributed + by Eiji Tokuya + * Changes above will appear in 7.0.1 + +Mar 22, 2000 + * Add new libpq functions PQsetClientEncoding, PQclientEncoding + * ./configure --with-mb=EUC_JP + now deprecated. use + ./configure --enable-multibyte=EUC_JP + instead + * Add SQL_ASCII regression test case + * Add SJIS User Defined Character (UDC) support + * All of above will appear in 7.0 + +July 11, 1999 + * Add support for WIN1250 (Windows Czech) as a client encoding + (contributed by Pavel Behal) + * fix some compiler warnings (contributed by Tomoaki Nishiyama) + +Mar 23, 1999 + * Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866) + (thanks Oleg Broytmann for testing) + * Fix problem with MB and locale + +Jan 26, 1999 + * Add support for Big5 for fronend encoding + (you need to create a database with EUC_TW to use Big5) + * Add regression test case for EUC_TW + (contributed by Jonah Kuo <[email protected]>) + +Dec 15, 1998 + * Bugs related to SQL_ASCII support fixed + +Nov 5, 1998 + * 6.4 release. In this version, pg_database has "encoding" + column that represents the database encoding + +Jul 22, 1998 + * determine encoding at initdb/createdb rather than compile time + * support for PGCLIENTENCODING when issuing COPY command + * support for SQL92 syntax "SET NAMES" + * support for LATIN2-5 + * add UNICODE regression test case + * new test suite for MB + * clean up source files + +Jun 5, 1998 + * add support for the encoding translation between the backend + and the frontend + * new command SET CLIENT_ENCODING etc. added + * add support for LATIN1 character set + * enhance 8 bit cleaness + +April 21, 1998 some enhancements/fixes + * character_length(), position(), substring() are now aware of + multi-byte characters + * add octet_length() + * add --with-mb option to configure + * new regression tests for EUC_KR + (contributed by "Soonmyung. Hong" <[email protected]>) + * add some test cases to the EUC_JP regression test + * fix problem in regress/regress.sh in case of System V + * fix toupper(), tolower() to handle 8bit chars + +Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1 + +Mar 10, 1998 PL2 released + * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL + * add an English document (this file) + * fix problems concerning 8-bit single byte characters + +Mar 1, 1998 PL1 released + +Appendix: + +[Here is a good documentation explaining how to use WIN1250 on +Windows/ODBC from Pavel Behal. Please note that Installation step 1) +is not necceary in 6.5.1 -- Tatsuo] + +Version: 0.91 for PgSQL 6.5 +Author: Pavel Behal +Revised by: Tatsuo Ishii +Email: [email protected] +Licence: The Same as PostgreSQL + +Sorry for my Eglish and C code, I'm not native :-) + +!!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + +Instalation: +------------ +1) Change three affected files in source directories + (I don't have time to create proper patch diffs, I don't know how) +2) Compile with enabled locale and multibyte set to LATIN2 +3) Setup properly your instalation, do not forget to create locale + variables in your profile (environment). Ex. (may not be exactly true): + LC_ALL=cs_CZ.ISO8859-2 + LC_COLLATE=cs_CZ.ISO8859-2 + LC_CTYPE=cs_CZ.ISO8859-2 + LC_MONETARY=cs_CZ.ISO8859-2 + LC_NUMERIC=cs_CZ.ISO8859-2 + LC_TIME=cs_CZ.ISO8859-2 +4) You have to start the postmaster with locales set! +5) Try it with Czech language, it have to sort +5) Install ODBC driver for PgSQL into your M$ Windows +6) Setup properly your data source. Include this line in your ODBC + configuration dialog in field "Connect Settings:" : + SET CLIENT_ENCODING = 'WIN1250'; +7) Now try it again, but in Windows with ODBC. + +Description: +------------ +- Depends on proper system locales, tested with RH6.0 and Slackware 3.6, + with cs_CZ.iso8859-2 loacle +- Never try to set-up server multibyte database encoding to WIN1250, + always use LATIN2 instead. There is not WIN1250 locale in Unix +- WIN1250 encoding is useable only for M$W ODBC clients. The characters are + on thy fly re-coded, to be displayed and stored back properly + +Important: +---------- +- it reorders your sort order depending on your LC_... setting, so don't be + confused with regression tests, they don't use locale +- "ch" is corectly sorted only in some newer locales (Ex. RH6.0) +- you have to insert money as '162,50' (with comma in aphostrophes!) +- not tested properly |