Add a README file for multi-byte. This file is contributed by

Chih-Chang Hsieh <[email protected]>, written in traditional Chinese (Big5).
author: Tatsuo Ishii 2001-01-09 09:54:11 +0000
committer: Tatsuo Ishii 2001-01-09 09:54:11 +0000
commit: 9df71d4c6a729494932319933e5a461fbbdb9e71 (patch)
tree: bb2041cf1c26996e178716c2790b0773d12c63a4
parent: 7edb4442aae0261aa5d17943a96f3d09345209ed (diff)
1 files changed, 326 insertions, 0 deletions
diff --git a/doc/README.mb.big5 b/doc/README.mb.big5
new file mode 100644
index 0000000000..eabbcc42e4
--- /dev/null
+++ b/doc/README.mb.big5
@@ -0,0 +1,326 @@
+PostgreSQL 7.0.1 multi-byte (MB) support README	  May 20 2000
+
+						Tatsuo Ishii
+						[email protected]
+		  http://www.sra.co.jp/people/t-ishii/PostgreSQL/
+
+[��] 1. �P�¥ۤ��F�� (Tatsuo Ishii) ����!
+     2. �����������ҵL, ��Ķ�Y�����~, ���p�� [email protected]
+
+
+0. ²��
+
+MB �䴩�O���F�� PostgreSQL ��B�z�h�줸�զr�� (multi-byte character),
+�Ҧp: EUC (Extended Unix Code), Unicode (�Τ@�X) �M Mule internal code
+(�h��y�����X). �b MB ���䴩�U, �A�i�H�b���W���ܦ� (regexp), LIKE ��
+��L�@�Ǩ禡���ϥΦh�줸�զr��. �w�]���s�X�t�Υi���M��A�w�� PostgreSQL
+�ɪ� initdb(1) �R�O, ��i�� createdb(1) �R�O�Ϋإ߸�Ʈw�� SQL �R�O�M�w.
+�ҥH�A�i�H���h�Ӥ��P�s�X�t�Ϊ���Ʈw.
+
+MB �䴩�]�ѨM�F�@�� 8 �줸��줸�զr���� (�]�t ISO-8859-1) ���������D,
+(�ڨèS�����Ҧ����������D���ѨM�F, �ڥu�O�T�{�F�j�k���հ��榨�\,
+�Ӥ@�Ǫk�y�r���b MB �׸ɤU�i�H�ϥ�. �p�G�A�b�ϥ� 8 �줸�r���ɵo�{�F
+������D, �гq����)
+
+1. �p��ϥ�
+
+�sĶ PostgreSQL �e, ���� configure �ɨϥ� multibyte ���ﶵ
+
+	% ./configure --enable-multibyte[=encoding_system]
+	% ./configure --enable-multibyte[=�s�X�t��]
+
+�䤤���s�X�t�Υi�H���w���U���䤤���@:
+
+	SQL_ASCII		ASCII
+	EUC_JP			Japanese EUC
+	EUC_CN			Chinese EUC
+	EUC_KR			Korean EUC
+	EUC_TW			Taiwan EUC
+	UNICODE			Unicode(UTF-8)
+	MULE_INTERNAL		Mule internal
+	LATIN1			ISO 8859-1 English and some European languages
+	LATIN2			ISO 8859-2 English and some European languages
+	LATIN3			ISO 8859-3 English and some European languages
+	LATIN4			ISO 8859-4 English and some European languages
+	LATIN5			ISO 8859-5 English and some European languages
+	KOI8			KOI8-R
+	WIN			Windows CP1251
+	ALT			Windows CP866
+
+�Ҧp:
+
+	% ./configure --enable-multibyte=EUC_JP
+
+�p�G�ٲ����w�s�X�t��, ����w�]�ȴN�O SQL_ASCII.
+
+2. �p��]�w�s�X
+
+initdb �R�O�w�q PostgresSQL �w�˫᪺�w�]�s�X, �Ҧp:
+
+	% initdb -E EUC_JP
+
+�N�w�]���s�X�]�w�� EUC_JP (Extended Unix Code for Japanese), �p�G�A���w
+�������r��, �A�]�i�H�� "--encoding" �Ӥ��� "-E". �p�G�S���ϥ� -E ��
+--encoding ���ﶵ, ����sö�ɪ��]�w�|�����w�]��.
+
+�A�i�H�إߨϥΤ��P�s�X����Ʈw:
+
+	% createdb -E EUC_KR korean
+
+�o�өR�O�|�إߤ@�ӥs�� "korean" ����Ʈw, �Ө�ĥ� EUC_KR �s�X.
+�t�~���@�Ӥ�k, �O�ϥ� SQL �R�O, �]�i�H�F��P�˪��ت�:
+
+	CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
+
+�b pg_database �t�γW��� (system catalog) �����@�� "encoding" �����,
+�N�O�ΨӬ����@�Ӹ�Ʈw���s�X. �A�i�H�� psql -l �ζi�J psql ��� \l ��
+�R�O�Ӭd�ݸ�Ʈw�ĥΦ�ؽs�X:
+
+$ psql -l
+            List of databases
+   Database    |  Owner  |   Encoding    
+---------------+---------+---------------
+ euc_cn        | t-ishii | EUC_CN
+ euc_jp        | t-ishii | EUC_JP
+ euc_kr        | t-ishii | EUC_KR
+ euc_tw        | t-ishii | EUC_TW
+ mule_internal | t-ishii | MULE_INTERNAL
+ regression    | t-ishii | SQL_ASCII
+ template1     | t-ishii | EUC_JP
+ test          | t-ishii | EUC_JP
+ unicode       | t-ishii | UNICODE
+(9 rows)
+
+3. �e�ݻP��ݽs�X���۰��ഫ
+
+[��: �e�ݪx���Ȥ�ݪ��{��, �i��O psql �R�O��Ķ��, �αĥ� libpq �� C 
+�{��, Perl �{��, �Ϊ̬O�z�L ODBC ���������ε{��. �ӫ�ݴN�O�� PostgreSQL
+��Ʈw�����A�{��]
+
+PostgreSQL �䴩�Y�ǽs�X�b�e�ݻP��ݶ����۰��ഫ: [��: �o�̩ҿת��۰�
+�ഫ�O���A�b�e�ݤΫ�ݩҫŧi�ĥΪ��s�X���P, ���u�n PostgreSQL �䴩�o
+��ؽs�X�����ഫ, ���򥦷|���A�b�s���e���ഫ]
+
+  encoding of backend			available encoding of frontend
+  --------------------------------------------------------------------
+	EUC_JP				EUC_JP, SJIS
+  
+	EUC_TW				EUC_TW, BIG5
+  
+  	LATIN2				LATIN2, WIN1250
+  
+	LATIN5				LATIN5, WIN, ALT
+  
+	MULE_INTERNAL			EUC_JP, SJIS, EUC_KR, EUC_CN, 
+					EUC_TW, BIG5, LATIN1 to LATIN5, 
+					WIN, ALT, WIN1250
+
+�b�Ұʦ۰ʽs�X�ഫ���e, �A�����i�D PostgreSQL �A�n�b�e�ݱĥΦ�ؽs�X.
+���n�X�Ӥ�k�i�H�F��o�ӥت�:
+
+o �b psql �R�O��Ķ�����ϥ� \encoding �o�өR�O
+
+\encoding �o�өR�O�i�H���A���W�����e�ݽs�X, �Ҧp, �A�n�N�e�ݽs�X������ SJIS,
+����Х�:
+
+	\encoding SJIS
+
+o �ϥ� libpq [��: PostgreSQL ��Ʈw�� C API �{���w] ���禡
+
+psql �� \encoding �R�O���u�O�h�I�s PQsetClientEncoding() �o�Ө禡�ӹF��ت�.
+
+  int PQsetClientEncoding(PGconn *conn, const char *encoding)
+
+�W���� conn �o�ӰѼƥN���@�ӹ��ݪ��s�u, encoding �o�ӰѼƭn��A�Q�Ϊ��s�X,
+���p�����\�a�]�w�F�s�X, �K�|�Ǧ^ 0 ��, ���Ѫ��ܶǦ^ -1. �ܩ�ثe�s�u���s�X�i
+�Q�ΥH�U�禡�d��:
+
+  int PQclientEncoding(const PGconn *conn)
+
+�o�̭n�`�N���O: �o�Ө禡�Ǧ^���O�s�X���N�� (encoding id, �O�Ӿ�ƭ�),
+�Ӥ��O�s�X���W�٦r�� (�p "EUC_JP"), �p�G�A�n�ѽs�X�N���o���s�X�W��,
+�����I�s:
+
+char *pg_encoding_to_char(int encoding_id)
+
+o �ϥ� PGCLIENTENCODING �o�������ܼ�
+
+�p�G�e�ݩ��]�w�F PGCLIENTENCODING �o�@�������ܼ�, �����ݷ|���s�X�۰��ഫ.
+
+[��] PostgreSQL 7.0.0 ~ 7.0.3 ���� bug -- ���{�o�������ܼ�
+
+o �ϥ� SET CLIENT_ENCODING TO �o�� SQL ���R�O
+
+�n�]�w�e�ݪ��s�X�i�H�ΥH�U�o�� SQL �R�O:
+
+	SET CLIENT_ENCODING TO 'encoding';
+
+�A�]�i�H�ϥ� SQL92 ���y�k "SET NAMES" �F��P�˪��ت�:
+
+	SET NAMES 'encoding';
+
+�d�ߥثe���e�ݽs�X�i�H�ΥH�U�o�� SQL �R�O:
+
+	SHOW CLIENT_ENCODING;
+
+��������ӹw�]���s�X, �ΥH�U�o�� SQL �R�O:
+
+	RESET CLIENT_ENCODING;
+
+[��] �ϥ� psql �R�O��Ķ����, ��ĳ���n�γo�Ӥ�k, �Х� \encoding
+
+4. ���� Unicode (�Τ@�X)
+
+�Τ@�X�M��L�s�X�����ഫ�i��n�b 7.1 ����~�|��{.
+
+5. �p�G�L�k�ഫ�|�o�ͤ����?
+
+���]�A�b��ݿ�ܤF EUC_JP �o�ӽs�X, �e�ݨϥ� LATIN1, (�Y�Ǥ��r���L�k�ഫ�� 
+LATIN1) �b�o�Ӫ��p�U, �Y�Ӧr���Y�����ন LATIN1 �r����, �N�|�Q�ন�H�U������:
+
+	(�Q���i���)
+
+6. �ѦҸ��
+
+These are good sources to start learning various kind of encoding
+systems.
+
+ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
+	Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
+	appear in section 3.2.
+
+Unicode: http://www.unicode.org/
+	The homepage of UNICODE.
+
+	RFC 2044
+	UTF-8 is defined here.
+
+5. History
+
+May 20, 2000
+	* SJIS UDC (NEC selection IBM kanji) support contributed
+	  by Eiji Tokuya
+	* Changes above will appear in 7.0.1
+
+Mar 22, 2000
+	* Add new libpq functions PQsetClientEncoding, PQclientEncoding
+	* ./configure --with-mb=EUC_JP
+	  now deprecated. use 
+	  ./configure --enable-multibyte=EUC_JP
+	  instead
+  	* Add SQL_ASCII regression test case
+	* Add SJIS User Defined Character (UDC) support
+	* All of above will appear in 7.0
+
+July 11, 1999
+	* Add support for WIN1250 (Windows Czech) as a client encoding
+	  (contributed by Pavel Behal)
+	* fix some compiler warnings (contributed by Tomoaki Nishiyama)
+
+Mar 23, 1999
+	* Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
+	  (thanks Oleg Broytmann for testing)
+	* Fix problem with MB and locale
+
+Jan 26, 1999
+	* Add support for Big5 for fronend encoding
+	  (you need to create a database with EUC_TW to use Big5)
+	* Add regression test case for EUC_TW
+	  (contributed by Jonah Kuo <[email protected]>)
+
+Dec 15, 1998
+	* Bugs related to SQL_ASCII support fixed
+
+Nov 5, 1998
+	* 6.4 release. In this version, pg_database has "encoding"
+	  column that represents the database encoding
+
+Jul 22, 1998
+	* determine encoding at initdb/createdb rather than compile time
+	* support for PGCLIENTENCODING when issuing COPY command
+	* support for SQL92 syntax "SET NAMES"
+	* support for LATIN2-5
+	* add UNICODE regression test case
+	* new test suite for MB
+	* clean up source files
+
+Jun 5, 1998
+	* add support for the encoding translation between the backend
+	  and the frontend
+	* new command SET CLIENT_ENCODING etc. added
+	* add support for LATIN1 character set
+	* enhance 8 bit cleaness
+
+April 21, 1998 some enhancements/fixes
+	* character_length(), position(), substring() are now aware of 
+	  multi-byte characters
+	* add octet_length()
+	* add --with-mb option to configure
+	* new regression tests for EUC_KR
+  	  (contributed by "Soonmyung. Hong" <[email protected]>)
+	* add some test cases to the EUC_JP regression test
+	* fix problem in regress/regress.sh in case of System V
+	* fix toupper(), tolower() to handle 8bit chars
+
+Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
+
+Mar 10, 1998 PL2 released
+	* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
+	* add an English document (this file)
+	* fix problems concerning 8-bit single byte characters
+
+Mar 1, 1998 PL1 released
+
+Appendix:
+
+[Here is a good documentation explaining how to use WIN1250 on
+Windows/ODBC from Pavel Behal. Please note that Installation step 1)
+is not necceary in 6.5.1 -- Tatsuo]
+
+Version: 0.91 for PgSQL 6.5
+Author: Pavel Behal
+Revised by: Tatsuo Ishii
+Email: [email protected]
+Licence: The Same as PostgreSQL
+
+Sorry for my Eglish and C code, I'm not native :-)
+
+!!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+Instalation:
+------------
+1) Change three affected files in source directories 
+    (I don't have time to create proper patch diffs, I don't know how)
+2) Compile with enabled locale and multibyte set to LATIN2
+3) Setup properly your instalation, do not forget to create locale
+   variables in your profile (environment). Ex. (may not be exactly true):
+	LC_ALL=cs_CZ.ISO8859-2
+	LC_COLLATE=cs_CZ.ISO8859-2
+	LC_CTYPE=cs_CZ.ISO8859-2
+	LC_MONETARY=cs_CZ.ISO8859-2
+	LC_NUMERIC=cs_CZ.ISO8859-2
+	LC_TIME=cs_CZ.ISO8859-2
+4) You have to start the postmaster with locales set!
+5) Try it with Czech language, it have to sort
+5) Install ODBC driver for PgSQL into your M$ Windows
+6) Setup properly your data source. Include this line in your ODBC
+   configuration dialog in field "Connect Settings:" :
+	SET CLIENT_ENCODING = 'WIN1250';
+7) Now try it again, but in Windows with ODBC.
+
+Description:
+------------
+- Depends on proper system locales, tested with RH6.0 and Slackware 3.6,
+  with cs_CZ.iso8859-2 loacle
+- Never try to set-up server multibyte database encoding to WIN1250,
+  always use LATIN2 instead. There is not WIN1250 locale in Unix
+- WIN1250 encoding is useable only for M$W ODBC clients. The characters are
+  on thy fly re-coded, to be displayed and stored back properly
+ 
+Important:
+----------
+- it reorders your sort order depending on your LC_... setting, so don't be
+  confused with regression tests, they don't use locale
+- "ch" is corectly sorted only in some newer locales (Ex. RH6.0)
+- you have to insert money as '162,50' (with comma in aphostrophes!)
+- not tested properly
author	Tatsuo Ishii	2001-01-09 09:54:11 +0000
committer	Tatsuo Ishii	2001-01-09 09:54:11 +0000
commit	9df71d4c6a729494932319933e5a461fbbdb9e71 (patch)
tree	bb2041cf1c26996e178716c2790b0773d12c63a4
parent	7edb4442aae0261aa5d17943a96f3d09345209ed (diff)