Skip to content

Commit 421c66b

Browse files
committed
Modify CREATE DATABASE to enforce that the source database's encoding setting
must be used for the new database, except when copying from template0. This is the same rule that we now enforce for locale settings, and it has the same motivation: databases other than template0 might contain data that would be invalid according to a different setting. This represents another step in a continuing process of locking down ways in which encoding violations could occur inside the backend. Per discussion of a few days ago. In passing, fix pre-existing breakage of mbregress.sh, and fix up a couple of ereport() calls in dbcommands.c that failed to specify sqlstate codes.
1 parent ab4e386 commit 421c66b

File tree

6 files changed

+97
-56
lines changed

6 files changed

+97
-56
lines changed

doc/src/sgml/charset.sgml

+22-14
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.93 2009/04/06 08:42:52 heikki Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.94 2009/05/06 16:15:20 tgl Exp $ -->
22

33
<chapter id="charset">
44
<title>Localization</>
@@ -20,11 +20,9 @@
2020

2121
<listitem>
2222
<para>
23-
Providing a number of different character sets defined in the
24-
<productname>PostgreSQL</productname> server, including
25-
multiple-byte character sets, to support storing text in all
26-
kinds of languages, and providing character set translation between
27-
client and server.
23+
Providing a number of different character sets to support storing text
24+
in all kinds of languages, and providing character set translation
25+
between client and server.
2826
</para>
2927
</listitem>
3028
</itemizedlist>
@@ -75,8 +73,8 @@ initdb --locale=sv_SE
7573
names on your system depends on what was provided by the operating
7674
system vendor and what was installed. On most Unix systems, the command
7775
<literal>locale -a</> will provide a list of available locales.
78-
Windows uses more verbose names, such as <literal>German_Germany</>
79-
or <literal>Swedish_Sweden.1252</>.
76+
Windows uses more verbose locale names, such as <literal>German_Germany</>
77+
or <literal>Swedish_Sweden.1252</>, but the principles are the same.
8078
</para>
8179

8280
<para>
@@ -133,7 +131,7 @@ initdb --locale=sv_SE
133131
fixed when the database is created. You can use different settings
134132
for different databases, but once a database is created, you cannot
135133
change them for that database anymore. <literal>LC_COLLATE</literal>
136-
and <literal>LC_CTYPE</literal> are those categories. They affect
134+
and <literal>LC_CTYPE</literal> are these categories. They affect
137135
the sort order of indexes, so they must be kept fixed, or indexes on
138136
text columns will become corrupt. The default values for these
139137
categories are determined when <command>initdb</command> is run, and
@@ -169,7 +167,7 @@ initdb --locale=sv_SE
169167
For a given locale category, say the collation, the following
170168
environment variables are consulted in this order until one is
171169
found to be set: <envar>LC_ALL</envar>, <envar>LC_COLLATE</envar>
172-
(the variable corresponding to the respective category),
170+
(or the variable corresponding to the respective category),
173171
<envar>LANG</envar>. If none of these environment variables are
174172
set then the locale defaults to <literal>C</literal>.
175173
</para>
@@ -186,8 +184,9 @@ initdb --locale=sv_SE
186184

187185
<para>
188186
To enable messages to be translated to the user's preferred language,
189-
<acronym>NLS</acronym> must have been enabled at build time. This
190-
choice is independent of the other locale support.
187+
<acronym>NLS</acronym> must have been selected at build time
188+
(<literal>configure --enable-nls</>). All other locale support is
189+
built in automatically.
191190
</para>
192191
</sect2>
193192

@@ -325,6 +324,7 @@ initdb --locale=sv_SE
325324
<envar>LC_COLLATE</> locale settings. For <literal>C</> or
326325
<literal>POSIX</> locale, any character set is allowed, but for other
327326
locales there is only one character set that will work correctly.
327+
(On Windows, however, UTF-8 encoding can be used with any locale.)
328328
</para>
329329

330330
<sect2 id="multibyte-charset-supported">
@@ -752,6 +752,14 @@ createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr
752752
CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
753753
</programlisting>
754754

755+
Notice that the above commands specify copying the <literal>template0</>
756+
database. When copying any other database, the encoding and locale
757+
settings cannot be changed from those of the source database, because
758+
that might result in corrupt data. For more information see
759+
<xref linkend="manage-ag-templatedbs">.
760+
</para>
761+
762+
<para>
755763
The encoding for a database is stored in the system catalog
756764
<literal>pg_database</literal>. You can see it by using the
757765
<option>-l</option> option or the <command>\l</command> command
@@ -777,7 +785,7 @@ $ <userinput>psql -l</userinput>
777785
<para>
778786
On most modern operating systems, <productname>PostgreSQL</productname>
779787
can determine which character set is implied by an <envar>LC_CTYPE</>
780-
setting, and it will enforce that only the correct database encoding is
788+
setting, and it will enforce that only the matching database encoding is
781789
used. On older systems it is your responsibility to ensure that you use
782790
the encoding expected by the locale you have selected. A mistake in
783791
this area is likely to lead to strange misbehavior of locale-dependent
@@ -1225,7 +1233,7 @@ RESET client_encoding;
12251233

12261234
<listitem>
12271235
<para>
1228-
The web site of the Unicode Consortium
1236+
The web site of the Unicode Consortium.
12291237
</para>
12301238
</listitem>
12311239
</varlistentry>

doc/src/sgml/manage-ag.sgml

+19-11
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/manage-ag.sgml,v 2.57 2007/11/08 15:21:03 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/manage-ag.sgml,v 2.58 2009/05/06 16:15:20 tgl Exp $ -->
22

33
<chapter id="managing-databases">
44
<title>Managing Databases</title>
@@ -203,8 +203,17 @@ createdb -O <replaceable>rolename</> <replaceable>dbname</>
203203
<literal>template1</>. This is particularly handy when restoring a
204204
<literal>pg_dump</> dump: the dump script should be restored in a
205205
virgin database to ensure that one recreates the correct contents
206-
of the dumped database, without any conflicts with additions that
207-
can now be present in <literal>template1</>.
206+
of the dumped database, without any conflicts with objects that
207+
might have been added to <literal>template1</> later on.
208+
</para>
209+
210+
<para>
211+
Another common reason for copying <literal>template0</> instead
212+
of <literal>template1</> is that new encoding and locale settings
213+
can be specified when copying <literal>template0</>, whereas a copy
214+
of <literal>template1</> must use the same settings it does.
215+
This is because <literal>template1</> might contain encoding-specific
216+
or locale-specific data, while <literal>template0</> is known not to.
208217
</para>
209218

210219
<para>
@@ -238,9 +247,8 @@ createdb -T template0 <replaceable>dbname</>
238247
<literal>datallowconn</literal>. <literal>datistemplate</literal>
239248
can be set to indicate that a database is intended as a template for
240249
<command>CREATE DATABASE</>. If this flag is set, the database can be
241-
cloned by
242-
any user with <literal>CREATEDB</> privileges; if it is not set, only superusers
243-
and the owner of the database can clone it.
250+
cloned by any user with <literal>CREATEDB</> privileges; if it is not set,
251+
only superusers and the owner of the database can clone it.
244252
If <literal>datallowconn</literal> is false, then no new connections
245253
to that database will be allowed (but existing sessions are not killed
246254
simply by setting the flag false). The <literal>template0</literal>
@@ -305,14 +313,14 @@ ALTER DATABASE mydb SET geqo TO off;
305313
<title>Destroying a Database</title>
306314

307315
<para>
308-
Databases are destroyed with the command
316+
Databases are destroyed with the command
309317
<xref linkend="sql-dropdatabase" endterm="sql-dropdatabase-title">:<indexterm><primary>DROP DATABASE</></>
310318
<synopsis>
311319
DROP DATABASE <replaceable>name</>;
312320
</synopsis>
313321
Only the owner of the database, or
314322
a superuser, can drop a database. Dropping a database removes all objects
315-
that were
323+
that were
316324
contained within the database. The destruction of a database cannot
317325
be undone.
318326
</para>
@@ -403,8 +411,8 @@ CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';
403411
<para>
404412
Tables, indexes, and entire databases can be assigned to
405413
particular tablespaces. To do so, a user with the <literal>CREATE</>
406-
privilege on a given tablespace must pass the tablespace name as a
407-
parameter to the relevant command. For example, the following creates
414+
privilege on a given tablespace must pass the tablespace name as a
415+
parameter to the relevant command. For example, the following creates
408416
a table in the tablespace <literal>space1</>:
409417
<programlisting>
410418
CREATE TABLE foo(i int) TABLESPACE space1;
@@ -493,7 +501,7 @@ SELECT spcname FROM pg_tablespace;
493501
update the <structname>pg_tablespace</> catalog to show the new
494502
locations. (If you do not, <literal>pg_dump</> will continue to show
495503
the old tablespace locations.)
496-
</para>
504+
</para>
497505

498506
</sect1>
499507
</chapter>

doc/src/sgml/ref/create_database.sgml

+23-16
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.51 2009/04/06 08:42:52 heikki Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.52 2009/05/06 16:15:21 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -116,19 +116,19 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
116116
</listitem>
117117
</varlistentry>
118118
<varlistentry>
119-
<term><replaceable class="parameter">collate</replaceable></term>
119+
<term><replaceable class="parameter">lc_collate</replaceable></term>
120120
<listitem>
121121
<para>
122122
Collation order (<literal>LC_COLLATE</>) to use in the new database.
123-
This affects the sort order applied to strings, e.g in queries with
123+
This affects the sort order applied to strings, e.g. in queries with
124124
ORDER BY, as well as the order used in indexes on text columns.
125125
The default is to use the collation order of the template database.
126126
See below for additional restrictions.
127127
</para>
128128
</listitem>
129129
</varlistentry>
130130
<varlistentry>
131-
<term><replaceable class="parameter">ctype</replaceable></term>
131+
<term><replaceable class="parameter">lc_ctype</replaceable></term>
132132
<listitem>
133133
<para>
134134
Character classification (<literal>LC_CTYPE</>) to use in the new
@@ -207,25 +207,27 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
207207

208208
<para>
209209
The character set encoding specified for the new database must be
210-
compatible with the chosen LC_COLLATE and LC_CTYPE settings.
211-
If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
210+
compatible with the chosen locale settings (<literal>LC_COLLATE</> and
211+
<literal>LC_CTYPE</>). If the locale is <literal>C</> (or equivalently
212212
<literal>POSIX</>), then all encodings are allowed, but for other
213213
locale settings there is only one encoding that will work properly.
214+
(On Windows, however, UTF-8 encoding can be used with any locale.)
214215
<command>CREATE DATABASE</> will allow superusers to specify
215-
<literal>SQL_ASCII</> encoding regardless of the locale setting,
216+
<literal>SQL_ASCII</> encoding regardless of the locale settings,
216217
but this choice is deprecated and may result in misbehavior of
217218
character-string functions if data that is not encoding-compatible
218219
with the locale is stored in the database.
219220
</para>
220221

221222
<para>
222-
The <literal>LC_COLLATE</> and <literal>LC_CTYPE</> settings must match
223-
those of the template database, except when template0 is used as
224-
template. This is because <literal>LC_COLLATE</> and <literal>LC_CTYPE</>
225-
affects the ordering in indexes, so that any indexes copied from the
226-
template database would be invalid in the new database with different
227-
settings. <literal>template0</literal>, however, is known to not
228-
contain any indexes that would be affected.
223+
The encoding and locale settings must match those of the template database,
224+
except when <literal>template0</> is used as template. This is because
225+
other databases might contain data that does not match the specified
226+
encoding, or might contain indexes whose sort ordering is affected by
227+
<literal>LC_COLLATE</> and <literal>LC_CTYPE</>. Copying such data would
228+
result in a database that is corrupt according to the new settings.
229+
<literal>template0</literal>, however, is known to not contain any data or
230+
indexes that would be affected.
229231
</para>
230232

231233
<para>
@@ -257,12 +259,17 @@ CREATE DATABASE sales OWNER salesapp TABLESPACE salesspace;
257259
</para>
258260

259261
<para>
260-
To create a database <literal>music</> which supports the ISO-8859-1
262+
To create a database <literal>music</> which supports the ISO-8859-1
261263
character set:
262264

263265
<programlisting>
264-
CREATE DATABASE music ENCODING 'LATIN1';
266+
CREATE DATABASE music ENCODING 'LATIN1' TEMPLATE template0;
265267
</programlisting>
268+
269+
In this example, the <literal>TEMPLATE template0</> clause would only
270+
be required if <literal>template1</>'s encoding is not ISO-8859-1.
271+
Note that changing encoding might require selecting new
272+
<literal>LC_COLLATE</> and <literal>LC_CTYPE</> settings as well.
266273
</para>
267274
</refsect1>
268275

src/backend/commands/dbcommands.c

+28-10
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
*
1414
*
1515
* IDENTIFICATION
16-
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.223 2009/05/05 23:39:55 tgl Exp $
16+
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.224 2009/05/06 16:15:21 tgl Exp $
1717
*
1818
*-------------------------------------------------------------------------
1919
*/
@@ -361,7 +361,8 @@ createdb(const CreatedbStmt *stmt)
361361
#endif
362362
(encoding == PG_SQL_ASCII && superuser())))
363363
ereport(ERROR,
364-
(errmsg("encoding %s does not match locale %s",
364+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
365+
errmsg("encoding %s does not match locale %s",
365366
pg_encoding_to_char(encoding),
366367
dbctype),
367368
errdetail("The chosen LC_CTYPE setting requires encoding %s.",
@@ -374,29 +375,45 @@ createdb(const CreatedbStmt *stmt)
374375
#endif
375376
(encoding == PG_SQL_ASCII && superuser())))
376377
ereport(ERROR,
377-
(errmsg("encoding %s does not match locale %s",
378+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
379+
errmsg("encoding %s does not match locale %s",
378380
pg_encoding_to_char(encoding),
379381
dbcollate),
380382
errdetail("The chosen LC_COLLATE setting requires encoding %s.",
381383
pg_encoding_to_char(collate_encoding))));
382384

383385
/*
384-
* Check that the new locale is compatible with the source database.
386+
* Check that the new encoding and locale settings match the source
387+
* database. We insist on this because we simply copy the source data ---
388+
* any non-ASCII data would be wrongly encoded, and any indexes sorted
389+
* according to the source locale would be wrong.
385390
*
386-
* We know that template0 doesn't contain any indexes that depend on
387-
* collation or ctype, so template0 can be used as template for
388-
* any locale.
391+
* However, we assume that template0 doesn't contain any non-ASCII data
392+
* nor any indexes that depend on collation or ctype, so template0 can be
393+
* used as template for creating a database with any encoding or locale.
389394
*/
390395
if (strcmp(dbtemplate, "template0") != 0)
391396
{
397+
if (encoding != src_encoding)
398+
ereport(ERROR,
399+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
400+
errmsg("new encoding (%s) is incompatible with the encoding of the template database (%s)",
401+
pg_encoding_to_char(encoding),
402+
pg_encoding_to_char(src_encoding)),
403+
errhint("Use the same encoding as in the template database, or use template0 as template.")));
404+
392405
if (strcmp(dbcollate, src_collate) != 0)
393406
ereport(ERROR,
394-
(errmsg("new collation is incompatible with the collation of the template database (%s)", src_collate),
407+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
408+
errmsg("new collation (%s) is incompatible with the collation of the template database (%s)",
409+
dbcollate, src_collate),
395410
errhint("Use the same collation as in the template database, or use template0 as template.")));
396411

397412
if (strcmp(dbctype, src_ctype) != 0)
398413
ereport(ERROR,
399-
(errmsg("new LC_CTYPE is incompatible with LC_CTYPE of the template database (%s)", src_ctype),
414+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
415+
errmsg("new LC_CTYPE (%s) is incompatible with the LC_CTYPE of the template database (%s)",
416+
dbctype, src_ctype),
400417
errhint("Use the same LC_CTYPE as in the template database, or use template0 as template.")));
401418
}
402419

@@ -1099,7 +1116,8 @@ movedb(const char *dbname, const char *tblspcname)
10991116
continue;
11001117

11011118
ereport(ERROR,
1102-
(errmsg("some relations of database \"%s\" are already in tablespace \"%s\"",
1119+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
1120+
errmsg("some relations of database \"%s\" are already in tablespace \"%s\"",
11031121
dbname, tblspcname),
11041122
errhint("You must move them back to the database's default tablespace before using this command.")));
11051123
}

src/test/mb/README

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
$PostgreSQL: pgsql/src/test/mb/README,v 1.3 2008/03/21 13:23:29 momjian Exp $
1+
$PostgreSQL: pgsql/src/test/mb/README,v 1.4 2009/05/06 16:15:21 tgl Exp $
22

33
README for multibyte regression test
44
1998/7/22
@@ -7,4 +7,4 @@ README for multibyte regression test
77
This directory contains a set of tests for multibyte supporting
88
extentions for PostgreSQL. To run the test, simply type:
99

10-
% mbregress.sh
10+
% sh mbregress.sh

src/test/mb/mbregress.sh

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#! /bin/sh
2-
# $PostgreSQL: pgsql/src/test/mb/mbregress.sh,v 1.9 2005/06/24 15:11:59 ishii Exp $
2+
# $PostgreSQL: pgsql/src/test/mb/mbregress.sh,v 1.10 2009/05/06 16:15:21 tgl Exp $
33

44
if echo '\c' | grep -s c >/dev/null 2>&1
55
then
@@ -15,7 +15,7 @@ if [ ! -d results ];then
1515
fi
1616

1717
dropdb utf8
18-
createdb -E UTF8 utf8
18+
createdb -T template0 -l C -E UTF8 utf8
1919

2020
PSQL="psql -n -e -q"
2121
tests="euc_jp sjis euc_kr euc_cn euc_tw big5 utf8 mule_internal"
@@ -36,7 +36,7 @@ do
3636
unset PGCLIENTENCODING
3737
else
3838
dropdb $i >/dev/null 2>&1
39-
createdb -E `echo $i | tr 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'` $i >/dev/null
39+
createdb -T template0 -l C -E `echo $i | tr 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'` $i >/dev/null
4040
$PSQL $i < sql/${i}.sql > results/${i}.out 2>&1
4141
fi
4242

0 commit comments

Comments
 (0)