summaryrefslogtreecommitdiff
path: root/doc/src/sgml/syntax.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/syntax.sgml')
-rw-r--r--doc/src/sgml/syntax.sgml142
1 files changed, 134 insertions, 8 deletions
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 0efba278c5..6c988011b7 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.123 2008/06/26 22:24:42 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.124 2008/10/29 08:04:52 petere Exp $ -->
<chapter id="sql-syntax">
<title>SQL Syntax</title>
@@ -190,6 +190,57 @@ UPDATE "my_table" SET "a" = 5;
</para>
<para>
+ <indexterm><primary>Unicode escape</primary><secondary>in
+ identifiers</secondary></indexterm> A variant of quoted
+ identifiers allows including escaped Unicode characters identified
+ by their code points. This variant starts
+ with <literal>U&</literal> (upper or lower case U followed by
+ ampersand) immediately before the opening double quote, without
+ any spaces in between, for example <literal>U&"foo"</literal>.
+ (Note that this creates an ambiguity with the
+ operator <literal>&</literal>. Use spaces around the operator to
+ avoid this problem.) Inside the quotes, Unicode characters can be
+ specified in escaped form by writing a backslash followed by the
+ four-digit hexadecimal code point number or alternatively a
+ backslash followed by a plus sign followed by a six-digit
+ hexadecimal code point number. For example, the
+ identifier <literal>"data"</literal> could be written as
+<programlisting>
+U&"d\0061t\+000061"
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&"\0441\043B\043E\043D"
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+U&"d!0061t!+000061" UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character. Note that the escape character is
+ written in single quotes, not double quotes.
+ </para>
+
+ <para>
+ To include the escape character in the identifier literally, write
+ it twice.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be specified.
+ </para>
+
+ <para>
Quoting an identifier also makes it case-sensitive, whereas
unquoted names are always folded to lower case. For example, the
identifiers <literal>FOO</literal>, <literal>foo</literal>, and
@@ -245,7 +296,7 @@ UPDATE "my_table" SET "a" = 5;
write two adjacent single quotes, e.g.
<literal>'Dianne''s horse'</literal>.
Note that this is <emphasis>not</> the same as a double-quote
- character (<literal>"</>).
+ character (<literal>"</>). <!-- font-lock sanity: " -->
</para>
<para>
@@ -269,14 +320,19 @@ SELECT 'foo' 'bar';
by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
following the standard.)
</para>
+ </sect3>
- <para>
- <indexterm>
+ <sect3 id="sql-syntax-strings-escape">
+ <title>String Constants with C-Style Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-escape">
<primary>escape string syntax</primary>
</indexterm>
- <indexterm>
+ <indexterm zone="sql-syntax-strings-escape">
<primary>backslash escapes</primary>
</indexterm>
+
+ <para>
<productname>PostgreSQL</productname> also accepts <quote>escape</>
string constants, which are an extension to the SQL standard.
An escape string constant is specified by writing the letter
@@ -287,7 +343,8 @@ SELECT 'foo' 'bar';
Within an escape string, a backslash character (<literal>\</>) begins a
C-like <firstterm>backslash escape</> sequence, in which the combination
of backslash and following character(s) represent a special byte
- value:
+ value, as shown in <xref linkend="sql-backslash-table">.
+ </para>
<table id="sql-backslash-table">
<title>Backslash Escape Sequences</title>
@@ -341,14 +398,24 @@ SELECT 'foo' 'bar';
</tgroup>
</table>
- It is your responsibility that the byte sequences you create are
- valid characters in the server character set encoding. Any other
+ <para>
+ Any other
character following a backslash is taken literally. Thus, to
include a backslash character, write two backslashes (<literal>\\</>).
Also, a single quote can be included in an escape string by writing
<literal>\'</literal>, in addition to the normal way of <literal>''</>.
</para>
+ <para>
+ It is your responsibility that the byte sequences you create are
+ valid characters in the server character set encoding. When the
+ server encoding is UTF-8, then the alternative Unicode escape
+ syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
+ should be used instead. (The alternative would be doing the
+ UTF-8 encoding by hand and writing out the bytes, which would be
+ very cumbersome.)
+ </para>
+
<caution>
<para>
If the configuration parameter
@@ -379,6 +446,65 @@ SELECT 'foo' 'bar';
</para>
</sect3>
+ <sect3 id="sql-syntax-strings-uescape">
+ <title>String Constants with Unicode Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-uescape">
+ <primary>Unicode escape</primary>
+ <secondary>in string constants</secondary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> also supports another type
+ of escape syntax for strings that allows specifying arbitrary
+ Unicode characters by code point. A Unicode escape string
+ constant starts with <literal>U&</literal> (upper or lower case
+ letter U followed by ampersand) immediately before the opening
+ quote, without any spaces in between, for
+ example <literal>U&'foo'</literal>. (Note that this creates an
+ ambiguity with the operator <literal>&</literal>. Use spaces
+ around the operator to avoid this problem.) Inside the quotes,
+ Unicode characters can be specified in escaped form by writing a
+ backslash followed by the four-digit hexadecimal code point
+ number or alternatively a backslash followed by a plus sign
+ followed by a six-digit hexadecimal code point number. For
+ example, the string <literal>'data'</literal> could be written as
+<programlisting>
+U&'d\0061t\+000061'
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&'\0441\043B\043E\043D'
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+ U&'d!0061t!+000061' UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be
+ specified.
+ </para>
+
+ <para>
+ To include the escape character in the string literally, write it
+ twice.
+ </para>
+ </sect3>
+
<sect3 id="sql-syntax-dollar-quoting">
<title>Dollar-Quoted String Constants</title>