Barry Stinson - PostgreSQL Essential Reference-Sams (2001)
Barry Stinson - PostgreSQL Essential Reference-Sams (2001)
Barry Stinson - PostgreSQL Essential Reference-Sams (2001)
Copyright
About the Author
About the Technical Reviewers
Acknowledgments
Tell Us What You Think
Introduction
What's Inside?
Who Is This Book For?
Who Is This Book Not For?
Conventions
ecpg
JDBC
libpq
libpq++
libpgeasy
ODBC
Perl
Python (PyGreSQL)
PHP
All rights reserved. No part of this book may be reproduced or transmitted in any
form or by any means, electronic or mechanical, including photocopying, recording,
or by any information storage and retrieval system, without written permission from
the publisher, except for the inclusion of brief quotations in a review.
06 05 04 03 02 7 6 5 4 3 2 1
Interpretation of the printing code: The rightmost double-digit number is the year of
the book's printing; the rightmost single-digit number is the number of the book's
printing. For example, the printing code 02-1 shows that the first printing of the
book occurred in 2002.
Trademarks
All terms mentioned in this book that are known to be trademarks or service marks
have been appropriately capitalized. New Riders Publishing cannot attest to the
accuracy of this information. Use of a term in this book should not be regarded as
affecting the validity of any trademark or service mark.
This book is designed to provide information about PostgreSQL. Every effort has
been made to make this book as complete and as accurate as possible, but no
warranty of fitness is implied.
The information is provided on an as-is basis. The authors and New Riders
Publishing shall have neither liability nor responsibility to any person or entity with
respect to any loss or damages arising from the information contained in this book
or from the use of the discs or programs that may accompany it.
Publisher
David Dwyer
Associate Publisher
Stephanie Wall
Managing Editor
Kristy Knoop
Acquisitions Editor
Deborah Hittel-Shoaf
Development Editor
Chris Zahn
Stephanie Layton
Publicity Manager
Susan Nixon
Project Editor
Sarah Kearns
Copy Editor
Amy Lepore
Indexer
Chris Morris
Manufacturing Coordinator
Jim Conway
Book Designer
Louisa Klucznik
Cover Designer
Cover Production
Aren Howell
Proofreader
Jeannie Smith
Composition
Ron Wise
About the Author
Barry Stinson graduated from Louisiana State University in 1995 with a master's
degree in music composition. During his tenure there, he was fortunate enough to
help design the Digital Arts studio with Dr. Stephen David Beck. Designing a full-
fledged music and graphicarts digital studio afforded him exposure to a diverse set
of unique computing systems—particularly those from NeXT, SGI, and Apple. It was
during this time that he discovered Linux and subsequently PostgreSQL, both of
which were still in an early stage of development.
After graduation, Barry set up his own consulting company, Silicon Consulting, which
is based in Lafayette, Louisiana. Over the years, he has worked as a consultant for
many companies throughout southern Louisiana.
Increasingly, much of the work Barry has done over the years has centered on
databases. In the time from his original exposure to Postgre95 to its present form
as PostgreSQL, an amazing amount of development has taken place on open-source
database systems.
The rise of high-quality and open-sourced computing systems that has taken place
recently has produced a renaissance in the high-tech industry. However, according
his girlfriend, Pamela, his continued insistence to rely on renegade operating
systems, such as Linux, has only served to strengthen the unruly aspects already
present in his personality.
About the Technical Reviewers
These reviewers contributed their considerable hands-on expertise to the entire
development process for PostgreSQL Essential Reference. As the book was being
written, these dedicated professionals reviewed all the material for technical
content, organization, and flow. Their feedback was critical to ensuring that
PostgreSQL Essential Reference fits our reader's need for the highest-quality
technical information.
Jeremy Murrish is a Software Engineer and Project Manager at Direct Data, Inc., in
St. Louis, Missouri. Jeremy has spent his five years at Direct Data developing web-
based digital asset management and workflow solutions on a UNIX platform.
Jeremy's experience lies mainly in the publishing, pre-press, and printing industries.
He has a B.S. in Computer Science from the University of Missouri-Rolla. Jeremy
lives in St. Louis with his wife, Sherri, and their two dogs, Sloan and Thorn.
Lamar Owen basically grew up breathing computer programming. His first real
experiences with computers involved the old 8-bit TRS-80 Models I and III and a
hexidecimal debugger, and he programmed in hand-assembled machine language.
Lamar wrote a Z80 disassembler, and patched/rewrote portions of the TRSDOS
operating system for his personal use. After graduating from Rosman High School in
1986, he earned his Bachelor's degree in Electronics Engineering Technology from
DeVRY Institute of Technology in Decatur, Georgia, where he graduated Summa
Cum Laude in 1989. Lamar has owned, administered, and programmed various
UNIX systems for nearly 15 years. He has been employed by Anchor Baptist
Broadcasting for 11 years, for which he is currently Technical Director. In addition,
Lamar has maintained the PostgreSQL RPM set for two years.
Acknowledgments
This book would not have been possible if it weren't for the tireless efforts of the
New Riders staff, specifically Stephanie Wall, Deborah Hittel-Shoaf, and Chris Zahn.
Let me also acknowledge the efforts of the rest of the New Riders team—Sarah
Kearns, Amy Lepore, Chris Morris, Jeannie Smith, and Ron Wise. Moreover, the
technical reviewers, Jeremy Murrish and Lamar Owen, deserve a great deal of
thanks for providing invaluable insights, corrections, and a great deal of expertise in
making this book a success.
I would like to also thank The Logan Law Firm in Lafayette, Louisiana, for providing
me with all the coffee I could drink while writing this book and for providing an
environment "free from distraction."
Thanks to all the people who have worked so hard on the PostgreSQL web site,
mailing list, and documentation project that answered so many of my questions.
Mom and Dad, thanks for always believing in me and for buying that first computer.
I told you it would pay off one day!
Last, but certainly not least, I would like to thank Pamela Beadle, my beautiful
girlfriend, whose intelligence, patience, and beauty are enough to inspire any man.
Tell Us What You Think
As the reader of this book, you are the most important critic and commentator. We
value your opinion and want to know what we're doing right, what we could do
better, what areas you'd like to see us publish in, and any other words of wisdom
you're willing to pass our way.
As the Associate Publisher for New Riders Publishing, I welcome your comments.
You can fax, email, or write me directly to let me know what you did or didn't like
about this book— as well as what we can do to make our books stronger.
Please note that I cannot help you with technical problems related to the topic of
this book, and that due to the high volume of mail I receive, I might not be able to
reply to every message.
When you write, please be sure to include this book's title and author as well as
your name and phone or fax number. I will carefully review your comments and
share them with the author and editors who worked on the book.
Fax: 317-581-4663
Email: [email protected]
In the early days of the modern computer, each program was expected to be able to
handle its own data storage and retrieval functions. This, of course, placed a
significant burden on early programmers, who had to write extra code that had
nothing to do with the true function of their application. Moreover, it turns out that it
is difficult to store data efficiently and reliably; therefore, it was only natural that
the idea of a database was born.
There were two problems with these early systems: They caused problems with
portability, and they tended to work better with static data structures. Because each
vendor was selling a unique and proprietary database management system,
applications had to be specially written to interface with each. If your application
was written to interface with the IBM database, it could not easily be configured to
work with a competitor's and vice versa. Moreover, the early databases worked on
datasets that were implemented as "flat files." This meant that if you wanted to
capture a different set of data from your application, modifications to the base data
structure were difficult and time consuming. In many cases, significant sections of
your application's source code would need to be rewritten to enable such
modifications.
In the early 1970s, a paper written by E.F. Codd, an IBM researcher, fundamentally
changed the history of how database systems would be implemented. Codd
suggested that datasets be represented relationally by the database system. This
meant that tables within a database could be linked together with various indexes to
produce an underlying data structure that was much more dynamic and extensible
than previously possible with the early flat-file systems.
IBM set out to design a system that incorporated many of Codd's visions, and this
system came to be known as System-R. Completed in the mid-1970s, System-R
also contained a new feature known as a structured query language (SQL). This new
language provided two very radical concepts to the database world: (1) It was
declarative, and (2) It was accepted by the American National Standards Institute
(ANSI) to be a standard.
Before the advent of the SQL language, programmers needed to procedurally define
how the data stored by the database would be accessed.With SQL, however,
programmers could simply request what criteria needed to be present in the
returned dataset, and the database engine would perform the work of actually
translating that request into returned data. This removed an additional burden from
programmers because it meant that the actual mechanics of data input and retrieval
were abstracted from their control. As a result, a tremendous amount of work went
into designing query planners that could perform these requests in the most
efficient manner possible. Through this concentration of effort, databases achieved
levels of efficiency and reliability that were outside the grasp of what any individual
developer could have achieved independently.
One of the early RDBMSs was called Ingres, and it included many of the features
available in database systems at that time. A project was started in the early 1990s
at the University of California at Berkeley to further these concepts; this project was
dubbed Postgres as a play on words that implied after Ingres.
During the middle of that decade, the project, then known as Postgres95, was
renamed to PostgreSQL and was released to the world at large as an open-sourced
project. In the ensuing years, a tremendous amount of development was put into
this project, which has resulted in an open-source and freely available feature-rich
database system. The result is an RDBMS that rivals the features and performance
typically only found in high-dollar commercial systems. This is a monumental
achievement, and a great deal of admiration and respect should go to the countless
developers who have contributed their time and efforts to this project.
In fact, PostgreSQL has advanced to such a degree that there is now much
commercial interest in further supporting and developing PostgreSQL. Among the
companies interested are Great Bridge and Red Hat, both of which have a deep
commitment to open source and to PostgreSQL. With an active development
community backed by serious commercial support, PostgreSQL is destined to be one
of the shining stars of the open-source arena. PostgreSQL will undoubtedly be
considered a success in the same manner as the Apache and Linux projects.
What's Inside?
A lot of effort has been put into making this book a truly great "reference" manual.
There is a certain art to making a reference manual that differs from writing a
traditional book. Namely, the author of a reference manual needs to keep in mind,
even more than usual, how the book will actually be used.
Reference manuals aren't read sequentially; rather, the reader usually jumps from
topic to topic as his or her needs arise. Consequently, a great deal of care needs to
be taken in how the book is laid out, in how it is organized, and in giving the reader
the "right amount" of information on each topic.
I've worked hard at following this approach to the best of my ability. The
information collected has been organized into the following structure.
SQL Reference
This section outlines all the SQL commands supported by PostgreSQL Version 7.1 in
a single chapter— Chapter 1, "PostgreSQL SQL Reference." Each command is listed
in alphabetical order, along with usage notes and an example.
PostgreSQL Specifics
PostgreSQL Administration
This section outlines the options available to programmers who need to develop
custom applications with PostgreSQL. Covered topics include the following:
Chapter 12, "Creating Custom Functions," outlines the use of custom written
functions, triggers, and rules.
Appendices
Ideally, the reader should have some familiarity with a UNIX-style operating system.
This is not a strict requirement, but it will make certain tasks like installation or
administration much easier.
If you are new to database systems in general, this book will probably not be of
immediate benefit to you.Additionally, because PostgreSQL is both a relational
database and SQL based, if these concepts are not familiar to you as well, then this
might not be the book for you, at least not yet.
Conventions
Monospaced font indicates web sites, keywords, commands, file paths, and
options. Italicized font indicates where you should substitute a value of your own
choosing.
In SQL statements, SQL keywords and function names are uppercase. Database,
table, and column names are lowercase.
Part I: SQL Reference
The solution was to create a standard method of accessing database functions that
each database vendor would support. The result was originally dubbed SQL-86,
named after the year of its inception. Later, the standard was amended with
additional features and renamed to SQL-89.
In 1992, the SQL specification was expanded significantly to handle extra data
types, outer joins, catalog specification, and other enhancements. This version of
SQL, called SQL-92 (a.k.a. SQL-2), is the foundation of many modern relational
database management systems (RDBMSs).
PostgreSQL supports the majority of the functions outlined in the SQL-92 standard.
The following pages list the SQL commands, their syntax, their options, and
examples of how SQL is used in PostgreSQL. Although all the major functional
specifications of SQL-92 are supported in PostgreSQL, there are occasions when
PostgreSQL has SQL commands that have no counterpart in the formal SQL-92
specification. The following alphabetical listing notes these areas and points the user
to synonymous commands.
Table of Commands
LOAD
VACUUM
ALTER INDEX
COPY
UPDATE
INSERT
DELETE
SELECT INTO
VACUUM
CLUSTER
User
CREATE USER DROP USER ALTER USER
GRANT
REVOKE
ROLLBACK MOVE
END LISTEN
UNLISTEN
NOTIFY
LOCK
UNLOCK
The following pages comprise an alphabetical listing of the SQL commands supported in PostgreSQL (Version
7.1).
ABORT
Syntax
Description
ABORT is used to halt a transaction in process and roll back the table(s) to its original state.
Input(s)
Output(s)
Notes
SQL-92 Compatibility
Example
The following code shows how the ABORT command could be used to halt a transaction in progress and return
the table back to its original state. First you see the values in the table mytable in their original state. Next,
those values are modified with the UPDATE command. However, the UPDATE command is issued from within a
BEGIN…COMMIT transaction; therefore, it is possible for us to ABORT the current transaction and return the table
to its original state.
name | age
----------------------
Barry | 29
BEGIN TRANSACTION;
UPDATE mytable SET age=30 WHERE name='Barry';
SELECT * FROM mytable;
name | age
----------------------
Barry | 30
ABORT TRANSACTION;
SELECT * FROM mytable;
name | age
----------------------
Barry | 29
ALTER GROUP
Usage
Description
Input(s)
Output(s)
Notes
Only the superuser can issue this command—all other attempts will fail. The user and the group must exist
before this command can be issued. Dropping a user will only remove the user from the group, not drop him or
her from the database.
SQL-92 Compatibility
Example
The following code shows how multiple users can be added or dropped from the group admins.
ALTER TABLE
Usage
Starting with PostgreSQL 7.1, some new possibilities to the ALTER TABLE command were added. See the
following:
Description
ALTER TABLE modifies a table or column. It enables columns to be modified, renamed, or added to an existing
table. Additionally, the table itself can be renamed by using the ALTER TABLE…RENAME syntax. If a table or
column is renamed, none of the underlying data will be affected.
By using the SET DEFAULT or DROP DEFAULT options, the default value for that column can be set, modified,
or removed. (See the "Notes" section.)
If an asterisk (*) is included after the table name, then all tables that inherit their column properties from the
current table will be modified as well. (See the "Notes" section.) This changes withVersion 7.1 of PostgreSQL,
which cascades all changes to inherited tables by default. To limit changes to a specific table in PostgreSQL 7.1
and later, use the ONLY command.
Input(s)
Output(s)
ERROR (Message returned if the column, table, or column type does not exist.)
Notes
The ALTER TABLE command can only be issued by users who own the table or class of tables being modified.
Changing the default value for a column will not retroactively affect existing data in that column. DEFAULT
VALUE will only affect newly inserted rows. To change the default value for all rows, the DEFAULT VALUE clause
should be followed with an UPDATE command to reset the existing rows to the desired value.
The asterisk (*) should always be included if the table is a superclass; otherwise, queries will fail if performed on
subtables that depend on the newly modified column.
Only FOREIGN KEY constraints can be added to a table; to add or remove a unique constraint, a unique index
must be created. When adding a FOREIGN KEY constraint, the column name must exist in the foreign table. To
add check constraints to a table, you must re-create and reload the table using the CREATE TABLE command.
SQL-92 Compatibility
The ADD COLUMN form is compliant, except that it does not support defaults or constraints. A subsequent ALTER
COLUMN command must be issued to achieve the desired results.
ALTER TABLE does not support some of the functionality as specified in SQL-92. Specifically, SQL-92 allows
constraints to be dropped from a table. To achieve this result in PostgreSQL, indexes must be dropped, or the
table must be re-created and reloaded.
Examples
To add a column statecode of type VARCHAR[2] to the table authors, you would issue the following
command:
To rename the column statecode to state, you would use the following command:
To add a FOREIGN KEY constraint to the table authors, which ensures that the field state is a valid entry (as
defined by the foreign table us_states), issue this command:
ALTER TABLE authors ADD CONSTRAINT statechk FOREIGN KEY (state) REFERENCES
us_states (state) MATCH FULL;
ALTER USER
Usage
Description
The optional clauses CREATEDB or NOCREATEDB determine whether the user is allowed to create databases.
The optional clauses CREATEUSER or NOCREATEUSER determine whether the user will be allowed to create users
of his or her own.
The optional clause VALID UNTIL supplies the date and/or time when the password will expire.
Input(s)
Output(s)
ERROR: ALTER USER: user 'username' does not exist (Message returned if the username does not
exist in current database.)
Notes
Only a database administrator or superuser can modify privileges and account expiration.
To create or drop a user from the database, use CREATE USER or DROP USER, respectively.
SQL-92 Compatibility
SQL-92 does not define the concept of USERS; it is left for each implementation to decide.
Examples
To change the user Charles password to qwerty, issue the following command:
To set the user Charles password to expire on January 1, 2005, issue the following command:
To cause the user Charles password to expire at 12:35 on January 1, 2005, in a time zone that is six hours
ahead of UTC, issue the following command:
To give the user Charles the capability to create his own users but not his own databases, issue the following
command:
BEGIN
Usage
Description
By default, all commands issued in PostgreSQL are performed in an implicit transaction. The explicit use of the
BEGIN…COMMIT clauses encapsulates a series of SQL commands to ensure proper execution. If any of the
commands in the series fail, it can cause the entire transaction to ROLLBACK, bringing the database back to its
original state.
PostgreSQL transactions are normally set to be READ COMMITTED, which means that inprocess transactions can
see the effect of other committed transactions. The behavior can be changed by issuing a SET TRANSACTION
ISOLATION LEVEL SERIALIZABLE command after a transaction has started. This would have the effect of
preventing the current transaction from seeing any changes to the database while it is in process. (See the
"Examples" section.)
Input(s)
Output(s)
NOTICE: BEGIN: already a transaction in progress (Message indicates that a current transaction is
already in progress and that the transaction just begun has no effect on existing transaction.)
Notes
See ABORT, COMMIT, and ROLLBACK for more information regarding transactions.
SQL-92 Compatibility
The BEGIN keyword is implicit in SQL-92; this is an extension to PostgreSQL. Normally, in SQL-92, every
transaction begins with an implicit BEGIN command but requires a COMMIT or ROLLBACK command to actually
commit the transaction to the database.
Examples
In these examples, you focus on two users who are performing operations on the database concurrently. These
examples will highlight how transactions affect the data that other users see, particularly with respect to the
READ COMMIT and SET TRANSACTION ISOLATION LEVEL SERIALIZABLE commands. (Note: In the following
examples, the SELECT commands would also display the column names and return the actual data. This output
has been abbreviated to make these listings more readable.)
The following example shows User 1 as he or she is engaged in an explicit transaction series and User 2, who
is using only implicit transactions. The example shows what data each user can see.
User 1 User 2
BEGIN TRANSACTION;
INSERT INTO mytable VALUES ('Pam');
(1) row inserted SELECT * FROM mytable;
(0) row found
SELECT * FROM mytable;
(1) row found
COMMIT TRANSACTION;
The following example shows how two explicit transactions, both using READ COMMIT (which is the default),
affect each other. In particular, note how User 2 can view the effects of User 1 after User 1 has issued a
COMMIT command. Compare this example with the next one, which uses the SERIALIZABLE command.
User 1 User 2
BEGIN TRANSACTION; BEGIN TRANSACTION;
SELECT * FROM mytable; SELECT * FROM mytable;
(0) results found (0) results found
In this example, the SERIALIZABLE command is used to show how it prevents changes from being seen while
the transaction is in process. In effect, the SERIALIZABLE command takes a snapshot of the database as it
existed before the transaction was started and isolates it from the effects of other COMMIT commands.
User 1 User 2
BEGIN TRANSACTION; BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE;
SELECT * FROM mytable; SELECT * FROM mytable;
(0) results found (0) results found
CLOSE
Usage
CLOSE cursor;
Description
This closes cursors that were opened by using the DECLARE command. Closing a cursor frees resources within
PostgreSQL and should be performed when the current cursor is no longer needed.
Input(s)
NOTICE PerformPortalClose: portal 'cursor' not found (Message returned if no cursor is found.)
Notes
By default, a cursor is closed if a COMMIT or ROLLBACK command is issued. See DECLARE for more discussion on
cursors.
SQL-92 Compatibility
Example
CLOSE newchecks;
CLUSTER
Usage
Description
Normally, PostgreSQL physically stores data in tables in an unordered manner. CLUSTER forces PostgreSQL to
physically reorder the tables so that the data is grouped according to the index specified. Generally speaking,
database performance will improve after a CLUSTER command is issued. However, any subsequent inserts are
not physically grouped in the same manner. In effect, the CLUSTER command creates a static index based on the
criteria specified. If subsequent data is inserted or updated, the CLUSTER command must be reissued to
physically reorder the table.
Input(s)
Output(s)
Notes
To perform the reordering of data, PostgreSQL copies the table in index order to a temporary table and then re-
creates and reloads the table in the new order. This causes any grant permissions and other indexes to be lost in
the transfer.
Because the CLUSTER command produces a static ordering, most users would only benefit from this command
for specific cases. Dynamic clusters can be created by using the ORDER BY clause within a SELECT command.
(See the section on SELECT later in this chapter.)
The CLUSTER command can take several minutes to complete. This depends on the size of the table and/or the
hardware speed of the system.
SQL-92 Compatibility
Example
The following example shows a table named authors that has an index called name. The same effect could be
achieved by using a SELECT…ORDER ON name command.
COMMENT
Usage
COMMENT ON DATABASE | INDEX | RULE | SEQUENCE | TABLE | TYPE | VIEW obj_name IS text
Or
Description
Input(s)
Output(s)
Notes:
Comments on an object can be retrieved from within psql by using the /dd command. (See "psql" in Chapter
6,"User Executable Files," for more.)
SQL-92 Compatibility
Examples
COMMIT
Usage
Description
By default, all commands issued in PostgreSQL are performed in an implicit transaction. The explicit use of the
BEGIN…COMMIT clauses encapsulates a series of SQL commands to ensure proper execution. If any of the
commands in the series fail, it can cause the entire transaction to ROLLBACK, bringing the database back to its
original state.
Input(s)
Output(s)
Notes
See ABORT, BEGIN, COMMIT, and ROLLBACK for more information regarding transactions.
SQL-92 Compatibility
SQL-92 only specifies the forms COMMIT and COMMIT WORK. Otherwise, this command is fully compliant.
Example
This example shows two users who are concurrently using the table mytable. The INSERT command from User
1 is not seen by User 2 until a COMMIT command is issued. (This assumes the READ COMMITTED clause is set;
see the section on BEGIN for more information.)
User 1 User 2
BEGIN TRANSACTION;
INSERT INTO mytable VALUES ('Pam');
(1) row inserted SELECT * FROM mytable;
(0) row found
SELECT * FROM mytable;
(1) row found
COMMIT TRANSACTION;
COPY
Usage
Or
Description
The COPY command enables users to import or export tables from PostgreSQL. By using the BINARY keyword,
data will be used in a binary format and will not be human readable. For ASCII formats, the delimiters can be
specified by including the USING DELIMITERS keyword. Additionally, null strings can be specified by using the
WITH NULL clause. The inclusion of the WITH OIDS clause will cause PostgreSQL to export or expect the Object
IDs to be present.
When COPY…TO is used without the BINARY keyword, PostgreSQL will generate a text file in which each row
(instance) is contained on a separate line of the text file. If a character embedded in a field also matches the
specified delimiter, the embedded character will be preceded with a backslash (\). OIDs, if included, will be the
first item on the line. The format of a generated text file will look like this:
<OID.Row1><delimiter><Field1.Row1><delimiter>…<Field N.Row1><newline>
<OID.Row2><delimiter><Field1.Row2><delimiter>…<Field N.Row2><newline>
…
<OID.RowN><delimiter><Field1.RowN><delimiter>…<Field N.RowN><newline>
(EOF)
If COPY…TO is sending to standard output (stdout) instead of a text file, the End-Of-File (EOF) will be
designated by \.<newline> (backslash followed by a period followed by a new line).
If COPY…FROM is being used, it will expect the text file to have this same format. Similarly, if being copied from
standard-input (stdin), COPY…FROM will expect the last row to be \.<newline> (backslash followed by a
period followed by a new line). However, in the case of COPY…FROM using a file, the process will terminate if a \.
<newline> is received or when an <EOF> occurs.
If COPY…TO is used with the BINARY clause, PostgreSQL will generate the resulting file as a binary file type. The
format for a binary file will be as follows:
Input(s)
stdin—Specifies that the file should come from the standard input or pipe.
stdout—Specifies that the file should be copied to the standard output or pipe.
Output(s)
ERROR: reason (Message returned if the copy failed with reason for failure.)
Notes
The user must have either SELECT or SELECT INTO permissions to execute a COPY…TO or COPY…FROM
command.
By default, the delimiter is the tab (\t). If the delimiter specified with the USING DELIMITER option is more
than one character long, only the first character will be used.
When a filename is given, PostgreSQL assumes the current directory (such as $PGDATA). In general, it is best to
use the full pathname of the file so that confusion does not occur. Accordingly, the user executing the COPY
command must have sufficient permissions to create, modify, or delete a file in the specified directory. This is, of
course, more related to the permissions granted to the user by the underlying OS than to a specific issue related
to PostgreSQL.
Using a COPY command will not invoke table rules or defaults. However, triggers will still continue to function.
Therefore, additional operations might need to take place after a COPY command is issued (to replace defaults,
for instance).
Generally, using the BINARY keyword will result in a faster execution time. However, this depends on the data
stored in the table.
SQL-92 Compatibility
There is no specification for the COPY command in SQL-92. It is left for each implementation to decide how to
import and export data.
Examples
To copy the data from the table authors to the file /home/sqldata.txt in a comma-delimited format:
Amy,43
Barry,29
Pam,25
Tom,32
Alternatively, if the BINARY keyword is added, then the statement becomes the following:
004 \0 \0 \0 \f \0 \0 \0 \0 \0 \0 \0 \a \0 \0 \0
A m y \0 + \0 \0 \0 020 \0 \0 \0 \0 \0 \0 \0
\t \0 \0 \0 B a r r y \0 \0 \0 035 \0 \0 \0
\f \0 \0 \0 \0 \0 \0 \0 \a \0 \0 \0 P a m \0
031 \0 \0 \0 \f \0 \0 \0 \0 \0 \0 \0 \a \0 \0 \0
120 T o m \0 \0 \0 \0
CREATE AGGREGATE
Usage
CREATE AGGREGATE name (BASETYPE = input_data_type
[ , SFUNC1= sfunc1, STYPE1=state1_type]
[ , SFUNC2= sfunc2, STYPE2=state2_type]
[, FINALFUNC= ffunc]
[, INITCOND1= initial_condition1]
[, INITCOND2= initial_condition2]
Description
PostgreSQL includes a number of built-in aggregates such as sum(), avg(), min(), max()…. By using the
CREATE AGGREGATE command, users can extend PostgreSQL to include user-defined aggregate functions.
An aggregate is composed of at least one function but can include up to three. There are two state-transition
functions, sfunc1 and sfunc2, and a final calculation function, ffunc. They are used as follows:
Additionally, an aggregate function can provide one or two initial values for the related functions. If only one
sfunc is used, this initial value is optional. However, if sfunc2 is specified, then initial_condition2 is a
mandatory inclusion.
Input(s)
input_data_type—The data type that this aggregate operates on (that is, INT, VARCHAR, and so on).
sfunc1—The first state-transition function to operate on all non-NULL values (see the next "Notes" section).
state1_type—The data type for the first state-transition function (that is, INT, VARCHAR, and so on).
sfunc2—The second state-transition function to operate on all non-NULL values (see the next "Notes" section).
state2_type—The data type for the second state-transition function (that is, INT, VARCHAR, and so on).
ffunc—The final function to compute the aggregate after all input is completed (see "Notes" below).
Output(s)
Notes
ffunc must be included if both sfunc functions are included. If only one transition function is used, then it is
optional. When ffunc is not included, the aggregate's output value is derived from the last value as computed
by sfunc1.
Two aggregates can have the same name if they each operate on different data types. In this way, PostgreSQL
allows for an aggregate name to be used, but will choose the correct version depending on the data type it is
given. In other words, if you have two functions— both named the same, but each accepting a different data type
such as foo([varchar]) and foo([int])—then you only need to call the aggregate foo([our-data-
type]), and PostgreSQL will choose the appropriate version to compute the output aggregate value.
SQL-92 Compatibility
Examples
The following code creates a function called complex_sum, which extends the standard sum() function by
added complex number support.
Then, when the code is run, you get the following output:
complex_sum
-----------
(34,53.9)
CREATE DATABASE
Usage
Description
CREATE DATABASE is used to create new PostgreSQL databases. The creator must have sufficient permissions
to perform such an action. Accordingly, once the database is created, the user will then become its owner.
By default, PostgreSQL will create the database in the standard data directory (that is, $PGDATA). However,
alternate paths can be identified by including the WITH LOCATION keywords.
Input(s)
Output(s)
ERROR: user 'username' not allowed to create/drop databases (Message returned if the user
doesn't have permission to create or drop databases.)
ERROR: createdb: database 'name' already exist (Message returned if the database already exists.)
ERROR: Single quotes are not allowed in database name (Message returned if the database name
contains single quotes.)
ERROR: Single quotes are not allowed in database path (Message returned if the pathname
contains single quotes.)
ERROR: The path 'pathname' is invalid (Message returned if the path doesn't exist.)
ERROR: createdb: May not be called in transaction block (Message returned if trying to create a
database while in an explicit transaction.)
Or
ERROR: Could not initialize database directory (Message returned usually because user doesn't
have sufficient permissions in the specified directory.)
Notes
If the location definition contains a slash (/), then the leading part is assumed to be an environmental variable,
which must be known to the server process. However, if PostgreSQL is compiled with the option
ALLOW_ABSOLUTE_PATHS set to true, then absolute pathnames are also allowed (for example,
/home/barry/pgsql). By default, this option is set to false.
Before an alternate location can be used, it must be prepared with the initlocation command. For more
information, see Chapter 7, "System Executable Files," in the section "initlocation."
SQL-92 Compatibility
Databases are equivalent to the SQL-92 concept of catalogs, which are left for the specific implementation to
define.
Examples
The following is a simple example that creates a new database named sales:
The following example creates a database in an alternate location, based on an environmental variable that is
known to the server.
CREATE FUNCTION
Usage
Or
Description
CREATE FUNCTION enables users to create functions in PostgreSQL. PostgreSQL allows for the concept of
operator overloading, which is to say, the same name can be used with several different functions as long as they
each operate on different data types. However, this must be used with caution with respect to C namespaces.
See Chapter 12, "Creating Custom Functions," for more information.
Input(s)
definition—Either the actual code, a function name, or the path to the object file that defines the function.
obj—When used with C code, the actual object file that defines the function.
attrib—Optional information used for optimization purposes (see the "Notes" section for more information).
Output(s)
Notes
The user who creates a function will become the subsequent owner of the function.
SQL-92 Compatibility
Examples
The following code creates a simple SQL function that returns the date of the last check for a given employee.
First the function needs to be defined:
CHECK_DATE
----------
11/14/2001
CREATE GROUP
Usage
CREATE GROUP is used to initiate a new group in the current database. Additionally, users can be added to the
newly created group by specifying the USER keyword. By default, the group will be given the next group ID
(gid); however, if the clause WITH SYSID is specified, the user can declare the gid to use (if available).
Input(s)
Output(s)
Notes
The user of this command must have superuser access to the database.
SQL-92 Compatibility
Examples
CREATE INDEX
Usage
Or
Description
This command creates an index on the particular column and table specified. Generally, this will improve
database performance if the affected columns are part of query operation.
In addition to creating indexes on specific columns, PostgreSQL also allows for the creation of indexes based on
the results generated by a function. This allows dynamic indexes to be created for data that would normally
require significant transformation to generate via standard operations.
By default, PostgreSQL creates indexes using the BTREE method. However, with the inclusion of the USING
idx_method clause, it is possible to specify other methods. The following index methods are possible (see the
"Notes" section for more information):
In addition to being able to specify index methods, PostgreSQL also allows for the specification of which operator
classes to use. Normally, it is sufficient to accept the base operator classes for the field's data type; however,
there are cases in which such a specification would be useful. For instance, in the case of a complex number that
needs to be indexed based on the absolute and the real value, it would be beneficial to specify the particular
operator class at index creation time to achieve the most efficient indexing method.
Input(s)
UNIQUE—The addition of this keyword mandates that all data contained in the specified column will always hold a
unique value. If subsequent data is inserted that is not unique, an error message will be generated.
Output(s)
ERROR: Cannot create index: 'index_name' already exists. (Message returned if index is already
in existence.)
Notes
The BTREE method is the most common (and default) type of index used. Additionally, the BTREE method is the
only one that supports multicolumn indexes (up to 16 by default). When data is searched with one of the
following operators, BTREE index use is preferred:
= Equal to
The RTREE method is most useful for determining geometric relations. In particular, if the following operators are
used, the RTREE index method is preferred:
@ Object contains or is on
~= Same as
The HASH method provides for a very quick comparison but is only useful when the following operator is invoked:
= Equal to
SQL-92 Compatibility
Examples
CREATE LANGUAGE
Usage
Description
The capability to add a new language to PostgreSQL is one of its more advanced features. The CREATE
LANGUAGE command enables the administrator to catalog a new language in PostgreSQL, which can then be
used to create functions.
Care must be taken when using the TRUSTED keyword. Its inclusion indicates that the particular language offers
unprivileged users no functionality to bypass access restrictions. When the TRUSTED keyword is not used, it
indicates that only superusers can use this language to create new functions.
See Part IV, "Programming with PostgreSQL," for more information on registering new languages in PostgreSQL.
Input(s)
TRUSTED—Keyword that indicates whether the language can be trusted with unprivileged users.
lang-name—The new language name to add to the system. A new language name cannot override a built-in
PostgreSQL language.
HANDLER handler-name—The name of an existing function that is called to execute the newly registered
language.
Output(s)
ERROR PL handler function func() doesn't exist (Message returned if the handler function is not
registered.)
Notes
Handler functions must take no arguments and return an opaque type, a placeholder, or an unspecified data
type. This eliminates the possibility of calling a handler function as a standard function within a query.
However, arguments must be specified on the actual call from the PL function in the desired language.
Specifically, the following arguments must be included:
Triggers. When called from the trigger manager, the only argument required is the object ID from that
procedure's pg_proc entry.
Functions. When called from the function manager, the arguments needed are as follows:
A pointer to a Boolean value that indicates to the caller whether the return value is SQL NULL.
SQL-92 Compatibility
Example
This example implies that the handler function pl_call_hand already exists. First, you need to register the
pl_call_hand as a function, and then it can be used to define a new language:
CREATE FUNCTION pl_call_hand () RETURNS opaque
AS '/usr/local/pgsql/lib/my_pl_handler.so'
LANGUAGE 'C';
CREATE OPERATOR
Usage
Description
This command names a new operator from the following possible candidates:
There are some exceptions concerning how the operator can be named:
A minus sign (-) or a /* cannot appear anywhere in an operator name. (These characters signify a
comment and are therefore ignored.)
A dollar sign ($) or a colon (:) cannot be defined as a single-character name. However, it is permissible to
use them as part of a multicharacter name (such as $%).
A multicharacter name cannot end with a plus (+) or minus (-) sign unless certain conditions are met. This
is due to how PostgreSQL parses operators for queries. The characters that must be present for an operator
to end with a plus or minus sign are as follows:
: $ ~ ! ? ' && | @ # % ^
In addition to the restrictions on naming conventions, the right-hand and/or left-hand data types must be
defined. For unary operators, either the LEFTARG or RIGHTARG data type must be defined. Subsequently, both
must be defined for binary operators. Binary operators have a data type on each side of the operator (that is, x
OPERATOR y), whereas unary operators only contain data on one side.
Other than the preceding items, the only other required member of a CREATE OPERATOR command is the
PROCEDURE. The function_name specifies a previously created function that handles the underlying work
necessary to deliver the correct answer.
The remaining options (COMMUTATOR, NEGATOR, RESTRICT, JOIN, HASHES, SORT1, and SORT2) are used to
help the query optimization process. Generally, it is not necessary to define these optimization helpers. The
downside is that queries will take longer than needed to complete. Care should be taken when defining these
options. Incorrect use of these optimization parameters can result in core dumps and/or other server mishaps.
Input(s)
name—The name of the operator to create (see the preceding naming conventions).
comut_op—The equivalent operator for switched left-hand and right-hand data placement.
negat_op—The operator that negates the current operator (for example, != negates =).
rest_func—The function used to estimate the selectivity restriction in determining how many rows will pass
when the operator is part of a WHERE clause (see Part IV for more information).
join_func—The function used to estimate the selectivity of joins that would result if the operator was used in
conjunction with fields in between a pair of tables.
HASHES—Indicates to PostgreSQL that it is permissible to use hash-level equality matching for a join based on
this operator.
l_sort_op—Defines the left-hand sort operator that is needed to optimize merge joins.
r_sort_op—Defines the right-hand sort operator that is needed to optimize merge joins.
Output(s)
Notes
The function function_name must previously exist before an operator can be defined. Likewise, rest_func
and join_func must previously exist if their associated options are to be set.
The RESTRICT, JOIN, and HASHES clauses must only be used on operators that are binary and that return
Boolean values.
SQL-92 Compatibility
Example
This example shows the creation of a binary operator = that is used for comparing int4 data types. (Note:This
operator is already defined as part of the base PostgreSQL operator set; this example is for demonstration
purposes only.)
CREATE RULE
Usage
Description
PostgreSQL enables users to define action rules that are executed automatically once fired by a specific event.
Although the concept of RULES is close to TRIGGERS, there are some important differences that make each
suitable for different tasks.
RULES are primarily useful for performing cascading chains of events to ensure that certain SQL actions are
always carried out. TRIGGERS are more useful for performing data validation before or after an action is
committed. However, there is sufficient overlap between the two that allows each to perform the other's
functionality in certain cases.
The events that can be used to trigger rules are SELECT, UPDATE, INSERT, and DELETE. These events can be
bound either to a specific column or to an entire table.
One curious aspect of rule creation is the DO INSTEAD keywords. Normally, the action specified in the rule
definition is carried out in addition to the event that originally fired the trigger. However, with the inclusion of the
DO INSTEAD keywords, PostgreSQL can be directed to perform an alternate action that will supplant the action
that originally fired the event. Additionally, if the NOTHING keyword is included, no action at all will be
performed.
Input(s)
event—The specific event(s) that causes the action to initiate. Must be SELECT, UPDATE, INSERT, or DELETE.
Output(s)
Notes
When specifying the condition for the rule, it is permissible to use the new or old temporary variable for
performing dynamic queries (see the "Examples" section for more).
Care needs to be taken when designing cascading rules. It is possible to create infinite loops by defining multiple
rule actions that operate on circular definitions. In such cases, PostgreSQL will simply refuse to execute the rule
if it determines that it would result in an infinite loop.
You must have rule definition permissions for a table or column to define rules on it.
System attributes generally cannot be referenced in a rule definition (for example, func(cls) where cls is a
class). However, OIDs can be accessed from a rule.
SQL-92 Compatibility
CREATE RULE is a PostgreSQL extension; there is no SQL-92 command.
Examples
The following example shows how rules can be used to enforce referential integrity. In this case, if an author is
deleted from the authors table, it also marks that author's status as 'inactive' in the payroll table:
The next example shows how to redirect a user's action by using the DO INSTEAD clause. In this case, if the
user is not a manager, then no action is performed (notice the use of current_user, which is a built-in
environmental variable that contains the current user):
In this example, you see how rules can be used to help managers keep track of import information as it changes
throughout the database. This rule definition will log all high-dollar orders to a separate table, which can then be
printed and purged daily for a manager's review:
CREATE SEQUENCE
Usage
Description
Sequences are number generators that PostgreSQL can use to produce series of sequential numbers for use
throughout the database. Most often, the CREATE SEQUENCE command is used to generate unique number
series for use in table inserts. However, sequences can be used for many different reasons and are independent
of any table-related functions.
After a sequence has been created, it will respond to the following function calls:
currval(sequence)—Returns the current value of the sequence (no modification done to existing
sequence).
setval(sequence, newvalue)—Sets the current sequence to a new value.
Input(s)
invalue—The value used to determine the direction of the sequence. A positive value (default = 1) will result in
an ascending sequence. A negative value results in a descending sequence.
mnvalue—The minimal value that the sequence will reach. The value -2147483647 is the default for descending
sequences, and 1 is the default for ascending sequences.
mxvalue—The maximum value that the sequence will reach. The value 2147483647 is the default for ascending
sequences, and -1 is the default for descending sequences.
cavalue—Indicates whether PostgreSQL should preallocate sequence numbers and store them in memory for
faster access. The minimum and default value is 1.
CYCLE—Indicates whether the sequence should continue past the max or min values. If the outer bound (min or
max) is reached, the sequence will begin again at the opposite area (minvalue or maxvalue).
Output(s)
ERROR: Relation 'sequence' already exists (Message returned if the sequence already exists.)
ERROR: DefineSequence: MINVALUE (start) can't be >= MAXVALUE (max) (Message returned if the
starting value is out of range.)
ERROR: DefineSequence: START value (start) can't be < MINVALUE (min) (Message returned if
the starting value is out of range.)
ERROR: DefineSequence: MINVALUE (min) can't be >= MAXVALUE (max) (Message returned if the
minimum and maximum values conflict with each other.)
Notes
Sequences actually exist in a database as a one-row table. An alternative method for determining the current
value would be to issue the following:
SQL-92 Compatibility
Example
The following example shows how a sequence can be created and then bound to the default value of a table:
Usage
Description
CREATE TABLE is a comprehensive command that is used to enter a new table class into the current database.
In its most basic form, CREATE TABLE can simply be the listing of column names and data types. However,
specifying PRIMARY KEYS, DEFAULTS, and CONSTRAINTS can become increasingly more complex and requires
more explanation.
By using the TEMP or TEMPORARY keyword, it signifies to PostgreSQL that the table being created should only
exist for the length of this session. Once the current session is completed, the table will automatically be dropped
from the database.
The syntax of the CREATE TABLE command can be broken up according to column-level or table-wide directives.
Column-Level Commands
At the column level, you can specify many clauses that act to constrain the acceptable data that might be
inserted in that field. Use NULL or NOT NULL clauses to specify whether or not null values are permitted in a
column.
Also at the column level, the UNIQUE keyword can be used to mandate that all values in that column be unique.
In actuality, this is performed by PostgreSQL creating a unique index on the desired column. In addition to the
UNIQUE keyword, you can also specify that the current column is intended to be a primary key. A primary key
implies that values will be unique and non-null, but it also indicates that other tables might rely on this column
for referential integrity reasons.
By using the DEFAULT keyword, default values can be specified for a particular column. These include either
hard-coded defaults or the results of functions.
The CONSTRAINT clause can be used to define more advanced constraints than are possible through the NULL,
DEFAULT, and UNIQUE keywords. However, note that explicitly named constraints can have significant overlap
with the existing keywords present in the CREATE TABLE command. For instance, it is possible to designate a
column as non-null by using either of the two methods:
Or
Essentially, both methods are valid ways to ensure that non-null values are rejected from the column. However,
the CONSTRAINT clause offers many features that are more advanced. The full syntax of the columnar
CONSTRAINT command is as follows:
CONSTRAINT name
{
[ NULL | NOT NULL ] | UNIQUE | PRIMARY KEY | CHECK constraint |
REFERENCES reftable(refcolumn)[ MATCH mtype ][ ON DELETE delaction ] [ ON
UPDATE upaction ][ [NOT] DEFERRABLE ] [ INITIALLY chktime ]
}
[, …]
By using the CHECK constraint clause, it is possible to include a conditional expression that resolves to a
Boolean result. If the result returned is TRUE, then the CHECK constraint will pass.
The following is a more detailed list containing examples of valid column-level constraint clauses:
The NOT NULL constraint at the column level takes the following syntax:
UNIQUE constraint
The UNIQUE constraint at the column level takes the following syntax:
The PRIMARY KEY constraint at the column level takes the following syntax:
CHECK constraint
The check constraint evaluates a conditional expression, which returns a Boolean value.The syntax at the
column level takes the following syntax:
REFERENCES constraint
The REFERENCES keyword allows external columns to be bound with the current column for referential integrity
purposes.The general syntax of REFERENCES at the columnar level is as follows:
Table 1.1 shows the valid options that the REFERENCES command can take.
Option Explanation
MATCH FULL All columns in a multikey foreign reference must match (that is, all
columns must be non-null and so on).
SET DEFAULT Sets the column values to the default if referenced columns are deleted.
SET NULL Sets the column values to null if referenced columns are deleted.
SET DEFAULT Sets the column value to the default if referenced columns are updated.
SET NULL Sets the column value to null if referenced columns are updated.
This command updates the current row if the referenced column is updated. If the referenced row
is updated but no changes are made to the referenced column, no changes are made.
Table-Level Commands
Many of the commands at the column level directly overlap commands issued at the table level. In most cases,
the syntax is the same, with the exception that table-level commands must also specify the column they are
acting upon. Commands issued at the columnar level will be implicitly bound to the current column.
Issuing a PRIMARY KEY is essentially the same as it is in the columnar specification. However, in this case, the
syntax is slightly different. Use the format …PRIMARY KEY(columnname)… instead of …columnname coltype
PRIMARY KEY… to specify a primary key at the table-level.
Additionally, the CONSTRAINT clause differs slightly at the table level. The following listing shows the table-level
CONSTRAINT clause:
CONSTRAINT name { PRIMARY KEY | UNIQUE } (columnname [,…])
[CONSTRAINT name ] CHECK (constraint_clause)
[CONSTRAINT name ] FOREIGN KEY (column[,…])
[REFERENCES reftable(refcolumn[,…])
[MATCH matchtype]
[ON DELETE delaction]
[ON UPDATE upaction]
[[NOT DEFERRABLE] [INITIALLY chktime]]
UNIQUE constraint
The UNIQUE constraint at the table level takes the following syntax:
The PRIMARY KEY constraint at the table level takes the following syntax:
The FOREIGN KEY constraint at the table level takes the following syntax:
REFERENCES constraint
The REFERENCES keyword enables external columns to be bound with the current column for referential integrity
purposes.The general syntax of REFERENCES at the table level is as follows:
For a listing of the valid options that the REFERENCES command can take for the tablelevel version, please refer
to Table 1.1.
Input(s)
columntype—What data type the column will hold. (For more information on data types, see Chapter 2,
"PostgreSQL Data Types.")
NOT NULL—Indicates that the column should not allow null values.
PRIMARY KEY—Indicates that all values for a column will be unique and non-null.
CHECK (conditional)—Used to signify that a conditional expression will be evaluated for the column or table
to determine whether an INSERT or UPDATE is permitted.
INHERITS (inheritable)—Specifies a table(s) from which the current table will inherit all of its fields.
Output(s)
ERROR: DEFAULT: type mismatch (Message returned if the default value type doesn't match the column data
type.)
Notes
Arrays can be specified as a valid columnar data type; however, consistent array dimensions are not enforced.
Up to the 7.0.X version of PostgreSQL, there was a compile-time limit of 8KB of data per row. By changing this
option and recompiling the source, a 32KB limit per row was possible. The newest release of PostgreSQL—Version
7.1—has introduced a new functionality dubbed TOAST (The Oversized-Attribute Storage Technique), which
promises virtually unlimited row-size limits.
Although it is possible to overlap columns with both the UNIQUE and PRIMARY KEY clauses, it is best not to
directly overlap indexes in such a way. Generally, there is a performance hit associated with overlapped indexes.
Ideally, tables referenced by the MATCH command should be columns with UNIQUE or PRIMARY KEY bindings.
However, this is not enforced in PostgreSQL.
SQL-92 Compatibility
There are so many attributes that deal with the CREATE TABLE command and SQL-92 compatibility that it would
be more beneficial to talk about specific cases, as outlined in the following sections.
In PostgreSQL, temporary tables are only locally visible. SQL-92, however, also defines the idea of globally visible
temporary tables.Additionally, SQL-92 also further defines global temporary tables with the ON COMMIT clause,
which can be used to delete table rows after a transaction is completed.
The UNIQUE clause in SQL-92 also allows for the UNIQUE clause at both the table and column level to have these
additional options: INITIALLY DEFERRED, INITIALLY IMMEDIATE, DEFERRABLE, and NOT DEFERRABLE.
In the SQL-92 specification, the NOT NULL clause can also have the following options: INITIALLY DEFERRED,
INITIALLY IMMEDIATE, DEFERRABLE, and NOT DEFERRABLE.
In the SQL-92 specification, the CHECK clause can also have the following options: INITIALLY DEFERRED,
INITIALLY IMMEDIATE, DEFERRABLE, and NOT DEFERRABLE.
In the SQL-92 specification, the PRIMARY KEY clause can also have the following options: INITIALLY
DEFERRED, INITIALLY IMMEDIATE, DEFERRABLE, and NOT DEFERRABLE.
Examples
The following is a basic example of how the command is used to create the table authors. It creates four fields:
one primary key (bound to a sequence default value), two mandatory non-null fields, and one date field.
The following code creates a temporary table that has a field that can hold a two-dimensional array:
The following shows a basic example of how CREATE TABLE can be used to enforce data integrity by including
the use of the CHECK constraint. This example shows how a column constraint is used to mandate that an author
be older than 18:
This example is similar to the preceding, except it shows how a table constraint can be used. Note how the table
constraint is based on two field conditions returning a Boolean true value:
This last example shows how tables can inherit fields from other tables. Additionally, it demonstrates how the
table-level PRIMARY KEY clause can be used to create multicolumn primary keys:
CREATE TABLE new_author
(
new_id INT PRIMARY KEY,
new_name VARCHAR(40) NOT NULL,
CONSTRAINT multikey PRIMARY KEY(new_id, id)
);
INHERITS(author)
CREATE TABLE AS
Usage
Description
The CREATE TABLE AS command is functionally very similar to SELECT INTO; it enables the results of a query
to be used to populate a new table.
If the columnname clause is left out, then all columns will be created in the new table.
Input(s)
select_criteria—The SELECT statement that will be used to generate the table data.
Output(s)
Notes
The user who executes this command will own the resulting table. Likewise, users must have permissions to
create tables and be able to select data from the tables.
SQL-92 Compatibility
This command is a PostgreSQL extension; there is no CREATE TABLE AS specified in the SQL-92 specification.
Example
The following example shows how to create a table called tmp_authors from the existing table authors only
where the author is older than 40:
CREATE TRIGGER
Usage
CREATE TRIGGER trigname { BEFORE | AFTER } {event [OR …] }
ON table
FOR EACH { ROW | STATEMENT }
EXECUTE PROCEDURE func(args)
Description
CREATE TRIGGER specifies that an action is to be bound to a particular table-related event. This concept is close
to the idea of RULES in many ways, but each is better suited for different uses. TRIGGERS are most commonly
used for maintaining referential integrity, either before or after a table event has occurred. RULES are most likely
used to perform cascading SQL commands while an event is in progress.
The CREATE TRIGGER command specifies when to fire a trigger (that is, BEFORE or AFTER) and what event will
trigger it (that is, INSERT, UPDATE, or DELETE). Finally, the user-specified function fires when these conditions
are met.
If a trigger is set to fire before an event, it is possible for the trigger to change (or ignore) the data before it is
inserted. Likewise, if the trigger is set to fire after an event, all of the changes made—including deletions, inserts,
and updates—are visible to the trigger.
Input(s)
func(args)—The function and arguments to fire when event conditions are met.
Output(s)
Notes
The creator of the trigger must also have sufficient rights to the relations in question. Currently, STATEMENT
triggers are not implemented.
SQL-92 Compatibility
SQL-92 does not contain a CREATE TRIGGER statement. This is an extension by PostgreSQL.
Examples
This example uses a function named state_check() to verify that newly inserted state names are greater than
three characters in length.
Now you can test the trigger by inserting some test data:
CREATE TYPE
Usage
Description
PostgreSQL includes a number of built-in data types; however, users can register their own by using the CREATE
TYPE command.
CREATE TYPE requires that two functions (in_function and out_function) exist before a new type can be
defined. The in_function is used to convert the data to an internal data type so that it can be used by the
operators and functions defined for that type. Likewise, out_function converts the data back to its external
representation.
Newly created data types can either be fixed or variable length. Fixed-length types must be explicitly specified
during the definition of the new data type. By using the VARIABLE keyword, PostgreSQL will assume that the
data type is a TEXT type and therefore variable in length.
The ELEMENT and DELIMITER keywords are used when specifying a new data type that is an array. The
ELEMENT keyword specifies the data type of the elements in an array, and the DELIMITER keyword is used to
denote what delimiter is used to separate array elements.
When an external computer will be making use of the newly created data type, it is then necessary to specify
send_function and rec_function. These functions are used to convert the data to and from a format that is
conducive for the external system. If these functions are not specified, it is assumed that the internal data-type
format is acceptable on all machine architectures.
Use the PASSEDBYVALUE keyword to specify to PostgreSQL that operators and functions that make use of the
new data type should be explicitly passed the value—instead of the reference.
Input(s)
typename—The name of the newly created data type.
in_function—The function used to convert from the external representation of a data type to the internal data
type.
out_function—The function used to convert from an internal data type to the external representation.
in_length—Either a literal value or the keyword VARIABLE used to specify the internal length of the data type.
ext_length—Either a literal value or the keyword VARIABLE used to specify the external length of the data
type.
element—If the newly created type is an array, this specifies the type of elements in that array.
delimiter—If the newly created type is an array, this indicates what delimiter appears between the array
elements.The default is a comma (,).
send_function—Specifies the function to convert data to a form for use by an external machine.
rec_function—Specifies the function to convert data from a form for use by an external machine to the format
needed by the local machine.
PASSEDBYVALUE—This variable, if present, indicates that functions or operators using the new data type should
be passed arguments by value instead of by reference.
Output(s)
Notes
The data type specified must be a name that is unique, it must be fewer than 31 characters in length, and it
cannot begin with an underscore (_).
The in_function and out_function must both be defined to accept either one or two arguments of type
opaque.
You cannot use PASSEDBYVALUE to pass values whose internal representation is greater than 4 bytes.
SQL-92 Compatibility
SQL-92 does not specify CREATE TYPE; however, it is defined in the SQL3 proposal.
Example
The following example creates a data type called deweydec, which will be used to hold Dewey decimal numbers.
This example assumes that the functions dewey_in and dewey_out have previously been defined.
CREATE USER
Usage
CREATE USER username
[ WITH [ SYSID uid] [ PASSWORD password]]
[ CREATEDB | NOCREATEDB ]
[ CREATEUSER | NOCREATEUSER ]
[ IN GROUP groupname [,… ]
[ VALID UNTIL abstime]
Description
The CREATE USER command adds a new user to the current PostgreSQL database. The only required variable is
the name of the new user, which must be unique. By default, PostgreSQL will assign the user the next user
identification number (UID); however, it can be specified by including the WITH SYSID clause.
Additionally, a new user can be included in an existing group by specifying the IN GROUP command. Likewise,
certain user rights can be specified at creation time. Users can be given permission to create users of their own
by including the CREATEUSER clause. Likewise, users can be assigned permission to create their own databases
with the CREATEDB option.
PostgreSQL enables usernames to be set to automatically expire at a given time. By using the VALID UNTIL
clause, an absolute time can be specified that sets the expiration time.
Input(s)
abstime—If present, specifies the absolute time at which the new username is set to expire. Otherwise, the
username is valid forever.
Output(s)
Notes
The creator must have sufficient permissions to execute the CREATE USER command. Additionally, creators
become owners of the objects created with the CREATE USER command.
SQL-92 Compatibility
Examples
Create a new user with a specified password in the current database. Assign the user to the group managers:
CREATE VIEW
Usage
Description
Views are useful as a method to implement commonly used queries. Instead of using the full query each time it
is needed, you can define it as a view and reuse it with a much simpler syntax each time it's needed.
Input(s)
selectquery—The SQL query that provides the columns and row specifications for the newly created view.
Output(s)
ERROR: Relation 'viewname' already exists (Message returned if the specified view name is already in
use.)
NOTICE create: attribute name 'column' has an unknown type (Message returned if an explicit
query does not define the data type of the static variable; see the "Examples" section.)
Notes
Views are sometimes referred to as virtual tables; however, thinking of them as macro substitutions is probably
more conceptually correct.
SQL-92 Compatibility
SQL-92 specifies that VIEWS are to be updateable. Currently, PostgreSQL views are read-only.
Examples
Create a view of the books table, where only those books that are fiction are returned:
DECLARE
Usage
Description
The DECLARE statement enables a user to create a cursor to store and navigate a query result. By default,
PostgreSQL returns data in a text format; however, data can also be returned in a binary format by including the
BINARY keyword.
Returning the data as binary information requires that the calling application be able to convert and manipulate it
(the standard psql front end cannot handle binary data). However, there are specific advantages to returning
data in a binary-only format; it usually requires less work of the server and usually results in a smaller-size data
transfer.
Input(s)
selectquery—A SQL query that defines what row and column selections to use for the cursor to be created.
READ ONLY—A keyword that denotes that the cursor is read-only. PostgreSQL, at this time, only generates read-
only cursors. This word is ignored by PostgreSQL.
UPDATE—A keyword that denotes that the cursor should be updateable. PostgreSQL produces only read-only
cursors; this keyword is ignored.
column—For use with the UPDATE keyword. PostgreSQL ignores this word at this time.
Output(s)
NOTICE: Named portals may only be used in begin / end transaction blocks (Message
returned if the cursor is not declared in the transaction block.)
Notes
PostgreSQL does return architecture-specific binary data. Therefore, there can be issues related to big-endian or
little-endian byte ordering. However, all text returns are architecture neutral.
SQL-92 Compatibility
The INSENSITIVE, SCROLL, READ ONLY, UPDATE, and column keywords are reserved for future SQL-92
compatibility. At this time, PostgreSQL creates only read-only cursors.
SQL-92 only allows cursors to be in embedded SQL commands or in modules. PostgreSQL, however, also allows
cursors to exist in interactive methods.
SQL-92 specifies that cursors are to be opened with the OPEN command. PostgreSQL assumes that cursors are
considered open upon declaration. However, ecpg (embedded SQL preprocessor for Postgres) supports the OPEN
command to be in compliance with the SQL-92 specification.
The BINARY keyword is a PostgreSQL extension; no such keyword exists in the SQL-92 specification.
Example
This example creates a cursor for use with the authors table:
DELETE
Usage
Description
DELETE is used to remove all or certain rows from a table. Use the WHERE condition to specify which rows are to
be deleted.
Input(s)
Output(s)
DELETE count (Message returned if successful with the number of rows deleted.)
Notes
The user deleting the rows must have permissions to the table in question as well as to any tables present in the
WHERE condition.
Using DELETE without a WHERE condition results in all rows being deleted. Although not part of the SQL-92
specification, TRUNCATE performs this same function much more efficiently.
SQL-92 Compatibility
DELETE is SQL-92 compatible. However, SQL-92 also specifies that DELETE is allowed from a cursor, which in
PostgreSQL are read-only.
Examples
Delete all the rows from the table where the author's salary is less than $10,000:
DROP AGGREGATE
Usage
Description
The DROP AGGREGATE command deletes all references to the aggregate named from the current database.
Input(s)
Output(s)
NOTICE: RemoveAggregate: aggregate 'agg' for 'type' does not exist (Message returned if the
aggregate does not exist in the current database.)
Notes
SQL-92 Compatibility
There is no CREATE or DROP AGGREGATE in the SQL-92 specification. This is a PostgreSQL extension.
Example
DROP DATABASE
Usage
Description
The DROP DATABASE command deletes the database and all related data named.
Input(s)
Output(s)
ERROR: user 'username' is not allowed to create/drop databases (Message returned if the user
does not have sufficient rights to drop a database.)
ERROR: dropdb: cannot be executed on the template database (Message returned if the user
attempts to drop the template database.)
ERROR: dropdb: cannot be executed on an open database (Message returned if the command is
attempted on a database that is currently open.)
ERROR: dropdb: database 'name' does not exist (Message returned if the specified database name
cannot be found.)
Notes
You cannot issue a DROP DATABASE command on the current database. Usually, the command is performed
while connected to another database or from the command line with the dropdb command.
Due to the need to physically delete files, the DROP DATABASE command cannot take place inside of a
transaction. Usually a DROP command only modifies the system catalogs; therefore, they can be rolled back.
Because a ROLLBACK cannot recover deleted file system objects, this command must be issued as an atomic
entity and not be embedded in an explicit BEGIN…COMMIT clause.
The user must own the database or have superuser permissions to execute the DROP DATABASE command.
SQL-92 Compatibility
The SQL-92 specification does not define a method for the DROP DATABASE command.
Example
DROP FUNCTION
Usage
Description
Removes the function specified from the current database. PostgreSQL allows functions to be overloaded;
therefore, the optional type keyword allows for PostgreSQL to discriminate between similar function names.
Input(s)
funcname—The name of the function to delete.
Output(s)
NOTICE: RemoveFunction: Function "name" ("types") does not exist: (Message returned if the
function name or data type is not valid.)
Notes
The user must own the function to be dropped or have superuser rights to the database.
SQL-92 Compatibility
DROP FUNCTION is a PostgreSQL language extension; the SQL-92 specification does not define it.
Example
This example drops the function called last_check from the current database:
DROP GROUP
Usage
Description
Input(s)
Output(s)
Notes
The DROP GROUP command does not remove the users that make up the group from the database.
SQL-92 Compatibility
Example
The following example deletes the group managers from the current database:
DROP INDEX
Usage
Description
The DROP INDEX command removes an index from the current database.
Input(s)
Output(s)
ERROR: index 'index_name' nonexistent (Message returned if the index name does not exist.)
Notes
To execute this command, the user must own or have superuser rights to the index.
SQL-92 Compatibility
SQL-92 leaves the concept of indexes up to the specific implementation. Therefore, DROP INDEX is a PostgreSQL
implementation.
Example
This example removes the index named checknumber from the current database:
DROP LANGUAGE
Usage
Description
The DROP LANGUAGE command is used to remove a user-defined language from the current database.
Input(s)
ERROR: Language 'name' doesn't exist (Message returned if the language name specified cannot be
found.)
Notes
Warning: PostgreSQL does not do any checks to see if functions depend on the language to be dropped.
Consequently, it is possible to remove a language that is still needed by the system.
To execute the DROP LANGUAGE command, the user needs to own the object or have superuser access to the
database.
SQL-92 Compatibility
Example
DROP OPERATOR
Usage
Description
This command is used to remove an existing operator from the current database. By using the type keyword, it
is possible to specify either the left or the right operator in conjunction with the NONE keyword.
Input(s)
Output(s)
ERROR: RemoveOperator: binary operator 'oper' taking type 'type' and 'type2' does not
exist (Message returned if the specified operator does not exist in the current database.)
ERROR: RemoveOperator: left unary operator 'oper' taking type 'type' does not exist
(Message returned if the left unary operator specified does not exist.)
ERROR: RemoveOperator: right unary operator 'oper' taking type 'type' does not exist
(Message returned if the right unary operator specified does not exist.)
Notes
The DROP OPERATOR command does not check for dependencies that rely on the operator to be dropped.
Therefore, it is the user's responsibility to ensure that all dependencies will continue to be satisfied after the
operation is completed.
SQL-92 Compatibility
Examples
DROP RULE
Usage
Description
The DROP RULE command removes a specific rule designation from the current database. Once removed,
PostgreSQL will immediately cease applying the rule actions to all event triggers.
Input(s)
Output(s)
ERROR: RewriteGetRuleEventRel: rule 'name' not found (Message returned if PostgreSQL cannot
find the rule name specified.)
Notes
The user of this command must either own the rule or have superuser access to the current database in order to
execute the DROP RULE command.
SQL-92 Compatibility
The DROP RULE command is a PostgreSQL extension; there is no specification for this command in SQL-92.
Example
This example drops the rule called del_author from the database.
Usage
Description
The DROP SEQUENCE command removes the named sequence from the current database. PostgreSQL actually
uses a table to hold the current value of the sequence, so in effect, DROP SEQUENCE works like a specific DROP
TABLE command.
Input(s)
Output(s)
NOTICE: Relation 'name' does not exist. (Message returned if PostgreSQL could not find the rule
name specified.)
Notes
PostgreSQL does not do any dependency checking on dropped sequences. Therefore, it is the user's responsibility
to ensure that nothing depends on the sequence before issuing a DROP SEQUENCE command.
The user of this command must either own the sequence named or have superuser rights to the database.
SQL-92 Compatibility
DROP SEQUENCE is a PostgreSQL extension; there is no equivalent command in the SQL-92 specification.
Example
This example removes the sequence named check_numb_seq from the database:
DROP TABLE
Usage
Description
The DROP TABLE command removes the table named, related indexes, and any associated views from the
current database.
Input(s)
name—The name of the table to remove.
Output(s)
ERROR: Relation 'name' Does Not Exist! (Message returned if the table name cannot be located in the
current database.)
Notes
PostgreSQL does not check or warn for FOREIGN KEY relationships that could be affected by executing the DROP
TABLE command; therefore, it is the user's responsibility to ensure that other relations will not be affected by
the command.
Due to the need to physically delete files, the DROP TABLE command cannot take place inside of a transaction.
Usually a DROP command modifies only the system catalogs; therefore, they can be rolled back. Because a
ROLLBACK cannot recover deleted file system objects, this command must be issued as an atomic entity and not
be embedded in an explicit BEGIN…COMMIT clause.
The user of this command must own the table and associated objects or have superuser rights to the current
database.
SQL-92 Compatibility
DROP TABLE is mostly SQL-92 compliant. However, the SQL-92 specification also includes the keywords
RESTRICT and CASCADE in the command. These keywords are used to limit or cascade the removal of a table to
other referenced objects.At this time, PostgreSQL does not support these commands.
Examples
DROP TRIGGER
Usage
Description
The DROP TRIGGER command will remove the trigger specified from the current database.
Input(s)
Output(s)
DROP (Message returned if the command was executed successfully.)
ERROR: Drop Trigger: there is no trigger 'name' on relation 'table': (Message returned if
PostgreSQL cannot locate the trigger name specified.)
Notes
The user of this command must either own the object or have superuser access to the current database.
SQL-92 Compatibility
There is no DROP TRIGGER definition in the SQL-92 specification. This is a PostgreSQL language extension.
Example
This example removes the trigger state_checktrigger from the payroll table:
DROP TYPE
Usage
Description
The DROP TYPE command is used to remove the type specified from the current database.
Input(s)
Output(s)
ERROR: RemoveType: type 'name' does not exist (Message returned if PostgreSQL cannot locate the
name specified.)
Notes
The user must own the objects or have superuser access to the type of object that is to be removed.
PostgreSQL does not do any dependency checking on the removal of TYPE objects. Therefore, it is the user's
responsibility to ensure that any operators, functions, aggregates, or other objects that depend on the data type
will not be left in an inconsistent state as a result of the removal of that data type.
SQL-92 Compatibility
SQL-92 does not specify a DROP TYPE command; however, it is part of the SQL3 specification.
Example
To remove the data type int4 from the database:
Warning
This action could be very dangerous; the int4 object is an important part of the PostgreSQL system.
This example is provided for sample purposes only—DO NOT EXECUTE IT! Removing the int4 object
will result in serious corruption of your database.
DROP USER
Usage
Description
The DROP USER command is used to remove a user from the current database.
Input(s)
Output(s)
ERROR: DROP USER: user 'name' does not exist (Message returned if the username specified cannot
be found.)
DROP USER: user 'name' owns database 'name' (Message returned if the user who attempted to drop a
database owns any database.)
Notes
PostgreSQL will not allow a user who owns a database to be dropped. However, PostgreSQL does not do a
dependency check for objects owned by the user. Therefore, it is the user's responsibility to ensure that other
database objects will not be left in an inconsistent state after the DROP USER command is completed.
SQL-92 Compatibility
The DROP USER command is a PostgreSQL language extension. There is no SQL-92 command for DROP USER.
Example
DROP VIEW
Usage
Description
The DROP VIEW command removes the view specified from the current database.
Input(s)
Output(s)
ERROR: RewriteGetRuleEventRel: rule '_RETname' not found (Message returned if the view named
does not exist in the current database.)
Notes
The DROP VIEW command removes the named view from the current database.
SQL-92 Compatibility
The SQL-92 specification defines some additional features for the DROP VIEW command: RESTRICT and
CASCADE. These keywords determine whether items that reference the view in question are also dropped. By
default, PostgreSQL only deletes the view explicitly named.
It is the user's responsibility to ensure that other database objects will not be left in an inconsistent state after
the DROP VIEW command is completed.
Example
The following command removes the view fictionbooks from the current database:
END
Usage
Description
The explicit BEGIN…END clauses are used to encapsulate a series of SQL commands to ensure proper execution.
If any of the commands in the series fail, it can cause the entire transaction to roll back, bringing the database
back to its original state.
By default, all commands issued in PostgreSQL are performed in an implicit transaction. The END keyword is
equivalent to COMMIT.
Input(s)
Output(s)
Notes
Generally, it is best to use the COMMIT PostgreSQL keyword, thereby maintaining SQL-92 compatibility.
See ABORT, BEGIN, and ROLLBACK for more information regarding transactions.
SQL-92 Compatibility
The END keyword is a SQL-92 extension. It is equivalent to the SQL-92 word COMMIT.
Example
This example shows how END can be used to terminate a PostgreSQL transaction:
BEGIN;
END;
EXPLAIN
Usage
Description
The EXPLAIN command is used to profile and trace how queries are being executed. It gives insight into how the
PostgreSQL planner generates an execution plan for the supplied query. It also displays what indexes will be used
and what join algorithms it will employ.
The output of the EXPLAIN command generates the startup time before the first tuple can be returned, the total
time for all tuples, and what type of scan is being used (that is, sequential, index, and so on).
The VERBOSE argument will cause EXPLAIN to dump the full internal representation of the plan tree instead of
just the summary. This option is typically used for performance tuning and advanced debugging scenarios.
Input(s)
VERBOSE—Optional keyword that will produce the full execution plan and all internal states. Useful mostly for
debugging.
Output(s)
NOTICE: QUERY PLAN: plan (Message returned along with the execution plan.)
Notes
See Chapter 10, " Common Administrative Tasks," and its section titled "Performance Tuning" for more
information on query optimization.
SQL-92 Compatibility
Examples
This example supposes that the authors table has a single field of an int4 data type and 1,000 rows of data.
Additionally, this example assumes that the authors table has no index set:
The following example includes the addition of a WHERE constraint and an index on the single field in the
authors table. Notice the improvement in total time and the fact that only one row is returned:
This final example includes the use of a sum() aggregate added to the preceding example. Notice how the start
time for the aggregate is .38, which is also the total time for the index scan. This, of course, is because an
aggregate cannot function until data is provided to it.
Usage
Description
The FETCH command retrieves rows from a defined cursor. The cursor should have previously been defined in a
DECLARE statement.
The number of rows to retrieve can either be specified by a signed integer or be one of the following: ALL, NEXT,
or PRIOR.
In addition to the number of rows to retrieve, the direction of the next retrieval can also be specified. By default,
PostgreSQL searches in a FORWARD direction. However, by using a signed integer, the resulting direction of the
search can be changed from what is specified by the keywords alone. For instance, FORWARD -1 is functionally
the same as BACKWARD 1.
Input(s)
number—A signed integer to indicate the number of rows to retrieve in the specified direction.
NEXT—Retrieve the next single row in the specified direction (for example, equivalent to using 1 count).
PRIOR—Retrieve the previous single row from the specified direction (for example, equivalent to using the -1
count).
Output(s)
NOTICE: PerformPortalFetch: portal 'cursor' not found (Message returned if the cursor specified
has not been declared.)
NOTICE: FETCH/ABSOLUTE not supported, using RELATIVE (Message returned because PostgreSQL
does not support absolute positioning in cursors.)
ERROR: FETCH/RELATIVE at current position is not supported (Message returned if the user tried
to execute a FETCH RELATIVE 0 command. This command, although valid in SQL-92, is not supported in
PostgreSQL.)
Notes
By using a signed integer with a directional statement, search directions can be reversed. For instance, the
following commands are all functionally identical:
FETCH FORWARD 1 IN mycursor
FETCH FORWARD NEXT IN mycursor
FETCH BACKWARD PRIOR IN mycursor
FETCH BACKWARD -1 IN mycursor
PostgreSQL currently supports read-only cursors but not updateable cursors. Therefore, updates must be entered
explicitly and cannot take place in a cursor.
Use the MOVE command to navigate through a cursor without retrieving rows of data.
SQL-92 Compatibility
PostgreSQL allows cursors to exist outside of embedded use, which is an extension from the original SQL-92
specification.
Additionally, SQL-92 declared some additional features for the FETCH command. Absolute cursor positioning
through the ABSOLUTE command and storing results in variables through the INTO command were both defined
in SQL-92 but do not exist in PostgreSQL.
Example
This example shows a cursor created from the authors table and then FETCH being used to retrieve specific
rows:
BEGIN;
DECLARE mycursor CURSOR FOR SELECT * FROM authors;
FETCH FORWARD 3 IN mycursor;
CLOSE mycursor;
COMMIT;
GRANT
Usage
Description
The GRANT command is used to assign specific privileges to groups, users, or the public at large. By default,
creators of an object get all privileges assigned to them for that object. Users other than the creator need to be
given explicit rights, or belong to a group that inherits such rights, to access an object.
Tables
Views
Sequences
Input(s)
Output(s)
ERROR: ChangeAcl: class 'object' not found: (Message returned if the object specified cannot be
located to assign permission to.)
Notes
To grant access to only a specific column, the following procedure must be carried out:
See the REVOKE command for information on how to remove permissions assigned with GRANT.
SQL-92 Compatibility
SQL-92 defines some additional settings for the GRANT command. Specifically, it allows privileges to be set down
to the column level. Additionally, the SQL-92 specification includes the following:
Privileges
References
Usage
Objects
Character Set
Collation
Translation
Domain
Examples
The following example gives the user bill certain rights to the authors table:
INSERT
Usage
Description
The INSERT command is used to append new rows to a table. Additionally, by using a SELECT query, numerous
rows can be appended simultaneously.
The particular columns can be specified during an insert, or if not included, PostgreSQL will attempt to insert the
default value for that column.
If an attempt is made to insert the wrong data type into a column, PostgreSQL will automatically try to convert
the data to the correct data type.
Input(s)
Output(s)
INSERT oid 1 (Message returned if one row was inserted along with the OID of that object.)
INSERT 0 number (Message returned if multiple rows were inserted; includes the number of rows inserted.)
Notes
The user executing this command must have insert privileges to the table specified.
SQL-92 Compatibility
Examples
This example shows a basic use for the INSERT command. Data is inserted into the three-column table
authors:
This example shows how the SELECT command is used in conjunction with the INSERT command. Notice how
the columns returned from the SELECT statement should match the columns specified in the INSERT command.
LISTEN
Usage
LISTEN name
Description
The LISTEN command is used in conjunction with NOTIFY. LISTEN registers a name to the PostgreSQL back
end and listens for a notification from a NOTIFY command.
Multiple clients can all listen on the same LISTEN name. When a notification comes for that name, all clients will
be notified.
Input(s)
Output(s)
NOTICE: Async_Listen: We are already listening on 'name' (Message returned if the back end
already has that LISTEN name registered.)
Notes
SQL-92 Compatibility
Example
This example registers a name with the LISTEN command and then sends a notification:
LISTEN IAmWaiting;
NOTIFY IAmWaiting;
LOAD
Usage
LOAD filename
Description
The LOAD command is used to load an object file (a .o from a C-compiled file) for use by PostgreSQL. After the
file has been loaded, all functions contained therein will be available for use.
Alternatively, if no LOAD command is explicitly given, PostgreSQL will automatically load the necessary object file
once the function is called.
If the code in an object file has changed, the LOAD command can be issued to refresh PostgreSQL and make
those changes visible.
Input(s)
Output(s)
ERROR: LOAD: could not open file 'name' (Message returned if the filename specified could not be
found.)
Notes
The object file must be reachable from the PostgreSQL back end; therefore, the user needs to take into account
pathnames and permissions before specifying the file.
Care should be taken in designing object files to prevent errors. Functions in a user-defined object file should not
call other user-defined object files. Ideally, all function calls should exist in the same object file or be linked to
one of the standard C, math, or PostgreSQL library files.
SQL-92 Compatibility
The SQL-92 specification does not define a LOAD command; this is a PostgreSQL extension.
Example
LOAD '/home/bill/myfile.o'
LOCK
Usage
Or
Or
Description
The LOCK TABLE command is used to control simultaneous access to the specified table. PostgreSQL, by default,
automatically handles many table-locking scenarios. However, there are cases when the capability to specify is
helpful.
EXCLUSIVE—Prevents any other lock type from being granted on the table for the duration of the
transaction.
SHARE—Allows others to share the lock as well, but prevents exclusive locks for the duration of the
transaction.
The following table lists the common lock modes, their typical uses, and what conflicts they produce with other
lock modes:
ACCESS SHARE MODE SELECT (any table query) (This is the least restrictive lock.)
ACCESS EXCLUSIVE MODE LOCK TABLE (This is the most restrictive lock.)
ALTER TABLE
DROP TABLE
VACUUM
EXCLUSIVE
ACCESS EXCLUSIVE
SHARE
EXCLUSIVE
ACCESS EXCLUSIVE
ROW EXCLUSIVE
SHARE
EXCLUSIVE
ACCESS EXCLUSIVE
ACCESS EXCLUSIVE
DELETE EXCLUSIVE
ACCESS EXCLUSIVE
Input(s)
SHARE ROW EXCLUSIVE MODE—Like an EXCLUSIVE lock, but it allows SHARE ROW locks by others.
Output(s)
ERROR 'tablename': Table does not exist (Message returned if the LOCK command could not locate
the table specified.)
Notes
To prevent deadlocks (pauses that occur when two transactions each wait for the other to complete), it is
important for transactions to acquire locks on objects in the same order. For instance, if a transaction updates
Row 1 and then Row 2, then a separate transaction should also update Row 1 and then Row 2 in that order and
not vice versa.
Additionally, if multiple locks are involved from a single transaction, the most restrictive lock should be used.
PostgreSQL will detect deadlocks and roll back at least one of the waiting transactions to resolve it.
Most LOCK modes (except ACCESS SHARE/EXCLUSIVE) are compatible with Oracle's LOCK modes.
SQL-92 Compatibility
The SQL-92 specification uses the SET TRANSACTION clause to specify concurrent table access, which is
supported by PostgreSQL. (See the SET command.)
Example
This example locks the entire table authors to prevent any other access while the updates occur:
BEGIN;
LOCK TABLE authors;
UPDATE authors SET status='active';
COMMIT;
MOVE
Usage
Description
The MOVE command enables a user to navigate through a cursor without actually retrieving any of the data. It
works somewhat like the FETCH command, except it only positions the cursor.
Input(s)
count—Either a signed integer or the keyword NEXT or PRIOR; these specify how many rows to move from the
current position.
cursorname—The name of the cursor to move through; it should already have been defined with a DECLARE
statement.
Output(s)
Notes
By using a signed integer with a directional statement, movement directions can be reversed. For instance, the
following commands are all functionally identical:
MOVE works in a very similar fashion to FETCH. Refer to FETCH for more information.
SQL-92 Compatibility
SQL-92 does not specify a MOVE command. However, it is possible to FETCH rows starting from a defined
position. The effect is an implied MOVE to that defined position.
Example
The following example defines a cursor mycursor and then navigates through it, using the MOVE command to
retrieve specific rows:
BEGIN;
DECLARE mycursor CURSOR FOR SELECT * FROM authors;
MOVE FORWARD 3 IN mycursor;
FETCH NEXT IN mycursor;
COMMIT;
NOTIFY
Usage
NOTIFY name
Description
The NOTIFY command is used in conjunction with LISTEN to send a notification message to clients who have
registered a name to listen on. It provides a way to implement a basic messaging system between client and
server processes. A typical use might be to inform client applications that a specific table has changed, prompting
the client applications to redisplay their data.
The information passed to the client application includes the notification name and the PID of the back-end
process.
Input(s)
Output(s)
Notes
NOTIFY events are actually executed inside a PostgreSQL transaction; therefore, this has some important
implications.
First, notifications will not be sent until the entire transaction is committed. Particularly, this is relevant if the
NOTIFICATION is part of a RULE or TRIGGER associated with a table. The notification will not be sent until the
entire transaction involving that table has completed.
Second, if a listening front-end receives a notification while it is in a transaction, the NOTIFY event will be
delayed until its transaction is completed.
It is not a good practice to have a front-end application depend on the number of notifications it receives. It is
possible, if many notifications are sent in quick succession, that the client would only receive one notification.
SQL-92 Compatibility
The NOTIFY command is a PostgreSQL extension; there is no such command in the SQL-92 specification.
Example
This example registers a name with the LISTEN command and then sends a notification:
LISTEN IAmWaiting;
NOTIFY IAmWaiting;
REINDEX
Usage
Description
The REINDEX command is used to recover from corruptions of system indexes. To run this command, the
postmaster process must be shut down, and PostgreSQL should be launched with the -O and -P options. (This
is to prevent PostgreSQL from reading system indexes upon startup.)
Input(s)
FORCE—Forces PostgreSQL to overwrite the current index, even if PostgreSQL determines that it is still valid.
Output(s)
SQL-92 Compatibility
This is a PostgreSQL language extension. No such command is defined in the SQL-92 specification.
Example
RESET
Usage
RESET variable
Description
RESET changes a run-time variable back to its default setting. It is functionally equivalent to a SET variable
TO DEFAULT command.
Input(s)
Output(s)
Notes
See the SET command for further discussion and for a list of run-time variables.
SQL-92 Compatibility
RESET is a PostgreSQL language extension. There is no RESET command in the SQL-92 specification.
Example
This example restores the variable DateStyle back to its default setting:
RESET DateStyle;
REVOKE
Usage
Description
The REVOKE command enables the owner of an object (or a superuser) to remove permissions granted to a user,
a group, or the public on a specific object.
Tables
Views
Sequences
Input(s)
Output(s)
ERROR (Message returned if an object was not found or if the permissions specified could not be revoked.)
Notes
Refer to the GRANT command for more information on assigning privileges to a user or group.
SQL-92 Compatibility
The SQL-92 specification for the REVOKE command has some additional functionality. It allows privileges to be
removed at the column level, as well as removing additional privileges not mentioned here. Specifically, these are
the following:
Usage
Grant Option
References
Examples
This example shows how to remove the user bill's privileges for changing data in the table authors:
To remove all users from being able to see or modify the table payroll:
ROLLBACK
Usage
Description
The ROLLBACK command is used to stop and reverse a PostgreSQL transaction that is currently in progress.
When PostgreSQL receives a ROLLBACK command, any changes made to tables are automatically reverted to
their original state.
By default, all commands issued in PostgreSQL are performed in an implicit transaction. The explicit use of the
BEGIN…COMMIT clauses encapsulates a series of SQL commands to ensure proper execution. If any of the
commands in the series fail, a ROLLBACK command can be issued, thereby bringing the database back to its
original state.
Input(s)
None. WORK and TRANSACTION are optional keywords that have no functional effect.
Output(s)
Notes
The COMMIT command is used to successfully ensure that transactional actions are completed successfully.
See ABORT, BEGIN, and COMMIT for more information regarding transactions.
SQL-92 Compatibility
The ROLLBACK command is fully SQL-92 compliant. SQL-92 also specifies ROLLBACK WORK as a valid statement,
which is also supported by PostgreSQL.
Example
This example shows a transaction in progress that is terminated by using a ROLLBACK command:
BEGIN;
SELECT * FROM authors;
ROLLBACK;
SELECT * FROM authors;
SELECT
Usage
Description
The SELECT command is used to retrieve rows from a single or multiple tables. If the WHERE condition is not
given, all rows are returned.
FROM Clause
The FROM clause identifies what tables are to be included in the query. If the FROM clause is simply a table name,
by default, this includes rows from inherited relations. The ONLY option will limit results to be only from the
specified table.
The FROM clause can also refer to SUB-SELECT, which is useful for performing advanced grouping, aggregation,
and ordering functions.
The FROM clause can also refer to a JOIN statement, which is the combination of two distinct FROM locations.
The following JOIN types are supported:
INNER JOIN | CROSSJOIN. A straight combination of the included row sources with no qualification made
for row removal.
The OUTER JOINs presented here in the next three bullets are a feature of Version 7.1 and above.
Previous versions of PostgreSQL do not support OUTER JOINs.
LEFT OUTER JOIN. The left-hand row source is returned in full, but the right-hand rows are returned only
where they passed the ON qualification. The left-hand rows are fully extended across the width of the
result, using NULLs to pad the areas where right-hand rows are missing.
RIGHT OUTER JOIN. The converse of a LEFT OUTER JOIN. All right-hand rows are returned, but left-
hand rows are only returned where they passed the ON qualification. The right-hand rows are fully extended
across the width of the result, using NULLs to pad the areas where the left-hand rows are missing.
FULL OUTER JOIN. A FULL OUTER JOIN returns all left-hand rows (NULL extended to right) and all
right-hand rows (NULL extended to left).
DISTINCT Clause
The DISINCT clause allows the user to specify whether duplicate rows are returned or not. The default is to
return ALL, including duplicate rows.
By specifying DISTINCT ON in conjunction with ORDER BY, it is possible to limit duplicate returns based on
specific columns.
WHERE Clause
The WHERE clause is used to limit what rows are returned. An expression that constitutes a valid WHERE clause
evaluates to a Boolean expression. For instance:
For example:
WHERE Name='Barry')
The condition can be one of =, <, <=, >, >=, <>, ALL, ANY, IN, and LIKE.
GROUP BY Clause
The GROUP BY clause is used to consolidate duplicate rows into single entries. All fields selected must contain
identical rows for the rows to be consolidated.
In the case in which an aggregate is on a field, the aggregate function will be computed for all members in each
group.
By default, GROUP BY attempts to function on input columns. However, if used with a SELECT AS clause, GROUP
BY can function on output columns.Additionally, GROUP BY can be used with the ordinal column number.
HAVING Clause
The HAVING clause filters out groups of rows generated by a GROUP BY command. Essentially, the HAVING
clause is like a WHERE filter for GROUP BY conditionals. However, the WHERE clause goes into effect before a
GROUP BY is run, whereas HAVING is executed after the GROUP BY has finished.
ORDER BY Clause
The ORDER BY clause instructs PostgreSQL to order the output of a SELECT command by specific columns. If
multiple columns are specified, the output order will match the left-to-right order of the columns specified.
The direction of ordering can be specified by using either the ASC (ascending) or DESC (descending) option. By
default, ASC is assumed.
In addition to specifying column names, the ordinal numbers of the respective columns can also be used. If the
ORDER BY declaration name is ambiguous, an output column name will be assumed. This functions opposite of
the GROUP BY clause.
UNION Clause
The UNION clause allows the output result to be a collection of rows from two or more queries. To function, each
query must have the same number of columns and the same respective data types.
By default, UNION composites do not contain duplicate rows, but they can if the ALL option is specified.
INTERSECT Clause
The INTERSECT clause gathers a composite output result from a collection of like queries. To function, each
query must have the same number of columns and the same respective data types.
INTERSECT differs from UNION because only the rows that are in common to both queries are returned.
The FOR UPDATE clause performs an exclusive lock on the selected rows to facilitate data modifications.
EXCEPT Clause
The EXCEPT clause returns composite output resulting from a collection of like queries. To function, each query
must have the same number of columns and the same respective data types.
The EXCEPT clause differs from UNION in that all rows from the first query are returned but only nonmatching
rows from the second column.
LIMIT Clause
The LIMIT clause is used to specify the maximum number of rows returned. If the OFFSET option is included,
that many rows will be skipped before the LIMIT command starts to take effect.
LIMIT usually returns meaningful results only when used in conjunction with an ORDER BY command;
otherwise, it is difficult to know what significance the rows being returned have.
Input(s)
name—Specifies an alternate name for a column or expression. Often used to rename the result of an aggregate
(that is, SELECT sum(check) AS TotalPayroll).
TEMPORARY | TEMP—The results of SELECT are sent to a unique temporary table, which is deleted once this
session is complete.
new_table—The results of SELECT are sent to a new query with this specified name. (See the SELECT INTO
command for more information.)
fromitem—The name of a table, sub-select, or JOIN clause to select rows from (see preceding for JOIN).
alias—Defines an optional name for the preceding table. Used to prevent confusion when dealing with same-
table joins.
wherecondition—The SQL statement that returns a Boolean value once evaluated. This output determines
what rows are initially filtered by the query. Alternatively, an asterisk (*) signifies ALL.
secselect—The secondary SELECT statement for use in UNIONS. Standard SELECT statement, except in the
version of SELECT, cannot include ORDER BY or LIMIT clauses.
start—Used with the OFFSET command to not begin returning data until the specified number of initial rows
have been returned.
Output(s)
XX ROWS (Message returned after the data set that indicates the number of rows was returned.)
Notes
The user executing the SELECT command must have permissions to select the tables involved.
SQL-92 Compatibility
The major components of the PostgreSQL SELECT command are SQL-92 compliant, except for the following
areas:
GROUP BY—In SQL-92, this command can only refer to input column names, whereas PostgreSQL can use
both.
ORDER BY—In SQL-92, this command can only refer to output (result) column names, whereas PostgreSQL
can use both.
UNION clause—In SQL-92, this command allows an additional option, CORRESPONDING BY, to be
included. This option is not available in PostgreSQL.
Examples
This example shows a simple SELECT statement from the table authors, selecting rows where they match a
specific name:
Here's the example again, this time with a LIMIT command to return only the two newest members:
To join the current authors table with the payroll table to get the last check amount:
Use of aggregate functions (like count(), sum(), and so on) provides easy methods of summarizing data that
would be tedious to compute otherwise. In this example, you use the count() function to tell us how many
authors are named Sam:
count
-----
4
Use the SUM function, GROUP BY, and a JOIN to tell us how much all the Sams have been paid:
This example shows how a sub-select works. All the people from payroll who made more than $900 on their
last check are chosen, and then their names are displayed from a join to authors:
SELECT INTO
Usage
Description
The syntax of the SELECT INTO command is essentially the same as for a regular SELECT command; the only
difference is that the output of the query is directed to a new table.
Input(s)
Output(s)
Notes
The user that executes this command will become the owner of the newly created table.
SQL-92 Compatibility
Example
Create a new table from only the people named Sam in your authors table:
SET
Usage
Or
SET CONSTRAINTS { ALL | list} mode
Or
Or
Description
Essentially, the SET command is used to set a run-time variable in PostgreSQL. However, the specific usage
varies greatly depending on what run-time variable is being set.
After a variable has been SET, the SHOW command can be used to display its current setting, and the RESET
command can be used to set it to its default value.
Input(s)
The basic list of valid variables and value combinations follow in the next section.
CLIENT_ENCODING | NAMES
Parameter(s): value
DATESTYLE
Parameter(s):
SEED
Parameter(s): value
Sets the random number generator with a specific seed (floating point between 0 and 1).
SERVER_ENCODING
Parameter(s): value
This option is only available if MULTIBYTE support has also been enabled.
CONSTRAINTS
Parameter(s): value
Sets the time zone depending on your operating system (that is, /usr/lib/zoneinfo or
/usr/share/zoneinfo has valid time-zone values for a Linux-based OS).
PG_OPTIONS
PG_OPTIONS can take several internal optimization parameters. They are as follows:
all
deadlock_timeout
executorstats
hostlookup
lock_debug_oidmin
lock_debug_relid
lock_read_priority
locks
malloc
nofsync
notify
palloc
parserstats
parse
plan
plannerstats
pretty_parse
pretty_plan
pretty_rewritten
query
rewritten
shortlocks
showportnumber
spinlocks
syslog
userlocks
verbose
RANDOM_PAGE_COST
Parameter(s): float-value
Sets the optimizer's estimate of the cost of nonsequentially fetched disk pages.
CPU_TUPLE_COST
Parameter(s): float-value
Sets the optimizer's estimate of the cost of processing each tuple during a query.
CPU_INDEX_TUPLE_COST
Parameter(s): float-value
Sets the optimizer's estimate of the cost of processing each indexed tuple during a query.
CPU_OPERATOR_COST
Parameter(s): float-value
Sets the optimizer's estimate of the cost of processing each operator in a WHERE clause during a query.
EFFECTIVE_CACHE_SIZE
Parameter(s): float-value
Sets the optimizer's assumptions about the effective size of the disk cache.
ENABLE_SEQSCAN
Enables/disables the planner's use of sequential scan types. (Note: This capability is actually impossible to turn
off completely, but setting it as disabled discourages its use.)
ENABLE_INDEXSCAN
ENABLE_TIDSCAN
ENABLE_SORT
Enables/disables the planner's use of explicit sort types. (Note: This capability is actually impossible to turn off
completely, but setting it as disabled discourages its use.)
ENABLE_NESTLOOP
Enables/disables the planner's use of nested loops in join plans. (Note: This capability is actually impossible to
turn off completely, but setting it as disabled discourages its use.)
ENABLE_MERGEJOIN
ENABLE_HASHJOIN
GEQO
KSQO
MAX_EXPR_DEPTH
Parameter(s): integer
Sets the maximum nesting depth that the parser will accept.
Caution
Output(s)
NOTICE: Bad value for variable (value) (Message returned if the value specified cannot be used with
the declared variable.)
Notes
Use the SHOW command to display the value at which a variable is currently set.
SQL-92 Compatibility
The only use of the SET command defined in the SQL-92 specification is for SET TRANSACTION ISOLATION
LEVEL and SET TIME ZONE. Outside of these specific areas, this command is a PostgreSQL language extension.
Example
RightNow
----------------------
2001-08-15 09:50:23-06
SHOW
Usage
SHOW variable
Description
The SHOW command is used to display the current value of a run-time variable. It can be used in conjunction with
the SET and RESET commands to change variable settings.
Input(s)
Output(s)
NOTICE: Unrecognized variable value (Message returned if the variable name specified cannot be
found.)
NOTICE: Time zone is unknown (Message returned if the TZ or PGTZ variables are not set correctly.)
Notes
For a list of valid variables that can be displayed, refer to the SET command.
SQL-92 Compatibility
SHOW is a PostgreSQL extension. There is no SHOW command defined in the SQL-92 specification.
Example
This example shows what the current date style is set to:
SHOW datestyle;
TRUNCATE
Usage
Description
The TRUNCATE command quickly deletes all rows from the specified table. Functionally, it is the same as a
DELETE command, but it is much faster.
Input(s)
Output(s)
Notes
The user of this command must own the table specified or have DELETE privileges to execute this command.
SQL-92 Compatibility
This is a PostgreSQL extension; the SQL-92 method for achieving this same effect would be to elicit an
unqualified DELETE command.
Example
UNLISTEN
Usage
UNLISTEN { notifyname | * }
Description
The UNLISTEN command is used to stop a front-end from waiting on a LISTEN command. The specific name to
stop listening on can be specified, or a wildcard (*) can be specified, which will stop listening on all previously
registered names.
Input(s)
Output(s)
Notes
Once unregistered, further NOTIFY commands sent by the server will be ignored.
SQL-92 Compatibility
Example
This example shows the name mynotify being registered, sending notification, and then being unregistered:
LISTEN mynotify;
NOTIFY mynotify;
Asynchronous NOTIFY 'mynotify' from backend with pid '7277' received
UNLISTEN mynotify;
NOTIFY mynotify;
UPDATE
Usage
UPDATE table SET column=expression [,…]
FROM fromlist
WHERE condition
Description
The UPDATE command is used to change the data in specific rows in a table. If no WHERE condition is specified,
all rows are assumed; otherwise, only those rows matching the WHERE criteria are updated.
By using the FROM keyword, multiple tables can be used to satisfy the WHERE condition.
Input(s)
condition—A standard SQL WHERE condition to constrain the updates. (See SELECT for more information on
WHERE conditions.)
Output(s)
UPDATE # (Message returned if successful. Output includes the number of rows where data was changed.)
Notes
The user of the UPDATE command must have write permissions to the table specified, as well as SELECT
permissions on any tables needed in the WHERE clause.
SQL-92 Compatibility
The UPDATE command is mostly compliant with the SQL-92 specification, except the following:
FROM fromlist—PostgreSQL allows multiple tables to satisfy the WHERE condition. This is not supported
in SQL-92.
WHERE CURRENT OF cursor—SQL-92 allows updates to be positioned based on an open cursor. This is
not supported in PostgreSQL.
Example
The following example updates the column status to active for all people named Bill in the authors table:
VACUUM
Usage
Description
The VACUUM command serves two purposes: to reclaim wasted disk space and to profile PostgreSQL optimization
performance.
When the VACUUM command is run, all classes in the current database are opened, and old records from rolled-
back transactions are cleared out. Additionally, the system catalog tables are then updated with information
concerning the optimization statistics for each class. Furthermore, if run with the ANALYZE command,
information related to the dispersion of column data will be updated to improve query execution paths.
Input(s)
ANALYZE—Updates the column statistics for each table. This information is used by the query optimization
routine to plan the most efficient searches.
Output(s)
NOTICE: - Relation 'table' (The report header for the specified table.)
NOTICE: Pages XX, Changed XX, Reapped XX, Empty XX, New XX; Tup XXXX: Vac XXXX, Crash
XX, Unused XX, MinLen XXX, MaxLen XXX; Re-using: Free/Avail. Space XXXXXXX/XXXXXXX;
EndEmpty/Avail. Pages X/XX. Elapsed X/X sec (Message returned that is the analysis table.)
NOTICE: Index 'name': Pages XX; Tuples XXXX: Deleted XXXX. Elapsed X/X sec (The analysis
report for an index.)
Notes
VACUUM is a good candidate for running as a nightly cron job. For running this command outside of a psql or
other front-end application, see the vacuumdb command in Chapter 6, "User Executable Files," and the
section,"vacuumdb."
VACUUM ANALYZE should be run after significant deletions or modifications have been made to a database.
SQL-92 Compatibility
Example
3 PostgreSQL Operators
4 PostgreSQL Functions
A further benefit of data types is that they provide a significant boost to the overall
performance and efficiency of a database system. Because the database knows
what type of data is stored in a given column, assumptions can be made about how
to most efficiently store and retrieve that data.
Table of Data Types
The following is a table with the PostgreSQL built-in data types sorted according to
what type of data they hold. This chart might be useful if you know the type of data
you want to store but are unsure of the PostgreSQL designation for that data type.
TIME WITH TIME Time of day and time zone from 00:00:00+12 to
ZONE 23:59:59-12.
Right-padded.
[*]Some of the data-type names for the Numeric category are only supported
in PostgreSQL 7.1. For instance, in Version 6.5, BIGINT is not supported, but
INT8 is supported.
The following pages comprise a listing of the built-in data types. They are broken up
according to the types of data they define. Each element is described with relevant
information such as storage size, range of values, compatibility, and descriptive
notes.
Geometric Data Types
BOX
Description
Inputs
((x1,y1),(x2,y2))
Storage Size
32 bytes
Example Data
((1,1),(50,50))
Notes
On input, the data is reordered to store the lower-right corner first and the upper-
left corner second.Therefore, the preceding example, when stored, would be
represented as (50,50),(1,1).
CIRCLE
Description
Holds the coordinates that represent a center point and radius of a circle.
Inputs
<(x,y),r>
r—Radius
Storage Size
24 bytes
Example Data
<(10,10),5>
LINE
Description
Inputs
((x1,y1),(x2,y2))
Storage Size
32 bytes
Example Data
((1,1),(100,100))
Notes
LSEG
Description
Inputs
((x1,y1),(x2,y2))
Storage Size
32 bytes
Example Data
((1,1),(100,100))
Notes
The LSEG data type is similar to LINE, except that the latter represents an infinite
line as opposed to a specified segment. Essentially, once a line is defined with the
LINE data type, it is assumed that it will continue along the same plane in
perpetuity.
PATH
Description
Paths represent variable line segments, in which there can be numerous points that
create either an open or closed path.
Inputs
((x1,y1),…,(xn,yn))
[(x1,y1),…,(xn,yn)]
Storage Size
4 + 32n bytes
Example Data
Notes
Closed paths begin with an open parenthesis and open paths begin with an open
bracket.
The functions isopen, isclosed, popen, and pclose can be used to test and
manipulate paths.
POINT
Description
Inputs
(x,y)
x—X-axis of point.
y—Y-axis of point.
Storage Size
16 bytes
Example Data
(1,5)
POLYGON
Description
Inputs
((x1,y1),…,(xn,yn))
xn, yn—The ending point of the polygon (will usually be the same as the starting
point).
Storage Size
4+32n bytes
Example Data
Notes
A polygon is very similar to a closed path. However, there are some additional
functions that only act on polygons (that is, poly_center, poly_contain,
poly_left, poly_right, and so on).
Logical Data Types
Logical data types are used to represent the concepts of true, false, or NULL.
Typically, this data type is useful as a flag that indicates the current state of a
record. The true and false values are self-explanatory, while the value NULL
usually indicates the equivalent of "unknown."
BOOLEAN
Description
Inputs
Storage Size
1 byte
Notes
Generally, it is best to use the TRUE and FALSE input forms for Boolean data. These
formats are SQL compatible and generally are more accepted, although some
RDBMSs use 1 and 0 for TRUE and FALSE representations. Some of the input values
need to be escaped by enclosing them in single quotes (i.e., 't'). However, the
SQL-compliant TRUE and FALSE forms do not require quotations.
Network Data Types
PostgreSQL is unique among many SQL systems in that it includes built-in data
types for network addresses. CIDR, INET, and MACADDR all represent specific
aspects of network addresses. These data types can be particularly useful when
using PostgreSQL as a back-end database to a web application.
Storing network values in these data types is preferential due to the included
functions in PostgreSQL that act on network-specific data types.
CIDR
Description
Holds dotted-quad data for an IP address and the number of bits in the netmask.
This data type is named for the Classless Internet Domain Routing (CIDR)
convention.
Inputs
x.x.x.x/y
Storage Size
12 bytes
Example Data
192.168.0.1/24
10 (10.0.0.0/8 assumed)
Notes
If the bits from the netmask are omitted, the netmask bits are assumed by using
the class of the dotted-quad (for example, 255.0.0.0 assumes 8, 255.255.0.0
assumes 16, 255.255.255.0 assumes 24, and so on). However, the assumption will
be large enough to handle all the entries in the expressed octets.
IPv6 is not yet supported.
INET
Description
Inputs
x.x.x.x/y
Storage Size
12 bytes
Example Data
Notes
The difference between this and CIDR is that an INET can refer to a single host,
whereas CIDR refers to an IP network.
MACADDR
Description
Inputs
xxxxxx:xxxxxx
xxxxxx-xxxxxx
xxxx.xxxx.xxxx
xx-xx-xx-xx-xx-xx
Storage Size
6 bytes
Example Data
08-00-2d-01-32-22
08002d:013222
Notes
Numeric data types store a variety of number-related data. The 7.X series of
releases has brought some changes to this area. Namely, PostgreSQL now uses a
more descriptive naming convention for number-related data types (for example,
BIGINT versus INT8).
Recently, some data types have become deprecated over the last few releases. For
instance, use of the MONEY data type is no longer encouraged; instead, it is
preferential to use the DECIMAL data type.
Description
Inputs
Storage Size
8 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as INT8.
Description
Inputs
(x,y)
x—Total length.
y—Decimal width.
Storage Size
8 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as NUMERIC.
Description
Inputs
Storage Size
8 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as FLOAT8.
Description
Holds an integer.
Inputs
Storage Size
4 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as INT4.
Description
Inputs
Storage Size
4 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as FLOAT4.
SERIAL
Description
Inputs
Storage Size
4 bytes
Notes
The SERIAL data type is actually just a standard INTEGER type with some
additional features: The SERIAL data type is an INTEGER with an automatically
created SEQUENCE and INDEX on the specified column. When a table containing a
SERIAL type is dropped, the associated SEQUENCE must also be explicitly dropped
—it does not occur automatically.
Description
Inputs
Storage Size
2 bytes
Notes
Versions of PostgreSQL before 7.1 might refer to this data type as INT2.
String Data Types
PostgreSQL includes three basic data types for storing string-related data. In
compliance with the SQL standard, there are types for fixed-length and variable-
length character strings. Additionally, PostgreSQL defines a more generic data type
named TEXT that requires no specified upper limit regarding maximum size.
However, this data type is PostgreSQL-specific and is not compliant with the SQL-92
standards.
Description
Inputs
CHAR(n)
Storage Size
(4+n) bytes
Notes
CHAR is a SQL-92-compatible data type. Data that does not fill to the limit specified
is blank padded.
TEXT
Description
Storage Size
Inputs
VARCHAR(n)
Storage Size
Notes
A number of built-in constants are useful to know for simplifying date-time entry.
The following is a list of them:
PostgreSQL evaluates constants at the start of a transaction, and this might result in
undesired behavior. For instance, using the now constant in a series inserted inside
a transaction will result in all rows having the same timestamp. A way around this is
to use the now() function, which is evaluated upon each call, not during transaction
creation.
DATE
Description
Holds a value that describes a particular day. Many different input formats are
supported (see the following section).
Inputs
4 bytes
Notes
January Jan
February Feb
March Mar
April Apr
May May
June Jun
July Jul
August Aug
October Oct
November Nov
December Dec
Friday Fri
Saturday Sat
Sunday Sun
The preceding describes the input formats; the output formats are specified by the
DATESTYLE variable (see the SET SQL command).
INTERVAL
Description
Inputs
-2147483648 to +2147483648
Valid values for Unit are as follows (plurals are also valid):
Second
Hour
Minute
Day
Week
Month
Year
Decade
Century
Millennium
Storage Size
12 bytes
Example Data
1 Week Ago
30 Days
Notes
TIME
Description
The valid range for TIME is from 00: 00: 00.00 to 23: 59: 59.99.
The valid input formats that TIME can take are as follows:
082450—ISO format
08:24 PM—Standard
z—Same as 00:00:00
zulu—Same as 00:00:00
Storage Size
4 bytes
Notes
The TIME data type is a SQL-compatible format. The TIME data type is accurate to
a resolution of .000001 (1 microsecond).
Description
Inputs
The valid range for TIME WITH TIME ZONE is from 00:00:00.00+12 to
23:59:59.99-12.
The valid input formats that TIME WITH TIME ZONE can take are as follows:
08:246—ISO format
08:24:506—ISO format
08:24:50.156—ISO format
0824506—ISO format
Storage Size
4 bytes
Notes
TIME WITH TIME ZONE will accept any time-based input format that is also legal
for the TIME data type, except time zone information is appended to the end.
The TIME data type is a SQL-compatible format. The TIME WITH TIME ZONE data
type is accurate to a resolution of .000001 (1 microsecond).
TIMESTAMP
Description
Inputs
The valid input formats that TIMESTAMP can take are as follows:
For instance:
Storage Size
8 bytes
Notes
Because of the inclusion of time, date, era, and time-zone information, the
TIMESTAMP is a popular data type for storage of temporal elements.
Other Data Types
The following are various data types that are used less frequently. Often, these are
used for internal system purposes only, but you might run across them, so they are
listed here for completeness.
The BIT type stores a series of binary-1 and binary-0 values. The BIT data type has
a specified width and pads empty entries with zeros, whereas the BIT VARYING
data type allows flexible-width entries to be made.
MONEY
The MONEY data type is still supported but is no longer considered active. Consider
using a NUMERIC or DECIMAL data type with an appropriately set decimal width.
NAME
The NAME data type stores a 31-character string, but it is only intended to be used
internally. PostgreSQL makes use of the NAME type to store information in the
internal system catalogs.
OID
The OID data type is an integer that ranges in value from zero to 4 billion. Every
object created in PostgreSQL has an OID assigned to it implicitly. OID s are useful
for maintaining data integrity because that number will be unique in the database.
By default, OID s are hidden from view, but they can be selected and displayed by
explicitly specifying them in a query. For instance:
Moreover, once an OID sequence has reached its upper limit, it starts again at zero
(or another prescribed minimum). Although sequences can do the same thing, the
odds of OID s wrapping around are much greater because they are distributed
throughout the entire database.
The following is a listing of some of the more obscure PostgreSQL data types (some
of these types are only aliases to previously documented types):
Most operators simply return an implicit Boolean true or false given the comparison
criteria. However, some operators, such as the math- and string-related ones,
return new results from the supplied elements.
The following is a map of the default PostgreSQL operators grouped by data type.
After that is a more detailed listing of all the supported PostgreSQL operators,
including information on specific usage, syntax, and notes.
Geometric +
##
&&
&<
&>
<->
<<
<^
>>
>^
?#
?-
?-|
@-@
?|
?||
\@
@@
~=
Logical AND
OR
NOT
Network <
<=
>=
>
<>
<<
<<=
>>
>>=
Numerical !
!!
%
*
|/
||/
String <
<=
<>
>
>=
||
!!=
~~
!~~
~*
!~
!~*
Time #<
#<=
#<>
#=
#>
#>=
<#>
<<
~=
<?>
Geometric Operators
Some of these operators are implicit Boolean returns (for example, << and >>), and
others provide new results from the input elements like math operators (for
example, + and -).
Listing
* Scaling/rotation
/ Scaling/rotation
# Intersection
&& Overlaps
<< Is left of
<^ Is below
>> Is right of
>^ Is above
?# Intersects or overlaps
?- Is horizontal
?-| Is perpendicular
?| Is vertical
?|| Is parallel
@ Contained or on
@@ Center of
~= Same as
Notes/Examples
Select all boxes that lay to the left of the given box:
The logical operators usually are used to combine expressions to get an aggregate
Boolean value from the list.
Listing
Notes/Examples
The network operators that are built into PostgreSQL are useful for making
comparisons between IP addresses. These operators function on INET and CIDR
data types equally.
Listing
= Equals
>> Contains
Notes/Examples
Use the following to find all the IP addresses less than the one specified:
Similarly, use the following to find all the IP addresses on the given subnet:
SELECT * FROM computers WHERE ipaddr<<'192.168.0.1/24'
Numerical Operators
Listing
! Factorial
!! Factorial (left operator)
% Mod or truncate
* Multiplication
+ Addition
- Subtraction
/ Division
: Exponentiation
@ Absolute value
^ Exponentiation
| Square root
||/ Cube root
& Binary AND
| Binary OR
# Binary XOR
~ Binary NOT
<< Binary shift left
>> Binary shift right
Notes/Examples
Addition 5+5 10
Left factorial !3 6
The binary operators also function on BIT and BITVARYING d ata types. For
instance:
Essentially, all string operators return implicit Boolean true or false values given the
supplied comparison. (The exception is the concatenation operators shown in the
next "Listing" section.)
When making comparisons, the characters' location in the ANSI chart is taken into
account. Therefore, a lowercase "a" is seen as less than (<) an uppercase "A."
PostgreSQL can make use of two distinct types of pattern matching: an ANSI-SQL
method and a POSIX regex style. The internal ANSI-SQL style makes use of the
LIKE and NOTLIKE keywords. This ANSI-SQL method can use the following
wildcards for pattern matching:
Wildcard Meaning
POSIX Regex
Meaning
Symbol
+ Repetition of a sequence.
…etc…
The regex engine included with most versions of PostgreSQL is the POSIX 1003.2
"egrep" style. This regex library, by Henry Spencer, is included in many other
popular applications. More information on the regex engine included in a specific
version of PostgreSQL can usually be found in the source directory $
SOURCE/backend/ regex.
Listing
= Equal to
| | Concatenate strings
~~ Like
LIKE Like
Notes/Examples
Select all records from a table where the first name is Bob:
Select all records where the first name begins with Bo:
Select all records where the first name begins with b, regardless of case:
The time operators are used to compare temporal values and usually return a
Boolean true or false.
Listing
= Interval equal
| Start of interval
Chapter 4. PostgreSQL Functions
PostgreSQL includes a number of built-in functions that manipulate specific data
types and return values.
Function Category
COUNT
MAX
MIN
STDDEV
SUM
VARIANCE
TO_CHAR
TO_DATE
TO_NUMBER
TO_TIMESTAMP
BOX
CENTER
CIRCLE
DIAMETER
HEIGHT
ISCLOSED
ISOPEN
LENGTH
LSEG
NPOINTS
PATH
PCLOSE
POINT
POLYGON
POPEN
RADIUS
WIDTH
BROADCAST
HOST
MASKLEN
NETMASK
NETWORK
TEXT
TRUNC
ACOS
ASIN
ATAN
ATAN2
CBRT
CEIL
COS
COT
DEGREES
EXP
FLOOR
LN
LOG
PI
POW or POWER
RADIANS
RANDOM
ROUND
SIN
SQRT
TAN
TRUNC
COALESCE
NULLIF
CHR
INITCAP
LOWER
LPAD
LTRIM
OCTET_
POSITION
STRPOS
RPAD
RTRIM
SUBSTRING
SUBSTR
TRANSLATE
TRIM
UPPER
CURRENT_
CURRENT_TIME
CURRENT_TIMESTAMP
DATE_PART
DATE_TRUNC
EXTRACT
ISFINITE
NOW
TIMEOFDAY
TIMESTAMP
SESSION_USER
USER
Other functions ARRAY_DIMS
Aggregate Functions
AVG
Description
The AVG function returns the average value of the supplied column or expression.
Input
AVG(col | expression)
Example
This example shows the use of expressions contained in the AVG function. Specifically,
it returns the average amount over $18,000 that employees earn (notice that the
criteria provided restricts calculations being performed on anyone earning less than
$18,000).
Notes
The AVG function will work on the following data types: smallint, integer, bigint,
real, double precision, numeric, and interval.
Any integer value (that is, bigint, integer, and so on) returns an integer data
type.
COUNT
Description
The COUNT function counts the rows or expressions where a nonNULL value is returned.
Inputs
Example
MAX
Description
The MAX function returns the greatest value from a column or expression list that was
passed to it.
Input
MAX(col | expression)
Example
MIN
Description
The MIN function returns the smallest value from a column or expression list that was
passed to it.
Input
MIN(col | expression)
Example
STDDEV
Description
The STDDEV function returns the standard deviation of the supplied columns or
expression list.
Input
STDDEV(col | stddev)
Example
Notes
The STDDEV function will work on the following data types: smallint, integer,
bigint, real, double precision, and numeric.
SUM
Description
The SUM function returns the aggregate sum of all the column or expression values
passed to it.
Input
SUM(col | expression)
Example
Notes
The SUM function will work on the following data types: smallint, integer, bigint,
real, double precision, numeric, and interval.
VARIANCE
Description
The VARIANCE function will return the squared value of the standard deviation from the
supplied column or expression list.
Input
VARIANCE(col | expression)
Example
CAST
Description
The CAST function can be used to convert from one data type to another. Generally
speaking, CAST is a fairly generic and easy-to-use function that makes most data-
type conversions easy.
Inputs
CAST(value AS newtype)
Examples
CAST('57' as INT)) 57
CAST(57 as CHAR) '57'
CAST(57 as NUMERIC(4,2)) 57.00
CAST('05-23-87' as DATE)) 1987-05-23
Notes
An additional way to perform type conversion is to separate the value and the
desired data type with double colons (::).
'57'::INT 57
57::CHAR '57'
57::NUMERIC(4,2) 57.00
'05-23-87'::DATE 1987-05-23
TO_CHAR
Description
The TO_CHAR function takes various input data types and converts them to a string
data type. In addition to performing a data conversion, the TO_CHAR function also
has extensive formatting capabilities to output the string in the exact format
desired.
Inputs
The TO_CHAR function shares a common usage pattern regardless of the data type
it is handling. All TO_CHAR functions accept two arguments; the first is the data to
be converted, and the second is a formatting template for PostgreSQL to use when
constructing the output.The following table illustrates this usage pattern.
Usage Description
Converting to a character string from a numerical data type uses the following
template mask for formatting output.
(In addition to the following specific formatting commands, the TO_CHAR function
will also blindly accept and display any text enclosed in double quotes. This can be
very helpful when trying to perform specific labeling of output data.)
Item Description
Leading zero
0
9 Digit placeholder
. Decimal point
, Thousands separator
G Group separator*
D Decimal point*
L Currency symbol*
* These items use the locale setting for your particular machine, so your results
might vary.
TO_CHAR with Date/Time Data Types
The TO_CHAR (and TO_DATE, TO_TIMESTAMP) function uses the following date-
time–related template mask for formatting output:
Item Description
SS Second (00–59)
MI Minute (00–59)
IW ISO week of year (1–53; first week starts on first Thursday of Jan)
MM Month (01–12)
Q Quarter
Examples
Input Output
TO_CHAR(123,'999') 123
TO_CHAR(123,'99 9') 12 3
TO_CHAR(123,'0999') 0123
TO_CHAR(123,'999.9') 123.0
TO_CHAR(1234,'9,999') 1,234
TO_CHAR(1234,'9G999') 1,234
TO_CHAR(1234.5,'9999D99') 1234.50
TO_CHAR(123,'999PL') 123+
TO_CHAR(123,'PL123') +123
TO_CHAR(-123,'999MI') 123-
TO_CHAR(-123,'MI123') -123
TO_CHAR(123,'SG123') +123
TO_CHAR(-123,'SG123') -123
TO_CHAR(-123,'999PR') <123>
TO_CHAR(123,'RN') CXXIII
Input Output
Notes
Any items in double quotes are ignored. Therefore, to output reserved template
words, simply enclose them in double quotes (that is, YYYY outputs as " YYYY").
Special characters like backslashes (\) can be achieved by enclosing them in quotes
and doubling them (that is,"\\" become "\" on output).
The preceding templates are used in many other TO -style functions (that is,
TO_DATE, TO_NUMBER, and so on).
TO_DATE
Description
The TO_DATE function converts a text string to a date format. The TO_DATE
function takes two arguments; the first is the string to be converted, and the second
is a text template that specifies how the output is to appear.
Input
TO_DATE(text, texttemplate)
Example
Notes
There are a number of options that the text template string can take. Refer to
TO_CHAR for a full listing of the options that the date-time template can take.
TO_NUMBER
Description
Input
TO_NUMBER(text, texttemplate)
Examples
TO_CHAR(1234567,'9G999G999') 1,234,567
TO_CHAR(1234.5,'9999D99') 1234.50
Notes
The text template of the TO_NUMBER function accepts a number of options. For a
full listing of supported layout options, refer to the TO_CHAR function.
TO_TIMESTAMP
Description
Input
Example
Notes
The date-time template accepts many options for formatting output. Refer to the
TO_CHAR function for a full list of valid date-time formatting options.
Geometric Functions
AREA
Description
Input
AREA(obj)
Example
AREA(box '((1,1),(3,3))') 4
BOX
Description
There are several versions of the BOX function. Most perform conversions from other
geometric types to the box data type. However, if the BOX function is passed two
overlapping boxes, the result will be a box that represents where the intersection
occurs.
Inputs
BOX(box,box)—Perform an intersection.
Examples
BOX(box'((1,1),(3,3))', box'((2,2),(4,4))') BOX'(3,3),(2,2)'
BOX(circle'(0,0),2') BOX'(1.41, 1.41), (-1.41, -1.41)'
BOX(point'(0,0)', point'(1,1)') BOX'(1,1),(0,0)'
BOX(polygon'(0,0),(1,1),(1,0)' BOX'(1,1),(0,0)'
CENTER
Description
The CENTER function returns the center point of the object passed to it.
Input
CENTER(obj)
Example
CENTER(box'(0,0),(1,1)') point'(.5,.5)'
CIRCLE
Description
Input
Example
CIRCLE(box'(0,0),(1,1)') CIRCLE'(.5,.5),.707016…'
DIAMETER
Description
DIAMETER(circle)
Example
DIAMETER(circle'((0,0),2)') 4
HEIGHT
Description
The HEIGHT function is used to compute the vertical height of a supplied box.
Input
HEIGHT(box)
Example
HEIGHT(box'(0,0),(3,3)') 3
ISCLOSED
Description
The ISCLOSED function returns a Boolean value that represents whether the supplied
path is open or closed.
Input
ISCLOSED(path)
Example
ISCLOSED(path'(0,0),(1,1),(1,0),(0,0)') t
ISOPEN
Description
The ISOPEN function returns a Boolean value that represents whether the supplied
path is open or closed.
Input
ISOPEN(path)
Example
ISOPEN(path'(0,0),(1,1),(1,0),(0,0)') f
LENGTH
Description
Input
LENGTH(lseg)
Example
LENGTH(lseg'(0,0),(1,1)' 1.41422135623731
Notes
If the LENGTH function is passed a BOX data type, it will interpret the opposite
corners of the box as the lseg to compute.
LSEG
Description
The LSEG function converts from either a box or a pair of points to an lseg data
type.
Inputs
LSEG(box)
LSEG(point,point)
Example
LSEG(box'(0,0),(1,1)') LSEG'(1,1),(0,0)'
NPOINTS
Description
The NPOINTS function returns the number of points that compose the supplied path.
Inputs
NPOINTS(path)
NPOINTS(polygon)
Example
NPOINTS(path'(0,0),(1,1)') 2
PATH
Description
Input
PATH(polygon)
Example
PATH(polygon'(0,0),(1,1),(1,0)') PATH'((0,0),(1,1),(1,0))'
Notes
Notice the closed representation "(" in the example provided. For more information
on open or closed path representation, refer to the PATH data type.
PCLOSE
Description
Input
PCLOSE(path)
Example
PCLOSE(path'(0,0),(1,1),(1,0)') PATH'((0,0),(1,1),(1,0))'
Notes
See the PATH data type for more information on how paths are represented as being
open or closed.
POINT
Description
The POINT function provides various geometric services, depending on the supplied
object type.
Inputs
Examples
POINT(circle'((0,0),2)') POINT'(0,0)'
POINT(polygon'(0,0),(1,1),(1,0)') POINT'(.66…),.33…)'
POLYGON
Description
Inputs
Examples
POLYGON(4, circle'((0,0),4)')
POLYGON'(-4,0),(2.041,4),(4,-4.0827),(-6.12,-4)'
POPEN
Description
Input
POPEN(path)
Example
POPEN(path'(0,0),(1,1),(1,0)') PATH'[(0,0),(1,1),(1,0)]'
Notes
Notice the open representation of the returned path. For more information on open or
closed path representations, refer to the PATH data type.
RADIUS
Description
Input
RADIUS(circle)
Example
RADIUS(circle'((0,0),2)') 2
WIDTH
Description
Input
WIDTH(box)
Example
WIDTH(box'(0,0),(2,2)') 2
Network Functions
PostgreSQL includes many functions that are network oriented. Primarily, these are
useful for performing calculations and transformations of IP-related data. The
following sections discuss the included network functions in PostgreSQL.
ABBREV
Description
The ABBREV function returns an abbreviated text format for a supplied inet or
cidr value.
Input
ABBREV(inet | cidr)
Example
ABBREV('192.168.0.0/24') "192.168/24"
BROADCAST
Description
The BROADCAST function returns the broadcast address of the supplied inet or
cidr value.
Input
BROADCAST(inet | cidr)
Example
BROADCAST('192.168.0.1/24') '192.168.0.255/24'
HOST
Description
The HOST function extracts the host address for the supplied inet or cidr value.
Input
HOST(inet | cidr)
Example
HOST('192.168.0.101/24') '192.168.0.101'
MASKLEN
Description
The MASKLEN function extracts the netmask length for the supplied inet or cidr
value.
Input
MASKLEN(inet | cidr)
Example
MASKLEN('192.168.0.1/24') 24
NETMASK
Description
The NETMASK function calculates the netmask for the supplied inet or cidr value.
Input
NETMASK(inet | cidr)
Example
NETMASK('192.168.0.1/24') '255.255.255.0'
NETWORK
Description
The NETWORK function extracts the network from a supplied inet or cidr value.
Input
NETWORK(inet | cidr)
Example
NETWORK('192.168.0.155/24') '192.168.1.0/24'
TEXT
Description
The TEXT function returns the IP and netmask length as a text value.
Input
TEXT(inet | cidr)
Example
TRUNC
Description
The TRUNC function sets the last 3 bytes to zero for the supplied macaddr value.
Input
TRUNC(macaddr)
Example
Notes
This function is useful for associating a supplied MAC address with a manufacturer.
See the directory $SOURCE/contrib/mac (SOURCE is the location of the
PostgreSQL source code) for more information.
Numerical Functions
ABS
Description
Input
ABS(num)
Examples
ABS(-7) 7
ABS(-7.234) 7.234
Notes
The ABS function's return value is the same data type that it is passed.
ACOS
Description
Input
ACOS(num)
ASIN
Description
The ASIN function returns an inverse sine.
Input
ASIN(num)
ATAN
Description
Input
ATAN(num)
ATAN2
Description
Input
ATAN2(x,y)
CBRT
Description
The CBRT function returns the cube root of the supplied number.
Input
CBRT(num)
Example
CBRT(27) 3
CEIL
Description
The CEIL function returns the smallest integer not less than the supplied value.
Input
CEIL(num)
Example
CEIL(-22.2) -22
COS
Description
Input
COS(num)
COT
Description
Input
COT(num)
DEGREES
Description
The DEGREES function converts from radians to degrees.
Input
DEGREES(num)
Example
DEGREES(1) 90
EXP
Description
Input
EXP(num)
Example
EXP(0) 1.0
FLOOR
Description
The FLOOR function returns the largest integer not greater than the supplied value.
Input
FLOOR(num)
Example
FLOOR(-22.2) -23
LN
Description
Input
LN(num)
Example
LN(100) 4.6051701860
LOG
Description
The LOG function performs a standard base-10 logarithm on the supplied value.
Input
LOG(num)
Example
LOG(100) 2.0
PI
Description
Inputs
None.
Example
PI() 3.1459265358979
POW or POWER
Description
Inputs
POW(num, exp)
Examples
POW(2,2) 4.0
POW(2,3) 8.0
RADIANS
Description
Input
RADIANS(num)
Example
RADIANS(90) 1
RANDOM
Description
The RANDOM function returns a pseudorandom number between 0.0 and 1.0.
Inputs
None.
Example
RANDOM() .654387
ROUND
Description
Inputs
ROUND(num, dec)
Examples
ROUND(1.589, 1) 1.6
ROUND(1.589, 2) 1.59
SIN
Description
Input
SIN(num)
SQRT
Description
The SQRT function returns the square root of the supplied value.
Input
SQRT(num)
Example
SQRT(9) 3
TAN
Description
Input
TAN(num)
TRUNC
Description
The TRUNC function truncates to the specified number of decimal places without
rounding them out.
Inputs
TRUNC(num [, dec])
Examples
TRUNC(1.589999, 2) 1.58
TRUNC(1.589999) 1
SQL Functions
CASE WHEN
Description
The CASE WHEN function is a simple conditional evaluation tool. Most programming
languages contain similar constructs. It can be thought of as analogous to the
ubiquitous IF…THEN…ELSE statement.
Inputs
Example
This example shows a classic IF…THEN…ELSE paradigm in which the CASE WHEN
function can be used. The age of an employee is compared against certain constants,
and the possible outputs of minor, adult, or unknown are returned depending on
their age.
Description
The COALESCE function accepts an arbitrary number of input arguments and returns
the first one that is evaluated as NOT NULL. The COALESCE function is very useful
for providing display defaults for arbitrary data sources.
Input
COALESCE(arg1, …, argN)
Example
NULLIF
Description
The NULLIF function accepts two arguments. It returns a NULL value only if the
value of both arguments is equal. Otherwise, it returns the value of the first
argument.
Input
NULLIF(arg1, arg2)
Example
In this case, the first value will be returned because the values are not equal:
----------------
'hello'
NULL
Notes
ASCII
Description
The ASCII function returns the ASCII value for the supplied character.
Inputs
ASCII(chr)
Examples
ASCII('A') 65
ASCII('Apple') 65
Notes
In the case of multiple characters being supplied to the ASCII function, only the first
is evaluated.
CHR
Description
The CHR function returns the character that corresponds to the ASCII value
provided.
Inputs
CHR(val)
CHR(65) 'A'
INITCAP
Description
The INITCAP function forces a string or column to be returned when the first
character is uppercase and the rest is lowercase only.
Inputs
INITCAP(col)
Or
INITCAP(string)
Example
Proper_Name
--------
Bill
Bob
Sam
Description
Inputs
LENGTH(col)
Name
------
Pam
Sam
Sue
Bob
LOWER
Description
Inputs
LOWER(col)
Or
LOWER(string)
Example
Low_Name
--------
bill
bob
sam
LPAD
Description
Inputs
LPAD(str, len, fill)
Examples
LTRIM
Description
The LTRIM function removes the specified characters from the left side of a
character string.
Inputs
LTRIM(str [,trim])
Examples
OCTET_LENGTH
Description
The OCTET_LENGTH function returns the length of a column or string, including any
multibyte data present.
Inputs
OCTET_LENGTH(col)
Or
OCTET_LENGTH(string)
Example
Octet_Length
11
Notes
OCTET_LENGTH and LENGTH will often return the same value. However, a crucial
difference is that OCTET_LENGTH is actually returning the number of bytes in a
string. This can be an important difference if multibyte information is being stored.
POSITION
Description
The POSITION function returns an integer that represents the position of the
supplied character string in the given column (or supplied string).
Inputs
POSITION(str IN col)
Example
Return the names from the table authors where the second letter is an 'a':
Name
------
Pam
Sam
Tammy
Barry
STRPOS
Description
The STRPOS function returns an integer that represents the position of a specific
character string in a given column (or supplied string).
Inputs
STRPOS(col, str)
Example
Notes
RPAD
Description
The RPAD function right-fills the specified string with spaces or characters.
Inputs
Examples
RTRIM
Description
The RTRIM function removes the specified characters from the right side of a
character string.
Inputs
RTRIM(str [,trim])
Examples
SUBSTRING
Description
Inputs
Examples
Notes
SUBSTR
Description
The SUBSTR function extracts a specified portion from an existing character string.
Inputs
len—By default, the rest of the string is assumed; however, a specific portion can
be specified.
Examples
SUBSTRING('Hello', 2) 'ello'
SUBSTRING('Hello', 2, 2) 'el'
Notes
TRANSLATE
Description
The TRANSLATE function performs a search and replace on a specified string. The
data replaced is done according to where it matches in the search criteria. See the
following example for more.
Inputs
Examples
TRIM
Description
The TRIM function removes the specified character or whitespace from the left or
right (or both) of a given string.
Inputs
leading | trailing | both— The side from which to remove the specified
characters.
Examples
Description
Inputs
UPPER(col)
Or
UPPER(string)
Example
Upper_Name
--------
BILL
BOB
SAM
Time Functions
The following functions assist in performing calculations based on time- and date-related
material. These are useful in performing calculations and transformations of temporal-related
data sets.
AGE
Description
The AGE function returns an interval that represents the difference between the current time and
the time argument supplied.
Inputs
Example
CURRENT_DATE
Description
Inputs
None.
Example
Notes
Notice that there are no trailing parentheses "()" with this function. This is to maintain SQL
compatibility.
CURRENT_TIME
Description
Inputs
None.
Example
Notes
Notice that there are no trailing parentheses "()" with this function. This is to maintain SQL
compatibility.
CURRENT_TIMESTAMP
Description
The CURRENT_TIMESTAMP function returns the current system date and time.
Inputs
None.
Example
Notes
Notice that there are no trailing parentheses "()" with this function. This is to maintain SQL
compatibility. This function is analogous to the NOW function.
DATE_PART
Description
The DATE_PART function extracts a specified section from the supplied date/time argument.
Inputs
DATE_PART(formattext, timestamp)
DATE_PART(formattext, interval)
formattext—One of the valid DATE_PART formatting options; see the following section.
The following keywords are recognized as valid date-time elements available for extraction:
Item Description
month The month of the year (1–12) (timestamp only).The number of remaining
months (interval only).
epoch The number of seconds since 01-01-1970 00:00 (timestamp). The total
number of seconds (interval).
Examples
Notes
When using DATE_PART with interval data types, it is import to recognize that DATE_PART
will not do implicit calculations. DATE_PART only functions as an extraction tool. For instance, if
your interval is 1 month ago, DATE_PART will return 0 (zero) if you tried to extract days.
DATE_TRUNC
Description
The DATE_TRUNC function truncates the supplied timestamp to the specified precision.
Inputs
DATE_TRUNC(formattext, timestamp)
formattext—The precision value to which to truncate the timestamp; see the following valid
options for formatting.
The following is a listing of the various levels of precision that DATE_TRUNC can operate on:
Item Description
Examples
EXTRACT
Description
The EXTRACT function extracts the specified value from the supplied timestamp or interval.
Inputs
formattext—A valid date field. Refer to DATE_PART for a listing of valid format codes.
Example
The EXTRACT function performs like the DATE_PART function. Either syntax can be used
interchangeably.
ISFINITE
Description
The ISFINITE function returns a Boolean value that indicates whether the supplied timestamp
or interval represents a finite amount of time.
Inputs
ISFINITE(timestamp)
ISFINITE(interval)
Example
NOW
Description
The NOW function returns a timestamp that represents the current system time.
Inputs
None.
Example
SELECT now();
'2001-11-1 15:23:54-06'
Notes
TIMEOFDAY
Description
The TIMEOFDAY function returns a high-precision date and time value.
Inputs
None.
Example
TIMESTAMP
Description
The TIMESTAMP function works as a conversion routine to convert either date or date and
time data types into a timestamp.
Inputs
TIMESTAMP(date)
TIMESTAMP(date, time)
Example
Several of the included functions in PostgreSQL deal with user and session issues.
The following sections discuss user-related functions.
CURRENT_USER
Description
The CURRENT_USER function returns the user ID being used for permission
checking.
Inputs
None.
Example
SELECT CURRENT_USER;
--------------
webuser
Notes
Currently, CURRENT_USER and SESSION_USER are the same, but in the future,
there might be a distinction as needed for programs running in a setuid mode.
Notice that the preceding function is not called with the trailing parentheses "()".
This is to maintain SQL compatibility.
SESSION_USER
Description
The SESSION_USER function returns the user ID that is currently logged into
PostgreSQL.
Inputs
None.
Example
SELECT SESSION_USER;
--------------
webuser
Notes
Currently, CURRENT_USER and SESSION_USER are the same, but in the future,
there might be a distinction as needed for programs running in a setuid mode.
Notice that the preceding function is not called with the trailing parenthesis "()".
This is to maintain SQL compatibility.
USER
Notes
See CURRENT_USER.
Other Functions
Some of the included PostgreSQL functions do not fall neatly into a specific category.
This section outlines one example of this type of function.
ARRAY_DIMS
Description
The ARRAY_DIMS function returns the number of elements stored in an array field.
Input
ARRAY_DIMS (col)
Example
array_dims
----------
[1:4]
Chapter 5. Other PostgreSQL Topics
PostgreSQL, like all RDBMSs, has specific ways in which common concepts such as
indexes and transaction control are implemented. Additionally, there are other
concepts that are unique to PostgreSQL.
This chapter contains information related to how PostgreSQL handles the following:
Arrays in fields
Inheritance
Indexes
One of the nice features that PostgreSQL supports is the concept of fields that can
hold arrays. This enables multiple values of the same data type to be stored in a
single field.
To insert arrays into a field, the field should be marked as holding arrays during
table creation. After a field has been designated to hold arrays, data can be
inserted, selected, or updated into the array by referring to the specific array
element in the designated table.
Creating an Array
For example, let's create a table named students that has a four-element array field
to hold test scores:
Notice that, at table creation time, a field is designated as being able to support
arrays by including brackets [] next to the data-type definition.
When inserting, updating, or selecting data, the specific array element can be
chosen by referencing it explicitly. PostgreSQL begins numbering array elements at
1. Therefore, the first element in the array testscore would be referenced as
testscore[1].
Now let's insert some sample data into your table students. Notice how you refer
to the specific element you want to address by using braces {}:
Notice that the array fields are referenced by using braces {}; specifically, the array
elements are referenced where you wanted the data to be inserted. Likewise, the
same strategy is used when updating the row is done.
In fact, either the entire array can be replaced or just the specific element:
Selecting specific elements can be done in the same way. For instance, to select all
students who scored higher than an 85 on the first three tests, you would use the
following:
Multidimensional Arrays
You could then insert and access that information as before (notice the use of the
double braces):
Selection can then be made for the specific element in the specific multidimensional
array. For instance, to see who scored greater than a 90 on the third exam in the
second half of school:
Extending Arrays
One caveat (or benefit) of the PostgreSQL array structure is that element sizes
within an array can be expanded dynamically. Although you might explicitly specify
a maximum array size during the table creation, this size can be altered by using
the UPDATE command.
For instance, here is an example of a table created with an array. The table then
utilizes the UPDATE command to extend the size of the array:
Name testscore
-------------------
Bill {96,84,98}
Name testscore
-------------------
Bill {96,84,98,100}
Although this can be a useful feature, it can also be problematic unless used
carefully. It would be possible to end up with different rows that each have a
different number of array elements.
One useful function for dealing with arrays is the ARRAY_DIMS function. This
function returns the current number of elements in an array. Refer to the
ARRAY_DIMS function in Chapter 4, "PostgreSQL Functions."
Inheritance
PostgreSQL allows tables to inherit properties and attributes from other tables. This
is useful in cases in which many tables are needed to hold very similar information.
In these cases, it is often possible to create a parent table that holds the common
data structures, allowing the children to inherit these structures from the parent.
Now you can create a specific table just for cooks, who happen to need all the
information that other employees need:
The real power of inheritance lies in the capability to search parent tables for
information stored in child tables, without having to explicitly name the child table in
the query.
Notice that the preceding query includes an asterisk (*) after the table employees.
This is to tell PostgreSQL to extend its search to child tables as well.
To limit a query search to a particular table inVersion 7.1, there are two options.
One is to set the environmental variable SQL_Inheritance to OFF. The second is
to use the keyword ONLY during a SELECT query, for instance:
Although not a limitation of PostgreSQL per se, unless table inheritance is carefully
planned, problems will arise. For instance, in the preceding examples, you are
assuming that every cook will also be an employee.
Certainly, it's possible that a new relationship could be formed that would not fall
under the category of employee. Perhaps volunteer or consultant would be
more appropriate for a given relationship. At this point, your previous database
schema is problematic and will need to be redone to fit more accurately. As
mentioned earlier, this is not an inherent problem with PostgreSQL; it just
underlines the need for careful planning when using inheritance.
PostgreSQL Indexes
Essentially, indexes help database systems search tables more efficiently. The
concept of indexes is widely supported among all the popular RDBMSs, and
PostgreSQL is no exception.
By default, PostgreSQL has support for up to three types of indexes: B-Tree, R-Tree,
and hash. During index creation, the specific type of index required can be specified.
Each index type is best suited for a specific type of indexing.
A general rule of thumb when using indexes is to determine what queries your
database is making consistent use of. Essentially, indexes should exist for every
WHERE criteria in frequently used queries.
B-Tree Indexes
This is the default index of which PostgreSQL most often makes use. In fact, if the
CREATE INDEX command is called with no specification of index types, a B-Tree
index will be generated.
B-Tree indexes are used whenever the following list of comparison operators is
employed:
Currently, B-Tree indexes are the only provided indexes that support multicolumn
indexing. Up to 16 columns can be aggregated into a B-Tree multicolumn index
(although this limit can be altered at compile time).
R-Tree Indexes
R-Tree indexes are especially suited for fast optimization of geometric and/or spatial
comparisons. R-Tree indexes are implementations of Antonin Guttman's quadratic
splits. The R-Tree index is a fully dynamic index that does not need periodic
optimization.
Hash Indexes
The hash index is a standard hash index that is an implementation of Litwin's linear
hashing algorithms. This hash index is a fully dynamic index that does not need
periodic optimization.
Hash indexes can be used whenever the = operator is employed in a comparison.
However, there is no substantial evidence that a hash index is any faster than a B-
Tree index, as implemented in PostgreSQL. Therefore, in most cases, it is preferable
to use B-Tree for = comparisons.
There are also specific uses of indexes that should be talked about. Namely, they
can also be used on the output of functions and in multicolumn situations.
Functional Indexes
This will result in much faster optimization times when queries call WHERE
MAX(salary)>n or other such selection criteria.
Multicolumn Indexes
For instance:
This would make use of the multicolumn name_ssn_idx. However, the following
would not:
However, multicolumn indexes can be used with great effect to aggregate unique
row keys for tables that contain a lot of similar information. This is generally used to
enforce data integrity. For instance, suppose a table made use of the following
fields:
Each individual field would prove to be difficult to enforce any unique constraints
upon. After all, there will be many potential people who are 5'10" or who are 25 and
so on. However, there will be decidedly fewer people who are 5'10", 25 years old,
and named Bill Smith. This can be a good candidate to use a multicolumn index with
unique constraints to enforce data integrity.
A common cause of confusion is the fact that there are two key types that, on the
surface, seem to perform the same function.
Both primary and unique keys make use of indexes that enforce rules requiring field
values to be unique to the table. However, there are some important distinctions
and finer differences between them:
Primary keys are mainly used to relate a field value to a specific row (OID) in a
table. This is why primary keys can be used as relational keys when used in
conjunction with foreign tables. Additionally, primary keys will not allow NULL
values to be entered.
Unique keys do not relate a field value to a specific row; they just enforce a
uniqueness clause on the specified column. Although this is useful for
maintaining data integrity, it is not necessarily as useful for foreign table
relations as primary keys are. Moreover, unique keys will generally allow NULL
values to be inserted.
Here's a basic example of the differences: Suppose you have two fields that are
important in an employee table. One field is for the employee_id, which is
assigned by the system, and the other is an SSN used by humans for data input and
so on.
In this scenario, the employee_id should be designated as a primary key, and the
SSN should be designated as a unique key.
OIDs
PostgreSQL makes use of object identifiers (OIDs) and temporary identifiers (TIDs)
to correlate table rows with system tables and temporary index entries.
Every row inserted into a table in PostgreSQL has a unique OID associated with it.
In fact, every table in PostgreSQL contains a hidden column named oid. For
instance:
Name Title
---------------------------------------------------
Bill Smith Cooking for 6 Billion
Sam Jones Chicken Soup for the Publishers Soul
SELECT oid, * FROM authors;
The key concept to understand with OIDs is that they are not sequential within a
table. OIDs are issued for every row item in the entire database; they are not
specifically constrained to one table. Therefore, any one table will never contain a
sequential ordering of OIDs. The SERIAL data type or an autonumbering SEQUENCE
is best suited for that type of application.
By default, PostgreSQL reserves the OIDs from 0 to 16384 for system-only use.
Therefore, user table-rows will always be assigned an OID greater than this.
PostgreSQL also uses TIDs to make dynamic relations between rows of data and
index entries.This value fluctuates and is only used for internal system purposes.
A common question is how to create an exact copy of a table, including the original
OIDs.This is made possible by utilizing the OID data type provided by PostgreSQL.
For instance:
Most popular RDBMSs make use of table or row locks to maintain database
consistency. Typically, these locks occur at the physical level of the file. These locks
are used to prohibit two or more instances from writing to the same row (or table)
concurrently.
PostgreSQL uses a more advanced method for ensuring database integrity. In MVCC,
each transaction sees a version of the database as it existed at some near point in
the past. It is import to distinguish that the transaction is not seeing the actual
data, just a previous version of that data.
This prevents the current transaction from having to deal with the database arriving
at an inconsistent state due to other concurrent database transactions. In essence,
once a transaction is started, that transaction is an island unto itself. The underlying
data structures are isolated from other transactions' manipulations. Once that
transaction has ended, the changes it made to its specific version of the database
are merged back into the actual data structures.
There are three types of concurrency problems that any RDBMS has to deal with:
Nonrepeat reads. A transaction rereads data and finds data that has
changed due to another transaction having been committed since the first read
occurred.
4. If the row still matches the criterion, then the update will continue. (Note: See
the paragraph following this list.)
5. The row is then doubly updated, and other waiting statements in Transaction A
will continue to execute.
The important point to notice is what happens in step 4. At this point, Transaction A
has a new version of the database it is using. This occurred when Transaction A re-
executed its query. At that point, it was using a new version of the database as its
baseline. Therefore, subsequent statements in Transaction A will operate on the
changes made by Transaction B. So, in this way, the transaction isolation is only
partial. It is possible for transactions to "seep" into each other in specific cases like
these.
SERIALIZABLE Level
This isolation level differs from READ COMMITTED in that each transaction must
occur in a scheduled serial manner. No transactions can occur that would result in
one transaction acting on another transaction's modifications.
The practical effect of this is that the database system must be constructed in such
a way as to expect transaction failures and retry them afterward. On a heavily used
system, this could mean that a significant percentage of transactions would fail due
to this strict scheduling. Such a burden could make the system much slower than
would occur under a straight READ COMMITTED database.
Most of these utilities perform operations that could also be performed by executing
series of SQL commands. They have been included as standalone executables to aid
the DBA in performing routine system tasks.
In the following documentation, the location of the file is noted. The two most
common forms of PostgreSQL installation are installation either from source code or
as part of an RPM package. The appropriate installation type is included in the
"Notes" section of each of the following commands.
Alphabetical Listing of Files
createdb
Description
Usage/Options
Option Description
-p, --port port The port or socket file of the listening server.
-q, --quiet Do not return any responses from the back end.
Notes/Location
createdb relies on the psql command to actually perform the database creation.
Therefore, psql must be present and able to be executed in order for createdb to
function correctly.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
createlang
Description
Usage/Options
Option Description
-p, --port port The port or socket file of the listening server.
-l, --list Lists the languages currently registered for the specified
database.
Examples
Notes/Location
This command is a wrapper for the CREATE LANGUAGE SQL command; however,
this is the preferred method for adding languages because of certain system checks
it automatically performs.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
createuser
Description
Usage/Options
-p, --port port The port or socket file of the listening server.
-q, --quiet Do not return any responses from the back end.
Examples
$ createuser joe
$ createuser -h db.someserver.com -p 9999 joe
$ createuser -a -d joe
Notes/Location
The createuser utility is a wrapper for the psql command. Therefore, the psql
file must be present and able to be executed by the user.
To create users, the flag in pg_shadow for the executing user must be set as such
to succeed.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
dropdb
Usage/Options
Option Description
-p, --port port The port or socket file of the listening server.
-q, --quiet Do not return any responses from the back end.
$ dropdb mydatabase
$ dropdb -h db.somewebsite.com -p 9333 mydatabase
Notes/Location
dropdb relies on the psql command to actually perform the database deletion.
Therefore, psql must be present and able to be executed for dropdb to function
correctly.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
droplang
Description
The droplang utility removes a language from the specified PostgreSQL database.
Usage/Options
Option Description
-p, --port port The port or socket file of the listening server.
Examples
Notes/Location
This command is a wrapper for the DROP LANGUAGE SQL command; however, this
is the preferred method for removing languages because of the system checks it
automatically performs.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
dropuser
Description
Usage/Options
Option Description
-p, --port port The port or socket file of the listening server.
-q, --quiet Do not return any responses from the back end.
Examples
$ dropuser joe
$ dropuser -h db.someserver.com -p 9999 joe
Notes/Location
The dropuser utility is a wrapper for the psql command. Therefore, the psql file
must be present and able to be executed by the user.
To remove users, the flag in pg_shadow for the executing user must be set as such
to succeed.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
ecpg
Description
The ecpg command is a SQL preprocessor that is used to embed SQL commands
within C programs. Using SQL commands within a C program is essentially a two-
step process. First, the file of SQL commands is passed through the ecpg utility;
then it can be linked and compiled with a standard C compiler.
Usage/Options
Examples
$ ecpg myfile.pgc
Notes/Location
A discussion concerning the actual syntax of the ecpg command is outside the
scope of this entry. For a more complete discussion of using embedded SQL in C
programs, see Chapter 13,"Client-Side Programming," and the section "ecpg."
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pgaccess
Description
The pgaccess utility is a GUI front end that makes many common administration
tasks easier.
Usage/Options
pgaccess [dbname]
Notes/Location
Renames tables.
Drops tables.
Prompts the user for supplied parameters for dynamic queries (such as "
SELECT * FROM authors WHERE name=[parameter 'Authors
Name']").
RPM— /usr/bin
pgadmin
Description
The pgadmin tool is a Windows 95/98/NT tool for performing basic PostgreSQL
administration. (This tool is not included in the base PostgreSQL package; it is a
third-party tool specifically for Windows users.)
Notes/Location
The tool's features include the following:
Revision tracking.
The pgadmin tool is not distributed as part of the standard PostgreSQL system.
Please visit http://www.pgadmin.freeserve.co.uk for more information on obtaining,
installing, and using pgadmin.
pg_dump
Description
Combined with psql or pg_restore, this is the preferred method for performing
database backups and restores (see the next section, "pg_dumpall").
Usage/Options
Option Description
-h host Starts the host where the server is running.
-v Verbose mode.
-Fp Uses a plain SQL text format. This is the default (v7.1 feature).
-Fc Outputs the archive in new custom format. This is the most flexible
option (v7.1 feature).
-O, --no- Does not set ownership objects to match the original database (v7.1
owner feature).
-R, --no- Does not attempt to connect to the database (v7.1 feature).
reconnect
-S, -- Specifies the superuser name (DBA) to use when disabling triggers
superuser and setting ownership information (v7.1 feature).
name
-Z, -- Specifies compression level (0–9); currently, only the custom format
compress supports this feature (v7.1 feature).
[0..9]
Examples
$ pg_dump authors
$ pg_dump -a authors
$ pg_dump -t payroll authors
Notes/Location
pg_dump cannot handle large objects (LOs).
pg_dump cannot correctly handle extracting all system catalog metadata. For
instance, partial indexes are not supported.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pg_dumpall
Description
Usage/Options
pg_dumpall [options]
Option Description
-v Verbose mode.
-Fp Uses a plain SQL text format. This is the default (v7.1 feature).
-Fc Outputs the archive in new custom format. This is the most flexible
option (v7.1 feature).
-i Ignores version mismatch with the server back end (pg_dump is only
designed to work with the correct version; this is for experimental
use only).
-O, --no- Does not set ownership objects to match the original database (v7.1
owner feature).
-R, --no- Does not attempt to connect to the database (v7.1 feature).
reconnect
-S, -- Specifies the superuser name (DBA) to use when disabling triggers
superuser and setting ownership information (v7.1 feature).
name
-Z, -- Specifies compression level (0–9); currently, only the custom format
compress supports this feature (v7.1 feature).
[0..9]
Examples
$ pg_dumpall
$ pg_dumpall -a
$ pg_dumpall -o
Notes/Location
pg_dumpall has many of the same limitations that pg_dump has with regard to
system metadata. See the section "pg_dump" for more information.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pg_restore
Description
This is a new tool included with the PostgreSQL 7.1 release. It is designed to restore
data dumped by the pg_dump or pg_dumpall database utilities.
The new version of pg_dump includes the capability to dump data in a nontext
format that has many advantages over traditional data dumps:
Selective restores are possible using the new pg_dump format and
pg_restore.
The new pg_dump format will produce queries to enable the regeneration of
all user-defined types, functions, tables, indexes, aggregates, and operators.
Usage/Options
Option Description
-Fc, -- Specifies that the format of the archive file is in the custom
format=c format of pg_dump. This is the most flexible format to restore
from.
-L, --use-list Restores the elements contained in the specified file only.
file Restored in the order they appear, comments begin with a
semicolon (;) at the start of the line.
-O, --no-owner Does not restore ownership information; objects will be owned
by the current user.
-r, -rearrange Restores items in modified OID order. (This is the default.)
-x, --no-acl Prevents restoration of the Access Control List (that is,
Grant/Revoke information).
-h, --host Specifies the hostname where the server process is running.
name
Examples
Notes/Location
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pg_upgrade
Description
The pg_upgrade utility can be used to upgrade to a new version of the database
system without having to reload all the data in the current database.
This command currently will not function on PostgreSQLVersion 7.1 and above; see
the section on pg_restore if you are using one of the newer versions.
Usage/Options
-f file—This specifies a file containing the schema for the old database.
Examples
The usual method of upgrading a database while using the pg_upgrade utility is as
follows:
10. Copy old pg_hba.conf files and pg_options to their new location (that is,
/usr/local/pgsql/data).
13. Connect to the restored database and examine its contents carefully.
14. If the database is not valid, restore from your full dump file created in step 1.
15. If the database is valid, issue a vacuum command to update query-planning
statistics (that is, vacuumdb -z mydb).
Notes/Location
Not all upgrades can be accomplished with this tool. Check the release notes of the
new database version to see if pg_upgrade is supported.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pgtclsh
Description
Usage/Options
Examples
$ pgtclsh myfile.tcl
Notes/Location
If pgtclsh is launched with no specified script file, it automatically enters into the
interactive Tcl interface.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pgtksh
Description
The pgtksh command is essentially a Tk (wish) shell with the libpgtcl libraries
loaded. This is what the pgaccess program is based on.
Usage/Options
Examples
$ pgtksh myfile.tcl
Notes/Location
If pgtksh is launched with no specified script file, it automatically enters into the
interactive Tcl interface.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
psql
Description
Once psql is started and connected to the specified database, the user is entered
into an interactive shell. In this mode, commands can be issued to the PostgreSQL
back end, and the responses can be seen in real time.
Usage/Options
psql [options] [database [user]]
The psql options fall into two categories: command-line options, which are issued
while starting psql, and shell options, which can be issued inside the psql shell.
Command-
Line Description
Options
-a, -- Echoes all processed lines to the screen. This is useful when running
echo-all a script to monitor progress or for debugging purposes.
-f, -- Reads the specified file and executes the contained SQL queries.
file file After finished, terminates.
-P, -- Allows setting the default print (output) style (for example,
pset val aligned, unaligned, html, or latex).
-q Quiet mode.
-s, -- Prompts the user before each query is executed. Useful for
single- debugging or controlling execution of SQL scripts.
step
-t, -- Turns off the printing of column names and result totals. Only prints
tuples- data (tuples) returned.
only
Inside the psql shell, most options are prefaced with a backslash (\).
Shell Options Description
\C title Sets the specified title to print atop each query result set.
\c, \connect db Closes the current connection and connects to the specified
[user] database. Optionally, will connect as the specified user.
filename |
stdin | stdout
[using
delimiters
char]
[with null as
nullstr]
\dd [obj] Shows the comments associated with all objects in the
current database. Optionally, only shows comments attached
to specified object.
\dT [pattern] Displays information on the data types included in the current
database. Optionally, only displays information on objects that
match the specified pattern.
\e, \edit file Launches an external editor (vi is default) to edit the file
specified.
\encoding type Sets encoding to the specified type or, with no parameter, lists
the current encoding type.
\f str Sets the field separator to the specified string. Default is the
piping symbol (|) (see also \pset).
\g [file | Sends output from the entered query to the specified file or
command] pipes it through the specified command (similar to \o).
\h, \help Displays a list of all the valid SQL commands. Optionally,
[command] displays more detailed help on the specified command.
\i file Reads input from the specified file and executes it.
\l, \list Lists all known databases and their owners (including a "+"
will also display comments).
\lo_export oid Exports the large object with the specified OID to the
file filename specified.
\lo_import file Imports the large object from the filename specified.
[comment] Optionally, provides a descriptive comment to be associated
with the LO.
\lo_unlink oid Deletes the large object with the specified OID from the
current database.
\pset parameter Allows the user to manually set one of several options that
affect the current database. See this listing of parameters:
\set var value Sets the psql environmental variable to the specified value.
(Note: This is not the same as the SQL SET command.) Here
is a list of valid environmental variables:
Prompt types:
%M—Full hostname.
%m—Truncated hostname.
%>—Port number.
%/—Current database.
\w file command Outputs the current query buffer to the specified filename or
pipes it through the provided command.
Examples
Alternatively, to run an entire script called mydb.sql from the command line
(executing into the newriders database):
To perform the same example but this time display it in HTML mode (useful for CGI
programming):
To redirect queries to an output file from inside a psql shell and to include
descriptive titles to each data dump:
psql=>\o mycapture.txt
psql=>\qecho Listing of all authors
psql=>\qecho **********************
psql=>SELECT * FROM authors;
psql=>\qecho And their payroll info
psql=>\qecho **********************
psql=>SELECT * FROM payroll;
To list all files that end in .sql in the current directory from a psql shell interface
(notice the use of backticks, not single quotes):
Notes/Location
The psql shell environment also supports variable substitution. The most basic
form associates a variable name with a value. For instance:
As you can see, variable names are referenced by prefixing the name with a colon
(:).
To change the default editor when using the \e command, specify the correct value
to the PSQL_EDITOR variable.
Source— /usr/local/pgsql/bin
vacuumdb
Description
The vacuumdb command is a wrapper program for the VACUUM SQL statements.
Although there is no real difference in how they operate, the vacuumdb command is
most widely used for running via a cron job.
Usage/Options
-p, --port port The port or socket file of the listening server.
-q, --quiet Do not return any responses from the back end.
Analyze options:
-t, --table table(col) Analyzes column only (must be used with -z).
Examples
$ vacuumdb -d newriders
$
$ vacuumdb newriders
$ vacuumdb -a
$
$ vacuumdb -z -d newriders -t authors
Notes/Location
RPM— /usr/bin
Source— /usr/local/pgsql/bin
Chapter 7. System Executable Files
Although most of these files are executable by user accounts, they have been
collected into a separate listing.They are usually relegated to specific database
system events, as opposed to the everyday use of the commands discussed in
Chapter 6, "User Executable Files." Generally, these files are to be used for server
control instead of as client utilities.
The following sections include the typical file locations of the discussed commands.
These locations usually differ depending on whether the database system was
installed from source code or from an RPM package.
Alphabetical Listing of Files
initdb
Description
The initdb command is used to prepare a directory location for a new PostgreSQL system.
The initdb command is usually performed with several other steps, as briefly outlined here:
6. initdb will generate the template1 database. (Each time you create a new database, it
is generated from template1.)
Usage/Options
Option Description
-E, -- Specifies the encoding type to use (the system must have been built with
encoding=type the multibyte encoding flag set to true).
Examples
$initdb -D /usr/local/pgsql/data
Notes/Location
RPM— /usr/bin
Source— /usr/local/pgsql/bin
initlocation
Description
The initlocation utility is used to initialize a secondary data storage area. In some ways,
this command is similar to initdb, except that many internal catalog operations do not occur
with initlocation. Additionally, this command can be run as often as necessary, as opposed
to initdb, which is generally run only once per installation.
Usage/Options
initlocation path
Examples
$initlocation /usr/local/pgsql/data2
Notes/Location
RPM— /usr/bin
Source— /usr/local/pgsql/bin
ipcclean
Description
The ipcclean command is a shell script designed to clean up orphaned semaphores and
shared memory after a back-end server crash.
Usage/Options
ipcclean
Notes/Location
This command makes certain assumptions about the naming convention used with output from
the ipcs utility. Therefore, this shell script might not be portable across all operating systems.
Warning!
pg_ctl
Description
Usage/Options
Option Description
f, fast Sends SIGTERM to the back end. Active transactions are issued an
immediate ROLLBACK.
-o Sends specified options to the postmaster. Options are usually quoted in order
"options" to ensure proper execution.
Examples
$pg_ctl start
$pg_ctl -m smart stop
Notes/Location
RPM— /usr/bin
Source— /usr/local/pgsql/bin
pg_passwd
Description
The pg_passwd utility is used to create and manipulate the password file needed if
authentication is enabled in PostgreSQL.
Usage/Options
pg_passwd filename
Examples
$pg_passwd /usr/local/pgsql/data/pg_pword
File "/usr/local/pgsql/data/pg_pword" does not exist,, Create? (y/n): Y
Username: barry
Password:
Re-enter password:
Notes/Location
The file must be in the path of the PostgreSQL database to be used for client authentication.
Additionally, the authentication method might need to be entered in the pg_hba.conf
configuration file.
RPM— /usr/bin
Source— /usr/local/pgsql/bin
postgres
Description
The postgres file is the actual server process for processing queries in PostgreSQL. It is
usually called by the multiprocess postmaster wrapper. (Both are actually the same file;
postmaster is a symlink to the postgres process.)
The postgres server is usually not invoked directly; rather, many of these options are passed
to the postgres process upon execution.
Although generally not invoked directly, the postgres process can be executed in an
interactive mode that will allow queries to be entered and executed. However, such execution
should not be attempted if the postmaster process is running; data corruption could result.
Usage/Options
Option Description
-A 0|1 Specifies whether assert checking should be enabled. (This debugging tool is
available only if enabled at compile time. If it was enabled, the default is on.)
-B val The number of 8KB shared buffers to use. The default is 64.
-c Sets various runtime options. See the Advanced Option list later in this chapter for
var=val these options.
-d Sets the debug level. The higher the value, the more entries are output to the log.
level The default is 0; the range is usually valid up to 4.
-F Disables fsync system calls. Can result in improved performance, but there is a
risk of data corruption. Generally, use this option only if there is a specific reason to
do so; it is not intended for standard operation.
-s Sends timing statistics to stdout for each processed query. Useful for performance
tuning.
-S val Specifies the amount of kilobytes to be used for internal sorts and hashes before the
system calls on temporary files. This memory amount indicates that every system
sort and/or hash is capable of using up to this much memory. When the system is
processing complex sorts, multiple sort/hash instances will be used, and each one
will use up to this much memory. The default value is 512 kilobytes.
Advanced
Description
Option
-p Indicates that the specified database has been started by postmaster. Impacts
database buffer sizes, file descriptors, and so on.
-tpa Prints timing information for the system parser. (Cannot be used with the -s
option.)
-tpl Prints timing information for the system planner. (Cannot be used with the -s
option.)
-te Prints timing information for the system executor. (Cannot be used with the -s
option.)
-W sec Sleeps for the specified number of seconds before starting. Useful for developers
who need to start debugging programs in the interim.
Notes/Location
When starting the postgres process, the current OS username is selected as the PostgreSQL
username. If the current username is not a valid PostgreSQL user, the process will not continue.
postgres and postmaster are the same file (actually postmaster is a symbolic link to the
postgres executable). However, you cannot substitute one command for the other and expect
the same results. The postgres executable registers what name it was invoked by, and if it is
called as postmaster, certain options and assumptions are enabled.
Many of these options can be and are passed to the postgres process by using a configuration
file. (See the "pg_options/postgresql.conf" section in Chapter 8, "System Configuration Files
and Libraries.")
RPM— /usr/bin
Source— /usr/local/pgsql/bin
postmaster
Description
The postmaster is the multiuser implementation of the postgres application. In most cases,
this process is started at boot time, and log files are redirected to an appropriate file.
One postmaster instance is required to manage each database cluster. Starting multiple
instances can be achieved by specifying separate data locations and connection ports.
Usage/Options
Option Description
-A 0|1 Specifies whether assert checking should be enabled. (This debugging tool is only
available if enabled at compile time. If it was enabled, the default is on.)
-B val The number of 8KB shared buffers to use. The default is 64.
debug_level = integer
fsync = BOOLEAN
virtual_host = integer
tcpip_socket = BOOLEAN
unix_socket_directory = integer
ssl = BOOLEAN
max_connections = integer
port = integer
enable_indexscan = BOOLEAN
enable_hashjoin = BOOLEAN
enable_mergejoin = BOOLEAN
enable_nestloop = BOOLEAN
enable_seqscan = BOOLEAN
enable_tidscan = BOOLEAN
sort_mem = integer
show_query_stats = BOOLEAN
show_parser_stats = BOOLEAN
show_planner_stats = BOOLEAN
show_executor_stats = BOOLEAN
(Note: See the configuration options in Chapter 8 for more information about these
settings.)
-d Sets the debug level. The higher the value, the more entries are output to the log.
level The default is 0; the range is usually valid up to 4.
-F Disables fsync system calls. Can result in improved performance, but there is a
risk of data corruption. Generally, only use this option if there is a specific reason to
do so; it is not intended for standard operation.
-h host Specifies the host for which the server is to respond to queries. This defaults to
listening on all configured interfaces.
-i Enables clients to connect via TCP/IP. By default, on UNIX, domain sockets are
permitted.
-k path Specifies the directory the postmaster is to use for listening to UNIX domain
sockets. (The default is /tmp.)
-1 Enables use of SSL connections. (Note: This option requires that SSL was enabled at
compile time and that the -i option has also been used.)
-N val Specifies the maximum connections permitted to this database back end. The
default value is 32, but this can be set as high as 1,024 if your system will support
that many processes. (Note: The -B option must be set with at least twice the
number of -N to operate correctly.)
-o Command-line options to pass to the postgres back end. If the option string
options contains any whitespace, quotes must be used. (Note: See postgres for valid
command-line switches.)
-p port The TCP/IP port on which to start listening for connections. The default port is either
5432 or the port set during compile time. (Note: If set to a nondefault port, all client
applications will need to specify this port number to connect successfully.)
Examples
$postmaster -D /usr/local/pgsql/data
Start postmaster as a background process with a specified data directory and direct all error
messages to a specified log file:
Notes/Location
When starting the postmaster process, the current OS username is selected as the
PostgreSQL username. If the current username is not a valid PostgreSQL user, the process will
not continue.
postgres and postmaster are the same file (actually postmaster is a symbolic link to the
postgres executable). However, you cannot substitute one command for the other and expect
the same results. The postgres executable registers the name it was invoked by, and if it is
called as postmaster, certain options and assumptions are enabled.
Location of the file:
RPM— /usr/bin
Source— /usr/local/pgsql/bin
Chapter 8. System Configuration Files and Libraries
In addition to the executable files mentioned in the previous chapters, PostgreSQL
also includes files that deal with configuration settings. Additionally, depending on
what features are desired, a number of libraries are included. The following is a
listing of the configuration files used and the libraries included.
System Configuration Files
pg_options/postgresql.conf
Description
This is the configuration file that specifies what options are to be used when the server is started as
postmaster. The name of this file has changed between versions and will either be called pg_options,
postmaster.opts, or postgresql.conf, depending on your current version. The exact syntax and
options available within this configuration file also vary depending on the version.
Notes/Location
Essentially, this file is a text file that contains various command-line switches. A standard configuration
file might appear as follows:
The exact syntax of the configuration file will depend on what version of PostgreSQL is running. The
postgresql.conf file, which is the method used in 7.1, accepts the following options:
CHECKPOINT_SEGMENTS (integer)
CHECKPOINT_TIMEOUT (integer)
Sets the estimated cost of processing each tuple when used in an index scan.
Sets the estimated cost for processing each operator in a WHERE clause.
DEADLOCK_TIMEOUT (integer)
Specifies the amount of time, in milliseconds, to wait on a lock before checking to see if there is a
deadlock condition or not.
DEBUG_ASSERTIONS (boolean)
DEBUG_LEVEL (integer)
The value that determines how verbose the debugging output is. This option is 0 by default, which
means no debugging output. Values up to 4 are valid.
Specifies what to print in the debug information. Prints the query, the parse tree, the execution plan, or
the query rewriter output to the server log.
Sets the assumed size of the disk cache. This is measured in disk pages, which are normally 8KB apiece.
ENABLE_HASHJOIN (boolean)
The Boolean value to enable or disable hash joins. The default is on.
ENABLE_INDEXSCAN (boolean)
The Boolean value to enable or disable the use of index scan plan types. The default is on.
ENABLE_MERGEJOIN (boolean)
The Boolean value to enable or disable the use of merge-join plan types. The default is on.
ENABLE_NESTLOOP (boolean)
The Boolean value to enable or disable the use of nested-loop join plans. It's not possible to suppress
nested-loop joins entirely, but turning this variable off discourages the planner from using it.
ENABLE_SEQSCAN (boolean)
The Boolean value to enable or disable the use of sequential scan plan types. It's not possible to
suppress sequential scans entirely, but turning this variable off discourages the planner from using it.
ENABLE_SORT (boolean)
The Boolean value to enable or disable the use of sort steps. It's not possible to suppress sorts entirely,
but turning this variable off discourages the planner from using it.
ENABLE_TIDSCAN (boolean)
The Boolean value to enable or disable the use of TID scan plan types. The default is on.
FSYNC (boolean)
The Boolean value that enables or disables PostgreSQL use of the fsync() system call in several places
to make sure that updates are physically written to disk and do not hang around in the kernel buffer
cache. This increases the chance that a database installation will still be usable after an operating system
or hardware crash by a large amount. However, use of this option will degrade system performance. The
default is off.
GEQO (boolean)
The Boolean value to enable or disable genetic query optimization. This is on by default.
GEQO_THRESHOLD (integer)
Specifies how many FROM items until the GEQO optimization is used. The default is 11.
HOSTNAME_LOOKUP (boolean)
The Boolean value to specify whether to resolve IP addresses to hostnames. By default, connection logs
show only the IP address.
KRB_SERVER_KEYFILE (string)
KSQO (boolean)
The Key Set Query Optimizer (KSQO) causes the query planner to convert queries whose WHERE
clause contains many OR'ed AND clauses. KSQO is commonly used when working with products like
Microsoft Access that tend to generate queries of this form. The default is off.
LOG_CONNECTIONS (boolean)
The Boolean value to enable or disable logging of each successful connection.This is off by default.
LOG_PID (boolean)
The Boolean value that enables or disables the log entry to prefix each message with the process ID of
the back-end process. The default is off.
LOG_TIMESTAMP (boolean)
The Boolean value to enable or disable each log message to include a timestamp.The default is off.
MAX_CONNECTIONS (integer)
Determines how many concurrent connections the database server will allow.The default is 32.
MAX_EXPR_DEPTH (integer)
Sets the maximum expression nesting depth that the parser will accept. The default value is high enough
for any normal query, but you can raise it if you need to. (If you raise it too high, however, you run the
risk of back-end crashes due to stack overflow.)
PORT (integer)
SHARED_BUFFERS (integer)
Sets the number of 8KB shared memory buffers that the database server will use.The default is 64.
Various Boolean values to set options that write performance statistics of the respective module to the
server log.
SHOW_SOURCE_PORT (boolean)
The Boolean value to enable or disable the showing of the outgoing port of the connected user. The
default is off.
SILENT_MODE (bool)
The Boolean value that determines whether postmaster runs silently. If this option is set, postmaster
will automatically run in the background, and any controlling ttys are disassociated; thus, no messages
are written to stdout or stderr (the same effect as the postmaster's -S option). Unless some logging
system such as syslog is enabled, using this option is discouraged because it makes it impossible to see
error messages.
SORT_MEM (integer)
Specifies the amount of memory to be used by internal sorts and hashes before resorting to temporary
disk files. The value is specified in kilobytes and defaults to 512 kilobytes.
SQL_INHERITANCE (bool)
The Boolean value to determine whether subtables are included in queries by default. By default, 7.1 and
above include this capability; however, this was not the case in prior versions. If you need the old
behavior, you can set this variable to off.
SSL (boolean)
The Boolean value to enable or disable SSL connections. The default is off.
SYSLOG (integer)
The value that determines the postgres use of syslog for logging. If this option is set to 1, messages
go both to syslog and the standard output. A setting of 2 sends output only to syslog. The default is
0, which means syslog is off. To use syslog, the build of postgres must be configured with the --
enable-syslog option.
SYSLOG_FACILITY (string)
This option determines the syslog "facility" to be used when syslog is enabled.You can choose from
LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7; the default is LOCAL0.
SYSLOG_IDENT (string)
If logging to syslog is enabled, this option determines the program name used to identify PostgreSQL
messages in syslog log messages. The default is postgres.
TCPIP_SOCKET (boolean)
TRACE_NOTIFY (boolean)
The Boolean value to enable or disable debugging output for the LISTEN and NOTIFY commands. The
default is off.
UNIX_SOCKET_DIRECTORY (string)
Specifies the directory of the UNIX domain socket on which the postmaster is to listen for connections
from client applications. The default is normally /tmp.
UNIX_SOCKET_GROUP (string)
UNIX_SOCKET_PERMISSIONS (integer)
Sets the access permissions of the UNIX domain socket. The default permissions are 0777, meaning
anyone can connect.
VIRTUAL_HOST (string)
Specifies the TCP/IP hostname or address on which the postmaster is to listen for connections from
client applications. Defaults to listening on all configured addresses (including localhost).
WAL_BUFFERS (integer)
The number of disk-page buffers in shared memory for the WAL log.
WAL_DEBUG (integer)
WAL_FILES (integer)
WAL_SYNC_METHOD (string)
The method used for forcing WAL updates out to disk. Possible values are FSYNC, FDATASYNC,
OPEN_SYNC, and OPEN_DATASYNC. Not all of these choices are available on all platforms.
All the preceding Boolean values will accept the following values:
ON OFF
TRUE FALSE
YES NO
1 0
RPM— /var/lib/pgsql/data/
Source— /usr/local/pgsql/data
/etc/logrotate.d/postgres
Description
Notes/Location
Although not an official part of the PostgreSQL distribution, many systems include a file to manage and
rotate log files produced by PostgreSQL.
These are usually cron jobs that are scheduled to run daily or weekly. These RPM additions are usually
configured to run PostgreSQL logging with syslog.
It is generally not advisable to attempt to rotate log files not running as a syslog configured
installation. PostgreSQL keeps its connections to log files open at all times; therefore, rotating a log file
while postmaster is still active could result in unpredictable behavior.
If configuring PostgreSQL to run with syslog is not an option, the next best solution is to briefly stop
the postmaster service, rotate your log files, and then restart the database system.
For more information on PostgreSQL log files and syslog, see Chapter 9, "Databases and Log Files."
RPM— /etc/logrotate.d/postgres
Note:
This file was subsequently dropped from the most recent RPM package due to the confusion
resulting from the syslog versus PostgreSQL log issues identified here. However, older RPM
packages that are specifically designed to work with syslog will include the preceding file in
the specified location.
pg_hba.conf
Description
The pg_hba.conf file is a configuration file that is responsible for host-based access control. Essentially,
this is a text file that details how users are permitted to connect to the PostgreSQL back end.
This file has separate areas that deal with local or remote (TCP/IP) users, databases allowed to connect,
and authentication methods.
The format of a PostgreSQL access control file differs depending on whether a TCP/IP or a local UNIX
connection is being specified. The basic formats are as follows:
TCP/IP:
Local:
Option Description
Notes/Location
In this case, all local connections are permitted. Similarly, any connection from the IP range 192.168.0.0
to 192.168.0.254 is permitted to the web database. However, a user from that address block trying to
connect to the payroll database will need to provide authentication provided by the crypt method.
RPM— /usr/local/pgsql/data/
Source— /var/lib/pgsql/data/
Library Files
Source: /usr/local/pgsql/lib
RedHat: /usr/lib
libecpg.a
libecpg.so libecpg.so.X.Y.Z
libecpg.so.X libecpg.so.X.Y.Z
libecpg.so.X.Y.Z
libpgeasy.a
libpgeasy.so libpgeasy.so.X.Y
libpgeasy.so.X libpgeasy.so.X.Y
libpgeasy.so.X.Y
libpq.a
libpq.so libpq.so.X.Y
libpq.so.X libpq.so.X.Y
libpq.so.X.Y
libpq++.a
libpq++.so libpq++.so.X.Y
libpq++.so.X libpq++.so.X.Y
libpq++.so.X.Y
libpsqlodbc.a
libpsqlodbc.so libpsqlodbc.so.X.Y
libpsqlodbc.so.X libpsqlodbc.so.X.Y
libpsqlodbc.so.X.Y
Library files for the tcl interface:
libpgtcl.a
libpgtcl.so libpgtcl.so.X.Y
libpgtcl.X libpgtcl.so.X.Y
libpgtcl.so.X.Y
plpgsql.so
/usr/lib/perl5/site_perl/5.005/<arch>/Pg.so
/usr/lib/python1.5/site-packages/_pgmodule.so
/usr/lib/php3/pgsql.so
/usr/lib/php4/pgsql.so
Chapter 9. Databases and Log Files
Depending on your specific installation, the location of your log and database files
will vary. Normally, they are located in the same directory as your base PostgreSQL
data files.
File Description
You will notice that all user-defined databases are stored neatly in the
$PGBASE/base directory. Every database created in PostgreSQL is stored in its own
directory under $PGBASE/base. Within each directory are two main classes of files:
system catalogs and user-created.
Note:
Some changes were made to these file locations starting inVersion 7.1. In
particular, there now exists a template0 file that is a read-only copy of
the template1 file. Additionally, many of the preceding files are now
named according to their PID number; this change was made to facilitate
the new Write-Ahead Logging (WAL) implementation. Refer to the latest
documentation included with your system for more information.
System Catalogs
Every time a new database is created, PostgreSQL extracts a base set of system
catalogs from the template1 database. These files are used to track tables,
indexes, aggregates, operators, data types, and so on.
A basic set of system catalogs should look something like the following. (Different
versions contain different catalog files, but this is a representational sample.)
File Description
It is important to realize that these objects are accessible (from a DBA account) via
a standard SQL interface. From psql, for instance, these system catalogs can be
called like normal SQL tables.
Warning!
User-Defined Catalogs
This directory also contains the names of any user-defined tables, indexes,
sequences, and so on. For instance, looking in the directory for your newriders
database, you see the following:
This will redirect the stdout of the postmaster process to the file named pglog
and will also redirect the stderr facility of postmaster to stdout (which is then
itself redirected to the specified log file).
The system log files and the database log files would reside in separate areas.
This can make debugging system failures problematic.
Depending on the size of your database, the frequency of its use, and your
networking architecture, this might be a fine solution. However, there are two
methods for dealing with the problems previously presented:
This type of solution usually mandates the use of cron and shell scripting. The
process usually occurs like this:
Typically, this can be completed in one or two minutes, depending of course on the
system hardware.
Configuring cron to perform these tasks is outside the scope of this book, but
generally it is a straightforward process.
The script to handle the actual log rotation can be done either as a simple shell
script or in a language like Perl or Python. A typical rotation scheme usually
renames (using mv) files, keeping only a specific amount of history. For instance:
Typically, these events are run daily or weekly; however, for heavily used systems, a
more frequent schedule could be advisable.
The preceding approach would be fine for most PostgreSQL server installations;
however, a number of issues are still not addressed by this example, namely:
Log files are not integrated with other system log files (making debugging
more difficult).
The solution is to use the syslog facility present on most UNIX (Linux) systems.
1. Compile the source using the appropriate options to enable logging (that is, --
enable-syslog) or download the appropriate RPMs that have that
functionality enabled.
2. Enable the syslog option in the pg_options (or equivalent) file (that is,
syslog=1).
local0.* /var/log/postgresql
Using syslog is suggested for larger installations or when remote monitoring of the
database system is a priority.
Chapter 10. Common Administrative Tasks
The administration duties for dealing with a PostgreSQL system can be broken down
as follows:
Creating users
Performance tuning
Each of these issues requires specific knowledge of key areas of the system. Please
refer to the following sections that focus on these areas.
Compiling and Installation
Source-Based Installation
The source files can be retrieved from the PostgreSQL FTP site
(ftp.postgresql.org) or from numerous mirror sites around the world.
After the code is unpacked, you can delete the original tar.gz file if disk space is
an issue; otherwise, move it to a safe location.
Next, review the INSTALL text file included in the created directory for installation
notes. Briefly, the rest of the procedure is as follows:
2. Review the installation options for your system. Here is a partial list of options
supported (type ./configure --help for a full list):
--enable-locale
3.
Configure the source code with those options selected (for example,
configure --with-odbc).
5. If the make fails, examine the log files generated (usually in ./config.log)
for any reasons why the compile didn't work.
6. Type make install to install the binaries to the location specified (default is
/usr/local/pgsql).
7. Tell your machine where the libraries are located, either by setting the
LD_LIBRARY_PATH environmental variable to the <BASEDIR>/lib path or
by editing the /etc/ld.so.conf file to include it.
8. Include the <BASEDIR>/bin path in the user's or system's search path (that
is, /etc/profile).
9. Create the directory to hold the databases, change the ownership to the DBA,
and initialize the location (assumes a user named postgres exists):
# mkdir /usr/local/pgsql/data
# chown postgres /usr/local/pgsql/data
# su - postgres
> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
10. Start the postmaster server (as the DBA account) in the background.
Specify the data directory previously created, such as:
11. As DBA, create the users you need using the createuser command.
12. Switch to the user created and create the database(s) needed (that is,
createdb).
Package-Based Installation
(examples)
postgresql-server-7.0.3-2.i386.rpm Server programs (req)
postgresql-7.0.3-2.i386.rpm Clients & Utilities (req)
postgresql-devel-7.0.3-2.i386.rpm Development Libraries
postgresql-odbc-7.0.3-2.i386.rpm ODBC Libraries
postgresql-perl-7.0.3-2.i386.rpm Perl interface
postgresql-python-7.0.3-2.i386.rpm Python interface
postgresql-tcl-7.0.3-2.i386.rpm TCL Interface
postgresql-tk-7.0.3-2.i386.rpm Tk Interface
postgresql-test-7.0.3-2.i386.rpm Regression Test Routines
3. Verify that a user for PostgreSQL was created by examining /etc/passwd (or
the equivalent).
4. Switch to the DBA user account (typically postgres) and create the users you
need (for example, createuser web).
Switch to that user and create the working database (for example, createdb web
site).
Creating Users
Database users are separate entities from regular operating system users.
Depending on the particular application, it might be possible to have only one or two
total database users. However, if multiple users need to connect to the database—
each with his or her own set of access rights—it is desirable to create individual user
accounts.
The easiest way to create users is to utilize the command-line utility createuser.
There are three main attributes to consider when creating new users:
Should they be able to create their own users? (Are they superusers?)
The actual act of creating users can take place either at the command line or in an
interactive SQL session.
>createuser web
>Shall the new user be allowed to create databases (y/n)? N
>Shall the new user be allowed to create users (y/n)? N
Creating a user from a SQL session enables some additional options that are not
available from the command-line utility. For instance, passwords, group
membership, and account expiration can all be set from this method.
Additionally, PostgreSQL enables users to be collected into logical groups for easier
permission management. To create a group, the following command should be
entered in a SQL session:
Select (read)
Insert (write)
Update/Delete (write)
Rule (write/execute)
By default, the creator of a database is implicitly given all rights to all objects in the
database. These privileges are considered immutable for the DBA superuser
account.
To assign other users rights on database options, use the GRANT and REVOKE SQL
commands, such as:
The PostgreSQL system also has a reserved keyword called PUBLIC that applies to
every user in the system (except the DBA). This can make setting blanket rules and
permissions much easier.
Typically, the users should be collected into logical groups that seek to match like
users together with respect to their permissions. Rights then can be assigned or
revoked for the entire group without having to specify every individual user.
Proper database maintenance ensures that the system will always function optimally
and that problems can be handled effectively. There are three main areas of
database maintenance:
Regular monitoring of log files can tip off administrators to potential issues that can
be corrected long before they become major problems. Some administrators even
write small custom scripts to parse log files and automatically mail any suspicious
entries to an email address so that the administrator can take further action.
cron is also a useful database maintenance tool, especially for performing routine
tasks like vacuumdb and log rotation. One of the primary reasons the vacuumdb
utility was created as a separate command-line utility was to facilitate its use as an
automated cron job.
Database Backup/Restore
The most critical component of any database maintenance plan is the database
backup and restore procedures. Once again, PostgreSQL makes the administrator's
job easier by providing command-line tools such as pg_dump, pg_dumpall, and
pg_restore. Like commands such as vacuumdb, these are especially suited to be
run as cron jobs.
After the command has redirected its output to a standard OS file, standard backup
tools can be used to securely archive it.
Here are some factors to consider when trying to evaluate an optimal backup plan.
Will you need to selectively restore database files (that is, specific tables and
so on)?
This new format provides a great deal of flexibility when it comes time to
restore. The database schema, data, functions, or specific tables can be
selectively restored. Additionally, this new format stores data in a compressed
format that causes fewer problems when dealing with very large databases.
For instance, to dump out the newriders database in the special format and
then selectively restore only the payroll table:
There are a number of answers to this problem, ranging from upgrading your
PostgreSQL database system to piping output through special tools.
Alternatively, another method for achieving the same effect is to pipe the
output of pg_dump through the gzip command. This can be done using any
version of PostgreSQL and standard UNIX system commands:
If the resulting zipped files are still too large, the other option is to use the
split command. This example will split the output file into numerous 1GB
files.
For complex installations, lost configuration files can be very time consuming
to try to re-create by hand. Make sure you have secure, offline copies of all
your PostgreSQL configuration files.
Performance Tuning
Generally speaking, there are no surefire methods for obtaining optimal performance from a
database system. However, there are guidelines that can assist administrators in implementing
successful performance-tuning strategies.
Hardware Considerations
If you notice that your database system is consistently running at high CPU loads or that an
excessive amount of hard-disk paging is occurring, it might be necessary to upgrade your
hardware.
These are the four biggest hardware issues related to database performance:
RAM. Not enough RAM will result in the database constantly having to swap memory to
hard disk. This expensive and time-consuming operation always incurs a performance hit.
Hard disk. Slow hard drives and controllers can result in a severe lack of performance.
Upgrading to newer controllers and/or drives can result in a significant boost in system
speed. Particularly, the use of striped RAID arrays can benefit system performance.
CPU. Insufficient CPU resources can slow down system responsiveness, particularly if many
large queries are being processed simultaneously. Because PostgreSQL is not
multithreaded, there is no direct benefit to be gained by running it on a multi-CPU system.
However, each connection does receive its own process, which could be benefited by being
spread across multiple CPUs.
Network. No matter how robust the system's hardware, performance will suffer if there are
networking problems. Upgrading networking cards, adding switches to the LAN, and
increasing bandwidth capacity can all positively impact system performance.
Although it is common to blame hardware for database sluggishness, most often there are
tunings to the underlying database code that could improve performance.
Some general rules can be followed to help tune SQL database code:
Any fields in which joins are being done or that are the focus of numerous SELECT…WHERE
clauses should be indexed. However, there is a balance to strike between the number of
indexes on a field and performance. Indexes help with selection but penalize insertion or
updates. So having every field indexed is not a good idea.
If numerous tables are being updated or inserted, encapsulating the statements inside of
one BEGIN…COMMIT clause can significantly improve performance.
Use cursors.
Using cursors can dramatically improve system performance. In particular, using cursors to
generate lists for user selection can be much more efficient than running numerous isolated
queries.
Limit use of triggers and rules.
Although triggers and rules are an important part of data integrity, overuse will severely
impact system performance.
Starting with PostgreSQLVersion 7.1, it is possible to control how the query planner will
operate by using an explicit JOIN syntax. For instance, both of the following queries
produce the same results, but the second unambiguously gives the query planner the order
to proceed:
Enforcing some minimal standards on the front end of a database application can improve
overall system performance. Checking input fields for valid and/or minimal information
requirements can obviate the need to do tremendously expensive queries. For instance,
enforcing that a front end requires more than three letters on a last name will prevent the
back end from having to process a query to return all records in which the last name begins
with an "S," which could be a very expensive query and not provide any real value to the
user.
PostgreSQL comes with certain default or preset settings with regard to buffer size, simultaneous
connections, and sort memory. Usually these settings are fine for a standalone database.
However, they usually are set cautiously low to make as little impact on the system as possible
while idle.
For larger, dedicated servers with several hundred or thousand megabytes of data, these settings
will need to be adjusted.
It is often assumed that setting the options to higher values will automatically improve
performance. Generally, you should not exceed more than 20% of your system limits with any of
these settings. It is important to leave sufficient RAM for kernel needs; a sufficient amount of
memory particularly needs to be available to handle network connections, manage virtual
memory, and control scheduling and process management. Without such tolerances, performance
and responsiveness of the system will be negatively impacted.
There are three crucial run-time settings that impact database performance: shared buffers, sort
memory, and simultaneous connections.
Shared Buffers
The shared buffer option (-B) determines how much RAM is made available to all of the server
processes. Minimally, it should be set to at least twice the number of simultaneous connections
allowed.
Shared buffers can be set either in the postgresql.conf file or by issuing a direct command-
line option to the postmaster back end. By default, many PostgreSQL installations come with a
preset value of 64 for this setting. Each buffer consumes 8KB of system RAM. Therefore, in a
default setting, 512KB of RAM is dedicated for shared buffers.
If you are setting up a dedicated database that is expected to handle very large datasets or
numerous simultaneous connections, it might need to be set as high as 15% of system RAM. For
instance, on a machine with 512MB of RAM, that means a shared buffer setting of 9,000.
Ideally, buffer space should be large enough to hold the most commonly accessed table
completely in memory. Yet it should be small enough to avoid swap (page-in) activity from the
kernel.
Sort Memory
The postgres back end (which typically is only called by the postmaster process) has a
setting (-S) that determines how much memory is made available to query sorts. This value
determines how much physical RAM is exhausted before resorting to disk space, while trying to
process sorts or hash-related functions.
This setting is declared in KB, with the standard default being 512.
For complex queries, many sorts and hashes might be running in parallel, and each one will be
allowed to use this much memory before swapping to hard disk begins. This is an important point
to stress—if you would blindly set this setting to 4,096, every complex query and sort would be
allowed to take as much as 4MB of RAM. Depending on your machine's available resources, this
might cause the virtual memory subsystem of your kernel to swap this memory out.
Unfortunately, this is usually a much slower process than just allowing PostgreSQL to create
temporary files in the first place.
Simultaneous Connections
There is a postmaster option (-N) that will set the number of concurrent connections that
PostgreSQL will accept. By default, this setting is set to 32. However, it can be set as high as
1,024 connections. (Remember that shared buffers need to be set to at least twice this number.)
Also remember that PostgreSQL does not run multithreaded (yet); therefore, every connection
will spawn a new process. On some systems, like UNIX, this poses no significant problems.
However, on NT, this can often become an issue.
The EXPLAIN command describes the query plan being evaluated for the supplied query. It
returns the following information:
Starting cost. This is an estimation of how much time elapsed before an output scan
began. Typically, this number will be nonzero if it was waiting on another query to complete
before it could begin; such is the case with subselects and joins.
Total cost. This is an estimation of how much time would be spent if all rows were
returned. This occurs regardless of whether any other factors, like a LIMIT statement,
would've prevented all rows from being returned.
Output rows. This is the estimated number of rows returned. As in the preceding, this
happens even if factors like LIMIT statements would prevent it.
Estimated average width. This is the width, in bytes, of the average row.
The time units previously mentioned are not related to an objective amount of time; they are an
indication of how many disk page fetches would be needed to complete the request.
For example:
In this example, you can see that the total cost increased slightly. It is interesting to note that
although there is an index on age on this table, the query planner is still using a sequential scan.
This is due to the fact that the search criterion is so broad; an index scan would not be of any
benefit. (Obviously, all values in the age column are less than 10,000.)
If you constrain the search criterion slightly more, you can see some changes:
Again, you are still using an index scan, although the number of rows returned is now lower.
Further constraints can produce results that are more dramatic:
A number of things are interesting about this result. First, you have finally constrained the
criterion enough to force the query planner to make use of the age_idx index. Second, both the
total cost and the number of returned rows are dramatically reduced.
You can see the tremendous speed gain you were able to achieve by using such a limited
criterion.
Using EXPLAIN on more complex queries can sometimes illuminate potential problems with the
underlying database structure.
This output produces some interesting facts about the underlying database structure. Obviously,
the authors table has an index on name, but the payroll table appears to be resorting to
using sequential scans and sorts to match fields.
After investigating, it is determined that, in fact, the payroll table does not have an
appropriate index for this join. So, after an index is created, you get the following results:
By including an index on the payroll table, you have now achieved a 25% increase in query
execution.
Running EXPLAIN on your queries is a good way to uncover hidden bottlenecks that are
impacting system performance.
In fact, for non-hardware–related problems, the EXPLAIN command is probably the single best
tool that a DBA can use to solve performance problems. EXPLAIN provides the information
necessary to intelligently allocate system resources, such as shared buffers, and optimize your
queries and indexes for greater performance.
One of the best ways to use the EXPLAIN command is as a benchmark generation tool. This way,
when changes are made to table schema, indexes, hardware, or the operating system, a valid
comparison can be made to determine how much these changes affected system performance.
Part IV: Programming with PostgreSQL
11 Server-Side Programming
13 Client-Side Programming
The choice of language and approach depends heavily on several factors. Procedural
language programming is less complex and therefore enables a faster development
cycle. Procedural language programming is the preferred method for performing
common extensions to the base PostgreSQL system, and as such, it assists in
maximizing code reuse.
Utilizing the external APIs is appropriate when fine-grained control and/or speed of
execution is required. Moreover, utilizing the externally available APIs might be the
only way to interface custom applications with the back-end system in specific
cases.
Benefits of Procedural Languages
This chapter specifically focuses on using the procedural languages (PL) available in
PostgreSQL. Regardless of the specific PL chosen, they all share a common set of
advantages to the developer. Those advantages include the following:
Control structures. By default, the SQL language does not allow the
programmer to use the rich set of control structures and conditional
evaluations included in other common programming languages. For this
reason, the included PLs allow a developer to marry such traditional control
structures with the SQL language. This is particularly useful when creating
complex computations and triggers.
Security. The included PostgreSQL PLs are trusted by the back-end system
and only have access to a limited set of system-wide functions. In particular,
the included PLs operate, on a system level, with the same permissions
granted to the base postgres user. This is because it implies that extraneous
file system objects will be safe from any errant code.
Installing Procedural Languages
In a default installation, PostgreSQL will automatically include the capability for the
system to access code written in the PL/pgSQL language. Both PL/Tcl and PL/Perl
can also be included by setting their respective compile-time variables (that is, --
witht-cl or --with-perl).
SQL Declaration
The location of the shared library, which acts as a handler, might vary from
installation to installation. RPM-based installations usually place it in the
/usr/lib/pgsql directory. Source-based installations will depend on what path
was supplied to the install script. The UNIX find command can always be used with
great effect in these situations.
For instance:
For instance:
The createlang utility can be used currently to register either the PL/Tcl or PL/Perl
language with the back-end server. Moreover, the createlang utility also accepts
an option to declare what database the language is registered in. If a language is
registered in the template1 database, then that language will be available in all
future databases subsequently created.
For instance:
This command will automatically register the PL/Tcl language in the template1
database and in all subsequent databases.
PL/pgSQL
The PL/pgSQL language is the default language typically used to perform server-side programming. It combines
the ease of SQL with the power of a scripting language.
With PL/pgSQL, it is possible to build custom functions, operators, and triggers. A standard use might be to
incorporate commonly called queries inside the database. Many RDBMSs refer to this as stored-procedures, and
it offers a way for client applications to quickly request specific database services without the need for a lengthy
communication transaction to occur. The overhead involved in establishing a conversation between a client and
server machine can often significantly slow down the apparent speed of the system.
When a PL/pgSQL-based function is created, it is compiled internally as byte code. The resultant near-binary
code is then executed each time the function is called. PostgreSQL will execute the PL/pgSQL compiled code
rather than having to reinterpret individual SQL commands. Therefore, this can result in a significant
performance increase compared to reissuing the same SQL commands time after time.
Another benefit of using PL/pgSQL is when portability is an issue. Because PL/pgSQL is executed entirely within
the PostgreSQL system, this means that PL/pgSQL code can be run on any system running PostgreSQL.
<label declaration>
[DECLARE
…Statements… ]
BEGIN
…Statements…
END;
Any number of these blocks can be encapsulated inside each other, for instance:
<label declaration>
[DECLARE
…Statements… ]
BEGIN
[DECLARE
…Statements… ]
BEGIN
…Statements…
END;
…Statements…
END;
When PostgreSQL encounters multiple groups of DECLARE…BEGIN…END statements, it interprets all variables as
local to their respective group. In effect, variables used in one subgroup are not accessible to variables in
neighboring or parent groups. For instance, in this example, all the myvar variables are local to their respective
subgroups:
Comments
PL/pgSQL has two different comment styles: one for inline comments (such as --) and another for comment
blocks (such as /* … */). For instance:
BEGIN
Some-code --this is a comment
<…>
<…>
<…>
Some-more-code
<…>
/* And this
is a comment
block */
END;
Variable declarations are made in the DECLARE block of the PL/pgSQL statement. Any valid SQL data type can
be assigned to a PL/pgSQL variable. Declaration statements follow this syntax:
If a variable type is unknown, the programmer can make use of the %TYPE and %ROWTYPE commands, which
will automatically gather a specific variable type or an entire row from a database table.
For instance, if you wanted to automatically type the variable myvar as the same type as the table/field
payroll.salary, you could use the following:
Alternatively, an entire database row can be typed by using the %ROWTYPE syntax. For instance:
PL/pgSQL can accommodate up to 16 passed variables. It refers to variables by their ordinal number.The
numbering sequence starts at 1; therefore $1 represents the first variable passed, $2 the second, and so on.
There is no need to declare the data type of the passed variable; PL/pgSQL will automatically cast the
appropriate variable number as the proper data type.
Using the ALIAS keyword, however, enables the programmer to alias a more descriptive variable name to the
ordinal number. For instance:
Additionally, the RENAME command can be used to rename current variables to alternate names. For instance:
Control Statements
PL/pgSQL supports most of the common control structures such as IF...THEN and WHILE loops, and FOR
statements. Most of the syntax of these statements works as it does in other languages. The following sections
outline the basic format expected by these control statements.
IF…THEN…ELSE…ELSE IF
In addition to just the basic IF…THEN statement, PL/pgSQL also provides the capability to perform ELSE and
ELSE IF exception testing.A string of ELSE IF conditional tests is analogous to using a CASE or SWITCH
statement, which is often found in other programming languages.
IF conditional-expression THEN
execute-statement;
END IF;
IF conditional-expression THEN
execute-statement;
ELSE
execute-statement;
END IF;
IF conditional-expression THEN
execute-statement;
ELSE IF conditional-expression2 THEN
execute-statement;
END IF;
LOOPS
Like all programming languages, PL/pgSQL includes the capability to create code loops that will only run when
certain conditions are met. Loops can be particularly useful when traversing a series of rows in a table and
performing some manipulation.
LOOP
Statements;
END LOOP;
Such as:
LOOP
x:=x+1;
END LOOP;
Or, alternatively, the EXIT directive can be used with an IF…THEN statement to create an exit point.
LOOP
x:=x+1;
IF x>10 THEN
EXIT;
END IF;
END LOOP;
Another way of performing the preceding task is to use the EXIT WHEN statement, such as:
LOOP
x:=x+1;
EXIT WHEN x>10;
END LOOP;
The WHILE clause can be included to offer a cleaner implementation of the preceding, such as:
In contrast to a WHILE -type loop, a FOR loop is expected to perform a fixed number of iterations. The FOR
statement expects the following syntax:
For instance, the following two examples count from 1 to 100 and from 100 to 1, respectively:
Although these examples are similar in functionality to the WHILE loops, the real power of using FOR loops is
for traversing record sets. For instance, this example traverses through the payroll table and summarizes the
total amount paid out for a given payroll period:
retval:=retval+PAYROLL.SALARY;
END LOOP;
RETURN retval
END;
' LANGUAGE 'plpgpsql';
Using SELECT
PL/pgSQL has some slight differences from standard SQL in how the SELECT statement operates inside of a
code block. The SELECT…INTO command normally creates a new table; inside of a PL/pgSQL code block,
however, this declarative assigns the selected row to a variable placeholder. For instance, this example declares
a variable myrecs as a RECORD and fills it with the output of a SELECT query.
In the preceding example, the existence of an email is determined by comparing it against a SQL NULL value.
Alternatively, the NOT FOUND clause can be used following a SELECT INTO query. For example:
There are two basic methods for executing code within a current code block. If a return value is not required,
the developer can call the code with the PERFORM command.
The following example gives an indication of how the PERFORM command would be used. First, a custom
function is defined, addemp, which accepts the parameters needed to create an employee. If the employee
already exists, however, the function exits with a 0 exit code. However, if the employee was created, the exit
code is a 1. The following is an example of your first function:
After the preceding is created, you can now call this function from another by using the PERFORM statement. As
mentioned earlier, the PERFORM statement ignores any return values from the called function. So, in this case,
the returned 0 or 1 exit code will be ignored. However, due to the nature of how the addemp function is being
used, that is not a concern.
<function is created>
<…Some Code…>
…
/*Traverse List and run against addemp function */
FOR emp IN SELECT * FROM TempEmps;
PERFORM addemp(emps.name, emps.emp_id, emps.age);
END LOOP;
…
<…Some Code…>
<End Function>
In the preceding case, no return values are processed from the PERFORM addemp clause. In this instance, this
is a desired behavior because the addemp function will only add employees when it is appropriate to do so.
The EXECUTE statement contrasts with the PERFORM command in that, instead of executing predefined
functions, the EXECUTE statement is designed to handle dynamic queries.
For instance, the following code snippet gives a brief example of how this could be used:
The preceding example shows how a basic dynamic query can be created using the EXECUTE statement.
However, much more complex uses are possible. In fact, it is possible to actually use the EXECUTE statement to
create custom functions within other functions.
PL/pgSQL uses the RAISE statement to insert messages into the PostgreSQL log system. The basic format for
the RAISE command is as follows:
format—Uses the % character to denote the placeholder for the comma-separated list in identifier.
DEBUG will be silently ignored if debugging is turned off (compile-time option). NOTICE will write the message
to the client application and enter it in the PostgreSQL system log file. EXCEPTION will perform all the actions
of NOTICE and additionally force a ROLLBACK from the parent transaction.
Unfortunately, PL/pgSQL does not have built-in mechanisms for detecting or recovering from an error based on
RAISE events. This can be done either by setting specific return variables or through explicit trapping done in
the client application. However, in most cases—particularly if the transaction is aborted—not much can be done
with regard to automatic recovery; usually human intervention will be required at some level.
PL/pgSQL also includes the capability for a function to retrieve certain diagnostic settings from the PostgreSQL
back end while in process. GET DIAGNOSTICS can be used to retrieve the ROW_COUNT and the RESULT_OID.
The syntax would be as follows:
The RESULT_OID would only make sense after an insert was immediately performed previously in the code.
Notes
The BEGIN and END statements that define a PL/pgSQL code block are not analogous to the BEGIN…END SQL
transaction clause. The SQL BEGIN…END statements define the start and commit of a transactional statement. A
PL/pgSQL function is automatically part of either an explicit or implicit transaction in the SQL query that called
it. Because PostgreSQL does not support nested transactions, it is not possible to have a transaction be part of
a called function.
Just like with standard SQL declarations, in PL/pgSQL, arrays can be used by utilizing the standard notation (for
example, myint INTEGER(5);).
The main differences between PL/pgSQL and Oracle's procedural languages are that PostgreSQL can overload
functions, CURSORS are not needed in PostgreSQL, default parameters are allowed in function calls in
PostgreSQL, and PostgreSQL must escape single quotes. (Because the function itself is already in quotes,
queries inside a function must use a series of quotes to remain at the proper level.) There are other
differences, but most of these deal with specific syntax issues; consult an Oracle PL/SQL book for more
information.
PL/Tcl
The PL/Tcl language allows a trusted version of the popular Tool Command Language
(Tcl) to be used when creating custom functions or triggers in PostgreSQL. Although a
full explanation of the Tcl language is outside the scope of this book, we will highlight
some of the major features and provide some examples.
The major difference between the regular Tcl language and PL/Tcl is that the latter is
running in a trusted mode. This means that no OS-level activity can be performed.
Moreover, only a limited set of Tcl commands are enabled. In fact, Tcl functions cannot
be used to create new data types in PostgreSQL.
Much of the syntax in PL/Tcl is the same as Tcl in general. The following is a brief
synopsis of how to use Tcl.
Comments
Like many scripting languages, the default comment indicator is the pound sign (#).
Any line that begins with this symbol is ignored entirely. For instance:
Variable Assignment
Tcl accepts variable assignments. For instance, to assign a variable, you could do the
following:
Set myval 10
Set mystr "This is my string"
Set myval_2 myval+100
The first two examples are obvious: The variable myval is set to a numerical value of
10, and the variable mystr is set to string. However, the last example is deceptive.
On first look, it would appear that the variable myval_2 should be equal to 110, but
actually it is equal to the string myval+100. To perform variable substitution, use the
following syntax:
PL/Tcl uses the $ symbol to indicate that a variable is being referenced. Additionally,
anything enclosed in brackets ([]) is evaluated as Tcl code.
Control Structures
Like all modern scripting languages, Tcl has the standard flow-control mechanisms for
determining code-path execution. For instance, the standard IF block looks like this:
if {conditional-expression} {
#code block
}
if {conditional-expression} {
#code block
}
else {
#something else
}
Tcl also supports the standard WHILE and FOR loops. For instance:
Or, alternatively, a FOR loop could be used. The FOR loop takes the following syntax:
For instance:
The Tcl language also supports a more powerful FOR loop called FOREACH. The basic
syntax is as follows:
For example:
Or alternatively:
Tcl also supports the SWITCH control structure. The basic syntax is as follows:
OPTION usually refers to -exact, -glob, or -regexp, which does exact matching,
pattern matching, or regular expression matching on the supplied test cases.
The DEFAULT keyword can be used to match a case that fails all other comparison
tests. Additionally, a "-" sign as a code statement will indicate that the first following
full code statement is to be run as the appropriate execution initiative. For instance:
The preceding example will return a 4. Notice how the "-" continuation symbols can be
linked together to form a chain of correct matches.
Tcl has many included list- and string-related commands. A brief listing is included
here:
Command Description
Up to this point, we have been discussing general features of the Tcl language; Pl/Tcl
adds some specific functionality to the base language.
Basic Structure
This is similar to how all PLs are used within PostgreSQL, and as with PL/pgSQL, care
must be taken to properly escape quoted character strings correctly.
Due to the nature of performing queries with PL/Tcl, it is important to be able to store
globally accessible data between various operations inside a PL/Tcl code block.
To accomplish this, PL/Tcl uses an internally available array named "GD." This variable
is the recommended method for distributing shared information throughout a
procedure. (See the example in the next section for a procedure that uses the GD
variable.)
Unlike PL/pgSQL, you cannot simply embed standard SQL statements inside of PL/Tcl.
There are special built-in commands that allow access to the database back end.
The spi_exec command can be used to submit a query directly to the database
query engine. The syntax for the spi_exec command is as follows:
The following are some examples of how the spi_exec command works:
The preceding use of spi_exec executed the queries by submitting them directly to
the query engine. In many cases, this approach will work fine. However, if you plan to
execute the same basic query multiple times—with perhaps just a change in criteria—
it is more efficient to prepare the query and then execute it.
When a query is prepared, it is submitted to the query planner, which then prepares
and saves a query plan for the submitted entry. It is then possible to use that query
plan to execute the actual query, which can result in performance increases if used
correctly.
A query is prepared by using the spi_prepare command, which takes the following
syntax:
After a query has been prepared, it can be executed with the spi_execp command.
This command is similar to the spi_exec command, with the exception that is geared
toward executing already-prepared queries. The following is the syntax that the
spi_execp command uses:
The following is an example of using the spi_execp command; notice the use of the
GD global system variable. In particular, the following example will only create the
query plan when first called; on all subsequent calls, the previously saved plan is
simply executed:
Constructing Queries
A related command that is useful when accessing the PostgreSQL back end is the
quote statement. This command is useful in constructing query strings that make use
of variable substitution.An example of the quote command is as follows:
The preceding would result in the following text if sent to the query parser:
One subtle point to watch out for is when the value of a variable already contains a
single or double quote. The quote command will dutifully reproduce this, which could
result in an error being generated from the PostgreSQL query parser. Consider the
following:
The preceding would result in the following text if sent to the query parser:
Like PL/pgSQL, there are commands present in PL/Tcl that provide access to the
PostgreSQL log system. The elog command uses the following syntax:
Notes
When installing PL/Tcl—whether at compile time or after—it is required that the Tcl
language and associated libraries exist on the target system for installation to be
successful.
PL/Perl
Perl is one of the most common scripting languages in use. It runs on almost all
platforms and has wide support in the development community. For these reasons,
PL/Perl can be an effective choice when choosing a PostgreSQL PL language.
Like PL/Tcl, the PostgreSQL implementation of PL/Perl only enables specific commands,
which are deemed trusted. Essentially, any Perl commands that explicitly deal with the
file system, environmental settings, or external modules have been disabled.
It is still possible, however, that errant code created in PL/Perl can negatively impact
the base system. Most of these problems are because PL/Perl will still allow the
exhaustion of memory and endless loops to be created. Therefore, code created in
PL/Perl should be closely inspected to ensure that runaway code could not create a
problem for the parent system.
Much of the syntax in PL/Perl is the same as Perl in general. The following is a brief
synopsis of how to use Perl. Obviously, if you are new to Perl, consult one of the many
books or web sites available for the new Perl user.
Comments
Like many scripting languages, the default comment indicator is the pound sign (#).
Any line that begins with the pound sign is ignored entirely.
Control Structures
Perl contains most of the common control structures that are present in other
languages. The standard IF structure is as follows:
if (expression) {
code-statement
}
Perl also supports more complex IF statements, like IF…ELSE and ELSEIF statements.
For instance:
if (expression) {
code-statement
} elseif (another-expression) {
other-code-statement;
} elseif (another-expression) {
other-code-statement;
} else (final-expression) {
final-code;
}
Notice in the preceding code how ELSEIF and ELSE statements can be combined to
create a chain of test cases and a final default statement to execute if none of the cases
test true.
Perl also supports WHILE, UNTIL, DO, FOR, and FOREACH loops; examples are notated
in the following:
while ($a<10) {
print $a;
$a++;
}
until ($a>10) {
print $a;
$a++;
}
do {
print $a;
$a++;
} while ($a<10)
Perl also contains ways to break out-of-control structures like LAST, NEXT, and REDO,
and by using labeled blocks. The following is a list of examples:
while ($a<10) {
print $a;
if ($a=5){
#a is 5, so exit loop
last;
}
$a++;
}
print "exited loop";
The preceding code will continue looping until one of two conditions are met: Either $a
is greater or equal to 10 or $a is equal to 5. (Actually, in this example, the code will
never reach 10 because it will always exit at 5.)
The other statements work similarly. The NEXT statement will reiterate the loop and
skip any remaining items; the REDO statement will run the loop again from the
beginning without reevaluating the test condition. For example:
while ($a<10) {
$a++;
print $a;
if ($a=5){
#a is 5, so loop again
next;
}
}
print "exited loop";
In addition to just reiterating the loop, label declaratives can be specified in conjunction
with the NEXT, LAST, and REDO statements to control program flow:
Associative Arrays
One of the more powerful features of the Perl language is how associative arrays can be
created and manipulated. The next example creates a two-element array and assigns
values to it:
$employee("name")="Fred";
$employee("age")=29;
To get a listing of the keys contained in an array, use the keys function. For instance:
@lst = keys(%employee);
#lst now equals ("name", "age")
Alternatively, if you wanted to list the values stored in array, you could use the values
function. For instance:
@lst = values(%employee);
#lst now equals ("Fred", 29)
If you want to return both the key and value pairs together, the each function can be
used. This function is meant to be used inside of a loop, and on each successive call, it
returns the next key/value pair. For instance:
To remove an element from an associative array, use the delete function, as in the
following:
$employee("name")="Fred";
$employee("age")=29;
$employee("shoesize")=10;
#The employee array is 3 elements wide
delete $employee("shoesize");
#Now just 2
In Perl, array numbering begins at 0 and proceeds sequentially for every element
contained. Lists of elements can be specified by using a comma-separated list. Negative
numbers refer to array elements beginning at the end of the element list. The following
is a brief listing of examples:
For example:
@queue=(54,123,65643);
#Return and Remove 65643
$myval=pop(@queue);
#Add 111 to queue
push(@queue, 111);
#
#Return and Remove 54
$myval=shift(@queue);
#Add 222 to left side
unshift(@queue, 222);
To reverse or reorder the list of elements, use the reverse or sort function, as in the
following:
@lst=(10,1,5);
@lst=reverse(@lst); #Now lst = (5,1,10)
@lst=sort(@lst); #Now lst = (1,5,10)
One of the things that has made Perl so widely used is its use of regular expressions
(regex). Essentially, regular expressions are a method to match patterns between a
supplied template and the source text. A full explanation of regex is beyond the scope
of this book; however, Table 11.1 and Table 11.2 provide some examples.
Escaping Characters
As with PL/pgSQL and PL/Tcl, it is important to remember that quoted strings inside of
a PL/Perl function need to be properly escaped.
Use of the Perl functions q[], qq[], and qw[] can assist in creating properly escaped
variable-substitution sequences.
Variable Substitution
By default, variables are passed to the underlying Perl function as "$_". This variable is
the default Perl namespace when no explicit variable has been specified, and
consequently, this is the namespace that inherits PL/Perl variables. For instance:
Additionally, entire tuples can be passed to a PL/Perl function. Within the PL/Perl code,
the keys of the associative array are the field names from the passed tuple. Obviously,
the values of the associative array hold the field data. For instance:
Notes
When installing PL/Perl—whether at compile time or after—it is required that the Perl
language and associated libraries exist on the target system for installation to be
successful. Moreover, the shared library version of libperl (that is, libperl.so)
should be present so that PostgreSQL can have access to it.
Chapter 12. Creating Custom Functions
By itself, a database is nothing more than a container that holds data. The functions
and tools are what make a database truly useful. Much of the work of designing an
effective database is being able to model the business rules needed. Developing the
proper table schema is one of the ways to model the business rules within the
database; the other ways are through the creation of custom functions, triggers,
and rules.
There are many cases in which the system would benefit from the existence of user-
defined functions. Functions are particularly useful when the same information
needs to be accessed repeatedly. In these cases, it is possible to create a user-
defined function that is stored within the server.
The benefits of using custom-created functions are not only limited to speed
considerations. In many instances, the standard SQL language does not provide
sufficient control to perform the desired action. For instance, if conditional
branching, loop iteration, or complex variable substitution is needed, creating
custom functions might be the only way to accomplish the task at hand.
Example Uses
In this section, you will examine instances of when creating custom functions would
be useful.
Code Reuse
Speed and efficiency. The preformed query plan already exists in the
database engine and awaits execution.
Combining Functions
In this example, there is a specific user interface (UI) feature that you are trying to
create. It comes to the developer's attention that when users are entering
information into the system, they want to be able to enter either the employee
name or the employee ID into the dialog box.
This task is made easier because of the fact that you can assume that all employee
IDs will be strictly composed of numbers, whereas employee names will consist
entirely of letters.
Rather than having to re-create this feature for each instance of its use, it is decided
to create a general case function that simply accepts either input (ID or name) and
returns the employee ID.
Therefore, the following function can be developed that will accept either format and
return the employee ID. (The full potential of this function will not be seen until
later.)
At first glance, this function doesn't appear to be that useful. It simply determines
whether the variable passed is a digit or alphabetical, and it returns the employee
ID for that person. Moreover, it seems that if this function is already passed the
employee ID, it simply returns that value directly back. On the surface, this might
seem like a waste. However, when combined with other functions, the true potential
for such a function can be seen.
For instance, by combining the first function, homestate, with this latest function,
you can enable it to accept either the last name or the employee ID. In this case,
you use your latest function as a wrapper to ensure a flexible range of input values.
The clientside code would appear as follows:
SELECT homestate(getempid('Stinson'));
Or
SELECT homestate(getempid(592915));
Or, finally
SELECT homestate(getempid(strInputValue));
By combining the two functions, this allows a more flexible range of accepted input
data, while still storing data in a consistent format on the back end. Moreover, if the
developers one day realize that they want to allow users to input the Social Security
number as well, it will only require a modification of the getempid function.
Stored Procedures
In reality, stored procedures and functions are exactly the same thing. Namely, they
are a set of code statements that are created with a CREATE FUNCTION command.
The difference is more conceptual than concrete.
Stored procedures, however, do more than just accept a value and provide return
data. Generally, they perform some basic procedure or alteration to database tables.
For instance, consider the following example.
If an employee doesn't exist, add the person and assign him or her to the
specified job.
If an employee already exists, change his or her job description to the new
one.
Given these specs, a sample stored procedure that accomplishes this might appear
as the following:
Stored procedures are very useful in automating table manipulations that must
occur regularly. For instance, a good use might be to perform a task such as voiding
a payroll check. Typically, such an operation requires modifying many tables in a
standard account system setup. Although it could be coded directly into the client
application, that might make for a more rigid application in the end.
For instance, with the current system, modifications might need to be made to the
payroll, employee, AP, and GL tables to fully void an incorrectly printed check.
There would be no problem, per se, with coding this procedure directly from the
client machine. If in the future, however, there is a new table— JobCost—that
needs to be updated, this could be a needlessly complex change to make. It could
require changing the code in dozens or hundreds of client applications.
A better approach would have been to create the task of voiding a check as a stored
procedure (that is, function) within the database back end. The benefit of this setup
is that the clients simply call the voidcheck function and are oblivious to the actual
steps the server is taking to complete their request. On the server side, it is
relatively minor to update the function to affect another table; therefore, the entire
system becomes much more flexible.
Creating Custom Triggers
There is a certain degree of overlap between stored procedures (that is, functions) and
triggers. Both operate as predefined code created with the CREATE FUNCTION statement.
However, triggers are most often used as an automated response to some table-related
event, not as an action directly called by a client application.
Triggers bind these functions to DELETE, UPDATE, or INSERT table events using the
CREATE TRIGGER command. The client application has no direct knowledge of a trigger's
existence; it simply performs the requested action, which results in the server firing the
appropriate trigger event.
Triggers are used for performing actions that pertain to the same table that is being
accessed. Often they are used as a mechanism to ensure data or business-rule integrity. For
instance, consider the following function and trigger pair:
The preceding two functions make use of the new and old keywords. These keywords refer
to data that has just been INSERTED or DELETED, respectively, when called as part of a
trigger event. Next, a trigger event is created and bound to each function.
CREATE TRIGGER employee_insert_update
BEFORE INSERT OR UPDATE
ON employee
FOR EACH ROW EXECUTE PROCEDURE trig_insert_update_check_emp();
Now that the triggers have been created, they can be tested as follows:
In the preceding examples, notice the similarity between how these triggers behave and
how column constraints typically behave. Column constraints generally check a specific
field's validity before an INSERT or UPDATE is allowed.
However, triggers and column constraints are not mutually exclusive in their behavior. If the
BEFORE keyword is used when creating a trigger, it will fire before the field (or table)
constraints are checked. Moreover, the BEFORE keyword means that the trigger will be fired
before the actual insert is completed. Therefore, if a trigger depends on an OID or relies on
a unique index, it will not function correctly.
Likewise, when the AFTER keyword is specified, the trigger event will be activated after the
specified table action (INSERT, UPDATE, or DELETE) has already completed. Moreover, the
AFTER keyword will cause the trigger not to fire until all the table or field constraints have
already been evaluated.
Creating Custom Rules
Rules are very similar to triggers in concept, with some crucial differences. Triggers usually
refer exclusively to the table being acted on, whereas rules act on external tables.
Additionally, triggers are fired in addition to the action being carried out. For instance, an
INSERT trigger will fire the event either before or after the insert is performed.
Rules, on the other hand, can also be created with the optional keyword INSTEAD. In this
case, the rule action is carried out in lieu of the specified action.
A typical use of rules is to perform actions on external tables when a table-related event
occurs on the specified table. A simple use of a rule set would be to implement the
capability to log an audit trail of changes made to important tables. For instance, suppose
the management wants to see a weekly report of every expenditure over $1000. You could
implement this as follows:
Rules can also be used in conjunction with functions to create actions that are more
complicated. For instance, suppose there are two tables in the database, payroll and
paytotals. The payroll table holds an individual record for every payroll check issued.
The paytotals table, however, has the latest year-to-date payroll totals for each
employee.
In this case, it is assumed that it is important for the system to automatically keep the
paytotals table up-to-date.A rule/function combination could be created to accomplish
this, as follows:
Now that the function is created, it can be incorporated into a rule set:
As previously mentioned, the CREATE RULE command also allows for the inclusion of the
INSTEAD keyword. When this is specified, an alternative action will be performed.
Following from the previous example, let's assume management decided that any entry into
the accounts_payable table that was over $1000 should be deferred into an alternate
table until it was approved. For instance:
This would result in any insert actions made into the accounts_payable table being
redirected into the ap_hold table, pending management approval.
Unlike triggers, rules can also be defined to occur on SELECT statements. This can have
some interesting implications. For instance, consider the following example:
In the preceding example, we've created a table that INHERITS all the attributes from the
base table. Then a rule is defined on the new table that rewrites any SELECT statements
and enforces a criterion match. The resultant action behaves suspiciously like a standard
VIEW. This is not by accident because PostgreSQL actually uses rule definitions as the way
that the CREATE VIEW command is implemented.
This example shows a dangerous potential of rule use. In this case, if an INSERT is made to
either table, an infinite loop of cascading insert actions will begin. In actuality, PostgreSQL
is too intelligent to allow this to happen, and the action would automatically fail once too
many recursive queries are executed.
In general, however, rules should only point to tables that do not have any associated rules
already set. That is, rule sets should point away from other rule sets. In large databases,
which might have hundreds of tables, it can be extremely complicated to manage and
predict results if numerous rules are actively engaged.
Additionally, rules only have access to specific system classes, namely to the OID attribute.
This means that rule definitions cannot act directly on any system attributes. Therefore,
functions such as func(table) will fail because table is considered a system class.
The code body for a particular rule can be accessed by viewing the pg_rules catalog.
Chapter 13. Client-Side Programming
PostgreSQL provides a number of interfaces that enable client applications to access
the database back end. In addition to the APIs provided by PostgreSQL, a number of
other languages have provided their own interfaces to PostgreSQL.
The choice of client language depends on many factors. C and C++ excel at fine-
grained control and raw speed, Python and Perl are ideal for rapid prototyping and
flexibility, PHP is great as a web-based solution, and ODBC and JDBC provide access
from Windows or Java clients. Interfaces for each of these languages are addressed
in the following sections.
ecpg
ecpg is a set of applications and libraries designed to help facilitate an easy way to
include SQL commands within C source code. Embedded SQL in C, or ecpg, is a
multiplatform tool that many RDBMSs support.
The concept behind Embedded SQL is that a developer can simply type SQL queries
directly into his or her C source code, and the ecpg preprocessor translates those
simple SQL statements into more complex functions, thereby obviating that work
needing to be done by the developer.
The output of the ecpg program is standard C code; this can then be linked against
the libpq and ecpg libraries and compiled directly to binary code.
The general flow of creating a program with ecpg is illustrated in Figure 13.1.
Embedded SQL makes use of the syntax in the following section to perform standard
database operations.
The following code is used to define the variables needed by the underlying C
program when data is passed to or from the PostgreSQL back end.
Obviously, this section must occur before any use can be made of the empl_id and
empl_name variables, and the types must match their corresponding PostgreSQL
data type. Here is a brief table that matches PostgreSQL to standard C data types:
PostgreSQL C
SMALLINT short
INTEGER int
INT2 short
INT4 int
FLOAT float
FLOAT4 float
FLOAT8 double
DOUBLE double
DECIMAL(p,s) double
VARCHAR(n) struct
DATE char[12]
TIME char[9]
TIMESTAMP char[28]
Connecting to a Database
Embedded SQL in C uses the following syntax for connecting to a back-end server:
dbname[@server][:port]
Executing Queries
Once connected, queries can be sent to the back end for processing by using the
following syntax:
In general, almost all query actions require that an explicit COMMIT command be
issued. The exception is SELECT commands; they can be issued on a single line.
The following are some examples of typical usage:
Error Handling
The epcg communications area must be defined with the following command:
PostgreSQL provides a type 4 JDBC driver. This indicates that the driver is written in
Pure Java and is platform independent. Therefore, once compiled, the driver can be
used on any system.
To build the driver at compile time of the system, include the --with-java option
of the configure command. Otherwise, if the system is already installed, it can
still be compiled by entering the /src/interfaces/jdbc directory and issuing
the make install command.
Upon completion, the JDBC driver will be in the current directory, named
postgresql.jar.
To use the driver, the jar archive postgresql.jar needs to be included in the
environment variable CLASSPATH. For example, to load the driver with the fictional
Java application foo.jar, you would issue (this assumes using the Bash shell) the
following:
$ CLASSPATH=/usr/local/pgsql/lib/postgresql.jar
$ export CLASSPATH
$ java ./foo.jar
Configuring Clients
Any Java source that uses JDBC needs to import the java.sql package using the
following command:
import java.sql.*;
Do not import the postgresql package. If you do, your source will not
compile.
Connecting
To connect, you need to get a Connection instance from JDBC.To do this, you
would use the DriverManager.getConnection() method:
Connection db = DriverManager.getConnection(url,user,pwd);
Options Description
jdbc:postgresql:database
jdbc:postgresql://host/database
jdbc:postgresql://hostport/database
Options Description
Executing Queries
Updating Records
To update a specific element or to execute any statement that does not result in a
ResultSet, use the executeUpdate() method. For instance:
The libpq library is a C language API that provides access to the PostgreSQL back
end. In fact, most of the provided client tools (like psql) use this library as their
connection route to the back end.
The libpq provides many functions that can control nearly every aspect of the
client/server. Although an in-depth discussion of every function is outside the scope
of this chapter, the two most popular functions it provides are as follows:
PQconnectdb and PQexec are both discussed in more detail in the following sections.
PQconnectdb
The PQconnectdb function accepts several options as shown here. In this example,
your user-defined object name is PGconnectID, but it could be anything.
conninfo would contain one of the following (in the form option=value):
Option Description
If the conninfo string is not specified, the following environmental variables can
be set to specify the connection options:
CONNECTION_MADE
CONNECTION_AWAITING_RESPONSE
CONNECTION_AUTH_OK
CONNECTION_SETENV
CONNECTION_OK
CONNECTION_BAD
Closes the connection to the back end and frees memory used by the PGconn
object.
PQexec
Provides information regarding the last executed query.This will return one of
the following values:
PGRES_EMPTY_QUERY
PGRES_COMMAND_OK
PGRES_TUPLES_OK
PGRES_COPY_OUT
PGRES_COPY_IN
PGRES_BAD_RESPONSE
PGRES_NONFATAL_ERROR
PGRES_FATAL_ERROR
This function returns the last error message specifically associated with a
particular PGresult. This function differs from PQerrormessage, which
returns the last error associated with a particular connection but not a specific
result.
Returns the field name associated with the given field index. Field indices start
at 0.
Returns the field index associated with the specified field name. A value of –1
is returned if the given name does not match any field.
Returns an integer that represents the field type associated with the field
index. The system table pg_type contains the names and properties of the
various data types. The OIDs of the built-in data types are defined in
src/include/catalog/pg_type.h in the source tree.
Returns the number of bytes of field data in the specified field index.
char* PQgetvalue(const PGresult *res, int tup_num, int
field_num)
Returns a single field value of one row of a PGresult. In most instances, the
value returned by PQgetvalue is a null-terminated ASCII representation of
the value.
Returns the number of rows affected by the SQL command. This function only
measures the effects of INSERT, DELETE, or UPDATE commands.
If the SQL command was an INSERT, returns the OID of the tuple inserted;
otherwise, returns InvalidOid.
Frees the storage associated with the PGresult. Every query result should be
freed via PQclear when it is no longer needed. PGresult does not go away
after use, even if the connection is closed. Failure to do this will result in
memory leaks in the front-end application.
libpq++
The libpq++ library enables C++ applications to interface with the PostgreSQL
back end. Fundamentally, it operates in the same way as the libpq library except
that much of it is implemented as classes.
PgConnection
The connection info can be specified either by the connect-string argument (as in
the preceding) or by expressly setting the following environmental variables:
The connect-string argument, if the environmental variables are not used, can be
specified with the following options (in the form option=value):
Option Description
int PgConnection::ConnectionBad()
Sends a query to the back-end server for execution. Returns the results of the
query.The status should report one of the following:
PGRES_EMPTY_QUERY
PGRES_COMMAND_OK
PGRES_TUPLES_OK
PGRES_COPY_OUT
PGRES_COPY_IN
PGRES_BAD_RESPONSE
PGRES_NONFATAL_ERROR
PGRES_FATAL_ERROR
PgDatabase
The pgDatabase class provides access to the elements residing in a return set of
data. Specifically, this class is useful for returning information pertaining to how
many rows or fields were affected by a given query. The following are the class
functions:
int PgDatabase::Tuples()
int PgDatabase::CmdTuples()
int PgDatabase::Fields()
Returns the field name associated with the given index. The field indices start
at 0.
Returns the field index associated with the field name specified.
Returns the field type associated with the given index. The integer returned is
an internal coding of the type.
Returns the number of bytes occupied by the given field. Field indices start at
0.
Returns a single field value from a row of a PGresult. Row and field indices
start at 0. For most queries, the value returned by GetValue is a null-
terminated ASCII string.
int PgDatabase::EndCopy()
Ensures that client and server will be synchronized, in case direct access
methods caused communications to get out of sync.
libpgeasy
void reset_fetch();
void disconnectdb();
ODBC
Installation
PostgreSQL can be compiled (or installed from packages) with the necessary drivers
for ODBC access. Although PostgreSQL includes some built-in ODBC drivers, other
projects are more supported. One of the more popular ODBC access methods is
currently the unixODBC project (see www.unixodbc.org).
Before the actual installation of the chosen ODBC driver can begin, an ODBC
manager must previously exist on the system.All versions of Windows from
Windows 95 on already include an ODBC manager. For UNIX/Linux clients, there are
several choices. There is the unixODBC manager applet, and there is a free ODBC
client called iODBC. (More information can be obtained from www.unixodbc.org or
www.iodbc.org.)
If your system was installed from source code, the option --enable-odbc could've
been supplied at compile time. (See Chapter 10, "Common Administrative Tasks,"
for more compile-time options.) Likewise, most of the package-based installs also
provide an optional package that includes the required ODBC functionality (for
example, postgresql-odbc-7.1.2-4PGDG.i386.rpm).
Alternatively, if the system has previously been compiled without the ODBC option,
it can still be compiled by running the make install command in the appropriate
directory (for example, src/interfaces/odbc).
As for installing the client machines, the easiest method is to download the Windows
executable that automatically installs and configures Windows machines. This
installer can be obtained from the following (check mirrors also):
ftp://ftp.postgresql.org/pub/odbc/versions/full/
Additionally, the MS Installer (MSI) or plain DLL versions of the driver can be
obtained from the following:
ftp://ftp.postgresql.org/pub/odbc/versions/msi/
ftp://ftp.postgresql.org/pub/odbc/versions/dll/
The next step is to configure the odbc.ini file (or preferably, use the provided GUI
management dialog).
[ODBC]
An alternative to specifying all of these options within an .ini file is to utilize the
GUI configuration tool provided with the Windows driver.
ReadOnly (default). New data sources will inherit the state of this box for
the data source read-only attribute.
Parse Statements. If enabled, the driver will parse a SQL query statement to
identify the columns and tables and to gather statistics about them such as
precision, nullability, aliases, and so on.
Dont Know. Returns a "Dont Know" value and lets the application decide.
Longest. Returns the longest string length of the column of any row.
Data Type Options. Affects how some data types are mapped. Options are as
follows:
Cache Size. When using cursors, this is the row size of the tuple cache. If not
using cursors, this is how many tuples to allocate memory for at any given
time.
Max VarChar. The maximum precision of theVarChar and BPChar (char[x])
types.
SysTable Prefixes. By default, names that begin with pg_ are treated as
system tables.This allows defining additional ones. Separate each prefix with a
semicolon (;).
Connect Settings. These commands will be sent to the back end upon a
successful connection. Use a semicolon (;) to separate commands.
Show System Tables. The driver will treat system tables as regular tables.
OID Options:
Fake Index. Fakes a unique index on OID.This is mainly useful for older MS
Access–style applications.
Protocol:
6.2. Forces the driver to use Postgres 6.2 protocol, which had different byte
ordering, protocol, and other semantics.
6.3. Use the 6.3 protocol. This is compatible with both 6.3 and 6.4 back ends.
6.4. Use the 6.4 protocol. This is only compatible with 6.4.
Perl
PostgreSQL already includes the procedural language PL/Perl that can run Perl
scripts. Accessing PostgreSQL from an external Perl script, however, requires the
use of the Perl database-independent (DBI) module. The Perl DBI defines a set of
functions, variables, and conventions to Perl scripts, regardless of what back-end
database is actually used.
Perl DBI provides a consistent interface to scripts, resulting in much more portable
and flexible code. The DBI is just a general-purpose interface, however; a database
driver is still needed to connect to a specific database.
The Perl system does include an older, non-DBI PostgreSQL access module named
Pg. However, this is an older module, and most development work recently has
gone into the newer DBI-compliant modules.
The overall architecture of the Perl DBI system is illustrated in Figure 13.2.
The PostgreSQL driver is named DBI::Pg, and it must be present and installed for
execution to be successful. This class and driver set is modeled closely after the
libpq library functions. Therefore, the functional interfaces are analogous to how
things would be done in C.
The DBI class is the base class provided by the interface system. The following are
the methods it provides:
Options:
available_drivers
data_sources($driver)
trace($level[, $file])
Once the DBI class returns a valid handle object, it will provide these methods:
prepare($statement[, \%attr])
commit
rollback
disconnect
ping
After a ResultSet has been returned, that object provides the following methods:
execute([@bind_values])
fetchrow_arrayref
Description: Fetches the next row of data holding values; returns a reference
to the array.
fetchrow_array
Description: Fetches the next row of data holding values; returns an array.
fetchrow_hashref
Description: Fetches the next row of data holding values; returns a reference
to an array.
fetchall_arrayref
Description: Fetches all the rows of data holding values; returns a reference to
an array.
finish
Description: Indicates to the back end that no more rows will be fetched;
allows the server to reclaim resources.
rows
NUM_OF_FIELDS
NUM_OF_PARAMS
NAME
Description: Returns a reference to an array that contains the field's names for
each column.
pg_size
pg_type
pg_oid_status
pg_cmd_status
PyGreSQL is a Python interface to the PostgreSQL database. It was written by D'Arcy J.M. Cain and
was based heavily on code written by Pascal Andre.
Compiling PyGreSQL
Python Configuration
Locate the directory where the dynamic loading packages are located for Python (for example,
usr/lib/python/libdynload). Copy the resulting _pg.so file to this location. Copy the pg.py
and pgdb.py files to Python's standard library directory (for example, /usr/local/lib/Python).
The pg.py file uses the traditional interface, whereas the pgdb.py file is compliant with the DB-API
2.0 specification developed by the Python DB-SIG.
The remainder of this section describes only the older pg API. You can read about the new DB-SIG
API at the following:
www.python.org/topics/database/DatabaseAPI-2.0.html
www2.linuxjournal.com/lj-issues/issue49/2605.html
PyGreSQL Interfaces
The PyGreSQL module provides two separate interfaces to a PostgreSQL database server.Access is
provided via one of the two included wrapper modules:
Although most of the new development effort is to further define the DBI-compliant interface, the
standard PyGreSQL interface is currently more standard. This section will focus on the standard
interface, although information on the DBI 2.0 interface can be found at the following:
www.python.org/topics/database/DatabaseAPI-2.0.html
Parameters:
For instance:
>>>import pg
>>>database=pg.connect(dbname="newriders", host=127.0.0.1)
get_defhost()
get_defport()
get_defopt()
get_deftty()
set_deftty(tty)
Parameters:
Description: Sets the debug terminal value for new connections. If None is supplied as a
parameter, environment variables will be used in future connections.
get_defbase()
Once connected to a database, a pgobject is returned. This object embeds specific parameters that
define this connection. The following parameters are available through function calls:
query(command)
Parameters:
Description: Sends the specified SQL query (command) to the database. If the query is an
insert statement, the return value is the OID of the new row. If it is a query that does not
return a result, None is returned. For SELECT statements, a pgqueryobject object is
returned that can be accessed via the getresult or dictresult method.
For instance:
>>>import pg
>>>database=pg.connect("newriders")
>>>result=database.query("SELECT * FROM authors")
close
Description: Closes the database connection. The connection is automatically closed when the
connection is deleted, but this method enables an explicit close to be issued.
fileno
getnotify
Description: Receives NOTIFY messages from the server. If the server returns no notify, the
methods returns None. Otherwise, it returns a tuple (relname, pid), where relname is the
name of the notify and pid is the process ID of the connection that triggered the notify.
Remember to do a listen query first; otherwise, getnotify will always return None.
inserttable
Description: Allows quick insertion of large blocks of data in a table. The list is a list of
tuples/lists that define the values for each inserted row.
putline
getline
endcopy
Description: Ensures that the client and server will be synchronized, in case direct access
methods cause communications to get out of sync.
To access large objects via a pg connection to a database, the following functions are used:
getlo
locreate
loimport
Description: This method enables you to create large objects in a very simple way. You just give
the name of a file containing the data to be used.
open
Description: This method opens a large object for reading/writing, in the same way as the UNIX
open() function.
close
Description: This method closes a previously opened large object, in the same way as the UNIX
close() function.
read
Description: This function enables you to read a large object, starting at the current position.
write
Description: This function enables writing a large object, starting at the current position.
tell
Description: This method gets the current position of the large object.
seek
unlink
size
Description: Returns the size of a large object. Currently, the large object needs to be opened.
export
Description: Dumps the content of a large object on the host of the running Python program,
not the server host.
Once a query has been issued to the database, if results are returned, they can be accessed in the
following ways:
getresult
Description: Returns the list of the values contained in pgqueryobject. More information
about this result can be accessed using listfields, fieldname, or fieldnum methods.
dictresult
listfields
Description: Lists the field names of the previous query result. The fields are in the same order
as the result values.
fieldname(int)
Description: Finds a field name from its ordinal sequence number (integer). The fields are in
the same order as the result values.
fieldnum(str)
ntuples
reset
For example:
>>>import pg
>>>database=pg.connect("newriders")
>>>results=database.query("SELECT * FROM payroll")
>>>results.ntuples()
2340
>>>mydict=results.dictresult()
The DB Wrapper
The preceding functions are wrapped within the pg module. This module also provides a special
wrapper named DB. This wrapper streamlines much of the connection and access mechanics needed
to interact with the database. The preceding functions are also included in the name space, so it isn't
necessary to import both modules. The preferred way to use this module is as follows:
>>>import pg
>>>db=pg.DB('payroll','localhost')
>>>db.query("INSERT INTO checks VALUES ('Erica',200)")
>>>db.query("SELECT * FROM checks")
Name Amount
-------------
Erica 200
The following list describes the methods and variables of this class (these are very similar to the base
pg method, with some slight exceptions):
pkey(table)
Description: This method returns the primary key of a table. Note that this raises an exception
if the table doesn't have a primary key.
get_databases
Description: Although you can do this with a simple select, it is added here for convenience.
get_tables
Description: Returns a list of tables available in the current database.
get_attnames
Parameters:
Description: Gets a single row. It assumes that the key specifies a unique row; if keyname is not
specified, the primary key for the table is used.
insert(table, a)
Parameters:
a A dictionary of values.
Description: Inserts values into the specified table, using values from the dictionary. Then the
dictionary is updated with values modified by rules, triggers, and so on.
update(table, a)
Parameters:
a A dictionary of values.
Description: Updates an existing row. The update is based on the OID value from get. An array is
returned that reflects any changes caused by the update due to triggers, rules, defaults, and so on.
clear(table, [a])
Parameters:
a A dictionary of values.
Description: Clears all the fields to clear values, which is determined by the data type. Numeric
types are set to 0, dates are set to TODAY, and everything else is set to NULL. If the argument a is
present, it is used as the array, and any entries matching attribute names are cleared with
everything else left unchanged.
delete(table, a)
Parameters:
a A dictionary of values.
Description: Deletes the row from a table based on the OID from get.
PHP
PHP is a scripting language used for building dynamic web pages. It contains a number of advanced
features that rival commercial options such as ASP and ColdFusion.
It contains several built-in database interfaces, including functions specific for communicating with
both MySQL and PostgreSQL. The following is a list of the functions specific to PostgreSQL:
pg_close(connection_id)
pg_cmdtuples(result_id)
Example:
<?php
$dbconn = pg_Connect ("dbname=newriders");
$dbconn2 = pg_Connect ("host=localhost port=5432 dbname=newriders");
?>
pg_dbname(connection_id)
Description: Returns the name of the database connected to the specified connection index.
Otherwise, it returns false if the connection is not a valid connection index.
pg_end_copy([resource connection])
Description: Synchronizes a front-end application with the back end after doing a copy
operation. It must be issued; otherwise, the back end might get out of sync with the front end.
pg_errormessage(connection_id)
Description: Returns a string containing any error messages from previous database operations;
otherwise, returns false.
pg_exec(connection_id, query)
Description: Returns a result index following the execution of the SQL commands contained in
the query. Otherwise, it returns a false value. From a successful execution, the return value
of this function is an index to be used to access the results from other PostgreSQL functions.
For instance:
<?php
$conn = pg_pconnect ("dbname=newriders");
Description: Returns the properties that correspond to the fetched row; otherwise, returns
false if there are no more rows.
pg_fetch_row(result_id, row)
Description: Returns the specified row as an array. Each result column is stored in an array
offset, starting at offset 0.
Description: Returns 0 if the field in the given row is not NULL. Returns 1 if the field in the
given row is NULL. Field can be specified as number or fieldname.
pg_fieldname(result_id, field_number)
Description: Returns the field name of the corresponding field-index number specified. Field
numbering starts from 0.
pg_fieldnum(result_id, field_name)
Description: Returns the field number for the column name specified.
Description: Returns the number of characters of a specific field in the given row.
pg_fieldsize(result_id, field_number)
Description: Returns the number of bytes that the internal storage size of the given field
number occupies.A field size of –1 indicates a variable-length field.
pg_fieldtype(result_id, field_number)
Description: Returns a string containing the data type of the field represented by the field
number supplied.
pg_freeresult(result_id)
Description: When called, all result memory will automatically be freed. Generally, this is only
needed when you are certain you are running low on memory because PHP will automatically
free memory once a connection is closed.
pg_getlastoid(result_id)
Description: Returns the last OID assigned to an inserted tuple. The result identifier is used
from the last command sent via pg_exec().
pg_host(connection_id)
pg_loclose(file_id)
Description: Closes a large object. file_id is a file descriptor for the large object from
pg_loopen().
pg_locreate(connection_id)
Description: Specifies the object ID of the large object to export, and the filename argument
specifies the pathname of the file.
pg_loimport(file_path, [connection_id])
Description: Specifies the pathname of the file to be imported as a large object. All handling of
large objects in PostgreSQL must happen inside a transaction.
Description: Opens a large object and returns file descriptor. The file descriptor encapsulates
information about the connection. Do not close the connection before closing the large object
file descriptor. obj_oid specifies a valid large object OID. The mode can be "r","w", or "rw".
pg_loread(file_id, length)
Description: Reads the specified length of bytes from a large object and returns it as a string.
The file_id specifies a valid large object file descriptor.
pg_loreadall(file_id)
pg_lounlink(connection_id, large_obj_id)
pg_lowrite(file_id, buffer)
Description: Writes to a large object from the specified buffer. Returns the number of bytes
actually written or false in the case of an error. file_id refers to the file descriptor for the
large object from pg_loopen().
pg_numfields(result_id)
Description: Returns the number of fields in a result. The result_id is a valid result identifier
returned by pg_exec().
pg_numrows(result_id)
Description: Returns the number of rows in a result. The result_id is a valid result identifier
returned by pg_exec().
pg_options(connection_id)
Description: Returns a string of the specified options valid on the provided connection identifier.
pg_port(connection_id)
pg_put_line(connection_id, data)
Description: Sends a NULL -terminated string to the PostgreSQL server. This is useful, for
example, for very high-speed inserting of data into a table, initiated by starting a PostgreSQL
copy operation.
For instance:
<?php
$conn = pg_pconnect ("dbname=foo");
pg_exec($conn, "create table bar (a int4, b char(16), d float8)");
pg_exec($conn, "copy bar from stdin");
pg_put_line($conn, "3\thello world\t4.5\n");
pg_put_line($conn, "4\tgoodbye world\t7.11\n");
pg_put_line($conn, "\\.\n");
pg_end_copy($conn);
?>
pg_result(result_id, row_number, fieldname)
Description: Returns values from a result identifier produced by pg_exec(). The row_number
and fieldname specify what elements of the array are returned. Instead of naming the field,
you can use the field index as an unquoted number.
pg_set_client_encoding(connection_id, encoding)
Description: Sets the client encoding type. The encoding can be SQL_ASCII, EUC_JP, EUC_CN,
EUC_KR, EUC_TW, UNICODE, MULE_INTERNAL, LATIN1 … LATIN9, KOI8, WIN, ALT, SJIS,
BIG5, or WIN1250. Returns 0 if success or –1 if error.
pg_client_encoding(connection_id)
Description: Returns the client encoding as a string. Will be one of the values that can be set
with the pg_set_client_encoding function.
pg_tty(connection_id)
Description: Returns the tty name that server-side debugging output is being sent.
pg_untrace(connection_id)
Plugging in new features and objects into the database is known as "extending" it.
Fundamentally, PostgreSQL enables this by allowing users to write new C-based
objects and by using the resultant function as a handler for specific data type,
operator, or aggregate needs.
2. Registering that function with the PostgreSQL back end through the use of the
CREATE FUNCTION command.
3. Linking the proper SQL command (for example, CREATE TYPE, CREATE
OPERATOR, and so on) with that registered object.
One type of information that these tables store is pointers to compiled shared
objects that handle specific database functions. In essence, the CREATE
FUNCTION, CREATE OPERATOR, CREATE TYPE, and CREATE AGGREGATE
commands modify these system catalogs to include definitions for this extra
functionality.
The basic breakdown of system catalogs can be defined as shown in Table 14.1.
Table Description
pg_class Tables
pg_database Databases
pg_operator Operators
Most acts of extension require the defining of special functions. For instance, to define a new
data type, a C shared-object function describing the new data type must first be created.
There are three fundamental types of custom function (also refer to Chapter 12, "Creating
Custom Functions," for a more relevant discussion of created SQL or PL functions):
SQL functions. These functions consist purely of standard SQL code. No external
database objects must exist in order for these to be executed. They can be defined on-
the-fly regardless of the configuration of the base system.
PL functions. These functions are written in a non-native code (for example, PL/pgTCL).
For these functions to execute, an external shared-object handler must exist. The
handler functions must first be registered with the database back end before execution
can proceed.
SQL Functions
SQL language functions are simply predefined queries that are assigned a name. However,
they do support input type and can provide return values. Writing SQL functions requires no
modification to the base system or special features. For instance:
Standard SQL functions can also handle classes to be passed to or from it. For instance:
last_name
-------------------
Parody
last_name
-------------------
Paro
Procedural Language Functions
Procedural language functions are offered via loadable modules. For instance, the PL/pgSQL
language depends on the plpgsql.so loadable module. After these shared objects have been
created, they are defined as handlers with the CREATE LANGUAGE command.
The specific steps to create a valid handler object are beyond the scope of this book, but the
basic or general steps would be as follows:
2. Create a function that defines this object. The return type must be set as OPAQUE for
this function. For instance:
3. Define a handler that routes a language request for this object to the previously created
function. For instance:
After a language has been defined, functions and stored procedures can be created with it.
Currently, PostgreSQL supports PL/pgSQL, PL/Tcl, and PL/Perl. For more information on
creating procedural language functions, refer to Chapter 11, "Server-Side Programming."
Compiled Functions
Compiled functions are shared objects that have been registered with the database through
the use of the CREATE FUNCTION command. Creating custom compiled functions is more
complex than creating scripted functions, but they do offer a tremendous benefit in execution
speed.
Creating successful C functions requires that the PostgreSQL and C data types can be
exchanged correctly. Table 14.2 lists the PostgreSQL data type, the corresponding C data type,
and the C header file where it is defined.
Table 14.2. Corresponding PostgreSQL Data Types, C Data Types, and C Header Files
PostgreSQL Data
C Data Type C Header File
Type
pass-by-value
Generally, data that is passed by value must either be 1, 2, or 4 bytes in length (although
some architectures can support 8 bytes as well). Fixed-length or variable-length calls can be
made with any size data types.
Two separate conventions exist regarding how C-based functions are to be interfaced:
Version-0. This method is the original, but it has now been deprecated. Although this
method was fairly simple to use, functions using this method encountered portability
problems when trying to port functions across architectures.
Version-1. This is the newest interface convention. It overcomes many of the shortfalls
of Version-0 calling. It achieves this by relying on macros to encapsulate the passing of
arguments, thereby making the resultant code much more portable.
BecauseVersion-0 calling is now deprecated, the following examples will demonstrate some
simpleVersion-1 functions. (For more information onVersion-0 calling, refer to the PostgreSQL
Programmer's Guide at www.postgresql.org.)
PG_FUNCTION_INFO_V1(add_it);
Datum add_it(PG_FUNCTION_ARGS)
{
int32 arg1 = PG_GETARG_INT32(0);
int32 arg2 = PG_GETARG_INT32(1);
PG_RETURN_INT32(arg1 + arg2);
}
After this has been compiled into a shared object, it can be defined and utilized as follows:
Answer
------
12
In addition to handling simple pass-by-value transfers, composite objects, like row objects,
can be passed and manipulated by C functions. For instance, this example defines the function
named isminor(), which returns TRUE or FALSE depending on whether the employee is 21
or over:
#include "postgres.h"
#include "executor/executor.h"
#include "fmgr.h"
PG_FUNCTION_INFO_V1(islegal);
Datum
islegal(PG_FUNCTION_ARGS)
{
/*Get the current table row, assign to pointer t*/
/*Get the 'age' attribute from the row, this function defined in
executor.h*/
After this function is compiled to a shared object, it can be defined and used within
PostgreSQL. For instance:
islegal
------
t
The following is a list of tips and pointers garnered from the PostgreSQL Programmer's Guide
(for more information on this guide, visit www.postgresql.org):
Use the Postgres routines palloc and pfree instead of the standard C functions
malloc and free when allocating memory. Memory reserved with palloc will
automatically be freed for each transaction, thus preventing memory leaks.
Always zero the bytes of your structures using memset or bzero. Even if you initialize
all fields of your structure, there might be several bytes of alignment padding (holes in
the structure) that contain garbage values.
Usually, programs will always require at least postgres.h and fmgr.h to be included.
The internal Postgres types are declared in postgres.h, and function manager
interfaces (PG_FUNCTION_ARGS and so on) are in fmgr.h. For portability reasons, it's
best to include postgres.h first before any other system or user header files.
Symbol names defined within object files must not conflict with each other or with
symbols defined in the PostgreSQL server executable. You will have to rename your
functions or variables if you get error messages to this effect.
Extending Types
PostgreSQL has a plethora of built-in data types (see Chapter 2, "PostgreSQL Data
Types"). However, in specific cases, it might be advantageous to create custom-
defined data types.
All the data types in PostgreSQL can be defined as belonging to one of the following
cases: base types or composites.
Base types, like int4, are written in C and are compiled into the system. However,
custom data types can be compiled as shared objects and linked to the back end by
using the CREATE TYPE command.
Composite types are created whenever a new table is created. At first it might seem
counterintuitive to think of a table as a type. However, tables are merely collections
of single data types grouped in a specific order. In that way, a table can be seen as
just a "composite," or complex collection, of simpler single-element data types.
To create a custom base type, two functions must be defined: an input function
and an output function.
The output function accesses the internal representation of the data element and
returns it as the original NULL -delimited character string.
The PostgreSQL 7.1 Programmer's Guide contains a good example of how a custom
data type could be created.
First you must define the structure of your complex data type:
Complex *
complex_in(char *str)
{
double x, y;
Complex *result;
if (sscanf(str, " ( %lf, %lf )", &x, &y) != 2) {
elog(ERROR, "complex_in: error in parsing %s", str);
return NULL;
}
result = (Complex *)palloc(sizeof(Complex));
result->x = x;
result->y = y;
return (result);
}
char *
complex_out(Complex *complex)
{
char *result;
if (complex == NULL)
return(NULL);
result = (char *) palloc(60);
sprintf(result, "(%g,%g)", complex->x, complex->y);
return(result);
}
Care should be taken to ensure that the input and output functions are the
reciprocal of each other. If not, data that is dumped out (that is, copied to a file) will
not be able to be read back in.
After the preceding code has been compiled to static objects, the corresponding SQL
function must be created to register them with the database:
Lastly, the CREATE TYPE command is used to define the characteristics of the
newly created custom base type:
Binary operators are perhaps the most common. In essence, an operator is binary
when it will sit between two separate data types (for example, 21 > 20). A classic
example of a binary data type is the greater-than symbol (>); it sits between two
data elements and returns a Boolean value from the evaluation of each element.
Even more basic is the addition operator (+), which sums the values on each side
and returns a result (for example, 2 + 3 returns 5).
Unary operators only accept data from one side, hence the names left unary or right
unary. An example of a right-unary operator is the factorial operator (!); it sits on
the left of an integer and provides the factorial result (for example, !4).
Operators must be defined for the specific data types they are required to act on.
For instance, the > operator performs different actions depending on whether
integers or geometric elements are being evaluated. Because of that, it is necessary
to explicitly type the specific data types that custom operators are designated to
operate on.
Before an operator can be defined, the underlying function must first be created.
These functions either can be defined as procedural functions (for example, SQL,
PL/pgSQL, and so on) or can link to a compiled C object file.
In this example, a function is created that accepts two integers. It adds these
integers. If the result is greater than 100, a TRUE value is returned; otherwise, it
returns FALSE. A simple SQL function is created to perform this action, as follows:
answer
------
t
SELECT addhund(9,9) AS answer;
answer
------
f
Next, this function is bound to a specific operator character through the use of the
CREATE OPERATOR command:
The preceding command specifies that it is a binary operator that expects int4
data types on both the left and right sides. Additionally, it specifies that the
COMMUTATOR optimization for this operator is itself.
answer
------
t
answer
------
f
Optimization Notes
Operator optimization pertains to giving the database clues as to how the various
operators relate to each other. There are several optimization settings that can be
specified upon operator creation.
COMMUTATOR
3 + 8 = 11
8 + 3 = 11
You can see that the addition operator is commutative with itself. This means that it
doesn't matter what side each individual data element is on; the results will be the
same. In contrast, this differs with regard to how the subtraction operator works:
3 – 8 = –5
8–3=5
In this case, the position of the data elements does make a difference. Therefore,
subtraction is not commutative with itself.
NEGATOR
Another phrase that can be specified during operator creation is what, if anything,
negates the current definition. For instance, the equal operator is negated by the
unequal operator (for example, a = b is negated by a <> b).
RESTRICT
The RESTRICT optimization clause is only valid for binary operators that return a
Boolean result (for example, a > b). Restriction provides hints to the query
optimizer related to the particular selectivity that would satisfy a general WHERE
clause. The standard estimators are as follows:
JOIN
JOIN optimization is generally only valid for binary operators that return Boolean
results (for example, a = b). The JOIN optimizer provides insight as to how many
rows would match between a pair of tables selected with a general WHERE clause
(for example, payroll.empid=employee.empid).
The possible values that can be specified for an estimation clause are shown in Table
14.3.
HASHES
In general, this only makes sense when the operator represents absolute equality
between the data types (for example, a = b). If the operator does not provide an
equality comparison between the operators, hash joins would be of little use.
SORT1 and SORT2
Use of these optimization options is very limited. In practice, it is usually only valid
for the equal (=) operator. Moreover, the two referenced operators should always be
named <.
The CREATE OPERATOR command does not perform any sanity checks to determine
the validity of optimization options. Therefore, the command might successfully
create the specified operator, but it might still fail on use. In fact, using the
SORT1/SORT2 optimization options will cause failure if either of the following
conditions is not met:
The merge join equality operator must have a commutator (should be itself if
the two data types are the same).
There must be < and > operators that have the same data types as the
specified sort operator.
Part V: Appendices
Part V Appendices
A Additional Resources
One of the first questions asked by new users to RDBMSs is, "Which one is best?"
That question is nearly impossible to answer without a full understanding of the
database's required functionality.
Comparing databases is like comparing vehicles. Each type of vehicle is suited for a
particular task; motorcycles would be preferable over pickup trucks in some
situations and would be disastrous in others. Likewise, the required functionality of
a RDBMS must be understood before the right fit can be established.
In lieu of trying to compare apples to oranges, the following sections will give a brief
listing of the popular RDBMSs currently available and will list their strong and weak
points as well as their typical uses.
PostgreSQL
Pros:
Many commercial support options (Great Bridge, Red Hat, and others).
Has a wide array of API access solutions, including ODBC, JDBC, C, Perl, PHP,
and Python.
Fully transactional.
Cons:
Typical uses:
The only possible concern that exists when evaluating PostgreSQL is the
specific environment where it will operate. Although an NT version of the
database is available, it tends to run better in a UNIX-style environment.
MySQL
Pros:
Cons:
Doesn't perform complex joins or subselects.
No triggers.
No foreign keys.
Typical uses:
MySQL makes an excellent choice for serving dynamic web pages, particularly
if there is no need for transactions or complex queries. Additionally, MySQL is
very straightforward to install, configure, and administrate.
Pros:
Cons:
Typical uses:
Interbase
Pros:
Cons:
Lacks some of the more advanced SQL statements (such as CASE, NULLIF,
and COALESCE).
DB2
Pros:
Cons:
Proprietary software.
Typical uses:
Used by enterprises for large, complex, involved projects that need a full-
featured RDBMS.
Proper installation and configuration can be complex. As a result, DB2 may not
make much sense for small to mid-range database solutions. However, as a
back end to a massive database system, DB2 cannot be beat.
Oracle
Pros:
Cons:
Proprietary software.
Typical uses:
Web Sites
Mirror Sites
Australia
postgresql.planetmirror.com
Canada
www.ca.postgresql.org/index.html
Germany
postgresql.bnv-bamberg.de
Italy
www.postgresql.uli.it
Russia
postgresql.rinet.ru
United States
postgresql.readysetnet.com
Mailing Lists
The PostgreSQL user and development community has a very active set of mailing
lists. The procedure for subscribing to any of the following lists is as follows:
4. Optionally, set the phrase "set nomail" in the message body. This will stop the
flow of email but still keep you subscribed. (This is useful for the following
newsgroup option.)
List to discuss external APIs to the PostgreSQL back end. (Note: There are
separate lists for the ODBC and JDBC interfaces.)
Newsgroups
You can subscribe to many newsgroups from the PostgreSQL news server
(news//news.postgresq.org). Although anyone can read these groups, you must be
subscribed to one of the preceding mailing lists to post.
comp.databases.postgresql.admin
comp.databases.postgresql.announce
comp.databases.postgresql.bugs
comp.databases.postgresql.committers
comp.databases.postgresql.docs
comp.databases.postgresql.general
comp.databases.postgresql.hackers
comp.databases.postgresql.hackers.fmgr
comp.databases.postgresql.hackers.oo
comp.databases.postgresql.hackers.smgr
comp.databases.postgresql.hackers.wal
comp.databases.postgresql.interfaces
comp.databases.postgresql.interfaces.jdbc
comp.databases.postgresql.interfaces.odbc
comp.databases.postgresql.interfaces.php
comp.databases.postgresql.mirrors
comp.databases.postgresql.novice
comp.databases.postgresql.patches
comp.databases.postgresql.ports
comp.databases.postgresql.ports.cygwin
comp.databases.postgresql.questions
comp.databases.postgresql.sql
FTP Sites
The collection of mirrored FTP sites is the primary way to get source and binary
packages that relate to PostgreSQL. The main web site, www.postgresql.org, lists a
collection of addresses. Here are the more popular sites:
Australia
ftp.planetmirror.com/pub/postgresql
Canada
ftp.jack-of-all-trades.net/www.postgresql.org
looking-glass.usask.ca/pub/postgresql
postgresql.wavefire.com
Germany
ftp.leo.org/pub/comp/os/unix/database/postgresql
ftp-stud.fht-esslingen.de/pub/Mirrors/ftp.postgresql.org
Italy
ftp.postgresql.uli.it
postgresql.theomnistore.com/mirror/postgresql
bo.mirror.garr.it/mirrors/postgres
Japan
ring.asahi-net.or.jp/pub/misc/db/postgresql
Russia
ftp.chg.ru/pub/databases/postgresql
postgresql.rinet.ru
United Kingdom
postgresql.rmplc.co.uk/pub/postgresql
United States
postgresql.readysetnet.com/pub/postgresql
download.sourceforge.net/pub/mirrors/postgresql
ftp.digex.net/pub/packages/database/postgresql
ftp.crimelabs.net/pub/postgresql
Books
Matthew, Neil, et. al. Professional Linux Programming. Chicago:Wrox Press, Inc.,
2000.
Appendix B. PostgreSQL Version Information
PostgreSQL is under constant development, and in the last few years, a slew of new
features have been added. The following is a brief listing of the major changes/bug
fixes that each new version has implemented. For a more comprehensive listing,
look in the ChangeLog file (usually located in /usr/local/pgsql/ChangeLogs).
Version 7.1.2 (Released May 2001)
JOIN fixes.
ODBC fixes.
Python fixes.
TOAST (The Over-Attribute Storage Technique) implemented. Enabled any size rows
to be stored in tables. Removed previous fixed-row lengths.
Fix implemented for inserting long multibyte strings into type CHAR.
Version 7.0.2 (Released June 2000)
Repaired the check for redundant UNIQUE and PRIMARY KEY indices.
Fix implemented for removal of temp tables if last transaction was aborted.
Fix implemented to prevent too large tuple from being created in plpgsql bug
fixes.
Added ^ precedence.
Added pg_dump -N flag to force double quotes around identifiers. This is the
default.
Fixed test for table existence to allow mixed-case and whitespace in the table name.
Views and rules are now functional thanks to extensive new code in the rewrite rules
system from Jan Wieck. He also wrote a chapter on it for the Programmer's Guide.
The parser will now perform automatic type coercion to match arguments to
available operators and functions and to match columns and expressions with target
columns. This uses a generic mechanism that supports the type extensibility
features of Postgres. There is a new chapter in the User's Guide that covers this
topic.
Three new data types have been added. Two types, inet and cidr, support
various forms of IP network, subnet, and machine addressing. There is now an 8-
byte integer type available on some platforms. A fourth type, serial, is now
supported by the parser as an amalgam of the int4 type, a sequence, and a unique
index.
EXPLAIN invokes rule system and shows plan(s) for rewritten queries.
NOTIFY now sends sender's PID so you can tell whether it was your own.
Added routines to allow sizing of varchar and bpchar into target columns.
Added HAVING clause with full support for subselects and unions.
EXPLAIN VERBOSE can pretty-print the plan to the postmaster log file.
New rewrite system fixes many problems with rules and views.
Subselects with EXISTS, IN, ALL, and ANY keywords (Vadim, Bruce, and Thomas).
Allowed NOT NULL UNIQUE constraint clause (each allowed separately before).
Support SQL-92 syntax for IS TRUE/IS FALSE/IS NOT TRUE/IS NOT FALSE.
Allowed shorter strings for Boolean literals (for example, t, tr, tru).
Implemented SQL-92 binary and hexadecimal string decoding (b'10' and x'1F').
Supported SQL-92 syntax for type coercion of literal strings (for example,
"DATETIME 'now'").
Added conversions for int2, int4, and OID types to and from text.
Allowed for a pg_password authentication database that was separate from the
system password file.
Added new psql \da, \dd, \df, \do, \dS, and \dT commands.
New front-end/back-end protocol has a version number and network byte order.
Added syntax and warnings for UNION, HAVING, INNER, and OUTERJOIN (SQL-92).
Replaced above operator !^ with >^ and below operator ! | with <^.
Added routines for text trimming on both ends, substring, and string position.
Added conversion routines circle(box) and poly(circle).
MOVE implementation.
Version 6.2 (Released June 1997)
Added hostname/user level access control rather than just hostname and user.
Implemented IN qualifier.
New VACUUM option for attribute statistics and for certain columns.
Initial release.