PostgreSQL Cookbook Sample Chapter
PostgreSQL Cookbook Sample Chapter
ee
PostgreSQL Cookbook
With the goal of teaching you the skills to master PostgreSQL, the book begins by giving you a
glimpse of the unique features of PostgreSQL and how to utilize them to solve real-world problems.
With the aid of practical examples, the book will then show you how to create and manage databases.
You will learn how to secure PostgreSQL, perform administration and maintenance tasks, implement
high availability features, and provide replication. The book will conclude by teaching you how to migrate
information from other databases to PostgreSQL.
PostgreSQL Cookbook
PostgreSQL is an open source database management system. It is used for a wide variety of development
practices such as software and web design, as well as for handling large datasets (big data).
Sa
pl
e
and problems
P U B L I S H I N G
Chitij Chauhan
PostgreSQL Cookbook
Over 90 hands-on recipes to effectively manage, administer, and
design solutions using PostgreSQL
P U B L I S H I N G
Chitij Chauhan
PostgreSQL Cookbook
PostgreSQL is a database server that is available on a wide range of platforms and is
one of the most popular open source databases deployed in production environments
worldwide.
It is also one of the most advanced databases, with a wide range of features that challenge
even many proprietary databases. This book offers you an insight into the various
features and implementations of these features in PostgreSQL. It is intended to be a
practical guide for database administrators and developers alike, with solutions related to
data migration, table partitioning, high availability and replication, database performance,
and using Perl and Python languages for integration with PostgreSQL.
Chapter 7, High Availability and Replication, demonstrates the high availability and
replication concepts in PostgreSQL. After reading this chapter, you will be able to
implement high availability and replication options using different techniques including
streaming replication, Slony replication, replication using Bucardo, and replication using
Longdiste. Eventually, you will be able to implement a full-fl edged, active/passive,
highly available PostgreSQL cluster using open source tools such as DRBD, Pacemaker,
and Corosync.
Chapter 8, Connection Pooling, covers connection pooling methods such as pgpool and
pgbouncer. They help reduce database overhead when there are a large number of
concurrent connections. After reading this chapter, you should be able to configure the
pgpool and pgbouncer methods.
Chapter 9, Table Partitioning, explains the different partitioning methods and
implementing logical segregation of table data into partitions. You will also get familiar
with horizontal partitioning implementation using PL/Proxy.
Chapter 10, Accessing PostgreSQL from Perl, makes you familiar with creating database
connections, accessing data, and performing DML operations on the PostgreSQL
database using Perl programming.
Chapter 11, Accessing PostgreSQL from Python, shows you how to create database
connections, access data, and carry out DML operations on the PostgreSQL database
using Python programming.
Chapter 12, Data Migration from Other Databases and Upgrading the PostgreSQL
Cluster, covers the different mechanisms available to initiate minor and major version
upgrades of PostgreSQL. You will also become familiar with the Oracle GoldenGate tool
used to replicate data from other databases to PostgreSQL.
Managing Databases
and the PostgreSQL
Server
In this chapter, we will cover the following recipes:
f
Creating databases
Creating schemas
Creating users
Creating groups
Destroying databases
Terminating connections
Introduction
PostgreSQL is an open source, object-oriented relational database management system
that was originally developed at the Berkeley Computer Science Department of the University
of California.
PostgreSQL is an advanced database server available on a wide range of platforms, ranging
from Unix-based operating systems such as Oracle Solaris, IBM AIX, and HP-UX; Windows;
and Mac OS X to Red Hat Linux and other Linux-based platforms.
We start with showing how to create databases in PostgreSQL. During the course of this
chapter, we will cover schemas, users, groups, and tablespaces, and show how to create
these entities. We will also show how to start and stop the PostgreSQL server services.
Creating databases
A database is a systematic and organized collection of data which can be easily accessed,
managed, and updated. It provides an efficient way of retrieving stored information.
PostgreSQL is a powerful open source database. It is portable because it written in ANSI
C. As a result, it is available for different platforms and is reliable. It is also ACID (short for
Atomicity, Consistency, Isolation, Durability) compliant, supports transactions, is scalable
as it supports multi version concurrency control (MVCC) and table partitioning, is secure
as it employs host based access control and supports SSL, and provides high availability and
replication by implementing features such as streaming replication and its support for point in
time recovery.
Getting ready
Before you start creating databases, you would need to install PostgreSQL on your computer.
For Red Hat or CentOS Linux environments, you can download the correct rpm for the
PostgreSQL 9.3 version from yum.postgresql.org.
Here is the link you can use to install PostgreSQL on CentOS:
http://www.postgresonline.com/journal/archives/329-An-almost-idiotsguide-to-install-PostgreSQL-9.3,-PostGIS-2.1-and-pgRouting-with-Yum.
html
The following are the links you can use to install PostgreSQL on an Ubuntu platform:
f
http://technobytz.com/install-postgresql-9-3-ubuntu.html
http://www.cloudservers.com/installing-and-configuringpostgresql-9-3-on-hosted-linux-cloud-vps-server/
Chapter 1
Alternatively, you may download the graphical PostgreSQL installer available from the
EnterpriseDB website, at http://www.enterprisedb.com/products-servicestraining/pgdownload.
For details on how to install PostgreSQL using the graphical PostgreSQL installer from the
EnterpriseDB website, you can refer to the following link for further instructions:
http://www.enterprisedb.com/docs/en/9.3/pginstguide/Table%20of%20
Contents.htm
Once you have downloaded and installed PostgreSQL, you will need to define the data
directory, which is the storage location for all of the data files for the database. You will then
need to initialize the data directory. Initialization of the data directory is covered under the
recipe titled Initializing a database cluster. After this, you are ready to create the database.
To connect to a database using the psql utility, you can use the following command:
psql
-h localhost
-d postgres p 5432
Here, we are basically connecting to the postgres database, which is resident on the
localhost, that is the same server on which PostgreSQL was installed, and the connection
is taking place on port 5432.
In the following code, we are creating a user, hr. Basically, this user is being created because
in the next section, it is being used as the owner of the hrdb database:
CREATE USER hr with PASSWORD 'hr';
More details regarding creating users will be covered in the Creating users recipe.
How to do it...
PostgreSQL provides two methods to create a new database:
f
The first method relies on using the CREATE DATABASE SQL statement:
CREATE DATABASE hrdb WITH ENCODING='UTF8' OWNER=hr
CONNECTION LIMIT=25;
How it works...
A database is a named collection of objects such as tables, functions, and so on.
In order to create a database, the user must be either a superuser or must have
the special CREATEDB privilege.
9
Alternatively, you may use \l switch of psql to view the list of existing databases.
Creating schemas
Schemas are among the most important objects within a database. A schema is a named
collection of tables. A schema may also contain views, indexes, sequences, data types,
operators, and functions. Schemas help organize database objects into logical groups,
which helps make these objects more manageable.
How to do it...
You can use the CREATE SCHEMA statement to create a new schema in PostgreSQL:
CREATE SCHEMA employee;
How it works...
A schema is a logical entity that helps organize objects and data in the database.
By default, if you don't create any schemas, any new objects will be created in the public schema.
In order to create a schema, the user must either be a superuser or must have the CREATE
privilege for the current database.
Once a schema is created, it can be used to create new objects such as tables and views
within that schema.
10
Chapter 1
There's more...
You may use the \dn switch of psql to list all of the schemas in a database as shown in the
following screenshot:
To identify the schema in which you are currently working, you can use the following command:
SELECT current_schema();
While searching for objects in the database, you can define the search schemas preferences
for where those searches should start. You can use the search_path parameter for this,
as follows:
ALTER DATABASE hrd SET search_path TO hr,hrms, public, pg_catalog;
Creating users
A user is a login role that is allowed to log in to the PostgreSQL server. The login roles section
is where you define accounts for individual users for the PostgreSQL system. Each database
user should have an individual account to log in to the PostgreSQL system. Each user has an
internal system identifier in PostgreSQL, which is known as a sysid. The user's system ID is
used to associate objects in a database with their owner. Users may also have global rights
assigned to them when they are created. These rights determine whether a user is allowed to
create or drop databases and whether the existing user is a superuser or not.
How to do it...
PostgreSQL provides two methods by which database users are created:
f
The first method requires using the CREATE USER SQL statement to create a
new user in the database. You can create a new user with the CREATE USER SQL
statement, like this:
CREATE user agovil WITH PASSWORD 'Kh@rt0um';
11
The second method requires executing the createuser script from the
command line.
We may also use the createdb script to create a user called nchabbra on the same
host (port 5432), and the S option specifies that the created user will not have the
superuser privileges:
$ createuser -h localhost -p 5432 -S nchabbra
How it works...
The CREATE USER SQL statement requires one mandatory parameter which is the name of
the new user. Other parameters, which are optional, however, are passwords for the user or
group, the system ID, and a set of privileges that may be explicitly allocated.
The createuser script can be invoked without arguments. In that case, it will prompt us
to provide the username and the set of rights and will attempt to make a local connection
to PostgreSQL. It can also be invoked with options and the username to be created on the
command line, and you will need to give the user access to a database explicitly if he/she is
not the owner of the database.
There's more...
We can use the \du switch of psql to display the list of existing users, inclusive of roles in the
PostgreSQL server, as shown in this screenshot:
12
Chapter 1
Alternatively you may obtain the list of users by querying the pg_user catalog table using the
SQL statement, as shown in the following screenshot:
Creating groups
A group in the PostgreSQL server is similar to the groups that exist in Unix and Linux. A group
in PostgreSQL serves to simplify the assignment of rights. It simply requires a name and may
be created empty. Once it is created, users who are intended to share common access rights
are added into the group together, and are thus associated by their membership within that
group. Grants on the database objects are then given to the group instead of each individual
group member.
How to do it...
Groups in the PostgreSQL server can be created by using the CREATE GROUP SQL statement.
The following command will create a group. However, no users are currently a part of this group:
hrdb=# CREATE GROUP dept;
In order to assign members/users to the group, we can use the ALTER GROUP statement
as follows:
hrdb=# ALTER GROUP dept ADD USER agovil,nchabbra;
It is also possible to create a group and assign users upon its creation, as shown in the
following CREATE GROUP statement:
hrdb=# CREATE GROUP admins WITH user agovil,nchabbra;
13
How it works...
A group is a system-wide database object that can be assigned privileges and have users
added to it as members. A group is a role that cannot be used to log in to any database.
It is also possible to grant membership in a group to another group, thereby allowing the
member role use of privileges assigned to the group it is a member of.
Database groups are global across a database cluster installation.
There's more...
To list all of the available groups in the PostgreSQL server instance, you need to query the
pg_group catalog table, as shown in the following screenshot:
Destroying databases
Every major RDBMS vendor offers the ability to drop databases just as it allows you to create
databases. However, one should exercise caution when dealing with situations like dropping
databases. Once a database is dropped, all of the information residing in it is lost forever. It
is only for a valid business purpose that we should drop databases. In normal circumstances,
a database is only dropped when it gets decommissioned and is no longer required for
business operations.
How to do it...
There are two methods to drop a database in the PostgreSQL server instance:
f
You can use the DROP DATABASE statement to drop a database from PostgreSQL,
as follows:
hrdb=# DROP DATABASE hrdb;
14
Chapter 1
f
You can use the dropdb command line-utility, which is a wrapper around the DROP
DATABASE command:
$ dropdb hrdb;
How it works...
The DROP DATABASE statement permanently deletes catalog entries and the data directory.
Only the owner of the database can issue the DROP DATABASE statement.
Also, it is not possible to drop a database to which you are connected. In order to delete the
database, the database owner will have to make a connection to another database of which
he is an owner.
There's more...
One situation that demands attention is when a user tries to drop a database that has active
connections. The user will get an error when trying to drop such a database.
In order to drop a database that has active connections to it, you will have to follow these steps:
1. Identify all of the active sessions on the database. To identify all of the active
sessions on the database, you need to query the pg_stat_activity catalog
table as follows:
SELECT * from
2. Terminate all of the active connections to the database. To terminate all of the active
connections, you will need to use the pg_terminate_backend function as follows:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
datname = 'testdb1';
3. Once all of the connections are terminated, you may proceed with dropping the
database using the DROP DATABASE statement.
Getting ready
A tablespace is a location on the disk where PostgreSQL stores data files containing database
objects, for example indexes, tables, and so on.
15
How to do it...
To create a tablespace in PostgreSQL, you need to use the CREATE TABLESPACE statement.
The following command creates a data_tbs tablespace, which is owned by the agovil user:
CREATE TABLESPACE data_tbs OWNER agovil LOCATION
'/var/lib/pgsql/data/dbs';
How it works...
A tablespace allows you to control the disk layout of PostgreSQL. The owner of the tablespace,
by default, would be the user who executed the CREATE TABLESPACE statement. This
statement also gives you the option of assigning the ownership of the tablespace to a new
user. This option is the part of the OWNER clause in the CREATE TABLESPACE statement.
The name of the tablespace should not begin with a pg_ prefix because this is reserved for
the system tablespaces.
Before deleting a tablespace, ensure that it is empty, which means there should be no
database objects inside it. If the user tries to delete the tablespace when it is not empty,
the command will fail.
There are two options that will aid in deleting the tablespace when it is not empty:
f
After any of the preceding actions have been completed, then the corresponding tablespace
may be dropped.
There's more...
By default, two tablespaces exist in PostgreSQL:
f
16
Chapter 1
You may query the pg_tablespace catalog table to get the list of existing tablespaces in
PostgreSQL, as shown in the following screenshot:
Getting ready
Here, we will first create a new tablespace, hrms, using the following command:
mkdir p
/var/lib/pgsql/data/hrms
Then we set the default tablespace for the testdb1 database to hrms using the
following statement:
CREATE TABLESPACE HRMS OWNER agovil LOCATION
'/var/lib/pgsql/data/hrms';
We will also create a table, insert some records into it, and create a corresponding index for it.
This is being done because the table and its index will be used in the How to do it section of
this recipe:
CREATE
INSERT
INSERT
CREATE
17
How to do it...
Moving a complete database to a different tablespace involves three steps:
1. You will change the tablespace for the given database so that new objects for the
associated database are created in the new tablespace:
ALTER DATABASE testdb1 SET default_tablespace='hrms';
2. You will have to then move all of the existing tables in the corresponding database to
the new tablespace:
ALTER TABLE employee SET TABLESPACE hrms;
3. You will also have to move any existing indexes to the new tablespace:
ALTER INDEX emp_idx SET TABLESPACE hrms;
How it works...
You will have to query the pg_tables catalog table to find out which tables from the current
database need to be moved to a different tablespace.
Similarly for the indexes, you will have to query the pg_indexes catalog table to find out
which indexes need to be moved to a different tablespace.
How to do it...
The initdb command is used to initialize or create the database cluster. The D switch of
the initdb command is used to specify the filesystem location for the database cluster.
To create the database cluster, use the initdb command:
$ initdb -D /var/lib/pgsql/data
Another way of initializing the database cluster is by calling the initdb command via the
pg_ctl utility:
$ pg_ctl -D /var/lib/pgsql/data initdb
18
Chapter 1
How it works...
A database cluster is a collection of databases that are managed by a single server instance.
When the initdb command is triggered, the directories in which the database data will
reside are created, shared catalog tables are generated, and the template1 and postgres
databases are created, out of which the default database is postgres. The initdb
command initializes the database cluster default locale and the character set encoding.
You can refer to http://www.postgresql.org/docs/9.3/static/creatingcluster.html for more information on initializing a database cluster.
Getting ready
The term "server" refers to the database and the associated backend processes. The term
"service" refers to the operating system wrapper through which the server gets invoked. In
normal circumstances, the PostgreSQL server will usually start automatically when the system
boots up. However, there will be situations where you may have to start the server manually
for different reasons.
How to do it...
There are a couple of methods through which the PostgreSQL server can be started on Unix or
Linux platforms:
f
The first method relies on passing the start argument to the pg_ctl utility to get
the postmaster backend process started, which effectively means starting the
PostgreSQL server.
The next method relies on using the service commands, which, if supported by the
operating system, can be used as a wrapper to the installed PostgreSQL script.
The last method involves invoking the installed PostgreSQL script directly using its
complete path.
On most Unix distributions and Red Hat-based Linux distributions, the pg_ctl utility can be
used as follows:
pg_ctl -D /var/lib/pgsql/data start
19
For PostgreSQL version 9.3, the service command to start the PostgreSQL server is as follows:
service postgresql-9.3 start
You may also start the server by manually invoking the installed PostgreSQL script using its
complete path:
/etc/rc.d/init.d/postgresql-9.3 start
How it works...
The start argument of the pg_ctl utility will first start PostgreSQL's postmaster backend
process using the path of the data directory.
The database system will then start up successfully, report the last time the database system
was shut down, and provide various debugging statements before returning the postgres
user to the shell prompt.
There's more...
In Ubuntu and Debian Linux distributions, the pg_ctlcluster wrapper can be used with the
start argument to start the postmaster server for a particular cluster. A cluster is a group of
one or more PostgreSQL database servers that may coexist on a single host.
How to do it...
There are a couple of ways by which the PostgreSQL server can be stopped.
20
Chapter 1
On Unix distributions and Red Hat-based Linux distributions, we can use the stop argument
of the pg_ctl utility to stop the postmaster:
pg_ctl -D /var/lib/pgsql/data stop -m fast
Using the service command, the PostgreSQL server can be stopped like this:
service postgresql stop
You may also stop the server by manually invoking the installed PostgreSQL script using its
complete path:
/etc/rc.d/init.d/postgresql stop
On Windows-based systems, you may stop the postmaster service in this manner:
NET STOP postgresql-9.3
How it works...
The pg_ctl utility checks for the running postmaster process, and if the stop argument of
the pg_ctl utility is invoked, then the server is shut down.
By default, the PostgreSQL server will wait for clients to first cancel their connections before
shutting down.
However, with the use of a fast shutdown, there is no wait time involved as all of the user
transactions will be aborted and all connections will be disconnected.
There's more...
There may be situations where one needs to stop the PostgreSQL server in an emergency
situation, and for this, PostgreSQL provides the immediate shutdown mode.
In case of immediate shutdown, a process will receive a harsher signal and will not be able to
respond to the server anymore.
The consequence of this type of shutdown is that PostgreSQL is not able to finish its disk I/O,
and therefore has to do a crash recovery the next time it is started.
The immediate shutdown mode can be invoked like this:
pg_ctl -D /var/lib/pgsql/data stop -m immediate
21
How to do it...
There are a couple of ways by which the status of the PostgreSQL server can be checked.
On Unix and on Red Hat-based Linux distributions, the status argument of the pg_ctl utility
can be used to check the status of a running postmaster backend:
pg_ctl -D /var/lib/pgsql/data status
On Unix-based and Linux-based platforms supporting the service command, the status of a
postgresql process can be checked as follows:
service postgresql status
You may also check the server status by manually invoking the installed PostgreSQL script
using its complete path:
/etc/rc.d/init.d/postgresql status
How it works...
The status mode of the pg_ctl utility checks whether the postmaster process is running in
the specified data directory.
If the server is running, then the process ID and the command-line options that were used to
invoke it are displayed.
22
Chapter 1
How to do it...
Some of the configuration parameters in PostgreSQL can be changed on the fly. However,
changes to other configurations can only be reflected once the server configuration files
are reloaded.
On most Unix-based and Linux-based platforms, the command to reload the server
configuration file is as follows:
pg_ctl -D /var/lib/pgsql/data reload
It is also possible to reload the configuration file while being connected to a PostgreSQL
session. However, this can be done by the superuser only:
postgres=# select pg_reload_conf();
On Red Hat and other Linux-based systems that support the service command, the
postgresql command to reload the configuration file is as follows:
service postgresql reload
How it works...
To ensure that changes made to the parameters in the configuration file take effect, a reload
of the configuration file is needed. Reloading the configuration files requires sending the
sighup signal to the postmaster process, which in turn will forward it to the other connected
backend sessions.
There are some configuration parameters whose changed values can only be reflected by
a server reload. These configuration parameters have a value known as sighup for the
attribute context in the pg_settings catalog table:
SELECT name, setting, unit ,(source = 'default') as is_default FROM
pg_settings WHERE context = 'sighup'
AND (name like '%delay' or name like '%timeout')
AND setting != '0';
23
Terminating connections
Every major RDBMS, including PostgreSQL, allows simultaneous and concurrent database
connections in order for users to run transactions. Due to such concurrent processing of
databases, it may be during peak transaction hours that database performance becomes
slow or that there are some blocking sessions. In order to deal with such situations, we might
have to terminate some specific sessions or sessions coming from a particular user so that we
can get database performance back to normal.
How to do it...
PostgreSQL provides the pg_terminate_backend function to kill a specific session. Even
though the pg_terminate_backend function acts on a single connection at a time, we can
embed pg_terminate_backend by wrapping it around the SELECT query to kill multiple
connections, based on the filter criteria specified in the WHERE clause.
To terminate all of the connections from a particular database, we can use the
pg_terminate_backend function as follows:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE datname = 'testdb1';
To terminate all of the connections for a particular user, we can use pg_terminate_
backend like this:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE usename = 'agovil';
24
Chapter 1
How it works...
The pg_terminate_backend function requires the pid column or process ID as input.
The value of pid can be obtained from the pg_stat_activity catalog table. Once
pid is passed as input to the pg_terminate_backend function, all running queries will
automatically be canceled and it will terminate a specific connection corresponding to the
process ID as found in the pg_stat_activity table.
Terminating backends is also useful to free memory from idle postgres processes that was
not released for whatever reason and was hogging system resources.
There's more...
If the requirement is to cancel running queries and not to terminate existing sessions, then
we can use the pg_cancel_backend function to cancel all active queries on a connection.
However, with the pg_cancel_backend function, we can only kill runaway queries issued in
a database or by a specific user. It does not have the ability to terminate connections.
To cancel all of the running queries issued against a database, we can use the pg_cancel_
backend function as follows:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity
WHERE datname = 'testdb1';
To cancel all of the running queries issued by a specific user, we can use the pg_cancel_
backend function like this:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity
WHERE usename = 'agovil';
In versions before PostgreSQL 9.2, the procpid column has to be passed as input to the
pg_terminate_backend and pg_cancel_backend functions to terminate running
sessions and cancel queries. The pid column replaced the procpid column from
PostgreSQL version 9.2 onwards.
You may refer to https://blog.sleeplessbeastie.eu/2014/07/23/how-toterminate-postgresql-sessions/ and http://www.devopsderek.com/
blog/2012/11/13/list-and-disconnect-postgresql-db-sessions/ for
more information regarding terminating backend connections.
25
www.PacktPub.com
Stay Connected: