Getting StartedInstallation
Before you can use PostgreSQL you need
to install it, of course. It is possible that
PostgreSQL is already installed at your
site, either because it was included in your operating system
distribution or because the system administrator already installed
it. If that is the case, you should obtain information from the
operating system documentation or your system administrator about
how to access PostgreSQL.
If you are not sure whether PostgreSQL
is already available or whether you can use it for your
experimentation then you can install it yourself. Doing so is not
hard and it can be a good exercise.
PostgreSQL can be installed by any
unprivileged user; no superuser (root)
access is required.
If you are installing PostgreSQL
yourself, then refer to
for instructions on installation, and return to
this guide when the installation is complete. Be sure to follow
closely the section about setting up the appropriate environment
variables.
If your site administrator has not set things up in the default
way, you might have some more work to do. For example, if the
database server machine is a remote machine, you will need to set
the PGHOST environment variable to the name of the
database server machine. The environment variable
PGPORT might also have to be set. The bottom line is
this: if you try to start an application program and it complains
that it cannot connect to the database, you should consult your
site administrator or, if that is you, the documentation to make
sure that your environment is properly set up. If you did not
understand the preceding paragraph then read the next section.
Architectural Fundamentals
Before we proceed, you should understand the basic
PostgreSQL system architecture.
Understanding how the parts of
PostgreSQL interact will make this
chapter somewhat clearer.
Postgres-XL>, in short, is a collection
of PostgreSQL> database clusters which act as if the whole
collection is a single database cluster. Based on your database design,
each table is replicated or distributed among member databases.
To provide this capability, Postgres-XL> is
composed of three major components called the GTM, Coordinator and
Datanode. The GTM is responsible to provide ACID property of
transactions. The Datanode stores table data and handle SQL statements
locally. The Coordinator handles each SQL statements from
applications, determines which Datanode to go, and sends plans on
to the appropriate Datanodes.
You usually should run GTM on a separate server because GTM has to take
care of transaction requirements from all the Coordinators and
Datanodes. To group multiple requests and responses from
Coordinator and Datanode processes running on the same server, you can
configure GTM-Proxy. GTM-Proxy reduces the number of interactions
and the amount of data to GTM. GTM-Proxy also helps handle
GTM failures.
It is often good practice to run both Coordinator and Datanode on the
same server because we don't have to worry about workload balance
between the two, and you can often get at data from replicated tables locally
without sending an additional request out on the network.
You can have any number of servers where these
two components are running. Because both Coordinator and Datanode
are essentially PostgreSQL instances, you should configure them to
avoid resource conflict. It is very important to assign them
different working directories and port numbers.
Postgres-XL allows multiple Coordinators to accept statements
from applications independently but in an integrated way. Any
writes from any Coordinator is available from any other
Coordinators. They acts as if they are single database.
The Coordinator's role is to accept statements, find what Datanodes are
involved, send query plans on to the appropriate Datanodes if
needed, collect the results
and write them back to applications.
The Coordinator does not store any user data. It stores only catalog
data to determine how to process statements, where the target
Datanodes are, among others. Therefore, you don't have to worry
about Coordinator failure much. When the Coordinator fails, you
can just switch to the other one.
The GTM could be single point of failure (SPOF). To prevent this, you can
run another GTM as a GTM-Standby to backup GTM's status. When GTM fails,
GTM-Proxy can switch to the standby on the fly. This will be described in
detail in high-availability sections.
As described above, the Coordinators and Datanodes
of Postgres-XL> are
essentially PostgreSQL> database servers. In database
jargon, PostgreSQL uses a client/server
model. A PostgreSQL session consists
of the following cooperating processes (programs):
A server process, which manages the database files, accepts
connections to the database from client applications, and
performs database actions on behalf of the clients. The
database server program is called
postgres.
postgres
The user's client (frontend) application that wants to perform
database operations. Client applications can be very diverse
in nature: a client could be a text-oriented tool, a graphical
application, a web server that accesses the database to
display web pages, or a specialized database maintenance tool.
Some client applications are supplied with the
PostgreSQL distribution; most are
developed by users.
As is typical of client/server applications, the client and the
server can be on different hosts. In that case they communicate
over a TCP/IP network connection. You should keep this in mind,
because the files that can be accessed on a client machine might
not be accessible (or might only be accessible using a different
file name) on the database server machine.
The PostgreSQL server can handle
multiple concurrent connections from clients. To achieve this it
starts (forks) a new process for each connection.
From that point on, the client and the new server process
communicate without intervention by the original
postgres process. Thus, the
master server process is always running, waiting for
client connections, whereas client and associated server processes
come and go. (All of this is of course invisible to the user. We
only mention it here for completeness.)
Creating a Postgres-XL cluster
As mentioned in the architectural fundamentals, Postgres-XL
is a collection of multiple components. It can be a bit of work to come up with your
initial working setup. In this tutorial, we will show how one can start with
an empty configuration file and use the pgxc_ctl
utility to create your Postgres-XL cluster from scratch.
A few pre-requisites are necessary on each node that is going to be a part of the
Postgres-XL setup.
Password-less ssh access is required from the node that is going to run the
pgxc_ctl utility.
The PATH environment variable should have the correct Postgres-XL
binaries on all nodes, especially while running a command via ssh.
The pg_hba.conf entries must be updated to allow remote access. Variables
like and
in the pgxc_ctl.conf configuration file may need appropriate changes.
Firewalls and iptables may need to be updated to allow access to ports.
The pgxc_ctl utility should be present in your PATH. If it is
not there, it can be compiled from source.
$cd $XLSRC/contrib/pgxc_ctl$make install
We are now ready to prepare our template configuration file. The pgxc_ctl
utility allows you to create three types of configuration. We will choose the empty
configuration which will allow us to create our Postgres-XL setup from
scratch. Note that we also need to set up the environment
variable properly for all future invocations of pgxc_ctl.
$export dataDirRoot=$HOME/DATA/pgxl/nodes$mkdir $HOME/pgxc_ctl$pgxc_ctl
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Reading configuration using /Users/postgres/pgxc_ctl/pgxc_ctl_bash --home
/Users/postgres/pgxc_ctl --configuration
/Users/postgres/pgxc_ctl/pgxc_ctl.conf
Finished reading configuration.
******** PGXC_CTL START ***************
Current directory: /Users/postgres/pgxc_ctl
PGXC$ prepare config emptyPGXC$ exit
The empty configuration file is now ready. You should now make changes
to the pgxc_ctl.conf. At a minimum,
should be set correctly. The configuration file does contain USERi> and HOME>
environment variables to allow easy defaults for the current user.
The next step is to add the GTM master to the setup.
$pgxc_ctlPGXC$ add gtm master gtm localhost 20001 $dataDirRoot/gtm
Use the "monitor" command to check the status of the cluster.
$pgxc_ctlPGXC$ monitor all
Running: gtm master
Let us now add a couple of coordinators. When the first coordinator is added, it just
starts up. When another coordinator is added, it connects to any existing coordinator node
to fetch the metadata about objects.
PGXC$ add coordinator master coord1 localhost 30001 30011 $dataDirRoot/coord_master.1 none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
PGXC$ add coordinator master coord2 localhost 30002 30012 $dataDirRoot/coord_master.2 none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Moving on to the addition of a couple of datanodes, now. When the first datanode is added,
it connects to any existing coordinator node to fetch global metadata. When a subsequent
datanode is added, it connects to any existing datanode for the metadata.
PGXC$ add datanode master dn1 localhost 40001 40011 $dataDirRoot/dn_master.1 none none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
PGXC$ add datanode master dn2 localhost 40002 40012 $dataDirRoot/dn_master.2 none none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Your Postgres-XL setup is ready now and you can move on to the next
"Getting Started" topic.
Read on further, only if you want a quick crash course on the various commands you can
try out with Postgres-XL. It is strongly recommended to go through
the entire documentation for more details on each and every command that we will touch upon
below.
Connect to one of the coordinators and create a test database.
$ psql -p 30001 postgres
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \q
Look at pgxc_node catalog. It should show all the configured nodes. It is normal to have
negative node id values. This will be fixed soon.
$ psql -p 30001 testdb
testdb=# SELECT * FROM pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+-----------+----------------+------------------+-------------
coord1 | C | 30001 | localhost | f | f | 1885696643
coord2 | C | 30002 | localhost | f | f | -1197102633
dn1 | D | 40001 | localhost | t | t | -560021589
dn2 | D | 40002 | localhost | f | t | 352366662
(4 rows)
Let us now create a distributed table, distributed on first column by HASH.
testdb=# CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1);
CREATE TABLE
testdb=# \d+ disttab
Table "public.disttab"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
col3 | text | | extended | |
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES
Also create a replicated table.
testdb=# CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY
REPLICATION;
CREATE TABLE
testdb=# \d+ repltab
Table "public.repltab"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: ALL DATANODES
Now insert some sample data in these tables.
testdb=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo';
INSERT 0 100
testdb=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200);
INSERT 0 100
Ok. So the distributed table should have 100 rows
testdb=# SELECT count(*) FROM disttab;
count
-------
100
(1 row)
And they must not be all on the same node. xc_node_id> is a system
column which shows the originating datanode for each row.
Note that the distribution can be slightly uneven because of the HASH
function
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 42
352366662 | 58
(2 rows)
For replicated tables, we expect all rows to come from a single
datanode (even though the other node has a copy too).
testdb=# SELECT count(*) FROM repltab;
count
-------
100
(1 row)
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 100
(1 row)
Now add a new datanode to the cluster.
PGXC$ add datanode master dn3 localhost 40003 40013 $dataDirRoot/dn_master.3 none none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3
Note that during cluster reconfiguration, all outstanding transactions
are aborted and sessions are reset. So you would typically see errors
like these on open sessions
testdb=# SELECT * FROM pgxc_node;
ERROR: canceling statement due to user request <==== pgxc_pool_reload() resets all sessions and aborts all open transactions
testdb=# SELECT * FROM pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+-----------+----------------+------------------+-------------
coord1 | C | 30001 | localhost | f | f | 1885696643
coord2 | C | 30002 | localhost | f | f | -1197102633
dn1 | D | 40001 | localhost | t | t | -560021589
dn2 | D | 40002 | localhost | f | t | 352366662
dn3 | D | 40003 | localhost | f | f | -700122826
(5 rows)
Note that with new datanode addition, Existing tables are not affected. The distribution information now
explicitly shows the older datanodes
testdb=# \d+ disttab
Table "public.disttab"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
col3 | text | | extended | |
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: dn1, dn2
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 42
352366662 | 58
(2 rows)
testdb=# \d+ repltab
Table "public.repltab"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: dn1, dn2
Let us now try to redistribute tables so that they can take advantage
of the new datanode
testdb=# ALTER TABLE disttab ADD NODE (dn3);
ALTER TABLE
testdb=# \d+ disttab
Table "public.disttab"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
col3 | text | | extended | |
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-700122826 | 32
352366662 | 32
-560021589 | 36
(3 rows)
Let us now add a third coordinator.
PGXC$ add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none nonePGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: coordinator master coord3
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3
testdb=# SELECT * FROM pgxc_node;
ERROR: canceling statement due to user request
testdb=# SELECT * FROM pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+-----------+----------------+------------------+-------------
coord1 | C | 30001 | localhost | f | f | 1885696643
coord2 | C | 30002 | localhost | f | f | -1197102633
dn1 | D | 40001 | localhost | t | t | -560021589
dn2 | D | 40002 | localhost | f | t | 352366662
dn3 | D | 40003 | localhost | f | f | -700122826
coord3 | C | 30003 | localhost | f | f | 1638403545
(6 rows)
We can try some more ALTER TABLE so as to delete a node from a table
distribution and add it back
testdb=# ALTER TABLE disttab DELETE NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
352366662 | 42
-700122826 | 58
(2 rows)
testdb=# ALTER TABLE disttab ADD NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-700122826 | 32
352366662 | 32
-560021589 | 36
(3 rows)
You could also alter a replicated table to make it a distributed table.
Note that even though the cluster has 3 datanodes now, the table will continue
to use only 2 datanodes where the table was originally replicated on.
testdb=# ALTER TABLE repltab DISTRIBUTE BY HASH(col1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 42
352366662 | 58
(2 rows)
testdb=# ALTER TABLE repltab DISTRIBUTE BY REPLICATION;
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 100
(1 row)
Remove the coordinator added previously now. You can use the "clean" option
to remove the corresponding data directory as well.
PGXC$ remove coordinator master coord3 cleanPGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3
testdb=# SELECT oid, * FROM pgxc_node;
ERROR: canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
11197 | coord1 | C | 30001 | localhost | f | f | 1885696643
16384 | coord2 | C | 30002 | localhost | f | f | -1197102633
16385 | dn1 | D | 40001 | localhost | t | t | -560021589
16386 | dn2 | D | 40002 | localhost | f | t | 352366662
16397 | dn3 | D | 40003 | localhost | f | f | -700122826
(5 rows)
Let us try to remove a datanode now. NOTE: Postgres-XL does not
employ any additional checks to ascertain if the datanode being dropped has data from tables
that are replicated/distributed. It is the responsibility of the user to ensure that it's
safe to remove a datanode.
You can use the below query to find out if the datanode being removed has any data on it.
Do note that this only shows tables from the current database. You might want to ensure
the same for all databases before going ahead with the datanode removal. Use the OID of the
datanode that is to be removed in the below query:
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids
---------+---------------+----------+-----------------+---------------+-------------------
16388 | H | 1 | 1 | 4096 | 16385 16386 16397
(1 row)
testdb=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids
---------+---------------+----------+-----------------+---------------+----------
(0 rows)
Ok, it is safe to remove datanode "dn3" now.
PGXC$ remove datanode master dn3 cleanPGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
testdb=# SELECT oid, * FROM pgxc_node;
ERROR: canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
11197 | coord1 | C | 30001 | localhost | f | f | 1885696643
16384 | coord2 | C | 30002 | localhost | f | f | -1197102633
16385 | dn1 | D | 40001 | localhost | t | t | -560021589
16386 | dn2 | D | 40002 | localhost | f | t | 352366662
(4 rows)
The pgxc_ctl utility can also help in setting up slaves for
datanodes and coordinators. Let us setup a slave for a datanode and see how failover can
be performed in case the master datanode goes down.
PGXC$ add datanode slave dn1 localhost 40101 40111 $dataDirRoot/dn_slave.1 none $dataDirRoot/datanode_archlog.1PGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode slave dn1
Running: datanode master dn2
testdb=# EXECUTE DIRECT ON(dn1) 'SELECT client_hostname, state, sync_state FROM pg_stat_replication';
client_hostname | state | sync_state
-----------------+-----------+------------
| streaming | async
(1 row)
Add some more rows to test failover now.
testdb=# INSERT INTO disttab SELECT generate_series(1001,1100), generate_series(1101, 1200), 'foo';
INSERT 0 100
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 94
352366662 | 106
(2 rows)
Let us simulate datanode failover now. We will first stop the datanode master "dn1" for
which we configured a slave above. Note that since the slave is connected to the master
we will use "immediate" mode for stopping it.
PGXC$ stop -m immediate datanode master dn1
Since a datanode is down, queries will fail. Though a few queries may still work if
the failed node is not required to run the query, and that is determined by the
distribution of the data and the WHERE clause being used.
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR: Failed to get pooled connections
testdb=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3;
xc_node_id | col1 | col2 | col3
------------+------+------+------
352366662 | 3 | 103 | foo
(1 row)
We will now perform the failover and check that everything is working fine post that.
PGXC$ failover datanode dn1
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR: canceling statement due to user request
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 94
352366662 | 106
(2 rows)
The pgxc_node catalog now should have updated entries. Especially, the
failed over datanode node_host and node_port should have been replaced
with the slave's host and port values.
testdb=# SELECT oid, * FROM pgxc_node;
oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
11197 | coord1 | C | 30001 | localhost | f | f | 1885696643
16384 | coord2 | C | 30002 | localhost | f | f | -1197102633
16386 | dn2 | D | 40002 | localhost | f | t | 352366662
16385 | dn1 | D | 40101 | localhost | t | t | -560021589
(4 rows)
PGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Creating a Databasedatabasecreatingcreatedb
The first test to see whether you can access the database server
is to try to create a database. A running
PostgreSQL server can manage many
databases. Typically, a separate database is used for each
project or for each user.
Possibly, your site administrator has already created a database
for your use. In that case you can omit this step and skip ahead
to the next section.
To create a new database, in this example named
mydb, you use the following command:
$createdb mydb
If this produces no response then this step was successful and you can skip over the
remainder of this section.
If you see a message similar to:
createdb: command not found
then PostgreSQL> was not installed properly. Either it was not
installed at all or your shell's search path was not set to include it.
Try calling the command with an absolute path instead:
$/usr/local/pgsql/bin/createdb mydb
The path at your site might be different. Contact your site
administrator or check the installation instructions to
correct the situation.
Another response could be this:
createdb: could not connect to database postgres: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
This means that the server was not started, or it was not started
where createdb expected it. Again, check the
installation instructions or consult the administrator.
Another response could be this:
createdb: could not connect to database postgres: FATAL: role "joe" does not exist
where your own login name is mentioned. This will happen if the
administrator has not created a PostgreSQL> user account
for you. (PostgreSQL> user accounts are distinct from
operating system user accounts.) If you are the administrator, see
for help creating accounts. You will need to
become the operating system user under which PostgreSQL>
was installed (usually postgres>) to create the first user
account. It could also be that you were assigned a
PostgreSQL> user name that is different from your
operating system user name; in that case you need to use the
If you have a user account but it does not have the privileges required to
create a database, you will see the following:
createdb: database creation failed: ERROR: permission denied to create database
Not every user has authorization to create new databases. If
PostgreSQL refuses to create databases
for you then the site administrator needs to grant you permission
to create databases. Consult your site administrator if this
occurs. If you installed PostgreSQL
yourself then you should log in for the purposes of this tutorial
under the user account that you started the server as.
As an explanation for why this works:
PostgreSQL user names are separate
from operating system user accounts. When you connect to a
database, you can choose what
PostgreSQL user name to connect as;
if you don't, it will default to the same name as your current
operating system account. As it happens, there will always be a
PostgreSQL user account that has the
same name as the operating system user that started the server,
and it also happens that that user always has permission to
create databases. Instead of logging in as that user you can
also specify the option everywhere to select
a PostgreSQL user name to connect as.
You can also create databases with other names.
PostgreSQL allows you to create any
number of databases at a given site. Database names must have an
alphabetic first character and are limited to 63 bytes in
length. A convenient choice is to create a database with the same
name as your current user name. Many tools assume that database
name as the default, so it can save you some typing. To create
that database, simply type:
$createdb
If you do not want to use your database anymore you can remove it.
For example, if you are the owner (creator) of the database
mydb, you can destroy it using the following
command:
$dropdb mydb
(For this command, the database name does not default to the user
account name. You always need to specify it.) This action
physically removes all files associated with the database and
cannot be undone, so this should only be done with a great deal of
forethought.
More about createdb and dropdb can
be found in and
respectively.
Accessing a Databasepsql
Once you have created a database, you can access it by:
Running the PostgreSQL interactive
terminal program, called psql>, which allows you
to interactively enter, edit, and execute
SQL commands.
Using an existing graphical frontend tool like
pgAdmin or an office suite with
ODBC> or JDBC> support to create and manipulate a
database. These possibilities are not covered in this
tutorial.
Writing a custom application, using one of the several
available language bindings. These possibilities are discussed
further in .
You probably want to start up psql to try
the examples in this tutorial. It can be activated for the
mydb database by typing the command:
$psql mydb
If you do not supply the database name then it will default to your
user account name. You already discovered this scheme in the
previous section using createdb.
In psql, you will be greeted with the following
message:
psql (&version;)
Type "help" for help.
mydb=>
superuser
The last line could also be:
mydb=#
That would mean you are a database superuser, which is most likely
the case if you installed the PostgreSQL instance
yourself. Being a superuser means that you are not subject to
access controls. For the purposes of this tutorial that is not
important.
If you encounter problems starting psql
then go back to the previous section. The diagnostics of
createdb and psql are
similar, and if the former worked the latter should work as well.
The last line printed out by psql is the
prompt, and it indicates that psql is listening
to you and that you can type SQL queries into a
work space maintained by psql. Try out these
commands:
versionmydb=>SELECT version();
version
------------------------------------------------------------------------------------------
PostgreSQL &version; on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)
mydb=>SELECT current_date;
date
------------
2016-01-07
(1 row)
mydb=>SELECT 2 + 2;
?column?
----------
4
(1 row)
The psql program has a number of internal
commands that are not SQL commands. They begin with the backslash
character, \.
For example,
you can get help on the syntax of various
PostgreSQL SQL
commands by typing:
mydb=>\h
To get out of psql, type:
mydb=>\q
and psql will quit and return you to your
command shell. (For more internal commands, type
\? at the psql prompt.) The
full capabilities of psql are documented in
. In this tutorial we will not use these
features explicitly, but you can use them yourself when it is helpful.