Inspired by Actual Events: PostgreSQL

Showing posts with label PostgreSQL. Show all posts

Tuesday, February 19, 2019

Stashing Previously Set psql Variables

The command-line based "PostgreSQL interactive terminal" known as psql is handy for manipulating and accessing data in a PostgreSQL database. Because of its command-line nature, psql is particularly well suited for use in scripts. One of the psql features that makes it even more useful in scripting contexts is its support for "meta-commands". As the psql documentation states, "Anything you enter in psql that begins with an unquoted backslash is a psql meta-command that" and "these commands make psql more useful for administration or scripting."

When writing psql scripts, it is often preferable to set some variables locally for the time period the script is being run, but might also be desirable to not change these variables permanently for the psql session if it's likely that other scripts or other work will be performed from the psql session after the script's conclusion. In this post, I will demonstrate use of psql's \set meta-command to temporarily store off previous settings of variables to restore these settings at the script's conclusion.

The psql documentation describes "a number of ... variables [that] are treated specially by psql." These "specially treated variables" are the ones that we most likely want to ensure that we set for our script's duration only and the restore their previously set values upon script exit. The documentation describes these "specially treated variables": "They represent certain option settings that can be changed at run time by altering the value of the variable, or in some cases represent changeable state of psql. By convention, all specially treated variables' names consist of all upper-case ASCII letters (and possibly digits and underscores). To ensure maximum compatibility in the future, avoid using such variable names for your own purposes." Examples of these "specially treated variables" include AUTOCOMMIT, ECHO, ECHO_HIDDEN, PROMPT1, PROMPT2, PROMPT3, and VERBOSE, but there are many more.

For demonstration purposes, let's suppose you want to set the ECHO variable to something other than its default (none). For our purposes, we'll set ECHO to queries. We want to make sure, however, that we set it back to whatever it was when our script was called before leaving the script. The following simple psql logic accomplishes this.

\set PRIOR_ECHO :ECHO
\set ECHO queries

-- Run various queries for which you want to see the query itself output before the query results ...

\set ECHO :PRIOR_ECHO

It's as simple as that to temporarily set "specially treated variables" for your script's convenience without permanently changing the settings for the caller who might be running your script in the same psql session. The key things to remember are that the \set meta-command is always all lowercase, the specially treated variables have names that are always all uppercase but the specially treated variable values do not need to be uppercase (and typically are not), and the values in a variable can be accessed by prefixing the variable name with a colon (:).

Friday, February 15, 2019

PostgreSQL's psql \set versus SET

It is easy for someone who is new to PostgreSQL and who uses PostgreSQL's terminal editor psql to confuse the commands \set and SET. This post contrasts these commands and provides a brief overview of other commands that include the word "set".

The easiest way to remember how to differentiate \set from SET is to keep in mind that the "backslash commands" such as \set are "meta commands" for the command-line psql tool and do not mean anything to the PostgreSQL database itself. The SET command, which lacks a backslash, is a PostgreSQL database command that happens to be executed against the database from the psql command-line client.

Contrasting `\set` and `SET`
Command	`\set`	`SET` or `set` (or other case-insensitive variation¹)
Context	psql terminal editor configuration meta command (interactive client terminal configuration)	PostgreSQL database configuration command (server configuration)
Ends with Semicolon?	No	Yes
Command Case Sensitive?	Yes (must be exactly `\set`)	No
Parameter/Variable Case Sensitive?	Yes²	No
How are Settings Displayed?	`\set`³	`SHOW ALL;` or `show all;`⁴
How are Settings Displayed?	`\echo :variable_name`	`SHOW variable_name;`⁵
Examples	`\set AUTOCOMMIT on`	`SET search_path TO myschema, public;`
Footnotes	I prefer to use all uppercase letters for `SET` to more clearly differentiate from `/set`. Two variables with same letters but different cases are two distinct variables. `\set` displays all variables when no arguments are provided to it. Any case variation (even `sHoW aLl;`) works. Any case variation of command and/or variable name also works.

There are two more psql meta commands that that "set" things and include the name "set". The \pset met command configures how psql presents "query result tables." Like \set, \pset can be specified without argument to see all of the current presentation settings.

Unlike the psql meta commands \set and \pset, the \gset psql metacommand does affect the PostgreSQL server because \gset submits the query buffer to the server and then stores the output returned from the server into specified psql variables. I discussed \gset with a few additional details in the blog post "Setting PostgreSQL psql Variable Based Upon Query Result."

Although \set and SET can be used to set variables, the easiest way to distinguish between them is to consider that the backslash commands such as \set are psql commands (and so \pset sets variables in the psql client tool) and commands without the backslash such as SET are PostgreSQL commands sent to the server from psql or from any other client (but ultimately set variables on the server).

Saturday, February 2, 2019

Revealing the Queries Behind psql's Backslash Commands

PostgreSQL's psql interactive terminal tool provides several useful "backslash list commands" such as \d (lists "relations" such as tables, views, indexes, and sequences), \dt (lists tables), \di (lists indexes), \ds (lists sequences), \dv (lists views), \df (lists functions), \du (lists roles), and \? (displays help/usage details on backslash commands). These commands are concise and much simpler to use than writing the queries against PostgreSQL system catalogs (pg_class, pg_roles, pg_namespace, pg_trigger, pg_index, etc.) and information_schema that would provide the same types of details.

Although the psql backslash commands are easier to use than their associated queries, there are situations when it is important to know the full query behind a particular command. These situations include needing to perform a slightly different/adapted query from that associated with the pre-built command and needing to perform similar queries in scripts or code that are being used as PostgreSQL clients instead of psql. These situations make it important to be able to determine what queries psql is performing and the psql option -E (or --echo-hidden) allow that.

The PostgreSQL psql documentation states that the psql options -E and --echo-hidden "echo the actual queries generated by \d and other backslash commands." The documentation adds commentary on why this is useful, "You can use this to study psql's internal operations." When psql is started with the -E or --echo-hidden options, it will display the query associated with a backslash command before executing that command. The next screen snapshot illustrates this for the \du command used to show roles.

From use of psql -E and execution of the command \du, we're able to see that the query underlying \du is this:

Although the query is not nearly as nice to use as \du, we are now able to adapt this query for a related but different use case and are able to run this query from a PostgreSQL client other than psql.

Monday, November 7, 2016

Fixed-Point and Floating-Point: Two Things That Don't Go Well Together

One of the more challenging aspects of software development can be dealing with floating-point numbers. David Goldberg's 1991 Computing Surveys paper What Every Computer Scientist Should Know About Floating-Point Arithmetic is a recognized classic treatise on this subject. This paper not only provides an in-depth look at how floating-point arithmetic is implemented in most programming languages and computer systems, but also, through its length and detail, provides evidence of the nuances and difficulties of this subject. The nuances of dealing with floating-point numbers in Java and tactics to overcome these challenges are well documented in sources such as JavaWorld's Floating-point Arithmetic, IBM DeveloperWorks's Java's new math, Part 2: Floating-point numbers and Java theory and practice: Where's your point?, Dr. Dobb's Java's Floating-Point (Im)Precision and Fixed, Floating, and Exact Computation with Java's Bigdecimal, Java Glossary's Floating Point, Java Tutorial's Primitive Data Types, and NUM04-J. Do not use floating-point numbers if precise computation is required.

Most of the issues encountered and discussed in Java related to floating-point representation and arithmetic are caused by the inability to precisely represent (usually) decimal (base ten) floating point numbers with an underlying binary (base two) representation. In this post, I focus on similar consequences that can result from mixing fixed-point numbers (as stored in a database) with floating-point numbers (as represented in Java).

The Oracle database allows numeric columns of the NUMBER data type to be expressed with two integers that represent "precision" and "scale". The PostgreSQL implementation of the numeric data type is very similar. Both Oracle's NUMBER(p,s) and PostgreSQL's numeric(p,s) allow the same datatype to represent essentially an integral value (precision specified but scale not specified), a fixed-point number (precision and scale specified), or a floating-point number (neither precision nor scale specified). Simple Java/JDBC-based examples in this post will demonstrate this.

For the examples in this post, a simple table named DOUBLES in Oracle and doubles in PostgreSQL will be created. The DDL statements for defining these simple tables in the two database are shown next.

createOracleTable.sql

CREATE TABLE doubles
(
   int NUMBER(5),
   fixed NUMBER(3,2),
   floating NUMBER
);

createPgTable.sql

CREATE TABLE doubles
(
   int numeric(5),
   fixed numeric(3,2),
   floating numeric
);

With the DOUBLES table created in Oracle database and PostgreSQL database, I'll next use a simple JDBC PreparedStatement to insert the value of java.lang.Math.PI into each table for all three columns. The following Java code snippet demonstrates this insertion.

Inserting Math.PI into DOUBLES Columns

/** SQL syntax for insertion statement with placeholders. */
private static final String INSERT_STRING =
   "INSERT INTO doubles (int, floating, fixed) VALUES (?, ?, ?)";


final Connection connection = getDatabaseConnection(databaseVendor);
try (final PreparedStatement insert = connection.prepareStatement(INSERT_STRING))
{
   insert.setDouble(1, Math.PI);
   insert.setDouble(2, Math.PI);
   insert.setDouble(3, Math.PI);
   insert.execute();
}
catch (SQLException sqlEx)
{
   err.println("Unable to insert data - " + sqlEx);
}

Querying DOUBLES Columns

/** SQL syntax for querying statement. */
private static final String QUERY_STRING =
   "SELECT int, fixed, floating FROM doubles";

final Connection connection = getDatabaseConnection(databaseVendor);
try (final Statement query = connection.createStatement();
     final ResultSet rs = query.executeQuery(QUERY_STRING))
{
   out.println("\n\nResults for Database " + databaseVendor + ":\n");
   out.println("Math.PI :        " + Math.PI);
   while (rs.next())
   {
      final double integer = rs.getDouble(1);
      final double fixed = rs.getDouble(2);
      final double floating = rs.getDouble(3);
      out.println("Integer NUMBER:  " + integer);
      out.println("Fixed NUMBER:    " + fixed);
      out.println("Floating NUMBER: " + floating);
   }
   out.println("\n");
}
catch (SQLException sqlEx)
{
   err.println("Unable to query data - " + sqlEx);
}

The output of running the above Java insertion and querying code against the Oracle and PostgreSQL databases respectively is shown in the next two screen snapshots.

Comparing Math.PI to Oracle's NUMBER Columns

Comparing Math.PI to PostgreSQL's numeric Columns

The simple examples using Java and Oracle and PostgreSQL demonstrate issues that might arise when specifying precision and scale on the Oracle NUMBER and PostgreSQL numeric column types. Although there are situations when fixed-point numbers are desirable, it is important to recognize that Java does not have a fixed-point primitive data type and use BigDecimal or a fixed-point Java library (such as decimal4j or Java Math Fixed Point Library) to appropriately deal with the fixed-point numbers retrieved from database columns expressed as fixed points. In the examples demonstrated in this post, nothing is really "wrong", but it is important to recognize the distinction between fixed-point numbers in the database and floating-point numbers in Java because arithmetic that brings the two together may not have the results one would expect.

In Java and other programming languages, one needs to not only be concerned about the effect of arithmetic operations and available precision on the "correctness" of floating-point numbers. The developer also needs to be aware of how these numbers are stored in relational database columns in the Oracle and PostgreSQL databases to understand how precision and scale designations on those columns can affect the representation of the stored floating-point number. This is especially applicable if the representations queried from the database are to be used in floating-point calculations. This is another (of many) examples where it is important for the Java developer to understand the database schema being used.

Monday, September 12, 2016

More on Spooling Queries and Results in psql

In the recent blog post SPOOLing Queries with Results in psql, I looked briefly at some PostgreSQL database psql meta-commands and options that can be used to emulate Oracle database's SQL*Plus spooling behavior. In that post, I wrote, "I have not been able to figure out a way to ... have both the query and its results written to the file without needing to use \qecho." Fortunately, since that writing, a colleague pointed me to the psql option --log-file (or -L).

The PostgreSQL psql documentation states that the --log-file / -L option "write[s] all query output into file filename, in addition to the normal output destination." This handy single option prints both the query and its non-error results to the indicated file. For example, if I start psql with the command "psql -U postgres -L C:\output\albums.txt" and then run the query select * from albums;, the generated file C:\output\albums.txt appears like this:

********* QUERY **********
select * from albums;
**************************

           title           |     artist      | year 
---------------------------+-----------------+------
 Back in Black             | AC/DC           | 1980
 Slippery When Wet         | Bon Jovi        | 1986
 Third Stage               | Boston          | 1986
 Hysteria                  | Def Leppard     | 1987
 Some Great Reward         | Depeche Mode    | 1984
 Violator                  | Depeche Mode    | 1990
 Brothers in Arms          | Dire Straits    | 1985
 Rio                       | Duran Duran     | 1982
 Hotel California          | Eagles          | 1976
 Rumours                   | Fleetwood Mac   | 1977
 Kick                      | INXS            | 1987
 Appetite for Destruction  | Guns N' Roses   | 1987
 Thriller                  | Michael Jackson | 1982
 Welcome to the Real World | Mr. Mister      | 1985
 Never Mind                | Nirvana         | 1991
 Please                    | Pet Shop Boys   | 1986
 The Dark Side of the Moon | Pink Floyd      | 1973
 Look Sharp!               | Roxette         | 1988
 Songs from the Big Chair  | Tears for Fears | 1985
 Synchronicity             | The Police      | 1983
 Into the Gap              | Thompson Twins  | 1984
 The Joshua Tree           | U2              | 1987
 1984                      | Van Halen       | 1984
(23 rows)

One drawback when using -L is that any error messages are not written to the file that the queries and successful results are written to. The next screen snapshot demonstrates an error caused by querying from the column name rather than from the table name and the listing after the screen snapshot shows what appears in the output file.

********* QUERY **********
select * from artist;
**************************

The output file generated with psql's -L option shows the incorrect query, but the generated file does not include the error message that was shown in the psql terminal application ('ERROR: relation "artist" does not exist'). I don't know of any way to easily ensure that this error message is written to the same file that the query is written to. Redirection of standard output and standard error is a possibility, but then I'd need to redirect the error messages to a different file than the file to which the query and output are being written based on the filename provided with the -L option.

Saturday, September 10, 2016

AutoCommit in PostgreSQL's psql

One potential surprise for someone familiar with Oracle database's SQL*Plus when being introduced to PostgreSQL database's psql may be psql's default enabling of autocommit. This post provides an overview of psql's handling of autocommit and some related nuances.

By default, Oracle's SQL*Plus command-line tool does not automatically commit DML statements and the operator must explicitly commit these statements as part of a transaction (or exit from the session without rolling back). Because of this, developers and administrators familiar with using SQL*Plus to work with the Oracle database might be a bit surprised when the opposite is true for PostgreSQL and its psql command-line tool. Auto-commit is turned on by default in psql, meaning that every statement (including DML statements such as INSERT, UPDATE, and DELETE statements) are automatically committed once submitted.

One consequence of PostgreSQL's psql enabling autocommit by default is that COMMIT statements are unnecessary. When one tries to submit a commit; in psql with autocommit enabled, the WARNING-level message "there is no transaction in progress" is shown. This is demonstrated in the next screen snapshot.

The remainder of this post looks at how to turn off this automatic committing of all manipulation statements in psql.

One often cited approach to overriding psql's autocommit default is to explicitly begin a transaction with the BEGIN keyword and then psql won't commit until an explicit commit is provided. However, this can become a bit tedious over time and fortunately PostgreSQL's psql provides a convenient way of configuring psql to have autocommit disabled.

Before getting into the easy approach used to disable autocommit in psql, I'll point out here that one should not confuse the advise for ECPG (Embedded SQL in C). When using ECPG, the "SET AUTOCOMMIT" section of the PostgreSQL documentation on ECPG applies. Although this only applies to ECPG and does NOT apply to psql, it might be easy to not realize that as one of the first responses to a Google search for "psql autocommit" is this ECPG-specific manual page. That ECPG-specific manual page states that the command looks like "SET AUTOCOMMIT { = | TO } { ON | OFF }" and adds, "By default, embedded SQL programs are not in autocommit mode, so COMMIT needs to be issued explicitly when desired." This is like Oracle's SQL*Plus and is not how psql behaves by default.

Fortunately, it's very easy to disable autocommit in psql. One merely needs to enter the following at the psql command prompt (AUTOCOMMIT is case sensitive and should be all uppercase):

\set AUTOCOMMIT off

This simple command disables autocommit for the session. One can determine whether autocommit is enabled with a simple \echo meta-command like this (AUTOCOMMIT is case sensitive and all uppercase and prefixed with colon indicating it's a variable):

\echo :AUTOCOMMIT

The next screen snapshot demonstrates the discussion so far. It uses an \echo to indicate the default nature of autocommit (on) and how use of \set AUTOCOMMIT allows it to be disabled (off).

If it's desired to "always" have autocommit disabled, the \set AUTOCOMMIT off meta-command can be added to one's local ~/.psqlrc file. For an even more global setting, this meta-command can be placed in a psqlrc file in the database's system config directory (which can be located using PostgreSQL operating system-level command pg_config --sysconfdir as shown in the next screen snapshot).

One last nuance to be wary of when using psql and dealing with autocommit, is to realize that show AUTOCOMMIT; is generally not useful. In PostgreSQL 9.5, as the next screen snapshot demonstrates, an error message makes it clear that it's not even available anymore.

Conclusion

Although autocommit is enabled by default in PostgreSQL database's psql command-line tool, it can be easily disabled using \set AUTOCOMMIT off explicitly in a session or via configuration in the personal ~/.psqlrc file or in the global system configuration psqlrc file.

Thursday, August 11, 2016

SPOOLing Queries with Results in psql

SQL*Plus, the Oracle database's command-line tool, provides the SPOOL command to "store query results in a file." The next screen snapshot shows SPOOL used in SQL*Plus to spool the listing of user tables to a file called C:\pdf\output.txt.

Both the executed query and the results of the query have been spooled to the file output.txt as shown in the next listing of that file.

Oracle's SQL*Plus's SPOOL-ed output.txt

SQL> select table_name from user_tables;

TABLE_NAME                                                                      
------------------------------                                                  
REGIONS                                                                         
LOCATIONS                                                                       
DEPARTMENTS                                                                     
JOBS                                                                            
EMPLOYEES                                                                       
JOB_HISTORY                                                                     
PEOPLE                                                                          
NUMERAL                                                                         
NUMBER_EXAMPLE                                                                  
COUNTRIES                                                                       

10 rows selected.

SQL> spool off

PostgreSQL's command-line tool, psql, provides functionality similar to SQL*Plus's SPOOL with the \o (\out) meta-command. The following screen snapshot shows this in action in psql.

The file output.txt written via psql's \o meta-command is shown in the next listing.

         List of relations
 Schema |  Name  | Type  |  Owner   
--------+--------+-------+----------
 public | albums | table | postgres
(1 row)

Only the results of the query run in psql are contained in the generated output.txt file. The query itself, even the longer query produced by using \set ECHO_HIDDEN on, is not contained in the output.

One approach to ensuring that the query itself is output with the query's results written to the file is to use the \qecho meta-command to explicitly write the query to the spooled file before running the query. This is demonstrated in the next screen snapshot.

Using \qecho in conjunction with \o does place the query itself in the written file with the query's results as shown in the next listed output.

select * from albums;
           title           |     artist      | year 
---------------------------+-----------------+------
 Back in Black             | AC/DC           | 1980
 Slippery When Wet         | Bon Jovi        | 1986
 Third Stage               | Boston          | 1986
 Hysteria                  | Def Leppard     | 1987
 Some Great Reward         | Depeche Mode    | 1984
 Violator                  | Depeche Mode    | 1990
 Brothers in Arms          | Dire Straits    | 1985
 Rio                       | Duran Duran     | 1982
 Hotel California          | Eagles          | 1976
 Rumours                   | Fleetwood Mac   | 1977
 Kick                      | INXS            | 1987
 Appetite for Destruction  | Guns N' Roses   | 1987
 Thriller                  | Michael Jackson | 1982
 Welcome to the Real World | Mr. Mister      | 1985
 Never Mind                | Nirvana         | 1991
 Please                    | Pet Shop Boys   | 1986
 The Dark Side of the Moon | Pink Floyd      | 1973
 Look Sharp!               | Roxette         | 1988
 Songs from the Big Chair  | Tears for Fears | 1985
 Synchronicity             | The Police      | 1983
 Into the Gap              | Thompson Twins  | 1984
 The Joshua Tree           | U2              | 1987
 1984                      | Van Halen       | 1984
(23 rows)

The main downside to use of \qecho is that it must be used before every statement to be written to the output file.

The psql variable ECHO can be set to queries to have "all SQL commands sent to the server [sent] to standard output as well." This is demonstrated in the next screen snapshot.

Unfortunately, although setting the psql variable ECHO to queries leads to the query being output along with the results in the psql window, the query is not written to the file by the \o meta-command. Instead, when \o is used with ECHO set to queries, the query itself is printed out again to the window and the results only are written to the specified file. This is because, as the documentation states (I added the emphasis), the \o meta-command writes "the query output ... to the standard output." This is demonstrated in the next screen snapshot.

I have not been able to figure out a way to easily use the \o meta-data command and have both the query and its results written to the file without needing to use \qecho. However, another approach that doesn't require \qecho is to run not try to spool the file output from within psql interactively, but to instead execute a SQL script input file externally.

For example, if I make an input file called input.sql that consisted only of a single line with query

select * from albums;

I could run psql with the command

psql -U postgres --echo-queries < input.txt > outputWithQuery.txt

to read that single-line file with the query and write output to the outputWithQuery.txt file. The --echo-queries option works like the \set ECHO queries from within psql and running this command successfully generates the prescribed output file with query and results. The following screen snapshot and the code listing following that demonstrate this.

outputWithQuery.txt

select * from albums;
           title           |     artist      | year 
---------------------------+-----------------+------
 Back in Black             | AC/DC           | 1980
 Slippery When Wet         | Bon Jovi        | 1986
 Third Stage               | Boston          | 1986
 Hysteria                  | Def Leppard     | 1987
 Some Great Reward         | Depeche Mode    | 1984
 Violator                  | Depeche Mode    | 1990
 Brothers in Arms          | Dire Straits    | 1985
 Rio                       | Duran Duran     | 1982
 Hotel California          | Eagles          | 1976
 Rumours                   | Fleetwood Mac   | 1977
 Kick                      | INXS            | 1987
 Appetite for Destruction  | Guns N' Roses   | 1987
 Thriller                  | Michael Jackson | 1982
 Welcome to the Real World | Mr. Mister      | 1985
 Never Mind                | Nirvana         | 1991
 Please                    | Pet Shop Boys   | 1986
 The Dark Side of the Moon | Pink Floyd      | 1973
 Look Sharp!               | Roxette         | 1988
 Songs from the Big Chair  | Tears for Fears | 1985
 Synchronicity             | The Police      | 1983
 Into the Gap              | Thompson Twins  | 1984
 The Joshua Tree           | U2              | 1987
 1984                      | Van Halen       | 1984
(23 rows)

I don't know how to exactly imitate SQL*Plus's writing of the query with its results from within SQL*Plus in psql without needing to add \qecho meta-commands, but passing the input script to psql with the --echo-queries option works very similarly to invoking and spooling the script from within SQL*Plus.

Friday, March 4, 2016

SQL: Counting Groups of Rows Sharing Common Column Values

In this post, I focus on using simple SQL SELECT statements to count the number of rows in a table meeting a particular condition with the results grouped by a certain column of the table. These are all basic SQL concepts, but mixing them allows for different and useful representations of data stored in a relational database. The specific aspects of a SQL query covered in this post and illustrated with simple examples are the aggregate function count(), WHERE, GROUP BY, and HAVING. These will be used to build together a simple single SQL query that indicates the number of rows in a table that match different values for a given column in that table.

I'll need some simple SQL data to demonstrate. The following SQL code demonstrates creation of a table called ALBUMS in a PostgreSQL database followed by use of INSERT statements to populate that table.

createAndPopulateAlbums.sql

CREATE TABLE albums
(
   title text,
   artist text,
   year integer
);

INSERT INTO albums (title, artist, year)
   VALUES ('Back in Black', 'AC/DC', 1980);
INSERT INTO albums (title, artist, year)
   VALUES ('Slippery When Wet', 'Bon Jovi', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('Third Stage', 'Boston', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('Hysteria', 'Def Leppard', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Some Great Reward', 'Depeche Mode', 1984);
INSERT INTO albums (title, artist, year)
   VALUES ('Violator', 'Depeche Mode', 1990);
INSERT INTO albums (title, artist, year)
   VALUES ('Brothers in Arms', 'Dire Straits', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Rio', 'Duran Duran', 1982);
INSERT INTO albums (title, artist, year)
   VALUES ('Hotel California', 'Eagles', 1976);
INSERT INTO albums (title, artist, year)
   VALUES ('Rumours', 'Fleetwood Mac', 1977);
INSERT INTO albums (title, artist, year)
   VALUES ('Kick', 'INXS', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Appetite for Destruction', 'Guns N'' Roses', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Thriller', 'Michael Jackson', 1982);
INSERT INTO albums (title, artist, year)
   VALUES ('Welcome to the Real World', 'Mr. Mister', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Never Mind', 'Nirvana', 1991);
INSERT INTO albums (title, artist, year)
   VALUES ('Please', 'Pet Shop Boys', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('The Dark Side of the Moon', 'Pink Floyd', 1973);
INSERT INTO albums (title, artist, year)
   VALUES ('Look Sharp!', 'Roxette', 1988);
INSERT INTO albums (title, artist, year)
   VALUES ('Songs from the Big Chair', 'Tears for Fears', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Synchronicity', 'The Police', 1983);
INSERT INTO albums (title, artist, year)
   VALUES ('Into the Gap', 'Thompson Twins', 1984);
INSERT INTO albums (title, artist, year)
   VALUES ('The Joshua Tree', 'U2', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('1984', 'Van Halen', 1984);

The next two screen snapshots show the results of running this script in psql:

At this point, if I want to see how many albums were released in each year, I could use several individual SQL query statements like these:

SELECT count(1) FROM albums where year = 1985;
SELECT count(1) FROM albums where year = 1987;

It might be desirable to see how many albums were released in each year without needing an individual query for each year. This is where using an aggregate function like count() with a GROUP BY clause comes in handy. The next query is simple, but takes advantage of GROUP BY to display the count of each "group" of rows grouped by the albums' release years.

SELECT year, count(1)
  FROM albums
 GROUP BY year;

The WHERE clause can be used as normal to narrow the number of returned rows by specifying a narrowing condition. For example, the following query returns the albums that were released in a year after 1988.

SELECT year, count(1)
  FROM albums
 WHERE year > 1988
 GROUP BY year;

We might want to only return the years for which multiple albums (more than one) are in our table. A first naive approach might be as shown next (doesn't work as shown in the screen snapshot that follows):

-- Bad Code!: Don't do this.
SELECT year, count(1)
  FROM albums
 WHERE count(1) > 1
 GROUP BY year;

The last screen snapshot demonstrates that "aggregate functions are not allowed in WHERE." In other words, we cannot use the count() in the WHERE clause. This is where the HAVING clause is useful because HAVING narrows results in a similar manner as WHERE does, but is used with aggregate functions and GROUP BY.

The next SQL listing demonstrates using the HAVING clause to accomplish the earlier attempted task (listing years for which multiple album rows exist in the table):

SELECT year, count(1)
  FROM albums
 GROUP BY year
HAVING count(1) > 1;

Finally, I may want to order the results so that they are listed in increasing (later) years. Two of the SQL queries demonstrated earlier are shown here with ORDER BY added.

SELECT year, count(1)
  FROM albums
 GROUP BY year
 ORDER BY year;

SELECT year, count(1)
  FROM albums
 GROUP BY year
HAVING count(1) > 1
 ORDER BY year;

SQL has become a much richer language than when I first began working with it, but the basic SQL that has been available for numerous years remains effective and useful. Although the examples in this post have been demonstrated using PostgreSQL, these examples should work on most relational databases that implement ANSI SQL.

Tuesday, November 10, 2015

Does PostgreSQL Have an ORA-01795-like Limit?

The Oracle database requires that no more than 1000 entries be used in a SQL IN portion of a WHERE clause and will throw an ORA-01795 error if that number is exceeded. If a value needs to be compared to more than 1000 values, approaches other than use of IN must be applied. I wondered if this limitation applies to PostgreSQL and decided to write a simple application to find out.

For my simple test application, I wanted a very simple table to use with both an Oracle database and a PostgreSQL database.

Oracle: Creating Single Column Table And Inserting Single Row

CREATE TABLE numeral(numeral1 number);
INSERT INTO numeral (numeral1) VALUES (15);

PostgreSQL: Creating Single Column Table and Inserting Single Row

CREATE TABLE numeral(numeral1 numeric);
INSERT INTO numeral (numeral1) VALUES (15);

Building the SQL Query

Java 8 makes it to build up a query to test the condition of more than 1000 values in an IN clause. The next code snippet focuses on how this can be accomplished easily.

Java 8 Construction of SQL Query

final String queryPrefix = "SELECT numeral1 FROM numeral WHERE numeral1 IN ";
final String inClauseTarget =
   IntStream.range(1, numberOfInValues+1).boxed().map(String::valueOf).collect(Collectors.joining(",", "(", ")"));
final String select = queryPrefix + inClauseTarget;

The string constructed by the Java 8 code shown in the last code listing looks like this:

SELECT numeral1 FROM numeral WHERE numeral1 IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001)

Running the Query

When the above SQL query statement is executed against an Oracle database, the ORA-01795 error is manifest:

java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a list is 1000

The PostgreSQL database does not have this same limitation as shown by its output below:

The full Java class I used to demonstrate the above findings in available at https://github.com/dustinmarx/databasedemos/blob/master/dustin/examples/inparameters/Main.java.

Conclusion

There are numerous ways to avoid the ORA-01795 error when using an Oracle database. However, I was curious if the same limitation existed for PostgreSQL and apparently it doesn't (I'm using PostgreSQL 9.4.4 in these examples). In fact, when I tried as many as one million IN values, PostgreSQL was still able to process the query, albeit noticeably slower than with a smaller number of IN values.

Monday, October 5, 2015

Downsides of Mixed Identifiers When Porting Between Oracle and PostgreSQL Databases

Both the Oracle database and the PostgreSQL database use the presence or absence of double quotes to indicate case sensitive or case insensitive identifiers. Each of these databases allows identifiers to be named without quotes (generally case insensitive) or with double quotes (case sensitive). This blog post discusses some of the potential negative consequences of mixing quoted (or delimited) identifiers and case-insenstive identifiers in an Oracle or PostgreSQL database and then trying to port SQL to the other database.

Advantages of Case-Sensitive Quoted/Delimiter Identifiers

There are multiple advantages of case sensitive identifiers. Some of the advertised (real and perceived) benefits of case sensitive database identifiers include:

Ability to use reserved words, key words, and special symbols not available to identifiers without quotes.
- PostgreSQL's keywords:
  - reserved ("only real key words" that "are never allowed as identifiers")
  - unreserved ("special meaning in particular contexts," but "can be used as identifiers in other contexts").
  - "Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.) This allows constructing table or column names that would otherwise not be possible, such as ones containing spaces or ampersands."
- Oracle reserved words and keywords:
  - Oracle SQL Reserved Words that can only be used as "quoted identifiers, although this is not recommended."
  - Oracle SQL Keywords "are not reserved," but using these keywords as names can lead to "SQL statements [that] may be more difficult to read and may lead to unpredictable results."
  - "Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character."
  - "Quoted identifiers can contain any characters and punctuations marks as well as spaces."
Ability to use the same characters for two different identifiers with case being the differentiation feature.
Avoid dependency on a database's implementation's case assumptions and provide "one universal version."
Explicit case specification avoids issues with case assumptions that might be changeable in some databases such as SQL Server.
Consistency with most programming languages and operating systems' file systems.
Specified in SQL specification and explicitly spells out case of identifiers rather than relying on specific implementation details (case folding) of particular database.
Additional protection in cases where external users are allowed to specify SQL that is to be interpreted as identifiers.

Advantages of Case-Insensitive Identifiers

There are also advantages associated with use of case-insensitive identifiers. It can be argued that case-insensitive identifiers are the "default" in Oracle database and PostgreSQL database because one must use quotes to specify when this default case-insensitivity is not the case.

Case-insensitivity is the "default" in Oracle and PostgreSQL databases.
The best case for readability can be used in any particular context. For example, allows DML and DDL statements to be written to a particular coding convention and then be automatically mapped to the appropriate case folding for various databases.
Avoids errors introduced by developers who are unaware of or unwilling to follow case conventions.
Double quotes (" ") are very different from single quotes (' ') in at least some contexts in both the Oracle and PostgreSQL databases and not using case-sensitive identifier double quotes eliminates need to remember the difference or worry about the next developer not remembering the difference.
- Oracle database single quotes: "Text, character, and string literals are always surrounded by single quotation marks."
- Oracle database double quotes: "A quoted identifier begins and ends with double quotation marks (")."
- PostgreSQL single quotes: Use single quotes in PostgreSQL for "literals"/"string values".
- PostgreSQL double quotes: "delimited identifier or quoted identifier ... is formed by enclosing an arbitrary sequence of characters in double-quotes (")."
Many of the above listed "advantages" may not really be good practices:
- Using reserved words and keywords as identifiers is probably not good for readability anyway.
- Using symbols allowed in quoted identifiers that are not allowed in unquoted identifiers may not be necessary or even desirable.
- Having two different variables of the same name with just different characters cases is probably not a good idea.

Default Case-Insensitive or Quoted Case-Sensitive Identifiers?

In Don’t use double quotes in PostgreSQL, Reuven Lerner makes a case for using PostgreSQL's "default" (no double quotes) case-insensitive identifiers. Lerner also points out that pgAdmin implicitly creates double-quoted case-sensitive identifiers. From an Oracle DBA perspective, @MBigglesworth79 calls quoted identifiers in Oracle an Oracle Gotcha and concludes, "My personal recommendation would be against the use of quoted identifiers as they appear to cause more problems and confusion than they are worth."

A key trade-off to be considered when debating quoted case-sensitive identifiers versus default case-insensitive identifiers is one of being able to (but also required to) explicitly specify identifiers' case versus not being able to (but not having to) specify case of characters used in the identifiers.

Choose One or the Other: Don't Mix Them!

It has been my experience that the worst choice one can make when designing database constructs is to mix case-sensitive and case-insensitive identifiers. Mixing of these make it difficult for developers to know when case matters and when it doesn't, but developers must be aware of the differences in order to use them appropriately. Mixing identifiers with implicit case and explicit case definitely violates the Principle of Least Surprise and will almost certainly result in a frustrating runtime bug.

Another factor to consider in this discussion is case folding choices implemented in Oracle database and PostgreSQL database. This case folding can cause unintentional consequences, especially when porting between two databases with different case folding assumptions. The PostgreSQL database folds to lowercase characters (non-standard) while the Oracle database folds to uppercase characters. This significance of this difference is exemplified in one of the first PostgreSQL Wiki "Oracle Compatibility Tasks": "Quoted identifiers, upper vs. lower case folding." Indeed, while I have found PostgreSQL to be heavily focused on being standards-compliant, this case folding behavior is one place that is very non-standard and cannot be easily changed.

About the only "safe" strategy to mix case-sensitive and case-insensitive identifiers in the same database is to know that particular database's default case folding strategy and to name even explicitly named (double quoted) identifiers with exactly the same case as the database will case fold non-quoted identifiers. For example, in PostgreSQL, one could name all identifiers in quotes with completely lowercase characters because PostgreSQL will default unquoted identifiers to all lowercase characters. However, when using Oracle, the opposite approach would be needed: all quoted identifiers should be all uppercase to allow case-sensitive and case-insensitive identifiers to be intermixed. Problems will arise, of course, when one attempts to port from one of these databases to the other because the assumption of lowercase or uppercase changes. The better approach, then, for database portability between Oracle and PostgreSQL databases is to commit either to using quoted case-sensitive identifiers everywhere (they are then explicitly named the same for both databases) or to use default case-insensitive identifiers everywhere (and each database will appropriately case fold appropriately in its own approach).

Conclusion

There are advantages to both identifiers with implicit case (case insensitive) and to identifiers with explicit (quoted and case sensitive) case in both Oracle database and PostgreSQL database with room for personal preferences and tastes to influence any decision on which approach to use. Although I prefer (at least at the time of this writing) to use the implicit (default) case-insensitive approach, I would rather use the explicitly spelled-out (with double quotes) identifier cases in all cases than mix the approach and use explicit case specification for identifiers in some cases and implicit specification of case of identifiers in other cases. Mixing the approaches makes it difficult to know which is being used in each table and column in the database and makes it more difficult to port the SQL code between databases such as PostgreSQL and Oracle that make different assumptions regarding case folding.

Additional Reading

Database identifiers, quoting and case sensitivity
Oracle Database 11g (11.1.1) Database SQL Language Reference: Schema Object Names and Qualifiers
PostgreSQL 9.4.4 Documentation: Identifiers and Keywords
Are there benefits to a case sensitive database?
Reason why oracle is case sensitive?

Friday, September 11, 2015

Passing Arrays to a PostgreSQL PL/pgSQL Function

It can be handy to pass a collection of strings to a PL/pgSQL stored function via a PostgreSQL array. This is generally a very easy thing to accomplish, but this post demonstrates a couple of nuances to be aware of when passing an array to a PL/pgSQL function from JDBC or psql.

The next code listing is for a contrived PL/pgSQL stored function that will be used in this post. This function accepts an array of text variables, loops over them based on array length, and reports these strings via the PL/pgSQL RAISE statement.

printStrings.sql

CREATE OR REPLACE FUNCTION printStrings(strings text[]) RETURNS void AS $printStrings$
DECLARE
   number_strings integer := array_length(strings, 1);
   string_index integer := 1;
BEGIN
   WHILE string_index <= number_strings LOOP
      RAISE NOTICE '%', strings[string_index];
      string_index = string_index + 1;
   END LOOP;
END;
$printStrings$ LANGUAGE plpgsql;

The above PL/pgSQL code in file printStrings.sql can executed in psql with \ir as shown in the next screen snapshot.

The syntax for invoking a PL/pgSQL stored function with an array as an argument is described in the section "Array Value Input" in the PostgreSQL Arrays documentation. This documentation explains that "general format of an array constant" is '{ val1 delim val2 delim ... }' where delim is a delimited of comma (,) in most cases. The same documentation shows an example: '{{1,2,3},{4,5,6},{7,8,9}}'. This example provides three arrays of integral numbers with three integral numbers in each array.

The array literal syntax just shown is straightforward to use with numeric types such as the integers in the example shown. However, for strings, there is a need to escape the quotes around the strings because there are already quotes around the entire array ('{}'). This escaping is accomplished by surrounding each string in the array with two single quotes on each side. For example, to invoke the stored function just shown on the three strings "Inspired", "Actual", and "Events", the following syntax can be used in psql: SELECT printstrings('{''Inspired'', ''Actual'', ''Events''}'); as shown in the next screen snapshot.

Arrays can be passed to PL/pgSQL functions from Java code as well. This provides an easy approach for passing Java collections to PL/pgSQL functions. The following Java code snippet demonstrates how to call the stored function shown earlier with JDBC. Because this stored function returns void (it's more like a stored procedure), the JDBC code does not need to invoke any CallableStatement's overridden registerOutParameter() methods.

JDBC Code Invoking Stored Function with Java Array

final CallableStatement callable =
   connection.prepareCall("{ call printstrings ( ? ) }");
final String[] strings = {"Inspired", "Actual", "Events"};
final Array stringsArray = connection.createArrayOf("varchar", strings);
callable.setArray(1, stringsArray);
callable.execute();
callable.close();

Java applications often work more with Java collections than with arrays, but fortunately Collection provides the toArray(T[]) for easily getting an array representation of a collection. For example, the next code listing is adapted from the previous code listing, but works against an ArrayList rather than an array.

JDBC Code Invoking Stored Function with Java Collection

final CallableStatement callable =
   connection.prepareCall("{ call printstrings ( ? ) }");
final ArrayList<String> strings = new ArrayList<>();
strings.add("Inspired");
strings.add("Actual");
strings.add("Events");
final Array stringsArray =
   connection.createArrayOf(
      "varchar",
      strings.toArray(new String[strings.size()]));
callable.setArray(1, stringsArray);
callable.execute();
callable.close();

Conclusion

The ability to pass an array as a parameter to a PostgreSQL PL/pgSQL stored function is a straightforward process. This post specifically demonstrated passing an array of strings (including proper escaping) to a PL/pgSQL stored function from psql and passing an array of Strings to a PL/pgSQL stored function from JDBC using java.sql.Array and Connection.createArrayOf(String, Object[]).

Tuesday, August 18, 2015

Setting PostgreSQL psql Variable Based Upon Query Result

When using PostgreSQL's psql command-line tool to interact with a PostgreSQL database via operator interaction or script, it is not uncommon to want to set psql variables based on the results of a particular query. While PostgreSQL's procedural language PL/pgSQL supports approaches such as SELECT INTO and assignment (:=) to set PL/pgSQL variables based on a query result, these approaches are not supported for psql variable assignment.

The typical way to make a psql variable assignment is via use of \set. This allows for the setting of the psql variable to a literal value. However, there are situations in which it is desirable to set the psql variable based upon the result of a query. This is done with the \gset option in psql. Unlike the \set operation in psql which sets a variable with an explicitly specified name to an explicitly specified literal value, \gset implicitly names the psql variables after the names of the columns (or aliases if columns are aliased) returned by the query to which the \gset is associated. The \gset is specified after the query (no semicolon generally on the query) and there is no semicolon after the \gset statement (just as no semicolon should be placed after a \set statement).

It is easier to see how \gset works with a code sample. The next code listing shows a small psql file that takes advantage of \gset to set a psql variable named "name" whose value that was set by the query is displayed using psql's colon prefix notation to "echo" its value.

CREATE TABLE person
(
   name text
);

INSERT INTO person (name) VALUES ('Dustin');

SELECT name FROM person \gset
\echo :name

DROP TABLE person;

In the previous code listing, lines 8-9 are the relevant lines for this discussion (the remainder of the lines are for setup and teardown of the demonstration). Line 8 contains the query (sans semicolon) followed by \gset. A psql variable of 'name' is set by that as evidenced by the echo-ing of its value in line 9. The output showing this works looks like this in a psql terminal window:

CREATE TABLE
INSERT 0 1
Dustin
DROP TABLE

Additional Considerations When Using psql's \gset

Placement of a semicolon between the query and the \gset affects the output.
- Placing a semicolon after the query and before the \gset will execute the query and display the query results before setting the variable(s).
- Leaving the semicolon out will execute the query to populate parameters with the names of the query's columns and aliases, but will not display the actual query results.
There should be no semicolon after the entire statement and placing a semicolon after the \gset will mess up the variable setting.
- Error: invalid command \gset;
Query being used to set variable via \gset should return exactly one row.
- ERROR: more than one row returned for \gset
When a column in the SELECT clause of a query associated with \gset is aliased, there are actually two psql variable names by which the returned value can be accessed. They are the column name and the alias name.
- This allows a psql developer to alias a predefined column to any name he or she prefers for the variable set by \gset.

Conclusion

When using psql, use \set variable_name variable_value to explicitly set a psql variable with the name provided by the first argument and an associated value provided by the second argument. To set a psql variable based on query results, append \gset after the query (without semicolon generally) and access returned values by column names (or by columns' aliased names).

Monday, August 17, 2015

Procedure-Like Functions in PostgreSQL PL/pgSQL

PostgreSQL does not support stored procedures in the sense that a database such as Oracle does, but it does support stored functions. In this post, I look at a few tactics that can make the use of a stored function in PostgreSQL (stored function and its calling code both written in PL/pgSQL) feel like using a stored procedure. These simple approaches allow developers to use PostgreSQL stored functions in a manner that is more consistent with use of stored procedures.

Stored procedures and stored functions are very similar and, in fact, I've often heard the term "stored procedure" used interchangeably for stored procedures and for stored functions. For purposes of this post, the essential differences between the two can be summarized as:

Functions are created with the FUNCTION keyword and procedures are created with the PROCEDURE keyword.
Stored procedures do not return a value, but stored functions return a single value.
- The stored function's return value can be used in SELECT statements.

Because PostgreSQL PL/pgSQL only supports stored functions, the defined functions need to declare a return type. Fortunately, in the case of our emulated "stored procedure," we can declare void as the return type. This is demonstrated in the code listing below for a "Hello World" implementation written in PL/pgSQL.

CREATE OR REPLACE FUNCTION helloWorld(name text) RETURNS void AS $helloWorld$
DECLARE
BEGIN
    RAISE LOG 'Hello, %', name;
END;
$helloWorld$ LANGUAGE plpgsql;

With a PostgreSQL stored function with void return type written, we now can invoke it from a client. In this case, I will look at three approaches for calling the stored function from other PL/pgSQL code.

PL/pgSQL: Invoke Function Via SELECT Statement

One approach for calling the stored function in PL/pgSQL code is to use SELECT INTO. The most obvious disadvantage is that, in the case of a procedure-like function returning void, nothing useful being selected and so the variable being selected into must be ignored anyway. The next code listing demonstrates using SELECT INTO to invoke the procedure-like function. The variable in this example, called "dumped", will not have anything useful selected into it, but this statement successfully invoke the stored function. Besides the line shown here, I also need a line in the DECLARE section to declare the "dumped" variable.

SELECT INTO dumped helloWorld('Dustin');

PL/pgSQL: Invoke Function Via Variable Assignment

The PL/pgSQL assignment operator provides another way to invoke the procedure-like stored function. As with the previous example, this approach requires a variable ("ignored") be declared in the DECLARE section and then that variable is assigned the result of the function that returns void, making it effectively a throw-away variable as well.

ignored := helloWorld('Dustin');

PL/pgSQL: Use PERFORM to Explicitly Ignore Returned Value

The PL/pgSQL command PERFORM provides some syntactical advantages when invoking procedure-like stored functions. This command does not require a PL/pgSQL variable to be declared. This saves the line to declare the variable and avoids the pretense of setting a variable that is never really set. It's really just a shortcut for SELECT, but it's syntactically sweeter and makes for more readable code because code maintainers don't have to figure out that a statement that appears to make an assignment actually does not do so.

PERFORM helloWorld('Dustin');

Conclusion

Although PostgreSQL only supports stored "functions" (and not stored "procedures"), it provides syntax that allows for functions to take on procedure-like qualities. BY allowing stored functions to return void and be to called from PL/pgSQL code via PERFORM that expects no result value, PostgreSQL allows client code to appear as if it is invoking a stored procedure rather than a stored function.

Inspired by Actual Events

Dustin's Pages

Tuesday, February 19, 2019

Stashing Previously Set psql Variables

Friday, February 15, 2019

PostgreSQL's psql \set versus SET

Saturday, February 2, 2019

Revealing the Queries Behind psql's Backslash Commands

Monday, November 7, 2016

Fixed-Point and Floating-Point: Two Things That Don't Go Well Together

Monday, September 12, 2016

More on Spooling Queries and Results in psql

Saturday, September 10, 2016

AutoCommit in PostgreSQL's psql

Thursday, August 11, 2016

SPOOLing Queries with Results in psql

Friday, March 4, 2016

SQL: Counting Groups of Rows Sharing Common Column Values

Tuesday, November 10, 2015

Does PostgreSQL Have an ORA-01795-like Limit?

Monday, October 5, 2015

Downsides of Mixed Identifiers When Porting Between Oracle and PostgreSQL Databases

Friday, September 11, 2015

Passing Arrays to a PostgreSQL PL/pgSQL Function

Tuesday, August 18, 2015

Setting PostgreSQL psql Variable Based Upon Query Result

Monday, August 17, 2015

Procedure-Like Functions in PostgreSQL PL/pgSQL

My Non-Technical Blog

Software Development

Affiliations

Affiliations

Support Wikipedia