0% found this document useful (0 votes)
252 views

Essential Postgres Database Development Using PostgreSQL (Rick Silva)

Uploaded by

hocine djafar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
252 views

Essential Postgres Database Development Using PostgreSQL (Rick Silva)

Uploaded by

hocine djafar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 176

Essential Postgres: Database Development using PostgreSQL

Copyright 2020, 2021 © Rick Silva

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted,
in any form without prior written permission of the author.

Every effort has been made to ensure the accuracy of the information presented. However, the
information contained in the book is sold without warranty, either express or implied. The author will not
be held liable for any damages caused directly or indirectly by this book.

The image of the elephant on the front cover by Ksyu Deniska/Shutterstock.com

ISBN: 9798687854184 (paperback)


To Patti, Jacki, and Rj - the best family in the history of families.
TABLE OF CONTENTS

Introduction

Audience

Conventions
Ellipses Used in Examples
Parens

1 Selecting Data from a Postgres Database


Introducing Database Tables
Introducing SQL
Ordering Rows
Null Values
Formatting SQL
Adding Comments to SQL
Calling Postgres Functions
The upper( ) Function
The coalesce( ) Function
Calling Aggregate Functions
Using “Group By” with Aggregate Functions
A Rule of Thumb for Using “Group By”
Using “Having” in SQL
Don’t Repeat Yourself
Joining Database Tables
Table Aliasing
Types of Joins
Inner Joins
Outer Joins
Cross Joins
Self Joins
Column Aliasing
Alternate Join Syntax
Using Parens
Joining with “Using” Instead of “On”
Old-School SQL
Complex Joins with Multiple Tables.
Complex SQL Statements with Inner and Outer Joins
String Concatenation
Comparing Values in SQL
Greater Than
Greater Than or Equal to
Less Than
Less Than or Equal to
Not Equal
In
Between
Like
Is Null
Not
Subqueries
Correlated Subqueries
Subqueries that Return One Row
Distinct
Limit

2 Creating and Changing Database Tables


Data Types
Commonly-Used Data Types
Text
Integer
Numeric
Date
Time Without Time Zone
Time With Time Zone
Timestamp Without Time Zone
Timestamp With Time Zone
Boolean
Character
Less Commonly-Used Data Types
Numbers
Formatted Data
Serial
Bits and Bytes
Geometry-Related
Specialty Addresses
Miscellaneous
Constraints
Primary Keys
Single Column Primary Keys
Multi-Column Primary Keys
Foreign Keys
Not Null Columns
Unique
Check
Indexes
Dropping Tables
Altering Tables
Adding and Modifying Data
Inserting Data
The Number of Columns Must Match the Number of Values
Always List your Columns
Inserting Data from Another Table
Updating Data
Deleting Data
The Delete Statement
The Truncate Statement

3 More Database Objects


Views
Using Views to Hide Sensitive Data
Materialized Views
Sequences
Creating Functions
Parameters and Arguments
SQL vs. PL/pgSQL
Writing Functions with SQL
Writing Functions with PL/PgSQL
Return Types
Dollar Quoting with Tags
Parameters by Position vs Name
Parameter Modes: In, Out, Inout, and Variadic
Procedures
Triggers
Audit Triggers
Creating the Audit Table
Creating the Trigger Function
Creating the Trigger
Watching it Work
Auditing Updates and Deletes
Special Variables for Triggers
Altering Triggers
Dropping Triggers
Disabling Triggers
Enabling Triggers

4 Built-In Functions
Aggregate Functions
abs( )
upper( )
lower( )
initcap( )
round( )
trunc( )
ceil( )
floor( )
length( )
substr( )
trim( )
ltrim( )
rtrim( )
left( )
right( )
lpad( )
replace( )
format( )
extract( )
exp( )
pi( )
power( )
random( )
position( )
version( )

5 Tips and Tricks


PgAdmin
Connecting PgAdmin to a Postgres Server
Production Servers and Development Servers
Creating the Server Connections
The Tree Control Pane
The Query Tool
Changing a Table’s Column Order
Debugging with Raise
All Balls
Casting Data Types
Functions for Casting
to_date( )
to_timestamp( )
to_char( )
to_number( )
Using Parens for Math
Transferring Data To and From Files
Loading Unstructured Data
Common Table Expressions
Calling Postgres from Programming Languages
Final Thoughts
INTRODUCTION

PostgreSQL is a popular free and open source relational database


management system. It is also widely known as “Postgres”. I will refer to it
by its more informal alias, “Postgres”, in this book rather than as
“PostgreSQL” because that’s the way I hear it said most often in the real
world.

A relational database management system (RDMS) or “relational database”,


at its heart, is a system to store and retrieve data from a database. The term
“relation” describes a structure that stores data in a grid of rows and
columns. A relation is essentially a database table, so a relational database
is a database system that stores data in tables that have rows and columns.

Structured Query Language (SQL) is the language we use to interact with


relational databases, including Postgres. SQL allows us to store, modify, and
delete data from our database. It also allows us to create and remove tables,
query our data, and much more.

Postgres is a great technology to learn. It is a powerful, secure, and fast


relational database system that is growing in popularity. It is a mature, stable
database that has been around for over 30 years. Postgres runs on a wide
variety of operating systems, including Linux, Windows, macOS, and it even
runs in the cloud. Postgres helps you to maintain the integrity of your data. It
has a robust set of features, and is extensible if you ever need to do
something unconventional that it doesn’t already do.

This book, Essential Postgres, will give you exposure to the most used – and
useful – parts of Postgres development, as well as tips and insights that I
have picked up over my years of working with Postgres. Writing SQL
statements, creating tables, functions, triggers, views, and data integrity are
explained.
AUDIENCE

This book is suitable for anybody who wants to learn to use Postgres. That
includes folks who are new to Postgres and databases, intermediate-level
developers who would like a refresher, and even seasoned software
developers who are transitioning from another database to Postgres. This
book is for anybody who is interested in learning about the essentials of
using the Postgres database.

The aim of this book is to show you how to do things with the Postgres
database. The book is short on theory and long on examples. If you are the
type of person who likes that sort of thing, read on.

Since this book focuses on Postgres development rather than administration,


if you are a Postgres Database Administrator (DBA) you may want to choose
another book. This book does not discuss backup, recovery, security or other
DBA-related issues.
CONVENTIONS

In this book, I have made some stylistic choices that you might want to be
aware of.

ELLIPSES USED IN EXAMPLES


There are several examples in this book that show data being returned by the
Postgres database. In cases where many rows are returned, I have shown a
few of the rows and added an ellipsis (“…”) rather than showing all the
results and making the book 900 pages long. When you see “…” be aware
that I am not showing all of the results in order to save space. For example:

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
… … …
44 Barack Obama Democrat
45 Donald Trump Republican
46 Joe Biden Democrat

In this case, a full set of all 46 presidents was returned from the database, but
I omitted presidents 4 – 43 for the sake of brevity.

PARENS
I often refer to parentheses as “parens” because that’s the way I most often
hear them referred to among developers.

Also, I add parens to the end of function names. For example, Postgres has a
function called “upper” that I write as “upper( )”. You would say this aloud
as “the upper function”, but in this book I add the “parens” to the end of
function name to make it clear that “upper( )” is a function. Functions are
explained later in this book, and we’ll see what the “parens” are for then.
1
SELECTING DATA FROM A
POSTGRES DATABASE

INTRODUCING DATABASE TABLES


In Postgres, data is stored in database tables. Think of a table as a grid of
rows and columns where data can be saved. You can create your own tables
and choose the names of the columns that you want, and specify what type of
data they should contain. Each record is saved as a row in the table.

For example, here is a table called us_president that has been defined with
three columns: president_id, president_name, and president_party.

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democrat
8 Martin Van Buren Democrat
9 William Harrison Whig
10 John Tyler Whig
11 James Polk Democrat
12 Zachary Taylor Whig
13 Millard Fillmore Whig
14 Franklin Pierce Democrat
15 James Buchanan Democrat
16 Abraham Lincoln Republican
17 Andrew Johnson Democrat
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Abram Garfield Republican
21 Chester Arthur Republican
22 Grover Cleveland Democrat
23 Benjamin Harrison Republican
24 Grover Cleveland Democrat
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
28 Woodrow Wilson Democrat
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
32 Franklin Roosevelt Democrat
33 Harry Truman Democrat
34 Dwight Eisenhower Republican
35 John Kennedy Democrat
36 Lyndon Johnson Democrat
37 Richard Nixon Republican
38 Gerald Ford Republican
39 Jimmy Carter Democrat
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
42 Bill Clinton Democrat
43 George W. Bush Republican
44 Barack Obama Democrat
45 Donald Trump Republican
46 Joe Biden Democrat

This table contains 46 rows of data, one row for each president.

When naming database tables, it’s a good idea to stick with singular names
like us_president rather than plural names like us_presidents. Although
Postgres will allow you to name tables in the plural form, table names should
be singular by convention.

INTRODUCING SQL
In order to interact with Postgres, you will use Structured Query Language
(SQL). You will use SQL often for interacting with Postgres. Using SQL, you
can create tables, show the contents of tables, create new rows, update
values, and much more.

For example, you could see the data in our us_president table by using this
SQL select statement.

select president_id,
president_name,
president_party
from us_president;

This select statement returns data from the us_president database table and
displays it for you. Here we chose to select the president_id,
president_name, and president_party columns. Note that you should type a
semi-colon at the end of your SQL statements. The results are:

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
… … …
44 Barack Obama Democrat
45 Donald Trump Republican
46 Joe Biden Democrat

The query returns all 46 US presidents (although I have not shown presidents
4 – 43 for brevity).

Now let’s say you want to show only the rows for Republicans in the table.
You could make a change to the SQL and add a “where” clause.

select president_id,
president_name,
president_party
from us_president
where president_party = 'Republican';

president_id president_name president_party


16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican

The SQL query now returns 19 rows because there are 19 Republicans in the
table.

Since we are now retrieving a list of Republicans only, let’s change the SQL
to not select the president_party column. Let’s select just the president_id
and president_name columns.

select president_id,
president_name
from us_president
where president_party = 'Republican';

president_id president_name
16 Abraham Lincoln
18 Ulysses Grant
19 Rutherford Hayes
20 James Garfield
21 Chester Arthur
23 Benjamin Harrison
25 William McKinley
26 Theodore Roosevelt
27 William Taft
29 Warren Harding
30 Calvin Coolidge
31 Herbert Hoover
34 Dwight Eisenhower
37 Richard Nixon
38 Gerald Ford
40 Ronald Reagan
41 George Herbert Walker Bush
43 George W. Bush
45 Donald Trump

The SQL query didn’t show the party_name column because we didn’t select
it, and again, the query returned 19 rows.

ORDERING ROWS
Now let’s display all the rows and columns in the table ordered by political
party and then by the president’s ID. To do that, we can add an “order by”
clause to our SQL statement.

select president_id,
president_name,
president_party
from us_president
order by president_party,
president_id;

president_id president_name president_party


7 Andrew Jackson Democrat
8 Martin Van Buren Democrat
11 James Polk Democrat
14 Franklin Pierce Democrat
15 James Buchanan Democrat
17 Andrew Johnson Democrat
22 Grover Cleveland Democrat
24 Grover Cleveland Democrat
28 Woodrow Wilson Democrat
32 Franklin Roosevelt Democrat
33 Harry Truman Democrat
35 John Kennedy Democrat
36 Lyndon Johnson Democrat
39 Jimmy Carter Democrat
42 Bill Clinton Democrat
44 Barack Obama Democrat
46 Joe Biden Democrat
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
2 John Adams Federalist
16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican
9 William Harrison Whig
10 John Tyler Whig
12 Zachary Taylor Whig
13 Millard Fillmore Whig
1 George Washington null

Adding the “order by” clause had the effect of showing us the rows ordered
alphabetically by the president_party column and then ordered by the
president_id column in lowest to highest order numerically. Democrats were
shown first because the string “Democrat” comes alphabetically before the
other political parties (“Democratic-Republican”, “Federalist”,
“Republican”, and “Whig”). Of the Democrat presidents, Andrew Jackson
appeared before Martin Van Buren because Jackson’s president_id value
was lower.
You can also specify whether you want to order in ascending or descending
order. That is, in low-to-high order or in high-to-low order.

select president_id,
president_name,
president_party
from us_president
order by president_party desc,
president_id asc;

president_id president_name president_party


1 George Washington null
9 William Harrison Whig
10 John Tyler Whig
12 Zachary Taylor Whig
13 Millard Fillmore Whig
16 Abraham Lincoln Republican
18 Ulysses Grant Republican
… … …
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democrat
8 Martin Van Buren Democrat
11 James Polk Democrat
14 Franklin Pierce Democrat
15 James Buchanan Democrat
17 Andrew Johnson Democrat
… … …

The query returned all 46 rows (Again, I haven’t shown all the rows for
brevity). The data is displayed in descending (reverse) order for the
president_party column and then displayed in ascending (low-to-high) order
for the president_id column. If you don’t specify descending or ascending
(which can be abbreviated “desc” or “asc”), Postgres defaults to ascending
order and you will see your results in low-to-high order.

NULL VALUES
What’s going on with George Washington? Why is his political party listed as
null? Null is a special value in Postgres that represents an empty value.
George Washington was never a member of a political party so his
political_party value is set to null in our database table.

To show only the rows for presidents that have a null political_party, you
could run this query:

select president_id,
president_name,
president_party
from us_president
where president_party is null;

That SQL query would return results for only George.

president_id president_name president_party


1 George Washington null

On the other hand, if you wanted to exclude rows that have a null
president_party, you would run this query:

select president_id,
president_name,
president_party
from us_president
where president_party is not null;

All presidents except George Washington would be returned by this query


because we specified that we want to see rows where the president_party
column is not null.

Null is a special value in Postgres. Null is not the same as zero, and it is not
the same as an empty string or a space character. It is its own value, and
Postgres has special syntax to help you deal with it.

The word “syntax”, in software development speak, means the words and
symbols you can use that are part of the language. Some of the special syntax
Postgres has to handle null values includes “is null”, “is not null”, and
“coalesce”. (We will see “coalesce” in a moment).

FORMATTING SQL
So far, the SQL we have used has been in a nice readable format.

1 select president_id,
2 president_name,
3 president_party
4 from us_president;

The column names and the table name all line up together vertically. There is
only one column name on each of lines 1 - 3. It is a good idea to write your
SQL statements in a neat, maintainable format like this, but SQL will also
allow you to write SQL statements that look like this:

select president_id, president_name, president_party from


us_president;

or like this:

select president_id, president_name, president_party


from us_president;

As long as SQL can figure out what you intended, it will run your statement.
You can add extra spaces between commas if you want. You can write the
entire statement on one line or you can use many lines.

You can also use the asterisk wildcard character as a shortcut to tell SQL to
retrieve all columns.

select * from us_president;

would return the same results as:

select president_id, president_name, president_party from


us_president;
“select * from” means to select all columns from the table. Since
president_id, president_name, and president_party are all of the columns we
have in the us_president table, “select *” will produce the same results as
listing the column names individually. People will often say “select * from”
aloud as “select star from”.

ADDING COMMENTS TO SQL


Sometimes you will want to add comments to your SQL code to add
explanations that will help developers maintain the code in the future. That’s
a good idea, especially because the developer who has to maintain the code
in the future may be you.

To add single-line comments, use “--” (dash dash).

-- This SQL selects all US Presidents from the database


select * from us_president;

You can also use “--” to add a comment at the end of a line of SQL.

select president_id,
president_name,
president_party -- Political party, not birthday
from us_president;

To add comments that will be spread over more than one line, you can also
use “/*” at the beginning and “*/” at the end of comment.

/*
This query retrieves all the US Presidents.
There sure are a lot of them!
*/
select * from us_president;

A well-placed comment that explains a complex SQL statement will save


time for anybody who has to maintain your code in the future. But avoid
adding comments that are obvious or can be easily understood by simply
reading the code. In the example above, for instance, it isn’t worth the time to
comment that “This SQL selects all US Presidents from the database” when
that is obvious from the SQL on the next line.
CALLING POSTGRES FUNCTIONS
Postgres comes with several pre-written functions that you can call. We’ll
learn more about these functions in the “Built-In Functions” section a little
later, but let’s start by taking a look at the upper( ) function and the coalesce(
) function.

The upper( ) Function


One function Postgres comes with is the upper( ) function. The upper( )
function converts a string to all uppercase letters. You could call it from a
SQL statement like this:

select upper('Say it loud');

The string “Say it loud” is an argument to the function. In other words, the
string “Say it loud” is sent to the upper( ) function, and the upper( ) function
will use that value to do something. In this case, it returns the uppercase
version of it, which is “SAY IT LOUD”.

Note that Postgres allows you to use a “select” statement without the “from”
clause. Normally, you would use a select statement to select data from a
database table, but in this case, since we are passing a hardcoded string
(“Say it loud”) into the function, we do not need to specify which database
table to get the data from. We don’t need to say “from table_name” because
there is no table involved. We can just “select upper(‘Say it loud’)” without
the word “from”.

To call the upper( ) function and pass a value to it from the database, we can
use a SQL statement like this:

select upper(president_name)
from us_president
where president_id = 10;

The result is:

upper
JOHN TYLER
In our us_president table, president 10 is “John Tyler”. We got “John Tyler”
from the president_name column of the us_president table and then used it as
an argument to the upper( ) function, which used it to return the text “JOHN
TYLER”.

The coalesce( ) Function


Another example of a Postgres function is the coalesce( ) function. coalesce(
) helps us to deal with null values in our data. Sometimes when we select a
null value from a table, we want to display something other than null.

For example, when we selected from the us_president table, we saw that
George Washington’s political party was set to null.

select president_name,
president_party
from us_president;

president_name president_party
George Washington null
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …

If we want to display “No Political Party” instead of null, we could use the
coalesce( ) function.

select president_name,
coalesce(president_party, 'No Political Party')
from us_president;

president_name president_party
George Washington No Political Party
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …

The coalesce( ) function takes a list of values and returns the first value that
isn’t null. In this case, we sent the coalesce( ) function two values: the
president_party column from the us_president table, and the text “No
Political Party”. If the president_party column in the us_president table is not
null, it will be returned. But if the president_party column in the us_president
table is null, the text “No Political Party” will be returned. The effect is that
each president’s political party is shown, and for George Washington, null is
replaced by the text “No Political Party”.

Calling Aggregate Functions


An aggregate function is a function that can take multiple values in the
database and return a single value. Some examples of aggregate functions
are:

count( ) Returns a count of rows


max( ) Returns the maximum, or largest, of a set
min( ) Returns the minimum, or smallest, of a set
avg( ) Returns the average of a set of numbers
sum( ) Returns the sum of a set of numbers

Let’s take a look at how we might call the count( ) function.

select count(*)
from us_president;

This count( ) function will return the number of rows in the us_president
table, which is 46. Note that with the count( ) function you use an asterisk
between the parentheses, like this: count(*).

Now let’s look at calling the max( ) function.

select max(president_id)
from us_president;

This max( ) function returns the maximum president_id value in the table. The
row in this table with the highest president_id has a president_id of 46, so 46
is returned.

Here is how we can call the min( ) function.

select min(president_id)
from us_president;

This min( ) function returns the minimum president_id value in the table. The
lowest president_id in the table is 1, so 1 is returned.

Using “Group By” with Aggregate Functions


Let’s say we wanted to get a count of how many presidents there are of each
political party. We could use this SQL query.

select president_party,
count(*)
from us_president
group by president_party;

president_party count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democrat 17
Whig 4

The query’s results show us that there was one president without a political
party, 19 Republicans, 4 Democratic-Republicans, 1 Federalist, 17
Democrats, and 4 Whigs.

Let’s look at that last line of SQL that says “group by president_party”. We
said that an aggregate function returns a single value based on multiple
values in the database. For example, there are 19 rows in the table with
Republican presidents and the count( ) function returns just one row for them
that shows the total of 19. The “group by” in the SQL statement tells Postgres
to group the results by the president_party column.

A common mistake is to use an aggregate function and forget to use a “group


by” clause. For example, if you were to run this SQL statement without the
words “group by”:

select president_party,
count(*)
from us_president;

Postgres would return this error:

ERROR: column "us_president.president_party" must appear in the


GROUP BY clause or be used in an aggregate function
LINE 1: select president_party, count(*) from us_president;
^
SQL state: 42803
Character: 8

To correct the problem, add the “group by” clause and tell SQL which
column to group by.

A Rule of Thumb for Using “Group By”


Here is a rule of thumb for figuring out which column or columns to group
by: The part of the query between the word “select” and the word “from” is
called the “select list”. It lists the things that we want to select from the
database. In this case there are two things: president_name and count(*).
Determine which ones are calls to aggregate functions and which are not, and
group by the ones that are not. In other words, we want to group by the non-
aggregates.

In this example, count(*) is a call to an aggregate function and


president_name is not, so we want to group by the president_name column.

Here is another example. Let’s say we want to list the highest president_id
value for each political party. We want to do something like this:

select president_party,
max(president_id)
from us_president;

But the query fails because we forgot the “group by” statement.

ERROR: column "us_president.president_party" must appear in the


GROUP BY clause or be used in an aggregate function
LINE 1: select president_party, max(president_id)
^
SQL state: 42803
Character: 8

To figure out what things should go in the “group by statement”, we look at


the select list. We are selecting two things: the president_party column and
max(president_id). Remember, we almost always want to group by all of the
non-aggregates. The max( ) function is an aggregate function so the only non-
aggregate being selected is the president_party column. If we group by the
president_party column our query works nicely.

select president_party,
max(president_id)
from us_president
group by president_party;

president_party max
null 1
Republican 45
Democratic-Republican 6
Federalist 2
Democrat 46
Whig 13

Here is a more complex example using a table containing world population


data:

select continent,
country,
sum(population),
sum(area)
from world_population;

This query will not run because we forgot the “group by” clause. But what
should be in the “group by” clause? Everything in the select clause that isn’t
an aggregate function.

select continent,
country,
sum(population),
sum(area)
from world_population
group by continent,
country;

continent country sum sum


Africa Algeria 35978000 2381741
Africa Angola 18992708 1246700
South America Uruguay 3356584 176215
South America Venezuela 28833845 716445

USING “HAVING” IN SQL


There is an interesting aspect of our president data that is exposed by this
query:

select president_name,
count(*)
from us_president

group by president_name;

I would expect this query to return a list of all the presidents with a count
of 1. After all, it’s not like there were two presidents named George
Washington. But when we run the query, we see one president comes back
with a count of 2.

president_name count
… …
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …

What is going on here? If we run a query to look at the data in the table, we
can see an interesting wrinkle in the data.

select * from us_president;

president_id president_name president_party


… … …
20 James Garfield Republican
21 Chester Arthur Republican
22 Grover Cleveland Democrat
23 Benjamin Harrison Republican
24 Grover Cleveland Democrat
25 William McKinley Republican
26 Theodore Roosevelt Republican
… … …

It turns out that Grover Cleveland served two non-consecutive terms as


president. That makes him both the 22nd and 24th president. Our query that
selected a count of 2 actually worked correctly because Grover Cleveland
was a glutton for punishment!

When we ran this query, it returned 45 rows:

select president_name,
count(*)
from us_president
group by president_name;

president_name count
… …
Franklin Pierce 1
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …

We had to look through all of those 45 resulting rows to see if there were any
counts that were more than 1. But there is an easier way. We can display only
counts that are higher than 1 by using the “having” statement.

select president_name,
count(*)
from us_president
group by president_name
having count(*) > 1;

Now only Grover Cleveland is shown in the results.

president_name count
Grover Cleveland 2

The “having” clause is a bit like the “where” clause, but it can be used after
a “group by” clause to operate on grouped rows.

DON’T REPEAT YOURSELF


“Don’t Repeat Yourself”, or DRY, is a principle of software development
that we should adhere to. One thing that isn’t ideal about the way the
us_president table was designed is that there is a lot of repeated data in the
table. For example, there are 19 Republicans in the table, and each one of
them has the text “Republican” repeated in the president_party column.

select president_id,
president_name,
president_party
from us_president
where president_party = 'Republican';

president_id president_name president_party


16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican

A better approach is to create a second table that contains just a list of


political parties and an identifier to represent each one.

select * from political_party;

party_id party_name
1 Republican
2 Democratic-Republican
3 Federalist
4 Democrat
5 Whig

We should also change the us_president table and replace the president_party
column with a party_id column that matches the party_id from the
political_party table.

select * from us_president;

president_id president_name party_id


1 George Washington null
2 John Adams 3
3 Thomas Jefferson 2
… … …
44 Barack Obama 4
45 Donald Trump 1
46 Joe Biden 4

Now the text “Republican”, “Democrat” and “Whig” isn’t repeated over and
over in the table, which is an improvement.

JOINING DATABASE TABLES


Because we added the political_party table, our president data isn’t all in
one table anymore. It is spread across two tables. So now, in order to get a
list of presidents and their political parties, we would need to join those two
tables. That is, we need to link the two tables on the party_id column of each
table and then select data from columns in both the tables.

1 select us_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4 from us_president
5 inner join political_party
6 on us_president.party_id = political_party.party_id;

The results of the query are:

president_id president_name party_name


2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democrat
8 Martin Van Buren Democrat
9 William Harrison Whig
… … …

We are selecting from two database tables: the us_president table in line 4 of
our SQL and the political_party table in line 5. Both tables have a party_id
column so we join the tables based on the party_id column of both tables in
line 6.

Postgres allows you to do different types of joins, like inner joins, outer
joins, and cross joins. These will be explained momentarily, but you can see
in line 5 that this query does an inner join.
Since we are now selecting from two tables in our query, every time we
reference a column name, we should specify which table the column comes
from. We can do that by using the table name followed by a period followed
by the column name. For example us_president.president_id refers to the
president_id column in the us_president table.

1 select us_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4 from us_president
5 inner join political_party
6 on us_president.party_id = political_party.party_id;

In this query, the president_id and president_name columns come from the
us_president table, and the party_name column comes from the
political_party column.

TABLE ALIASING
One time-saving shortcut we can use when writing SQL is to make a short
alias for our table names. A table alias is a short, temporary, substitute name
for the table. We can save keystrokes by using an alias of “a” for the
us_president table, and “b” for political_party table.

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;

This query returns the same information as the previous query, but it requires
us to do a lot less typing to create the query.

On line 4 we declare that “a” will be the alias for the us_president table. On
line 5 we declare that “b” will be the alias for the political_party table. We
can choose any letter or letters we want, but this example uses “a” and “b”.
Now we can type just “a” or “b” instead of the table names in lines 1, 2, 3,
and 6. The benefit of table aliasing, as you can see, is that it saves us quite a
bit of typing.

You may see queries that use a slight variation of this syntax, using the word
“as” to define table aliases.

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president as a
5 inner join political_party as b
6 on a.party_id = b.party_id;

You can see on lines 4 and 5 the word “as” was used between the table
names and the alias names. It’s fine to use the “as” syntax, but since the
objective of using table aliases is to cut down on typing, personally I prefer
not to use the word “as”.

TYPES OF JOINS

Inner Joins
Inner joins are the most common type of join. In an inner join, there must be a
match in both tables in order for data to be retrieved from the database. Let’s
take another look at the inner join from the example above.

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;

The query joins the us_president and political_party tables based on the
party_id columns of each of the tables.

We know there are 46 presidents in the us_president table, but strangely,


when we run this query only 45 rows are returned.
Why?

The reason is that the political_party table has one row for each of these
parties: Democrat, Democratic-Republican, Federalist, Republican, and
Whig. But there is one president in the us_president table that isn’t from any
of those parties. George Washington’s party in the us_president table is null.
So, when we joined the two tables, there was no match for George
Washington’s party_id and he did not appear in the results of the query.

That is the nature of an inner join. In an inner join, there must be a match in
both tables in order for the data to be retrieved.

Also, where the “inner join” is specified on line 5, the word “inner” is
optional. The default join type is inner. So this query:

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;

Will produce the same results as this query that does not use the word
“inner” on line 5:

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 join political_party b
6 on a.party_id = b.party_id;

Let’s look at another example of an inner join using different tables. Let’s say
we have a customer table and an item_ordered table. The customer table and
the item_ordered table both have a customer_id column so we can join the
tables based on customer_id.

Here are the contents of the customer table:


customer_id customer_name
1 Penny Sato
2 Dixie Gooseman
3 Molly Terry

Here is what the item_ordered table looks like:

order_id customer_id item_ordered


90 1 Hat
91 2 Gloves

We can use this inner join to see which customer ordered which item.

select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;

Remember, the word “inner” is optional. This query does an inner join even
though the syntax says “join” instead of “inner join”. Inner is the default type
of join.

In this query we saved typing by aliasing the “customer” table as “a”, and the
“item_ordered” table as “b”.

Here are the results:

customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves

Since this was an inner join, results appeared only when there was a match
in both tables. For this query, that means only customers who ordered an item
will appear in the results. “Molly Terry” is in the customer table but she
doesn’t have an order in the item_ordered table so she doesn’t appear in the
results at all.
Outer Joins
Outer joins work a little differently. An outer join will display all rows from
one table, whether or not there is a matching row in the other table. If there
are matching rows in the other table, it will show the columns from that table
too.

1 select a.customer_name,
2 b.item_ordered
3 from customer a left outer join item_ordered b
4 on a.customer_id = b.customer_id;

customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Molly Terry null

Using an outer join, Molly Terry appears in the results of the query, but the
item_ordered column appears as null because there was no matching row in
the item_ordered table. In other words, SQL can’t show us which item she
ordered because she didn’t order an item, so it displays null.

The customer table is the left table because it is to the left of the “left outer
join” syntax on line 3. The item_ordered table is the right table because it is
to the right of the “left outer join” syntax.

Here is another outer join example using our US Presidents data.

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a left outer join political_party b
5 on a.party_id = b.party_id;

president_id president_name party_name


… … …
9 William Harrison Whig
10 John Tyler Whig
1 George Washington null
“left outer join” returns all the rows from the left table (us_president) even if
there is no match in the right table (the political_party table).

Using this outer join, the query returns all 46 presidents. George Washington
appears in the results even though he has no matching political party in the
political_party table. This is different than the inner join we did on these
tables that returned only 45 rows.

The outer join worked as advertised. It showed all of the rows in the
us_president table and showed matching party_name columns from the
political_party table where there were some.

I like to think of the tables that are joined in an outer join as a mandatory
table and an optional table. In order for the president data to appear in the
results, it is mandatory that rows exist in the us_president table. But having
matching rows in the political_party table is optional. Rows will be returned
in our result set even if there isn’t a matching row in the political_party table.

To my way of thinking, us_president is the mandatory table and


political_party is the optional table. The “left outer join” syntax tells
Postgres to do an outer join and treat the table on the left as the mandatory
table.

The first table mentioned on line 4 is the us_president table. It is the left table
because it is to left of the “left outer join” syntax. The political_party table is
the right table because it is to the right of the “left outer join” syntax. Or
more simply stated, the us_president table is on the left in line 4 and the
political_party table is on the right.

This is how Postgres knows to select all rows from the us_president table
and only matching rows from the political_party table. “left outer join”
returns all the rows from the left table (us_president) even if there are no
matches in the right table (political_party).

Also note that the word “outer” is optional. I recommend that you use it for
clarity, but you will undoubtedly come across outer joins that use the syntax
“left join” instead of “left outer join”. The same results will be returned.
You can also do an outer join with the “right outer join” syntax rather than
“left outer join”. The “right outer join” syntax tells Postgres to do an outer
join and treat the table on the right side as the mandatory table.

These two outer joins will return the same results:

Left outer join:

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a left outer join political_party b
5 on a.party_id = b.party_id;

Right outer join:

1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from political_party b right outer join us_president a
5 on a.party_id = b.party_id;

Notice that in the SQL statement with the right outer join, the order of the
tables has been switched. The political_party table is now the left table and
the us_president table is now on the right side.

Cross Joins
Cross joins are used less frequently than inner and outer joins, but there are
occasions when they come in handy.

Let’s say a chain of bookstores wants to send one copy of each new book to
each of their stores. They have a table called bookstore that looks like this:

store_id store_name
1 Mall Store
2 College Campus Bookstore
3 Suburban Store
4 Cellar Store
And they have a table called new_book that looks like this:

book_id book_name
100 Essential Postgres
200 How to Knit
300 Poetry for Dogs

To get a list of which books should go to which stores, we could do a query


that does a cross join between the two tables.

select b.store_name,
n.book_name
from bookstore b
cross join new_book n;

store_name book_name
Mall Store Essential Postgres
Mall Store How to Knit
Mall Store Poetry for Dogs
College Campus Bookstore Essential Postgres
College Campus Bookstore How to Knit
College Campus Bookstore Poetry for Dogs
Suburban Store Essential Postgres
Suburban Store How to Knit
Suburban Store Poetry for Dogs
Cellar Store Essential Postgres
Cellar Store How to Knit
Cellar Store Poetry for Dogs

A cross join joins every row of one table to every row of another table.
There are four rows in our bookstore table and three rows in our new_book
table. So it makes sense that 12 rows are returned by the cross join because 4
x 3 = 12.

Self Joins
A self join is a join where the same table is joined to itself.

Let’s take a look at the church_hierarchy table.


position_id position_desc supervisor_id
1 God null
2 Pope 1
3 Cardinal 2
4 Bishop 3
5 Priest 4

Each row represents one position in the church. The supervisor_id column
represents the position_id of the supervisor. For example, position_id 5 –
“Priest” – has a supervisor_id of 4. If we look at the row in the table that has
a position_id of 4, we see Bishop. So, we know that the Priest’s supervisor
is the Bishop.

In order to select an org chart from the database, we can join the
church_hierarchy table to itself.

select a.position_desc,
b.position_desc
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest

In the result set, the left column is a list of supervisors and the right column is
a list of their subordinates.

By the way, when I named the column “position_desc” in the table I used
“desc” to stand for “description”. That column is a text description of the
position.

The table alias “a” represents the church_hierarchy table, and the table alias
“b” also represents the church_hierarchy table, but we are treating them as
different tables. We joined on the position_id column of the “a” table to the
supervisor_id column of the “b” table.
COLUMN ALIASING
We have seen that we can create aliases for tables. Now let’s create aliases
for columns.

The query above gave us the information we needed for our org chart, but
both columns names appeared as “position_desc”.

position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest

These column headings could be confusing to our end users. Really, the left
column is a list of supervisors and the right column is a list of workers. Let’s
change those column headings to show “Supervisor” and “Worker” instead of
the column headings from the table, “position_desc” and “position_desc”.

select a.position_desc Supervisor,


b.position_desc Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

Now the query returns the same results, but the column headings make the
meaning of the data more clear.

Supervisor Worker
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest

Just as with table aliases, you may see queries that use the word “as” when
defining column aliases, like this:

select a.position_desc as Supervisor,


b.position_desc as Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

Either way is fine. You can use the word “as” or you can leave it out.

ALTERNATE JOIN SYNTAX

Using Parens
Postgres allows you to write SQL queries that accomplish the same results
using different syntax. We saw this query earlier.

1 select a.customer_name,
2 b.item_ordered
3 from customer a
4 join item_ordered b
5 on a.customer_id = b.customer_id;

You may come across queries where they use parens on line 5.

1 select a.customer_name,
2 b.item_ordered
3 from customer a
4 join item_ordered b
5 on (a.customer_id = b.customer_id);

That syntax produces the same results.

Joining with “Using” Instead of “On”


Also, because we are joining the tables on columns that are called
“customer_id” in both tables, we can use the following “using” syntax
instead of the “on” syntax used above.

select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
using (customer_id);
Either approach is fine. You will come across queries that use both the “on”
and the “using” styles.

Old-School SQL
This inner join between the customer and item_ordered tables looked like
this.

select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;

But you may see this query rewritten like this:

select a.customer_name,
b.item_ordered
from customer a,
item_ordered b
where a.customer_id = b.customer_id;

This is an older style of SQL that is still supported by Postgres. This syntax
doesn’t include the word “join”; it just lists the table names separated by
commas. I recommend that you use the syntax shown in the first example, but
you may come across queries written using this older “ANSI-89” style
shown in the second example.

COMPLEX JOINS WITH MULTIPLE TABLES.


So far all of our SQL examples have selected from only two tables. But in
the real world, your SQL queries will get more complicated.

Let’s take a look at the tables for a group of music stores.


store
store_id store_name
1 Bill's Banjos
2 Mike's Music Emporium
3 Tennessee Instruments

inventory
store_id product_id quantity
1 11 5
2 22 9
2 11 2
3 33 12
3 11 4

product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300

supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.

Let’s join these four tables to get an inventory list of the products at all of our
music stores:

1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;

Here are the results:


store_name product_name supplier_name quantity
Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.

That query worked nicely. It was more complicated than our previous SQL
examples because the data we needed for this query was spread across four
tables. That required us to join across all four tables in order to produce our
inventory list.

In lines 5 – 8, we defined tables aliases for all four tables. We used “a” for
the store table, “b” for the product table, “c” for the supplier table, and “d”
for the inventory table. That saved us quite a bit of typing.

In lines 6 – 8, we used inner joins on the tables and specified which columns
the tables should be matched on. Recall that the “join” syntax is shorthand for
“inner join”.

In line 6, we said “join inventory d on d.store_id = a.store_id”. That defined


“d” as the table alias for the inventory table and joined the inventory table to
the store table using the store_id columns of both tables. In line 5, “a” had
already defined as the table alias for the store table, so when we joined
where “d.store_id = a.store_id”, Postgres knew to join the store and
inventory tables based on the store_id columns of both of those tables.

So far all of the table aliases we have seen have been “a”, “b”, “c”, or “d”,
but you can make up your own table aliases. When you are writing a complex
query, you can make your SQL easier to read by making the table aliases
more meaningful. For example, you could write the query like this:

1 select st.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5 from store st
6 join inventory i on i.store_id = st.store_id
7 join product p on p.product_id = i.product_id
8 join supplier sp on sp.supplier_id = p.supplier_id
9 order by st.store_name;

Instead of using “a”, “b”, “c”, and “d”, this time we used table aliases that
are shorthand for the table names. For example, “p” for the product table and
“i” for the inventory table. That can make it easier for you to write the join
condition on say, line 7. It’s harder to lose sight of which tables you are
joining when the table alias gives you a hint like the first letter of the table.

Unfortunately, in this join, two of the tables start with the same letter: store
and supplier. Since there are two tables that start with the letter “s”, for those
tables we used two-letter table aliases of “st” for the store table, and “sp”
for the supplier table.

Also, when you write queries that are joining many tables, note that the order
of the joins matters. For example, let’s switch the order of lines 6 and 7 in
the query above.

1 select st.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5 from store st
6 join product p on p.product_id = i.product_id
7 join inventory i on i.store_id = st.store_id
8 join supplier sp on sp.supplier_id = p.supplier_id
9 order by st.store_name;

When we run the query, Postgres returns this error:

ERROR: missing FROM-clause entry for table "i"


LINE 6: join product p on p.product_id = i.product_id
^
SQL state: 42P01
Character: 125
The reason is that line 6 makes reference to a table alias of “i” but “i” isn’t
defined until the “inventory i” part of line 7. We tried to use the table alias
“i” before it was defined. The fix is to change the order of lines 6 and 7 back
to their original positions.

COMPLEX SQL STATEMENTS WITH INNER AND


OUTER JOINS
Let’s say the “Tennessee Instruments” store discovered a new mandolin in
their stockroom. They added it to our database tables, but they don’t know
who the supplier is. The inventory table would now look like this:

store_id product_id quantity


1 11 5
2 22 9
2 11 2
3 33 12
3 11 4
3 44 1

The last row represents the new mandolin in store_id 3 (“Tennessee


Instruments”). The new product_id is 44 (“Mandolin”), and the quantity is 1.

The mandolin has also been added to the product table, but the supplier was
set to null because we don’t know the supplier.

product_id product_name supplier_id


11 Banjo 100
22 Guitar 200
33 Fiddle 300
44 Mandolin null

The trouble is that now our inventory list query isn’t showing the new
mandolin.

1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;

store_name product_name supplier_name quantity


Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.

These are the same results that we saw before. Why isn’t the mandolin
appearing?

You can see that this query uses all inner joins. Lines 6 – 8 use just the word
“join”, which is shorthand for an inner join. In order to display a product that
is in the product table but doesn’t have a supplier in the supplier table, we
need to do an outer join between the product and supplier tables. But we
want the other joins in the query to continue to be inner joins.

It turns out that all we need to do is add the word “left” to the query to
change the inner join to an outer join.

1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 left join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;

Recall that “left join” is shorthand for “left outer join”. So now we have a
query that does two inner joins (on lines 6 and 7) and one outer join (on line
8). Now the mandolin appears in our inventory list.
store_name product_name supplier_name quantity
Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.
Tennessee Mandolin null 1
Instruments

STRING CONCATENATION
Let’s say we want to get a list of presidents in this format: “President Name
(Political Party)”. To do that, we can take the president_name column and
add a space, then and a left paren to it, then add the president_party value,
then add a right paren to it, like this:

select president_name || ' (' || president_party || ')'


from us_president;

The SQL will return:


George W. Bush (Republican)
Barack Obama (Democrat)
Donald Trump (Republican)
Joe Biden (Democrat)

Concatenation is a fancy word for joining things together. The string


concatenation operator is “||”. It is made by typing two vertical bars, or
“pipe” characters. The pipe character is usually on the right side of the
keyboard above the backslash character.

COMPARING VALUES IN SQL


So far, all of our joins have compared columns based on equality, like
“a.position_id = b.supervisor_id”. But we can also check for these
conditions:
>
Greater Than
>= Greater Than or Equal to
< Less Than
<= Less Than or Equal to
!= or <> Not Equal

And we can use these keywords in our queries:

in Checks that a value is in a list of values


between Checks that a value is in a range of values
like Checks that a string value is like some pattern
is null Checks that a value is null
not Combined with another statement, checks that the condition is not true

Greater Than
Using “>” we can list all the presidents with a president_id that is more than,
say, 40:

select *
from us_president
where president_id > 40;

president_id president_name president_party


41 George Herbert Walker Bush Republican
42 Bill Clinton Democrat
43 George W. Bush Republican
44 Barack Obama Democrat
45 Donald Trump Republican
46 Joe Biden Democrat

Notice that president 40, Ronald Reagan, doesn’t appear in our results. To
show Reagan as well, we can use “>=” instead of “>”.

Greater Than or Equal to


Using “>=” we can list all the presidents with a president_id that is greater
than 40 or equal to 40. Now Reagan appears with the others.

select *
from us_president
where president_id >= 40;

president_id president_name president_party


40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
42 Bill Clinton Democrat
43 George W. Bush Republican
44 Barack Obama Democrat
45 Donald Trump Republican
46 Joe Biden Democrat

Less Than
Using “<” we can list all the presidents with a president_id that is less than,
say, 5:

select *
from us_president
where president_id < 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican

Notice that president 5, James Monroe, doesn’t appear in our results. To


show Monroe as well, we can use “<=” instead of “<”.

Less Than or Equal to


Using “<=” we can list all the presidents with a president_id that is less than
5, or equal to 5. Now Monroe appears with the others.

select *
from us_president
where president_id <= 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican

Not Equal
There are two ways to represent “not equal” in SQL: “!=” and “<>”.

Here is a query using the != syntax:

select *
from product
where product_name != 'Mandolin'
and product_id != 11;

The “!” is the “not” operator. So “!=” means “not equal to”. This query
selects products where the product_name is not Mandolin and the product_id
is not 11 (Banjo).

Here is the same query using the “<>” syntax:

select *
from product
where product_name <> 'Mandolin'
and product_id <> 11;

I think of the “<>” syntax as saying, “show me everything that is less than (<)
11, and also show me everything that is greater than (>) 11”, which is really
another way to say, “not equal to 11”.

Both queries produce the same results: They both produce a list of all the
products except for Mandolin and Banjo, because Banjo is product_id 11.

product_id product_name supplier_id


22 Guitar 200
33 Fiddle 300

I personally like to use the “!=” syntax more than the “<>” syntax, but they do
the same thing. There is no advantage to using one over the other.
In
The “in” keyword checks that a value is in a list of values.

select *
from store
where store_id in (1,3);

store_id store_name
1 Bill's Banjos
3 Tennessee Instruments

Our query returns all rows from the store table where the store_id is a 1 or a
3. Another way to write this query is using the “or” keyword:

select *
from store
where store_id = 1
or store_id = 3;

The advantage to using the “in” syntax is that it’s less typing than using “or”,
and with the “in” clause, you can also select the list of values from the
database, like this:

select *
from store
where store_id in
(
select store_id
from south_east_store
);

The part of the SQL statement within the parens is called a subquery. We’ll
explore subqueries in more detail momentarily.

Between
The “between” keyword checks that a value is within a range of values.

select *
from product
where product_id between 11 and 33;
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300

Postgres returns a list of rows from the product table where the product_id is
between 11 and 33. Note that “between” returns rows where the product_id
equals 11 or equals 33, as well as any values that are between 11 and 33.

Like
The “like” keyword checks that a string value matches some pattern. In the
following query, we are returning values from the us_president table where
the values for president_name are like the pattern “George%”.

select president_name
from us_president
where president_name like 'George%';

president_name
George Washington
George Herbert Walker Bush
George W. Bush

This query returns the names of presidents where the president_name value
starts with the text “George”. The percent character represents any
characters. So, the pattern “George%” checks for “George” followed by any
characters at all.

The percent sign will also match no characters. If a president were named
just “George” with no last name, the “where president_name like
‘George%’” condition would return him as well.

The percent sign can be placed at any position in the string. In the query
below, the “%” is placed in the middle of the string, right between “George”
and “Bush”.

select president_name
from us_president
where president_name like 'George%Bush';
president_name
George Herbert Walker Bush
George W. Bush

When using “like”, you can also use the underscore character (“_”) to
represent one character. The following query searches for presidents with
names that have the pattern of “_onald%”. That is, any one character
followed by the text “onald” followed by anything at all.

select president_name
from us_president
where president_name like'_oNaLd%';

president_name
Ronald Reagan
Donald Trump

“Like” is case-sensitive. If we search for lower case letters, we won’t find


uppercase letters. The following query won’t match any rows:

select president_name
from us_president
where president_name like '_oNaLd%';

If you want to do a case-insensitive “like”, use the ilike keyword instead of


like. ilike will return president_name values regardless of whether the letters
are upper or lowercase.

select president_name
from us_president
where president_name ilike '_oNaLd%';

president_name
Ronald Reagan
Donald Trump

Is Null
“is null” checks if a value is null. Recall that null is a special value in
Postgres that represents an empty value.
select *
from product
where supplier_id is null;

product_id product_name supplier_id


44 Mandolin null

Postgres comes with special keywords to deal with null values. A common
mistake is to use the syntax “where supplier_id = null”. That will return
unexpected results. The correct syntax is “where supplier_id is null”.

Not
The word “not” can be used to negate other keywords. For example:

select *
from product
where supplier_id is not null;

select *
from product
where supplier_id not in (11, 22);

select president_name
from us_president
where president_name not like '_onald%';

select *
from product
where product_id not between 11 and 33;

“Not” has the effect of reversing the search criteria. “where supplier_id in
(11, 22)” returns only 11 and 22, but “where supplier_id not in (11, 22)”
returns everything except 11 and 22.

SUBQUERIES
A subquery is a query that is nested within another query.

select tornado_alert_message
from emergency_message
where exists
(
select *
from tornado_alert
where alert_flag = 'Y'
);

This SQL statement has an inner query and an outer query. The part of the
SQL statement that is the inner query is also called a subquery. In the SQL
statement below, I have added comments to show which part is the outer
query and which part is the inner query or subquery.

1 select tornado_alert_message -- outer query


2 from emergency_message -- outer query
3 where exists -- outer query
4(
5 select * -- inner query (subquery)
6 from tornado_alert -- inner query (subquery)
7 where alert_flag = 'Y' -- inner query (subquery)
8 );

The word “exists” on line 3 checks for the existence of rows in the subquery.
The inner query runs first, and checks the tornado_alert table to see if any
rows exist where the alert_flag column is set to “Y”. Let’s take a look at the
tornado_alert table.

select * from tornado_alert;

State alert_flag
Massachusetts N
North Carolina Y
Florida N

There is a row in the table with an alert_flag of Y. It is the row for North
Carolina. Since at least one row with a Y exists, the outer query then selects
the tornado_alert_message column from the emergency_message table and
returns it. The results are:

tornado_alert_message
There is a tornado alert
If there had been more than one row with an alert_flag of Y, the results would
have been the same. “Exists” checks that one or more rows exist in the
subquery.

If there had not been any rows returned by the subquery with an alert flag of
Y, nothing would have been returned by the SQL statement at all.

Correlated Subqueries
Let’s take another look at our music store data. Let’s get a list of all stores
that have more than 10 of an item in stock.

1 select store_name
2 from store s
3 where exists
4 (
5 select i.store_id
6 from inventory i
7 where i.store_id = s.store_id
8 and i.quantity > 10
9 );

In this SQL statement, we have an inner query and an outer query like in the
last example, but there is something different about this SQL statement. In
line 7, the store table from the outer query is joined with the inventory table
from the inner query. This is a correlated subquery because there is a
correlation between the inner and outer queries. In other words, the inner and
outer queries are joined.

Because the inner and outer queries are correlated, the query can show us not
only that there is a store with more than 10 of an item in stock, but it can
show us which store it is.

The result of this SQL statement is:

store_name
Tennessee Instruments

Subqueries that Return One Row


If we know for sure that the subquery will only return one row, we could
rewrite the SQL statement this way.

1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );

Instead of using the “exists” keyword, we are checking that the store_id in the
outer query matches the store_id in the inner query, on lines 3 and 5.

That SQL statement produces the same result.

store_name
Tennessee Instruments

The trouble is that if the data in the inventory table changes, and a second
store begins stocking more than 10 of an item, our query would fail.

Right now, if we were to run the just the inner query by itself, we would see
that it returns one row that represents store_id 3, “Tennessee Instruments”.

select i.store_id
from inventory i
where i.quantity > 10;

store_id
3

But now if another store, say “Bill’s Banjos”, got a new shipment of banjos,
the inner query would return two rows.

select i.store_id
from inventory i
where i.quantity > 10;
store_id
3
1

Now when we run our complete query, we get an error:

1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );

ERROR: more than one row returned by a subquery used as an


expression SQL state: 21000

That is because the equal sign on line 3 tells SQL to expect only one value to
be returned from the subquery. When the subquery returns more than one
value, Postgres returns an error.

This is a relatively common problem. To avoid it, we shouldn’t have used an


equal sign on line 3. Unless you are sure that there is no possibility of the
subquery returning more than one row, use “in” instead of “=”.

1 select store_name
2 from store s
3 where s.store_id in
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );

Now that we have used “in” on line 3, our results come back nicely:

store_name
Bill's Banjos
Tennessee Instruments
Now the outer query isn’t surprised when the inner query returns multiple
rows. The “in” keyword can handle a set of values while “=” expects just
one value.

DISTINCT
The “distinct” keyword removes duplicates from our result set. If we select
the president_party column from our us_president table, we get lots of
duplicates.

select president_party
from us_president;

president_party
null
Federalist
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democrat
Democrat
Whig
Whig

That query returns 45 values, one showing the party for each president. We
could use “distinct” if we want to see each party only once.

select distinct president_party


from us_president;

president_party
null
Republican
Democratic-Republican
Federalist
Democrat
Whig
Now the query returns only 6 rows. Selecting the distinct parties shows each
party value only once.

LIMIT
The “limit” keyword allows us to select a limited number of rows. If we are
interested in seeing what the us_president table looks like, but we don’t need
to see all 45 rows in it, we could limit the results to 5 rows.

select *
from us_president
limit 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
2
CREATING AND CHANGING
DATABASE TABLES

So far, we have been selecting from database tables that already exist. Now
let’s look at how to create tables in the first place.

To create a table in Postgres, we use the “create table” statement.

1 create table us_president (


2 president_id int,
3 president_name text,
4 president_party text
5 );

In line 1, we provide the name of the table we want to create, us_president in


this case. In lines 2 – 4, we provide the column names we want, and the data
type for each column.

We can make up our own names for tables and columns. The names must be
63 characters or fewer, and they need to start with a letter or an underscore.
For readability, I recommend keeping your table and column names under 30
characters, and starting table and column names with a letter. Underscores
are often used between words in table and column names, but dashes are not
allowed.

By convention, tables are given singular names, not plural names. For
example, we called this table us_president instead of us_presidents.
In lines 2 – 4, we defined three columns for the table: president_id,
president_name, and president_party. We specified a data type for each
column. president_id is an integer, president_name is text, and
president_party also has a text data type.

DATA TYPES
Postgres provides lots of data types you can use to create your columns.
Some data types you will use frequently and some you will use once in a
while or only when some special situation arises.

Commonly-Used Data Types


Postgres gives us a choice of many data types that we can use for our
columns. Here are the most commonly-used data types.

text A character string of any length


integer An integer (often abbreviated as “int”)
numeric A floating point number (also called “decimal”)
date A calendar date with year, month, and day
time without time zone The time of day without a time zone (also called “time”)
time with time zone The time of day with a time zone (also called “timetz”)
timestamp without time zone Date and time without a time zone (also called “timestamp”)
timestamp with time zone Date and time with a time zone (also called “timestamptz”)
boolean A true or false value (often abbreviated as “bool”)
character A fixed-length character string (often abbreviated as “char”)

Text
The text data type stores a string of characters. A string can be made up of
almost any characters, like letters, numbers, underscores, dashes,
ampersands, and lots more. Here the product_name column of the product
table is created with the text data type.

create table product


(
product_id int,
product_name text,
supplier_id int
);
When dealing with text value in Postgres, we use single quotes not double-
quotes.

-- Surrounding Banjo with single quotes works nicely


select *
from product
where product_name = 'Banjo';

product_id product_name supplier_id


11 Banjo 100

-- Using double-quotes produces an error. Don't do this.


select *
from product
where product_name = "Banjo";

ERROR: column "Banjo" does not exist


LINE 3: where product_name = "Banjo";
^
SQL state: 42703
Character: 47

If you are creating a table and you need a column for text data of unknown
length, use the text data type.

Integer
The integer data type is used for storing whole numbers. An example of a
whole number is 3. int would not be the right data type to store 3.1415,
because that is not a whole number. That is because 3.1415 has a fractional
part (the part after the decimal point).

To create an integer column, you can save keystrokes by abbreviating


“integer” as “int”.

create table product


(
product_id int,
product_name text,
supplier_id int
);
The int data type can store numbers between -2147483648 and
+2147483647. If you need to store larger numbers you can use the bigint data
type instead. If you are dealing with smaller numbers and you want to save
disk space, you can use the smallint data type.

Data Type Range of Values


int -2147483648 to +2147483647
smallint -32768 to +32767
bigint -9223372036854775808 to +9223372036854775807

The smallint and bigint data types are available if you need them, but most
times, if you need to store numbers that don’t have a decimal point, the int
data type is what you want.

Numeric
The “numeric” data type is used for storing numbers that have a fractional
amount.

For example, let’s take a look at a table called special_number. The


special_value column in the table has been defined using the numeric data
type.

In Postgres, the “numeric” data type and the “decimal” data type are the
same. You can create the column using either word.

create table special_number


(
name text,
special_value numeric
);

Is the same as:

create table special_number


(
name text,
special_value decimal
);
We might see values like these in the special_number table:

select * from special_number;


name special_value
Pi 3.14159
Golden Ratio 1.61803
Euros to Dollars 1.17
US National Debt 28523782444392.17

Numbers like these that have a decimal point are said to have a precision and
scale. Precision is the total number of digits. Scale is the number of digits
after the decimal point. For example, 3.14159 has a precision of 6 and a
scale of 5.

We can use the “numeric” data type without specifying a precision and scale,
like we did above, or we can define a precision and scale for our numeric
column. When we created the special_number table, we could have created
the special_value column with a precision of, say, 10 and a scale of 4.

create table special_number


(
name text,
special_value numeric(10,4)
);

Now the values appear with 4 digits to the right of the decimal point, and up
to 6 digits to the left of the decimal point. Since only 10 digits total are
allowed in the column, and 4 are to the right of the decimal point, that leaves
us with 6 digits to the left of the decimal point.

select * from special_number;


name special_value
Pi 3.14159
Golden Ratio 1.61803
Euros to Dollars 1.17

We could not add the US National Debt to this column anymore because it is
too large for our column definition. The number has more than 6 digits to the
left of the decimal point.
Date
To store dates in Postgres, we can use the “date” data type. For example,
here is a table called clock_change that has one column called
clock_change_date. The clock_change_date column was created with the
“date” data type, so we can store dates in that column.

create table clock_change


(
clock_change_date date
);

We might see dates like these in the table.

select *
from clock_change;

clock_change_date
2020-11-01
2021-03-14
2021-11-07
2022-03-13
2022-11-06

Postgres typically displays dates in YYYY-MM-DD format. That is, as a 4-


digit year, followed by a dash, a 2-digit month, another dash, and a 2-digit
day. We can use that same format to select a particular date from the table.

1 select *
2 from clock_change
3 where clock_change_date = '2021-11-07';

The “date” data type is handy when you need to store a date but when you
don’t need to store the time of day or a time zone. Notice that the date is
surrounded by single quotes on line 3.

Postgres provides an easy way to get today’s date called current_date. The
current_date function gives you today’s date as a “date” data type.
Strangely, the current_date function is one of a handful of Postgres functions
that you call without parens. The syntax to call the current_date function is
“select current_date” not “select current_date( )”.

select current_date;

current_date
2021-06-29

We could also use current_date to check for dates in our clock_change table
that have today’s date.

select *
from clock_change
where clock_change_date = current_date;

You can also get the current_date and then add or subtract from it. For
example, the following SQL statement displays all rows in the table with a
clock_change_date from the last ten days:

select *
from clock_change
where clock_change_date > current_date – 10;

We are checking that the clock_change_date column is newer than today’s


date minus 10 days. This uses, but doesn’t change the value of, current_date.
The next time you make use of current_date it will be set to the current date.

Time Without Time Zone


The “time without time zone” data type stores a time of day without a date or
time zone.

create table start_time


(
twotz time without time zone
);

I named this column twotz because it stands for “Time Without Time Zone”.
The following query returns the column twotz from the start_time table:
select twotz
from start_time;

The query might return this value.

twotz
15:55:25.832142

The time that was returned consists of several parts. 15 is military time for
3pm because 15 minus 12 is 3. 55 is the minutes. 25 is the seconds. 832142
is the milliseconds. So, this time represents 3:55 pm and 25 seconds and
832142 milliseconds.

Notice that the “time without time zone” data type doesn’t include a date or a
time zone.

When using the “time without time zone” data type, you can either spell out
“time without time zone” or shorten it to “time”. The default for “time” is
“without time zone”. These two tables would be the same:

create table start_time


(
twotz time without time zone
);
create table start_time
(
twotz time
);

Time With Time Zone


The “time with time zone” data type stores a time of day without a date, but
with a time zone.

select twtz
from start_time;

This query returns the twtz column from the start_time table.

twtz
15:55:25.832142-04:00
This returned the same time as the “time without time zone” data type
example above, but now there is a “-4:00” at the end of it.

The “-4:00” is an offset that represents the time zone. This offset tells us the
difference between the time zone of this time and the UTC (Universal Time
Coordinated) time zone. The time “15:55:25.832142-04:00” is still 3:55 pm
and 25 seconds and 832142 milliseconds, but the “-4:00” tells us that it is in
the EDT time zone. EDT is Eastern Daylight Time, which is a time zone used
in the eastern part of the United States. EDT is 4 hours behind UTC, so the
offset is negative 4.

When using the “time with time zone” data type, you can either spell out
“time with time zone” or shorten it to “timetz”. These two tables would be
the same:

create table start_time


(
twtz time with time zone
);

create table start_time


(
twtz timetz
);

Postgres provides an easy way to get the current time called current_time.
The current_time function gives you the current time as a “time with time
zone” data type.

select current_time;

14:34:19.274711-04:00

You could compare the times in our table to the current time using this query.

select *
from start_time
where twtz = current_time;
We are unlikely to have any rows in the table with a time that is exactly the
same as the current time including the milliseconds, but we can check for
times within the last hour using a query like this:

select *
from start_time
where twtz > (current_time - interval '1 hour');

The “interval” syntax allows us to do math on dates or times.

Timestamp Without Time Zone


The “timestamp without time zone” data type stores a date and time without a
time zone. The format of the field is “2021-06-29 14:54:56.710457”. That is
the date in YYYY-MM-DD format, followed by the time in hours, minutes,
seconds, and milliseconds.

The “timestamp without time zone” data type can also be referred to as
simply “timestamp”. These two tables would be the same:

create table start_time


(
tswotz timestamp without time zone
);

create table start_time


(
tswotz timestamp
);

Timestamp With Time Zone


The “timestamp with time zone” data type stores a date and time with a time
zone.

The “timestamp with time zone” data type can also be shortened to
“timestamptz”. These two tables would be the same:

create table start_time


(
tswtz timestamp with time zone
);
create table start_time
(
tswtz timestamptz
);

Postgres provides a function called now( ) that is an easy way for you to get
a current “timestamp with time zone” value.

select now();

now
2021-06-29 14:54:56.710457-04

You can also use the Postgres current_timestamp function, which does this
same thing as now( ).

select current_timestamp;

current_timestamp
2021-06-29 14:54:56.710457-04

You could also use now( ) or current_timestamp to check the table to see if
there are any values with the current timestamp, like this:

select *
from start_time
where tswtz = now();

or:

select *
from start_time
where tswtz = current_timestamp;

Boolean
The boolean data type, often abbreviated as “bool”, stores a value that
represents true or false. Let’s take a look at the presidential_hair table.

create table presidential_hair (


president_name text,
good_hair bool
);

You might see data like this in the table:

select * from presidential_hair;

president_name good_hair
Richard Nixon false
Martin Van Buren false
John F. Kennedy true

You may see queries that use a slightly different syntax for checking
booleans, for example:

select president_name from presidential_hair where good_hair is


true;
select president_name from presidential_hair where good_hair =
true;
select president_name from presidential_hair where good_hair = 't';

All of those statements do the same thing.

To check a boolean for a value of false, you can use the “not” syntax.

select president_name
from presidential_hair
where not good_hair;

president_name
Richard Nixon
Martin Van Buren

And again, you may come across queries that use a slightly different syntax to
do the same thing.

select president_name from presidential_hair where good_hair is


false;
select president_name from presidential_hair where good_hair =
false;
select president_name from presidential_hair where good_hair = 'f';

In some databases, you will see columns named with the suffix “_flag” that
have a boolean data type. Since a boolean is either true or false, columns
will be given names like “active_user_flag” because if the user is active,
setting the column to true can be compared to raising a flag.

create table application_user (


user_name text,
active_user_flag bool,
speaks_english_flag bool,
speaks_spanish_flag bool
);

That table could contain data like this.

User_name active_user_flag speaks_english_flag speaks_spanish_flag


Rick Silva true true false
Kendra true true true
Duncan
Jose false false true
Gonzalez

If you wanted a list of active users who speak Spanish, you could use this
query:

select *
from application_user
where active_user_flag
and speaks_spanish_flag;

User_name active_user_flag speaks_english_flag speaks_spanish_flag


Kendra true true true
Duncan

Character
The “character” data type allows us to store fixed-length character strings. It
is often abbreviated as “char”.

For example, the following table was created to store data about states in the
USA:

create table us_state (


state_code char(2),
state_name text
);

The state_code column is defined as a character data type with a length of 2.


This table might store data that looks like this:

select * from us_state;

state_code state_name
MA Massachusetts
NC North Carolina
HI Hawaii

Since we defined the state_code column as char(2), Postgres won’t allow us


to enter more than 2 characters in that column. On the other hand, the
state_name column was defined with the “text” data type because we don’t
know how many characters a state name will have.

Less Commonly-Used Data Types


Postgres is a robust database. Sometimes it feels like they have thought of
everything. Postgres provides so many data types that you can use to create
your columns that it’s hard to imagine using them all. Some of these less
commonly-used data types you will use once in a while, some you will use
only when some special situation arises, and some you will probably never
use.

Some of the data types - like point, circle, and polygon - are for geographical
data. There are data types for binary data, MAC addresses, host addresses,
and all manner of other things. You can find a complete description of all of
the Postgres data types on the Postgres website. You can even create your
own data types using the “create type” command if you ever need a really
unconventional data type.

Numbers

bigint A larger version of "int"


smallint A smaller version of "int"
money A currency amount
real A floating point number
double precision A larger floating point number (also called "double" or "float")

Formatted Data

json
Stores textual JSON (JavaScript Object Notation) data
jsonb Stores binary JSON data, decomposed
xml Stores XML (Extensible Markup Language) data

Serial

serial An autoincrementing integer


smallserial A smaller autoincrementing integer
bigserial A larger autoincrementing integer

Bits and Bytes

bit A fixed-length bit string


bit varying A variable-length bit string
bytea Binary data ("byte array")

Geometry-Related

box A rectangular box on a plane


circle A circle on a plane
line An infinite line on a plane
lseg A line segment on a plane
path A geometric path on a plane
point A geometric point on a plane
polygon A closed geometric path on a plane

Specialty Addresses

cidr An IPv4 or IPv6 network address


inet An IPv4 or IPv6 host address
macaddr A MAC (Media Access Control) address
macaddr8 A MAC (Media Access Control) address (EUI-64 format)

Miscellaneous

interval A time span


pg_lsn A Postgres log sequence number
tsquery A text search query
tsvector A text search document
txid_snapshot A user-level transaction ID snapshot
uuid A universally unique identifier
character varying A variable-length character string

CONSTRAINTS
When you create your own database tables, Postgres gives you a way to put
“constraints” on your columns. Constraints are rules about the data that can
be saved in your columns. Can there be two rows in the table with the same
value in this column? Are null values allowed for this column? Using
constraints, you can define rules about the data that is allowed, and Postgres
will enforce those rules for you.
Constraints help us with “data integrity”. That is, they help us to keep the
data in our database accurate and consistent.

Primary Keys
A “primary key” is a column, or more than one column, that uniquely
identifies the rows in a table. When you create a database table, one of the
most important questions to consider is “what should the primary key for this
table be?”

Single Column Primary Keys


In our us_president table, which column uniquely identifies each row?

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

The answer is the president_id column. That is because no two rows can
have the same president_id. There will never be two different rows with the
same president_id.

That can’t be said of the president_party column. We can see that there are
multiple rows that have a president_party of “Democratic-Republican”.

It can’t even be said of the president_name column. It is possible that there


could be two presidents with exactly the same name.

But we can always be sure that president_id 3 will refer to just one row in
the table. So, when we create the us_president table, we should make the
president_id column the primary key.

To designate the president_id column the primary key for the table, we can
use the “primary key” syntax:
1 create table us_president (
2 president_id int primary key,
3 president_name text,
4 president_party text
5 );

In line 2, we named the column “president_id”, we set its data type to


integer, and we defined the column as the primary key for this table by using
the syntax “primary key”.

Making president_id the primary key for this table does three good things for
us.

First, it prevents duplicate President IDs from being inserted into the table. If
a user of our database tries to add president_id 40 and there already is a
president_id 40 in the table, Postgres will stop them and give them an error
message.

Secondly, making president_id the primary key prevents users from adding a
null value to the president_id column. When we defined president_id as the
primary key, we designated it as a special column that cannot be null.

Those two things fall under the category of “data integrity”. They help to
keep the data in our database accurate and consistent. Once we define this
primary key, we can be assured that all rows in the table will have a unique
president_id, and that no president_id will be null. Postgres will enforce that
for us, and that will help to keep the data in our database of a high quality.

The third advantage to creating a primary key is that it causes Postgres to


create an index for us. An index will help to speed up queries that select
from this table. We’ll look at indexes more in a moment.

Multi-Column Primary Keys


Sometimes a primary key consists of more than one column. For example, in
the following table we are storing the daily high temperature by city.

city temp_date high_temp


Boston 2021-12-01 36
Boston 2021-12-02 31
Boston 2021-12-03 27
Raleigh 2021-12-01 54
Raleigh 2021-12-02 56
Raleigh 2021-12-03 49

For this table, the primary key should consist of the city and temp_date
columns. That is because there should be only one row with the same city
and date. For example, we have a row for Boston for 2021-12-01 with a high
temperature of 36. If we make the city and temp_date the primary key for this
table, Postgres will prevent users from adding a second row for Boston for
2021-12-01.

There would never be a reason to add a second row for Boston for 2021-12-
01, so if we make city and temp_date the primary key, Postgres will make
sure that doesn’t happen. To make city and temp_date the primary key for this
table, we can use the “primary key” syntax.

1 create table high_temperature (


2 city text,
3 temp_date date,
4 high_temp int
5 primary key (city, temp_date)
6 );

In line 5, we told Postgres that the primary key should be comprised of the
city and temp_date columns. When a primary key is made up of more than
one column this way, it is called a “composite” key.

Postgres doesn’t require you to define a primary key for the tables you
create, but I recommend that you do. Defining primary keys for your tables
sets Postgres up to help you with data integrity. It also helps you to
understand your data and the best way to define your tables. Most times, if
you can’t figure out what the primary key should be for a new table, that
means you need to rethink your table design. Except in rare cases, every table
should have a primary key.

Foreign Keys
A foreign key is a column, or more than one column, that has a relationship
with the primary key of another table. You can use a table’s foreign key to
join it to another table. A foreign key consists of the columns that are used to
match this table to another table’s primary key columns.

Let’s take another look at the tables for our music stores:

supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.

product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300

When we create the supplier table, we should make the primary key the
supplier_id column because we never want there to be two rows in the
supplier table with the same supplier_id.

create table supplier


(
supplier_id int primary key,
supplier_name text
);

The product table also has a supplier_id column. That column can be used to
link the product table to the supplier table. Product 11 – Banjo – in the
product table has a supplier_id of 100. We can go to the supplier table and
see that supplier_id 100 is “Missing Teeth Mfg” so we know that the
supplier of the banjo is Missing Teeth Mfg.

When we create the product table, we should define the supplier_id column
as a foreign key that references the supplier_id column of the supplier table.

1 create table product


2(
3 product_id int,
4 product_name text,
5 supplier_id int,
6 foreign key (supplier_id) references supplier (supplier_id)
7 );

In line 6, we defined the supplier_id column of the product table as a foreign


key. We also told Postgres that the supplier_id column of the product table is
related to the supplier_id column of the supplier table.

Now if a user tries to add a product to the product table with a supplier_id
of, say, 900 then Postgres will prevent them from adding the product and will
display an error message. Postgres will recognize that there is no supplier_id
900 in the supplier table, so adding that row to the product table would
violate the foreign key we set up when we defined the table.

This is known as “referential integrity”. By setting up this foreign key, we


can be assured that every supplier_id in the product table refers to a
supplier_id that exists in the supplier table.

Not Null Columns


Sometimes we want to create a database table with columns that should not
allow null values. To do that we can use the “not null” syntax.

1 create table us_president (


2 president_id int primary key,
3 president_name text not null,
4 president_party text
5 );

On line 3, we set the president_name column to not allow null values. It


doesn’t make sense to have a null value in the president_name column
because if we don’t know the president’s name, why would we add the row
in the table at all?

Now that we have defined the president_name column as “not null”, if a user
in our database tries to add a null value for president_name, Postgres will
stop them and display an error message.

By the way, the president_id column defined in line 2 will also not allow
null values because it has been defined as the table’s primary key. Recall that
primary keys columns cannot have a null value. Because it is the primary key,
we didn’t need to specify “not null” for the president_id column. Postgres
does that for us automatically.

Unique
Sometimes we want to prevent duplicate values in a column. We might want
to create a customer table, for example, that will not allow duplicate email
addresses.

1 create table customer (


2 customer_id int primary key,
3 customer_name text not null,
4 email_address text unique
5 );

On line 4 of our “create table” SQL statement, we used the word “unique”.
Because the table was created this way, Postgres will not allow two
customers in the table to have the same email address. Postgres will display
an error message if a user tries to create a customer with an email address
that is already in the table.

Check
We can use the “check” constraint to make sure that a column contains certain
values, or a certain range of values. We can define the high_temperature table
this way.

create table high_temperature (


city text,
temp_date date check (temp_date between '2020-01-01' and '2030-
01-01'),
high_temp int check (high_temp < 150)
);

If a user tries to add a row to the high_temperature table with a temp_date


before January 1, 2020 or after January 1, 2030, Postgres will prevent the
row from being added and will display an error message. If a user tries to
add a row with a temperature that is 150 or higher, Postgres will likewise
prevent the row from being added, and will display an error message.

INDEXES
Postgres lets us create indexes on our tables that will help to speed up the
SQL queries that use those tables. You can create an index using the “create
index” syntax.

create index product_supplier_index on product(supplier_id);

This SQL command creates an index named product_supplier_index on the


supplier_id column of the product table. The new index won’t change
anything about the way we use the table. It will just speed up access to it.

Again, you don’t need to create indexes for columns that have been defined
as primary keys. When you create a primary key for a table, Postgres
automatically indexes those columns for you.

DROPPING TABLES
The syntax to drop a table is “drop table”. Be careful with this command
because it will remove the table and all of the data in it.

drop table product;

ALTERING TABLES
The alter command lets us add columns to the table. We can also use it to
drop a column, change the data type of a column, rename a column, and more.

alter table customer add column county text;


alter table customer drop column zip;

alter table customer rename column city to city_or_town;

ADDING AND MODIFYING DATA


Once we have created our database tables, we will certainly want to insert
data, update that data, and delete data. Often, we will use SQL to add,
change, and delete data. The SQL commands to do that are insert, update,
delete, and truncate. We will see examples of those commands in a moment.

One way to execute SQL commands against a Postgres database is using a


graphical database management tool like PgAdmin. PgAdmin comes with a
query tool that allows us to enter SQL and run that SQL against a Postgres
database. We will take a closer look at PgAdmin momentarily.

There are other ways to insert, update, and delete data from a Postgres
database in addition to using PgAdmin. Many applications have a User
Interface (UI) that allows end users to enter and change data and to have
those changes affect a Postgres database. An application’s UI might be web-
based, or it might be part of a mobile application, or it may be part of a
desktop application. Often, an application’s user interface sends SQL
commands like insert, update, and delete to a Postgres database, where those
commands are executed.

Inserting Data
To use SQL to insert data into a table, use the “insert” command.

1 insert into us_president (president_id, president_name,


president_party)
2 values (33, 'Harry Truman', 'Democrat');

Here we are inserting a row into the us_president table. In line 1, we are
telling Postgres that we want to insert this data into the president_id,
president_name, and president_party columns. Then on line 2, we provide
the data to insert, in this case, 33, “Harry Truman”, and “Democrat”. “Harry
Truman” and “Democrat” have single quotes around them because they are
text fields. 33 is an integer so it does not have quotes around it. This SQL
statement will add a new row to the us_president table.
Notice that we listed three columns in line 1 and three values in line 2.

The Number of Columns Must Match the Number of Values


A common mistake is to provide more or fewer values than columns. For
example:

1 -- This is wrong. Too few columns for the data.


2 insert into us_president (president_id, president_name)
3 values (33, 'Harry Truman', 'Democrat');

Postgres looks for a column to insert the text “Democrat” in, but since we
removed the reference to the “president_party” column from line 2, it can’t
find one. We are trying to insert three values into two columns, so Postgres
returns this error message:

ERROR: INSERT has more expressions than target columns


LINE 2: values (33, 'Harry Truman', 'Democrat');
^
SQL state: 42601
Character: 85

Always List your Columns


One “gotcha” that has burned many a developer is caused by the fact that
Postgres doesn’t require us to list the columns in the table we are inserting
data into. Instead of specifying the president_id, president_name, and
president_party columns in our insert statement, we can leave those column
names out entirely, like this:

1 -- Don't do this.
2 insert into us_president
3 values (33, 'Harry Truman', 'Democrat');

Even though we aren’t specifying any columns to insert the data into on line
2, Postgres accepts this syntax and adds the row to the table. Postgres sees
that there are three columns in the table and since we provided three values,
it writes the values to those three columns in the table without displaying an
error.
The problem comes later when we decide to make a change to the
us_president table. Let’s say we add a new column to the table called
inauguration_date. Now when we run our insert statement, there are four
columns in the table and there are only three values being added. That causes
Postgres to reject the new row and display an error message.

It can be perplexing when an insert statement is part of an application that has


been running without a problem and now all of a sudden it is failing. It can
take some time to figure out that the insert statement hasn’t changed, but the
database table has.

For this reason, I recommend that you always list the column names that you
want to insert into.

1 -- Don't do this, even though it will work.


2 insert into us_president
3 values (33, 'Harry Truman', 'Democrat');

1 -- Do this instead.
2 insert into us_president (president_id, president_name,
president_party)
3 values (33, 'Harry Truman', 'Democrat');

Listing the column names on line 2 allows this insert statement to work even
if we add new columns to the table later.

Inserting Data from Another Table


We can also insert data from one table into another table.

1 insert into us_republican_president (


2 president_id,
3 president_name
4 )
5 select president_id,
6 president_name
7 from us_president
8 where president_party = 'Republican';
This SQL statement will insert all 19 Republican presidents from the
us_president table into the us_republican_president table. If we were to run
just lines 5 – 8 in isolation, it would display the Republican presidents from
the us_president table. Lines 1 – 4 take that data and insert it into the
us_republican_president table. Notice that the word “values” is not part of
this syntax as it was in the previous example.

Updating Data
To change data that already exists in a table, we can use the “update”
statement and specify which data we want to change in our “where” clause.
Here we are replacing any null values in the president_party column with the
text “None”.

update us_president
set president_party = 'None'
where president_party is null;

This changes George Washington’s political party from null to the text
“None”.

Now let’s change the row for Lyndon Johnson to include his middle name.

update us_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';

This updates the row for Lyndon Johnson and sets his president_name to
Lyndon Baines Johnson. We can see that the update worked by querying the
table:

select *
from us_president
where president_id = 36;

president_id president_name president_party


36 Lyndon Baines Johnson Democrat
One thing to be aware of is that the update statement will update as many
rows as meet the criteria in the “where” clause. In this case, there was only
one row in the table with a president_id of 36, so only that row was updated.

But let’s imagine that we forgot to include the where clause in our update
statement. Instead of running this SQL statement:

update us_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';

By mistake, we ran this one:

update us_president
set president_name = 'Lyndon Baines Johnson';

Now instead of updating one row in the table, we would update all the rows
in the table! Let’s look at the table now:

select * from us_president;

president_id president_name president_party


1 Lyndon Baines Johnson None
2 Lyndon Baines Johnson Federalist
3 Lyndon Baines Johnson Democratic-Republican
4 Lyndon Baines Johnson Democratic-Republican
5 Lyndon Baines Johnson Democratic-Republican
6 Lyndon Baines Johnson Democratic-Republican
8 Lyndon Baines Johnson Democrat
9 Lyndon Baines Johnson Whig
… … …

We have mistakenly changed every president’s name to Lyndon Baines


Johnson!

So be aware that update statements can update more than one row. It will
update as many rows as the where clause specifies. If there is no “where”
clause at all, all rows in the table will be updated.
Deleting Data
To delete data from database tables, we usually use the “delete” statement.
There is also a “truncate” statement that we can use if we want to quickly
remove all rows in a table.

The Delete Statement


Let’s say we wanted to remove the row for supplier 500 from the supplier
table. We can use the delete keyword.

delete from supplier


where supplier_id = 500;

If there were no supplier 500 in the table to begin with, the delete statement
would not display an error message. It would simply delete zero rows.

To delete all rows in a table, you can use the “delete” syntax without a
“where” clause.

delete from supplier;

This will delete all the rows in the table, but it will not drop the table. The
table will still exist, but there will be no rows of data in it. If you want to
remove the entire table, you can use the “drop table” statement instead.

Like the “insert” statement, “delete” will remove as many rows that match
the criteria in the where clause. “Delete” can also delete many rows in a
single SQL statement. This one delete statement will delete every Republican
in the table:

delete from us_president


where president_party = 'Republican';

The Truncate Statement


A faster way to delete all rows from a large table is to use the “truncate”
statement.

truncate supplier;
Truncate has the same effect as deleting all rows from a table with the
“delete” statement. It will delete all the rows in the table, but will not drop
the table. The table will still exist, but there will be no rows in it. The
difference is that “truncate” is faster because of the way Postgres implements
“truncate” behind the scenes.
3
MORE DATABASE OBJECTS

Tables are by far the most-used database objects, but there are others we will
use as well, like views, materialized views, sequences, functions, and
triggers.

You can name your objects using the same rules that we used for tables. Their
names must be 63 characters or fewer, and they need to start with a letter or an
underscore. I like to name my database objects with a prefix that indicates
what type of objects they are. For example:

dog A table containing data about dogs


v_dog A view containing data about dogs
mv_dog A materialized view containing data about dogs
f_dog( ) A function that processes data about dogs
tr_dog A trigger that processes data about dogs

These aren’t industry-standard naming conventions, just something I like to do.


If you are maintaining an existing codebase, you should continue using
whatever naming conventions were established when the database was
created.

VIEWS
A “view” in Postgres is a saved SQL statement that you can query like it’s a
table. To create a view, use the “create view” statement.

create view v_democrat_president as


select president_id,
president_name
from us_president
where president_party = 'Democrat';

Now that you have defined the v_democrat_president view, you can query it
as if it were a table.

select * from v_democrat_president;

president_id president_name
7 Andrew Jackson
8 Martin Van Buren
11 James Polk
14 Franklin Pierce
… …

You can also use a “where” clause when selecting from a view.

select *
from v_democrat_president
where president_id = 7;

president_id president_name
7 Andrew Jackson

Each time we select from a view, Postgres runs the query that we defined in
our “create view” statement. That query typically selects rows from a
database table. That means that if we make a change to the data in our
underlying database table, the change will also appear when we select from
the view. For example, when we selected from the v_democrat_president
view for president_id 7, Postgres ran the query that we defined when we
created the v_democrat_president view.

create view v_democrat_president as


select president_id,
president_name
from us_president
where president_party = ‘Democrat’;

That caused Postgres to return a set of Democrat presidents including


president_id 7. But now let’s see what happens if we change the political
party of president_id 7 to a Republican in the underlying us_president table.
update us_president -- This is the table that the view is based on
set president_party = 'Republican'
where president_id = 7;

Now if we run our select statement against the view again, we see that
president_id 7 is not returned. This query returns no rows:

select *
from v_democrat_president
where president_id = 7;

That makes sense because this view is supposed to return only Democrat
presidents. We can see that the changes to the underlying table took effect
when we queried the view.

Using Views to Hide Sensitive Data


Views can be also be handy for making sure sensitive data is seen only by the
right users. For example, let’s say we have a table called employee:

select * from employee;

employee_id employee_name title salary


1 Dave Nelson CEO $980,000.00
2 Rob Field Sales Manager $349,000.00
3 Cindy Rhodes Software Developer $574,000.00

We might want all users to be able to see the employee_id, employee_name,


and title columns, but we only want Human Resources to be able to see the
salaries.

We could create a view called v_employee that has all of the columns except
for salary.

create view v_employee as


select employee_id,
employee_name,
title
from employee;
When we query the new v_employee view, we see that it has been created
correctly and that it has the information we intended.

select * from v_employee;

employee_id employee_name title


1 Dave Nelson CEO
2 Rob Field Sales Manager
3 Cindy Rhodes Software Developer

Now we have a table called employee that includes salary data, and a view
called v_employee that does not include salary data. When data changes in the
employee table, the v_employee view will automatically reflect those
changes, so we’ll only have to make our changes in one place.

Now we can give users in the Human Resources department access to the
employee table, and give all other users access to the v_employee view.

MATERIALIZED VIEWS
A “materialized view” in Postgres is created by running a SQL statement, but
unlike a regular view, the results of the SQL statement are saved. To create a
materialized view, use the “create materialized view” syntax.

1 create materialized view mv_democrat_president as


2 select president_id,
3 president_name
4 from us_president
5 where president_party = 'Democrat';

The results of the query in lines 2 – 5 are saved in the mv_democrat_president


materialized view. You can query a materialized view as if it’s a table.

select *
from mv_democrat_president
where president_id = 7;

president_id president_name
7 Andrew Jackson
Now let’s see what happens if we change the political party of president_id 7
to “Republican” in the underlying us_president table.

update update us_president -- The underlying table


set president_party = 'Republican'
where president_id = 7;

Now if you run the select statement against the mv_democrat_president


materialized view again, you might be surprised to see that president_id 7 is
still being returned.

select *
from mv_democrat_president
where president_id = 7;

president_id president_name
7 Andrew Jackson

Even though we changed the data in the underlying us_president table, the data
in the materialized view did not get refreshed. The materialized view is still
reflecting the state of the us_president table before we changed it. Unlike a
regular view, we need to run the “refresh materialized view” SQL command
to refresh a materialized view.

refresh materialized view mv_democrat_president;

Now when we check the materialized view, president 7 does not appear. The
change we made to the underlying table is now reflected in the materialized
view.

select *
from mv_democrat_president
where president_id = 7;

Now that we have refreshed the materialized view, the query returns no rows,
as we would expect.

Materialized views are good for situations where you either want to save
point-in-time data or where the query that creates the materialized view is
complex and takes a long time to run. If the query for a view takes, say, 5
minutes to run then you may make the decision to use a materialized view
instead. You may decide to refresh a materialized view every hour rather than
using a regular view that will run the query every time somebody selects from
the view.

SEQUENCES
Sometimes we want a column in a table to contain consecutive numbers. In the
us_president table for example, the row for the first president has a
president_id of 1. The next row has a president_id of 2. Then 3, 4, 5, etc.

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

One way to accomplish that is to create a “sequence” that starts at 1 and


increments by 1 each time it is used.

create sequence president_sequence;

Each time a new row is added to the us_president table, we can use the
“president_sequence” sequence we just created to keep a running tally and
insert the right president_id into the table. Postgres provides a function called
nextval( ) to help us with that. Calling nextval( ) adds 1 to our sequence and
saves the new value.

Let’s assume that the us_president table is empty when we start. We can add
rows to it using the sequence and the nextval( ) function to create a value for
the president_id column.

insert into us_president (president_id, president_name,


president_party)
values (nextval(‘president_sequence’), ‘George Washington’, null);
insert into us_president (president_id, president_name,
president_party)
values (nextval(‘president_sequence’), ‘John Adams’, ‘Federalist’);

insert into us_president (president_id, president_name,


president_party)
values (nextval(‘president_sequence’), ‘Thomas Jefferson’,
‘Democratic-Republican’);

When we query the us_president table, we can see that the


“president_sequence” sequence inserted the correct values into the
president_id column in the table.

select * from us_president;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican

CREATING FUNCTIONS
We saw earlier how to call functions that have been provided to us by
Postgres, but now let’s take a look at creating our own functions.

Functions are helpful when you have a complex SQL statement, or a group of
SQL statements that have several steps, and you know that you will need to run
those same SQL statements again sometime in the future. It makes sense to
save that SQL as a function so you can call it by name later. This will prevent
you from having to recreate the SQL next time, and it will make it easy to run
the SQL the next time you need it. Functions are stored in the Postgres
database, so we can write the SQL once and simply call the function whenever
we need to.

Recall that Postgres comes with functions like the upper( ) function. We saw
that calling upper( ) with a value of “Say it loud” returned the text “SAY IT
LOUD”.

select upper('Say it loud');


upper
SAY IT LOUD

Notice that when you call the upper( ) function, there are left and right parens
after the word “upper”. You can pass values to the function by putting the
values between the parens. In this case, we wrapped the value “Say it loud” in
single quotes because it is text. If the value had been a number, we wouldn’t
have used quotes. You will call the functions that you write the same way that
you call functions provided by Postgres.

Parameters and Arguments


Many developers use the words “parameters” and “arguments”
interchangeably. Technically, there is a slight difference between the two
terms. Arguments are the values that are sent to functions, and parameters are
the variables that receive the values in those functions. The main point,
however, is that we can call functions and send values to them, and we can
write functions that receive those values and do something with them.

In our “select upper(‘Say it loud’)” example, the text “Say it loud” was the
only argument passed to the upper( ) function. The upper( ) function was
written to accept one value and to do something with it. In this case, the
function takes the value sent to it and returns the uppercase version of it.

Functions can be written to accept no parameters at all, or they can be written


to take any number of parameters – even thousands of parameters. Typically,
you wouldn’t write a function that accepts more than 5 or 10 parameters.

SQL vs. PL/pgSQL


We can write simple Postgres functions using the SQL language. For more
complex functions, we can use the PL/pgSQL language. PL/pgSQL stands for
“Procedural Language/PostgreSQL”.

We have been using plain SQL up until now and it has done everything we
needed it to do, but there are times when we need more functionality than the
SQL language can offer. SQL is a non-procedural language. It can query
database tables nicely, but you can’t define variables or create loops with it.
You can’t add if/then logic to your programs with plain SQL. If you need to
write a function using a procedural language to do those types of more
advanced things, use PL/pgSQL.

Writing Functions with SQL


Let’s use SQL to write our own function called f_get_president_and_party ( ).
The function will take a president_id value and return the president’s name
and political party wrapped in parens.

1 create or replace function f_get_president_and_party(int) returns


text as $$
2 select president_name || ' (' || president_party || ')'
3 from us_president
4 where president_id = $1;
5 $$ language sql;

On line 1, we create the function. We could drop and then recreate the function
in two steps, but the “create or replace function” syntax will do that for us in
one step. If the function doesn’t exist, it will be created. If the function already
exists, it will be recreated.

We named the function f_get_president_and_party( ) and we said it takes one


parameter that is an integer. This function will accept the argument being
passed into it and return a text value.

The body of the function goes within the “$$” symbols (two dollar signs).
Surrounding text with $$ in Postgres is known as “dollar quoting”. The
function’s body goes from the $$ at the end of line 1 until the $$ at the
beginning of line 5. Also on line 5, we are specifying that this function has
been written in SQL, not PL/PgSQL.

Lines 2 – 4 are the body of the function. They contain a SQL query that selects
the president’s name, concatenates it to a space and a left paren, adds the
president’s political party, and concatenates a right paren. The effect is to
return the president’s name followed by his political party in parens. Recall
that two pipe characters (“||”) is the concatenation operator; it appends strings
together.
On line 4, “$1” represents the first argument passed into this function. In this
case there is only one argument passed in, but if there were others, we could
have referenced them as $2, $3, $4 for the second argument, the third
argument, the fourth argument, etc.

We can call the function like this:

select f_get_president_and_party(7);

and it will return:

Andrew Jackson (Republican)

The function queried the us_president table for president_id 7 and returned the
president_name and president_party columns, nicely formatted. We could also
call this function embedded in a SQL query like this:

1 select f_get_president_and_party(president_id)
2 from us_president
3 where president_id > 40;

George Herbert Walker Bush (Republican)


Bill Clinton (Democrat)
George W. Bush (Republican)
Barack Obama (Democrat)
Donald Trump (Republican)
Joe Biden (Democrat)

In line 1, we got the president_id from the us_president table and sent it as an
argument to the f_get_president_and_party( ) function. The function returned a
president name and the political party shown within parens. On line 3, in our
“where” clause, we asked for only presidents that have a president_id greater
than 40, so the function retrieved 5 presidents from the us_president table. The
f_get_president_and_party( ) function was called 5 times, with 5 different
president IDs, returning 5 rows.

Writing Functions with PL/PgSQL


Writing Postgres functions using PL/PgSQL is similar to writing them with
SQL, but PL/PgSQL has more capabilities.
Here is an example of a PL/PgSQL function that returns not only the
president’s ID, name, and party, but if that president is currently in office it
also displays the text “**Current President**”.

1 create or replace function


f_get_president_note_current(p_president_id int)
2 returns text as $$
3 declare
4 v_max_president_id int;
5 v_president_name text;
6 v_president_party text;
7 v_current_text text = ";
8 begin
9 select max(president_id)
10 into v_max_president_id
11 from us_president;
12
13 if (v_max_president_id = p_president_id) then
14 v_current_text = '**Current President**';
15 end if;
16
17 select president_name,
18 president_party
19 into v_president_name,
20 v_president_party
21 from us_president
22 where president_id = p_president_id;
23
24 return 'President ID: ' || p_president_id || ' Name: ' ||
v_president_name ||
25 ' Political Party: ' || v_president_party || ' ' ||
v_current_text;
26
27 end;
28 $$ language plpgsql;

In line 1, we defined the f_get_president_note_current( ) function using the


“create or replace” syntax. We are telling Postgres that if this function doesn’t
exist to create it, and if it does exist to replace it with this new code.

In line 1, we also defined the function to take one parameter called


p_president_id. I like to name parameters starting with “p_” for clarity, but
this isn’t required.
In line 2, we said that this function returns text. Then the function’s body goes
from the “$$” symbol at the end of line 2 until the other “$$” on line 28.

In lines 3 – 7, we are declaring variables to be used in the function. I like to


name variables starting with “v_” for clarity, but again, this isn’t required. On
line 7, we declared the v_current_text variable and we also set its value to an
empty string.

The “begin” on line 8 goes with the “end” on line 27. This groups those lines
together in a code block. Notice that the “end” in line 27 has a semi-colon
after it, but the “begin” on line 8 does not.

In lines 9 – 11, we are putting a value in the v_max_president_id variable


from the database. We select the maximum president_id from the us_president
table, which is 46, and put that value into the v_max_president_id variable.

On line 13, we have an “if statement”. The “if” on line 13 goes with the “end-
if” on line 15. If the value of the v_max_president_id variable is the same as
the value of the p_president_id parameter then we set the v_current_text
variable to the text “**Current President**”. If the value of the
v_max_president_id variable is not the same as the value of the
p_president_id parameter then we don’t set v_max_president_id, so it remains
equal to the empty string we set it to in line 7. Notice that the “end-if” on line
15 has a semi-colon after it, but the “if” on line 13 does not.

Lines 17 – 22, we get the president_name and president_party from the


Postgres database and write them to our v_president_name and
v_president_party variables.

In line 2, we promised that this function “returns text”. In line 24, we actually
return that text. We return a text value that is created by combining the
p_president_id parameter and the v_president_name, v_president_party, and
v_current_text variables formatted with punctuation. Line 24 is a long line so
we continued it on line 25. Postgres knows we are done with the line when it
sees the semi-colon at the end of line 25.
Let’s test the function by calling it with an argument of 33 and then again with
an argument of 46.

select f_get_president_note_current(33);

President ID: 33 Name: Harry Truman Political Party: Democrat

select f_get_president_note_current(45);

President ID: 46 Name: Joe Biden Political Party: Democrat **Current


President**

We know the function is working because the data appears in the format we
expected and the current president shows the text “**Current President**”.

Return Types
Functions can return a value of any Postgres data type. The example we just
saw returned a text value, but functions can also return other data types like
int, date, or boolean.

Functions can also return nothing. To define a function that doesn’t return a
value, declare the function to return “void”.

create or replace function f_no_return_type(p_president_id int)


returns void as $$
begin
insert into some_table (president_id)
select p_president_id;
end;
$$ language plpgsql;

Dollar Quoting with Tags


We have seen how dollar quoting is used when creating Postgres functions.
The examples we have seen show how we can use the “$$” symbol to mark
the beginning and the end of the function body. You are also allowed to add a
“tag” between the dollar signs. A tag is a small piece of text. You may see
functions that use text between the dollar signs, for example the following
function uses the text “funct” between the dollar signs on lines 2 and 7.

1 create or replace function f_no_return_type(p_president_id int)


2 returns void as $funct$
3 begin
4 insert into some_table (president_id)
5 select p_president_id;
6 end;
7 $funct$ language plpgsql;

Some developers like to use tags because they feel that it makes the code more
readable. If you plan to use a tag, the text within the dollar signs must be the
same on lines 2 and 7.

Parameters by Position vs Name


When we write a function, we can specify the parameters that the function will
accept, like this:

create or replace function f_get_president(


p_president_id int,
p_party text) ...

This function takes two parameters: an integer called p_president_id and a


text field called p_party. When we call the f_get_president( ) function, we can
pass arguments to it by putting the arguments between commas and within
parens.

select f_get_president(33, 'Republican');

Postgres knows that we want to send the value 33 into the p_president_id
parameter and the text “Republican” into the p_party parameter. It knows that
because of the order of the parameters and arguments. When we called the
function, we provided 33 first and when we created the function, we accepted
p_president_id first. So Postgres knows that we want 33 to be the
p_president_id value. This is called “positional notation”.

Alternatively, we can pass arguments to a function using “named notation”.


When we call the function, we can provide the name of the parameters with
the values that we want them to have.

select f_get_president(
p_president_id => 33,
p_party => 'Republican'
);

The equal sign followed by the greater than sign (“=>”) is intended to look
like an arrow, as if to say p_president_id points to 33 and p_party points to
“Republican”.

Using this named notation, we could switch the order that we provide the
arguments in, and the function would still work. Because we are specifying the
name of the parameter for each value, calling the function with the arguments
in a different order will work just as long as the correct parameter name is
still associated with the correct value.

select f_get_president(
p_party => 'Republican',
p_president_id => 33
);

Named notation is especially helpful when a function takes many arguments.


Using named notation helps to make it easier to see which parameter takes
which value.

Parameter Modes: In, Out, Inout, and Variadic


A function parameter can have one of four possible modes: in, out, inout, and
variadic. The default mode is “in”. All of the parameters we have seen so far
have been “in” parameters. Since we didn’t specify “in”, “out”, “inout”, or
“variadic” mode, Postgres assumed they were “in”.

We can use an “in” parameter to pass a value into the function. We can use an
“out” parameter to return a value from the function. An “inout” parameter is
used when we want to pass an argument into a function, change it, and then
return it. To send an array of values to a function, we can use “variadic”
mode.

For example, here is a function called f_multiply_by_5( ) that multiplies a


number by 5.

1 create or replace function f_multiply_by_5(


2 in p_in int,
3 out p_out int
4 ) as $$
5 begin
6 select p_in * 5
7 into ;p_out;
8 end;
9 $$ language plpgsql;

select f_multiply_by_5(3);

f_multiply_by_5
15

In line 2, we defined p_in as an “in” parameter. In line 3, we defined p_out as


an “out” parameter. Both parameters are integers.

On line 4, we didn’t need to say “returns int” because we defined p_out as an


integer with a mode of “out” in line 3. Postgres already knows to return an
integer from this function.

In lines 6 and 7, we multiplied the p_in parameter by 5 and put the resulting
value into p_out. The p_out parameter gets returned from the function because
we declared its mode as “out” in line 3.

We could rewrite that function using just one “inout” parameter, like this:

1 create or replace function f_multiply_by_5(


2 inout p_inout int
4 ) as $$
5 begin
6 select p_inout * 5
7 into p_inout;
8 end;
9 $$ language plpgsql;

select f_multiply_by_5(3);

f_multiply_by_5
15
Now the function accepts one parameter called p_inout on line 2. The value of
p_inout gets changed on lines 6 and 7 and is automatically returned to the
caller of the function because we declared its mode as “inout” on line 2.

Variadic mode is used less commonly than the in, out, and inout modes.
Variadic mode lets us pass an array into a function. An “array” is a group of
values that all have the same data type. Let’s write a function that accepts an
array of integers and multiplies them together.

1 create or replace function f_multiply_together(variadic p_nums


int[])
2 returns int as $$
3 declare
4 v_tot int = 1;
5 v_ind int;
6 begin
7 foreach v_ind in array p_nums loop
8 v_tot = v_tot * v_ind;
9 end loop;
10 return v_tot;
11 end;
12 $$ language plpgsql;

select f_multiply_together(3,5,2);

f_multiply_together
30

In line 1, we defined the f_multiply_together( ) function and declared one


parameter called “p_nums”, and set its mode to “variadic”. The “int[]” syntax
means that this parameter is an array of integers.

In line 2, we promised that this function will return an integer, which in this
case will be all the parameter values multiplied together.

In lines 4 and 5, we declared two variables. We will use the “v_tot” variable
to calculate the product of the parameter values. We will use the “v_ind”
variable for each individual value.
In lines 7 - 9, we are looping through each value in p_nums. The v_ind
variable is assigned to each value, one value at a time. In line 8, we take the
value of our v_tot variable and multiply it by the value in the v_ind variable.

When we call the function with arguments of 3, 5, and 2, the function loops
through lines 7 – 9 three times. The first time v_ind is 3. The second time
v_ind is 5. The third time v_ind is 2.

The v_tot variable started out as 1 because we defined it as 1 in line 4. The


first time we looped through lines 7 – 9, v_tot became 3 because 1 times 3 is
3. The second time we looped through, v_tot became 15 because 3 times 5 is
15. The third time we looped through, v_tot became 30 because 15 times 2 is
30. When the function was finished looping through the parameter values, it
returned the value of 30 in line 10.

We passed three values into the function, but because the function was written
to accept an array of integers, it can accept any number of integer values.

select f_multiply_together(1,2,3,4,5);

f_multiply_together
120

select f_multiply_together(12,4,9,8,1,333,1,87,5,2);

f_multiply_together
1001237760

PROCEDURES
Procedures, sometimes called “stored procedures”, are similar to functions.
Procedures were added to Postgres fairly recently, in version 11. In the real
world, I see functions being used more frequently than procedures in Postgres.

We saw that you can call a function by using the “select” keyword, like this:

select f_multiply_by_5(3);

To call a procedure, use the “call” keyword instead:


call proc_multiply_by_5(3);

Writing a procedure is similar to writing a function.

1 create or replace procedure proc_multiply_by_5(


2 inout p_inout int
3 ) as $$
4 begin
5 select p_inout * 5
6 into p_inout;
7 end;
8 $$ language plpgsql;

In line 1, we created the procedure using the “create or replace procedure”


statement.

In line 2, we defined an integer parameter called p_inout that has a mode of


“inout”. Parameters for procedures can have one of three modes: in, inout, or
variadic. There is no “out” mode for procedures.

TRIGGERS
You can use a trigger to automatically run a Postgres function when some event
occurs. For example, when a new row gets inserted into a particular table, you
can have a trigger “fire” and call a function to insert a row in another table.

Anytime you think “wouldn’t it be nice if when X happened in the database


that Y automatically happened too”, think of triggers.

Audit Triggers
One common use of triggers is for keeping track of which users have made
changes to a table. In cases where we have several different database users
making changes to data in an important table, it can be useful to create a
second table where we can log which user made which changes.

When new rows get inserted into the us_president table, let’s track which user
inserted them. To do that, we will create three new database objects:
A new table called us_president_insert_audit that will store an audit
trail of the inserts made to the us_president table.
A new trigger called tr_us_president_insert_audit that will fire when a
new row is inserted into the us_president table. This trigger will call a
function.
A new function called tf_us_president_insert_audit( ) that the trigger
will call to insert the audit information into the
us_president_insert_audit table.

When a user inserts a row for a new president in the us_president table, the
trigger will automatically fire and call the function. The function will write a
row to the us_president_insert_audit table. The audit table will store the name
of the user who inserted the row into the us_president table as well as what
time that row was inserted.

Creating the Audit Table


1 create table us_president_insert_audit (
2 insert_user text,
3 insert_date timestamp with time zone,
4 president_id int,
5 president_name text,
6 president_party text
7 );

In line 2, we are creating a column in the audit table called “insert_user” to


track the user who inserted the row into the us_president table.

In line 3, we create a column called “insert_date” to track the date that the
new row was inserted into the us_president table.

In lines 4 – 6, we created columns in the audit table that will track the
president_id, president_name, and president_party columns that were inserted
into the us_president table. The same values for the columns that the user
inserted into the us_president table will be written to these columns in the
us_president_insert_audit table.

Creating the Trigger Function


Now we need to create a function that will insert a row into the
us_president_insert_audit table when a row is inserted into the us_president
table. This type of function is called a “trigger function”. A trigger function is
a function that doesn’t take any parameters and returns “trigger”. Let’s name
the trigger function tf_us_president_insert_audit( ).

1 create or replace function tf_us_president_insert_audit( )


2 returns trigger as $$
3 begin
4 insert into us_president_insert_audit (
5 insert_user,
6 insert_date,
7 president_id,
8 president_name,
9 president_party
10 )
11 values
12 (
13 current_user,
14 now(),
15 new.president_id,
16 new.president_name,
17 new.president_party
18 );
19 return new;
20 end;
21 $$ language plpgsql;

In line 1, I named the function starting with the text “tf_” for “trigger function”,
but this isn’t necessary. Postgres will allow you to pick your own naming
conventions. You can see there are no parameters defined between the parens.
For trigger functions, you don’t define any parameters.

You can see in line 2 that the return type is “trigger”.

In line 13, we got the user’s username by calling the “current_user” Postgres
function. You can test the current_user function by selecting your own
username from the database, like this:

select current_user;
current_user
Rick

In line 14, we got the date and time that the row was inserted by using the
Postgres now( ) function. By the way, you can test this function by selecting the
current date and time where you are.

select now();

now
2021-06-23 10:29:44.216481-04

We use the “new” syntax in lines 15 – 17, for example “new.president_id”.


The “new” tells us that this is the new president_id value that was inserted
into the us_president table.

Creating the Trigger


Now let’s create the trigger that will fire and call our
tf_us_president_insert_audit( ) function.

1 create trigger tr_us_president_insert_audit


2 after insert on us_president
3 for each row
4 execute function tf_us_president_insert_audit();

In line 1, I named the trigger tr_us_president_insert_audit, where “tr” stands


for “trigger”.

In line 2, we specified that the trigger should fire “after insert”. That is, after
the insert has been made to the us_president table. When a user inserts a row
into the us_president table, the row will be written to us_president first and
then the audit record will be written to the us_president_insert_audit table
afterwards. Our options are “before”, “after”, or “instead of”. “Instead of”
means don’t insert the record to the us_president table at all; instead run the
function.

On line 3, we said “for each row”. That means for every row we insert into
the us_president table, the tf_us_president_insert_audit( ) function will be
called once. Our other option is “for each statement”. Consider this insert
statement:

insert into us_president (


president_id,
president_name,
president_party
)
select president_id,
president_name,
president_party
from a_table_with_10_rows;

This one insert statement is inserting 10 rows into the us_president table. If
our trigger specified “for each row”, it would call the
tf_us_president_insert_audit ( ) function 10 times. On the other hand, if we
specified “for each statement” in the trigger, it would call the
tf_us_president_insert_audit( ) function just once. Depending upon what you
are trying to accomplish, there are times that you might want to choose “for
each statement” or “for each row”.

Watching it Work
Now that the trigger is in place, any new rows that get inserted to the
us_president table will get audited to the us_president_insert_audit table. For
example, a user might add a new row to the us_president table using this insert
statement:

insert into us_president (


president_id,
president_name,
president_party
)
values
(
47,
'Hillary Clinton',
'Democrat'
);
We might see this row in the us_president table and wonder who added that
data. Because we have set up an audit trigger, we can query the
us_president_insert_audit table to find out.

insert_user insert_date president_id president_name president_party


HRC 2021-01-21 47 Hillary Democrat
14:39:44.184637- Clinton
04

The audit trail shows that a database user named “HRC” created the row in
the us_president table on 1/21/2021 at 2:39pm Eastern Daylight Time. That
row of data doesn’t appear to be correct, so we might want to pay a visit to
the HRC user and possibly revoke the user’s database privileges. We should
probably also delete that row from the us_president table.

Auditing Updates and Deletes


Triggers have a lot of flexibility, and there is more we can do with them. We
looked at an audit trigger that tracked when users inserted rows into a table,
but you can also audit when users update and delete rows as well. We saw
how the trigger was defined for inserts:

create trigger tr_us_president_insert_audit


after insert on us_president
for each row
execute function tf_us_president_insert_audit();

We could write another trigger for deletes:

create trigger tr_us_president_delete_audit


after delete on us_president
for each row
execute function tf_us_president_delete_audit();

And we could write another trigger for updates:

create trigger tr_us_president_update_audit


after update on us_president
for each row
execute function tf_us_president_update_audit();
We could even write one trigger that fires for inserts, updates, and deletes.

create trigger tr_us_president_audit


after insert or update or delete on us_president
for each row
execute function tf_us_president_audit();

Special Variables for Triggers


Trigger functions can make use of some special variables that regular
functions cannot. The most commonly-used ones are “new”, “old”, and
“tg_op”.

If you are writing a trigger to track rows that have been updated in a table, it is
helpful to know what the original values were and what the new values are.
For example, if a user runs this update statement:

update us_president
set president_name = 'Harry S. Truman'
where president_name = 'Harry Truman';

The original president name, old.president_name, would be “Harry Truman”


and new.president_name would be “Harry S. Truman”. We could use those
variables in our trigger function to write a message like “Harry Truman was
changed to Harry S. Truman” to our audit table.

We already saw “new” used in our tf_us_president_insert_audit( ) function


trigger in lines 15 – 17:

new.president_id,
new.president_name,
new.president_party

Since tf_us_president_insert_audit( ) is used to track inserts, it will have


“new” values but not “old” values. That is because for inserts we are tracking
rows that are being inserted that never existed before.

The special variable “tg_op” tells us if a row was inserted, updated, or


deleted. If you want to write one trigger function that will be called for inserts,
updates, and deletes, it is helpful to know which command caused the trigger
to fire. You can add “if statements” to your trigger functions and take different
actions based on whether a row was inserted, deleted, or updated.

if (tg_op = 'INSERT') then


...
end if;
if (tg_op = 'DELETE') then
...
end if;
if (tg_op = 'UPDATE') then
...
end if;

Altering Triggers
If we wanted to rename a trigger, we can use the “alter trigger” command.

alter trigger tr_us_president_insert_audit on us_president


rename to tr_us_pres_ins_aud;

Here we renamed the tr_us_president_insert_audit to tr_us_pres_ins_aud.

Dropping Triggers
To remove a trigger, we can use the “drop trigger” syntax.

drop trigger tr_us_president_insert_audit on us_president;

Notice that we need to tell Postgres that the trigger was associated with the
us_president table. If we are sure we won’t need them anymore, we could also
drop the other database objects that we created for use with the trigger.

drop function tf_us_president_insert_audit();

drop table us_president_insert_audit;

Disabling Triggers
If you have a trigger that you know you will not need again, you can drop the
trigger. Dropping the trigger will remove it permanently. On the other hand, if
you want to keep the trigger but prevent it from firing temporarily, you can
disable the trigger and reenable it when you want it to start firing again.

alter table us_president


disable trigger tr_us_president_insert_audit;

Notice that to disable a trigger, you alter the table that it is associated with.

Enabling Triggers
After you have disabled a trigger temporarily, you can enable it again by using
the “alter table” command.

alter table us_president


enable trigger tr_us_president_insert_audit;

Disabling a trigger temporarily and reenabling it later could be useful in a


situation where you need to load some data into a table, but for this load you
don’t want the trigger to audit the inserts.
4
BUILT-IN FUNCTIONS

As mentioned, Postgres comes with lots of functions that you can call from
your SQL statements. There are functions to transform strings, do
mathematical calculations, get today’s date, and perform all manner of other
helpful operations.

Postgres comes with over 2,000 functions. You won’t need to use – or even
know about – most of them. The majority of the functions that Postgres
provides are not commonly-used and are not shown here. Most of them can
be found on the Postgres website if you do ever have occasion to use them.

Let’s take a look at some of the built-in Postgres functions that you are most
likely to use.

AGGREGATE FUNCTIONS
We have already talked about aggregate functions. You will use these
functions frequently:

count( )
Returns a count of rows
max( ) Returns the maximum, or largest, of a set
min( ) Returns the minimum, or smallest, of a set
avg( ) Returns the average of a set of numbers
sum( ) Returns the sum of a set of numbers

ABS()
The abs( ) function returns the absolute value of a number. “Absolute value”
measures how far away from zero a number is.
select abs(8),
abs(-8);

abs abs
8 8

The results show that the absolute value of 8 is 8, and the absolute value of
-8 is also 8.

Consider the following table that contains a list of meteorologists, their most
recent forecasted temperature, and the actual temperature:

meteorologist forecasted_high actual_high


Gail Winds 74 80
Sunny Cloudfoot 77 86
Storm Warning 59 38

Which meteorologist made the closest prediction? We can use the abs( )
function to help find out:

select meteorologist,
abs(forecasted_high – actual_high) degrees_off
from weather_forecast_vs_actual;

meteorologist degrees_off
Gail Winds 6
Sunny Cloudfoot 9
Storm Warning 21

We can see that Gail Winds was the most accurate with only 6 degrees
difference between her forecasted temperature and the actual temperature.
Sunny Cloudfoot was 9 degrees off, and Storm Warning’s forecast was wrong
by a whopping 21 degrees.

We subtracted the actual_high from the forecasted_high and then got the
absolute value of that number. If we had not used the abs( ) function, the
results would have been -6, -9, and 21. The abs( ) function had the effect of
removing the minus signs, which was useful for our purposes because we
don’t care whether the forecasted temperature was higher or lower than the
actual temperature.

We used a column alias of “degrees_off” so that the column header of the


result set would be more descriptive.

You can call Postgres functions in the “select list” of your SQL statements, as
shown above, and you can also call functions in the “where clause” of your
SQL statement, as shown below:

select *
from weather_forecast_vs_actual
where abs(forecasted_high – actual_high) < 10;

meteorologist forecasted_high actual_high


Gail Winds 74 80
Sunny Cloudfoot 77 86

UPPER()
The upper( ) function takes a string and returns an all-uppercase version of it.

select upper('MiXed');

upper
MIXED

The upper function can be helpful in all sorts of scenarios. For example, you
may have two tables that were loaded from different data sources that use
different formatting rules for cities.

select * from city;

city state
Boston Massachusetts
Raleigh North Carolina
Sacramento California

select * from yearly_precipitation;


city_name precipitation
BOSTON 47.1
RALEIGH 50.9
SACRAMENTO 21.4

If you were to join the tables, no rows would be returned because the cities
are all in uppercase in the yearly_precipitation table and therefore they
wouldn’t match the cities in the city table:

1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on a.city_name = b.city_name;

The query returns no rows, but if we change the query to join using the upper(
) function on line 6, we get the expected results.

1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on upper(a.city_name) = upper(b.city_name);

city_name state precipitation


Boston Massachusetts 47.1
Raleigh North Carolina 50.9
Sacramento California 21.4

In this case, we actually didn’t need to use the upper( ) function on the cities
in the yearly_precipitation table because they were already in uppercase
letters, but I recommend using upper( ) for both tables just to make sure you
are comparing apples to apples. In the future there could be cities in the
yearly_precipitation table that are not in uppercase letters.

By the way, it’s not a great idea to have different tables in your database that
represent cities in different ways. In a real, production database, we would
pick some standard format for cities and eliminate the need to call upper( ) in
our where clause. We might use upper( ) to help build our real tables based
on raw data tables that were loaded from some external source.

LOWER()
The lower( ) function does the opposite of what the upper( ) function does.
The lower( ) function takes a string and returns the all-lowercase version of
it.

select lower('MiXed');

lower
mixed

The lower( ) function can also be used to join values that are in different
formats, in the same way that we used the upper( ) function above.

1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on lower(a.city_name) = lower(b.city_name);

city_name state precipitation


Boston Massachusetts 47.1
Raleigh North Carolina 50.9
Sacramento California 21.4

In this case, we are matching “boston” to “boston” in line 6, instead of


matching “BOSTON” to “BOSTON” as we did when we used the upper( )
function. As long as we remember to convert the values on both sides of the
equal sign using upper( ) or lower( ), we will get our expected results.

INITCAP()
The initcap( ) function returns the first letter of each word as a capital letter
and all the other letters as lowercase. In other words, the initcap( ) function
takes the initial letter of each word and capitalizes it.
select initcap('george HERBERT wAlKeR BuSh');

initcap
George Herbert Walker Bush

Another example:

select initcap('e e cummings');

initcap
E E Cummings

ROUND()
The round( ) function takes a number with a decimal point and returns the
number rounded to its closest integer value.

select round(3.1415);

round
3

select round(3.5141);

round
4

In the examples above, we supplied one argument to the round( ) function.


3.1415 was rounded to 3, and 3.5141 was rounded to 4. You can also send a
second argument to the round( ) function if you want to round to a digit that is
after the decimal point. The second argument specifies how many places
after the decimal point you want to round to.

select round(3.49999, 2);

round
3.50
This is useful when creating reports or displaying values to users who want
to see numeric values in a consistent format.

TRUNC()
The trunc( ) function truncates a number to a certain number of decimal
places. By default, it truncates all of the digits after the decimal point.

select trunc(3.1415);

trunc
3

select trunc(3.5141);

trunc
3

We got different results using the trunc( ) function than we did with the round(
) function. When we called “trunc(3.5141)” it returned 3 because it truncated
– or chopped off – the “.5141”. The round( ) function, on the other hand,
rounded 3.5141 up to 4.

You can send the trunc( ) function a second parameter to tell it how many
places after the decimal point to keep:

select trunc(3.5141, 2);

trunc
3.51

Here it kept the 2 digits after the decimal point and truncated everything to
the right of that.

CEIL()
The ceil( ) function selects the closest integer that is more than the argument
passed in.

select ceil(3.1415);
ceil
4

select ceil(-3.1415);

ceil
-3

FLOOR()
The floor( ) function selects the closest integer that is less than the argument
passed in.

select floor(3.1415);

floor
3

select floor(-3.1415);

floor
-4

LENGTH()
The length( ) function counts the number of characters in a string.

select length('A long string');

length
13

Let’s get a list of the lengths of the cities in our city table:

select city_name,
length(city_name)
from city;

city_name length
Boston 6
Raleigh 7
Sacramento 10

Those results make sense because “Boston” has 6 letters, “Raleigh” has 7
letters, and “Sacramento” has 10 letters.

One of the aspects of Postgres functions that makes them so useful is that you
can string them together. Earlier we saw that we can use the max( ) function
to get the maximum, or highest, value. Let’s use the max( ) function together
with the length( ) function to see what the longest length for our cities is:

select max(length(city_name))
from city;

max
10

We selected each city_name from the city table and wrapped it in the length(
) function. Then we wrapped that in the max( ) function. At the end of the first
line, there are two right parens. The first right paren ends the call to the
length( ) function. The second right paren ends the call to the max( ) function.

The results show that the longest city name has 10 characters. That makes
sense because our longest city name, Sacramento, has 10 characters.

In a similar way, we could use the min( ) function to get the shortest
character length for our cities:

select min(length(city_name))
from city;

min
6

The number 6 is returned because the shortest city name in our table, Boston,
has six characters.

SUBSTR()
The substr( ) function takes a string and returns some part of it. It returns a
substring.

The substr( ) function takes a string, the position of the character in the string
that you want to start at, and the number of characters that you want to select
from the string.

Let’s say that we have a table that stores barcode numbers.


select * from barcode;

code
811399100131
096244031005
789661001783

Barcodes are typically comprised of several parts. In our example, we want


to extract the manufacturing code, which starts on the 2nd character and is 5
characters long.

select substr(code, 2, 5) mfg_code


from barcode;

mfg_code
11399
96244
89661

Calling “substr(code, 2, 5)” got us the values from the “code” column
starting at position 2 and going for 5 characters. We used a column alias of
“mfg_code” so that the results would display nicely under the heading
“mfg_code”.

If you want to get the string starting at a character position and going all the
way to the end of the string, you can leave off the 3rd argument to the substr( )
function:

select code,
substr(code, 9)
from barcode;

Here we are selecting the part of the “code” column that goes from the 9th
position all the way until the end of the value.

code substr
811399100131 0131
096244031005 1005
789661001783 1783

TRIM()
The trim( ) function returns a string with any extra characters at the beginning
and end of a string removed.

Sometimes when you load data from files into a database, the fields can have
extra spaces at the beginning or end of them. In the raw_data_load table
below, each of the column values have some number of spaces at both the
beginning and end.
select * from raw_data_load;

col1 col2 col3


" Apple " " Penny " " String "

I have included quotes here to highlight where the leading and trailing spaces
are.

If we use the trim( ) function, it will remove the spaces from both ends of
each string:

select trim(col1),
trim(col2),
trim(col3)
from raw_data_load;

col1 col2 col3


"Apple" "Penny" "String"
The trim function also allows you to remove characters other than the space
character. You can specify which character you want to remove. Let’s
remove the letter “k” from the beginning and end of the word “kayak”.

select trim('kayak', 'k');

The result is:

aya

LTRIM()
The ltrim( ) function returns a string with the extra characters at the beginning
of a string removed. It trims characters from the left of the string. Like the
trim( ) function, the default character to remove is the space character.

select ltrim (' Penny ');

This function returns “Penny ” with the spaces removed from the left side, but
not from the right side.

You can also specify which character you want to remove from the left
instead of removing the space characters. Let’s remove the leading zeroes
from “0000738”:

select ltrim('0000738', '0');

ltrim
738

RTRIM()
The rtrim( ) function, as you probably guessed, returns a string with the extra
characters removed from the right side of a string. The default character to
remove is a space character, but you can specify other characters. Here we
trimmed the text “Massachusetts” from the right side of the text “Boston
Massachusetts”:

select rtrim('Boston Massachusetts', 'Massachusetts');


rtrim
Boston

As mentioned, we can string functions together, taking the results of one


function and using them as an argument for another function. We can, for
example, trim “Massachusetts” from the right side of “Boston Massachusetts”
and then take the results and run them through the upper( ) function.

select upper(
rtrim('Boston Massachusetts', 'Massachusetts')
);

upper
BOSTON

LEFT()
The left( ) function returns the first – or leftmost – characters of a string. We
can get the first 3 characters of our customers’ last names using this query:

select last_name,
left(last_name, 3)
from customer;

last_name left
Smith Smi
Guy Guy
Jones Jon

If we call the left( ) function with a negative number of, say -1, it will return
everything except the last 1 character:

select last_name,
left(last_name, -1)
from customer;

last_name left
Smith Smit
Guy Gu
Jones Jone
RIGHT()
The right( ) function returns the last – or rightmost – characters of a string.
We can get the last 3 characters of our customers’ last names using this
query:

select last_name,
right(last_name, 3)
from customer;

last_name right
Smith ith
Guy Guy
Jones nes

If you call the right( ) function with a negative number, say -2, it will return
everything except the first 2 characters:

select last_name,
right(last_name, -2)
from customer;

last_name right
Smith ith
Guy y
Jones nes

LPAD()
The lpad( ) function adds characters to the left of a value. It pads characters
to the left. You can specify the characters you want added and what the total
length should be.

Our send_check table has a column called “amt” that represents the amount
of the checks that we need to send out.

select amt
from send_check;

amt
123
854
24

We can add asterisks to the beginning of the amount to prevent people from
fraudulently writing in their own numbers before our amounts:

select lpad(amt, 5, '*')


from send_check;

lpad
**123
**854
***24

REPLACE()
The replace( ) function replaces some value in a string with a new value.

Let’s say we have a table of airport runways that includes runways for
“Orange County Airport”. That airport was renamed to “John Wayne
Airport”, so we want to display “John Wayne Airport” in our results instead
of “Orange County Airport”.

select runway_name
from runway;

runway_name
Runway 172 – Orange County Airport
Runway 213 – Orange County Airport
Runway 985 – Washington National Airport

We can use the replace( ) function to replace “Orange County Airport” with
“John Wayne Airport”:

select replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
from runway;
Runway 172 – John Wayne Airport
Runway 213 – John Wayne Airport
Runway 985 – Washington National Airport

Notice that the replace( ) function didn’t change the entire string, just the part
that contained “Orange County Airport”. The text “Runway 172” and
“Runway 213” is still intact.

Also notice that we have not changed the values in the table with the replace(
) function, we are just selecting these values for display. If we wanted to
actually update the runway table, we could use the replace( ) function as part
of an update statement:

update runway
set runway_name =
replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
;

This update statement will change the runways that have “Orange County
Airport” in their name but won’t affect runways that do not.

FORMAT()
The format( ) function formats values based on a format string that you
provide.

select format(
'President %s was a %s',
president_name,
president_party
)
from us_president
where president_id between 31 and 34;

format
President Herbert Hoover was a Republican
President Franklin Roosevelt was a Democrat
President Dwight Eisenhower was a Republican
President Harry Truman was a Democrat

The format string used here is “President %s was a %s”. The “%s” is a
placeholder for a string value. The format string in this case has two
placeholders, the first one was replaced by the president_name value from
the us_president table, and the second one was replaced by the value of the
president_party column.

EXTRACT()
The extract( ) function selects parts from a timestamp or time value.

We saw earlier that the now( ) function returns the current date and time. The
data type that the now( ) function returns is a “timestamp with time zone”. It
includes several subfields that you can access by using the extract( ) function.

select now();

now
2021-06-29 12:13:00.439298-04

We can extract the century from the current date and time:

select extract(century from now());

date_part
21

We can extract the day from the current date and time:

select extract(day from now());

date_part
28

We can extract the month from the current date and time:

select extract(month from now());


date_part
10

Other subfields that we can extract include:

dow The day of the week. 0 (Sunday) to 6 (Saturday)


doy The day of the year. 1 – 366
hour Hour. 0 – 23
minute Minutes. 0 – 59
quarter The quarter of the year. 1 – 4
second Seconds. 0-59
timezone The time zone offset from UTC
week Week. 1-53
year Year expressed in 4 digits, like 2020

EXP()
The exp() function returns the exponential of the number supplied.

select exp(1);

exp
2.718281828459045

The results show that the exponential of 1 is 2.718281828459045.

PI()
The pi( ) function returns the value of pi.

select pi();

pi
3.141592653589793

POWER()
The power( ) function returns the value of a number raised to the power of
another number.
select power(3, 2);

power
9

RANDOM()
The random( ) function returns a random number between 0 and 1.

select random();

random
0.41278845070616654

POSITION()
The position( ) function finds the position of a substring within a string.

select position('pal' in 'principal');

position
7

The results tell us that the string “pal” is found at position 7 of the string
“principal”.

If we search for the position of a substring that doesn’t exist within a string,
Postgres will return a zero:

select position('i' in 'team');

position
0

The query returned zero because there is no “i” in “team”.

Strangely, Postgres has another function called strpos( ) that does the same
thing that the position( ) function does, using slightly different syntax. The
strpos( ) function also returns the position of a substring within a string, but
with the strpos( ) function, you supply the string first and the substring
second, separated by commas:

select strpos('principal', 'pal');

strpos
7

VERSION()
The version( ) function tells you which version of Postgres you are running.

select version();

version
PostgreSQL 13.2, compiled by Visual C++ build 1914, 64-bit

The results show that we are using Postgres version 13.


5
TIPS AND TRICKS

PGADMIN
Using PgAdmin to interact with Postgres will make life easier for you.
PgAdmin is a free and open source tool designed for Postgres. PgAdmin is a
graphical tool that runs in your web browser. PgAdmin is the Swiss Army
Knife of database tools. It has a slew of features for seemingly anything you
could ever want to do with Postgres.

You can download PgAdmin from the PgAdmin website. As of this writing, it
can be found at https://www.pgadmin.org/download/

The first time you use PgAdmin, it will ask you to create a master password.
Each time you start PgAdmin it will ask you for this password.
Connecting PgAdmin to a Postgres Server
To use PgAdmin with your Postgres dstabase servers, you will create
connections to the Postgres servers that you want to use. Here I created two
connections to two different servers. I called the connections “Server 1 –
Development” and “Server 2 – Production”.

Production Servers and Development Servers


Many times, companies will set up Postgres database servers for
development or testing purposes. These servers store test data for testing
SQL queries or for testing a set of features before they “go live” to the
production environment.

The production environment is where the real, live data is stored. Production
Postgres servers are where the data that the users of our applications see is
located.

It is important that the data in the production environment is accurate and


error-free. It is not as important that the data on our development or test
servers is perfect. The data on test and development servers is usually seen
only by developers who are using the data to test or develop new
functionality. That data is usually not visible to our end users.

For this reason, some companies will give developers access to test or
development Postgres servers, but not to production servers. The thinking is
that there is no way for a developer to make a mistake in production if he or
she doesn’t have access to the production environment.
If you do have access to both production and development Postgres database
servers, however, it’s a good idea to mark each server connection clearly in
PgAdmin. That will help to prevent you from making a mistake in the
production environment. You want to make sure you don’t, say, delete data
from the production environment thinking that you are in the development
environment.

Creating the Server Connections


To create the connections to Postgres database servers in PgAdmin, click the
“Object” pull-down menu and choose “Create” and then “Server…”.

That will take you to the “General” tab of PgAdmin’s “Create – Server”
screen. The only value you need to enter on this tab is the server connection
name. You can call the connection whatever you want, but it’s a good idea to
name it something meaningful that describes what the server is used for.
As mentioned, if you have access to production Postgres servers as well as
non-production Postgres servers, I recommend that you clearly mark which
connection is which when you set up your server connections. Here I called
this connection “Postgres Essentials – Production”.
Another way to make it clear that this is a production server is to use color.
Some PgAdmin users will use the “Background” color picker to change a
connection’s background to red for a production database server.

The connections below show “Server 1 – Development” with the default


white background and “Server 2 – Production” with a bright red background.
After you have entered a server connection name in the “General” tab, you
can go to the “Connection” tab and supply PgAdmin with the connection
information for your Postgres database server. The “General” and
“Connection” tabs are the only tabs you will typically need for entering your
server connection information. Most times, the default values will suffice on
the other tabs.
The person who set up your Postgres server, probably your Database
Administrator, can provide you with the information to enter on this screen.
The “host name/address” is the IP address or hostname of the Postgres
database server. The “port” is usually 5432. The “maintenance database” is
usually “postgres”. You will also enter your database username and
password. If you want PgAdmin to save your password for next time, you can
click the “save password” checkbox. When you click the “save” button, your
new connection will be saved and you can use PgAdmin to quickly connect
to it from now on.

Now let’s take a look at two of the most useful parts of PgAdmin: The Tree
Control pane and the Query Tool.

The Tree Control Pane


When you open PgAdmin, you will see the Tree Control pane on the left.
PgAdmin’s Tree Control shows you the database objects on your Postgres
server organized as a hierarchical “tree”. You can click on the right arrow
(“>”) symbol to expand a node and click on the down arrow (“v”) symbol to
collapse a node. For example, if we wanted to take a look at the “bookstore”
table, we could navigate to that table by expanding “Servers”, “Server 1 –
Development”, “Databases”, “Database1”, “Schemas”, “Schema1”,
“Tables”, and then “bookstore”. If you wanted to see which columns are in
the bookstore table, you could expand “columns” and store_id and
store_name would appear.

In other words, the bookstore table is on the Postgres database server with
the connection I called “Server 1 – Development”. That server has a
database called “Database1”. That database has a schema called “Schema1”.
That schema has a table called “bookstore”. That table has columns called
store_id and store_name. In the Tree Control pane, we can navigate to the
bookstore table by expanding nodes until we reach it.

The numbers shown in the parens to the right of “Servers” and “Databases”
tell us how many objects of those types there are. It says “Servers (2)”
because I have 2 server connections set up in PgAdmin. It says “Databases
(3)” because there are 3 databases under the “Server1 - Development”
server.

The Tree Control pane is powerful because it lets us quickly drill down and
examine the part of the database we are interested in. It gives us a visual
sense of how our database objects are organized in Postgres, as well as what
naming conventions were used to create those objects.
The Query Tool
The Query Tool lets you type in some SQL, run that SQL in Postgres, and see
the results. To start the Query Tool, choose “Query Tool” from the “Tools”
drop-down menu.

Below, I entered the SQL “select * from us_president;” and clicked the
“execute” icon (the one shown at the top right), and the results appeared in
the lower “data output” pane. You can also press F5 instead of clicking the
execute icon to execute a query.
The Query Tool can be used for all sorts of SQL statements. You can use it to
run SQL to create tables, create functions, run procedures, create triggers,
and more.

CHANGING A TABLE’S COLUMN ORDER


If you want to make changes to an existing table, you have two options: You
can drop and recreate the table, or you can “alter” the table.

Let’s say we have a table called customer that looks like this:

customer_id first_name last_name city state zip


1 Joe Smith San Jose CA 12121
2 Sum Guy Omaha NE 49494
3 Mary Jones Merrimack NH 78787

Now imagine that the boss tells us that we have to start tracking customer
middle names. We need to add a middle name column to the customer table.
We can do that by using the “alter table” command.

alter table customer add column middle_name text;


We added a new column to the customer table called middle_name and we
used the “text” data type. Now if we take another look at the table, we see
that the middle_name column is there, and all the middle name values were
set to null by default.

select * from customer;

customer_id first_name last_name city state zip middle_name


1 Joe Smith San Jose CA 12121 null
2 Sum Guy Omaha NE 49494 null
3 Mary Jones Merrimack NH 78787 null

Altering the table worked nicely. We didn’t lose any of our data and the new
column now exists. Now we could update the middle names for existing
customers and insert middle names for our new customers and we could
move on to our next task.

But there is something bugging me about this table. It is the order of the
columns. I would expect the middle_name column to come between the
first_name and last_name columns. But in our customer table, the “alter”
command added middle_name as the last column.

In relational databases like Postgres, the order of the columns doesn’t really
matter, so the “alter table / add column” statement just adds the column as the
last column in the table. Many Postgres developers would declare victory
and leave this table the way it is, with middle_name as the last column. After
all, when we select from the table, we can specify the column order we want,
like this:

select customer_id,
first_name,
middle_name,
last_name
from customer;

And we will see the columns in the order we expect them:

customer_id first_name middle_name last_name


1 Joe null Smith
2 Sum null Guy
3 Mary null Jones

Still, some developers will be bothered by having middle_name as the last


column in the table, and will want to change the order of the columns in the
table. To reorder the columns, you could drop and recreate the customer
table with a new column order. The trouble is that when you drop a table, it
not only removes the table but it also removes all the data in the table!

To reorder the columns in the customer table without losing our data, we can
copy our customer data to another table temporarily, drop and recreate our
customer table with a new column order, and then copy the saved data from
the temporary table back to our new customer table.

-- Make a copy customer table copy that includes all data


create table customer_temp as
select * from customer;

-- Drop the original customer table


drop table customer;

-- Recreate a new customer table with the new column order


create table customer
(
customer_id int,
first_name text,
middle_name text, -- Put middle_name here
last_name text,
city text,
state text,
zip text
);

-- Put the saved data into the new customer table


insert into customer (
customer_id,
first_name,
middle_name,
last_name,
city,
state,
zip
)
select customer_id,
first_name,
middle_name,
last_name,
city,
state,
zip
from customer_temp;

-- Remove the copied data table


drop table customer_temp;

Now if you select from the customer table the column order will be updated.

select * from customer;

customer_id first_name middle_name last_name city state zip


1 Joe null Smith San Jose CA 12121
2 Sum null Guy Omaha NE 49494
3 Mary null Jones Merrimack NH 78787

DEBUGGING WITH RAISE


The “raise” statement can be a big help when you want to debug your
PL/pgSQL code. The purpose of the raise command is to report messages
and errors. There are six levels of severity that you can raise: debug, log,
info, notice, warning, and exception. All of the levels raise messages of
different priorities. The “exception” level is the most severe level, and will
not only raise a message but will stop the execution of your SQL. You can
raise a message by providing the level and a message. I like to use the
“notice” level for debugging.

raise notice 'Test Message';

Let’s take a look at a function that isn’t working properly. Let’s see if we can
figure out what the problem is. The function is supposed to compare the
number of Republican presidents with the number of Democrat presidents
and return the ratio of Democrats to Republicans. The problem is that the
function always returns 1.0. We would expect that about half of the presidents
that were either Democrats or Republicans would have been Democrats, so
the ratio should be about .5 or 50%, not 1.0 or 100%.

Here is the function:

1 create or replace function f_dem_to_rep_ratio() returns numeric


2 as $$
3 declare
4 v_num_democrats numeric;
5 v_num_republicans numeric;
6 v_dem_to_rep_ratio numeric;
7 begin
8 select count(*)
9 into v_num_democrats
10 from us_president
11 where president_party = 'Democrat';
12
13 select count(*)
14 into v_num_republicans
15 from us_president
16 where president_party = 'Republicans';
17
18 v_dem_to_rep_ratio = v_num_democrats / (v_num_republicans +
v_num_democrats);
19
20 return v_dem_to_rep_ratio;
21 end;
22 $$ language plpgsql;

Our f_dem_to_rep_ratio( ) function takes no arguments and returns one


numeric value that is the ratio of Democrats to Republicans.

In lines 8 - 11, we are selecting the count of Democrats from the us_president
table. We put that value in a variable called v_num_democrats.

In lines 13 - 16, we are selecting the count of Republicans from the


us_president table. We put that value in a variable called
v_num_republicans.

In line 18 we calculate the ratio of Democrats to Republicans by dividing the


count of the Democrats by the total of the counts for Democrats and
Republicans. In line 20 we return that ratio.

We know the function is returning 1.0 and we know that is wrong, but it’s a
little hard to look at the function and determine what the problem is. We
aren’t sure what part of the function is causing the problem. So let’s add
some raise statements to the function and see if that will help us to debug it.

1 create or replace function f_dem_to_rep_ratio() returns numeric


2 as $$
3 declare
4 v_num_democrats numeric;
5 v_num_republicans numeric;
6 v_dem_to_rep_ratio numeric;
7 begin
8 select count(*)
9 into v_num_democrats
10 from us_president
11 where president_party = 'Democrat';
12
13 raise notice 'The count of Democrats is %', v_num_democrats;
14
15 select count(*)
16 into v_num_republicans
17 from us_president
18 where president_party = 'Republicans';
19
20 raise notice 'The count of Republicans is %', v_num_republicans;
21
22 v_dem_to_rep_ratio = v_num_democrats / (v_num_republicans +
v_num_democrats);
23
24 raise notice 'The ratio is %', v_dem_to_rep_ratio;
25
26 return v_dem_to_rep_ratio;
27 end;
28 $$ language plpgsql;

We added three “raise notice” statements at lines 13, 20, and 24. These
statements will tell us the numbers of Democrats and Republicans that are
being returned from the database, as well as the ratio that was calculated in
line 22.
In the “raise notice” statements, we used a percentage sign as a place holder
for the variable and we supplied the variable name. In the line:

raise notice ‘The count of Democrats is %’, v_num_democrats;

The value of the v_num_democrats variable will be shown where the percent
sign is, like “The count of Democrats is 17”.

When we run the function now, we can see the numbers being displayed from
our “raise notice” statements:

select f_dem_to_rep_ratio( );

NOTICE: The count of Democrats is 17


NOTICE: The count of Republicans is 0
NOTICE: The ratio is 1.00000000000000000000

This gives us a lot of insight into why the function is returning the wrong
ratio. The function is showing that there are zero Republicans. That can’t be
right. Now we can take a look at the part of the function that gets the number
of Republicans and determine what the problem is. The part of the function
that gets the count of Republicans looks like this:

select count(*)
into v_num_republicans
from us_president
where president_party = 'Republicans';

If we look at the rows of the us_president table we see that there are no rows
for ‘Republicans’ in the table. The values are stored in the database as
‘Republican’, without the “s”.

If we update our f_dem_to_rep_ratio( ) function and change ‘Republicans’ to


‘Republican’ on line 18, we can then rerun the function and see the results
we had hoped for. The ratio being returned is now
0.47222222222222222222 and the “raise notice” messages look a lot more
reasonable:
NOTICE: The count of Democrats is 17
NOTICE: The count of Republicans is 19
NOTICE: Ratio = 0.47222222222222222222

Adding the “raise notice” messages helped us to figure out which part of the
function was causing the problem. That made fixing the function much easier.

Now that we have fixed the problem, we can remove the “raise notice”
statements from the function.

ALL BALLS
One of the quirkier features of Postgres is the “all balls” syntax. “All balls”
is slang for all zeroes, since zeroes look like balls. If you want to insert all
zeroes into a column that has a time data type, you can surround the text
“allballs” in single quotes.

insert into start_time (twotz)


select 'allballs';

select * from start_time;

twotz
00:00:00

CASTING DATA TYPES


We have seen how Postgres tables have columns and those columns have
data types like integer, text, and boolean. There are times when we need to
convert a value from one data type to another data type. This is known as
“casting”.

Postgres provides us with multiple ways to cast. There is the cast( ) function,
the cast operator (“::”), and there are other functions available to help us cast
like the to_char( ), to_date( ), and the to_number( ) functions.

Here is an example of a scenario where you might want to cast from one data
type to another: Let’s say we want to see the current year and month in
“YYYY-MM” format. We don’t need the day or the time. We know that there
is a now( ) function that can get the current date for us, and we know there is
a substr( ) function that we can use to get a part of a value that we care about.
Let’s use those functions:

select now();

now
2021-06-28 12:13:00.439298-04

We only need the “2021-06” part of the date, so let’s wrap the now( )
function in the substr( ) function to select just the first 7 characters.

select substr(now(), 1, 7);

Postgres returns an error message:

ERROR: function substr(timestamp with time zone, integer, integer)


does not exist
LINE 1: select substr(now(),1,7);
^ HINT: No function matches the given name and
argument types. You might need to add explicit type casts.
SQL state: 42883
Character: 8

The problem is that the substr( ) function expects the first argument it gets
sent to be a string, and we sent it the results of the now( ) function, which is a
datetime.

The error message that Postgres gave us tells us how to resolve the problem:
“You might need to add explicit type casts.” We need to send the substr( )
function a string instead of a datetime. Let’s try calling the now( ) function
and then casting the results to the text data type before we send it to the
substr( ) function.

select substr(now()::text, 1, 7);

substr
2021-06

It worked.
The “::” symbol (two colons) is the Postgres cast operator. The syntax “now(
)::text” calls the now( ) function and then casts the results to a text data type.
Another way to accomplish the same thing is to use the cast( ) function:

select substr(cast(now() as text), 1, 7);

substr
2021-06

We got the same results. The syntax “cast(now( ) as text)” also calls the now(
) function and then casts the results to the text data type, but this time we are
using the cast( ) function instead of “::”.

With the cast function, we specify the data type that we want to cast to after
the word “as”:

select cast('123' as int);

Functions for Casting


Postgres also provides us with some other functions that we can use to cast.
The functions are to_date( ), to_timestamp( ), to_char( ), and to_number( ).

These functions take two arguments: The value that needs to be converted,
and the format to use when converting the value. The format is made up of a
set of character patterns that identify the parts of the field. This group of
formatting characters is sometimes called a “format mask”.

to_date( )
The to_date( ) function converts a value to a date. For example, if we wanted
to convert a string to a date, we might use this SQL statement:

select to_date('28 Oct 2021', 'DD Mon YYYY');

We know that the original value “28 Oct 2021” is a string because it is
surrounded by single quotes. The query will return the string “28 Oct 2021”
converted to a “date” data type.
to_date
2021-10-28

The “DD Mon YYYY” format that we used when calling the function is a set
of characters that identify which part of the string “28 Oct 2021” is the day,
month, and year. By using the format “‘DD Mon YYYY” we let Postgres
know that the “28” in our string is a 2-digit day, the “Oct” in our string is a 3-
character month that is capitalized, and the “2021” in our string is a 4-digit
year. With this information, Postgres can convert the string to a date.

Postgres provides more character patterns than you will ever need, but the
most common patterns for dates are:

DD Day of month (01–31)


D Day of the week, Sunday (1) to Saturday (7)
MM Month number (01–12)
MONTH Uppercase month name
Month Capitalized month name
MON 3-character uppercase month name
Mon 3-character capitalized month name
YYYY 4-digit year
YY 2-digit year

to_timestamp( )
The to_timestamp( ) function converts a value to a timestamp. Recall that a
“timestamp” data type in Postgres contains not only a date, but also a time
and a time zone. For that reason, the to_timestamp( ) function can make use of
some character patterns that weren’t useful for the to_date( ) function. The
most commonly-used of those character patterns are:

HH Hour of day (0-12)


HH24 Hour of day (0-23)
MI Minute (00–59)
SS Second (00–59)
MS Millisecond (000–999)
AM, am, PM or pm Meridiem indicator
TZH Time zone hours

If we wanted to convert a string to a timestamp, we might use this SQL


statement:

select to_timestamp(
'28 Oct 2021 09:36:47 am',
'DD Mon YYYY HH:MI:SS AM'
);

The query will return the string “28 Oct 2021 09:36:47 am” converted to a
“timestamp” data type. The format mask told Postgres how to parse the parts
of the string into the correct parts of the timestamp.

to_timestamp
2021-10-28 09:36:47-04

to_char( )
The to_char( ) function converts a value to a string.

select to_char(clock_change_date, 'Mon DD, YYYY')


from clock_change;

to_char
Nov 01, 2020
Mar 14, 2021
Nov 07, 2021
Mar 13, 2022
Nov 06, 2022

The to_char( ) function converted the dates in our clock_change table to


strings. The format mask of “Mon DD, YYYY” tells Postgres to display the
string as a 3-character month with the first character in uppercase followed
by a space, then a 2-digit day followed by a comma and a space, then a 4-
digit year. By using the to_char( ) function to convert dates to strings, we can
control the format that the value is displayed in.

The to_char( ) function can use some character patterns that we haven’t seen
before. These patterns are for converting numbers to strings. The most
commonly-used are:

9 1-digit position that will not show for leading zeroes.


0 1-digit position that will show for leading zeroes.
. Decimal point
, Comma (thousands separator)
L Currency symbol that uses locale
D Decimal point that uses locale
G Group separator that uses locale
MI Minus sign if the number is negative
PL Plus sign if the number is positive
SG Plus or minus sign

select to_char(123, '99999'),


to_char(123, '00000'),
to_char(1234.56, 'PL9G999.99');

to_char to_char to_char


123 00123 + 1,234.56

You can see that the “99999” format mask did not display any leading zeroes,
but the “00000” format mask did. The “PL9G999.99” format mask added a
plus sign and a “group separator” to our number, which in the United States
displays as a comma. It also added a decimal point.

One of the wackiest character formats is “RN”, which will return a Roman
numeral:

select to_char(2021, 'RN');

to_char
MMXXI

to_number( )
The to_number( ) function converts a value to a number.

select to_number(
'$1,234.56',
'L9G999.99'
);

to_number
1234.56

The format mask of “L9G999.99” helped Postgres determine which character


in the “$1,234.56” string was a currency symbol and which character was a
group separator (a comma in the US) so that it could convert the string to the
number 1234.56.

USING PARENS FOR MATH


Let’s take a look at a table of loan information:

select * from loan_info;

property_tax insurance rate


50 40 3.25

The following query does a calculation using the numbers from the table:

select property_tax + insurance * 12 / rate


from loan_info;

197.6923076923076923

That result doesn’t seem right to me. If the property tax was 50 and the
insurance was 40, those two values added together would be 90. If you
multiply 90 by 12 you get 1,080. If you divide 1,080 by 3.25 you get 332.30,
not 197.69. What’s going on here?

It turns out that Postgres didn’t solve the calculation in a left-to-right manner.
Instead, it did the multiplication and division first and then it added the
property tax. It multiplied the insurance of 40 by 12, which came to 480.
Then it divided 480 by the rate of 3.25, which came to 147.69, and then it
added the property tax of 50, for a total of 197.69.
That’s not what I had intended when I wrote the query, but the way Postgres
performed the calculation was actually correct. The problem was in the way
I wrote the query, but luckily there is an easy solution: Use parens.

select ((property_tax + insurance) * 12) / rate


from loan_info;

332.3076923076923077

Here I added parens to group parts of the calculation. I put parens around
“property_tax + insurance” so that they would be calculated together. I also
put parens around “(property_tax + insurance) * 12”. Postgres honored my
groupings and produced the results that I wanted.

As a check, I like to count the number of left and right parens to make sure
they match. In “((property_tax + insurance) * 12) / rate” we have 2 left
parens and 2 right parens.

TRANSFERRING DATA TO AND FROM FILES


At some point you may need to load data from a data file into a Postgres
database, or you may need to take data from a Postgres database and copy it
to a file. While there is more than one way to accomplish that, my favorite
approach is to use the “copy” command.

In its simplest form, you can copy data from a Postgres table to a file by
using the copy command like this:

copy us_president to '/home/pres/us_president.txt';

We are copying the us_president table to a new file called us_president.txt in


the “/home/pres/” directory. Using the command line interface of your
computer (also called the console, the command prompt, or terminal), you
can go to the /home/pres/ directory and see that the file got created. We can
check the contents of the file by typing it out:

> cd /home/pres/
> cat us_president.txt
1, George Washington
2, John Adams, Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican

We can see that the copy command copied all 45 rows from the us_president
table to the /home/pres/us_president.txt file. Postgres didn’t remove the rows
from the us_president table, it just created a data file with a copy of the
values from the us_president table.

The copy command lets us create files in different formats. For example, we
can create a CSV (Comma Separated Values) file by specifying “with csv”:

copy us_president to '/home/pres/us_president.csv' with csv;

The file now gets created with commas between each field:

> cd /home/pres/
> cat us_president.csv

1, George Washington,
2, John Adams,Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican

Using SQL, let’s create a new file and use a pipe character as a delimiter
between the fields in the file:

copy us_president to '/home/pres/us_president_pipe.txt' with


delimiter '|';

In the last data file we created, we saw commas between the fields, but in
this file we see pipe characters instead.

> cd /home/pres/
> cat us_president_pipe.txt
1|George Washington|
2|John Adams|Federalist
3|Thomas Jefferson|Democratic-Republican
4|James Madison|Democratic-Republican
5|James Monroe|Democratic-Republican

Now let’s look at loading data from a file into a table. Let’s load the rows
from our us_president.txt file into a new table called us_president_staging.
To load data from a file into a table, you can specify that you want to copy
“from” the file instead of copying “to” the file.

copy us_president_staging from '/home/pres/us_president.txt';

When you load data from a file into a database, it’s a good idea to load the
data into a staging table first and move the data to its permanent table later. A
“staging table” is a table that you create just for the purposes of loading data.
Using this approach, you can decrease the risk of causing problems with the
data in your permanent table if there is a problem loading the data from the
file.

First, let’s create the us_president_staging table with the same column
definitions as the us_president table, but with no rows of data:

1 create table us_president_staging as


2 select *
3 from us_president
4 where 1 = 2;

Lines 1 – 3 tell Postgres to create a table named us_president_staging that is


the same as the us_president table. The “where clause” in line 4 controls
how many rows from the us_president table will be written to the
us_president_staging table. Since the number 1 can never equal 2, no rows
are written to us_president_staging, and we are left with a table in the image
of us_president but with no rows.

Now we can load data from the data file into our staging table using SQL:

copy us_president_staging from '/home/pres/us_president.txt';


Checking the us_president_staging table, we can see that it worked:

select * from us_president_staging;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

Now we could load the contents of the staging table into a permanent table.

LOADING UNSTRUCTURED DATA


Earlier, we said that you should create a primary key for most of your
Postgres tables. A primary key is a column, or columns, in a table that
uniquely identifies the rows in the table.

For example, a table containing data about authors might have a primary key
of the author_id column. That means that there can’t be more than one row in
the table that has the same value for the author_id column.

In another table containing data about books, the primary key might be the
author_id and the book_name columns. That would allow one – but only one
– row per author ID and book name. There could never be more than one row
in the table with a particular author_id and book_name combination.

But here is a scenario that comes up occasionally where it makes sense to


have a table without a primary key: Let’s say you have a data file containing
a list of donations for your non-profit organization. You are being asked to
create a report of the total donation amount by donor. The trouble is, you
don’t know how many records there will be per donor because sometimes a
donor will donate more than once.

One way to approach this is to create a staging table with no primary key to
store the donation detail information. We could load the data file into that
table. Then we could use the data in that detail staging table to build a
summary table.

create table donation_detail_staging

(
donor text,
donation_amount int
);

When we load our data file into the donation_detail_staging table, we might
see data that looks like this:

donor donation_amount
Jacki Smith 200
Jacki Smith 850
RJ Boyle 100
Jacki Smith 100

We can see that Jacki Smith made 3 different donations.

We could then use the data in the donation_detail_staging table to create a


donation_summary table that sums up the donation amounts by donor:

create table donation_summary as


select donor,
sum(donation_amount) as donation_sum
from donation_detail_staging
group by donor;

If we look at the data in the donation_summary table, we see only two rows:

select * from donation_summary;

donor donation_sum
Jacki Smith 1150
RJ Boyle 100

Now we can create a primary key on the donation_summary table because


the donor column uniquely identifies the rows. There will never be more than
one row in this summary table for each donor.

alter table donation_summary add primary key (donor);

There is no primary key for the donation_detail_staging table because there


are no columns that uniquely identify the rows in the table. We can’t make the
donor column the primary key for the donation_detail_staging table because
there can be multiple rows for the same donor. We can’t make the
donation_amount column the primary key because there can be multiple rows
that have the same amount. We can’t even make the primary key the
combination of the donor and donation_amount columns because the same
donor could make another donation of the same amount. So, we could leave
the donation_detail_staging with no primary key.

Primary keys are important because they prevent duplicate rows, and they
speed up joins between tables. For that reason, we might choose to use the
donation_summary table in any future SQL joins, and drop the
donation_detail_staging table once we are done using it to build the
donation_summary table.

COMMON TABLE EXPRESSIONS


Common Table Expressions, or CTEs, are SQL statements that create a
temporary result set. Common Table Expressions can be used to simplify
complex queries.

In your query, you can join to the CTE as if it were a table. You can create a
CTE using the “with” keyword:

1 with party_cte as
2 (
3 select president_party,
4 count(*) as president_count
5 from us_president
6 group by president_party
7 )
8 select avg (president_count)
9 from party_cte;
avg
7.6666666666666667

In line 1, we created a CTE called “party_cte” using the query in lines 3 – 6.


If we were to run lines 3 – 6 in isolation, we would see these results:

select president_party,
count(*) as president_count
from us_president
group by president_party;

president_party president_count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democrat 17
Whig 4

In lines 8 and 9, we selected from the CTE as if it were a table. We selected


the average number of presidents per political party, which turned out to be
7.6666666666666667.

CALLING POSTGRES FROM PROGRAMMING


LANGUAGES
Many computer programs that you write will need to make calls to a Postgres
database. You may just need to select data from the database and display it,
or you may need to insert, update, or delete data from Postgres in your
programs.

The syntax for calling Postgres, of course, will vary depending upon the
computer language that you are using. The way that Python programs call
Postgres will look a little different from the way that Java or PHP programs
call Postgres, but there are similarities.

Let’s take a look at how we would call a Postgres database from the Python
programming language.
A common way to access a Postgres database from Python programs is to use
the “psycopg2” database adapter. Psycopg2 is compatible with Python
version 3, which – at the time of this writing – is the version of Python that
most people are using.

As of this writing, psycopg3 is in development, but not yet ready for prime
time.

Apparently, the name “psycopg2” comes from combining “psyco” with “pg”
and adding the number 2. Rumor has it that the name “psyco” is a reference to
an old Python compiler, “pg” stands for Postgres, and the number 2 means
that it implements version 2 of the Python DB API.

Here is a Python program that I wrote called “python_calling_postgres.py”


that gets US President data from a Postgres database and displays it to the
user:

1 import psycopg2
2
3 conn = psycopg2.connect(
4 host="111.11.11.11",
5 port="5432",
6 database="president",
7 user="Rick",
8 password=”guacamole83”)
9
10 cur = conn.cursor()
11 cur.execute ('select * from us_president;')
12 presidents = cur.fetchall()
13 for president in presidents:
14 print(f"President {president[0]} was {president[1]}")
15 cur.close()

In line 1, we import the psycopg2 module so that we can interact with our
Postgres database.

In line 3, we connect to our Postgres database and create a Python variable


called “conn” which represents the connection to our database. In lines 4 – 8,
we provide our authorization criteria to connect to the database. Here we are
connecting to a database named “president” on a server with the IP address
of 111.11.11.11. The database User ID is “Rick”, and the database password
is “guacamole83”. The port is usually 5432, so if you leave off the port it
will default to 5432.

In line 10, we use the “conn” Python variable to create a Python variable
named “cur”, which represents a cursor. A cursor allows us to send
commands to the Postgres database and return data back to our Python
program.

In line 11, we use our cursor to execute a SQL command. We select “star”
(everything) from the us_president database table.

In line 12, we fetch all the results and write them to a Python variable named
“presidents”.

In line 13, we use a Python “for loop” to iterate through all the presidents
data and write each individual row to a Python variable called “president”
(with no “s” at the end).

In line 14, we use Python’s f-string formatting to display the first two
columns we got from the Postgres database for the row. The columns are
zero-based, so president[0] is the first column and president[1] is the second
column. We could have also displayed each president’s party – the third
column - by printing president[2], but we chose not to.

In line 15, we closed the cursor that we opened in line 10.

Running this Python program produces a list of our presidents:

> python python_calling_postgres.py

President 1 was George Washington


President 2 was John Adams
President 3 was Thomas Jefferson

President 44 was Barack Obama
President 45 was Donald Trump
President 46 was Joe Biden
We could add a lot to this Python program. For example, we could add
exception handling to display a message to the user if the us_president
database table no longer exists, or if the Postgres database password has
changed and we can’t access the database anymore.

Regardless of the programming language you use, there will be some module,
method, or driver that creates a connection to the database using credentials
like server name, database name, database user, and database password.
There will be a way to provide SQL, execute it against the database, and
return the results to your program. Sometimes that SQL will be a select,
insert, update, or delete statement, but sometimes it will be a call to a stored
procedure or a database function.

Another approach to interacting with Postgres from a programming language


is to use Object-Relational Mapping (ORM). ORM allows us to map
database tables to objects in your program and to deal with the objects in
your programming language of choice instead of using SQL to access your
data.

The most common ORM for the Python language is SQLAlchemy. Django – a
Python Web framework – uses it’s own ORM called “Django ORM”. The
most popular ORM for the Java language is Hibernate.

FINAL THOUGHTS
Thank you for reading this book. I hope it helped you to understand Postgres.
If you enjoyed the book, I would love it if you would leave a review on
Amazon. If you would like to suggest changes to future editions, or if you
found an error in the book, feel free to email me at
[email protected].

Learning to work with relational databases, like Postgres, can stretch your
mind. Seeing data in terms of rows and columns can be a different way to
conceptualize the world. Understanding an organization’s data by
understanding its primary keys may be a new way of thinking for you, so I
congratulate you for working your way through this book.
I expect to see Postgres take market share away from the big proprietary
database vendors in the future, and I hope you will benefit from that trend.

I hope that you will continue to enjoy learning new technologies and
techniques all throughout your career. Believe in yourself and do your best to
guard against “imposter syndrome” – that nagging doubting of your own
abilities and the feeling of being a fraud - which is so prevalent among
people in our industry. By continuing to learn and by having fun in the
process, you can’t help but to accumulate expertise and become skilled in
whichever technologies you choose to learn.

I wish you the best of luck!

You might also like