Essential Postgres_ Database Development Using PostgreSQL
Essential Postgres_ Database Development Using PostgreSQL
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted,
in any form without prior written permission of the author.
Every effort has been made to ensure the accuracy of the information presented. However, the
information contained in the book is sold without warranty, either express or implied. The author will
not be held liable for any damages caused directly or indirectly by this book.
Introduction
Audience
Conventions
Ellipses Used in Examples
Parens
4 Built-In Functions
Aggregate Functions
abs( )
upper( )
lower( )
initcap( )
round( )
trunc( )
ceil( )
floor( )
length( )
substr( )
trim( )
ltrim( )
rtrim( )
left( )
right( )
lpad( )
replace( )
format( )
extract( )
exp( )
pi( )
power( )
random( )
position( )
version( )
This book, Essential Postgres, will give you exposure to the most used – and
useful – parts of Postgres development, as well as tips and insights that I have
picked up over my years of working with Postgres. Writing SQL statements,
creating tables, functions, triggers, views, and data integrity are explained.
AUDIENCE
This book is suitable for anybody who wants to learn to use Postgres. That
includes folks who are new to Postgres and databases, intermediate-level
developers who would like a refresher, and even seasoned software
developers who are transitioning from another database to Postgres. This
book is for anybody who is interested in learning about the essentials of using
the Postgres database.
The aim of this book is to show you how to do things with the Postgres
database. The book is short on theory and long on examples. If you are the
type of person who likes that sort of thing, read on.
In this book, I have made some stylistic choices that you might want to be
aware of.
In this case, a full set of all 45 presidents was returned from the database, but
I omitted presidents 4 – 42 for the sake of brevity.
PARENS
I often refer to parentheses as “parens” because that’s the way I most often
hear them referred to among developers.
Also, I add parens to the end of function names. For example, Postgres has a
function called “upper” that I write as “upper( )”. You would say this aloud
as “the upper function”, but in this book I add the “parens” to the end of
function name to make it clear that “upper( )” is a function. Functions are
explained later in this book, and we’ll see what the “parens” are for then.
1
SELECTING DATA FROM A
POSTGRES DATABASE
For example, here is a table called us_president that has been defined with
three columns: president_id, president_name, and president_party.
This table contains 45 rows of data, one row for each president.
When naming database tables, it’s a good idea to stick with singular names
like us_president rather than plural names like us_presidents. Although
Postgres will allow you to name tables in the plural form, table names should
be singular by convention.
INTRODUCING SQL
In order to interact with Postgres, you will use Structured Query Language
(SQL). You will use SQL often for interacting with Postgres. Using SQL,
you can create tables, show the contents of tables, create new rows, update
values, and much more.
For example, you could see the data in our us_president table by using this
SQL select statement.
selectpresident_id,
president_name,
president_party
from us_president;
This select statement returns data from the us_president database table and
displays it for you. Here we chose to select the president_id, president_name,
and president_party columns. Note that you should type a semi-colon at the
end of your SQL statements. The results are:
The query returns all 45 US presidents (although I have not shown presidents
4 – 42 for brevity).
Now let’s say you want to show only the rows for Republicans in the table.
You could make a change to the SQL and add a “where” clause.
selectpresident_id,
president_name,
president_party
from us_president
where president_party = 'Republican';
The SQL query now returns 19 rows because there are 19 Republicans in the
table.
Since we are now retrieving a list of Republicans only, let’s change the SQL
to not select the president_party column. Let’s select just the president_id and
president_name columns.
selectpresident_id,
president_name
from us_president
where president_party = 'Republican';
president_id president_name
16 Abraham Lincoln
18 Ulysses Grant
19 Rutherford Hayes
20 James Garfield
21 Chester Arthur
23 Benjamin Harrison
25 William McKinley
26 Theodore Roosevelt
27 William Taft
29 Warren Harding
30 Calvin Coolidge
31 Herbert Hoover
34 Dwight Eisenhower
37 Richard Nixon
38 Gerald Ford
40 Ronald Reagan
41 George Herbert Walker Bush
43 George W. Bush
45 Donald Trump
The SQL query didn’t show the party_name column because we didn’t select
it, and again, the query returned 19 rows.
ORDERING ROWS
Now let’s display all the rows and columns in the table ordered by political
party and then by the president’s ID. To do that, we can add an “order by”
clause to our SQL statement.
select president_id,
president_name,
president_party
from us_president
order by president_party,
president_id;
Adding the “order by” clause had the effect of showing us the rows ordered
alphabetically by the president_party column and then ordered by the
president_id column in lowest to highest order numerically. Democrats were
shown first because the string “Democrat” comes alphabetically before the
other political parties (“Democratic-Republican”, “Federalist”, “Republican”,
and “Whig”). Of the Democratic presidents, Andrew Jackson appeared before
Martin Van Buren because Jackson’s president_id value was lower.
You can also specify whether you want to order in ascending or descending
order. That is, in low-to-high order or in high-to-low order.
select president_id,
president_name,
president_party
from us_president
order by president_party desc,
president_id asc;
The query returned all 45 rows (Again, I haven’t shown all the rows for
brevity). The data is displayed in descending (reverse) order for the
president_party column and then displayed in ascending (low-to-high) order
for the president_id column. If you don’t specify descending or ascending
(which can be abbreviated “desc” or “asc”), Postgres defaults to ascending
order and you will see your results in low-to-high order.
NULL VALUES
What’s going on with George Washington? Why is his political party listed
as null? Null is a special value in Postgres that represents an empty value.
George Washington was never a member of a political party so his
political_party value is set to null in our database table.
To show only the rows for presidents that have a null political_party, you
could run this query:
selectpresident_id,
president_name,
president_party
from us_president
where president_party is null;
selectpresident_id,
president_name,
president_party
from us_president
where president_party is not null;
Null is a special value in Postgres. Null is not the same as zero, and it is not
the same as an empty string or a space character. It is its own value, and
Postgres has special syntax to help you deal with it.
The word “syntax”, in software development speak, means the words and
symbols you can use that are part of the language. Some of the special syntax
Postgres has to handle null values includes “is null”, “is not null”, and
“coalesce”. (We will see “coalesce” in a moment).
FORMATTING SQL
So far, the SQL we have used has been in a nice readable format.
1selectpresident_id,
2 president_name,
3 president_party
4from us_president;
The column names and the table name all line up together vertically. There is
only one column name on each of lines 1 - 3. It is a good idea to write your
SQL statements in a neat, maintainable format like this, but SQL will also
allow you to write SQL statements that look like this:
select president_id, president_name, president_party from
us_president;
or like this:
select president_id, president_name, president_party
from us_president;
As long as SQL can figure out what you intended, it will run your statement.
You can add extra spaces between commas if you want. You can write the
entire statement on one line or you can use many lines.
You can also use the asterisk wildcard character as a shortcut to tell SQL to
retrieve all columns.
select * from us_president;
“select * from” means to select all columns from the table. Since
president_id, president_name, and president_party are all of the columns we
have in the us_president table, “select *” will produce the same results as
listing the column names individually. People will often say “select * from”
aloud as “select star from”.
You can also use “--” to add a comment at the end of a line of SQL.
selectpresident_id,
president_name,
president_party -- Political party, not birthday
from us_president;
To add comments that will be spread over more than one line, you can also
use “/*” at the beginning and “*/” at the end of comment.
/*
This query retrieves all the US Presidents.
There sure are a lot of them!
*/
select * from us_president;
The string “Say it loud” is an argument to the function. In other words, the
string “Say it loud” is sent to the upper( ) function, and the upper( ) function
will use that value to do something. In this case, it returns the uppercase
version of it, which is “SAY IT LOUD”.
Note that Postgres allows you to use a “select” statement without the “from”
clause. Normally, you would use a select statement to select data from a
database table, but in this case, since we are passing a hardcoded string (“Say
it loud”) into the function, we do not need to specify which database table to
get the data from. We don’t need to say “from table_name” because there is
no table involved. We can just “select upper(‘Say it loud’)” without the word
“from”.
To call the upper( ) function and pass a value to it from the database, we can
use a SQL statement like this:
selectupper(president_name)
from us_president
where president_id = 10;
upper
JOHN TYLER
For example, when we selected from the us_president table, we saw that
George Washington’s political party was set to null.
selectpresident_name,
president_party
from us_president;
president_name president_party
George Washington null
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …
If we want to display “No Political Party” instead of null, we could use the
coalesce( ) function.
selectpresident_name,
coalesce(president_party, 'No Political Party')
from us_president;
president_name president_party
George Washington No Political Party
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …
The coalesce( ) function takes a list of values and returns the first value that
isn’t null. In this case, we sent the coalesce( ) function two values: the
president_party column from the us_president table, and the text “No
Political Party”. If the president_party column in the us_president table is not
null, it will be returned. But if the president_party column in the us_president
table is null, the text “No Political Party” will be returned. The effect is that
each president’s political party is shown, and for George Washington, null is
replaced by the text “No Political Party”.
selectcount(*)
from us_president;
This count( ) function will return the number of rows in the us_president
table, which is 45. Note that with the count( ) function you use an asterisk
between the parentheses, like this: count(*).
selectmax(president_id)
from us_president;
This max( ) function returns the maximum president_id value in the table.
The row in this table with the highest president_id has a president_id of 45,
so 45 is returned.
selectmin(president_id)
from us_president;
This min( ) function returns the minimum president_id value in the table. The
lowest president_id in the table is 1, so 1 is returned.
select president_party,
count(*)
from us_president
group by president_party;
president_party count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democratic 16
Whig 4
The query’s results show us that there was one president without a political
party, 19 Republicans, 4 Democratic-Republicans, 1 Federalist, 16
Democrats, and 4 Whigs.
Let’s look at that last line of SQL that says “group by president_party”. We
said that an aggregate function returns a single value based on multiple
values in the database. For example, there are 19 rows in the table with
Republican presidents and the count( ) function returns just one row for them
that shows the total of 19. The “group by” in the SQL statement tells Postgres
to group the results by the president_party column.
selectpresident_party,
count(*)
from us_president;
To correct the problem, add the “group by” clause and tell SQL which
column to group by.
Here is another example. Let’s say we want to list the highest president_id
value for each political party. We want to do something like this:
selectpresident_party,
max(president_id)
from us_president;
But the query fails because we forgot the “group by” statement.
ERROR: column "us_president.president_party" must appear in the GROUP
BY clause or be used in an aggregate function
LINE 1: select president_party, max(president_id)
^
SQL state: 42803
Character: 8
select president_party,
max(president_id)
from us_president
group by president_party;
president_party max
null 1
Republican 45
Democratic-Republican 6
Federalist 2
Democratic 44
Whig 13
This query will not run because we forgot the “group by” clause. But what
should be in the “group by” clause? Everything in the select clause that isn’t
an aggregate function.
selectcontinent,
country,
sum(population),
sum(area)
from world_population
group by continent,
country;
selectpresident_name,
count(*)
from us_president
group by president_name;
I would expect this query to return a list of all the presidents with a count
of 1. After all, it’s not like there were two presidents named George
Washington. But when we run the query, we see one president comes back
with a count of 2.
president_name count
… …
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …
What is going on here? If we run a query to look at the data in the table, we
can see an interesting wrinkle in the data.
select * from us_president;
select president_name,
count(*)
from us_president
group by president_name;
president_name count
… …
Franklin Pierce 1
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …
We had to look through all of those 44 resulting rows to see if there were any
counts that were more than 1. But there is an easier way. We can display only
counts that are higher than 1 by using the “having” statement.
select president_name,
count(*)
from us_president
group by president_name
having count(*) > 1;
president_name count
Grover Cleveland 2
The “having” clause is a bit like the “where” clause, but it can be used after a
“group by” clause to operate on grouped rows.
selectpresident_id,
president_name,
president_party
from us_president
where president_party = 'Republican';
party_id party_name
1 Republican
2 Democratic-Republican
3 Federalist
4 Democrat
5 Whig
We should also change the us_president table and replace the president_party
column with a party_id column that matches the party_id from the
political_party table.
select * from us_president;
Now the text “Republican”, “Democrat” and “Whig” isn’t repeated over and
over in the table, which is an improvement.
1selectus_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4from us_president
5inner join political_party
6on us_president.party_id = political_party.party_id;
We are selecting from two database tables: the us_president table in line 4 of
our SQL and the political_party table in line 5. Both tables have a party_id
column so we join the tables based on the party_id column of both tables in
line 6.
Postgres allows you to do different types of joins, like inner joins, outer joins,
and cross joins. These will be explained momentarily, but you can see in line
5 that this query does an inner join.
Since we are now selecting from two tables in our query, every time we
reference a column name, we should specify which table the column comes
from. We can do that by using the table name followed by a period followed
by the column name. For example us_president.president_id refers to the
president_id column in the us_president table.
1selectus_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4from us_president
5inner join political_party
6on us_president.party_id = political_party.party_id;
In this query, the president_id and president_name columns come from the
us_president table, and the party_name column comes from the
political_party column.
TABLE ALIASING
One time-saving shortcut we can use when writing SQL is to make a short
alias for our table names. A table alias is a short, temporary, substitute name
for the table. We can save keystrokes by using an alias of “a” for the
us_president table, and “b” for political_party table.
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;
This query returns the same information as the previous query, but it requires
us to do a lot less typing to create the query.
On line 4 we declare that “a” will be the alias for the us_president table. On
line 5 we declare that “b” will be the alias for the political_party table. We
can choose any letter or letters we want, but this example uses “a” and “b”.
Now we can type just “a” or “b” instead of the table names in lines 1, 2, 3,
and 6. The benefit of table aliasing, as you can see, is that it saves us quite a
bit of typing.
You may see queries that use a slight variation of this syntax, using the word
“as” to define table aliases.
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president as a
5inner join political_party as b
6on a.party_id = b.party_id;
You can see on lines 4 and 5 the word “as” was used between the table names
and the alias names. It’s fine to use the “as” syntax, but since the objective of
using table aliases is to cut down on typing, personally I prefer not to use the
word “as”.
TYPES OF JOINS
Inner Joins
Inner joins are the most common type of join. In an inner join, there must be
a match in both tables in order for data to be retrieved from the database.
Let’s take another look at the inner join from the example above.
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;
The query joins the us_president and political_party tables based on the
party_id columns of each of the tables.
Why?
The reason is that the political_party table has one row for each of these
parties: Democrat, Democratic-Republican, Federalist, Republican, and
Whig. But there is one president in the us_president table that isn’t from any
of those parties. George Washington’s party in the us_president table is null.
So, when we joined the two tables, there was no match for George
Washington’s party_id and he did not appear in the results of the query.
That is the nature of an inner join. In an inner join, there must be a match in
both tables in order for the data to be retrieved.
Also, where the “inner join” is specified on line 5, the word “inner” is
optional. The default join type is inner. So this query:
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;
Will produce the same results as this query that does not use the word “inner”
on line 5:
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5join political_party b
6on a.party_id = b.party_id;
Let’s look at another example of an inner join using different tables. Let’s say
we have a customer table and an item_ordered table. The customer table and
the item_ordered table both have a customer_id column so we can join the
tables based on customer_id.
customer_id customer_name
1 Penny Sato
2 Dixie Gooseman
3 Molly Terry
Here is what the item_ordered table looks like:
We can use this inner join to see which customer ordered which item.
selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;
Remember, the word “inner” is optional. This query does an inner join even
though the syntax says “join” instead of “inner join”. Inner is the default type
of join.
In this query we saved typing by aliasing the “customer” table as “a”, and the
“item_ordered” table as “b”.
customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Since this was an inner join, results appeared only when there was a match in
both tables. For this query, that means only customers who ordered an item
will appear in the results. “Molly Terry” is in the customer table but she
doesn’t have an order in the item_ordered table so she doesn’t appear in the
results at all.
Outer Joins
Outer joins work a little differently. An outer join will display all rows from
one table, whether or not there is a matching row in the other table. If there
are matching rows in the other table, it will show the columns from that table
too.
1selecta.customer_name,
2 b.item_ordered
3from customer a left outer join item_ordered b
4on a.customer_id = b.customer_id;
customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Molly Terry null
Using an outer join, Molly Terry appears in the results of the query, but the
item_ordered column appears as null because there was no matching row in
the item_ordered table. In other words, SQL can’t show us which item she
ordered because she didn’t order an item, so it displays null.
The customer table is the left table because it is to the left of the “left outer
join” syntax on line 3. The item_ordered table is the right table because it is
to the right of the “left outer join” syntax.
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a left outer join political_party b
5on a.party_id = b.party_id;
“left outer join” returns all the rows from the left table (us_president) even if
there is no match in the right table (the political_party table).
Using this outer join, the query returns all 45 presidents. George Washington
appears in the results even though he has no matching political party in the
political_party table. This is different than the inner join we did on these
tables that returned only 44 rows.
The outer join worked as advertised. It showed all of the rows in the
us_president table and showed matching party_name columns from the
political_party table where there were some.
I like to think of the tables that are joined in an outer join as a mandatory
table and an optional table. In order for the president data to appear in the
results, it is mandatory that rows exist in the us_president table. But having
matching rows in the political_party table is optional. Rows will be returned
in our result set even if there isn’t a matching row in the political_party table.
The first table mentioned on line 4 is the us_president table. It is the left table
because it is to left of the “left outer join” syntax. The political_party table is
the right table because it is to the right of the “left outer join” syntax. Or
more simply stated, the us_president table is on the left in line 4 and the
political_party table is on the right.
This is how Postgres knows to select all rows from the us_president table and
only matching rows from the political_party table. “left outer join” returns all
the rows from the left table (us_president) even if there are no matches in the
right table (political_party).
Also note that the word “outer” is optional. I recommend that you use it for
clarity, but you will undoubtedly come across outer joins that use the syntax
“left join” instead of “left outer join”. The same results will be returned.
You can also do an outer join with the “right outer join” syntax rather than
“left outer join”. The “right outer join” syntax tells Postgres to do an outer
join and treat the table on the right side as the mandatory table.
1selecta.president_id,
2 a.president_name,
3 b.party_name
4from political_party b right outer join us_president a
5on a.party_id = b.party_id;
Notice that in the SQL statement with the right outer join, the order of the
tables has been switched. The political_party table is now the left table and
the us_president table is now on the right side.
Cross Joins
Cross joins are used less frequently than inner and outer joins, but there are
occasions when they come in handy.
Let’s say a chain of bookstores wants to send one copy of each new book to
each of their stores. They have a table called bookstore that looks like this:
store_id store_name
1 Mall Store
2 College Campus Bookstore
3 Suburban Store
4 Cellar Store
And they have a table called new_book that looks like this:
book_id book_name
100 Essential Postgres
200 How to Knit
300 Poetry for Dogs
store_name book_name
Mall Store Essential Postgres
Mall Store How to Knit
Mall Store Poetry for Dogs
College Campus Bookstore Essential Postgres
College Campus Bookstore How to Knit
College Campus Bookstore Poetry for Dogs
Suburban Store Essential Postgres
Suburban Store How to Knit
Suburban Store Poetry for Dogs
Cellar Store Essential Postgres
Cellar Store How to Knit
Cellar Store Poetry for Dogs
A cross join joins every row of one table to every row of another table. There
are four rows in our bookstore table and three rows in our new_book table.
So it makes sense that 12 rows are returned by the cross join because 4 x
3 = 12.
Self Joins
A self join is a join where the same table is joined to itself.
Each row represents one position in the church. The supervisor_id column
represents the position_id of the supervisor. For example, position_id 5 –
“Priest” – has a supervisor_id of 4. If we look at the row in the table that has
a position_id of 4, we see Bishop. So, we know that the Priest’s supervisor is
the Bishop.
In order to select an org chart from the database, we can join the
church_hierarchy table to itself.
selecta.position_desc,
b.position_desc
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;
position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
In the result set, the left column is a list of supervisors and the right column is
a list of their subordinates.
By the way, when I named the column “position_desc” in the table I used
“desc” to stand for “description”. That column is a text description of the
position.
The table alias “a” represents the church_hierarchy table, and the table alias
“b” also represents the church_hierarchy table, but we are treating them as
different tables. We joined on the position_id column of the “a” table to the
supervisor_id column of the “b” table.
COLUMN ALIASING
We have seen that we can create aliases for tables. Now let’s create aliases
for columns.
The query above gave us the information we needed for our org chart, but
both columns names appeared as “position_desc”.
position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
These column headings could be confusing to our end users. Really, the left
column is a list of supervisors and the right column is a list of workers. Let’s
change those column headings to show “Supervisor” and “Worker” instead of
the column headings from the table, “position_desc” and “position_desc”.
selecta.position_desc Supervisor,
b.position_desc Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;
Now the query returns the same results, but the column headings make the
meaning of the data more clear.
Supervisor Worker
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
Just as with table aliases, you may see queries that use the word “as” when
defining column aliases, like this:
selecta.position_desc as Supervisor,
b.position_desc as Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;
Either way is fine. You can use the word “as” or you can leave it out.
Using Parens
Postgres allows you to write SQL queries that accomplish the same results
using different syntax. We saw this query earlier.
1selecta.customer_name,
2 b.item_ordered
3from customer a
4join item_ordered b
5on a.customer_id = b.customer_id;
You may come across queries where they use parens on line 5.
1selecta.customer_name,
2 b.item_ordered
3from customer a
4join item_ordered b
5on (a.customer_id = b.customer_id);
selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
using (customer_id);
Either approach is fine. You will come across queries that use both the “on”
and the “using” styles.
Old-School SQL
This inner join between the customer and item_ordered tables looked like
this.
selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;
selecta.customer_name,
b.item_ordered
from customer a,
item_ordered b
where a.customer_id = b.customer_id;
This is an older style of SQL that is still supported by Postgres. This syntax
doesn’t include the word “join”; it just lists the table names separated by
commas. I recommend that you use the syntax shown in the first example, but
you may come across queries written using this older “ANSI-89” style shown
in the second example.
store
store_id store_name
1 Bill's Banjos
2 Mike's Music Emporium
3 Tennessee Instruments
inventory
store_id product_id quantity
1 11 5
2 22 9
2 11 2
3 33 12
3 11 4
product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.
Let’s join these four tables to get an inventory list of the products at all of our
music stores:
1selecta.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;
That query worked nicely. It was more complicated than our previous SQL
examples because the data we needed for this query was spread across four
tables. That required us to join across all four tables in order to produce our
inventory list.
In lines 5 – 8, we defined tables aliases for all four tables. We used “a” for
the store table, “b” for the product table, “c” for the supplier table, and “d”
for the inventory table. That saved us quite a bit of typing.
In lines 6 – 8, we used inner joins on the tables and specified which columns
the tables should be matched on. Recall that the “join” syntax is shorthand for
“inner join”.
So far all of the table aliases we have seen have been “a”, “b”, “c”, or “d”,
but you can make up your own table aliases. When you are writing a complex
query, you can make your SQL easier to read by making the table aliases
more meaningful. For example, you could write the query like this:
1selectst.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5from store st
6join inventory i on i.store_id = st.store_id
7join product p on p.product_id = i.product_id
8join supplier sp on sp.supplier_id = p.supplier_id
9order by st.store_name;
Instead of using “a”, “b”, “c”, and “d”, this time we used table aliases that are
shorthand for the table names. For example, “p” for the product table and “i”
for the inventory table. That can make it easier for you to write the join
condition on say, line 7. It’s harder to lose sight of which tables you are
joining when the table alias gives you a hint like the first letter of the table.
Unfortunately, in this join, two of the tables start with the same letter: store
and supplier. Since there are two tables that start with the letter “s”, for those
tables we used two-letter table aliases of “st” for the store table, and “sp” for
the supplier table.
Also, when you write queries that are joining many tables, note that the order
of the joins matters. For example, let’s switch the order of lines 6 and 7 in the
query above.
1selectst.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5from store st
6join product p on p.product_id = i.product_id
7join inventory i on i.store_id = st.store_id
8join supplier sp on sp.supplier_id = p.supplier_id
9order by st.store_name;
The reason is that line 6 makes reference to a table alias of “i” but “i” isn’t
defined until the “inventory i” part of line 7. We tried to use the table alias “i”
before it was defined. The fix is to change the order of lines 6 and 7 back to
their original positions.
The mandolin has also been added to the product table, but the supplier was
set to null because we don’t know the supplier.
The trouble is that now our inventory list query isn’t showing the new
mandolin.
1selecta.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;
These are the same results that we saw before. Why isn’t the mandolin
appearing?
You can see that this query uses all inner joins. Lines 6 – 8 use just the word
“join”, which is shorthand for an inner join. In order to display a product that
is in the product table but doesn’t have a supplier in the supplier table, we
need to do an outer join between the product and supplier tables. But we want
the other joins in the query to continue to be inner joins.
It turns out that all we need to do is add the word “left” to the query to
change the inner join to an outer join.
1select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8left join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;
Recall that “left join” is shorthand for “left outer join”. So now we have a
query that does two inner joins (on lines 6 and 7) and one outer join (on line
8). Now the mandolin appears in our inventory list.
STRING CONCATENATION
Let’s say we want to get a list of presidents in this format: “President Name
(Political Party)”. To do that, we can take the president_name column and
add a space, then and a left paren to it, then add the president_party value,
then add a right paren to it, like this:
select president_name || ' (' || president_party || ')'
from us_president;
…
Bill Clinton (Democratic)
George W. Bush (Republican)
Barack Obama (Democratic)
Donald Trump (Republican)
Greater Than
Using “>” we can list all the presidents with a president_id that is more than,
say, 40:
select*
from us_president
where president_id > 40;
select*
from us_president
where president_id >= 40;
Less Than
Using “<” we can list all the presidents with a president_id that is less than,
say, 5:
select*
from us_president
where president_id < 5;
select*
from us_president
where president_id <= 5;
Not Equal
There are two ways to represent “not equal” in SQL: “!=” and “<>”.
select*
from product
where product_name != 'Mandolin'
and product_id != 11;
The “!” is the “not” operator. So “!=” means “not equal to”. This query
selects products where the product_name is not Mandolin and the product_id
is not 11 (Banjo).
select*
from product
where product_name <> 'Mandolin'
and product_id <> 11;
I think of the “<>” syntax as saying, “show me everything that is less than (<)
11, and also show me everything that is greater than (>) 11”, which is really
another way to say, “not equal to 11”.
Both queries produce the same results: They both produce a list of all the
products except for Mandolin and Banjo, because Banjo is product_id 11.
In
The “in” keyword checks that a value is in a list of values.
select*
from store
where store_id in (1,3);
store_id store_name
1 Bill's Banjos
3 Tennessee Instruments
Our query returns all rows from the store table where the store_id is a 1 or a
3. Another way to write this query is using the “or” keyword:
select*
from store
where store_id = 1
or store_id = 3;
The advantage to using the “in” syntax is that it’s less typing than using “or”,
and with the “in” clause, you can also select the list of values from the
database, like this:
select *
from store
where store_id in
(
selectstore_id
from south_east_store
);
The part of the SQL statement within the parens is called a subquery. We’ll
explore subqueries in more detail momentarily.
Between
The “between” keyword checks that a value is within a range of values.
select *
from product
where product_id between 11 and 33;
Postgres returns a list of rows from the product table where the product_id is
between 11 and 33. Note that “between” returns rows where the product_id
equals 11 or equals 33, as well as any values that are between 11 and 33.
Like
The “like” keyword checks that a string value matches some pattern. In the
following query, we are returning values from the us_president table where
the values for president_name are like the pattern “George%”.
selectpresident_name
from us_president
where president_name like 'George%';
president_name
George Washington
George Herbert Walker Bush
George W. Bush
This query returns the names of presidents where the president_name value
starts with the text “George”. The percent character represents any characters.
So, the pattern “George%” checks for “George” followed by any characters at
all.
The percent sign will also match no characters. If a president were named just
“George” with no last name, the “where president_name like ‘George%’”
condition would return him as well.
The percent sign can be placed at any position in the string. In the query
below, the “%” is placed in the middle of the string, right between “George”
and “Bush”.
selectpresident_name
from us_president
where president_name like 'George%Bush';
president_name
George Herbert Walker Bush
George W. Bush
When using “like”, you can also use the underscore character (“_”) to
represent one character. The following query searches for presidents with
names that have the pattern of “_onald%”. That is, any one character
followed by the text “onald” followed by anything at all.
selectpresident_name
from us_president
where president_name like'_oNaLd%';
president_name
Ronald Reagan
Donald Trump
selectpresident_name
from us_president
where president_name like '_oNaLd%';
selectpresident_name
from us_president
where president_name ilike '_oNaLd%';
president_name
Ronald Reagan
Donald Trump
Is Null
“is null” checks if a value is null. Recall that null is a special value in
Postgres that represents an empty value.
select*
from product
where supplier_id is null;
Postgres comes with special keywords to deal with null values. A common
mistake is to use the syntax “where supplier_id = null”. That will return
unexpected results. The correct syntax is “where supplier_id is null”.
Not
The word “not” can be used to negate other keywords. For example:
select*
from product
where supplier_id is not null;
select*
from product
where supplier_id not in (11, 22);
selectpresident_name
from us_president
where president_name not like '_onald%';
select*
from product
where product_id not between 11 and 33;
“Not” has the effect of reversing the search criteria. “where supplier_id in
(11, 22)” returns only 11 and 22, but “where supplier_id not in (11, 22)”
returns everything except 11 and 22.
SUBQUERIES
A subquery is a query that is nested within another query.
selecttornado_alert_message
from emergency_message
where exists
(
select*
from tornado_alert
where alert_flag = 'Y'
);
This SQL statement has an inner query and an outer query. The part of the
SQL statement that is the inner query is also called a subquery. In the SQL
statement below, I have added comments to show which part is the outer
query and which part is the inner query or subquery.
The word “exists” on line 3 checks for the existence of rows in the subquery.
The inner query runs first, and checks the tornado_alert table to see if any
rows exist where the alert_flag column is set to “Y”. Let’s take a look at the
tornado_alert table.
select * from tornado_alert;
State alert_flag
Massachusetts N
North Carolina Y
Florida N
There is a row in the table with an alert_flag of Y. It is the row for North
Carolina. Since at least one row with a Y exists, the outer query then selects
the tornado_alert_message column from the emergency_message table and
returns it. The results are:
tornado_alert_message
There is a tornado alert
If there had been more than one row with an alert_flag of Y, the results
would have been the same. “Exists” checks that one or more rows exist in the
subquery.
If there had not been any rows returned by the subquery with an alert flag of
Y, nothing would have been returned by the SQL statement at all.
Correlated Subqueries
Let’s take another look at our music store data. Let’s get a list of all stores
that have more than 10 of an item in stock.
1 select store_name
2 from store s
3 where exists
4 (
5 select i.store_id
6 from inventory i
7 where i.store_id = s.store_id
8 and i.quantity > 10
9 );
In this SQL statement, we have an inner query and an outer query like in the
last example, but there is something different about this SQL statement. In
line 7, the store table from the outer query is joined with the inventory table
from the inner query. This is a correlated subquery because there is a
correlation between the inner and outer queries. In other words, the inner and
outer queries are joined.
Because the inner and outer queries are correlated, the query can show us not
only that there is a store with more than 10 of an item in stock, but it can
show us which store it is.
store_name
Tennessee Instruments
1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );
Instead of using the “exists” keyword, we are checking that the store_id in
the outer query matches the store_id in the inner query, on lines 3 and 5.
store_name
Tennessee Instruments
The trouble is that if the data in the inventory table changes, and a second
store begins stocking more than 10 of an item, our query would fail.
Right now, if we were to run the just the inner query by itself, we would see
that it returns one row that represents store_id 3, “Tennessee Instruments”.
selecti.store_id
from inventory i
where i.quantity > 10;
store_id
3
But now if another store, say “Bill’s Banjos”, got a new shipment of banjos,
the inner query would return two rows.
selecti.store_id
from inventory i
where i.quantity > 10;
store_id
3
1
1select store_name
2from store s
3where s.store_id =
4(
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8);
That is because the equal sign on line 3 tells SQL to expect only one value to
be returned from the subquery. When the subquery returns more than one
value, Postgres returns an error.
1select store_name
2from store s
3where s.store_id in
4(
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8);
Now that we have used “in” on line 3, our results come back nicely:
store_name
Bill's Banjos
Tennessee Instruments
Now the outer query isn’t surprised when the inner query returns multiple
rows. The “in” keyword can handle a set of values while “=” expects just one
value.
DISTINCT
The “distinct” keyword removes duplicates from our result set. If we select
the president_party column from our us_president table, we get lots of
duplicates.
selectpresident_party
from us_president;
president_party
null
Federalist
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic
Democratic
Whig
Whig
…
That query returns 45 values, one showing the party for each president. We
could use “distinct” if we want to see each party only once.
selectdistinct president_party
from us_president;
president_party
null
Republican
Democratic-Republican
Federalist
Democratic
Whig
Now the query returns only 6 rows. Selecting the distinct parties shows each
party value only once.
LIMIT
The “limit” keyword allows us to select a limited number of rows. If we are
interested in seeing what the us_president table looks like, but we don’t need
to see all 45 rows in it, we could limit the results to 5 rows.
select *
from us_president
limit 5;
So far, we have been selecting from database tables that already exist. Now
let’s look at how to create tables in the first place.
We can make up our own names for tables and columns. The names must be
63 characters or fewer, and they need to start with a letter or an underscore.
For readability, I recommend keeping your table and column names under 30
characters, and starting table and column names with a letter. Underscores are
often used between words in table and column names, but dashes are not
allowed.
By convention, tables are given singular names, not plural names. For
example, we called this table us_president instead of us_presidents.
In lines 2 – 4, we defined three columns for the table: president_id,
president_name, and president_party. We specified a data type for each
column. president_id is an integer, president_name is text, and
president_party also has a text data type.
DATA TYPES
Postgres provides lots of data types you can use to create your columns.
Some data types you will use frequently and some you will use once in a
while or only when some special situation arises.
Text
The text data type stores a string of characters. A string can be made up of
almost any characters, like letters, numbers, underscores, dashes,
ampersands, and lots more. Here the product_name column of the product
table is created with the text data type.
When dealing with text values in Postgres, we use single quotes not double-
quotes.
If you are creating a table and you need a column for text data of unknown
length, use the text data type.
Integer
The integer data type is used for storing whole numbers. An example of a
whole number is 3. int would not be the right data type to store 3.1415,
because that is not a whole number. That is because 3.1415 has a fractional
part (the part after the decimal point).
The int data type can store numbers between -2147483648 and +2147483647.
If you need to store larger numbers you can use the bigint data type instead.
If you are dealing with smaller numbers and you want to save disk space, you
can use the smallint data type.
The smallint and bigint data types are available if you need them, but most
times, if you need to store numbers that don’t have a decimal point, the int
data type is what you want.
Numeric
The “numeric” data type is used for storing numbers that have a fractional
amount.
In Postgres, the “numeric” data type and the “decimal” data type are the
same. You can create the column using either word.
Numbers like these that have a decimal point are said to have a precision and
scale. Precision is the total number of digits. Scale is the number of digits
after the decimal point. For example, 3.14159 has a precision of 6 and a scale
of 5.
We can use the “numeric” data type without specifying a precision and scale,
like we did above, or we can define a precision and scale for our numeric
column. When we created the special_number table, we could have created
the special_value column with a precision of, say, 10 and a scale of 4.
Now the values appear with 4 digits to the right of the decimal point, and up
to 6 digits to the left of the decimal point. Since only 10 digits total are
allowed in the column, and 4 are to the right of the decimal point, that leaves
us with 6 digits to the left of the decimal point.
We could not add the US National Debt to this column anymore because it is
too large for our column definition. The number has more than 6 digits to the
left of the decimal point.
Date
To store dates in Postgres, we can use the “date” data type. For example, here
is a table called clock_change that has one column called clock_change_date.
The clock_change_date column was created with the “date” data type, so we
can store dates in that column.
create table clock_change
(
clock_change_date date
);
select*
from clock_change;
clock_change_date
2020-11-01
2021-03-14
2021-11-07
2022-03-13
2022-11-06
1select*
2from clock_change
3where clock_change_date = '2021-11-07';
The “date” data type is handy when you need to store a date but when you
don’t need to store the time of day or a time zone. Notice that the date is
surrounded by single quotes on line 3.
Postgres provides an easy way to get today’s date called current_date. The
current_date function gives you today’s date as a “date” data type.
current_date
2020-12-27
We could also use current_date to check for dates in our clock_change table
that have today’s date.
select *
from clock_change
where clock_change_date = current_date;
You can also get the current_date and then add or subtract from it. For
example, the following SQL statement displays all rows in the table with a
clock_change_date from the last ten days:
select *
from clock_change
where clock_change_date > current_date – 10;
I named this column twotz because it stands for “Time Without Time Zone”.
The following query returns the column twotz from the start_time table:
selecttwotz
from start_time;
The time that was returned consists of several parts. 15 is military time for
3pm because 15 minus 12 is 3. 55 is the minutes. 25 is the seconds. 832142 is
the milliseconds. So, this time represents 3:55 pm and 25 seconds and
832142 milliseconds.
Notice that the “time without time zone” data type doesn’t include a date or a
time zone.
When using the “time without time zone” data type, you can either spell out
“time without time zone” or shorten it to “time”. The default for “time” is
“without time zone”. These two tables would be the same:
create table start_time
(
twotz time without time zone
);
create table start_time
(
twotz time
);
selecttwtz
from start_time;
This query returns the twtz column from the start_time table.
twtz
15:55:25.832142-04:00
This returned the same time as the “time without time zone” data type
example above, but now there is a “-4:00” at the end of it.
The “-4:00” is an offset that represents the time zone. This offset tells us the
difference between the time zone of this time and the UTC (Universal Time
Coordinated) time zone. The time “15:55:25.832142-04:00” is still 3:55 pm
and 25 seconds and 832142 milliseconds, but the “-4:00” tells us that it is in
the EDT time zone. EDT is Eastern Daylight Time, which is a time zone used
in the eastern part of the United States. EDT is 4 hours behind UTC, so the
offset is negative 4.
When using the “time with time zone” data type, you can either spell out
“time with time zone” or shorten it to “timetz”. These two tables would be
the same:
create table start_time
(
twtz time with time zone
);
Postgres provides an easy way to get the current time called current_time.
The current_time function gives you the current time as a “time with time
zone” data type.
select current_time;
14:34:19.274711-04:00
You could compare the times in our table to the current time using this query.
select*
from start_time
where twtz = current_time;
We are unlikely to have any rows in the table with a time that is exactly the
same as the current time including the milliseconds, but we can check for
times within the last hour using a query like this:
select*
from start_time
where twtz > (current_time - interval '1 hour');
The “timestamp without time zone” data type can also be referred to as
simply “timestamp”. These two tables would be the same:
create table start_time
(
tswotz timestamp without time zone
);
The “timestamp with time zone” data type can also be shortened to
“timestamptz”. These two tables would be the same:
create table start_time
(
tswtz timestamp with time zone
);
now
2020-12-29 14:54:56.710457-04
You can also use the Postgres current_timestamp function, which does this
same thing as now( ).
select current_timestamp;
current_timestamp
2020-12-29 14:54:56.710457-04
You could also use now( ) or current_timestamp to check the table to see if
there are any values with the current timestamp, like this:
select*
from start_time
where tswtz = now();
or:
select*
from start_time
where tswtz = current_timestamp;
Boolean
The boolean data type, often abbreviated as “bool”, stores a value that
represents true or false. Let’s take a look at the presidential_hair table.
create table presidential_hair (
president_name text,
good_hair bool
);
president_name good_hair
Richard Nixon false
Martin Van Buren false
John F. Kennedy true
To select all the presidents with good hair, you can say “where good_hair” in
your query.
selectpresident_name
from presidential_hair
where good_hair;
president_name
John F. Kennedy
You may see queries that use a slightly different syntax for checking
booleans, for example:
select president_name from presidential_hair where good_hair is true;
select president_name from presidential_hair where good_hair = true;
select president_name from presidential_hair where good_hair = 't';
To check a boolean for a value of false, you can use the “not” syntax.
selectpresident_name
from presidential_hair
where not good_hair;
president_name
Richard Nixon
Martin Van Buren
And again, you may come across queries that use a slightly different syntax
to do the same thing.
select president_name from presidential_hair where good_hair is
false;
select president_name from presidential_hair where good_hair = false;
select president_name from presidential_hair where good_hair = 'f';
In some databases, you will see columns named with the suffix “_flag” that
have a boolean data type. Since a boolean is either true or false, columns will
be given names like “active_user_flag” because if the user is active, setting
the column to true can be compared to raising a flag.
If you wanted a list of active users who speak Spanish, you could use this
query:
select*
from application_user
where active_user_flag
and speaks_spanish_flag;
User_name active_user_flag speaks_english_flag speaks_spanish_flag
Kendra Duncan true true true
Character
The “character” data type allows us to store fixed-length character strings. It
is often abbreviated as “char”.
For example, the following table was created to store data about states in the
USA:
create table us_state (
state_code char(2),
state_name text
);
state_code state_name
MA Massachusetts
NC North Carolina
HI Hawaii
Some of the data types - like point, circle, and polygon - are for geographical
data. There are data types for binary data, MAC addresses, host addresses,
and all manner of other things. You can find a complete description of all of
the Postgres data types on the Postgres website. You can even create your
own data types using the “create type” command if you ever need a really
unconventional data type.
Numbers
bigint A larger version of "int"
smallint A smaller version of "int"
money A currency amount
real A floating point number
double precision A larger floating point number (also
called "double" or "float")
Formatted Data
json Stores textual JSON (JavaScript Object Notation)
data
jsonb Stores binary JSON data, decomposed
xml Stores XML (Extensible Markup Language) data
Serial
serial An autoincrementing integer
smallserial A smaller autoincrementing integer
bigserial A larger autoincrementing integer
Geometry-Related
box A rectangular box on a plane
circle A circle on a plane
line An infinite line on a plane
lseg A line segment on a plane
path A geometric path on a plane
point A geometric point on a plane
polygon A closed geometric path on a plane
Specialty Addresses
cidr An IPv4 or IPv6 network address
inet An IPv4 or IPv6 host address
macaddr A MAC (Media Access Control) address
macaddr8 A MAC (Media Access Control) address
(EUI-64 format)
Miscellaneous
interval A time span
pg_lsn A Postgres log sequence number
tsquery A text search query
tsvector A text search document
txid_snapshot A user-level transaction ID snapshot
uuid A universally unique identifier
character varying A variable-length character string
CONSTRAINTS
When you create your own database tables, Postgres gives you a way to put
“constraints” on your columns. Constraints are rules about the data that can
be saved in your columns. Can there be two rows in the table with the same
value in this column? Are null values allowed for this column? Using
constraints, you can define rules about the data that is allowed, and Postgres
will enforce those rules for you.
Constraints help us with “data integrity”. That is, they help us to keep the
data in our database accurate and consistent.
Primary Keys
A “primary key” is a column, or more than one column, that uniquely
identifies the rows in a table. When you create a database table, one of the
most important questions to consider is “what should the primary key for this
table be?”
Single Column Primary Keys
In our us_president table, which column uniquely identifies each row?
The answer is the president_id column. That is because no two rows can have
the same president_id. There will never be two different rows with the same
president_id.
That can’t be said of the president_party column. We can see that there are
multiple rows that have a president_party of “Democratic-Republican”.
But we can always be sure that president_id 3 will refer to just one row in the
table. So, when we create the us_president table, we should make the
president_id column the primary key.
To designate the president_id column the primary key for the table, we can
use the “primary key” syntax:
In line 2, we named the column “president_id”, we set its data type to integer,
and we defined the column as the primary key for this table by using the
syntax “primary key”.
Making president_id the primary key for this table does three good things for
us.
First, it prevents duplicate President IDs from being inserted into the table. If
a user of our database tries to add president_id 40 and there already is a
president_id 40 in the table, Postgres will stop them and give them an error
message.
Secondly, making president_id the primary key prevents users from adding a
null value to the president_id column. When we defined president_id as the
primary key, we designated it as a special column that cannot be null.
Those two things fall under the category of “data integrity”. They help to
keep the data in our database accurate and consistent. Once we define this
primary key, we can be assured that all rows in the table will have a unique
president_id, and that no president_id will be null. Postgres will enforce that
for us, and that will help to keep the data in our database of a high quality.
For this table, the primary key should consist of the city and temp_date
columns. That is because there should be only one row with the same city and
date. For example, we have a row for Boston for 2020-12-01 with a high
temperature of 36. If we make the city and temp_date the primary key for this
table, Postgres will prevent users from adding a second row for Boston for
2020-12-01.
There would never be a reason to add a second row for Boston for 2020-12-
01, so if we make city and temp_date the primary key, Postgres will make
sure that doesn’t happen. To make city and temp_date the primary key for
this table, we can use the “primary key” syntax.
In line 5, we told Postgres that the primary key should be comprised of the
city and temp_date columns. When a primary key is made up of more than
one column this way, it is called a “composite” key.
Postgres doesn’t require you to define a primary key for the tables you create,
but I recommend that you do. Defining primary keys for your tables sets
Postgres up to help you with data integrity. It also helps you to understand
your data and the best way to define your tables. Most times, if you can’t
figure out what the primary key should be for a new table, that means you
need to rethink your table design. Except in rare cases, every table should
have a primary key.
Foreign Keys
A foreign key is a column, or more than one column, that has a relationship
with the primary key of another table. You can use a table’s foreign key to
join it to another table. A foreign key consists of the columns that are used to
match this table to another table’s primary key columns.
Let’s take another look at the tables for our music stores:
supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.
product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
When we create the supplier table, we should make the primary key the
supplier_id column because we never want there to be two rows in the
supplier table with the same supplier_id.
The product table also has a supplier_id column. That column can be used to
link the product table to the supplier table. Product 11 – Banjo – in the
product table has a supplier_id of 100. We can go to the supplier table and
see that supplier_id 100 is “Missing Teeth Mfg” so we know that the supplier
of the banjo is Missing Teeth Mfg.
When we create the product table, we should define the supplier_id column
as a foreign key that references the supplier_id column of the supplier table.
Now that we have defined the president_name column as “not null”, if a user
in our database tries to add a null value for president_name, Postgres will
stop them and display an error message.
By the way, the president_id column defined in line 2 will also not allow null
values because it has been defined as the table’s primary key. Recall that
primary keys columns cannot have a null value. Because it is the primary
key, we didn’t need to specify “not null” for the president_id column.
Postgres does that for us automatically.
Unique
Sometimes we want to prevent duplicate values in a column. We might want
to create a customer table, for example, that will not allow duplicate email
addresses.
On line 4 of our “create table” SQL statement, we used the word “unique”.
Because the table was created this way, Postgres will not allow two
customers in the table to have the same email address. Postgres will display
an error message if a user tries to create a customer with an email address that
is already in the table.
Check
We can use the “check” constraint to make sure that a column contains
certain values, or a certain range of values. We can define the
high_temperature table this way.
INDEXES
Postgres lets us create indexes on our tables that will help to speed up the
SQL queries that use those tables. You can create an index using the “create
index” syntax.
Again, you don’t need to create indexes for columns that have been defined
as primary keys. When you create a primary key for a table, Postgres
automatically indexes those columns for you.
DROPPING TABLES
The syntax to drop a table is “drop table”. Be careful with this command
because it will remove the table and all of the data in it.
drop table product;
ALTERING TABLES
The alter command lets us add columns to the table. We can also use it to
drop a column, change the data type of a column, rename a column, and
more.
alter table customer add column county text;
There are other ways to insert, update, and delete data from a Postgres
database in addition to using PgAdmin. Many applications have a User
Interface (UI) that allows end users to enter and change data and to have
those changes affect a Postgres database. An application’s UI might be web-
based, or it might be part of a mobile application, or it may be part of a
desktop application. Often, an application’s user interface sends SQL
commands like insert, update, and delete to a Postgres database, where those
commands are executed.
Inserting Data
To use SQL to insert data into a table, use the “insert” command.
Here we are inserting a row into the us_president table. In line 1, we are
telling Postgres that we want to insert this data into the president_id,
president_name, and president_party columns. Then on line 2, we provide the
data to insert, in this case, 33, “Harry Truman”, and “Democratic”. “Harry
Truman” and “Democratic” have single quotes around them because they are
text fields. 33 is an integer so it does not have quotes around it. This SQL
statement will add a new row to the us_president table.
Notice that we listed three columns in line 1 and three values in line 2.
Even though we aren’t specifying any columns to insert the data into on line
2, Postgres accepts this syntax and adds the row to the table. Postgres sees
that there are three columns in the table and since we provided three values, it
writes the values to those three columns in the table without displaying an
error.
Listing the column names on line 2 allows this insert statement to work even
if we add new columns to the table later.
This SQL statement will insert all 19 Republican presidents from the
us_president table into the us_republican_president table. If we were to run
just lines 5 – 8 in isolation, it would display the Republican presidents from
the us_president table. Lines 1 – 4 take that data and insert it into the
us_republican_president table. Notice that the word “values” is not part of
this syntax as it was in the previous example.
Updating Data
To change data that already exists in a table, we can use the “update”
statement and specify which data we want to change in our “where” clause.
Here we are replacing any null values in the president_party column with the
text “None”.
updateus_president
set president_party = 'None'
where president_party is null;
This changes George Washington’s political party from null to the text
“None”.
Now let’s change the row for Lyndon Johnson to include his middle name.
updateus_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';
This updates the row for Lyndon Johnson and sets his president_name to
Lyndon Baines Johnson. We can see that the update worked by querying the
table:
select*
from us_president
where president_id = 36;
One thing to be aware of is that the update statement will update as many
rows as meet the criteria in the “where” clause. In this case, there was only
one row in the table with a president_id of 36, so only that row was updated.
But let’s imagine that we forgot to include the where clause in our update
statement. Instead of running this SQL statement:
updateus_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';
updateus_president
set president_name = 'Lyndon Baines Johnson';
Now instead of updating one row in the table, we would update all the rows
in the table! Let’s look at the table now:
select * from us_president;
So be aware that update statements can update more than one row. It will
update as many rows as the where clause specifies. If there is no “where”
clause at all, all rows in the table will be updated.
Deleting Data
To delete data from database tables, we usually use the “delete” statement.
There is also a “truncate” statement that we can use if we want to quickly
remove all rows in a table.
If there were no supplier 500 in the table to begin with, the delete statement
would not display an error message. It would simply delete zero rows.
To delete all rows in a table, you can use the “delete” syntax without a
“where” clause.
delete from supplier;
This will delete all the rows in the table, but it will not drop the table. The
table will still exist, but there will be no rows of data in it. If you want to
remove the entire table, you can use the “drop table” statement instead.
Like the “insert” statement, “delete” will remove as many rows that match
the criteria in the where clause. “Delete” can also delete many rows in a
single SQL statement. This one delete statement will delete every Republican
in the table:
delete from us_president
where president_party = 'Republican';
Truncate has the same effect as deleting all rows from a table with the
“delete” statement. It will delete all the rows in the table, but will not drop the
table. The table will still exist, but there will be no rows in it. The difference
is that “truncate” is faster because of the way Postgres implements “truncate”
behind the scenes.
3
MORE DATABASE OBJECTS
Tables are by far the most-used database objects, but there are others we will
use as well, like views, materialized views, sequences, functions, and
triggers.
You can name your objects using the same rules that we used for tables.
Their names must be 63 characters or fewer, and they need to start with a
letter or an underscore. I like to name my database objects with a prefix that
indicates what type of objects they are. For example:
VIEWS
A “view” in Postgres is a saved SQL statement that you can query like it’s a
table. To create a view, use the “create view” statement.
Now that you have defined the v_democratic_president view, you can query
it as if it were a table.
select * from v_democratic_president;
president_id president_name
7 Andrew Jackson
8 Martin Van Buren
11 James Polk
14 Franklin Pierce
… …
You can also use a “where” clause when selecting from a view.
select*
from v_democratic_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Each time we select from a view, Postgres runs the query that we defined in
our “create view” statement. That query typically selects rows from a
database table. That means that if we make a change to the data in our
underlying database table, the change will also appear when we select from
the view. For example, when we selected from the v_democratic_president
view for president_id 7, Postgres ran the query that we defined when we
created the v_democratic_president view.
Now if we run our select statement against the view again, we see that
president_id 7 is not returned. This query returns no rows:
select*
from v_democratic_president
where president_id = 7;
That makes sense because this view is supposed to return only Democratic
presidents. We can see that the changes to the underlying table took effect
when we queried the view.
We could create a view called v_employee that has all of the columns except
for salary.
When we query the new v_employee view, we see that it has been created
correctly and that it has the information we intended.
select * from v_employee;
Now we have a table called employee that includes salary data, and a view
called v_employee that does not include salary data. When data changes in
the employee table, the v_employee view will automatically reflect those
changes, so we’ll only have to make our changes in one place.
Now we can give users in the Human Resources department access to the
employee table, and give all other users access to the v_employee view.
MATERIALIZED VIEWS
A “materialized view” in Postgres is created by running a SQL statement, but
unlike a regular view, the results of the SQL statement are saved. To create a
materialized view, use the “create materialized view” syntax.
select*
from mv_democratic_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Now let’s see what happens if we change the political party of president_id 7
to “Republican” in the underlying us_president table.
select*
from mv_democratic_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Even though we changed the data in the underlying us_president table, the
data in the materialized view did not get refreshed. The materialized view is
still reflecting the state of the us_president table before we changed it. Unlike
a regular view, we need to run the “refresh materialized view” SQL
command to refresh a materialized view.
refresh materialized view mv_democratic_president;
Now when we check the materialized view, president 7 does not appear. The
change we made to the underlying table is now reflected in the materialized
view.
select*
from mv_democratic_president
where president_id = 7;
Now that we have refreshed the materialized view, the query returns no rows,
as we would expect.
Materialized views are good for situations where you either want to save
point-in-time data or where the query that creates the materialized view is
complex and takes a long time to run. If the query for a view takes, say, 5
minutes to run then you may make the decision to use a materialized view
instead. You may decide to refresh a materialized view every hour rather than
using a regular view that will run the query every time somebody selects
from the view.
SEQUENCES
Sometimes we want a column in a table to contain consecutive numbers. In
the us_president table for example, the row for the first president has a
president_id of 1. The next row has a president_id of 2. Then 3, 4, 5, etc.
Each time a new row is added to the us_president table, we can use the
“president_sequence” sequence we just created to keep a running tally and
insert the right president_id into the table. Postgres provides a function called
nextval( ) to help us with that. Calling nextval( ) adds 1 to our sequence and
saves the new value.
Let’s assume that the us_president table is empty when we start. We can add
rows to it using the sequence and the nextval( ) function to create a value for
the president_id column.
insert into us_president (president_id, president_name,
president_party)
values (nextval(‘president_sequence’), ‘George Washington’, null);
CREATING FUNCTIONS
We saw earlier how to call functions that have been provided to us by
Postgres, but now let’s take a look at creating our own functions.
Functions are helpful when you have a complex SQL statement, or a group of
SQL statements that have several steps, and you know that you will need to
run those same SQL statements again sometime in the future. It makes sense
to save that SQL as a function so you can call it by name later. This will
prevent you from having to recreate the SQL next time, and it will make it
easy to run the SQL the next time you need it. Functions are stored in the
Postgres database, so we can write the SQL once and simply call the function
whenever we need to.
Recall that Postgres comes with functions like the upper( ) function. We saw
that calling upper( ) with a value of “Say it loud” returned the text “SAY IT
LOUD”.
select upper('Say it loud');
upper
SAY IT LOUD
Notice that when you call the upper( ) function, there are left and right parens
after the word “upper”. You can pass values to the function by putting the
values between the parens. In this case, we wrapped the value “Say it loud”
in single quotes because it is text. If the value had been a number, we
wouldn’t have used quotes. You will call the functions that you write the
same way that you call functions provided by Postgres.
In our “select upper(‘Say it loud’)” example, the text “Say it loud” was the
only argument passed to the upper( ) function. The upper( ) function was
written to accept one value and to do something with it. In this case, the
function takes the value sent to it and returns the uppercase version of it.
We have been using plain SQL up until now and it has done everything we
needed it to do, but there are times when we need more functionality than the
SQL language can offer. SQL is a non-procedural language. It can query
database tables nicely, but you can’t define variables or create loops with it.
You can’t add if/then logic to your programs with plain SQL. If you need to
write a function using a procedural language to do those types of more
advanced things, use PL/pgSQL.
On line 1, we create the function. We could drop and then recreate the
function in two steps, but the “create or replace function” syntax will do that
for us in one step. If the function doesn’t exist, it will be created. If the
function already exists, it will be recreated.
The body of the function goes within the “$$” symbols (two dollar signs).
Surrounding text with $$ in Postgres is known as “dollar quoting”. The
function’s body goes from the $$ at the end of line 1 until the $$ at the
beginning of line 5. Also on line 5, we are specifying that this function has
been written in SQL, not PL/PgSQL.
Lines 2 – 4 are the body of the function. They contain a SQL query that
selects the president’s name, concatenates it to a space and a left paren, adds
the president’s political party, and concatenates a right paren. The effect is to
return the president’s name followed by his political party in parens. Recall
that two pipe characters (“||”) is the concatenation operator; it appends strings
together.
On line 4, “$1” represents the first argument passed into this function. In this
case there is only one argument passed in, but if there were others, we could
have referenced them as $2, $3, $4 for the second argument, the third
argument, the fourth argument, etc.
We can call the function like this:
select f_get_president_and_party(7);
The function queried the us_president table for president_id 7 and returned
the president_name and president_party columns, nicely formatted. We could
also call this function embedded in a SQL query like this:
1selectf_get_president_and_party(president_id)
2from us_president
3where president_id > 40;
In line 1, we got the president_id from the us_president table and sent it as an
argument to the f_get_president_and_party( ) function. The function returned
a president name and the political party shown within parens. On line 3, in
our “where” clause, we asked for only presidents that have a president_id
greater than 40, so the function retrieved 5 presidents from the us_president
table. The f_get_president_and_party( ) function was called 5 times, with 5
different president IDs, returning 5 rows.
In line 2, we said that this function returns text. Then the function’s body
goes from the “$$” symbol at the end of line 2 until the other “$$” on line 28.
The “begin” on line 8 goes with the “end” on line 27. This groups those lines
together in a code block. Notice that the “end” in line 27 has a semi-colon
after it, but the “begin” on line 8 does not.
On line 13, we have an “if statement”. The “if” on line 13 goes with the “end-
if” on line 15. If the value of the v_max_president_id variable is the same as
the value of the p_president_id parameter then we set the v_current_text
variable to the text “**Current President**”. If the value of the
v_max_president_id variable is not the same as the value of the
p_president_id parameter then we don’t set v_max_president_id, so it
remains equal to the empty string we set it to in line 7. Notice that the “end-
if” on line 15 has a semi-colon after it, but the “if” on line 13 does not.
In line 2, we promised that this function “returns text”. In line 24, we actually
return that text. We return a text value that is created by combining the
p_president_id parameter and the v_president_name, v_president_party, and
v_current_text variables formatted with punctuation. Line 24 is a long line so
we continued it on line 25. Postgres knows we are done with the line when it
sees the semi-colon at the end of line 25.
Let’s test the function by calling it with an argument of 33 and then again
with an argument of 45.
select f_get_president_note_current(33);
President ID: 33 Name: Harry Truman Political Party: Democratic
select f_get_president_note_current(45);
We know the function is working because the data appears in the format we
expected and the current president shows the text “**Current President**”.
Return Types
Functions can return a value of any Postgres data type. The example we just
saw returned a text value, but functions can also return other data types like
int, date, or boolean.
Functions can also return nothing. To define a function that doesn’t return a
value, declare the function to return “void”.
create or replace function f_no_return_type(p_president_id int)
returns void as $$
begin
insert into some_table (president_id)
select p_president_id;
end;
$$ language plpgsql;
Postgres knows that we want to send the value 33 into the p_president_id
parameter and the text “Republican” into the p_party parameter. It knows that
because of the order of the parameters and arguments. When we called the
function, we provided 33 first and when we created the function, we accepted
p_president_id first. So Postgres knows that we want 33 to be the
p_president_id value. This is called “positional notation”.
The equal sign followed by the greater than sign (“=>”) is intended to look
like an arrow, as if to say p_president_id points to 33 and p_party points to
“Republican”.
Using this named notation, we could switch the order that we provide the
arguments in, and the function would still work. Because we are specifying
the name of the parameter for each value, calling the function with the
arguments in a different order will work just as long as the correct parameter
name is still associated with the correct value.
select f_get_president(
p_party => 'Republican',
p_president_id => 33
);
We can use an “in” parameter to pass a value into the function. We can use
an “out” parameter to return a value from the function. An “inout” parameter
is used when we want to pass an argument into a function, change it, and then
return it. To send an array of values to a function, we can use “variadic”
mode.
f_multiply_by_5
15
In lines 6 and 7, we multiplied the p_in parameter by 5 and put the resulting
value into p_out. The p_out parameter gets returned from the function
because we declared its mode as “out” in line 3.
We could rewrite that function using just one “inout” parameter, like this:
select f_multiply_by_5(3);
f_multiply_by_5
15
Now the function accepts one parameter called p_inout on line 2. The value
of p_inout gets changed on lines 6 and 7 and is automatically returned to the
caller of the function because we declared its mode as “inout” on line 2.
Variadic mode is used less commonly than the in, out, and inout modes.
Variadic mode lets us pass an array into a function. An “array” is a group of
values that all have the same data type. Let’s write a function that accepts an
array of integers and multiplies them together.
select f_multiply_together(3,5,2);
f_multiply_together
30
In line 2, we promised that this function will return an integer, which in this
case will be all the parameter values multiplied together.
In lines 4 and 5, we declared two variables. We will use the “v_tot” variable
to calculate the product of the parameter values. We will use the “v_ind”
variable for each individual value.
When we call the function with arguments of 3, 5, and 2, the function loops
through lines 7 – 9 three times. The first time v_ind is 3. The second time
v_ind is 5. The third time v_ind is 2.
The v_tot variable started out as 1 because we defined it as 1 in line 4. The
first time we looped through lines 7 – 9, v_tot became 3 because 1 times 3 is
3. The second time we looped through, v_tot became 15 because 3 times 5 is
15. The third time we looped through, v_tot became 30 because 15 times 2 is
30. When the function was finished looping through the parameter values, it
returned the value of 30 in line 10.
We passed three values into the function, but because the function was
written to accept an array of integers, it can accept any number of integer
values.
select f_multiply_together(1,2,3,4,5);
f_multiply_together
120
select f_multiply_together(12,4,9,8,1,333,1,87,5,2);
f_multiply_together
1001237760
PROCEDURES
Procedures, sometimes called “stored procedures”, are similar to functions.
Procedures were added to Postgres fairly recently, in version 11. In the real
world, I see functions being used more frequently than procedures in
Postgres.
We saw that you can call a function by using the “select” keyword, like this:
select f_multiply_by_5(3);
TRIGGERS
You can use a trigger to automatically run a Postgres function when some
event occurs. For example, when a new row gets inserted into a particular
table, you can have a trigger “fire” and call a function to insert a row in
another table.
Audit Triggers
One common use of triggers is for keeping track of which users have made
changes to a table. In cases where we have several different database users
making changes to data in an important table, it can be useful to create a
second table where we can log which user made which changes.
When new rows get inserted into the us_president table, let’s track which
user inserted them. To do that, we will create three new database objects:
When a user inserts a row for a new president in the us_president table, the
trigger will automatically fire and call the function. The function will write a
row to the us_president_insert_audit table. The audit table will store the name
of the user who inserted the row into the us_president table as well as what
time that row was inserted.
In line 3, we create a column called “insert_date” to track the date that the
new row was inserted into the us_president table.
In lines 4 – 6, we created columns in the audit table that will track the
president_id, president_name, and president_party columns that were inserted
into the us_president table. The same values for the columns that the user
inserted into the us_president table will be written to these columns in the
us_president_insert_audit table.
In line 1, I named the function starting with the text “tf_” for “trigger
function”, but this isn’t necessary. Postgres will allow you to pick your own
naming conventions. You can see there are no parameters defined between
the parens. For trigger functions, you don’t define any parameters.
In line 13, we got the user’s username by calling the “current_user” Postgres
function. You can test the current_user function by selecting your own
username from the database, like this:
select current_user;
current_user
Rick
In line 14, we got the date and time that the row was inserted by using the
Postgres now( ) function. By the way, you can test this function by selecting
the current date and time where you are.
select now();
now
2020-12-23 10:29:44.216481-04
In line 2, we specified that the trigger should fire “after insert”. That is, after
the insert has been made to the us_president table. When a user inserts a row
into the us_president table, the row will be written to us_president first and
then the audit record will be written to the us_president_insert_audit table
afterwards. Our options are “before”, “after”, or “instead of”. “Instead of”
means don’t insert the record to the us_president table at all; instead run the
function.
On line 3, we said “for each row”. That means for every row we insert into
the us_president table, the tf_us_president_insert_audit( ) function will be
called once. Our other option is “for each statement”. Consider this insert
statement:
insertinto us_president (
president_id,
president_name,
president_party
)
selectpresident_id,
president_name,
president_party
from a_table_with_10_rows;
This one insert statement is inserting 10 rows into the us_president table. If
our trigger specified “for each row”, it would call the
tf_us_president_insert_audit ( ) function 10 times. On the other hand, if we
specified “for each statement” in the trigger, it would call the
tf_us_president_insert_audit( ) function just once. Depending upon what you
are trying to accomplish, there are times that you might want to choose “for
each statement” or “for each row”.
Watching it Work
Now that the trigger is in place, any new rows that get inserted to the
us_president table will get audited to the us_president_insert_audit table. For
example, a user might add a new row to the us_president table using this
insert statement:
insertinto us_president (
president_id,
president_name,
president_party
)
values
(
46,
'Hillary Clinton',
'Democratic'
);
We might see this row in the us_president table and wonder who added that
data. Because we have set up an audit trigger, we can query the
us_president_insert_audit table to find out.
The audit trail shows that a database user named “HRC” created the row in
the us_president table on 1/21/2017 at 2:39pm Eastern Daylight Time. That
row of data doesn’t appear to be correct, so we might want to pay a visit to
the HRC user and possibly revoke the user’s database privileges. We should
probably also delete that row from the us_president table.
We could even write one trigger that fires for inserts, updates, and deletes.
create trigger tr_us_president_audit
after insert or update or delete on us_president
for each row
execute function tf_us_president_audit();
If you are writing a trigger to track rows that have been updated in a table, it
is helpful to know what the original values were and what the new values are.
For example, if a user runs this update statement:
updateus_president
set president_name = 'Harry S. Truman'
where president_name = 'Harry Truman';
Altering Triggers
If we wanted to rename a trigger, we can use the “alter trigger” command.
alter trigger tr_us_president_insert_audit on us_president
rename to tr_us_pres_ins_aud;
Dropping Triggers
To remove a trigger, we can use the “drop trigger” syntax.
drop trigger tr_us_president_insert_audit on us_president;
Notice that we need to tell Postgres that the trigger was associated with the
us_president table. If we are sure we won’t need them anymore, we could
also drop the other database objects that we created for use with the trigger.
drop function tf_us_president_insert_audit();
Disabling Triggers
If you have a trigger that you know you will not need again, you can drop the
trigger. Dropping the trigger will remove it permanently. On the other hand,
if you want to keep the trigger but prevent it from firing temporarily, you can
disable the trigger and reenable it when you want it to start firing again.
alter table us_president
disable trigger tr_us_president_insert_audit;
Notice that to disable a trigger, you alter the table that it is associated with.
Enabling Triggers
After you have disabled a trigger temporarily, you can enable it again by
using the “alter table” command.
alter table us_president
enable trigger tr_us_president_insert_audit;
As mentioned, Postgres comes with lots of functions that you can call from
your SQL statements. There are functions to transform strings, do
mathematical calculations, get today’s date, and perform all manner of other
helpful operations.
Postgres comes with over 2,000 functions. You won’t need to use – or even
know about – most of them. The majority of the functions that Postgres
provides are not commonly-used and are not shown here. Most of them can
be found on the Postgres website if you do ever have occasion to use them.
Let’s take a look at some of the built-in Postgres functions that you are most
likely to use.
AGGREGATE FUNCTIONS
We have already talked about aggregate functions. You will use these
functions frequently:
ABS()
The abs( ) function returns the absolute value of a number. “Absolute value”
measures how far away from zero a number is.
selectabs(8),
abs(-8);
abs abs
8 8
The results show that the absolute value of 8 is 8, and the absolute value of -8
is also 8.
Consider the following table that contains a list of meteorologists, their most
recent forecasted temperature, and the actual temperature:
Which meteorologist made the closest prediction? We can use the abs( )
function to help find out:
selectmeteorologist,
abs(forecasted_high – actual_high) degrees_off
from weather_forecast_vs_actual;
meteorologist degrees_off
Gail Winds 6
Sunny Cloudfoot 9
Storm Warning 21
We can see that Gail Winds was the most accurate with only 6 degrees
difference between her forecasted temperature and the actual temperature.
Sunny Cloudfoot was 9 degrees off, and Storm Warning’s forecast was
wrong by a whopping 21 degrees.
We subtracted the actual_high from the forecasted_high and then got the
absolute value of that number. If we had not used the abs( ) function, the
results would have been -6, -9, and 21. The abs( ) function had the effect of
removing the minus signs, which was useful for our purposes because we
don’t care whether the forecasted temperature was higher or lower than the
actual temperature.
We used a column alias of “degrees_off” so that the column header of the
result set would be more descriptive.
You can call Postgres functions in the “select list” of your SQL statements, as
shown above, and you can also call functions in the “where clause” of your
SQL statement, as shown below:
select*
from weather_forecast_vs_actual
where abs(forecasted_high – actual_high) < 10;
UPPER()
The upper( ) function takes a string and returns an all-uppercase version of it.
select upper('MiXed');
upper
MIXED
The upper function can be helpful in all sorts of scenarios. For example, you
may have two tables that were loaded from different data sources that use
different formatting rules for cities.
select * from city;
city state
Boston Massachusetts
Raleigh North Carolina
Sacramento California
city_name precipitation
BOSTON 47.1
RALEIGH 50.9
SACRAMENTO 21.4
If you were to join the tables, no rows would be returned because the cities
are all in uppercase in the yearly_precipitation table and therefore they
wouldn’t match the cities in the city table:
1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on a.city_name = b.city_name;
The query returns no rows, but if we change the query to join using the
upper( ) function on line 6, we get the expected results.
1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on upper(a.city_name) = upper(b.city_name);
In this case, we actually didn’t need to use the upper( ) function on the cities
in the yearly_precipitation table because they were already in uppercase
letters, but I recommend using upper( ) for both tables just to make sure you
are comparing apples to apples. In the future there could be cities in the
yearly_precipitation table that are not in uppercase letters.
By the way, it’s not a great idea to have different tables in your database that
represent cities in different ways. In a real, production database, we would
pick some standard format for cities and eliminate the need to call upper( ) in
our where clause. We might use upper( ) to help build our real tables based
on raw data tables that were loaded from some external source.
LOWER()
The lower( ) function does the opposite of what the upper( ) function does.
The lower( ) function takes a string and returns the all-lowercase version of it.
select lower('MiXed');
lower
mixed
The lower( ) function can also be used to join values that are in different
formats, in the same way that we used the upper( ) function above.
1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on lower(a.city_name) = lower(b.city_name);
INITCAP()
The initcap( ) function returns the first letter of each word as a capital letter
and all the other letters as lowercase. In other words, the initcap( ) function
takes the initial letter of each word and capitalizes it.
select initcap('george HERBERT wAlKeR BuSh');
initcap
George Herbert Walker Bush
Another example:
select initcap('e e cummings');
initcap
E E Cummings
ROUND()
The round( ) function takes a number with a decimal point and returns the
number rounded to its closest integer value.
select round(3.1415);
round
3
select round(3.5141);
round
4
round
3.50
This is useful when creating reports or displaying values to users who want to
see numeric values in a consistent format.
TRUNC()
The trunc( ) function truncates a number to a certain number of decimal
places. By default, it truncates all of the digits after the decimal point.
select trunc(3.1415);
trunc
3
select trunc(3.5141);
trunc
3
We got different results using the trunc( ) function than we did with the
round( ) function. When we called “trunc(3.5141)” it returned 3 because it
truncated – or chopped off – the “.5141”. The round( ) function, on the other
hand, rounded 3.5141 up to 4.
You can send the trunc( ) function a second parameter to tell it how many
places after the decimal point to keep:
select trunc(3.5141, 2);
trunc
3.51
Here it kept the 2 digits after the decimal point and truncated everything to
the right of that.
CEIL()
The ceil( ) function selects the closest integer that is more than the argument
passed in.
select ceil(3.1415);
ceil
4
select ceil(-3.1415);
ceil
-3
FLOOR()
The floor( ) function selects the closest integer that is less than the argument
passed in.
select floor(3.1415);
floor
3
select floor(-3.1415);
floor
-4
LENGTH()
The length( ) function counts the number of characters in a string.
select length('A long string');
length
13
Let’s get a list of the lengths of the cities in our city table:
selectcity_name,
length(city_name)
from city;
city_name length
Boston 6
Raleigh 7
Sacramento 10
Those results make sense because “Boston” has 6 letters, “Raleigh” has 7
letters, and “Sacramento” has 10 letters.
One of the aspects of Postgres functions that makes them so useful is that you
can string them together. Earlier we saw that we can use the max( ) function
to get the maximum, or highest, value. Let’s use the max( ) function together
with the length( ) function to see what the longest length for our cities is:
selectmax(length(city_name))
from city;
max
10
We selected each city_name from the city table and wrapped it in the length(
) function. Then we wrapped that in the max( ) function. At the end of the
first line, there are two right parens. The first right paren ends the call to the
length( ) function. The second right paren ends the call to the max( ) function.
The results show that the longest city name has 10 characters. That makes
sense because our longest city name, Sacramento, has 10 characters.
In a similar way, we could use the min( ) function to get the shortest
character length for our cities:
selectmin(length(city_name))
from city;
min
6
The number 6 is returned because the shortest city name in our table, Boston,
has six characters.
SUBSTR()
The substr( ) function takes a string and returns some part of it. It returns a
substring.
The substr( ) function takes a string, the position of the character in the string
that you want to start at, and the number of characters that you want to select
from the string.
code
811399100131
096244031005
789661001783
Barcodes are typically comprised of several parts. In our example, we want to
extract the manufacturing code, which starts on the 2nd character and is 5
characters long.
selectsubstr(code, 2, 5) mfg_code
from barcode;
mfg_code
11399
96244
89661
Calling “substr(code, 2, 5)” got us the values from the “code” column starting
at position 2 and going for 5 characters. We used a column alias of
“mfg_code” so that the results would display nicely under the heading
“mfg_code”.
If you want to get the string starting at a character position and going all the
way to the end of the string, you can leave off the 3rd argument to the substr(
) function:
selectcode,
substr(code, 9)
from barcode;
Here we are selecting the part of the “code” column that goes from the 9th
position all the way until the end of the value.
code substr
811399100131 0131
096244031005 1005
789661001783 1783
TRIM()
The trim( ) function returns a string with any extra characters at the beginning
and end of a string removed.
Sometimes when you load data from files into a database, the fields can have
extra spaces at the beginning or end of them. In the raw_data_load table
below, each of the column values have some number of spaces at both the
beginning and end.
select * from raw_data_load;
I have included quotes here to highlight where the leading and trailing spaces
are.
If we use the trim( ) function, it will remove the spaces from both ends of
each string:
selecttrim(col1),
trim(col2),
trim(col3)
from raw_data_load;
The trim function also allows you to remove characters other than the space
character. You can specify which character you want to remove. Let’s
remove the letter “k” from the beginning and end of the word “kayak”.
select trim('kayak', 'k');
aya
LTRIM()
The ltrim( ) function returns a string with the extra characters at the
beginning of a string removed. It trims characters from the left of the string.
Like the trim( ) function, the default character to remove is the space
character.
select ltrim (' Penny ');
This function returns “Penny ” with the spaces removed from the left side,
but not from the right side.
You can also specify which character you want to remove from the left
instead of removing the space characters. Let’s remove the leading zeroes
from “0000738”:
select ltrim('0000738', '0');
ltrim
738
RTRIM()
The rtrim( ) function, as you probably guessed, returns a string with the extra
characters removed from the right side of a string. The default character to
remove is a space character, but you can specify other characters. Here we
trimmed the text “Massachusetts” from the right side of the text “Boston
Massachusetts”:
select rtrim('Boston Massachusetts', 'Massachusetts');
rtrim
Boston
selectupper(
rtrim('Boston Massachusetts', 'Massachusetts')
);
upper
BOSTON
LEFT()
The left( ) function returns the first – or leftmost – characters of a string. We
can get the first 3 characters of our customers’ last names using this query:
selectlast_name,
left(last_name, 3)
from customer;
last_name left
Smith Smi
Guy Guy
Jones Jon
If we call the left( ) function with a negative number of, say -1, it will return
everything except the last 1 character:
selectlast_name,
left(last_name, -1)
from customer;
last_name left
Smith Smit
Guy Gu
Jones Jone
RIGHT()
The right( ) function returns the last – or rightmost – characters of a string.
We can get the last 3 characters of our customers’ last names using this
query:
selectlast_name,
right(last_name, 3)
from customer;
last_name right
Smith ith
Guy Guy
Jones nes
If you call the right( ) function with a negative number, say -2, it will return
everything except the first 2 characters:
selectlast_name,
right(last_name, -2)
from customer;
last_name right
Smith ith
Guy y
Jones nes
LPAD()
The lpad( ) function adds characters to the left of a value. It pads characters to
the left. You can specify the characters you want added and what the total
length should be.
Our send_check table has a column called “amt” that represents the amount
of the checks that we need to send out.
selectamt
from send_check;
amt
123
854
24
We can add asterisks to the beginning of the amount to prevent people from
fraudulently writing in their own numbers before our amounts:
selectlpad(amt, 5, '*')
from send_check;
lpad
**123
**854
***24
REPLACE()
The replace( ) function replaces some value in a string with a new value.
Let’s say we have a table of airport runways that includes runways for
“Orange County Airport”. That airport was renamed to “John Wayne
Airport”, so we want to display “John Wayne Airport” in our results instead
of “Orange County Airport”.
selectrunway_name
from runway;
runway_name
Runway 172 – Orange County Airport
Runway 213 – Orange County Airport
Runway 985 – Washington National Airport
We can use the replace( ) function to replace “Orange County Airport” with
“John Wayne Airport”:
select replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
from runway;
Notice that the replace( ) function didn’t change the entire string, just the part
that contained “Orange County Airport”. The text “Runway 172” and
“Runway 213” is still intact.
Also notice that we have not changed the values in the table with the replace(
) function, we are just selecting these values for display. If we wanted to
actually update the runway table, we could use the replace( ) function as part
of an update statement:
updaterunway
set runway_name =
replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
;
This update statement will change the runways that have “Orange County
Airport” in their name but won’t affect runways that do not.
FORMAT()
The format( ) function formats values based on a format string that you
provide.
selectformat(
'President %s was a %s',
president_name,
president_party
)
from us_president
where president_id between 31 and 34;
format
President Herbert Hoover was a Republican
President Franklin Roosevelt was a Democratic
President Dwight Eisenhower was a Republican
President Harry Truman was a Democratic
The format string used here is “President %s was a %s”. The “%s” is a
placeholder for a string value. The format string in this case has two
placeholders, the first one was replaced by the president_name value from the
us_president table, and the second one was replaced by the value of the
president_party column.
EXTRACT()
The extract( ) function selects parts from a timestamp or time value.
We saw earlier that the now( ) function returns the current date and time. The
data type that the now( ) function returns is a “timestamp with time zone”. It
includes several subfields that you can access by using the extract( ) function.
select now();
now
2020-10-28 12:13:00.439298-04
We can extract the century from the current date and time:
select extract(century from now());
date_part
21
We can extract the day from the current date and time:
select extract(day from now());
date_part
28
We can extract the month from the current date and time:
select extract(month from now());
date_part
10
EXP()
The exp() function returns the exponential of the number supplied.
select exp(1);
exp
2.718281828459045
PI()
The pi( ) function returns the value of pi.
select pi();
pi
3.141592653589793
POWER()
The power( ) function returns the value of a number raised to the power of
another number.
select power(3, 2);
power
9
RANDOM()
The random( ) function returns a random number between 0 and 1.
select random();
random
0.41278845070616654
POSITION()
The position( ) function finds the position of a substring within a string.
select position('pal' in 'principal');
position
7
The results tell us that the string “pal” is found at position 7 of the string
“principal”.
If we search for the position of a substring that doesn’t exist within a string,
Postgres will return a zero:
select position('i' in 'team');
position
0
Strangely, Postgres has another function called strpos( ) that does the same
thing that the position( ) function does, using slightly different syntax. The
strpos( ) function also returns the position of a substring within a string, but
with the strpos( ) function, you supply the string first and the substring
second, separated by commas:
select strpos('principal', 'pal');
strpos
7
VERSION()
The version( ) function tells you which version of Postgres you are running.
select version();
version
PostgreSQL 13.0 on x86_64-apple-darwin, compiled by Apple clang
version 11.0.3 (clang-1103.0.32.59), 64-bit
PGADMIN
Using PgAdmin to interact with Postgres will make life easier for you.
PgAdmin is a free and open source tool designed for Postgres. PgAdmin is a
graphical tool that runs in your web browser. PgAdmin is the Swiss Army
Knife of database tools. It has a slew of features for seemingly anything you
could ever want to do with Postgres.
You can download PgAdmin from the PgAdmin website. As of this writing,
it can be found at https://www.pgadmin.org/download/
The first time you use PgAdmin, it will ask you to create a master password.
Each time you start PgAdmin it will ask you for this password.
Connecting PgAdmin to a Postgres Server
To use PgAdmin with your Postgres dstabase servers, you will create
connections to the Postgres servers that you want to use. Here I created two
connections to two different servers. I called the connections “Server 1 –
Development” and “Server 2 – Production”.
The production environment is where the real, live data is stored. Production
Postgres servers are where the data that the users of our applications see is
located.
For this reason, some companies will give developers access to test or
development Postgres servers, but not to production servers. The thinking is
that there is no way for a developer to make a mistake in production if he or
she doesn’t have access to the production environment.
If you do have access to both production and development Postgres database
servers, however, it’s a good idea to mark each server connection clearly in
PgAdmin. That will help to prevent you from making a mistake in the
production environment. You want to make sure you don’t, say, delete data
from the production environment thinking that you are in the development
environment.
That will take you to the “General” tab of PgAdmin’s “Create – Server”
screen. The only value you need to enter on this tab is the server connection
name. You can call the connection whatever you want, but it’s a good idea to
name it something meaningful that describes what the server is used for.
As mentioned, if you have access to production Postgres servers as well as
non-production Postgres servers, I recommend that you clearly mark which
connection is which when you set up your server connections. Here I called
this connection “Postgres Essentials – Production”.
Another way to make it clear that this is a production server is to use color.
Some PgAdmin users will use the “Background” color picker to change a
connection’s background to red for a production database server.
Now let’s take a look at two of the most useful parts of PgAdmin: The Tree
Control pane and the Query Tool.
In other words, the bookstore table is on the Postgres database server with the
connection I called “Server 1 – Development”. That server has a database
called “Database1”. That database has a schema called “Schema1”. That
schema has a table called “bookstore”. That table has columns called store_id
and store_name. In the Tree Control pane, we can navigate to the bookstore
table by expanding nodes until we reach it.
The numbers shown in the parens to the right of “Servers” and “Databases”
tell us how many objects of those types there are. It says “Servers (2)”
because I have 2 server connections set up in PgAdmin. It says “Databases
(3)” because there are 3 databases under the “Server1 - Development” server.
The Tree Control pane is powerful because it lets us quickly drill down and
examine the part of the database we are interested in. It gives us a visual
sense of how our database objects are organized in Postgres, as well as what
naming conventions were used to create those objects.
The Query Tool
The Query Tool lets you type in some SQL, run that SQL in Postgres, and see
the results. To start the Query Tool, choose “Query Tool” from the “Tools”
drop-down menu.
Below, I entered the SQL “select * from us_president;” and clicked the
“execute” icon (the one shown at the top right), and the results appeared in
the lower “data output” pane. You can also press F5 instead of clicking the
execute icon to execute a query.
The Query Tool can be used for all sorts of SQL statements. You can use it to
run SQL to create tables, create functions, run procedures, create triggers, and
more.
Let’s say we have a table called customer that looks like this:
Now imagine that the boss tells us that we have to start tracking customer
middle names. We need to add a middle name column to the customer table.
We can do that by using the “alter table” command.
alter table customer add column middle_name text;
We added a new column to the customer table called middle_name and we
used the “text” data type. Now if we take another look at the table, we see
that the middle_name column is there, and all the middle name values were
set to null by default.
select * from customer;
Altering the table worked nicely. We didn’t lose any of our data and the new
column now exists. Now we could update the middle names for existing
customers and insert middle names for our new customers and we could
move on to our next task.
But there is something bugging me about this table. It is the order of the
columns. I would expect the middle_name column to come between the
first_name and last_name columns. But in our customer table, the “alter”
command added middle_name as the last column.
In relational databases like Postgres, the order of the columns doesn’t really
matter, so the “alter table / add column” statement just adds the column as the
last column in the table. Many Postgres developers would declare victory and
leave this table the way it is, with middle_name as the last column. After all,
when we select from the table, we can specify the column order we want, like
this:
selectcustomer_id,
first_name,
middle_name,
last_name
from customer;
To reorder the columns in the customer table without losing our data, we can
copy our customer data to another table temporarily, drop and recreate our
customer table with a new column order, and then copy the saved data from
the temporary table back to our new customer table.
-- Make a copy customer table copy that includes all data
create table customer_temp as
select * from customer;
Now if you select from the customer table the column order will be updated.
select * from customer;
Let’s take a look at a function that isn’t working properly. Let’s see if we can
figure out what the problem is. The function is supposed to compare the
number of Republican presidents with the number of Democratic presidents
and return the ratio of Democrats to Republicans. The problem is that the
function always returns 1.0. We would expect that about half of the
presidents that were either Democrats or Republicans would have been
Democrats, so the ratio should be about .5 or 50%, not 1.0 or 100%.
In lines 8 - 11, we are selecting the count of Democrats from the us_president
table. We put that value in a variable called v_num_democrats.
We know the function is returning 1.0 and we know that is wrong, but it’s a
little hard to look at the function and determine what the problem is. We
aren’t sure what part of the function is causing the problem. So let’s add
some raise statements to the function and see if that will help us to debug it.
We added three “raise notice” statements at lines 13, 20, and 24. These
statements will tell us the numbers of Democrats and Republicans that are
being returned from the database, as well as the ratio that was calculated in
line 22.
The value of the v_num_democrats variable will be shown where the percent
sign is, like “The count of Democrats is 15”.
When we run the function now, we can see the numbers being displayed from
our “raise notice” statements:
select f_dem_to_rep_ratio( );
NOTICE: The count of Democrats is 15
NOTICE: The count of Republicans is 0
NOTICE: The ratio is 1.00000000000000000000
This gives us a lot of insight into why the function is returning the wrong
ratio. The function is showing that there are zero Republicans. That can’t be
right. Now we can take a look at the part of the function that gets the number
of Republicans and determine what the problem is. The part of the function
that gets the count of Republicans looks like this:
selectcount(*)
into v_num_republicans
from us_president
where president_party = 'Republicans';
If we look at the rows of the us_president table we see that there are no rows
for ‘Republicans’ in the table. The values are stored in the database as
‘Republican’, without the “s”.
Adding the “raise notice” messages helped us to figure out which part of the
function was causing the problem. That made fixing the function much
easier.
Now that we have fixed the problem, we can remove the “raise notice”
statements from the function.
ALL BALLS
One of the quirkier features of Postgres is the “all balls” syntax. “All balls” is
slang for all zeroes, since zeroes look like balls. If you want to insert all
zeroes into a column that has a time data type, you can surround the text
“allballs” in single quotes.
insert into start_time (twotz)
select 'allballs';
twotz
00:00:00
Postgres provides us with multiple ways to cast. There is the cast( ) function,
the cast operator (“::”), and there are other functions available to help us cast
like the to_char( ), to_date( ), and the to_number( ) functions.
Here is an example of a scenario where you might want to cast from one data
type to another: Let’s say we want to see the current year and month in
“YYYY-MM” format. We don’t need the day or the time. We know that
there is a now( ) function that can get the current date for us, and we know
there is a substr( ) function that we can use to get a part of a value that we
care about. Let’s use those functions:
select now();
now
2020-10-28 12:13:00.439298-04
We only need the “2020-10” part of the date, so let’s wrap the now( )
function in the substr( ) function to select just the first 7 characters.
select substr(now(), 1, 7);
Postgres returns an error message:
ERROR: function substr(timestamp with time zone, integer, integer)
does not exist
LINE 1: select substr(now(),1,7);
^ HINT: No function matches the given name and argument
types. You might need to add explicit type casts.
SQL state: 42883
Character: 8
The problem is that the substr( ) function expects the first argument it gets
sent to be a string, and we sent it the results of the now( ) function, which is a
datetime.
The error message that Postgres gave us tells us how to resolve the problem:
“You might need to add explicit type casts.” We need to send the substr( )
function a string instead of a datetime. Let’s try calling the now( ) function
and then casting the results to the text data type before we send it to the
substr( ) function.
select substr(now()::text, 1, 7);
substr
2020-09
It worked.
The “::” symbol (two colons) is the Postgres cast operator. The syntax “now(
)::text” calls the now( ) function and then casts the results to a text data type.
Another way to accomplish the same thing is to use the cast( ) function:
select substr(cast(now() as text), 1, 7);
substr
2020-09
We got the same results. The syntax “cast(now( ) as text)” also calls the now(
) function and then casts the results to the text data type, but this time we are
using the cast( ) function instead of “::”.
With the cast function, we specify the data type that we want to cast to after
the word “as”:
select cast('123' as int);
These functions take two arguments: The value that needs to be converted,
and the format to use when converting the value. The format is made up of a
set of character patterns that identify the parts of the field. This group of
formatting characters is sometimes called a “format mask”.
to_date( )
The to_date( ) function converts a value to a date. For example, if we wanted
to convert a string to a date, we might use this SQL statement:
select to_date('28 Oct 2020', 'DD Mon YYYY');
We know that the original value “28 Oct 2020” is a string because it is
surrounded by single quotes. The query will return the string “28 Oct 2020”
converted to a “date” data type.
to_date
2020-10-28
The “DD Mon YYYY” format that we used when calling the function is a set
of characters that identify which part of the string “28 Oct 2020” is the day,
month, and year. By using the format “‘DD Mon YYYY” we let Postgres
know that the “28” in our string is a 2-digit day, the “Oct” in our string is a 3-
character month that is capitalized, and the “2020” in our string is a 4-digit
year. With this information, Postgres can convert the string to a date.
Postgres provides more character patterns than you will ever need, but the
most common patterns for dates are:
DD Day of month (01–31)
D Day of the week, Sunday (1) to Saturday (7)
MM Month number (01–12)
MONTHUppercase month name
Month Capitalized month name
MON 3-character uppercase month name
Mon 3-character capitalized month name
YYYY 4-digit year
YY 2-digit year
to_timestamp( )
The to_timestamp( ) function converts a value to a timestamp. Recall that a
“timestamp” data type in Postgres contains not only a date, but also a time
and a time zone. For that reason, the to_timestamp( ) function can make use
of some character patterns that weren’t useful for the to_date( ) function. The
most commonly-used of those character patterns are:
The query will return the string “28 Oct 2020 09:36:47 am” converted to a
“timestamp” data type. The format mask told Postgres how to parse the parts
of the string into the correct parts of the timestamp.
to_timestamp
2020-10-28 09:36:47-04
to_char( )
The to_char( ) function converts a value to a string.
to_char
Nov 01, 2020
Mar 14, 2021
Nov 07, 2021
Mar 13, 2022
Nov 06, 2022
The to_char( ) function can use some character patterns that we haven’t seen
before. These patterns are for converting numbers to strings. The most
commonly-used are:
selectto_char(123, '99999'),
to_char(123, '00000'),
to_char(1234.56, 'PL9G999.99');
You can see that the “99999” format mask did not display any leading zeroes,
but the “00000” format mask did. The “PL9G999.99” format mask added a
plus sign and a “group separator” to our number, which in the United States
displays as a comma. It also added a decimal point.
One of the wackiest character formats is “RN”, which will return a Roman
numeral:
select to_char(2021, 'RN');
to_char
MMXXI
to_number( )
The to_number( ) function converts a value to a number.
selectto_number(
'$1,234.56',
'L9G999.99'
);
to_number
1234.56
The following query does a calculation using the numbers from the table:
197.6923076923076923
That result doesn’t seem right to me. If the property tax was 50 and the
insurance was 40, those two values added together would be 90. If you
multiply 90 by 12 you get 1,080. If you divide 1,080 by 3.25 you get 332.30,
not 197.69. What’s going on here?
It turns out that Postgres didn’t solve the calculation in a left-to-right manner.
Instead, it did the multiplication and division first and then it added the
property tax. It multiplied the insurance of 40 by 12, which came to 480.
Then it divided 480 by the rate of 3.25, which came to 147.69, and then it
added the property tax of 50, for a total of 197.69.
That’s not what I had intended when I wrote the query, but the way Postgres
performed the calculation was actually correct. The problem was in the way I
wrote the query, but luckily there is an easy solution: Use parens.
332.3076923076923077
Here I added parens to group parts of the calculation. I put parens around
“property_tax + insurance” so that they would be calculated together. I also
put parens around “(property_tax + insurance) * 12”. Postgres honored my
groupings and produced the results that I wanted.
As a check, I like to count the number of left and right parens to make sure
they match. In “((property_tax + insurance) * 12) / rate” we have 2 left
parens and 2 right parens.
In its simplest form, you can copy data from a Postgres table to a file by
using the copy command like this:
copy us_president to '/home/pres/us_president.txt';
1, George Washington
2, John Adams, Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican
…
We can see that the copy command copied all 45 rows from the us_president
table to the /home/pres/us_president.txt file. Postgres didn’t remove the rows
from the us_president table, it just created a data file with a copy of the
values from the us_president table.
The copy command lets us create files in different formats. For example, we
can create a CSV (Comma Separated Values) file by specifying “with csv”:
copy us_president to '/home/pres/us_president.csv' with csv;
The file now gets created with commas between each field:
> cd /home/pres/
> cat us_president.csv
1, George Washington,
2, John Adams,Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican
…
Using SQL, let’s create a new file and use a pipe character as a delimiter
between the fields in the file:
copy essential_postgres.us_president to
'/home/pres/us_president_pipe.txt' with delimiter '|';
In the last data file we created, we saw commas between the fields, but in this
file we see pipe characters instead.
> cd /home/pres/
> cat us_president_pipe.txt
1|George Washington|
2|John Adams|Federalist
3|Thomas Jefferson|Democratic-Republican
4|James Madison|Democratic-Republican
5|James Monroe|Democratic-Republican
…
Now let’s look at loading data from a file into a table. Let’s load the rows
from our us_president.txt file into a new table called us_president_staging.
To load data from a file into a table, you can specify that you want to copy
“from” the file instead of copying “to” the file.
copy us_president_staging from '/home/pres/us_president.txt';
When you load data from a file into a database, it’s a good idea to load the
data into a staging table first and move the data to its permanent table later. A
“staging table” is a table that you create just for the purposes of loading data.
Using this approach, you can decrease the risk of causing problems with the
data in your permanent table if there is a problem loading the data from the
file.
First, let’s create the us_president_staging table with the same column
definitions as the us_president table, but with no rows of data:
Now we can load data from the data file into our staging table using SQL:
copy us_president_staging from '/home/pres/us_president.txt';
Now we could load the contents of the staging table into a permanent table.