Essential Postgres Database Development Using PostgreSQL (Rick Silva)
Essential Postgres Database Development Using PostgreSQL (Rick Silva)
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted,
in any form without prior written permission of the author.
Every effort has been made to ensure the accuracy of the information presented. However, the
information contained in the book is sold without warranty, either express or implied. The author will not
be held liable for any damages caused directly or indirectly by this book.
Introduction
Audience
Conventions
Ellipses Used in Examples
Parens
4 Built-In Functions
Aggregate Functions
abs( )
upper( )
lower( )
initcap( )
round( )
trunc( )
ceil( )
floor( )
length( )
substr( )
trim( )
ltrim( )
rtrim( )
left( )
right( )
lpad( )
replace( )
format( )
extract( )
exp( )
pi( )
power( )
random( )
position( )
version( )
This book, Essential Postgres, will give you exposure to the most used – and
useful – parts of Postgres development, as well as tips and insights that I
have picked up over my years of working with Postgres. Writing SQL
statements, creating tables, functions, triggers, views, and data integrity are
explained.
AUDIENCE
This book is suitable for anybody who wants to learn to use Postgres. That
includes folks who are new to Postgres and databases, intermediate-level
developers who would like a refresher, and even seasoned software
developers who are transitioning from another database to Postgres. This
book is for anybody who is interested in learning about the essentials of
using the Postgres database.
The aim of this book is to show you how to do things with the Postgres
database. The book is short on theory and long on examples. If you are the
type of person who likes that sort of thing, read on.
In this book, I have made some stylistic choices that you might want to be
aware of.
In this case, a full set of all 46 presidents was returned from the database, but
I omitted presidents 4 – 43 for the sake of brevity.
PARENS
I often refer to parentheses as “parens” because that’s the way I most often
hear them referred to among developers.
Also, I add parens to the end of function names. For example, Postgres has a
function called “upper” that I write as “upper( )”. You would say this aloud
as “the upper function”, but in this book I add the “parens” to the end of
function name to make it clear that “upper( )” is a function. Functions are
explained later in this book, and we’ll see what the “parens” are for then.
1
SELECTING DATA FROM A
POSTGRES DATABASE
For example, here is a table called us_president that has been defined with
three columns: president_id, president_name, and president_party.
This table contains 46 rows of data, one row for each president.
When naming database tables, it’s a good idea to stick with singular names
like us_president rather than plural names like us_presidents. Although
Postgres will allow you to name tables in the plural form, table names should
be singular by convention.
INTRODUCING SQL
In order to interact with Postgres, you will use Structured Query Language
(SQL). You will use SQL often for interacting with Postgres. Using SQL, you
can create tables, show the contents of tables, create new rows, update
values, and much more.
For example, you could see the data in our us_president table by using this
SQL select statement.
select president_id,
president_name,
president_party
from us_president;
This select statement returns data from the us_president database table and
displays it for you. Here we chose to select the president_id,
president_name, and president_party columns. Note that you should type a
semi-colon at the end of your SQL statements. The results are:
The query returns all 46 US presidents (although I have not shown presidents
4 – 43 for brevity).
Now let’s say you want to show only the rows for Republicans in the table.
You could make a change to the SQL and add a “where” clause.
select president_id,
president_name,
president_party
from us_president
where president_party = 'Republican';
The SQL query now returns 19 rows because there are 19 Republicans in the
table.
Since we are now retrieving a list of Republicans only, let’s change the SQL
to not select the president_party column. Let’s select just the president_id
and president_name columns.
select president_id,
president_name
from us_president
where president_party = 'Republican';
president_id president_name
16 Abraham Lincoln
18 Ulysses Grant
19 Rutherford Hayes
20 James Garfield
21 Chester Arthur
23 Benjamin Harrison
25 William McKinley
26 Theodore Roosevelt
27 William Taft
29 Warren Harding
30 Calvin Coolidge
31 Herbert Hoover
34 Dwight Eisenhower
37 Richard Nixon
38 Gerald Ford
40 Ronald Reagan
41 George Herbert Walker Bush
43 George W. Bush
45 Donald Trump
The SQL query didn’t show the party_name column because we didn’t select
it, and again, the query returned 19 rows.
ORDERING ROWS
Now let’s display all the rows and columns in the table ordered by political
party and then by the president’s ID. To do that, we can add an “order by”
clause to our SQL statement.
select president_id,
president_name,
president_party
from us_president
order by president_party,
president_id;
Adding the “order by” clause had the effect of showing us the rows ordered
alphabetically by the president_party column and then ordered by the
president_id column in lowest to highest order numerically. Democrats were
shown first because the string “Democrat” comes alphabetically before the
other political parties (“Democratic-Republican”, “Federalist”,
“Republican”, and “Whig”). Of the Democrat presidents, Andrew Jackson
appeared before Martin Van Buren because Jackson’s president_id value
was lower.
You can also specify whether you want to order in ascending or descending
order. That is, in low-to-high order or in high-to-low order.
select president_id,
president_name,
president_party
from us_president
order by president_party desc,
president_id asc;
The query returned all 46 rows (Again, I haven’t shown all the rows for
brevity). The data is displayed in descending (reverse) order for the
president_party column and then displayed in ascending (low-to-high) order
for the president_id column. If you don’t specify descending or ascending
(which can be abbreviated “desc” or “asc”), Postgres defaults to ascending
order and you will see your results in low-to-high order.
NULL VALUES
What’s going on with George Washington? Why is his political party listed as
null? Null is a special value in Postgres that represents an empty value.
George Washington was never a member of a political party so his
political_party value is set to null in our database table.
To show only the rows for presidents that have a null political_party, you
could run this query:
select president_id,
president_name,
president_party
from us_president
where president_party is null;
On the other hand, if you wanted to exclude rows that have a null
president_party, you would run this query:
select president_id,
president_name,
president_party
from us_president
where president_party is not null;
Null is a special value in Postgres. Null is not the same as zero, and it is not
the same as an empty string or a space character. It is its own value, and
Postgres has special syntax to help you deal with it.
The word “syntax”, in software development speak, means the words and
symbols you can use that are part of the language. Some of the special syntax
Postgres has to handle null values includes “is null”, “is not null”, and
“coalesce”. (We will see “coalesce” in a moment).
FORMATTING SQL
So far, the SQL we have used has been in a nice readable format.
1 select president_id,
2 president_name,
3 president_party
4 from us_president;
The column names and the table name all line up together vertically. There is
only one column name on each of lines 1 - 3. It is a good idea to write your
SQL statements in a neat, maintainable format like this, but SQL will also
allow you to write SQL statements that look like this:
or like this:
As long as SQL can figure out what you intended, it will run your statement.
You can add extra spaces between commas if you want. You can write the
entire statement on one line or you can use many lines.
You can also use the asterisk wildcard character as a shortcut to tell SQL to
retrieve all columns.
You can also use “--” to add a comment at the end of a line of SQL.
select president_id,
president_name,
president_party -- Political party, not birthday
from us_president;
To add comments that will be spread over more than one line, you can also
use “/*” at the beginning and “*/” at the end of comment.
/*
This query retrieves all the US Presidents.
There sure are a lot of them!
*/
select * from us_president;
The string “Say it loud” is an argument to the function. In other words, the
string “Say it loud” is sent to the upper( ) function, and the upper( ) function
will use that value to do something. In this case, it returns the uppercase
version of it, which is “SAY IT LOUD”.
Note that Postgres allows you to use a “select” statement without the “from”
clause. Normally, you would use a select statement to select data from a
database table, but in this case, since we are passing a hardcoded string
(“Say it loud”) into the function, we do not need to specify which database
table to get the data from. We don’t need to say “from table_name” because
there is no table involved. We can just “select upper(‘Say it loud’)” without
the word “from”.
To call the upper( ) function and pass a value to it from the database, we can
use a SQL statement like this:
select upper(president_name)
from us_president
where president_id = 10;
upper
JOHN TYLER
In our us_president table, president 10 is “John Tyler”. We got “John Tyler”
from the president_name column of the us_president table and then used it as
an argument to the upper( ) function, which used it to return the text “JOHN
TYLER”.
For example, when we selected from the us_president table, we saw that
George Washington’s political party was set to null.
select president_name,
president_party
from us_president;
president_name president_party
George Washington null
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …
If we want to display “No Political Party” instead of null, we could use the
coalesce( ) function.
select president_name,
coalesce(president_party, 'No Political Party')
from us_president;
president_name president_party
George Washington No Political Party
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …
The coalesce( ) function takes a list of values and returns the first value that
isn’t null. In this case, we sent the coalesce( ) function two values: the
president_party column from the us_president table, and the text “No
Political Party”. If the president_party column in the us_president table is not
null, it will be returned. But if the president_party column in the us_president
table is null, the text “No Political Party” will be returned. The effect is that
each president’s political party is shown, and for George Washington, null is
replaced by the text “No Political Party”.
select count(*)
from us_president;
This count( ) function will return the number of rows in the us_president
table, which is 46. Note that with the count( ) function you use an asterisk
between the parentheses, like this: count(*).
select max(president_id)
from us_president;
This max( ) function returns the maximum president_id value in the table. The
row in this table with the highest president_id has a president_id of 46, so 46
is returned.
select min(president_id)
from us_president;
This min( ) function returns the minimum president_id value in the table. The
lowest president_id in the table is 1, so 1 is returned.
select president_party,
count(*)
from us_president
group by president_party;
president_party count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democrat 17
Whig 4
The query’s results show us that there was one president without a political
party, 19 Republicans, 4 Democratic-Republicans, 1 Federalist, 17
Democrats, and 4 Whigs.
Let’s look at that last line of SQL that says “group by president_party”. We
said that an aggregate function returns a single value based on multiple
values in the database. For example, there are 19 rows in the table with
Republican presidents and the count( ) function returns just one row for them
that shows the total of 19. The “group by” in the SQL statement tells Postgres
to group the results by the president_party column.
select president_party,
count(*)
from us_president;
To correct the problem, add the “group by” clause and tell SQL which
column to group by.
Here is another example. Let’s say we want to list the highest president_id
value for each political party. We want to do something like this:
select president_party,
max(president_id)
from us_president;
But the query fails because we forgot the “group by” statement.
select president_party,
max(president_id)
from us_president
group by president_party;
president_party max
null 1
Republican 45
Democratic-Republican 6
Federalist 2
Democrat 46
Whig 13
select continent,
country,
sum(population),
sum(area)
from world_population;
This query will not run because we forgot the “group by” clause. But what
should be in the “group by” clause? Everything in the select clause that isn’t
an aggregate function.
select continent,
country,
sum(population),
sum(area)
from world_population
group by continent,
country;
select president_name,
count(*)
from us_president
group by president_name;
I would expect this query to return a list of all the presidents with a count
of 1. After all, it’s not like there were two presidents named George
Washington. But when we run the query, we see one president comes back
with a count of 2.
president_name count
… …
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …
What is going on here? If we run a query to look at the data in the table, we
can see an interesting wrinkle in the data.
select president_name,
count(*)
from us_president
group by president_name;
president_name count
… …
Franklin Pierce 1
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …
We had to look through all of those 45 resulting rows to see if there were any
counts that were more than 1. But there is an easier way. We can display only
counts that are higher than 1 by using the “having” statement.
select president_name,
count(*)
from us_president
group by president_name
having count(*) > 1;
president_name count
Grover Cleveland 2
The “having” clause is a bit like the “where” clause, but it can be used after
a “group by” clause to operate on grouped rows.
select president_id,
president_name,
president_party
from us_president
where president_party = 'Republican';
party_id party_name
1 Republican
2 Democratic-Republican
3 Federalist
4 Democrat
5 Whig
We should also change the us_president table and replace the president_party
column with a party_id column that matches the party_id from the
political_party table.
Now the text “Republican”, “Democrat” and “Whig” isn’t repeated over and
over in the table, which is an improvement.
1 select us_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4 from us_president
5 inner join political_party
6 on us_president.party_id = political_party.party_id;
We are selecting from two database tables: the us_president table in line 4 of
our SQL and the political_party table in line 5. Both tables have a party_id
column so we join the tables based on the party_id column of both tables in
line 6.
Postgres allows you to do different types of joins, like inner joins, outer
joins, and cross joins. These will be explained momentarily, but you can see
in line 5 that this query does an inner join.
Since we are now selecting from two tables in our query, every time we
reference a column name, we should specify which table the column comes
from. We can do that by using the table name followed by a period followed
by the column name. For example us_president.president_id refers to the
president_id column in the us_president table.
1 select us_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4 from us_president
5 inner join political_party
6 on us_president.party_id = political_party.party_id;
In this query, the president_id and president_name columns come from the
us_president table, and the party_name column comes from the
political_party column.
TABLE ALIASING
One time-saving shortcut we can use when writing SQL is to make a short
alias for our table names. A table alias is a short, temporary, substitute name
for the table. We can save keystrokes by using an alias of “a” for the
us_president table, and “b” for political_party table.
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;
This query returns the same information as the previous query, but it requires
us to do a lot less typing to create the query.
On line 4 we declare that “a” will be the alias for the us_president table. On
line 5 we declare that “b” will be the alias for the political_party table. We
can choose any letter or letters we want, but this example uses “a” and “b”.
Now we can type just “a” or “b” instead of the table names in lines 1, 2, 3,
and 6. The benefit of table aliasing, as you can see, is that it saves us quite a
bit of typing.
You may see queries that use a slight variation of this syntax, using the word
“as” to define table aliases.
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president as a
5 inner join political_party as b
6 on a.party_id = b.party_id;
You can see on lines 4 and 5 the word “as” was used between the table
names and the alias names. It’s fine to use the “as” syntax, but since the
objective of using table aliases is to cut down on typing, personally I prefer
not to use the word “as”.
TYPES OF JOINS
Inner Joins
Inner joins are the most common type of join. In an inner join, there must be a
match in both tables in order for data to be retrieved from the database. Let’s
take another look at the inner join from the example above.
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;
The query joins the us_president and political_party tables based on the
party_id columns of each of the tables.
The reason is that the political_party table has one row for each of these
parties: Democrat, Democratic-Republican, Federalist, Republican, and
Whig. But there is one president in the us_president table that isn’t from any
of those parties. George Washington’s party in the us_president table is null.
So, when we joined the two tables, there was no match for George
Washington’s party_id and he did not appear in the results of the query.
That is the nature of an inner join. In an inner join, there must be a match in
both tables in order for the data to be retrieved.
Also, where the “inner join” is specified on line 5, the word “inner” is
optional. The default join type is inner. So this query:
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 inner join political_party b
6 on a.party_id = b.party_id;
Will produce the same results as this query that does not use the word
“inner” on line 5:
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a
5 join political_party b
6 on a.party_id = b.party_id;
Let’s look at another example of an inner join using different tables. Let’s say
we have a customer table and an item_ordered table. The customer table and
the item_ordered table both have a customer_id column so we can join the
tables based on customer_id.
We can use this inner join to see which customer ordered which item.
select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;
Remember, the word “inner” is optional. This query does an inner join even
though the syntax says “join” instead of “inner join”. Inner is the default type
of join.
In this query we saved typing by aliasing the “customer” table as “a”, and the
“item_ordered” table as “b”.
customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Since this was an inner join, results appeared only when there was a match
in both tables. For this query, that means only customers who ordered an item
will appear in the results. “Molly Terry” is in the customer table but she
doesn’t have an order in the item_ordered table so she doesn’t appear in the
results at all.
Outer Joins
Outer joins work a little differently. An outer join will display all rows from
one table, whether or not there is a matching row in the other table. If there
are matching rows in the other table, it will show the columns from that table
too.
1 select a.customer_name,
2 b.item_ordered
3 from customer a left outer join item_ordered b
4 on a.customer_id = b.customer_id;
customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Molly Terry null
Using an outer join, Molly Terry appears in the results of the query, but the
item_ordered column appears as null because there was no matching row in
the item_ordered table. In other words, SQL can’t show us which item she
ordered because she didn’t order an item, so it displays null.
The customer table is the left table because it is to the left of the “left outer
join” syntax on line 3. The item_ordered table is the right table because it is
to the right of the “left outer join” syntax.
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a left outer join political_party b
5 on a.party_id = b.party_id;
Using this outer join, the query returns all 46 presidents. George Washington
appears in the results even though he has no matching political party in the
political_party table. This is different than the inner join we did on these
tables that returned only 45 rows.
The outer join worked as advertised. It showed all of the rows in the
us_president table and showed matching party_name columns from the
political_party table where there were some.
I like to think of the tables that are joined in an outer join as a mandatory
table and an optional table. In order for the president data to appear in the
results, it is mandatory that rows exist in the us_president table. But having
matching rows in the political_party table is optional. Rows will be returned
in our result set even if there isn’t a matching row in the political_party table.
The first table mentioned on line 4 is the us_president table. It is the left table
because it is to left of the “left outer join” syntax. The political_party table is
the right table because it is to the right of the “left outer join” syntax. Or
more simply stated, the us_president table is on the left in line 4 and the
political_party table is on the right.
This is how Postgres knows to select all rows from the us_president table
and only matching rows from the political_party table. “left outer join”
returns all the rows from the left table (us_president) even if there are no
matches in the right table (political_party).
Also note that the word “outer” is optional. I recommend that you use it for
clarity, but you will undoubtedly come across outer joins that use the syntax
“left join” instead of “left outer join”. The same results will be returned.
You can also do an outer join with the “right outer join” syntax rather than
“left outer join”. The “right outer join” syntax tells Postgres to do an outer
join and treat the table on the right side as the mandatory table.
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from us_president a left outer join political_party b
5 on a.party_id = b.party_id;
1 select a.president_id,
2 a.president_name,
3 b.party_name
4 from political_party b right outer join us_president a
5 on a.party_id = b.party_id;
Notice that in the SQL statement with the right outer join, the order of the
tables has been switched. The political_party table is now the left table and
the us_president table is now on the right side.
Cross Joins
Cross joins are used less frequently than inner and outer joins, but there are
occasions when they come in handy.
Let’s say a chain of bookstores wants to send one copy of each new book to
each of their stores. They have a table called bookstore that looks like this:
store_id store_name
1 Mall Store
2 College Campus Bookstore
3 Suburban Store
4 Cellar Store
And they have a table called new_book that looks like this:
book_id book_name
100 Essential Postgres
200 How to Knit
300 Poetry for Dogs
select b.store_name,
n.book_name
from bookstore b
cross join new_book n;
store_name book_name
Mall Store Essential Postgres
Mall Store How to Knit
Mall Store Poetry for Dogs
College Campus Bookstore Essential Postgres
College Campus Bookstore How to Knit
College Campus Bookstore Poetry for Dogs
Suburban Store Essential Postgres
Suburban Store How to Knit
Suburban Store Poetry for Dogs
Cellar Store Essential Postgres
Cellar Store How to Knit
Cellar Store Poetry for Dogs
A cross join joins every row of one table to every row of another table.
There are four rows in our bookstore table and three rows in our new_book
table. So it makes sense that 12 rows are returned by the cross join because 4
x 3 = 12.
Self Joins
A self join is a join where the same table is joined to itself.
Each row represents one position in the church. The supervisor_id column
represents the position_id of the supervisor. For example, position_id 5 –
“Priest” – has a supervisor_id of 4. If we look at the row in the table that has
a position_id of 4, we see Bishop. So, we know that the Priest’s supervisor
is the Bishop.
In order to select an org chart from the database, we can join the
church_hierarchy table to itself.
select a.position_desc,
b.position_desc
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;
position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
In the result set, the left column is a list of supervisors and the right column is
a list of their subordinates.
By the way, when I named the column “position_desc” in the table I used
“desc” to stand for “description”. That column is a text description of the
position.
The table alias “a” represents the church_hierarchy table, and the table alias
“b” also represents the church_hierarchy table, but we are treating them as
different tables. We joined on the position_id column of the “a” table to the
supervisor_id column of the “b” table.
COLUMN ALIASING
We have seen that we can create aliases for tables. Now let’s create aliases
for columns.
The query above gave us the information we needed for our org chart, but
both columns names appeared as “position_desc”.
position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
These column headings could be confusing to our end users. Really, the left
column is a list of supervisors and the right column is a list of workers. Let’s
change those column headings to show “Supervisor” and “Worker” instead of
the column headings from the table, “position_desc” and “position_desc”.
Now the query returns the same results, but the column headings make the
meaning of the data more clear.
Supervisor Worker
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
Just as with table aliases, you may see queries that use the word “as” when
defining column aliases, like this:
Either way is fine. You can use the word “as” or you can leave it out.
Using Parens
Postgres allows you to write SQL queries that accomplish the same results
using different syntax. We saw this query earlier.
1 select a.customer_name,
2 b.item_ordered
3 from customer a
4 join item_ordered b
5 on a.customer_id = b.customer_id;
You may come across queries where they use parens on line 5.
1 select a.customer_name,
2 b.item_ordered
3 from customer a
4 join item_ordered b
5 on (a.customer_id = b.customer_id);
select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
using (customer_id);
Either approach is fine. You will come across queries that use both the “on”
and the “using” styles.
Old-School SQL
This inner join between the customer and item_ordered tables looked like
this.
select a.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;
select a.customer_name,
b.item_ordered
from customer a,
item_ordered b
where a.customer_id = b.customer_id;
This is an older style of SQL that is still supported by Postgres. This syntax
doesn’t include the word “join”; it just lists the table names separated by
commas. I recommend that you use the syntax shown in the first example, but
you may come across queries written using this older “ANSI-89” style
shown in the second example.
inventory
store_id product_id quantity
1 11 5
2 22 9
2 11 2
3 33 12
3 11 4
product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.
Let’s join these four tables to get an inventory list of the products at all of our
music stores:
1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;
That query worked nicely. It was more complicated than our previous SQL
examples because the data we needed for this query was spread across four
tables. That required us to join across all four tables in order to produce our
inventory list.
In lines 5 – 8, we defined tables aliases for all four tables. We used “a” for
the store table, “b” for the product table, “c” for the supplier table, and “d”
for the inventory table. That saved us quite a bit of typing.
In lines 6 – 8, we used inner joins on the tables and specified which columns
the tables should be matched on. Recall that the “join” syntax is shorthand for
“inner join”.
So far all of the table aliases we have seen have been “a”, “b”, “c”, or “d”,
but you can make up your own table aliases. When you are writing a complex
query, you can make your SQL easier to read by making the table aliases
more meaningful. For example, you could write the query like this:
1 select st.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5 from store st
6 join inventory i on i.store_id = st.store_id
7 join product p on p.product_id = i.product_id
8 join supplier sp on sp.supplier_id = p.supplier_id
9 order by st.store_name;
Instead of using “a”, “b”, “c”, and “d”, this time we used table aliases that
are shorthand for the table names. For example, “p” for the product table and
“i” for the inventory table. That can make it easier for you to write the join
condition on say, line 7. It’s harder to lose sight of which tables you are
joining when the table alias gives you a hint like the first letter of the table.
Unfortunately, in this join, two of the tables start with the same letter: store
and supplier. Since there are two tables that start with the letter “s”, for those
tables we used two-letter table aliases of “st” for the store table, and “sp”
for the supplier table.
Also, when you write queries that are joining many tables, note that the order
of the joins matters. For example, let’s switch the order of lines 6 and 7 in
the query above.
1 select st.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5 from store st
6 join product p on p.product_id = i.product_id
7 join inventory i on i.store_id = st.store_id
8 join supplier sp on sp.supplier_id = p.supplier_id
9 order by st.store_name;
The mandolin has also been added to the product table, but the supplier was
set to null because we don’t know the supplier.
The trouble is that now our inventory list query isn’t showing the new
mandolin.
1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;
These are the same results that we saw before. Why isn’t the mandolin
appearing?
You can see that this query uses all inner joins. Lines 6 – 8 use just the word
“join”, which is shorthand for an inner join. In order to display a product that
is in the product table but doesn’t have a supplier in the supplier table, we
need to do an outer join between the product and supplier tables. But we
want the other joins in the query to continue to be inner joins.
It turns out that all we need to do is add the word “left” to the query to
change the inner join to an outer join.
1 select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5 from store a
6 join inventory d on d.store_id = a.store_id
7 join product b on b.product_id = d.product_id
8 left join supplier c on c.supplier_id = b.supplier_id
9 order by a.store_name;
Recall that “left join” is shorthand for “left outer join”. So now we have a
query that does two inner joins (on lines 6 and 7) and one outer join (on line
8). Now the mandolin appears in our inventory list.
store_name product_name supplier_name quantity
Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.
Tennessee Mandolin null 1
Instruments
STRING CONCATENATION
Let’s say we want to get a list of presidents in this format: “President Name
(Political Party)”. To do that, we can take the president_name column and
add a space, then and a left paren to it, then add the president_party value,
then add a right paren to it, like this:
…
George W. Bush (Republican)
Barack Obama (Democrat)
Donald Trump (Republican)
Joe Biden (Democrat)
Greater Than
Using “>” we can list all the presidents with a president_id that is more than,
say, 40:
select *
from us_president
where president_id > 40;
Notice that president 40, Ronald Reagan, doesn’t appear in our results. To
show Reagan as well, we can use “>=” instead of “>”.
select *
from us_president
where president_id >= 40;
Less Than
Using “<” we can list all the presidents with a president_id that is less than,
say, 5:
select *
from us_president
where president_id < 5;
select *
from us_president
where president_id <= 5;
Not Equal
There are two ways to represent “not equal” in SQL: “!=” and “<>”.
select *
from product
where product_name != 'Mandolin'
and product_id != 11;
The “!” is the “not” operator. So “!=” means “not equal to”. This query
selects products where the product_name is not Mandolin and the product_id
is not 11 (Banjo).
select *
from product
where product_name <> 'Mandolin'
and product_id <> 11;
I think of the “<>” syntax as saying, “show me everything that is less than (<)
11, and also show me everything that is greater than (>) 11”, which is really
another way to say, “not equal to 11”.
Both queries produce the same results: They both produce a list of all the
products except for Mandolin and Banjo, because Banjo is product_id 11.
I personally like to use the “!=” syntax more than the “<>” syntax, but they do
the same thing. There is no advantage to using one over the other.
In
The “in” keyword checks that a value is in a list of values.
select *
from store
where store_id in (1,3);
store_id store_name
1 Bill's Banjos
3 Tennessee Instruments
Our query returns all rows from the store table where the store_id is a 1 or a
3. Another way to write this query is using the “or” keyword:
select *
from store
where store_id = 1
or store_id = 3;
The advantage to using the “in” syntax is that it’s less typing than using “or”,
and with the “in” clause, you can also select the list of values from the
database, like this:
select *
from store
where store_id in
(
select store_id
from south_east_store
);
The part of the SQL statement within the parens is called a subquery. We’ll
explore subqueries in more detail momentarily.
Between
The “between” keyword checks that a value is within a range of values.
select *
from product
where product_id between 11 and 33;
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
Postgres returns a list of rows from the product table where the product_id is
between 11 and 33. Note that “between” returns rows where the product_id
equals 11 or equals 33, as well as any values that are between 11 and 33.
Like
The “like” keyword checks that a string value matches some pattern. In the
following query, we are returning values from the us_president table where
the values for president_name are like the pattern “George%”.
select president_name
from us_president
where president_name like 'George%';
president_name
George Washington
George Herbert Walker Bush
George W. Bush
This query returns the names of presidents where the president_name value
starts with the text “George”. The percent character represents any
characters. So, the pattern “George%” checks for “George” followed by any
characters at all.
The percent sign will also match no characters. If a president were named
just “George” with no last name, the “where president_name like
‘George%’” condition would return him as well.
The percent sign can be placed at any position in the string. In the query
below, the “%” is placed in the middle of the string, right between “George”
and “Bush”.
select president_name
from us_president
where president_name like 'George%Bush';
president_name
George Herbert Walker Bush
George W. Bush
When using “like”, you can also use the underscore character (“_”) to
represent one character. The following query searches for presidents with
names that have the pattern of “_onald%”. That is, any one character
followed by the text “onald” followed by anything at all.
select president_name
from us_president
where president_name like'_oNaLd%';
president_name
Ronald Reagan
Donald Trump
select president_name
from us_president
where president_name like '_oNaLd%';
select president_name
from us_president
where president_name ilike '_oNaLd%';
president_name
Ronald Reagan
Donald Trump
Is Null
“is null” checks if a value is null. Recall that null is a special value in
Postgres that represents an empty value.
select *
from product
where supplier_id is null;
Postgres comes with special keywords to deal with null values. A common
mistake is to use the syntax “where supplier_id = null”. That will return
unexpected results. The correct syntax is “where supplier_id is null”.
Not
The word “not” can be used to negate other keywords. For example:
select *
from product
where supplier_id is not null;
select *
from product
where supplier_id not in (11, 22);
select president_name
from us_president
where president_name not like '_onald%';
select *
from product
where product_id not between 11 and 33;
“Not” has the effect of reversing the search criteria. “where supplier_id in
(11, 22)” returns only 11 and 22, but “where supplier_id not in (11, 22)”
returns everything except 11 and 22.
SUBQUERIES
A subquery is a query that is nested within another query.
select tornado_alert_message
from emergency_message
where exists
(
select *
from tornado_alert
where alert_flag = 'Y'
);
This SQL statement has an inner query and an outer query. The part of the
SQL statement that is the inner query is also called a subquery. In the SQL
statement below, I have added comments to show which part is the outer
query and which part is the inner query or subquery.
The word “exists” on line 3 checks for the existence of rows in the subquery.
The inner query runs first, and checks the tornado_alert table to see if any
rows exist where the alert_flag column is set to “Y”. Let’s take a look at the
tornado_alert table.
State alert_flag
Massachusetts N
North Carolina Y
Florida N
There is a row in the table with an alert_flag of Y. It is the row for North
Carolina. Since at least one row with a Y exists, the outer query then selects
the tornado_alert_message column from the emergency_message table and
returns it. The results are:
tornado_alert_message
There is a tornado alert
If there had been more than one row with an alert_flag of Y, the results would
have been the same. “Exists” checks that one or more rows exist in the
subquery.
If there had not been any rows returned by the subquery with an alert flag of
Y, nothing would have been returned by the SQL statement at all.
Correlated Subqueries
Let’s take another look at our music store data. Let’s get a list of all stores
that have more than 10 of an item in stock.
1 select store_name
2 from store s
3 where exists
4 (
5 select i.store_id
6 from inventory i
7 where i.store_id = s.store_id
8 and i.quantity > 10
9 );
In this SQL statement, we have an inner query and an outer query like in the
last example, but there is something different about this SQL statement. In
line 7, the store table from the outer query is joined with the inventory table
from the inner query. This is a correlated subquery because there is a
correlation between the inner and outer queries. In other words, the inner and
outer queries are joined.
Because the inner and outer queries are correlated, the query can show us not
only that there is a store with more than 10 of an item in stock, but it can
show us which store it is.
store_name
Tennessee Instruments
1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );
Instead of using the “exists” keyword, we are checking that the store_id in the
outer query matches the store_id in the inner query, on lines 3 and 5.
store_name
Tennessee Instruments
The trouble is that if the data in the inventory table changes, and a second
store begins stocking more than 10 of an item, our query would fail.
Right now, if we were to run the just the inner query by itself, we would see
that it returns one row that represents store_id 3, “Tennessee Instruments”.
select i.store_id
from inventory i
where i.quantity > 10;
store_id
3
But now if another store, say “Bill’s Banjos”, got a new shipment of banjos,
the inner query would return two rows.
select i.store_id
from inventory i
where i.quantity > 10;
store_id
3
1
1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );
That is because the equal sign on line 3 tells SQL to expect only one value to
be returned from the subquery. When the subquery returns more than one
value, Postgres returns an error.
1 select store_name
2 from store s
3 where s.store_id in
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );
Now that we have used “in” on line 3, our results come back nicely:
store_name
Bill's Banjos
Tennessee Instruments
Now the outer query isn’t surprised when the inner query returns multiple
rows. The “in” keyword can handle a set of values while “=” expects just
one value.
DISTINCT
The “distinct” keyword removes duplicates from our result set. If we select
the president_party column from our us_president table, we get lots of
duplicates.
select president_party
from us_president;
president_party
null
Federalist
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democrat
Democrat
Whig
Whig
…
That query returns 45 values, one showing the party for each president. We
could use “distinct” if we want to see each party only once.
president_party
null
Republican
Democratic-Republican
Federalist
Democrat
Whig
Now the query returns only 6 rows. Selecting the distinct parties shows each
party value only once.
LIMIT
The “limit” keyword allows us to select a limited number of rows. If we are
interested in seeing what the us_president table looks like, but we don’t need
to see all 45 rows in it, we could limit the results to 5 rows.
select *
from us_president
limit 5;
So far, we have been selecting from database tables that already exist. Now
let’s look at how to create tables in the first place.
We can make up our own names for tables and columns. The names must be
63 characters or fewer, and they need to start with a letter or an underscore.
For readability, I recommend keeping your table and column names under 30
characters, and starting table and column names with a letter. Underscores
are often used between words in table and column names, but dashes are not
allowed.
By convention, tables are given singular names, not plural names. For
example, we called this table us_president instead of us_presidents.
In lines 2 – 4, we defined three columns for the table: president_id,
president_name, and president_party. We specified a data type for each
column. president_id is an integer, president_name is text, and
president_party also has a text data type.
DATA TYPES
Postgres provides lots of data types you can use to create your columns.
Some data types you will use frequently and some you will use once in a
while or only when some special situation arises.
Text
The text data type stores a string of characters. A string can be made up of
almost any characters, like letters, numbers, underscores, dashes,
ampersands, and lots more. Here the product_name column of the product
table is created with the text data type.
If you are creating a table and you need a column for text data of unknown
length, use the text data type.
Integer
The integer data type is used for storing whole numbers. An example of a
whole number is 3. int would not be the right data type to store 3.1415,
because that is not a whole number. That is because 3.1415 has a fractional
part (the part after the decimal point).
The smallint and bigint data types are available if you need them, but most
times, if you need to store numbers that don’t have a decimal point, the int
data type is what you want.
Numeric
The “numeric” data type is used for storing numbers that have a fractional
amount.
In Postgres, the “numeric” data type and the “decimal” data type are the
same. You can create the column using either word.
Numbers like these that have a decimal point are said to have a precision and
scale. Precision is the total number of digits. Scale is the number of digits
after the decimal point. For example, 3.14159 has a precision of 6 and a
scale of 5.
We can use the “numeric” data type without specifying a precision and scale,
like we did above, or we can define a precision and scale for our numeric
column. When we created the special_number table, we could have created
the special_value column with a precision of, say, 10 and a scale of 4.
Now the values appear with 4 digits to the right of the decimal point, and up
to 6 digits to the left of the decimal point. Since only 10 digits total are
allowed in the column, and 4 are to the right of the decimal point, that leaves
us with 6 digits to the left of the decimal point.
We could not add the US National Debt to this column anymore because it is
too large for our column definition. The number has more than 6 digits to the
left of the decimal point.
Date
To store dates in Postgres, we can use the “date” data type. For example,
here is a table called clock_change that has one column called
clock_change_date. The clock_change_date column was created with the
“date” data type, so we can store dates in that column.
select *
from clock_change;
clock_change_date
2020-11-01
2021-03-14
2021-11-07
2022-03-13
2022-11-06
1 select *
2 from clock_change
3 where clock_change_date = '2021-11-07';
The “date” data type is handy when you need to store a date but when you
don’t need to store the time of day or a time zone. Notice that the date is
surrounded by single quotes on line 3.
Postgres provides an easy way to get today’s date called current_date. The
current_date function gives you today’s date as a “date” data type.
Strangely, the current_date function is one of a handful of Postgres functions
that you call without parens. The syntax to call the current_date function is
“select current_date” not “select current_date( )”.
select current_date;
current_date
2021-06-29
We could also use current_date to check for dates in our clock_change table
that have today’s date.
select *
from clock_change
where clock_change_date = current_date;
You can also get the current_date and then add or subtract from it. For
example, the following SQL statement displays all rows in the table with a
clock_change_date from the last ten days:
select *
from clock_change
where clock_change_date > current_date – 10;
I named this column twotz because it stands for “Time Without Time Zone”.
The following query returns the column twotz from the start_time table:
select twotz
from start_time;
twotz
15:55:25.832142
The time that was returned consists of several parts. 15 is military time for
3pm because 15 minus 12 is 3. 55 is the minutes. 25 is the seconds. 832142
is the milliseconds. So, this time represents 3:55 pm and 25 seconds and
832142 milliseconds.
Notice that the “time without time zone” data type doesn’t include a date or a
time zone.
When using the “time without time zone” data type, you can either spell out
“time without time zone” or shorten it to “time”. The default for “time” is
“without time zone”. These two tables would be the same:
select twtz
from start_time;
This query returns the twtz column from the start_time table.
twtz
15:55:25.832142-04:00
This returned the same time as the “time without time zone” data type
example above, but now there is a “-4:00” at the end of it.
The “-4:00” is an offset that represents the time zone. This offset tells us the
difference between the time zone of this time and the UTC (Universal Time
Coordinated) time zone. The time “15:55:25.832142-04:00” is still 3:55 pm
and 25 seconds and 832142 milliseconds, but the “-4:00” tells us that it is in
the EDT time zone. EDT is Eastern Daylight Time, which is a time zone used
in the eastern part of the United States. EDT is 4 hours behind UTC, so the
offset is negative 4.
When using the “time with time zone” data type, you can either spell out
“time with time zone” or shorten it to “timetz”. These two tables would be
the same:
Postgres provides an easy way to get the current time called current_time.
The current_time function gives you the current time as a “time with time
zone” data type.
select current_time;
14:34:19.274711-04:00
You could compare the times in our table to the current time using this query.
select *
from start_time
where twtz = current_time;
We are unlikely to have any rows in the table with a time that is exactly the
same as the current time including the milliseconds, but we can check for
times within the last hour using a query like this:
select *
from start_time
where twtz > (current_time - interval '1 hour');
The “timestamp without time zone” data type can also be referred to as
simply “timestamp”. These two tables would be the same:
The “timestamp with time zone” data type can also be shortened to
“timestamptz”. These two tables would be the same:
Postgres provides a function called now( ) that is an easy way for you to get
a current “timestamp with time zone” value.
select now();
now
2021-06-29 14:54:56.710457-04
You can also use the Postgres current_timestamp function, which does this
same thing as now( ).
select current_timestamp;
current_timestamp
2021-06-29 14:54:56.710457-04
You could also use now( ) or current_timestamp to check the table to see if
there are any values with the current timestamp, like this:
select *
from start_time
where tswtz = now();
or:
select *
from start_time
where tswtz = current_timestamp;
Boolean
The boolean data type, often abbreviated as “bool”, stores a value that
represents true or false. Let’s take a look at the presidential_hair table.
president_name good_hair
Richard Nixon false
Martin Van Buren false
John F. Kennedy true
You may see queries that use a slightly different syntax for checking
booleans, for example:
To check a boolean for a value of false, you can use the “not” syntax.
select president_name
from presidential_hair
where not good_hair;
president_name
Richard Nixon
Martin Van Buren
And again, you may come across queries that use a slightly different syntax to
do the same thing.
In some databases, you will see columns named with the suffix “_flag” that
have a boolean data type. Since a boolean is either true or false, columns
will be given names like “active_user_flag” because if the user is active,
setting the column to true can be compared to raising a flag.
If you wanted a list of active users who speak Spanish, you could use this
query:
select *
from application_user
where active_user_flag
and speaks_spanish_flag;
Character
The “character” data type allows us to store fixed-length character strings. It
is often abbreviated as “char”.
For example, the following table was created to store data about states in the
USA:
state_code state_name
MA Massachusetts
NC North Carolina
HI Hawaii
Some of the data types - like point, circle, and polygon - are for geographical
data. There are data types for binary data, MAC addresses, host addresses,
and all manner of other things. You can find a complete description of all of
the Postgres data types on the Postgres website. You can even create your
own data types using the “create type” command if you ever need a really
unconventional data type.
Numbers
Formatted Data
json
Stores textual JSON (JavaScript Object Notation) data
jsonb Stores binary JSON data, decomposed
xml Stores XML (Extensible Markup Language) data
Serial
Geometry-Related
Specialty Addresses
Miscellaneous
CONSTRAINTS
When you create your own database tables, Postgres gives you a way to put
“constraints” on your columns. Constraints are rules about the data that can
be saved in your columns. Can there be two rows in the table with the same
value in this column? Are null values allowed for this column? Using
constraints, you can define rules about the data that is allowed, and Postgres
will enforce those rules for you.
Constraints help us with “data integrity”. That is, they help us to keep the
data in our database accurate and consistent.
Primary Keys
A “primary key” is a column, or more than one column, that uniquely
identifies the rows in a table. When you create a database table, one of the
most important questions to consider is “what should the primary key for this
table be?”
The answer is the president_id column. That is because no two rows can
have the same president_id. There will never be two different rows with the
same president_id.
That can’t be said of the president_party column. We can see that there are
multiple rows that have a president_party of “Democratic-Republican”.
But we can always be sure that president_id 3 will refer to just one row in
the table. So, when we create the us_president table, we should make the
president_id column the primary key.
To designate the president_id column the primary key for the table, we can
use the “primary key” syntax:
1 create table us_president (
2 president_id int primary key,
3 president_name text,
4 president_party text
5 );
Making president_id the primary key for this table does three good things for
us.
First, it prevents duplicate President IDs from being inserted into the table. If
a user of our database tries to add president_id 40 and there already is a
president_id 40 in the table, Postgres will stop them and give them an error
message.
Secondly, making president_id the primary key prevents users from adding a
null value to the president_id column. When we defined president_id as the
primary key, we designated it as a special column that cannot be null.
Those two things fall under the category of “data integrity”. They help to
keep the data in our database accurate and consistent. Once we define this
primary key, we can be assured that all rows in the table will have a unique
president_id, and that no president_id will be null. Postgres will enforce that
for us, and that will help to keep the data in our database of a high quality.
For this table, the primary key should consist of the city and temp_date
columns. That is because there should be only one row with the same city
and date. For example, we have a row for Boston for 2021-12-01 with a high
temperature of 36. If we make the city and temp_date the primary key for this
table, Postgres will prevent users from adding a second row for Boston for
2021-12-01.
There would never be a reason to add a second row for Boston for 2021-12-
01, so if we make city and temp_date the primary key, Postgres will make
sure that doesn’t happen. To make city and temp_date the primary key for this
table, we can use the “primary key” syntax.
In line 5, we told Postgres that the primary key should be comprised of the
city and temp_date columns. When a primary key is made up of more than
one column this way, it is called a “composite” key.
Postgres doesn’t require you to define a primary key for the tables you
create, but I recommend that you do. Defining primary keys for your tables
sets Postgres up to help you with data integrity. It also helps you to
understand your data and the best way to define your tables. Most times, if
you can’t figure out what the primary key should be for a new table, that
means you need to rethink your table design. Except in rare cases, every table
should have a primary key.
Foreign Keys
A foreign key is a column, or more than one column, that has a relationship
with the primary key of another table. You can use a table’s foreign key to
join it to another table. A foreign key consists of the columns that are used to
match this table to another table’s primary key columns.
Let’s take another look at the tables for our music stores:
supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.
product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
When we create the supplier table, we should make the primary key the
supplier_id column because we never want there to be two rows in the
supplier table with the same supplier_id.
The product table also has a supplier_id column. That column can be used to
link the product table to the supplier table. Product 11 – Banjo – in the
product table has a supplier_id of 100. We can go to the supplier table and
see that supplier_id 100 is “Missing Teeth Mfg” so we know that the
supplier of the banjo is Missing Teeth Mfg.
When we create the product table, we should define the supplier_id column
as a foreign key that references the supplier_id column of the supplier table.
Now if a user tries to add a product to the product table with a supplier_id
of, say, 900 then Postgres will prevent them from adding the product and will
display an error message. Postgres will recognize that there is no supplier_id
900 in the supplier table, so adding that row to the product table would
violate the foreign key we set up when we defined the table.
Now that we have defined the president_name column as “not null”, if a user
in our database tries to add a null value for president_name, Postgres will
stop them and display an error message.
By the way, the president_id column defined in line 2 will also not allow
null values because it has been defined as the table’s primary key. Recall that
primary keys columns cannot have a null value. Because it is the primary key,
we didn’t need to specify “not null” for the president_id column. Postgres
does that for us automatically.
Unique
Sometimes we want to prevent duplicate values in a column. We might want
to create a customer table, for example, that will not allow duplicate email
addresses.
On line 4 of our “create table” SQL statement, we used the word “unique”.
Because the table was created this way, Postgres will not allow two
customers in the table to have the same email address. Postgres will display
an error message if a user tries to create a customer with an email address
that is already in the table.
Check
We can use the “check” constraint to make sure that a column contains certain
values, or a certain range of values. We can define the high_temperature table
this way.
INDEXES
Postgres lets us create indexes on our tables that will help to speed up the
SQL queries that use those tables. You can create an index using the “create
index” syntax.
Again, you don’t need to create indexes for columns that have been defined
as primary keys. When you create a primary key for a table, Postgres
automatically indexes those columns for you.
DROPPING TABLES
The syntax to drop a table is “drop table”. Be careful with this command
because it will remove the table and all of the data in it.
ALTERING TABLES
The alter command lets us add columns to the table. We can also use it to
drop a column, change the data type of a column, rename a column, and more.
There are other ways to insert, update, and delete data from a Postgres
database in addition to using PgAdmin. Many applications have a User
Interface (UI) that allows end users to enter and change data and to have
those changes affect a Postgres database. An application’s UI might be web-
based, or it might be part of a mobile application, or it may be part of a
desktop application. Often, an application’s user interface sends SQL
commands like insert, update, and delete to a Postgres database, where those
commands are executed.
Inserting Data
To use SQL to insert data into a table, use the “insert” command.
Here we are inserting a row into the us_president table. In line 1, we are
telling Postgres that we want to insert this data into the president_id,
president_name, and president_party columns. Then on line 2, we provide
the data to insert, in this case, 33, “Harry Truman”, and “Democrat”. “Harry
Truman” and “Democrat” have single quotes around them because they are
text fields. 33 is an integer so it does not have quotes around it. This SQL
statement will add a new row to the us_president table.
Notice that we listed three columns in line 1 and three values in line 2.
Postgres looks for a column to insert the text “Democrat” in, but since we
removed the reference to the “president_party” column from line 2, it can’t
find one. We are trying to insert three values into two columns, so Postgres
returns this error message:
1 -- Don't do this.
2 insert into us_president
3 values (33, 'Harry Truman', 'Democrat');
Even though we aren’t specifying any columns to insert the data into on line
2, Postgres accepts this syntax and adds the row to the table. Postgres sees
that there are three columns in the table and since we provided three values,
it writes the values to those three columns in the table without displaying an
error.
The problem comes later when we decide to make a change to the
us_president table. Let’s say we add a new column to the table called
inauguration_date. Now when we run our insert statement, there are four
columns in the table and there are only three values being added. That causes
Postgres to reject the new row and display an error message.
For this reason, I recommend that you always list the column names that you
want to insert into.
1 -- Do this instead.
2 insert into us_president (president_id, president_name,
president_party)
3 values (33, 'Harry Truman', 'Democrat');
Listing the column names on line 2 allows this insert statement to work even
if we add new columns to the table later.
Updating Data
To change data that already exists in a table, we can use the “update”
statement and specify which data we want to change in our “where” clause.
Here we are replacing any null values in the president_party column with the
text “None”.
update us_president
set president_party = 'None'
where president_party is null;
This changes George Washington’s political party from null to the text
“None”.
Now let’s change the row for Lyndon Johnson to include his middle name.
update us_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';
This updates the row for Lyndon Johnson and sets his president_name to
Lyndon Baines Johnson. We can see that the update worked by querying the
table:
select *
from us_president
where president_id = 36;
But let’s imagine that we forgot to include the where clause in our update
statement. Instead of running this SQL statement:
update us_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';
update us_president
set president_name = 'Lyndon Baines Johnson';
Now instead of updating one row in the table, we would update all the rows
in the table! Let’s look at the table now:
So be aware that update statements can update more than one row. It will
update as many rows as the where clause specifies. If there is no “where”
clause at all, all rows in the table will be updated.
Deleting Data
To delete data from database tables, we usually use the “delete” statement.
There is also a “truncate” statement that we can use if we want to quickly
remove all rows in a table.
If there were no supplier 500 in the table to begin with, the delete statement
would not display an error message. It would simply delete zero rows.
To delete all rows in a table, you can use the “delete” syntax without a
“where” clause.
This will delete all the rows in the table, but it will not drop the table. The
table will still exist, but there will be no rows of data in it. If you want to
remove the entire table, you can use the “drop table” statement instead.
Like the “insert” statement, “delete” will remove as many rows that match
the criteria in the where clause. “Delete” can also delete many rows in a
single SQL statement. This one delete statement will delete every Republican
in the table:
truncate supplier;
Truncate has the same effect as deleting all rows from a table with the
“delete” statement. It will delete all the rows in the table, but will not drop
the table. The table will still exist, but there will be no rows in it. The
difference is that “truncate” is faster because of the way Postgres implements
“truncate” behind the scenes.
3
MORE DATABASE OBJECTS
Tables are by far the most-used database objects, but there are others we will
use as well, like views, materialized views, sequences, functions, and
triggers.
You can name your objects using the same rules that we used for tables. Their
names must be 63 characters or fewer, and they need to start with a letter or an
underscore. I like to name my database objects with a prefix that indicates
what type of objects they are. For example:
VIEWS
A “view” in Postgres is a saved SQL statement that you can query like it’s a
table. To create a view, use the “create view” statement.
Now that you have defined the v_democrat_president view, you can query it
as if it were a table.
president_id president_name
7 Andrew Jackson
8 Martin Van Buren
11 James Polk
14 Franklin Pierce
… …
You can also use a “where” clause when selecting from a view.
select *
from v_democrat_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Each time we select from a view, Postgres runs the query that we defined in
our “create view” statement. That query typically selects rows from a
database table. That means that if we make a change to the data in our
underlying database table, the change will also appear when we select from
the view. For example, when we selected from the v_democrat_president
view for president_id 7, Postgres ran the query that we defined when we
created the v_democrat_president view.
Now if we run our select statement against the view again, we see that
president_id 7 is not returned. This query returns no rows:
select *
from v_democrat_president
where president_id = 7;
That makes sense because this view is supposed to return only Democrat
presidents. We can see that the changes to the underlying table took effect
when we queried the view.
We could create a view called v_employee that has all of the columns except
for salary.
Now we have a table called employee that includes salary data, and a view
called v_employee that does not include salary data. When data changes in the
employee table, the v_employee view will automatically reflect those
changes, so we’ll only have to make our changes in one place.
Now we can give users in the Human Resources department access to the
employee table, and give all other users access to the v_employee view.
MATERIALIZED VIEWS
A “materialized view” in Postgres is created by running a SQL statement, but
unlike a regular view, the results of the SQL statement are saved. To create a
materialized view, use the “create materialized view” syntax.
select *
from mv_democrat_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Now let’s see what happens if we change the political party of president_id 7
to “Republican” in the underlying us_president table.
select *
from mv_democrat_president
where president_id = 7;
president_id president_name
7 Andrew Jackson
Even though we changed the data in the underlying us_president table, the data
in the materialized view did not get refreshed. The materialized view is still
reflecting the state of the us_president table before we changed it. Unlike a
regular view, we need to run the “refresh materialized view” SQL command
to refresh a materialized view.
Now when we check the materialized view, president 7 does not appear. The
change we made to the underlying table is now reflected in the materialized
view.
select *
from mv_democrat_president
where president_id = 7;
Now that we have refreshed the materialized view, the query returns no rows,
as we would expect.
Materialized views are good for situations where you either want to save
point-in-time data or where the query that creates the materialized view is
complex and takes a long time to run. If the query for a view takes, say, 5
minutes to run then you may make the decision to use a materialized view
instead. You may decide to refresh a materialized view every hour rather than
using a regular view that will run the query every time somebody selects from
the view.
SEQUENCES
Sometimes we want a column in a table to contain consecutive numbers. In the
us_president table for example, the row for the first president has a
president_id of 1. The next row has a president_id of 2. Then 3, 4, 5, etc.
Each time a new row is added to the us_president table, we can use the
“president_sequence” sequence we just created to keep a running tally and
insert the right president_id into the table. Postgres provides a function called
nextval( ) to help us with that. Calling nextval( ) adds 1 to our sequence and
saves the new value.
Let’s assume that the us_president table is empty when we start. We can add
rows to it using the sequence and the nextval( ) function to create a value for
the president_id column.
CREATING FUNCTIONS
We saw earlier how to call functions that have been provided to us by
Postgres, but now let’s take a look at creating our own functions.
Functions are helpful when you have a complex SQL statement, or a group of
SQL statements that have several steps, and you know that you will need to run
those same SQL statements again sometime in the future. It makes sense to
save that SQL as a function so you can call it by name later. This will prevent
you from having to recreate the SQL next time, and it will make it easy to run
the SQL the next time you need it. Functions are stored in the Postgres
database, so we can write the SQL once and simply call the function whenever
we need to.
Recall that Postgres comes with functions like the upper( ) function. We saw
that calling upper( ) with a value of “Say it loud” returned the text “SAY IT
LOUD”.
Notice that when you call the upper( ) function, there are left and right parens
after the word “upper”. You can pass values to the function by putting the
values between the parens. In this case, we wrapped the value “Say it loud” in
single quotes because it is text. If the value had been a number, we wouldn’t
have used quotes. You will call the functions that you write the same way that
you call functions provided by Postgres.
In our “select upper(‘Say it loud’)” example, the text “Say it loud” was the
only argument passed to the upper( ) function. The upper( ) function was
written to accept one value and to do something with it. In this case, the
function takes the value sent to it and returns the uppercase version of it.
We have been using plain SQL up until now and it has done everything we
needed it to do, but there are times when we need more functionality than the
SQL language can offer. SQL is a non-procedural language. It can query
database tables nicely, but you can’t define variables or create loops with it.
You can’t add if/then logic to your programs with plain SQL. If you need to
write a function using a procedural language to do those types of more
advanced things, use PL/pgSQL.
On line 1, we create the function. We could drop and then recreate the function
in two steps, but the “create or replace function” syntax will do that for us in
one step. If the function doesn’t exist, it will be created. If the function already
exists, it will be recreated.
The body of the function goes within the “$$” symbols (two dollar signs).
Surrounding text with $$ in Postgres is known as “dollar quoting”. The
function’s body goes from the $$ at the end of line 1 until the $$ at the
beginning of line 5. Also on line 5, we are specifying that this function has
been written in SQL, not PL/PgSQL.
Lines 2 – 4 are the body of the function. They contain a SQL query that selects
the president’s name, concatenates it to a space and a left paren, adds the
president’s political party, and concatenates a right paren. The effect is to
return the president’s name followed by his political party in parens. Recall
that two pipe characters (“||”) is the concatenation operator; it appends strings
together.
On line 4, “$1” represents the first argument passed into this function. In this
case there is only one argument passed in, but if there were others, we could
have referenced them as $2, $3, $4 for the second argument, the third
argument, the fourth argument, etc.
select f_get_president_and_party(7);
The function queried the us_president table for president_id 7 and returned the
president_name and president_party columns, nicely formatted. We could also
call this function embedded in a SQL query like this:
1 select f_get_president_and_party(president_id)
2 from us_president
3 where president_id > 40;
In line 1, we got the president_id from the us_president table and sent it as an
argument to the f_get_president_and_party( ) function. The function returned a
president name and the political party shown within parens. On line 3, in our
“where” clause, we asked for only presidents that have a president_id greater
than 40, so the function retrieved 5 presidents from the us_president table. The
f_get_president_and_party( ) function was called 5 times, with 5 different
president IDs, returning 5 rows.
The “begin” on line 8 goes with the “end” on line 27. This groups those lines
together in a code block. Notice that the “end” in line 27 has a semi-colon
after it, but the “begin” on line 8 does not.
On line 13, we have an “if statement”. The “if” on line 13 goes with the “end-
if” on line 15. If the value of the v_max_president_id variable is the same as
the value of the p_president_id parameter then we set the v_current_text
variable to the text “**Current President**”. If the value of the
v_max_president_id variable is not the same as the value of the
p_president_id parameter then we don’t set v_max_president_id, so it remains
equal to the empty string we set it to in line 7. Notice that the “end-if” on line
15 has a semi-colon after it, but the “if” on line 13 does not.
In line 2, we promised that this function “returns text”. In line 24, we actually
return that text. We return a text value that is created by combining the
p_president_id parameter and the v_president_name, v_president_party, and
v_current_text variables formatted with punctuation. Line 24 is a long line so
we continued it on line 25. Postgres knows we are done with the line when it
sees the semi-colon at the end of line 25.
Let’s test the function by calling it with an argument of 33 and then again with
an argument of 46.
select f_get_president_note_current(33);
select f_get_president_note_current(45);
We know the function is working because the data appears in the format we
expected and the current president shows the text “**Current President**”.
Return Types
Functions can return a value of any Postgres data type. The example we just
saw returned a text value, but functions can also return other data types like
int, date, or boolean.
Functions can also return nothing. To define a function that doesn’t return a
value, declare the function to return “void”.
Some developers like to use tags because they feel that it makes the code more
readable. If you plan to use a tag, the text within the dollar signs must be the
same on lines 2 and 7.
Postgres knows that we want to send the value 33 into the p_president_id
parameter and the text “Republican” into the p_party parameter. It knows that
because of the order of the parameters and arguments. When we called the
function, we provided 33 first and when we created the function, we accepted
p_president_id first. So Postgres knows that we want 33 to be the
p_president_id value. This is called “positional notation”.
select f_get_president(
p_president_id => 33,
p_party => 'Republican'
);
The equal sign followed by the greater than sign (“=>”) is intended to look
like an arrow, as if to say p_president_id points to 33 and p_party points to
“Republican”.
Using this named notation, we could switch the order that we provide the
arguments in, and the function would still work. Because we are specifying the
name of the parameter for each value, calling the function with the arguments
in a different order will work just as long as the correct parameter name is
still associated with the correct value.
select f_get_president(
p_party => 'Republican',
p_president_id => 33
);
We can use an “in” parameter to pass a value into the function. We can use an
“out” parameter to return a value from the function. An “inout” parameter is
used when we want to pass an argument into a function, change it, and then
return it. To send an array of values to a function, we can use “variadic”
mode.
select f_multiply_by_5(3);
f_multiply_by_5
15
In lines 6 and 7, we multiplied the p_in parameter by 5 and put the resulting
value into p_out. The p_out parameter gets returned from the function because
we declared its mode as “out” in line 3.
We could rewrite that function using just one “inout” parameter, like this:
select f_multiply_by_5(3);
f_multiply_by_5
15
Now the function accepts one parameter called p_inout on line 2. The value of
p_inout gets changed on lines 6 and 7 and is automatically returned to the
caller of the function because we declared its mode as “inout” on line 2.
Variadic mode is used less commonly than the in, out, and inout modes.
Variadic mode lets us pass an array into a function. An “array” is a group of
values that all have the same data type. Let’s write a function that accepts an
array of integers and multiplies them together.
select f_multiply_together(3,5,2);
f_multiply_together
30
In line 2, we promised that this function will return an integer, which in this
case will be all the parameter values multiplied together.
In lines 4 and 5, we declared two variables. We will use the “v_tot” variable
to calculate the product of the parameter values. We will use the “v_ind”
variable for each individual value.
In lines 7 - 9, we are looping through each value in p_nums. The v_ind
variable is assigned to each value, one value at a time. In line 8, we take the
value of our v_tot variable and multiply it by the value in the v_ind variable.
When we call the function with arguments of 3, 5, and 2, the function loops
through lines 7 – 9 three times. The first time v_ind is 3. The second time
v_ind is 5. The third time v_ind is 2.
We passed three values into the function, but because the function was written
to accept an array of integers, it can accept any number of integer values.
select f_multiply_together(1,2,3,4,5);
f_multiply_together
120
select f_multiply_together(12,4,9,8,1,333,1,87,5,2);
f_multiply_together
1001237760
PROCEDURES
Procedures, sometimes called “stored procedures”, are similar to functions.
Procedures were added to Postgres fairly recently, in version 11. In the real
world, I see functions being used more frequently than procedures in Postgres.
We saw that you can call a function by using the “select” keyword, like this:
select f_multiply_by_5(3);
TRIGGERS
You can use a trigger to automatically run a Postgres function when some event
occurs. For example, when a new row gets inserted into a particular table, you
can have a trigger “fire” and call a function to insert a row in another table.
Audit Triggers
One common use of triggers is for keeping track of which users have made
changes to a table. In cases where we have several different database users
making changes to data in an important table, it can be useful to create a
second table where we can log which user made which changes.
When new rows get inserted into the us_president table, let’s track which user
inserted them. To do that, we will create three new database objects:
A new table called us_president_insert_audit that will store an audit
trail of the inserts made to the us_president table.
A new trigger called tr_us_president_insert_audit that will fire when a
new row is inserted into the us_president table. This trigger will call a
function.
A new function called tf_us_president_insert_audit( ) that the trigger
will call to insert the audit information into the
us_president_insert_audit table.
When a user inserts a row for a new president in the us_president table, the
trigger will automatically fire and call the function. The function will write a
row to the us_president_insert_audit table. The audit table will store the name
of the user who inserted the row into the us_president table as well as what
time that row was inserted.
In line 3, we create a column called “insert_date” to track the date that the
new row was inserted into the us_president table.
In lines 4 – 6, we created columns in the audit table that will track the
president_id, president_name, and president_party columns that were inserted
into the us_president table. The same values for the columns that the user
inserted into the us_president table will be written to these columns in the
us_president_insert_audit table.
In line 1, I named the function starting with the text “tf_” for “trigger function”,
but this isn’t necessary. Postgres will allow you to pick your own naming
conventions. You can see there are no parameters defined between the parens.
For trigger functions, you don’t define any parameters.
In line 13, we got the user’s username by calling the “current_user” Postgres
function. You can test the current_user function by selecting your own
username from the database, like this:
select current_user;
current_user
Rick
In line 14, we got the date and time that the row was inserted by using the
Postgres now( ) function. By the way, you can test this function by selecting the
current date and time where you are.
select now();
now
2021-06-23 10:29:44.216481-04
In line 2, we specified that the trigger should fire “after insert”. That is, after
the insert has been made to the us_president table. When a user inserts a row
into the us_president table, the row will be written to us_president first and
then the audit record will be written to the us_president_insert_audit table
afterwards. Our options are “before”, “after”, or “instead of”. “Instead of”
means don’t insert the record to the us_president table at all; instead run the
function.
On line 3, we said “for each row”. That means for every row we insert into
the us_president table, the tf_us_president_insert_audit( ) function will be
called once. Our other option is “for each statement”. Consider this insert
statement:
This one insert statement is inserting 10 rows into the us_president table. If
our trigger specified “for each row”, it would call the
tf_us_president_insert_audit ( ) function 10 times. On the other hand, if we
specified “for each statement” in the trigger, it would call the
tf_us_president_insert_audit( ) function just once. Depending upon what you
are trying to accomplish, there are times that you might want to choose “for
each statement” or “for each row”.
Watching it Work
Now that the trigger is in place, any new rows that get inserted to the
us_president table will get audited to the us_president_insert_audit table. For
example, a user might add a new row to the us_president table using this insert
statement:
The audit trail shows that a database user named “HRC” created the row in
the us_president table on 1/21/2021 at 2:39pm Eastern Daylight Time. That
row of data doesn’t appear to be correct, so we might want to pay a visit to
the HRC user and possibly revoke the user’s database privileges. We should
probably also delete that row from the us_president table.
If you are writing a trigger to track rows that have been updated in a table, it is
helpful to know what the original values were and what the new values are.
For example, if a user runs this update statement:
update us_president
set president_name = 'Harry S. Truman'
where president_name = 'Harry Truman';
new.president_id,
new.president_name,
new.president_party
Altering Triggers
If we wanted to rename a trigger, we can use the “alter trigger” command.
Dropping Triggers
To remove a trigger, we can use the “drop trigger” syntax.
Notice that we need to tell Postgres that the trigger was associated with the
us_president table. If we are sure we won’t need them anymore, we could also
drop the other database objects that we created for use with the trigger.
Disabling Triggers
If you have a trigger that you know you will not need again, you can drop the
trigger. Dropping the trigger will remove it permanently. On the other hand, if
you want to keep the trigger but prevent it from firing temporarily, you can
disable the trigger and reenable it when you want it to start firing again.
Notice that to disable a trigger, you alter the table that it is associated with.
Enabling Triggers
After you have disabled a trigger temporarily, you can enable it again by using
the “alter table” command.
As mentioned, Postgres comes with lots of functions that you can call from
your SQL statements. There are functions to transform strings, do
mathematical calculations, get today’s date, and perform all manner of other
helpful operations.
Postgres comes with over 2,000 functions. You won’t need to use – or even
know about – most of them. The majority of the functions that Postgres
provides are not commonly-used and are not shown here. Most of them can
be found on the Postgres website if you do ever have occasion to use them.
Let’s take a look at some of the built-in Postgres functions that you are most
likely to use.
AGGREGATE FUNCTIONS
We have already talked about aggregate functions. You will use these
functions frequently:
count( )
Returns a count of rows
max( ) Returns the maximum, or largest, of a set
min( ) Returns the minimum, or smallest, of a set
avg( ) Returns the average of a set of numbers
sum( ) Returns the sum of a set of numbers
ABS()
The abs( ) function returns the absolute value of a number. “Absolute value”
measures how far away from zero a number is.
select abs(8),
abs(-8);
abs abs
8 8
The results show that the absolute value of 8 is 8, and the absolute value of
-8 is also 8.
Consider the following table that contains a list of meteorologists, their most
recent forecasted temperature, and the actual temperature:
Which meteorologist made the closest prediction? We can use the abs( )
function to help find out:
select meteorologist,
abs(forecasted_high – actual_high) degrees_off
from weather_forecast_vs_actual;
meteorologist degrees_off
Gail Winds 6
Sunny Cloudfoot 9
Storm Warning 21
We can see that Gail Winds was the most accurate with only 6 degrees
difference between her forecasted temperature and the actual temperature.
Sunny Cloudfoot was 9 degrees off, and Storm Warning’s forecast was wrong
by a whopping 21 degrees.
We subtracted the actual_high from the forecasted_high and then got the
absolute value of that number. If we had not used the abs( ) function, the
results would have been -6, -9, and 21. The abs( ) function had the effect of
removing the minus signs, which was useful for our purposes because we
don’t care whether the forecasted temperature was higher or lower than the
actual temperature.
You can call Postgres functions in the “select list” of your SQL statements, as
shown above, and you can also call functions in the “where clause” of your
SQL statement, as shown below:
select *
from weather_forecast_vs_actual
where abs(forecasted_high – actual_high) < 10;
UPPER()
The upper( ) function takes a string and returns an all-uppercase version of it.
select upper('MiXed');
upper
MIXED
The upper function can be helpful in all sorts of scenarios. For example, you
may have two tables that were loaded from different data sources that use
different formatting rules for cities.
city state
Boston Massachusetts
Raleigh North Carolina
Sacramento California
If you were to join the tables, no rows would be returned because the cities
are all in uppercase in the yearly_precipitation table and therefore they
wouldn’t match the cities in the city table:
1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on a.city_name = b.city_name;
The query returns no rows, but if we change the query to join using the upper(
) function on line 6, we get the expected results.
1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on upper(a.city_name) = upper(b.city_name);
In this case, we actually didn’t need to use the upper( ) function on the cities
in the yearly_precipitation table because they were already in uppercase
letters, but I recommend using upper( ) for both tables just to make sure you
are comparing apples to apples. In the future there could be cities in the
yearly_precipitation table that are not in uppercase letters.
By the way, it’s not a great idea to have different tables in your database that
represent cities in different ways. In a real, production database, we would
pick some standard format for cities and eliminate the need to call upper( ) in
our where clause. We might use upper( ) to help build our real tables based
on raw data tables that were loaded from some external source.
LOWER()
The lower( ) function does the opposite of what the upper( ) function does.
The lower( ) function takes a string and returns the all-lowercase version of
it.
select lower('MiXed');
lower
mixed
The lower( ) function can also be used to join values that are in different
formats, in the same way that we used the upper( ) function above.
1 select a.city_name,
2 a.state,
3 b.precipitation
4 from city a
5 join yearly_precipitation b
6 on lower(a.city_name) = lower(b.city_name);
INITCAP()
The initcap( ) function returns the first letter of each word as a capital letter
and all the other letters as lowercase. In other words, the initcap( ) function
takes the initial letter of each word and capitalizes it.
select initcap('george HERBERT wAlKeR BuSh');
initcap
George Herbert Walker Bush
Another example:
initcap
E E Cummings
ROUND()
The round( ) function takes a number with a decimal point and returns the
number rounded to its closest integer value.
select round(3.1415);
round
3
select round(3.5141);
round
4
round
3.50
This is useful when creating reports or displaying values to users who want
to see numeric values in a consistent format.
TRUNC()
The trunc( ) function truncates a number to a certain number of decimal
places. By default, it truncates all of the digits after the decimal point.
select trunc(3.1415);
trunc
3
select trunc(3.5141);
trunc
3
We got different results using the trunc( ) function than we did with the round(
) function. When we called “trunc(3.5141)” it returned 3 because it truncated
– or chopped off – the “.5141”. The round( ) function, on the other hand,
rounded 3.5141 up to 4.
You can send the trunc( ) function a second parameter to tell it how many
places after the decimal point to keep:
trunc
3.51
Here it kept the 2 digits after the decimal point and truncated everything to
the right of that.
CEIL()
The ceil( ) function selects the closest integer that is more than the argument
passed in.
select ceil(3.1415);
ceil
4
select ceil(-3.1415);
ceil
-3
FLOOR()
The floor( ) function selects the closest integer that is less than the argument
passed in.
select floor(3.1415);
floor
3
select floor(-3.1415);
floor
-4
LENGTH()
The length( ) function counts the number of characters in a string.
length
13
Let’s get a list of the lengths of the cities in our city table:
select city_name,
length(city_name)
from city;
city_name length
Boston 6
Raleigh 7
Sacramento 10
Those results make sense because “Boston” has 6 letters, “Raleigh” has 7
letters, and “Sacramento” has 10 letters.
One of the aspects of Postgres functions that makes them so useful is that you
can string them together. Earlier we saw that we can use the max( ) function
to get the maximum, or highest, value. Let’s use the max( ) function together
with the length( ) function to see what the longest length for our cities is:
select max(length(city_name))
from city;
max
10
We selected each city_name from the city table and wrapped it in the length(
) function. Then we wrapped that in the max( ) function. At the end of the first
line, there are two right parens. The first right paren ends the call to the
length( ) function. The second right paren ends the call to the max( ) function.
The results show that the longest city name has 10 characters. That makes
sense because our longest city name, Sacramento, has 10 characters.
In a similar way, we could use the min( ) function to get the shortest
character length for our cities:
select min(length(city_name))
from city;
min
6
The number 6 is returned because the shortest city name in our table, Boston,
has six characters.
SUBSTR()
The substr( ) function takes a string and returns some part of it. It returns a
substring.
The substr( ) function takes a string, the position of the character in the string
that you want to start at, and the number of characters that you want to select
from the string.
code
811399100131
096244031005
789661001783
mfg_code
11399
96244
89661
Calling “substr(code, 2, 5)” got us the values from the “code” column
starting at position 2 and going for 5 characters. We used a column alias of
“mfg_code” so that the results would display nicely under the heading
“mfg_code”.
If you want to get the string starting at a character position and going all the
way to the end of the string, you can leave off the 3rd argument to the substr( )
function:
select code,
substr(code, 9)
from barcode;
Here we are selecting the part of the “code” column that goes from the 9th
position all the way until the end of the value.
code substr
811399100131 0131
096244031005 1005
789661001783 1783
TRIM()
The trim( ) function returns a string with any extra characters at the beginning
and end of a string removed.
Sometimes when you load data from files into a database, the fields can have
extra spaces at the beginning or end of them. In the raw_data_load table
below, each of the column values have some number of spaces at both the
beginning and end.
select * from raw_data_load;
I have included quotes here to highlight where the leading and trailing spaces
are.
If we use the trim( ) function, it will remove the spaces from both ends of
each string:
select trim(col1),
trim(col2),
trim(col3)
from raw_data_load;
aya
LTRIM()
The ltrim( ) function returns a string with the extra characters at the beginning
of a string removed. It trims characters from the left of the string. Like the
trim( ) function, the default character to remove is the space character.
This function returns “Penny ” with the spaces removed from the left side, but
not from the right side.
You can also specify which character you want to remove from the left
instead of removing the space characters. Let’s remove the leading zeroes
from “0000738”:
ltrim
738
RTRIM()
The rtrim( ) function, as you probably guessed, returns a string with the extra
characters removed from the right side of a string. The default character to
remove is a space character, but you can specify other characters. Here we
trimmed the text “Massachusetts” from the right side of the text “Boston
Massachusetts”:
select upper(
rtrim('Boston Massachusetts', 'Massachusetts')
);
upper
BOSTON
LEFT()
The left( ) function returns the first – or leftmost – characters of a string. We
can get the first 3 characters of our customers’ last names using this query:
select last_name,
left(last_name, 3)
from customer;
last_name left
Smith Smi
Guy Guy
Jones Jon
If we call the left( ) function with a negative number of, say -1, it will return
everything except the last 1 character:
select last_name,
left(last_name, -1)
from customer;
last_name left
Smith Smit
Guy Gu
Jones Jone
RIGHT()
The right( ) function returns the last – or rightmost – characters of a string.
We can get the last 3 characters of our customers’ last names using this
query:
select last_name,
right(last_name, 3)
from customer;
last_name right
Smith ith
Guy Guy
Jones nes
If you call the right( ) function with a negative number, say -2, it will return
everything except the first 2 characters:
select last_name,
right(last_name, -2)
from customer;
last_name right
Smith ith
Guy y
Jones nes
LPAD()
The lpad( ) function adds characters to the left of a value. It pads characters
to the left. You can specify the characters you want added and what the total
length should be.
Our send_check table has a column called “amt” that represents the amount
of the checks that we need to send out.
select amt
from send_check;
amt
123
854
24
We can add asterisks to the beginning of the amount to prevent people from
fraudulently writing in their own numbers before our amounts:
lpad
**123
**854
***24
REPLACE()
The replace( ) function replaces some value in a string with a new value.
Let’s say we have a table of airport runways that includes runways for
“Orange County Airport”. That airport was renamed to “John Wayne
Airport”, so we want to display “John Wayne Airport” in our results instead
of “Orange County Airport”.
select runway_name
from runway;
runway_name
Runway 172 – Orange County Airport
Runway 213 – Orange County Airport
Runway 985 – Washington National Airport
We can use the replace( ) function to replace “Orange County Airport” with
“John Wayne Airport”:
select replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
from runway;
Runway 172 – John Wayne Airport
Runway 213 – John Wayne Airport
Runway 985 – Washington National Airport
Notice that the replace( ) function didn’t change the entire string, just the part
that contained “Orange County Airport”. The text “Runway 172” and
“Runway 213” is still intact.
Also notice that we have not changed the values in the table with the replace(
) function, we are just selecting these values for display. If we wanted to
actually update the runway table, we could use the replace( ) function as part
of an update statement:
update runway
set runway_name =
replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
;
This update statement will change the runways that have “Orange County
Airport” in their name but won’t affect runways that do not.
FORMAT()
The format( ) function formats values based on a format string that you
provide.
select format(
'President %s was a %s',
president_name,
president_party
)
from us_president
where president_id between 31 and 34;
format
President Herbert Hoover was a Republican
President Franklin Roosevelt was a Democrat
President Dwight Eisenhower was a Republican
President Harry Truman was a Democrat
The format string used here is “President %s was a %s”. The “%s” is a
placeholder for a string value. The format string in this case has two
placeholders, the first one was replaced by the president_name value from
the us_president table, and the second one was replaced by the value of the
president_party column.
EXTRACT()
The extract( ) function selects parts from a timestamp or time value.
We saw earlier that the now( ) function returns the current date and time. The
data type that the now( ) function returns is a “timestamp with time zone”. It
includes several subfields that you can access by using the extract( ) function.
select now();
now
2021-06-29 12:13:00.439298-04
We can extract the century from the current date and time:
date_part
21
We can extract the day from the current date and time:
date_part
28
We can extract the month from the current date and time:
EXP()
The exp() function returns the exponential of the number supplied.
select exp(1);
exp
2.718281828459045
PI()
The pi( ) function returns the value of pi.
select pi();
pi
3.141592653589793
POWER()
The power( ) function returns the value of a number raised to the power of
another number.
select power(3, 2);
power
9
RANDOM()
The random( ) function returns a random number between 0 and 1.
select random();
random
0.41278845070616654
POSITION()
The position( ) function finds the position of a substring within a string.
position
7
The results tell us that the string “pal” is found at position 7 of the string
“principal”.
If we search for the position of a substring that doesn’t exist within a string,
Postgres will return a zero:
position
0
Strangely, Postgres has another function called strpos( ) that does the same
thing that the position( ) function does, using slightly different syntax. The
strpos( ) function also returns the position of a substring within a string, but
with the strpos( ) function, you supply the string first and the substring
second, separated by commas:
strpos
7
VERSION()
The version( ) function tells you which version of Postgres you are running.
select version();
version
PostgreSQL 13.2, compiled by Visual C++ build 1914, 64-bit
PGADMIN
Using PgAdmin to interact with Postgres will make life easier for you.
PgAdmin is a free and open source tool designed for Postgres. PgAdmin is a
graphical tool that runs in your web browser. PgAdmin is the Swiss Army
Knife of database tools. It has a slew of features for seemingly anything you
could ever want to do with Postgres.
You can download PgAdmin from the PgAdmin website. As of this writing, it
can be found at https://www.pgadmin.org/download/
The first time you use PgAdmin, it will ask you to create a master password.
Each time you start PgAdmin it will ask you for this password.
Connecting PgAdmin to a Postgres Server
To use PgAdmin with your Postgres dstabase servers, you will create
connections to the Postgres servers that you want to use. Here I created two
connections to two different servers. I called the connections “Server 1 –
Development” and “Server 2 – Production”.
The production environment is where the real, live data is stored. Production
Postgres servers are where the data that the users of our applications see is
located.
For this reason, some companies will give developers access to test or
development Postgres servers, but not to production servers. The thinking is
that there is no way for a developer to make a mistake in production if he or
she doesn’t have access to the production environment.
If you do have access to both production and development Postgres database
servers, however, it’s a good idea to mark each server connection clearly in
PgAdmin. That will help to prevent you from making a mistake in the
production environment. You want to make sure you don’t, say, delete data
from the production environment thinking that you are in the development
environment.
That will take you to the “General” tab of PgAdmin’s “Create – Server”
screen. The only value you need to enter on this tab is the server connection
name. You can call the connection whatever you want, but it’s a good idea to
name it something meaningful that describes what the server is used for.
As mentioned, if you have access to production Postgres servers as well as
non-production Postgres servers, I recommend that you clearly mark which
connection is which when you set up your server connections. Here I called
this connection “Postgres Essentials – Production”.
Another way to make it clear that this is a production server is to use color.
Some PgAdmin users will use the “Background” color picker to change a
connection’s background to red for a production database server.
Now let’s take a look at two of the most useful parts of PgAdmin: The Tree
Control pane and the Query Tool.
In other words, the bookstore table is on the Postgres database server with
the connection I called “Server 1 – Development”. That server has a
database called “Database1”. That database has a schema called “Schema1”.
That schema has a table called “bookstore”. That table has columns called
store_id and store_name. In the Tree Control pane, we can navigate to the
bookstore table by expanding nodes until we reach it.
The numbers shown in the parens to the right of “Servers” and “Databases”
tell us how many objects of those types there are. It says “Servers (2)”
because I have 2 server connections set up in PgAdmin. It says “Databases
(3)” because there are 3 databases under the “Server1 - Development”
server.
The Tree Control pane is powerful because it lets us quickly drill down and
examine the part of the database we are interested in. It gives us a visual
sense of how our database objects are organized in Postgres, as well as what
naming conventions were used to create those objects.
The Query Tool
The Query Tool lets you type in some SQL, run that SQL in Postgres, and see
the results. To start the Query Tool, choose “Query Tool” from the “Tools”
drop-down menu.
Below, I entered the SQL “select * from us_president;” and clicked the
“execute” icon (the one shown at the top right), and the results appeared in
the lower “data output” pane. You can also press F5 instead of clicking the
execute icon to execute a query.
The Query Tool can be used for all sorts of SQL statements. You can use it to
run SQL to create tables, create functions, run procedures, create triggers,
and more.
Let’s say we have a table called customer that looks like this:
Now imagine that the boss tells us that we have to start tracking customer
middle names. We need to add a middle name column to the customer table.
We can do that by using the “alter table” command.
Altering the table worked nicely. We didn’t lose any of our data and the new
column now exists. Now we could update the middle names for existing
customers and insert middle names for our new customers and we could
move on to our next task.
But there is something bugging me about this table. It is the order of the
columns. I would expect the middle_name column to come between the
first_name and last_name columns. But in our customer table, the “alter”
command added middle_name as the last column.
In relational databases like Postgres, the order of the columns doesn’t really
matter, so the “alter table / add column” statement just adds the column as the
last column in the table. Many Postgres developers would declare victory
and leave this table the way it is, with middle_name as the last column. After
all, when we select from the table, we can specify the column order we want,
like this:
select customer_id,
first_name,
middle_name,
last_name
from customer;
To reorder the columns in the customer table without losing our data, we can
copy our customer data to another table temporarily, drop and recreate our
customer table with a new column order, and then copy the saved data from
the temporary table back to our new customer table.
Now if you select from the customer table the column order will be updated.
Let’s take a look at a function that isn’t working properly. Let’s see if we can
figure out what the problem is. The function is supposed to compare the
number of Republican presidents with the number of Democrat presidents
and return the ratio of Democrats to Republicans. The problem is that the
function always returns 1.0. We would expect that about half of the presidents
that were either Democrats or Republicans would have been Democrats, so
the ratio should be about .5 or 50%, not 1.0 or 100%.
In lines 8 - 11, we are selecting the count of Democrats from the us_president
table. We put that value in a variable called v_num_democrats.
We know the function is returning 1.0 and we know that is wrong, but it’s a
little hard to look at the function and determine what the problem is. We
aren’t sure what part of the function is causing the problem. So let’s add
some raise statements to the function and see if that will help us to debug it.
We added three “raise notice” statements at lines 13, 20, and 24. These
statements will tell us the numbers of Democrats and Republicans that are
being returned from the database, as well as the ratio that was calculated in
line 22.
In the “raise notice” statements, we used a percentage sign as a place holder
for the variable and we supplied the variable name. In the line:
The value of the v_num_democrats variable will be shown where the percent
sign is, like “The count of Democrats is 17”.
When we run the function now, we can see the numbers being displayed from
our “raise notice” statements:
select f_dem_to_rep_ratio( );
This gives us a lot of insight into why the function is returning the wrong
ratio. The function is showing that there are zero Republicans. That can’t be
right. Now we can take a look at the part of the function that gets the number
of Republicans and determine what the problem is. The part of the function
that gets the count of Republicans looks like this:
select count(*)
into v_num_republicans
from us_president
where president_party = 'Republicans';
If we look at the rows of the us_president table we see that there are no rows
for ‘Republicans’ in the table. The values are stored in the database as
‘Republican’, without the “s”.
Adding the “raise notice” messages helped us to figure out which part of the
function was causing the problem. That made fixing the function much easier.
Now that we have fixed the problem, we can remove the “raise notice”
statements from the function.
ALL BALLS
One of the quirkier features of Postgres is the “all balls” syntax. “All balls”
is slang for all zeroes, since zeroes look like balls. If you want to insert all
zeroes into a column that has a time data type, you can surround the text
“allballs” in single quotes.
twotz
00:00:00
Postgres provides us with multiple ways to cast. There is the cast( ) function,
the cast operator (“::”), and there are other functions available to help us cast
like the to_char( ), to_date( ), and the to_number( ) functions.
Here is an example of a scenario where you might want to cast from one data
type to another: Let’s say we want to see the current year and month in
“YYYY-MM” format. We don’t need the day or the time. We know that there
is a now( ) function that can get the current date for us, and we know there is
a substr( ) function that we can use to get a part of a value that we care about.
Let’s use those functions:
select now();
now
2021-06-28 12:13:00.439298-04
We only need the “2021-06” part of the date, so let’s wrap the now( )
function in the substr( ) function to select just the first 7 characters.
The problem is that the substr( ) function expects the first argument it gets
sent to be a string, and we sent it the results of the now( ) function, which is a
datetime.
The error message that Postgres gave us tells us how to resolve the problem:
“You might need to add explicit type casts.” We need to send the substr( )
function a string instead of a datetime. Let’s try calling the now( ) function
and then casting the results to the text data type before we send it to the
substr( ) function.
substr
2021-06
It worked.
The “::” symbol (two colons) is the Postgres cast operator. The syntax “now(
)::text” calls the now( ) function and then casts the results to a text data type.
Another way to accomplish the same thing is to use the cast( ) function:
substr
2021-06
We got the same results. The syntax “cast(now( ) as text)” also calls the now(
) function and then casts the results to the text data type, but this time we are
using the cast( ) function instead of “::”.
With the cast function, we specify the data type that we want to cast to after
the word “as”:
These functions take two arguments: The value that needs to be converted,
and the format to use when converting the value. The format is made up of a
set of character patterns that identify the parts of the field. This group of
formatting characters is sometimes called a “format mask”.
to_date( )
The to_date( ) function converts a value to a date. For example, if we wanted
to convert a string to a date, we might use this SQL statement:
We know that the original value “28 Oct 2021” is a string because it is
surrounded by single quotes. The query will return the string “28 Oct 2021”
converted to a “date” data type.
to_date
2021-10-28
The “DD Mon YYYY” format that we used when calling the function is a set
of characters that identify which part of the string “28 Oct 2021” is the day,
month, and year. By using the format “‘DD Mon YYYY” we let Postgres
know that the “28” in our string is a 2-digit day, the “Oct” in our string is a 3-
character month that is capitalized, and the “2021” in our string is a 4-digit
year. With this information, Postgres can convert the string to a date.
Postgres provides more character patterns than you will ever need, but the
most common patterns for dates are:
to_timestamp( )
The to_timestamp( ) function converts a value to a timestamp. Recall that a
“timestamp” data type in Postgres contains not only a date, but also a time
and a time zone. For that reason, the to_timestamp( ) function can make use of
some character patterns that weren’t useful for the to_date( ) function. The
most commonly-used of those character patterns are:
select to_timestamp(
'28 Oct 2021 09:36:47 am',
'DD Mon YYYY HH:MI:SS AM'
);
The query will return the string “28 Oct 2021 09:36:47 am” converted to a
“timestamp” data type. The format mask told Postgres how to parse the parts
of the string into the correct parts of the timestamp.
to_timestamp
2021-10-28 09:36:47-04
to_char( )
The to_char( ) function converts a value to a string.
to_char
Nov 01, 2020
Mar 14, 2021
Nov 07, 2021
Mar 13, 2022
Nov 06, 2022
The to_char( ) function can use some character patterns that we haven’t seen
before. These patterns are for converting numbers to strings. The most
commonly-used are:
You can see that the “99999” format mask did not display any leading zeroes,
but the “00000” format mask did. The “PL9G999.99” format mask added a
plus sign and a “group separator” to our number, which in the United States
displays as a comma. It also added a decimal point.
One of the wackiest character formats is “RN”, which will return a Roman
numeral:
to_char
MMXXI
to_number( )
The to_number( ) function converts a value to a number.
select to_number(
'$1,234.56',
'L9G999.99'
);
to_number
1234.56
The following query does a calculation using the numbers from the table:
197.6923076923076923
That result doesn’t seem right to me. If the property tax was 50 and the
insurance was 40, those two values added together would be 90. If you
multiply 90 by 12 you get 1,080. If you divide 1,080 by 3.25 you get 332.30,
not 197.69. What’s going on here?
It turns out that Postgres didn’t solve the calculation in a left-to-right manner.
Instead, it did the multiplication and division first and then it added the
property tax. It multiplied the insurance of 40 by 12, which came to 480.
Then it divided 480 by the rate of 3.25, which came to 147.69, and then it
added the property tax of 50, for a total of 197.69.
That’s not what I had intended when I wrote the query, but the way Postgres
performed the calculation was actually correct. The problem was in the way
I wrote the query, but luckily there is an easy solution: Use parens.
332.3076923076923077
Here I added parens to group parts of the calculation. I put parens around
“property_tax + insurance” so that they would be calculated together. I also
put parens around “(property_tax + insurance) * 12”. Postgres honored my
groupings and produced the results that I wanted.
As a check, I like to count the number of left and right parens to make sure
they match. In “((property_tax + insurance) * 12) / rate” we have 2 left
parens and 2 right parens.
In its simplest form, you can copy data from a Postgres table to a file by
using the copy command like this:
> cd /home/pres/
> cat us_president.txt
1, George Washington
2, John Adams, Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican
…
We can see that the copy command copied all 45 rows from the us_president
table to the /home/pres/us_president.txt file. Postgres didn’t remove the rows
from the us_president table, it just created a data file with a copy of the
values from the us_president table.
The copy command lets us create files in different formats. For example, we
can create a CSV (Comma Separated Values) file by specifying “with csv”:
The file now gets created with commas between each field:
> cd /home/pres/
> cat us_president.csv
1, George Washington,
2, John Adams,Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican
…
Using SQL, let’s create a new file and use a pipe character as a delimiter
between the fields in the file:
In the last data file we created, we saw commas between the fields, but in
this file we see pipe characters instead.
> cd /home/pres/
> cat us_president_pipe.txt
1|George Washington|
2|John Adams|Federalist
3|Thomas Jefferson|Democratic-Republican
4|James Madison|Democratic-Republican
5|James Monroe|Democratic-Republican
…
Now let’s look at loading data from a file into a table. Let’s load the rows
from our us_president.txt file into a new table called us_president_staging.
To load data from a file into a table, you can specify that you want to copy
“from” the file instead of copying “to” the file.
When you load data from a file into a database, it’s a good idea to load the
data into a staging table first and move the data to its permanent table later. A
“staging table” is a table that you create just for the purposes of loading data.
Using this approach, you can decrease the risk of causing problems with the
data in your permanent table if there is a problem loading the data from the
file.
First, let’s create the us_president_staging table with the same column
definitions as the us_president table, but with no rows of data:
Now we can load data from the data file into our staging table using SQL:
Now we could load the contents of the staging table into a permanent table.
For example, a table containing data about authors might have a primary key
of the author_id column. That means that there can’t be more than one row in
the table that has the same value for the author_id column.
In another table containing data about books, the primary key might be the
author_id and the book_name columns. That would allow one – but only one
– row per author ID and book name. There could never be more than one row
in the table with a particular author_id and book_name combination.
One way to approach this is to create a staging table with no primary key to
store the donation detail information. We could load the data file into that
table. Then we could use the data in that detail staging table to build a
summary table.
(
donor text,
donation_amount int
);
When we load our data file into the donation_detail_staging table, we might
see data that looks like this:
donor donation_amount
Jacki Smith 200
Jacki Smith 850
RJ Boyle 100
Jacki Smith 100
If we look at the data in the donation_summary table, we see only two rows:
donor donation_sum
Jacki Smith 1150
RJ Boyle 100
Primary keys are important because they prevent duplicate rows, and they
speed up joins between tables. For that reason, we might choose to use the
donation_summary table in any future SQL joins, and drop the
donation_detail_staging table once we are done using it to build the
donation_summary table.
In your query, you can join to the CTE as if it were a table. You can create a
CTE using the “with” keyword:
1 with party_cte as
2 (
3 select president_party,
4 count(*) as president_count
5 from us_president
6 group by president_party
7 )
8 select avg (president_count)
9 from party_cte;
avg
7.6666666666666667
select president_party,
count(*) as president_count
from us_president
group by president_party;
president_party president_count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democrat 17
Whig 4
The syntax for calling Postgres, of course, will vary depending upon the
computer language that you are using. The way that Python programs call
Postgres will look a little different from the way that Java or PHP programs
call Postgres, but there are similarities.
Let’s take a look at how we would call a Postgres database from the Python
programming language.
A common way to access a Postgres database from Python programs is to use
the “psycopg2” database adapter. Psycopg2 is compatible with Python
version 3, which – at the time of this writing – is the version of Python that
most people are using.
As of this writing, psycopg3 is in development, but not yet ready for prime
time.
Apparently, the name “psycopg2” comes from combining “psyco” with “pg”
and adding the number 2. Rumor has it that the name “psyco” is a reference to
an old Python compiler, “pg” stands for Postgres, and the number 2 means
that it implements version 2 of the Python DB API.
1 import psycopg2
2
3 conn = psycopg2.connect(
4 host="111.11.11.11",
5 port="5432",
6 database="president",
7 user="Rick",
8 password=”guacamole83”)
9
10 cur = conn.cursor()
11 cur.execute ('select * from us_president;')
12 presidents = cur.fetchall()
13 for president in presidents:
14 print(f"President {president[0]} was {president[1]}")
15 cur.close()
In line 1, we import the psycopg2 module so that we can interact with our
Postgres database.
In line 10, we use the “conn” Python variable to create a Python variable
named “cur”, which represents a cursor. A cursor allows us to send
commands to the Postgres database and return data back to our Python
program.
In line 11, we use our cursor to execute a SQL command. We select “star”
(everything) from the us_president database table.
In line 12, we fetch all the results and write them to a Python variable named
“presidents”.
In line 13, we use a Python “for loop” to iterate through all the presidents
data and write each individual row to a Python variable called “president”
(with no “s” at the end).
In line 14, we use Python’s f-string formatting to display the first two
columns we got from the Postgres database for the row. The columns are
zero-based, so president[0] is the first column and president[1] is the second
column. We could have also displayed each president’s party – the third
column - by printing president[2], but we chose not to.
Regardless of the programming language you use, there will be some module,
method, or driver that creates a connection to the database using credentials
like server name, database name, database user, and database password.
There will be a way to provide SQL, execute it against the database, and
return the results to your program. Sometimes that SQL will be a select,
insert, update, or delete statement, but sometimes it will be a call to a stored
procedure or a database function.
The most common ORM for the Python language is SQLAlchemy. Django – a
Python Web framework – uses it’s own ORM called “Django ORM”. The
most popular ORM for the Java language is Hibernate.
FINAL THOUGHTS
Thank you for reading this book. I hope it helped you to understand Postgres.
If you enjoyed the book, I would love it if you would leave a review on
Amazon. If you would like to suggest changes to future editions, or if you
found an error in the book, feel free to email me at
[email protected].
Learning to work with relational databases, like Postgres, can stretch your
mind. Seeing data in terms of rows and columns can be a different way to
conceptualize the world. Understanding an organization’s data by
understanding its primary keys may be a new way of thinking for you, so I
congratulate you for working your way through this book.
I expect to see Postgres take market share away from the big proprietary
database vendors in the future, and I hope you will benefit from that trend.
I hope that you will continue to enjoy learning new technologies and
techniques all throughout your career. Believe in yourself and do your best to
guard against “imposter syndrome” – that nagging doubting of your own
abilities and the feeling of being a fraud - which is so prevalent among
people in our industry. By continuing to learn and by having fun in the
process, you can’t help but to accumulate expertise and become skilled in
whichever technologies you choose to learn.