0% found this document useful (0 votes)
64 views31 pages

Ceng301 Dbms Session 12

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views31 pages

Ceng301 Dbms Session 12

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

CENG301 Database Management Systems

Session-12
Asst. Prof. Mustafa YENIAD
[email protected]
PostgreSQL Vacuum
• The term "vacuum" refers to a database maintenance procedure that reclaims storage space
and optimizes database performance.
• Whenever data is inserted, updated, or deleted in a PostgreSQL database, it can generate "dead tuples"
- rows that have become obsolete or are inaccessible.
• Sometime in the 2000s the developers of PostgreSQL found a major loophole in the design of their
relational database management system with respect to storage space and transaction speed:
• It turned out that the UPDATE query was becoming an
expensive routine.
DBMS

• UPDATE was duplicating the old row and rewriting new data
in, which meant that the size of the database or the tables
were not bound to any limit! Additionally, deleting a row
only MARKED the row deleted while the actual data
remained untouched – data forensics supported later.
• This may sound familiar since it is what present-day file
systems and data recovery software rely on i.e., data, when
deleted, remains intact on the magnetic disk in its raw form,
but is hidden on the interface. However, keeping old data
was important for older transactions as well. So technically,
it wasn’t right to compromise on transactional integrity. This
being sufficient stimulus, the Postgres team soon introduced
the ‘vacuum’ feature which literally vacuumed the deleted
rows. However, this was a manual process and due to the
several parameters involved in the function, it wasn’t
desirable. Hence, autovacuum was developed.
PostgreSQL Vacuum
• Remember: UPDATE in PostgreSQL would perform an insert and a delete. Hence, all the records being
UPDATED have been deleted and inserted back with the new value.
• As mentioned above, every such record that has been deleted but is still taking some space is called a
dead tuple.
• Once there is no dependency on those dead tuples with the already running transactions, the dead tuples
are no longer needed.
• Thus, PostgreSQL runs VACUUM on such Tables. VACUUM reclaims the storage occupied by these dead
DBMS

tuples.
• The VACUUM operation in PostgreSQL identifies and eliminates these dead tuples, freeing up disk space to
utilize for future operations.
• In a large-scale datacenter, the tables in the Application Server PostgreSQL database can grow quite large.
Performance can degrade significantly if stale and temporary data are not systematically removed.
• Vacuuming cleans up stale or temporary data in a table, and analyzing refreshes its knowledge of all the
tables for the query planner.
PostgreSQL Vacuum
• The "autovacuum_vacuum_threshold" parameter in PostgreSQL determines the minimum number of
updated or deleted tuples required in a table before the autovacuum process is triggered.

• The default is 50 tuples, meaning that if 50 or more tuples are modified in a table, autovacuum will be
triggered. However, you can adjust this value in the PostgreSQL configuration file (postgresql.conf) or by
changing table storage parameters.

• The default auto-vacuum analyze and vacuum settings are sufficient for a small deployment, but the
DBMS

percentage thresholds take longer to trigger as the tables grow larger. Performance degrades significantly
before the auto-vacuum vacuuming and analyzing occurs.

• Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL.

• To confirm whether the autovacuum daemon is running on LINUX, use the command below:
$ ps aux|grep autovacuum|grep -v grep
• Alternatively, the SQL query below can be used to check the status of the autovacuum in the pg_settings:
$ sudo --login -u postgres # switch to the postgres account
postgres@[hostname]:~ $ psql # then access the PostgreSQL prompt immediately
postgres=# SELECT name, setting FROM pg_settings WHERE name LIKE '%autovacuum%';
PostgreSQL Vacuum
• The VACUUM command will reclaim storage space occupied by dead tuples.
• The VACUUM can be run on its own, or with ANALYZE command.

• In the examples below, [tablename] is optional. Without a table specified, VACUUM will be run on ALL available tables in
the current schema that the user has access to.
• Plain VACUUM: Frees up space for re-use.
postgres=# VACUUM [tablename];
DBMS

• Full VACUUM: Locks the database table, and reclaims more space than a plain VACUUM.
postgres=# VACUUM(FULL) [tablename];
• A plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command
can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However,
extra space is not returned to the operating system (in most cases), it is made available for re-use within the same
table.
• VACUUM(FULL) rewrites the entire contents of the table into a new disk file with no extra space, allowing unused
space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on
each table while it is being processed and usage of the table will be blocked until this completes. We may consider
this an outage of the table.

• VACUUM(FULL) is useful when a particular table is full of dead rows and not expected to become that big again.
PostgreSQL Vacuum
• Full VACUUM and ANALYZE: Performs a Full VACUUM and gathers new statistics on query executions paths using ANALYZE:
postgres=# VACUUM(FULL, ANALYZE) [tablename];
• Verbose Full VACUUM and ANALYZE: Performs a Full VACUUM and gathers new statistics on query executions paths using
ANALYZE with verbose progress output:
postgres=# VACUUM(FULL, ANALYZE, VERBOSE) [tablename];

• As it can be seen in the figure, with PostgreSQL’s


DBMS

regular VACUUM operation, the DBMS only removes


dead tuples from each table page and reorganizes it
to put all the live tuples at the end of the page.

• With VACUUM(FULL), PostgreSQL removes the dead


tuples from each page, coalesces and compacts the
remaining live tuples to a new page (Table Page #3),
and then deletes the unneeded pages (Table Pages
#1 / #2).

• ANALYZE gathers statistics for the query planner to create the most efficient query execution paths. Per PostgreSQL
documentation, accurate statistics will help the planner to choose the most appropriate query plan, and thereby improve the
speed of query processing:
postgres=# ANALYZE [tablename];
PostgreSQL Index
• An index is sorting and a pointer from each record to their corresponding record in the original table where the data is actually stored.
Fundamentally, a database index is a strategically designed data structure that enhances the speed of data retrieval activities in a database table.
• PostgreSQL indexes are effective tools to enhance database performance. Indexes help the database server find specific rows much faster than it
could do without indexes.
• Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a
table. An index in a database is very similar to an index in the back of a book. For example, if you want to reference all pages in a book that
discusses a certain topic, you have to first refer to the index, which lists all topics alphabetically and then refer to one or more specific page
numbers.
• An index helps to speed up SELECT queries and WHERE clauses; however, it slows down data input, with UPDATE and INSERT statements. Indexes
can be created or dropped with no effect on the data. Adding an index can improve query time from minutes to milliseconds.
DBMS

• Keep in mind: Indexes add write and storage overheads to the database system. Therefore, using them appropriately is very important! This
means you shouldn't create indexes unnecessarily.
• Before moving on you should know that indexes are not perfect and can also be a query performance killer.
• While indexes are good at speeding up the time it takes to find data, they can actually slow down UPDATE, INSERT, or DELETE queries
because of the need to reindex the data when the table changes.
• As a rule, if your tables are constantly modified with frequent INSERTs and UPDATEs (more often than you read the data), then indexes will
cause performance degradation.
PostgreSQL Index
• Advantages of Indexes in PostgreSQL
Leveraging indexes in PostgreSQL provides several key advantages:
• Rapid data access: Indexes are instrumental in drastically slashing the time needed to retrieve data, particularly from large tables. Without
indexes, a complete table scan would be required, which can be quite time-consuming.
• Boosted query efficiency: Queries that include conditions in the WHERE clause or require table joins see marked improvements in
performance with indexing. Such queries leverage indexes to quickly pinpoint rows that fulfill the set criteria.
• Reduced disk I/O: As indexes hold a subset of the table's data, disk I/O operations are significantly lessened. This not only speeds up query
execution but also lightens the load on the storage system.
• Data integrity maintenance: Unique indexes act as safeguards against duplicate values within specific columns, thereby maintaining data
integrity by ensuring no two rows have identical values in the designated columns.
DBMS

• Disadvantages of Indexes in PostgreSQL


Despite the benefits, there are some potential pitfalls to using indexes in PostgreSQL:
• Increased storage requirement: The most obvious downside of utilizing indexes is the extra storage space they require. The precise amount
depends on the size of the table and the number of indexed columns. Usually, it's a small fraction of the total table size. However, for large
datasets, adding multiple indexes can lead to a significant increase in storage usage.
• Slower write operations: Every time a row is inserted, updated, or deleted, the index must be updated too. As a result, write operations may
become slower. It's crucial to balance read and write operations when considering index usage. If your application relies heavily on write
operations, the benefits of faster reads should be carefully weighed against the cost of slower writes.
• HOT updates: PostgreSQL employs a mechanism called Multi-Version Concurrency Control (MVCC) for updates. However, indexes can lead to
"HOT" (Heap-Only Tuples) updates. Instead of allowing direct in-place updates, each update operation effectively results in a new row version,
generating a new entry in each associated index. This leads to increased I/O activity and the addition of dead rows in the database.

Despite PostgreSQL indexes offering remarkable benefits in query performance, it's crucial to judiciously balance these advantages against
potential drawbacks, especially in situations where storage efficiency and write performance are key considerations.
PostgreSQL Index - Common PostgreSQL Index Types
1. B-Tree indexes
The B-tree (Balanced Tree) index stands as the default and most widely employed index type within PostgreSQL. When an
indexed column participates in a comparison employing any of these operators: <, <=, =, >=, >, the query planner will consider
employing a B-tree index.

B-tree indexes can also be effectively utilized with operators like BETWEEN and IN. Furthermore, an IS NULL or IS NOT NULL
condition on an indexed column can be combined with a B-tree index. By default, B-tree indexes organize their entries in
ascending order, with null last. If you wish to gain more insights into index ordering and its potential advantages in specific
scenarios, you can refer to the following link: https://www.postgresql.org/docs/current/indexes-ordering.html
DBMS

In the following example, a B-tree index is created on the product_id column of the products table. This index enhances the
speed of queries that seek specific product_id values or product_id ranges.
postgres=# CREATE INDEX index_product_id ON products (product_id);

2. Hash indexes
Hash indexes are ideal for equality-based lookups, but they don't support range queries or sorting. They work well with data
types like integers and are usually faster than B-tree indexes for equality checks.
postgres=# CREATE INDEX index_product_id ON products USING HASH (product_id);
PostgreSQL Index - Common PostgreSQL Index Types
3. Composite indexes
A multicolumn index is defined on more than one column of a table.
postgres=# CREATE INDEX index_product_id_name ON products (product_id, product_name);
It is presumed that the product_id column is heavily utilized when retrieving data from the products table. Therefore, we are
giving precedence to the product_id column over the product_name column.

When deciding whether to create a single-column index or a multicolumn index, it's essential to consider the column or
columns that are frequently used in a query's WHERE clause as filter conditions.
DBMS

4. Partial indexes
A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate
of the partial index). The index contains entries only for those table rows that satisfy the predicate.
postgres=# CREATE INDEX index_product_id ON products (product_id) WHERE product_available = 'true';
One common use case for partial indexes is to filter out rows that are irrelevant for most queries. For example, suppose you
have a table of products with a column called product_available, which can be ‘true' or ‘false’. Most queries will only need to
access the available products, so you can create a partial index on the availability column, where available = 'true’.
PostgreSQL Index - Common PostgreSQL Index Types
5. Covering indexes
Covering index allows a user to perform an index-only scan if the select list in the query matches the columns that are included
in the index. Additional columns can be specified using the INCLUDE keyword.
postgres=# CREATE INDEX index_product_id_name_status ON products (product_id, product_name)
include (status);

When the user runs the following query:


DBMS

postgres=# EXPLAIN ANALYZE SELECT product_id,product_name,status FROM products


WHERE product_id < 10;

All the columns specified in the SELECT clause will be retrieved directly from the index pages. In theory, this can significantly
reduce the amount of I/O (input/output) your query requires to access information.

Traditionally, I/O operations represent a significant bottleneck in database systems, so by avoiding access to the heap (table
data) and minimizing the need for multiple I/O operations, PostgreSQL can enhance query performance.

However, it's important to exercise caution when implementing covering indexes. Each column added to the index still takes up
space on disk, and there is an associated cost for maintaining the index, especially when it comes to row updates.
PostgreSQL Index - Common PostgreSQL Index Types
6. Block Range Index(BRIN)
BRIN indexes are designed for large tables with sorted data, such as time-series data. They divide the table into logical blocks
where each block contains a range of values. Instead of storing individual index entries for each row, the index stores the
minimum and maximum values within each block, making them smaller in size compared to other index types.

postgres=# CREATE INDEX btree_example_index ON logs(log_date); #index size i.e: 528MB

postgres=# CREATE INDEX btree_example_index ON logs USING BRIN(log_date); #index size i.e: 521kB
DBMS

7. Other indexes
For complex data types like arrays or geometric shapes, use GiST, GIN, or SP-GIST indexes.
PostgreSQL Index - Drop or List Indexes
• Sometimes, you may want to remove an existing index from the database system. To do it, you use the DROP
INDEX statement as follows:
postgres=# DROP INDEX index_name;

• PostgreSQL does not provide a command like SHOW INDEXES to list the index information of a table or database.
However, it does provide you with access to the pg_indexes view so that you can query the index information.
• The following statement lists all indexes of the schema public in the current database:
DBMS

postgres=# SELECT tablename, indexname, indexdef FROM pg_indexes


WHERE schemaname = 'public' ORDER BY tablename, indexname;

• Also you can use the \d meta-command to view the index information for a table.
postgres=# \d table_name;
PostgreSQL Reindex
• The REINDEX command rebuilds one or more indices, replacing the previous version of the index. REINDEX can be
used in many scenarios, including the following (from PostgreSQL documentation):
• An index has become corrupted, and no longer contains valid data. Although in theory, this should never happen, in practice
indexes can become corrupted due to software bugs or hardware failures. REINDEX provides a recovery method.

• An index has become "bloated", that is it contains many empty or nearly-empty pages. This can occur with B-tree indexes in
PostgreSQL under certain uncommon access patterns. REINDEX provides a way to reduce the space consumption of the index by
writing a new version of the index without the dead pages.
DBMS

• A storage parameter (such as fillfactor) has been changed for an index, and needs ensure that the change has taken full effect.

• An index build with the CONCURRENTLY option failed, leaving an "invalid" index. Such indexes are useless but it can be convenient
to use REINDEX to rebuild them. Note that REINDEX will not perform a concurrent build. To build the index without interfering with
production it is necessary to drop the index and reissue the CREATE INDEX CONCURRENTLY command *.

postgres=# REINDEX INDEX myindex; # recreate a single index, myindex


postgres=# REINDEX TABLE mytable; # Recreate all indices in a table, mytable
postgres=# REINDEX SCHEMA public; # Recreate all indices in schema public
postgres=# REINDEX DATABASE postgres; # Recreate all indices in database postgres
postgres=# REINDEX SYSTEM postgres; # Recreate all indices on system catalogs in database postgres
* Any of these can be forced by adding the keyword FORCE after the command
PostgreSQL Write-Ahead Log (WAL)
• The Write-Ahead Log (WAL) is a very important term in transaction processing.
• In PostgreSQL, it is also known as a transaction log.
• A log is a record of all the events or changes and WAL data is just a description of changes made to the actual data.
So, it is "data about data" or metadata.
• Using the Postgres WAL entries you can:
• restore the database back to its state at any previous point in time
• replicate the changes on a byte-by-byte level, creating an identical copy of the database in another server.
• In practice, a process called WAL receiver, running on the standby server, will connect to the primary server using a TCP/IP
connection. In the primary server, another process exists, named WAL sender, and is in charge of sending the WAL registries
DBMS

to the standby server as they happen.


• When configuring streaming replication, you have the option to enable WAL archiving. This is not mandatory, but is extremely
important for robust replication setup, as it is necessary to avoid the main server to recycle old WAL files that have not yet
being applied to the standby server. If this occurs you will need to recreate the replica from scratch.
PostgreSQL Write-Ahead Log (WAL)
• The term "Write-Ahead Log" implies that any change that you make to the database must first be appended
to the log file, and then the log file should be flushed to disk. What is the consequence of this? The basic
purpose of Write-Ahead Logging is to ensure that when there is a crash in the operating system or
PostgreSQL or the hardware, the database can be recovered. This is because, with the help of the log
records, you can recover all the transactions to the data pages.

• With most installations and packages, 16 MB is the size of the wal segments. Unless your transaction rate is
through the roof, 16 MB size is good enough.
DBMS

• You can change this size by adjusting the with-wal-segsize option.

• As new records are written, they are appended to WAL logs. Its position is defined by a Log Sequence
Number. The Log Sequence Number (LSN) is a unique identifier in the transaction log. It represents a position
in the WAL stream. That is, as records are added to the Postgres WAL log, their insert positions are described
by the Log Sequence Number.

• You can look at two LSN values and based on the difference, determine how much WAL data lies in between
them. This will let you estimate the advancement of recovery.
PostgreSQL Write-Ahead Log (WAL)
• Benefits of WAL in PostgreSQL:
• As only log files are flushed to disk during transnational
commit, it reduces the number of disk writes.
• The cost of syncing your log file is less as the log files are
written sequentially.
• It adds data page consistency .
• Postgres WAL offers on-line backup and point-in-time
recovery.
DBMS

• WAL files are stored in $PGDATA/pg_wal.


Typically these are 16 MB files with a 24-character filename
made of hex numbers (0-9, A-F). Fig-1: How replication works
• New WAL files keep getting created in the course of operation of
the server, and the old ones are effectively deleted (they’re
actually renamed and reused, as this incurs slightly less disk I/O
than deleting and creating).

• The WAL file name is in the format TTTTTTTTXXXXXXXXYYYYYYYY.


Here 'T' is the timeline, 'X' is the high 32-bits from the LSN, and
'Y' is the low 32-bits of the LSN.
Fig-2: WAL file naming
PostgreSQL Write-Ahead Log (WAL)
• Why should we follow WAL files?
• Unfortunately, it can be difficult to predict the number of WAL files needed (and therefore the disk space needed for them)
for the normal operation of the server.
• Configuration settings related to checkpoint (timeout, completion targets), WAL files (min and max wal file sizes,
compression) and archiving (timeout) will also influence the number of WAL files lying in $PGDATA/pg_wal directory. On
top of all this, features like WAL archiving and replication slots can cause the retention of WAL files.
• Increasing the WAL file count are typically caused by VACUUM-like maintenance tasks that create large volume of changes,
or temporary tables and objects in a short period of time. These should come slowly back down to normal levels. These
DBMS

typically result in a lot of disk I/O and CPU activity, causing application queries to run slower until things are back to
normal.

• Increases in the count that refuse to come back down have to be dealt with quickly. These can be because of:
• Archival failures: If the archive script fails for a certain WAL file, Postgres will retain it and keep retrying until it
succeeds. In the meantime, new WAL files will keep getting created. Ensure that WAL archival processes are not
broken, and can keep up with the WAL creation rate.

• Replication failures: When using streaming replication if the standby goes offline for extended periods of time,
or if someone forgot to delete the replication slot on the primary, the WAL files can be retained indefinitely.

• Long running transactions: These can prevent checkpoints, and therefore the WAL files have to be retained until
the time checkpointer can make progress. Ensure that you have no long running transactions, especially ones
that mutate a lot of data.
PostgreSQL Write-Ahead Log (WAL)
• How Can We Monitor / Follow WAL Files?
• Scripts:
• Shell scripts that simply monitor the count of files in the $PGDATA/pg_wal directory, and send the values to your
existing monitoring systems should help you keep track of the WAL file count.
• Existing script-based tools like check_postgres can also collect this information. You should also have a way to
correlate this count with the PostgreSQL activity going on at a specific time. Ensure that you have no long running
transactions, especially ones that mutate a lot of data.
• Queries:
DBMS

• PostgreSQL does not have a built-in function or view that directly returns WAL file related information. You can
however, use this query:
postgres=# SELECT COUNT(*) FROM pg_ls_dir('pg_xlog') WHERE pg_ls_dir ~ '^[0-9A-F]{24}';
that does the job of getting the count of WAL files. (Note that you’ll need superuser privileges or explicit GRANTs to do a pg_ls_dir)
• pgmetrics:
• pgmetrics (https://pgmetrics.io) is an open-source tool that can collect and report a lot of PostgreSQL metrics,
including WAL file counts.
• pgDash:
• pgDash (https://pgdash.io/) is a modern, in-depth monitoring solution designed specifically for PostgreSQL
deployments.
• pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected
using the open-source tool pgmetrics. With pgDash you can correlate WAL file activity at any given time with
the SQL queries that were running at the time, and system-level metrics like CPU and memory usage.
Database Query - JOINS
• PostgreSQL join is used to combine columns from one (self-join) or more tables based on the values of the common
columns between related tables. The common columns are typically the primary key columns of the first table and
foreign key columns of the second table.
• PostgreSQL supports: inner join, left join, right join, full outer join, cross join, natural join and a special kind of join
called self-join.

left outer join right outer join


DBMS

inner join

right outer join


left outer join only rows from the right table
only rows from the left table

full outer join


full outer join
only rows unique to both tables
Database Query - JOINS - Setting up sample tables
• Suppose you have two tables called basket_a and basket_b that store fruits:
postgres=# CREATE TABLE basket_a (
a INT PRIMARY KEY,
fruit_a VARCHAR (100) NOT NULL
);

CREATE TABLE basket_b (


b INT PRIMARY KEY,
fruit_b VARCHAR (100) NOT NULL
DBMS

);

INSERT INTO basket_a (a, fruit_a)


VALUES
(1, 'Apple'),
(2, 'Orange'),
(3, 'Plum'),
(4, 'Pineapple');

INSERT INTO basket_b (b, fruit_b)


VALUES
(1, 'Orange'),
(2, 'Apple'),
(3, 'Watermelon'),
(4, 'Pear');
Database Query - JOINS - Inner Join
• The following statement joins the first table (basket_a) with the second table (basket_b) by matching the values in the
fruit_a and fruit_b columns (returns matching rows in both tables):
postgres=# SELECT a, fruit_a, b, fruit_b
FROM basket_a
INNER JOIN basket_b
ON fruit_a = fruit_b;
DBMS

• The inner join examines each row in the first table (basket_a).
• It compares the value in the fruit_a column with the value in the fruit_b column of each row in the second table
(basket_b).
• If these values are equal, the inner join creates a new row that contains columns from both tables and adds this new
row the result set.
Database Query - JOINS - Left Join
• The following statement uses the left join the first table (basket_a) with the second table (basket_b). In the left join
context, the first table is called the left table and the second table is called the right table:
postgres=# SELECT a, fruit_a, b, fruit_b
FROM basket_a
LEFT JOIN basket_b
ON fruit_a = fruit_b;
DBMS

• The left join starts selecting data from the left table. It compares values in the fruit_a column with the values in the
fruit_b column in the basket_b table. If these values are equal, the left join creates a new row that contains columns
of both tables and adds this new row to the result set. (see the row #1 and #2 in the result set).
• In case the values do not equal, the left join also creates a new row that contains columns from both tables and adds
it to the result set. However, it fills the columns of the right table (basket_b) with null (both the row #3 and #4 in the
result are null).
Database Query - JOINS - Left Join - only rows from the left table
• To select rows from the left table that do not have matching rows in the right table, you use the left join with a WHERE
clause. For example:
postgres=# SELECT a, fruit_a, b, fruit_b
FROM basket_a
LEFT JOIN basket_b
ON fruit_a = fruit_b
DBMS

WHERE b IS NULL;
• Note that the LEFT JOIN is the same as the LEFT OUTER JOIN so you can use them interchangeably.
• The following diagram illustrates the left join that returns rows from the left table that do not have matching rows from
the right table:
Database Query - JOINS - Right Join
• The right join is a reversed version of the left join. The right join starts selecting data from the right table. It compares each value
in the fruit_b column of every row in the right table with each value in the fruit_a column of every row in the fruit_a table.
• If these values are equal, the right join creates a new row that contains columns from both tables.
• In case these values are not equal, the right join also creates a new row that contains columns from both tables. However, it fills
the columns in the left table with NULL.
• The following statement uses the right join to join the basket_a table with the basket_b table:

postgres=# SELECT a, fruit_a, b, fruit_b


DBMS

FROM basket_a
RIGHT JOIN basket_b
ON fruit_a = fruit_b;
Database Query - JOINS - Right Join - only rows from the right table
• Similarly, you can get rows from the right table that do not have matching rows from the left table by adding a WHERE clause as
follows:

postgres=# SELECT a, fruit_a, b, fruit_b


FROM basket_a
RIGHT JOIN basket_b
ON fruit_a = fruit_b
DBMS

WHERE a IS NULL;
• Note that the RIGHT JOIN and RIGHT OUTER JOIN are the same therefore you can use them interchangeably.
• The following diagram illustrates the right join that returns rows from the right table that do not have matching rows in
the left table:
Database Query - JOINS - Full Outer Join
• The full outer join or full join returns a result set that contains all rows from both left and right tables, with the matching rows
from both sides if available. In case there is no match, the columns of the table will be filled with NULL:

postgres=# SELECT a, fruit_a, b, fruit_b


FROM basket_a
FULL OUTER JOIN basket_b
ON fruit_a = fruit_b;
DBMS

• The following diagram illustrates the full outer join:


Database Query - JOINS - Full Outer Join - only rows unique to both tables
• To return rows in a table that do not have matching rows in the other, you use the full join with a WHERE clause like this:

postgres=# SELECT a, fruit_a, b, fruit_b


FROM basket_a
FULL JOIN basket_b
ON fruit_a = fruit_b
WHERE a IS NULL OR b IS NULL;
DBMS

• The following Venn diagram illustrates the full outer join that returns rows from a table that do not have the
corresponding rows in the other table:
Database Query - JOINS - Briefly
🔹 INNER JOIN: Returns matching rows in both tables
🔹 LEFT JOIN: Returns all records from the left table, and the matching records from the right table
🔹 RIGHT JOIN: Returns all records from the right table, and the matching records from the left table
🔹 FULL OUTER JOIN: Returns all records where there is a match in either left or right table
DBMS
Database Query - See also other SQL operators may be required
• UNION
• INTERSECT
• EXCEPT
• EXPLAIN
• HAVING
DBMS

• ROLLUP
• ANY
• ALL
• EXISTS
• UPDATE JOIN
• DELETE JOIN
...
and so on... :)
Tutorials & Other Resources
Website URL Description

The PostgreSQL Global Development Group's Documentation


PostgreSQL Official Documentation
Everything about PostgreSQL is here!

W3Schools PostgreSQL Documentation Quickly learn PostgreSQL and test yourself with exercises.
DBMS

A full, free online course for walking through PostgreSQL, from the basics to advanced
Tutorials Point PostgreSQL
administration.

PG Exercises Free online exercises for learning PostgreSQL in an interactive manner.

PostgreSQL Primer for Busy People A handy single-paged resource and reference guide for getting started with PostgreSQL.

PostgreSQL Tutorial Learn PostgreSQL and how to get started quickly through practical examples.

Schemaverse A space-based strategy game implemented entirely within a PostgreSQL database.

Awesome Postgres A curated list of awesome PostgreSQL software, libraries, tools and resources.

You might also like