PostgreSQL Essentials v16 Student
PostgreSQL Essentials v16 Student
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Course Agenda
Introduction and Architectural Overview Database Security
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
EDB Portfolio
History of PostgreSQL
Major Features
Architectural Overview
Open source Postgres EDB proprietary distribution for EDB EDB proprietary distribution with
Postgres Distributed use cases Transparent Data Encryption
with Transparent Data Encryption
EDB continues to be committed SQL compatible with Postgres, SQL compatible with Oracle,
to advancing features in extended for stringent reduces effort to migrate
collaboration with the broader availability and advanced applications and data to
community replication needs Postgres
Transparent Data Encryption Transparent Data Encryption
Formerly known as Additional value-add enterprise
2ndQPostgres features
PostgreSQL
Postgres95 (1994-1995)
PostgreSQL (1996-current)
PostgreSQL Lineage
PostgreSQL History
SQL Server
Sybase
SQL Server
Microsoft
Multi-node PITR
Application Assessment
Advanced Security ✔ ✔
Advanced SQL ✔ ✔
Advanced Performance ✔ ✔
Resource Manager ✔ ✔
Oracle Compatibility ✔
Capabilities And Tools
Reliable:
ACID Compliant
Supports Transactions and Savepoints
Uses Write Ahead Logging (WAL)
Scalable:
Uses Multi-version Concurrency Control
Table Partitioning and Tablespaces
Parallel Sequential Scans, DDL(Table and Index Creation)
Major Features (continued)
Secure: Advanced:
Employs Host-Based Access Control, SSL Supports Triggers, Functions and Procedures
Connections and Logging using Custom Procedural Languages
Provides Object-Level Permissions and Row Major Database Version Upgrades using
Level Security pg_upgrade
Connectors
PERL DBI
NODE.JS
PYTHON
LIBPQ
ODBC
ECPG
JDBC
.NET
TCL
PostgreSQL
Limit Value
Maximum Database Size Unlimited
Row Tuple
Column Attribute
History of PostgreSQL
Major Features
Architectural Overview
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Architectural Summary Commit and Checkpoint
Shared Memory
Shared Buffers WAL Buffers Process Array
BGWRITER LOGGER
Data WAL Archived
Files Segments WAL
CHECKPOINTER ARCHIVER
AUTOVACUUM LOGICAL
REPLICATION
Error Log
Files
WAL WRITER
Utility Processes
Background writer
Writes dirty data blocks to disk
WAL writer
Flushes write-ahead log to disk
Checkpointer
Automatically performs a checkpoint based on config parameters
Logging collector
Routes log messages to syslog, eventlog, or log files
More Utility Process
Autovacuum launcher
Starts Autovacuum workers as needed
Autovacuum workers
Recover free space for reuse
Archiver
Archives write-ahead log files
Client requests a
connection
Postmaster
Postmaster is the main process called postgres
27
User Backend Process
Postmaster process spawns a new Postmaster
server process for each connection
request detected
Shared Memory
Authorization - Verify permissions
28
Respond to Client
Postmaster
work_mem
Postgres
User backend process called postgres
Callback to client
Shared Memory
Waits for SQL
CHECKPOINT
Stable
Databases
Background Writer
Cleaning Scan Postgres Postgres Postgres
Stable
Databases
Write Ahead Logging (WAL) Postgres Postgres Postgres
Transaction
Stable Log
Databases
Archive
Command
Commit and Checkpoint
Optimize
Check syntax
Call traffic cop Execute query based on
Identify query type query plan
Command processor if Planner generates a plan
needed Uses database statistics
Break query into tokens Apply Optimizer Hints
Query cost calculation
Choose best plan
Parse Execute
Physical Database Architecture
Database Cluster
Collection of databases managed by single server instance
Databases
A cluster can contain multiple databases
Installation Directory Layout
Default Installation Directory Location:
Linux - /usr/pgsql-16
bin – Programs
lib – Libraries
DATA
Cluster wide Contains Symbolic link to Write ahead Startup logs Error logs pg_xact, pg_multiexact, postgresql.conf,
database Databases tablespaces logs pg_snapshots, pg_stat, pg_hba.conf,
objects pg_subtrans,pg_notify, pg_ident.conf,
pg_serial, pg_replslot, postgresql.auto.conf
pg_logical,
pg_dynshmem
Physical Database Architecture
File-per-table, file-per-index
A table-space is a directory
Each relation using that table-space/database combination gets one or more files, in 1GB chunks
Additional files are used to hold auxiliary information (free space map, visibility map)
14307 14300
Database OID
14405
base
14312 14498
DATA
pg_tblsc
16650
Tablespace OID
14299
/storage/pg_tab
14307 14301
16700
16651 16701
Page Layout
Page header Row/index entry
General information about the page The actual row or index entry data
Pointers to free space Special
24 bytes long Index access method specific data
Row/index pointers Empty in ordinary tables
Array of offset/length pairs pointing to the
actual rows/index entries
4 bytes per item
Free space
Unallocated space
New pointers allocated from the front, new
rows/index entries from the rear
Page Structure
Page
Item Item Item
Header
8K
Tuple
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Deployment Options
Package Installation
Native packages or installers: PostgreSQL Yum Repository can be used for YUM and RPM based
installation
Source Code Installation: PostgreSQL source code is open-source and free to use
This user account should only own the data directory that is
managed by the server
https://www.postgresql.org/download/linux/redhat/
https://www.postgresql.org/download/linux/ubuntu/
It will provide you with repository location and the post installation
steps to be performed for setup of the initial database cluster
Example – Download the PostgreSQL YUM Repository
Example – Download the PostgreSQL APT Repository
Practice Lab - Install PostgreSQL on Rocky Linux
Install the repository RPM:
sudo dnf install -y
https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-
x86_64/pgdg-redhat-repo-latest.noarch.rpm
Disable the built-in PostgreSQL module:
sudo dnf -qy module disable postgresql
Install PostgreSQL:
sudo dnf install -y postgresql16-server
Change the authentication method to scram-sha-256 in pg_hba.conf file and reload the server
export PATH
export PGDATA=/var/lib/pgsql/16/data/
export PGUSER=postgres Logoff and Login
export PGPORT=5432
export PGDATABASE=postgres
Package Installation
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Introduction to psql
Connecting to Database
psql Meta-Commands
postgres=# \q
Connecting to a Database
-p <Database Port>
-U <Database Username>
Conventions
psql has its own set of commands, all of which start with a backslash (\).
Some commands accept a pattern. This pattern is a modified regex. Key points:
Double-quotes are used to specify an exact name, ignoring all special characters and
preserving case
On Startup…
During startup, psql considers environment variables for connection
psql will then execute commands from $HOME/.psqlrc file, this can be
skipped using -X option
On UNIX, there is tab completion for various things, such as SQL commands
History and Query Buffer
\s will show the command history
Edmonton
Example:
Once AUTOCOMMIT is set to off use COMMIT/ROLLBACK to complete the running transaction
Conditional Commands
Conditional commands primarily helpful for scripting
\d[+] [pattern]
\l[ist][+]
\df[+] [pattern]
Lists functions
\cd [ directory ]
\! [ command ]
\?
Shows help information about psql commands
\h [command]
Shows information about SQL commands
psql --help
Lists command line options for psql
Module Summary
Introduction to psql
Connecting to Database
psql Meta-Commands
Run the psql command with the -f option to execute the edbstore.sql file and install all the sample
objects required for this training
psql -p 5432 -f edbstore.sql –d postgres –U postgres
Enter postgres database user password
After successful execution, a new database named edbstore owned by a new database user edbuser
is created. Default password for edbuser is edbuser
Connect to edbstore database and verify newly created objects using psql meta commands.
8. Execute a sql statement, saving 14. View the current working directory
the output to a file
Database
Clusters
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Database Clusters
Port
template1
postgres
Creating a Database Cluster
Choose the data directory location for new cluster
Initialize the database cluster storage area (data directory) using the initdb
utility
You must have permissions on the parent directory so that initdb can create
the data directory
The data directory can be created manually by superuser and the ownership
can be given to postgres user
initdb Utility
$ initdb [OPTION]... [DATADIR]
Options:
-D, --pgdata location for this database cluster
-E, --encoding set default encoding for new databases
-U, --username database superuser name
-W, --pwprompt prompt for a password for the new superuser
-X, --waldir location for the write-ahead log directory
--wal-segsize size of wal segments , in megabytes
-k, --data-checksums use data page checksums
-?, --help show this help, then exit
If the data directory is not specified, the environment variable PGDATA is used
Example - initdb
[root@Base ~]# mkdir /edbstore
[root@Base ~]# chown postgres:postgres /edbstore
[root@Base ~]# su – postgres
It provides options for redirecting start log, controlled startup and shutdown
pg_ctl -D datadir
edb=# \q
Reload a Database Cluster
Some configuration parameter changes do not require a restart
Syntax:
Reload your cluster with pg_ctl utility and using pg_reload_conf() function
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Server Parameter File - postgresql.conf
integer
floating point
string
enum
One way to set these parameters is to edit the file postgresql.conf, which is
normally kept in the data directory
The Server Parameter File - postgresql.conf
Holds parameters used by a cluster
Parameters are case-insensitive
Normally stored in data directory
initdb installs default copy
Some parameters only take effect on server restart (pg_ctl restart)
# used for comments
One parameter per line
Use include directive to read and process another file
Can also be set using the command-line option
Viewing and Changing Server Parameters
ssl_cert_file - Specifies the name of the file containing the SSL server certificate
ssl_key_file - Specifies the name of the file containing the SSL server private key
ssl_ciphers - List of SSL ciphers that may be used for secure connections
maintenance_ autovacuum
shared_buffers temp_buffers work_mem temp_file_limit
work_mem _work_mem
Amount of
Amount of Amount of Amount of Amount of
Size of memory
memory memory memory disk space
shared used
used sorting used for used by used for
buffer pool caching
and hashing maintenance autovacuum temporary
for a cluster temporary
operations commands worker files
tables
Server Session
Query Planner Settings
random_page_cost (default 4.0) - Estimated cost of a random page fetch.
May need to be reduced to account for caching effects
Percentage of queries(above
log_statement_sample_rate
log_autovacuum_min_duration) to be logged
Log some information each time a session disconnects, including the duration of
log_disconnections the session
log_checkpoints Causes checkpoints and restart points to be logged in the server log
log_lock_waits Log information if a session is waits longer then deadlock_timeout to acquire a lock
log_error_verbosity How detailed the logged message is. Can be set to default, terse or verbose
Additional details to log with each line. Default is '%m [%p] ‘ which logs a timestamp
log_line_prefix and the process ID
log_statement Legal values are none, ddl, mod (DDL and all other data-modifying statements), or all
Background Writer Settings
bgwriter_delay (default 200 ms) - Specifies time between activity rounds for
the background writer
parallel_tuple_cost (default 0.1): Estimated cost of transferring one tuple from a parallel worker
process to another process
min_parallel_table_scan_size (default 8MB): Sets minimum amount of table data that must be
scanned in order for a parallel scan
min_parallel_index_scan_size (default 512 KB): Sets the minimum amount of index data that must
be scanned in order for a parallel scan
force_parallel_mode (default off): Useful when testing parallel query scan even when there is no
performance benefit
Parallel Maintenance Settings
PostgreSQL supports parallel processes for creating an index
max_parallel_maintenance_workers=0 max_parallel_maintenance_workers=4
Vacuum Cost Settings
vacuum_cost_delay (default 0 ms) - The length of time, in milliseconds, that the process will
wait when the cost limit is exceeded
vacuum_cost_page_hit (default 1) - The estimated cost of vacuuming a buffer found in the
buffer pool
vacuum_cost_page_miss (default 10) - The estimated cost of vacuuming a buffer that must be
read into the buffer pool
vacuum_cost_page_dirty (default 20) - The estimated cost charged when vacuum modifies a
buffer that was previously clean
vacuum_cost_limit (default 200) - The accumulated cost that will cause the vacuuming process
to sleep
vacuum_buffer_usage_limit(default 256kb) - The size of the Buffer Access Strategy used by the
VACUUM and ANALYZE commands
Autovacuum Settings
autovacuum (default on) - Controls whether the autovacuum launcher runs, and
starts worker processes to vacuum and analyze tables
block_size data_directory_mode
wal_block_size server_encoding
segment_size max_function_args
wal_segment_size max_index_keys
data_checksums ssl_library
Configuration File Includes
The postgresql.conf file can now contain include directives
include ‘filename’
Save all the error message in a file inside the log folder in your cluster data directory
(e.g. c:\edbdata or /edbdata)
Log all queries which are taking more than 5 seconds to execute, and their time
Autovacuum workers to 6
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
The System Catalog Schema
Contains:
pg_conf_load_time() pg_jit_available()
System Administration Functions
current_setting, set_config Return or modify configuration variables
pg_ls_logdir() Returns the name, size, and last modified time of each file in the log directory
pg_ls_waldir() Returns the name, size, and last modified time of each file in the WAL directory
More System Administration Functions
Disk space used by a tablespace, database, relation or
pg_*_size total_relation (includes indexes and toasted data)
pg_hba_file_rules Provides a summary of the contents of the client authentication configuration file, pg_hba.conf
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Object Hierarchy
Tablespaces
Databases
Access Control
Creating Schemas
Database
Cluster
Users/Groups
Database Tablespaces
(Roles)
Event
Table View Sequence Functions
Triggers
Users and Roles
Database Users
Are global within a database cluster
Syntax: Example:
CREATE USER name [ [ WITH ] option [ ... ]
where option can be:
SUPERUSER | CREATEDB | CREATEROLE |
INHERIT | LOGIN | NOLOGIN | REPLICATION
| BYPASSRLS |
| CONNECTION LIMIT connlimit | [
ENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'
Creating Users Using createuser
The createuser utility can also be used to create a user
Syntax:
$ createuser [OPTION]... [ROLENAME]
Example: create a new user with read only access or a new user with
access to view monitoring data only
pg_checkpoint pg_read_server_files
pg_database_owner pg_signal_backend
pg_execute_server_program pg_stat_scan_tables
pg_monitor pg_write_all_data
pg_read_all_data pg_create_subscription
pg_read_all_settings pg_use_reserved_connections
pg_read_all_stats pg_write_server_files
Tablespaces
Tablespaces and Data Files
Data is stored logically in tablespaces and physically in data files
Tablespaces:
Can belong to only one database cluster
Data Files:
Can belong to only one tablespace
Indexes
Tablespace A
Database Instance
Fast Storage
Transactional Tables
Historical Tables
Tablespace B
Slow Storage
Seldom Used Partition
Pre-Configured Tablespaces
PGDATA/global directory
pg_global
Tablespace
Database Instance
PGDATA/base directory
pg_default
Tablespace
Databases, schemas and other objects
Creating Tablespaces
Tablespace Physical Cluster Data
How to create? CREATE Directory Directory
TABLESPACE command
Syntax:
CREATE TABLESPACE
Database Directory Symbolic
tablespace_name [ OWNER for each Database Link(Tablespace OID)
user_name ]
LOCATION 'directory‘;
Database
Objects(Files)
Example - CREATE TABLESPACE
[training@Base ~]$ sudo mkdir /newtab1
[training@Base ~]$ sudo chown postgres:postgres /newtab1
[training@Base ~]$ su - postgres
[postgres@Base ~]$ psql -p 5432 postgres postgres
(1 row)
edb=# show temp_tablespaces;
temp_tablespaces
------------------
(1 row)
Altering Tablespaces
ALTER TABLESPACE can be used to rename a tablespace, change ownership
and set a custom value for a configuration parameter
Syntax:
ALTER TABLESPACE name RENAME TO new_name
ALTER TABLESPACE name OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
ALTER TABLESPACE name SET ( tablespace_option = value [, ... ] )
ALTER TABLESPACE name RESET ( tablespace_option [, ... ] )
Note: If PATH is not set you can execute psql command from the bin
directory of postgres installation
Privileges
Cluster level
Granted to a user during CREATE or later using ALTER USER
Object Level
Granted to user using GRANT command
These privileges allow a user to perform particular actions on a database object, such as
tables, views, or sequence
Can be granted by owner, superuser or someone who has been given permission to grant
privileges (WITH GRANT OPTION)
GRANT Statement
Grants object level privileges to database users, groups or roles
REVOKE [ GRANT OPTION FOR ] can be used to revoke only the grant
option without revoking the actual privilege
SCHEMA
Tables Views
Owns
Sequences Functions
USER
Domains
Benefits of Schemas
A database can contain one or more named schemas
There are several reasons why one might want to use schemas:
To allow many users to use one database without interfering with each other
To organize database objects into logical groups to make them more manageable
Third-party applications can be put into separate schemas so they cannot collide
with the names of other objects
Creating Schemas
Schemas can be added using the CREATE SCHEMA SQL command
Syntax:
CREATE SCHEMA IF NOT EXISTS schema_name [ AUTHORIZATION
role_specification ]
Example:
What is a Schema Search Path
The schema search path determines which schemas are searched
for matching table names
Search path is used when fully qualified object names are not used
in a query
Example:
This statement will find the first employee table from the schemas listed in the
search path
Determine the Schema Search Path
To show the current search path, execute the following command in psql:
SHOW search_path;
Modifying search_path:
Cluster/Instance Level: postgresql.conf or ALTER SYSTEM
Database
Cluster
Owner
Users/Groups
Database Tablespaces
(Roles)
Event
Table View Sequence Functions
Triggers
Module Summary
Object Hierarchy
Tablespaces
Databases
Access Control
Creating Schemas
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Database Security Requirements and
Protection Plan
Levels of Security in Postgres
Data Encryption
183
Data Security Requirements
Stopping improper disclosure,
modification and denial of access
to information is very important
Who wants an employee finding
out boss’s salary, changing his/her
salary or stopping HR from
printing paychecks
Database Security Requirements
includes:
Confidentiality
Integrity
Availability
Protection Plan –
We all need one
Access Control
Prevent
Authentication and Authorization
Data Control
Views, Row Level Security, Encryptions
User/Password
Connect Privilege Database
Schema Permissions
pg_hba.conf
Postmaster Certificate
Client
IP: 10.8.99.30
User: appuser1
File Structure: Comprises individual records (one per line); Records are processed from top
to bottom
Record Details: Specifies connection type, database name, user name, client IP(Hostnames,
IPv6, and IPv4), and authentication method
Authentication Methods: trust, reject, password, gss, sspi, krb5, ident, peer, pam, ldap, radius,
bsd, cert, scram, md5
Reject Unconditionally rejects the connection; useful for blocking specific hosts.
ident Obtains the client's OS username by contacting the ident server for TCP/IP connections.
peer Obtains the client's OS username and matches it with the requested database user name (local connections).
pam Authenticates using Pluggable Authentication Modules (PAM) provided by the OS.
bsd Authenticates using the BSD Authentication service provided by the OS.
pg_hba.conf Example
# TYPE DATABASE USER ADDRESS METHOD
SQL:
select rule_number,type,database,user_name,address,netmask,auth_method
from pg_hba_file_rules ;
Authentication Problems
FATAL: no pg_hba.conf entry for host "192.168.10.23", user
“edbstore", database “edbuser“
FATAL: password authentication failed for user “edbuser“
FATAL: user “edbuser" does not exist
FATAL: database “edbstore" does not exist
To allow all users to view their own row in a user table, a simple policy can be
used:
postgres=# CREATE POLICY user_policy ON users USING (user =
current_user);
Data Encryption
Database Level Encryption
Encrypting everything does not make data secure
Resources are consumed when you query encrypted data
pgcrypto provides mechanism for encrypting selected columns
pgcrypto supports one-way and two-way data encryption
Install pgcrypto using CREATE EXTENSION command
CREATE EXTENSION pgcrypto;
General Security
Recommendations
General Recommendations - Database Server
Always keep your system patched to the latest version
If that's not possible, make a read-only Replica database available on the port, not a R/W master
Shared credentials make auditing more complicated and violate HIPAA, PCI, etc.
Ensure the login(s) have minimum rights needed to do their work (e.g. SELECT
rights and only to specified tables)
General Recommendations - Database Superuser
Only allow the database superuser to log in from the server machine
itself, with local or localhost connection
Use a separate database login to own each database and own everything in it
General Recommendations - Database Backups
Keep backups and have a tested recovery plan. No matter how well you secure
things, it's still possible an intruder could get in and delete or modify your data
Have scripts perform backups and immediately test them and alert DBA on any
failures
Keep backups physically separate from the database server. A disaster can
strike and take out an entire location, whether that’s environmental (e.g.
earthquake), malicious (e.g. hacker, insider), or human error
General Recommendations - Think AAA
Data Encryption
2. Configure your server so that the new developer can connect from
their machine
Monitoring and
Admin Tools
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Overview and Features of pgAdmin
Access pgAdmin
sudo rpm -i
https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-
repo-2-1.noarch.rpm
/usr/pgadmin4/bin/setup-web.sh
Access pgAdmin Web Interface
Open a browser and type: http://<IP>/pgadmin4
Enter email address and password provided during post install script.
Click Login
pgAdmin - User Interface
Registering a Server
This error occurs when either the database server isn't running OR the port 5432 may
not be open on database server to accept external TCP/IP connections.
This means your server can be contacted over the network but is not configured to
accept the connection. Your client is not detected as a legal user for the database.
You will have to add an entry for each of your clients to the pg_hba.conf file.
Query Tool
Click on a
Database
Click on
Query Tool
Query Tool - Data Output
View Results
Databases
The databases menu allows you to create a new database
Perform maintenance
Backup or Restore
Tuples in
Tuples out
Block I/O
HTTPD
Monitoring
PEM Agent Data PEM Agent
PEM Storage
(Backend Database: pem) Monitoring PEM Agent
Data
Access pgAdmin
Columns - Change the default value for the TeamID column to seq_teamid
View - Display all teams in ascending order of their ratings. Name the view as vw_top_teams
Lab Exercise 3
View all rows in the Teams table.
Using the Edit data window, you just opened in the previous step, insert the
following rows into the Teams table:
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Data Types Sequences
TIMESTAMP BYTEA
NUMERIC CHAR
BOOL
DATE MONEY
INTEGER VARCHAR
TIME XML
JSON
SERIAL TEXT INTERVAL JSONB
Structured Query Language
CREATE COMMIT
INSERT
GRANT
ALTER ROLLBACK
UPDATE
DROP SAVEPOINT
REVOKE
DELETE
TRUNCATE SET TRANSACTION
DDL Statements
Statement Syntax
ALTER TABLE ALTER TABLE [IF EXISTS] [ONLY] name [*] action [,…]
Statement Syntax
Statement Syntax
Object Description
Table: emp
Column: cityname
Data Type: city
Table: clients
Column: res_city
Data Type: city
Types of JOINS
Type Description
Returns all matching rows and rows from left-hand table even if there is no
LEFT OUTER JOIN
corresponding row in the joined table
Returns all matching rows and rows from right-hand table even if there is
RIGHT OUTER JOIN
no corresponding row in the joined table
FULL OUTER JOIN Returns all matching as well as not matching rows from both tables
CROSS JOIN Returns all rows of both tables with Cartesian product on number of rows
Using SQL Functions
Can be used in SELECT statements and WHERE clauses
Includes
String Functions
Format Functions
Date and Time Functions
Aggregate Functions
Example:
=> SELECT lower(name)FROM departments;
EXPLAIN ANALYZE is used to run the query to get actual runtime stats
Execution Plan Components
Execution Plan Components: Syntax:
Estimated average width (in bytes) of rows output by this plan node
PEM - Query Tool’s Visual Explain
Quoting
Single quotes and dollar quotes are used to specify non-numeric values
Example:
'hello world'
'2011-07-04 13:36:24'
'{1,4,5}'
$$A string "with" various 'quotes' in.$$
$foo$A string with $$ quotes in $foo$
Double quotes are used for names of database objects which either clash with
keywords, contain mixed case letters, or contain characters other than a-z, 0-9
or underscore
Example:
SELECT * FROM "select“
CREATE TABLE "HelloWorld" ...
SELECT * FROM "Hi everyone and everything"
Indexes
Indexes are a common way to enhance performance
Block Range
B-tree SP-GiST Index on
Hash Index (BRIN) GIN GIST
(default) Indexes Expressions
Example Index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [
ONLY ] table_name [ USING method ] ( { column_name | ( expression ) } [
COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [
ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ NULLS [ NOT ] DISTINCT ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
Example:
Module Summary
Data Types Sequences
Confirm that the view works. Display the contents of the EMPVU view.
Using your EMPVU view, write a query for the SALES department to display all
employee names and department numbers.
Lab Exercise - 3
You need a sequence that can be used with the primary key column of the dept
table. The sequence should start at 60 and have a maximum value of 200. Have
your sequence increment by 10. Name the sequence dept_id_seq.
To test your sequence, write a script to insert two rows in the dept table.
Backup, Recovery
and PITR
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Backup Types
Database SQL Dumps
Restoring SQL Dumps
Offline Physical Backups
Continuous Archiving
Online Physical Backups Using pg_basebackup
Point-in-time Recovery
Recovery Settings
Backup Tools – Barman and pgBackRest
Types of Backup
As with any database, PostgreSQL databases should be backed up regularly
Logical Backups
Physical Backups
Standard Unix tools can be used to work around this potential problem
The split command allows you to split the output into smaller files:
-d <database name> - Connect to the specified database. Also restores to this database if -C option is omitted
-C - Create the database named in the dump file and restore directly into it
-v - Verbose option
Entire Cluster - SQL Dump
pg_dumpall is used to dump an entire database cluster in plain-text SQL format
Syntax:
The database server must be shut down or in backup mode in order to get a
usable backup
File system backups only work for complete backup and restoration of an entire
database cluster
Two types of File system backup
Offline backups
Online backups
File System Backups
Offline Backups
Online Backups
archive_command must be set in postgresql.conf which archives WAL logs and supports PITR
Continuous Archiving Methods
wal_level = replica
archive_command = 'cp -i %p /home/postgres/archive/%f‘
archive_mode = on
max_wal_senders = 3
wal_keep_size = 512
Backup Command:
$ pg_basebackup [options] ..
Options for pg_basebackup command
-D <directory name> - Location of backup
-F <p or t> - Backup files format. Plain(p) or tar(t)
-R - write standby.signal and append postgresql.auto.conf
-T OLDDIR=NEWDIR - relocate tablespace in OLDDIR to NEWDIR
--waldir - Write ahead logs location
-z - Enable compression(tar) for files
-Z - Compress backup based on setting set to none, client or server
-P - Progress Reporting
-h host - host on which cluster is running
-p port - cluster port
To create a base backup of the server at localhost and store it in the local
directory /home/postgres/pgbackup
$ pg_basebackup -h localhost -D /home/postgres/pgbackup
Verify Base Backups
Verify backup taken by pg_basebackup using pg_verifybackup utility
Backup is verified against a backup_manifest generated by the server at the
time of the backup
Only plain format backups can be verified
Restoring Physical Backups
Point-in-time Recovery
Point-in-time recovery (PITR) is the ability to restore a database cluster
up to the present or to a specified point of time in the past
Uses a full database cluster backup and the write-ahead logs found in
the /pg_wal subdirectory
Manage backups and the recovery phase of multiple servers from one location
Remote backup and restore with rsync and the PostgreSQL protocol
Solves common
Fully supported Open
bottleneck problems with Support capabilities like
Source bacOpen-
parallel processing for symmetric encryption,
Sourceth troubleshooting
backup, compression, and partial restore
support
restoring and archiving
-
Feature comparison
Capability Added value Barman pgBackRest Pg_basebackup
SSH protocol support Yes Yes -
PostgreSQL protocol Works without passwordless ssh. Yes - Yes
Incremental backups Yes Yes -
RPO=0 Restore up to the last commit Yes - -
Rate limiting Preserve IO for Postgres Yes - Yes
Retention and List backups Yes Yes -
Backup compression Less backup space required - Yes -
Symmetric encryption Lower security footprint for the backup data - Yes -
Partial restore (only selected Restore required data for analysis purposes - Yes -
databases)
S3 and Azure Blob Support Use flexible Cloud Storage for backup storage Yes Yes -
Take a full database dump of the edbstore database with the pg_dump utility. The dump
should be in plain text format
Name the dump file as edbstore_full.sql and store it in the /pgbackup directory
Lab Exercise - 2
1. Take a dump of the edbuser schema from the edbstore database
and name the file as edbstore_schema.sql
3. Restore the full dump from edbstore_full.sql and verify all the objects
and their ownership.
2. Configure your cluster to run in archive mode and set the archive log location
to be /opt/arch or c:\arch.
3. Take a full online base backup of your cluster in the /pgbackup directory
using the pg_basebackup utility.
Routine
Maintenance
Tasks
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Updating Optimizer Statistics
Re-indexing in Postgres
Database Maintenance
Data files become fragmented as data is modified and deleted
If done on time nobody notices but when not done everyone knows
Collects information for relations including size, row counts, average row size
and row sampling
Long running transactions may block vacuuming, thus it should be done during
low usage times
Vacuuming Commands
When executed, the VACUUM command:
VACUUM FULL
Vacuum and Vacuum Full
VACUUM
Removes dead rows and marks the space available for future reuse
VACUUM FULL
Compacts tables by writing a complete new version of the table file with no dead space
Requires extra disk space for the new copy of the table, until the operation completes
VACUUM Syntax
Options:
FULL [ boolean ] PARALLEL integer
A cluster that runs for a long time (more than 4 billion transactions)
would suffer transaction ID wraparound
VACUUM normally skips pages without dead row versions, but some rows may
need FREEZE
vacuum_freeze_table_age controls when a whole table must be scanned
The Visibility Map
Each heap relation has a Visibility Map which keeps track of which pages
contain only tuples
Stored at <relfilenode>_vm
Syntax:
vacuumdb --help
Autovacuuming
Highly recommended feature of Postgres
It automates the execution of VACUUM, FREEZE and ANALYZE commands
Autovacuum consists of a launcher and many worker processes
A maximum of autovacuum_max_workers worker processes are allowed
Launcher will start one worker within each database every
autovacuum_naptime seconds
Workers check for inserts, updates and deletes and execute VACUUM and/or
ANALYZE as needed
track_counts must be set to TRUE as autovacuum depends on statistics
Temporary tables cannot be accessed by autovacuum
Autovacuuming Parameters
Indexes are stored on data pages and become fragmented over time
REINDEX rebuilds an index using the data stored in the index's table
An index has become "bloated", meaning it contains many empty or nearly-empty pages
An index built with the CONCURRENTLY option failed, leaving an "invalid" index
Syntax:
=> REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [
CONCURRENTLY ] name
Module Summary
Updating Optimizer Statistics
Re-indexing in Postgres
Lab Exercise - 1
1. While monitoring table statistics on the edbstore database, you found that
some tables are not automatically maintained by autovacuum. You decided to
perform manual maintenance on these tables. Write a SQL script to perform
the following maintenance:
Reclaim obsolete row space from the customers table.
Mark all the obsolete rows in the orders table for reuse.
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Loading flat files
339
The COPY Command
COPY moves data between Postgres tables and standard file-system files
Copy To:
COPY table_name[(column list])|(query) TO 'filename'|PROGRAM 'command'|STDOUT [options]
Usage:
346
Lab Exercise - 1
In this lab exercise you will demonstrate your ability to copy data:
1. Unload the emp table from the edbuser schema to a csv file, with column headers
2. Create a copyemp table with the same structure as the emp table
3. Load the csv file (from step 1) into the copyemp table
Replication and
High Availability
Tools
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
Data Replication
Failovers can be configured to such a level where application may not notice
the primary source is offline
Data Replication in Postgres
Data replication options:
Physical Streaming Replication
Patroni
Replication manager(repmgr)
Streaming Replication
Streaming Replication (Hot Standby) is a major feature of Postgres
WAL stream
Reports
Primary database Replica database
Asynchronous Replication
Streaming replication is asynchronous by default but can be configured as
synchronous
Asynchronous
Disconnected architecture
Transaction must apply changes to primary and synchronously replicated replicas using two-
phase commit actions
User gets a commit message after confirmation from both primary and replica
max_wal_senders
max_replication_slots
wal_keep_size
max_slot_wal_keep_size
Synchronous Streaming Replication Configuration
Default level of Streaming Replication is Asynchronous
synchronous_commit=on
synchronous_standby_names
-R option
max_standby_streaming_delay Duration for which replica has to wait during query conflicts
wal_receiver_create_temp_slot Authorize WAL receiver process to be able to create a temporary replication slot
pg_stat_subscription
Shows the status of subscription when using logical replication
pg_stat_wal_receiver
Shows the WAL receiver process status on Replica
pg_current_wal_lsn
pg_last_wal_receive_lsn
pg_last_xact_replay_timestamp()
Example - Monitoring Replication
Execute:
=# SELECT * FROM pg_stat_replication;
Multi-Master
Synchronous or Flexible Always-ON
Row Level
Asynchronous Deployment DDL Replication DDL and Row Filters
Consistency
Replication for Architectures
Postgres
Configurable Conflict-free
Database Rolling
Parallel Apply Auto Partitioning Column-level Replicated Data
Upgrades
Conflict Resolution Types (CRDTs)
Data filtering
Scheduling
Replica
Replication Server
REPLICATES BETWEEN POSTGRES AND NON-POSTGRES DATABASES
Client Applications
Monitors database health -
detects failures and takes
Load Balancer (e.g. Pgpool)
action
Perform switchovers
Promote a replica
repmgrd tool:
Monitor replication
Email notification
repmgr Architecture
Replica - 1 Primary Replica - 2
repmgr
repmgr repmgr
See the Training Portal for the full library of Postgres training classes
Get familiar with the EDB Tools available as part of the EDB Postgres Platform
For any questions related to EDB Postgres Trainings and Certifications,
or for additional information, write to:
[email protected]
Thank you!
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .