0% found this document useful (0 votes)
49 views400 pages

PostgreSQL Essentials v16 Student

The document outlines the PostgreSQL Essentials v16 course agenda, covering topics such as system architecture, database security, installation, and routine maintenance tasks. It highlights the major features of PostgreSQL, including its extensibility, scalability, and performance, as well as the various EDB Postgres server offerings. Additionally, it provides insights into the architectural overview, deployment options, and installation guidelines for PostgreSQL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views400 pages

PostgreSQL Essentials v16 Student

The document outlines the PostgreSQL Essentials v16 course agenda, covering topics such as system architecture, database security, installation, and routine maintenance tasks. It highlights the major features of PostgreSQL, including its extensibility, scalability, and performance, as well as the various EDB Postgres server offerings. Additionally, it provides insights into the architectural overview, deployment options, and installation guidelines for PostgreSQL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 400

PostgreSQL Essentials v16

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Course Agenda
 Introduction and Architectural Overview  Database Security

 System Architecture  Monitoring and Admin Tools Overview

 Installation  SQL Primer

 User Tools - Command Line Interfaces  Backup and Recovery

 Database Clusters  Routine Maintenance Tasks

 Database Configuration  Data Loading

 Data Dictionary  Data Replication and High Availability

 Creating and Managing Database Objects


Introduction

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 EDB Portfolio

 History of PostgreSQL

 Major Features

 Architectural Overview

 General Database Limits

 Common Database Object Names


EDB Supported Databases

Postgres Postgres Extended Postgres Advanced Server

Open source Postgres EDB proprietary distribution for EDB EDB proprietary distribution with
Postgres Distributed use cases Transparent Data Encryption
with Transparent Data Encryption

 EDB continues to be committed  SQL compatible with Postgres,  SQL compatible with Oracle,
to advancing features in extended for stringent reduces effort to migrate
collaboration with the broader availability and advanced applications and data to
community replication needs Postgres
 Transparent Data Encryption  Transparent Data Encryption
 Formerly known as  Additional value-add enterprise
2ndQPostgres features
PostgreSQL

The open-source database of choice

Performance Scalability Extensibility Community-driven


Handles enterprise Multiple technical Supported by a wide Multiple companies
workloads with 50% options for operating array of extensions and individuals
improvement in the Postgres at scale plus multiple SQL contribute to the
last 4 years and NoSQL data project and drive
models innovation
Facts about PostgreSQL

 The world’s most advanced open sourcopen-source

 Designed for extensibility and customization

 ANSI/ISO compliant SQL support

 Actively developed for more than 30 years

 University Postgres (1986-1993)

 Postgres95 (1994-1995)

 PostgreSQL (1996-current)
PostgreSQL Lineage
PostgreSQL History

SQL Server
Sybase
SQL Server
Microsoft

Ingres Postgres PostgreSQL


UC Berkeley UC Berkeley Community
EnterpriseDB
Illustra To Informix
To IBM
Informix

Ingres Corporation Computer CA-OS Ingres


Associates Corp.

1975 Timeline Today


EDB Postgres Extended Server

EDB Postgres Extended Server


Replication Enhancements
 Replication Enhancements
 Enables EDB Postgres Distributed functionality such as:
PostgreSQL
 Group Commit, Commit at Most Once, and Eager all-node synchronous
replication
 Timestamp-based Snapshots

 Estimates for Replication Catch-up times

 Selective Backup of a Single Database

 Hold back freezing to assist resolution of UPDATE/DELETE conflicts

 Multi-node PITR

 Application Assessment

 Only available for use with an additional subscription for


Extreme HA
EDB Postgres Advanced Server
EDB Postgres Advanced Server
 Oracle Compatibility - Compatibility for schemas, data types, indexes, users,
Oracle Compatibility roles, partitioning, packages, views, PL/SQL triggers, stored procedures,
functions, and utilities
Additional Security  Additional Security - Password policy management, session tag auditing, data
redaction, SQL injection protection, and procedural language code obfuscation
Developer Productivity  Developer Productivity - Over 200 pre-packaged utility functions, user-defined
object types, autonomous transactions, nested tables, synonyms, advanced
queueing
DBA Productivity
 DBA Productivity - Throttle CPU and I/O at the process level, over 55 extended
catalog views to profile all the objects and processing that occurs in the
Performance database
 Performance - Query optimizer hints, SQL session/system wait diagnostics
Replication Enhancements
 Replication Enhancements - Enables EDB Postgres Distributed functionality such
as Group Commit, Commit at Most Once and Eager all-node synchronous
PostgreSQL replication, timestamp-based snapshots, estimates for replication catch-up
times, selective backup of a single database, hold back freezing to assist
resolution of UPDATE/DELETE conflicts, multi-node PITR
Database Servers - High Level Overview
EDB Postgres EDB Postgres Advanced EDB Postgres Advanced
Database Server PostgreSQL
Extended Server Server: Berkeley Server: Redwood

SQL Compatibility PostgreSQL PostgreSQL PostgreSQL + Oracle

Binary Compatibility Yes No No No

Advanced PGD Features ✔ 14+ 14+

Transparent Data Encryption 15+ 15+ 15+

Advanced Security ✔ ✔

Advanced SQL ✔ ✔

Advanced Performance ✔ ✔

Resource Manager ✔ ✔

Bulk Data Loader ✔ ✔

Oracle Compatibility ✔
Capabilities And Tools

Management/Monitoring High Availability Backup and Recovery


Postgres Enterprise Manager EDB Postgres Distributed Barman
pgAdmin Failover Manager pgBackRest
Repmgr
Patroni

Migration Integration Kubernetes


Migration Portal Connectors EDB Postgres for Kubernetes
Migration Toolkit Foreign Data Wrappers CloudNativePG
Replication Server Connection Poolers
Major Features
 Portable:
 Written in ANSI C
 Supports Windows, Linux, Mac OS/X and major UNIX platforms

 Reliable:
 ACID Compliant
 Supports Transactions and Savepoints
 Uses Write Ahead Logging (WAL)

 Scalable:
 Uses Multi-version Concurrency Control
 Table Partitioning and Tablespaces
 Parallel Sequential Scans, DDL(Table and Index Creation)
Major Features (continued)
 Secure:  Advanced:
 Employs Host-Based Access Control, SSL  Supports Triggers, Functions and Procedures
Connections and Logging using Custom Procedural Languages
 Provides Object-Level Permissions and Row  Major Database Version Upgrades using
Level Security pg_upgrade

 Recovery and Availability:  Unlogged Tables and Materialized Views

 Physical and Logical Streaming Replication


 Support for Sync, Async and Cascaded
Replication
 Supports Hot-Backup using pg_basebackup
and Point-in-Time Recovery
Postgres for Big Data
 Postgres enables you to support a
wider range of workloads with
your relational database
 An Object-relational design and decades
of proven reliability make Postgres the
most flexible, extensible and performant
database available

 Document store capabilities: XML, JSON,


PLV8; HStore (key-value store); non-
durable storage; full text indexing
Architectural Overview

Connectors

PERL DBI
NODE.JS

PYTHON
LIBPQ

ODBC
ECPG
JDBC

.NET

TCL
PostgreSQL

Shared Background OS Kernel


User Process Storage
Memory Processes Cache
General Database Limits

Limit Value
Maximum Database Size Unlimited

Maximum Table Size 32 TB

Maximum Row Size 1.6 TB

Maximum Field Size 1 GB

Maximum Rows per Table Unlimited

Maximum Columns per Table 250-1600 (Depending on Column types)

Maximum Indexes per Table Unlimited


Common Database Object Names

Industry Term Postgres Term


Table or Index Relation

Row Tuple

Column Attribute

Data Block Page (when block is on disk)

Page Buffer (when block is in memory)


Lab Setup Guidelines
 All the instructor demos and labs are based on Linux

 Rocky Linux machine or virtual machine with at least 1 GB RAM and 20


GB storage space is recommended

 Participants using Linux must follow instructor during the installation


module and install PostgreSQL
Module Summary
 EDB Portfolio

 History of PostgreSQL

 Major Features

 Architectural Overview

 General Database Limits

 Common Database Object Names


System
Architecture

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Architectural Summary  Commit and Checkpoint

 Process and Memory Architecture  Statement Processing

 Utility Processes  Physical Database Architecture

 Connection Request-Response  Data Directory Layout

 Disk Read Buffering  Installation Directory Layout

 Disk Write Buffering  Page Layout

 Background Writer Cleaning Scan


Architectural Summary

Postgres uses processes, not threads

The “Postmaster” process acts as a supervisor

Several utility processes perform background work


 Postmaster starts them and restarts them if they die

One backend process is created per user session


 Postmaster listens for new connections
Process and Memory Architecture
Postmaster

Shared Memory
Shared Buffers WAL Buffers Process Array

BGWRITER LOGGER
Data WAL Archived
Files Segments WAL
CHECKPOINTER ARCHIVER

AUTOVACUUM LOGICAL
REPLICATION
Error Log
Files
WAL WRITER
Utility Processes
 Background writer
 Writes dirty data blocks to disk

 WAL writer
 Flushes write-ahead log to disk

 Checkpointer
 Automatically performs a checkpoint based on config parameters

 Logging collector
 Routes log messages to syslog, eventlog, or log files
More Utility Process
 Autovacuum launcher
 Starts Autovacuum workers as needed

 Autovacuum workers
 Recover free space for reuse

 Archiver
 Archives write-ahead log files

 Logical replication launcher


 Starts logical replication apply process for logical replication
Postmaster as Listener

Client requests a
connection

Postmaster
 Postmaster is the main process called postgres

 Listens on 1, and only 1, tcp port

 Receives client connection requests Shared Memory

27
User Backend Process
 Postmaster process spawns a new Postmaster
server process for each connection
request detected

 Communication is done using


semaphores and shared memory work_mem
Postgres

 Authentication - IP, user and password

Shared Memory
 Authorization - Verify permissions

28
Respond to Client
Postmaster

work_mem
Postgres
 User backend process called postgres

 Callback to client
Shared Memory
 Waits for SQL

 Query is transmitted using plain text


Disk Read Buffering Postgres Postgres Postgres

 Postgres buffer cache


(shared_buffers) reduces
OS reads Shared (data) Buffers

 Read the block once, then Shared Memory


examine it many times in
cache
Stable
Databases
Disk Write Buffering Postgres Postgres Postgres

 Blocks are written to disk


only when needed:
Shared (data) Buffers
 To make room for new blocks
Shared Memory
 At checkpoint time

CHECKPOINT
Stable
Databases
Background Writer
Cleaning Scan Postgres Postgres Postgres

 Background writer scan


attempts to ensure an
adequate supply of clean Shared (data) Buffers
buffers

 Back end write dirty


buffers as need BGWRITER

Stable
Databases
Write Ahead Logging (WAL) Postgres Postgres Postgres

 Back end write data to


WAL buffers Shared Memory

 Flush WAL buffers Shared (data) WAL


periodically (WAL writer), Buffers Buffers
on commit, or when
buffers are full

 Group commit Transaction


Log
Stable
Databases
Transaction Log Archiving Postgres Postgres Postgres

 Archiver spawns a task


Shared Memory
to copy away pg_wal
log files when full
Shared (data) WAL
Buffers Buffers

Transaction
Stable Log
Databases

Archive
Command
Commit and Checkpoint

Before commit Uncommitted updates are in memory

WAL buffers are written to the disk


After commit (write-ahead log file) and shared
buffers are marked as committed

Modified data pages are written from


After checkpoint shared memory to the data files
Statement Processing

Optimize
 Check syntax
 Call traffic cop  Execute query based on
 Identify query type query plan
 Command processor if  Planner generates a plan
needed  Uses database statistics
 Break query into tokens  Apply Optimizer Hints
 Query cost calculation
 Choose best plan
Parse Execute
Physical Database Architecture

Database Cluster
Collection of databases managed by single server instance

Each cluster has a separate

Data Directory TCP port Set of Processes

Databases
A cluster can contain multiple databases
Installation Directory Layout
 Default Installation Directory Location:

 Linux - /usr/pgsql-16

 bin – Programs

 lib – Libraries

 share – Shared data

 Default Data directory - /var/lib/pgsql/16/data


Database Cluster Data Directory Layout

DATA

Status Configuration Postmaster


global base pg_tblsc pg_wal pg_log log
Directories Files Info Files

Cluster wide Contains Symbolic link to Write ahead Startup logs Error logs pg_xact, pg_multiexact, postgresql.conf,
database Databases tablespaces logs pg_snapshots, pg_stat, pg_hba.conf,
objects pg_subtrans,pg_notify, pg_ident.conf,
pg_serial, pg_replslot, postgresql.auto.conf
pg_logical,
pg_dynshmem
Physical Database Architecture
File-per-table, file-per-index

A table-space is a directory

Each database that uses that table-space gets a subdirectory

Each relation using that table-space/database combination gets one or more files, in 1GB chunks

Additional files are used to hold auxiliary information (free space map, visibility map)

Each file name is a number (see pg_class.relfilenode)


Sample - Data Directory Layout 14297

14307 14300
Database OID
14405
base
14312 14498
DATA

pg_tblsc
16650

Tablespace OID
14299
/storage/pg_tab
14307 14301

16700

16651 16701
Page Layout
 Page header  Row/index entry
 General information about the page  The actual row or index entry data
 Pointers to free space  Special
 24 bytes long  Index access method specific data
 Row/index pointers  Empty in ordinary tables
 Array of offset/length pairs pointing to the
actual rows/index entries
 4 bytes per item
 Free space
 Unallocated space
 New pointers allocated from the front, new
rows/index entries from the rear
Page Structure

Page
Item Item Item
Header

8K

Tuple

Tuple Tuple Special


Module Summary
 Architectural Summary  Background Writer Cleaning Scan

 Shared Memory  Commit and Checkpoint

 Inter-processes Communication  Physical Database Architecture

 Statement Processing  Data Directory Layout

 Utility Processes  Installation Directory Layout

 Disk Read Buffering  Page Layout

 Disk Write Buffering


PostgreSQL
Installation

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Deployment Options

 OS User and Permissions

 Package Installation

 Installation Example and Practice Labs

 Setting Environmental Variables


Deployment Options
 Deployment methods for PostgreSQL and supported Tools:

 BigAnimal: Fully managed database-as-a-service.

 CloudNativePG: Operator designed for managing PostgreSQL workloads on Kubernetes clusters.

 Native packages or installers: PostgreSQL Yum Repository can be used for YUM and RPM based
installation

 Source Code Installation: PostgreSQL source code is open-source and free to use

 Note: This training is based on PostgreSQL YUM Repository deployment.


OS User and Permissions
 PostgreSQL runs as a daemon (Unix / Linux) or service (Windows)

 The PostgreSQL Installation requires superuser/admin access

 All processes and data files must be owned by a user in the OS

 During installation, a postgres locked user will be created on Linux

 On Windows a password is required

 SELinux must be set to permissive mode on systems with SELinux


The postgres User Account
 It is advised to run Postgres under a separate user account

 This user account should only own the data directory that is
managed by the server

 The useradd or adduser Unix command can be used to add a user

 The user account named postgres is used throughout this training


Practice Lab - Add postgres User
 Connect to your Linux machine as root or sudo user

 Use useradd command to create a new user:

[root@Base ~]# useradd postgres


[root@Base ~]# passwd postgres
Changing password for user postgres.
New password:
Retype new password:
passwd:
all authentication tokens updated successfully.
Package Installation Options

Wizard Installer RPM Installer YUM/APT Installer

 Interactive Method  Preferred Installation  Attempt to Install


Method on Linux required package
 Graphical or Command dependencies
Line Mode, available for  Dependencies are
Windows resolved manually  Can be used to install
Postgres in Isolated
 Easy Download from Environments
www.enterprisedb.com
YUM Installation
Configure Repositories
 PostgreSQL can be installed using yum or apt repo:

 https://www.postgresql.org/download/linux/redhat/

 https://www.postgresql.org/download/linux/ubuntu/

 On this page, select the version and the platform on which


PostgreSQL needs to be installed

 It will provide you with repository location and the post installation
steps to be performed for setup of the initial database cluster
Example – Download the PostgreSQL YUM Repository
Example – Download the PostgreSQL APT Repository
Practice Lab - Install PostgreSQL on Rocky Linux
 Install the repository RPM:
 sudo dnf install -y
https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-
x86_64/pgdg-redhat-repo-latest.noarch.rpm
 Disable the built-in PostgreSQL module:
 sudo dnf -qy module disable postgresql

 Install PostgreSQL:
 sudo dnf install -y postgresql16-server

 Configure a Package Installation using service configuration file(Optional)


# /usr/lib/systemd/system/postgresql-16.service
 Create a database cluster and start the cluster using services:
 sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
 sudo systemctl enable postgresql-16
 sudo systemctl start postgresql-16
After Installation
Database Cluster Defaults
 Data directory – /var/lib/pgsql/16/data

 Default authentication – peer and scram-sha-256

 Default database superuser – postgres

 Default password of database superuser – blank

 Default port – 5432


Practice Lab - Connecting to a Database
 Connect to the default database using psql and change password of the superuser postgres:

[root@pgsrv1 ~]# su - postgres


[postgres@pgsrv1 ~]$ /usr/pgsql-16/bin/psql -d postgres -U postgres
postgres=# ALTER USER postgres PASSWORD 'postgres';
ALTER ROLE
postgres=# \q

 Change the authentication method to scram-sha-256 in pg_hba.conf file and reload the server

[postgres@pgsrv1 ~]$ vi /var/lib/pgsql/16/data/pg_hba.conf


local all all scram-sha-256
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256

[postgres@pgsrv1 ~]$ /usr/pgsql-16/bin/pg_ctl -D /var/lib/pgsql/16/data/ reload


server signaled
Setting Environmental Variables
 Setting environment variables is very important for trouble free
startup/shutdown of the database server
 PATH – should point to correct bin directory

 PGDATA – should point to correct data cluster directory

 PGPORT – should point to correct port on which database cluster is running

 PGUSER – specifies the default database user name

 PGDATABASE – specify the default database

 PGPASSWORD – specify default password

 Edit .profile or .bash_profile to set the variables


 In Windows set these variables using my computer properties page
Example – Environmental Variables setup
[postgres@pgsrv1 ~]$ vi .bash_profile
Edit User Profile
PATH=/usr/pgsql-16/bin/:$PATH:$HOME/.local/bin:$HOME/bin

export PATH
export PGDATA=/var/lib/pgsql/16/data/
export PGUSER=postgres Logoff and Login
export PGPORT=5432
export PGDATABASE=postgres

[postgres@pgsrv1 ~]$ exit


logout
[root@pgsrv1 ~]# su - postgres Verify
Environmental
[postgres@pgsrv1 ~]$ which psql
Settings
/usr/pgsql-16/bin/psql

[postgres@pgsrv1 ~]$ pg_ctl status


pg_ctl: server is running (PID: 1663)
/usr/pgsql-16/bin/postgres "-D" "/var/lib/pgsql/16/data/"
Module Summary
 Deployment Options

 OS User and Permissions

 Package Installation

 Installation Example and Practice Labs

 Setting Environmental Variables


Lab Exercise - 1
 Choose the platform on which you want to install PostgreSQL

 Download the PostgreSQL installer from the postgresql.org for


the chosen platform

 Prepare the platform for installation

 Install PostgreSQL and connect to a database using psql


User Tools -
Command Line
Interfaces

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Introduction to psql

 Connecting to Database

 psql Command Line Parameters

 psql Meta-Commands

 Conditional and Information Commands


psql
Introduction to psql
 psql is a command line interface (CLI) to Postgres

 Can be used to execute SQL queries and psql meta commands

[postgres@pgsrv1 ~]$ psql -p 5432 -U postgres -d postgres


Password for user postgres:
psql (16.0)
Type "help" for help.

postgres=# \q
Connecting to a Database

psql Connection Options: Environmental Variables

 -d <Database Name>  PGDATABASE, PGHOST,


PGPORT and PGUSER
 -h <Hostname>

 -p <Database Port>

 -U <Database Username>
Conventions
 psql has its own set of commands, all of which start with a backslash (\).

 Some commands accept a pattern. This pattern is a modified regex. Key points:

 * and ? are wildcards

 Double-quotes are used to specify an exact name, ignoring all special characters and
preserving case
On Startup…
 During startup, psql considers environment variables for connection

 psql will then execute commands from $HOME/.psqlrc file, this can be
skipped using -X option

 -f FILENAME will execute the commands in FILENAME, then exit

 -c COMMAND will execute COMMAND (SQL or internal) and then exit

 --help will display all the startup options, then exit

 --version will display version info and then exit


Entering Commands
 psql uses the command line editing capabilities that are available
in the native OS. Generally, this means:

 Up and Down arrows cycle through command history

 On UNIX, there is tab completion for various things, such as SQL commands
History and Query Buffer
 \s will show the command history

 \s FILENAME will save the command history

 \e will edit the query buffer and then execute it

 \e FILENAME will edit FILENAME and then execute it

 \w FILENAME will save the query buffer to FILENAME


Controlling Output
 psql -o FILENAME or meta command \o FILENAME will send query
output (excluding STDERR) to FILENAME

 \g FILENAME executes the query buffer sending output to FILENAME

 \watch <seconds> can be used to run previous query repeatedly


Advanced Features - Variables
 psql provides variable substitution

 Variables are simply name/value pairs

 Use \set meta command to set a variable


=> \set city Edmonton

=> \echo :city

Edmonton

 Use \unset to delete a variable


=> \unset city
Advanced Features - Special Variables
 Settings can be changed at runtime by altering special variables

 Some important special variables include:

 AUTOCOMMIT, ENCODING, HISTFILE, ON_ERROR_ROLLBACK, ON_ERROR_STOP, PROMPT1


and VERBOSITY

 Example:

=# \set AUTOCOMMIT off

 Once AUTOCOMMIT is set to off use COMMIT/ROLLBACK to complete the running transaction
Conditional Commands
 Conditional commands primarily helpful for scripting

 \if EXPR begin conditional block

 \elif EXPR alternative within current conditional block

 \else final alternative within current conditional block

 \endif end conditional block


Information Commands
 \d[(i|s|t|v|b|S)][+] [pattern]

 List of objects (indexes, sequences, tables, views, tablespaces and dictionaries)

 \d[+] [pattern]

 Describe structure details of an object

 \l[ist][+]

 Lists of databases in a database cluster


Information Commands (continued)
 \dn+ [pattern]

 Lists schemas (namespaces)

 + adds permissions and description to output

 \df[+] [pattern]

 Lists functions

 + adds owner, language, source code and description to output


Common psql Meta Commands
 \q or ^d or quit or exit

 Quits the psql program

 \cd [ directory ]

 Change current working directory

 Tip - To display your current working directory, use \! pwd

 \! [ command ]

 Executes the specified Unix or Windows command

 If no command is specified, escapes to a separate Unix shell (CMD.EXE in Windows)


Help
 \conninfo
 Current connection information

 \?
 Shows help information about psql commands

 \h [command]
 Shows information about SQL commands

 If command isn't specified, lists all SQL commands

 psql --help
 Lists command line options for psql
Module Summary
 Introduction to psql

 Connecting to Database

 psql Command Line Parameters

 psql Meta-Commands

 Conditional and Information Commands


Prepare Lab Environment
 In the training materials provided by EnterpriseDB there is a script file edbstore.sql that can be
executed using psql to create a sample edbstore database. Here are the steps:
 Download the edbstore.sql file and place in a directory which is accessible to the postgres user

 Login as postgres OS user

 Run the psql command with the -f option to execute the edbstore.sql file and install all the sample
objects required for this training
psql -p 5432 -f edbstore.sql –d postgres –U postgres
 Enter postgres database user password

 After successful execution, a new database named edbstore owned by a new database user edbuser
is created. Default password for edbuser is edbuser
 Connect to edbstore database and verify newly created objects using psql meta commands.

psql -p 5432 -h localhost –d edbstore –U edbuser


Lab Exercise - 1
 In this lab exercise you will have a chance to practice what you have learned
through using command line interfaces:
1. Connect to a database using psql 9. Do the same thing, just saving data, not the
column headers
2. Switch databases
10. Create a script via another method, and
3. Describe the customers table
execute from psql
4. Describe the customers table
11. Turn on the expanded table formatting mode
including description
12. Lists tables, views and sequences with their
5. List all databases
associated access privileges
6. List all schemas
13. Which meta command displays the SQL text
7. List all tablespaces for a function?

8. Execute a sql statement, saving 14. View the current working directory
the output to a file
Database
Clusters

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Database Clusters

 Creating a Database Cluster

 Starting and Stopping the


Server (pg_ctl)

 Connecting to the Server


Using psql
Database Clusters
 A Database Cluster is a collection of databases managed by
a single server instance
 Database Clusters are comprised of:
 Data directory

 Port

 Default databases are created named:


 template0

 template1

 postgres
Creating a Database Cluster
 Choose the data directory location for new cluster

 Initialize the database cluster storage area (data directory) using the initdb
utility

 initdb will create the data directory if it doesn’t exist

 You must have permissions on the parent directory so that initdb can create
the data directory

 The data directory can be created manually by superuser and the ownership
can be given to postgres user
initdb Utility
$ initdb [OPTION]... [DATADIR]
 Options:
-D, --pgdata location for this database cluster
-E, --encoding set default encoding for new databases
-U, --username database superuser name
-W, --pwprompt prompt for a password for the new superuser
-X, --waldir location for the write-ahead log directory
--wal-segsize size of wal segments , in megabytes
-k, --data-checksums use data page checksums
-?, --help show this help, then exit

 If the data directory is not specified, the environment variable PGDATA is used
Example - initdb
[root@Base ~]# mkdir /edbstore
[root@Base ~]# chown postgres:postgres /edbstore
[root@Base ~]# su – postgres

[postgres@Base ~]$ initdb -D /edbstore --wal-segsize 1024 -W

 In the above example the database  --wal-segsize 1024 MB


system will be owned by user specifies the write-ahead log
postgres
file segment size
 The postgres user is the database
superuser  -W is used to force initdb to
prompt for the superuser
 The default server config file will be
created in /edbstore named password
postgresql.conf
pg_ctl Utility
 pg_ctl is a command line utility provided by Postgres to initialize, start, stop and
control a Postgres instance

 It provides options for redirecting start log, controlled startup and shutdown

 -D option or environmental variable PGDATA can be used to specify cluster


data directory

pg_ctl -D datadir

start stop restart reload status promote init logrotate kill


Starting a Database Cluster
 After initializing a database cluster, a unique port must be assigned
 Choose a unique port for postmaster in postgresql.conf
 Start the database cluster using pg_ctl utility
 Example:

[postgres@Base ~]$ vi /edbstore/postgresql.conf


port = 5434

[postgres@Base ~]$ pg_ctl -D /edbstore/ -l /edbstore/startlog start


waiting for server to start.... done
server started

[postgres@Base ~]$ pg_ctl -D /edbstore/ status


pg_ctl: server is running (PID: 62239)
Connecting To a Database Cluster
 The psql and pgAdmin can be used for connections
[postgres@Base ~]$ psql -p 5434 -d edb -U postgres
Type "help" for help.

edb=# show port;


port
------
5434
(1 row)

edb=# show data_directory;


data_directory
----------------
/edbstore
(1 row)

edb=# \q
Reload a Database Cluster
 Some configuration parameter changes do not require a restart

 Changes can be reloaded using the pg_ctl utility

 Changes can also be reloaded using pg_reload_conf()

 Syntax:

$ pg_ctl reload [options]

-D location of the database cluster’s data directory

-s only print errors, no informational messages


Stopping a Database Cluster
 pg_ctl supports three modes of shutdown
 smart quit after all clients have disconnected
 Fast quit directly, with proper shutdown (default)
 immediate quit without complete shutdown; will lead to recovery
 Syntax:
$ pg_ctl stop [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]
 Example:
[postgres@Base ~]$ pg_ctl -D /edbstore/ stop
waiting for server to shut down.... done
server stopped

[postgres@Base ~]$ pg_ctl -D /edbstore/ status


pg_ctl: no server running
View Cluster Control Information
 pg_controldata can be used to view the control information for
a database cluster
 It can be run with data directory as an option

[postgres@Base ~]$ pg_controldata /edbstore/


……………………………………………………………………………………………….
Database system identifier: 6724770293870218226
Database cluster state: shut down
Latest checkpoint location: 0/41A3AA40
Latest checkpoint's REDO WAL file: 000000010000000000000001
Latest checkpoint's TimeLineID: 1
Backup start location: 0/0
Backup end location: 0/0
wal_level setting: replica
Database block size: 8192
WAL block size: 8192
Data page checksum version: 0
Module Summary
 Database Clusters

 Creating a Database Cluster

 Starting and Stopping the


Server (pg_ctl)

 Connecting to the Server


Using psql
Lab Exercise - 1
1. A new website is to be developed for an online music store.

 Create a new cluster edbdata with ownership of postgres user

 Start your edbdata cluster

 Reload your cluster with pg_ctl utility and using pg_reload_conf() function

 Stop your edbdata cluster with fast mode


Configuration

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Server Parameter File - postgresql.conf

 Viewing and Changing Server Parameters

 Configuration Parameters - Security, Resources and WAL

 Configuration Parameters - Error Logging, Planner and Maintenance

 Viewing Compilation Settings

 Using File Includes


Setting Server Parameters
There are many configuration parameters that effect the behavior of the
database system
All parameter names are case-insensitive
Every parameter takes a value of one of five types:
 boolean

 integer

 floating point

 string

 enum

One way to set these parameters is to edit the file postgresql.conf, which is
normally kept in the data directory
The Server Parameter File - postgresql.conf
 Holds parameters used by a cluster
 Parameters are case-insensitive
 Normally stored in data directory
 initdb installs default copy
 Some parameters only take effect on server restart (pg_ctl restart)
 # used for comments
 One parameter per line
 Use include directive to read and process another file
 Can also be set using the command-line option
Viewing and Changing Server Parameters

Configuration parameters Configuration parameters


can be viewed using: can be modified for:

 SHOW command  Single session using


the SET command
 pg_settings
 Database user using
 pg_file_settings ALTER USER

 Single database using


ALTER DATABASE
Changing Configuration Parameter at Cluster Level

 Use ALTER SYSTEM command to edit


[postgres@pgsrv1 ~] psql edb postgres
cluster level settings without editing
postgresql.conf
edb=# ALTER SYSTEM SET work_mem=20480;
ALTER SYSTEM
edb=# SELECT pg_reload_conf();  ALTER SYSTEM writes new setting to
edb=# ALTER SYSTEM RESET work_mem;
postgresql.auto.conf file which is read
ALTER SYSTEM at last during server reload/restarts
edb=# SELECT pg_reload_conf();

 Parameters can be modified using


ALTER SYSTEM when required
Connection Settings
 listen_addresses (default *) - Specifies the addresses on which the server is to listen for
connections. Use * for all
 port (default 5432) - The port the server listens on
 max_connections (default 100) - Maximum number of concurrent connections the server can
support
 superuser_reserved_connections (default 3) - Number of connection slots reserved for
superusers
 reserved_connections (default 0) - Reserved slots for users with
pg_use_reserved_connections role
 unix_socket_directory (default /tmp) - Directory to be used for UNIX socket connections to the
server
 unix_socket_permissions (default 0777) - access permissions of the Unix-domain socket
Security and Authentication Settings
authentication_timeout (default is 1 minute) – Maximum time to
complete client authentication, in seconds

row_security (default is on) – Controls row security policy behavior

password_encryption (default scram-sha-256) – Determines the


algorithm to use to encrypt password

ssl (default: off) - Enables SSL connections


SSL Settings
ssl_ca_file - Specifies the name of the file containing the SSL server certificate
authority (CA)

ssl_cert_file - Specifies the name of the file containing the SSL server certificate

ssl_key_file - Specifies the name of the file containing the SSL server private key

ssl_ciphers - List of SSL ciphers that may be used for secure connections

ssl_dh_params_file – Specifies file name for custom OpenSSL DH paramters


Memory Settings

maintenance_ autovacuum
shared_buffers temp_buffers work_mem temp_file_limit
work_mem _work_mem

Amount of
Amount of Amount of Amount of Amount of
Size of memory
memory memory memory disk space
shared used
used sorting used for used by used for
buffer pool caching
and hashing maintenance autovacuum temporary
for a cluster temporary
operations commands worker files
tables

Server Session
Query Planner Settings
 random_page_cost (default 4.0) - Estimated cost of a random page fetch.
May need to be reduced to account for caching effects

 seq_page_cost (default 1.0) - Estimated cost of a sequential page fetch.

 effective_cache_size (default 4GB) - Used to estimate the cost of an index


scan.

 plan_cache_mode (default auto) – Controls custom or generic plan execution


for prepared statements. Can be set to auto, force_custom_plan and
force_generic_plan
Write Ahead Log Settings
 wal_level (default replica) - Determines how much information is written to the WAL. Other
values are minimal and logical
 fsync (default on) – Force WAL buffer flush at each commit, Turning this off can cause lead to
arbitrary corruption in case of a system crash
 wal_buffers (default -1, autotune) - The amount of memory used in shared memory for WAL
data. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers
 min_wal_size (default 80 MB) – The WAL size to start recycling the WAL files
 max_wal_size (default 1GB) – The WAL size to start checkpoint. Controls the number of WAL
Segments(16MB each) after which checkpoint is forced
 checkpoint_timeout (default 5 minutes) - Maximum time between checkpoints
 wal_compression (default off) – The WAL of Full Page write will be compressed and written
Where To Log

Controls logging type for a database cluster.


log_destination
Can be set to stderr, csvlog, jsonlog, syslog, and eventlog

Enables logger process to capture stderr and csv logging messages


logging_collector
These messages can be redirected based on configuration settings

log_directory - Directory where log files are written


Log File and log_filename - Format of log file name (e.g. postgresql-%Y-%m-%d_%H%M%S.log)
Directory Settings log_file_mode - permissions for log files
log_rotation_age - Used for file age-based log rotation
log_rotation_size - Used for file size-based log rotation
When To Log
Messages of this severity level or above are sent to
log_min_messages
the server log
Duration and
sampling
When a message of this severity or higher is written to the
log_min_error_statement
server log, the statement that caused it is logged along with it

When a statement runs for at least this long, it is


log_min_duration_statement
written to the server log

Logs any Autovacuum activity running for at


log_autovacuum_min_duration
least this long

Percentage of queries(above
log_statement_sample_rate
log_autovacuum_min_duration) to be logged

Sample a percentage of transactions by logging


log_transaction_sample_rate
statements
What To Log
log_connections Log successful connections to the server log

Log some information each time a session disconnects, including the duration of
log_disconnections the session

log_temp_files Log temporary files of this size or larger, in kilobytes

log_checkpoints Causes checkpoints and restart points to be logged in the server log

log_lock_waits Log information if a session is waits longer then deadlock_timeout to acquire a lock

log_error_verbosity How detailed the logged message is. Can be set to default, terse or verbose

Additional details to log with each line. Default is '%m [%p] ‘ which logs a timestamp
log_line_prefix and the process ID

log_statement Legal values are none, ddl, mod (DDL and all other data-modifying statements), or all
Background Writer Settings
 bgwriter_delay (default 200 ms) - Specifies time between activity rounds for
the background writer

 bgwriter_lru_maxpages (default 100) - Maximum number of pages that the


background writer may clean per activity round

 bgwriter_lru_multiplier (default 2.0) - Multiplier on buffers scanned per round.


By default, if system thinks 10 pages will be needed, it cleans 10 *
bgwriter_lru_multiplier of 2.0 = 20

 Primary tuning technique is to lower bgwriter_delay


Statement Behavior
 search_path - This parameter specifies the order in which schemas are
searched. The default value for this parameter is "$user", public
 default_tablespace - Name of the tablespace in which objects are created by
default
 temp_tablespaces - Tablespaces name(s) in which temporary objects are
created
 statement_timeout - Postgres will abort any statement that takes over the
specified number of milliseconds A value of zero (the default) turns this off
 idle_in_transaction_session_timeout – Terminates any session with an open
transaction that has been idle for longer than the specified duration in
milliseconds
Parallel Query Scan Settings
 Advanced Server supports parallel execution of read-only queries

 Can be enabled and configured by using configuration parameters

 max_parallel_workers_per_gather (default 2): Enables parallel query scan

 parallel_tuple_cost (default 0.1): Estimated cost of transferring one tuple from a parallel worker
process to another process

 parallel_setup_cost (default 1000): Estimates cost of launching parallel worker processes

 min_parallel_table_scan_size (default 8MB): Sets minimum amount of table data that must be
scanned in order for a parallel scan

 min_parallel_index_scan_size (default 512 KB): Sets the minimum amount of index data that must
be scanned in order for a parallel scan

 force_parallel_mode (default off): Useful when testing parallel query scan even when there is no
performance benefit
Parallel Maintenance Settings
 PostgreSQL supports parallel processes for creating an index

 Currently this feature is only available for btree index type

 max_parallel_maintenance_workers (default 2): Enables parallel index creation

max_parallel_maintenance_workers=0 max_parallel_maintenance_workers=4
Vacuum Cost Settings
 vacuum_cost_delay (default 0 ms) - The length of time, in milliseconds, that the process will
wait when the cost limit is exceeded
 vacuum_cost_page_hit (default 1) - The estimated cost of vacuuming a buffer found in the
buffer pool
 vacuum_cost_page_miss (default 10) - The estimated cost of vacuuming a buffer that must be
read into the buffer pool
 vacuum_cost_page_dirty (default 20) - The estimated cost charged when vacuum modifies a
buffer that was previously clean
 vacuum_cost_limit (default 200) - The accumulated cost that will cause the vacuuming process
to sleep
 vacuum_buffer_usage_limit(default 256kb) - The size of the Buffer Access Strategy used by the
VACUUM and ANALYZE commands
Autovacuum Settings
 autovacuum (default on) - Controls whether the autovacuum launcher runs, and
starts worker processes to vacuum and analyze tables

 log_autovacuum_min_duration (default -1) - Autovacuum tasks running longer


than this duration are logged

 autovacuum_max_workers (default 3) - Maximum number of autovacuum


worker processes which may be running at one time

 autovacuum_work_mem (default -1, to use maintenance_work_mem) -


Maximum amount of memory used by each autovacuum worker
Just-in-Time Compilation
 Just-in-Time(JIT) is a core
feature of Postgres for JIT configuration parameters:
accomplishing high performance

 JIT in Postgres supports


accelerating expression
evaluation and tuple deforming
Preset Options - Read Only Parameters
 Postgres sources are compiled using various settings.
 Various read-only configuration parameters can be used to view build settings

block_size data_directory_mode

wal_block_size server_encoding

segment_size max_function_args

wal_segment_size max_index_keys

data_checksums ssl_library
Configuration File Includes
 The postgresql.conf file can now contain include directives

 Allows configuration file to be divided in separate files

 Usage in postgresql.conf file:

 include ‘filename’

 include_dir ‘directory name’


Module Summary
 Server Parameter File - postgresql.conf

 Viewing and Changing Server Parameters

 Configuration Parameters - Security, Resources and WAL

 Configuration Parameters - Error Logging, Planner and Maintenance

 Viewing Compilation Settings

 Using File Includes


Lab Exercise - 1
1. You are working as a DBA. It is recommended to keep a backup copy of the
postgresql.conf file before making any changes. Make the necessary changes
in the server parameter file for the following settings:

 Allow up to 200 connected users on the server

 Reserve 10 connection slots for DBA users on the server

 Maximum time to complete client authentication will be 10 seconds


Lab Exercise - 2
1. Working as a DBA is a challenging job and to track down certain activities
on the database server, logging has to be implemented. Go through the
server parameters that control logging and implement the following:

 Save all the error message in a file inside the log folder in your cluster data directory
(e.g. c:\edbdata or /edbdata)

 Log all queries which are taking more than 5 seconds to execute, and their time

 Log the users who are connecting to the database cluster

 Make the above changes and verify them


Lab Exercise - 3
1. Perform the following changes recommended by a senior DBA and
verify them. Set:

 Shared buffer to 256MB

 Effective cache for indexes to 512MB

 Maintenance memory to 64MB

 Temporary memory to 8MB


Lab Exercise - 4
1. Vacuuming is an important maintenance activity and needs to be
properly configured. Change the following autovacuum parameters
on the production server. Set:

 Autovacuum workers to 6

 Autovacuum threshold to 100

 Autovacuum scale factor to 0.3

 Auto analyze threshold to 100

 Autovacuum cost limit to 100


Data Dictionary

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 The System Catalog Schema

 System Information Tables


and Views

 System Information and


Administration Functions
The System Catalog Schema
 Stores information about table and other objects

 Created and maintained automatically in pg_catalog schema

 pg_catalog is always effectively part of the search_path

 Contains:

 System Tables like pg_class etc.

 System Function like pg_database_size() etc.

 System Views like pg_stat_activity etc.


System Information Tables
 \dS in psql prompt will give you the list of pg_* tables and views

 This list is from pg_catalog schema

pg_tables list of tables

pg_constraints list of constraints

pg_indexes list of indexes

pg_trigger list of triggers

pg_views list of views


More System Information Tables

Provides summary of the contents of the server


pg_file_settings configuration file

pg_policy Stores row level security for tables

Provides access to useful information about


pg_policies each row-level security policy in the database
System Information Functions

current_database() current_schema[()] pg_postmaster_start_time() version()

current_user current_schemas(boolean) pg_current_logfile() txid_status()

pg_conf_load_time() pg_jit_available()
System Administration Functions
current_setting, set_config Return or modify configuration variables

pg_cancel_backend Cancel a backend's current query

pg_terminate_backend Terminates backend process

pg_reload_conf Reload configuration files

pg_rotate_logfile Rotate the server's log file

pg_start_backup, pg_stop_backup Used with point-in time recovery

pg_ls_logdir() Returns the name, size, and last modified time of each file in the log directory

pg_ls_waldir() Returns the name, size, and last modified time of each file in the WAL directory
More System Administration Functions
Disk space used by a tablespace, database, relation or
pg_*_size total_relation (includes indexes and toasted data)

pg_column_size Bytes used to store a particular value

pg_size_pretty Convert a raw size to something more human-readable

File operation functions. Restricted to superuser use and


pg_ls_dir, pg_read_file only on files in the data or log directories

pg_blocking_pids() Function to reliably identify which sessions block others


System Information Views
pg_stat_activity Details of open connections and running transactions

pg_locks List of current locks being held

pg_stat_database Details of databases

pg_stat_user_* Details of tables, indexes and functions

pg_stat_archiver Status of the archiver process

pg_stat_progress_basebackup View pg_basebackup progress

pg_stat_progress_vacuum Provides progress reporting for VACUUM operations

pg_stat_progress_analyze Provides progress details for ANALYZE operations

pg_hba_file_rules Provides a summary of the contents of the client authentication configuration file, pg_hba.conf

pg_stat_io Provides I/O information


Module Summary
 The System Catalog Schema

 System Information Tables


and Views

 System Information and


Administration Functions
Lab Exercise - 1
1. You are working with
different schemas in a
database. After a while you
need to determine all the
schemas in your search
path. Write a query to find
the list of schemas currently
in your search path.
Lab Exercise - 2
1. You need to determine the
names and definitions of all of
the views in your schema.
Create a report that retrieves
view information - the view
name and definition text.
Lab Exercise - 3
1. Create a report of all the users
who are currently connected. The
report must display total session
time of all connected users.

2. You found that a user has


connected to the server for a very
long time and have decided to
gracefully kill its connection. Write
a statement to perform this task.
Lab Exercise - 4
1. Write a query to display the
name and size of all the
databases in your cluster.
Size must be displayed using
a meaningful unit.
Creating and
Managing
Databases

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Object Hierarchy

 Users and Roles

 Tablespaces

 Databases

 Access Control

 Creating Schemas

 Schema Search Path


Object Hierarchy

Database
Cluster

Users/Groups
Database Tablespaces
(Roles)

Catalogs Schema Extensions

Event
Table View Sequence Functions
Triggers
Users and Roles
Database Users
 Are global within a database cluster

 Are not the operating system users

 Are used for connecting to a database

 Have a unique name not starting with pg_

 postgres is a predefined superuser


Creating Users Using psql
 How to create? CREATE USER sql command
 How to delete? DROP USER sql command
 superuser or createrole privilege is required for creating a database user

Syntax: Example:
CREATE USER name [ [ WITH ] option [ ... ]
where option can be:
SUPERUSER | CREATEDB | CREATEROLE |
INHERIT | LOGIN | NOLOGIN | REPLICATION
| BYPASSRLS |
| CONNECTION LIMIT connlimit | [
ENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'
Creating Users Using createuser
 The createuser utility can also be used to create a user
 Syntax:
$ createuser [OPTION]... [ROLENAME]

 Use --help option to view the full list of options available


 Example:
Roles
 Role is a collection of cluster and object level privileges

 Role makes it easier to manage multiple privileges

 How to create? CREATE ROLE statement

 How to assign? GRANT statement

 Who it can be assigned to? user or a group


Predefined Roles
 Provide certain administrative capabilities using these default roles

 Example: create a new user with read only access or a new user with
access to view monitoring data only

pg_checkpoint pg_read_server_files
pg_database_owner pg_signal_backend
pg_execute_server_program pg_stat_scan_tables
pg_monitor pg_write_all_data
pg_read_all_data pg_create_subscription
pg_read_all_settings pg_use_reserved_connections
pg_read_all_stats pg_write_server_files
Tablespaces
Tablespaces and Data Files
 Data is stored logically in tablespaces and physically in data files

 Tablespaces:
 Can belong to only one database cluster

 Consist of multiple data files

 Can be used by multiple databases

 Data Files:
 Can belong to only one tablespace

 Are used to store database objects

 Cannot be shared by multiple tables (one or more per table)


Advantages of Tablespaces
 Control the disk layout for a database cluster
 Store indexes and data physically separated for performance

Indexes
Tablespace A
Database Instance

Fast Storage
Transactional Tables

Historical Tables
Tablespace B
Slow Storage
Seldom Used Partition
Pre-Configured Tablespaces

PGDATA/global directory
pg_global
Tablespace
Database Instance

Cluster-wide tables and catalog objects

PGDATA/base directory
pg_default
Tablespace
Databases, schemas and other objects
Creating Tablespaces
Tablespace Physical Cluster Data
 How to create? CREATE Directory Directory
TABLESPACE command

 The tablespace directory must


be existing with permissions Directory(Database
pg_tblspc
Catalogue Version)

 Syntax:

CREATE TABLESPACE
Database Directory Symbolic
tablespace_name [ OWNER for each Database Link(Tablespace OID)
user_name ]
LOCATION 'directory‘;

Database
Objects(Files)
Example - CREATE TABLESPACE
[training@Base ~]$ sudo mkdir /newtab1
[training@Base ~]$ sudo chown postgres:postgres /newtab1
[training@Base ~]$ su - postgres
[postgres@Base ~]$ psql -p 5432 postgres postgres

postgres=# CREATE TABLESPACE fast_tab LOCATION '/newtab1';


CREATE TABLESPACE
postgres=# \db
List of tablespaces
Name | Owner | Location
------------+--------------+----------
fast_tab | postgres | /newtab1
pg_default | postgres |
pg_global | postgres |
(3 rows)
Using Tablespaces
 Use the TABLESPACE keyword while creating databases, tables and indexes

edb=# CREATE TABLE account(acno INT PRIMARY KEY,


ac_hldr_fname VARCHAR(20)) TABLESPACE fast_tab;
CREATE TABLE
Default and Temp Tablespace
 default_tablespace server parameter sets default tablespace
 default_tablespace parameter can also be set using the SET command at the session level
 temp_tablespaces parameter determines the placement of temporary tables and indexes and
temporary files
 temp_tablespaces can be a list of tablespace names

edb=# show default_tablespace;


default_tablespace
--------------------

(1 row)
edb=# show temp_tablespaces;
temp_tablespaces
------------------

(1 row)
Altering Tablespaces
 ALTER TABLESPACE can be used to rename a tablespace, change ownership
and set a custom value for a configuration parameter

 Only the owner or superuser can alter a tablespace

 The seq_page_cost and random_page_cost parameters can be altered


for a tablespace
Example - Alter Tablespace

Syntax:
ALTER TABLESPACE name RENAME TO new_name
ALTER TABLESPACE name OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
ALTER TABLESPACE name SET ( tablespace_option = value [, ... ] )
ALTER TABLESPACE name RESET ( tablespace_option [, ... ] )

edb=# ALTER TABLESPACE fast_tab RENAME TO new_tab;


ALTER TABLESPACE
edb=# \db
List of tablespaces
Name | Owner | Location
------------+--------------+----------
new_tab | postgres | /newtab
pg_default | postgres |
pg_global | postgres |
Dropping a Tablespace
 DROP TABLESPACE removes a tablespace from the system

 Only the owner or superuser can drop a tablespace

 The tablespace must be empty

 If a tablespace is listed in the temp_tablespaces parameter,


make sure current sessions are not using the tablespace

 DROP TABLESPACE cannot be executed inside a transaction


Databases
What Is a Database?
 A database is a named collection of SQL objects

 A running Postgres instance can manage multiple databases

 How to create? CREATE DATABASE command

 How to delete? DROP DATABASE command

 To determine the set of existing databases:

 SQL - SELECT datname FROM pg_database;

 psql META COMMAND - \l (backslash lowercase L)


Creating Databases
 Database can be created using:

1. createdb utility program

2. CREATE DATABASE SQL command

 SQL Command syntax:

CREATE DATABASE name [ [ WITH ] [ OWNER [=] user_name ]


[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
[ TABLESPACE [=] tablespace_name ]
[ ALLOW_CONNECTIONS [=] allowconn ]
[ CONNECTION LIMIT [=] connlimit ]
Example - Creating Databases
Accessing a Database
 pgAdmin4 or psql can be used to access a database

 To use psql, open a terminal and execute:

$ psql –U postgres –d prod

 Note: If PATH is not set you can execute psql command from the bin
directory of postgres installation
Privileges
 Cluster level
 Granted to a user during CREATE or later using ALTER USER

 These privileges are granted by superuser

 Object Level
 Granted to user using GRANT command

 These privileges allow a user to perform particular actions on a database object, such as
tables, views, or sequence
 Can be granted by owner, superuser or someone who has been given permission to grant
privileges (WITH GRANT OPTION)
GRANT Statement
 Grants object level privileges to database users, groups or roles

 GRANT can also be used to grant a role to a user

 How to view syntax and available privileges?

 Type \h GRANT in psql


Example – GRANT Statement
REVOKE Statement
 Revokes object level privileges from database users, groups or roles

 REVOKE [ GRANT OPTION FOR ] can be used to revoke only the grant
option without revoking the actual privilege

 How to view syntax and available privileges?

 Type \h REVOKE in psql


Example - REVOKE Statement
Database Schemas
What is a Schema

SCHEMA

Tables Views

Owns
Sequences Functions
USER

Domains
Benefits of Schemas
 A database can contain one or more named schemas

 By default, all databases contain a public schema

 There are several reasons why one might want to use schemas:

 To allow many users to use one database without interfering with each other

 To organize database objects into logical groups to make them more manageable

 Third-party applications can be put into separate schemas so they cannot collide
with the names of other objects
Creating Schemas
 Schemas can be added using the CREATE SCHEMA SQL command
 Syntax:
CREATE SCHEMA IF NOT EXISTS schema_name [ AUTHORIZATION
role_specification ]
 Example:
What is a Schema Search Path
 The schema search path determines which schemas are searched
for matching table names

 Search path is used when fully qualified object names are not used
in a query

 Example:

SELECT * FROM employee;

This statement will find the first employee table from the schemas listed in the
search path
Determine the Schema Search Path
 To show the current search path, execute the following command in psql:
SHOW search_path;

 Default search_path is "$user",public

 Modifying search_path:
 Cluster/Instance Level: postgresql.conf or ALTER SYSTEM

 Database Level: ALTER DATABASE

 User Lever: ALTER USER

 Session Level: SET


Object Ownership

Database
Cluster
Owner

Users/Groups
Database Tablespaces
(Roles)

Catalogs Schema Extensions

Event
Table View Sequence Functions
Triggers
Module Summary
 Object Hierarchy

 Users and Roles

 Tablespaces

 Databases

 Access Control

 Creating Schemas

 Schema Search Path


Lab Exercise - 1
 An e-music online store website
application developer wants to add an
online buy/sell facility and has asked you
to separate all tables used in online
transactions. Here you have suggested to
use schemas. Implement the following
suggested options:
 Create an ebuy user with password ‘lion’

 Create an ebuy schema which can be used


by user ebuy

 Login as the ebuy user, create a table


sample1 and check whether that table
belongs to the ebuy schema or not
Lab Exercise - 2
 Retrieve a list of databases
using a SQL query

 Retrieve a list of databases


using the psql meta command

 Retrieve a list of tables in the


edbstore database and check
which schema and owner they
have
Database
Security

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Database Security Requirements and
Protection Plan
 Levels of Security in Postgres

 Access Control using pg_hba.conf

 Introduction to Row Level Security

 Data Encryption

 General Security Recommendations


Why Database Security
 Databases are a core component
of many computing systems
 Confidential data like SIN(Social
Insurance Number), Healthcare,
Banking details is stored and
shared using databases
 It is very critical to protect stored
information from hackers,
insiders, and other groups who
intend to steal valuable data
 Database Security is a
mechanism to protect the data
against threats

183
Data Security Requirements
 Stopping improper disclosure,
modification and denial of access
to information is very important
 Who wants an employee finding
out boss’s salary, changing his/her
salary or stopping HR from
printing paychecks
 Database Security Requirements
includes:
 Confidentiality

 Integrity

 Availability
Protection Plan –
We all need one
 Access Control
Prevent
 Authentication and Authorization

 Data Control
 Views, Row Level Security, Encryptions

 Network Control Attacks

 SSL Connections, Firewalls


Protect Discover
 Auditing
 Monitoring
Levels of Security

Server and  Check Client IP


Application  pg_hba.conf

 User/Password
 Connect Privilege Database
 Schema Permissions

 Table Level Privileges


Object
 Grant/Revoke
Host Based Access Control
Host Based Access Control

pg_hba.conf

Postmaster Certificate

Client
IP: 10.8.99.30
User: appuser1

 pg_hba.conf can be used to restrict the ability to connect to a database


 SSL can be forced for selected clients based on hostname or IP address
 Different authentication methods can be used
 Superuser access can be locked down to certain IPs using pg_hba.conf
Host-Based Access Control File - pg_hba.conf
 Location: Cluster data directory

 Read Behavior: Loaded at startup; Any changes require a service reload

 File Structure: Comprises individual records (one per line); Records are processed from top
to bottom

 Record Details: Specifies connection type, database name, user name, client IP(Hostnames,
IPv6, and IPv4), and authentication method

 Authentication Methods: trust, reject, password, gss, sspi, krb5, ident, peer, pam, ldap, radius,
bsd, cert, scram, md5

 Password Encryption: Availability of password-based authentication methods depends on


the password_encryption
Authentication Methods
trust Allows unconditional connection, no password required.

Reject Unconditionally rejects the connection; useful for blocking specific hosts.

scram-sha-256 Performs SCRAM-SHA-256 authentication for password verification.

md5 Performs SCRAM-SHA-256 or MD5 authentication for password verification.

password Requires an unencrypted password, not recommended for untrusted networks.

gss Uses GSSAPI for user authentication (TCP/IP connections only).

sspi Uses SSPI for user authentication (Windows only).

ident Obtains the client's OS username by contacting the ident server for TCP/IP connections.

peer Obtains the client's OS username and matches it with the requested database user name (local connections).

ldap Authenticates using an LDAP server.

radius Authenticates using a RADIUS server.

cert Authenticates using SSL client certificates.

pam Authenticates using Pluggable Authentication Modules (PAM) provided by the OS.

bsd Authenticates using the BSD Authentication service provided by the OS.
pg_hba.conf Example
# TYPE DATABASE USER ADDRESS METHOD

# "local" is for Unix domain socket connections only


local all all peer
# IPv4 local connections:
host all all 127.0.0.1/32 scram-sha-256
# IPv6 local connections:
host all all ::1/128 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all peer
host replication all 127.0.0.1/32 scram-sha-256
host replication all ::1/128 scram-sha-256

 SQL:
select rule_number,type,database,user_name,address,netmask,auth_method
from pg_hba_file_rules ;
Authentication Problems
FATAL: no pg_hba.conf entry for host "192.168.10.23", user
“edbstore", database “edbuser“
FATAL: password authentication failed for user “edbuser“
FATAL: user “edbuser" does not exist
FATAL: database “edbstore" does not exist

 Self-explanatory message is displayed


 Verify database name, username and Client IP in pg_hba.conf
 Reload Cluster after changing pg_hba.conf
 Check server log for more information
Row Level Security
Row Level Security (RLS)
 GRANT and REVOKE can be used at table
level Account Balance

 PostgreSQL supports security policies Company J $23,925


for limiting access at row level
Company M $133,007
 By default, all rows of a table are visible
Company Z $17,092
 Once RLS is enabled on a table, all
Company L $997,654
queries must go through the security
policy Company R $72,871
 Security policies are controlled by DBA Company A $0.0
rather than application
Company T $50,194
 RLS offers stronger security as it is
enforced by the database Company Q $67,892
Example - Row Level Security
 For example, to enable row level security for the table accounts :
 Create the table first
postgres=# CREATE TABLE accounts (manager text, company text,
contact_email text);
 Then alter the table
postgres=# ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
 Syntax:
CREATE POLICY name ON table_name
[ AS { PERMISSIVE | RESTRICTIVE } ]
[ FOR { ALL | SELECT | INSERT | UPDATE | DELETE } ]
[ TO{ role_name | PUBLIC | CURRENT_USER | SESSION_USER}[,...] ]
[ USING ( using_expression ) ]
[ WITH CHECK ( check_expression ) ]
Example - Row Level Security (continued)
 To create a policy on the accounts table to allow the managers role to view
the rows of their accounts, the CREATE POLICY command can be used:
postgres=# CREATE POLICY account_managers ON accounts TO managers
USING (manager = current_user);

 To allow all users to view their own row in a user table, a simple policy can be
used:
postgres=# CREATE POLICY user_policy ON users USING (user =
current_user);
Data Encryption
Database Level Encryption
 Encrypting everything does not make data secure
 Resources are consumed when you query encrypted data
 pgcrypto provides mechanism for encrypting selected columns
 pgcrypto supports one-way and two-way data encryption
 Install pgcrypto using CREATE EXTENSION command
CREATE EXTENSION pgcrypto;
General Security
Recommendations
General Recommendations - Database Server
 Always keep your system patched to the latest version

 Don't put a postmaster port on the Internet

 Firewall this port appropriately

 If that's not possible, make a read-only Replica database available on the port, not a R/W master

 Isolate the database port from other network traffic

 Don't rely solely on your front-end application to prevent unauthorized access to


your database

 Avoid using trust authentication in pg_hba.conf


General Recommendations - Database Users
 Provide each user with their own login

 Shared credentials make auditing more complicated and violate HIPAA, PCI, etc.

 Allow users the minimum access to do their jobs

 Use Roles and classes of privileges

 Use Views and View Security Barriers

 Use Row Level Security


General Recommendations - Connection Pooling
 When not practical to provide each user with their own login (i.e. connection
pooling is in use):

 Have one or more logins related to the application

 Limit access to the database by the specific IP addresses where the


application is certified to run

 Ensure the login(s) have minimum rights needed to do their work (e.g. SELECT
rights and only to specified tables)
General Recommendations - Database Superuser
 Only allow the database superuser to log in from the server machine
itself, with local or localhost connection

 Reserve use of superuser accounts for tasks or roles where it is


absolutely required

 Make as few objects owned by the superuser as necessary

 Restrict access to configuration files (postgresql.conf and pg_hba.conf)


and error log files to administrators

 Disallow host system login by database superuser roles ('postgres‘)


General Recommendations - Database Superuser
(continued)
 Do not allow superuser to log into database server OS. Use personal OS login
and then “sudo” to create an audit trail

 Use a separate database login to own each database and own everything in it
General Recommendations - Database Backups
 Keep backups and have a tested recovery plan. No matter how well you secure
things, it's still possible an intruder could get in and delete or modify your data

 Have scripts perform backups and immediately test them and alert DBA on any
failures

 Keep backups physically separate from the database server. A disaster can
strike and take out an entire location, whether that’s environmental (e.g.
earthquake), malicious (e.g. hacker, insider), or human error
General Recommendations - Think AAA

Authenticate Authorize Audit

Verify the user is Verify the user is Record which


who she claims allowed access user did what and
to be when they did it
Module Summary
 Database Security Requirements and Protection Plan

 Levels of Security in Postgres

 Access Control using pg_hba.conf

 Introduction to Row Level Security

 Data Encryption

 General Security Recommendations


Lab Exercise - 1
1. You are working as a Postgres DBA. Your server box has 2 network cards with
ip addresses 192.168.30.10 and 10.4.2.10. 192.168.30.10 is used for the
internal LAN and 10.4.2.10 is used by the web server to connect users from an
external network. Your server should accept TCP/IP connections both from
internal and external users.
 Configure your server to accept connections from external and internal networks.
Lab Exercise - 2
1. A new developer has joined the team with ID number 89

 Create a new user by name dev89 and password password89

 Then assign the necessary privileges to dev89 so they can


connect to the edbstore database and view all tables
Lab Exercise - 3
1. A new developer joins e-music corp. They have an ip address
192.168.30.89. They are not able to connect from their machine to
the Postgres and gets the following error on the server:

FATAL: no pg_hba.conf entry for host “192.168.30.89", user


“dev89", database “edbstore", SSL off

2. Configure your server so that the new developer can connect from
their machine
Monitoring and
Admin Tools

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Overview and Features of pgAdmin

 Access pgAdmin

 Register and Connect to a Database Server

 General Database Administration

 Object Browser - View Data, Query Tool, Server Status

 Overview of Postgres Enterprise Manager


Introduction to pgAdmin
 Open-source graphical user interface for Postgres

 Create, manage and maintain database objects

 pgAdmin is web based and requires Apache HTTP server

 Download and Install: https://www.pgadmin.org/download/


pgAdmin Features
 Multi-platform
 Supports PostgreSQL and EDB Postgres Advanced Server
 Multi-deployment Mode – Desktop, Server
 Integrated SQL IDE
 pl/pgsql and edb-spl Debugger
 Schema Diff Tool
 ERD Tool
 Perform Maintenance Tasks – Vacuum, Backups, Restore etc.
 Job Scheduler
 Multibyte server-side encoding support
Installing pgAdmin on Linux
 Install the EPEL repository:
 sudo dnf install -y epel-release

 Install pgadmin repository:


 sudo rpm -i
https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-
repo-2-1.noarch.rpm
 sudo dnf makecache

 Use the yum command to install pgAdmin package:


 sudo dnf install -y pgadmin4
Post Installation Script
 Run post-install script to configure the Apache-HTTP:
 sudo dnf install -y policycoreutils-python-utils
 /usr/pgadmin4/bin/setup-web.sh

[rocky@pgsrv1 ~]$ sudo /usr/pgadmin4/bin/setup-web.sh


Setting up pgAdmin 4 in web mode on a Redhat based platform...
……………………
Enter the email address and password to use for the initial pgAdmin user account:
Email address: [email protected]
Password:
Retype password:
pgAdmin 4 - Application Initialisation
======================================
Creating storage and log directories...
Configuring SELinux...
……………………
The Apache web server is not running. We can enable and start the web server for you to finish pgAdmin 4
installation. Continue (y/n)? Y
Created symlink /etc/systemd/system/multi-user.target.wants/httpd.service → /usr/lib/systemd/system/httpd.service.
Apache successfully enabled.
Apache successfully started.
You can now start using pgAdmin 4 in web mode at http://127.0.0.1/pgadmin4
[rocky@pgsrv1 ~]$
Practice Lab - Install pgAdmin
 Connect to the Linux VM as sudo user

 Execute following commands to install and configure pgAdmin:

 sudo dnf install -y epel-release

 sudo rpm -i
https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-
repo-2-1.noarch.rpm

 sudo dnf makecache

 sudo dnf install -y pgadmin4 policycoreutils-python-utils

 /usr/pgadmin4/bin/setup-web.sh
Access pgAdmin Web Interface
 Open a browser and type: http://<IP>/pgadmin4

 Enter email address and password provided during post install script.

 Click Login
pgAdmin - User Interface
Registering a Server

 Right Click on the


server to add a server
Common Connection Problems
 There are 2 common error messages that you encounter while
connecting to a PostgreSQL database:

Could not connect to Server - Connection refused

 This error occurs when either the database server isn't running OR the port 5432 may
not be open on database server to accept external TCP/IP connections.

FATAL: no pg_hba.conf entry

 This means your server can be contacted over the network but is not configured to
accept the connection. Your client is not detected as a legal user for the database.
You will have to add an entry for each of your clients to the pg_hba.conf file.
Query Tool

Click on a
Database

Click on
Query Tool
Query Tool - Data Output

Type SQL Query Click on Execute Button

View Results
Databases
 The databases menu allows you to create a new database

 The menu for an individual database allows you to perform


operations on that database
 Create a new object in the database

 Drop the database

 Open the Query Tool with a script to re-create the database

 Perform maintenance

 Backup or Restore

 Modify the database properties


Creating a Database
Backup and Restore
Schemas
Schemas - Grant Wizard
Domains
Sequences
Tables
Tables - Indexes
Tables - Maintenance
Rules
 Rules can be
applied to tables
or views
Triggers
 Create a trigger
function before
creating a trigger
Views
Create Tablespaces
Roles
Dashboard
 Server Sessions

 Transaction per second

 Tuples in

 Tuples out

 Block I/O

 Server activity - sessions


Overview of Postgres
Enterprise Manager (PEM)
Postgres Enterprise Manager

Manage, monitor, and tune Postgres at scale

Manage from one Optimize database Monitor system Integrate with


interface performance health other tools

One place to In-depth diagnostics Built-in dashboards APIs and webhooks


visualize and for database reports and customizable to fetch data, send
manage everything and tuning alert thresholds alerts, and manage
servers
PEM - Features
Manage, Monitor and Tune PostgreSQL and EDB Postgres Advanced
Server running on multiple Platforms

Management Monitoring Tuning


 Integrated SQL IDE  Customizable charts and  Detailed performance
 Built-in query debugger dashboards diagnostics
 User/group access  Predefined and custom alerts  SQL profiler
management via email or SNMP  Capacity management
 Schema Diff  User-defined metrics log  Log manager
analysis
 Session profiling  Expert wizards for
 Database and OS level configuration setup
 Job scheduling monitoring
 Backup and failover  Web hooks and REST API for
management integrations
PEM Architecture

PEM Web Application PostgreSQL EPAS

HTTPD

Monitoring
PEM Agent Data PEM Agent

Managed Host Machine


Client(Browser)
Monitoring
Data EPAS

PEM Storage
(Backend Database: pem) Monitoring PEM Agent
Data

Managed Host Machine

PEM Server Host Machine Managed Hosts


Module Summary
 Overview and Features of pgAdmin

 Access pgAdmin

 Register and Connect to a Database Server

 General Database Administration

 Object Browser - View Data, Query Tool, Server Status

 Overview of Postgres Enterprise Manager


Lab Exercise 1
 Open pgAdmin 4 and connect to the default PostgreSQL database cluster

 Create a user named pguser

 Create a database named pgdb owned by pguser

 After creating the pgdb database change its connection limit to 4

 Create a schema named pguser inside the pgdb database

 The schema owner should be pguser


Lab Exercise 2
 You have created the pgdb database with the pguser schema. Create following
objects in the pguser schema:
 Table - Teams with columns TeamID, TeamName, TeamRatings

 Sequence - seq_teamid start value - 1 increment by 1

 Columns - Change the default value for the TeamID column to seq_teamid

 Constraint - TeamRatings must be between 1 and 10

 Index - Primary Key TeamID

 View - Display all teams in ascending order of their ratings. Name the view as vw_top_teams
Lab Exercise 3
 View all rows in the Teams table.

 Using the Edit data window, you just opened in the previous step, insert the
following rows into the Teams table:

TeamID TeamName TeamRatings


Auto generated Oilers 1
Auto generated Rangers 6
Auto generated Canucks 8
Auto generated Blackhawks 5
Auto generated Bruins 2
SQL Primer

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Data Types  Sequences

 Structured Query Language  Domains

 DDL, DML and DCL Statements  SQL Joins and Functions

 Transaction Control Statements  Explain Plans

 Tables and Constraints  Quoting in PostgreSQL

 Views and Materialized Views  Indexes


Data Types
 Common Data Types:

Numeric Types Character Types Date/Time Types Other Types

TIMESTAMP BYTEA
NUMERIC CHAR
BOOL
DATE MONEY
INTEGER VARCHAR
TIME XML
JSON
SERIAL TEXT INTERVAL JSONB
Structured Query Language

Data Definition Data Manipulation Transaction Control


Data Control Language
Language Language Language

CREATE COMMIT
INSERT
GRANT
ALTER ROLLBACK
UPDATE
DROP SAVEPOINT
REVOKE
DELETE
TRUNCATE SET TRANSACTION
DDL Statements
Statement Syntax

CREATE [TEMPORARY][UNLOGGED] TABLE table_name


( [column_name data_type [ column_constraint] )
[ INHERITS ( parent_table) ]
CREATE TABLE
[ TABLESPACE tablespace_name ]
[ USING INDEX TABLESPACE tablespace_name ]
[ PARTITION BY { RANGE | LIST | HASH } (column_name|( expression) ]

ALTER TABLE ALTER TABLE [IF EXISTS] [ONLY] name [*] action [,…]

DROP TABLE DROP TABLE [ IF EXISTS ] name [, …] [ CASCADE | RESTRICT ]

TRUNCATE TABLE TRUNCATE [ TABLE ] [ ONLY ] name [ * ] [, ….]


DML Statements
Statement Syntax

INSERT INTO table_name [ ( column_name [, ...] ) ]


INSERT
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [,...] | query }

UPDATE [ ONLY ] table_name


UPDATE SET column_name = { expression | DEFAULT }
[ WHERE condition]

DELETE FROM [ ONLY ] table_name


DELETE
[ WHERE condition ]

SELECT [ ALL | DISTINCT ] [ * | expression ]


SELECT
[FROM table [,.. ]
DCL Statements

Statement Syntax

GRANT { { SELECT | INSERT | UPDATE ……} [, … ] | ALL [PRIVILEGES ] }


GRANT ON { [ TABLE ] table_name [, …] | ALL TABLES IN SCHEMA schema_name [ ,…] }
TO role_specification [, …] [ WITH GRANT OPTION ]

REVOKE [ GRANT OPTION FOR ]


{ { SELECT | INSERT | UPDATE ……} [, … ] | ALL [PRIVILEGES ] }
REVOKE
ON { [ TABLE ] table_name [, …] | ALL TABLES IN SCHEMA schema_name [ ,…] }
FROM { [ GROUP ] role_name | PUBLIC } [, …]
Transaction Control Language

Statement Syntax

COMMIT COMMIT [ WORK | TRANSACTION ]

ROLLBACK ROLLBACK [ WORK | TRANSACTION ]

SAVEPOINT SAVEPOINT savepoint_name

SET TRANSACTION SET TRANSACTION transaction_mode [, …]


Database Objects

Object Description

TABLE Named collection of rows

VIEW Virtual table, can be used to hide complex queries

SEQUENCE Used to automatically generate integer values that follow a pattern

INDEX A common way to enhance query performance

DOMAIN A data type with optional constraints


Tables
 A table is a named collection of rows
 Each table row has same set of columns
 Each column has a data type
 Tables can be created using the CREATE TABLE statement
 Syntax:
Types of Constraints
 Constraints are used to enforce data integrity
 PostgreSQLsupports different types of constraints:
 NOT NULL
 CHECK
 UNIQUE
 PRIMARY KEY
 FOREIGN KEY

 Constraints can be defined at the column level or table level


 Constraints can be added to an existing table using the ALTER TABLE
statement
 Constraints can be declared DEFERRABLE or NOT DEFERRABLE
 Constraints prevent the deletion of a table if there are dependencies
Views
 A View is a Virtual Table and can be used to hide complex queries
 Can also be used to represent a selected view of data
 Simple views are updatable and allow non-updatable columns
 Views can be created using the CREATE VIEW statement
 Syntax:
=> CREATE [ OR REPLACE ] VIEW name [ ( column_name [, ...] ) ]
[ WITH ( view_option_name [= view_option_value] [, ... ] ) ]
AS query
Sequences
 A sequence is used to automatically generate integer values that follow a
pattern
 A sequence has a name, start point and an end point
 Sequence values can be cached for performance
 Sequence can be used using CURRVAL and NEXTVAL functions
 Syntax:
=> CREATE SEQUENCE name [ INCREMENT [ BY ] increment ]
[ MINVALUE minvalue] [ MAXVALUE maxvalue]
[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
[ OWNED BY { table_name.column_name | NONE } ]
Domains
 A domain is a data type with optional constraints
 Domains can be used to create a data type which allows a selected list of values

Table: emp
Column: cityname
Data Type: city

Domain: city Table: shop


Allowed Values: Edmonton, Column: shoplocation
Calgary, Red Deer Data Type: city

Table: clients
Column: res_city
Data Type: city
Types of JOINS

Type Description

INNER JOIN Returns all matching rows from both tables

Returns all matching rows and rows from left-hand table even if there is no
LEFT OUTER JOIN
corresponding row in the joined table

Returns all matching rows and rows from right-hand table even if there is
RIGHT OUTER JOIN
no corresponding row in the joined table

FULL OUTER JOIN Returns all matching as well as not matching rows from both tables

CROSS JOIN Returns all rows of both tables with Cartesian product on number of rows
Using SQL Functions
 Can be used in SELECT statements and WHERE clauses
 Includes
 String Functions
 Format Functions
 Date and Time Functions
 Aggregate Functions

 Example:
=> SELECT lower(name)FROM departments;

=> SELECT * FROM departments


WHERE lower(name) = 'development';
SQL Format Functions
Function Return Type Description Example
convert time stamp to to_char(current_timestamp,
to_char(timestamp, text) text
string 'HH12:MI:SS')
convert interval to to_char(interval '15h 2m 12s',
to_char(interval, text) text
string 'HH24:MI:SS')
convert integer to
to_char(int, text) text to_char(125, '999')
string
to_char(double real/double precision to
text to_char(125.8::real, '999D9')
precision, text) strconvert ing
convert numeric to
to_char(numeric, text) text to_char(-125.8, '999D99S')
string
to_date('05 Dec 2000',
to_date(text, text) date convert string to date
'DD Mon YYYY')
convert string to to_number('12,454.8-',
to_number(text, text) numeric
numeric '99G999D9S')
timestamp with convert string to time to_timestamp('05 Dec 2000',
to_timestamp(text, text)
time zone stamp 'DD Mon YYYY')
to_timestamp(double timestamp with convert Unix epoch to
to_timestamp(1284352323)
precision) time zone time stamp
Execution Plan
 An execution plan shows the detailed steps necessary to execute a SQL statement

 Planner is responsible for generating the execution plan

 The Optimizer determines the most efficient execution plan

 Optimization is cost-based, cost is estimated resource usage for a plan

 Cost estimates rely on accurate table statistics, gathered with ANALYZE

 Costs also rely on seq_page_cost, random_page_cost, and others

 The EXPLAIN command is used to view a query plan

 EXPLAIN ANALYZE is used to run the query to get actual runtime stats
Execution Plan Components
 Execution Plan Components:  Syntax:

 Cardinality - Row Estimates EXPLAIN [ ( option [, ...] ) ] statement


EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
 Access Method - Sequential or Index where option can be one of:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
 Join Method - Hash, Nested Loop etc. COSTS [ boolean ]
SETTINGS [ boolean ]
 Join Type, Join Order GENERIC_PLAN [ boolean ]
BUFFERS [ boolean ]
WAL [ boolean ]
 Sort and Aggregates TIMING [ boolean ]
SUMMARY [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
Explain Example
 Example
postgres=# EXPLAIN SELECT * FROM emp;
QUERY PLAN
------------------------------------------------------
Seq Scan on emp (cost=0.00..1.14 rows=14 width=145)

 The numbers that are quoted by EXPLAIN are:


 Estimated start-up cost

 Estimated total cost

 Estimated number of rows output by this plan node

 Estimated average width (in bytes) of rows output by this plan node
PEM - Query Tool’s Visual Explain
Quoting
 Single quotes and dollar quotes are used to specify non-numeric values
 Example:
'hello world'
'2011-07-04 13:36:24'
'{1,4,5}'
$$A string "with" various 'quotes' in.$$
$foo$A string with $$ quotes in $foo$
 Double quotes are used for names of database objects which either clash with
keywords, contain mixed case letters, or contain characters other than a-z, 0-9
or underscore
 Example:
SELECT * FROM "select“
CREATE TABLE "HelloWorld" ...
SELECT * FROM "Hi everyone and everything"
Indexes
 Indexes are a common way to enhance performance

 Postgres supports several index types:

Block Range
B-tree SP-GiST Index on
Hash Index (BRIN) GIN GIST
(default) Indexes Expressions
Example Index
 Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [
ONLY ] table_name [ USING method ] ( { column_name | ( expression ) } [
COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [
ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ NULLS [ NOT ] DISTINCT ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
 Example:
Module Summary
 Data Types  Sequences

 Structured Query Language  Domains

 DDL, DML and DCL Statements  SQL Joins and Functions

 Transaction Control Statements  Explain Plans

 Tables and Constraints  Quoting in PostgreSQL

 Views and Materialized Views  Indexes


Lab Exercise - 1
 Test your knowledge:
1. Initiate a psql session
2. psql commands access the database True/False
3. The following SELECT statement executes successfully: True/False
=> SELECT ename, job, sal AS Salary FROM emp;
4. The following SELECT statement executes successfully: True/False
=> SELECT * FROM emp;
5. There are coding errors in the following statement. Can you identify them?
=> SELECT empno, ename, sal * 12 annual salary FROM emp;
Lab Exercise - 2
 The staff in the HR department wants to hide some of the data in the EMP table.
They want a view called EMPVU based on the employee numbers, employee
names, and department numbers from the EMP table. They want the heading
for the employee name to be EMPLOYEE.

 Confirm that the view works. Display the contents of the EMPVU view.

 Using your EMPVU view, write a query for the SALES department to display all
employee names and department numbers.
Lab Exercise - 3
 You need a sequence that can be used with the primary key column of the dept
table. The sequence should start at 60 and have a maximum value of 200. Have
your sequence increment by 10. Name the sequence dept_id_seq.

 To test your sequence, write a script to insert two rows in the dept table.
Backup, Recovery
and PITR

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Backup Types
 Database SQL Dumps
 Restoring SQL Dumps
 Offline Physical Backups
 Continuous Archiving
 Online Physical Backups Using pg_basebackup
 Point-in-time Recovery
 Recovery Settings
 Backup Tools – Barman and pgBackRest
Types of Backup
 As with any database, PostgreSQL databases should be backed up regularly

Logical Backups

 Database SQL Dumps using pg_dump


 Database Cluster SQL Dump using pg_dumpall

Physical Backups

 Offline File System Level Backups using OS commands


 Online File System Level Backups using pg_basebackup
 Backup Tool – Barman and pgBackRest
Logical Backups
Database SQL Dump
 Generate a text file with SQL commands
 PostgreSQLprovides the utility program pg_dump for this purpose
 pg_dump does not block readers or writers
 pg_dump does not operate with special permissions
 Dumps created by pg_dump are internally consistent, that is, the dump
represents a snapshot of the database as of the time pg_dump begins running
 Syntax:
$ pg_dump [options] [dbname]
pg_dump Options
-a - Data only. Do not dump the data definitions (schema)
-s - Data definitions (schema) only. Do not dump the data
-n <schema> - Dump from the specified schema only
-t <table> - Dump specified table only
-f <file name> - Send dump to specified file. Filename can be specified using absolute or relative location
-Fp - Dump in plain-text SQL script (default)
-Ft - Dump in tar format
-Fc - Dump in compressed, custom format
-Fd - Dump in directory format
-j njobs - Dump in parallel by dumping n jobs tables simultaneously. Only supported with –Fd
-B, --no-blobs - Excludes large objects in dump
-v - Verbose option
SQL Dump - Large Databases
 If the operating system has maximum file size limits, it can cause problems
when creating large pg_dump output files

 Standard Unix tools can be used to work around this potential problem

 Use a compression program, for example gzip:

$ pg_dump dbname | gzip > filename.gz

 The split command allows you to split the output into smaller files:

$ pg_dump dbname | split -b 1m - filename


Restore – SQL Dump

Backups taken using pg_dump with plain text


format(Fp) psql client
Backups taken using pg_dumpall

Backup taken using pg_dump with custom(Fc),


tar(Ft) or director(Fd) formats
Supports parallel jobs for during restore pg_restore utility
Selected objects can be restored
pg_restore Options
-l - Display TOC of the archive file

-F [c|d|t] - Backup file format

-d <database name> - Connect to the specified database. Also restores to this database if -C option is omitted

-C - Create the database named in the dump file and restore directly into it

-a - Restore the data only, not the data definitions (schema)

-s - Restore the data definitions (schema) only, not the data

-n <schema> - Restore only objects from specified schema

-N <schema> - do not restore objects in this schema

-t <table> - Restore only specified table

-v - Verbose option
Entire Cluster - SQL Dump
 pg_dumpall is used to dump an entire database cluster in plain-text SQL format

 Dumps global objects - users, groups, and associated permissions

 Use psql to restore

 Syntax:

$ pg_dumpall [options…] > filename.backup


pg_dumpall Options
-a - Data only. Do not dump schema
-s - Data definitions (schema) only
-g - Dump global objects only - not databases
-r - Dump only roles
-c - Clean (drop) databases before recreating
-O - Skip restoration of object ownership
-x - do not dump privileges (grant/revoke)
-v - Verbose option
--disable-triggers - disable triggers during data-only restore
--no-role-passwords - do not dump passwords for roles. This allows use of pg_dumpall by non-superusers
--exclude-database - exclude database whose name match with given pattern
Physical Backups
Backup - File system level backup
 An alternative backup strategy is to directly copy the files that Postgres uses to
store the data in the database
 You can use whatever method you prefer for doing usual file system backups,
for example:
$ tar -cf backup.tar /usr/local/edb/data

 The database server must be shut down or in backup mode in order to get a
usable backup
 File system backups only work for complete backup and restoration of an entire
database cluster
 Two types of File system backup
 Offline backups

 Online backups
File System Backups

Offline Backups

 Taken using OS Copy command


 Database Server must be shutdown
 Cluster Level Backup and Restore

Online Backups

 Continuous archiving must be enabled


 Database server start/end backup mode
 Cluster Level Backup and Restore with PITR
 Methods - pg_basebackup, Barman, pgBackRest
Continuous Archiving
 Postgres maintains WAL files for all transactions in pg_wal directory
 Postgres automatically maintains the WAL logs which are full and switched
 Continuous archiving can be setup to keep a copy of switched WAL Logs which
can be later used for recovery
 It also enables online file system backup of a database cluster
 Requirements:
 wal_level must be set to replica

 archive_mode must be set to on (can be set to always)

 archive_command must be set in postgresql.conf which archives WAL logs and supports PITR
Continuous Archiving Methods

 Parameters in postgresql.conf file


 wal_level = replica
Archiver Process  archive_mode = on
 archive_command = ‘cp -i %p /edb/archive/%f’
 Restart the database server
 Archive files are generated after every log switch

 Parameters in postgresql.conf file


 wal_level = replica
 archive_mode = on
Streaming WAL  max_wal_senders = 3
 Restart the database server
 pg_receivewal –h localhost –D /edb/archive
 Transactions are streamed and written to archive files
Base Backup Using pg_basebackup Tool
 pg_basebackup can take an online base backup of a database cluster

 This backup can be used for PITR or Streaming Replication

 pg_basebackup makes a binary copy of the database cluster files

 System is automatically put in and out of backup mode


pg_basebackup - Online Backup
 Steps require to take Base Backup:
 Modify pg_hba.conf
host replication postgres [Ipv4 address of client]/32 scram-sha-256
 Modify postgresql.conf

wal_level = replica
archive_command = 'cp -i %p /home/postgres/archive/%f‘
archive_mode = on
max_wal_senders = 3
wal_keep_size = 512

 Backup Command:
$ pg_basebackup [options] ..
Options for pg_basebackup command
-D <directory name> - Location of backup
-F <p or t> - Backup files format. Plain(p) or tar(t)
-R - write standby.signal and append postgresql.auto.conf
-T OLDDIR=NEWDIR - relocate tablespace in OLDDIR to NEWDIR
--waldir - Write ahead logs location
-z - Enable compression(tar) for files
-Z - Compress backup based on setting set to none, client or server
-P - Progress Reporting
-h host - host on which cluster is running
-p port - cluster port

 To create a base backup of the server at localhost and store it in the local
directory /home/postgres/pgbackup
$ pg_basebackup -h localhost -D /home/postgres/pgbackup
Verify Base Backups
 Verify backup taken by pg_basebackup using pg_verifybackup utility
 Backup is verified against a backup_manifest generated by the server at the
time of the backup
 Only plain format backups can be verified
Restoring Physical Backups
Point-in-time Recovery
 Point-in-time recovery (PITR) is the ability to restore a database cluster
up to the present or to a specified point of time in the past

 Uses a full database cluster backup and the write-ahead logs found in
the /pg_wal subdirectory

 Must be configured before it is needed (write-ahead log archiving must


be enabled)
Performing Point-in-Time Recovery

Prepare Restore Configure Recover


Stop the server Copy data cluster Configure Start the server
files and folders recovery settings using service or
Take a file from backup in pg_ctl utility
system level location to the postgresql.conf
backup if data directory file Check error log
possible for any issue
Use cp -rp to Create
Clean the data preserve recovery.signal recovery.signal
directory privileges file in the data file is removed
directory automatically
after recovery
Point-in-Time Recovery Settings
 Restoring archived WAL using restore_command parameter:
 Unix:
restore_command = 'cp /home/postgres/archive/%f "%p"'
 Windows:
restore_command = 'copy c:\\mnt\\server\\archivedir\\"%f" "%p"'
 Recovery target settings:
recovery_target_name
recovery_target_time
recovery_target_xid
recovery_target_action
Backup and Recovery Tools
Backup And Recovery Manager(Barman)
 Open-source administration tool for remote backups and disaster recovery

 Manage backups and the recovery phase of multiple servers from one location

 Distributed under GNU GPL 3 and maintained by EDB


Barman Architecture http://docs.pgbarman.org/

 One Barman for multiple Primary Replica


Postgres servers

 Standard connection to Postgres


for management, coordination
and monitoring Barman
Backup
Server
 Standard replication connection Processing Tier Remote Tier

for running pg_basebackup and


pg_receivewal
Local Tier S3/Azure
 Supports rsync/SSH (Barman Storage)
Barman - Features https://www.pgbarman.org/about/

 Remote backup and restore with rsync and the PostgreSQL protocol

 Support for file level incremental backups with rsync

 Retention policy support

 WAL Archive Compression with gzip, bzip2, or pigz

 Backup data verification

 Backup with RPO=0 using a synchronous physical streaming replication


connection
 Rate limiting
Postgres Backup And Restore
pgBackRest

Solves common
Fully supported Open
bottleneck problems with Support capabilities like
Source bacOpen-
parallel processing for symmetric encryption,
Sourceth troubleshooting
backup, compression, and partial restore
support
restoring and archiving
-

Feature comparison
Capability Added value Barman pgBackRest Pg_basebackup
SSH protocol support Yes Yes -
PostgreSQL protocol Works without passwordless ssh. Yes - Yes
Incremental backups Yes Yes -
RPO=0 Restore up to the last commit Yes - -
Rate limiting Preserve IO for Postgres Yes - Yes
Retention and List backups Yes Yes -
Backup compression Less backup space required - Yes -
Symmetric encryption Lower security footprint for the backup data - Yes -
Partial restore (only selected Restore required data for analysis purposes - Yes -
databases)
S3 and Azure Blob Support Use flexible Cloud Storage for backup storage Yes Yes -

Nagios integration Monitor your backups with Nagios Yes Yes -


Module Summary
 Backup Types
 Database SQL Dumps
 Restoring SQL Dumps
 Offline Physical Backups
 Continuous Archiving
 Online Physical Backups Using pg_basebackup
 Point-in-time Recovery
 Recovery Settings
 Backup Tools – Barman and pgBackRest
Lab Exercise - 1
1. The edbstore website database is all setup and as a DBA you need to plan a
proper backup strategy and implement it
 As the root user, create a folder /pgbackup and assign ownership to the Postgres user
using the chown utility or the Windows security tab in folder properties

 Take a full database dump of the edbstore database with the pg_dump utility. The dump
should be in plain text format

 Name the dump file as edbstore_full.sql and store it in the /pgbackup directory
Lab Exercise - 2
1. Take a dump of the edbuser schema from the edbstore database
and name the file as edbstore_schema.sql

2. Take a data-only dump of the edbstore database, disable all triggers


for a faster restore, use the INSERT command instead of COPY, and
name the file as edbstore_data.sql

3. Take a full dump of customers table and name the file as


edbstore_customers.sql
Lab Exercise - 3
1. Take a full database dump of edbstore in compressed format using the
pg_dump utility, name the file as edbstore_full_fc.dmp

2. Take a full database cluster dump using pg_dumpall. Remember pg_dumpall


supports only plain text format; name the file edbdata.sql
Lab Exercise - 4
 In this exercise you will demonstrate your ability to restore a database.

1. Drop database edbstore.

2. Create database edbstore with owner edbuser.

3. Restore the full dump from edbstore_full.sql and verify all the objects
and their ownership.

4. Drop database edbstore.

5. Create database edbstore with edbuser owner.

6. Restore the full dump from the compressed file edbstore_full_fc.dmp


and verify all the objects and their ownership.
Lab Exercise - 5
1. Create a directory /opt/arch or c:\arch and give ownership to the
Postgres user.

2. Configure your cluster to run in archive mode and set the archive log location
to be /opt/arch or c:\arch.

3. Take a full online base backup of your cluster in the /pgbackup directory
using the pg_basebackup utility.
Routine
Maintenance
Tasks

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Updating Optimizer Statistics

 Handling Data Fragmentation using Routine Vacuuming

 Preventing Transaction ID Wraparound Failures

 Automatic Maintenance using Autovacuum

 Re-indexing in Postgres
Database Maintenance
 Data files become fragmented as data is modified and deleted

 Database maintenance helps reconstruct the data files

 If done on time nobody notices but when not done everyone knows

 Must be done before you need it

 Improves performance of the database

 Saves database from transaction ID wraparound failures


Maintenance Tools
 Maintenance thresholds can be configured using the pgAdmin Client
 Postgres maintenance thresholds can be configured in postgresql.conf
 Manual scripts can be written watch stat tables like pg_stat_user_tables
 Maintenance commands:
 ANALYZE
 VACUUM
 CLUSTER
 Maintenance command vacuumdb can be run from OS prompt
 Autovacuum can help in automatic database maintenance
Optimizer Statistics
 Optimizer statistics play a vital role in query planning

 Not updated in real time

 Collects information for relations including size, row counts, average row size
and row sampling

 Stored permanently in catalog tables

 The maintenance command ANALYZE updates the statistics


Example - Updating Statistics
Data Fragmentation and Bloat
 Data is stored in data file pages

 An update or delete of a row


does not immediately remove
the row from the disk page

 Eventually this row space


becomes obsolete and causes
fragmentation and bloating
Routine Vacuuming
 Obsoleted rows can be removed or reused using vacuuming

 Helps in shrinking data file size when required

 Vacuuming can be automated using autovacuum

 The VACUUM command locks tables in access exclusive mode

 Long running transactions may block vacuuming, thus it should be done during
low usage times
Vacuuming Commands
 When executed, the VACUUM command:

 Can recover or reuse disk space occupied by obsolete rows

 Updates data statistics

 Updates the visibility map, which speeds up index-only scans

 Protects against loss of very old data due to transaction ID wraparound

 The VACUUM command can be run in two modes:


 VACUUM

 VACUUM FULL
Vacuum and Vacuum Full
 VACUUM

 Removes dead rows and marks the space available for future reuse

 Does not return the space to the operating system

 Space is reclaimed if obsolete rows are at the end of a table

 VACUUM FULL

 More aggressive algorithm compared to VACUUM

 Compacts tables by writing a complete new version of the table file with no dead space

 Takes more time

 Requires extra disk space for the new copy of the table, until the operation completes
VACUUM Syntax

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

 Options:
 FULL [ boolean ]  PARALLEL integer

 FREEZE [ boolean ]  INDEX_CLEANUP { AUTO | ON | OFF }

 VERBOSE [ boolean ]  PROCESS_MAIN [ boolean ]

 ANALYZE [ boolean ]  PROCESS_TOAST [ boolean ]

 DISABLE_PAGE_SKIPPING [ boolean ]  SKIP_DATABASE_STATS [ boolean ]

 SKIP_LOCKED [ boolean ]  ONLY_DATABASE_STATS [ boolean ]

 TRUNCATE [ boolean ]  BUFFER_USAGE_LIMIT [ size ]


Example - Vacuuming
Example – Vacuuming (continued)
Preventing Transaction ID Wraparound Failures
 MVCC depends on transaction ID numbers

 Transaction IDs have limited size (32 bits at this writing)

 A cluster that runs for a long time (more than 4 billion transactions)
would suffer transaction ID wraparound

 This causes a catastrophic data loss

 To avoid this problem, every table in the database must be


vacuumed at least once for every two billion transactions
Vacuum Freeze
 VACUUM FREEZE will mark rows as frozen

 Postgres reserves a special XID, FrozenTransactionId

 FrozenTransactionId is always considered older than every normal XID

 VACUUM FREEZE replaces transaction IDs with FrozenTransactionId, thus


rows will appear to be “in the past”
 vacuum_freeze_min_age controls when a row will be frozen

 VACUUM normally skips pages without dead row versions, but some rows may
need FREEZE
 vacuum_freeze_table_age controls when a whole table must be scanned
The Visibility Map
 Each heap relation has a Visibility Map which keeps track of which pages
contain only tuples

 Stored at <relfilenode>_vm

 Helps vacuum to determine whether pages contain dead rows

 Can also be used by index-only scans to answer queries

 VACUUM command updates the visibility map

 The visibility map is vastly smaller, so can be cached easily


vacuumdb Utility
 The VACUUM command has a command-line executable wrapper called
vacuumdb

 vacuumdb can VACUUM all databases using a single command

 Syntax:

vacuumdb [OPTION]... [DBNAME]

 Available options can be listed using:

vacuumdb --help
Autovacuuming
 Highly recommended feature of Postgres
 It automates the execution of VACUUM, FREEZE and ANALYZE commands
 Autovacuum consists of a launcher and many worker processes
 A maximum of autovacuum_max_workers worker processes are allowed
 Launcher will start one worker within each database every
autovacuum_naptime seconds
 Workers check for inserts, updates and deletes and execute VACUUM and/or
ANALYZE as needed
 track_counts must be set to TRUE as autovacuum depends on statistics
 Temporary tables cannot be accessed by autovacuum
Autovacuuming Parameters

Autovacuum Launcher Autovacuum Worker


Vacuuming Thresholds
Process Processes
 autovacuum  autovacuum_max_workers  autovacuum_vacuum_scale_factor
 autovacuum_naptime  autovacuum_vacuum_threshold
 autovacuum_analyze_scale_factor
 autovacuum_analyze_threshold
 autovacuum_vacuum_insert_scale_threshold
 autovacuum_vacuum_insert_threshold
 autovacuum_freeze_max_age
Per-Table Thresholds
 Autovacuum workers are resource intensive
 Table-by-table autovacuum parameters can be configured for large tables
 Configure the following parameters using ALTER TABLE or CREATE TABLE:
 autovacuum_enabled
 autovacuum_vacuum_threshold
 autovacuum_vacuum_scale_factor
 autovacuum_analyze_threshold
 autovacuum_analyze_scale_factor
 autovacuum_vacuum_insert_scale_threshold
 autovacuum_vacuum_insert_threshold
 autovacuum_freeze_max_age
Routine Reindexing
 Indexes are used for faster data access

 UPDATE and DELETE on a table modify underlying index entries

 Indexes are stored on data pages and become fragmented over time

 REINDEX rebuilds an index using the data stored in the index's table

 Time required depends on:


Number of indexes
Size of indexes
Load on server when running command
When to Reindex
 There are several reasons to use REINDEX:

 An index has become "bloated", meaning it contains many empty or nearly-empty pages

 You have altered a storage parameter (such as fillfactor) for an index

 An index built with the CONCURRENTLY option failed, leaving an "invalid" index

 Syntax:
=> REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [
CONCURRENTLY ] name
Module Summary
 Updating Optimizer Statistics

 Handling Data Fragmentation using Routine Vacuuming

 Preventing Transaction ID Wraparound Failures

 Automatic Maintenance using Autovacuum

 Re-indexing in Postgres
Lab Exercise - 1
1. While monitoring table statistics on the edbstore database, you found that
some tables are not automatically maintained by autovacuum. You decided to
perform manual maintenance on these tables. Write a SQL script to perform
the following maintenance:
 Reclaim obsolete row space from the customers table.

 Update statistics for emp and dept tables.

 Mark all the obsolete rows in the orders table for reuse.

2. Execute the newly created maintenance script on edbstore database.


Lab Exercise - 2
1. The composite index named ix_orderlines_orderid on (orderid,
orderlineid) columns of the orderlines table is performing very slowly.
Write a statement to reindex this index for better performance.
Moving Data
Using COPY
Command

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Loading flat files

 Import and export data using COPY

 Examples of COPY Command

 Using COPY FREEZE for performance


Loading Flat Files into Database Tables
 A "flat file" is a plain text or mixed text file which usually
contains one record per line

 Postgres COPY command can be used to load flat files


into a database table

339
The COPY Command
 COPY moves data between Postgres tables and standard file-system files

 COPY TO copies the contents of a table or a query to a file

 COPY FROM copies data from a file to a table

 The file must be accessible to the server


COPY Command Syntax
Copy From:
 COPY table_name [(column list)] FROM 'filename'|PROGRAM 'command'|STDIN [options][WHERE cond.]

Copy To:
 COPY table_name[(column list])|(query) TO 'filename'|PROGRAM 'command'|STDOUT [options]

Copy Command Options


 FORMAT format_name
 FREEZE [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 DEFAULT 'default_string'
 HEADER [ boolean ]
 QUOTE 'quote_character'
 ESCAPE 'escape_character'
 FORCE_QUOTE { ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
 ENCODING 'encoding_name'
Example Export to File
=> COPY emp (empno,ename,job,sal,comm,hiredate) TO '/tmp/emp.csv' CSV HEADER;
COPY
=> \! cat /tmp/emp.csv
empno,ename,job,sal,comm,hiredate
7369,SMITH,CLERK,800.00,,17-DEC-80 00:00:00
7499,ALLEN,SALESMAN,1600.00,300.00,20-FEB-81 00:00:00
7521,WARD,SALESMAN,1250.00,500.00,22-FEB-81 00:00:00
7566,JONES,MANAGER,2975.00,,02-APR-81 00:00:00
7654,MARTIN,SALESMAN,1250.00,1400.00,28-SEP-81 00:00:00
7698,BLAKE,MANAGER,2850.00,,01-MAY-81 00:00:00
7782,CLARK,MANAGER,2450.00,,09-JUN-81 00:00:00
7788,SCOTT,ANALYST,3000.00,,19-APR-87 00:00:00
7839,KING,PRESIDENT,5000.00,,17-NOV-81 00:00:00
7844,TURNER,SALESMAN,1500.00,0.00,08-SEP-81 00:00:00
7876,ADAMS,CLERK,1100.00,,23-MAY-87 00:00:00
7900,JAMES,CLERK,950.00,,03-DEC-81 00:00:00
7902,FORD,ANALYST,3000.00,,03-DEC-81 00:00:00
7934,MILLER,CLERK,1300.00,,23-JAN-82 00:00:00
Example Import from File
edb=# CREATE TEMP TABLE empcsv (LIKE emp);
CREATE TABLE
edb=# COPY empcsv (empno, ename, job, sal, comm, hiredate)
edb-# FROM '/tmp/emp.csv' CSV HEADER;
COPY
edb=# SELECT * FROM empcsv;
empno | ename | job | mgr | hiredate | sal | comm | deptno
-------+--------+-----------+-----+--------------------+---------+---------+--------
7369 | SMITH | CLERK | | 17-DEC-80 00:00:00 | 800.00 | |
7499 | ALLEN | SALESMAN | | 20-FEB-81 00:00:00 | 1600.00 | 300.00 |
7521 | WARD | SALESMAN | | 22-FEB-81 00:00:00 | 1250.00 | 500.00 |
7566 | JONES | MANAGER | | 02-APR-81 00:00:00 | 2975.00 | |
7654 | MARTIN | SALESMAN | | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 |
7698 | BLAKE | MANAGER | | 01-MAY-81 00:00:00 | 2850.00 | |
7782 | CLARK | MANAGER | | 09-JUN-81 00:00:00 | 2450.00 | |
7788 | SCOTT | ANALYST | | 19-APR-87 00:00:00 | 3000.00 | |
7839 | KING | PRESIDENT | | 17-NOV-81 00:00:00 | 5000.00 | |
7844 | TURNER | SALESMAN | | 08-SEP-81 00:00:00 | 1500.00 | 0.00 |
7876 | ADAMS | CLERK | | 23-MAY-87 00:00:00 | 1100.00 | |
7900 | JAMES | CLERK | | 03-DEC-81 00:00:00 | 950.00 | |
7902 | FORD | ANALYST | | 03-DEC-81 00:00:00 | 3000.00 | |
7934 | MILLER | CLERK | | 23-JAN-82 00:00:00 | 1300.00 | |
(14 rows)
Example - COPY Command on Remote Host
 COPY command on remote host using psql
$ cat emp.csv | ssh 192.168.192.83 “psql –U edbstore edbstore
-c ‘copy emp from stdin;’ “
COPY FREEZE
 FREEZE option of COPY statement

 Add rows to a newly created table and freezes them

 Table must be created or truncated in current subtransaction

 Improves performance of initial bulk load

 Does violate normal rules of MVCC

 Usage:

=> COPY tablename FROM filename FREEZE;


Module Summary
 Loading flat files

 Import and export data using COPY

 Examples of COPY Command

 Using COPY FREEZE for performance

346
Lab Exercise - 1
 In this lab exercise you will demonstrate your ability to copy data:

1. Unload the emp table from the edbuser schema to a csv file, with column headers

2. Create a copyemp table with the same structure as the emp table

3. Load the csv file (from step 1) into the copyemp table
Replication and
High Availability
Tools

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Module Objectives
 Data Replication

 Data Replication in Postgres

 Streaming Replication and Architecture

 Synchronous, Asynchronous and Cascaded Replication

 Setup Streaming Replication

 Logical Replication Architecture

 Overview: EDB Postgres Distributed, EDB Failover Manager, Replication Server


and Replication Manager (repmgr)
Data Replication
 Replication is the process of copying data and changes to a secondary location
for data safety and availability
 Data loss can occur due to several reasons

 Replication is aimed towards availability of the data when a primary source


goes offline
 Data can be recovered from backup but downtimes are costly

 Replication aims towards lowering downtime

 Failovers can be configured to such a level where application may not notice
the primary source is offline
Data Replication in Postgres
 Data replication options:
 Physical Streaming Replication

 Logical Streaming Replication

 EDB Postgres Distributed

 Cluster management tools:


 EDB Failover Manager

 High Availability Routing for Postgres(HARP)

 Patroni

 Replication manager(repmgr)
Streaming Replication
 Streaming Replication (Hot Standby) is a major feature of Postgres

 Replica connects to the primary node using REPLICATION protocol

 WAL segments are streamed to replica server

 No log shipping delays, stream WAL content across to replica immediately

 Synchronous/Asynchronous options available

 Supports cascaded streaming replication


Hot Streaming Architecture
Production Replica

WAL Sender WAL Receiver

WAL stream

Reports
Primary database Replica database
Asynchronous Replication
 Streaming replication is asynchronous by default but can be configured as
synchronous

 Asynchronous

 Disconnected architecture

 Transaction is committed on primary and flushed to WAL segment

 Later transaction is transmitted to replica server(s) using stream

 Some data loss is possible

 Replication using WAL Archive method is always asynchronous


Synchronous Replication
 Synchronous Replication

 A 2-safe replication method offering zero data loss

 Transaction must apply changes to primary and synchronously replicated replicas using two-
phase commit actions

 User gets a commit message after confirmation from both primary and replica

 This will introduce a delay in committing transactions


Cascading Replication
Primary

 Streaming replication supports single


master node
 Cascade replication can be used to share
the replication overhead of primary with
other replicas
 Replica can stream changes to other Replica 1
Replicas
 Helps minimize inter-site bandwidth
overheads on primary node
 Asynchronous only
Replica 2 Replica 3
Setup Streaming Replication
Primary Server Configuration
 For Physical Streaming Replication:
 Change WAL Content parameter:

 wal_level = replica #Default is replica

 Two options to allow streaming connection:

 max_wal_senders

 max_replication_slots

 Wal files to be retained in pg_wal for streaming:

 wal_keep_size

 max_slot_wal_keep_size
Synchronous Streaming Replication Configuration
 Default level of Streaming Replication is Asynchronous

 Synchronous level can also be configured using additional parameters:

synchronous_commit=on
synchronous_standby_names

 If the synchronous replica stops responding, then COMMITs will be blocked


forever until someone manually intervenes
 Transactions can be configured not to wait for replication by setting the
synchronous_commit parameter to local or off
Configure Authentication
 Authentication setting on the primary server must allow replication connections
from the replica(s)
 Provide a suitable entry or entries in pg_hba.conf with the database field set to
replication
 Open pg_hba.conf of primary server:

host replication all 192.168.56.2/32 md5


Note - You will need to reload the primary server
Take a Full Backup of the Primary Server
 Backup the Primary Server using pg_basebackup:

pg_basebackup -h localhost -U postgres –p 5444 -D /backup/data1 -R

-R option

 Creates a default copy of standby.signal file

 Add primary server connection info to postgresql.auto.conf


Replica Configuration

hot_standby Set this parameter to “ON” for read-only replica

primary_conninfo Set connection string to connect with primary or cascaded replica

primary_slot_name Specify replication slot name to be used for connection

max_standby_streaming_delay Duration for which replica has to wait during query conflicts

wal_receiver_create_temp_slot Authorize WAL receiver process to be able to create a temporary replication slot

recovery_min_apply_delay Parameter used for delayed replication


Replica Recovery Settings
 Replica configuration settings must be set in postgresql.conf or
postgresql.auto.conf

 Create a file name standby.signal in the data directory

 standby.signal indicates the server should start as a replica

 Start the replica using system services or pg_ctl


Logical Replication
Logical Replication
 Logical replication is a method of
Publication 1 Subscription 1
replicating selected data objects and
their changes Publication 2 Subscription 2

 Based on publications and subscriptions WAL


Subscription
Primary Sender Standby
Worker

 Can be used to consolidate data Logical


Replication
Worker
Subscription 1
 Portable across hardware and software
Logical Subscription 2
version Replication
Launcher
Publication 3 Subscription 3
 Tables on standby server which are part
of a subscription must be treated as Publication 4 Subscription 4
read only to avoid conflicts
Reporting
Consolidate
Upgrades
When to Use Logical Replication
 Sending incremental changes in a single database or a subset of a
database to subscribers

 Consolidating multiple databases into a single one

 Replicating between different major versions of Postgres

 Giving access to replicated data to different groups of users

 Sharing a subset of the database between multiple databases


Setting Up Logical Replication

 Change wal_level to logical in postgresql.conf

 Add pg_hba.conf entry in each server to allow connection

 Connect to database in publication instance

 Create a publication using CREATE PUBLICATION statement

 A published table must have a “replica identity” configured in order to be able


to replicate UPDATE and DELETE operations

 Connect to database in subscription instance and create a subscription using


CREATE SUBSCRIPTION statement
Example – Logical Replication Setup
 Initialize sample publication(primarypub) and subscription(primarysub)
instance
[postgres@localhost ~]$ initdb --version
[postgres@localhost ~]$ initdb –D primarypub –U pubdba
[postgres@localhost ~]$ initdb –D primarysub –U subdba

 Edit postgresql.conf parameter for both instances


[postgres@localhost ~]$ vi primarypub/postgresql.conf
 port=5420
 wal_level=logical
[postgres@localhost ~]$ vi primarysub/postgresql.conf
 port=5421
 wal_level=logical
Example – HBA Entries and Starting Instances
 Add pg_hba.conf entries for connections
[postgres@localhost ~]$ vi primarysub/pg_hba.conf
host all pubdba 192.168.56.101/32 md5
[postgres@localhost ~]$ vi primarypub/pg_hba.conf
host all subdba 192.168.56.101/32 md5

 Start both instances


[postgres@localhost ~]$ pg_ctl –D primarypub/ start
[postgres@localhost ~]$ pg_ctl –D primarysub/ start
Example – Create Tables and Publication
 Connect to default database in publication instance
[postgres@localhost ~]$ psql –p 5420 –U pubdba postgres

 Create a sample table and publication


=# CREATE TABLE pubexample(id INT PRIMARY KEY,
name VARCHAR(30));
=# INSERT INTO pubexample
VALUES(generate_series(1,5000),’Test1’);
=# SELECT count(*) FROM pubexample;
=# CREATE PUBLICATION testpub FOR TABLE pubexample;
Example – Create Tables and Subscription
 Connect to default database in subscription instance
[postgres@localhost ~]$ psql –p 5421 –U subdba postgres

 Create a sample table and subscription


=# CREATE TABLE pubexample(id INT PRIMARY KEY,
name VARCHAR(30));
=# CREATE SUBSCRIPTION testsub CONNECTION
‘host=localhost port=5420 user=pubdba dbname=postgres’
PUBLICATION testpub;
=# SELECT count(*) FROM pubexample;
Example – Test Logical Replication
 Add data to publication
[postgres@localhost ~]$ psql –p 5420 –U pubdba postgres
postgres=# INSERT INTO pubexample
VALUES (generate_series(5001,10000),’Test1’);
postgres=# \q

 Check changes on Subscription


[postgres@localhost ~]$ psql –p 5421 –U subdba postgres
postgres=# SELECT count(*) FROM pubexample;
Monitoring Basics
Monitoring Replication
 pg_stat_replication
 Show connected replicas and their status on the primary

 pg_stat_subscription
 Shows the status of subscription when using logical replication

 pg_stat_wal_receiver
 Shows the WAL receiver process status on Replica

 Recovery information functions:


 pg_is_in_recovery()

 pg_current_wal_lsn

 pg_last_wal_receive_lsn

 pg_last_xact_replay_timestamp()
Example - Monitoring Replication
 Execute:
=# SELECT * FROM pg_stat_replication;

 Find lag (bytes):


=# SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM
pg_stat_replication;

 Find lag (seconds):


=# SELECT CASE WHEN pg_last_wal_receive_lsn() =
pg_last_wal_replay_lsn()
THEN 0 ELSE
EXTRACT (EPOCH FROM now() -pg_last_xact_replay_timestamp())
END AS stream_delay;
Recovery Control Functions

Name Return Type Description

pg_is_wal_replay_paused() bool True if recovery is paused.

pg_wal_replay_pause() void Pauses recovery immediately.

pg_wal_replay_resume() void Restarts recovery if it was paused.


EDB Postgres Distributed -
Overview
EDB Postgres Distributed

The most advanced replication solution for Postgres

Maintain extreme Upgrade with Choose the level


high availability near zero downtime of consistency
Postgres clusters deployed with Rolling upgrades of application and Robust capabilities provide
EDB Postgres Distributed keep top database software eliminate the flexibility to meet application
tier enterprise applications running largest source of downtime data loss requirements
Always ON

Top-tier enterprise applications are critical to an organization’s success in all


regions where business is conducted, whether a single region or globally

The application The application The availability of The application data


represents a must perform well the application must always be
promise to its for a good user directly ties to current and
customers experience revenue generation available, or the
user loses trust
More Than Bi-directional
Replication
MULTI-MASTER REPLICATION ENABLING HIGHLY AVAILABLE AND
GEOGRAPHICALLY DISTRIBUTED POSTGRES CLUSTERS

 Logical replication of data and schema


enabled via standard Postgres extension
 Data consistency options that span from
immediate to eventual consistency
 Robust tooling to manage conflicts, monitor
performance, and validate consistency
 Deploy natively to cloud, virtual, or bare
metal environments
 Geo-fencing, allowing selectively replicate
data for security compliance and
jurisdiction control.
EDB Postgres Distributed Features

Multi-Master
Synchronous or Flexible Always-ON
Row Level
Asynchronous Deployment DDL Replication DDL and Row Filters
Consistency
Replication for Architectures
Postgres

Configurable Conflict-free
Database Rolling
Parallel Apply Auto Partitioning Column-level Replicated Data
Upgrades
Conflict Resolution Types (CRDTs)

Transactions State Next Generation


Subscriber-only Open Telemetry
Tracking Across PGD CLI Connection Routing
Nodes Integration
Failovers using PGD-Proxy
Deployment - Single Location
 Locations = 1, local redundancy = 3,
nodes = 3, active locations = 1
 Global group with single data group of
A1, A2 and A3
 Lead Primary A1 receiving all writes but
changes can also be received by A2 and
A2
 Shadow Primary A2, A3 receiving writes
 Can be 3 data nodes (recommended)

 Can be 2 data nodes and 1 witness that doesn't


hold data (not depicted)
Example Deployment - Multiple Location
 Locations = 2,
local redundancy = 3,
nodes = 3,
active locations = 1
Replication Server Overview
EDB Postgres Replication Server
Single Master Replication (SMR) for Reporting or Migration
Master

EDB Advanced Server


PostgreSQL Read/Write
Oracle®
SQL Server ®

 Data filtering
 Scheduling

EDB Advanced Server


PostgreSQL Read
Oracle®
SQL Server ®

Replica
Replication Server
REPLICATES BETWEEN POSTGRES AND NON-POSTGRES DATABASES

 Integrate with Oracle or SQL Server


databases to offload reporting or to
feed data to legacy applications

 Flexibility to replicate a subset of data


from the source database

 Graphical user interface provides easy


configuration and management

 Includes utility to validate data


consistency between the source and Replication from SQL Server
target databases or Oracle to Postgres
EDB Replication Server Features
 Replicate Oracle or SQL Server data to EDB  Snapshot and continuous modes
Postgres Advanced Server
 Define and apply row filters
 Distributed multi-Publication/Subscription
Architecture  Flexible replication scheduler

 Synchronize data across geographies  Replication History Viewer

 Replicate tables and sequences  Graphical Replication Console and CLI

 Controlled switchover and failover

 Supports cascading replication

 Trigger and Log-based replication


Failover Manager Overview
Why Failover Manager

Ensure business Maintain high Upgrade with


continuity availability minimal downtime
Monitor health Meet your SLAs by Switchover on demand
databases and identify switching over to the to move the primary to
failures quickly most recent standby standby for maintenance
EDB Postgres Failover Manager
AUTOMATICALLY DETECT FAILURES

Client Applications
 Monitors database health -
detects failures and takes
Load Balancer (e.g. Pgpool)
action

 Automatically fails over to the


most current standby,
reconfigures others Read/Write
Read

Primary Primary Witness (Optional)


 Reconfigures load balancers
on failover - integrates with
pgPool and others Streaming
Replication

 Avoids “split brain” scenarios -


Prevents two nodes from EDB Postgres Failover EDB Postgres Failover EDB Postgres Failover
Manager Agent Manager Agent
thinking that each is primary Manager Agent
EFM Features
 Multiple health checks for Primary & Replica nodes
 Automatic Failover from Primary to Replica node
 Controlled switchovers for planned events on primary
 Configurable fencing operations
 User configurable failure detection wait times
 Witness node protects against ‘split brain’ scenarios
 Support for multiple streaming replicas
 Replica promotion based on WAL location and node priority
 Real-time notifications to chat rooms, SNMP and SMTP for all
cluster status changes
Setup an EFM Cluster
 Set up Streaming replication Client Connection Pools and
Application Load Balancer

between the two servers


 Install EFM

 Configure the Streaming Replication


efm.properties file
 Start EFM
Primary Replica - 2

 Add nodes to EFM cluster EFM Agent EFM Agent

 Monitor the EFM and Streaming


database servers Replication Replica - 1
EFM Agent
Replication Manager Overview
Replication Manager(repmgr)

Cluster Management tool for Postgres

Maintain high Perform upgrades Open Source


availability using switchovers
Automatic Failover to Add/remove replicas Open Source from
Replica in a Streaming and switchover of EnterpriseDB and
Replication Environment primary instance licensed under GPL
repmgr Features
 Open-source tool for managing replication and failover
 Supports Postgres Streaming Replication
 repmgr tool for setup:
 Add/remove replicas

 Perform switchovers

 Promote a replica

 repmgrd tool:
 Monitor replication

 Automatic failover detection with witness protection

 Email notification
repmgr Architecture
Replica - 1 Primary Replica - 2

Streaming Replication Streaming Replication

repmgr user and metadata

repmgr
repmgr repmgr

repmgrd repmgrd repmgrd


Module Summary
 Data Replication

 Data Replication in Postgres

 Streaming Replication and Architecture

 Synchronous, Asynchronous and Cascaded Replication

 Setup Streaming Replication

 Logical Replication Architecture

 Overview: EDB Postgres Distributed, EDB Failover Manager,


Replication Server and Replication Manager (repmgr)
Course Summary
 Introduction and Architectural Overview  Database Security

 System Architecture  Monitoring and Admin Tools Overview

 Installation  SQL Primer

 User Tools - Command Line Interfaces  Backup and Recovery

 Database Clusters  Routine Maintenance Tasks

 Database Configuration  Data Loading

 Data Dictionary  Data Replication and High Availability

 Creating and Managing Database Objects


Next Steps
 Certify your Postgres skills with EDB Certifications for Postgres
 Continue your skills development with the following classes:
 Advanced Database Administration

 Monitoring and Alerting with Postgres Enterprise Manager

 Tuning and Maintenance

 See the Training Portal for the full library of Postgres training classes

 Get familiar with the EDB Tools available as part of the EDB Postgres Platform
 For any questions related to EDB Postgres Trainings and Certifications,
or for additional information, write to:
[email protected]
Thank you!

Please visit our Training Portal for


more courses and workshops!

© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .

You might also like