The Best of Bruce’s Postgres Slides
BRUCE MOMJIAN
This talk has the best slides from my 25+ Postgres presentations.
Creative Commons Attribution License http://momjian.us/presentations
Last updated: May, 2017
1 / 26
Postgres System Architecture
Main
Libpq
Postmaster
Postgres Postgres
Parse Statement
utility Utility
Traffic Cop
Command
Query e.g. CREATE TABLE, COPY
SELECT, INSERT, UPDATE, DELETE
Rewrite Query
Generate Paths
Optimal Path
Generate Plan
Plan
Execute Plan
Utilities Catalog Storage Managers
Access Methods Nodes / Lists
Mastering PostgreSQL Administration
2 / 26
Shared Memory Creation
()
rk
postmaster fo postgres postgres
Program (Text) Program (Text) Program (Text)
Data Data Data
Shared Memory Shared Memory Shared Memory
Stack Stack Stack
Inside PostgreSQL Shared Memory
3 / 26
Shared Buffers and WAL
../administration/buffer_stack.eps
PostgreSQL Performance Tuning
4 / 26
Backend Flowchart - Magnified
Parse Statement
utility Utility
Traffic Cop Command
Query e.g. CREATE TABLE, COPY
SELECT, INSERT, UPDATE, DELETE
Rewrite Query
Generate Paths
Optimal Path
Generate Plan
Plan
Execute Plan
PostgreSQL Internals Through Pictures
5 / 26
Query Processing
FindExec: found "/var/local/postgres/./bin/postmaster" using argv[0]
./bin/postmaster: BackendStartup: pid 3320 user postgres db test socket 5
./bin/postmaster child[3320]: starting with (postgres −d99 −F −d99 −v131072 −p test )
FindExec: found "/var/local/postgres/./bin/postgres" using argv[0]
DEBUG: connection: host=[local] user=postgres database=test
DEBUG: InitPostgres
DEBUG: StartTransactionCommand
DEBUG: query: SELECT firstname
FROM friend
WHERE age = 33;
DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAgg
s false :hasSubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs (
"firstname" "lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUse
r 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresu
lttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod −1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype
23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom
{ RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :dis
tinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG: rewritten parse tree:
DEBUG: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :has
SubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs ( "firstname"
"lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :joint
ree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 }
:args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod −1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constle
n 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :r
esno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause
<> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG: plan: { SEQSCAN :startup_cost 0.00 :total_cost 22.50 :rows 10 :width 12 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno
1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :va
rattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16 :opType op :oper { OPE
R :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod −1 :varlevelsup 0 :varnoold 1 :varo
attno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}) :lefttree <> :rightt
ree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 }
DEBUG: ProcessQuery
DEBUG: CommitTransactionCommand
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
./bin/postmaster: reaping dead processes...
./bin/postmaster: CleanupProc: pid 3320 exited with status 0
PostgreSQL Internals Through Pictures
6 / 26
EXPLAIN with Constants of Various Frequencies
l | count | lookup_letter
---+-------+-----------------------------------------------------------------------
p | 199 | Seq Scan on sample (cost=0.00..13.16 rows=199 width=2)
s | 9 | Seq Scan on sample (cost=0.00..13.16 rows=9 width=2)
c | 8 | Seq Scan on sample (cost=0.00..13.16 rows=8 width=2)
r | 7 | Seq Scan on sample (cost=0.00..13.16 rows=7 width=2)
t | 5 | Bitmap Heap Scan on sample (cost=4.29..12.76 rows=5 width=2)
f | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)
v | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)
d | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)
a | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)
_ | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)
u | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)
e | 2 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)
i | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)
k | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)
(14 rows)
Explaining the Postgres Query Optimizer
7 / 26
Deadlocks
SELECT pg_sleep(0.500); SELECT * FROM lockview1;
pid | vxid | lock_type | lock_mode | granted | xid_lock | relname
-------+-------+---------------+------------------+---------+----------+------------
11306 | 2/61 | transactionid | ExclusiveLock | t | 710 |
11306 | 2/61 | relation | RowExclusiveLock | t | | i_lockdemo
11306 | 2/61 | relation | RowExclusiveLock | t | | lockdemo
11306 | 2/61 | tuple | ExclusiveLock | t | | lockdemo
11306 | 2/61 | transactionid | ShareLock | f | 711 |
11642 | 3/116 | transactionid | ExclusiveLock | t | 711 |
11642 | 3/116 | relation | RowExclusiveLock | t | | i_lockdemo
11642 | 3/116 | relation | RowExclusiveLock | t | | lockdemo
11642 | 3/116 | tuple | ExclusiveLock | t | | lockdemo
11642 | 3/116 | transactionid | ShareLock | f | 710 |
(10 rows)
Unlocking the Postgres Lock Manager
8 / 26
MVCC Behavior
Cre 40
Exp INSERT
Cre 40
Exp 47 DELETE
Cre 64 old (delete)
Exp 78
UPDATE
Cre 78 new (insert)
Exp
UPDATE is effectively a DELETE and an INSERT.
MVCC Unmasked
9 / 26
MVCC Examples
Create−Only
Cre 30 Sequential Scan
Exp Visible
Cre 50 Snapshot
Exp Invisible
Cre 110 The highest−numbered
Exp Invisible committed transaction: 100
Open Transactions: 25, 50, 75
Create & Expire
Cre 30 For simplicity, assume all other
Exp 80 Invisible transactions are committed.
Cre 30
Exp 75 Visible
Cre 30
Exp 110 Visible
Internally, the creation xid is stored in the system column ’xmin’, and expire in ’xmax’.
MVCC Unmasked
10 / 26
Heap Page Structure
Page Header Item Item Item
8K
Tuple
Tuple Tuple Special
PostgreSQL Internals Through Pictures
11 / 26
Pg_upgrade: Restore Schema In New Cluster
pg_dumpall − −schema
Old Cluster New Cluster
System Tables and Indexes System Tables and Indexes
1 4 7 1 4 7
2 5 8 2 5 8
3 6 9 3 6 9
pg_class pg_class
User Tables and Indexes User Tables and Indexes
10 16 22 10 16 22
11 17 23 11 17 23
12 18 24 12 18 24
13 19 25 13 19 25
14 20 26 14 20 26
15 21 27 15 21 27
clog clog
Rapid Upgrades With Pg_Upgrade
12 / 26
Pg_upgrade: Copy User Heap/Index Files
Old Cluster New Cluster
System Tables and Indexes System Tables and Indexes
1 4 7 1 4 7
2 5 8 2 5 8
3 6 9 3 6 9
pg_class pg_class
User Tables and Indexes User Tables and Indexes
10 16 22 10 16 22
11 17 23 11 17 23
12 18 24 12 18 24
13 19 25 13 19 25
14 20 26 14 20 26
15 21 27 15 21 27
clog clog
Rapid Upgrades With Pg_Upgrade
13 / 26
Continuous Archiving
0
:0
:0
:0
:0
02
09
11
13
WAL AL AL
W W
File System− Continuous
Level Backup Archive (WAL)
The Magic of Hot Streaming Replication
14 / 26
Point-in-Time Recovery
5
:0
:3
:4
:5
17
17
17
17
WAL
AL AL
W W
File System− Continuous
Level Backup Archive (WAL)
The Magic of Hot Streaming Replication
15 / 26
Streaming Replication Setup
0
:0
:0
:0
:0
02
09
11
13
WAL AL AL
W W
File System− Standby
Level Backup Server
The Magic of Hot Streaming Replication
16 / 26
Streaming Replication in Operation
Primary Standby
Network
/pg_xlog /pg_xlog
archive WAL restore
command Archive command
Directory
The Magic of Hot Streaming Replication
17 / 26
Read Scaling Using Pgpool & Streaming Replication
pgpool
INSERT, UPDATE, SELECT
DELETE to master to any host
host
111111111111
000000000000 1111111111111
0000000000000
0000000000000
1111111111111
000000000000
111111111111 0000000000000
1111111111111
streaming 000000000000
111111111111
000000000000
111111111111 0000000000000
1111111111111
000000000000
111111111111
replication 0000000000000
1111111111111
000000000000
111111111111 0000000000000
1111111111111
0000000000000
1111111111111
000000000000
111111111111 0000000000000
1111111111111
000000000000
111111111111
Master Slave Slave
replication
A full copy of the data exists on every node.
PostgreSQL Replication Solutions 18 / 26
Write Scaling Using FDW-Based Sharding
SQL Queries
PG FDW
SQL Queries
with joins, sorts, aggregates
111111111111111111111111
000000000000000000000000 11111111111111111111111
00000000000000000000000 11111111111111111111111
00000000000000000000000
000000000000000000000000
111111111111111111111111
000000000000000000000000
111111111111111111111111 00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111 00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
000000000000000000000000
111111111111111111111111
000000000000000000000000
111111111111111111111111 00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111 00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
000000000000000000000000
111111111111111111111111
Foreign Server
00000000000000000000000
11111111111111111111111
Foreign Server
00000000000000000000000
11111111111111111111111
Foreign Server
000000000000000000000000
111111111111111111111111 00000000000000000000000
11111111111111111111111 00000000000000000000000
11111111111111111111111
The Future of Postgres Sharding
19 / 26
Database Server Hardware Priorities
CPU
Memory
I/O
Database Hardware Selection Guidelines
20 / 26
Postgres’s Central Role
Oracle ISN
PostGIS
MongoDB
Foreign Data Extensions
Wrappers PL/R
Twitter
Postgres
Window Functions JSON
Data NoSQL
Warehouse
Data Paritioning Easy DDL
Bitmap Scans Sharding
Making Postgres Central in Your Data Center
21 / 26
Use of the Contains Operator @>
\do @>
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-------------
pg_catalog | @> | aclitem[] | aclitem | boolean | contains
pg_catalog | @> | anyarray | anyarray | boolean | contains
pg_catalog | @> | anyrange | anyelement | boolean | contains
pg_catalog | @> | anyrange | anyrange | boolean | contains
pg_catalog | @> | box | box | boolean | contains
pg_catalog | @> | box | point | boolean | contains
pg_catalog | @> | circle | circle | boolean | contains
pg_catalog | @> | circle | point | boolean | contains
pg_catalog | @> | jsonb | jsonb | boolean | contains
pg_catalog | @> | path | point | boolean | contains
pg_catalog | @> | polygon | point | boolean | contains
pg_catalog | @> | polygon | polygon | boolean | contains
pg_catalog | @> | tsquery | tsquery | boolean | contains
Non-Relational Postgres
22 / 26
Postgres System Tables
pg_database pg_trigger pg_aggregate pg_amproc
datlastsysoid tgrelid aggfnoid amopclaid
pg_conversion tgfoid aggtransfn amproc
conproc aggfinalfn
pg_language aggtranstype
pg_cast pg_proc pg_constraint pg_am
pg_rewrite castsource prolang contypid amgettuple
ev_class casttarget prorettype aminsert
castfunc pg_opclass ambeginscan
opcdeftype amrescan
amendscan
pg_index pg_class pg_type pg_operator ammarkpos
indexrelid reltype typrelid oprleft amrestrpos
indrelid relam typelem oprright ambuild
relfilenode typinput oprresult ambulkdelete
reltoastrelid typoutput oprcom amcostestimate
reltoastidxid typbasetype oprnegate
oprlsortop
oprrsortop
oprcode
pg_inherits pg_attribute pg_attrdef oprrest pg_amop
inhrelid attrelid adrelid oprjoin amopclaid
inhparent attnum adnum amopopr
atttypid
pg_statistic
starelid
staattnum
pg_depend pg_namespace staop pg_shadow pg_group pg_description
PostgreSQL Internals Through Pictures
http://www.postgresql.org/docs/current/static/catalogs.html
23 / 26
CTEs: Mixing Modification Commands
CREATE TEMPORARY TABLE old_orders (order_id INTEGER);
WITH source (order_id) AS (
DELETE FROM orders WHERE name = ’my order’ RETURNING order_id
), source2 AS (
DELETE FROM items USING source WHERE source.order_id = items.order_id
)
INSERT INTO old_orders SELECT order_id FROM source;
Programming the SQL Way with Common Table Expressions
24 / 26
SSL ’VERIFY-CA’ Is Secure
From Spoofing
SSL verify-ca Fake PostgreSQL PostgreSQL
Database Invalid certificate Database Database
X
Client (no CA signature)
root.crt
Server Server
server.crt
Securing PostgreSQL From External Attack
25 / 26
Conclusion: Release Dates and Sizes After 2000
version | reldate | months | relnotes | lines | change | % change
----------+------------+--------+----------+---------+--------+----------
7.0 | 2000-05-08 | 11 | | 383270 | 51992 | 15
7.1 | 2001-04-13 | 11 | | 410500 | 27230 | 7
7.2 | 2002-02-04 | 10 | 250 | 394274 | -16226 | -3
7.3 | 2002-11-27 | 10 | 305 | 453282 | 59008 | 14
7.4 | 2003-11-17 | 12 | 263 | 508523 | 55241 | 12
8.0 | 2005-01-19 | 14 | 230 | 654437 | 145914 | 28
8.1 | 2005-11-08 | 10 | 174 | 630422 | -24015 | -3
8.2 | 2006-12-05 | 13 | 215 | 684646 | 54224 | 8
8.3 | 2008-02-04 | 14 | 223 | 762697 | 78051 | 11
8.4 | 2009-07-01 | 17 | 314 | 939098 | 176401 | 23
9.0 | 2010-09-20 | 15 | 237 | 999862 | 60764 | 6
9.1 | 2011-09-12 | 12 | 203 | 1069547 | 69685 | 6
9.2 | 2012-09-10 | 12 | 238 | 1148192 | 78645 | 7
9.3 | 2013-09-09 | 12 | 177 | 1195627 | 47435 | 4
9.4 | 2014-12-18 | 15 | 211 | 1261024 | 65397 | 5
9.5 | 2016-01-07 | 13 | 193 | 1340005 | 78981 | 6
9.6 | 2016-09-29 | 8 | 214 | 1380458 | 40453 | 3
PostgreSQL: Past, Present, and Future
26 / 26