Postgresql
Postgresql
Ibrar Ahmed
2
Who am I?
Software Career
• Software industries since 1998.
PostgreSQL Career
• Working on PostgreSQL Since 2006.
02 PostgreSQL Modules
Database Performance PostgreSQL Performance Tuning Q Parallel Query Optimization Question Answers
Hardware Database
Choose better Hardware Tune your database (PostgreSQL)
/dev/sda /dev/sdb
3.
PostgreSQL Performance Tuning
PostgreSQL
Tuning Parameters
• shared_buffers
• wal_buffers
• effective_cache_size
• maintenance_work_mem
• synchronous_commit
• checkpoint_timeout
• checkpoint_completion_target
• temp_buffers
• huge_pages
PostgreSQL Tuning / shared_buffers
• PostgreSQL uses its own buffer along with kernel buffered I/O.
• PostgreSQL does not change the information on disk directly then how?
The proper size for the POSTGRESQL shared buffer cache is the largest useful size that does not adversely affect other activity.
—Bruce Momjian
PostgreSQL Tuning / max_connections
• Maximum connection for PostgreSQL
• Default is 100
postgres=# SELECT pg_backend_pid(); ps aux | grep postgres | grep idle
pg_backend_pid vagrant 3214 0.0 1.2 194060 12948 ? Ss 15:09 0:00
---------------- postgres: vagrant postgres [local] idle
3214
(1 row) vagrant 3590 0.0 1.2 193840 12936 ? Ss 15:11 0:00
postgres: vagrant postgres [local] idle
postgres=# SELECT pg_backend_pid();
pg_backend_pid vagrant 3616 0.0 1.2 193840 13072 ? Ss 15:11 0:00
---------------- postgres: vagrant postgres [local] idle
3590
(1 row)
shared_buffers vs TPS
Shared_buffer vs TPS
Shared_buffer TPS
35%
128 Megabyte 500
1 Gigabyte 12898
2 Gigabyte 30987
4 Gigabyte 54536
8 Gigabyte 55352
16 Gigabyte 55364
PostgreSQL Tuning / wal_buffer
• Do you have Transaction? Obviously
• Size of WAL files 16MB with 8K Block size (can be changed at compile time)
• PostgreSQL writes WAL into the buffers(wal_buffer ) and then these buffers are flushed to disk.
Bigger value for wal_buffer in case of lot of concurrent connection gives better performance.
PostgreSQL Tuning effective_cache_size
• This used by the optimizer to estimate the size of the kernel's disk buffer cache.
• The effective_cache_size provides an estimate of the memory available for disk caching.
• Each value is per session based, that means if you set that value to 10MB and 10 users issue sort queries then 100MB will
be allocated.
• In case of merge sort, if x number of tables are involved in the sort then x * work_mem will be used.
● Setting a large value helps in tasks like VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY and ALTER TABLE.
maintenance_work_mem
postgres=# CHECKPOINT;
postgres=# SET maintenance_work_mem='10MB';
postgres=# SHOW maintenance_work_mem;
maintenance_work_mem
----------------------
10MB
(1 row)
postgres=# CREATE INDEX idx_foo ON foo(id);
Time: 12374.931 ms (00:12.375)
postgres=# CHECKPOINT;
postgres=# SET maintenance_work_mem='1GB';
postgres=# SHOW maintenance_work_mem;
maintenance_work_mem
----------------------
1GB
(1 row)
postgres=# CREATE INDEX idx_foo ON foo(id);
Time: 9550.766 ms (00:09.551)
synchronous_commit
• This is used to enforce that commit will wait for WAL to be written on disk before returning a success status to the client.
Synchronous commit doesn't introduce the risk of corruption, which is really bad, just some risk of data loss.
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
checkpoint_timeout
• PostgreSQL writes changes into WAL. The checkpoint process flushes the data into the data files.
• PostgreSQL believes that the Operating system (Kernel) knows much better about storage and IO scheduling.
• PostgreSQL has its own buffering; and also needs the pages cache. Double Buffering
sections of memory
• Pages Table are used to translate the virtual addresses seen by the application
https://en.wikipedia.org/wiki/Virtual_memory
https://en.wikipedia.org/wiki/Page_table
Translation Lookaside Buffer (TLB)
• Translation Lookaside Buffer is a memory cache
• If a match is found the physical address of the page is returned → TLB hit
• Linux has a concept of Classic Huge Pages and Transparent Huge Pages.
4K 6710886
4
g es
g e pa 2M 131072
e/h u
la r g
1G 256
Classic Huge Pages
# cat /proc/meminfo
MemTotal: 264041660 kB
...
Hugepagesize: 2048 kB
DirectMap4k: 128116 kB
DirectMap2M: 3956736 kB
DirectMap1G: 266338304 kB
sysctl -w vm.nr_hugepages=256
Classic Huge Pages
# vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="hugepagesz=1GB default_hugepagesz=1G”
# update-grub
Done
# shutdown -r now
Classic Huge Pages
# vim /etc/postgresql/10/main/postgresql.conf
AnonHugePages: 2048 kB
To disable it:
• at runtime:
• at boot time:
GRUB_CMDLINE_LINUX_DEFAULT="(...) transparent_hugepage=never"
vm.swappiness
• This is another kernel parameter that can affect the performance of the database.
• Used to control the swappiness (swapping pages to swap memory into RAM) behavior on a Linux system.
• But in some cases, an application acquires too much memory and does not release it. This can invoke the OOM killer.
○ Parallel query operations are parallel safe, parallel restricted, or parallel unsafe.
https://www.postgresql.org/docs/12/parallel-plans.html
How Parallel Query Works - Configuration
• max_worker_processes The default is 8
Sets the maximum number of background processes that the system can support. This parameter can only be set at server start..
Sets the maximum number of workers that can be started by a single Gather or Gather Merge node.
Sets the maximum number of workers that the system can support for parallel queries.
• dynamic_shared_memory_type (enum)
Parallel query requires dynamic shared memory in order to pass data between cooperating processes, so there should not be set to “none”
● Posix
● sysv
● windows
● mmap
• parallel_workers (integer) Sets the number of workers that should be used to assist a parallel scan of this table
EXPLAIN ANALYZE SELECT COUNT(*) FROM foo; ALTER TABLE foo SET (parallel_workers = 2);
Finalize Aggregate EXPLAIN ANALYZE SELECT COUNT(*) FROM foo;
-> Gather Finalize Aggregate
Workers Planned: 1 -> Gather
Workers Launched: 1 Workers Planned: 2
-> Partial Aggregate Workers Launched: 2
-> Parallel Seq Scan on foo -> Partial Aggregate
-> Parallel Seq Scan on foo
ALTER TABLE foo SET (parallel_workers =
2);
If the Gather or Gather Merge node is at the very top of the plan tree, then the entire query will execute in parallel,
otherwise only the portion of the plan below it will run in parallel
When Can Parallel Query Be Used?
• Number of backgrounds < max_worker_processes.
▪ DECLARE CURSOR
• The query writes any data or locks any database rows. If a query contains a data-modifying operation (top level/within a
CTE)*.
Default value of max_worker_processes and max_parallel_workers are 8.
* Limitation of current implementation.
When Can Parallel Query Be Used? 2/2
• Windowed functions and ordered-set aggregate functions are non-parallel*.
● Parallel Joins
● Parallel Aggregation
● Parallel Append
Parallel Sequential Scan
CREATE TABLE foo AS SELECT id AS id, 'name’::TEXT||id::TEXT AS name FROM
generate_series(1,10000000) AS id;
EXPLAIN ANALYZE SELECT * FROM foo WHERE id %2 = 10; Process
QUERY PLAN
-----------------------------------------------------------------------------
TABLE
----Seq Scan on foo (cost=0.00..204052.90 rows=49999 width=15) Block0
Planning Time: 0.063 ms Block1
• Worker-0 will start from the root node 100 200 300
and scan until the leaf node 200. 100 101 102 200 210 220 300 301 302
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . . . n
divided into the workers
Parallel Joins
Parallel Hash Join EXPLAIN ANALYZE SELECT COUNT(*)
In case of hash join the inner side is executed in full by every cooperating process FROM foo JOIN bar ON id = v WHERE v > 10;
QUERY PLAN
to build identical copies of the hash table. -------------------------------------------------------------
-Finalize Aggregate
What if hash table is too big ? -> Gather
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate
-> Parallel Hash Join
In parallel hash join the inner side is a parallel hash that divides the work of
Hash Cond: (bar.v = foo.id)
building a shared hash table over the cooperating processes. -> Parallel Seq Scan on)
All workers participate in creating the Hash table. Filter: (v > 10)
Rows Removed by Filter: 500058
After the hash, each worker can perform the join. -> Parallel Hash
Merge Join -> Parallel Seq Scan on foo
Planning Time: 0.211 ms
The inner side is always a non-parallel plan and therefore executed in full.
Execution Time: 758.847 ms
This may be inefficient, especially if a sort must be performed, because the work
and resulting data are duplicated in every cooperating process.
Nested Loop Join
The inner side is always non-parallel.
The outer tuples and thus the loops that look up values in the index are divided
over the cooperating processes.
Parallel Aggregation
Master
• Worker performs the scans on the pages and applies the
Finalize Aggregate
filter.
Gather Merge
• Transfer all the tuples to master.
Worker-0 Worker-1 Worker-2 Worker-3 Worker-4
• Master performs the final aggregate. Partial Aggregate Partial Aggregate Partial Aggregate Partial Aggregate Partial Aggregate
Seuential Scan Seuential Scan Seuential Scan Seuential Scan Seuential Scan
0 1 2 3 4 5 6 7 8 9 10 10 . . . n
EXPLAIN (COSTS FALSE) SELECT sum(id) FROM foo WHERE id EXPLAIN (COSTS FALSE) SELECT sum(id) FROM foo WHERE id %2 = 10;
QUERY PLAN
%2 = 10; ---------------------------------------------------------------
QUERY PLAN Finalize Aggregate A new Finalize Aggregate
-> Gather node to combine the
----------------------------------------------- Workers Planned: 4 results
-> Partial Aggregate
Aggregate Node to handle -> Parallel Seq Scan on foo
Aggregate aggregates Filter: ((id % 2) = 10)
-> Seq Scan on foo A new PartialAggregate
produces transition state
Filter: (( id % 2) = 10) outputs.
(3 rows)
Parallel aggregation is not supported if any aggregate function call contains DISTINCT or ORDER BY clause
Parallel Append
\d+ foo; EXPLAIN (costs off)SELECT count(id) FROM foo WHERE id <= 201;
Partitioned table "public.foo" QUERY PLAN
Column | Type | ---------------------------------------------------------
------+-------------- Finalize Aggregate
-> Gather Used to combine rows from
id | integer
Workers Planned: 2 multiple sources
name | text
-> Partial Aggregate
Partition key: RANGE (id) -> Parallel Append
Partitions: foo1 FOR VALUES FROM ('-1000000') TO (1), -> Parallel Seq Scan on foo1 foo_1
foo2 FOR VALUES FROM (1) TO (1000000) Filter: (id <= 201)
-> Parallel Seq Scan on foo2 foo_2
Filter: (id <= 201)
• Append node used to combine rows from multiple sources into a single result
set.
• In Append, all the participating processes cooperate to execute the first child
plan until it is complete and then move to the second plan at around the same
time
All user-defined functions are assumed to be parallel unsafe unless otherwise marked.
Parallel Labeling for Functions and Aggregates 2/2
● The following operations are always parallel restricted:
• Scans of foreign tables, unless the foreign data wrapper has an IsForeignScanParallelSafe API
which indicates otherwise.
CPUs in order to process the table rows faster. This feature CREATE INDEX
is known as parallel index build. Time: 11815.685 ms (00:11.816)
● Generally, a cost model automatically determines how
○ https://www.percona.com/blog/2018/08/31/tuning-postgresql-database-parameters-to-optimize-performance/
• Tune Linux Kernel Parameters For PostgreSQL Optimization
○ https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-for-postgresql-optimization/
Thanks!
Any questions?
You can find me at:
● [email protected]
● Pgelephant.com
56