A Deep Dive Into PostgreSQL Indexing
A Deep Dive Into PostgreSQL Indexing
Ibrar Ahmed
Senior Database Architect - Percona LLC
May 2019
Table Characteristics
• The physical files on disk can be seen in the PostgreSQL $PGDATA directory.
$ ls -lrt $PGDATA/base/13680/16384
-rw------- 1 vagrant vagrant 0 Apr 29 11:48 $PGDATA/base/13680/16384
2
Selecting Data 1/2
• Select whole table, must be a sequential scan.
3
Selecting Data 2/2
CREATE TABLE foo(id INTEGER, name TEXT); Page 0/N Tuple - 1
Tuple - 2
Tuple - 1
Page 1/N Tuple - 2
Tuple - 3
SELECT ctid, * FROM foo;
ctid | id | name
-------+----+------ Tuple - n H
Tuple - 1
Page 2/N E
(0,1) | 1 | Alex Tuple - 2
A
Tuple - 3
(0,2) | 2 | Bob P
(2 rows)
Tuple - n
Cost? Tuple - n
4
PostgreSQL Indexes
https://www.postgresql.org/docs/current/indexes.html
5
Why Index?
• Indexes are entry points for tables
• Index used to locate the tuples in the table
• The sole reason to have an index is performance
• Index is stored separately from the table’s main storage (PostgreSQL Heap)
• More storage required to store the index along with original table
postgres=# EXPLAIN SELECT name FROM bar WHERE id = 5432;
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on bar (cost=0.00..159235.00 rows=38216 width=32)
Filter: (id = 5432)
6
Index
The physical file on disk can be seen in the PostgreSQL $PGDATA directory.
$ ls -lrt $PGDATA/13680/16425
-rw-------1 vagrant vagrant 1073741824 Apr 29 13:05 $PGDATA/base/13680/16425
7
Creating Index 1/2
8
Creating Index 2/2
PostgreSQL locks the table when creating index
9
Expression Index 1/2
EXPLAIN SELECT * FROM bar WHERE lower(name) LIKE 'Text1';
QUERY PLAN
-------------------------------------------------------------
Seq Scan on bar (cost=0.00..213694.00 rows=50000 width=40)
Filter: (lower((name)::text) ~~ 'Text1'::text)
1
0
Expression Index 2/2
postgres=# EXPLAIN SELECT * FROM bar WHERE (dt + (INTERVAL '2 days')) < now();
QUERY PLAN
---------------------------------------------------------------
Seq Scan on bar (cost=0.00..238694.00 rows=3333333 width=40)
Filter: ((dt + '2 days'::interval) < now())
postgres=# EXPLAIN SELECT * FROM bar WHERE (dt + (INTERVAL '2 days')) < now();
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on bar (cost=62449.77..184477.10 rows=3333333 width=40)
Recheck Cond: ((dt + '2 days'::interval) < now())
-> Bitmap Index Scan on idx_math_exp (cost=0.00..61616.43 rows=3333333 width=0)
Index Cond: ((dt + '2 days'::interval) < now())
1
1
Partial Index
Index Partial Index
CREATE INDEX idx_full ON bar(id); CREATE INDEX idx_part ON bar(id) where id < 10000;
------------------------------------------------------------------------ -----------------------------------------------------------------------
--
Bitmap Heap Scan on bar (cost=199.44..113893.44 rows=16667 width=40)
Bitmap Heap Scan on bar (cost=61568.60..175262.59 rows=16667 width=40)
Recheck Cond: (id < 1000)
Recheck Cond: (id < 1000)
Filter:
Q: What will happen when we query ((name)::text
where id >1000? ~~ 'text1000'::text)
Filter: ((name)::text ~~ 'text1000'::text)
-> Bitmap Index Scan on idx_part (cost=0.00..195.28 rows=3333333
-> Bitmap Index Scan on idx_full (cost=0.00..61564.43 rows=3333333 width=0)
width=0)
A: Answer is simple, this index won’t selected.
Index Cond: (id < 1000)
Index Cond: (id < 1000)
SELECT pg_size_pretty(pg_total_relation_size('idx_part'));
SELECT pg_size_pretty(pg_total_relation_size('idx_full'));
pg_size_pretty
pg_size_pretty
----------------
----------------
240 kB
214 MB
(1 row)
(1 row)
12
Index Types
https://www.postgresql.org/docs/current/indexes-types.html
13
B-Tree Index 1/2
• What is a B-Tree index? Wikipedia: (https://en.wikipedia.org/wiki/Self-
• Supported Operators balancing_binary_search_tree)
• Less than < In computer science, a self-balancing (or height-balanced) binary search tree
• Less than equal to <=
• Equal = is any node-based binary search tree that automatically keeps its height
• Greater than equal to >= small in the face of arbitrary item insertions and deletions.
• Greater than >
14
B-Tree Index 2/2
CREATE TABLE foo(id INTEGER, name TEXT); Page 0/N Tuple - 1
Tuple - 2
INSERT INTO foo VALUES(1, 'Alex'); Tuple - 3
-------+------
(0,1) | Alex Tuple - 1
(0,2) | Bob Page N/N Tuple - 2
Tuple - n
15
HASH Index
• What is a Hash index? postgres=# \d bar
Table "public.bar"
• Hash indexes only handles equality operators Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
• Hash function is used to locate the tuples id
name
| integer |
| character varying |
|
|
|
|
dt | date | | |
Indexes:
CREATE INDEX idx_hash ON bar USING HASH (name); "idx_btree" btree (name)
"idx_hash" btree (name)
16
BRIN Index 1/2
17
BRIN Index 2/2
Sequential Scan BRIN Index
18
GIN Index 1/2
• Generalized Inverted Index
• GIN is to handle where we need to index composite values
• Slow while creating the index because it needs to scan the document up front
postgres=# \d bar
Table "public.bar"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | jsonb | | |
dt | date | | |
19
GIN Index 2/2
• Generalized Inverted Index
• GIN is to handle where we need to index composite values
• Slow while creating index because it needs to scan the document up front
CREATE INDEX idx_gin ON bar USING GIN (name);
postgres=# EXPLAIN ANALYZE SELECT * FROM bar postgres=# EXPLAIN ANALYZE SELECT * FROM bar
WHERE name @> '{"name": "Alex"}’; WHERE name @> '{"name": "Alex"}';
QUERY PLAN QUERY PLAN
----------------------------------------------------- -----------------------------------------------------
Seq Scan on bar (cost=0.00..108309.34 rows=3499 Bitmap Heap Scan on bar (cost=679.00..13395.57
width=96) (actual time=396.019..1050.143 rows=1000000 rows=4000 width=96) (actual time=91.110..445.112
loops=1) rows=1000000
Even if you create a BTREE index, it won’t be loops=1)
considered.
Filter: (name @> '{"name": Because
"Alex"}'::jsonb)
it does not know the individual element in (name
Recheck Cond: value. @> '{"name": "Alex"}'::jsonb)
Rows Removed by Filter: 3000000 Heap Blocks: exact=16394
Planning Time: 0.107 ms -> Bitmap Index Scan on
Execution Time: 1079.861 ms idx_gin (cost=0.00..678.00 rows=4000 width=0)
(actual time=89.033..89.033 rows=1000000 loops=1)
Index Cond: (name @> '{"name":
"Alex"}'::jsonb)
Planning Time: 0.168 ms
Execution Time: 475.447 ms
20
GiST Index
21
Where and What?
• B-Tree: Use this index for most of the queries and different data types
22
Index Only Scans
• Index is stored separately from the table’s main storage (PostgreSQL Heap)
• Index Only Scans only used when all the columns in the query part of the index
23
Index Only Scans
CREATE INDEX idx_btree_ios ON bar (id,name);
EXPLAIN SELECT id, name, dt FROM bar WHERE id > 100000 AND id <100010;
QUERY PLAN
Index Scan using idx_btree_ios on bar (cost=0.56..99.20 rows=25 width=19)
Index Cond: ((id > 100000) AND (id < 100010))
(2 rows)
EXPLAIN SELECT id, name FROM bar WHERE id > 100000 AND id <100010;
QUERY PLAN
Index Only Scan using idx_btree_ios on bar (cost=0.56..99.20 rows=25 width=15)
Index Cond: ((id > 100000) AND (id < 100010))
(2 rows)
24
Duplicate Indexes
SELECT indrelid::regclass relname,
indexrelid::regclass indexname, indkey
FROM pg_index
GROUP BY relname,indexname,indkey;
relname | indexname | indkey
--------------------------+-----------------------------------------------+---------
pg_index | pg_index_indexrelid_index | 1
pg_toast.pg_toast_2615 | pg_toast.pg_toast_2615_index | 1 2
pg_constraint | pg_constraint_conparentid_index | 11
25
Unused Indexes
SELECT relname, indexrelname, idx_scan
FROM pg_catalog.pg_stat_user_indexes;
26
?
“Poor leaders rarely ask questions of
themselves or others. Good leaders, on
the other hand, ask many questions.
Great leaders ask the great questions.”
27
Thank You to Our Sponsors
Rate My Session
29