CLUSTER: Improving PostgreSQL performance

Hans-Jürgen Schönig

03.2020

Category: Performance

Tags: administration, development

When designing a database application, the layout of data in disk is often neglected. However, the way data is clustered by PostgreSQL can have a major performance impact. Therefore it makes sense to take a look at what can be done to improve speed and throughput. In this post you will learn one of the most important tricks.

Table of Contents

PostgreSQL: To sort or not to sort

To demonstrate the importance of the on-disk layout I have created a simple test set:

test=# CREATE TABLE t_test AS SELECT *
		FROM generate_series(1, 10000000);
SELECT 10000000
test=# CREATE TABLE t_random AS SELECT *
		FROM t_test
		ORDER BY random();
SELECT 10000000

1

2

3

4

5

6

7

test=# CREATE TABLE t_test AS SELECT *

FROM generate_series(1, 10000000);

SELECT 10000000

test=# CREATE TABLE t_random AS SELECT *

FROM t_test

ORDER BY random();

SELECT 10000000

Note that both data sets are absolutely identical. I have loaded 10 million rows into a simple table. However, in the first case data has been sorted, then inserted. generate_series returns data in ascending order and because the table is new data will be written to disk in that order.
In the second case I decided to shuffle the data before insertion. We are still talking about the same data set. However, it is not in the same order:

test=# d+
                    List of relations
 Schema |   Name   | Type  | Owner |  Size  | Description
--------+----------+-------+-------+--------+-------------
 public | t_random | table | hs    | 346 MB |
 public | t_test   | table | hs    | 346 MB |
(2 rows)

1

2

3

4

5

6

7

test=# d+

List of relations

Schema | Name | Type | Owner | Size | Description

--------+----------+-------+-------+--------+-------------

public | t_random | table | hs | 346 MB |

public | t_test | table | hs | 346 MB |

(2 rows)

In both cases the size on disk is the same. There are no changes in terms of space consumption which can be an important factor as well.

Creating an index in PostgreSQL

Let us create an index on both tables:

test=# timing
Timing is on.
test=# CREATE INDEX idx_test ON t_test (generate_series);
CREATE INDEX
Time: 3699.416 ms (00:03.699)
test=# CREATE INDEX idx_random ON t_random (generate_series);
CREATE INDEX
Time: 5084.823 ms (00:05.085)

1

2

3

4

5

6

7

8

test=# timing

Timing is on.

test=# CREATE INDEX idx_test ON t_test (generate_series);

CREATE INDEX

Time: 3699.416 ms (00:03.699)

test=# CREATE INDEX idx_random ON t_random (generate_series);

CREATE INDEX

Time: 5084.823 ms (00:05.085)

Even creating the index is already faster on sorted data for various reasons. However, creating initial indexes does not happen too often, so you should not worry too much.

In the next step we can already create optimizer statistics and make sure that all hint bits are set to ensure a fair performance comparison:

test=# VACUUM ANALYZE;
VACUUM

1 2	test=# VACUUM ANALYZE; VACUUM

Reading blocks of database

Now that all the test data sets are in place we can run a simple test: Let us fetch 49000 rows from the sorted data set first:

test=# explain (analyze, buffers) SELECT *
	FROM 	t_test
	WHERE 	generate_series BETWEEN 1000 AND 50000;
                                      QUERY PLAN                                                            
-------------------------------------------------------------------------------------
 Index Only Scan using idx_test on t_test  (cost=0.43..1362.62 rows=43909 width=4) 
(actual time=0.017..7.810 rows=49001 loops=1)
   Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))
   Heap Fetches: 0
   Buffers: shared hit=138
 Planning Time: 0.149 ms
 Execution Time: 11.785 ms
(6 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

test=# explain (analyze, buffers) SELECT *

FROM t_test

WHERE generate_series BETWEEN 1000 AND 50000;

QUERY PLAN

-------------------------------------------------------------------------------------

Index Only Scan using idx_test on t_test (cost=0.43..1362.62 rows=43909 width=4)

(actual time=0.017..7.810 rows=49001 loops=1)

Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))

Heap Fetches: 0

Buffers: shared hit=138

Planning Time: 0.149 ms

Execution Time: 11.785 ms

(6 rows)

Not bad. We need 11.785 milliseconds to read the data. What is most important to consider here is that the number of 8k blocks needed is 138, which is not much. “shared hit” means that all the data has come from memory.

Let me run the same test again:

test=# explain (analyze, buffers) SELECT *
	FROM 	t_random
	WHERE 	generate_series BETWEEN 1000 AND 50000;
                                     QUERY PLAN                                                              
------------------------------------------------------------------------------------------
 Index Only Scan using idx_random on t_random  (cost=0.43..1665.17 rows=53637 width=4) 
(actual time=0.013..9.892 rows=49001 loops=1)
   Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))
   Heap Fetches: 0
   Buffers: shared hit=18799
 Planning Time: 0.102 ms
 Execution Time: 13.386 ms
(6 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

test=# explain (analyze, buffers) SELECT *

FROM t_random

WHERE generate_series BETWEEN 1000 AND 50000;

QUERY PLAN

------------------------------------------------------------------------------------------

Index Only Scan using idx_random on t_random (cost=0.43..1665.17 rows=53637 width=4)

(actual time=0.013..9.892 rows=49001 loops=1)

Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))

Heap Fetches: 0

Buffers: shared hit=18799

Planning Time: 0.102 ms

Execution Time: 13.386 ms

(6 rows)

In this case the query took a bit longer: 13.4 ms. However, let us talk about the most important number here: The number of blocks needed to return this result. 18799 blocks. Wow. That is roughly 150 times more.

One could argue that the query is not really that much slower. This is true. However, in my example all data is coming from memory. Let us assume for a moment that data has to be read from disk because for some reason we get no cache hits. The situation would change dramatically. Let us assume that reading one block from disk takes 0.1 ms:

138 * 0.1 + 11.7 = 25.5 ms
vs.
18799 * 0.1 + 13.4 = 1893.3 ms

That is a major difference. This is why the number of blocks does make a difference – even if it might not appear to be the case at first glance. The lower your cache hit rates are, the bigger the problem will become.

There is one more aspect to consider in this example: Note that if you want to read a handful of rows only the on-disk layout does not make too much of a difference. However, if the subset of data contains thousands of rows, the way is ordered on disk does have an impact on performance.

CLUSTER: PostgreSQL comes to the rescue

The CLUSTER command has been introduced many years ago to address exactly the issues I have just outlined. It allows you to organize data according to an index. Here is the syntax:

test=# h CLUSTER
Command:     CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]

1

2

3

4

5

6

test=# h CLUSTER

Command: CLUSTER

Description: cluster a table according to an index

Syntax:

CLUSTER [VERBOSE] table_name [ USING index_name ]

CLUSTER [VERBOSE]

URL: https://www.postgresql.org/docs/12/sql-cluster.html

Utilizing the CLUSTER command is easy. The following code snipped will show how you can do that:

test=# CLUSTER t_random USING idx_random;
CLUSTER

1 2	test=# CLUSTER t_random USING idx_random; CLUSTER

To see what happens I have executed the same query as before again. However, there is something important to be seen:

test=# explain (analyze, buffers)
	SELECT * 	FROM t_random
	WHERE 	generate_series BETWEEN 1000 AND 50000;
                                 QUERY PLAN                                
---------------------------------------------------------------------------------
 Bitmap Heap Scan on t_random  (cost=1142.21..48491.32 rows=53637 width=4) 
(actual time=3.328..9.564 rows=49001 loops=1)
   Recheck Cond: ((generate_series >= 1000) AND (generate_series <= 50000)) 
Heap Blocks: exact=218 Buffers: shared hit=2 read=353 ->  
 Bitmap Index Scan on idx_random  (cost=0.00..1128.80 rows=53637 width=0) 
(actual time=3.284..3.284 rows=49001 loops=1)
         Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))
         Buffers: shared hit=2 read=135
 Planning Time: 1.024 ms
 Execution Time: 13.077 ms
(9 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

test=# explain (analyze, buffers)

SELECT * FROM t_random

WHERE generate_series BETWEEN 1000 AND 50000;

QUERY PLAN

---------------------------------------------------------------------------------

Bitmap Heap Scan on t_random (cost=1142.21..48491.32 rows=53637 width=4)

(actual time=3.328..9.564 rows=49001 loops=1)

Recheck Cond: ((generate_series >= 1000) AND (generate_series <= 50000))

Heap Blocks: exact=218 Buffers: shared hit=2 read=353 ->

Bitmap Index Scan on idx_random (cost=0.00..1128.80 rows=53637 width=0)

(actual time=3.284..3.284 rows=49001 loops=1)

Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))

Buffers: shared hit=2 read=135

Planning Time: 1.024 ms

Execution Time: 13.077 ms

(9 rows)

PostgreSQL has changed the execution plan. This happens due to wrong statistics. Therefore it is important to run ANALYZE to make sure that the optimizer has up-to date information:

test=# ANALYZE;
ANALYZE

1 2	test=# ANALYZE; ANALYZE

Once the new optimizer statistics is in place the execution plan will be as expected again:

test=# explain (analyze, buffers) SELECT *
	FROM 	t_random
	WHERE 	generate_series BETWEEN 1000 AND 50000;
                                         QUERY PLAN                                                              
-----------------------------------------------------------------------------------------
 Index Only Scan using idx_random on t_random  (cost=0.43..1807.12 rows=50884 width=4) 
(actual time=0.012..11.737 rows=49001 loops=1)
   Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))
   Heap Fetches: 49001
   Buffers: shared hit=355
 Planning Time: 0.220 ms
 Execution Time: 15.267 ms
(6 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

test=# explain (analyze, buffers) SELECT *

FROM t_random

WHERE generate_series BETWEEN 1000 AND 50000;

QUERY PLAN

-----------------------------------------------------------------------------------------

Index Only Scan using idx_random on t_random (cost=0.43..1807.12 rows=50884 width=4)

(actual time=0.012..11.737 rows=49001 loops=1)

Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))

Heap Fetches: 49001

Buffers: shared hit=355

Planning Time: 0.220 ms

Execution Time: 15.267 ms

(6 rows)

Maintaining order

If you have decided to cluster a table it does NOT mean that order on disk is maintained forever. If you run UPDATES etc. frequently the table might gradually loose order again. Therefore, CLUSTER is especially useful if your data is rather static. It can also make sense to order data as you import it to ensure physical order.

Finally …

If you want to learn more about database performance and storage consider checking out my post about shrinking the storage footprint of PostgreSQL.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

5 responses to “CLUSTER: Improving PostgreSQL performance”

Michael Christofides says:

March 17, 2020 at 11:58 am

Thanks for sharing! I like the idea in the docs to reduce the table fillfactor a little if the data is not static. It sounds like this could avoid requiring a re-cluster, or at least less often. Is it worth adding to the maintaining order section? Or is this a bad idea? Thanks again!

Reply
hmijail says:

June 28, 2020 at 11:48 am

I don't get it. If I am understanding it correctly, the last example is actually the worst performing of the whole article: 15 ms, when before CLUSTERing it was 13ms.

Reply
miguel says:

January 4, 2021 at 10:48 am

Getting columns that are not part of the index would be a better way to demonstrate the advantages of clustering a table.

Reply
Guenter says:

July 9, 2025 at 1:06 pm

Thanks for this great article !

But why is the Buffers count so high on the index idx_random in the first place - since the DB is using as "Index Only Scan"

According to the documentation (postgres.org) only index pages are used in this case:
"To solve this performance problem, PostgreSQL supports index-only scans, which can answer queries from an index alone without any heap access."

In my case I got the following execution plan:

user1db=> explain (analyze, buffers) SELECT *
FROM t_random
WHERE generate_series BETWEEN 1000 AND 50000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_random on t_random (cost=0.43..1690.11 rows=50684 width=4) (actual time=0.022..9.955 rows=49001 loops=1)
Index Cond: ((generate_series >= 1000) AND (generate_series <= 50000))
Heap Fetches: 0
Buffers: shared hit=18859 read=135
Planning:
Buffers: shared hit=19 read=3
Planning Time: 0.344 ms
Execution Time: 12.337 ms
(8 rows)

So 18859 Buffers where accessed, the whole index has about 27323 leaf pages.

# SELECT leaf_pages FROM pgstatindex('idx_random');
leaf_pages
------------
27323
(1 row)

So for reading about 50000 values (out of 10000000 values), most of the leaf nodes where read.

Could you probably shed some light on that ?

Thanks

Reply
- Laurenz Albe says:
  
  July 10, 2025 at 11:29 am
  
  A very observant question!
  These buffers are not index buffers, but buffers from the visibility map.
  When PostgreSQL performs an index-only scan, it checks the visibility map if the block is all-visible so that it can skip the expensive row fetch.
  Now if the table is clustered, the next row is probably in the same block ad the previous one, and the same visibility map entry applies.
  If the table is not clustered, PostgreSQL will often have to fetch a different visibility map entry to check the all-visible flag.
  The "buffers" statistics don't take duplicate access into account, so if PostgreSQL switches between the same visibility map buffers repeatedly, it will count them repeatedly.
  That is the reason for the high number.
  You can see that the execution time is not greatly affected, because all these buffers are cached in RAM.
  
  Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH