Skip to content

Commit 9da0cc3

Browse files
committed
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
1 parent 9aef173 commit 9da0cc3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2237
-361
lines changed

contrib/bloom/blinsert.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,8 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
135135

136136
/* Do the heap scan */
137137
reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
138-
bloomBuildCallback, (void *) &buildstate);
138+
bloomBuildCallback, (void *) &buildstate,
139+
NULL);
139140

140141
/*
141142
* There are could be some items in cached page. Flush this page if

doc/src/sgml/config.sgml

+42-2
Original file line numberDiff line numberDiff line change
@@ -2022,7 +2022,8 @@ include_dir 'conf.d'
20222022

20232023
<para>
20242024
When changing this value, consider also adjusting
2025-
<xref linkend="guc-max-parallel-workers"/> and
2025+
<xref linkend="guc-max-parallel-workers"/>,
2026+
<xref linkend="guc-max-parallel-workers-maintenance"/>, and
20262027
<xref linkend="guc-max-parallel-workers-per-gather"/>.
20272028
</para>
20282029
</listitem>
@@ -2070,6 +2071,44 @@ include_dir 'conf.d'
20702071
</listitem>
20712072
</varlistentry>
20722073

2074+
<varlistentry id="guc-max-parallel-workers-maintenance" xreflabel="max_parallel_maintenance_workers">
2075+
<term><varname>max_parallel_maintenance_workers</varname> (<type>integer</type>)
2076+
<indexterm>
2077+
<primary><varname>max_parallel_maintenance_workers</varname> configuration parameter</primary>
2078+
</indexterm>
2079+
</term>
2080+
<listitem>
2081+
<para>
2082+
Sets the maximum number of parallel workers that can be
2083+
started by a single utility command. Currently, the only
2084+
parallel utility command that supports the use of parallel
2085+
workers is <command>CREATE INDEX</command>, and only when
2086+
building a B-tree index. Parallel workers are taken from the
2087+
pool of processes established by <xref
2088+
linkend="guc-max-worker-processes"/>, limited by <xref
2089+
linkend="guc-max-parallel-workers"/>. Note that the requested
2090+
number of workers may not actually be available at runtime.
2091+
If this occurs, the utility operation will run with fewer
2092+
workers than expected. The default value is 2. Setting this
2093+
value to 0 disables the use of parallel workers by utility
2094+
commands.
2095+
</para>
2096+
2097+
<para>
2098+
Note that parallel utility commands should not consume
2099+
substantially more memory than equivalent non-parallel
2100+
operations. This strategy differs from that of parallel
2101+
query, where resource limits generally apply per worker
2102+
process. Parallel utility commands treat the resource limit
2103+
<varname>maintenance_work_mem</varname> as a limit to be applied to
2104+
the entire utility command, regardless of the number of
2105+
parallel worker processes. However, parallel utility
2106+
commands may still consume substantially more CPU resources
2107+
and I/O bandwidth.
2108+
</para>
2109+
</listitem>
2110+
</varlistentry>
2111+
20732112
<varlistentry id="guc-max-parallel-workers" xreflabel="max_parallel_workers">
20742113
<term><varname>max_parallel_workers</varname> (<type>integer</type>)
20752114
<indexterm>
@@ -2079,8 +2118,9 @@ include_dir 'conf.d'
20792118
<listitem>
20802119
<para>
20812120
Sets the maximum number of workers that the system can support for
2082-
parallel queries. The default value is 8. When increasing or
2121+
parallel operations. The default value is 8. When increasing or
20832122
decreasing this value, consider also adjusting
2123+
<xref linkend="guc-max-parallel-workers-maintenance"/> and
20842124
<xref linkend="guc-max-parallel-workers-per-gather"/>.
20852125
Also, note that a setting for this value which is higher than
20862126
<xref linkend="guc-max-worker-processes"/> will have no effect,

doc/src/sgml/monitoring.sgml

+9-3
Original file line numberDiff line numberDiff line change
@@ -1263,7 +1263,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
12631263
<entry>Waiting in an extension.</entry>
12641264
</row>
12651265
<row>
1266-
<entry morerows="32"><literal>IPC</literal></entry>
1266+
<entry morerows="33"><literal>IPC</literal></entry>
12671267
<entry><literal>BgWorkerShutdown</literal></entry>
12681268
<entry>Waiting for background worker to shut down.</entry>
12691269
</row>
@@ -1371,6 +1371,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
13711371
<entry><literal>ParallelBitmapScan</literal></entry>
13721372
<entry>Waiting for parallel bitmap scan to become initialized.</entry>
13731373
</row>
1374+
<row>
1375+
<entry><literal>ParallelCreateIndexScan</literal></entry>
1376+
<entry>Waiting for parallel <command>CREATE INDEX</command> workers to finish heap scan.</entry>
1377+
</row>
13741378
<row>
13751379
<entry><literal>ProcArrayGroupUpdate</literal></entry>
13761380
<entry>Waiting for group leader to clear transaction id at transaction end.</entry>
@@ -3900,13 +3904,15 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
39003904
</row>
39013905
<row>
39023906
<entry><literal>sort-start</literal></entry>
3903-
<entry><literal>(int, bool, int, int, bool)</literal></entry>
3907+
<entry><literal>(int, bool, int, int, bool, int)</literal></entry>
39043908
<entry>Probe that fires when a sort operation is started.
39053909
arg0 indicates heap, index or datum sort.
39063910
arg1 is true for unique-value enforcement.
39073911
arg2 is the number of key columns.
39083912
arg3 is the number of kilobytes of work memory allowed.
3909-
arg4 is true if random access to the sort result is required.</entry>
3913+
arg4 is true if random access to the sort result is required.
3914+
arg5 indicates serial when <literal>0</literal>, parallel worker when
3915+
<literal>1</literal>, or parallel leader when <literal>2</literal>.</entry>
39103916
</row>
39113917
<row>
39123918
<entry><literal>sort-done</literal></entry>

doc/src/sgml/ref/create_index.sgml

+58
Original file line numberDiff line numberDiff line change
@@ -599,6 +599,64 @@ Indexes:
599599
which would drive the machine into swapping.
600600
</para>
601601

602+
<para>
603+
<productname>PostgreSQL</productname> can build indexes while
604+
leveraging multiple CPUs in order to process the table rows faster.
605+
This feature is known as <firstterm>parallel index
606+
build</firstterm>. For index methods that support building indexes
607+
in parallel (currently, only B-tree),
608+
<varname>maintenance_work_mem</varname> specifies the maximum
609+
amount of memory that can be used by each index build operation as
610+
a whole, regardless of how many worker processes were started.
611+
Generally, a cost model automatically determines how many worker
612+
processes should be requested, if any.
613+
</para>
614+
615+
<para>
616+
Parallel index builds may benefit from increasing
617+
<varname>maintenance_work_mem</varname> where an equivalent serial
618+
index build will see little or no benefit. Note that
619+
<varname>maintenance_work_mem</varname> may influence the number of
620+
worker processes requested, since parallel workers must have at
621+
least a <literal>32MB</literal> share of the total
622+
<varname>maintenance_work_mem</varname> budget. There must also be
623+
a remaining <literal>32MB</literal> share for the leader process.
624+
Increasing <xref linkend="guc-max-parallel-workers-maintenance"/>
625+
may allow more workers to be used, which will reduce the time
626+
needed for index creation, so long as the index build is not
627+
already I/O bound. Of course, there should also be sufficient
628+
CPU capacity that would otherwise lie idle.
629+
</para>
630+
631+
<para>
632+
Setting a value for <literal>parallel_workers</literal> via <xref
633+
linkend="sql-altertable"/> directly controls how many parallel
634+
worker processes will be requested by a <command>CREATE
635+
INDEX</command> against the table. This bypasses the cost model
636+
completely, and prevents <varname>maintenance_work_mem</varname>
637+
from affecting how many parallel workers are requested. Setting
638+
<literal>parallel_workers</literal> to 0 via <command>ALTER
639+
TABLE</command> will disable parallel index builds on the table in
640+
all cases.
641+
</para>
642+
643+
<tip>
644+
<para>
645+
You might want to reset <literal>parallel_workers</literal> after
646+
setting it as part of tuning an index build. This avoids
647+
inadvertent changes to query plans, since
648+
<literal>parallel_workers</literal> affects
649+
<emphasis>all</emphasis> parallel table scans.
650+
</para>
651+
</tip>
652+
653+
<para>
654+
While <command>CREATE INDEX</command> with the
655+
<literal>CONCURRENTLY</literal> option supports parallel builds
656+
without special restrictions, only the first table scan is actually
657+
performed in parallel.
658+
</para>
659+
602660
<para>
603661
Use <xref linkend="sql-dropindex"/>
604662
to remove an index.

doc/src/sgml/ref/create_table.sgml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1228,8 +1228,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
12281228
This sets the number of workers that should be used to assist a parallel
12291229
scan of this table. If not set, the system will determine a value based
12301230
on the relation size. The actual number of workers chosen by the planner
1231-
may be less, for example due to
1232-
the setting of <xref linkend="guc-max-worker-processes"/>.
1231+
or by utility statements that use parallel scans may be less, for example
1232+
due to the setting of <xref linkend="guc-max-worker-processes"/>.
12331233
</para>
12341234
</listitem>
12351235
</varlistentry>

src/backend/access/brin/brin.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -706,7 +706,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
706706
* heap blocks in physical order.
707707
*/
708708
reltuples = IndexBuildHeapScan(heap, index, indexInfo, false,
709-
brinbuildCallback, (void *) state);
709+
brinbuildCallback, (void *) state, NULL);
710710

711711
/* process the final batch */
712712
form_and_insert_tuple(state);
@@ -1205,7 +1205,7 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
12051205
state->bs_currRangeStart = heapBlk;
12061206
IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,
12071207
heapBlk, scanNumBlks,
1208-
brinbuildCallback, (void *) state);
1208+
brinbuildCallback, (void *) state, NULL);
12091209

12101210
/*
12111211
* Now we update the values obtained by the scan with the placeholder

src/backend/access/gin/gininsert.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
391391
* prefers to receive tuples in TID order.
392392
*/
393393
reltuples = IndexBuildHeapScan(heap, index, indexInfo, false,
394-
ginBuildCallback, (void *) &buildstate);
394+
ginBuildCallback, (void *) &buildstate, NULL);
395395

396396
/* dump remaining entries to the index */
397397
oldCtx = MemoryContextSwitchTo(buildstate.tmpCtx);

src/backend/access/gist/gistbuild.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
203203
* Do the heap scan.
204204
*/
205205
reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
206-
gistBuildCallback, (void *) &buildstate);
206+
gistBuildCallback, (void *) &buildstate, NULL);
207207

208208
/*
209209
* If buffering was used, flush out all the tuples that are still in the

src/backend/access/hash/hash.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
159159

160160
/* do the heap scan */
161161
reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
162-
hashbuildCallback, (void *) &buildstate);
162+
hashbuildCallback, (void *) &buildstate, NULL);
163163

164164
if (buildstate.spool)
165165
{

src/backend/access/hash/hashsort.c

+1
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ _h_spoolinit(Relation heap, Relation index, uint32 num_buckets)
8282
hspool->low_mask,
8383
hspool->max_buckets,
8484
maintenance_work_mem,
85+
NULL,
8586
false);
8687

8788
return hspool;

src/backend/access/heap/heapam.c

+24-4
Original file line numberDiff line numberDiff line change
@@ -1627,7 +1627,16 @@ heap_parallelscan_initialize(ParallelHeapScanDesc target, Relation relation,
16271627
SpinLockInit(&target->phs_mutex);
16281628
target->phs_startblock = InvalidBlockNumber;
16291629
pg_atomic_init_u64(&target->phs_nallocated, 0);
1630-
SerializeSnapshot(snapshot, target->phs_snapshot_data);
1630+
if (IsMVCCSnapshot(snapshot))
1631+
{
1632+
SerializeSnapshot(snapshot, target->phs_snapshot_data);
1633+
target->phs_snapshot_any = false;
1634+
}
1635+
else
1636+
{
1637+
Assert(snapshot == SnapshotAny);
1638+
target->phs_snapshot_any = true;
1639+
}
16311640
}
16321641

16331642
/* ----------------
@@ -1655,11 +1664,22 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
16551664
Snapshot snapshot;
16561665

16571666
Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
1658-
snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
1659-
RegisterSnapshot(snapshot);
1667+
1668+
if (!parallel_scan->phs_snapshot_any)
1669+
{
1670+
/* Snapshot was serialized -- restore it */
1671+
snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
1672+
RegisterSnapshot(snapshot);
1673+
}
1674+
else
1675+
{
1676+
/* SnapshotAny passed by caller (not serialized) */
1677+
snapshot = SnapshotAny;
1678+
}
16601679

16611680
return heap_beginscan_internal(relation, snapshot, 0, NULL, parallel_scan,
1662-
true, true, true, false, false, true);
1681+
true, true, true, false, false,
1682+
!parallel_scan->phs_snapshot_any);
16631683
}
16641684

16651685
/* ----------------

0 commit comments

Comments
 (0)