Skip to content

Commit 8aba932

Browse files
committed
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions that are marked pending detach -- otherwise, the RI query is tricked into allowing a row into the referencing table whose corresponding row is in the detached partition. Which is bogus: once the detach operation completes, the row becomes an orphan. However, the code was not doing that in repeatable-read transactions, because relcache kept a copy of the partition descriptor that included the partition, and used it in the RI query. This commit changes the partdesc cache code to only keep descriptors that aren't dependent on a snapshot (namely: those where no detached partition exist, and those where detached partitions are included). When a partdesc-without- detached-partitions is requested, we create one afresh each time; also, those partdescs are stored in PortalContext instead of CacheMemoryContext. find_inheritance_children gets a new output *detached_exist boolean, which indicates whether any partition marked pending-detach is found. Its "include_detached" input flag is changed to "omit_detached", because that name captures desired the semantics more naturally. CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are identically renamed. This was noticed because a buildfarm member that runs with relcache clobbering, which would not keep the improperly cached partdesc, broke one test, which led us to realize that the expected output of that test was bogus. This commit also corrects that expected output. Author: Amit Langote <[email protected]> Author: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/[email protected]
1 parent 82b13db commit 8aba932

File tree

12 files changed

+160
-101
lines changed

12 files changed

+160
-101
lines changed

src/backend/catalog/heap.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -3840,7 +3840,7 @@ StorePartitionBound(Relation rel, Relation parent, PartitionBoundSpec *bound)
38403840
* removed.
38413841
*/
38423842
defaultPartOid =
3843-
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, false));
3843+
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, true));
38443844
if (OidIsValid(defaultPartOid))
38453845
CacheInvalidateRelcacheByRelid(defaultPartOid);
38463846

src/backend/catalog/pg_inherits.c

+39-24
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
5252
* then no locks are acquired, but caller must beware of race conditions
5353
* against possible DROPs of child relations.
5454
*
55-
* include_detached says to include all partitions, even if they're marked
56-
* detached. Passing it as false means they might or might not be included,
57-
* depending on the visibility of the pg_inherits row for the active snapshot.
55+
* If a partition's pg_inherits row is marked "detach pending",
56+
* *detached_exist (if not null) is set true, otherwise it is set false.
57+
*
58+
* If omit_detached is true and there is an active snapshot (not the same as
59+
* the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
60+
* marked "detach pending" is visible to that snapshot, then that partition is
61+
* omitted from the output list. This makes partitions invisible depending on
62+
* whether the transaction that marked those partitions as detached appears
63+
* committed to the active snapshot.
5864
*/
5965
List *
60-
find_inheritance_children(Oid parentrelId, bool include_detached,
61-
LOCKMODE lockmode)
66+
find_inheritance_children(Oid parentrelId, bool omit_detached,
67+
LOCKMODE lockmode, bool *detached_exist)
6268
{
6369
List *list = NIL;
6470
Relation relation;
@@ -78,6 +84,9 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
7884
if (!has_subclass(parentrelId))
7985
return NIL;
8086

87+
if (detached_exist)
88+
*detached_exist = false;
89+
8190
/*
8291
* Scan pg_inherits and build a working array of subclass OIDs.
8392
*/
@@ -99,29 +108,35 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
99108
{
100109
/*
101110
* Cope with partitions concurrently being detached. When we see a
102-
* partition marked "detach pending", we only include it in the set of
103-
* visible partitions if caller requested all detached partitions, or
104-
* if its pg_inherits tuple's xmin is still visible to the active
105-
* snapshot.
111+
* partition marked "detach pending", we omit it from the returned set
112+
* of visible partitions if caller requested that and the tuple's xmin
113+
* does not appear in progress to the active snapshot. (If there's no
114+
* active snapshot set, that means we're not running a user query, so
115+
* it's OK to always include detached partitions in that case; if the
116+
* xmin is still running to the active snapshot, then the partition
117+
* has not been detached yet and so we include it.)
106118
*
107-
* The reason for this check is that we want to avoid seeing the
119+
* The reason for this hack is that we want to avoid seeing the
108120
* partition as alive in RI queries during REPEATABLE READ or
109-
* SERIALIZABLE transactions. (If there's no active snapshot set,
110-
* that means we're not running a user query, so it's OK to always
111-
* include detached partitions in that case.)
121+
* SERIALIZABLE transactions: such queries use a different snapshot
122+
* than the one used by regular (user) queries.
112123
*/
113-
if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending &&
114-
!include_detached &&
115-
ActiveSnapshotSet())
124+
if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending)
116125
{
117-
TransactionId xmin;
118-
Snapshot snap;
126+
if (detached_exist)
127+
*detached_exist = true;
128+
129+
if (omit_detached && ActiveSnapshotSet())
130+
{
131+
TransactionId xmin;
132+
Snapshot snap;
119133

120-
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
121-
snap = GetActiveSnapshot();
134+
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
135+
snap = GetActiveSnapshot();
122136

123-
if (!XidInMVCCSnapshot(xmin, snap))
124-
continue;
137+
if (!XidInMVCCSnapshot(xmin, snap))
138+
continue;
139+
}
125140
}
126141

127142
inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
@@ -235,8 +250,8 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
235250
ListCell *lc;
236251

237252
/* Get the direct children of this rel */
238-
currentchildren = find_inheritance_children(currentrel, false,
239-
lockmode);
253+
currentchildren = find_inheritance_children(currentrel, true,
254+
lockmode, NULL);
240255

241256
/*
242257
* Add to the queue only those children not already seen. This avoids

src/backend/commands/indexcmds.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -1123,7 +1123,7 @@ DefineIndex(Oid relationId,
11231123
*/
11241124
if (partitioned && stmt->relation && !stmt->relation->inh)
11251125
{
1126-
PartitionDesc pd = RelationGetPartitionDesc(rel, false);
1126+
PartitionDesc pd = RelationGetPartitionDesc(rel, true);
11271127

11281128
if (pd->nparts != 0)
11291129
flags |= INDEX_CREATE_INVALID;
@@ -1180,7 +1180,7 @@ DefineIndex(Oid relationId,
11801180
*
11811181
* If we're called internally (no stmt->relation), recurse always.
11821182
*/
1183-
partdesc = RelationGetPartitionDesc(rel, false);
1183+
partdesc = RelationGetPartitionDesc(rel, true);
11841184
if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
11851185
{
11861186
int nparts = partdesc->nparts;

src/backend/commands/tablecmds.c

+20-20
Original file line numberDiff line numberDiff line change
@@ -1041,7 +1041,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
10411041
*/
10421042
defaultPartOid =
10431043
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent,
1044-
false));
1044+
true));
10451045
if (OidIsValid(defaultPartOid))
10461046
defaultRel = table_open(defaultPartOid, AccessExclusiveLock);
10471047

@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid,
35073507
* expected_parents will only be 0 if we are not already recursing.
35083508
*/
35093509
if (expected_parents == 0 &&
3510-
find_inheritance_children(myrelid, false, NoLock) != NIL)
3510+
find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
35113511
ereport(ERROR,
35123512
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
35133513
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid,
37063706
else
37073707
{
37083708
if (expected_parents == 0 &&
3709-
find_inheritance_children(myrelid, false, NoLock) != NIL)
3709+
find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
37103710
ereport(ERROR,
37113711
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
37123712
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -6580,7 +6580,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
65806580
*/
65816581
if (colDef->identity &&
65826582
recurse &&
6583-
find_inheritance_children(myrelid, false, NoLock) != NIL)
6583+
find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
65846584
ereport(ERROR,
65856585
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
65866586
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -6826,7 +6826,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
68266826
* use find_all_inheritors to do it in one pass.
68276827
*/
68286828
children =
6829-
find_inheritance_children(RelationGetRelid(rel), false, lockmode);
6829+
find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
68306830

68316831
/*
68326832
* If we are told not to recurse, there had better not be any child
@@ -6980,7 +6980,7 @@ ATPrepDropNotNull(Relation rel, bool recurse, bool recursing)
69806980
*/
69816981
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
69826982
{
6983-
PartitionDesc partdesc = RelationGetPartitionDesc(rel, false);
6983+
PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
69846984

69856985
Assert(partdesc != NULL);
69866986
if (partdesc->nparts > 0 && !recurse && !recursing)
@@ -7689,7 +7689,7 @@ ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recurs
76897689
* resulting state can be properly dumped and restored.
76907690
*/
76917691
if (!recurse &&
7692-
find_inheritance_children(RelationGetRelid(rel), false, lockmode))
7692+
find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL))
76937693
ereport(ERROR,
76947694
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
76957695
errmsg("ALTER TABLE / DROP EXPRESSION must be applied to child tables too")));
@@ -8297,7 +8297,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
82978297
* use find_all_inheritors to do it in one pass.
82988298
*/
82998299
children =
8300-
find_inheritance_children(RelationGetRelid(rel), false, lockmode);
8300+
find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
83018301

83028302
if (children)
83038303
{
@@ -8785,7 +8785,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
87858785
* use find_all_inheritors to do it in one pass.
87868786
*/
87878787
children =
8788-
find_inheritance_children(RelationGetRelid(rel), false, lockmode);
8788+
find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
87898789

87908790
/*
87918791
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -9400,7 +9400,7 @@ addFkRecurseReferenced(List **wqueue, Constraint *fkconstraint, Relation rel,
94009400
*/
94019401
if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
94029402
{
9403-
PartitionDesc pd = RelationGetPartitionDesc(pkrel, false);
9403+
PartitionDesc pd = RelationGetPartitionDesc(pkrel, true);
94049404

94059405
for (int i = 0; i < pd->nparts; i++)
94069406
{
@@ -9534,7 +9534,7 @@ addFkRecurseReferencing(List **wqueue, Constraint *fkconstraint, Relation rel,
95349534
}
95359535
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
95369536
{
9537-
PartitionDesc pd = RelationGetPartitionDesc(rel, false);
9537+
PartitionDesc pd = RelationGetPartitionDesc(rel, true);
95389538

95399539
/*
95409540
* Recurse to take appropriate action on each partition; either we
@@ -11318,8 +11318,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
1131811318
* use find_all_inheritors to do it in one pass.
1131911319
*/
1132011320
if (!is_no_inherit_constraint)
11321-
children =
11322-
find_inheritance_children(RelationGetRelid(rel), false, lockmode);
11321+
children = find_inheritance_children(RelationGetRelid(rel), true,
11322+
lockmode, NULL);
1132311323
else
1132411324
children = NIL;
1132511325

@@ -11703,8 +11703,8 @@ ATPrepAlterColumnType(List **wqueue,
1170311703
}
1170411704
}
1170511705
else if (!recursing &&
11706-
find_inheritance_children(RelationGetRelid(rel), false,
11707-
NoLock) != NIL)
11706+
find_inheritance_children(RelationGetRelid(rel), true,
11707+
NoLock, NULL) != NIL)
1170811708
ereport(ERROR,
1170911709
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
1171011710
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -16875,7 +16875,7 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
1687516875
}
1687616876
else if (scanrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
1687716877
{
16878-
PartitionDesc partdesc = RelationGetPartitionDesc(scanrel, false);
16878+
PartitionDesc partdesc = RelationGetPartitionDesc(scanrel, true);
1687916879
int i;
1688016880

1688116881
for (i = 0; i < partdesc->nparts; i++)
@@ -16935,7 +16935,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
1693516935
* new partition will change its partition constraint.
1693616936
*/
1693716937
defaultPartOid =
16938-
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, false));
16938+
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, true));
1693916939
if (OidIsValid(defaultPartOid))
1694016940
LockRelationOid(defaultPartOid, AccessExclusiveLock);
1694116941

@@ -17551,7 +17551,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
1755117551
* will change its partition constraint.
1755217552
*/
1755317553
defaultPartOid =
17554-
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, false));
17554+
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel, true));
1755517555
if (OidIsValid(defaultPartOid))
1755617556
{
1755717557
/*
@@ -18148,7 +18148,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
1814818148
RelationGetRelationName(partIdx))));
1814918149

1815018150
/* Make sure it indexes a partition of the other index's table */
18151-
partDesc = RelationGetPartitionDesc(parentTbl, false);
18151+
partDesc = RelationGetPartitionDesc(parentTbl, true);
1815218152
found = false;
1815318153
for (i = 0; i < partDesc->nparts; i++)
1815418154
{
@@ -18302,7 +18302,7 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
1830218302
* If we found as many inherited indexes as the partitioned table has
1830318303
* partitions, we're good; update pg_index to set indisvalid.
1830418304
*/
18305-
if (tuples == RelationGetPartitionDesc(partedTbl, false)->nparts)
18305+
if (tuples == RelationGetPartitionDesc(partedTbl, true)->nparts)
1830618306
{
1830718307
Relation idxRel;
1830818308
HeapTuple newtup;

src/backend/commands/trigger.c

+3-3
Original file line numberDiff line numberDiff line change
@@ -1119,7 +1119,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
11191119
*/
11201120
if (partition_recurse)
11211121
{
1122-
PartitionDesc partdesc = RelationGetPartitionDesc(rel, false);
1122+
PartitionDesc partdesc = RelationGetPartitionDesc(rel, true);
11231123
List *idxs = NIL;
11241124
List *childTbls = NIL;
11251125
ListCell *l;
@@ -1141,8 +1141,8 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
11411141
ListCell *l;
11421142
List *idxs = NIL;
11431143

1144-
idxs = find_inheritance_children(indexOid, false,
1145-
ShareRowExclusiveLock);
1144+
idxs = find_inheritance_children(indexOid, true,
1145+
ShareRowExclusiveLock, NULL);
11461146
foreach(l, idxs)
11471147
childTbls = lappend_oid(childTbls,
11481148
IndexGetRelation(lfirst_oid(l),

src/backend/executor/execPartition.c

+8-11
Original file line numberDiff line numberDiff line change
@@ -991,19 +991,16 @@ ExecInitPartitionDispatchInfo(EState *estate,
991991

992992
/*
993993
* For data modification, it is better that executor does not include
994-
* partitions being detached, except in snapshot-isolation mode. This
995-
* means that a read-committed transaction immediately gets a "no
996-
* partition for tuple" error when a tuple is inserted into a partition
997-
* that's being detached concurrently, but a transaction in repeatable-
998-
* read mode can still use the partition. Note that because partition
999-
* detach uses ShareLock on the partition (which conflicts with DML),
1000-
* we're certain that the detach won't be able to complete until any
1001-
* inserting transaction is done.
994+
* partitions being detached, except when running in snapshot-isolation
995+
* mode. This means that a read-committed transaction immediately gets a
996+
* "no partition for tuple" error when a tuple is inserted into a
997+
* partition that's being detached concurrently, but a transaction in
998+
* repeatable-read mode can still use such a partition.
1002999
*/
10031000
if (estate->es_partition_directory == NULL)
10041001
estate->es_partition_directory =
10051002
CreatePartitionDirectory(estate->es_query_cxt,
1006-
IsolationUsesXactSnapshot());
1003+
!IsolationUsesXactSnapshot());
10071004

10081005
oldcxt = MemoryContextSwitchTo(proute->memcxt);
10091006

@@ -1571,10 +1568,10 @@ ExecCreatePartitionPruneState(PlanState *planstate,
15711568
ListCell *lc;
15721569
int i;
15731570

1574-
/* Executor must always include detached partitions */
1571+
/* For data reading, executor always omits detached partitions */
15751572
if (estate->es_partition_directory == NULL)
15761573
estate->es_partition_directory =
1577-
CreatePartitionDirectory(estate->es_query_cxt, true);
1574+
CreatePartitionDirectory(estate->es_query_cxt, false);
15781575

15791576
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
15801577
Assert(n_part_hierarchies > 0);

src/backend/optimizer/util/plancat.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -2200,7 +2200,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
22002200
if (root->glob->partition_directory == NULL)
22012201
{
22022202
root->glob->partition_directory =
2203-
CreatePartitionDirectory(CurrentMemoryContext, false);
2203+
CreatePartitionDirectory(CurrentMemoryContext, true);
22042204
}
22052205

22062206
partdesc = PartitionDirectoryLookup(root->glob->partition_directory,

src/backend/partitioning/partbounds.c

+3-3
Original file line numberDiff line numberDiff line change
@@ -2798,7 +2798,7 @@ check_new_partition_bound(char *relname, Relation parent,
27982798
PartitionBoundSpec *spec, ParseState *pstate)
27992799
{
28002800
PartitionKey key = RelationGetPartitionKey(parent);
2801-
PartitionDesc partdesc = RelationGetPartitionDesc(parent, true);
2801+
PartitionDesc partdesc = RelationGetPartitionDesc(parent, false);
28022802
PartitionBoundInfo boundinfo = partdesc->boundinfo;
28032803
int with = -1;
28042804
bool overlap = false;
@@ -3991,7 +3991,7 @@ get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
39913991
{
39923992
int i;
39933993
int ndatums = 0;
3994-
PartitionDesc pdesc = RelationGetPartitionDesc(parent, true); /* XXX correct? */
3994+
PartitionDesc pdesc = RelationGetPartitionDesc(parent, false);
39953995
PartitionBoundInfo boundinfo = pdesc->boundinfo;
39963996

39973997
if (boundinfo)
@@ -4191,7 +4191,7 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
41914191
if (spec->is_default)
41924192
{
41934193
List *or_expr_args = NIL;
4194-
PartitionDesc pdesc = RelationGetPartitionDesc(parent, true); /* XXX correct? */
4194+
PartitionDesc pdesc = RelationGetPartitionDesc(parent, false);
41954195
Oid *inhoids = pdesc->oids;
41964196
int nparts = pdesc->nparts,
41974197
i;

0 commit comments

Comments
 (0)