Teach predtest about IS [NOT] <boolean> proofs

Lists: pgsql-hackers
From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-11 19:59:46
Message-ID: CAAaqYe8Bo4bf_i6qKj8KBsmHMYXhe3Xt6vOe3OBQnOaf3_XBWg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I recently encountered a case where partial indexes were surprisingly not
being used. The issue is that predtest doesn't understand how boolean
values and IS <boolean> expressions relate.

For example if I have:

create table foo(i int, bar boolean);
create index on foo(i) where bar is true;

then this query:

select * from foo where i = 1 and bar;

doesn't use the partial index.

Attached is a patch that solves that issue. It also teaches predtest about
quite a few more cases involving BooleanTest expressions (e.g., how they
relate to NullTest expressions). One thing I could imagine being an
objection is that not all of these warrant cycles in planning. If that
turns out to be the case there's not a particularly clear line in my mind
about where to draw that line.

As noted in a TODO in the patch itself, I think it may be worth refactoring
the test_predtest module to run the "x, y" case as well as the "y, x" case
with a single call so as to eliminate a lot of repetition in
clause/expression test cases. If reviewers agree that's desirable, then I
could do that as a precursor.

Regards,
James Coleman

Attachment Content-Type Size
v1-0001-Teach-predtest-about-IS-NOT-bool-proofs.patch application/octet-stream 26.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-13 18:36:14
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> Attached is a patch that solves that issue. It also teaches predtest about
> quite a few more cases involving BooleanTest expressions (e.g., how they
> relate to NullTest expressions). One thing I could imagine being an
> objection is that not all of these warrant cycles in planning. If that
> turns out to be the case there's not a particularly clear line in my mind
> about where to draw that line.

I don't have an objection in principle to adding more smarts to
predtest.c. However, we should be wary of slowing down cases where
no BooleanTests are present to be optimized. I wonder if it could
help to use a switch on nodeTag rather than a series of if(IsA())
tests. (I'd be inclined to rewrite the inner if-then-else chains
as switches too, really. You get some benefit from the compiler
noticing whether you've covered all the enum values.)

I note you've actively broken the function's ability to cope with
NULL input pointers. Maybe we don't need it to, but I'm not going
to accept a patch that just side-swipes that case without any
justification.

Another way in which the patch needs more effort is that you've
not bothered to update the large comment block atop the function.
Perhaps, rather than hoping people will notice comments that are
potentially offscreen from what they're modifying, we should relocate
those comment paras to be adjacent to the relevant parts of the
function?

I've not gone through the patch in detail to see whether I believe
the proposed proof rules. It would help to have more comments
justifying them.

> As noted in a TODO in the patch itself, I think it may be worth refactoring
> the test_predtest module to run the "x, y" case as well as the "y, x" case
> with a single call so as to eliminate a lot of repetition in
> clause/expression test cases. If reviewers agree that's desirable, then I
> could do that as a precursor.

I think that's actively undesirable. It is not typically the case that
a proof rule for A => B also works in the other direction, so this would
encourage wasting cycles in the tests. I fear it might also cause
confusion about which direction a proof rule is supposed to work in.

regards, tom lane


From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-14 00:35:01
Message-ID: CAAaqYe-CGx7sEm1QRNaY=Pfti4KinLOPtgVr4LKoXY6wLa68Ag@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for taking a look!

On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > Attached is a patch that solves that issue. It also teaches predtest about
> > quite a few more cases involving BooleanTest expressions (e.g., how they
> > relate to NullTest expressions). One thing I could imagine being an
> > objection is that not all of these warrant cycles in planning. If that
> > turns out to be the case there's not a particularly clear line in my mind
> > about where to draw that line.
>
> I don't have an objection in principle to adding more smarts to
> predtest.c. However, we should be wary of slowing down cases where
> no BooleanTests are present to be optimized. I wonder if it could
> help to use a switch on nodeTag rather than a series of if(IsA())
> tests. (I'd be inclined to rewrite the inner if-then-else chains
> as switches too, really. You get some benefit from the compiler
> noticing whether you've covered all the enum values.)

I think I could take this on; would you prefer it as a patch in this
series? Or as a new patch thread?

> I note you've actively broken the function's ability to cope with
> NULL input pointers. Maybe we don't need it to, but I'm not going
> to accept a patch that just side-swipes that case without any
> justification.

I should have explained that. I don't think I've broken it:

1. predicate_implied_by_simple_clause() is only ever called by
predicate_implied_by_recurse()
2. predicate_implied_by_recurse() starts with:
pclass = predicate_classify(predicate, &pred_info);
3. predicate_classify(Node *clause, PredIterInfo info) starts off with:
Assert(clause != NULL);

I believe this means we are currently guaranteed by the caller to
receive a non-NULL pointer, but I could be missing something.

The same argument (just substituting the equivalent "refute" function
names) applies to predicate_refuted_by_simple_clause().

> Another way in which the patch needs more effort is that you've
> not bothered to update the large comment block atop the function.
> Perhaps, rather than hoping people will notice comments that are
> potentially offscreen from what they're modifying, we should relocate
> those comment paras to be adjacent to the relevant parts of the
> function?

Splitting up that block comment makes sense to me.

> I've not gone through the patch in detail to see whether I believe
> the proposed proof rules. It would help to have more comments
> justifying them.

Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
that I don't think there's a lot to say in justification. In some
cases I've noted the cases that force only strong or weak implication.

There are a few cases, though, (e.g., "X is unknown weakly implies X
is not true") that, reading over this again, don't immediately strike
me as obvious, so I'll expand on those.

> > As noted in a TODO in the patch itself, I think it may be worth refactoring
> > the test_predtest module to run the "x, y" case as well as the "y, x" case
> > with a single call so as to eliminate a lot of repetition in
> > clause/expression test cases. If reviewers agree that's desirable, then I
> > could do that as a precursor.
>
> I think that's actively undesirable. It is not typically the case that
> a proof rule for A => B also works in the other direction, so this would
> encourage wasting cycles in the tests. I fear it might also cause
> confusion about which direction a proof rule is supposed to work in.

That makes sense in the general case.

Boolean expressions seem like a special case in that regard: (subject
to what it looks like) would you be OK with a wrapping function that
does both directions (with output that shows which direction is being
tested) used only for the cases where we do want to check both
directions?

Thanks,
James Coleman


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-14 21:38:17
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't have an objection in principle to adding more smarts to
>> predtest.c. However, we should be wary of slowing down cases where
>> no BooleanTests are present to be optimized. I wonder if it could
>> help to use a switch on nodeTag rather than a series of if(IsA())
>> tests. (I'd be inclined to rewrite the inner if-then-else chains
>> as switches too, really. You get some benefit from the compiler
>> noticing whether you've covered all the enum values.)

> I think I could take this on; would you prefer it as a patch in this
> series? Or as a new patch thread?

No, keep it in the same thread (and make a CF entry, if you didn't
already). It might be best to make a series of 2 patches, first
just refactoring what's there per this discussion, and then a
second one to add BooleanTest logic.

>> I note you've actively broken the function's ability to cope with
>> NULL input pointers. Maybe we don't need it to, but I'm not going
>> to accept a patch that just side-swipes that case without any
>> justification.

> [ all callers have previously used predicate_classify ]

OK, fair enough. The checks for nulls are probably from ancient
habit, but I agree we could remove 'em here.

>> Perhaps, rather than hoping people will notice comments that are
>> potentially offscreen from what they're modifying, we should relocate
>> those comment paras to be adjacent to the relevant parts of the
>> function?

> Splitting up that block comment makes sense to me.

Done, let's make it so.

>> I've not gone through the patch in detail to see whether I believe
>> the proposed proof rules. It would help to have more comments
>> justifying them.

> Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
> that I don't think there's a lot to say in justification. In some
> cases I've noted the cases that force only strong or weak implication.

Yeah, it's the strong-vs-weak distinction that makes me cautious here.
One's high-school-algebra instinct for what's obviously true tends to
not think about NULL/UNKNOWN, and you do have to consider that.

>>> As noted in a TODO in the patch itself, I think it may be worth refactoring
>>> the test_predtest module to run the "x, y" case as well as the "y, x" case

>> I think that's actively undesirable. It is not typically the case that
>> a proof rule for A => B also works in the other direction, so this would
>> encourage wasting cycles in the tests. I fear it might also cause
>> confusion about which direction a proof rule is supposed to work in.

> That makes sense in the general case.

> Boolean expressions seem like a special case in that regard: (subject
> to what it looks like) would you be OK with a wrapping function that
> does both directions (with output that shows which direction is being
> tested) used only for the cases where we do want to check both
> directions?

Using a wrapper where appropriate would remove the inefficiency
concern, but I still worry whether it will promote confusion about
which direction we're proving things in. You'll need to be very clear
about the labeling.

regards, tom lane


From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-22 15:00:50
Message-ID: CAAaqYe_H=_SsisUuJUtTfsyw50+o+iqJtMgKd1B0qBAv3V_WNw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 14, 2023 at 4:38 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I don't have an objection in principle to adding more smarts to
> >> predtest.c. However, we should be wary of slowing down cases where
> >> no BooleanTests are present to be optimized. I wonder if it could
> >> help to use a switch on nodeTag rather than a series of if(IsA())
> >> tests. (I'd be inclined to rewrite the inner if-then-else chains
> >> as switches too, really. You get some benefit from the compiler
> >> noticing whether you've covered all the enum values.)
>
> > I think I could take this on; would you prefer it as a patch in this
> > series? Or as a new patch thread?
>
> No, keep it in the same thread (and make a CF entry, if you didn't
> already). It might be best to make a series of 2 patches, first
> just refactoring what's there per this discussion, and then a
> second one to add BooleanTest logic.

CF entry is already created; I'll keep it here then.

> >> I note you've actively broken the function's ability to cope with
> >> NULL input pointers. Maybe we don't need it to, but I'm not going
> >> to accept a patch that just side-swipes that case without any
> >> justification.
>
> > [ all callers have previously used predicate_classify ]
>
> OK, fair enough. The checks for nulls are probably from ancient
> habit, but I agree we could remove 'em here.
>
> >> Perhaps, rather than hoping people will notice comments that are
> >> potentially offscreen from what they're modifying, we should relocate
> >> those comment paras to be adjacent to the relevant parts of the
> >> function?
>
> > Splitting up that block comment makes sense to me.
>
> Done, let's make it so.
>
> >> I've not gone through the patch in detail to see whether I believe
> >> the proposed proof rules. It would help to have more comments
> >> justifying them.
>
> > Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
> > that I don't think there's a lot to say in justification. In some
> > cases I've noted the cases that force only strong or weak implication.
>
> Yeah, it's the strong-vs-weak distinction that makes me cautious here.
> One's high-school-algebra instinct for what's obviously true tends to
> not think about NULL/UNKNOWN, and you do have to consider that.
>
> >>> As noted in a TODO in the patch itself, I think it may be worth refactoring
> >>> the test_predtest module to run the "x, y" case as well as the "y, x" case
>
> >> I think that's actively undesirable. It is not typically the case that
> >> a proof rule for A => B also works in the other direction, so this would
> >> encourage wasting cycles in the tests. I fear it might also cause
> >> confusion about which direction a proof rule is supposed to work in.
>
> > That makes sense in the general case.
>
> > Boolean expressions seem like a special case in that regard: (subject
> > to what it looks like) would you be OK with a wrapping function that
> > does both directions (with output that shows which direction is being
> > tested) used only for the cases where we do want to check both
> > directions?
>
> Using a wrapper where appropriate would remove the inefficiency
> concern, but I still worry whether it will promote confusion about
> which direction we're proving things in. You'll need to be very clear
> about the labeling.

I've not yet applied all of your feedback, but I wanted to get an
initial read on your thoughts on how using switch statements ends up
looking. Attached is a single (pure refactor) patch that converts the
various if/else levels that check things like node tag and
boolean/null test type into switch statements. I've retained 'default'
keyword usages for now for simplicity (my intuition is that we
generally prefer to list out all options for compiler safety benefits,
though I'm not 100% sure that's useful in the outer node tag check
since it's unlikely that someone adding a new node would modify
this...).

My big question is: are you comfortable with the indentation explosion
this creates? IMO it's a lot wordier, but it is also more obvious what
the structural goal is. I'm not sure how we want to make the right
trade-off though.

Once there's agreement on this part, I'll add back a second patch
applying my changes on top of the refactor as well as apply other
feedback (e.g., splitting up the block comment).

Regards,
James Coleman

Attachment Content-Type Size
v2-0001-WIP-use-switch-statements.patch application/octet-stream 7.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2023-12-22 19:48:29
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> I've not yet applied all of your feedback, but I wanted to get an
> initial read on your thoughts on how using switch statements ends up
> looking. Attached is a single (pure refactor) patch that converts the
> various if/else levels that check things like node tag and
> boolean/null test type into switch statements. I've retained 'default'
> keyword usages for now for simplicity (my intuition is that we
> generally prefer to list out all options for compiler safety benefits,
> though I'm not 100% sure that's useful in the outer node tag check
> since it's unlikely that someone adding a new node would modify
> this...).

> My big question is: are you comfortable with the indentation explosion
> this creates? IMO it's a lot wordier, but it is also more obvious what
> the structural goal is. I'm not sure how we want to make the right
> trade-off though.

Yeah, I see what you mean. Also, I'd wanted to shove most of
the text in the function header in-line and get rid of the short
restatements of those paras. I carried that through just for
predicate_implied_by_simple_clause, as attached. The structure is
definitely clearer, but we end up with an awful lot of indentation,
which makes the comments less readable than I'd like. (I did some
minor rewording to make them flow better.)

On balance I think this is probably better than what we have, but
maybe we'd be best off to avoid doubly nested switches? I think
there's a good argument for the outer switch on nodeTag, but
maybe we're getting diminishing returns from an inner switch.

regards, tom lane

Attachment Content-Type Size
WIP-refactor-predicate_implied_by_simple_clause.patch text/x-diff 7.0 KB

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-01-18 00:34:36
Message-ID: CAAaqYe93AkueRS-=yH7VgD+6c2gz+ip=oW3nWSqZXmf-ieW4hQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 22, 2023 at 2:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > I've not yet applied all of your feedback, but I wanted to get an
> > initial read on your thoughts on how using switch statements ends up
> > looking. Attached is a single (pure refactor) patch that converts the
> > various if/else levels that check things like node tag and
> > boolean/null test type into switch statements. I've retained 'default'
> > keyword usages for now for simplicity (my intuition is that we
> > generally prefer to list out all options for compiler safety benefits,
> > though I'm not 100% sure that's useful in the outer node tag check
> > since it's unlikely that someone adding a new node would modify
> > this...).
>
> > My big question is: are you comfortable with the indentation explosion
> > this creates? IMO it's a lot wordier, but it is also more obvious what
> > the structural goal is. I'm not sure how we want to make the right
> > trade-off though.
>
> Yeah, I see what you mean. Also, I'd wanted to shove most of
> the text in the function header in-line and get rid of the short
> restatements of those paras. I carried that through just for
> predicate_implied_by_simple_clause, as attached. The structure is
> definitely clearer, but we end up with an awful lot of indentation,
> which makes the comments less readable than I'd like. (I did some
> minor rewording to make them flow better.)
>
> On balance I think this is probably better than what we have, but
> maybe we'd be best off to avoid doubly nested switches? I think
> there's a good argument for the outer switch on nodeTag, but
> maybe we're getting diminishing returns from an inner switch.
>
> regards, tom lane
>

Apologies for the long delay.

Attached is a new patch series.

0001 does the initial pure refactor. 0003 makes a lot of modifications
to what we can prove about implication and refutation. Finally, 0003
isn't intended to be committed, but attempts to validate more
holistically that none of the changes creates any invalid proofs
beyond the mostly happy-path tests added in 0004.

I ended up not tackling changing how test_predtest tests run for now.
That's plausibly still useful, and I'd be happy to add that if you
generally agree with the direction of the patch and with that
abstraction being useful.

I added some additional verifications to the test_predtest module to
prevent additional obvious flaws.

Regards,
James Coleman

Attachment Content-Type Size
v4-0003-Add-temporary-all-permutations-test.patch application/octet-stream 32.9 KB
v4-0001-Use-switch-statements-in-predicate_-implied-refut.patch application/octet-stream 14.6 KB
v4-0002-Teach-predtest.c-about-BooleanTest.patch application/octet-stream 51.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-01-22 17:57:41
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> 0001 does the initial pure refactor. 0003 makes a lot of modifications
> to what we can prove about implication and refutation. Finally, 0003
> isn't intended to be committed, but attempts to validate more
> holistically that none of the changes creates any invalid proofs
> beyond the mostly happy-path tests added in 0004.

> I ended up not tackling changing how test_predtest tests run for now.
> That's plausibly still useful, and I'd be happy to add that if you
> generally agree with the direction of the patch and with that
> abstraction being useful.

> I added some additional verifications to the test_predtest module to
> prevent additional obvious flaws.

I looked through 0001 and made some additional cosmetic changes,
mostly to get comments closer to the associated code; I also
ran pgindent on it (see v5-0001 attached). That seems pretty
committable to me at this point. I also like your 0002 additions to
test_predtest.c (although why the mixture of ERROR and WARNING?
ISTM they should all be WARNING, so we can press on with the test).

One other thought is that maybe separating out
predicate_implied_not_null_by_clause should be part of 0001?

I'm less excited about the rest of v4-0002.

@@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
!weak))
return true;

+ /*
+ * Because weak refutation expands the allowed outcomes for B
+ * from "false" to "false or null", we can additionally prove
+ * weak refutation in the case that strong refutation is proven.
+ */
+ if (weak && not_arg &&
+ predicate_implied_by_recurse(predicate, not_arg,
+ true))
+ return true;
+
switch (pclass)
{
case CLASS_AND:

I don't buy this bit at all. If the prior recursive call fails to
prove weak refutation in a case where strong refutation holds, how is
that not a bug lower down? Moreover, in order to mask such a bug,
you're doubling the time taken by failed proofs, which is an
unfortunate thing --- we don't like spending a lot of time on
something that fails to improve the plan.

@@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
Assert(list_length(op->args) == 2);
rightop = lsecond(op->args);

- /*
- * We might never see a null Const here, but better check
- * anyway
- */
- if (rightop && IsA(rightop, Const) &&
- !((Const *) rightop)->constisnull)
+ if (rightop && IsA(rightop, Const))
{
+ Const *constexpr = (Const *) rightop;
Node *leftop = linitial(op->args);

- if (DatumGetBool(((Const *) rightop)->constvalue))
- {
- /* X = true implies X */
- if (equal(predicate, leftop))
- return true;
- }
+ if (constexpr->constisnull)
+ return false;
+
+ if (DatumGetBool(constexpr->constvalue))
+ return equal(predicate, leftop);
else
- {
- /* X = false implies NOT X */
- if (is_notclause(predicate) &&
- equal(get_notclausearg(predicate), leftop))
- return true;
- }
+ return is_notclause(predicate) &&
+ equal(get_notclausearg(predicate), leftop);
}
}
}
break;

I don't understand what this bit is doing ... and the fact that
the patch removes all the existing comments and adds none isn't
helping that. What it seems to mostly be doing is adding early
"return false"s, which I'm not sure is a good thing, because
it seems possible that operator_predicate_proof could apply here.

+ case IS_UNKNOWN:
+ /*
+ * When the clause is in the form "foo IS UNKNOWN" then
+ * we can prove weak implication of a predicate that
+ * is strict for "foo" and negated. This doesn't work
+ * for strong implication since if "foo" is "null" then
+ * the predicate will evaluate to "null" rather than
+ * "true".
+ */

The phrasing of this comment seems randomly inconsistent with others
making similar arguments.

+ case IS_TRUE:
/*
- * If the predicate is of the form "foo IS NOT NULL",
- * and we are considering strong implication, we can
- * conclude that the predicate is implied if the
- * clause is strict for "foo", i.e., it must yield
- * false or NULL when "foo" is NULL. In that case
- * truth of the clause ensures that "foo" isn't NULL.
- * (Again, this is a safe conclusion because "foo"
- * must be immutable.) This doesn't work for weak
- * implication, though. Also, "row IS NOT NULL" does
- * not act in the simple way we have in mind.
+ * X implies X is true
+ *
+ * We can only prove strong implication here since
+ * `null is true` is false rather than null.
*/

This hardly seems like an improvement on the comment. (Also, here and
elsewhere, could we avoid using two different types of quotes?)

+ /* X is unknown weakly implies X is not true */
+ if (weak && clausebtest->booltesttype == IS_UNKNOWN &&
+ equal(clausebtest->arg, predbtest->arg))
+ return true;

Maybe I'm confused, but why is it only weak?

+ /*
+ * When we know what the predicate is in the form
+ * "foo IS UNKNOWN" then we can prove strong and
+ * weak refutation together. This is because the
+ * limits imposed by weak refutation (allowing
+ * "false" instead of just "null") is equivalently
+ * helpful since "foo" being "false" also refutes
+ * the predicate. Hence we pass weak=false here
+ * always.
+ */

This comment doesn't make sense to me either.

+ /* TODO: refactor this into switch statements also? */

Let's drop the TODO comments.

+ /*
+ * We can recurse into "not foo" without any additional processing because
+ * "not (null)" evaluates to null. That doesn't work for allow_false,
+ * however, since "not (false)" is true rather than null.
+ */
+ if (is_notclause(clause) &&
+ clause_is_strict_for((Node *) get_notclausearg(clause), subexpr, false))
+ return true;

Not exactly convinced by this. The way the comment is written, I'd
expect to not call clause_is_strict_for at all if allow_false. If
it's okay to call it anyway and pass allow_false = false, you need
to defend that, which this comment isn't doing.

regards, tom lane

Attachment Content-Type Size
v5-0001-Use-switch-statements-in-predicate_-implied-refut.patch text/x-diff 14.3 KB

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-01-25 01:08:47
Message-ID: CAAaqYe8CRCdMceeWkbGbrjQEyiP1S21hZYCKjuu=YFygpfJJdQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the feedback.

On Mon, Jan 22, 2024 at 12:57 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > 0001 does the initial pure refactor. 0003 makes a lot of modifications
> > to what we can prove about implication and refutation. Finally, 0003
> > isn't intended to be committed, but attempts to validate more
> > holistically that none of the changes creates any invalid proofs
> > beyond the mostly happy-path tests added in 0004.
>
> > I ended up not tackling changing how test_predtest tests run for now.
> > That's plausibly still useful, and I'd be happy to add that if you
> > generally agree with the direction of the patch and with that
> > abstraction being useful.
>
> > I added some additional verifications to the test_predtest module to
> > prevent additional obvious flaws.
>
> I looked through 0001 and made some additional cosmetic changes,
> mostly to get comments closer to the associated code; I also
> ran pgindent on it (see v5-0001 attached). That seems pretty
> committable to me at this point.

Great.

> I also like your 0002 additions to
> test_predtest.c (although why the mixture of ERROR and WARNING?
> ISTM they should all be WARNING, so we can press on with the test).

My reasoning is that one is a major error in something larger than
predtest, while the other is clearly "your code change isn't
accurate". The surrounding code seems to be drawing a distinction also
(it uses both ERROR and WARNING), and so I was trying to parallel that
appropriately.

I'm fine with making both WARNING though.

But does that also mean we should make other such cases WARNING as
well? For example, the query not returning two boolean columns doesn't
really seem like a reason to break subsequent tests.

I haven't changed this yet pending these questions.

> One other thought is that maybe separating out
> predicate_implied_not_null_by_clause should be part of 0001?

Would you prefer to commit a refactor along with some functionality
changes? Or one patch with the pure refactor and then a second patch
with the predicate_implied_not_null_by_clause changes?

> I'm less excited about the rest of v4-0002.
>
> @@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
> !weak))
> return true;
>
> + /*
> + * Because weak refutation expands the allowed outcomes for B
> + * from "false" to "false or null", we can additionally prove
> + * weak refutation in the case that strong refutation is proven.
> + */
> + if (weak && not_arg &&
> + predicate_implied_by_recurse(predicate, not_arg,
> + true))
> + return true;
> +
> switch (pclass)
> {
> case CLASS_AND:
>
> I don't buy this bit at all. If the prior recursive call fails to
> prove weak refutation in a case where strong refutation holds, how is
> that not a bug lower down?

This is one of the last additions I made while authoring the most
recent version of the patch, and at first I thought it suggested a bug
lower down also.

However the cases proven by these lines ("x is not false" is weakly
refuted by "not x", "x is false", and "x = false") correctly do not
have their not arg ("x") strongly implied by "x is not false" since
both "x is null" and "x is true" would have to imply "x", which
obviously doesn't hold. These aren't cases we're handling directly in
predicate_refuted_by_simple_clause.

This is caused by the asymmetry between implication and refutation
that I noted in my addition to the comments nearer the top of the
file:

+ * A notable difference between implication and refutation proofs is that
+ * strong/weak refutations don't vary the input of A (both must be true) but
+ * vary the allowed outcomes of B (false vs. non-truth), while for implications
+ * we vary both A (truth vs. non-falsity) and B (truth vs. non-falsity).

Put another way in the comments I added in test_predtest.c:

+ /* Because weak refutation proofs are a strict subset of strong refutation
+ * proofs (since for "A => B" "A" is always true) we ought never
have strong
+ * refutation hold when weak refutation does not.
+ *
+ * We can't make the same assertion for implication since moving
from strong
+ * to weak implication expands the allowed values of "A" from
true to either
+ * true or NULL.

We could decide to handle this particular failing case explicitly in
predicate_refuted_by_simple_clause as opposed to inferring it by
whether or not implication by the not-arg holds, but I suspect that
leaves us open to other cases we should be to prove refutation for but
don't.

Alternatively (to avoid unnecessary CPU burn) we could modify
predicate_implied_by_recurse (and functionals called by it) to have a
argument beyond "weak = true/false" Ie.g., an enum that allows for
something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
so I didn't want to do that right away unless there was agreement on
that direction.

I haven't changed this yet pending this discussion.

> Moreover, in order to mask such a bug,
> you're doubling the time taken by failed proofs, which is an
> unfortunate thing --- we don't like spending a lot of time on
> something that fails to improve the plan.

See above.

> @@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
> Assert(list_length(op->args) == 2);
> rightop = lsecond(op->args);
>
> - /*
> - * We might never see a null Const here, but better check
> - * anyway
> - */
> - if (rightop && IsA(rightop, Const) &&
> - !((Const *) rightop)->constisnull)
> + if (rightop && IsA(rightop, Const))
> {
> + Const *constexpr = (Const *) rightop;
> Node *leftop = linitial(op->args);
>
> - if (DatumGetBool(((Const *) rightop)->constvalue))
> - {
> - /* X = true implies X */
> - if (equal(predicate, leftop))
> - return true;
> - }
> + if (constexpr->constisnull)
> + return false;
> +
> + if (DatumGetBool(constexpr->constvalue))
> + return equal(predicate, leftop);
> else
> - {
> - /* X = false implies NOT X */
> - if (is_notclause(predicate) &&
> - equal(get_notclausearg(predicate), leftop))
> - return true;
> - }
> + return is_notclause(predicate) &&
> + equal(get_notclausearg(predicate), leftop);
> }
> }
> }
> break;
>
> I don't understand what this bit is doing ... and the fact that
> the patch removes all the existing comments and adds none isn't
> helping that. What it seems to mostly be doing is adding early
> "return false"s, which I'm not sure is a good thing, because
> it seems possible that operator_predicate_proof could apply here.

I was mostly bringing it in line with the style I have elsewhere in
the patch by pulling out the Const* into a variable to avoid repeated
casting.

That being said, you're right that I didn't catch in the many
revisions along the way that I'd added unnecessary early returns and
lost the comments. Fixed both of those in the next version.

> + case IS_UNKNOWN:
> + /*
> + * When the clause is in the form "foo IS UNKNOWN" then
> + * we can prove weak implication of a predicate that
> + * is strict for "foo" and negated. This doesn't work
> + * for strong implication since if "foo" is "null" then
> + * the predicate will evaluate to "null" rather than
> + * "true".
> + */
>
> The phrasing of this comment seems randomly inconsistent with others
> making similar arguments.

Changed.

> + case IS_TRUE:
> /*
> - * If the predicate is of the form "foo IS NOT NULL",
> - * and we are considering strong implication, we can
> - * conclude that the predicate is implied if the
> - * clause is strict for "foo", i.e., it must yield
> - * false or NULL when "foo" is NULL. In that case
> - * truth of the clause ensures that "foo" isn't NULL.
> - * (Again, this is a safe conclusion because "foo"
> - * must be immutable.) This doesn't work for weak
> - * implication, though. Also, "row IS NOT NULL" does
> - * not act in the simple way we have in mind.
> + * X implies X is true
> + *
> + * We can only prove strong implication here since
> + * `null is true` is false rather than null.
> */
>
> This hardly seems like an improvement on the comment. (Also, here and
> elsewhere, could we avoid using two different types of quotes?)

I think the git diff is confusing here. The old comment was about a
predicate "foo IS NOT NULL", but the new comment is about a predicate
"foo IS TRUE".

I did fix the usage of backticks though.

> + /* X is unknown weakly implies X is not true */
> + if (weak && clausebtest->booltesttype == IS_UNKNOWN &&
> + equal(clausebtest->arg, predbtest->arg))
> + return true;
>
> Maybe I'm confused, but why is it only weak?

You're not confused; this seems like a mistake (same with the IS NOT
FALSE below it).

> + /*
> + * When we know what the predicate is in the form
> + * "foo IS UNKNOWN" then we can prove strong and
> + * weak refutation together. This is because the
> + * limits imposed by weak refutation (allowing
> + * "false" instead of just "null") is equivalently
> + * helpful since "foo" being "false" also refutes
> + * the predicate. Hence we pass weak=false here
> + * always.
> + */
>
> This comment doesn't make sense to me either.

I rewrote the comment in the attached revision; let me know if that helps.

> + /* TODO: refactor this into switch statements also? */
>
> Let's drop the TODO comments.

This one was meant to be a question for you in review: do we want to
make that change? Or are we content to leave it as-is?

Either way, removed.

> + /*
> + * We can recurse into "not foo" without any additional processing because
> + * "not (null)" evaluates to null. That doesn't work for allow_false,
> + * however, since "not (false)" is true rather than null.
> + */
> + if (is_notclause(clause) &&
> + clause_is_strict_for((Node *) get_notclausearg(clause), subexpr, false))
> + return true;
>
> Not exactly convinced by this. The way the comment is written, I'd
> expect to not call clause_is_strict_for at all if allow_false. If
> it's okay to call it anyway and pass allow_false = false, you need
> to defend that, which this comment isn't doing.

I updated the comment to clarify. The restriction on allow_false
(always passing along false on the recursion case) is already
documented as a requirement in the function comment, but I wanted the
comment here to explain why that was necessary here, since in my
opinion it's not immediately obvious reading the function comment why
such a restriction would necessarily hold true for all recursion
cases.

Regards,
James Coleman

Attachment Content-Type Size
v6-0002-Teach-predtest.c-about-BooleanTest.patch application/octet-stream 51.1 KB
v6-0003-Add-temporary-all-permutations-test.patch application/octet-stream 32.9 KB
v6-0001-Use-switch-statements-in-predicate_-implied-refut.patch application/octet-stream 14.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-03-25 21:53:06
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> [ v6 patchset ]

I went ahead and committed 0001 after one more round of review

statements; my bad). I also added the changes in test_predtest.c from
0002. I attach a rebased version of 0002, as well as 0003 which isn't
changed, mainly to keep the cfbot happy.

I'm still not happy with what you did in predicate_refuted_by_recurse:
it feels wrong and rather expensively so. There has to be a better
way. Maybe strong vs. weak isn't quite the right formulation for
refutation tests?

regards, tom lane

Attachment Content-Type Size
v7-0002-Teach-predtest.c-about-BooleanTest.patch text/x-diff 48.3 KB
v6-0003-Add-temporary-all-permutations-test.patch text/x-diff 32.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-03-26 03:45:42
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I went ahead and committed 0001 after one more round of review
>
> statements; my bad). I also added the changes in test_predtest.c from
> 0002. I attach a rebased version of 0002, as well as 0003 which isn't
> changed, mainly to keep the cfbot happy.

[ squint.. ] Apparently I managed to hit ^K right before sending this
email. The missing line was meant to be more or less

> which found a couple of missing "break"

Not too important, but perhaps future readers of the archives will
be confused.

regards, tom lane


From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-04-01 12:05:02
Message-ID: CAAaqYe-+ai70UYGpFAejF1EDWwtf0Ob46+hjiyt4eSrz-cgMsQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 25, 2024 at 11:45 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > I went ahead and committed 0001 after one more round of review
> >
> > statements; my bad). I also added the changes in test_predtest.c from
> > 0002. I attach a rebased version of 0002, as well as 0003 which isn't
> > changed, mainly to keep the cfbot happy.
>
> [ squint.. ] Apparently I managed to hit ^K right before sending this
> email. The missing line was meant to be more or less
>
> > which found a couple of missing "break"
>
> Not too important, but perhaps future readers of the archives will
> be confused.

I was wondering myself :) so thanks for clarifying.

Regards,
James Coleman


From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-04-01 12:06:42
Message-ID: CAAaqYe9Cs6RttpMo1x0MdJKV9wxYJC5iknE6S7+5+dtY7q25Pg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 25, 2024 at 5:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > [ v6 patchset ]
>
> I went ahead and committed 0001 after one more round of review
>
> statements; my bad). I also added the changes in test_predtest.c from
> 0002. I attach a rebased version of 0002, as well as 0003 which isn't
> changed, mainly to keep the cfbot happy.
>
> I'm still not happy with what you did in predicate_refuted_by_recurse:
> it feels wrong and rather expensively so. There has to be a better
> way. Maybe strong vs. weak isn't quite the right formulation for
> refutation tests?

Possibly. Earlier I'd mused that:

> Alternatively (to avoid unnecessary CPU burn) we could modify
> predicate_implied_by_recurse (and functionals called by it) to have a
> argument beyond "weak = true/false" Ie.g., an enum that allows for
> something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
> so I didn't want to do that right away unless there was agreement on
> that direction.

I'm going to try implementing that and see how I feel about what it
looks like in practice.

Regards,
James Coleman


From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-04-06 00:43:33
Message-ID: CAAaqYe-s59GXnfEb-V39oBhp=W5zRDABn3Zk77vYOgZi8k+T6A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 1, 2024 at 8:06 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Mon, Mar 25, 2024 at 5:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > James Coleman <jtc331(at)gmail(dot)com> writes:
> > > [ v6 patchset ]
> >
> > I went ahead and committed 0001 after one more round of review
> >
> > statements; my bad). I also added the changes in test_predtest.c from
> > 0002. I attach a rebased version of 0002, as well as 0003 which isn't
> > changed, mainly to keep the cfbot happy.
> >
> > I'm still not happy with what you did in predicate_refuted_by_recurse:
> > it feels wrong and rather expensively so. There has to be a better
> > way. Maybe strong vs. weak isn't quite the right formulation for
> > refutation tests?
>
> Possibly. Earlier I'd mused that:
>
> > Alternatively (to avoid unnecessary CPU burn) we could modify
> > predicate_implied_by_recurse (and functionals called by it) to have a
> > argument beyond "weak = true/false" Ie.g., an enum that allows for
> > something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
> > so I didn't want to do that right away unless there was agreement on
> > that direction.
>
> I'm going to try implementing that and see how I feel about what it
> looks like in practice.

Attached is v8 which does this. Note that I kept the patch 0001 as
before and inserted a new 0002 to show exactly what's changed from the
previously version -- I wouldn't expect that to be committed
separately, of course. With this change we only need to recurse a
single time and can check for both strong and weak refutation when
either will do for proving refutation of the "NOT x" construct.

Regards,
James Coleman

Attachment Content-Type Size
v8-0001-Teach-predtest.c-about-BooleanTest.patch application/octet-stream 49.3 KB
v8-0003-Add-temporary-all-permutations-test.patch application/octet-stream 32.9 KB
v8-0002-Recurse-weak-and-strong-implication-at-the-same-t.patch application/octet-stream 12.1 KB