Lists: | pgsql-hackers |
---|
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Speed up Hash Join by teaching ExprState about hashing |
Date: | 2024-05-13 09:23:49 |
Message-ID: | CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=kEa4g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
can see that it calls ExecEvalExpr() and then manually calls the hash
function on the returned value. This process is repeated once for each
hash key. This is inefficient for a few reasons:
1) ExecEvalExpr() will only deform tuples up the max varattno that's
mentioned in the hash key. That means we might have to deform
attributes in multiple steps, once for each hash key.
2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
and keep_nulls on every loop. There's also a branch to check which
hash functions to use.
3) foreach isn't exactly the pinnacle of efficiency either.
All of the above points can be improved by making ExprState handle
hashing. This means we'll deform all attributes that are needed for
hashing once, rather than incrementally once per key. This also allows
JIT compilation of hashing ExprStates, which will make things even
faster.
The attached patch implements this. Here are some performance numbers.
## Test 1: rows=1000 jit=0
1 hash key
master = 4938.5 tps
patched = 5126.7 tps (+3.81%)
2 hash keys
master = 4326.4 tps
patched = 4520.2 tps (+4.48%)
3 hash keys
master = 4145.5 tps
patched = 4559.7 tps (+9.99%)
## Test 2: rows = 1000000 jit=1 (with opt and inline)
1 hash key
master = 3.663 tps
patched = 3.816 tps (+4.16%)
2 hash keys
master = 3.392 tps
patched = 3.550 tps (+4.67%)
3 hash keys
master = 3.086 tps
patched = 3.411 tps (+10.55%)
Benchmark script attached
Notes:
The ExecBuildHash32Expr() function to build the ExprState isn't called
from the same location as the previous ExecInitExprList() code. The
reason for this is that it's not possible to build the ExprState for
hashing in ExecInitHash() because we don't yet know the jointype and
we need to know that because the expression ExecBuildHash32Expr()
needs to allow NULLs for outer join types. I've put the
ExecBuildHash32Expr() call in ExecInitHashJoin() just after we set
hj_NullOuterTupleSlot and hj_NullOuterTupleSlot fields. I tried
having this code in ExecHashTableCreate(). but that's no good as we
only call that during executor run, which is too late as any SubPlans
in the hash keys need to be attributed to the correct parent. Since
EXPLAIN shows the subplans, this needs to be done before executor run.
I've not hacked on llvmjit_expr.c much before, so I'd be happy for a
detailed review of that code.
I manually checked hashvalues between JIT and non-JIT. They matched.
If we ever consider JITting more granularly, it might be worth always
applying the same jit flags to the hash exprs on either side of the
join. I've slight concerns about compiler bugs producing different
hash codes. Unsure if there are non-bug reasons for them to differ on
the same CPU architecture.
I've not looked at applications of this beyond hash join. I'm
considering other executor nodes to be follow-on material.
Thanks to Andres Freund for mentioning this idea to me.
Attachment | Content-Type | Size |
---|---|---|
bench.sh.txt | text/plain | 1.2 KB |
v1-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patch | application/octet-stream | 35.1 KB |
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Speed up Hash Join by teaching ExprState about hashing |
Date: | 2024-07-11 04:47:09 |
Message-ID: | CAApHDvpm1Kt8XyoQSASJ3b4wi_TnGNdOOPminNcioG5p7xFJyA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, 13 May 2024 at 21:23, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
> can see that it calls ExecEvalExpr() and then manually calls the hash
> function on the returned value. This process is repeated once for each
> hash key. This is inefficient for a few reasons:
>
> 1) ExecEvalExpr() will only deform tuples up the max varattno that's
> mentioned in the hash key. That means we might have to deform
> attributes in multiple steps, once for each hash key.
> 2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
> and keep_nulls on every loop. There's also a branch to check which
> hash functions to use.
> 3) foreach isn't exactly the pinnacle of efficiency either.
>
> All of the above points can be improved by making ExprState handle
> hashing. This means we'll deform all attributes that are needed for
> hashing once, rather than incrementally once per key. This also allows
> JIT compilation of hashing ExprStates, which will make things even
> faster.
>
> The attached patch implements this. Here are some performance numbers.
I've been doing a bit more work on this to start to add support for
faster hashing for hashing needs other than Hash Join. In the
attached, I've added support to give the hash value an initial value.
Support for that is required to allow Hash Aggregate to work. If you
look at what's being done now inside BuildTupleHashTableExt(), you'll
see that "hash_iv" exists there to allow an initial hash value. This
seems to be getting used to allow some variation in hash values
calculated inside parallel workers, per hashtable->hash_iv =
murmurhash32(ParallelWorkerNumber). One of my aims for this patch is
to always produce the same hash value before and after the patch, so
I've gone and implemented the equivalent functionality which can be
enabled or disabled as required depending on the use case.
I've not added support for Hash Aggregate quite yet. I did look at
doing that, but it seems to need quite a bit of refactoring to do it
nicely. The problem is that BuildTupleHashTableExt() receives
keyColIdx with the attribute numbers to hash. The new
ExecBuildHash32Expr() function requires a List of Exprs. It looks
like the keyColIdx array comes directly from the planner which is many
layers up and would need lots of code churn of function signatures to
change. While I could form Vars using the keyColIdx array to populate
the required List of Exprs, I so far can't decide where exactly that
should happen. I think probably the planner should form the Expr List.
It seems a bit strange to be doing makeVar() in the executor.
I currently think that it's fine to speed up Hash Join as phase one
for this patch. I can work more on improving hash value generation in
other locations later.
I'd be happy if someone else were to give this patch a review and
test. One part I struggled a bit with was finding a way to cast the
Size variable down to uint32 in LLVM. I tried to add a new supported
type for uint32 but just couldn't get it to work. Instead, I did:
v_tmp1 = LLVMBuildAnd(b, v_tmp1,
l_sizet_const(0xffffffff), "");
which works and I imagine compiled to the same code as a cast. It
just looks a bit strange.
David
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patch | application/octet-stream | 36.7 KB |
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Alexey Dvoichenkov <alexey(at)hyperplane(dot)net> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Speed up Hash Join by teaching ExprState about hashing |
Date: | 2024-08-14 23:36:57 |
Message-ID: | CAApHDvpncdT9JJbnLrB7UU=E1LnK8JOm4w+vkO6mBak1NN1YEw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, 11 Aug 2024 at 22:09, Alexey Dvoichenkov <alexey(at)hyperplane(dot)net> wrote:
> I like the idea so I started looking at this patch. I ran some tests,
> the query is an aggregation over a join of two tables with 5M rows,
> where "columns" is the number of join conditions. (Mostly the same as
> in your test.) The numbers are the average query run-time in seconds.
Thanks for running those tests.
I wondered if the hash table has 5M items that the non-predictable
memory access pattern when probing that table might be drowning out
some of the gains of producing hash values faster. I wrote the
attached script which creates a fairly small table but probes that
table much more than once per hash value. I tried to do that in a way
that didn't read or process lots of shared buffers so as not to put
additional pressure on the CPU caches, which could evict cache lines
of the hash table. I am seeing much larger performance gains from
that test. Up to 26% faster. Please see the attached .png file for the
results. I've also attached the script I used to get those results.
This time I tried 1-6 join columns and also included the test results
for jit=off, jit=on, jit optimize, jit inline for each of the 6
queries. You can see that with 5 and 6 columns that jit inline was
26% faster than master, but just 14% faster with 1 column. The
smallest improvement was with 1 col with jit=on at just 7% faster.
> - ExecHashGetHashValue, and
> - TupleHashTableHash_internal
>
> .. currently rotate the initial and previous hash values regardless of
> the NULL check. So the rotation should probably be placed before the
> NULL check in NEXT states if you want to preserve the existing
> behavior.
That's my mistake. I think originally I didn't see the sense in
rotating, but you're right. I think not doing that would have (1,
NULL) and (NULL, 1) hash to the same value. Maybe that's ok, but I
think it's much better not to take the risk and keep the behaviour the
same as master. The attached v3 patch does that. I've left the
client_min_messages=debug1 output in the patch for now. I checked the
hash values match with master using a FULL OUTER JOIN with a 3-column
join using 1000 random INTs, 10% of them NULL.
David
Attachment | Content-Type | Size |
---|---|---|
hashjoin_bench.sh.txt | text/plain | 2.0 KB |
![]() |
image/png | 72.4 KB |
v3-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patch | application/octet-stream | 37.4 KB |
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Alexey Dvoichenkov <alexey(at)hyperplane(dot)net> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Speed up Hash Join by teaching ExprState about hashing |
Date: | 2024-08-17 05:14:10 |
Message-ID: | CAApHDvpuVM43t763Rp6P6QfRh+G+36zVuLRtpdq2+vtdi4CuhA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 15 Aug 2024 at 19:50, Alexey Dvoichenkov <alexey(at)hyperplane(dot)net> wrote:
> I gave v3 another look. One tiny thing I've noticed is that you
> removed ExecHashGetHashValue() but not its forward declaration in
> include/executor/nodeHash.h
Fixed
> I also reviewed the JIT code this time, it looks reasonable to
> me. I've added names to some variables to make the IR easier to
> read. (Probably best to squash it into your patch, if you want to
> apply this.)
Thanks. I've included that.
I made another complete pass over this today and I noticed that there
were a few cases where I wasn't properly setting resnull and resvalue
to (Datum) 0.
I'm happy with the patch now. I am aware nothing currently uses
EEOP_HASHDATUM_SET_INITVAL, but I want to get moving with the Hash
Aggregate usages of this code fairly quickly and I'd rather get the
ExprState step code done now and not have to change it again.
v4 patch attached. If nobody else wants to look at this then I'm
planning on pushing it soon.
David
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patch | application/octet-stream | 39.3 KB |