Oracle Hash Join
Oracle Hash Join
Background
It is very common that people says hash join is good for joining huge table while nested loop is good for small
table. I will not give any comment regarding that statement, but to me, the fundamental and the main
different between hash join and nested loop is that in nested loop we can look-up the inner table using any
value from the outer table (and get benefit of index access if any) while in hash join we cannot do that. The
only possibility to look-up to the inner table (probe table in term of hash join) is by using constant value.
In below example, actually we can divide the query into 2 separate queries.
Query 1:
Query 2:
So, even though we have index on ID column in those 2 tables, Oracle won’t be able to use that index access
(instead of full table scan).
Based on how Oracle do those 2 steps , there are 3 types of hash join:
1. Optimal
2. Onepass
3. Multipass
The objective of this article is to see the different between those 3 types along with few other scenarios to see
how Oracle handle it.
During the build phase, Oracle will create in-memory hash buckets (we may call it as hash table). The number
of hash bucket should be more than enough to avoid hash collision. Technically the hash buckets are split into
several partitions. In every partition there are several slots and those slots will be having several blocks. If I can
give analogy, it is very close to partition table. Apart from that, there is bitmap structure that maintain the slot
usage.
In-Memory Hash Table Table Segment
Every partition consists several slots Every segment consists several extents
Every slot consists several blocks Every extent consists several blocks
In this first 3 exercises I will use below attached script (create_tables.txt) to build the required tables.
Each table has 10,000 unique rows. To get the details information on hash join operation, we need to turn on
event 10104. Beside that we need to change ”workarea_size_policy” to MANUAL and we need to
configure the ”hash_area_size” to create the required scenarios (ibegin.sql).
create_tables.txt ibegin.sql
Optimal hash join is the best type, where Oracle doesn’t requires any temporary space to store the hash
bucket. Everything is done in the memory since there is enough memory to do that.
Let’s analyze the statistics which we can see in above section of trace file.
In the first 3 lines after “Join Type” we can see the size of memory:
o Hash Area is 10,304,443
o Slot Table is 9,478,144
This value is calculated later as 13 * 712 * 1,024, where:
13 is number of slot
712 kB is size of each slot
1,024 of course size of 1 kB
o Overhead is 826,299
This is calculated as Hash Area – Slot Table
Number of slot/ cluster is 13
Number of partition is 8
Number of block in every slot is 89
Block size is 8 kB
Slot size is 89 * 8 kB = 712 kB
Bitmap size for each partition is 64 kB
Bitmap for all partitions is 8 * 64 kB
Size of row is 220
The size for overhead is approximately 15 bytes in this table (220 – 205).
Next in below section, the Build phase is started and since we have the optimal hash join, we cannot see any
operation in the temporary tablespace.
Later is the Probe phase. Let’s check few statistics from the trace file:
All partitions (8) are fitted and available in memory
Only 8 out of 13 slots are used
In the “Partition Distribution” we can see:
I take 1 line as example
o Number of rows across all partitions
o How many number of cluster/ slot in each partition
o How many number of slot is available in the memory for each partition
o The status,an indication whether the partition is still in memory or not
If all partitions have kept=1 optimal hash join
If all partitions have kept=0 multipass hash join
If at least 1 partition has kept=1 onepass hash join
The number of bucket is 16,384
This value is 2^14 which is the closest one to 10,000 with this value, we are sure that hash collision
will not happened
Partitions
distribution
Below part shows the histogram of number of rows inside the hash bucket.
The last part is the overall statistics information which are quite self-explanation. The most interested part is
the first line after the title. From 16,384 available buckets, Oracle only use 7,538 buckets (the other 8,846
buckets are empty). That means there are some buckets that hold more than 1 row value (it can be saw in the
above histogram as well). So the hash function is efficient enough to manage/ address more than 1 row into
the bucket without any collision.
Number of rows
Lastly, below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – optimal” is increasing by 3, where actually only 1 is relevant for above
query (the other scenarios also show the increment of 2 in this statistic as well, so we should consider only 1).
There is no temporary tablespace activity in this test case.
In the above capture, the important point is number of block in each slot, which is only 13 (compare to 89 in
the optimal type). It makes the slot size is 104 kB since there are 13 slots in this exercise. The reason why
Oracle reduce the number of block is to manage at least 1 slot in every partition is available in the memory.
The next section (Build phase) is quite interested, Oracle start spilling the data to temporary tablespace.
Once Build phase completed, Oracle start the Probe phase as below. Let’s highlight few points:
In the trace file output we see a lot of writing and reading, and we see new section in Probe phase like below
(HASH JOIN GET FLUSHED PARTITIONS). This is the process of reading back the build table and then continue
with probing the second table to get the result. Oracle will do this operation for rest of partitions, and since
the memory is not sufficient to do the operation in one shoot, Oracle will iterates the operation.
We can see clearly in below partition, 1,224 rows are being processed from build and probe table (which is
Partition: 0 if we trace back to the initial step of Probe phase), and at the end of iteration, the number of rows
left to be iterated over is 0.
These is the list of all iterations in this test case. Not sure why Oracle didn’t do the operation in ordered
fashion (from Partition 0 to 5 or from Partition 5 down to 0).
This is the overall statistics for onepass type (not all rows are showed).
Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – onepass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is balanced activity between read and write against
temporary tablespace, which means Oracle only write once to temporary tablespace and read once from
temporary tablespace.
In multipass type, Oracle also needs to dump the data in the temporary tablespace due to insufficient hash
area memory, but when probing the second table, for each available partitions, Oracle will iterates SEVERAL
times. This is why it is called multipass. This is the least efficient type of hash join.
In above output, every slot has single block only, so the size of slot is 8 kB. The total memory for slot table is 14
* 8 * 1024 = 114,688. The Build probe is getting longer since the size of slot is less (for efficiency please go to
the attached trace file if you want to know how long the “writing” activity of Build phase .
Again power of 2
Below is the details iteration for Partition 0. There are 4 iterations for processing 1,224 rows in this partition.
Theoretically if we want to change the type to onepass operation, we need to multiply “hash_area_size” by 4,
so in this case we need to configure at least 131,072 * 4 = 524,288 (512 kB).
Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – multipass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is imbalance activity between read and write against
temporary tablespace. Oracle does the read part more compare to write part.
Now let’s create another test case which will shows us the impact of build table’s size to the performance and
memory size. The configuration will be like below:
Create 2 tables, one table with 10,000 rows (TBIG) and the other with 2,500 rows (TSMALL). Both the
tables has 100 distinct values
Set “workarea_size_policy” to MANUAL
Set “hash_area_size” to 3 MB
Create 2 scenarios, first scenario will uses TBIG as build table and the second will uses TSMALL as build
table
The complete table creation script is attached
create_tables2.txt
Again, I attach below statistics from TBIG and TSMALL table. The next capture is summary of consistent gets
for both scenarios (TSMALL as “build” table and TBIG as “build” table)
During Build phase, what Oracle reads all rows from “build” table. So in case of TBIG, Oracle requires 1,000
consistent gets (number of block in TBIG) and in case of TSMALL, Oracle requires 292 consistent gets (yet I
cannot explain the 78 different in this case)
The next Probe phase is more interesting, instead of loading all available blocks in “probe” table, it looks like
Oracle reads the second table row by row, 1 consistent get for single row. The result is 10,000 consistent gets
when we use TSMALL as “build” table (there are 10,000 rows in TBIG). In case of TBIG as “build” table, Oracle
requires 2,542 consistent gets (there are 2,500 rows in TSMALL again I cannot explain the 42 different, but
during the test I filled-up the buffer by doing full table scan against TBIG and TSMALL, not sure if this was the
RC).
From above comparison, we see Oracle works more efficient when the build table is small. We can see the
number of block in the slot is bigger and the number of bucket is smaller. It makes the overall memory
consumption is smaller when we have smaller build table.
Before we go to the conclusion that smaller build table is better than the bigger one, let’s retry the test case
with “workarea_size_policy” = AUTO which is default and recommended by Oracle.
Again we see that the memory consumption is better for smaller build table. But in this time, Oracle decided to
configure more blocks for each slot when bigger build table is used (this is in the reverse way if we compare to
previous test case when we set MANUAL for “workarea_size_policy”)
Conclusion
1. Multipass hash join is the most in-efficient type, and we can change it to, at least, onepass by multiply
the “hash_area_size“ by the number of iterations in one of hash partition.
2. Smaller table is always good as starting point for Build table until unless you see significant downgrade
in the performance. Small table will leads to smaller in-memory hash table.
2. Smaller table is not always good as “build” table, it depends ;-)
o It is good as “build” table as it requires smaller in-memory hash table to start the join
o In the other hand, it generates more consistent gets (again it depends of the size of Probe
table, the number of rows)
Saying this table is good as “build” table, or that table is not good (without confirmed by the number)
is not wise.
-heri-