0% found this document useful (0 votes)

65 views16 pages

Oracle Hash Join

Hash join performs an inner join between two tables by building a hash table from one table (the build table) and probing the other table (the probe table). There are three types of hash joins based on how Oracle performs the build and probe steps: optimal, onepass, and multipass. Optimal hash join keeps the entire hash table in memory and performs a single probe. Onepass hash join spills some data to temporary storage but probes partitions only once. Multipass hash join iterates over partitions multiple times due to insufficient memory.

Uploaded by

Heribertus Bramundito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views16 pages

Oracle Hash Join

Uploaded by

Heribertus Bramundito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Hash Join

Background

It is very common that people says hash join is good for joining huge table while nested loop is good for small
table. I will not give any comment regarding that statement, but to me, the fundamental and the main
different between hash join and nested loop is that in nested loop we can look-up the inner table using any
value from the outer table (and get benefit of index access if any) while in hash join we cannot do that. The
only possibility to look-up to the inner table (probe table in term of hash join) is by using constant value.

In below example, actually we can divide the query into 2 separate queries.

SELECT /+ leading(a) use_hash(b) / a., b.

FROM tbuild a, tprobe b
WHERE a.id = b.id;

Query 1:

SELECT a.* FROM tbuild;

Query 2:

SELECT b.* FROM tprobe;

So, even though we have index on ID column in those 2 tables, Oracle won’t be able to use that index access
(instead of full table scan).

In general, there are only 2 major steps in performing hash join:

1. Build hash table from 1 table based on pre-defined hash function
2. Probe the other table to get the result

Based on how Oracle do those 2 steps , there are 3 types of hash join:
1. Optimal
2. Onepass
3. Multipass

The objective of this article is to see the different between those 3 types along with few other scenarios to see
how Oracle handle it.

During the build phase, Oracle will create in-memory hash buckets (we may call it as hash table). The number
of hash bucket should be more than enough to avoid hash collision. Technically the hash buckets are split into
several partitions. In every partition there are several slots and those slots will be having several blocks. If I can
give analogy, it is very close to partition table. Apart from that, there is bitmap structure that maintain the slot
usage.
In-Memory Hash Table Table Segment

Hash partition Table partition

Every partition consists several slots Every segment consists several extents

Every slot consists several blocks Every extent consists several blocks

Start the Exercise

In this first 3 exercises I will use below attached script (create_tables.txt) to build the required tables.
Each table has 10,000 unique rows. To get the details information on hash join operation, we need to turn on
event 10104. Beside that we need to change ”workarea_size_policy” to MANUAL and we need to
configure the ”hash_area_size” to create the required scenarios (ibegin.sql).

create_tables.txt ibegin.sql

The query for this exercise is:

SELECT /+ leading(a) / a., b.

FROM tbuild a, tprobe b
WHERE a.id=b.id;
From above execution plan, the join will produces 10,000 rows and requires around 2 MB memory for hash
join. The size of hash join can be calculated as below. The query selects all columns in those 2 tables. Total size
of the columns are 201 + 4 = 205. We have 10,000 rows in the table, so in total we require 205 * 10,000 / 1,024
= 2,000 kB.

Hash Join - Optimal

workarea_size_policy = MANUAL
hash_area_size = 10485760 (10 MB)

Optimal hash join is the best type, where Oracle doesn’t requires any temporary space to store the hash
bucket. Everything is done in the memory since there is enough memory to do that.
Let’s analyze the statistics which we can see in above section of trace file.
 In the first 3 lines after “Join Type” we can see the size of memory:
o Hash Area is 10,304,443
o Slot Table is 9,478,144
This value is calculated later as 13 * 712 * 1,024, where:
 13 is number of slot
 712 kB is size of each slot
 1,024 of course size of 1 kB 
o Overhead is 826,299
This is calculated as Hash Area – Slot Table
 Number of slot/ cluster is 13
 Number of partition is 8
 Number of block in every slot is 89
 Block size is 8 kB
 Slot size is 89 * 8 kB = 712 kB
 Bitmap size for each partition is 64 kB
 Bitmap for all partitions is 8 * 64 kB
 Size of row is 220
The size for overhead is approximately 15 bytes in this table (220 – 205).

Next in below section, the Build phase is started and since we have the optimal hash join, we cannot see any
operation in the temporary tablespace.

Later is the Probe phase. Let’s check few statistics from the trace file:
 All partitions (8) are fitted and available in memory
 Only 8 out of 13 slots are used
 In the “Partition Distribution” we can see:
I take 1 line as example
o Number of rows across all partitions
o How many number of cluster/ slot in each partition
o How many number of slot is available in the memory for each partition
o The status,an indication whether the partition is still in memory or not
 If all partitions have kept=1  optimal hash join
 If all partitions have kept=0  multipass hash join
 If at least 1 partition has kept=1  onepass hash join
 The number of bucket is 16,384
This value is 2^14 which is the closest one to 10,000  with this value, we are sure that hash collision
will not happened

Partitions
distribution

Below part shows the histogram of number of rows inside the hash bucket.
The last part is the overall statistics information which are quite self-explanation. The most interested part is
the first line after the title. From 16,384 available buckets, Oracle only use 7,538 buckets (the other 8,846
buckets are empty). That means there are some buckets that hold more than 1 row value (it can be saw in the
above histogram as well). So the hash function is efficient enough to manage/ address more than 1 row into
the bucket without any collision.
Number of rows

Lastly, below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – optimal” is increasing by 3, where actually only 1 is relevant for above
query (the other scenarios also show the increment of 2 in this statistic as well, so we should consider only 1).
There is no temporary tablespace activity in this test case.

Hash Join - Onepass

workarea_size_policy = MANUAL
hash_area_size = 1572864 (1.5 MB)
In onepass type, Oracle needs to dump the data in the temporary tablespace due to insufficient hash area
memory, but when probing the second table, for each available partitions, Oracle only iterates ONCE. This is
why it is called onepass.

In the above capture, the important point is number of block in each slot, which is only 13 (compare to 89 in
the optimal type). It makes the slot size is 104 kB since there are 13 slots in this exercise. The reason why
Oracle reduce the number of block is to manage at least 1 slot in every partition is available in the memory.
The next section (Build phase) is quite interested, Oracle start spilling the data to temporary tablespace.

Once Build phase completed, Oracle start the Probe phase as below. Let’s highlight few points:

 In first operation, only 6 slots are in memory

 2 partitions are in memory

 2,481 rows are processed
 Above 3 items are also expressed in below capture
 Number of bucket is 4,096
Again, this value is 2^12 (so we can conclude that the value is the closest power of 2)
 All 13 slots are used

In the trace file output we see a lot of writing and reading, and we see new section in Probe phase like below
(HASH JOIN GET FLUSHED PARTITIONS). This is the process of reading back the build table and then continue
with probing the second table to get the result. Oracle will do this operation for rest of partitions, and since
the memory is not sufficient to do the operation in one shoot, Oracle will iterates the operation.

We can see clearly in below partition, 1,224 rows are being processed from build and probe table (which is
Partition: 0 if we trace back to the initial step of Probe phase), and at the end of iteration, the number of rows
left to be iterated over is 0.
These is the list of all iterations in this test case. Not sure why Oracle didn’t do the operation in ordered
fashion (from Partition 0 to 5 or from Partition 5 down to 0).

This is the overall statistics for onepass type (not all rows are showed).
Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – onepass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is balanced activity between read and write against
temporary tablespace, which means Oracle only write once to temporary tablespace and read once from
temporary tablespace.

Hash Join - Multipass

workarea_size_policy = MANUAL
hash_area_size = 131072 (128 kB)

In multipass type, Oracle also needs to dump the data in the temporary tablespace due to insufficient hash
area memory, but when probing the second table, for each available partitions, Oracle will iterates SEVERAL
times. This is why it is called multipass. This is the least efficient type of hash join.
In above output, every slot has single block only, so the size of slot is 8 kB. The total memory for slot table is 14
* 8 * 1024 = 114,688. The Build probe is getting longer since the size of slot is less (for efficiency please go to
the attached trace file if you want to know how long the “writing” activity of Build phase .

This is few capture of Probe phase for multipass type.

Again power of 2
Below is the details iteration for Partition 0. There are 4 iterations for processing 1,224 rows in this partition.
Theoretically if we want to change the type to onepass operation, we need to multiply “hash_area_size” by 4,
so in this case we need to configure at least 131,072 * 4 = 524,288 (512 kB).
Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – multipass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is imbalance activity between read and write against
temporary tablespace. Oracle does the read part more compare to write part.

Which One Should Be The Build Table

Now let’s create another test case which will shows us the impact of build table’s size to the performance and
memory size. The configuration will be like below:
 Create 2 tables, one table with 10,000 rows (TBIG) and the other with 2,500 rows (TSMALL). Both the
tables has 100 distinct values
 Set “workarea_size_policy” to MANUAL
 Set “hash_area_size” to 3 MB
 Create 2 scenarios, first scenario will uses TBIG as build table and the second will uses TSMALL as build
table
 The complete table creation script is attached

create_tables2.txt

 The trace files output are attached

These are the execution plan for both queries. The consistent gets is bigger when we use TSMALL as build
table. (unfortunately I didn’t turn on event 10200 so I don’t know from where those consistent gets are
coming). The reason behind this symptom can be explained if we turn on event 10200 (for dumping consistent
gets). Please find below attached excel for the details of consistent gets, along with its trace files.

DBA series - Hash

Join.xlsx

Again, I attach below statistics from TBIG and TSMALL table. The next capture is summary of consistent gets
for both scenarios (TSMALL as “build” table and TBIG as “build” table)

TSMALL as “build” table

TBIG as “build” table

During Build phase, what Oracle reads all rows from “build” table. So in case of TBIG, Oracle requires 1,000
consistent gets (number of block in TBIG) and in case of TSMALL, Oracle requires 292 consistent gets (yet I
cannot explain the 78 different in this case)

The next Probe phase is more interesting, instead of loading all available blocks in “probe” table, it looks like
Oracle reads the second table row by row, 1 consistent get for single row. The result is 10,000 consistent gets
when we use TSMALL as “build” table (there are 10,000 rows in TBIG). In case of TBIG as “build” table, Oracle
requires 2,542 consistent gets (there are 2,500 rows in TSMALL  again I cannot explain the 42 different, but
during the test I filled-up the buffer by doing full table scan against TBIG and TSMALL, not sure if this was the
RC).

Apart from that, everything is similar.

Now let’s analyze the trace file to see the different from hash memory configuration and components.

From above comparison, we see Oracle works more efficient when the build table is small. We can see the
number of block in the slot is bigger and the number of bucket is smaller. It makes the overall memory
consumption is smaller when we have smaller build table.
Before we go to the conclusion that smaller build table is better than the bigger one, let’s retry the test case
with “workarea_size_policy” = AUTO which is default and recommended by Oracle.

Again we see that the memory consumption is better for smaller build table. But in this time, Oracle decided to
configure more blocks for each slot when bigger build table is used (this is in the reverse way if we compare to
previous test case when we set MANUAL for “workarea_size_policy”)

Conclusion
1. Multipass hash join is the most in-efficient type, and we can change it to, at least, onepass by multiply
the “hash_area_size“ by the number of iterations in one of hash partition.
2. Smaller table is always good as starting point for Build table until unless you see significant downgrade
in the performance. Small table will leads to smaller in-memory hash table.
2. Smaller table is not always good as “build” table, it depends ;-)
o It is good as “build” table as it requires smaller in-memory hash table to start the join
o In the other hand, it generates more consistent gets (again it depends of the size of Probe
table, the number of rows)
Saying this table is good as “build” table, or that table is not good (without confirmed by the number)
is not wise.

-heri-

biesse works Manual Tooling
No ratings yet
biesse works Manual Tooling
127 pages
DC52D MK3 PC Software User Manual V1.0-20220223
100% (1)
DC52D MK3 PC Software User Manual V1.0-20220223
36 pages
Bitmap: 10.1 Bitblt and Stretchblt
No ratings yet
Bitmap: 10.1 Bitblt and Stretchblt
52 pages
Hash Joins - Implementation and Tuning
No ratings yet
Hash Joins - Implementation and Tuning
20 pages
1z0-515 Answers Explanation
100% (1)
1z0-515 Answers Explanation
31 pages
Evidence of Sort Sizes in An Oracle10g Database
No ratings yet
Evidence of Sort Sizes in An Oracle10g Database
230 pages
Compression Compression Compression Compression and and and and G Hi G Hi Graphics Graphics
No ratings yet
Compression Compression Compression Compression and and and and G Hi G Hi Graphics Graphics
20 pages
0 - Oracle Full & SQL Topic
No ratings yet
0 - Oracle Full & SQL Topic
168 pages
Oracle Partitioning For Developers
No ratings yet
Oracle Partitioning For Developers
11 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
Instruction Manual: High Performance Digital Imaging System
No ratings yet
Instruction Manual: High Performance Digital Imaging System
98 pages
MAX BPM OLED Buzzer - Ino
No ratings yet
MAX BPM OLED Buzzer - Ino
3 pages
Explain Plan
No ratings yet
Explain Plan
4 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
DBMS 10 Joins v2
No ratings yet
DBMS 10 Joins v2
38 pages
InMemory Table 1718287885
No ratings yet
InMemory Table 1718287885
13 pages
Rich Text Box Tricks and Tips - VBForums
No ratings yet
Rich Text Box Tricks and Tips - VBForums
17 pages
Solution 03
No ratings yet
Solution 03
6 pages
EN eNBSP SDK NBioAPI-ImageConverter
No ratings yet
EN eNBSP SDK NBioAPI-ImageConverter
10 pages
Top 10, No - Make That 11, Things About Oracle Database 11g Release 1
No ratings yet
Top 10, No - Make That 11, Things About Oracle Database 11g Release 1
81 pages
Operasi Citra Negatif Dan Operasi Clipping: Tugas 2 Disusun Untuk Memenuhi Tugas Matakuliah Pengolahan Citra Digital
No ratings yet
Operasi Citra Negatif Dan Operasi Clipping: Tugas 2 Disusun Untuk Memenuhi Tugas Matakuliah Pengolahan Citra Digital
5 pages
Oracle Datatypes: Character Strings CHAR (Size)
No ratings yet
Oracle Datatypes: Character Strings CHAR (Size)
43 pages
Example of Visual Basic Projects
No ratings yet
Example of Visual Basic Projects
37 pages
Ada Ts-3000 Manual
No ratings yet
Ada Ts-3000 Manual
39 pages
PracticalPartitioning v2
No ratings yet
PracticalPartitioning v2
76 pages
Row Migration N Chaining
No ratings yet
Row Migration N Chaining
18 pages
ADBMS PPT
No ratings yet
ADBMS PPT
15 pages
Interview Questions q
No ratings yet
Interview Questions q
9 pages
XSpider 2.13 en Reference Manual
No ratings yet
XSpider 2.13 en Reference Manual
236 pages
Lecture16 Fall
No ratings yet
Lecture16 Fall
81 pages
Chapter 1 Part II
No ratings yet
Chapter 1 Part II
22 pages
BMP_Guidelines_V1
No ratings yet
BMP_Guidelines_V1
6 pages
Oracle Join Algorithms
No ratings yet
Oracle Join Algorithms
7 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Autotrace
No ratings yet
Autotrace
4 pages
Tunning Dss Queries
No ratings yet
Tunning Dss Queries
16 pages
CSE_CSPC403_DBMS-70
No ratings yet
CSE_CSPC403_DBMS-70
1 page
Partitioning With Oracle 11G: Bert Scalzo, Domain Expert, Oracle Solutions
No ratings yet
Partitioning With Oracle 11G: Bert Scalzo, Domain Expert, Oracle Solutions
45 pages
Library Cache Internals
No ratings yet
Library Cache Internals
66 pages
DC42S PC Software User Manual V1.0-20220223
100% (1)
DC42S PC Software User Manual V1.0-20220223
36 pages
Flashback Snapshot of Schema
No ratings yet
Flashback Snapshot of Schema
49 pages
Oracle SQL Plan Execution
No ratings yet
Oracle SQL Plan Execution
28 pages
Database Modeling - notes-VI
No ratings yet
Database Modeling - notes-VI
8 pages
Application Tuning
No ratings yet
Application Tuning
11 pages
Important Questions Asked in Interviews
No ratings yet
Important Questions Asked in Interviews
55 pages
Oracle Objects Vs Cache Objects
No ratings yet
Oracle Objects Vs Cache Objects
8 pages
Ec Council.passguide.312 49v10.Simulations.2022 Jul 27.by.cornelius.219q.vce
No ratings yet
Ec Council.passguide.312 49v10.Simulations.2022 Jul 27.by.cornelius.219q.vce
49 pages
Mini Thermal Receipt Printer
No ratings yet
Mini Thermal Receipt Printer
47 pages
Knowing The Internals - Who Needs SQL Server Anyway - Mark Rasmussen
No ratings yet
Knowing The Internals - Who Needs SQL Server Anyway - Mark Rasmussen
98 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Computer Science Ocr - j277
20% (5)
Computer Science Ocr - j277
18 pages
PostgreSQL - MERGE JOIN Vs HASH JOIN
No ratings yet
PostgreSQL - MERGE JOIN Vs HASH JOIN
3 pages
Oracle 11g Partitioning
No ratings yet
Oracle 11g Partitioning
11 pages
Fifa Manager 13 - Graphics Guide
75% (4)
Fifa Manager 13 - Graphics Guide
11 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
6 pages
Oracle DBA Basics 2
No ratings yet
Oracle DBA Basics 2
19 pages
Partition Wise Joins
No ratings yet
Partition Wise Joins
3 pages
Oracle Partitioned Tables
No ratings yet
Oracle Partitioned Tables
38 pages
World Machine 2 User Guide
No ratings yet
World Machine 2 User Guide
83 pages
Laser Engraving Machines_New (1)
No ratings yet
Laser Engraving Machines_New (1)
20 pages
Internal Tables: Why We Need Internal Table
No ratings yet
Internal Tables: Why We Need Internal Table
5 pages
Setting The Degree of Parallelism: Figure C-4
No ratings yet
Setting The Degree of Parallelism: Figure C-4
16 pages
SQL Commands
No ratings yet
SQL Commands
24 pages
05 EASY ROB Chapter 03
No ratings yet
05 EASY ROB Chapter 03
18 pages
Effective Query Writing
No ratings yet
Effective Query Writing
32 pages
Oracle DBA Tuning
No ratings yet
Oracle DBA Tuning
10 pages
Data Representation Homework #01 - Answers: A) 1100 B) 111010 C) 1010010 D) 10011 E) 101101
No ratings yet
Data Representation Homework #01 - Answers: A) 1100 B) 111010 C) 1010010 D) 10011 E) 101101
67 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
22 pages
Introduction To Multimedia
No ratings yet
Introduction To Multimedia
217 pages
Indexes
No ratings yet
Indexes
18 pages
Five Tuning Tips For Your Data Warehouse
No ratings yet
Five Tuning Tips For Your Data Warehouse
46 pages
ABAP Internal Tables
100% (1)
ABAP Internal Tables
10 pages
Fanta Morph 5
No ratings yet
Fanta Morph 5
157 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Nested Loops, Hash Join and Sort Merge Joins - Difference?: Nested Loop (Loop Over Loop)
No ratings yet
Nested Loops, Hash Join and Sort Merge Joins - Difference?: Nested Loop (Loop Over Loop)
7 pages
DVDSubEdit Manual
No ratings yet
DVDSubEdit Manual
27 pages
Logical IO Vs Physical IO Vs Consistent Gets
No ratings yet
Logical IO Vs Physical IO Vs Consistent Gets
11 pages
HangersSupportsReferenceDataGuide PDF
No ratings yet
HangersSupportsReferenceDataGuide PDF
98 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Woodward Leo PC 37164 - B
No ratings yet
Woodward Leo PC 37164 - B
40 pages
Unison Light Manager v1.65 Manual
No ratings yet
Unison Light Manager v1.65 Manual
115 pages
Oracle LibraryCacheInternals JulianDyke
No ratings yet
Oracle LibraryCacheInternals JulianDyke
66 pages
TR Ece Ba GB 00043
No ratings yet
TR Ece Ba GB 00043
37 pages
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Essential Algorithms: A Practical Approach to Computer Algorithms Using Python and C#
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms Using Python and C#
Rod Stephens
4.5/5 (2)
Essential Algorithms: A Practical Approach to Computer Algorithms
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms
Rod Stephens
4.5/5 (2)
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet

Oracle Hash Join

Uploaded by

Oracle Hash Join

Uploaded by

Hash Join

SELECT /*+ leading(a) use_hash(b) */ a.*, b.*

SELECT a.* FROM tbuild;

SELECT b.* FROM tprobe;

In general, there are only 2 major steps in performing hash join:

Hash partition Table partition

Start the Exercise

The query for this exercise is:

SELECT /*+ leading(a) */ a.*, b.*

Hash Join - Optimal

Hash Join - Onepass

 In first operation, only 6 slots are in memory

 2 partitions are in memory

Hash Join - Multipass

This is few capture of Probe phase for multipass type.

Which One Should Be The Build Table

 The trace files output are attached

DBA series - Hash

TSMALL as “build” table

TBIG as “build” table

Apart from that, everything is similar.

You might also like

SELECT /+ leading(a) use_hash(b) / a., b.

SELECT /+ leading(a) / a., b.