0% found this document useful (0 votes)
1K views15 pages

Adams - Hash Joins Oracle

Hash joins work by mapping join keys to hash values and buckets using a hash function. The first input is partitioned and built into an in-memory hash table. The second input is then probed against this hash table to identify matches and return joined rows. Bitmaps and histograms are used to optimize partitioning and memory usage during the build and probe phases.

Uploaded by

rockerabc123
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views15 pages

Adams - Hash Joins Oracle

Hash joins work by mapping join keys to hash values and buckets using a hash function. The first input is partitioned and built into an in-memory hash table. The second input is then probed against this hash table to identify matches and return joined rows. Bitmaps and histograms are used to optimize partitioning and memory usage during the build and probe phases.

Uploaded by

rockerabc123
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Hash Join Internals

Unix + Oracle =
Steve Adams
Ixora [email protected]
Hash Functions
• A hash function maps arbitrary key values
to integer “hash values”
• Data is stored in an array based on the hash
values
• Data access is very efficient
• just compute the hash value of the key
and lookup the corresponding array element

Unix + Oracle =
Steve Adams
Ixora [email protected]
Hash Buckets
• Too many possible hash values
• hash tables would be large and sparse
• Hash values are mapped to hash buckets
using the mod() function
• 100056732 % 100 = 32
• Each hash table element is a “bucket”
• Hash values that map to the same bucket are
stored on a “collision chain”
Unix + Oracle =
Steve Adams
Ixora [email protected]
Numeric keys
• Key data itself is the hash value
• Hash bucket is computed using MOD
• binary hashes are cheap to compute
• just a bitwise SHIFT operation
• prime number hashes randomize the
distribution of keys to hash buckets
• this prevents uneven or “skew” distributions

Unix + Oracle =
Steve Adams
Ixora [email protected]
Non-numeric Keys
• Internal hash function to get hash value
• examples:
• ‘Adams’  3016007180
• ‘Millsap’  1765538108
• DBMS_UTILITY.GET_HASH_VALUE()
• Binary MOD() used to map hash values to
hash buckets

Unix + Oracle =
Steve Adams
Ixora [email protected]
Key Value Hash Tables
Hash Function Hash Value

Hash Bucket Number

Hash Table
hash
bucket Bucket 1 Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8
headers

Hash Value Key


Hash Value Key
Hash Value Key
collision
Hash Value Key Hash Value Key
chains
Hash Value Key Hash Value Key
Hash Value Key

Unix + Oracle =
Steve Adams
Ixora [email protected]
Hash Joins
• Concept
• read first row source and build a hash table
• read second row source and join via hash table
• applicable to (in)-equality joins & CBO only
• Approach
• first “partition” both inputs by hash value
• for corresponding pairs of partitions
• build an in-memory hash table from one input
• probe the hash table with rows from the other
Unix + Oracle =
Steve Adams
Ixora [email protected]
Hash Table Partitioning
Hash Table Data

hash
bucket
bitmap Hash Bucket Bitmap

hash
table Partition 1 Partition 2 Partition 3 Partition 4
partition
buffers

saved
partition
extents

Unix + Oracle =
Steve Adams
Ixora [email protected]
Bit Vector Filtering
• When partitioning the first input
• build a bitmap of non-empty hash buckets
• When partitioning the second input
• check the hash bucket bitmap
• keys that map to empty hash buckets cannot be
joined
• for equality joins these rows can be immediately
excluded

Unix + Oracle =
Steve Adams
Ixora [email protected]
Partition Histogram
• Partition sizes will be uneven if the data
distribution is skew
• Partition histogram records for each partition pair
• number of keys
• bytes of memory required
• Allows dynamic role reversal
• for each partition, the hash table is built from the input
with the smaller memory requirement
• Allows optimum memory use when joining
multiple partitions simultaneously
Unix + Oracle =
Steve Adams
Ixora [email protected]
Hash Join Processing
• Phases
• initialization (planning memory use)
• build input partitioning
• probe input partitioning
• may begin to return rows
• joining of saved partitions
• may require sub-partitioning of large partitions
• Hash area memory used differently in each
phase
Unix + Oracle =
Steve Adams
Ixora [email protected]
Build Input Partitioning
Hash Area
Join Key Non-Key Columns
Input Buffers

Partition Histogram

Hash Function

Hash Partition Number Partition Buffers

Hash Value Join Key Non-Key Columns

Hash Bucket Number Hash Bucket Bitmap

Unix + Oracle =
Steve Adams
Ixora [email protected]
Probe Input Partitioning
Hash Area

Input Buffers

Partition Histogram

Hash Table
Hash Function Output Rows

Partition Buffers

Hash Bucket Bitmap

Unix + Oracle =
Steve Adams
Ixora [email protected]
Joining Saved Partitions
Hash Area

Input Buffers

Partition Histogram

Hash Table
Output Rows

Unix + Oracle =
Steve Adams
Ixora [email protected]
Demonstration
• Event 10104 to show hash join internals

Unix + Oracle =
Steve Adams
Ixora [email protected]

You might also like