Creating Hash Clusters
Creating Hash Clusters
A hash cluster is created using a CREATE CLUSTER statement, but you specify a HASHKEYS clause. The
following example contains a statement to create a cluster named trial_cluster that stores the trial table,
clustered by the trialno column (the cluster key); and another statement creating a table in the cluster.
CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))
TABLESPACE users
STORAGE (INITIAL 250K
MINEXTENTS 1
NEXT 50K
MAXEXTENTS 3
PCTINCREASE 0)
HASH IS trialno HASHKEYS 150;
As with index clusters, the key of a hash cluster can be a single column or a composite key (multiple column
key). In this example, it is a single column.
The HASHKEYS value, in this case 150, specifies and limits the number of unique hash values that can be
generated by the hash function used by the cluster. The database rounds the number specified to the nearest
prime number.
If no HASH IS clause is specified, the database uses an internal hash function. If the cluster key is already a
unique identifier that is uniformly distributed over its range, you can bypass the internal hash function and
specify the cluster key as the hash value, as is the case in the preceding example. You can also use the HASH
IS clause to specify a user-defined hash function.
You cannot create a cluster index on a hash cluster, and you need not create an index on a hash cluster key.
For additional information about creating tables in a cluster, guidelines for setting parameters of the CREATE
CLUSTER statement common to index and hash clusters, and the privileges required to create any cluster,
see Chapter 20, "Managing Clusters". The following sections explain and provide guidelines for setting the
parameters of the CREATE CLUSTER statement specific to hash clusters:
650-555-1212
650-555-1213
650-555-1214
...
...
In the following SQL statements, the telephone_number column is the hash key. The hash cluster
is sorted on the call_timestamp andcall_duration columns. The number of hash keys is based
on 10-digit telephone numbers.
CREATE CLUSTER call_detail_cluster (
telephone_number NUMBER,
call_timestamp NUMBER SORT,
call_duration NUMBER SORT )
HASHKEYS 10000 HASH IS telephone_number
SIZE 256;
NUMBER,
call_timestamp
NUMBER
SORT,
call_duration
NUMBER
SORT,
other_info
VARCHAR2(30) )
CLUSTER call_detail_cluster (
telephone_number, call_timestamp, call_duration );
Given the sort order of the data, the following query would return the call records for a specified
hash key by oldest record first.
SELECT * WHERE telephone_number = 6505551212;
The database rounds the HASHKEYS value up to the nearest prime number, so this cluster has a
maximum of 503 hash key values, each of size 512 bytes. The SINGLE TABLE clause is valid only
for hash clusters. HASHKEYS must also be specified.
See Also:
Oracle Database SQL Language Reference for the syntax of the CREATE CLUSTER statement
Setting HASH IS
Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype,
and contains uniformly distributed integers. If these conditions apply, you can distribute rows in
the cluster so that each unique cluster key value hashes, with no collisions (two cluster key
values having the same hash value), to a unique hash value. If these conditions do not apply,
omit this clause so that you use the internal hash function.
Setting SIZE
SIZE should be set to the average amount of space required to hold all rows for any given hash
key. Therefore, to properly determine SIZE, you must be aware of the characteristics of your data:
If the hash cluster is to contain only a single table and the hash key values of the rows in
that table are unique (one row for each value), SIZE can be set to the average row size in
the cluster.
If the hash cluster is to contain multiple tables, SIZE can be set to the average amount of
space required to hold all rows associated with a representative hash value.
Further, once you have determined a (preliminary) value for SIZE, consider the following. If
the SIZE value is small (more than four hash keys can be assigned for each data block) you can
use this value for SIZE in the CREATE CLUSTER statement. However, if the value of SIZE is large
(four or fewer hash keys can be assigned for each data block), then you should also consider the
expected frequency of collisions and whether performance of data retrieval or efficiency of space
usage is more important to you.
If the hash cluster does not use the internal hash function (if you specified HASH IS) and
you expect few or no collisions, you can use your preliminary value of SIZE. No collisions
occur and space is used as efficiently as possible.
If you expect frequent collisions on inserts, the likelihood of overflow blocks being
allocated to store rows is high. To reduce the possibility of overflow blocks and maximize
performance when collisions are frequent, you should adjust SIZE as shown in the
following chart.
Setting fo
SIZE
SIZE
+ 15
SIZE
+ 12
SIZE
+ 8%
>4
SIZE
Overestimating the value of SIZE increases the amount of unused space in the cluster. If
space efficiency is more important than the performance of data retrieval, disregard the
adjustments shown in the preceding table and use the original value for SIZE.
Setting HASHKEYS
For maximum distribution of rows in a hash cluster, the database rounds the HASHKEYS value up to
the nearest prime number.
In this example, only one hash key can be assigned for each data block. Therefore, the initial
space required for the hash cluster is at least 100*2K or 200K. The settings for the storage
parameters do not account for this requirement. Therefore, an initial extent of 100K and a second
extent of 150K are allocated to the hash cluster.
Alternatively, assume the HASH parameters are specified as follows:
SIZE 500 HASHKEYS 100
In this case, three hash keys are assigned to each data block. Therefore, the initial space required
for the hash cluster is at least 34*2K or 68K. The initial settings for the storage parameters are
sufficient for this requirement (an initial extent of 100K is allocated to the hash cluster).