0% found this document useful (0 votes)
144 views32 pages

CH 09

The chapter discusses various searching and hashing algorithms. It describes sequential search which searches linearly through an unsorted list. Binary search is described as more efficient for sorted arrays using a divide-and-conquer approach. Hashing is described as an algorithm of order one complexity that stores data in a hash table using a hash function to map keys to array indices, which can cause collisions. Open addressing techniques like linear probing and quadratic probing are described to resolve collisions by probing through the array.

Uploaded by

Momin Aziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views32 pages

CH 09

The chapter discusses various searching and hashing algorithms. It describes sequential search which searches linearly through an unsorted list. Binary search is described as more efficient for sorted arrays using a divide-and-conquer approach. Hashing is described as an algorithm of order one complexity that stores data in a hash table using a hash function to map keys to array indices, which can cause collisions. Open addressing techniques like linear probing and quadratic probing are described to resolve collisions by probing through the array.

Uploaded by

Momin Aziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Data Structures Using C++ 2E

Chapter 9
Searching and Hashing Algorithms
Search Algorithms

• Item key
– Unique member of the item
– Used in searching, sorting, insertion, deletion
• Number of key comparisons
– Comparing the key of the search item with the key of
an item in the list
• Can use class arrayListType (Chapter 3)
– Implements a list and basic operations in an array

Data Structures Using C++ 2E 2


Sequential Search

• Array-based lists
– Covered in Chapter 3
• Linked lists
– Covered in Chapter 5
• Works the same for array-based lists and linked lists
• See code on page 499

Data Structures Using C++ 2E 3


Binary Search
• Performed only on ordered arrays
• Uses divide-and-conquer technique

FIGURE 9-1 List of length 12

FIGURE 9-2 Search list, list[0]...list[11]

FIGURE 9-3 Search list, list[6]...list[11]


Data Structures Using C++ 2E 4
Binary Search (cont’d.)

• C++ function implementing binary search algorithm

Data Structures Using C++ 2E 5


Binary Search (cont’d.)
• Example 9-1

FIGURE 9-4 Sorted list for a binary search

TABLE 9-1 Values of first, last, and mid and the


number of comparisons for search item 89

Data Structures Using C++ 2E 6


Binary Search (cont’d.)
TABLE 9-2 Values of first, last, and mid and the
number of comparisons for search item 34

TABLE 9-3 Values of first, last, and mid and the


number of comparisons for search item 22

Data Structures Using C++ 2E 7


Hashing
• Algorithm of order one (on average)
• Requires data to be specially organized
– Hash table
• Helps organize data
• Stored in an array
• Denoted by HT
– Hash function
• Arithmetic function denoted by h
• Applied to key X
• Compute h(X): read as h of X
• h(X) gives address of the item

Data Structures Using C++ 2E 8


Hashing (cont’d.)
• Organizing data in the hash table
– Store data within the hash table (array)
– Store data in linked lists
• Hash table HT divided into b buckets
– HT[0], HT[1], . . ., HT[b – 1]
– Each bucket capable of holding r items
– Follows that br = m, where m is the size of HT
– Generally r = 1
• Each bucket can hold one item
• The hash function h maps key X onto an integer t
– h(X) = t, such that 0 <= h(X) <= b – 1

Data Structures Using C++ 2E 9


Hashing (cont’d.)

• See Examples 9-2 and 9-3


• Synonym
– Occurs if h(X1) = h(X2)
• Given two keys X1 and X2, such that X1 ≠ X2
• Overflow
– Occurs if bucket t full
• Collision
– Occurs if h(X1) = h(X2)
• Given X1 and X2 nonidentical keys

Data Structures Using C++ 2E 10


Example 9-2
• Suppose there are six students a1, a2, a3, a4, a5,
a6 in the Data Structures class and their IDs are a1:
197354863; a2: 933185952; a3: 132489973; a4:
134152056; a5: 216500306; and a6:106500306.
• Let k1= 197354863, k2=933185952,
k3=132489973, k4=134152056, k5=216500306,
and k6=106500306.
• Suppose that HT denotes the hash table and HT is
of size 13 indexed 0, 1, 2, . . ., 12.
• Define the function h: {k1, k2, k3, k4, k5, k6} 
{0, 1, 2, . . ., 12} by h(ki) = ki % 13. (Note that %
Datadenotes the
Structures Using C++mod
2E operator.) 11
Example 9-2
• h (k1) = h(197354863) = 197354863 % 13 = 4
• h (k2) = h(933185952) = 933185952 % 13 = 10
• h (k3) = h(132489973) = 132489973 % 13 = 5
• h (k4) = h(134152056) = 134152056 % 13 = 12
• h (k5) = h(216500306) = 216500306 % 13 = 9
• h (k6) = h(106500306) = 106500306 % 13 = 3
• As a result,
• HT [4]  197354863; HT [5]  132489973;
• HT [9]  216500306; HT [10]  933185952
• HT [12] 134152056; HT [3]  106500306

Data Structures Using C++ 2E 12


Example 9-3
• Suppose there are eight students in the class in a
college and their IDs are 197354864,933185952,
132489973, 134152056, 216500306, 106500306,
216510306, and 197354865. We want to store each
student’s data into HT in this order using the same
hashing function. The result:
– h (k1) =197354864 % 13 = 5 ; h (k4) = 134152056 % 13 = 12
– h (k7) = 216510306 % 13 = 12; h (k2) = 933185952 % 13 = 10
– h (k5) = 216500306 % 13 = 9; h (k8) = 197354865 % 13 = 6
– h (k3) = 132489973 % 13 = 5; h (k6) = 106500306 % 13 = 3
• Note that a collision has occurred. We shall discuss
some ways to handle collisions.
Data Structures Using C++ 2E 13
Hashing (cont’d.)
• Overflow and collision occur at same time
– If r = 1 (bucket size = one)
• Choosing a hash function
– Main objectives
• Choose an easy to compute hash function
• Minimize number of collisions
• If HTSize denotes the size of hash table (array size
holding the hash table)
– Assume bucket size = one
• Each bucket can hold one item
• Overflow and collision occur simultaneously

Data Structures Using C++ 2E 14


Hash Functions: Some Examples
• Division (modular arithmetic)
– In C++
• h(X) = iX % HTSize;
– C++ function

Data Structures Using C++ 2E 15


Collision Resolution

• Desirable to minimize number of collisions


– Collisions unavoidable in reality
• Hash function always maps a larger domain onto a
smaller range
• Collision resolution technique categories
– Open addressing (closed hashing)
• Data stored within the hash table
– Chaining (open hashing)
• Data organized in linked lists
• Hash table: array of pointers to the linked lists

Data Structures Using C++ 2E 16


Collision Resolution: Open Addressing

• Data stored within the hash table


– For each key X, h(X) gives index in the array
• Where item with key X likely to be stored

Data Structures Using C++ 2E 17


Linear Probing
• Starting at location t
– Search array sequentially to find next available slot
• Assume circular array
– If lower portion of array full
• Can continue search in top portion of array using mod
operator
– Starting at t, check array locations using probe
sequence
• t, (t + 1) % HTSize, (t + 2) % HTSize, . . ., (t + j) %
HTSize

Data Structures Using C++ 2E 18


Example 9-4

• Using the linear probing, the array position for


Example 9-3 where each student’s data is stored is:
h(ID) (h(ID) + 1) % 13 (h(ID) + 2) % 13
197354864 5
933185952 10
132489973 5 6
134152056 12
216500306 9
106500306 3
216510306 12 0
197354865 6 7

Data Structures Using C++ 2E 19


Linear Probing (cont’d.)
• The next array slot is given by
– (h(X) + j) % HTSize where j is the jth probe
• See Example 9-4
• C++ code implementing linear programming

Data Structures Using C++ 2E 20


Linear Probing (cont’d.)
• Causes clustering
– More and more new keys would likely be hashed to
the array slots already occupied

FIGURE 9-5 Hash table of size 20

FIGURE 9-6 Hash table of size 20 with certain positions occupied

FIGURE 9-7 Hash table of size 20 with certain positions occupied

Data Structures Using C++ 2E 21


Rehashing

• If collision occurs with hash function h


– Use a series of hash functions: h1, h2, . . ., hs
– If collision occurs at h(X)
• Array slots hi(X), 1 <= hi(X) <= s examined

Data Structures Using C++ 2E 22


Quadratic Probing
• Suppose
– Item with key X hashed at t (h(X) = t and 0 <= t <=
HTSize – 1)
– Position t already occupied
• Starting at position t
– Linearly search array at locations (t + 1)% HTSize, (t
+ 22 ) % HTSize = (t + 4) %HTSize, (t + 32) % HTSize
= (t + 9) % HTSize, . . ., (t + i2) % HTSize
• Probe sequence: t, (t + 1) % HTSize (t + 22 ) %
HTSize, (t + 32) % HTSize, . . ., (t + i2) % HTSize

Data Structures Using C++ 2E 23


Example 9-6
• Suppose that the size of the hash table is 101 and for
the keys X1, X2, and X3, h(X1) = 25, h(X2) = 96, and
h(X3) = 34. Then the probe sequence for X1 is 25, 26,
29, 34, 41, and so on. The probe sequence for X2 is 96,
2
97, 100, 4, 11, and so on. (Notice that (96 + 3 ) % 101 =
105 % 101 = 4.)
• The probe sequence for X3 is 34, 35, 38, 43, 50, 59, and
so on. Even though element 34 of the probe sequence of
X3 is the same as the fourth element of the probe
sequence of X1, both probe sequences after 34 are
different.

Data Structures Using C++ 2E 24


Collision Resolution: Chaining (Open
Hashing)
• Hash table HT: array of pointers
– For each j, where 0 <= j <= HTsize -1
• HT[j] is a pointer to a linked list
• Hash table size (HTSize): less than or equal to the
number of items
FIGURE 9-10 Linked hash table

Data Structures Using C++ 2E 25


Collision Resolution: Chaining (cont’d.)

• Item insertion and collision


– For each key X (in the item)
• First find h(X) = t, where 0 <= t <= HTSize – 1
• Item with this key inserted in linked list pointed to by
HT[t]
– For nonidentical keys X1 and X2
• If h(X1) = h(X2)
– Items with keys X1 and X2 inserted in same linked list
• Collision handled quickly, effectively

Data Structures Using C++ 2E 26


Collision Resolution: Chaining (cont’d.)
• Search
– Determine whether item R with key X is in the hash
table
• First calculate h(X)
– Example: h(X) = t
• Linked list pointed to by HT[t] searched sequentially
• Deletion
– Delete item R from the hash table
• Search hash table to find where in a linked list R exists
• Adjust pointers at appropriate locations
• Deallocate memory occupied by R

Data Structures Using C++ 2E 27


Collision Resolution: Chaining (cont’d.)

• Overflow
– No longer a concern
• Data stored in linked lists
• Memory space to store data allocated dynamically
– Hash table size
• No longer needs to be greater than number of items
– Hash table less than the number of items
• Some linked lists contain more than one item
• Good hash function has average linked list length still
small (search is efficient)

Data Structures Using C++ 2E 28


Collision Resolution: Chaining (cont’d.)

• Advantages of chaining
– Item insertion and deletion: straightforward
– Efficient hash function
• Few keys hashed to same home position
• Short linked list (on average)
– Shorter search length
• If item size is large
– Saves a considerable amount of space

Data Structures Using C++ 2E 29


Collision Resolution: Chaining (cont’d.)
• Disadvantage of chaining
– Small item size wastes space
• Example: 1000 items each requires one word of
storage
– Chaining
• Requires 3000 words of storage
– Quadratic probing
• If hash table size twice number of items: 2000 words
• If table size three times number of items
– Keys reasonably spread out
– Results in fewer collisions

Data Structures Using C++ 2E 30


Summary
• Sequential search
– Order n
• Ordered lists
– Elements ordered according to some criteria
• Binary search
– Order log2n
• Hashing
– Data organized using a hash table
– Apply hash function to determine if item with a key is
in the table
– Two ways to organize data
Data Structures Using C++ 2E 31
Summary (cont’d.)

• Collision resolution technique categories


– Open addressing (closed hashing)
– Chaining (open hashing)
• Search analysis
– Review number of key comparisons
– Worst case, best case, average case

Data Structures Using C++ 2E 32

You might also like