0% found this document useful (0 votes)
14 views

Lecture 6 - Searching

Searching

Uploaded by

clintsimiyu004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 6 - Searching

Searching

Uploaded by

clintsimiyu004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Searching: Hash tables and Binary

search trees

Juliet Moso
Department of Computer Science
SOME TERMINOLOGY
Ancestor of a node: any node on the path from the root to
that node
Descendant of a node: any node on a path from the node to
the last node in the path
Level (depth) of a node: number of edges in the path from the
root to that node
Height of a tree: number of levels
BINARY SEARCH TREES (BST)
Binary Search Tree Property:
The value stored at a node is greater than the value stored at its left child
and less than the value stored at its right child
Thus, the value stored at the root of a subtree is greater than any value in
its left subtree and less than any value in its right subtree!!
SEARCHING A BST
(1) Start at the root
(2) Compare the value of the item you are searching
for with the value stored at the root
(3) If the values are equal, then item found;
otherwise, if it is a leaf node, then not found
(4) If it is less than the value stored at the root, then
search the left subtree
(5) If it is greater than the value stored at the root,
then search the right subtree
(6) Repeat steps 2-6 for the root of the subtree
chosen in the previous step 4 or 5
NUMBER OF NODES
Recursive implementation
#nodes in a tree = #nodes in left subtree + #nodes in right
subtree + 1
What is the size factor?
Number of nodes in the tree we are examining
What is the base case?
The tree is empty
What is the general case?
CountNodes(Left(tree)) + CountNodes(Right(tree)) + 1
NUMBER OF NODES
Let’s consider the first few steps:
BST OPERATIONS: RETRIEVE ITEM
What is the size of the problem?
Number of nodes in the tree we are examining

What is the base case(s)?


1. When the key is found
2. The tree is empty (key was not found)

What is the general case?


Search in the left or right subtrees
BST OPERATIONS: RETRIEVE ITEM
BST OPERATIONS: INSERT ITEM
What is the size of the problem?
Number of nodes in the tree we are examining
What is the base case(s)?
The tree is empty
What is the general case?
Choose the left or right subtree

• Use the binary search tree property to insert the new item
at the correct place
BST OPERATIONS: INSERT ITEM
BST OPERATIONS: INSERT ITEM
Insert 11
DOES THE ORDER OF INSERTING
ELEMENTS INTO A TREE MATTER?
Yes, certain orders produce very unbalanced trees!!
Unbalanced trees are not desirable because search
time increases!!
There are advanced tree structures (e.g.,"red-black
trees") which guarantee balanced trees
Does the
order of
inserting
elements
into a tree
matter?
BST OPERATIONS: DELETE ITEM
What is the size of the problem?
Number of nodes in the tree we are examining
What is the base case(s)?
Key to be deleted was found
What is the general case?
Choose the left or right subtree

First, find the item; then, delete it


Important: binary search tree property must be preserved!!
We need to consider three different cases:
(1) Deleting a leaf
(2) Deleting a node with only one child
(3) Deleting a node with two children
DELETING A LEAF
DELETING A NODE WITH ONLY ONE
CHILD
DELETING A NODE WITH TWO
CHILDREN

Find predecessor (it is the rightmost node in the left subtree)


Replace the data of the node to be deleted with predecessor's data
Delete predecessor node
TREE TRAVERSALS
There are mainly three ways to traverse a tree:
1) Inorder Traversal
2) Postorder Traversal
3) Preorder Traversal
TreeWalk(x)
TreeWalk(left[x]);
print(x);
TreeWalk(right[x]);
• Prints elements in sorted (increasing) order
• This is called an Inorder Traversal: print left, then root, then right
• Preorder Traversal: print root, then left, then right
• Postorder Traversal: print left, then right, then root
INORDER TRAVERSAL: A E H J M T Y

Visit second
tree

‘J’

‘E’ ‘T’

‘A’ ‘H’ ‘M’ ‘Y’

Visit left subtree first Visit right subtree last


POSTORDER TRAVERSAL: A H E M Y TJ

Visit last
tree

‘J’

‘E’ ‘T’

‘A’ ‘H’ ‘M’ ‘Y’

Visit left subtree first Visit right subtree second


PREORDER TRAVERSAL: J E A H T M Y
Visit first

tree

‘J’

‘E’ ‘T’

‘A’ ‘H’ ‘M’ ‘Y’

Visit left subtree second Visit right subtree last


WHAT IS A HASH TABLE ?
Hash tables are an array-based method for
implementing a Dictionary
The simplest kind of hash table is an array of
records.
This example has 701 records.

[0] [1] [2] [3] [4] [5] [ 700]

...

An array of records
WHAT IS A HASH TABLE ? [4]

Each record has a special field,


called its key.
In this example, the key is a long Number 506643548
integer field called Number
The number might be a person's
identification number, and the
rest of the record has
information about the person. [5]

[0] [1] [2] [3] [4] [ 700]

...
WHAT IS A HASH TABLE ?
When a hash table is in use, some spots contain
valid records, and other spots are "empty".
The empty spots are identified by a special key.
For example, if all our identification numbers are
positive, then we could use 0 as the Number that
indicates an empty spot.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
INSERTING A NEW RECORD
In order to insert a new record, the Number 580625685
key must somehow be converted
to an array index between 0 and
700.
The conversion process is called
hashing
The index is called the hash value
of the key.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
INSERTING A NEW RECORD
Typical way to create a hash value: Number 580625685
Take the key mod 701 (which could
be anywhere from 0 to 700).

(Number mod 701)


3
What is (580625685 mod 701) ?

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
INSERTING A NEW RECORD
The hash value is used for the Number 580625685
location of the new record.

So, this new item will be


placed at location [3] of the
array.
[3]

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
COLLISIONS
Sometimes, two different records Number 701466868
might end up with the same hash
value.
Here is another new record to
insert, with a hash value of 2.
My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
COLLISION RESOLUTION
If, when an element is inserted, it hashes to the
same value as an already inserted element, then we
have a collision and need to resolve it.

There are several methods for dealing with this:


• Separate chaining
• Open addressing
• Linear Probing
• Quadratic Probing
• Double Hashing
SEPARATE CHAINING
The idea is to keep a list of all elements that hash
to the same value.
• The array elements are pointers to the first nodes of the
lists.
• A new item is inserted to the front of the list.
Advantages:
• Better space utilization for large items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash table
size.
• Deletion is quick and easy: deletion from the linked list.
SEPARATE CHAINING: EXAMPLE
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key mod 10.
0 0

1 81 1
2

4
64 4
5
25
6
36 16

9
49 9
OPEN ADDRESSING
Separate chaining has the disadvantage of using linked
lists.
• Requires the implementation of a second data structure.

In an open addressing hashing system, all the data go


inside the table.
• Thus, a bigger table is needed.
• If a collision occurs, alternative cells are tried until an
empty cell is found.
There are three common collision resolution strategies:
• Linear Probing
• Quadratic probing
• Double hashing
OPEN ADDRESSING
This is called a collision, because Number 701466868
there is already another valid
record at [2].

When a collision
occurs,
move forward until you
find an empty spot.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
OPEN ADDRESSING
The new record is always placed in the first available
empty spot, after the hash value.

The new record goes


in the empty spot.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
SEARCHING FOR A KEY
The data that's attached to a key Number 701466868
can be found fairly quickly.
Start by computing the hash value,
which is 2 in this case.
Then check location 2.
If location 2 has a different key than My hash
value is [2].
the one you are looking for, then
move forward
Not me.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868
Number 155778322

...
SEARCHING FOR A KEY
Number 701466868

Keep moving forward until you


find the key, or you reach an
empty spot.
My hash
value is [2].
Not me.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
SEARCHING FOR A KEY
Number 701466868

Keep moving forward until you


find the key, or you reach an
empty spot.
My hash
value is [2].
Not me.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
SEARCHING FOR A KEY
Number 701466868
Keep moving forward until you
find the key, or you reach an
empty spot.

My hash
value is [2].
Yes!

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
SEARCHING FOR A KEY
When the item is found, the Number 701466868
information can be copied to the
necessary location.
What happens if a search reaches
an empty spot?
It can halt and indicate that
the key was not in the hash My hash
table. Yes! value is [2].

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
DELETING A RECORD
Records may also be deleted from a hash table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
(Remember that a search can stop when it reaches
an empty spot.)

Please
delete me.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
DELETING A RECORD
The location must be marked in some special way so
that a search can tell that the spot used to have
something in it.
In any case, a search can not stop when it reaches "a
location that used to have something here".
A search can only stop when it reaches a true empty
spot.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 701466868 Number 155778322

...

You might also like