0% found this document useful (0 votes)
8 views

Ch-2: Abstract Data Structures

Advanced Computing Concepts [COMP 8547]. University of Windsor. My notes.

Uploaded by

Rashed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Ch-2: Abstract Data Structures

Advanced Computing Concepts [COMP 8547]. University of Windsor. My notes.

Uploaded by

Rashed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Class 2 – Abstract Data Types

Abstract Data: An abstract data type (ADT) is a set of objects with a set of
operations.

Common ADTs:
● Stacks: LIFO (Last-In-First-Out) data structure.
● Operations: push, pop, peek, isEmpty.
● Queues: FIFO (First-In-First-Out) data structure.
● Operations: enqueue, dequeue, peek, isEmpty.
● Lists: Ordered collection of elements.
● Operations: insert, delete, search, traverse.
● Trees: Hierarchical data structure.
● Types: binary trees, binary search trees, AVL trees, etc.
● Graphs: Non-linear data structure with nodes and edges.
● Types: directed graphs, undirected graphs, weighted graphs, etc.
Maps:
Maps or dictionaries are abstract data types that store key-value pairs. This
allows for efficient retrieval of values based on their corresponding keys.
Key Components of a Map ADT:
● Keys: Unique identifiers used to access values.
● Values: Data associated with the keys.
● Operations:
● insert(key, value): Adds a new key-value pair to the map.
● get(key): Retrieves the value associated with the given key.
● remove(key): Removes the key-value pair from the map.
● contains(key): Checks if the map contains the given key.
● size(): Returns the number of key-value pairs in the map.

Common Implementations of Maps:


● Hash Tables [unsorted]: Use hash functions to map keys to indices in an array.
● Advantages: Efficient for most operations.
● Disadvantages: Can suffer from collisions (multiple keys mapped to the
same index).
● Binary Search Trees: Store key-value pairs in a tree structure, with keys sorted in a
specific order.
● Advantages: Efficient for ordered operations (e.g., finding the minimum or
maximum value).
● Disadvantages: Can be inefficient for unordered operations in the worst
case.
● Red-Black Trees: A self-balancing binary search tree that maintains balance
properties.
● Advantages: Guaranteed logarithmic time for all operations.
● Disadvantages: More complex implementation than binary search trees.

Hash Tables:
Hash tables are a type of data structure that use a hash function to map keys to indices in
an array. This allows for efficient storage and retrieval of key-value pairs.

Key Components of a Hash Table:


● Hash Function: A function that takes a key as input and returns an integer index within
the array.
● Array: The underlying data structure that stores the key-value pairs.
● Collision Handling: A mechanism to handle cases where multiple keys map to the same
index (collision).

Common Collision Handling Techniques:


Two objects map to the same cell in the table
1. Separate Chaining:
a. Each index in the array points to a linked list that stores all key-value pairs that
hashed to that index.
b. Requires additional memory.

2. Open Addressing: When a collision occurs, the algorithm probes other indices in the
array until an empty slot is found.
a. Linear probing: Search sequentially from the collision point.
b. Quadratic probing: Search using a quadratic function of the probe index.
c. Double hashing: Use a second hash function to determine the probe sequence.
Worst-Case Performance: In the worst case, when all keys hash to the same index, operations can
degrade to O(n).

Linear Probing
When a collision occurs, linear probing sequentially searches for the next available index in
the hash table.
How Linear Probing Works:
1. Hash the key: Calculate the hash value using the hash function.
2. Check the index: If the corresponding index in the hash table is empty, insert the
key-value pair.
3. Collision: If the index is occupied, increment the index by 1 and repeat step 2.

xx= [18,41,22,44,59,32,31]
y=13 //this is the table size
hash(x) = x mod y = [5, 2, 9, 5, 7, 6, 5]
Cons of Linear Probing:
● Clustering: Can lead to clustering of elements, where consecutive indices are
filled. This can degrade performance, especially for high load factors.
● Deletion difficulties: Deleting elements can create "holes" in the table, which can
make subsequent searches less efficient.
Quadratic Probing:
Quadratic probing is not guaranteed to find an empty bucket.
Collision: If the index is occupied, increment the index by a quadratic function of the probe
number. The quadratic function is typically of the form i^2, where i is the probe number
starting from 0.

xx= [18,41,31,54,28,44,15]
y=13 //this is the table size
hash(x) = x mod y = [5, 2, 5, 2, 2, 5, 2]
collision occurs:
put it in where
j = 1,2,3…..N-1, and iteration.
2
𝐴 [ (𝑖 + 𝑗 ) % 𝑁

Array Hash (x%size) calculation


18 5
41 2
31 5 -> 6 2
(5 + 1 ) % 13 = 6
54 2 -> 3 2
(2 + 1 ) % 13 = 3
28 2 -> 11 2
(2 + 2 ) % 13 = 6
2
(2 + 3 ) % 13 = 11
44 5 -> 9 2
(5 + 2 ) % 13 = 9
15 2 -> 1 2
(2 + 4 ) % 13 = 5
2
(2 + 5 ) % 13 = 1

Double Hashing
Aims to minimize clustering and improve performance by using two hash functions.
How Double Hashing Works:
1. Primary hash function
2. Secondary hash function
3. Probe sequence: Use the secondary hash value to determine the probe sequence.
The probe sequence is calculated as follows:
○ index = (initial_hash + j * secondary_hash) % table_size
○ i starts from 0 and increments with each probe.

Common Secondary Hash Function:


d2(k) = q - k mod q
where
▪q<N
▪ q and N are prime
Collision:
( Hash1(x) + j * Hash2(x) ) mod N
Where,
j = 1,2,3…..N-1, an iteration
N = table_size

31 41 18 32 59 73 22 44
0 1 2 3 4 5 6 7 8 9 10 11 12

Arr Hash1 Hash2 probes calculation


(x%size) (q - x%q)

18 5 3 5
41 2 1 2
22 9 6 9
44 5 5 5 ->10 (5 + 1 * 5) % 13 = 10
59 7 4 7
32 6 3 6
31 5 4 5 -> 0 (5 + 1 * 4) % 13 = 9
(5 + 2 * 4) % 13 = 0
73 8 4 8
Load Factor:
The load factor of a hash table is the ratio of the number of elements stored in the table to
the total size of the table. It is calculated as:
Load factor (α) = Number of elements (n) / Table size (N)
𝑛
α = 𝑁

If,
α = 0, p is a constant
α = 1, p is a ∞ (infinite)
Ideal load factor is a<0.5. Then, the expected running time of a search/insertion/deletion is
O(1).

Rehashing
1. Create a new hash table with a new size.
2. remove all elements from the old table (one at a time)
3. Insert them into the new table (one at a time).

What’s the size of the new table?


it’s usually decided by the load factor
new table size = load_factor * number of elements
𝑁 = α*𝑛

Cuckoo hashing
It's named after the cuckoo bird, which lays its eggs in other birds' nests, causing the
original eggs to be evicted. In cuckoo hashing, when a collision occurs, one of the
conflicting elements is evicted and rehashed to a different location.

How it works:

1. Multiple hash functions: Cuckoo hashing uses two independent hash functions, h1
and h2.
2. Initial insertion:
a. h1, h2 tables are calculated.
b. Initially try to place it in table 1 using h1.
3. Collision:
a. KICK OUT occupant and get placed.
b. The kicked out dude uses h2 to get placed in the 2nd table.
h1 h2 Tab 1 Tab2

A 0 2 B D

B 0 0 C

D 1 0 B

C 1 4 E

F 3 4 F

E 3 2

What if the second table is also occupied?

- evict and try to place the evicted in table 1.


- This process continues. If it cycles indefinitely (causing many displacements),
the algorithm detects a cycle or limit of moves and triggers a rehashing
(resizing the hash tables and recalculating the hash functions) to resolve the
issue.

Advantages of cuckoo hashing:

● High load factors: Cuckoo hashing can handle very high load factors (ratio of
elements to buckets) without significant performance degradation.
● Constant time operations: On average, operations like insertion, deletion, and
search have constant time complexity.
● No clustering: Unlike linear probing or quadratic probing, cuckoo hashing doesn't
suffer from clustering issues.

Disadvantages of cuckoo hashing:

● Worst-case performance: In rare cases, cuckoo hashing can require a large


number of rehashings, leading to linear time complexity.
● Complexity: The implementation of cuckoo hashing can be more complex than
other collision resolution techniques.

Binary Heap
A heap is a binary tree satisfying the following properties:
▪ Heap-Order: for every node v other than the root
𝑘𝑒𝑦(𝑣) ≥ 𝑘𝑒𝑦(𝑝𝑎𝑟𝑒𝑛𝑡(𝑣))
▪ Complete Binary Tree: let h be the height of the heap
➢ for i = 0, … , h-1, there are 2i nodes of depth i
➢ at depth h, the external nodes are arranged to the
left of the tree

Height of a Heap
• Theorem: A heap storing n keys has height
O(log n)

You might also like