0% found this document useful (0 votes)
40 views19 pages

DS Unit-Ii

The document discusses dictionaries, skip lists, and hash tables. It provides definitions and explanations of each data structure. Dictionaries store keys and their associated values, while skip lists and hash tables provide efficient search through probabilistic structures and hashing, respectively. Skip lists organize data hierarchically to allow searching without scanning all elements. Hash tables use hash functions to map keys to array indices in constant time. Both skip lists and hash tables provide average-case O(1) time search through probabilistic and hashing techniques.

Uploaded by

Bharghavteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views19 pages

DS Unit-Ii

The document discusses dictionaries, skip lists, and hash tables. It provides definitions and explanations of each data structure. Dictionaries store keys and their associated values, while skip lists and hash tables provide efficient search through probabilistic structures and hashing, respectively. Skip lists organize data hierarchically to allow searching without scanning all elements. Hash tables use hash functions to map keys to array indices in constant time. Both skip lists and hash tables provide average-case O(1) time search through probabilistic and hashing techniques.

Uploaded by

Bharghavteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Dictionary:

What is dictionary:
A dictionary is a general-purpose data structure for storing a group of objects. A dictionary is a
collection of keys, value pairs .it is a set of keys and each key has a single associated value. When
presented with a key, the dictionary will return the associated value.
Keys in a dictionary must be unique; an attempt to create a duplicate key will typically overwrite
the existing value for that key.
Note that there is a difference (which may be important) between a key not existing in a dictionary,
and the key existing but with its corresponding value being null
Dictionaries are often implemented as hash tables.
Usage: The concept of a key-value store is widely used in various computing systems, such as
caches and high-performance databases.

SkipList:
The skip list is a probabilistic data structure that is built upon the general idea of a linked list. The
skip list uses probability to build subsequent layers of linked lists upon an original linked list. Each
additional layer of links contains fewer elements.
Ex: We can think about the skip list like a subway system. There's one train that stops at every
single stop. However, there is also an express train. This train doesn't visit any unique stops, but it
will stop at fewer stops. This makes the express train an attractive option if you know where it
stops.
Why we need Skiplist:
The worst-case search time for a sorted linked list is O(n) as we can only linearly traverse the list
and cannot skip nodes while searching.
Can we search in a sorted linked list in better than O(n) time? The answer is Skip List.
skip lists are a linked-list-like structure which allows for faster search, insertion and deletion.
Organize ordered list hierarchically so we don’t need to scan all elements in search.
The time complexity and space complexity of a skiplist is O (log n) and O(n) respectively.

What is Skiplist:
 A skip list for a set S of n distinct keys is a series of lists S0, S1 , … , Sn such that
 Each list Si contains the special keys + and -
 List S0 contains the keys of S in nondecreasing order
 Each list is a subsequence(subset) of the previous one, i.e.,
S0  S1  …  Sn
 List Sn (i.e. Top Layer) contains only the two special keys

Search Operation:
Search Operation with an example along with diagram
 Steps for search a key x in a skip list:
 Start at the first position of the top list
 At the current position p, we compare x with y  key(next(p))
x = y: Return next(p)
x > y: Scan forward
x < y: Drop down
 Repeat the above step. (If “drop down” pasts the bottom list, return null.)
 Steps for Search Operation:
Element to be searched: 78.
1. At first our pointer will be at the top level S3 or any top layer if exists,
of the skip list.
2. We compare our search element with the next node element, if our element is higher,
we’ll scan forward in the same level, or else if next node element is higher value,
we’ll Drop down our pointer to the next level S2.
In our example 78 ≤ ∞, so we drop down to the below level S2.
3. We do the same operation in Level S2, 78 ≤ 31, we scan forward in the same level
Our pointer will be at 31.and compare 78 with the next element (i.e. ∞), so any how
78 less than ∞ so we drop down to the next level S1.
4. In Level S1, the result of same operation will be, 78 ≤ 34 → Scan Forward.
Then 78 ≤ 64 Scan Forward., 78≤∞ →Drop down to S1
5. In Level S0 , the result will be, 78 ≤ 78 →Element Found.
Insertion :
 For n nodes, max levels: log 2 n+1

Ex: n=10 nodes max levels: log 2 10+1=3.32 +1 = 5(ceiled).


 5 Levels - Level 0 – Level 4
 Level 4 - n/16 nodes =0 highest level will not have any node
 Level 3 - n/8 nodes =1
 Level 2 - n/4 nodes =2 or 3
 Level 1 - n/2 nodes =5
 Level 0 - n nodes =10
 We search for x in the skip list and find the positions p0, p1 , …, pi of the items with largest
key less than x in each list S0, S1, … , Si

 insert item (x, o) into list Sj after position pj

 k> y: Scan forward (where key k, y  key(next(p)))


 k < y: Drop down
 To insert an item into a skip list, we use a randomized algorithm:
 We repeatedly toss a coin until we get tails, and we denote with i the number of
times the coin came up heads

Example: Insert 10 keys {5,6,7,9,1,10,2,3,4,8} with position i decided by randomized algorithm

Suppose if we want to insert these 10 elements into the skiplist.


Element: 5 4 7 9 1 10 2 6 3
Position i: 2 3 1 1 1 1 2 1 1 (obtained by random algorithm at different time intervals)
Steps:
Element = x, next element = y
 Element 5 to be inserted into the skip list, start at first position p of the top list.

 At the current position p, we compare with the next element with the current element to be
inserted.
If x>y: → Scan Forward.
x<y: → Drop down till the position (i=2) returned by the randomized algorithm then we
insert the element and update the links to the previous and forward elements of the same
level and same in all the levels Si to the S0.

 We insert all elements into the skip list and at different positions in a sorted manner.
Deletion:
Suppose, we want to delete element x from the skip list.
 To remove an item with key x from a skip list, we proceed as follows:
 We search for x in the skip list and find the positions p0, p1 , …, pi of the items with
key x, where position pj is in list Sj
 We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si
 We remove all but one list containing only the two special keys
 Example: remove key 34
Hash table (Hash Map):

A hash table is a data structure that is used to store keys/value pairs.


It uses a hash function to compute an index into an array in which an element will be inserted or
searched. By using a good hash function, hashing can work well.
Under reasonable assumptions, the average time required to search for an element in a hash table
is O(1).
When a hash table is in use, some spots contain valid records, and other spots are "empty".
Hash table doesn’t accept any NULL key or value.

What is hashing:
Hashing is a technique that is used to uniquely identify a specific object from a group of similar
objects. Some examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
In libraries, each book is assigned a unique number that can be used to determine information about
the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number.
Assume that you have an object and you want to assign a key to it to make searching easy. To store
the key/value pair, you can use a simple array like a data structure where keys (integers) can be
used directly as an index to store values. However, in cases where the keys are large and cannot be
used directly as an index, you should use hashing techniques.
Different Time Complexities of Operations on Hash Table are as follows:

Algorithm Average Worst


Case
Space O(n) O(n)

Search O(1) O(n)

Insert O(1) O(n)

Delete O(1) O(n)


Applications:
Compilers use hash tables to implement the symbol table (a data structure to keep track of declared
variables).
Game programs use hash tables to keep track of positions it has encountered (transposition table)
Online spelling checkers.
Hash function and explain:
A hash function is any function that can be used to map a data set of an arbitrary size to a data set of
a fixed size, which falls into the hash table. The values returned by a hash function are called hash
values, hash codes, hash sums, or simply hashes.
In hashing, large keys are converted into small keys by using hash functions. The values are then
stored in a data structure called hash table. The idea of hashing is to distribute entries (key/value
pairs) uniformly across an array. Each element is assigned a key (converted key). By using that key,
you can access the element in O(1) time. Using the key, the algorithm (hash function) computes an
index that suggests where an entry can be found or inserted.
In this method, the hash is independent of the array size and it is then reduced to an index.
(a number between 0 and array_size − 1) by using the modulo operator (%).
To achieve a good hashing mechanism, it is important to have a good hash function with the
following basic requirements:
Easy to compute: It should be easy to compute and must not become an algorithm in itself.
Uniform distribution: It should provide a uniform distribution across the hash table and should not
result in clustering.
Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These
should be avoided.
Ex:
Hash function: index= key % hashtable_size.
Table_size = 1000, Element to be inserted = {key: 201, value: “Chandigarh”).
→ index = key % table_size;
→ index = 201 % 9 = 3.
At index 3 the element is inserted.
Types of Hash Funtions:-
various types of hash function which are used to place the data in a hash table .
Popular hash functions are :
 Division Method
 Mid Square Method
 Digit Folding Method
Division method :-
The hash function is dependent upon the remainder of a division between the key and the tablesize.
For example:-if the record 52,68,99,84 is to be placed in a hash table and let us take the table size is
10 .
h(key)=key % table size.
2=52%10
8=68%10
9=99%10
4=84%10

Mid Square Method:-


In this method, the key is squared and then mid part of the result is taken as the index.
 First perform (K2) given key.
 To obtain index Select middle r bits from K2
The same position must be used for all the Key.
Digit Folding Method :
In this method the key is divided into separate parts. These parts are combined by using simple
operations to produce a hash function by ignoring the last carry.

H(k) = k1 + k2 + ………. +kn


Ex: Consider a record of 12465512 then it will be divided into parts i.e. 124, 655, 12
After dividing the parts combine these parts by adding it.
H(key)=124+655+12=791

What is perfect Hash Function:


A hash function that maps each different key to a distinct integer index. Usually all possible keys
must be known beforehand. A hash table that uses a perfect hash has no collisions.
If a hash function maps each item to unique slot.
It is also known as optimal hashing.

What is Collision
Since a hash function gets us a small number for a big key, there is possibility that two keys result
in same value.
The situation where a newly inserted key maps to an already occupied slot in hash table is called
collision and must be handled using some collision handling techniques.

What are collision resolution strategies? explain all.


 Separate chaining
 Open addressing
 Linear Probing
 Quadratic Probing
 Double Hashing

A. Separate Chaining:
To handle collisions, the hash table has a technique known as separate chaining. Separate
chaining is defined as a method by which linked lists of values are built in association with each
location within the hash table when a collision occurs.

So, in place of the collision error which occurred in the figure, the cell now contains a linked list
containing the string 'Janet' and 'Martha' as seen in this new figure.

B. Open addressing
1. Linear Probing:
In linear probing, we linearly probe for next slot. For example, typical gap between two
probes is 1 as taken in below example also.
let hash_value be the slot index computed using hash function and table_size be the table size of
linear probing
hash_value = key % table_size
index = ((hash_value) + i) % table_size
 If slot (hash + 0) % table_size is full, then we try (hash + 1) % table_size
 If slot (hash + 1) % table_size is also full, then we try (hash + 2) % table_size
 If slot (hash + 2) % table_size is also full, then we try (hash + 3) %table_size
 . . . . . . . . . .
 . . . . . . . . . .
 If slot (hash + n-2) % table_size is also full, then we try (hash + (n-1)) % table_size.

Steps:
 Initially the hash table will be empty.
 For first element 50, calculate the hash value using key % table_size →50 % 7→1.
 Then the hash function is applied
At first iteration the index position will be at 0
index → ((hash_value) + i) % table_size → (1+0) % 7 →1.
At index 1 the element 50 is inserted.
 For second element 700, calculate the hash value using key % table_size →700 % 7→0.
 Then the hash function is applied
At first iteration the index position will be at 0
index → ((hash_value) + i) % table_size → (0+0) % 7 →0.
At index 0 the element 700 is inserted.
 Likewise:
key % table_size → 76 % 7→6.
index → ((hash_value) + i) % table_size → (6+0) % 7 →6.
At index 6 the element 76 is inserted.
 Collision Occurs when we try to insert 85.
 key % table_size → 85 % 7→1.
index → ((hash_value) + i) % table_size → (1+0) % 7 →1.
At index 1 collision of another element.
So, we increment the i value and insert the element, we increment I until we come to end of
array.
At index 2 the element 85 is inserted.
 Likewise, we insert all the elements.

Problem with Linear Probing:


 Clustering:
 In Linear probing elements tend to cluster together in the Hash Table
 One part of the table may be quite dense than the other part
 It causes long probe searches
To avoid clustering, we try another Open Addressing technique called Quadratic Probing.

2. Quadratic Probing:
In Quadratic probing, we Quadratic probe for next slot. For example, typical gap between two
probes is 1 as taken in below example also.
let hash_value be the slot index computed using hash function and table_size be the table size

of quadratic probing.
hash_value = key % table_size
index = ((hash_value) + i * i) % table_size
Same as linear Probing we will insert at index positions returned by the i*2 formula.

Linear Vs Quadratic Probing:


 Linear probing
 Clustering – it causes long probe searches
 Always find an open spot if one exists
 It might be a long search but we will find it
 Cache performance is best.
 Quadratic Probing
 Clustering is eliminated
 Cache performance is moderate.
 In order to guarantee quadratic probes, hit, table size must meet these requirements:
 Be a prime number Never be more than half full

3.Double Hashing:

Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
The result of the second hash function will be the number of positions form the point of collision to
insert.

There are a couple of requirements for the second function:

 it must never evaluate to 0


 must make sure that all cells can be probed

First Hash function is : Hash1 = key % Table_Size.

Whenever a collision occurs in first hash function we’ll go for second hash function,

popular second hash function is: Hash2 (key) = P - (key % P ) where P is a prime number that is
smaller than the size of the table (P < Table_Size).

The index value in Double hashing is calculated as: Hash1 (key) + I * Hash2 (key).

Example:
Advantages of Double hashing:

 It reduces clustering in a better way


 It requires fewer comparisons.
 Smaller hash tables can be used.

Disadvantages:

 Like all other forms of open addressing, double hashing becomes linear as the hash table
approaches maximum capacity.

Separate Chaining Vs Open Addressing :

Separate Chaining Open Addressing

Keys are stored inside the hash table as well All the keys are stored only inside the hash
as outside the hash table. table.
No key is present outside the hash table.

The number of keys to be stored in the hash The number of keys to be stored in the hash
table can even exceed the size of the hash table can never exceed the size of the hash
table. table.

Deletion is easier. Deletion is difficult.

Extra space is required for the pointers to store


No extra space is required.
the keys outside the hash table.

Cache performance is poor.


Cache performance is better.
This is because of linked lists which store the
This is because here no linked lists are used.
keys outside the hash table.

Some buckets of the hash table are never Buckets may be used even if no key maps to
used which leads to wastage of space. those particular buckets.

Rehashing (Hashing again):


When the hash table becomes nearly full, the number of collisions increases, thereby degrading
the performance of insertion and search operations. In such cases, a better option is to create a new
hash table with size double of the original hash table.
All the entries in the original hash table will then have to be moved to the new hash table. This is
done by taking each entry, computing its new hash value, and then inserting it in the new hash table.
Though rehashing seems to be a simple process, it is quite expensive and must therefore not be done
frequently.
Load Factor:
 If there are n entries and b is the size of the array there would be n/b entries on each index.
This value n/b is called the load factor that represents the load that is there on our map.
 when the load factor increases to more than its pre-defined value then complexity
increases.
 Default load factor is 0.75.
How Rehashing done:
 When table is too full, create a new table at least twice as big and size is Prime
 compute new hash value of each element, insert it into new table.
 Rehash when table is half full or when a certain load factor (default value of 0.75) is reached
(Insertions, deletions, search take longer time) or when an insertion fails
 Cost of Rehashing = O(N)
Example:
Suppose we have a hash table with some values in it, as u can see in below diagram.

The hash function used is h(x) = x % 5. Rehash the entries into to a new hash table.
The size of the hash table is doubled.
Before that we calculate load factor whenever we insert a new element into the hash table.
If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash.

Now, rehash the key values from the old hash table into the new one using hash function → h(x) =
x % 10.

Disadvantages of Rehashing:
This might not give the required time complexity of O(1).
Hence, rehash must be done, increasing the size of the bucket Array so as to reduce the load factor
and the time complexity.

Extendible hashing:
Extendible hashing is dynamic hashing, since the table is dynamically increasing as we go on insert
the elements.
A hash table in which the hash function is the last few bits of the key and the table refers
to buckets.
Features of Extendible Hashing:

The main features in this hashing technique are:


 Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
 Buckets: The buckets are used to hash the actual data.
How Extendible Hashing is done?
Step 1: Analyse Data Elements: Data elements may exist in various forms e.g. Integer, String,
Float, etc... Currently, let us consider data elements of type integer. e.g.: 49.

Step 2 – Convert into binary format: Convert the data element in Binary form. For string elements,
consider the ASCII equivalent integer of the starting character and then convert the integer into
binary form. Since we have 49 as our data element, its binary form is 110001.

Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary number
and match it to the directory id.
E.g. : The binary obtained is: 110001 and the global-depth is 3. So, the hash function will return 3
LSBs of 110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows. If an
overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting data in
the buckets, it might happen that the Bucket overflows. In such cases, we need to follow an
appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of the cases
below.
Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment the global
depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
Case2: In case the local depth is less than the global depth, then only Bucket Split takes
place. Then increment only the local depth value by 1. And, assign appropriate pointers.

Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket that
is split are rehashed w.r.t the new global depth of the directory.
Step 9 – The element is successfully hashed.
When the Hash table(2-bit) Overflows:
Advantages of Extendible hashing:
 It gives the ability to design a hash function that is automatically changed underneath when
the hash table is resized.
 Secondly, there is no need to recalculate the new bucket address for all the records in the
hash table. For example, as explained in Linear Hashing, we split an existing bucket B,
create a new bucket B*, and redistribute B’s contents between B and B*.

 This implies that rehash or redistribution is limited only to the particular bucket that is being
split. There is absolutely no need to touch items in all the other buckets in the hash table.

Disadvantage of extendible hash table:

 The size of hash table will double each time when we extend the table
(Exponential rate of increase).

You might also like