Da Ds Notes
Da Ds Notes
By Quantum City
struct node{
int data;
struct node *next;
};
What happens in memory when we write : Struct node x; ? – it will assign memory block according to
the size of struct that you have learned in c programming. That memory block will be consisting of
data and next pointer.
struct node{
int data;
struct node *next;
}x, y, *p;
...
p = &x;
x.next = y;
y.next = NULL;
but why we need linked list ? – LL is used in implementing playlist in music app. In browser pages to
get back and forward. And in operating system, we navigate from one tab to another tab or process.
In case of array memory assigned is continuous and in case of linked list it is not. So, when we access
any element of array operating system will fetch few continuous blocks from RAM to cache so first
access will be slow and then every access will be fast because OP have taken data into cache which is
superfast but in case of linked list as memory allocated is not continuous every access will take time.
But we can increase the size of linked list while size of array is fixed at the time of compilation. This is
main advantage of linked list over array. But linked list uses too much memory. And random access is
not possible in linked list as we have traverse whole list just to get the last element.
y
Cit
Memory ❌ ✅
ant
Qu
Data Structure
Random access ✅ ❌
3) Insertion at beginning :
4) Insertion at end :
newNode->next = NULL;
ant
Qu
Data Structure
6) Deleting first node : We can simply move head to head->next but in this way memory leak
problem will be there.
First find that node and point to before that node then curr->next = curr->next->next; and then delete
curr->next.
You can also apply recursion on these functions instead of while loop.
//Lecture 4a
When we follow choice 1 templet in recursion then we will loss out connection with 8 node. So
somehow, we need to go at the last node and do recursion of l->next->next = l; So, that is why second
choice is best.
reverserecursive(head->next);
Cit
head->next->next = head;
}
um
ant
Qu
Data Structure
But after executing this function our head is pointing to last node because we have done reverse so
now our first node is last and our last node is not pointing to NULL. Thus, in main function we carry
one pointer which points to last element of our original linked list.
int main(){
...
Node *last = head;
while(last->next)last = last->next;
reverserecursive(head);
head->next = NULL;
head = last;
...
}
Another method without last node and head->next = NULL in main function is to return head pointer
in reverserecursive function call. And after last recursive call head->next should points to NULL. So
final recursive program will be
Here in above example, time complexity is O(n) as it traverses whole linked list. But space complexity
is not O(1) it is O(n) because each recursive call takes O(1) time and there are n such. Consider one
code where only while loop runs O(n) times then space complexity is O(1) as only that program is
getting pushed into stack but in case of recursive calls n activation records are being pushed into stack.
//Lecture 4c
Till now, we have seen linked list and operation on it. So, it is logical to ask that which operation takes
more time ? And what if operations are too large to fit in memory ?
y
Cit
um
ant
Qu
Data Structure
The goal of asymptotic analysis is to simplify analysis of running time by getting rid of “details”. Like
rounding : 1,000,0001 = 1,000,000 and 3n2 = n2.
//Lecture 5b
Now, we are going to represent time complexity and space complexity with a funtion of input size.
1) Big-oh O notation :
𝑇(𝑛) is 𝑂(𝑔(𝑛)) if there exist constant 𝑐 > 0 and 𝑛𝑜 ≥ 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) ≤ 𝑐𝑔(𝑛).
Provides asymptotic upper bound.
𝑇(𝑛) is Ω(𝑔(𝑛)) if there exist constant 𝑐 > 0 and 𝑛𝑜 ≥ 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) ≥ 𝑐𝑔(𝑛).
Provides asymptotic lower bound.
3) 𝜽 Theta notation :
y
Cit
𝜃(𝑔(𝑛)) = {𝑓(𝑛): 𝑡ℎ𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐1 , 𝑐2 , 𝑎𝑛𝑑 𝑛𝑜 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 0 ≤ 𝑐1 𝑔(𝑛) ≤ 𝑓(𝑛)
≤ 𝑐2 𝑔(𝑛) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑛 ≥ 𝑛𝑜 }
//Lecture 5d
What does it mean to say Aysmptotically larger ? – It means we are ignoring constants and looking at
significant terms. Example,
Clearly, 2n2 > n2 but asymptotically they are equal. but 𝑛5+𝑛 ≠ 𝑛𝑛 because 𝑛5+𝑛 = 𝑛5 . 𝑛𝑛 ≠ 𝑛𝑛 .
Q : Prove that 𝑛𝜖 is asymptotically larger than (log 𝑛)𝑘 where 𝜖 < 0 and 𝑘 > 0. –
//Lecture 5e
But, we know that n3 > n2 but still after taking log we get log n ? log n. Clearly both are equal but we
know that it is wrong. So, when to take log and when not to ?
If you see carefully after taking log the value become so small that we cannot say if they are
asymptotically not equal. which means if after taking log things are far then before taking log they
would be very very far. Thus, if we encounter situations when a > b then we cannot say that log a >
log b but if log a > log b then we can definatly say that a > b but we can’t say a = b. We can say,
y
Cit
um
4) o little-oh :
Defintion : 𝑇(𝑛) is 𝑜(𝑔(𝑛)) for any constant 𝑐 > 0 there is 𝑛𝑜 > 0 so that for all 𝑛 ≥ 𝑛𝑜
That is 𝑛2 ≠ 𝑜(2𝑛2 ) because for c = ½. Both are equal but we want strickly greater.
5) 𝝎 little-omega :
Definition : 𝑇(𝑛) is 𝜔(𝑔(𝑛)) for any constant c>0 there is 𝑛𝑜 > 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) > 𝑐𝑔(𝑛)
And we can observe one thing that 𝑇(𝑛) = 𝜔(𝑔(𝑛)) 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑔(𝑛) = 𝑜(𝑇(𝑛))
Which means lg(n!) <= lg(n) + lg(n) + … n times ➔ lg(n!) <= nlgn
And now, lg(n!) = lg(n/2) + lg(n/2) + … till n/2 terms and then + lg(n/2 -1) + … + lg(1)
NOTE :
1) If we say n2 = O(n3) formally it is n2 ∈ O(n3) because if you see at definition also O(n3) is set
of function.
2) All the asymptotic notions are sets. But we use = ambiguously.
Q : For any two non-negative function f(n) and g(n), both of which tends to infinity, we must have
either f(n) = O(g(n)) or g(n) = O(f(n)) ? – This seems true but you can have f(n) = n2 and g(n) = n for
y
Cit
even input and n3 for odd inputs. Here g(n) goes up and down and is making spiral shape with f(n) so
it is sometimes O(f(n)) and sometimes Ω(f(n)).
um
//Lecture 6c
ant
Qu
Data Structure
main(){
for(int i = 0;i<N;i++)
for(int j = i+1;j<N;j++){
statement;
}
}
In this example we say after k step i becomes equal to N and then loop gets over. N = 1 + 2k meaning
k = (N-1)/2 and we know that complexity is O(k) (why?) = O(N/2) = O(N)
In first example,
Meaning time complexity will still remains same i.e. O(N/2) = O(N)
Thus, to find time complexity first assume O(k) which means after k step condition will false and then
find relation between k and n.
We again assume that i loop terminates after k iteration at the end of k iteration the value of 2k = N.
In above case, as the value of i will remain zero because of initial condition so this is infinite loop.
y
Cit
Increment : O(n) if (i = i + c)
um
ant
Qu
Data Structure
Doubling : O(log n) if (i = 2 * i)
Exponentiation : O(lg lg n) if (i = i * i)
//Lecture 7a
Best case : Whatever be the minimum time for any possible input. Even on 1 input you are taking O(1)
and all other inputs you are taking O(n4) then best case time complexity is O(1).
Worst case : Whatever be the maximum time for any possible input. Even on one input you are taking
O(n3) and all other inputs you are taking O(n) worst case time complexity O(n3).
Q : Consider an algorithm A which takes 𝜃(𝑛) in best case and 𝜃(𝑛2 ) in worst case. Then which of the
following is/are true ?
Abstract data types : An abstract data type (ADT) specifies the operation that can be performed on
the collection. It’s abstract because it doesn’t specify how the ADT will be implemented. A given ADT
can have multiple implementations. For example, linked list, stack, queue.
1.3.1) Stack :
y
Why stack ? – some of the application includes reverse a word, undo mechanism in text editors,
Cit
Stack permutation of a sequence : It is a permutation obtained after pushing and popping every
alphabet of sequence. Here pushing and popping can be done at any time meaning between alphabet.
1
Number of stack permutations : also known as Catalan number 𝐶𝑛 = 𝑛+1 (2𝑛
𝑛
)
//Lecture 3b
Implementing stack :
We can implement stack using array. we take array of some size and a point to index which initially
have value -1.
push(k){
top++;
if(stack.size() == N) "Overflow", return;
a[top] = k;
}
pop(){
if(top == -1) "Underflow", return;
return a[top--];
}
Why Queue ? – Sometimes it is also desirable to access the element which got inserted first. For
example, in networking, call center phone systems, operating system.
//Lecture 9b
We can implement queue using array but it is not as effective as circular array-based implementation.
We will use two pointer front and rear. And initially front = rear = -1.
y
Cit
um
ant
Qu
Data Structure
How to check if
Enqueue(data){
if((rear + 1) % N == front){
printf("Queue is full");
return;
}
front = (front == -1) ? 0 : front;
rear = (rear+1) % N;
array[rear] = data;
}
Dequeue(){
if(front == -1){
printf("Queue is empty");
return;
}
data = array[front];
if(front == rear) front = rear = -1;
front = (front+1) % N;
return data;
}
Getsize(){
if(front == -1 && rear == -1) return 0;
return (front>rear) ? N-front+rear+1 : rear-front+1;
}
//Lecture 9c
There is also one implementation where initially front = rear = 0 instead of -1. But in this we have
wastage of one element space.
We use one stack for insertion (S1) and one for deletion of element (S2).
For Dequeue ( ) : if S2 is not empty then pop(S2) and if S2 is empty we will transfer all the elements
from S1 to S2 and then pop(S2).
//Lecture 10a
For pop() : We dequeue n-1 element (meaning all except one) and enqueue in second queue. One
element is remaining which will be top element in case of stack.
//Lecture 10b
1) Balancing parentheses :
Create a stack.
while(input is finished){
if(character is an opening delimiter like (, {, [)
PUSH it into the stack;
if(character is a closing symbol like ), }, ]){
POP the stack;
if(stack is empty) report error;
if(symbol POP-ed is not the corresponding delimiter) report error;
}
}
//At the end of the input
if(stack is not empty) report error;
To implement two stacks, we manage two pointer points to top elements of both stacks. After pushing
elements, if we encounter situation where Top1 = Top2. We say stack overflow. Remember in this
implementation we first increment top1 or decrement top2 and then store data.
1. Infix notation A * B + C / D
2. Prefix notation (also known as “Polish Notation”) + * A B / C D
3. Postfix notation (also known as “Reverse Polish Notation”) A B * C D / +
Example :
input is -
if('(') push in stack;
if(')') pop until left parenthesis is popped;
if(operator){
lower priority is in input then pop all;
Higher priority is in input then push;
same priority pop except ↑ ;
}
We can calculate result value from postfix operation by simply taking one empty stack we push
number from left to right whenever we encounter binary operations, we pop top two elements and
perform operation and then push the result again into the stack.
y
Cit
um
ant
Qu
Data Structure
Depth (or level) of a node : The depth of a node is the number of edges from the node to the root.
Root node will have depth (or level) of 0.
Full binary tree : In a full binary tree all nodes have either 0 or 2 children.
Complete tree : All levels except last are full. Last level is left-filled.
Perfect binary tree : Complete binary tree when last level is also full.
Remark : some authors considers complete tree as perfect binary tree but that is just matter of notion.
If some strange term appears in exam they will specify.
In full binary tree all internal nodes have degree 2 so here no. of leaves = Internal nodes + 1.
In general, in m-ary tree if n is total number of nodes and i is internal nodes then n = mi + 1 and (m-
1)i + 1 leaves.
Q : The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum
number of nodes in a binary tree of height h is ? – Maximum node happen in perfect binary tree where
all parents have two children. So total number of nodes in perfect binary tree at height h is 2h and
total number of nodes in binary tree is 2h+1 – 1.
1) Array representation :
y
Cit
um
ant
Qu
Data Structure
struct node{
int data; //element
struct node *left; //pointer to l child
struct node *right; //pointer to r child
};
struct node *root;
root = (struct node*)malloc(sizeof(struct node));
root->data = 3;
root->left = NULL;
root->right = NULL;
//Lecture 12a
//Lecture 12b
Inorder = g d h b e I a f j c
Cit
Preorder = a b d g h e I c f j
um
ant
Qu
Data Structure
You know that preorder contains all root in sequence. Meaning first element of preorder is root of the
tree. And inorder contains root at the middle and all left nodes on left side of that root and right nodes
on right-hand side.
If inorder and postorder is given then we follow same procedure but instead of going left to right for
selecting root we go right to left as main root of tree appears at the last in postorder.
But if only postorder or only preorder is given then we can’t uniquely construct binary tree.
Q : but how many binary trees are possible with given only postorder traversal or given only preorder
traversal ? – Catalan number = number of stack permutation.
//Lecture 12c
Q : But what if preorder and postorder is given is it possible to construct ? – with preorder = 1 2 3,
postorder = 3 2 1, no. of possible binary tree…
Preorder = a b c d f g e
Postorder = c f g d b e a
Here a is root as it appears at the start of pre and end of post. And then go with instinct. If complete
binary tree was given then things will be easy. First see if elements are power of 2 – 1. If not then you
have to make one more node to satisfy left fill property. In short, we can construct unique BT iff
From above we can conclude that we can construct unique any BT iff we have inorder + any order.
//Lecture 13a
Binary search tree is binary tree such that all the nodes at the RHS of the root node of subtree is
greater than value stored at root node and all the nodes at the LHS of the root node of subtree is less
or equal to value stored at root node. For example, for any node
y
Cit
Now, we know that inorder traversal of any tree can be thought of as projecting values of each node
um
from left to right on number line. And From property of BST we know that minimum number occurs
ant
Qu
Data Structure
at the LHS of tree and maximum element occurs at the RHS of tree. And if we project on number line
then we have sorted sequence of numbers meaning inorder traversal of BST always gives sorted
number in increasing order.
1) Search in BST : Let’s say you want to search k. We first visit root node
Step 1 : if value is less then root value then visits left node of root and go to step 1
Step 2 : if value is less then root value then visits right node of root and go to step 1
Step 4 : If you reach leaf node (and k !=) then return k is not found.
2) Insertion in BST : same as searching we first search for the position if we reach leaf then check
if value is less than or equal to k if it is then k will be left child if not then k will be right child.
Kth maximum or Minimum element in BST : first we find inorder traversal of BST then we find kth
element from left to get kth minimum or you can find kth element from right to get kth maximum.
Complexity of range searches : You will defiantly encounter nodes between k1 and k2 let’s say there
are m nodes between range. So, m + something. This something is some extra nodes traversed. At
worst we can search in longest chain in BST which has length of h (height) so total time complexity is
y
O(h + m). where h = lg n. and if n >> m then time complexity will become O(lg n).
Cit
//Lecture 13c
um
4) Deletion in BST :
ant
Qu
Data Structure
//Lecture 14a
Let’s say we want to find legal sequence of BST search we can encounter while searching key 10. 1, 2,
5, 20, 25 these are the nodes encounter (not in order) how many sequences of these nodes are
possible ? – A legal sequence has values greater than 10 in decreasing order and values less then 10
in increasing order. And these lesser and greater values can occur randomly but in order. For example,
1, 20, 2, 5, 25 is not legal sequence because values greater than 10 are not in order if it would be 1,
25, 2, 5, 20 then it is legal sequence. So, number of such legal order is 5! / (3! X 2!). because we form
two groups {1, 2, 5} and {25, 20} now these may occur randomly but it should be in order.
y
So, whenever you are given sequence asking which one is valid do not waste time in making tree. Just
Cit
collect values less and greater than key then see if less values are in increasing order and greater
values are in decreasing order. This method is only valid when there is successful search for key is
um
given if unsuccessful search for key is given then this method gives wrong results. For example,
ant
Qu
Data Structure
Suppose the BST has been unsuccessfully searched for key 273. And sequence is given 550, 149, 507,
395, 463, 402, 270. We see that all the values less than 273 are in increasing order and values more
than 273 are in decreasing order but still it is false because after 395 it should have taken left child or
less value because 273 is less than 395 but it has traverse right child which means it is false sequence.
In remaining position, we can always place 6 before 5 and 7 but 5 and 7 can be in any order so, total
2! For 6 5 7. Total number of permutations of inserting = 2*C(6, 3)*2.
There is one problem you can observed with binary search tree that in worst case it takes O(n) time
complexity because of possibility of chain like structure we say “BST can be skewed”.
Suggestion 1 : right and left subtree of root have equal number of nodes. But can result in such
structure,
Suggestion 2 : right and left subtree of root have equal number of nodes an equal height. But still we
can have such structure
y
Cit
Suggestion 3 : right and left subtree of every node have equal number of nodes. This suggestion will
always make sure that height is lg n. but this condition is too strong we can only make perfect trees.
um
Final suggestion : balance of every node is between -1 and 1 i.e. balance(node) ∈ {-1, 0, +1}
ant
Qu
Data Structure
Application :
Direct addressing table : a fancy name for “array”… it is same as array where elements are stored at
index same as their array index. It’s limitations :
1) Key must be non-negative integer. In case string or other datatype is given we first map it to
int and then store
2) Range of keys must be small
3) Keys must be dense i.e. not many gapes in the key values.
With direct addressing, an element with key k is stored in slot k. With hashing, this element is stored
in slot h(k); that is, we use a hash function h to compute the slot from the key k. Here, h maps the
universe U of keys into the slots of a hash table T[0…m-1] :
h : U → {0, 1, 2, 3, …, m-1}
Division method (mod operator) : map into a hash table of m slots. ℎ𝑎𝑠ℎ(𝑘) = 𝑘%𝑚
Q : how to pick m (table size) ? – if m is power of two, say 2n, then (key mod m) is the same as extracting
the last n bits of the key. This is not good idea as it only considers last n bits two number can have
some common last bits in that case collision happens. What if m is 10n, then the hash value is the last
n digit of the key. This is also not good idea because 4 and 34 maps to same location.
We want h(k) depends on every bit of k, so that the differences between different k’s are fully
considered.
um
ant
Qu
Data Structure
𝑛
Load factor : 𝛼 = 𝑚
4.1.1) Simple uniform hashing : is when any given element is equally likely to hash into any of the m
slots, independently of where any other element has hashed to
1
𝑼𝒏𝒊𝒇𝒐𝒓𝒎: Prob.[ℎ(𝑥) = 𝑖] = 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑎𝑛𝑑 𝑎𝑙𝑙 𝑖
ℎ∈𝑈 𝑚
4.1.2) Collisions and its resolution techniques : Since |U|> m, there must be 2 keys that have the
same hash value. We need a mechanism for handling collisions.
• Separate chaining – also called close addressing or open hashing. Close addressing because
Im not allowed to search for an element besides that index. It’s called open hashing because
Im allowed to use external data structure like linked list etc.
• Linear probing
• Quadratic probing
• Double hashing
//Lecture 20c
1) Separate chaining : all keys that map to the same hash value are kept in a list.
• Search (k) : worst case = O(length of chain), Worst length of chain : O(n) all maps to same.
• Insert (k) : Need to check whether key already exists, still takes O(length of chain)
• Delete (k) : need searching so O(length of chain)
However, in practice, hash tables work really well, that is because the worst case almost never
happens. And average case performance is really good.
//Lecture 20d
Theorem : In a hash table in which collisions are resolved by chaining, an unsuccessful search takes
average-case time O(1+𝛼), under the assumption of simple uniform hashing.
Proof. We know that here 𝛼 represents load factor which in chaining represents average length of
chain. So, consider we search for key which is not present in hash table we will first mapped to some
y
entry in hash table (which takes O(1)) then in average case we go through whole chain and we don’t
Cit
Q : What if we ask above theorem but in successful search ? – then also answer remain same.
ant
Qu
Data Structure
Time to successful search for ith item in table having n items. = time to insert ith item when there
were i-1 items in hash table = unsuccessful search with i-1 item in the table.
//Lecture 21a
From now on we will talk about open address hash tables or closed hashing where all elements are
stored in the has table i.e. n<=m and there is no chain. To avoid collision, we use probing.
Q : How to probe ? – we want to design a function h, with the property that for all k ∈ U :
ℎ: 𝑈 × {0, 1, 2, … , 𝑚 − 1} → {0, 1, 2, … , 𝑚 − 1}
2) Linear probing :
In general, we have some f and use (ℎ(𝑘𝑒𝑦) + 𝑓(𝑖)) % 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒. In this f(i) can be any function of i
Search in linear probing : Continue looking at successive locations till we find key or encounter an
empty location.
Deletion in linear probing : We can’t just delete element because it may create empty space. Because
of this empty space upcoming search may affect and gives us wrong results. For example, here we
delete 76 and then search for 55 then we will get “Not found” although it is present.
Primary cluster
//Lecture 21b
um
3) Quadratic hashing :
ant
Qu
Data Structure
Problem with quadratic hashing : if two keys have the same initial probe position, then their probe
sequences are the same.
Since ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) implies ℎ(𝑘1 , 𝑖) = ℎ(𝑘2 , 𝑖) . This is called secondary clustering.
But this cycle can be eliminated with careful selectin of c1, c2 and h(k).
4) Double hashing :
here h and h1 are different hashing functions. But how it solves secondary clustering problem ? – since
secondary clustering satisfy ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) implies ℎ(𝑘1 , 𝑖) = ℎ(𝑘2 , 𝑖) but double hashing do not
have these problem if ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) is true then new hash function for k1 would be ℎ(𝑘1 , 1) +
ℎ1 (𝑘1 , 1).
NOTE : The main advantage of Chaining over open addressing is that in Open Addressing sometimes
though element is present we can’t delete it if Empty Bucket comes in between while searching for
that element; Such Limitation is not there in Chaining.
//Lecture 21c
Possible no. of probes in linear probing = m because only for first position we have to decide position.
For quadratic probing = m because only for first position we have to decide position.
In uniform hashing we can have m! permutation of probes as each position is equally likely to choose
even after some insertion.
//Lecture 22a
Cit
In this section we are going to use uniform hashing so there is no point of collisions. We are only
concerned about element in cell. So, there are m! permutation is possible.
Load factor 𝜶 in open addressing : In open addressing, the hash table can “fill up” so that no further
insertions can be made; one consequence is that the load factor 𝛼 can never exceed 1.
1) Unsuccessful search time : Given an open-address hash table with load factor 𝛼 the expected
1
number of probes in an unsuccessful search is at most 1−𝛼, assuming uniform hashing.
Proof. We have an open-address hash table with m slots, load factor α, and uniform hashing, where 0
< α < 1. This means that there is n = α * m elements stored in the hash table.
In an unsuccessful search, we're looking for an element that is not in the hash table. We start by
hashing the key and checking the slot. If it's empty, we're done. If it's occupied, we need to probe
further. Probability that slot is occupied is 𝛼 and probability that slot is empty is 1 – 𝛼. Now, let X
represents number of probes required to find empty slot. Then,
𝐸[𝑋] = (1 − 𝛼) + 2𝛼(1 − 𝛼) + 3𝛼 2 (1 − 𝛼) + ⋯
𝛼 𝟏
𝐸[𝑋] = (1 − 𝛼)(1 + 2𝛼 + 3𝛼 2 + ⋯ ) = 1 + =
1−𝛼 𝟏−𝜶
Unsuccessful search time is same as number of probes required to insert an element into an open
hash table having n element because in unsuccessful case also we stop as soon as we encounter empty
slot and in insertion also same.
2) Successful search time : Given an open-address hash table with load factor 𝛼 < 1, the expected
1 1
number of probes in a successful search is at most 𝛼 ln 1−𝛼 , assuming uniform hashing that
each key in the table is equally likely to be searched for.
A successful search for a key k reproduces the same probe sequence as when the element with key k
was inserted. If K was the (i+1)th key inserted into the hash table, then we know that there are i key
already inserted so expected number of probes made in a search for k is at most 1/(1-i/m) = m/(m-i).
Taking average of all n keys in the hash table :
𝑛−1 𝑛−1 𝑚
1 𝑚 𝑚 1 1 1 1 𝑚 𝑑𝑥 𝟏 𝟏
∑ = ∑ = ∑ ≤ ∫ = 𝐥𝐧
𝑛 𝑚−𝑖 𝑛 𝑚−𝑖 𝛼 𝑘 𝛼 𝑚−𝑛 𝑥 𝜶 𝟏−𝜶
𝑖=0 𝑖=0 𝑘=𝑚−𝑛+1
No. of comparisons made during unsuccessful search : First key got mapped to a particular location
then we do our first comparison, then we follow search according to type of probing (for example, in
linear probing we do linear search, in quadratic probing we do quadratic search) till we hit empty
space. We again have to check whether that space is empty. So, in total we do cluster comparison +
1. 1 is due to empty space check.
No. of comparisons made during successful search : At worst we have to cover whole cluster and last
element of cluster is our key because successful search is given. So, no. of comparisons = cluster
y
comparisons only.
Cit
//Lecture 23b
um
1) Expected items per slot : we know that it is 𝛼 but still prove it.
Let X : Number of items in a particular slot then Xi : ith item maps to particular slot
E[Xi] = 1/m as there are m slot and ith item can map to any of the m slots in uniform hashing.
Therefore, E[X] = n/m
1 𝑛
Therefore, E[X] = 𝑚 × (1 − 𝑚)
E[X1] = 0 because at first insertion whole hash table is empty so there will be no collisions.
E[X2] = 1/m. because after first insertion element must be in any of the slot and we again have to map
2nd element to that slot only to result in collision.
E[X3] = 2/m. because after two insertion 2 slot must be filled and 3rd element must be mapped to one
of those 2 filled slots. Similarly, for E[4] …
𝑛(𝑛−1)
E[X] = E[X1] + E[X2] + … + E[Xn] = 0 + 1/m + 2/m + … + (n-1)/m = 2𝑚
NOTE : when open addressing is given without specifying probing strategy consider random probing.
//Lecture 24a
When we have array then we know that in memory we cannot directly store it in array formate
because memory is linear and that is why we got two choices i.e. either to put it in row wise or column
wise.
Now, in memory we do not have memory 1, 2, 3, if we want to get element, we want base address of
that array and then we can use index to get desired element.