0% found this document useful (0 votes)
8 views27 pages

Da Ds Notes

The document provides an overview of data structures, focusing on linked lists, their operations, and comparisons with arrays. It discusses various linked list operations such as insertion, deletion, and reversal, along with asymptotic analysis and complexity notations like Big O, Big Omega, and Theta. Additionally, it covers types of linked lists and the importance of memory allocation and access speed in data structure performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

Da Ds Notes

The document provides an overview of data structures, focusing on linked lists, their operations, and comparisons with arrays. It discusses various linked list operations such as insertion, deletion, and reversal, along with asymptotic analysis and complexity notations like Big O, Big Omega, and Theta. Additionally, it covers types of linked lists and the importance of memory allocation and access speed in data structure performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

INDEX

1. Linked list … (2)


Time complexity … (6)
Stack and Queue … (11)
2. Binary tree … (16)
AVL tree … (21)
3. Heap … (27)
4. Hashing … (34)
RC major in array … (39)

By Quantum City

“Not too Detailed Not too Short” Notes


y
Cit
um
ant
Qu
Data Structure

1. Linked list, Complexity, Stack and Queue


//Lecture 1

1.1) Linked list :

Linked list is a structure which consists of data and pointer.

struct node{
int data;
struct node *next;
};

What happens in memory when we write : Struct node x; ? – it will assign memory block according to
the size of struct that you have learned in c programming. That memory block will be consisting of
data and next pointer.

struct node{
int data;
struct node *next;
}x, y, *p;
...
p = &x;
x.next = y;
y.next = NULL;

but why we need linked list ? – LL is used in implementing playlist in music app. In browser pages to
get back and forward. And in operating system, we navigate from one tab to another tab or process.

1.1.1) Array vs Linked list :

In case of array memory assigned is continuous and in case of linked list it is not. So, when we access
any element of array operating system will fetch few continuous blocks from RAM to cache so first
access will be slow and then every access will be fast because OP have taken data into cache which is
superfast but in case of linked list as memory allocated is not continuous every access will take time.

But we can increase the size of linked list while size of array is fixed at the time of compilation. This is
main advantage of linked list over array. But linked list uses too much memory. And random access is
not possible in linked list as we have traverse whole list just to get the last element.
y
Cit

Array Linked list


Cache locality ✅ ❌
No. of elements dynamic ❌ ✅
um

Memory ❌ ✅
ant
Qu
Data Structure

Random access ✅ ❌

1.1.2) Operations on linked list :

1) Finding length of linked list : count until head is null

int length(struct node *head){


int count = 0;
while(head != NULL){
count ++;
head = head->next;
}
return count;
}

2) Printing linked list : print until head is null

int length(struct node *head){


while(head != NULL){
printf("%d", head->data);
head = head->next;
}
}

3) Insertion at beginning :

int insertatbegin(struct node *head){


struct node *newNode = (struct node *)malloc(sizeof(struct node));
newNode->data = 13;
newNode->next = head;
head = newNode;
return 0;
}

4) Insertion at end :

int insertatend(struct node *head){


struct node *newNode = (struct node *)malloc(sizeof(struct node));
newNode->data = 13;
newNode->next = NULL;
while(head->next != NULL) head = head->next;
head->next = newNode;
return 0;
}

5) Insertion at the middle : Suppose middle element has data 23 then


y

int insertatmid(struct node *head){


Cit

struct node *newNode = (struct node *)malloc(sizeof(struct node));


newNode->data = 13;
um

newNode->next = NULL;
ant
Qu
Data Structure

while(head->next->data != 23) head = head->next;


newNode->next = head->next;
head->next = newNode;
return 0;
}

6) Deleting first node : We can simply move head to head->next but in this way memory leak
problem will be there.

int deleteatbegin(struct node *head){


if(head == NULL)return 0;
if(head->next == NULL){
free(head);
head = NULL;
}else{
struct node *temp = head;
head = head->next;
free(temp);
}
return 0;
}

7) Deleting last node :

int deleteatend(struct node *head){


if(head == NULL)return 0;
if(head->next == NULL){
free(head);
head = NULL;
}else{
while(head->next->next != NULL) head = head->next;
free(head->next);
head->next = NULL;
}
return 0;
}

8) Deleting intermediate node :

First find that node and point to before that node then curr->next = curr->next->next; and then delete
curr->next.

You can also apply recursion on these functions instead of while loop.
//Lecture 4a

1.1.3) Reverse a linked list :


y
Cit
um

After current -> next = prev


ant
Qu
Data Structure

After, prev = curr; curr = next; next = next->next;

We do this until curr == NULL

Node *reverseIterative(Node *head){


if(head == NULL || head->next == NULL) return head;
Node *prep = NULL;
Node *currp = head;
Node *nextp = head->next;
while(currp){
currp->next = prep;
prep = currp;
currp = nextp;
if(nextp) nextp = nextp->next;
}
return prep;
}
//Lecture 4b

Reverse a linked list using recursion :

When we follow choice 1 templet in recursion then we will loss out connection with 8 node. So
somehow, we need to go at the last node and do recursion of l->next->next = l; So, that is why second
choice is best.

void *reverserecursive(Node *head){


if(head == NULL || head->next == NULL)return;
y

reverserecursive(head->next);
Cit

head->next->next = head;
}
um
ant
Qu
Data Structure

But after executing this function our head is pointing to last node because we have done reverse so
now our first node is last and our last node is not pointing to NULL. Thus, in main function we carry
one pointer which points to last element of our original linked list.

int main(){
...
Node *last = head;
while(last->next)last = last->next;
reverserecursive(head);
head->next = NULL;
head = last;
...
}

Another method without last node and head->next = NULL in main function is to return head pointer
in reverserecursive function call. And after last recursive call head->next should points to NULL. So
final recursive program will be

head *reverserecursive(Node *head){


if(head == NULL || head->next == NULL)return head;
Node *n = reverserecursive(head->next);
head->next->next = head;
head->next = NULL;
return n;
}

Here in above example, time complexity is O(n) as it traverses whole linked list. But space complexity
is not O(1) it is O(n) because each recursive call takes O(1) time and there are n such. Consider one
code where only while loop runs O(n) times then space complexity is O(1) as only that program is
getting pushed into stack but in case of recursive calls n activation records are being pushed into stack.
//Lecture 4c

1.1.4) Type of linked list :


• Circular linked lists : Circular list is the simply linked list with last node points to head of the
linked list.
• Doubly linked list : We can navigate in both direction
//Lecture 5a

1.2) Asymptotic Analysis :

Till now, we have seen linked list and operation on it. So, it is logical to ask that which operation takes
more time ? And what if operations are too large to fit in memory ?
y
Cit
um
ant
Qu
Data Structure

Analyzing running time :

• Can be run on different computers


• Different clock speed
• Different instruction set
• Different memory access speed

The goal of asymptotic analysis is to simplify analysis of running time by getting rid of “details”. Like
rounding : 1,000,0001 = 1,000,000 and 3n2 = n2.
//Lecture 5b

1.2.1) Big oh and big omega asymptotic notations :

Now, we are going to represent time complexity and space complexity with a funtion of input size.

1) Big-oh O notation :

𝑇(𝑛) is 𝑂(𝑔(𝑛)) if there exist constant 𝑐 > 0 and 𝑛𝑜 ≥ 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) ≤ 𝑐𝑔(𝑛).
Provides asymptotic upper bound.

2) Big Omega Ω Notation :

𝑇(𝑛) is Ω(𝑔(𝑛)) if there exist constant 𝑐 > 0 and 𝑛𝑜 ≥ 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) ≥ 𝑐𝑔(𝑛).
Provides asymptotic lower bound.

We can say that 𝑇(𝑛) is Ω(𝑔(𝑛)) if and only if 𝑔(𝑛) is 𝑂(𝑇(𝑛)).


//Lecture 5c

3) 𝜽 Theta notation :
y
Cit

For a given function 𝑔(𝑛) we denote by 𝜃(𝑔(𝑛)) the set of functions:


um
ant
Qu
Data Structure

𝜃(𝑔(𝑛)) = {𝑓(𝑛): 𝑡ℎ𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐1 , 𝑐2 , 𝑎𝑛𝑑 𝑛𝑜 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 0 ≤ 𝑐1 𝑔(𝑛) ≤ 𝑓(𝑛)
≤ 𝑐2 𝑔(𝑛) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑛 ≥ 𝑛𝑜 }

//Lecture 5d

What does it mean to say Aysmptotically larger ? – It means we are ignoring constants and looking at
significant terms. Example,

Clearly, 2n2 > n2 but asymptotically they are equal. but 𝑛5+𝑛 ≠ 𝑛𝑛 because 𝑛5+𝑛 = 𝑛5 . 𝑛𝑛 ≠ 𝑛𝑛 .

Q : Prove that 𝑛𝜖 is asymptotically larger than (log 𝑛)𝑘 where 𝜖 < 0 and 𝑘 > 0. –

//Lecture 5e

But, we know that n3 > n2 but still after taking log we get log n ? log n. Clearly both are equal but we
know that it is wrong. So, when to take log and when not to ?

If you see carefully after taking log the value become so small that we cannot say if they are
asymptotically not equal. which means if after taking log things are far then before taking log they
would be very very far. Thus, if we encounter situations when a > b then we cannot say that log a >
log b but if log a > log b then we can definatly say that a > b but we can’t say a = b. We can say,
y
Cit
um

Some operation which always works :


ant
Qu
Data Structure

• Cancel out common terms (multiply or divide)


• Ignore constant only when c f(n) form otherwise never ignore.
• And for log we have already discussed.
• Sometimes we have to assume n(ex. 2128, 2256, 21024,…)
//Lecture 5f

4) o little-oh :

Defintion : 𝑇(𝑛) is 𝑜(𝑔(𝑛)) for any constant 𝑐 > 0 there is 𝑛𝑜 > 0 so that for all 𝑛 ≥ 𝑛𝑜

𝑇(𝑛) < 𝑐𝑔(𝑛)


See the definition of big O in which there exists c is written wherein little-oh has any constant means
every it should be true for every c.

That is 𝑛2 ≠ 𝑜(2𝑛2 ) because for c = ½. Both are equal but we want strickly greater.

5) 𝝎 little-omega :

Definition : 𝑇(𝑛) is 𝜔(𝑔(𝑛)) for any constant c>0 there is 𝑛𝑜 > 0 so that for all 𝑛 ≥ 𝑛𝑜 , 𝑇(𝑛) > 𝑐𝑔(𝑛)

In this definition also for any constant is written.

And we can observe one thing that 𝑇(𝑛) = 𝜔(𝑔(𝑛)) 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑔(𝑛) = 𝑜(𝑇(𝑛))

Imcomparability : n1+sin(n) is incomparable to n since sin(n) oscillates between -1 and 1.


𝑛 𝑛
Stirling approximation : 𝑛! ~√2𝜋𝑛 ( )
𝑒

But if we take log then, lg (n!) = lg(n) + lg(n-1) + …. + lg(1)

Which means lg(n!) <= lg(n) + lg(n) + … n times ➔ lg(n!) <= nlgn

And now, lg(n!) = lg(n/2) + lg(n/2) + … till n/2 terms and then + lg(n/2 -1) + … + lg(1)

This will surely be less than nlgn. ➔ lg(n!) >= nlgn

Combining these two we get 𝐥𝐠(𝒏!) = 𝜽(𝒏𝒍𝒈𝒏)

NOTE :

1) If we say n2 = O(n3) formally it is n2 ∈ O(n3) because if you see at definition also O(n3) is set
of function.
2) All the asymptotic notions are sets. But we use = ambiguously.

Q : For any two non-negative function f(n) and g(n), both of which tends to infinity, we must have
either f(n) = O(g(n)) or g(n) = O(f(n)) ? – This seems true but you can have f(n) = n2 and g(n) = n for
y
Cit

even input and n3 for odd inputs. Here g(n) goes up and down and is making spiral shape with f(n) so
it is sometimes O(f(n)) and sometimes Ω(f(n)).
um

//Lecture 6c
ant
Qu
Data Structure

1.2.2) Analyzing the loop time complexity :

main(){
for(int i = 0;i<N;i++)
for(int j = i+1;j<N;j++){
statement;
}
}

In above case we do loop unrolling.

for(int i = 1; i<N; i+=2){


sum += i;
}

In this example we say after k step i becomes equal to N and then loop gets over. N = 1 + 2k meaning
k = (N-1)/2 and we know that complexity is O(k) (why?) = O(N/2) = O(N)

for(int i = N; i<=1; i-=2){


sum += i;
}

In first example,

In second example, we are doing reverse

Meaning time complexity will still remains same i.e. O(N/2) = O(N)

Thus, to find time complexity first assume O(k) which means after k step condition will false and then
find relation between k and n.

for(int i = 2; i<=n; i=i*i){


sum += i;
}

for(int i = 1; i<=N; i*=2){


for(int j = 1; j<=i*i; j++){
sum++;
}
}

We again assume that i loop terminates after k iteration at the end of k iteration the value of 2k = N.

for(int i = 0;i<n; i*=2) sum += i;

In above case, as the value of i will remain zero because of initial condition so this is infinite loop.
y
Cit

In short, we can say that

Increment : O(n) if (i = i + c)
um
ant
Qu
Data Structure

Doubling : O(log n) if (i = 2 * i)

Exponentiation : O(lg lg n) if (i = i * i)
//Lecture 7a

1.2.3) Best case, Worst case :

Best case : Whatever be the minimum time for any possible input. Even on 1 input you are taking O(1)
and all other inputs you are taking O(n4) then best case time complexity is O(1).

Worst case : Whatever be the maximum time for any possible input. Even on one input you are taking
O(n3) and all other inputs you are taking O(n) worst case time complexity O(n3).

O is for describing worst – case running time.

Ω is for describing best – case running time.


//Lecture 7c

Q : Consider an algorithm A which takes 𝜃(𝑛) in best case and 𝜃(𝑛2 ) in worst case. Then which of the
following is/are true ?

1) Algorithm time complexity is 𝑂(𝑛2 ) – true


2) Algorithm time complexity is Ω(𝑛) – true
3) Algorithm time complexity is O(𝑛3 ) – true
4) Algorithm time complexity is Ω(𝑛2 ) – False because it means for all cases time complexity >=
n2 but worst case is n2 so not possible.
5) Algorithm time complexity is Ω(1) – true
6) Algorithm best case time complexity is 𝜃(𝑛) – true
7) Algorithm worst case time complexity is 𝜃(𝑛2 ) – true
8) Algorithm best case time complexity is 𝑂(𝑛) – true
9) Algorithm worst case time complexity is Ω(𝑛2 ) – true
10) Algorithm best case time complexity is 𝑂(𝑛2 ) – true
//Lecture 8a

1.3) Stack and Queue :

Abstract data types : An abstract data type (ADT) specifies the operation that can be performed on
the collection. It’s abstract because it doesn’t specify how the ADT will be implemented. A given ADT
can have multiple implementations. For example, linked list, stack, queue.

1.3.1) Stack :
y

Why stack ? – some of the application includes reverse a word, undo mechanism in text editors,
Cit

function calls, expression evaluation, balancing parenthesis.


um
ant
Qu
Data Structure

Stack permutation of a sequence : It is a permutation obtained after pushing and popping every
alphabet of sequence. Here pushing and popping can be done at any time meaning between alphabet.

1
Number of stack permutations : also known as Catalan number 𝐶𝑛 = 𝑛+1 (2𝑛
𝑛
)

//Lecture 3b

Implementing stack :

We can implement stack using array. we take array of some size and a point to index which initially
have value -1.

push(k){
top++;
if(stack.size() == N) "Overflow", return;
a[top] = k;
}
pop(){
if(top == -1) "Underflow", return;
return a[top--];
}

Using linked list, we can also implement stack.


//Lecture 9a

1.3.2) Introduction to queue :

Why Queue ? – Sometimes it is also desirable to access the element which got inserted first. For
example, in networking, call center phone systems, operating system.
//Lecture 9b

1) Implementing queue using array :

We can implement queue using array but it is not as effective as circular array-based implementation.

We will use two pointer front and rear. And initially front = rear = -1.
y
Cit
um
ant
Qu
Data Structure

How to check if

• Queue is full – when (r+1) mod n = f


• Queue is empty – When f = -1 or r = -1 or both.

Enqueue(data){
if((rear + 1) % N == front){
printf("Queue is full");
return;
}
front = (front == -1) ? 0 : front;
rear = (rear+1) % N;
array[rear] = data;
}

Dequeue(){
if(front == -1){
printf("Queue is empty");
return;
}
data = array[front];
if(front == rear) front = rear = -1;
front = (front+1) % N;
return data;
}

There is also operation of getsize() which returns no. of elements in queue.

Getsize(){
if(front == -1 && rear == -1) return 0;
return (front>rear) ? N-front+rear+1 : rear-front+1;
}
//Lecture 9c

There is also one implementation where initially front = rear = 0 instead of -1. But in this we have
wastage of one element space.

In such implementation, how to check if

• Queue is full – front == (rear + 1) % N


• Queue is empty – Front == rear == 0
//Lecture 9d

2) Implementing queue using linked list :

Enqueue( ) : Inserting an element at end of list


y
Cit

Dequeue( ) : Deleting an element from beginning of list


//Lecture 9e
um

3) Queue using two stacks :


ant
Qu
Data Structure

We use one stack for insertion (S1) and one for deletion of element (S2).

For Enqueue ( ) : we will simply push into S1

For Dequeue ( ) : if S2 is not empty then pop(S2) and if S2 is empty we will transfer all the elements
from S1 to S2 and then pop(S2).
//Lecture 10a

Stack using two queues :

For push() : we will simply push into non-empty queue.

For pop() : We dequeue n-1 element (meaning all except one) and enqueue in second queue. One
element is remaining which will be top element in case of stack.
//Lecture 10b

1.3.3) Few applications of stack :

1) Balancing parentheses :

Create a stack.
while(input is finished){
if(character is an opening delimiter like (, {, [)
PUSH it into the stack;
if(character is a closing symbol like ), }, ]){
POP the stack;
if(stack is empty) report error;
if(symbol POP-ed is not the corresponding delimiter) report error;
}
}
//At the end of the input
if(stack is not empty) report error;

2) Two stacks in one array :

To implement two stacks, we manage two pointer points to top elements of both stacks. After pushing
elements, if we encounter situation where Top1 = Top2. We say stack overflow. Remember in this
implementation we first increment top1 or decrement top2 and then store data.

3) Infix, prefix and postfix and conversion to each other :

Ways of writing expression :

1. Infix notation A * B + C / D
2. Prefix notation (also known as “Polish Notation”) + * A B / C D
3. Postfix notation (also known as “Reverse Polish Notation”) A B * C D / +

Convert infix to prefix and postfix :

Step 1 : apply appropriate brackets based on precedence and associativity.


y
Cit

Step 2 : Convert each inner bracket to postfix separately.

Step 3 : Combine and repeat step 2.


um
ant
Qu
Data Structure

Example :

Now, we will see how computer do these operations. It uses stack.

• Infix to postfix using stack :

input is -
if('(') push in stack;
if(')') pop until left parenthesis is popped;
if(operator){
lower priority is in input then pop all;
Higher priority is in input then push;
same priority pop except ↑ ;
}

We can calculate result value from postfix operation by simply taking one empty stack we push
number from left to right whenever we encounter binary operations, we pop top two elements and
perform operation and then push the result again into the stack.

y
Cit
um
ant
Qu
Data Structure

2. Binary, AVL tree


//Lecture 11a

2.1) Binary trees :

Binary tree is tree having at most 2 children per node.

Depth (or level) of a node : The depth of a node is the number of edges from the node to the root.
Root node will have depth (or level) of 0.

There are few types of binary tree based on different conditions :

Full binary tree : In a full binary tree all nodes have either 0 or 2 children.

Complete tree : All levels except last are full. Last level is left-filled.

Perfect binary tree : Complete binary tree when last level is also full.

Remark : some authors considers complete tree as perfect binary tree but that is just matter of notion.
If some strange term appears in exam they will specify.

No. of leaves = (Number of nodes with degree 2) + 1

In full binary tree all internal nodes have degree 2 so here no. of leaves = Internal nodes + 1.

In general, in m-ary tree if n is total number of nodes and i is internal nodes then n = mi + 1 and (m-
1)i + 1 leaves.

Q : The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum
number of nodes in a binary tree of height h is ? – Maximum node happen in perfect binary tree where
all parents have two children. So total number of nodes in perfect binary tree at height h is 2h and
total number of nodes in binary tree is 2h+1 – 1.

2.1.1) representation of tree :

1) Array representation :
y
Cit
um
ant
Qu
Data Structure

2) Linked list representation :

struct node{
int data; //element
struct node *left; //pointer to l child
struct node *right; //pointer to r child
};
struct node *root;
root = (struct node*)malloc(sizeof(struct node));

root->data = 3;
root->left = NULL;
root->right = NULL;
//Lecture 12a

2.1.2) Binary tree traversals :

• Pre-order (root left right)


• In-order (left root right)
• Post-order (left right root)

//Lecture 12b

Binary tree construction :

If only Inorder and preorder is given then,


y

Inorder = g d h b e I a f j c
Cit

Preorder = a b d g h e I c f j
um
ant
Qu
Data Structure

You know that preorder contains all root in sequence. Meaning first element of preorder is root of the
tree. And inorder contains root at the middle and all left nodes on left side of that root and right nodes
on right-hand side.

If inorder and postorder is given then we follow same procedure but instead of going left to right for
selecting root we go right to left as main root of tree appears at the last in postorder.

But if only postorder or only preorder is given then we can’t uniquely construct binary tree.

Q : but how many binary trees are possible with given only postorder traversal or given only preorder
traversal ? – Catalan number = number of stack permutation.
//Lecture 12c

Q : But what if preorder and postorder is given is it possible to construct ? – with preorder = 1 2 3,
postorder = 3 2 1, no. of possible binary tree…

But if question was construct FULL binary tree using

Preorder = a b c d f g e

Postorder = c f g d b e a

Here a is root as it appears at the start of pre and end of post. And then go with instinct. If complete
binary tree was given then things will be easy. First see if elements are power of 2 – 1. If not then you
have to make one more node to satisfy left fill property. In short, we can construct unique BT iff

From above we can conclude that we can construct unique any BT iff we have inorder + any order.
//Lecture 13a

2.2) Binary search tree :

Binary search tree is binary tree such that all the nodes at the RHS of the root node of subtree is
greater than value stored at root node and all the nodes at the LHS of the root node of subtree is less
or equal to value stored at root node. For example, for any node
y
Cit

Now, we know that inorder traversal of any tree can be thought of as projecting values of each node
um

from left to right on number line. And From property of BST we know that minimum number occurs
ant
Qu
Data Structure

at the LHS of tree and maximum element occurs at the RHS of tree. And if we project on number line
then we have sorted sequence of numbers meaning inorder traversal of BST always gives sorted
number in increasing order.

2.2.1) Operation on BST :

1) Search in BST : Let’s say you want to search k. We first visit root node

Step 1 : if value is less then root value then visits left node of root and go to step 1

Step 2 : if value is less then root value then visits right node of root and go to step 1

Step 3 : If value is equal then return else return error.

Step 4 : If you reach leaf node (and k !=) then return k is not found.

2) Insertion in BST : same as searching we first search for the position if we reach leaf then check
if value is less than or equal to k if it is then k will be left child if not then k will be right child.

Q : Build BST from given numbers. 10, 12, 5, 4, 20, 8, 7, 15 and 13 –

Kth maximum or Minimum element in BST : first we find inorder traversal of BST then we find kth
element from left to get kth minimum or you can find kth element from right to get kth maximum.

3) Range search in BST :

void RangePrinter(node *root, int k1, int k2){


if(root == NULL) return;
if(root->data >= K1 && root->data <= k2){
printf("%d", root->data);
RangePrinter(root->left, k1, k2);
RangePrinter(root->right, k1, k2);
}
else if(root->data < k1)
RangePrinter(root->right, k1, k2);
else RangePrinter(root->left, k1, k2);
}

Complexity of range searches : You will defiantly encounter nodes between k1 and k2 let’s say there
are m nodes between range. So, m + something. This something is some extra nodes traversed. At
worst we can search in longest chain in BST which has length of h (height) so total time complexity is
y

O(h + m). where h = lg n. and if n >> m then time complexity will become O(lg n).
Cit

//Lecture 13c
um

4) Deletion in BST :
ant
Qu
Data Structure

But before that we need to understand few concepts clearly.

In-order predecessor : Maximum value in left-subtree

In-order successor : Minimum value in right subtree

Now we can delete node from BST,

Three cases in delete :

• Case 1 : Leaf node – just delete it


• Case 2 : One child – Delete it, connect child to parent
• Case 3 : Two children – you can either replace the which you want to delete with in-order
successor or predecessor. In case 3 we have two subcases,

//Lecture 14a

2.2.2) Possible probe sequence :

Let’s say we want to find legal sequence of BST search we can encounter while searching key 10. 1, 2,
5, 20, 25 these are the nodes encounter (not in order) how many sequences of these nodes are
possible ? – A legal sequence has values greater than 10 in decreasing order and values less then 10
in increasing order. And these lesser and greater values can occur randomly but in order. For example,

1, 20, 2, 5, 25 is not legal sequence because values greater than 10 are not in order if it would be 1,
25, 2, 5, 20 then it is legal sequence. So, number of such legal order is 5! / (3! X 2!). because we form
two groups {1, 2, 5} and {25, 20} now these may occur randomly but it should be in order.
y

So, whenever you are given sequence asking which one is valid do not waste time in making tree. Just
Cit

collect values less and greater than key then see if less values are in increasing order and greater
values are in decreasing order. This method is only valid when there is successful search for key is
um

given if unsuccessful search for key is given then this method gives wrong results. For example,
ant
Qu
Data Structure

Suppose the BST has been unsuccessfully searched for key 273. And sequence is given 550, 149, 507,
395, 463, 402, 270. We see that all the values less than 273 are in increasing order and values more
than 273 are in decreasing order but still it is false because after 395 it should have taken left child or
less value because 273 is less than 395 but it has traverse right child which means it is false sequence.

Number of permutations of inserting :

In remaining position, we can always place 6 before 5 and 7 but 5 and 7 can be in any order so, total
2! For 6 5 7. Total number of permutations of inserting = 2*C(6, 3)*2.

2.2.3) BST time complexity :

Three cases are possible. i.e. best, average, worst case

Operation Best Case Average Case Worst Case


Search O(1) – in first node is key O(lg n) – height of tree O(n) – chain like structure
Deletion O(1) – first node is deleted O(lg n) – height of tree O(n) – Chain like structure
Insertion O(1) – insertion on empty O(lg n) – height of tree O(n) – Chain like structure
tree
//Lecture 15

2.3) AVL tree :

There is one problem you can observed with binary search tree that in worst case it takes O(n) time
complexity because of possibility of chain like structure we say “BST can be skewed”.

Solution : requires a balance condition that ensures height is always O(lg n)

Any suggestion for balance condition ? –

Suggestion 1 : right and left subtree of root have equal number of nodes. But can result in such
structure,

Suggestion 2 : right and left subtree of root have equal number of nodes an equal height. But still we
can have such structure
y
Cit

Suggestion 3 : right and left subtree of every node have equal number of nodes. This suggestion will
always make sure that height is lg n. but this condition is too strong we can only make perfect trees.
um

Final suggestion : balance of every node is between -1 and 1 i.e. balance(node) ∈ {-1, 0, +1}
ant
Qu
Data Structure

4. Hashing, RC major in Array


//Lecture 20a

Superfast method for searching, inserting, deleting.

When to prefer hashing :

• When order of data does not matter


• The relationship between data does not matter

Application :

• Google search (page rank algorithm)


• File system – c/user/desktop… to sector on disk
• Compiler – symbol table
• Digital signatures
• Machine learning (locality sensitive hashing)

Our goal is to do insertion, deletion, search in O(1) on avg time complexity.

Direct addressing table : a fancy name for “array”… it is same as array where elements are stored at
index same as their array index. It’s limitations :

1) Key must be non-negative integer. In case string or other datatype is given we first map it to
int and then store
2) Range of keys must be small
3) Keys must be dense i.e. not many gapes in the key values.

How to avoid these limitations ?

• Map non-negative keys to integers


• Map large integers to smaller integers
//Lecture 20b

4.1) Hash table :

With direct addressing, an element with key k is stored in slot k. With hashing, this element is stored
in slot h(k); that is, we use a hash function h to compute the slot from the key k. Here, h maps the
universe U of keys into the slots of a hash table T[0…m-1] :

h : U → {0, 1, 2, 3, …, m-1}

Division method (mod operator) : map into a hash table of m slots. ℎ𝑎𝑠ℎ(𝑘) = 𝑘%𝑚

Q : how to pick m (table size) ? – if m is power of two, say 2n, then (key mod m) is the same as extracting
the last n bits of the key. This is not good idea as it only considers last n bits two number can have
some common last bits in that case collision happens. What if m is 10n, then the hash value is the last
n digit of the key. This is also not good idea because 4 and 34 maps to same location.

Rule of thumb : pick a prime number, close to a power of two, to be m.


y
Cit

We want h(k) depends on every bit of k, so that the differences between different k’s are fully
considered.
um
ant
Qu
Data Structure

𝑛
Load factor : 𝛼 = 𝑚

4.1.1) Simple uniform hashing : is when any given element is equally likely to hash into any of the m
slots, independently of where any other element has hashed to
1
𝑼𝒏𝒊𝒇𝒐𝒓𝒎: Prob.[ℎ(𝑥) = 𝑖] = 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑎𝑛𝑑 𝑎𝑙𝑙 𝑖
ℎ∈𝑈 𝑚
4.1.2) Collisions and its resolution techniques : Since |U|> m, there must be 2 keys that have the
same hash value. We need a mechanism for handling collisions.

Collision resolution techniques :

• Separate chaining – also called close addressing or open hashing. Close addressing because
Im not allowed to search for an element besides that index. It’s called open hashing because
Im allowed to use external data structure like linked list etc.
• Linear probing
• Quadratic probing
• Double hashing
//Lecture 20c

1) Separate chaining : all keys that map to the same hash value are kept in a list.

Worst case performance :

• Search (k) : worst case = O(length of chain), Worst length of chain : O(n) all maps to same.
• Insert (k) : Need to check whether key already exists, still takes O(length of chain)
• Delete (k) : need searching so O(length of chain)

However, in practice, hash tables work really well, that is because the worst case almost never
happens. And average case performance is really good.
//Lecture 20d

Theorem : In a hash table in which collisions are resolved by chaining, an unsuccessful search takes
average-case time O(1+𝛼), under the assumption of simple uniform hashing.

Proof. We know that here 𝛼 represents load factor which in chaining represents average length of
chain. So, consider we search for key which is not present in hash table we will first mapped to some
y

entry in hash table (which takes O(1)) then in average case we go through whole chain and we don’t
Cit

get that element so O(length of chain) = O(𝛼). Total of O(1+𝛼).


um

Q : What if we ask above theorem but in successful search ? – then also answer remain same.
ant
Qu
Data Structure

Time to successful search for ith item in table having n items. = time to insert ith item when there
were i-1 items in hash table = unsuccessful search with i-1 item in the table.
//Lecture 21a

From now on we will talk about open address hash tables or closed hashing where all elements are
stored in the has table i.e. n<=m and there is no chain. To avoid collision, we use probing.

Q : How to probe ? – we want to design a function h, with the property that for all k ∈ U :

ℎ: 𝑈 × {0, 1, 2, … , 𝑚 − 1} → {0, 1, 2, … , 𝑚 − 1}

2) Linear probing :

Idea : use empty space in the table.

if h(key) is already full,


try (h(key) + 1)% TableSize. if full,
try (h(key) + 2)% TableSize. if full,
try (h(key) + 3)% TableSize. if full,...

Example : insert 38, 19, 8, 109, 10

Here we have used ith probe was (h(key) + i) % TableSize

In general, we have some f and use (ℎ(𝑘𝑒𝑦) + 𝑓(𝑖)) % 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒. In this f(i) can be any function of i

We have named few probing strategies depending upon f(i)

• Linear probing when f(i) = i


• Quadratic probing when f(i) = c1i + c2i2 or i2
• Double hashing when f(i) is another hash function i x h1(key)

Search in linear probing : Continue looking at successive locations till we find key or encounter an
empty location.

Deletion in linear probing : We can’t just delete element because it may create empty space. Because
of this empty space upcoming search may affect and gives us wrong results. For example, here we
delete 76 and then search for 55 then we will get “Not found” although it is present.

Problem with linear probing :


y
Cit

Primary cluster
//Lecture 21b
um

3) Quadratic hashing :
ant
Qu
Data Structure

ℎ(𝑘, 𝑖) = (ℎ′ (𝑘) + 𝑐1 𝑖 + 𝑐2 𝑖 2 )𝑚𝑜𝑑 𝑚


where c2 != 0 if it were then h(k, i) degrades to linear probing. But this is general formula we basically
take c1 = 0 in most cases.

Note that we apply all of these things whenever collision happens.

Problem with quadratic hashing : if two keys have the same initial probe position, then their probe
sequences are the same.

Since ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) implies ℎ(𝑘1 , 𝑖) = ℎ(𝑘2 , 𝑖) . This is called secondary clustering.

Cycles in quadratic probing :

But this cycle can be eliminated with careful selectin of c1, c2 and h(k).

4) Double hashing :

ℎ(𝑘𝑒𝑦) = (ℎ(𝑘𝑒𝑦) + 𝑖ℎ1 (𝑘𝑒𝑦))𝑚𝑜𝑑 𝑚

here h and h1 are different hashing functions. But how it solves secondary clustering problem ? – since
secondary clustering satisfy ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) implies ℎ(𝑘1 , 𝑖) = ℎ(𝑘2 , 𝑖) but double hashing do not
have these problem if ℎ(𝑘1 , 0) = ℎ(𝑘2 , 0) is true then new hash function for k1 would be ℎ(𝑘1 , 1) +
ℎ1 (𝑘1 , 1).

NOTE : The main advantage of Chaining over open addressing is that in Open Addressing sometimes
though element is present we can’t delete it if Empty Bucket comes in between while searching for
that element; Such Limitation is not there in Chaining.
//Lecture 21c

4.1.3) Possible number of probes :

Possible no. of probes in linear probing = m because only for first position we have to decide position.

For quadratic probing = m because only for first position we have to decide position.

In uniform hashing we can have m! permutation of probes as each position is equally likely to choose
even after some insertion.

In double hashing, we have m2 permutation how ?


y

//Lecture 22a
Cit

4.2) Analysis of open addressing :


um
ant
Qu
Data Structure

In this section we are going to use uniform hashing so there is no point of collisions. We are only
concerned about element in cell. So, there are m! permutation is possible.

Load factor 𝜶 in open addressing : In open addressing, the hash table can “fill up” so that no further
insertions can be made; one consequence is that the load factor 𝛼 can never exceed 1.

4.2.1) Search time open addressing :

1) Unsuccessful search time : Given an open-address hash table with load factor 𝛼 the expected
1
number of probes in an unsuccessful search is at most 1−𝛼, assuming uniform hashing.

Proof. We have an open-address hash table with m slots, load factor α, and uniform hashing, where 0
< α < 1. This means that there is n = α * m elements stored in the hash table.

In an unsuccessful search, we're looking for an element that is not in the hash table. We start by
hashing the key and checking the slot. If it's empty, we're done. If it's occupied, we need to probe
further. Probability that slot is occupied is 𝛼 and probability that slot is empty is 1 – 𝛼. Now, let X
represents number of probes required to find empty slot. Then,

𝐸[𝑋] = (1 − 𝛼) + 2𝛼(1 − 𝛼) + 3𝛼 2 (1 − 𝛼) + ⋯
𝛼 𝟏
𝐸[𝑋] = (1 − 𝛼)(1 + 2𝛼 + 3𝛼 2 + ⋯ ) = 1 + =
1−𝛼 𝟏−𝜶
Unsuccessful search time is same as number of probes required to insert an element into an open
hash table having n element because in unsuccessful case also we stop as soon as we encounter empty
slot and in insertion also same.

2) Successful search time : Given an open-address hash table with load factor 𝛼 < 1, the expected
1 1
number of probes in a successful search is at most 𝛼 ln 1−𝛼 , assuming uniform hashing that
each key in the table is equally likely to be searched for.

A successful search for a key k reproduces the same probe sequence as when the element with key k
was inserted. If K was the (i+1)th key inserted into the hash table, then we know that there are i key
already inserted so expected number of probes made in a search for k is at most 1/(1-i/m) = m/(m-i).
Taking average of all n keys in the hash table :
𝑛−1 𝑛−1 𝑚
1 𝑚 𝑚 1 1 1 1 𝑚 𝑑𝑥 𝟏 𝟏
∑ = ∑ = ∑ ≤ ∫ = 𝐥𝐧
𝑛 𝑚−𝑖 𝑛 𝑚−𝑖 𝛼 𝑘 𝛼 𝑚−𝑛 𝑥 𝜶 𝟏−𝜶
𝑖=0 𝑖=0 𝑘=𝑚−𝑛+1

No. of comparisons made during unsuccessful search : First key got mapped to a particular location
then we do our first comparison, then we follow search according to type of probing (for example, in
linear probing we do linear search, in quadratic probing we do quadratic search) till we hit empty
space. We again have to check whether that space is empty. So, in total we do cluster comparison +
1. 1 is due to empty space check.

No. of comparisons made during successful search : At worst we have to cover whole cluster and last
element of cluster is our key because successful search is given. So, no. of comparisons = cluster
y

comparisons only.
Cit

//Lecture 23b
um

4.2.2) Expectation in hashing :


ant
Qu
Data Structure

Here Bernoulli’s random variable is begin used.

1) Expected items per slot : we know that it is 𝛼 but still prove it.

Let X : Number of items in a particular slot then Xi : ith item maps to particular slot

X = X1 + X2 + X3 + … + Xn (why upto Xn because we have n items at the end), E[X] = ?

E[Xi] = 1/m as there are m slot and ith item can map to any of the m slots in uniform hashing.
Therefore, E[X] = n/m

2) Expected number of empty locations :

Let X : number of empty locations then Xi : ith slot is empty

X = X1 + X2 + X3 + … Xm (why upto m because there are m slots), E[X] = ?


1 𝑛
E[X1] is same as all the n keys mapped to other location apart from location 1. Means (1 − 𝑚) .

1 𝑛
Therefore, E[X] = 𝑚 × (1 − 𝑚)

3) Expected number of collisions :

X : number of collisions, Xi = collisions at ith insertion.

E[X1] = 0 because at first insertion whole hash table is empty so there will be no collisions.

E[X2] = 1/m. because after first insertion element must be in any of the slot and we again have to map
2nd element to that slot only to result in collision.

E[X3] = 2/m. because after two insertion 2 slot must be filled and 3rd element must be mapped to one
of those 2 filled slots. Similarly, for E[4] …
𝑛(𝑛−1)
E[X] = E[X1] + E[X2] + … + E[Xn] = 0 + 1/m + 2/m + … + (n-1)/m = 2𝑚

NOTE : when open addressing is given without specifying probing strategy consider random probing.
//Lecture 24a

4.3) Row major and column major in arrays :

When we have array then we know that in memory we cannot directly store it in array formate
because memory is linear and that is why we got two choices i.e. either to put it in row wise or column
wise.

Now, in memory we do not have memory 1, 2, 3, if we want to get element, we want base address of
that array and then we can use index to get desired element.

In Row major : Suppose a is m x n array. Address of a[i][j] = base + ni + j


y
Cit

In column major : Suppose a is m x n array. Address of a[i][j] = base + mj + i


//Lecture 24b
um
ant
Qu

You might also like