Data Structures and Algorithms Made Easy_ Data Structures and Algorithmic Puzzles_nodrm
Data Structures and Algorithms Made Easy_ Data Structures and Algorithmic Puzzles_nodrm
And
Algorithms
Made Easy
By
Narasimha Karumanchi
All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means,
including information storage and retrieval systems, without written permission from the publisher or author.
This book has been published with all the efforts taken to make the material error-free after the consent of the
author. However, the author and the publisher do not assume and hereby disclaim any liability to any party for
any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from
negligence, accident, or any other cause.
While every effort has been made to avoid any mistake or omission, this publication is being sold on the condition
and understanding that neither the author nor the publishers or printers would be liable in any manner to any
person by reason of any mistake or omission in this publication or for any action taken or omitted to be taken or
advice rendered or accepted on the basis of this work. For any defect in printing or binding the publishers will be
liable only to replace the defective copy by another copy of this work then available.
Acknowledgements
Dear 𝑀𝑜𝑡ℎ𝑒𝑟 and 𝐹𝑎𝑡ℎ𝑒𝑟, I cannot find the words to express my gratitude for everything you have done for me.
Your unwavering love and dedication in providing a stable and nurturing household has allowed me to flourish
and embrace life with open arms. Your traditional values and persistent efforts have instilled in me the belief that
with faith, hard work, and determination, anything is possible. You are not just my parents, but my role models,
and I feel blessed to have you in my life.
I would like to acknowledge the many people who have contributed to the creation of this book. Without their
support, advice, feedback, and assistance in editing, proofreading, and design, this book would not have been
possible. Thank you for everything.
−𝑁𝑎𝑟𝑎𝑠𝑖𝑚ℎ𝑎 𝐾𝑎𝑟𝑢𝑚𝑎𝑛𝑐ℎ𝑖
M-Tech, 𝐼𝐼𝑇 𝐵𝑜𝑚𝑏𝑎𝑦
Founder, 𝐶𝑎𝑟𝑒𝑒𝑟𝑀𝑜𝑛𝑘. 𝑐𝑜𝑚
Preface
Dear Reader,
I urge you to pause for a moment! Although many individuals tend to skip the Preface when reading a book, I
implore you to make an exception and read this particular preface.
The primary aim of this book is not to provide you with a comprehensive list of theorems and proofs on data
structures and algorithms. Instead, I have adopted a method of enhancing problem-solving skills by presenting
multiple solutions for each problem, each with varying complexities. Essentially, this book is an enumeration of
potential solutions that will assist you in your thought process when approaching new questions. Whether you
are preparing for interviews, competitive exams, or campus interviews, you will find this book to be an invaluable
resource.
If you are a job seeker and read this book in its entirety, I am confident that you will be well-equipped to impress
your interviewers with your knowledge and skills. As an instructor, this book will enable you to deliver lectures in
a clear and straightforward manner, resulting in your students developing a greater appreciation for their chosen
field of study in Computer Science/Information Technology.
This book is a valuable resource not only for students pursuing a degree in Engineering or Masters, but also for
those preparing for competitive exams in Computer Science/Information Technology. The book places a greater
emphasis on problem-solving and analysis, rather than just theory. Each chapter begins with an introduction to
the necessary theory, followed by a range of problem sets. There are approximately 700 algorithmic problems,
each with its solution, presented in the book.
My primary focus in writing this book was to aid students in their preparation for competitive exams. For each
problem, multiple solutions are provided, each with varying levels of complexity. The solutions start with a brute
force approach and gradually progress to the most efficient solution. For each problem, we also explore the time
and memory complexities of the algorithm.
To gain a complete understanding of all the topics covered, it is recommended that the reader does at least one
complete reading of the book. Subsequently, readers can refer to specific chapters or topics as required. While we
have made every effort to eliminate errors, there may still be minor typos in the book. Corrections can be found
on www.CareerMonk.com, along with new problems and solutions. I also welcome your valuable suggestions at:
[email protected].
I wish you the best of luck and am confident that you will find this book to be an invaluable resource.
−𝑁𝑎𝑟𝑎𝑠𝑖𝑚ℎ𝑎 𝐾𝑎𝑟𝑢𝑚𝑎𝑛𝑐ℎ𝑖
M-Tech, 𝐼𝐼𝑇 𝐵𝑜𝑚𝑏𝑎𝑦
Founder, 𝐶𝑎𝑟𝑒𝑒𝑟𝑀𝑜𝑛𝑘. 𝑐𝑜𝑚
Other Books by Narasimha Karumanchi
IT Interview Questions
Data Structures and Algorithms Made Easy in Java
Coding Interview Questions
Data Structures and Algorithmic Thinking with Python
Data Structures and Algorithmic Thinking with Go
Algorithm Design Techniques
Data Structure Operations Cheat Sheet
Space
Data Average Case Time Complexity Worst Case Time Complexity
Complexity
Structure
Accessing Accessing
Name Search Insertion Deletion Search Insertion Deletion Worst Case
𝑛𝑡ℎ element 𝑛𝑡ℎ element
Arrays O(1) O(𝑛) O(𝑛) O(𝑛) O(1) O(𝑛) O(𝑛) O(𝑛) O(𝑛)
Stacks O(𝑛) O(𝑛) O(1) O(1) O(𝑛) O(𝑛) O(1) O(1) O(𝑛)
Queues O(𝑛) O(𝑛) O(1) O(1) O(𝑛) O(𝑛) O(1) O(1) O(𝑛)
Binary
O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛)
Trees
Binary
Search O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛) O(𝑛)
Trees
Balanced
Binary
O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛) O(𝑙𝑜𝑔𝑛)
Search
Trees
Hash
N/A O(1) O(1) O(1) N/A O(𝑛) O(𝑛) O(𝑛) O(𝑛)
Tables
Note: For best case operations, the time complexities are O(1).
Chapter
Introduction 1
The objective of this chapter is to explain the importance of the analysis of algorithms, their notations,
relationships and solving as many problems as possible. Let us first focus on understanding the basic elements
of algorithms, the importance of algorithm analysis, and then slowly move toward the other topics as mentioned
above. After completing this chapter, you should be able to find the complexity of any given algorithm (especially
recursive functions).
1.1 Variables
Before going to the definition of variables, let us relate them to old mathematical equations. All of us have solved
many mathematical equations since childhood. As an example, consider the following equation:
𝑥 2 + 2𝑦 − 2 = 1
We don’t have to worry about the use of this equation. The important thing that we need to understand is that the
equation has names (𝑥 and 𝑦), which hold values (data). That means the 𝑛𝑎𝑚𝑒𝑠 (𝑥 and 𝑦) are placeholders for
representing data. Similarly, in computer science programming we need something for holding data, and 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠
is the way to do that.
1.1 Variables 15
Data Structures and Algorithms Made Easy Introduction
For the above-mentioned example, we can represent the cost of the car and the cost of the bicycle in terms of
function, and for a given function ignore the low order terms that are relatively insignificant (for large value of
input size, 𝑛). As an example, in the case below, 𝑛4 , 2𝑛2 , 100𝑛 and 500 are the individual costs of some function
and approximate to 𝑛4 since 𝑛4 is the highest rate of growth.
𝑛4 + 2𝑛2 + 100𝑛 + 500 ≈ 𝑛4
The diagram below shows the relationship between different rates of growth.
𝑛
22
D
𝑛! e
c
r
4𝑛 e
a
s
2𝑛 i
n
g
𝑛2
R
𝑛 log 𝑛 a
t
log (𝑛!) e
s
𝑛
O
f
2log 𝑛
G
r
o
𝑙𝑜𝑔2 𝑛 w
t
h
ඥ𝑙𝑜𝑔𝑛
log log 𝑛
𝑐𝑔(𝑛)
𝑓(𝑛)
Rate of growth
𝑛0 Input size, 𝑛
Let us see the O−notation with a little more detail. O−notation defined as O(𝑔(𝑛)) = {𝑓(𝑛): there exist positive
constants 𝑐 and 𝑛0 such that 0 ≤ 𝑓(𝑛) ≤ 𝑐𝑔(𝑛) for all 𝑛 ≥ 𝑛0 }. 𝑔(𝑛) is an asymptotic tight upper bound for 𝑓(𝑛).
Our objective is to give the smallest rate of growth 𝑔(𝑛) which is greater than or equal to the given algorithms’ rate
of growth 𝑓(𝑛).
Generally, we discard lower values of 𝑛. That means the rate of growth at lower values of 𝑛 is not important. In
the figure, 𝑛0 is the point from which we need to consider the rate of growth for a given algorithm. Below 𝑛0 , the
rate of growth could be different. 𝑛0 is called threshold for the given function.
Big-O Visualization
O(𝑔(𝑛)) is the set of functions with smaller or the same order of growth as 𝑔(𝑛). For example; O(𝑛2 ) includes
O(1), O(𝑛), O(𝑛𝑙𝑜𝑔𝑛), etc.
Note: Focus on analyzing the algorithms for larger values of n, as the rate of growth below 𝑛0 is not of concern to us.
Big-O Examples
Example-1 Find upper bound for 𝑓(𝑛) = 3𝑛 + 8
Solution: 3𝑛 + 8 ≤ 4𝑛, for all 𝑛 ≥ 8
∴ 3𝑛 + 8 = O(𝑛) with c = 4 and 𝑛0 = 8
Example-2 Find upper bound for 𝑓(𝑛) = 𝑛2 + 1
No Uniqueness?
There is no unique set of values for 𝑛0 and 𝑐 in proving the asymptotic bounds. Let us consider, 100𝑛 + 5 = O(𝑛). For this function there
are multiple 𝑛0 and 𝑐 values possible.
Solution1: 100𝑛 + 5 ≤ 100𝑛 + 𝑛 = 101𝑛 ≤ 101𝑛, for all 𝑛 ≥ 5, 𝑛0 = 5 and 𝑐 = 101 is a solution.
Solution2: 100𝑛 + 5 ≤ 100𝑛 + 5𝑛 = 105𝑛 ≤ 105𝑛, for all 𝑛 ≥ 1, 𝑛0 = 1 and 𝑐 = 105 is also a solution.
𝑓(𝑛)
𝑐𝑔(𝑛))
Rate of growth
𝑛0 Input size, 𝑛
Ω Examples
Example-1 Find lower bound for 𝑓(𝑛) = 5𝑛2 .
c2 𝑔(𝑛)
𝑓(𝑛)
𝑛0 Input size, 𝑛
Now consider the definition of notation. It is defined as (𝑔(𝑛)) = {𝑓(𝑛): there exist positive constants 𝑐1 , 𝑐2 and 𝑛0
such that 0 ≤ 𝑐1 𝑔(𝑛) ≤ 𝑓(𝑛) ≤ 𝑐2 𝑔(𝑛) for all 𝑛 ≥ 𝑛0 }. 𝑔(𝑛) is an asymptotic tight bound for 𝑓(𝑛). (𝑔(𝑛)) is the
set of functions with the same order of growth as 𝑔(𝑛).
Examples
𝑛2 𝑛
Example 1 Find bound for 𝑓(𝑛) = −2
2
𝑛2 𝑛2 𝑛
Solution: ≤ − 2 ≤ 𝑛2 , for all, 𝑛 ≥ 2
5 2
𝑛2 𝑛
∴ − 2 = (𝑛2 ) with 𝑐1 = 1/5, 𝑐2 = 1 and 𝑛0 = 2
2
Important Notes
For analysis (best case, worst case and average), we try to give the upper bound (O) and lower bound (W) and average running time
(). From the above examples, it should also be clear that, for a given function (algorithm), getting the upper bound (O) and lower
bound (W) and average running time () may not always be possible. For example, if we are discussing the best case of an algorithm, we try
to give the upper bound (O) and lower bound (W) and average running time ().
In the remaining chapters, we generally focus on the upper bound (O) because knowing the lower bound (W) of an algorithm
is of no practical importance, and we use the notation if the upper bound (O) and lower bound (W) are the same.
if(length( ) == 0 ) {
return false; //then part: constant
}
else {// else part: (constant + constant) * n
for (int n = 0; n < length( ); n++) {
// another if : constant + constant (no else part)
if(!list[n].equals(otherList.list[n]))
//constant
return false;
}
}
Total time = 𝑐0 + (𝑐1 + 𝑐2 ) ∗ 𝑛 = O(𝑛).
5) Logarithmic complexity: An algorithm is O(𝑙𝑜𝑔𝑛) if it takes a constant time to cut the problem size by a fraction
(usually by ½). As an example, let us consider the following program:
for (i=1; i<=n;)
i = i*2;
If we observe carefully, the value of 𝑖 is doubling every time. Initially 𝑖 = 1, in next step 𝑖 = 2, and in subsequent
steps 𝑖 = 4, 8 and so on. Let us assume that the loop is executing some 𝑘 times. At 𝑘 𝑡ℎ step 2𝑘 = 𝑛, and at
(𝑘 + 1)𝑡ℎ step we come out of the 𝑙𝑜𝑜𝑝. Taking logarithms on both sides, gives
𝑙𝑜𝑔(2k ) = 𝑙𝑜𝑔𝑛
𝑘𝑙𝑜𝑔2 = 𝑙𝑜𝑔𝑛
𝑘 = 𝑙𝑜𝑔𝑛 //if we assume base-2
Total time = O(𝑙𝑜𝑔𝑛).
Note: Similarly, for the case below, the worst-case rate of growth is O(𝑙𝑜𝑔𝑛) [discussion holds good for the
decreasing sequence as well].
for (i=n; i>=1;)
i = i/2;
Another example: binary search (finding a word in a dictionary of 𝑛 pages)
• Look at the center point in the dictionary.
• Is the word towards the left or right of center?
• Repeat the process with the left or right part of the dictionary until the word is found.
Harmonic series
𝑛
1 1 1
∑ = 1 + + … + ≈ 𝑙𝑜𝑔 𝑛
𝑘 2 𝑛
𝑘=1
Other important formulae
𝑛
∑ 𝑙𝑜𝑔 𝑘 ≈ 𝑛𝑙𝑜𝑔𝑛
𝑘=1
𝑛
1
∑ 𝑘 𝑝 = 1𝑝 + 2𝑝 + ⋯ + 𝑛𝑝 ≈ 𝑛𝑝+1
𝑝+1
𝑘=1
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≤ √𝑛. 𝑐 √𝑛 𝑙𝑜𝑔√𝑛 + 𝑛
= 𝑛. 𝑐 𝑙𝑜𝑔√𝑛 + 𝑛
1
= 𝑛.c.2 . 𝑙𝑜𝑔𝑛+ 𝑛
≤ 𝑐𝑛𝑙𝑜𝑔𝑛
1
The last inequality assumes only that 1 ≤ c.2 . 𝑙𝑜𝑔𝑛. This is correct if 𝑛 is sufficiently large and for any constant 𝑐,
no matter how small. From the above proof, we can see that our guess is correct for the upper bound. Now, let us
prove the 𝑙𝑜𝑤𝑒𝑟 bound for this recurrence.
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≥ √𝑛. 𝑘 √𝑛 𝑙𝑜𝑔√𝑛 + 𝑛
= 𝑛. 𝑘 𝑙𝑜𝑔√𝑛 + 𝑛
1
= 𝑛.𝑘.2 . 𝑙𝑜𝑔𝑛+ 𝑛
≥ 𝑘𝑛𝑙𝑜𝑔𝑛
1
The last inequality assumes only that 1 ≥ 𝑘.2 . 𝑙𝑜𝑔𝑛. This is incorrect if 𝑛 is sufficiently large and for any constant
𝑘. From the above proof, we can see that our guess is incorrect for the lower bound.
From the above discussion, we understood that Θ(𝑛𝑙𝑜𝑔𝑛) is too big. How about Θ(𝑛)? The lower bound is easy to
prove directly:
T(𝑛) = √𝑛 T(√𝑛) + 𝑛 ≥𝑛
Now, let us prove the upper bound for this Θ(𝑛).
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≤ √𝑛.𝑐. √𝑛 + 𝑛
= 𝑛. 𝑐+ 𝑛
= 𝑛 (𝑐 + 1)
≰ 𝑐𝑛
From the above induction, we understood that Θ(𝑛) is too small and Θ(𝑛𝑙𝑜𝑔𝑛) is too big. So, we need something
bigger than 𝑛 and smaller than 𝑛𝑙𝑜𝑔𝑛. How about 𝑛ඥ𝑙𝑜𝑔𝑛?
Proving the upper bound for 𝑛ඥ𝑙𝑜𝑔𝑛:
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≤ √𝑛.𝑐. √𝑛√𝑙𝑜𝑔√𝑛 + 𝑛
1
= 𝑛. 𝑐. 𝑙𝑜𝑔√𝑛+ 𝑛
√2
≤ 𝑐𝑛𝑙𝑜𝑔√𝑛
Proving the lower bound for 𝑛ඥ𝑙𝑜𝑔𝑛:
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≥ √𝑛.𝑘. √𝑛√𝑙𝑜𝑔√𝑛 + 𝑛
1
= 𝑛. 𝑘. 𝑙𝑜𝑔√𝑛+ 𝑛
√2
≱ 𝑘𝑛𝑙𝑜𝑔√𝑛
The last step doesn’t work. So, Θ(𝑛ඥ𝑙𝑜𝑔𝑛) doesn’t work. What else is between 𝑛 and 𝑛𝑙𝑜𝑔𝑛? How about 𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛?
Proving upper bound for 𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛:
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≤ √𝑛.𝑐. √𝑛𝑙𝑜𝑔𝑙𝑜𝑔√𝑛 + 𝑛
= 𝑛. 𝑐. 𝑙𝑜𝑔𝑙𝑜𝑔𝑛-𝑐. 𝑛 + 𝑛
≤ 𝑐𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛, if 𝑐 ≥ 1
Proving lower bound for 𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛:
T(𝑛) = √𝑛 T(√𝑛) + 𝑛
≥ √𝑛.𝑘. √𝑛𝑙𝑜𝑔𝑙𝑜𝑔√𝑛 + 𝑛
= 𝑛. 𝑘. 𝑙𝑜𝑔𝑙𝑜𝑔𝑛-𝑘. 𝑛 + 𝑛
≥ 𝑘𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛, if 𝑘 ≤ 1
From the above proofs, we can see that T(𝑛) ≤ 𝑐𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛, if 𝑐 ≥ 1 and T(𝑛) ≥ 𝑘𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛, if 𝑘 ≤ 1. Technically, we’re
still missing the base cases in both proofs, but we can be fairly confident at this point that T(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑙𝑜𝑔𝑛).
}
}
Solution: Consider the comments in the below function:
void function (int n) {
int i=1, s=1;
// s is increasing not at rate 1 but i
while( s <= n) {
i++;
s= s+i;
printf(“*");
}
}
We can define the ‘𝑠’ terms according to the relation 𝑠𝑖 = 𝑠𝑖−1 + 𝑖. The value of ‘𝑖’ increases by 1 for each iteration.
The value contained in ‘𝑠’ at the 𝑖 𝑡ℎ iteration is the sum of the first ‘𝑖’ positive integers. If 𝑘 is the total number of
iterations taken by the program, then the 𝑤ℎ𝑖𝑙𝑒 loop terminates if:
𝑘(𝑘+1)
1 + 2 + ...+ 𝑘 = > 𝑛 ⟹ 𝑘 = O(√𝑛).
2
Problem-24 Find the complexity of the function given below.
void function(int n) {
int i, count =0;
for(i=1; i*i<=n; i++)
count++;
}
Solution:
void function(int n) {
int i, count =0;
for(i=1; i*i<=n; i++)
count++;
}
In the above-mentioned function the loop will end, if 𝑖 2 > 𝑛 ⟹ 𝑇(𝑛) = O(√𝑛). This is similar to Problem-23.
Problem-25 What is the complexity of the program given below:
void function(int n) {
int i, j, k , count =0;
for(i=n/2; i<=n; i++)
for(j=1; j + n/2<=n; j= j+1)
for(k=1; k<=n; k= k * 2)
count++;
}
Solution: Consider the comments in the following function.
void function (int n) {
int i, j, k, count = 0;
//outer loop execute n/2 times
for (i = n / 2; i <= n; i++)
//middle loop executes n/2 times
for (j = 1; j + n / 2 <= n; j = j + 1)
//inner loop execute logn times
for (k = 1; k <= n; k = k * 2)
count++;
}
The complexity of the above function is O(𝑛2 𝑙𝑜𝑔𝑛).
Problem-26 What is the complexity of the program given below:
void function(int n) {
int i, j, k , count =0;
for(i=n/2; i<=n; i++)
for(j=1; j<=n; j= 2 * j)
for(k=1; k<=n; k= k * 2)
count++;
}
Solution: Consider the comments in the following function.
void function(int n) {
Solution: Using Divide and Conquer master theorem, we get O(𝑛𝑙𝑜𝑔2 𝑛).
𝑛 𝑛 𝑛
Problem-30 Determine Θ bounds for the recurrence: 𝑇(𝑛) = 𝑇 (2 ) + 𝑇 (4 ) + 𝑇 (8 ) + 𝑛.
𝑛 𝑛 𝑛
Solution: Substituting in the recurrence equation, we get: 𝑇(𝑛) ≤ 𝑐1 ∗ 2 + 𝑐2 ∗ 4 + 𝑐3 ∗ 8 + 𝑐𝑛 ≤ 𝑘 ∗ 𝑛 , where 𝑘
is a constant. This clearly says Θ(𝑛).
Problem-31 Determine Θ bounds for the recurrence relation: 𝑇(𝑛) = 𝑇(𝑛/2) + 7.
Solution: Using Master Theorem we get: Θ(𝑙𝑜𝑔𝑛).
Problem-32 Prove that the running time of the code below is Ω(𝑙𝑜𝑔𝑛).
void Read(int n) {
int k = 1;
while( k < n )
k = 3*k;
}
Solution: The 𝑤ℎ𝑖𝑙𝑒 loop will terminate once the value of ‘𝑘’ is greater than or equal to the value of ‘𝑛’. In each
iteration the value of ‘𝑘’ is multiplied by 3. If 𝑖 is the number of iterations, then ‘𝑘’ has the value of 3𝑖 after 𝑖
iterations. The loop is terminated upon reaching 𝑖 iterations when 3𝑖 ≥ 𝑛 ↔ 𝑖 ≥ log 3 𝑛, which shows that 𝑖 = Ω (𝑙𝑜𝑔𝑛).
Problem-33 Solve the following recurrence.
1, 𝑖𝑓 𝑛 = 1
𝑇(𝑛) = {
𝑇(𝑛 − 1) + 𝑛(𝑛 − 1), 𝑖𝑓 𝑛 ≥ 2
Solution: By iteration:
𝑇(𝑛) = 𝑇(𝑛 − 2) + (𝑛 − 1)(𝑛 − 2) + 𝑛(𝑛 − 1)
…
𝑛
𝑇(𝑛) = 𝑇(1) + ∑ 𝑖 2 − ∑ 𝑖
𝑖=1 𝑖=1
𝑛((𝑛 + 1)(2𝑛 + 1) 𝑛(𝑛 + 1)
𝑇(𝑛) = 1 + −
6 2
𝑇(𝑛) = (𝑛 )
3
Note: We can use the 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 master theorem for this problem.
Problem-34 Consider the following program:
Fib[n]
if(n==0) then return 0
else if(n==1) then return 1
else return Fib[n-1]+Fib[n-2]
Solution: The recurrence relation for the running time of this program is: 𝑇(𝑛) = 𝑇(𝑛 − 1) + 𝑇(𝑛 − 2) + 𝑐. Note T(𝑛)
has two recurrence calls indicating a binary tree. Each step recursively calls the program for 𝑛 reduced by 1 and
2, so the depth of the recurrence tree is O(𝑛). The number of leaves at depth 𝑛 is 2𝑛 since this is a full binary tree,
and each leaf takes at least O(1) computations for the constant factor. Running time is clearly exponential in 𝑛
and it is O(2𝑛 ).
Problem-35 Running time of following program?
void function(n) {
for(int i = 1; i <= n ; i + + )
for(int j = 1 ; j <= n ; j+ = i )
printf(“ ∗ ”);
}
Solution: Consider the comments in the function below:
void function (n) {
//this loop executes n times
for(int i = 1; i <= n ; i + + )
//this loop executes j times with j increase by the rate of i
for(int j = 1 ; j <= n ; j+ = i )
printf( “ ∗ ” );
}
In the above code, inner loop executes 𝑛/𝑖 times for each value of 𝑖. Its running time is 𝑛 × (∑ni=1 n/i) = O(𝑛𝑙𝑜𝑔𝑛).
Problem-36 What is the complexity of ∑𝑛𝑖=1 𝑙𝑜𝑔 𝑖 ?
Solution: Using logarithmic property, 𝑙𝑜𝑔𝑥𝑦 = 𝑙𝑜𝑔𝑥 + 𝑙𝑜𝑔𝑦, we can see that this problem is equivalent to
∑ 𝑙𝑜𝑔𝑖 = 𝑙𝑜𝑔 1 + 𝑙𝑜𝑔 2 + ⋯ + 𝑙𝑜𝑔 𝑛 = 𝑙𝑜𝑔 (1 × 2 × … × 𝑛) = 𝑙𝑜𝑔 (𝑛!) ≤ 𝑙𝑜𝑔 (𝑛𝑛 ) ≤ 𝑛𝑙𝑜𝑔𝑛
𝑖=1
This shows that the time complexity = O(𝑛𝑙𝑜𝑔𝑛).
Problem-37 What is the running time of the following recursive function (specified as a function of the input
value 𝑛)? First write the recurrence formula and then find its complexity.
void function(int n) {
if(n <= 1) return;
for (int i=1 ; i <= 3; i++ )
n
f(3);
}
Solution: Consider the comments in the function below:
void function (int n) {
if(n <= 1) return; // constant time
𝑛
//this loop executes with recursive loop of value
3
for (int i=1 ; i <= 3; i++ )
n
f(3);
}
We can assume that for asymptotical analysis 𝑘 = 𝑘 for every integer 𝑘 ≥ 1. The recurrence for this code is
𝑛
𝑇(𝑛) = 3𝑇(3 ) + Θ(1). Using master theorem, we get 𝑇(𝑛) = Θ(𝑛).
Problem-38 What is the running time of the following recursive function (specified as a function of the input
value 𝑛)? First write a recurrence formula and show its solution using induction.
void function(int n) {
if(n <= 1) return;
for (int i=1 ; i <= 3 ; i++ )
function (n − 1).
}
Solution: Consider the comments in the function below:
void function (int n) {
//constant time
if(n <= 1) return;
//this loop executes 3 times with recursive call of n-1 value
for (int i=1 ; i <= 3 ; i++ )
function (n − 1).
}
The 𝑖𝑓 statement requires constant time [O(1)]. With the 𝑓𝑜𝑟 loop, we neglect the loop overhead and only count
three times that the function is called recursively. This implies a time complexity recurrence:
𝑇(𝑛) = 𝑐, 𝑖𝑓 𝑛 ≤ 1;
= 𝑐 + 3𝑇(𝑛 − 1), 𝑖𝑓 𝑛 > 1.
Using the 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 master theorem, we get 𝑇(𝑛) = Θ(3𝑛 ).
Problem-39 Write a recursion formula for the running time 𝑇(𝑛) of the function whose code is below.
void function (int n) {
if(n <= 1) return;
for(int i = 1; i < n; i + +)
printf(“ ∗ ”);
function ( 0.8n ) ;
}
Solution: Consider the comments in the function below:
function (int n) {
if(n <= 1) return; //constant time
// this loop executes 𝑛 times with constant time loop
for(int i = 1; i < n; i + +)
printf(“ ∗ ”);
//recursive call with 0.8n
function ( 0.8n ) ;
}
The recurrence for this piece of code is 𝑇(𝑛) = 𝑇(. 8𝑛) + O(𝑛) = T(4/5𝑛) + O(𝑛) = 4/5 T(𝑛) + O(𝑛). Applying master
theorem, we get T(𝑛) = O(𝑛).
temp = temp + 1;
𝑛
n = 2;
until n <= 1
Solution: Consider the comments in the pseudocode below:
temp = 1 //const time
repeat // this loops executes n times
for i = 1 to n
temp = temp + 1;
𝑛
//recursive call with 2 value
n
n = 2;
until n <= 1
The recurrence for this function is 𝑇(𝑛) = 𝑇(𝑛/2) + 𝑛. Using master theorem, we get 𝑇(𝑛) = O(𝑛).
Problem-46 Running time of the following program?
void function(int n) {
for(int i = 1 ; i <= n ; i + + )
for(int j = 1 ; j <= n ; j * = 2 )
printf( “ ∗ ” );
}
Solution: Consider the comments in the below function:
void function(int n) {
for(int i = 1 ; i <= n ; i + + ) // this loops executes n times
// this loops executes logn times from our logarithms guideline
for(int j = 1 ; j <= n ; j * = 2 )
printf( “ ∗ ” );
}
Complexity of above program is: O(𝑛𝑙𝑜𝑔𝑛).
Problem-47 Running time of the following program?
void function(int n) {
for(int i = 1 ; i <= n/3 ; i + + )
for(int j = 1 ; j <= n ; j += 4 )
printf( “ ∗ ” );
}
Solution: Consider the comments in the function below:
void function(int n) { // this loops executes n/3 times
for(int i = 1 ; i <= n/3 ; i + + )
for(int j = 1 ; j <= n ; j += 4) // this loops executes n/4 times
printf( “ ∗ ” );
}
The time complexity of this program is: O(𝑛2 ).
Problem-48 Find the complexity of the below function:
void function(int n) {
if(n <= 1) return;
if(n > 1) {
printf (" ∗ ");
n
function( 2 );
n
function( );
2
}
}
Solution: Consider the comments in the function below:
void function(int n) {
if(n <= 1) return; //constant time
if(n > 1) {
//constant time
printf (" ∗ ");
function( n/2 ); //recursion with n/2 value
function( n/2 ); //recursion with n/2 value
}
}
𝑛
The recurrence for this function is: 𝑇(𝑛) = 2𝑇 (2 ) + 1. Using master theorem, we get 𝑇(𝑛) = O(𝑛).
on both sides gives 𝑘 = 𝑙𝑜𝑔2𝑛 . Since we are doing one more comparison for exiting from the loop, the answer is
ceil(𝑙𝑜𝑔2𝑛 )+ 1.
Problem-54 Consider the following C code segment. Let T(𝑛) denote the number of times the for loop is
executed by the program on input 𝑛. Which of the following is true?
int isPrime(int n){
for(int i=2;i<=sqrt(n);i++)
if(n%i == 0){
printf(“Not Prime\n”);
return 0;
}
return 1;
}
(A) T(𝑛) = O(√𝑛) and T(𝑛) = W(√𝑛) (B) T(𝑛) = O(√𝑛) and T(𝑛) = W(1) (C) T(𝑛) = O(𝑛) and T(𝑛) = W(√𝑛) (D) None
of the above
Solution: (B). Big O notation describes the tight upper bound and Big Omega notation describes the tight lower
bound for an algorithm. The 𝑓𝑜𝑟 loop in the question is run maximum √𝑛 times and minimum 1 time. Therefore,
T(𝑛) = O(√𝑛) and T(𝑛) = W(1).
Problem-55 In the following C function, let 𝑛 ≥ 𝑚. How many recursive calls are made by this function?
int gcd(n,m){
if (n%m ==0)
return m;
n = n%m;
return gcd(m,n);
}
(A) (𝑙𝑜𝑔2𝑛 ) (B) W(𝑛) (C) (𝑙𝑜𝑔2 𝑙𝑜𝑔2𝑛 ) (D) (𝑛)
Solution: No option is correct. Big O notation describes the tight upper bound and Big Omega notation describes
the tight lower bound for an algorithm. For 𝑚 = 2 and for all 𝑛 = 2𝑖 , the running time is O(1) which contradicts
every option.
Problem-56 Suppose 𝑇(𝑛) = 2𝑇(𝑛/2) + 𝑛, T(0)=T(1)=1. Which one of the following is false?
(A) 𝑇(𝑛) = O(𝑛2 ) (B) 𝑇(𝑛) = (𝑛𝑙𝑜𝑔𝑛) (C) 𝑇(𝑛) = W(𝑛2 ) (D) 𝑇(𝑛) = O(𝑛𝑙𝑜𝑔𝑛)
Solution: (C). Big O notation describes the tight upper bound and Big Omega notation describes the tight lower
bound for an algorithm. Based on master theorem, we get 𝑇(𝑛) = (𝑛𝑙𝑜𝑔𝑛). This indicates that tight lower bound
and tight upper bound are the same. That means, O(𝑛𝑙𝑜𝑔𝑛) and W(𝑛𝑙𝑜𝑔𝑛) are correct for given recurrence. So,
option (C) is wrong.
Problem-57 Find the complexity of the function below:
void function(int n) {
for (int i = 0; i<n; i++)
for(int j=i; j<i*i; j++)
if (j %i == 0){
for (int k = 0; k < j; k++)
printf(" * ");
}
}
Solution:
void function(int n) {
for (int i = 0; i<n; i++) // Executes n times
for(int j=i; j<i*i; j++) // Executes n*n times
if (j %i == 0){
for (int k = 0; k < j; k++) // Executes j times = (n*n) times
printf(" * ");
}
}
Time Complexity: O(𝑛5 ).
Problem-58 To calculate 9𝑛 , give an algorithm and discuss its complexity.
Solution: Start with 1 and multiply by 9 until reaching 9𝑛 .
Time Complexity: There are 𝑛 − 1 multiplications and each takes constant time giving a (𝑛) algorithm.
Problem-59 For Problem-58, can we improve the time complexity?
Solution: Refer to the 𝐷𝑖𝑣𝑖𝑑𝑒 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 chapter.
𝑛 𝑛 𝑛
Problem-60 Find the time complexity of recurrence T(n) = T(2 ) + T(4 ) + T(8 ) + 𝑛.
Solution: Let us solve this problem by method of guessing. The total size on each level of the recurrence tree is
less than 𝑛, so we guess that 𝑓(𝑛) = 𝑛 will dominate. Assume for all 𝑖 < 𝑛 that 𝑐1 𝑛 ≤ T(𝑖) ≤ 𝑐2 𝑛. Then,
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
𝑐1 2 + 𝑐1 4 + 𝑐1 8 + 𝑘𝑛 ≤ T(𝑛) ≤ 𝑐2 2 + 𝑐2 4 + 𝑐2 8 + 𝑘𝑛
1 1 1 𝑘 1 1 1 𝑘
𝑐1 𝑛(2 + + + ) ≤ T(𝑛) ≤ 𝑐2 𝑛(2 + + + )
4 8 𝑐1 4 8 𝑐2
7 𝑘 7 𝑘
𝑐1 𝑛(8 + ) ≤ T(𝑛) ≤ 𝑐2 𝑛(8 + )
𝑐1 𝑐2
If 𝑐1 ≥ 8k and 𝑐2 ≤ 8k, then 𝑐1 𝑛 = T(𝑛) = 𝑐2 𝑛. So, T(𝑛) = Θ(𝑛). In general, if you have multiple recursive calls, the
𝑛 𝑛 𝑛
sum of the arguments to those calls is less than n (in this case 2 + 4 + 8 < 𝑛), and 𝑓(𝑛) is reasonably large, a good
guess is T(𝑛) = Θ(f(𝑛)).
Problem-61 Find the complexity of the below function:
void function(int n) {
int sum = 0;
for (int i = 0; i<n; i++)
if (i>j)
sum = sum +1;
else {
for (int k = 0; k < n; k++)
sum = sum -1;
}
}
}
Solution: Consider the worst-case.
void function(int n) {
int sum = 0;
for (int i = 0; i<n; i++) // Executes 𝑛 times
if (i>j)
sum = sum +1; // Executes 𝑛 times
else {
for (int k = 0; k < n; k++) // Executes 𝑛 times
sum = sum -1;
}
}
}
Time Complexity: O(𝑛2 ).
𝑛 2𝑛
Problem-62 Solve the following recurrence relation using the recursion tree method: T(𝑛)=T(2 ) +T( 3 )+ 𝑛2 .
Solution: How much work do we do in each level of the recursion tree?
T(𝑛)
𝑛 2𝑛
T( ) T( 3 ) 𝑛2
2
T(
1𝑛
)
2𝑛
T( )
𝑛 2
T(
1 2𝑛
)
2 2𝑛
T( ) 2𝑛 2
22 32 ( ) 2 3 3 3 ( )
2 3
1𝑛 2𝑛 1 2𝑛 2 2𝑛
T(2 2) T(3 2) 𝑛 2 T(2 ) T(3 ) 2𝑛 2 1𝑛
T(2 2)
2𝑛
T(3 2) 𝑛 2 1 2𝑛
T(2 ) T(3
2 2𝑛
) 2𝑛 2
( ) 3 3 ( ) ( ) 3 3 ( )
2 3 2 3
In level 0, we take 𝑛2 time. At level 1, the two subproblems take time:
1 2 2 2 1 4 25
( 𝑛) + ( 𝑛) = ( + ) 𝑛2 = ( ) 𝑛2
2 3 4 9 36
1𝑛 2 𝑛 1 2𝑛 2 2𝑛
At level 2 the four subproblems are of size ,
22 32 2 3
, , and 3 3
respectively. These two subproblems take time:
1 1 2
1 2
4 625 2 25 2
( 𝑛) + ( 𝑛) + ( ) 𝑛2 + ( ) 𝑛2 = 𝑛 = ( ) 𝑛2
4 3 3 9 1296 36
25 𝑘
Similarly, the amount of work at level 𝑘 is at most (36) 𝑛2 .
25
Let 𝛼 = , the total runtime is then:
36
∞
T(𝑛) ≤ ∑ 𝛼 𝑘 𝑛2
𝑘=0
1 2
= 𝑛
1−∝
1
= 𝑛2
25
1 − 36
1 2
= 𝑛
11
36
36 2
= 𝑛
11
= O(𝑛2 )
That is, the first level provides a constant fraction of the total runtime.
3
Problem-63 Rank the following functions by order of growth: (𝑛 + 1)!, n!, 4𝑛 , 𝑛 × 3𝑛 , 3𝑛 + 𝑛2 + 20𝑛, (2)𝑛 , 4𝑛2 ,
4 , 𝑛 + 200, 20𝑛 + 500, 2
𝑙𝑔𝑛 2 𝑙𝑔𝑛
,𝑛 2/3
, 1.
Solution:
Function Rate of Growth
(𝑛 + 1)! O(𝑛!)
𝑛! O(𝑛!)
4𝑛 O(4𝑛 )
𝑛 × 3𝑛 O(𝑛3𝑛 )
3𝑛 + 𝑛2 + 20𝑛 O(3𝑛 )
3 3
O((2)𝑛 )
( )𝑛 Decreasing rate of
2
4𝑛2 O(𝑛2 ) growths
4𝑙𝑔𝑛 O(𝑛2 )
2
𝑛 + 200 O(𝑛2 )
20𝑛 + 500 O(𝑛)
2𝑙𝑔𝑛 O(𝑛)
𝑛2/3 O(𝑛2/3 )
1 O(1)
0.75
Problem-64 Can we say 3𝑛 = O(3𝑛 )?
𝑛0.75 𝑛1
Solution: Yes: because 3 < 3 .
Problem-65 Can we say 23𝑛 = O(2𝑛 )?
Solution: No: because 23𝑛 = (23 )𝑛 = 8𝑛 not less than 2𝑛 .
Chapter
Recursion and
Backtracking 2
2.1 Introduction
In this chapter, we will look at one of the important topics, “𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑜𝑛”, which will be used in almost every chapter,
and also its relative “𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘𝑖𝑛𝑔”.
2.1 Introduction 38
Data Structures and Algorithms Made Easy Recursion and Backtracking
return 1;
else // recursive case: multiply 𝑛 by (𝑛 − 1) factorial
return n*Fact(n-1);
}
printFunc(3)
Returns 0 printFunc(2)
Returns 0 printFunc(1)
Returns 0 to main
function Returns 0 printFunc(0)
Returns 0
Now, let us consider our factorial function. The visualization of factorial function with n=4 will look like:
4!
4* 3!
4*6=24 is returned 3*2!
Recursion
• Terminates when a base case is reached.
• Each recursive call requires extra space on the stack frame (memory).
• If we get infinite recursion, the program may run out of memory and result in stack overflow.
• Solutions to some problems are easier to formulate recursively.
Iteration
• Terminates when a condition is proven to be false.
• Each iteration does not require extra space.
• An infinite loop could loop forever since there is no extra memory being created.
• Iterative solutions to a problem may not always be as obvious as a recursive solution.
Using Subtraction and Conquer Master theorem we get: 𝑇(𝑛) = O(2𝑛 ). This means the algorithm for generating
bit-strings is optimal.
Problem-4 Generate all the strings of length 𝑛 drawn from 0. . . 𝑘 − 1.
Solution: Let us assume we keep current k-ary string in an array 𝐴[0. . 𝑛 − 1]. Call function 𝑘-𝑠𝑡𝑟𝑖𝑛𝑔(n, k):
void k_strings(int n, int k) {
//process all k-ary strings of length m
if (n < 1)
printf(“ %s ”, A); //Assume array A is a global variable
else {
for (int j = 0; j < k; j++) {
A[n - 1] = j;
k_strings(n - 1, k);
}
}
}
Let 𝑇(𝑛) be the running time of 𝑘 − 𝑠𝑡𝑟𝑖𝑛𝑔(𝑛). Then,
𝑐, 𝑖𝑓 𝑛 < 0
𝑇(𝑛) = {
𝑘𝑇(𝑛 − 1) + 𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Using Subtraction and Conquer Master theorem we get: 𝑇(𝑛) = O(𝑘 𝑛 ).
Note: For more problems, refer to 𝑆𝑡𝑟𝑖𝑛𝑔 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 chapter.
Problem-5 Finding the length of connected cells of 1s (regions) in an matrix of 0s and 1s: Given a
matrix, each of which may be 1 or 0. The filled cells that are connected form a region. Two cells are said to be
connected if they are adjacent to each other horizontally, vertically or diagonally. There may be several regions
in the matrix. How do you find the largest region (in terms of number of cells) in the matrix?
Sample Input: 11000 Sample Output: 5
01100
00101
10001
01011
Solution: The simplest idea is: for each location traverse in all 8 directions and in each of those directions keep
track of the maximum region found.
int getval(int (*A)[5],int i,int j,int L, int H){
if (i< 0 || i >= L || j< 0 || j >= H)
return 0;
else
return A[i][j];
}
void findMaxBlock(int (*A)[5], int r, int c,int L,int H,int size, bool **cntarr,int &maxsize){
if ( r >= L || c >= H)
return;
cntarr[r][c]=true;
size++;
if (size > maxsize)
maxsize = size;
//search in eight directions
int direction[][2]={{-1,0},{-1,-1},{0,-1},{1,-1},{1,0},{1,1},{0,1},{-1,1}};
for(int i=0; i<8; i++) {
int newi =r+direction[i][0];
int newj=c+direction[i][1];
int val=getval (A,newi,newj,L,H);
if (val>0 && (cntarr[newi][newj]==false)){
findMaxBlock(A,newi,newj,L,H,size,cntarr,maxsize);
}
}
cntarr[r][c]=false;
}
int getMaxOnes(int (*A)[5], int rmax, int colmax){
int maxsize=0;
int size=0;
bool **cntarr=create2darr(rmax,colmax);
int zarr[][5]={{1,1,0,0,0},{0,1,1,0,1},{0,0,0,1,1},{1,0,0,1,1},{0,1,0,1,1}};
cout << "Number of maximum 1s are " << getMaxOnes(zarr,5,5) << endl;
Problem-6 Solve the recurrence T(𝑛) = 2T(𝑛 − 1) + 2𝑛 .
Solution: At each level of the recurrence tree, the number of problems is double from the previous level, while the
amount of work being done in each problem is half from the previous level. Formally, the 𝑖 𝑡ℎ level has 2𝑖 problems,
each requiring 2𝑛−𝑖 work. Thus the 𝑖 𝑡ℎ level requires exactly 2𝑛 work. The depth of this tree is 𝑛, because at the
𝑖 𝑡ℎ level, the originating call will be T(𝑛 − 𝑖). Thus the total complexity for T(𝑛) is T(𝑛2𝑛 ).
Chapter
Linked Lists 3
3.1 What is a Linked List?
One disadvantage of using arrays to store data is that arrays are static structures and therefore cannot be easily
extended or reduced to fit the data set. Arrays are also expensive to maintain new insertions and deletions. In this
chapter we consider another data structure called Linked Lists that addresses some of the limitations of arrays.
A linked list is a data structure used for storing collections of data. A linked list has the following properties.
The linked list is a linear dynamic data structure. The number of nodes in a list is not fixed and can grow and
shrink demand. Each node of a linked list is made up of two items - the data and a reference to the next node.
The last node has a reference to null. The entry point into a linked list is called the head of the list. It should be
noted that the ℎ𝑒𝑎𝑑 is not a separate node, but the reference to the first node. If the list is empty, then the head
is a null reference.
• Pointers connect successive elements.
• The final element points to NULL.
• Size can change during program execution.
• It can be made as long as necessary (until system memory is depleted).
• It avoids wasting memory space but requires additional memory for pointers. Memory is allocated as the
list expands.
4 15 7 40 NULL
head
3 2 1 2 2 3
Index → 0 1 2 3 4 5
Advantages of Arrays
• Simple and easy to use
• Faster access to the elements (constant access)
Disadvantages of Arrays
• Preallocates all needed memory up front and wastes memory space for indices in the array that are empty.
• Fixed size: The size of the array is static (specify the array size before using it).
• One block allocation: To allocate the array itself at the beginning, sometimes it may not be possible to
get the memory for the complete array (if the array size is big).
• Complex position-based insertion: To insert an element at a given position, we may need to shift the
existing elements. This will create a position for us to insert the new element at the desired position. If
the position at which we want to add an element is at the beginning, then the shifting operation is more
expensive.
Dynamic Arrays
Dynamic array (also called as 𝑔𝑟𝑜𝑤𝑎𝑏𝑙𝑒 𝑎𝑟𝑟𝑎𝑦, 𝑟𝑒𝑠𝑖𝑧𝑎𝑏𝑙𝑒 𝑎𝑟𝑟𝑎𝑦, 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑡𝑎𝑏𝑙𝑒, or 𝑎𝑟𝑟𝑎𝑦 𝑙𝑖𝑠𝑡) is a random access,
variable-size list data structure that allows elements to be added or removed.
One simple way of implementing dynamic arrays is to initially start with some fixed size array. As soon as that
array becomes full, create the new array double the size of the original array. Similarly, reduce the array size to
half if the elements in the array are less than half.
Note: We will see the implementation for 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑎𝑟𝑟𝑎𝑦𝑠 in the 𝑆𝑡𝑎𝑐𝑘𝑠, 𝑄𝑢𝑒𝑢𝑒𝑠 and 𝐻𝑎𝑠ℎ𝑖𝑛𝑔 chapters.
are frequently being added and deleted, and the location of new elements added to the list is significant, then
benefits of a linked list increase.
Parameter Linked list Array Dynamic array
Indexing O(𝑛) O(1) O(1)
Insertion/deletion at O(𝑛), if array is not full (for shifting the
O(1) O(𝑛)
beginning elements)
O(1), if array is not
Insertion at ending O(𝑛) O(1), if array is not full full
O(𝑛), if array is full
Deletion at ending O(𝑛) O(1) O(𝑛)
O(𝑛), if array is not full (for shifting the
Insertion in middle O(𝑛) O(𝑛)
elements)
O(𝑛), if array is not full (for shifting the
Deletion in middle O(𝑛) O(𝑛)
elements)
Wasted space O(𝑛) (for pointers) 0 O(𝑛)
4 15 7 40 NULL
Head
The following is a type declaration for a linked list of integers:
struct ListNode { // defines a ListNode in a linked list
int data; // the datum
struct ListNode *next; // pointer to the next ListNode
};
4 15 7 40 NULL
head
The length() function takes a linked list as input and counts the number of nodes in the list. The function given
below can be used for printing the list data with extra print function.
void print(struct ListNode *head) {
struct ListNode *cur = head;
for (cur = head; cur != NULL; cur = cur->next) {
printf("%d ", cur->data);
}
printf("\n");
}
// Calculating the size of the list
int length(struct ListNode *head) {
struct ListNode *cur = head;
int count = 0;
for (cur = head; cur != NULL; cur = cur->next) {
count++;
cur = cur->next;
}
return count;
}
Time Complexity: O(𝑛), for scanning the list of size 𝑛. Space Complexity: O(1), for creating a temporary variable.
head
• Update head pointer to point to the new node.
new node
data 15 7 40 NULL
head
struct ListNode *insertAtBeginning(struct ListNode *head, int data){ // add to beginning of list
struct ListNode *temp;
// create a new temp node
temp=(struct ListNode *)malloc(sizeof(struct ListNode)); //create unnamed node and pointer
temp->data=data; //set temp data value
temp->next=NULL; // set as end of list
if (head== NULL){ // then list empty, so set as head node
head=temp;
head->next=NULL;
}
else{ // else add to left of list
temp->next=head;
head=temp;
}
return head;
}
Time Complexity: O(1). Space Complexity: O(1).
4 15 7 0
data NULL
head
• Last nodes next pointer points to the new node.
new node
4 15 7 0data NULL
head
struct ListNode *insertAtEnd(struct ListNode *head, int data){ //add to end of list
struct ListNode *temp,*cur;
// create a new temp node
temp= (struct ListNode *)malloc(sizeof(struct ListNode)); //create temp node
temp->data=data; //set temp value
temp->next=NULL; // set as end of list
*cur = head;
if (cur==NULL){
// then no head node exists yet, so set temp as head
head=temp;
}
else{
// find end of current list
while(cur->next != NULL)
cur=cur->next;
// cur is now the last node in list (next==NULL)
// set temp as new next node
cur->next =temp;
}
return head;
}
Time Complexity: O(𝑛), for scanning the list of size 𝑛. Space Complexity: O(1).
data
head
data
new node
Let us write the code for all three cases. We must update the first element pointer in the calling function, not just
in the called function. As a result, we require a double pointer to be passed. The subsequent code is used to insert
a node into a singly linked list.
4 15 7 40 NULL
Head Temp
• Now, move the head nodes pointer to the next node and dispose of the temporary node.
4 15 7 40 NULL
Temp Head
void *deleteFirst(struct ListNode **head){
struct ListNode *temp;
if (*head == NULL)
return;
temp = *head;
*head = (*head)->next;
free(temp);
return;
}
Time Complexity: O(1). Space Complexity: O(1).
• Traverse the list and while traversing maintain the previous node address also. By the time we reach the
end of the list, we will have two pointers, one pointing to the 𝑡𝑎𝑖𝑙 node and the other pointing to the node
𝑏𝑒𝑓𝑜𝑟𝑒 the tail node.
4 15 7 40 NULL
4 15 7 40 NULL
Head
Previous node to tail Tail
• Dispose of the tail node.
NULL
4 15 7 40 NULL
4 15 7 40 NULL
4 15 7 40 NULL
Head
Previous node Node to be deleted
}
p = *head;
if(position == 1) { /* from the beginning */
*head = (*head)->next;
free (p);
return;
}
else {
while ((p != NULL) && k < position ) { // traverse the list to the position from which we want to delete
k++;
q = p;
p = p->next;
}
if(p == NULL){ /* at the end */
printf ("Position does not exist.\n");
return;
}
else { /* from the middle */
q->next = p->next;
free(p);
}
return;
}
}
Time Complexity: O(𝑛). In the worst case, we may need to delete the node at the end of the list.
Space Complexity: O(1).
struct DLLNode {
int data;
struct DLLNode *next;
struct DLLNode *prev;
};
NULL NULL
• Update head node’s left pointer to point to the new node and make new node as head.
Head
data 15 7 40 NULL
NULL
// Insert at the beginning of the list
void insertAtBeginning(struct DLLNode **head, int data){
struct DLLNode *current = *head;
struct DLLNode *newNode = (struct DLLNode *) (malloc(sizeof(struct DLLNode)));
if(!newNode) {
printf("Memory Error");
return;
}
newNode->prev = NULL;
newNode->data = data;
newNode->next = NULL;
if(current == NULL){
*head = newNode;
return;
}
newNode->next = *head;
(*head)->prev = newNode;
*head = newNode;
}
Time Complexity: O(1). Space Complexity: O(1).
Head
NULL 4 15 7 NULL data NULL
• Set new node left pointer to the end of the list. NULL
Head
NULL 4 15 7 NULL data NULL
Head
data
New node
• Position node right pointer points to the new node and the left of position nodes’ 𝑛𝑒𝑥𝑡 𝑛𝑜𝑑𝑒 points to new
node.
NULL
Position
node
4 15 7 40 NULL
Head
data
New node
Next, we will provide code for all three cases. It is important to note that we must update the first element pointer
in the calling function and not just within the called function. Therefore, we need to pass a double pointer to
ensure proper updating. The code below demonstrates how to insert a node into a doubly linked list.
void insert(struct DLLNode **head, int data, int position) {
int k = 1;
struct DLLNode *temp, *newNode;
newNode = (struct DLLNode *) malloc(sizeof ( struct DLLNode ));
if(!newNode) { //Always check for memory errors
printf ("Memory Error");
return;
}
newNode->prev = NULL;
newNode->data = data;
newNode->next = NULL;
if(position == 1) { //Inserting a node at the beginning
newNode->next = *head;
newNode->prev = NULL;
if(*head)
(*head)->prev = newNode;
*head = newNode;
return;
}
temp = *head;
while (k < position-1 && temp->next != NULL) {
temp = temp->next;
k++;
}
if(k < position-1){
printf("Desired position does not exist\n");
return;
}
newNode->next = temp->next;
newNode->prev = temp;
if(temp->next)
temp->next->prev = newNode;
temp->next = newNode;
return;
}
Time Complexity: O(𝑛). In the worst case, we may need to insert the item at the end of the list. Space Complexity: O(1).
4 15 7 40 NULL
Head Temp
• Now, move the head nodes pointer to the next node and change the heads previous pointer to NULL. Then,
dispose of the temporary node.
4 15 7 40 NULL
4 15 7 40 NULL
Head Tail
Previous node to tail
• Update the next pointer of previous node to the tail node with NULL.
NULL NULL
4 15 7 40 NULL
Head Tail
Previous node to tail
• Dispose the tail node.
NULL NULL
4 15 7 40 NULL
Head Tail
Previous node to tail
void deleteLastNode(struct DLLNode **head) {
struct DLLNode *temp = *head, *current = *head;
if(*head == NULL) {
printf( "List empty!");
return;
}
while (current->next != NULL)
current = current->next;
temp = current->prev;
temp->next = current->next;
free(current);
return;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
3.7 Doubly Linked Lists 55
Data Structures and Algorithms Made Easy Linked Lists
4 15 7 40 NULL
Head
Previous node Node to be deleted
• Dispose of the current node to be deleted.
4 15 7 40 NULL
Head
Previous Node to be deleted
node
void delete(struct DLLNode **head, int position) {
struct DLLNode *temp2, *temp = *head;
int k = 1;
if(*head == NULL) {
printf("List is empty");
return;
}
if(position == 1) {
*head = (*head)->next;
if(*head != NULL)
(*head)->prev = NULL;
free(temp);
return;
}
while(k < position && temp->next!=NULL) {
temp = temp->next;
k++;
}
if(k < position-1){
printf("Desired position does not exist\n");
return;
}
temp2 = temp->prev;
temp2->next = temp->next;
if(temp->next) // Deletion from Intermediate Node
temp->next->prev = temp2;
free(temp);
return;
}
Time Complexity: O(𝑛), for scanning the complete list of size 𝑛. Space Complexity: O(1).
4 15 7 40
Head
In some situations, circular linked lists are useful. For example, when several processes are using the same
computer resource (say, CPU) for the same amount of time, we have to assure that no process accesses the
resource before all other processes do (round robin algorithm). The following is a type declaration for a circular
linked list of integers:
typedef struct CLLNode {
int data;
struct CLLNode *next;
};
In a circular linked list, we access the elements using the ℎ𝑒𝑎𝑑 node (similar to ℎ𝑒𝑎𝑑 node in singly linked list and
doubly linked lists).
4 15 7 40
Head
The circular list is accessible through the node marked ℎ𝑒𝑎𝑑. To count the nodes, the list has to be traversed from
the node marked ℎ𝑒𝑎𝑑, with the help of a dummy node 𝑐𝑢𝑟𝑟𝑒𝑛𝑡, and stop the counting when 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 reaches the
starting node ℎ𝑒𝑎𝑑. If the list is empty, ℎ𝑒𝑎𝑑 will be NULL, and in that case set 𝑐𝑜𝑢𝑛𝑡 = 0. Otherwise, set the
current pointer to the first node, and keep on counting till the current pointer reaches the starting node.
int length(struct CLLNode *head) {
struct CLLNode *current = head;
int count = 0;
if(head == NULL) return 0;
do {
current = current→next;
count++;
} while (current != head);
return count;
}
Time Complexity: O(𝑛), for scanning the complete list of size 𝑛. Space Complexity: O(1), for creating one temporary variable.
4 15 7 40
Head
4 15 7 40
• Update the next pointer of the new node with the head node and also traverse the list to the tail. That
means, in a circular list we should stop at the node whose next node is head.
tail node
4 15 7 40
• Update the next pointer of the tail node to point to the new node and we get the list as shown below.
4 15 17 40 data
Head
void insertAtEnd(struct CLLNode **head, int data){
struct CLLNode *current = *head;
struct CLLNode *newNode = (struct CLLNode *) (malloc(sizeof(struct CLLNode)));
if(!newNode) {
printf("Memory Error");
return;
}
newNode->data = data;
while (current->next != *head)
current = current->next;
newNode->next = newNode;
if(*head ==NULL)
*head = newNode;
else {
newNode->next = *head;
current->next = newNode;
}
}
Time Complexity: O(𝑛), for scanning the complete list of size 𝑛. Space Complexity: O(1), for temporary variable.
4 15 7 40
data
New node Head
• Update the next pointer of the new node with the head node and also traverse the list until the tail node
(node with data 40).
4 15 7 40
Head
data New node
• Update the tail node in the list to point to the new node.
4 15 7 40
Head
data New node
data 4 15 7 40
Head
void insertAtBegin(struct CLLNode **head, int data){
struct CLLNode *current = *head;
struct CLLNode *newNode = (struct CLLNode *) (malloc(sizeof(struct CLLNode)));
if(!newNode) {
printf("Memory Error");
return;
}
newNode->data = data;
newNode->next = newNode;
if(current == NULL){
*head = newNode;
return;
}
while (current->next != *head)
current = current->next;
newNode->next = *head;
current->next = newNode;
*head = newNode;
}
Time Complexity: O(𝑛), for scanning the complete list of size 𝑛. Space Complexity: O(1), for temporary variable.
60 4 15 7 40
Head
Previous node to deleting node Node to be deleted
• Update the next pointer of tail node’s previous node to point to head.
60 4 15 7 40
60 4 15 7 40
60 4 15 7 40
Head
Node to be deleted Previous node to deleting node
• Create a temporary node which will point to the head. Also, update the tail nodes next pointer to point to
next node of head (as shown below).
Temp
60 4 15 7 40
• Now, move the head pointer to next node and dispose the temporary node (as shown below).
Temp
60 4 15 7 40
Head
free(temp);
return;
}
Time Complexity: O(𝑛), for scanning the complete list of size 𝑛. Space Complexity: O(1), for a temporary variable.
A B C D NULL
Head
Pointer
In the example above, differences
(B ⊕ D) ⊕ B = D (since, B ⊕ B=0)
From the above discussion we can see that just by using a single pointer, we can move back and forth. A memory-
efficient implementation of a doubly linked list is possible with minimal compromising of timing efficiency.
List Head
10 1 30 6 70 3 45 2 91 19 4 17 /
Assume that there will be no more than 𝑛 elements in the unrolled linked list at any time. To simplify this problem,
all blocks, except the last one, should contain exactly ⌈√𝑛⌉ elements. Thus, there will be no more than ⌊√𝑛⌋ blocks
at any time.
10 1 30 6 70 3 45 2 91 19 4 17 /
10 1 22 30 6 70 3 45 2 19 4 17 /
10 1 22 30 6 70 3 45 2 19 4 17 /
70 3 45 19 4 17
2. In block 𝐴, move the next pointer of the head node to point to the second-to-last node, so that the tail
node of 𝐴 can be removed.
temp
A B
70 3 45 19 4 17
3. Let the next pointer of the node, which will be shifted (the tail node of 𝐴), point to the tail node of 𝐵.
A B
70 3 19 4 17
45
temp
4. Let the next pointer of the head node of 𝐵 point to the node 𝑡𝑒𝑚𝑝 points to.
A B
70 3 19 4 17
45
temp
5. Finally, set the head pointer of 𝐵 to point to the node 𝑡𝑒𝑚𝑝 points to. Now the node 𝑡𝑒𝑚𝑝 points to
becomes the new head node of 𝐵.
A B
70 3 45 19 4 17
temp
6. 𝑡𝑒𝑚𝑝 pointer can be thrown away. We have completed the shift operation to move the original tail node
of 𝐴 to become the new head node of 𝐵.
A B
70 3 45 19 4 17
Performance
With unrolled linked lists, there are a couple of advantages, one in speed and one in space. First, if the number
of elements in each block is appropriately sized (e.g., at most the size of one cache line), we get noticeably better
cache performance from the improved memory locality. Second, since we have O(𝑛/𝑚) links, where 𝑛 is the number
of elements in the unrolled linked list and 𝑚 is the number of elements we can store in any block, we can also
save an appreciable amount of space, which is particularly noticeable if each element is small.
struct ListNode {
int data;
struct ListNode *prev;
struct ListNode *next;
};
Assuming we have 4 byte pointers, each node is going to take 8 bytes. But the allocation overhead for the node
could be anywhere between 8 and 16 bytes. Let’s go with the best case and assume it will be 8 bytes. So, if we
want to store 1K items in this list, we are going to have 16KB of overhead.
Now, let’s think about an unrolled linked list node (let us call it 𝐿𝑖𝑛𝑘𝑒𝑑𝐵𝑙𝑜𝑐𝑘). It will look something like this:
struct LinkedBlock{
struct LinkedBlock *next;
struct ListNode *head;
int nodeCount;
};
Therefore, allocating a single node (12 bytes + 8 bytes of overhead) with an array of 100 elements (400 bytes + 8
bytes of overhead) will now cost 428 bytes, or 4.28 bytes per element. Thinking about our 1K items from above, it
would take about 4.2KB of overhead, which is close to 4x better than our original list. Even if the list becomes
severely fragmented and the item arrays are only 1/2 full on average, this is still an improvement. Also, note that
we can tune the array size to whatever gets us the best overhead for our application.
Implementation
#include <stdio.h>
#include <stdlib.h>
#define MAXSKIPLEVEL 5
struct ListNode {
int data;
struct ListNode *next[1];
};
struct SkipList {
struct ListNode *header;
int listLevel; //current level of list */
};
struct SkipList list;
struct ListNode *insertElement(int data) {
int i, newLevel;
struct ListNode *update[MAXSKIPLEVEL+1];
struct ListNode *temp;
temp = list.header;
for (i = list.listLevel; i >= 0; i--) {
while (temp→next[i] !=list.header && temp→next[i]→data < data)
temp = temp→next[i];
update[i] = temp;
}
temp = temp→next[0];
if (temp != list.header && temp→data == data)
return(temp);
//determine level
for (newLevel = 0; rand() < RAND_MAX/2 && newLevel < MAXSKIPLEVEL; newLevel++);
if (newLevel > list.listLevel) {
for (i = list.listLevel + 1; i <= newLevel; i++)
update[i] = list.header;
list.listLevel = newLevel;
}
// make new node
if ((temp = malloc(sizeof(Node) +
newLevel*sizeof(Node *))) == 0) {
printf ("insufficient memory (insertElement)\n");
exit(1);
}
temp→data = data;
for (i = 0; i <= newLevel; i++) { // update next links
temp→next[i] = update[i]→next[i];
update[i]→next[i] = temp;
}
return(temp);
}
// delete node containing data
void deleteElement(int data) {
int i;
struct ListNode *update[MAXSKIPLEVEL+1], *temp;
temp = list.header;
for (i = list.listLevel; i >= 0; i--) {
while (temp→next[i] != list.header && temp→next[i]→data < data)
temp = temp→next[i];
update[i] = temp;
}
temp = temp→next[0];
if (temp == list.header || !(temp→data == data) return;
//adjust next pointers
for (i = 0; i <= list.listLevel; i++) {
if (update[i]→next[i] != temp) break;
update[i]→next[i] = temp→next[i];
}
free (temp);
//adjust header level
while ((list.listLevel > 0) && (list.header→next[list.listLevel] == list.header))
list.listLevel--;
}
// find node containing data
struct ListNode *findElement(int data) {
struct ListNode *temp = list.header;
for (int i = list.listLevel; i >= 0; i--) {
while (temp→next[i] != list.header
&& temp→next[i]→data < data)
temp = temp→next[i];
}
temp = temp→next[0];
if (temp != list.header && temp→data == data) return (temp);
return(0);
}
// initialize skip list
void initList() {
int i;
if ((list.header = malloc(sizeof(struct ListNode) + MAXSKIPLEVEL*sizeof(struct ListNode *))) == 0) {
printf ("Memory Error\n");
exit(1);
}
for (i = 0; i <= MAXSKIPLEVEL; i++)
list.header→next[i] = list.header;
list.listLevel = 0;
}
/* command-line: skipList maxnum skipList 2000: process 2000 sequential records */
int main(int argc, char **argv) {
int i, *a, maxnum = atoi(argv[1]);
initList();
if ((a = malloc(maxnum * sizeof(*a))) == 0) {
fprintf (stderr, "insufficient memory (a)\n");
exit(1);
}
for (i = 0; i < maxnum; i++) a[i] = rand();
printf ("Random, %d items\n", maxnum);
for (i = 0; i < maxnum; i++) {
insertElement(a[i]);
}
for (i = maxnum-1; i >= 0; i--) {
findElement(a[i]);
}
for (i = maxnum-1; i >= 0; i--) {
deleteElement(a[i]);
}
return 0;
}
Performance
In a simple linked list that consists of 𝑛 elements, to perform a search 𝑛 comparisons are required in the worst
case. If a second pointer pointing two nodes ahead is added to every node, the number of comparisons goes down
to 𝑛/2 + 1 in the worst case.
Adding one more pointer to every fourth node and making them point to the fourth node ahead reduces the
number of comparisons to ⌈𝑛/4⌉ + 2. If this strategy is continued so that every node with 𝑖 pointers points to 2 ∗
𝑖 − 1 nodes ahead, O(𝑙𝑜𝑔𝑛) performance is obtained and the number of pointers has only doubled (𝑛 + 𝑛/2 + 𝑛/4 +
𝑛/8 + 𝑛/16 + .... = 2𝑛).
The find, insert, and remove operations on ordinary binary search trees are efficient, O(𝑙𝑜𝑔𝑛), when the input data
is random; but less efficient, O(𝑛), when the input data is ordered. Skip List performance for these same operations
and for any data set is about as good as that of randomly-built binary search trees - namely O(𝑙𝑜𝑔𝑛).
Implementation
#include <stdio.h>
#include <stdlib.h>
#define MAXSKIPLEVEL 5
struct ListNode {
int data;
struct ListNode *next[1];
};
struct SkipList {
struct ListNode *header;
int listLevel; //current level of list */
};
struct SkipList list;
struct ListNode *insertElement(int data) {
int i, newLevel;
struct ListNode *update[MAXSKIPLEVEL+1];
struct ListNode *temp;
temp = list.header;
for (i = list.listLevel; i >= 0; i--) {
while (temp→next[i] !=list.header && temp→next[i]→data < data)
temp = temp→next[i];
update[i] = temp;
}
temp = temp→next[0];
if (temp != list.header && temp→data == data)
return(temp);
//determine level
for (newLevel = 0; rand() < RAND_MAX/2 && newLevel < MAXSKIPLEVEL; newLevel++);
if (newLevel > list.listLevel) {
for (i = list.listLevel + 1; i <= newLevel; i++)
update[i] = list.header;
list.listLevel = newLevel;
}
// make new node
if ((temp = malloc(sizeof(Node) +
newLevel*sizeof(Node *))) == 0) {
printf ("insufficient memory (insertElement)\n");
exit(1);
}
temp→data = data;
for (i = 0; i <= newLevel; i++) { // update next links
temp→next[i] = update[i]→next[i];
update[i]→next[i] = temp;
}
return(temp);
}
// delete node containing data
void deleteElement(int data) {
int i;
struct ListNode *update[MAXSKIPLEVEL+1], *temp;
temp = list.header;
for (i = list.listLevel; i >= 0; i--) {
while (temp→next[i] != list.header && temp→next[i]→data < data)
temp = temp→next[i];
update[i] = temp;
}
temp = temp→next[0];
if (temp == list.header || !(temp→data == data) return;
//adjust next pointers
for (i = 0; i <= list.listLevel; i++) {
of the list (option C) is also an O(1) operation, but it requires updating the head pointer, which can be more
complicated than simply adding a new node to the beginning of the list.
Problem-2 Which of the following best describes the advantages of a doubly linked list over a singly linked
list?
A. Doubly linked lists have faster insertion and deletion operations.
B. Doubly linked lists require less memory than singly linked lists.
C. Doubly linked lists provide more efficient traversal of the list in reverse order.
D. Doubly linked lists are less prone to pointer errors than singly linked lists.
Solution: The correct answer is: C. Doubly linked lists provide more efficient traversal of the list in reverse order.
While doubly linked lists do allow for faster deletion and insertion operations at certain points in the list, this is
not a universal advantage over singly linked lists. Additionally, doubly linked lists require more memory than
singly linked lists due to the extra pointer for each node. The claim that doubly linked lists are less prone to
pointer errors is also false - both types of linked lists require careful management of pointers to prevent errors.
The primary advantage of a doubly linked list is its ability to efficiently traverse the list in both directions, due to
the presence of a previous pointer in each node. This is not possible in a singly linked list, where traversal is only
possible in the forward direction.
Problem-3 Which of the following is a disadvantage of a circular linked list compared to a singly linked list?
A. Circular linked lists require more memory than singly linked lists.
B. Circular linked lists can be more difficult to implement than singly linked lists.
C. Circular linked lists cannot be traversed in reverse order.
D. Circular linked lists can result in an infinite loop if not implemented carefully.
Solution: The correct answer is: D. Circular linked lists can result in an infinite loop if not implemented carefully.
While circular linked lists do require an extra pointer to point back to the head of the list, this does not necessarily
make them more memory-intensive than singly linked lists. Additionally, while they may be slightly more complex
to implement, this is not a universal disadvantage over singly linked lists.
One potential disadvantage of circular linked lists is that they cannot be traversed in reverse order, as the nodes
only have pointers to the next node, not the previous one. However, this is not always a critical concern for all use
cases.
The most significant disadvantage of circular linked lists is that if the pointers are not managed carefully, they
can result in an infinite loop that can cause the program to crash or hang. For example, if the next pointer of the
last node is not set to NULL, or if the same node is inserted into the list multiple times, an infinite loop can occur.
Problem-4 Implement Stack using Linked List.
Solution: Refer to 𝑆𝑡𝑎𝑐𝑘𝑠 chapter.
Problem-5 True or False: A linked list can contain cycles, where a node points back to an earlier node in the
list, creating a loop.
Solution: True. A linked list is a data structure that consists of a sequence of nodes, each containing a value and
a reference (or pointer) to the next node in the sequence. In a singly-linked list, each node only has a reference to
the next node in the list. In a doubly-linked list, each node has references to both the next and previous nodes in
the list.
In a linked list, it's possible for a node to have a reference that points back to an earlier node in the list, creating
a loop or cycle. This can happen when two or more nodes in the list have the same reference to the next node. For
example, if we have a linked list with nodes A, B, C, and D, and node D has a reference back to node B, then the
linked list will contain a cycle. Here's an example of a linked list with a cycle:
A -> B -> C -> D -> B (cycle)
In this example, node D points back to node B, creating a cycle in the linked list. So, to sum up: a linked list can
contain cycles, where a node points back to an earlier node in the list, creating a loop. Therefore, the answer to
the true/false question "A linked list can contain cycles" is true.
Problem-6 Find 𝑘 𝑡ℎ node from the end of a Linked List.
Solution: Brute-Force Method: Start with the first node and count the number of nodes present after that node.
If the number of nodes is < 𝑘 − 1 then return saying “fewer number of nodes in the list”. If the number of nodes
is > 𝑘 − 1 then go to next node. Continue this until the numbers of nodes after current node are 𝑘 − 1.
Time Complexity: O(𝑛2 ), for scanning the remaining list (from current node) for each node. Space Complexity:
O(1).
Problem-7 Can we improve the complexity of Problem-6?
Solution: Yes, using hash table. As an example, consider the following list.
5 1 17 4 NULL
Head
In this approach, create a hash table whose entries are < 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑛𝑜𝑑𝑒, 𝑛𝑜𝑑𝑒 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 >. That means, key is the
position of the node in the list and value is the address of that node.
Position in List Address of Node
1 Address of 5 node
2 Address of 1 node
3 Address of 17 node
4 Address of 4 node
By the time we traverse the complete list (for creating the hash table), we can find the list length. Let us say the
list length is 𝑀. To find 𝑛𝑡ℎ from the end of linked list, we can convert this to 𝑀- 𝑛 + 1𝑡ℎ from the beginning. Since
we already know the length of the list, it is just a matter of returning 𝑀- 𝑛 + 1𝑡ℎ key value from the hash table.
Time Complexity: Time for creating the hash table, 𝑇(𝑚) = O(𝑚). Space Complexity: Since we need to create a hash
table of size 𝑚, O(𝑚).
Problem-8 Can we solve Problem-5 without creating the hash table?
Solution: Yes. Upon examining the previous solution, it becomes apparent that we are determining the size of the
linked list. This involves utilizing a hash table to accomplish this task. However, we could also find the length of
the linked list by starting at the head node and traversing the list. Therefore, we do not necessarily need to create
a hash table to determine the length of the list. After finding the length, compute 𝑛 − 𝑘 + 1 and with one more scan
we can get the (𝑛 − 𝑘 + 1)𝑡ℎ node from the beginning. This solution needs two scans: one for finding the length of
the list and the other for finding (𝑛 − 𝑘 + 1)𝑡ℎ node from the beginning.
Time Complexity: Time for finding the length + Time for finding the (𝑛 − 𝑘 + 1)𝑡ℎ node from the beginning.
Therefore, 𝑇(𝑛) = O(𝑛) + O(𝑛) ≈ O(𝑛). Space Complexity: O(1). Hence, no need to create the hash table.
Problem-9 Can we solve Problem-5 in one scan?
Solution: Yes. Efficient Approach: Use two pointers 𝑘𝑡ℎ𝑁𝑜𝑑𝑒 and 𝑝𝑇𝑒𝑚𝑝. Initially, both point to head node of the
list. 𝑘𝑡ℎ𝑁𝑜𝑑𝑒 starts moving only after 𝑝𝑇𝑒𝑚𝑝 has made 𝑘 − 1 moves. From there both move forward until 𝑝𝑇𝑒𝑚𝑝
reaches the end of the list. As a result 𝑘𝑡ℎ𝑁𝑜𝑑𝑒 points to 𝑘 𝑡ℎ node from the end of the linked list.
Note: Both pointers move one node at a time.
struct ListNode* kthNodeFromEnd(struct ListNode* head, int k){
struct ListNode *pTemp, *kthNode;
int i;
pTemp = kthNode = head;
/* k should be less than length of linked list */
if(k > getLength(head)){
printf("Error: k is greater than length of linked list\n");
return NULL;
}
/* Move pTemp pointer k-1 ListNodes. This will create a difference of k-1 ListNodes
between pTemp and kthNode */
for(i = 0; i < k-1; i++){
pTemp = pTemp->next;
}
/* Now, move both pointers together till pTemp reaches last ListNode of linked list.
when pTemp reaches last ListNode kthNode pointer will be pointing to Nth last ListNode*/
while(pTemp->next != NULL){
pTemp = pTemp->next;
kthNode = kthNode->next;
}
return kthNode;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-10 Check whether the given linked list is either NULL-terminated or ends in a cycle (cyclic).
Solution: Brute-Force Approach. To check whether a linked list is NULL-terminated or not, we can simply
traverse the linked list from the head node to the tail node, following each node's next pointer until we reach the
end of the list. If the last node's next pointer is NULL, then the linked list is NULL-terminated. If the last node's
next pointer is not NULL, then the linked list contains a cycle.
As an example, consider the following linked list which has a loop in it. The difference between this list and the
regular list is that, in this list, there are two nodes whose next pointers are the same. In regular singly linked lists
(without a loop) each node’s next pointer is unique. That means the repetition of next pointers indicates the
existence of a loop.
One simple and brute force way of solving this is, start with the first node and see whether there is any node
whose next pointer is the current node’s address. If there is a node with the same address, then that indicates
that some other node is pointing to the current node and we can say a loop exists. Continue this process for all
the nodes of the linked list.
Does this method work? As per the algorithm, we are checking for the next pointer addresses, but how do we
find the end of the linked list (otherwise we will end up in an infinite loop)?
Note: If we start with a node in a loop, this method may work depending on the size of the loop.
Problem-11 Can we use the hashing technique for solving Problem-10?
Solution: Yes. Using Hash Tables, we can solve this problem.
Algorithm:
• Traverse the linked list nodes one by one.
• Check if the address of the node is available in the hash table or not.
• If it is already available in the hash table, that indicates that we are visiting the node that was already
visited. This is possible only if the given linked list has a loop in it.
• If the address of the node is not available in the hash table, insert that node’s address into the hash table.
• Continue this process until we reach the end of the linked list 𝑜𝑟 we find the loop.
Time Complexity: O(𝑛) for scanning the linked list. Note that we are doing a scan of only the input.
Space Complexity: O(𝑛) for hash table.
Problem-12 Can we solve Problem-10 using the sorting technique?
Solution: No. Consider the following algorithm which is based on sorting. Then we see why this algorithm fails.
Algorithm:
• Traverse the linked list nodes one by one and take all the next pointer values into an array.
• Sort the array that has the next node pointers.
• If there is a loop in the linked list, definitely two next node pointers will be pointing to the same node.
• After sorting if there is a loop in the list, the nodes whose next pointers are the same will end up adjacent
in the sorted list.
• If any such pair exists in the sorted list then we say the linked list has a loop in it.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛) for sorting the next pointers array. Space Complexity: O(𝑛) for the next pointers array.
Problem with the above algorithm: The above algorithm works only if we can find the length of the list. But if
the list has a loop then we may end up in an infinite loop. Due to this reason the algorithm fails.
Problem-13 Can we solve the Problem-10 in O(𝑛)?
Solution: Yes. Efficient Approach (Memoryless Approach): The space complexity can be reduced to O(1) by
considering two pointers at different speed - a slow pointer and a fast pointer. The slow pointer moves one step at
a time while the fast pointer moves two steps at a time. This problem was solved by 𝐹𝑙𝑜𝑦𝑑. The solution is named
the Floyd cycle finding algorithm (also known as the "tortoise and hare" algorithm). It uses 𝑡𝑤𝑜 pointers moving at
different speeds to walk the linked list. If there is no cycle in the list, the fast pointer will eventually reach the end
and we can return false in this case. Now consider a cyclic list and imagine the slow and fast pointers are two
runners racing around a circle track. Once they enter the loop they are expected to meet, which denotes that there
is a loop.
This works because the only way a faster moving pointer would point to the same location as a slower moving
pointer is if somehow the entire list or a part of it is circular. Think of a tortoise and a hare running on a track.
The faster running hare will catch up with the tortoise if they are running in a loop. As an example, consider the
following example and trace out the Floyd algorithm. From the diagrams below we can see that after the final step
they are meeting at some point in the loop which may not be the starting point of the loop.
Note: 𝑠𝑙𝑜𝑤𝑃𝑡𝑟 (𝑡𝑜𝑟𝑡𝑜𝑖𝑠𝑒) moves one pointer at a time and 𝑓𝑎𝑠𝑡𝑃𝑡𝑟 (ℎ𝑎𝑟𝑒) moves two pointers at a time.
slowPtr
fastPtr
slowPtr fastPtr
fastPtr
slowPtr
slowPtr
fastPtr
slowPtr
fastPtr
fastPtr slowPtr
slowPtr
fastPtr
if (slow == fast) {
printf("\n Linked list contains a loop\n");
return 1;
}
}
printf("\n No loop in linked list\n");
return 0;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-14 We are given a pointer to the first element of a linked list 𝐿. There are two possibilities for 𝐿: it
either ends (snake) or its last element points back to one of the earlier elements in the list (snail). Give an
algorithm that tests whether a given list 𝐿 is a snake or a snail.
Solution: It is the same as Problem-10.
Problem-15 Check whether the given linked list is NULL-terminated or not. If there is a cycle, find the start
node of the loop.
Solution: This solution builds upon the approach used in Problem-10. Once a loop is detected in the linked list,
we set the slow pointer to the head of the linked list. We then move both the slow and fast pointers one node at a
time until they meet again at the start of the loop. This approach is commonly used for removing loops in a linked
list.
struct ListNode* findLoopBeginning(struct ListNode * head) {
struct ListNode *slow = head, *fast = head;
int loopExists = 0;
while (fast && fast->next) {
slow = slow->next;
fast = fast->next->next;
if (slow == fast){
loopExists = 1;
break;
}
}
if(loopExists) {
slow = head;
while(slow != fast) {
fast = fast->next;
slow = slow->next;
}
return slow;
}
return NULL;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-16 From the previous discussion and problems, we understand that the meeting of tortoise and hare
concludes the existence of the loop, but how does moving the tortoise to the beginning of the linked list while
keeping the hare at the meeting place, followed by moving both one step at a time, make them meet at the
starting point of the cycle?
Solution: This problem is at the heart of the number theory. In the Floyd cycle finding algorithm, notice that the
tortoise and the hare will meet when they are 𝑛 × 𝐿, where 𝐿 is the loop length. Furthermore, the tortoise is at
the midpoint between the hare and the beginning of the sequence because of the way they move. Therefore, the
tortoise is 𝑛 × 𝐿 away from the beginning of the sequence as well. If we move both one step at a time, from the
position of the tortoise and from the start of the sequence, we know that they will meet as soon as both are in the
loop, since they are 𝑛 × 𝐿, a multiple of the loop length, apart. One of them is already in the loop, so we just move
the other one in single step until it enters the loop, keeping the other 𝑛 × 𝐿 away from it at all times.
Problem-17 In the Floyd cycle finding algorithm, does it work if we use steps 2 and 3 instead of 1 and 2?
Solution: Yes, but the complexity might be high. Trace out an example.
Problem-18 Check whether the given linked list is NULL-terminated. If there is a cycle, find the length of the loop.
Solution: This solution is also an extension of the basic cycle detection problem. After finding the loop in the
linked list, keep the 𝑠𝑙𝑜𝑤 as it is. The 𝑓𝑎𝑠𝑡 keeps on moving until it again comes back to 𝑠𝑙𝑜𝑤. While moving 𝑓𝑎𝑠𝑡,
use a counter variable which increments at the rate of 1.
int loopLength(struct ListNode * head) {
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Recursive version: The recursive version is slightly trickier, and the key is to work backwards. We will find it
easier to start from the bottom up, by asking and answering tiny questions:
• What is the reverse of NULL (the empty list)? NULL.
• What is the reverse of a one element list? The element itself.
• What is the reverse of an 𝑛 element list? The reverse of the second element followed by the first element.
struct ListNode* reverseLinkedListRecursive(struct ListNode *head) {
if (head == NULL || head->next == NULL)
return head;
struct ListNode *secondElem = head->next;
// unlink list from the rest or you will get a cycle
head->next = NULL;
// reverse everything from the second element on
struct ListNode *reverseRest = reverseLinkedListRecursive(secondElem);
secondElem->next = head; // then we join the two lists
return reverseRest;
}
Time Complexity: O(𝑛).
Space Complexity: O(𝑛). The extra space comes from implicit stack space due to recursion. The recursion could
go up to 𝑛 levels deep.
Problem-21 Suppose there are two singly linked lists both of which intersect at some point and become a
single linked list. The head or start pointers of both the lists are known, but the intersecting node is not
known. Also, the number of nodes in each of the lists before they intersect is unknown and may be different
in each list. 𝐿𝑖𝑠𝑡1 may have 𝑛 nodes before it reaches the intersection point, and 𝐿𝑖𝑠𝑡2 might have 𝑚 nodes
before it reaches the intersection point where 𝑚 and 𝑛 may be 𝑚 = 𝑛, 𝑚 < 𝑛 or 𝑚 > 𝑛. Give an algorithm for
finding the merging point.
NULL
Solution: Brute-Force Approach: One easy solution is to compare every node pointer in the first list with every
other node pointer in the second list by which the matching node pointers will lead us to the intersecting node.
But, the time complexity in this case will be O(𝑚𝑛) which will be high.
struct ListNode * intersectingNodeBruteForce(struct ListNode *head1, struct ListNode *head2){
// stores the result, if no ListNode is intersecting we return NULL
struct ListNode *temp;
while(head1 != NULL){
temp = head2;
while(temp != NULL){
if(temp == head1){
// found a matching node
return head1;
}
temp = temp -> next;
}
head1 = head1 -> next;
}
return NULL;
}
Time Complexity: O(𝑚𝑛). Space Complexity: O(1).
Problem-22 Can we solve Problem-21 using the sorting technique?
Solution: No. Consider the following algorithm which is based on sorting and see why this algorithm fails.
Algorithm:
• Take first list node pointers and keep them in some array and sort them.
• Take second list node pointers and keep them in some array and sort them.
• After sorting, use two indexes: one for the first sorted array and the other for the second sorted array.
• Start comparing values at the indexes and increment the index according to whichever has the lower value
(increment only if the values are not equal).
• At any point, if we are able to find two indexes whose values are the same, then that indicates that those
two nodes are pointing to the same node and we return that node.
Time Complexity: Time for sorting lists + Time for scanning (for comparing)
= O(𝑚𝑙𝑜𝑔𝑚) +O(𝑛𝑙𝑜𝑔𝑛) +O(𝑚 + 𝑛) We need to consider the one that gives the maximum value.
Space Complexity: O(1).
Any problem with the above algorithm? Yes. In the algorithm, we are storing all the node pointers of both the
lists and sorting. But we are forgetting the fact that there can be many repeated elements. This is because after
the merging point, all node pointers are the same for both the lists. The algorithm works fine only in one case and
it is when both lists have the ending node at their merge point.
Problem-23 Can we solve Problem-21 using hash tables?
Solution: Yes.
Algorithm:
• Select a list which has a smaller number of nodes (If we do not know the lengths beforehand then select
one list randomly).
• Now, traverse the other list and for each node pointer of this list check whether the same node pointer
exists in the hash table.
• If there is a merge point for the given lists, then we will definitely encounter the node pointer in the hash
table.
Time Complexity: Time for creating the hash table + Time for scanning the second list = O(𝑚) + O(𝑛) (or O(𝑛) +
O(𝑚), depending on which list we select for creating the hash table. But in both cases the time complexity is the
same. Space Complexity: O(𝑛) or O(𝑚).
Problem-24 Can we use stacks for solving the Problem-21?
Solution: Yes.
Algorithm:
• Create two stacks: one for the first list and one for the second list.
• Traverse the first list and push all the node addresses onto the first stack.
• Traverse the second list and push all the node addresses onto the second stack.
• Now both stacks contain the node address of the corresponding lists.
• Now compare the top node address of both stacks.
• If they are the same, take the top elements from both the stacks and keep them in some temporary variable
(since both node addresses are node, it is enough if we use one temporary variable).
• Continue this process until the top node addresses of the stacks are not the same.
• This point is the one where the lists merge into a single list.
• Return the value of the temporary variable.
Time Complexity: O(𝑚 + 𝑛), for scanning both the lists. Space Complexity: O(𝑚 + 𝑛), for creating two stacks for
both the lists.
Problem-25 Is there any other way of solving Problem-21?
Solution: Yes. Using “finding the first repeating number” approach in an array (for algorithm refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔
chapter).
Algorithm:
• Create an array 𝐴 and keep all the next pointers of both the lists in the array.
• In the array find the first repeating element [Refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter for algorithm].
• The first repeating number indicates the merging point of both the lists.
Time Complexity: O(𝑚 + 𝑛). Space Complexity: O(𝑚 + 𝑛).
Problem-26 Can we still think of finding an alternative solution for Problem-21?
Solution: Yes. By combining sorting and search techniques we can reduce the complexity.
Algorithm:
• Create an array 𝐴 and keep all the next pointers of the first list in the array.
• Sort these array elements.
• Then, for each of the second list elements, search in the sorted array (let us assume that we are using
binary search which gives O(𝑙𝑜𝑔𝑛)).
• Since we are scanning the second list one by one, the first repeating element that appears in the array is
nothing but the merging point.
Time Complexity: Time for sorting + Time for searching = O(𝑀𝑎𝑥(𝑚𝑙𝑜𝑔𝑚, 𝑛𝑙𝑜𝑔𝑛)). Space Complexity: O(𝑀𝑎𝑥(𝑚, 𝑛)).
Problem-27 Can we improve the complexity for Problem-21?
Solution: Yes. Maintain two pointers head1 and head2 initialized at the head of list1 and list2, respectively. Then
let them both traverse through the lists, one node at a time. First calculate the length of two lists and find the
difference. Then start from the longer list at the diff offset, iterate though 2 lists and find the node.
Efficient Approach:
• Find lengths (L1 and L2) of both lists -- O(𝑛) + O(𝑚) = O(𝑚𝑎𝑥(𝑚, 𝑛)).
• Take the difference 𝑑 of the lengths -- O(1).
• Make 𝑑 steps in longer list -- O(𝑑).
• Step in both lists in parallel until links to next node match -- O(𝑚𝑖𝑛(𝑚, 𝑛)).
• Total time complexity = O(𝑚𝑎𝑥(𝑚, 𝑛)).
• Space Complexity = O(1).
struct ListNode * intersectingListNode(struct ListNode * head1, struct ListNode * head2){
// get lengths of both the lists
int m = getLength(head1);
int n = getLength(head2);
struct ListNode * mergePoint = NULL; // to store the merge point
// finding the value of d based on the longer list
int diff = (m > n) ? (m-n) : (n-m);
//traverse the smaller longer list for 'diff' steps
if(m > n){
while(diff--)
head1 = head1 -> next;
}
else{
while(diff--)
head2 = head2 -> next;
}
// now both lists have equal nodes till the end.
while(head1 && head2){
if(head1 -> next == head2 -> next){
mergePoint = head1 -> next;
break;
}
head1 = head1 -> next;
head2 = head2 -> next;
}
return mergePoint;
}
Problem-28 How will you find the middle of the linked list?
Solution: Brute-Force Approach: For each of the node, count how many nodes are there in the list, and see
whether it is the middle node of the list.
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-29 Can we improve the complexity of Problem-28?
Solution: Yes.
Algorithm:
• Traverse the list and find the length of the list.
• After finding the length, again scan the list and locate 𝑛/2 node from the beginning.
Time Complexity: Time for finding the length of the list + Time for locating middle node = O(𝑛) + O(𝑛) ≈ O(𝑛).
Space Complexity: O(1).
Problem-30 Can we use the hash table for solving Problem-28?
Solution: Yes. The reasoning is the same as that of Problem-7.
Time Complexity: Time for creating the hash table. Therefore, 𝑇(𝑛) = O(𝑛).
Space Complexity: O(𝑛). Since we need to create a hash table of size 𝑛.
temp2 = temp1;
current = current→next;
}
return temp2;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-37 Given a binary search tree convert it to a doubly linked list.
Solution: Refer 𝐵𝑖𝑛𝑎𝑟𝑦 𝑆𝑒𝑎𝑟𝑐ℎ 𝑇𝑟𝑒𝑒𝑠 section of 𝑇𝑟𝑒𝑒𝑠 chapter.
Problem-38 How do we sort the Linked Lists?
Solution: Refer 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter.
Problem-39 Split a Circular Linked List into two equal parts. If the number of nodes in the list are odd, then
make first list one node extra than second list.
Alternative problem statement: Given a linked list, split it into two sublists — one for the front half, and one
for the back half. If the number of elements is odd, the extra element should go in the front list. So the algorithm
on the list {2, 3, 5, 7, 11} should yield the two lists {2, 3, 5} and {7, 11}.
Solution:
Algorithm:
• Store the mid and last pointers of the circular linked list using Floyd cycle finding algorithm.
• Make the second half circular.
• Make the first half circular.
• Set head pointers of the two linked lists.
As an example, consider the following circular list.
4 15 7 40
Head
After the split, the above list will look like:
4 15 7 40
Head1 Head2
*head1 = head;
*head2 = slow->next;
slow->next = NULL;
}
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-40 If we want to concatenate two linked lists which of the following gives O(1) complexity?
1) Singly linked lists 2) Doubly linked lists 3) Circular doubly linked lists
Solution: Circular Doubly Linked Lists. This is because for singly and doubly linked lists, we need to traverse the
first list till the end and append the second list. But in the case of circular doubly linked lists we don’t have to
traverse the lists.
Problem-41 How will you check if the linked list is palindrome or not?
Solution:
Algorithm:
1. Get the middle of the linked list.
2. Reverse the second half of the linked list.
3. Compare the first half and second half.
4. Construct the original linked list by reversing the second half again and attaching it back to the first half.
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-42 For a given 𝐾 value (𝐾 > 0) reverse blocks of 𝐾 nodes in a list.
Example: Input: 1 2 3 4 5 6 7 8 9 10. Output for different 𝐾 values:
For 𝐾 = 2: 2 1 4 3 6 5 8 7 10 9 For 𝐾 = 3: 3 2 1 6 5 4 9 8 7 10 For 𝐾 = 4: 4 3 2 1 8 7 6 5 9 10
Solution:
Algorithm: This is an extension of swapping nodes in a linked list.
1) Check if the remaining list has 𝐾 nodes.
a. If yes get the pointer of 𝐾 + 1 𝑡ℎ node.
b. Else return.
2) Reverse first 𝐾 nodes.
3) Set next of last node (after reversal) to 𝐾 + 1 𝑡ℎ node.
4) Move to 𝐾 + 1 𝑡ℎ node.
5) Go to step 1.
6) 𝐾 − 1 𝑡ℎ node of first 𝐾 nodes become the new head if available. Otherwise, we can return the head.
struct ListNode * getKPlusOneThNode(int K, struct ListNode *head) {
struct ListNode *Kth;
int i = 0;
if(!head)
return head;
for (i = 0, Kth = head; Kth && (i < K); i++, Kth = Kth→next);
if(i == K && Kth != NULL)
return Kth;
return head→next;
}
int hasKnodes(struct ListNode *head, int K) {
int i =0;
for(i = 0; head && (i < K); i++, head = head→next);
if(i == K)
return 1;
return 0;
}
struct ListNode *reverseBlockOfK-nodesInLinkedList(struct ListNode *head, int K) {
struct ListNode *cur = head, *temp, *next, newHead;
int i;
if(K==0 || K==1)
return head;
if(HasKnodes(cur, K-1))
newHead = getKPlusOneThNode(K-1, cur);
else
newHead = head;
10 20 5 19 16 None
Solution: To clone a linked list with random pointers, the idea is to maintain a hash table for storing the mappings
from a original linked list node to its clone. For each node in the original linked list, we create a new node with
the same data and set its next pointers. While doing so, we also create a mapping from the original node to the
duplicate node in the hash table. Finally, we traverse the original linked list again and update random pointers of
the duplicate nodes using the hash table.
Algorithm:
• Scan the original list and for each node 𝑋, create a new node 𝑌 with data of 𝑋, then store the pair (𝑋, 𝑌) in
hash table using 𝑋 as a key. Note that during this scan set 𝑌 → 𝑛𝑒𝑥𝑡 with 𝑋 → 𝑛𝑒𝑥𝑡 and 𝑌 → 𝑟𝑎𝑛𝑑𝑜𝑚 to
𝑁𝑈𝐿𝐿 and we will fix it in the next scan. Now for each node 𝑋 in the original list we have a copy 𝑌 stored
in our hash table.
• To update the random pointers, read random pointer of node 𝑋 from original linked list and get the
corresponding random node in cloned list from hash table created in previous step. Assign random pointer
of node 𝑋 from cloned list to corresponding node we got.
struct ListNode *clone(struct ListNode *head){
struct ListNode *X, *Y;
struct HashTable *HT = createHashTable();
X = head;
while (X != NULL) {
Y = (struct ListNode *)malloc(sizeof(struct ListNode *));
Y→data = X→data;
Y→next = NULL;
Y→random = NULL;
HT.insert(X, Y);
X = X→next;
}
X = head;
while (X != NULL) {
// get the node Y corresponding to X from the hash table
Y = HT.get(X);
Y→next = HT.get(X→next);
Y→random = HT.get(X→random);
X = X→next;
}
// Return the head of the new list, that is the Node Y
return HT.get(head);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-46 Can we solve Problem-45 without any extra space?
Solution: Yes. First, a new node is inserted after each node in the original linked list. The content of the new node
is the same as the previous node. For example, in the figure, insert 10 after 10, insert 20 after 20, and so on.
Second, how does the random pointer in the original linked list map? For example, in the figure above, the random
pointer of 10 node points to 5, and the random pointer of 19 nodes points to 16. For every node X in the original
list, the statement: X→next→random = X→random→next this problem can be solved. This works because 𝑋 → 𝑛𝑒𝑥𝑡
is nothing but copy of original and 𝑋 → 𝑟𝑎𝑛𝑑𝑜𝑚 → 𝑛𝑒𝑥𝑡 is nothing but copy of random.
10 20 5 19 16
10 20 5 19 16 None
The third step is to split the new linked list from the linked list.
void clone(struct ListNode *head){
struct ListNode *X, *Y;
//Step1: put X→random in Y→next, so that we can reuse the X→random field to point to Y.
X = head;
while (X != NULL) {
Y = (struct ListNode *)malloc(sizeof(struct ListNode *));
Y→data = X→data;
Y→next = X→random;
X→random = Y;
X = X→next;
}
//Step2: Setting Y→random. Y→next is the old copy of the node that
// Y→random should point to, so X→next→random is the new copy.
X = head;
while (X != NULL) {
Y = X→random;
Y→random = X→next→random;
X = X→next;
}
//Step3: Repair damage to old list and fill in next pointer in new list.
X = head;
while (X != NULL) {
Y = X→random;
X→random = Y→next;
Y→next = X→next→random;
X = X→next;
}
}
Time Complexity: O(3𝑛) ≈O(𝑛). Space Complexity: O(1).
Problem-47 We are given a pointer to a node (not the tail node) in a singly linked list. Delete that node from
the linked list.
Solution: To remove a node, the next pointer of the previous node must be adjusted to point to the following node
instead of the current one. However, if we don't have access to a pointer for the previous node, we cannot redirect
its next pointer. In such a scenario, we can simply transfer the data from the next node into the current node,
and then delete the next node to circumvent the issue.
void deleteNodeinLinkedList( struct ListNode * node ){
struct ListNode * temp = node->next;
node->data = node->next->data;
node->next = temp->next;
free(temp);
}
Time Complexity: O(1). Space Complexity: O(1).
Problem-48 Given a linked list with even and odd numbers, create an algorithm for making changes to the
list in such a way that all even numbers appear at the beginning.
Solution: One approach to resolve this issue is to utilize a splitting logic. During the linked list traversal, the list
can be split into two parts: one consisting of all even nodes, and the other comprising all odd nodes. After this, to
obtain the final list, we can concatenate the odd node linked list to the even node linked list.
To accomplish the splitting of the linked list, we can iterate through the original linked list and transfer all odd
nodes to a separate linked list dedicated to odd nodes. Upon completion of the loop, the original list will contain
only the even nodes, and the odd node list will include all the odd nodes. To maintain the original ordering of all
nodes, we must insert all odd nodes at the end of the odd node list.
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-49 In a linked list with 𝑛 nodes, the time taken to insert an element after an element pointed by some
pointer is
(A) O(1) (B) O(𝑙𝑜𝑔𝑛) (C) O(𝑛) (D) O(𝑛1𝑜𝑔𝑛)
Solution: A.
Problem-50 Find modular node: Given a singly linked list, write a function to find the last element from the
beginning whose 𝑛%𝑘 == 0, where 𝑛 is the number of elements in the list and 𝑘 is an integer constant. For
example, if 𝑛 = 19 and 𝑘 = 3 then we should return 18𝑡ℎ node.
Solution: For this problem the value of 𝑛 is not known in advance.
struct ListNode *modularNodeFromBegin(struct ListNode *head, int k){
struct ListNode * modularNode;
int i=0;
if(k<=0)
return NULL;
for (;head != NULL; head = head→next){
if(i%k == 0){
modularNode = head;
} else { modularNode = modularNode→next}
i++;
}
return modularNode;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-51 Find modular node from the end: Given a singly linked list, write a function to find the first
from the end whose 𝑛%𝑘 == 0, where 𝑛 is the number of elements in the list and 𝑘 is an integer constant. If 𝑛 =
19 and 𝑘 = 3 then we should return 16𝑡ℎ node.
Solution: For this problem the value of 𝑛 is not known in advance. To solve this problem, we are going to combine
the solutions of finding the 𝑘 𝑡ℎ element from the end of the linked list and finding the modular node from the
beginning. Everytime, whenever 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑚𝑜𝑑 𝑘 becomes zero, we reinitialize the modular node to point to
𝑘 𝑡ℎ node behind the current location.
struct ListNode * modularNodeFromEnd(struct ListNode * head, int k) {
struct ListNode * modularNode = NULL;
int i = 0, j;
if (k <= 0)
return NULL;
for (i = 0; i < k; i++) { // Move the head pointer for k steps so that,
if (head) // we get a difference of k nodes between the pointers
head = head→ next;
else
break;
}
while (head != NULL) { // Maintain the k distance between the pointers, and move till the end
modularNode = modularNode→ next;
head = head→ next;
i++
}
j: = k - (i % k);
while (j > 0 && modularNode != NULL) { // Now, adjust the modularNode to point to the correct node
modularNode = modularNode→ next;
j--;
}
return modularNode;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
𝑛
Problem-52 Find fractional node: Given a singly linked list, write a function to find the 𝑡ℎ element, where
𝑘
𝑛 is the number of elements in the list.
Solution: For this problem the value of 𝑛 is not known in advance.
struct ListNode *fractionalNodes(struct ListNode *head, int k){
struct ListNode *fractionalNode = NULL;
int i=0;
if(k<=0)
return NULL;
for (;head != NULL; head = head→next){
if(i%k == 0){
if(fractionalNode == NULL)
fractionalNode = head;
else fractionalNode = fractionalNode→next;
}
i++;
}
return fractionalNode;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
𝒕𝒉 𝑡ℎ
Problem-53 Find √𝒏 node: Given a singly linked list, write a function to find the √𝑛 element, where 𝑛 is
the number of elements in the list. Assume the value of 𝑛 is not known in advance.
Solution: For this problem the value of 𝑛 is not known in advance. Hence, we would increase the second pointer
for every perfect square position.
struct ListNode *sqrtNode(struct ListNode *head){
struct ListNode *sqrtN = NULL;
int i=1, j=1;
for (;head != NULL; head = head→next){
if(i == j*j){
if(sqrtN == NULL)
sqrtN = head;
else
sqrtN = sqrtN→next;
j++;
}
i++;
}
return sqrtN;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-54 Given two lists List1 = {𝐴1 , 𝐴2 , . . . , 𝐴𝑛 } and List2 = {𝐵1 , 𝐵2 , . . . , 𝐵𝑚 } with data (both lists) in
ascending order. Merge them into the third list in ascending order so that the merged list will be:
{𝐴1 , 𝐵1 , 𝐴2 , 𝐵2 ..... 𝐴𝑚 , 𝐵𝑚 , 𝐴𝑚+1 .... 𝐴𝑛 } if 𝑛 >= 𝑚
{𝐴1 , 𝐵1 , 𝐴2 , 𝐵2 ..... 𝐴𝑛 , 𝐵𝑛 , 𝐵𝑛+1 .... 𝐵𝑚 } if 𝑚 >= 𝑛
Solution: To merge two sorted linked lists, we need to pay attention to the order of their values. We iterate both
lists at the same time, but go to next only on the list that has a current node value smaller than the one in the
other current list. We have to handle the cases when one or the other list are empty.
struct ListNode*alternateMerge(struct ListNode *List1, struct ListNode *List2){
struct ListNode *newNode = (struct ListNode*) (malloc(sizeof(struct ListNode)));
struct ListNode *temp;
newNode→next = NULL;
temp = newNode;
while (List1!=NULL and List2!=NULL){
temp→next = List1;
temp = temp→next;
List1 = List1→next;
temp→next = List2;
List2 = List2→next;
temp = temp→next;
}
if (List1!=NULL)
temp→next = List1;
else
temp→next = List2;
temp = newNode→next;
free(newNode);
return temp;
}
Time Complexity: The 𝑤ℎ𝑖𝑙𝑒 loop takes O(𝑚𝑖𝑛(𝑛, 𝑚)) time as it will run for 𝑚𝑖𝑛(𝑛, 𝑚) times. The other steps run in
O(1). Therefore the total time complexity is O(𝑚𝑖𝑛(𝑛, 𝑚)). Space Complexity: O(1).
Problem-55 Find median in an infinite series of integers.
Solution: Median is the middle number in a sorted list of numbers (if we have an odd number of elements). If we
have an even number of elements, the median is the average of two middle numbers in a sorted list of numbers.
We can solve this problem with linked lists (with both sorted and unsorted linked lists).
𝐹𝑖𝑟𝑠𝑡, let us try with an 𝑢𝑛𝑠𝑜𝑟𝑡𝑒𝑑 linked list. In an unsorted linked list, we can insert the element either at the
head or at the tail. The disadvantage with this approach is that finding the median takes O(𝑛). Also, the insertion
operation takes O(1).
Now, let us try with a 𝑠𝑜𝑟𝑡𝑒𝑑 linked list. We can find the median in O(1) time if we keep track of the middle
elements. Insertion to a particular location is also O(1) in any linked list. But, finding the right location to insert
is not O(𝑙𝑜𝑔𝑛) as in a sorted array, it is instead O(𝑛) because we can’t perform binary search in a linked list even
if it is sorted. So, using a sorted linked list isn’t worth the effort as insertion is O(𝑛) and finding median is O(1),
the same as the sorted array. In the sorted array the insertion is linear due to shifting, but here it’s linear because
we can’t do a binary search in a linked list.
Note: For an efficient algorithm refer to the 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 𝑎𝑛𝑑 𝐻𝑒𝑎𝑝𝑠 chapter.
Problem-56 Given a linked list, how do you modify it such that all the even numbers appear before all the odd
numbers in the modified linked list?
Solution:
struct ListNode *exchangeEvenOddList(struct ListNode *head){
// initializing the odd and even list headers
struct ListNode *oddList = NULL, *evenList =NULL;
// creating tail variables for both the list
struct ListNode *oddListEnd = NULL, *evenListEnd = NULL;
struct ListNode *itr=head;
if( head == NULL ){
return;
}
else{
while( itr != NULL ){
if( itr→data % 2 == 0 ){
if( evenList == NULL ){
// first even node
evenList = evenListEnd = itr;
}
else{
// inserting the node at the end of linked list
evenListEnd→next = itr;
evenListEnd = itr;
}
}
else{
if( oddList == NULL ){
// first odd node
oddList = oddListEnd = itr;
}
else{
// inserting the node at the end of linked list
oddListEnd→next = itr;
oddListEnd = itr;
}
}
itr = itr→next;
}
evenListEnd→next = oddList;
return head;
}
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-57 Given two linked lists, each list node with one integer digit, add these two linked lists. The result
should be stored in the third linked list. Also note that the head node contains the most significant digit of the
number.
Solution: Since the integer addition starts from the least significant digit, we first need to visit the last node of
both lists and add them up, create a new node to store the result, take care of the carry if any, and link the
resulting node to the node which will be added to the second least significant node and continue.
First of all, we need to take into account the difference in the number of digits in the two numbers. So before
starting recursion, we need to do some calculation and move the longer list pointer to the appropriate place so
that we need the last node of both lists at the same time. The other thing we need to take care of is 𝑐𝑎𝑟𝑟𝑦. If two
digits add up to more than 10, we need to forward the 𝑐𝑎𝑟𝑟𝑦 to the next node and add it. If the most significant
digit addition results in a 𝑐𝑎𝑟𝑟𝑦, we need to create an extra node to store the 𝑐𝑎𝑟𝑟𝑦.
The function below is actually a wrapper function which does all the housekeeping like calculating lengths of lists,
calling recursive implementation, creating an extra node for the 𝑐𝑎𝑟𝑟𝑦 in the most significant digit, and adding
any remaining nodes left in the longer list.
void addListNumbersWrapper(struct ListNode *list1, struct ListNode *list2, int *carry, struct ListNode **result){
Solution: Simple Insertion sort is easily adaptable to singly linked lists. To insert an element, the linked list is
traversed until the proper position is found, or until the end of the list is reached. It is inserted into the list by
merely adjusting the pointers without shifting any elements, unlike in the array. This reduces the time required
for insertion but not the time required for searching for the proper position.
Problem-59 Given a list, List1 = {𝐴1 , 𝐴2 , . . . 𝐴𝑛−1 , 𝐴𝑛 } with data, reorder it to {𝐴1 , 𝐴𝑛 , 𝐴2 , 𝐴𝑛−1 ..... } without using
any extra space.
Solution: To reorder a list in the desired way without using any extra space, follow the steps below:
• Find the middle point of the input list using the slow and fast pointer technique. If the length of the list
is even, the middle point will be the second of the two middle nodes.
• Reverse the second half of the list starting from the node after the middle point.
• Merge the first half of the list with the reversed second half of the list by alternating the nodes from each
half.
• Set the next pointer of the last node in the merged list to NULL to terminate the list.
The time complexity of the function is O(n), where n is the length of the input linked list. This is because we
traverse each node in the list exactly once and perform constant time operations on each node. The space
complexity is O(1), because we are only using a constant amount of extra space for the temporary pointers.
Problem-60 Given two sorted linked lists, given an algorithm to return the common elements of them.
Solution: The easiest way to build up a list with common elements is by adding nodes at the beginning. The
disadvantage is that the elements will appear in the list in the reverse order that they are added.
// Solution to give common elements in the reverse
struct ListNode * intersection1(struct ListNode *list1, struct ListNode *list2) {
struct ListNode *head = NULL;
while (list1 != NULL && list2 != NULL) {
if (list1->data == list2->data) {
push(&head, list1->data); // Copy common element.
list1 = list1->next;
list2 = list2->next;
} else if (list1->data > list2->data) {
list2 = list2->next;
} else { // list1->data < list2->data
list1 = list1->next;
}
}
return head;
}
Time complexity O(𝑚 + 𝑛), where 𝑚 is the lengh of list1 and 𝑛 is the length of list2. Space Complexity: O(1).
What if we want to get the common elements in the increasing order (or in the same order of their appearance)?
i.e., What about adding nodes at the "tail end" of the list? Adding a node at the tail of a list most often involves
locating the last node in the list, and then changing its .next field from NULL to point to the new node, such as
the tail variable in the following example of adding a "3" node to the end of the list {1, 2}...
This is just a special case of the general rule: to insert or delete a node inside a list, you need a pointer to the node
just before that position, so you can change its .next field. Many list problems include the sub-problem of
advancing a pointer to the node before the point of insertion or deletion. The one exception is if the operation falls
on the first node in the list — in that case the head pointer itself must be changed.
Consider the problem of building up the list {1, 2, 3, 4, 5} by appending the nodes to the tail end. The difficulty is
that the very first node must be added at the head pointer, but all the other nodes are inserted after the last node
using a tail pointer. The simplest way to deal with both cases is to just have two separate cases in the code. Special
case code first adds the head node {1}. Then there is a separate loop that uses a tail pointer to add all the other
nodes. The tail pointer is kept pointing at the last node, and each new node is added at tail->next. The only
"problem" with this solution is that writing separate special case code for the first node is a little unsatisfying.
Nonetheless, this approach is a solid one for production code — it is simple and runs fast.
struct ListNode * intersection(struct ListNode *list1, struct ListNode *list2) {
struct ListNode *head = NULL;
struct ListNode* tail;
while (list1 != NULL && list2 != NULL) {
if (list1->data == list2->data) {
if (head == NULL) {
push(&head, list1->data);
tail = head;
}
else{
push(&tail->next, list1->data);
tail = tail->next;
}
list1 = list1->next;
list2 = list2->next;
} else if (list1->data > list2->data) {
list2 = list2->next;
} else { // list1->data < list2->data
list1 = list1->next;
}
}
return head;
}
There is a slightly unusual technique that can be used to shorten the code: The strategy here uses a temporary
dummy node as the start of the result list. The trick is that with the dummy, every node appears to be added after
the next field of some other node. That way the code for the first node is the same as for the other nodes. The tail
pointer plays the same role as in the previous example. The difference is that now it also handles the first node as
well.
The pointer tail always points to the last node in the result list, so appending new nodes is easy. The dummy node
gives tail something to point to initially when the result list is empty. This dummy node is efficient, since it is only
temporary, and it is allocated in the stack. The loop proceeds, removing one node from either ‘list1’ or ‘list2’, and
adding it to tail. When we are done, the result is in dummy.next.
// Solution uses the temporary dummy node to build up the result list
struct ListNode * intersection(struct ListNode *list1, struct ListNode *list2) {
struct ListNode dummy;
struct ListNode *tail = &dummy;
dummy.next = NULL;
while (list1 != NULL && list2 != NULL) {
if (list1->data == list2->data) {
push((&tail->next), list1->data); // Copy common element.
list1 = list1->next;
list2 = list2->next;
tail = tail->next;
} else if (list1->data > list2->data) {
list2 = list2->next;
} else { // list1->data < list2->data
list1 = list1->next;
}
}
return dummy.next;
}
Problem-61 Sort the linked list elements in O(𝑛), where 𝑛 is the number of elements in the linked list.
Solution: Refer 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter.
Problem-62 Partition list: Given a linked list and a value X, partition it such that all nodes less than X come
before nodes greater than or equal to X. Notice that, you should preserve the original relative order of the nodes
in each of the two partitions.
2 5 4 3 6 3
Head
For example, the above linked list with X = 4 should return the following linked list.
2 3 3 5 4 6
Head
Solution: The problem wants us to rearrange the linked list elements, such that the elements lesser than value
X, come before the elements greater or equal to X. This essentially means in this rearranged list, there would be
a point in the linked list before which all the elements would be smaller than X and after which all the elements
would be greater or equal to X. Let's call this point as the 𝑝𝑖𝑣𝑜𝑡. Careful observation tells us that if we break the
rearranged list at the 𝑝𝑖𝑣𝑜𝑡, we will get two smaller linked lists, one with lesser elements and the other with
elements greater or equal to X. In the solution, our main aim is to create these two linked lists and join them.
We can take two pointers lesser and greater to keep track of the two linked lists as described above. These two
pointers could be used two create two separate lists and then these lists could be combined to form the desired
rearranged list. We will traverse the original linked list as usual, and depending upon a node's value, we will
append it into one of the partitions.
Algorithm:
1. Initialize two pointers lesser and greater with None.
2. Iterate the original linked list, using the head pointer. If the node's value pointed by head is lesser than
X, the node should be part of the lesser list. So, we move it to lesser list. Else, the node should be part of
greater list. So, we move it to greater list.
3. Once we are done with all the nodes in the original linked list, we would have two list lesser and greater.
The original list nodes are either part of lesser list or greater list, depending on its value.
4. Now, these two lists lesser and greater can be combined to form the reformed list.
Since we traverse the original linked list from left to right, at no point would the order of nodes change relatively
in the two lists. Another important thing to note here is that we show the original linked list intact in the above
diagrams. However, in the implementation, we remove the nodes from the original linked list and attach them in
the lesser or greater list. We don't utilize any additional space. We simply move the nodes from the original list
around.
struct ListNode {
int data;
ListNode * next;
ListNode( int d ) : data{ d }, next{ NULL } { }
};
/* Start with a new list. Elements bigger than the pivot element are put at the greater list
and elements smaller are put at the lesser list*/
ListNode * partition( ListNode * head , int X ) {
ListNode * lesser = NULL;
ListNode * lesserHead = NULL; /*The initial node of list lesser*/
ListNode * greater = NULL;
ListNode * greaterHead = NULL; /*The initial node of list greater*/
while( head != NULL ) {
ListNode * nextNode = head->next;
if ( head->data < X ) {
if (lesser == NULL) {
lesser = head;
lesserHead = lesser;
}
//insert head node to lesser list
lesser->next = head;
lesser = head;
} else {
if (greater == NULL) {
greater = head;
greaterHead = greater;
}
// insert head node to greater list
greater->next = head;
greater = head;
}
head = nextNode;
}
lesser->next = greaterHead; /*Now, we connect the lesser list to greater list.*/
greater->next = NULL;
return lesserHead;
}
Time complexity: O(𝑛), where 𝑛 is the number of nodes in the original linked list and we iterate the original list.
Space complexity: O(1), we have not utilized any extra space, the point to note is that we are reforming the original
list, by moving the original nodes, we have not used any extra space as such.
Problem-63 Give an algorithm to delete duplicate nodes from the sorted linked list. Ideally, the list should
only be traversed once.
Solution: Since the list is sorted, to remove duplicates from a sorted linked list, we can simply traverse the list
and keep track of the last unique node we encountered. When we encounter a new node, we check if it is equal to
the last unique node. If it is, we skip it and continue to the next node. If it is not, we set the last unique node to
the current node and continue.
struct ListNode {
int data;
struct ListNode *next;
};
struct ListNode* deleteDuplicates(struct ListNode* head) {
if (head == NULL) {
return NULL;
}
struct ListNode *last_unique = head;
struct ListNode *node = head->next;
while (node != NULL) {
if (node->data == last_unique->data) {
last_unique->next = node->next;
} else {
last_unique = node;
}
node = node->next;
}
return head;
}
The time complexity of the algorithm to remove duplicates from a sorted linked list is O(n), where n is the number
of nodes in the list. This is because we need to visit each node in the list exactly once in the worst case.
The space complexity of the algorithm is O(1), because we are modifying the existing linked list in place and not
using any additional data structures. We only need to store a constant number of pointers to keep track of the
last unique node and the current node. Therefore, the space used by the algorithm is independent of the size of
the input list.
Problem-64 Give an algorithm that takes a list and divides up its nodes to make two smaller lists. The sublists
should be made from alternating elements in the original list. So if the original list is {1, 2, 3, 4, 5}, then one
sublist should be {1, 3, 5} and the other should be {2, 4}.
Solution: The simplest approach iterates over the given linked list and pull nodes off the list and alternately put
them on lists 'a' and b'. The only strange part is that the nodes will be in the reverse order that they occurred in
the source list.
void alternatingSplit(struct ListNode* head, struct ListNode** head1, struct ListNode** head2) {
struct ListNode* a = NULL; // Split the nodes to these 'a' and 'b' lists
struct ListNode* b = NULL;
struct ListNode* current = head;
while (current != NULL) {
// Move a ListNode to 'a'
struct ListNode* newNode = current; // the front current node
current = newNode->next; // Advance the current pointer
newNode->next = a; // Link the node with the head of list a
a = newNode;
// Move a ListNode to 'b'
if (current != NULL) {
struct ListNode* newNode = current; // the front source node
current = newNode->next; // Advance the source pointer
newNode->next = b; // Link the node with the head of list b
b = newNode;
}
}
*head1 = a;
*head2 = b;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-65 We are given head, the head node of a linked list containing unique integer values. We are also
given the list S, a subset of the values in the linked list. Return the number of connected components in S,
where two values are connected if they appear consecutively in the linked list.
Solution: Scanning through the list, if node.data is in S and node.next.data isn't (including if node.next is null),
then this must be the end of a connected component. For example, if the list is 0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7,
and S = [0, 2, 3, 5, 7], then when scanning through the list, we fulfill the above condition at 0, 3, 5, 7, for a total
answer of 4.
class Solution {
public:
int numComponents(ListNode* head, vector<int>& S) {
// set of all the subset elements which are considered unvisited initially
unordered_set<int> unvisited(begin(S), end(S));
int connComponents = 0;
ListNode* iter = head;
// As we traverse through the list, we find all the nodes connected in S
// from the current node and remove them unvisited set
while (iter) {
// check if current node is in S or not
if (unvisited.find(iter->data) != unvisited.end()) {
++connComponents;
// now find all the nodes connected via current node
// and remove them from unvisited set
while (iter && unvisited.find(iter->data) != unvisited.end()) {
// remove the current from list of unvisited
unvisited.erase(iter->data);
iter = iter->next;
}
}
if (iter)
iter = iter->next;
}
return connComponents;
}
};
Problem-66 Given a pointer to head of a linked list, give an algorithm to repeatedly delete consecutive
sequences of nodes that sum to 0 until there are no such sequences. Input: 1->2->-3->3->1 Output: 3->1
Solution: The basic idea of this problem is to record an accumulated sum from head to current position. If we
define sum(i) = summation from index 0 to index i and we find sum(i) == sum(j) when i != j, then we can imply
that the sum within index (i, j] is 0. So, we can use a map to record the first time this accumulated sum appears,
and if we find the same accumulated sum, we discard all nodes between these two nodes. Although we may modify
some discarded nodes' next pointer value, it has no impact on our answer.
class RemoveZeroSumSublists {
public:
// Deletes a linked list
void deleteList(ListNode* head) {
ListNode *nextptr= nullptr;
while(head) {
nextptr = head->next;
//delete head;
head = nextptr;
}
}
ListNode* removeZeroSumSublists(ListNode* head) {
// (sum: pos): cumulative sum till a given node
unordered_map<int, ListNode*> sum_pos;
ListNode *start = nullptr, *end = nullptr,
*after_start = nullptr, *after_end = nullptr;
long long sum = 0;
// dummy head for easier head ops
ListNode *dummy = new ListNode(0);
dummy->next = head;
// initial sum 0 for position before head
sum_pos[0] = dummy;
// Store the cumulative sum till each node, for the same cumulative sum we store the latest position
ListNode *curr = head;
while(curr) {
sum += curr->data;
sum_pos[sum] = curr;
curr = curr->next;
}
// We compute the cumulative sum again, this time for each sum we check that farthest
// position where it is found again and delete the list till that position
sum = 0;
curr = dummy;
while(curr) {
sum += curr->data;
// check the position this sum is last seen
ListNode *last_seen_pos = sum_pos[sum];
// delete the sublist to avoid memory leaks
after_start = curr->next;
after_end = last_seen_pos->next;
// when it is not the same node and trhere is somethign to delete
if(last_seen_pos != curr) {
last_seen_pos->next = nullptr;
deleteList(after_start);
}
curr->next = after_end;
curr = curr->next;
}
head = dummy->next;
delete dummy;
return head;
}
};
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-67 How do you determine whether a singly linked list is a palindrome?
Solution: To determine whether a singly linked list is a palindrome or not, we can follow these steps:
• Traverse the linked list and push each node's value into a stack.
• Traverse the linked list again and compare each node's value with the value at the top of the stack. If they
match, pop the value from the stack and continue. If they don't match, return false.
• If we have traversed the entire linked list and all values have matched, return true.
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack.
For code, refer 𝑆𝑡𝑎𝑐𝑘𝑠 chapter.
Problem-68 Can we solve the above problem without using stacks?
Solution: As an alternative, we can use the following approach:
• Traverse the list and create a new list in reverse order.
• Traverse the original list and the new list in parallel, comparing each node to see if they are equal.
• If all nodes are equal, the list is a palindrome; otherwise, it is not.
struct ListNode {
int val;
struct ListNode *next;
};
bool isPalindrome(struct ListNode* head) {
// Step 1: Traverse the list and create a new list in reverse order.
struct ListNode *reversed_list = NULL;
struct ListNode *current_node = head;
while (current_node != NULL) {
struct ListNode *new_node = (struct ListNode*) malloc(sizeof(struct ListNode));
new_node->val = current_node->val;
new_node->next = reversed_list;
3.12 Linked Lists: Problems & Solutions 94
Data Structures and Algorithms Made Easy Linked Lists
reversed_list = new_node;
current_node = current_node->next;
}
// Step 2: Traverse the original list and the new list in parallel, comparing each node to see if they are equal.
current_node = head;
while (current_node != NULL) {
if (current_node->val != reversed_list->val) {
return false;
}
current_node = current_node->next;
reversed_list = reversed_list->next;
}
// Step 3: If all nodes are equal, the list is a palindrome; otherwise, it is not.
return true;
}
The time complexity of the above code is O(n), where n is the number of nodes in the linked list. This is because
we traverse the linked list twice - once to reverse it and once to compare it with the original list. Both traversals
take O(n) time.
The space complexity of the above code is also O(n), because we create a new linked list to store the reversed
nodes of the original linked list. The space required for the reversed linked list is the same as the space required
for the original linked list, which is O(n).
Problem-69 Can we solve the above problem without extra space?
Solution: To improve the algorithm for determining whether a linked list is a palindrome, consider using the
following alternative approach:
• Find the middle of the linked list using the slow and fast pointer technique.
• Reverse the second half of the linked list.
• Compare the first half of the linked list with the reversed second half.
• If the two halves are equal, then the linked list is a palindrome. Otherwise, it is not.
struct ListNode {
int val;
struct ListNode *next;
};
bool isPalindrome(struct ListNode* head) {
// Find the middle of the linked list
struct ListNode *slow = head, *fast = head;
while (fast && fast->next) {
slow = slow->next;
fast = fast->next->next;
}
// Reverse the second half of the linked list
struct ListNode *prev = NULL, *curr = slow;
while (curr) {
struct ListNode *next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
}
// Traverse both halves and compare the nodes' values
struct ListNode *first_half = head, *second_half = prev;
while (second_half) {
if (first_half->val != second_half->val) {
return false;
}
first_half = first_half->next;
second_half = second_half->next;
}
return true;
}
This solution has a time complexity of O(n) and a space complexity of O(1), since we're only using constant space
for the slow and fast pointers, and the reversing of the second half of the linked list is done in place.
Problem-70 Add Two Numbers: We are given two non-empty linked lists representing two non-negative
integers. The most significant digit comes first and each of their nodes contains a single digit. Add the two
numbers and return the sum as a linked list.
Solution: To solve this problem, we can traverse both linked lists simultaneously and add the corresponding
digits, starting from the least significant digit. We can keep track of a carry variable that will be added to the sum
of the next pair of digits. We will create a new linked list to store the sum and keep adding new nodes to the front
of the list as we compute the sum. At the end, if there is a carry, we add it to the front of the list as well. The steps
are:
1. Initialize a carry variable to 0.
2. Reverse both linked lists.
3. Traverse the linked lists simultaneously until both are empty:
a. If one linked list is longer than the other, treat the missing nodes as having a value of 0.
b. Add the values of the current nodes, along with the carry, and update the carry if necessary.
c. Create a new node for the sum, with a value equal to the sum mod 10.
d. Update the carry to be the sum divided by 10.
e. Add the new node to the front of the result linked list.
f. Move to the next nodes in both linked lists.
4. If there is a remaining carry, create a new node for it and add it to the front of the result linked list.
5. Reverse the result linked list.
6. Return the head of the result linked list.
struct ListNode* addTwoNumbers(struct ListNode* l1, struct ListNode* l2) {
// Reverse the input linked lists
struct ListNode *prev1 = NULL, *curr1 = l1;
while (curr1 != NULL) {
struct ListNode *next = curr1->next;
curr1->next = prev1;
prev1 = curr1;
curr1 = next;
}
l1 = prev1;
struct ListNode *prev2 = NULL, *curr2 = l2;
while (curr2 != NULL) {
struct ListNode *next = curr2->next;
curr2->next = prev2;
prev2 = curr2;
curr2 = next;
}
l2 = prev2;
// Add the two linked lists
int carry = 0;
struct ListNode *head = NULL;
while (l1 != NULL || l2 != NULL || carry > 0) {
int val1 = (l1 != NULL) ? l1->data : 0;
int val2 = (l2 != NULL) ? l2->data : 0;
int sum = val1 + val2 + carry;
carry = sum / 10;
sum = sum % 10;
struct ListNode *newNode = (struct ListNode*) malloc(sizeof(struct ListNode));
newNode->data = sum;
newNode->next = head;
head = newNode;
if (l1 != NULL) {
l1 = l1->next;
}
if (l2 != NULL) {
l2 = l2->next;
}
3.12 Linked Lists: Problems & Solutions 96
Data Structures and Algorithms Made Easy Linked Lists
}
// Reverse the result linked list
struct ListNode *prev = NULL, *curr = head;
while (curr != NULL) {
struct ListNode *next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
}
return prev;
}
The time complexity of this algorithm is O(max(m, n)), where m and n are the lengths of the two input linked lists,
since we need to traverse both lists once. The space complexity is also O(max(m, n)), since we are creating a new
linked list to store the result.
Problem-71 Plus One Linked List: Given a non-negative integer represented as a linked list of digits, plus
one to the integer.
Solution: To solve this problem, we can use a modified version of the typical "adding two numbers" linked list
algorithm. Instead of adding two numbers, we just need to add 1 to the given linked list. We can follow the following
algorithm:
1. Traverse the linked list and find the last node that is not 9.
2. If all nodes are 9, create a new node with a value of 1 and add it to the front of the list.
3. Otherwise, add 1 to the value of the last non-9 node.
4. Set all subsequent nodes to 0.
5. Return the head of the updated linked list.
struct ListNode* plusOneLinkedList(struct ListNode* head) {
// Step 1: Traverse the linked list and find the last non-9 node
struct ListNode* curr = head;
struct ListNode* lastNonNineNode = NULL;
while (curr != NULL) {
if (curr->data != 9) {
lastNonNineNode = curr;
}
curr = curr->next;
}
// Step 2: If all nodes are 9, create a new node with value 1 and add it to the front of the list
if (lastNonNineNode == NULL) {
struct ListNode* newHead = malloc(sizeof(struct ListNode));
newHead->data = 1;
newHead->next = head;
curr = head;
while (curr != NULL) {
curr->data = 0;
curr = curr->next;
}
return newHead;
}
// Step 3: Add 1 to the value of the last non-9 node
lastNonNineNode->data += 1;
// Step 4: Set all subsequent nodes to 0
curr = lastNonNineNode->next;
while (curr != NULL) {
curr->data = 0;
curr = curr->next;
}
// Step 5: Return the head of the updated linked list
return head;
}
This algorithm has a time complexity of O(n), where n is the length of the linked list, and a space complexity of
O(1), as we are only using constant space to store pointers to nodes in the linked list.
3.12 Linked Lists: Problems & Solutions 97
Data Structures and Algorithms Made Easy Linked Lists
Problem-72 Delete N Nodes After M Nodes of a Linked List: We are given the head of a linked list and two
integers m and n. Traverse the linked list and remove some nodes in the following way:
• Start with the head as the current node.
• Keep the first m nodes starting with the current node.
• Remove the next n nodes.
• Keep repeating steps 2 and 3 until you reach the end of the list.
• Return the head of the modified list after removing the mentioned nodes.
Solution: To solve this problem, we can simply traverse the linked list and keep track of two pointers - one pointing
to the current node and the other pointing to the previous node. We can use these pointers to skip over the nodes
that need to be removed. Here is the step-by-step algorithm:
1. Initialize two pointers curr and prev to point to the head of the list.
2. Traverse the list using a loop that runs until the end of the list is reached.
3. Inside the loop, skip over the first m nodes by moving the prev pointer m times and the curr pointer m +
n times.
4. If the curr pointer becomes NULL after skipping over m nodes, we have reached the end of the list and we
can return the head of the modified list.
5. Otherwise, skip over the next n nodes by moving the curr pointer n times.
6. Set the next pointer of the node pointed to by the prev pointer to the node pointed to by the curr pointer.
7. Set the prev pointer to the node pointed to by the curr pointer.
8. Repeat steps 3-7 until the end of the list is reached.
struct ListNode {
int val;
struct ListNode* next;
};
struct ListNode* deleteNodes(struct ListNode* head, int m, int n) {
struct ListNode* curr = head;
struct ListNode* prev = NULL;
while (curr != NULL) {
for (int i = 0; i < m && curr != NULL; i++) {
prev = curr;
curr = curr->next;
}
for (int i = 0; i < n && curr != NULL; i++) {
curr = curr->next;
}
if (curr == NULL) {
prev->next = NULL;
break;
}
prev->next = curr;
prev = curr;
}
return head;
}
The time complexity of this algorithm is O(n), where n is the number of nodes in the list, because we traverse the
list once and perform constant-time operations on each node. The space complexity of the algorithm is O(1),
because we use a constant amount of extra space to store the curr and prev pointers.
Chapter
Stacks 4
4.1 What is a Stack DataStructure?
A stack is a simple data structure used for storing data (similar to Linked Lists). In a stack, the order in which the
data arrives is important. A stack data structure is an abstract data type that represents a collection of elements
with two main operations: push and pop. It is a Last-In-First-Out (LIFO) data structure, meaning that the last
element added to the stack will be the first one to be removed.
A stack can be thought of as a physical stack of plates, where you can add a new plate on top of the stack (push
operation) and remove the topmost plate from the stack (pop operation). You can only access the top element of
the stack, and you cannot access or modify the elements below it until the top element is removed.
Definition: A 𝑠𝑡𝑎𝑐𝑘 is an ordered list in which insertion and deletion are done at one end, called 𝑡𝑜𝑝. The last
element inserted is the first one to be deleted. Hence, it is called the Last in First out (LIFO) or First in Last out
(FILO) list.
Special names are given to the two changes that can be made to a stack. When an element is inserted in a stack,
the concept is called 𝑝𝑢𝑠ℎ, and when an element is removed from the stack, the concept is called 𝑝𝑜𝑝. Trying to
pop out an empty stack is called 𝑢𝑛𝑑𝑒𝑟𝑓𝑙𝑜𝑤 and trying to push an element in a full stack is called 𝑜𝑣𝑒𝑟𝑓𝑙𝑜𝑤.
Generally, we treat them as exceptions. As an example, consider the snapshots of the stack.
Push D Pop D
D top
top top
C C C
B B B
A A A
Exceptions
Attempting the execution of an operation may sometimes cause an error condition, called an exception. Exceptions
are said to be “thrown” by an operation that cannot be executed. In the Stack ADT, operations pop and top cannot
be performed if the stack is empty. Attempting the execution of pop (top) on an empty stack throws an exception.
Trying to push an element in a full stack throws an exception.
4.4 Applications
Following are some of the applications in which stacks play an important role.
Direct applications
• Balancing of symbols
• Infix-to-postfix conversion
• Evaluation of postfix expression
• Implementing function calls (including recursion)
• Finding of spans (finding spans in stock markets, refer to 𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 section)
• Page-visited history in a Web browser [Back Buttons]
• Undo sequence in a text editor
• Matching Tags in HTML and XML
Indirect applications
• Auxiliary data structure for other algorithms (Example: Tree traversal algorithms)
• Component of other data structures (Example: Simulating queues, refer 𝑄𝑢𝑒𝑢𝑒𝑠 chapter)
4.5 Implementation
There are many ways of implementing stack ADT; below are the commonly used methods.
• Simple array based implementation
• Dynamic array based implementation
• Linked lists implementation
S ……..
top
The array storing the stack elements may become full. A push operation will then throw a 𝑓𝑢𝑙𝑙 𝑠𝑡𝑎𝑐𝑘 𝑒𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛.
Similarly, if we try deleting an element from an empty stack it will throw 𝑠𝑡𝑎𝑐𝑘 𝑒𝑚𝑝𝑡𝑦 𝑒𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛.
#include<stdio.h>
#include <stdlib.h>
#include <limits.h>
struct Stack {
int top;
int capacity;
int *array;
};
struct Stack *createStack(int capacity) {
struct Stack *S = malloc(sizeof(struct Stack));
if(!S)
return NULL;
S->capacity = capacity;
S->top = -1;
S->array= malloc(S->capacity * sizeof(int));
if(!S->array)
return NULL;
return S;
}
int isEmpty(struct Stack *S) {
return (S->top == -1); // if the condition is true then 1 is returned else 0 is returned
}
int size(struct Stack *S) {
return (S->top + 1);
}
int isFull(struct Stack *S){
//if the condition is true then 1 is returned else 0 is returned
return (S->top == S->capacity - 1);
}
void push(struct Stack *S, int data){
/* S->top == capacity -1 indicates that the stack is full*/
if(isFull(S))
printf( "Stack Overflow\n");
else /*Increasing the ‘top’ by 1 and storing the value at ‘top’ position*/
S->array[++S->top] = data;
}
int pop(struct Stack *S){
/* S->top == - 1 indicates empty stack*/
if(isEmpty(S)){
printf("Stack is Empty\n");
return INT_MIN;
}
else /* Removing element from ‘top’ of the array and reducing ‘top’ by 1*/
return (S->array[S->top--]);
}
int peek(struct Stack *S){
/* S->top == - 1 indicates empty stack*/
if(isEmpty(S)){
printf("Stack is Empty");
return INT_MIN;;
}
else
return (S->array[S->top]);
}
void deleteStack(struct Stack *S){
if(S) {
if(S->array)
free(S->array);
free(S);
}
}
int main(){
int i = 0, capacity = 15;
// create a stack of capacity 15
struct Stack *stk = createStack(capacity);
for(i = 0; i <= capacity; i++){
push(stk, i);
}
printf("Top element is %d\n", peek(stk));
printf("Stack size is %d\n", size(stk));
for (i = 0; i <= capacity; i++){
printf("Popped element is %d\n", pop(stk));
}
if (isEmpty(stk))
printf("Stack is empty");
else
printf("Stack is not empty");
deleteStack(stk);
return 0;
}
Limitations
The maximum size of the stack must first be defined, and it cannot be changed. Trying to push a new element
into a full stack causes an implementation-specific exception.
𝑛 𝑛 𝑛 𝑛 𝑛
1 + 2 + 4+8… + + + 𝑛 = 𝑛 + + + … +4+2+ 1
4 2 2 4 8
1 1 1 4 2 1
= 𝑛 (1 + + + … + + + )
2 4 8 𝑛 𝑛 𝑛
= 𝑛(2 ) 2𝑛 = O(𝑛)
𝑇(𝑛) is O(𝑛) and the amortized time of a push operation is O(1) .
#includ e<stdio.h>
#include <stdlib.h>
#include <limits.h>
struct Stack {
int top;
int capacity;
int *array;
};
struct Stack *createStack(int capacity) {
struct Stack *S = malloc(sizeof(struct Stack));
if(!S)
return NULL;
S->capacity = capacity;
S->top = -1;
S->array= malloc(S->capacity * sizeof(int));
if(!S->array)
return NULL;
return S;
}
int isEmpty(struct Stack *S) {
return (S->top == -1); // if the condition is true then 1 is returned else 0 is returned
}
int size(struct Stack *S) {
return (S->top + 1);
}
int isFull(struct Stack *S){
//if the condition is true then 1 is returned else 0 is returned
return (S->top == S->capacity - 1);
}
void doubleStack(struct Stack *S){
S->capacity *= 2;
S->array = realloc(S->array, S->capacity * sizeof(int));
}
void push(struct Stack *S, int data){
if(isFull(S))
doubleStack(S);
S->array[++S->top] = data;
}
int pop(struct Stack *S){
/* S->top == - 1 indicates empty stack*/
if(isEmpty(S)){
printf("Stack is Empty\n");
return INT_MIN;
}
else /* Removing element from ‘top’ of the array and reducing ‘top’ by 1*/
return (S->array[S->top--]);
}
int peek(struct Stack *S){
/* S->top == - 1 indicates empty stack*/
if(isEmpty(S)){
printf("Stack is Empty");
return INT_MIN;;
}
else
return (S->array[S->top]);
4.5 Implementation 103
Data Structures and Algorithms Made Easy Stacks
}
void deleteStack(struct Stack *S){
if(S) {
if(S->array)
free(S->array);
free(S);
}
}
int main(){
int i = 0, capacity = 5;
// create a stack of capacity 5
struct Stack *stk = createStack(capacity);
for(i = 0; i <= 2 * capacity; i++){
push(stk, i);
}
printf("Top element is %d\n", peek(stk));
printf("Stack size is %d\n", size(stk));
for (i = 0; i <= capacity; i++){
printf("Popped element is %d\n", pop(stk));
}
if (isEmpty(stk))
printf("Stack is empty");
else
printf("Stack is not empty");
deleteStack(stk);
return 0;
}
Performance
Let 𝑛 be the number of elements in the stack. The complexities for operations with this representation can be
given as:
Space complexity (for 𝑛 push operations) O(𝑛)
Time complexity of createStack() O(1)
Time complexity of push() O(1) (Average)
Time complexity of pop() O(1)
Time complexity of top() O(1)
Time complexity of isEmpty() O(1))
Time complexity of isFull() O(1)
Time complexity of deleteStack() O(1)
Note: Too many doublings may cause memory overflow exception.
top
The other way of implementing stacks is by using Linked lists. Push operation is implemented by inserting element
at the beginning of the list. Pop operation is implemented by deleting the node from the beginning (the header/top
node).
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
struct ListNode{
int data;
struct ListNode *next;
};
struct Stack{
struct ListNode *top;
};
Performance
Let 𝑛 be the number of elements in the stack. The complexities for operations with this representation can be
given as:
Space complexity (for 𝑛 push operations) O(𝑛)
Time complexity of createStack() O(1)
Time complexity of push() O(1) (Average)
Time complexity of pop() O(1)
Time complexity of top() O(1)
Time complexity of isEmpty() O(1)
Time complexity of deleteStack() O(𝑛)
Incremental Strategy
The amortized time (average time per operation) of a push operation is O(𝑛) [O(𝑛2 )/𝑛 ].
Doubling Strategy
In this method, the amortized time of a push operation is O(1) [O(𝑛)/𝑛].
Note: For analysis, refer to the 𝐼𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 section.
Solution: Stacks can be used to check whether the given expression has balanced symbols. This algorithm is very
useful in compilers. Each time the parser reads one character at a time. If the character is an opening delimiter
such as (, {, or [- then it is written to the stack. When a closing delimiter is encountered like ), }, or ]- the stack is
popped.
The opening and closing delimiters are then compared. If they match, the parsing of the string continues. If they
do not match, the parser indicates that there is an error on the line. A linear-time O(𝑛) algorithm based on stack
can be given as:
Algorithm:
a) Create a stack.
b) while (end of input is not reached) {
1) If the character read is not a symbol to be balanced, ignore it.
2) If the character is an opening symbol like (, [, {, push it onto the stack
3) If it is a closing symbol like ),],}, then if the stack is empty report an error. Otherwise pop the
stack.
4) If the symbol popped is not the corresponding opening symbol, report an error.
}
c) At end of input, if the stack is not empty report an error.
Examples:
Example Valid? Description
(A+B)+(C-D) Yes The expression has a balanced symbol
((A+B)+(C-D) No One closing brace is missing
((A+B)+[C-D]) Yes Opening and immediate closing braces correspond
((A+B)+[C-D]} No The last closing brace does not correspond with the first opening parenthesis
For tracing the algorithm let us assume that the input is: () (() [()])
Input Symbol, A[i] Operation Stack Output
( Push ( (
Pop (
)
Test if ( and A[i] match? YES
( Push ( (
( Push ( ((
Pop (
) (
Test if ( and A[i] match? YES
[ Push [ ([
( Push ( ([(
Pop (
) ([
Test if( and A[i] match? YES
Pop [
] (
Test if [ and A[i] match? YES
Pop (
)
Test if( and A[i] match? YES
Test if stack is Empty? YES TRUE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct Stack {
int top;
int capacity;
int *array;
};
struct Stack *createStack(int capacity) {
struct Stack *S = malloc(sizeof(struct Stack));
if(!S)
return NULL;
S->capacity = capacity;
S->top = -1;
}
else{
temp = pop(stk);
if(!matchSymbol(temp, expression[count])){
printf("The mismatched symbols are:\t%c and %c\n", temp, expression[count]);
return 0;
}
}
}
}
if(isEmpty(stk)){
printf("\nThe Expression has Balanced Parentheses\n");
return 1;
}
else{
printf("The Expression has unbalanced parentheses\n");
return 0;
}
}
int main(){ // test code
int validity;
validity = checkExpression("[a+(b*{d+2})]");
if(validity == 1){
printf("The expression is valid\n");
}
else{
printf("The expression is invalid\n");
}
return 0;
}
Time Complexity: O(𝑛). Since we are scanning the input only once. Space Complexity: O(𝑛) [for stack].
Problem-2 Discuss infix to postfix conversion algorithm using stack.
Solution: Before discussing the algorithm, first let us see the definitions of infix, prefix and postfix expressions.
Infix: An infix expression is a single letter, or an operator, proceeded by one infix string and followed by another
Infix string.
A
A+B
(A+B)+ (C-D)
Prefix: A prefix expression is a single letter, or an operator, followed by two prefix strings. Every prefix string
longer than a single variable contains an operator, first operand and second operand.
A
+AB
++AB-CD
Postfix: A postfix expression (also called Reverse Polish Notation) is a single letter or an operator, preceded by two
postfix strings. Every postfix string longer than a single variable contains first and second operands followed by
an operator.
A
AB+
AB+CD-+
Prefix and postfix notions are methods of writing mathematical expressions without parenthesis. Time to evaluate
a postfix and prefix expression is O(𝑛), where 𝑛 is the number of elements in the array.
Infix Prefix Postfix
A+B +AB AB+
A+B-C -+ABC AB+C-
(A+B)*C-D -*+ABCD AB+C*D-
Now, let us focus on the algorithm. In infix expressions, the operator precedence is implicit unless we use
parentheses. Therefore, for the infix to postfix conversion algorithm we have to define the operator precedence (or
priority) inside the algorithm.
The table shows the precedence and their associativity (order of evaluation) among operators.
4.7 Stacks: Problems & Solutions 109
Data Structures and Algorithms Made Easy Stacks
?: conditional 3 right-to-left
= += -= /= *= %= assignment 2 right-to-left
<<= >>=
&= ^=
, comma 1 left-to-right
Important Properties
• Let us consider the infix expression 2 + 3 * 4 and its postfix equivalent 2 3 4 * +. Notice that between infix
and postfix the order of the numbers (or operands) is unchanged. It is 2 3 4 in both cases. But the order
of the operators * and + is affected in the two expressions.
• Only one stack is enough to convert an infix expression to postfix expression. The stack that we use in
the algorithm will be used to change the order of operators from infix to postfix. The stack we use will only
contain operators and the open parentheses symbol ‘(‘.
Postfix expressions do not contain parentheses. We shall not output the parentheses in the postfix output.
Algorithm:
a) Create a stack
b) for each character t in the input stream{
if(t is an operand)
output t
else if(t is a right parenthesis){
Pop and output tokens until a left parenthesis is popped (but not output)
}
else // t is an operator or left parenthesis{
pop and output tokens until one of lower priority than t is encountered or a left parenthesis
is encountered or the stack is empty
Push t
}
}
c) pop and output tokens until the stack is empty.
For better understanding let us trace out an example: A * B- (C + D) + E
Input Character Operation on Stack Stack Postfix Expression
A Empty A
* Push * A
B * AB
- Check and Push - AB*
( Push -( AB*
C -( AB*C
+ Check and Push -(+ AB*C
D AB*CD
) Pop and append to postfix till ‘(’ - AB*CD+
+ Check and Push + AB*CD+-
E + AB*CD+-E
End of input Pop till empty AB*CD+-E+
// Refer stack implementation from previous section
int priority(char x){
if(x == '(')
return 0;
if(x == '+' || x == '-')
return 1;
if(x == '*' || x == '/')
return 2;
}
int infixToPostfix(char expression[]){
char *e, x;
// create a stack of capacity 5
struct Stack *stk = createStack(5);
e = expression;
while(*e != '\0'){
if(isalnum(*e))
printf("%c",*e);
else if(*e == '(')
push(stk, *e);
else if(*e == ')'){
while((x = pop(stk)) != '(')
printf("%c", x);
}
else{
while(priority(peek(stk)) >= priority(*e))
printf("%c",pop(stk));
push(stk, *e);
}
e++;
}
while(!isEmpty(stk)){
printf("%c",pop(stk));
}
}
int main(){ // test code
infixToPostfix("(a+(b*(d+2)))");
}
Problem-3 Discuss postfix evaluation using stacks?
Solution:
Algorithm:
1 Scan the Postfix string from left to right.
2 Initialize an empty stack.
3 Repeat steps 4 and 5 till all the characters are scanned.
4 If the scanned character is an operand, push it onto the stack.
5 If the scanned character is an operator, and if the operator is a unary operator, then pop an element from
the stack. If the operator is a binary operator, then pop two elements from the stack. After popping the
elements, apply the operator to those popped elements. Let the result of this operation be retVal onto the
stack.
6 After all characters are scanned, we will have only one element in the stack.
7 Return top of the stack as result.
Example: Let us see how the above-mentioned algorithm works using an example. Assume that the postfix string
is 123*+5-.
Initially the stack is empty. Now, the first three characters scanned are 1, 2 and 3, which are operands. They will
be pushed into the stack in that order.
3
Expression
2
1
Stack
The next character scanned is "*", which is an operator. Thus, we pop the top two elements from the stack and
perform the "*" operation with the two operands. The second operand will be the first element that is popped.
2*3=6
Expression
1
Stack
The value of the expression (2*3) that has been evaluated (6) is pushed into the stack.
Expression
6
1
Stack
The next character scanned is "+", which is an operator. Thus, we pop the top two elements from the stack and
perform the "+" operation with the two operands. The second operand will be the first element that is popped.
1+6=7
Expression
Stack
The value of the expression (1+6) that has been evaluated (7) is pushed into the stack.
Expression
7
Stack
The next character scanned is "5", which is added to the stack.
Expression
5
7
Stack
The next character scanned is "-", which is an operator. Thus, we pop the top two elements from the stack and
perform the "-" operation with the two operands. The second operand will be the first element that is popped.
7-5=2
Expression
Stack
The value of the expression(7-5) that has been evaluated(23) is pushed into the stack.
Expression
2
Stack
Now, since all the characters are scanned, the remaining element in the stack (there will be only one element in
the stack) will be returned. End result:
• Postfix String : 123*+5-
• Result : 2
// Refer stack implementation from previous section
int postfixEvaluation(char expression[]){
// Create a stack of capacity equal to expression size
struct Stack* stk = createStack(strlen(expression));
int i;
// Scan all characters one by one
for (i = 0; expression[i]; ++i){
// If the scanned character is an operand (number here),
// push it to the Stack.
if (isdigit(expression[i]))
push(stk, expression[i] - '0');
// If the scanned character is an operator, pop top two
// elements from stack apply the operator
else{
int topElement = pop(stk);
int secondTopElement = pop(stk);
switch (expression[i]){
case '+': push(stk, secondTopElement + topElement); break;
case '-': push(stk, secondTopElement - topElement); break;
case '*': push(stk, secondTopElement * topElement); break;
case '/': push(stk, secondTopElement/topElement); break;
}
}
}
return pop(stk);
}
// test code
int main() {
printf ("postfix evaluation: %d", postfixEvaluation("123*+5-"));
return 0;
}
Problem-4 Can we evaluate the infix expression with stacks in one pass?
Solution: Using 2 stacks we can evaluate an infix expression in 1 pass without converting to postfix.
Algorithm:
1) Create an empty operator stack.
2) Create an empty operand stack.
3) For each token in the input string
a. Get the next token in the infix string.
b. If next token is an operand, place it on the operand stack.
c. If next token is an operator
return peek(S->minStack);
}
struct AdvancedStack * createAdvancedStack(int capacity){
struct AdvancedStack *S = malloc (sizeof (struct AdvancedStack));
if(!S)
return NULL;
S->elementStack = createStack(capacity);
S->minStack = createStack(capacity);
return S;
}
void deleteStackA(struct AdvancedStack *S){
if(S) {
deleteStackA(S->elementStack);
deleteStackA(S->minStack);
free(S);
}
}
int main(){
int i = 0, capacity = 5;
// create a stack of capacity 5
struct AdvancedStack *stk = createAdvancedStack(capacity);
for(i = 0; i <= 2 * capacity; i++){
pushA(stk, (7*i)%4);
}
printf("Top element is %d\n", peekA(stk));
printf("Stack size is %d\n", sizeA(stk));
for (i = 0; i <= capacity; i++){
printf("Popped element is %d\n", popA(stk));
printf("Minimum element is %d\n", getMinimum(stk));
}
if (isEmptyA(stk))
printf("Stack is empty");
else
printf("Stack is not empty");
deleteStackA(stk);
return 0;
}
Time complexity: O(1).
Space complexity: O(𝑛) [for Min stack]. This algorithm has much better space usage if we rarely get a "new
minimum or equal".
Problem-6 For Problem-5 is it possible to improve the space complexity?
Solution: 𝐘𝐞𝐬. The main problem of the previous approach is, for each push operation we are pushing the element
on to min stack also (either the new element or existing minimum element). That means, we are pushing the
duplicate minimum elements on to the stack.
Now, let us change the algorithm to improve the space complexity. We still have the min stack, but we only pop
from it when the value we pop from the main stack is equal to the one on the min stack. We only 𝑝𝑢𝑠ℎ to the min
stack when the value being pushed onto the main stack is less than 𝑜𝑟 𝑒𝑞𝑢𝑎𝑙 to the current min value. In this
modified algorithm also, if we want to get the minimum then we just need to return the top element from the min
stack. For example, taking the original version and pushing 1 again, we'd get:
Main stack Min stack
1 → top
5
1
4 1 → top
6 1
2 2
Popping from the above pops from both stacks because 1 == 1, leaving:
j--;
}
if(i < j){
//did not reach the center
printf("Not a palindrome\n");
return 0;
}
else{
//reached the center
printf("Palindrome\n");
return 1;
}
}
int main(void){
isPalindrome("ababaXababa");
isPalindrome("ababababXbababbbbabba");
return 0;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-9 For Problem-8, if the input is in singly linked list then how do we check whether the list elements
form a palindrome (That means, moving backward is not possible).
Solution: Refer Linked Lists chapter.
Problem-10 Can we solve Problem-8 using stacks?
Solution: Yes.
Algorithm:
• Traverse the list till we encounter X as input element.
• During the traversal push all the elements (until X) on to the stack.
• For the second half of the list, compare each element’s content with top of the stack. If they are the same
then pop the stack and go to the next element in the input list.
• If they are not the same then the given string is not a palindrome.
• Continue this process until the stack is empty or the string is not a palindrome.
// Refer basic stack implementation from previous section
int isPalindrome(char A[]){
int i=0;
struct Stack *stk = createStack(strlen(A));
while(A[i] && A[i] != 'X') {
push(stk, A[i]);
i++;
}
i++;
while(A[i]) {
if(isEmpty(stk) || A[i] != pop(stk)) {
printf("Not a palindrome\n");
return 0;
}
i++;
}
printf("Palindrome\n");
return 1;
}
int main(void){
isPalindrome("ababaXababa");
isPalindrome("ababababXbababbbbabba");
return 0;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛/2) ≈O(𝑛).
Problem-11 Given a stack, how to reverse the elements of the stack using only stack operations (push & pop)?
Solution:
Algorithm:
• First pop all the elements of the stack till it becomes empty.
• For each upward step in recursion, insert the element at the bottom of the stack.
// Refer basic stack implementation from previous section
void reverseStack(struct Stack *S){
char data;
if(isEmpty(S))
return;
data = pop(S);
reverseStack(S);
insertAtBottom(S, data);
}
void insertAtBottom(struct Stack *S, char data){
char temp;
if(isEmpty(S)) {
push(S, data);
return;
}
temp = pop(S);
insertAtBottom(S, data);
push(S, temp);
}
int main(void){
int i = 0, capacity = 2;
// create a stack of capacity 2
struct Stack *stk = createStack(capacity);
for(i = 0; i <= capacity; i++){
push(stk, i);
}
reverseStack(stk);
printf("Top element is %d\n", peek(stk));
printf("Stack size is %d\n", size(stk));
for (i = 0; i <= capacity; i++){
printf("Popped element is %d\n", pop(stk));
}
deleteStack(stk);
return 0;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(𝑛), for recursive stack.
Problem-12 Show how to implement one queue efficiently using two stacks. Analyze the running time of the
queue operations.
Solution: Refer Queues chapter.
Problem-13 Show how to implement one stack efficiently using two queues. Analyze the running time of the
stack operations.
Solution: Refer Queues chapter.
Problem-14 How do we implement 𝑡𝑤𝑜 stacks using only one array? Our stack routines should not indicate
an exception unless every slot in the array is used?
Solution:
Stack-1 Stack-2
Top Top
Algorithm: 1 2
• Start two indexes one at the left end and the other at the right end.
• The left index simulates the first stack and the right index simulates the second stack.
• If we want to push an element into the first stack then put the element at the left index.
• Similarly, if we want to push an element into the second stack then put the element at the right index.
• The first stack grows towards the right, and the second stack grows towards the left.
4.7 Stacks: Problems & Solutions 118
Data Structures and Algorithms Made Easy Stacks
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
struct MultiStacks {
int top1, top2;
int capacity;
char *array;
};
struct MultiStacks *createStack(int capacity) {
struct MultiStacks *twoStacks = malloc(sizeof(struct MultiStacks));
if(!twoStacks)
return NULL;
twoStacks->capacity = capacity;
twoStacks->top1 = -1;
twoStacks->top2 = capacity;
twoStacks->array= malloc(twoStacks->capacity * sizeof(int));
if(!twoStacks->array)
return NULL;
return twoStacks;
}
int isEmpty(struct MultiStacks *twoStacks, int stackNumber) {
if (stackNumber == 1){
return (twoStacks->top1 == -1);
} else {
return (twoStacks->top2 == twoStacks->capacity);
}
}
int size(struct MultiStacks *twoStacks, int stackNumber) {
if (stackNumber == 1){
return (twoStacks->top1 + 1);
} else {
return (twoStacks->capacity - twoStacks->top2);
}
}
int isFull(struct MultiStacks *twoStacks){
return (size(twoStacks, 1) + size(twoStacks, 2) == twoStacks->capacity);
}
void push(struct MultiStacks *twoStacks, int stackNumber, char data){
if(isFull(twoStacks)){
printf("Stack overflow\n");
return;
}
if (stackNumber == 1){
twoStacks->array[++twoStacks->top1] = data;
} else {
twoStacks->array[--twoStacks->top2] = data;
}
}
char pop(struct MultiStacks *twoStacks, int stackNumber){
/* twoStacks->top == - 1 indicates empty stack*/
if(isEmpty(twoStacks, stackNumber)){
printf("Stack is Empty\n");
return '\0';
}
if (stackNumber == 1){
return (twoStacks->array[twoStacks->top1--]);
} else {
return (twoStacks->array[twoStacks->top2++]);
}
}
Stack-2
Stack-1 Stack- Top2
Top1 Top
3
3
To implement 3 stacks we keep the following information.
• The index of the first stack (Top1): this indicates the size of the first stack.
• The index of the second stack (Top2): this indicates the size of the second stack.
• Starting index of the third stack (base address of third stack).
• Top index of the third stack.
Now, let us define the push and pop operations for this implementation.
Pushing:
• For pushing on to the first stack, we need to see if adding a new element causes it to bump into the third
stack. If so, try to shift the third stack upwards. Insert the new element at (start1 + Top1).
• For pushing to the second stack, we need to see if adding a new element causes it to bump into the third
stack. If so, try to shift the third stack downward. Insert the new element at (start2 – Top2).
• When pushing to the third stack, see if it bumps into the second stack. If so, try to shift the third stack
downward and try pushing again. Insert the new element at (start3 + Top3).
Time Complexity: O(𝑛). Since we may need to adjust the third stack. Space Complexity: O(1).
Popping: For popping, we don’t need to shift, just decrement the size of the appropriate stack.
Time Complexity: O(1). Space Complexity: O(1).
Problem-16 For Problem-15, is there any other way implementing the middle stack?
Solution: Yes. When either the left stack (which grows to the right) or the right stack (which grows to the left)
bumps into the middle stack, we need to shift the entire middle stack to make room. The same happens if a push
on the middle stack causes it to bump into the right stack.
To solve the above-mentioned problem (number of shifts) what we can do is: alternating pushes can be added at
alternating sides of the middle list (For example, even elements are pushed to the left, odd elements are pushed
to the right). This would keep the middle stack balanced in the center of the array but it would still need to be
shifted when it bumps into the left or right stack, whether by growing on its own or by the growth of a neighboring
stack. We can optimize the initial locations of the three stacks if they grow/shrink at different rates and if they
have different average sizes. For example, suppose one stack doesn't change much. If we put it at the left, then
the middle stack will eventually get pushed against it and leave a gap between the middle and right stacks, which
grow toward each other. If they collide, then it's likely we've run out of space in the array. There is no change in
the time complexity but the average number of shifts will get reduced.
Problem-17 Multiple (𝑚) stacks in one array: Similar to Problem-15, what if we want to implement 𝑚 stacks
in one array?
Solution: Let us assume that array indexes are from 1 to n. Similar to the discussion in Problem-15, to implement
𝑛
𝑚 stacks in one array, we divide the array into 𝑚 parts (as shown below). The size of each part is 𝑚.
𝑛 2𝑛
1 𝑚 n
𝑚
A
From the above representation we can see that, first stack is starting at index 1 (starting index is stored in Base[1]),
𝑛 2𝑛
second stack is starting at index 𝑚 (starting index is stored in Base[2]), third stack is starting at index 𝑚 (starting
index is stored in Base[3]), and so on. Similar to 𝐵𝑎𝑠𝑒 array, let us assume that 𝑇𝑜𝑝 array stores the top indexes
for each of the stack. Consider the following terminology for the discussion.
• Top[i], for 1 ≤ 𝑖 ≤ 𝑚 will point to the topmost element of the stack 𝑖.
• If Base[i] == Top[i], then we can say the stack 𝑖 is empty.
• If Top[i] == Base[i+1], then we can say the stack i is full.
𝑛
Initially Base[i] = Top[i] = 𝑚 (𝑖 − 1), for 1 ≤ 𝑖 ≤ 𝑚.
• The 𝑖 𝑡ℎ stack grows from Base[i]+1 to Base[i+1].
Pushing on to 𝒊𝒕𝒉 stack:
1) For pushing on to the 𝑖 𝑡ℎ stack, we check whether the top of 𝑖 𝑡ℎ stack is pointing to Base[i+1] (this case
defines that 𝑖 𝑡ℎ stack is full). That means, we need to see if adding a new element causes it to bump into
the 𝑖 + 1𝑡ℎ stack. If so, try to shift the stacks from 𝑖 + 1𝑡ℎ stack to 𝑚𝑡ℎ stack toward the right. Insert the
new element at (Base[i] + Top[i]).
2) If right shifting is not possible then try shifting the stacks from 1 to 𝑖 − 1𝑡ℎ stack toward the left.
3) If both of them are not possible then we can say that all stacks are full.
void push(int stackID, int data) {
if(top[i] == base[i+1])
Print 𝑖 𝑡ℎ Stack is full and does the necessary action (shifting);
top[i] = top[i]+1;
A[top[i]] = data;
}
Time Complexity: O(𝑛). Since we may need to adjust the stacks. Space Complexity: O(1).
Popping from 𝒊𝒕𝒉 stack: For popping, we don’t need to shift, just decrement the size of the appropriate stack. The
only case to check is stack empty case.
int pop(int stackID) {
if(top[i] == base[i])
Print 𝑖 𝑡ℎ Stack is empty;
return A[top[i]--];
}
0 1 2 3 4
return spans;
}
The algorithm works by starting from the first element and computing its span as 1. Then, for each subsequent
element, we iterate backwards through the array, starting from the previous element, and subtracting the spans
of the elements we encounter until we find an element that is greater than the current element. The span of the
current element is then the difference between its index and the index of the last element we encountered that
was greater than it.
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-22 Can we improve the complexity of Problem-21?
Solution: Alternative way to solve this problem is to use a stack. From the example above, we can see that span
𝑆[𝑖] on day 𝑖 can be easily calculated if we know the closest day preceding 𝑖, such that the price is greater on that
day than the price on day 𝑖. Let us call such a day as 𝑃. If such a day exists then the span is now defined as
𝑠𝑝𝑎𝑛𝑠[𝑖] = 𝑖 − 𝑃.
We can iterate through the array from left to right, and for each element A[i], we pop elements from the stack as
long as they are less than or equal to A[i]. The stack will contain the indices of the elements that are larger than
A[i] and that have been seen so far. The span of A[i] is then the difference between i and the index of the top
element of the stack.
int* find_spans(int A[], int n) {
int* spans = (int*) malloc(n * sizeof(int));
int* stack = (int*) malloc(n * sizeof(int));
int top = -1;
for (int i = 0; i < n; i++) {
while (top >= 0 && A[stack[top]] <= A[i])
top--;
spans[i] = (top >= 0) ? (i - stack[top]) : (i + 1);
stack[++top] = i;
}
free(stack);
return spans;
}
Time Complexity: Each index of the array is pushed into the stack exactly once and also popped from the stack
at most once. The statements in the while loop are executed at most 𝑛 times. Even though the algorithm has
nested loops, the complexity is O(𝑛) as the inner loop is executing only 𝑛 times during the course of the algorithm
(trace out an example and see how many times the inner loop becomes successful). Space Complexity: O(𝑛) [for
stack].
Problem-23 Largest rectangle under histogram: A histogram is a polygon composed of a sequence of
rectangles aligned at a common base line. For simplicity, assume that the rectangles have equal widths but
may have different heights. For example, the figure on the left shows a histogram that consists of rectangles
with the heights 3, 2 , 5, 6, 1, 4, 4, measured in units where 1 is the width of the rectangles. Here our problem is:
given an array with heights of rectangles (assuming width is 1), we need to find the largest rectangle possible.
For the given example, the largest rectangle is the shared part.
Solution: The first insight is to identify which rectangles to be considered for the solution: those which cover a
contiguous range of the input histogram and whose height equals the minimum bar height in the range (rectangle
height cannot exceed the minimum height in the range and there’s no point in considering a height less than the
minimum height because we can just increase the height to the minimum height in the range and get a better
solution). This greatly constrains the set of rectangles we need to consider. Formally, we need to consider only
those rectangles with 𝑤𝑖𝑑𝑡ℎ = 𝑗 − 𝑖 + 1 (0 = 𝑖 = 𝑗 < 𝑛) and ℎ𝑒𝑖𝑔ℎ𝑡 = 𝑚𝑖𝑛(𝐴[𝑖. . 𝑗]).
At this point, we can directly implement this solution.
int findMin(vector<int> &A, int i, int j){
int min = A[i];
while(i <= j){
if (min > A[i])
min = A[i];
i++;
}
return min;
}
int largestHistrogram(vector<int> &A) {
int maxArea = 0;
for (int i = 0; i < A.size(); i++) {
for (int j = i, minimum_height = A[i]; j < A.size(); j++) {
minimum_height = findMin(A, i, j);
maxArea = max(maxArea, (j-i+1) * minimum_height);
}
}
return maxArea;
}
There are only 𝑛2 choices for i and j. If we naively calculate the minimum height in the range [i..j], this will have
time complexity O(𝑛3 ).
Instead, we can keep track of the minimum height in the inner loop for j, leading to the following implementation
with O(𝑛2 ) time complexity and O(1) auxiliary space complexity.
int largestHistogram(vector<int> &A) {
int maxArea = 0;
for (int i = 0; i < A.size(); i++) {
for (int j = i, minimum_height = A[i]; j < A.size(); j++) {
minimum_height = min(minimum_height, A[j]);
maxArea = max(maxArea, (j-i+1) * minimum_height);
}
}
return maxArea;
}
Problem-24 For Problem-23, can we improve the time complexity?
Solution: We are still doing a lot of repeated work by considering all 𝑛2 rectangles. There are only 𝑛 possible
heights. For each position j, we need to consider only 1 rectangle: the one with height = A[j] and width = k-i+1,
where 0=i<=j<=k<n, A[i..k] >= A[j], A[i-1] < A[j] and A[k+1] < A[j].
Linear search using a stack of incomplete sub problems: There are many ways of solving this problem. 𝐽𝑢𝑑𝑔𝑒
has given a nice algorithm for this problem which is based on stack. Process the elements in left-to-right order
and maintain a stack of information about started but unfinished sub histograms. The idea is to traverse the
histogram from left to right, maintaining a stack of heights in non-decreasing order. Whenever a new height is
encountered that is smaller than the top of the stack, the rectangles that can be formed with the heights in the
stack are computed and the maximum area is recorded.
Here are the steps:
current index and the index at the top of the stack (or -1 if the stack is empty). Update the
maximum area seen so far accordingly.
3. After traversing the entire array, if there are any indices remaining in the stack, repeat step 2b for each
index until the stack is empty.
The maximum area seen during the traversal is the answer to the problem.
// Calculate the area of the rectangle with given height and width
int getArea(int height, int width) {
return height * width;
}
// Find the largest rectangle under the histogram
int findLargestRectangle(int* heights, int n) {
struct Stack* stack = createStack(n);
int maxArea = INT_MIN;
int i = 0;
while (i < n) {
if (isEmpty(stack) || heights[i] >= heights[peek(stack)]) {
push(stack, i++);
}
else {
int topIndex = pop(stack);
int width = isEmpty(stack) ? i : i - peek(stack) - 1;
int area = getArea(heights[topIndex], width);
if (area > maxArea) {
maxArea = area;
}
}
}
while (!isEmpty(stack)) {
int topIndex = pop(stack);
int width = isEmpty(stack) ? i : i - peek(stack) - 1;
int area = getArea(heights[topIndex], width);
if (area > maxArea) {
maxArea = area;
}
}
free(stack->array);
free(stack);
return maxArea;
}
At the first impression, this solution seems to be having O(𝑛2 ) complexity. But if we look carefully, every element
is pushed and popped at most once, and in every step of the function at least one element is pushed or popped.
Therefore, the number of iterations is proportional to the number of elements in the input array, which gives a
time complexity of O(𝑛). Space Complexity: O(𝑛) [for stack].
Problem-25 On a given machine, how do you check whether the stack grows up or down?
Solution: Try noting down the address of a local variable. Call another function with a local variable declared in
it and check the address of that local variable and compare.
int testStackGrowth() {
int temporary;
stackGrowth(&temporary);
exit(0);
}
void stackGrowth(int *temp){
int temp2;
printf("\nAddress of first local valuable: %u", temp);
printf("\nAddress of second local: %u", &temp2);
if(temp < &temp2)
printf("\n Stack is growing downwards");
else
printf("\n Stack is growing upwards");
}
Time Complexity: O(1). Space Complexity: O(1).
Problem-26 Suppose there are two singly linked lists which intersect at some point and become a single linked
list. The head or start pointers of both the lists are known, but the intersecting node is not known. Also, the
number of nodes in each of the lists before they intersect are unknown and both lists may have a different
number. 𝐿𝑖𝑠𝑡1 may have 𝑛 nodes before it reaches the intersection point and 𝐿𝑖𝑠𝑡2 may have 𝑚 nodes before it
reaches the intersection point where 𝑚 and 𝑛 may be 𝑚 = 𝑛, 𝑚 < 𝑛 or 𝑚 > 𝑛. Can we find the merging point
using stacks?
NULL
break;
}
}
printf("For the element %d, %d is the nearest greater element\n", A[i], nextNearestGreater);
}
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-30 For Problem-29, can we improve the complexity?
Solution: The approach is pretty much similar to Problem-21. Create a stack and push the first element. For the
rest of the elements, mark the current element as 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟. If stack is not empty, then pop an element
from stack and compare it with 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟. If 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟 is greater than the popped element, then
𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟 is the next greater element for the popped element. Keep popping from the stack while the
popped element is smaller than 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟. 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟 becomes the next greater element for all
such popped elements. If 𝑛𝑒𝑥𝑡𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝐺𝑟𝑒𝑎𝑡𝑒𝑟 is smaller than the popped element, then push the popped element
back.
void replaceWithNearestGreaterElement(int A[], int n){
int i = 0;
struct Stack *S = createStack();
int element, nextNearestGreater;
push(S, A[0]);
for (i=1; i<n; i++){
nextNearestGreater = A[i];
if (!isEmpty(S)){
element = pop(S);
while (element < nextNearestGreater){
printf("For the element %d, %d is the nearest greater element\n", A[i], nextNearestGreater);
if(isEmpty(S))
break;
element = pop(S);
}
if (element > nextNearestGreater)
push(S, element);
}
push(S, nextNearestGreater);
}
while (!isEmpty(S)){
element = pop(S);
nextNearestGreater = -INT_MIN;
printf("For the element %d, %d is the nearest greater element\n", A[i], nextNearestGreater);
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-31 How to implement a stack which will support following operations in O(1) time complexity?
• Push which adds an element to the top of stack.
• Pop which removes an element from top of stack.
• Find middle which will return middle element of the stack.
• Delete middle which will delete the middle element.
Solution: We can use a LinkedList data structure with an extra pointer to the middle element. Also, we need
another variable to store whether the LinkedList has an even or odd number of elements.
• 𝑃𝑢𝑠ℎ: Add the element to the head of the LinkedList. Update the pointer to the middle element according
to variable.
• 𝑃𝑜𝑝: Remove the head of the LinkedList. Update the pointer to the middle element according to variable.
• 𝐹𝑖𝑛𝑑 𝑚𝑖𝑑𝑑𝑙𝑒: Find middle which will return middle element of the stack.
• 𝐷𝑒𝑙𝑒𝑡𝑒 𝑚𝑖𝑑𝑑𝑙𝑒: Delete middle which will delete the middle element use the logic of Problem-43 from
𝐿𝑖𝑛𝑘𝑒𝑑 𝐿𝑖𝑠𝑡𝑠 chapter.
Problem-32 How do you determine whether a singly linked list is a palindrome?
Solution: To determine whether a singly linked list is a palindrome or not, we can follow these steps:
• Traverse the linked list and push each node's value into a stack.
• Traverse the linked list again and compare each node's value with the value at the top of the stack. If they
match, pop the value from the stack and continue. If they don't match, return false.
• If we have traversed the entire linked list and all values have matched, return true.
// Definition for singly-linked list.
struct ListNode {
int val;
struct ListNode *next;
};
bool isPalindrome(struct ListNode* head){
// Step 1: Create an empty stack and a pointer to the head of the linked list
int stack[100000]; // Assuming maximum of 100000 nodes in the linked list
int top = -1;
struct ListNode* cur = head;
// Step 2: Traverse the linked list, pushing each node's value onto the stack
while (cur != NULL) {
stack[++top] = cur->val;
cur = cur->next;
}
// Step 3: Pop values from the stack and compare them with the corresponding values in the linked list
cur = head;
while (cur != NULL && top >= 0) {
if (cur->val != stack[top--]) {
// If the values don't match, the linked list is not a palindrome
return false;
}
cur = cur->next;
}
// If we reach this point, all values in the linked list matched their corresponding values in the stack,
// so the linked list is a palindrome
return true;
}
The time complexity of the code is O(n), where n is the number of nodes in the linked list. This is because we
iterate through the entire linked list once to push all the values onto the stack, and then we iterate through the
first half of the stack to compare each value with its corresponding value in the linked list. The time complexity of
both iterations is O(n/2), which is asymptotically equivalent to O(n).
The space complexity of the algorithm is also O(n) because we need to create a stack of size n to store all the
values in the linked list. This is because the worst-case scenario is when the linked list is a palindrome, so we
need to store all n values in the stack. For alternative solutions, refer 𝐿𝑖𝑛𝑘𝑒𝑑 𝐿𝑖𝑠𝑡𝑠 chapter.
Problem-33 Remove All Adjacent Duplicates in String: We are given a string s consisting of lowercase
English letters. A duplicate removal consists of choosing two adjacent and equal letters and removing them.
Solution: One way to solve this problem is to use a stack data structure. We can iterate through each character
in the string and push it onto the stack. If the current character is equal to the top of the stack, we pop the top
element of the stack and continue. If not, we continue pushing elements onto the stack. At the end, we concatenate
the elements left in the stack and return the result.
Let's walk through an example to see how this algorithm works. Suppose we have the input string "abbaca". Here
are the steps of the algorithm:
1. Initialize an empty stack.
2. Iterate through each character in the string:
• Push 'a' onto the stack.
• Push 'b' onto the stack.
• Pop 'b' from the stack (since it's equal to the top of the stack, which is 'b').
• Pop 'a' from the stack (since it's equal to the top of the stack, which is 'a').
• Push 'c' onto the stack.
• Pop 'c' from the stack (since it's equal to the top of the stack, which is 'c').
3. Concatenate the remaining elements in the stack, which are 'a' and 'a', and return the result "aa".
So, the output for the input "abbaca" is "aa", which is the expected result after removing all adjacent duplicates.
char* removeDuplicates(char* s) {
int len = strlen(s);
char* stack = (char*)malloc(sizeof(char) * (len + 1)); // add 1 for null terminator
int top = -1; // initialize stack top to -1
for (int i = 0; i < len; i++) {
if (top >= 0 && s[i] == stack[top]) {
top--; // remove top element from stack
} else {
top++; // add element to stack
stack[top] = s[i];
}
}
stack[top + 1] = '\0'; // add null terminator
return stack;
}
The time complexity of the algorithm is O(n), where n is the length of the input string. This is because we iterate
through each character in the string once, and each character is pushed and popped from the stack at most once.
The space complexity of the algorithm is also O(n), since the stack can contain at most n/2 characters (if all
characters in the string are the same). Therefore, the algorithm is both time and space efficient.
Problem-34 Minimum Number of Swaps to Make the String Balanced: We are given a 0-indexed string s
of even length n. The string consists of exactly n / 2 opening brackets '[' and n / 2 closing brackets ']'. A string
is called balanced if and only if:
• It is the empty string, or
• It can be written as AB, where both A and B are balanced strings, or
• It can be written as [C], where C is a balanced string.
• You may swap the brackets at any two indices any number of times.
Return the minimum number of swaps to make s balanced.
Solution: We can solve this problem using a stack. We can iterate through the string s and maintain a count of
opening brackets that we have seen so far. When we see a closing bracket, we can check if we have an opening
bracket on the top of the stack. If we do, we can pop the opening bracket from the stack, since it has been paired
with the current closing bracket. If we don't have an opening bracket on the stack, we need to swap the current
closing bracket with an opening bracket that we have not yet used.
int min_swaps(char* s) {
int n = strlen(s);
int stack[n/2];
int top = -1;
int swaps = 0;
for (int i = 0; i < n; i++) {
if (s[i] == '[') {
stack[++top] = i;
} else {
if (top >= 0 && s[stack[top]] == '[') {
top--;
} else {
swaps++;
}
}
}
return swaps;
}
The time complexity of the min_swaps function is O(n), where n is the length of the input string s. This is because
we iterate through each character in the string exactly once.
The space complexity of the function is also O(n), since the maximum size of the stack is n/2 (since the input
string has n/2 opening and n/2 closing brackets) and we may need to store additional opening brackets that we
have not yet used in the stack.
Problem-35 Given a stack of integers, how do you check whether each successive pair of numbers in the
stack is consecutive or not. The pairs can be increasing or decreasing, and if the stack has an odd number of
elements, the element at the top is left out of a pair. For example, if the stack of elements are [4, 5, -2, -3, 11,
10, 5, 6, 20], then the output should be true because each of the pairs (4, 5), (-2, -3), (11, 10), and (5, 6) consists
of consecutive numbers.
Solution: To solve this problem, we can use a stack and iterate over the stack to check if each successive pair of
elements is consecutive or not.
Here is the algorithm:
1. Pop the first two elements from the stack.
2. Check if the absolute difference between the two elements is 1. If it is not, return false.
3. If the difference is 1, remember the sign of the difference.
4. Repeat steps 1-3 until there are no more pairs to check.
5. If all pairs have been checked and passed the consecutive test, return true. Otherwise, return false.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define STACK_SIZE 100
struct Stack {
int items[STACK_SIZE];
int top;
};
void push(struct Stack *s, int value) {
if (s->top == STACK_SIZE - 1) {
printf("Stack is full\n");
} else {
s->top++;
s->items[s->top] = value;
}
}
int pop(struct Stack *s) {
if (s->top == -1) {
printf("Stack is empty\n");
return -1;
} else {
int value = s->items[s->top];
s->top--;
return value;
}
}
bool isConsecutivePair(struct Stack *s) {
if (s->top < 1) {
return true;
}
int prev, curr;
prev = pop(s);
while (s->top > 0) {
curr = pop(s);
if (abs(prev - curr) != 1) {
return false;
}
prev = pop(s);
}
return true;
}
The time complexity of the given algorithm is O(n), where n is the number of elements in the stack. This is because
we are iterating over each pair of elements in the stack exactly once. The space complexity of the algorithm is also
O(n), as we are using a stack to store the elements. However, the actual space used by the stack may be less than
n, as elements are popped off as soon as we compare them. Therefore, the space used by the algorithm depends
on the size of the largest consecutive sequence in the stack.
Chapter
Queues 5
5.1 What is a Queue Data Structure?
A queue is a data structure used for storing data (similar to Linked Lists and Stacks). In queue, the order in which
data arrives is important. In general, a queue is a line of people or things waiting to be served in sequential order
starting at the beginning of the line or sequence.
Definition: A 𝑞𝑢𝑒𝑢𝑒 is an ordered list in which insertions are done at one end (𝑟𝑒𝑎𝑟) and deletions are done at
other end (𝑓𝑟𝑜𝑛𝑡). The first element to be inserted is the first one to be deleted. Hence, it is called First in First out
(FIFO) or Last in Last out (LILO) list.
Similar to 𝑆𝑡𝑎𝑐𝑘𝑠, special names are given to the two changes that can be made to a queue. When an element is
inserted in a queue, the concept is called 𝐸𝑛𝑄𝑢𝑒𝑢𝑒, and when an element is removed from the queue, the concept
is called 𝐷𝑒𝑄𝑢𝑒𝑢𝑒.
𝐷𝑒𝑄𝑢𝑒𝑢𝑒𝑖𝑛𝑔 an empty queue is called 𝑢𝑛𝑑𝑒𝑟𝑓𝑙𝑜𝑤 and 𝐸𝑛𝑄𝑢𝑒𝑢𝑖𝑛𝑔 an element in a full queue is called 𝑜𝑣𝑒𝑟𝑓𝑙𝑜𝑤.
Generally, we treat them as exceptions. As an example, consider the snapshot of the queue.
5.4 Exceptions
Similar to other ADTs, executing 𝐷𝑒𝑄𝑢𝑒𝑢𝑒 on an empty queue throws an “𝐸𝑚𝑝𝑡𝑦 𝑄𝑢𝑒𝑢𝑒 𝐸𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛” and executing
𝐸𝑛𝑄𝑢𝑒𝑢𝑒 on a full queue throws “𝐹𝑢𝑙𝑙 𝑄𝑢𝑒𝑢𝑒 𝐸𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛”.
5.5 Applications
Following are some of the applications that use queues.
Direct Applications
• Operating systems schedule jobs (with equal priority) in the order of arrival (e.g., a print queue).
• Simulation of real-world queues such as lines at a ticket counter or any other first-come first-served
scenario requires a queue.
• Multiprogramming.
• Asynchronous data transfer (file IO, pipes, sockets).
• Waiting times of customers at call center.
• Determining number of cashiers to have at a supermarket.
Indirect Applications
• Auxiliary data structure for algorithms
• Component of other data structures
5.6 Implementation
There are many ways (similar to Stacks) of implementing queue operations and some of the commonly used
methods are listed below.
• Simple circular array based implementation
• Dynamic circular array based implementation
• Linked list implementation
rear
front
This simple implementation of Queue ADT uses an array. In the array, we add elements circularly and use two
variables to keep track of the start element and end element. Generally, 𝑓𝑟𝑜𝑛𝑡 is used to indicate the start element
and 𝑟𝑒𝑎𝑟 is used to indicate the end element in the queue. The array storing the queue elements may become full.
An 𝐸𝑛𝑄𝑢𝑒𝑢𝑒 operation will then throw a 𝑓𝑢𝑙𝑙 𝑞𝑢𝑒𝑢𝑒 𝑒𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛. Similarly, if we try deleting an element from an empty
queue it will throw 𝑒𝑚𝑝𝑡𝑦 𝑞𝑢𝑒𝑢𝑒 𝑒𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛.
With stacks, the push method caused the single index field count to be increased by one, while the method pop
caused it to be decreased by one. With queues, however, the two index fields, 𝑓𝑟𝑜𝑛𝑡 and 𝑟𝑒𝑎𝑟, can each only be
increased by one. The field 𝑓𝑟𝑜𝑛𝑡 is increased by one by the deQueue method, while the field rear is increased by
one by the 𝑒𝑛𝑄𝑢𝑒𝑢𝑒 method. So as items leave and join the queue, the section of the array used to store the queue
items will gradually shift along the array and will eventually reach its end. This is different from stacks, where the
section of the array used to store the items in a stack only reaches the end of the array if the size of the stack goes
beyond the size of the array.
To deal with this problem, a "wraparound" technique is used with queues implemented with arrays. When either
the front or rear field is increased to the point where it would index past the end of the array, it is set 𝑟𝑒𝑎𝑟 to 0.
Thus the state is reached where the front section of the queue is in the higher indexed section of the array and
the rear section of the queue in the lower indexed section of the array.
Note: Initially, both front and rear points to -1 which indicates that the queue is empty.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
struct Queue {
int front, rear;
int capacity;
int size;
int *array;
};
// Create an empty queue
struct Queue *createQueue(int capacity) {
struct Queue *Q = malloc(sizeof(struct Queue));
if(!Q)
return NULL;
Q->capacity = capacity;
Q->front = Q->rear = -1;
Q->size = 0;
Q->array= malloc(Q->capacity * sizeof(int));
if(!Q->array)
return NULL;
return Q;
}
// Returns queue size
int size(struct Queue *Q) {
return Q->size;
}
// Returns Frnt Element of the Queue
int frontElement(struct Queue *Q) {
return Q->array[Q->front];
}
// Returns the Rear Element of the Queue
int rearElement(struct Queue *Q) {
return Q->array[Q->rear];
}
// Check's if Queue is empty or not
int isEmpty(struct Queue *Q) {
// if the condition is true then 1 is returned else 0 is returned
return (Q->size == 0);
}
// Check's if Queue is full or not
int isFull(struct Queue *Q) {
// if the condition is true then 1 is returned else 0 is returned
return (Q->size == Q->capacity);
}
// Adding elements in Queue
void enQueue(struct Queue *Q, int data) {
if(isFull(Q))
printf("Queue overflow\n");
else {
Q->rear = (Q->rear+1) % Q->capacity;
Q->array[Q->rear]= data;
if(Q->front == -1)
Q->front = Q->rear;
Q->size += 1;
}
}
// Removes an element from front of the queue
int deQueue(struct Queue *Q) {
int data = INT_MIN; //or element which does not exist in Queue
if(isEmpty(Q)){
printf("Queue is empty\n");
return data;
}
data = Q->array[Q->front];
if(Q->front == Q->rear) {
Q->front = Q->rear = -1;
Q->size = 0;
} else {
Q->front = (Q->front+1) % Q->capacity;
Q->size -= 1;
}
return data;
}
void deleteQueue(struct Queue *Q) {
if(Q) {
if(Q->array)
free(Q->array);
free(Q);
}
}
int main() {
// Initializing Queue
struct Queue *Q;
Q = createQueue(4);
// Adding elements in Queue
enQueue(Q, 1);
enQueue(Q, 3);
enQueue(Q, 7);
enQueue(Q, 5);
enQueue(Q, 10);
// Printing size of Queue
printf("\nSize of queue : %d\n", size(Q));
// Printing front and rear element of Queue */
printf("Front element : %d\n", frontElement(Q));
printf("Rear element : %d\n", rearElement(Q));
// Removing Element from Queue
printf("\nDequeued element : %d\n", deQueue(Q));
printf("Dequeued element : %d\n", deQueue(Q));
printf("Dequeued element : %d\n", deQueue(Q));
printf("Dequeued element : %d\n", deQueue(Q));
printf("Dequeued element : %d\n", deQueue(Q));
printf("Dequeued element : %d\n", deQueue(Q));
enQueue(Q, 15);
enQueue(Q, 100);
// Printing size of Queue
printf("\nSize of queue : %d\n", size(Q));
// Printing front and rear element of Queue
printf("Front element : %d\n", frontElement(Q));
printf("Rear element : %d\n", rearElement(Q));
// Removing Queue
deleteQueue(Q);
return 0;
}
Performance
Let 𝑛 be the number of elements in the queue.
Space Complexity (for 𝑛 enQueue operations) O(𝑛)
Time Complexity of enQueue() O(1) (Average)
Time Complexity of deQueue() O(1)
Time Complexity of QueueSize() O(1)
Time Complexity of isEmpty() O(1)
Time Complexity of IsFullQueue() O(1)
Time Complexity of QueueSize() O(1)
Time Complexity of deleteQueue() O(1)
front rear
#include <stdio.h>
#include <stdlib.h>
struct ListNode {
int data;
struct ListNode *next;
};
struct Queue {
struct ListNode *front;
struct ListNode *rear;
};
// Create an empty queue
struct Queue *createQueue() {
struct Queue *Q;
struct ListNode *temp;
Q = malloc(sizeof(struct Queue));
if(!Q)
return NULL;
temp = malloc(sizeof(struct ListNode));
Q->front = Q->rear = NULL;
return Q;
}
// Returns queue size
int size(struct Queue *Q) {
struct ListNode *temp = Q->front;
int count = 0;
if(Q->front == NULL && Q->rear == NULL)
return 0;
while(temp != Q->rear){
count++;
temp = temp->next;
}
if(temp == Q->rear)
count++;
return count;
}
// Returns Frnt Element of the Queue
int frontElement(struct Queue *Q) {
return Q->front->data;
}
// Returns the Rear Element of the Queue
int rearElement(struct Queue *Q) {
return Q->rear->data;
}
// Check's if Queue is empty or not
void isEmpty(struct Queue *Q) {
if (Q->front == NULL && Q->rear == NULL)
printf("Empty Queue\n");
else
printf("Queue is not Empty\n");
}
// Adding elements in Queue
void enQueue(struct Queue *Q, int num) {
struct ListNode *temp;
temp = (struct ListNode *)malloc(sizeof(struct ListNode));
temp->data = num;
temp->next = NULL;
if (Q->rear == NULL) {
Q->front = Q->rear = temp;
} else {
Q->rear->next = temp;
Q->rear = temp;
}
}
// Removes an element from front of the queue
void deQueue(struct Queue *Q) {
struct ListNode *temp;
if (Q->front == NULL) {
printf("\nQueue is Empty \n");
return;
} else {
temp = Q->front;
Q->front = Q->front->next;
if(Q->front == NULL){
Q->rear = NULL;
}
printf("Removed Element : %d\n", temp->data);
free(temp);
}
}
// Print's Queue
void printQueue(struct Queue *Q) {
Performance
Let 𝑛 be the number of elements in the queue, then
Comparison of Implementations
Note: Comparison is very similar to stack implementations and 𝑆𝑡𝑎𝑐𝑘𝑠 chapter.
Time Complexity: From the algorithm, if the stack S2 is not empty then the complexity is O(1). If the stack S2 is
empty, then we need to transfer the elements from S1 to S2. But if we carefully observe, the number of transferred
elements and the number of popped elements from S2 are equal. Due to this the average complexity of pop
operation in this case is O(1). The amortized complexity of pop operation is O(1).
Problem-3 Show how you can efficiently implement one stack using two queues. Analyze the running time
of the stack operations.
Solution: Yes, it is possible to implement the Stack ADT using 2 implementations of the Queue ADT. One of the
queues will be used to store the elements and the other to hold them temporarily during the 𝑝𝑜𝑝 and 𝑡𝑜𝑝 methods.
The 𝑝𝑢𝑠ℎ method would 𝑒𝑛𝑞𝑢𝑒𝑢𝑒 the given element onto the storage queue. The 𝑡𝑜𝑝 method would transfer all but
the last element from the storage queue onto the temporary queue, save the front element of the storage queue to
be returned, transfer the last element to the temporary queue, then transfer all elements back to the storage
queue. The 𝑝𝑜𝑝 method would do the same as top, except instead of transferring the last element onto the
temporary queue after saving it for return, that last element would be discarded. Let Q1 and Q2 be the two queues
to be used in the implementation of stack. All we have to do is to define the 𝑝𝑢𝑠ℎ and 𝑝𝑜𝑝 operations for the stack.
struct Stack {
struct Queue *Q1;
struct Queue *Q2;
};
In the algorithms below, we make sure that one queue is always empty.
Push Operation Algorithm: Insert the element in whichever queue is not empty.
• Check whether queue Q1 is empty or not. If Q1 is empty then Enqueue the element into Q2.
• Otherwise enQueue the element into Q1.
void push(struct Stack *S, int data) {
if(isEmpty(S→Q1))
enQueue(S→Q2, data);
else
enQueue(S→Q1, data);
}
Time Complexity: O(1).
Pop Operation Algorithm: Transfer 𝑛 − 1 elements to the other queue and delete last from queue for performing
pop operation.
• If queue Q1 is not empty then transfer 𝑛 − 1 elements from Q1 to Q2 and then, deQueue the last element
of Q1 and return it.
• If queue Q2 is not empty then transfer 𝑛 − 1 elements from Q2 to Q1 and then, deQueue the last element
of Q2 and return it.
int pop(struct Stack *S) {
int i, size;
if(isEmpty(S→Q2)) {
size = size(S→Q1);
i = 0;
while(i < size-1) {
enQueue(S→Q2, deQueue(S→Q1));
i++;
}
return deQueue(S→Q1);
}
else {
size = size(S→Q2);
while(i < size-1) {
enQueue(S→Q1, deQueue(S→Q2));
i++;
}
return deQueue(S→Q2);
}
}
Time Complexity: Running time of pop operation is O(𝑛) as each time pop is called, we are transferring all the
elements from one queue to the other.
Problem-4 Maximum sum in sliding window: Given array A[] with sliding window of size 𝑤 which is moving
from the very left of the array to the very right. Assume that we can only see the 𝑤 numbers in the window.
Each time the sliding window moves rightwards by one position. For example: The array is [1 3 -1 -3 5 3 6 7],
and 𝑤 is 3.
Window position Max
[1 3 -1] -3 5 3 6 7 3
1 [3 -1 -3] 5 3 6 7 3
1 3 [-1 -3 5] 3 6 7 5
1 3 -1 [-3 5 3] 6 7 5
1 3 -1 -3 [5 3 6] 7 6
1 3 -1 -3 5 [3 6 7] 7
Input: A long array A[], and a window width 𝑤. Output: An array B[], B[i] is the maximum value from A[i] to
A[i+w-1].
Requirement: Find a good optimal way to get B[i]
Solution: This problem can be solved with doubly ended queue (which supports insertion and deletion at both
ends). Refer 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 chapter for algorithms.
Problem-5 Given a queue Q containing 𝑛 elements, transfer these items on to a stack S (initially empty) so
that front element of Q appears at the top of the stack and the order of all other items is preserved. Using
enQueue and deQueue operations for the queue, and push and pop operations for the stack, outline an
efficient O(𝑛) algorithm to accomplish the above task, using only a constant amount of additional storage.
Solution: Assume the elements of queue Q are 𝑎1 , 𝑎2 … 𝑎𝑛 . Dequeuing all elements and pushing them onto the
stack will result in a stack with 𝑎𝑛 at the top and 𝑎1 at the bottom. This is done in O(𝑛) time as deQueue and each
push require constant time per operation. The queue is now empty. By popping all elements and pushing them
on the queue we will get 𝑎1 at the top of the stack. This is done again in O(𝑛) time.
As in big-oh arithmetic we can ignore constant factors. The process is carried out in O(𝑛) time. The amount of
additional storage needed here has to be big enough to temporarily hold one item.
Problem-6 A queue is set up in a circular array A[0..n - 1] with front and rear defined as usual. Assume that
𝑛 − 1 locations in the array are available for storing the elements (with the other element being used to detect
full/empty condition). Give a formula for the number of elements in the queue in terms of 𝑟𝑒𝑎𝑟, 𝑓𝑟𝑜𝑛𝑡, and 𝑛.
Solution: Consider the following figure to get a clear idea of the queue.
• Rear of the queue is somewhere clockwise from the front.
• To enQueue an element, we move 𝑟𝑒𝑎𝑟 one position clockwise and write the element in that position.
• To deQueue, we simply move 𝑓𝑟𝑜𝑛𝑡 one position clockwise.
• Queue migrates in a clockwise direction as we enQueue and deQueue.
• Emptiness and fullness to be checked carefully.
• Analyze the possible situations (make some drawings to see where 𝑓𝑟𝑜𝑛𝑡 and 𝑟𝑒𝑎𝑟 are when the queue is
empty, and partially and totally filled). We will get this:
𝑟𝑒𝑎𝑟 − 𝑓𝑟𝑜𝑛𝑡 + 1 if rear == front
𝑁𝑢𝑚𝑏𝑒𝑟 𝑂𝑓 𝐸𝑙𝑒𝑚𝑒𝑛𝑡𝑠 = {
𝑟𝑒𝑎𝑟 − 𝑓𝑟𝑜𝑛𝑡 + 𝑛 otherwise
Fixed size
rear
front
Problem-7 What is the most appropriate data structure to print elements of queue in reverse order?
Solution: Stack.
Problem-8 Implement doubly ended queues. A double-ended queue is an abstract data structure that
implements a queue for which elements can only be added to or removed from the front (head) or back (tail). It
is also often called a head-tail linked list.
Solution:
void pushBackDEQ(struct ListNode **head, int data){
struct ListNode *newNode = (struct ListNode*) malloc(sizeof(struct ListNode));
newNode→data = data;
if(*head == NULL){
*head = newNode;
(*head)→next = *head;
(*head)→prev = *head;
}
else{
newNode→prev = (*head)→prev;
newNode→next = *head;
(*head)→prev→next = newNode;
(*head)→prev = newNode;
}
}
void pushFrontDEQ(struct ListNode **head, int data){
pushBackDEQ(head,data);
*head = (*head)→prev;
}
int popBackDEQ(struct ListNode **head){
int data;
if( (*head)→prev == *head ){
data = (*head)→data;
free(*head);
*head = NULL;
}
else{
struct ListNode *newTail = (*head)→prev→prev;
data = (*head)→prev→data;
newTail→next = *head;
free((*head)→prev);
(*head) →prev = newTail;
}
return data;
}
int popFront(struct ListNode **head){
int data;
*head = (*head)→next;
data = popBackDEQ(head);
return data;
}
Problem-9 Given a stack of integers, how do you check whether each successive pair of numbers in the stack
is consecutive or not. The pairs can be increasing or decreasing, and if the stack has an odd number of elements,
the element at the top is left out of a pair. For example, if the stack of elements are [4, 5, -2, -3, 11, 10, 5, 6,
20], then the output should be true because each of the pairs (4, 5), (-2, -3), (11, 10), and (5, 6) consists of
consecutive numbers.
Solution:
int checkStackPairwiseOrder(struct Stack *s) {
struct Queue *q = createQueue();
int pairwiseOrdered = 1;
while (!isEmptyStack(s))
enQueue (q, pop(s));
while (!isEmptyQueue(q))
push(s, deQueue(q));
while (!isEmptyStack(s)) {
int n = pop(s);
enQueue(q, n);
if (!isEmptyStack(s)) {
int m = pop(s);
enQueue(q, m);
if (abs(n - m) != 1) {
pairwiseOrdered = 0;
}
}
}
while (!isEmptyQueue(q))
push(s, DdequeueeQueue(q));
return pairwiseOrdered;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-10 Given a queue of integers, rearrange the elements by interleaving the first half of the list with the
second half of the list. For example, suppose a queue stores the following sequence of values: [11, 12, 13, 14,
15, 16, 17, 18, 19, 20]. Consider the two halves of this list: first half: [11, 12, 13, 14, 15] second half: [16, 17,
18, 19, 20]. These are combined in an alternating fashion to form a sequence of interleave pairs: the first values
from each half (11 and 16), then the second values from each half (12 and 17), then the third values from each
half (13 and 18), and so on. In each pair, the value from the first half appears before the value from the second
half. Thus, after the call, the queue stores the following values: [11, 16, 12, 17, 13, 18, 14, 19, 15, 20].
Solution: To solve this problem, we can use a temporary queue and two pointers to keep track of the halves of
the original queue. Here is the algorithm:
• Find the midpoint of the queue by dividing the size of the queue by 2.
• Create two queues, one for the first half and one for the second half of the original queue.
• Enqueue the first half of the original queue into the first half queue and the second half of the original
queue into the second half queue.
• Create a temporary queue to hold the interleaved elements.
• Use a loop to iterate over the size of the original queue, and in each iteration, alternate between dequeuing
from the first half queue and the second half queue, and enqueue the dequeued element into the
temporary queue.
Set the original queue to be the temporary queue.
void interLeavingQueue(struct Queue *q) {
if (size(q) % 2 != 0) return;
struct Stack *s = createStack();
int halfSize = size(q) / 2;
for (int i = 0; i < halfSize; i++)
push(s, deQueue(q));
while (!isEmptyStack(s))
enQueue (q, pop(s));
for (int i = 0; i < halfSize; i++)
enQueue (q, deQueue(q));
for (int i = 0; i < halfSize; i++)
push(s, deQueue(q));
while (!isEmptyStack(s)) {
enQueue (q, pop(s));
enQueue (q, deQueue(q));
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-11 Given an integer 𝑘 and a queue of integers, how do you reverse the order of the first 𝑘 elements
of the queue, leaving the other elements in the same relative order? For example, if 𝑘=4 and queue has the
elements [10, 20, 30, 40, 50, 60, 70, 80, 90]; the output should be [40, 30, 20, 10, 50, 60, 70, 80, 90].
Solution: To reverse the order of the first k elements of the queue, we can follow the following algorithm:
• Initialize an empty stack.
• Dequeue the first k elements from the queue and push them onto the stack.
• Enqueue the elements from the stack back into the queue.
• Dequeue the remaining elements from the queue and enqueue them back into the queue.
void reverseQueueFirstKElements(int k, struct Queue *q) {
if (q == Null || k > size(q)) {
return;
}
else if (k > 0) {
struct Stack *s = createStack();
for (int i = 0; i < k; i++) {
push(s, deQueue(q));
}
while (!isEmptyStack(s)) {
enQueue(q, pop(s));
5.7 Queues: Problems & Solutions 144
Data Structures and Algorithms Made Easy Queues
}
for (int i = 0; i < size(q) - k; i++) { // wrap around rest of elements
enQueue (q, deQueue(q));
}
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-12 Number of Recent Calls: We have a RecentCounter class which counts the number of recent
requests within a certain time frame. Implement the RecentCounter class:
• RecentCounter() Initializes the counter with zero recent requests.
• int ping(int t) Adds a new request at time t, where t represents some time in milliseconds, and returns the
number of requests that has happened in the past 3000 milliseconds (including the new request).
Specifically, return the number of requests that have happened in the inclusive range [t - 3000, t].
Solution: The RecentCounter class can be implemented using a queue data structure. When a new request is
made, it is added to the end of the queue. Then, we iterate through the queue from the front, removing any
elements that are older than 3000 milliseconds from the current time t. Finally, we return the size of the queue,
which represents the number of requests made in the past 3000 milliseconds.
#include <stdio.h>
#define MAX_QUEUE_SIZE 10000
typedef struct {
int queue[MAX_QUEUE_SIZE];
int front;
int rear;
} RecentCounter;
RecentCounter* recentCounterCreate() {
RecentCounter* obj = (RecentCounter*)malloc(sizeof(RecentCounter));
obj->front = 0;
obj->rear = -1;
return obj;
}
int ping(RecentCounter* obj, int t) {
obj->rear++;
obj->queue[obj->rear] = t;
while (obj->queue[obj->front] < t - 3000) {
obj->front++;
}
return obj->rear - obj->front + 1;
}
void recentCounterFree(RecentCounter* obj) {
free(obj);
}
int main() {
RecentCounter* obj = recentCounterCreate();
printf("%d\n", ping(obj, 100)); // expected output: 1
printf("%d\n", ping(obj, 3001)); // expected output: 0
printf("%d\n", ping(obj, 3002)); // expected output: 1
printf("%d\n", ping(obj, 7000)); // expected output: 0
recentCounterFree(obj);
return 0;
}
The time complexity of the ping() function in the RecentCounter class implementation is O(n), where n is the
number of requests in the queue. This is because in the worst case, we may need to remove all requests from the
front of the queue that are older than 3000ms, which can take up to n iterations of the while loop. However, since
the maximum size of the queue is limited to 10000 in this implementation, the worst-case time complexity is also
limited.
The space complexity of the RecentCounter class implementation is O(n), where n is the number of requests in
the queue. This is because we need to store all requests in the queue. However, since the maximum size of the
queue is limited to 10000 in this implementation, the worst-case space complexity is also limited.
Problem-13 Design Hit Counter: Design a hit counter which counts the number of hits received in the past
5 minutes (i.e., the past 300 seconds). Your system should accept a timestamp parameter (in seconds
granularity), and you may assume that calls are being made to the system in chronological order (i.e., timestamp
is monotonically increasing). Several hits may arrive roughly at the same time. mplement the HitCounter class:
• HitCounter() Initializes the object of the hit counter system.
• void hit(int timestamp) Records a hit that happened at timestamp (in seconds). Several hits may happen
at the same timestamp.
• int getHits(int timestamp) Returns the number of hits in the past 5 minutes from timestamp (i.e., the past
300 seconds).
Solution: Here's the algorithm for designing a hit counter:
• Create a data structure to store the timestamps of the hits, such as a queue or a circular buffer.
• When a hit occurs, add its timestamp to the data structure.
• When getHits() is called, iterate through the data structure and count the number of timestamps that fall
within the past 5 minutes from the given timestamp.
Here's a high-level algorithm for designing a hit counter class that counts the number of hits received in the past
5 minutes:
• Create a circular buffer of size 300 to store the timestamps of the hits.
• Initialize the head index of the buffer to 0, the tail index to -1, and the size to 0.
• Implement a hit method that takes a timestamp parameter and records a hit that happened at that
timestamp. If the timestamp is the same as the tail timestamp in the buffer, increment its frequency.
Otherwise, add the timestamp to the buffer and set its frequency to 1. If the buffer size exceeds 300,
remove the oldest timestamp by incrementing the head index and decrementing the size.
• Implement a getHits method that takes a timestamp parameter and returns the number of hits in the past
5 minutes from that timestamp. Iterate over the timestamps in the buffer starting from the head index
and count the number of timestamps that fall within the 5-minute window from the given timestamp. If
a timestamp is outside of the 5-minute window, stop iterating since the buffer is in chronological order.
This algorithm should provide an efficient solution to the problem of counting the number of hits received in the
past 5 minutes, assuming that the calls are being made to the system in chronological order.
#include <stdlib.h>
typedef struct {
int *timestamps;
int head;
int tail;
int size;
} HitCounter;
/** Initialize your data structure here. */
HitCounter* hitCounterCreate() {
HitCounter *hc = (HitCounter *) malloc(sizeof(HitCounter));
hc->timestamps = (int *) calloc(300, sizeof(int));
hc->head = 0;
hc->tail = 0;
hc->size = 0;
return hc;
}
/** Record a hit.
@param timestamp - The current timestamp (in seconds granularity). */
void hitCounterHit(HitCounter* obj, int timestamp) {
if (obj->size > 0 && timestamp == obj->timestamps[obj->tail]) {
// If the current timestamp is the same as the tail timestamp,
// increment the frequency of the tail timestamp
obj->timestamps[obj->tail]++;
} else {
// Otherwise, add the current timestamp to the buffer and set its frequency to 1
obj->tail = (obj->tail + 1) % 300;
obj->timestamps[obj->tail] = timestamp;
obj->size++;
if (obj->size > 300) {
obj->size = 300;
obj->head = (obj->head + 1) % 300;
}
}
}
/** Return the number of hits in the past 5 minutes.
@param timestamp - The current timestamp (in seconds granularity). */
int hitCounterGetHits(HitCounter* obj, int timestamp) {
int count = 0;
for (int i = 0; i < obj->size; i++) {
int j = (obj->head + i) % 300;
if (timestamp - obj->timestamps[j] < 300) {
count++;
} else {
// If the timestamp at position j is outside of the 5-minute window,
// we can stop iterating since the buffer is in chronological order
break;
}
}
return count;
}
/** Deallocates memory previously allocated for the hit counter. */
void hitCounterFree(HitCounter* obj) {
free(obj->timestamps);
free(obj);
}
The hit method has O(1) time complexity since it only involves adding or updating a timestamp in the circular
buffer, which has a fixed size of 300.
The getHits method has O(min(n, 300)) time complexity, where n is the number of hits in the buffer. The method
iterates over the timestamps in the buffer starting from the head index and counting the number of timestamps
that fall within the 5-minute window from the given timestamp. The worst-case scenario is when all the hits in
the buffer fall within the 5-minute window, which would result in iterating over all n hits in the buffer. However,
since the buffer has a fixed size of 300, the number of hits that are older than 5 minutes is limited to 300, so the
worst-case time complexity is O(min(n, 300)).
Chapter
Trees 6
6.1 What is a Tree?
A 𝑡𝑟𝑒𝑒 is a data structure similar to a linked list but instead of each node pointing simply to the next node in a
linear fashion, each node points to a number of nodes. Tree is an example of a non-linear data structure. A 𝑡𝑟𝑒𝑒
structure is a way of representing the hierarchical nature of a structure in a graphical form.
In trees ADT (Abstract Data Type), the order of the elements is not important. If we need ordering information,
linear data structures like linked lists, stacks, queues, etc. can be used.
6.2 Glossary
root
A
B C D
E F G H I
J K
• The 𝑟𝑜𝑜𝑡 of a tree is the node with no parents. There can be at most one root node in a tree (node 𝐴 in the
above example).
• An 𝑒𝑑𝑔𝑒 refers to the link from parent to child (all links in the figure).
• A node with no children is called 𝑙𝑒𝑎𝑓 node (𝐸, 𝐽, 𝐾, 𝐻 and 𝐼).
• Children of same parent are called 𝑠𝑖𝑏𝑙𝑖𝑛𝑔𝑠 (𝐵, 𝐶, 𝐷 are siblings of 𝐴, and 𝐸, 𝐹 are the siblings of 𝐵).
• A node 𝑝 is an 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟 of node 𝑞 if there exists a path from 𝑟𝑜𝑜𝑡 to 𝑞 and 𝑝 appears on the path. The node 𝑞
is called a 𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡 of 𝑝. For example, 𝐴, 𝐶 and 𝐺 are the ancestors of 𝐾.
• The set of all nodes at a given depth is called the 𝑙𝑒𝑣𝑒𝑙 of the tree (𝐵, 𝐶 and 𝐷 are the same level). The root node
is at level zero.
root 1 Level-0
2 3 Level-1
6 7 Level-2
• The 𝑑𝑒𝑝𝑡ℎ of a node is the length of the path from the root to the node (depth of 𝐺 is 2, 𝐴 − 𝐶 − 𝐺).
• The ℎ𝑒𝑖𝑔ℎ𝑡 of a node is the length of the path from that node to the deepest node. The height of a tree is the
length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root)
has a height of zero. In the previous example, the height of 𝐵 is 2 (𝐵 − 𝐹 − 𝐽).
• 𝐻𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑟𝑒𝑒 is the maximum height among all the nodes in the tree and 𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑟𝑒𝑒 is the maximum
depth among all the nodes in the tree. For a given tree, depth and height returns the same value. But for
individual nodes we may get different results.
• The size of a node is the number of descendants it has including itself (the size of the subtree 𝐶 is 3).
• If every node in a tree has only one child (except leaf nodes) then we call such trees 𝑠𝑘𝑒𝑤 𝑡𝑟𝑒𝑒𝑠. If every node
has only left child then we call them 𝑙𝑒𝑓𝑡 𝑠𝑘𝑒𝑤 𝑡𝑟𝑒𝑒𝑠. Similarly, if every node has only right child then we call
them 𝑟𝑖𝑔ℎ𝑡 𝑠𝑘𝑒𝑤 𝑡𝑟𝑒𝑒𝑠.
2 2 2
3 3 3
2 3
Left Right
Subtree Subtree Example
4 5 6 7
root 1
2 3
4 5
Full Binary Tree: A binary tree is called 𝑓𝑢𝑙𝑙 𝑏𝑖𝑛𝑎𝑟𝑦 𝑡𝑟𝑒𝑒 if each node has exactly two children and all leaf nodes
are at the same level.
root 1
2 3
4 5 6 7
Complete Binary Tree: Before defining the 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑏𝑖𝑛𝑎𝑟𝑦 𝑡𝑟𝑒𝑒, let us assume that the height of the binary tree
is ℎ. In complete binary trees, if we give numbering for the nodes by starting at the root (let us say the root node
has 1) then we get a complete sequence from 1 to the number of nodes in the tree. While traversing we should give
numbering for NULL pointers also. A binary tree is called 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑏𝑖𝑛𝑎𝑟𝑦 𝑡𝑟𝑒𝑒 if all leaf nodes are at height ℎ or
ℎ − 1 and also without any missing number in the sequence.
root 1
2 3
4 5 6
root 1
ℎ=1 21 = 2
2 3
root 1
2 3 ℎ=2 22 = 4
4 5 6
data Or data
struct BinaryTreeNode {
int data;
struct BinaryTreeNode *left;
struct BinaryTreeNode *right;
};
Note: In trees, the default flow is from parent to children and it is not mandatory to show directed branches. For
our discussion, we assume both the representations shown below are the same.
data data
Traversal Possibilities
Starting at the root of a binary tree, there are three main steps that can be performed and the order in which they
are performed defines the traversal type. These steps are: performing an action on the current node (referred to
as "visiting" the node and denoted with “𝐷”), traversing to the left child node (denoted with “𝐿”), and traversing to
the right child node (denoted with “𝑅”). This process can be easily described through recursion. Based on the
above definition there are 6 possibilities:
1. 𝐿𝐷𝑅: Process left subtree, process the current node data and then process right subtree
2. 𝐿𝑅𝐷: Process left subtree, process right subtree and then process the current node data
3. 𝐷𝐿𝑅: Process the current node data, process left subtree and then process right subtree
4. 𝐷𝑅𝐿: Process the current node data, process right subtree and then process left subtree
5. 𝑅𝐷𝐿: Process right subtree, process the current node data and then process left subtree
6. 𝑅𝐿𝐷: Process right subtree, process left subtree and then process the current node data
root 1
2 3
4 5 6 7
PreOrder Traversal
In preorder traversal, each node is processed before (pre) either of its subtrees. This is the simplest traversal to
understand. However, even though each node is processed before the subtrees, it still requires that some
information must be maintained while moving down the tree. In the example above, 1 is processed first, then the
left subtree, and this is followed by the right subtree.
Therefore, processing must return to the right subtree after finishing the processing of the left subtree. To move
to the right subtree after processing the left subtree, we must maintain the root information. The obvious ADT for
such information is a stack. Because of its LIFO structure, it is possible to get the information about the right
subtrees back in the reverse order.
Preorder traversal is defined as follows:
• Visit the root.
• Traverse the left subtree in Preorder.
• Traverse the right subtree in Preorder.
The nodes of tree would be visited in the order: 1 2 4 5 3 6 7
void preOrder(struct BinaryTreeNode *root){
if(root) {
printf(“%d”,root→data);
preOrder(root→left);
preOrder (root→right);
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
InOrder Traversal
In Inorder Traversal the root is visited between the subtrees. Inorder traversal is defined as follows:
• Traverse the left subtree in Inorder.
• Visit the root.
• Traverse the right subtree in Inorder.
The nodes of tree would be visited in the order: 4 2 5 1 6 3 7
void inOrder(struct BinaryTreeNode *root){
if(root) {
inOrder (root→left);
printf(“%d”,root→data);
inOrder (root→right);
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
PostOrder Traversal
In postorder traversal, the root is visited after both subtrees. Postorder traversal is defined as follows:
• Traverse the left subtree in Postorder.
• Traverse the right subtree in Postorder.
• Visit the root.
The nodes of the tree would be visited in the order: 4 5 2 6 7 3 1
void postOrder(struct BinaryTreeNode *root){
if(root) {
postOrder(root→left);
postOrder(root→right);
printf(“%d”,root→data);
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
push(S, root);
root = root->left;
}
while(root == NULL && !isEmpty(S)){
root = top(S);
if(root->right == NULL || root->right == previous){
printf("%d ", root->data);
pop(S);
previous = root;
root = NULL;
}
else
root = root->right;
}
} while(!isEmpty(S)); // End of do-while
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
max = left;
else max = right;
if(root_val > max)
max = root_val;
}
return max;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-2 Give an algorithm for finding the maximum element in binary tree without recursion.
Solution: Using level order traversal: just observe the element’s data while deleting.
int findMaxUsingLevelOrder(struct BinaryTreeNode *root){
struct BinaryTreeNode *temp;
int max = INT_MIN;
struct Queue *Q = createQueue();
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
// largest of the three values
if(max < temp->data)
max = temp->data;
if(temp->left)
enQueue (Q, temp->left);
if(temp->right)
enQueue (Q, temp->right);
}
deleteQueue(Q);
return max;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-3 Give an algorithm for searching an element in binary tree.
Solution: Given a binary tree, return true if a node with data is found in the tree. Recurse down the tree, choose
the left or right branch by comparing data with each node’s data.
int findInBinaryTreeUsingRecursion(struct BinaryTreeNode *root, int data) {
int temp;
// Base case == empty tree, in that case, the data is not found so return false
if(root == NULL)
return 0;
else {
//see if found here
if(data == root→data)
return 1;
else {
// otherwise recur down the correct subtree
temp = findInBinaryTreeUsingRecursion (root→left, data)
if(temp != 0)
return temp;
else return(findInBinaryTreeUsingRecursion(root→right, data));
}
}
return 0;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-4 Give an algorithm for searching an element in binary tree without recursion.
Solution: We can use level order traversal for solving this problem. The only change required in level order
traversal is, instead of printing the data, we just need to check whether the root data is equal to the element we
want to search.
int searchUsingLevelOrder(struct BinaryTreeNode *root, int data){
Solution: Calculate the size of left and right subtrees recursively, add 1 (current node) and return to its parent.
// Compute the number of nodes in a tree.
int sizeOfBinaryTree(struct BinaryTreeNode *root) {
if(root==NULL)
return 0;
else
return(sizeOfBinaryTree(root→left) + 1 + sizeOfBinaryTree(root→right));
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-7 Can we solve Problem-6 without recursion?
Solution: Yes, using level order traversal.
int sizeofBTUsingLevelOrder(struct BinaryTreeNode *root){
struct BinaryTreeNode *temp;
if(!root)
return 0;
struct Queue *Q = createQueue();;
int count = 0;
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
count++;
if(temp→left)
enQueue (Q, temp→left);
if(temp→right)
enQueue (Q, temp→right);
}
deleteQueue(Q);
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-8 Give an algorithm for printing the level order data in reverse order. For example, the output for
the below tree should be: 4 5 6 7 2 3 1
root 1
2 3
4 5 6 7
Solution:
void levelOrderTraversalInReverse(struct BinaryTreeNode *root){
if(!root)
return;
struct Queue *Q = createQueue();
struct Stack *S = createStack();
struct BinaryTreeNode *temp;
enQueue(Q, root);
while(!isEmptyQueue(Q)) {
temp = deQueue(Q);
if(temp->right)
enQueue(Q, temp->right);
if(temp->left)
enQueue (Q, temp->left);
push(S, temp);
}
while(!isEmptyStack(S))
printf("%d",pop(S)->data);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
2 3
4 5
To delete a tree, we must traverse all the nodes of the tree and delete them one by one. So which traversal should
we use: Inorder, Preorder, Postorder or Level order Traversal?
Before deleting the parent node we should delete its children nodes first. We can use postorder traversal as it does
the work without storing anything. We can delete tree with other traversals also with extra space complexity. For
the following, tree nodes are deleted in order – 4, 5, 2, 3, 1.
void deleteBinaryTree(struct BinaryTreeNode *root){
if(root == NULL)
return;
/* first delete both subtrees */
deleteBinaryTree(root→left);
deleteBinaryTree(root→right);
//Delete current node only after deleting subtrees
free(root);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-10 Give an algorithm for finding the height (or depth) of the binary tree.
Solution: Recursively calculate height of left and right subtrees of a node and assign height to the node as max
of the heights of two children plus 1. This is similar to 𝑃𝑟𝑒𝑂𝑟𝑑𝑒𝑟 tree traversal (and 𝐷𝐹𝑆 of Graph algorithms).
int heightOfBinaryTree(struct BinaryTreeNode *root){
int leftheight, rightheight;
if(root == NULL)
return 0;
else {
/* compute the depth of each subtree */
leftheight = heightOfBinaryTree(root→left);
rightheight = heightOfBinaryTree(root→right);
if(leftheight > rightheight)
return(leftheight + 1);
else
return(rightheight + 1);
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-11 Can we solve Problem-10 without recursion?
Solution: Yes, using level order traversal. This is similar to 𝐵𝐹𝑆 of Graph algorithms. End of level is identified with
NULL.
int findHeightofBinaryTree(struct BinaryTreeNode *root){
if(!root)
return 0;
int level = 0;
struct Queue *Q = createQueue();
// End of first level
enQueue(Q,NULL);
while(!isEmpty(Q)) {
root=deQueue(Q);
// Completion of current level.
if(root==NULL) {
// Put another marker for next level.
if(!isEmpty(Q))
enQueue(Q, NULL);
level++;
}
else {
if(root->left)
enQueue(Q, root->left);
if(root->right)
enQueue(Q, root->right);
}
}
return level;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-12 Give an algorithm for finding the deepest node of the binary tree.
Solution:
struct BinaryTreeNode *deepestNodeinBinaryTree(struct BinaryTreeNode *root){
if(!root)
return NULL;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
if(temp->left)
enQueue(Q, temp->left);
if(temp->right)
enQueue(Q, temp->right);
}
deleteQueue(Q);
return temp;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-13 Give an algorithm for deleting an element (assuming data is given) from binary tree.
Solution: The deletion of a node in binary tree can be implemented as
• Starting at root, find the node which we want to delete.
• Find the deepest node in the tree.
• Replace the deepest node’s data with node to be deleted.
• Then delete the deepest node.
Problem-14 Give an algorithm for finding the number of leaves in the binary tree without using recursion.
Solution: The set of nodes whose both left and right children are NULL are called leaf nodes.
int numberOfLeavesInBTusingLevelOrder(struct BinaryTreeNode *root){
if(!root)
return 0;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
if(!temp->left && !temp->right)
count++;
else {
if(temp->left)
enQueue(Q, temp->left);
if(temp->right)
enQueue(Q, temp->right);
}
}
deleteQueue(Q);
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-15 Give an algorithm for finding the number of full nodes in the binary tree without using recursion.
Solution: The set of all nodes with both left and right children are called full nodes.
int numberOfFullNodesInBTusingLevelOrder(struct BinaryTreeNode *root){
if(!root)
return 0;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
int count = 0;
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
if(temp→left && temp→right)
count++;
if(temp→left)
enQueue (Q, temp→left);
if(temp→right)
enQueue (Q, temp→right);
}
deleteQueue(Q);
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-16 Give an algorithm for finding the number of half nodes (nodes with only one child) in the binary
tree without using recursion.
Solution: The set of all nodes with either left or right child (but not both) are called half nodes.
int numberOfHalfNodesInBTusingLevelOrder(struct BinaryTreeNode *root){
if(!root)
return 0;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
int count = 0;
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
//we can use this condition also instead of two temp->left ^ temp->right
if(!temp->left && temp->right || temp->left && !temp->right)
count++;
if(temp->left)
enQueue (Q, temp->left);
if(temp->right)
enQueue (Q, temp->right);
}
deleteQueue(Q);
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-17 Given two binary trees, return true if they are structurally identical.
Solution:
Algorithm:
• If both trees are NULL then return true.
• If both trees are not NULL, then compare data and recursively check left and right subtree structures.
//Return true if they are structurally identical.
int areStructurullySameTrees(struct BinaryTreeNode *root1, struct BinaryTreeNode *root2) {
// both empty->1
if(root1==NULL && root2==NULL)
return 1;
if(root1==NULL || root2==NULL)
return 0;
// both non-empty->compare them
return(root1->data == root2->data && areStructurullySameTrees(root1->left, root2->left)
&& areStructurullySameTrees(root1->right, root2->right));
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for recursive stack.
Problem-18 Give an algorithm for finding the diameter of the binary tree. The diameter of a tree (sometimes
called the 𝑤𝑖𝑑𝑡ℎ) is the number of nodes on the longest path between two leaves in the tree.
Solution: To find the diameter of a tree, first calculate the diameter of left subtree and right subtrees recursively.
Among these two values, we need to send maximum value along with current level (+1).
int diameterOfTree(struct BinaryTreeNode *root, int *ptr){
int left, right;
if(!root)
return 0;
left = diameterOfTree(root→left, ptr);
right = diameterOfTree(root→right, ptr);
if(left + right > *ptr)
*ptr = left + right;
return max(left, right)+1;
}
//Alternative Coding
static int diameter(struct BinaryTreeNode *root) {
if (root == NULL)
return 0;
int lHeight = height(root->eft);
int rHeight = height(root-right);
int lDiameter = diameter(root-left);
int rDiameter = diameter(root-right);
return max(lHeight + rHeight + 1, max(lDiameter, rDiameter));
}
/* The function Compute the "height" of a tree. Height is the number of nodes along
the longest path from the root node down to the farthest leaf node.*/
static int height(Node root) {
if (root == null)
return 0;
return 1 + max(height(root.left), height(root.right));
}
There is another solution and the complexity is O(n). The main idea of this approach is that the node stores its
left child's and right child's maximum diameter if the node's child is the "root", therefore, there is no need to
recursively call the height method. The drawback is we need to add two extra variables in the node structure.
int findMaxLen(Node root) {
int nMaxLen = 0;
if (root == null)
return 0;
if (root.left == null)
root.nMaxLeft = 0;
if (root.right == null)
root.nMaxRight = 0;
if (root.left != null)
findMaxLen(root.left);
if (root.right != null)
findMaxLen(root.right);
if (root.left != null) {
int nTempMaxLen = 0;
nTempMaxLen = (root.left.nMaxLeft > root.left.nMaxRight) ? root.left.nMaxLeft : root.left.nMaxRight;
root.nMaxLeft = nTempMaxLen + 1;
}
if (root.right != null) {
int nTempMaxLen = 0;
nTempMaxLen = (root.right.nMaxLeft > root.right.nMaxRight) ?
root.right.nMaxLeft : root.right.nMaxRight;
root.nMaxRight = nTempMaxLen + 1;
}
if (root.nMaxLeft + root.nMaxRight > nMaxLen) nMaxLen = root.nMaxLeft + root.nMaxRight;
return nMaxLen;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-19 Give an algorithm for finding the level that has the maximum sum in the binary tree.
Solution: The logic is very much similar to finding the number of levels. The only change is, we need to keep track
of the sums as well.
int findLevelwithMaxSum(struct BinaryTreeNode *root){
if(!root)
return 0;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
int level=0, maxLevel=0, currentSum = 0, maxSum = 0;
enQueue(Q,root);
enQueue(Q,NULL); // End of first level.
while(!isEmpty(Q)) {
temp =deQueue(Q);
// If the current level is completed then compare sums
if(temp == NULL) {
if(currentSum> maxSum) {
maxSum = currentSum;
maxLevel = level;
}
currentSum = 0;
//place the indicator for end of next level at the end of queue
if(!isEmpty(Q))
enQueue(Q,NULL);
level++;
}
else {
currentSum += temp->data;
if(temp->left)
enQueue(temp, temp->left);
if(root->right)
enQueue(temp, temp->right);
}
}
return maxLevel;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-20 Given a binary tree, print out all its root-to-leaf paths.
Solution: Refer to comments in functions.
void printPathsRecur(struct BinaryTreeNode *root, int path[], int pathLen) {
if(root ==NULL)
return;
// append this node to the path array
path[pathLen] = root→data;
pathLen++;
// it's a leaf, so print the path that led to here
if(root→left==NULL && root→right==NULL)
printArray(path, pathLen);
else {
6.6 Binary Tree Traversals 162
Data Structures and Algorithms Made Easy Trees
return sum;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-24 Give an algorithm for converting a tree to its mirror. Mirror of a tree is another tree with left and
right children of all non-leaf nodes interchanged. The trees below are mirrors to each other.
root 1 root 1
2 3 3 2
4 5 5 4
Solution:
struct BinaryTreeNode *mirrorOfBinaryTree(struct BinaryTreeNode *root){
struct BinaryTreeNode * temp;
if(root) {
mirrorOfBinaryTree(root->left);
mirrorOfBinaryTree(root->right);
/* swap the pointers in this node */
temp = root->left;
root->left = root->right;
root->right = temp;
}
return root;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-25 Given two trees, give an algorithm for checking whether they are mirrors of each other.
Solution:
int areMirrors(struct BinaryTreeNode * root1, struct BinaryTreeNode * root2) {
if(root1 == NULL && root2 == NULL) return 1;
if(root1 == NULL || root2 == NULL) return 0;
if(root1->data != root2->data)
return 0;
else
return areMirrors(root1->left, root2->right) && areMirrors(root1->right, root2->left);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-26 Give an algorithm for finding LCA (Least Common Ancestor) of two nodes in a Binary Tree.
Solution:
struct BinaryTreeNode *LCA(struct BinaryTreeNode *root, struct BinaryTreeNode *α, struct BinaryTreeNode
*β){
struct BinaryTreeNode *left, *right;
if(root == NULL)
return root;
if(root == α || root == β)
return root;
left = LCA (root→left, α, β );
right = LCA (root→right, α, β );
if(left && right)
return root;
else return (left? left: right)
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛) for recursion.
Problem-27 Give an algorithm for constructing binary tree from given Inorder and Preorder traversals.
Solution: Let us consider the traversals below:
Inorder sequence: D B E A F C
Preorder sequence: A B D E C F
DBE FC
In a Preorder sequence, leftmost element denotes the root of the tree. So we know ‘𝐴’ is the root for given sequences.
By searching ‘𝐴’ in Inorder sequence we can find out all elements on the left side of ‘𝐴’, which come under the left
subtree, and elements on the right side of ‘𝐴’, which come under the right subtree. So we get the structure as seen
below.
We recursively follow the above steps and get the following tree.
root A
B C
D E F
Algorithm: buildTree()
1 Select an element from 𝑃𝑟𝑒𝑜𝑟𝑑𝑒𝑟. Increment a 𝑃𝑟𝑒𝑜𝑟𝑑𝑒𝑟 index variable (𝑝𝑟𝑒𝑂𝑟𝑑𝑒𝑟𝐼𝑛𝑑𝑒𝑥 in code below) to
pick next element in next recursive call.
2 Create a new tree node (𝑛𝑒𝑤𝑁𝑜𝑑𝑒) from heap with the data as selected element.
3 Find the selected element’s index in Inorder. Let the index be 𝑖𝑛𝑂𝑟𝑑𝑒𝑟𝐼𝑛𝑑𝑒𝑥.
4 Call BuildBinaryTree for elements before 𝑖𝑛𝑂𝑟𝑑𝑒𝑟𝐼𝑛𝑑𝑒𝑥 and make the built tree as left subtree of 𝑛𝑒𝑤𝑁𝑜𝑑𝑒.
5 Call BuildBinaryTree for elements after 𝑖𝑛𝑂𝑟𝑑𝑒𝑟𝐼𝑛𝑑𝑒𝑥 and make the built tree as right subtree of 𝑛𝑒𝑤𝑁𝑜𝑑𝑒.
6 return 𝑛𝑒𝑤𝑁𝑜𝑑𝑒.
struct BinaryTreeNode *buildBinaryTree(int inOrder[], int preOrder[], int inOrderStart, int inOrderEnd){
static int preOrderIndex = 0;
struct BinaryTreeNode *newNode;
if(inOrderStart > inOrderEnd) return NULL;
newNode = (struct BinaryTreeNode *) malloc (sizeof(struct BinaryTreeNode));
if(!newNode) {
printf(“Memory Error”);
return NULL;
}
// Select current node from Preorder traversal using preOrderIndex
newNode→data = preOrder[preOrderIndex];
preOrderIndex++;
if(inOrderStart == inOrderEnd)
return newNode;
// find the index of this node in Inorder traversal
int inOrderIndex = search(inOrder, inOrderStart, inOrderEnd, newNode→data);
//Fill the left and right subtrees using index in Inorder traversal
newNode→left = buildBinaryTree(inOrder, preOrder, inOrderStart, inOrderIndex -1);
newNode→right = buildBinaryTree(inOrder, preOrder, inOrderIndex +1, inOrderEnd);
return newNode;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-28 If we are given two traversal sequences, can we construct the binary tree uniquely?
Solution: It depends on what traversals are given. If one of the traversal methods is 𝐼𝑛𝑜𝑟𝑑𝑒𝑟 then the tree can be
constructed uniquely, otherwise not.
Therefore, the following combinations can uniquely identify a tree:
• Inorder and Preorder
• Inorder and Postorder
• Inorder and Level-order
The following combinations do not uniquely identify a tree.
• Postorder and Preorder
• Preorder and Level-order
• Postorder and Level-order
For example, Preorder, Level-order and Postorder traversals are the same for the above trees:
A A
B B
2 3
4 5 6 7
Solution: Apart from the Depth First Search of this tree, we can use the following recursive way to print the
ancestors.
int printAllAncestors(struct BinaryTreeNode *root, struct BinaryTreeNode *node){
if(root == NULL) return 0;
if(root→left == node || root→right == node ||
printAllAncestors(root→left, node) || printAllAncestors(root→right, node)) {
printf(“%d”,root→data);
return 1;
}
return 0;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛) for recursion.
Problem-30 Zigzag Tree Traversal: Give an algorithm to traverse a binary tree in Zigzag order. For example,
the output for the tree below should be: 1 3 2 4 5 6 7
root 1
2 3
4 5 6 7
Solution: This problem can be solved easily using two stacks. Assume the two stacks are: 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐿𝑒𝑣𝑒𝑙 and
𝑛𝑒𝑥𝑡𝐿𝑒𝑣𝑒𝑙. We would also need a variable to keep track of the current level order (whether it is left to right or right
to left). We pop from 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐿𝑒𝑣𝑒𝑙 stack and print the node’s value. Whenever the current level order is from left to
right, push the node’s left child, then its right child, to stack 𝑛𝑒𝑥𝑡𝐿𝑒𝑣𝑒𝑙. Since a stack is a Last In First Out (𝐿𝐼𝐹𝑂)
structure, the next time that nodes are popped off nextLevel, it will be in the reverse order.
On the other hand, when the current level order is from right to left, we would push the node’s right child first,
then its left child. Finally, don't forget to swap those two stacks at the end of each level (𝑖. 𝑒., when 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐿𝑒𝑣𝑒𝑙 is
empty).
void zigZagTraversal(struct BinaryTreeNode *root){
struct BinaryTreeNode *temp;
int leftToRight = 1;
if(!root)
return;
struct Stack *currentLevel = createStack(), *nextLevel = createStack();
push(currentLevel, root);
while(!isEmpty(currentLevel)) {
temp = pop(currentLevel);
if(temp) {
printf(“%d”,temp→data);
if(leftToRight) {
if(temp→left)
push(nextLevel, temp→left);
if(temp→right)
push(nextLevel, temp→right);
}
else {
if(temp→right)
push(nextLevel, temp→right);
if(temp→left)
push(nextLevel, temp→left);
}
}
if(isEmpty(currentLevel)) {
leftToRight = 1-leftToRight;
swap(currentLevel, nextLevel);
}
}
}
Time Complexity: O(𝑛). Space Complexity: Space for two stacks = O(𝑛) + O(𝑛) = O(𝑛).
Problem-31 Give an algorithm for finding the vertical sum of a binary tree. For example,
The tree has 5 vertical lines
Vertical-1: nodes-4 => vertical sum is 4
Vertical-2: nodes-2 => vertical sum is 2
Vertical-3: nodes-1,5,6 => vertical sum is 1 + 5 + 6 = 12
Vertical-4: nodes-3 => vertical sum is 3
Vertical-5: nodes-7 => vertical sum is 7
We need to output: 4 2 12 3 7
root 1
2 3
4 5 6 7
Solution: This problem can be easily solved with the help of hashing tables. The idea is to create an empty map
where each key represents the relative horizontal distance of a node from the root node and value in the map
maintains sum of all nodes present at same horizontal distance. Then we do a pre-order traversal of the tree and
we update the sum for current horizontal distance in the map. For each node, we recur for its left subtree by
decreasing horizontal distance by 1 and recur for right subtree by increasing horizontal distance by 1.
void verticalSumInBinaryTree (struct BinaryTreeNode *root, int column){
if(root==NULL)
return;
verticalSumInBinaryTree(root→left, column-1);
//Refer Hashing chapter for implementation of hash table
Hash[column] += root→data;
verticalSumInBinaryTree(root→right, column+1);
}
verticalSumInBinaryTree(root, 0);
Print Hash;
Problem-32 How many different binary trees are possible with 𝑛 nodes?
Solution: For example, consider a tree with 3 nodes (𝑛 = 3). It will have the maximum combination of 5 different
(i.e., 23 − 3 = 5) trees.
root I
L I
L L
Solution: First, we should see how preorder traversal is arranged. Pre-order traversal means first put root node,
then pre-order traversal of left subtree and then pre-order traversal of right subtree. In a normal scenario, it’s not
possible to detect where left subtree ends and right subtree starts using only pre-order traversal. Since every node
has either 2 children or no child, we can surely say that if a node exists then its sibling also exists. So every time
when we are computing a subtree, we need to compute its sibling subtree as well.
Secondly, whenever we get ‘L’ in the input string, that is a leaf and we can stop for a particular subtree at that
point. After this ‘L’ node (left child of its parent ‘L’), its sibling starts. If ‘L’ node is right child of its parent, then we
need to go up in the hierarchy to find the next subtree to compute.
Keeping the above invariant in mind, we can easily determine when a subtree ends and the next one starts. It
means that we can give any start node to our method and it can easily complete the subtree it generates going
outside of its nodes. We just need to take care of passing the correct start nodes to different sub-trees.
struct BinaryTreeNode *buildTreeFromPreOrder(char* A, int *i){
struct BinaryTreeNode *newNode;
newNode = (struct BinaryTreeNode *) malloc(sizeof(struct BinaryTreeNode));
newNode→data = A[*i];
newNode→left = newNode→right = NULL;
if(A == NULL){ //Boundary Condition
free(newNode);
return NULL;
}
if(A[*i] == 'L') //On reaching leaf node, return
return newNode;
*i = *i + 1; //populate left sub tree
newNode→left = buildTreeFromPreOrder(A, i);
*i = *i + 1; //populate right sub tree
newNode→right = buildTreeFromPreOrder(A, i);
return newNode;
}
Time Complexity: O(𝑛).
Problem-34 Given a binary tree with three pointers (left, right and nextSibling), give an algorithm for filling
the 𝑛𝑒𝑥𝑡𝑆𝑖𝑏𝑙𝑖𝑛𝑔 pointers assuming they are NULL initially.
Solution: We can use simple queue (similar to the solution of Problem-11). Let us assume that the structure of
binary tree is:
struct BinaryTreeNode {
struct BinaryTreeNode* left;
struct BinaryTreeNode* right;
struct BinaryTreeNode* nextSibling;
};
int fillNextSiblings(struct BinaryTreeNode *root){
if(!root)
return 0;
struct BinaryTreeNode *temp;
struct Queue *Q = createQueue();
enQueue(Q,root);
enQueue(Q,NULL);
while(!isEmpty(Q)) {
temp =deQueue(Q);
// Completion of current level.
if(temp ==NULL) { //Put another marker for next level.
if(!isEmpty(Q))
enQueue(Q,NULL);
}
else {
temp->nextSibling = QueueFront(Q);
if(root->left)
enQueue(Q, temp->left);
if(root->right)
enQueue(Q, temp->right);
}
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-35 Is there any other way of solving Problem-34?
Solution: The trick is to re-use the populated 𝑛𝑒𝑥𝑡𝑆𝑖𝑏𝑙𝑖𝑛𝑔 pointers. As mentioned earlier, we just need one more
step for it to work. Before we pass the 𝑙𝑒𝑓𝑡 and 𝑟𝑖𝑔ℎ𝑡 to the recursion function itself, we connect the right child’s
𝑛𝑒𝑥𝑡𝑆𝑖𝑏𝑙𝑖𝑛𝑔 to the current node’s nextSibling left child. In order for this to work, the current node 𝑛𝑒𝑥𝑡𝑆𝑖𝑏𝑙𝑖𝑛𝑔
pointer must be populated, which is true in this case.
void fillNextSiblings(struct BinaryTreeNode* root) {
if (!root) return;
if (root→left) root→left→nextSibling = root→right;
if (root→right) root→right→nextSibling = (root→nextSibling) ? root→nextSibling→left: NULL;
fillNextSiblings(root→left);
fillNextSiblings(root→right);
}
Time Complexity: O(𝑛).
Problem-36 Given a binary tree, find its minimum depth. The minimum depth of a binary tree is the number
of nodes along the shortest path from the root node down to the nearest leaf node. For example, minimum depth
of the following binary tree is 3.
6 8
Minimum depth: 2
Depth: 3
2 7
3 Depth: 4
Solution: The algorithm is similar to the algorithm of finding depth (or height) of a binary tree, except here we are
finding minimum depth. One simplest approach to solve this problem would be by using recursion. But the
question is when do we stop it? We stop the recursive calls when it is a leaf node or 𝑁𝑜𝑛𝑒.
Algorithm:
Let 𝑟𝑜𝑜𝑡 be the pointer to the root node of a subtree.
• If the 𝑟𝑜𝑜𝑡 is equal to 𝑁𝑈𝐿𝐿, then the minimum depth of the binary tree would be 0.
• If the 𝑟𝑜𝑜𝑡 is a leaf node, then the minimum depth of the binary tree would be 1.
• If the 𝑟𝑜𝑜𝑡 is not a leaf node and if left subtree of the 𝑟𝑜𝑜𝑡 is 𝑁𝑈𝐿𝐿, then find the minimum depth in the
right subtree. Otherwise, find the minimum depth in the left subtree.
• If the 𝑟𝑜𝑜𝑡 is not a leaf node and both left subtree and right subtree of the 𝑟𝑜𝑜𝑡 are not 𝑁𝑈𝐿𝐿, then
recursively find the minimum depth of left and right subtree. Let it be 𝑙𝑒𝑓𝑡𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑀𝑖𝑛𝐷𝑒𝑝𝑡ℎ and
𝑟𝑖𝑔ℎ𝑡𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑀𝑖𝑛𝐷𝑒𝑝𝑡ℎ respectively.
• To get the minimum height of the binary tree rooted at root, we will take minimum of 𝑙𝑒𝑓𝑡𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑀𝑖𝑛𝐷𝑒𝑝𝑡ℎ
and 𝑟𝑖𝑔ℎ𝑡𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑀𝑖𝑛𝐷𝑒𝑝𝑡ℎ and 1 for the 𝑟𝑜𝑜𝑡 node.
class Solution {
public:
int minDepth(BinaryTreeNode * root) {
// If root (tree) is empty, minimum depth would be 0
if(!root)
return 0;
6 8 Depth: 2
Depth: 3 2 7
3 Maximum depth: 4
Problem-37 Given a binary tree, how do we check whether it is a complete binary tree?
Solution: A complete binary tree is a binary tree in which all the levels are completely filled except possibly the
lowest one, which is filled from the left. A complete binary tree is a binary tree in which every level, except possibly
the last, is completely filled, and all nodes are as far left as possible. Whereas, a full binary tree (sometimes proper
binary tree or 2-tree) is a tree in which every node other than the leaves has two children.
If we have say, 4 nodes in a row with depth 3 and positions 0, 1, 2, 3; and we want 8 new nodes in a row with
depth 4 and positions 0, 1, 2, 3, 4, 5, 6, 7; then we can see that the rule for going from a node to its left child is
(depth, position) -> (depth + 1, position * 2), and the rule for going from a node to its right child is (depth, position)
-> (depth + 1, position * 2 + 1). Then, our row at depth dd is completely filled if it has 2𝑑−1 nodes, and all the nodes
in the last level are left-justified when their positions take the form 0, 1, ... in sequence with no gaps.
bool isCompleteBinaryTree(BinaryTreeNode* root){
if (root == NULL)
return true;
BinaryTreeNode *p, *r;
std::queue<BinaryTreeNode*> q{}; // Queue is created
q.push(root);
int count = 0;
while (!q.empty()){ // while q is empty it will run
int z = q.size();
for (int i = 0; i < z; i++) {
p = q.front();
q.pop();
if (p == NULL){ // If null is recived all the next elements must be
count++; // null in order to be a complete binary tree
while (!q.empty()) {
r = q.front();
q.pop();
if (r != NULL)
return false;
}
break;
}
q.push(p->left);
q.push(p->right);
}
}
return true; // If all nodes after first null are null, it returns true;
}
B C D E F G
H I J K L M N
P Q
How do we represent the tree?
In the above tree, there are nodes with 6 children, with 3 children, with 2 children, with 1 child, and with zero
children (leaves). To present this tree we have to consider the worst case (6 children) and allocate that many child
pointers for each node. Based on this, the node representation can be given as:
struct TreeNode{
int data;
struct TreeNode *firstChild;
struct TreeNode *secondChild;
struct TreeNode *thirdChild;
struct TreeNode *fourthChild;
struct TreeNode *fifthChild;
struct TreeNode *sixthChild;
};
Since we are not using all the pointers in all the cases, there is a lot of memory wastage. Another problem is that
we do not know the number of children for each node in advance. In order to solve this problem we need a
representation that minimizes the wastage and also accepts nodes with any number of children.
B C D E F G
H I J K L M N
P Q
What these above statements say is if we have a link between children then we do not need extra links from parent
to all children. This is because we can traverse all the elements by starting at the first child of the parent. So if we
have a link between parent and first child and also links between all children of same parent then it solves our
problem.
This representation is sometimes called first child/next sibling representation. First child/next sibling
representation of the generic tree is shown above. The actual representation for this tree is:
A Element
First Child
None Next
Sibling
B A A
None None
……..
A
None
None
Based on this discussion, the tree node declaration for general tree can be given as:
struct TreeNode {
int data;
struct TreeNode *firstChild;
struct TreeNode *nextSibling;
};
Note: Since we are able to convert any generic tree to binary representation; in practice we use binary trees. We
can treat all generic trees with a first child/next sibling representation as binary trees.
Solution: Similar to the above discussion, if we want to get minimum height, then we need to fill all nodes with
maximum children (in this case 4). Now let’s see the following table, which indicates the maximum number of
nodes for a given height.
Height, h Maximum Nodes at height, ℎ = 4ℎ Total Nodes height ℎ =
4ℎ+1 −1
3
0 1 1
1 4 1+4
2 4×4 1+ 4 × 4
3 4×4×4 1+ 4 × 4 + 4 × 4 × 4
4ℎ+1 −1
For a given height ℎ the maximum possible nodes are: . To get minimum height, take logarithm on both
𝟑
sides:
4ℎ+1 −1
𝑛= ⟹ 4ℎ+1 = 3𝑛 + 1 ⟹ (ℎ + 1)𝑙𝑜𝑔4 = log(3𝑛 + 1) ⟹ ℎ + 1 = log 4 (3𝑛 + 1) ⟹ ℎ = log 4 (3𝑛 + 1) − 1
3
Problem-41 Given a node in the generic tree, give an algorithm for counting the number of siblings for that
node.
Solution: Since tree is represented with the first child/next sibling method, the tree structure can be given as:
struct TreeNode{
int data;
struct TreeNode *firstChild;
struct TreeNode *nextSibling;
};
For a given node in the tree, we just need to traverse all its next siblings.
int SiblingsCount(struct TreeNode *current){
int count = 0;
while(current) {
count++;
current = current→nextSibling;
}
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-42 Given a node in the generic tree, give an algorithm for counting the number of children for that
node.
Solution: Since the tree is represented as first child/next sibling method, the tree structure can be given as:
struct TreeNode {
int data;
struct TreeNode * firstChild;
struct TreeNode * nextSibling;
};
For a given node in the tree, we just need to point to its first child and keep traversing all its next siblings.
int ChildCount(struct TreeNode * current) {
int count = 0;
current = current→ firstChild;
while (current) {
count++;
current = current→ nextSibling;
}
return count;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-43 Given two trees how do we check whether the trees are isomorphic to each other or not?
Solution: Two binary trees 𝑟𝑜𝑜𝑡1 and 𝑟𝑜𝑜𝑡2 are isomorphic if they have the same structure. The values of the
nodes does not affect whether two trees are isomorphic or not. In the diagram below, the tree in the middle is not
isomorphic to the other trees, but the tree on the right is isomorphic to the tree on the left.
int isIsomorphic(struct TreeNode *root1, struct TreeNode *root2){
if(!root1 && !root2)
return 1;
2 3 5 6 5 6
4 5 6 2 8 3 3 2 8
7 8 4 7 7 4
2 3 5 6
4 5 6 3 8 2
7 8 7 4
Two trees 𝑟𝑜𝑜𝑡1 and 𝑟𝑜𝑜𝑡2 are quasi-isomorphic if 𝑟𝑜𝑜𝑡1 can be transformed into 𝑟𝑜𝑜𝑡2 by swapping the left and
right children of some of the nodes of 𝑟𝑜𝑜𝑡1. Data in the nodes are not important in determining quasi-
isomorphism; only the shape is important. The trees below are quasi-isomorphic because if the children of the
nodes on the left are swapped, the tree on the right is obtained.
int quasiIsomorphic(struct TreeNode * root1, struct TreeNode * root2) {
if (!root1 && !root2) return 1;
if ((!root1 && root2) || (root1 && !root2))
return 0;
return (quasiIsomorphic(root1→ left, root2→ left) && quasiIsomorphic(root1→ right, root2→ right) ||
quasiIsomorphic(root1→ right, root2→ left) && quasiIsomorphic(root1→ left, root2→ right));
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-45 Given a parent array 𝑃, where 𝑃[𝑖] indicates the parent of 𝑖 𝑡ℎ node in the tree (assume parent of
root node is indicated with −1). Give an algorithm for finding the height or depth of the tree.
Solution: For example: if the P is
-1 0 1 6 6 0 0 2 7
0 1 2 3 4 5 6 7 8
Its corresponding tree is: 0
5 1 6
2 3 4
8
From the problem definition, the given array represents the parent array. That means, we need to consider the
tree for that array and find the depth of the tree. The depth of this given tree is 4. If we carefully observe, we just
need to start at every node and keep going to its parent until we reach −1 and also keep track of the maximum
depth among all nodes.
int findDepthInGenericTree(int P[], int n){
int maxDepth =-1, currentDepth =-1, j;
B C D
E F G I J K
As we have seen, in preorder traversal first left subtree is processed then followed by root node and right subtree.
Because of this, to construct a full 𝑘-ary, we just need to keep on creating the nodes without bothering about the
previous constructed nodes. We can use this trick to build the tree recursively by using one global index. The
declaration for 𝑘-ary tree can be given as:
struct K - aryTreeNode {
char data;
struct K - aryTreeNode * child[];
};
int * Ind = 0;
struct K - aryTreeNode * buildK - aryTree(char A[], int n, int k) {
if (n <= 0)
return NULL;
struct K - aryTreeNode * newNode = (struct K - aryTreeNode * ) malloc(sizeof(struct K - aryTreeNode));
if (!newNode) {
printf("Memory Error");
return;
}
newNode→child = (struct K - aryTreeNode * ) malloc(k * sizeof(struct K - aryTreeNode));
if (!newNode→child) {
printf("Memory Error");
return;
}
newNode→ data = A[Ind];
for (int i = 0; i < k; i++) {
if (k * Ind + i < n) {
Ind++;
newNode→child[i] = buildK - aryTree(A, n, k, Ind);
} else newNode→child[i] = NULL;
}
return newNode;
}
Time Complexity: O(𝑛), where 𝑛 is the size of the pre-order array. This is because we are moving sequentially and
not visiting the already constructed nodes.
10 13
19 32 20 2
11
• The storage space required for the stack and queue is large.
• The majority of pointers in any binary tree are NULL. For example, a binary tree with 𝑛 nodes has 𝑛 + 1
NULL pointers and these were wasted.
• It is difficult to find successor node (preorder, inorder and postorder successors) for a given node.
struct ThreadedBinaryTreeNode{
struct ThreadedBinaryTreeNode *left;
int LTag;
int data;
int RTag;
struct ThreadedBinaryTreeNode *right;
};
5 11
2 16 31
To SubTree
With this convention the above tree can be represented as:
1 -- 1 Dummy
Node
5 11
2 16 31
P P
Q Q
• Node 𝑃 has right child (say, 𝑅): In this case we need to traverse 𝑅’𝑠 left subtree and find the left most node
and then update the left and right pointer of that node (as shown below).
P P
Q
R
R
Q
S
S
P→right = Q;
P→RTag = 1;
if(Q→RTag == 1) { //Case-2
Temp = Q→right;
while(Temp→LTag)
Temp = Temp→left;
Temp→left = Q;
}
}
Time Complexity: O(𝑛). Space Complexity: O(1).
+ D
A *
B C
A
Next, an operator '*' is read, so two pointers to trees are popped, a new tree is formed and a pointer to it is pushed
onto the stack.
C B
A
Next, an operator '+' is read, so two pointers to trees are popped, a new tree is formed and a pointer to it is pushed
onto the stack.
A *
C B
Next, an operand ‘D’ is read, a one-node tree is created and a pointer to the corresponding tree is pushed onto the
stack.
A *
C B
Finally, the last symbol (‘/’) is read, two trees are merged and a pointer to the final tree is left on the stack.
+ D
A *
C B
6.10 XOR Trees
This concept is similar to 𝑚𝑒𝑚𝑜𝑟𝑦 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑑𝑜𝑢𝑏𝑙𝑦 𝑙𝑖𝑛𝑘𝑒𝑑 𝑙𝑖𝑠𝑡𝑠 of 𝐿𝑖𝑛𝑘𝑒𝑑 𝐿𝑖𝑠𝑡𝑠 chapter. Also, like threaded binary
trees this representation does not need stacks or queues for traversing the trees. This representation is used for
traversing back (to parent) and forth (to children) using ⊕ operation. To represent the same in XOR trees, for
each node below are the rules used for representation:
• Each nodes left will have the ⊕ of its parent and its left children.
• Each nodes right will have the ⊕ of its parent and its right children.
• The root nodes parent is NULL and also leaf nodes children are NULL nodes.
A
B C
D E F
Based on the above rules and discussion, the tree can be represented as:
NULL⊕B A NULL⊕C
C⊕NULL F C⊕NULL
E⊕NULL G E⊕NULL
The major objective of this presentation is the ability to move to parent as well to children. Now, let us see how to
use this representation for traversing the tree. For example, if we are at node B and want to move to its parent
node A, then we just need to perform ⊕ on its left content with its left child address (we can use right child also
for going to parent node). Similarly, if we want to move to its child (say, left child D) then we have to perform ⊕
on its left content with its parent node address. One important point that we need to understand about this
representation is: When we are at node B, how do we know the address of its children D? Since the traversal starts
at node root node, we can apply ⊕ on root’s left content with NULL. As a result we get its left child, B. When we
are at B, we can apply ⊕ on its left content with A address.
<root→data <root→data
Example: The left tree is a binary search tree and the right tree is not a binary search tree (at node 6 it’s not
satisfying the binary search tree property).
7 3
4 9 1 6
2 5 2 7
6 16
4 9 13
6 16
4 9 13
7
𝑁𝑜𝑛 𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒 version of the above algorithm can be given as:
struct BinarySearchTreeNode *findMax(struct BinarySearchTreeNode * root ) {
if( root == NULL )
return NULL;
while( root→right != NULL )
root = root→right;
return root;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Predecessor(X) Successor(X)
If it does not have a left child, then a node’s inorder predecessor is its first left ancestor.
Predecessor(X)
2 8
1 4
3 5
To insert 𝑑𝑎𝑡𝑎 into binary search tree, first we need to find the location for that element. We can find the location
of insertion by following the same mechanism as that of 𝑓𝑖𝑛𝑑 operation. While finding the location, if the 𝑑𝑎𝑡𝑎 is
already there then we can simply neglect and come out. Otherwise, insert 𝑑𝑎𝑡𝑎 at the last location on the path
traversed.
As an example let us consider the following tree. The dotted node indicates the element (5) to be inserted. To insert
5, traverse the tree using 𝑓𝑖𝑛𝑑 function. At node with key 4, we need to go right, but there is no subtree, so 5 is
not in the tree, and this is the correct location for insertion.
struct BinarySearchTreeNode *insert(struct BinarySearchTreeNode *root, int data) {
if( root == NULL ) {
root = (struct BinarySearchTreeNode *) malloc(sizeof(struct BinarySearchTreeNode));
if( root == NULL ) {
printf(“Memory Error”);
return;
}
else {
root→data = data;
root→left = root→right = NULL;
}
}
else {
if( data < root→data )
2 8
1 5
• If the element to be deleted has one child: In this case we just need to send the current node’s child to
its parent. In the tree below, to delete 4, 4 left subtree is set to its parent node 2.
6
2 8
1 4
• If the element to be deleted has both children: The general strategy is to replace the key of this node with
the largest element of the left subtree and recursively delete that node (which is now empty). The largest
node in the left subtree cannot have a right child, so the second 𝑑𝑒𝑙𝑒𝑡𝑒 is an easy one. As an example, let
us consider the following tree. In the tree below, to delete 8, it is the right child of the root. The key value
is 8. It is replaced with the largest key in its left subtree (7), and then that node is deleted as before (second
case).
4 4
2 8 2 7
5 1 5 1
7 7
6 6
2 8
5 1 β
7
α 6
The main idea of the solution is: while traversing BST from root to bottom, the first node we encounter with value
between 𝛼 and 𝛽, i.e., 𝛼 < 𝑛𝑜𝑑𝑒 → 𝑑𝑎𝑡𝑎 < 𝛽, is the Least Common Ancestor(LCA) of 𝛼 and 𝛽 (where 𝛼 < 𝛽). So just
traverse the BST in pre-order, and if we find a node with value in between 𝛼 and 𝛽, then that node is the LCA. If
its value is greater than both 𝛼 and 𝛽, then the LCA lies on the left side of the node, and if its value is smaller than
both α and β, then the LCA lies on the right side.
struct BinarySearchTreeNode *findLCA(struct BinarySearchTreeNode *root, struct BinarySearchTreeNode *α,
struct BinarySearchTreeNode * β) {
while(1) {
if((α→data < root→data && β→data > root→data) ||
(α→data > root→data && β→data < root→data))
return root;
if(α→data < root→data)
root = root→left;
else root = root→right;
}
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for skew trees.
Problem-50 Give an algorithm for finding the shortest path between two nodes in a BST.
Solution: It’s nothing but finding the LCA of two nodes in BST.
Problem-51 Give an algorithm for counting the number of BSTs possible with 𝑛 nodes.
Solution: This is a DP problem. Refer to chapter on 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 for the algorithm.
Problem-52 Give an algorithm to check whether the given binary tree is a BST or not.
Solution:
root 6
2 8
1 9
Consider the following simple program. For each node, check if the node on its left is smaller and check if the
node on its right is greater. This approach is wrong as this will return true for binary tree below. Checking only at
current node is not enough.
int isBST(struct BinaryTreeNode* root) {
if(root == NULL)
return 1;
// false if left is > than root
if(root→left != NULL && root→left→data > root→data)
return 0;
// false if right is < than root
if(root→right != NULL && root→right→data < root→data)
return 0;
// false if, recursively, the left or right is not a BST
if(!isBST(root→left) || !isBST(root→right))
return 0;
// passing all that, it's a BST
return 1;
}
Problem-53 Can we think of getting the correct algorithm?
Solution: For each node, check if max value in left subtree is smaller than the current node data and min value
in right subtree greater than the node data. It is assumed that we have helper functions 𝐹𝑖𝑛𝑑𝑀𝑖𝑛() and 𝐹𝑖𝑛𝑑𝑀𝑎𝑥()
that return the min or max integer value from a non-empty tree.
/* Returns true if a binary tree is a binary search tree */
int isBST(struct BinaryTreeNode* root) {
if(root == NULL)
return 1;
/* false if the max of the left is > than root */
if(root→left != NULL && findMax(root→left) > root→data)
return 0;
/* false if the min of the right is <= than root */
if(root→right != NULL && findMin(root→right) < root→data)
return 0;
/* false if, recursively, the left or right is not a BST */
if(!isBST(root→left) || !isBST(root→right))
return 0;
/* passing all that, it's a BST */
return 1;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(𝑛).
Problem-54 Can we improve the complexity of Problem-53?
Solution: Yes. A better solution is to look at each node only once. The trick is to write a utility helper function
isBSTUtil(struct BinaryTreeNode* root, int min, int max) that traverses down the tree keeping track of the
narrowing min and max allowed values as it goes, looking at each node only once. The initial values for min and
max should be INT_MIN and INT_MAX — they narrow from there.
Initial call: isBST(root, INT_MIN, INT_MAX);
int isBST(struct BinaryTreeNode *root, int min, int max) {
if(!root)
return 1;
return (root->data >min && root->data < max && isBST(root->left, min, root->data) && isBST(root->right,
root->data, max));
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack space.
Problem-55 Can we further improve the complexity of Problem-53?
Solution: Yes, by using inorder traversal. The idea behind this solution is that inorder traversal of BST produces
sorted lists. While traversing the BST in inorder, at each node check the condition that its key value should be
greater than the key value of its previous visited node. Also, we need to initialize the prev with possible minimum
integer value (say, INT_MIN).
int prev = INT_MIN;
int isBST(struct BinaryTreeNode *root, int *prev) {
if(!root) return 1;
if(!isBST(root→left, prev))
return 0;
if(root→data < *prev)
return 0;
*prev = root→data;
return isBST(root→right, prev);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack space.
Problem-56 Give an algorithm for converting BST to circular DLL with space complexity O(1).
Solution: Convert left and right subtrees to DLLs and maintain end of those lists. Then, adjust the pointers.
struct BinarySearchTreeNode *BST2DLL(struct BinarySearchTreeNode *root, struct BinarySearchTreeNode
**Ltail) {
struct BinarySearchTreeNode *left, *ltail, *right, *rtail;
if(!root) {
*ltail = NULL;
return NULL;
}
left = BST2DLL(root→left, <ail);
right = BST2DLL(root→right, &rtail);
root→left = ltail;
root→right = right;
if(!right)
* ltail = root;
else {
right→left = root;
*ltail = rtail;
}
if(!left)
return root;
else {
ltail→right = root;
return left;
}
}
Time Complexity: O(𝑛).
Problem-57 For Problem-56, is there any other way of solving it?
Solution: Yes. There is an alternative solution based on the divide and conquer method which is quite neat.
struct BinarySearchTreeNode *Append(struct BinarySearchTreeNode *a, struct BinarySearchTreeNode *b) {
struct BinarySearchTreeNode *aLast, *bLast;
if (a==NULL) return b;
if (b==NULL) return a;
aLast = a→left;
bLast = b→left;
aLast→right = b;
b→left = aLast;
bLast→right = a;
a→left = bLast;
return a;
}
struct BinarySearchTreeNode* TreeToList(struct BinarySearchTreeNode *root) {
struct BinarySearchTreeNode *aList, *bList;
if (root==NULL)
return NULL;
aList = TreeToList(root→left);
bList = TreeToList(root→right);
root→left = root;
root→right = root;
aList = Append(aList, root);
aList = Append(aList, bList);
return(aList);
}
Time Complexity: O(𝑛).
Problem-58 Given a sorted doubly linked list, give an algorithm for converting it into balanced binary search
tree.
Solution: Find the middle node and adjust the pointers.
struct DLLNode * DLLtoBalancedBST(struct DLLNode *head) {
struct DLLNode *temp, *p, *q;
if( !head || !head→next)
return head;
temp = findMiddleNode(head);
p = head;
while(p→next != temp)
p = p→next;
p→next = NULL;
q = temp→next;
temp→next = NULL;
temp→prev = DLLtoBalancedBST(head);
temp→next = DLLtoBalancedBST(q);
return temp;
}
Time Complexity: 2𝑇(𝑛/2) + O(𝑛) [for finding the middle node] = O(𝑛𝑙𝑜𝑔𝑛).
Note: For 𝐹𝑖𝑛𝑑𝑀𝑖𝑑𝑑𝑙𝑒𝑁𝑜𝑑𝑒 function refer 𝐿𝑖𝑛𝑘𝑒𝑑 𝐿𝑖𝑠𝑡𝑠 chapter.
Problem-59 Given a sorted array, give an algorithm for converting the array to BST.
Solution: If we have to choose an array element to be the root of a balanced BST, which element should we pick?
The root of a balanced BST should be the middle element from the sorted array. We would pick the middle element
from the sorted array in each iteration. We then create a node in the tree initialized with this element. After the
element is chosen, what is left? Could you identify the sub-problems within the problem?
There are two arrays left — the one on its left and the one on its right. These two arrays are the sub-problems of
the original problem, since both of them are sorted. Furthermore, they are subtrees of the current node’s left and
right child.
The code below creates a balanced BST from the sorted array in O(𝑛) time (𝑛 is the number of elements in the
array). Compare how similar the code is to a binary search algorithm. Both are using the divide and conquer
methodology.
struct BinaryTreeNode * buildBST(int A[], int left, int right) {
struct BinaryTreeNode * newNode;
int mid;
if (left > right)
return NULL;
newNode = (struct BinaryTreeNode * ) malloc(sizeof(struct BinaryTreeNode));
if (!newNode) {
printf(“Memory Error”);
return;
}
if (left == right) {
newNode→ data = A[left];
return left;
if(++count == k)
return root;
return kthSmallestInBST(root→right, k, count);
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-63 Floor and ceiling: If a given key is less than the key at the root of a BST then the floor of the key
(the largest key in the BST less than or equal to the key) must be in the left subtree. If the key is greater than
the key at the root, then the floor of the key could be in the right subtree, but only if there is a key smaller than
or equal to the key in the right subtree; if not (or if the key is equal to the key at the root) then the key at the
root is the floor of the key. Finding the ceiling is similar, with interchanging right and left. For example, if the
sorted with input array is {1, 2, 8, 10, 10, 12, 19}, then
For 𝑥 = 0: floor doesn't exist in array, ceil = 1, For 𝑥 = 1: floor = 1, ceil = 1
For 𝑥 = 5: floor = 2, ceil = 8, For 𝑥 = 20: floor = 19, ceil doesn't exist in array
Solution: The idea behind this solution is that, inorder traversal of BST produces sorted lists. While traversing
the BST in inorder, keep track of the values being visited. If the roots data is greater than the given value then
return the previous value which we have maintained during traversal. If the roots data is equal to the given data
then return root data.
struct BinaryTreeNode * floorInBST(struct BinaryTreeNode * root, int data) {
struct BinaryTreeNode * prev = NULL;
return floorInBSTUtil(root, prev, data);
}
struct BinaryTreeNode * floorInBSTUtil(struct BinaryTreeNode * root, struct BinaryTreeNode * prev, int data) {
if (!root)
return NULL;
if (!floorInBSTUtil(root→ left, prev, data))
return 0;
if (root→ data == data)
return root;
if (root→ data > data)
return prev;
prev = root;
return floorInBSTUtil(root→ right, prev, data);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack space.
For ceiling, we just need to call the right subtree first, followed by left subtree.
struct BinaryTreeNode *ceilingInBST(struct BinaryTreeNode *root, int data){
struct BinaryTreeNode *prev=NULL;
return ceilingInBSTUtil(root, prev, data);
}
struct BinaryTreeNode *ceilingInBSTUtil(struct BinaryTreeNode *root, struct BinaryTreeNode *prev, int data){
if(!root)
return NULL;
if(!ceilingInBSTUtil(root→right, prev, data))
return 0;
if(root→data == data)
return root;
if(root→data < data)
return prev;
prev = root;
return ceilingInBSTUtil(root→left, prev, data);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack space.
Problem-64 Give an algorithm for finding the union and intersection of BSTs. Assume parent pointers are
available (say threaded binary trees). Also, assume the lengths of two BSTs are 𝑚 and 𝑛 respectively.
Solution: If parent pointers are available then the problem is same as merging of two sorted lists. This is because
if we call inorder successor each time we get the next highest element. It’s just a matter of which inorderSuccessor
to call.
Time Complexity: O(𝑚 + 𝑛). Space complexity: O(1).
Problem-65 For Problem-64, what if parent pointers are not available?
Solution: If parent pointers are not available, the BSTs can be converted to linked lists and then merged.
1 Convert both the BSTs into sorted doubly linked lists in O(𝑛 + 𝑚) time. This produces 2 sorted lists.
2 Merge the two double linked lists into one and also maintain the count of total elements in O(𝑛 + 𝑚) time.
3 Convert the sorted doubly linked list into height balanced tree in O(𝑛 + 𝑚) time.
Problem-66 For Problem-64, is there any alternative way of solving the problem?
Solution: Yes, by using inorder traversal.
• Perform inorder traversal on one of the BSTs.
• While performing the traversal store them in table (hash table).
• After completion of the traversal of first 𝐵𝑆𝑇, start traversal of second 𝐵𝑆𝑇 and compare them with hash
table contents.
Time Complexity: O(𝑚 + 𝑛). Space Complexity: O(𝑀𝑎𝑥(𝑚, 𝑛)).
Problem-67 Given a 𝐵𝑆𝑇 and two numbers 𝐾1 and 𝐾2, give an algorithm for printing all the elements of 𝐵𝑆𝑇
in the range 𝐾1 and 𝐾2.
Solution:
void rangePrinter(struct BinarySearchTreeNode * root, int K1, int K2) {
if (root == NULL) return;
if (root→ data >= K1) rangePrinter(root→ left, K1, K2);
if (root→ data >= K1 && root→ data <= K2) printf(“ % d”, root→ data);
if (root→ data <= K2) rangePrinter(root→ right, K1, K2);
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for stack space.
Problem-68 For Problem-67, is there any alternative way of solving the problem?
Solution: We can use level order traversal: while adding the elements to queue check for the range.
void rangeSeachLevelOrder(struct BinarySearchTreeNode *root, int K1, int K2){
struct BinarySearchTreeNode *temp;
struct Queue *Q = createQueue();
if(!root)
return NULL;
Q = enQueue(Q, root);
while(!isEmpty(Q)) {
temp=deQueue(Q);
if(temp→data >= K1 && temp→data <= K2)
printf(“%d”,temp→data);
if(temp→left && temp→data >= K1)
enQueue(Q, temp→left);
if(temp→right && temp→data <= K2)
enQueue(Q, temp→right);
}
deleteQueue(Q);
return NULL;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for queue.
Problem-69 For Problem-67, can we still think of an alternative way to solve the problem?
Solution: First locate 𝐾1 with normal binary search and after that use InOrder successor until we encounter 𝐾2.
For algorithm, refer to problems section of threaded binary trees.
Problem-70 Given root of a Binary Search tree, trim the tree, so that all elements returned in the new tree are
between the inputs 𝐴 and 𝐵.
Solution: It’s just another way of asking Problem-67.
Problem-71 Given two BSTs, check whether the elements of them are the same or not. For example: two BSTs
with data 10 5 20 15 30 and 10 20 15 30 5 should return true and the dataset with 10 5 20 15 30 and 10 15
30 20 5 should return false. Note: BSTs data can be in any order.
Solution: One simple way is performing an inorder traversal on first tree and storing its data in hash table. As a
second step, perform inorder traversal on second tree and check whether that data is already there in hash table
or not (if it exists in hash table then mark it with -1 or some unique value).
During the traversal of second tree if we find any mismatch return false. After traversal of second tree check
whether it has all -1s in the hash table or not (this ensures extra data available in second tree).
Time Complexity: O(𝑚𝑎𝑥(𝑚, 𝑛)), where 𝑚 and 𝑛 are the number of elements in first and second BST.
Space Complexity: O(𝑚𝑎𝑥(𝑚, 𝑛)). This depends on the size of the first tree.
Problem-72 For Problem-71, can we reduce the time complexity?
Solution: Instead of performing the traversals one after the other, we can perform 𝑖𝑛 − 𝑜𝑟𝑑𝑒𝑟 traversal of both the
trees in parallel. Since the 𝑖𝑛 − 𝑜𝑟𝑑𝑒𝑟 traversal gives the sorted list, we can check whether both the trees are
generating the same sequence or not.
Time Complexity: O(𝑚𝑎𝑥(𝑚, 𝑛)). Space Complexity: O(1). This depends on the size of the first tree.
Problem-73 For the key values 1. . . 𝑛, how many structurally unique BSTs are possible that store those keys.
Solution: Strategy: consider that each value could be the root. Recursively find the size of the left and right
subtrees.
int countTrees(int n) {
if (n <= 1)
return 1;
else {
// there will be one value at the root, with whatever remains on the left and right
// each forming their own subtrees. Iterate through all the values that could be the root...
int sum = 0;
int left, right, root;
for (root=1; root<=n; root++) {
left = countTrees(root - 1);
right = countTrees(n - root);
// number of possible trees with this root == left*right
sum += left*right;
}
return(sum);
}
}
Problem-72 Given a BST of size 𝑛, in which each node r has an additional field 𝑟 → 𝑠𝑖𝑧𝑒, the number of the
keys in the sub-tree rooted at 𝑟 (including the root node 𝑟). Give an O(ℎ) algorithm 𝐺𝑟𝑒𝑎𝑡𝑒𝑟𝑡ℎ𝑎𝑛𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡(𝑟, 𝑘) to
find the number of keys that are strictly greater than 𝑘 (ℎ is the height of the binary search tree).
Solution:
int greaterthanConstant (struct BinarySearchTreeNode *r, int k){
keysCount = 0
while (r != Null ){
if (k < r→data){
keysCount = keysCount + r→right→size + 1;
r = r→left;
}
else if (k > r→data)
r = r→right;
else{ // k = r→data
keysCount = keysCount + r→right→size;
break;
}
}
return keysCount;
}
The suggested algorithm works well if the key is a unique value for each node. Otherwise when reaching 𝑘=𝑟→𝑑𝑎𝑡𝑎,
we should start a process of moving to the right until reaching a node 𝑦 with a key that is bigger then 𝑘, and then
we should return 𝑘𝑒𝑦𝑠𝐶𝑜𝑢𝑛𝑡 + 𝑦→𝑠𝑖𝑧𝑒. Time Complexity: O(ℎ) where ℎ=O(𝑛) in the worst case and O(𝑙𝑜𝑔𝑛) in the
average case.
In general, the height balanced trees are represented with 𝐻𝐵(𝑘), where 𝑘 is the difference between left subtree
height and right subtree height. Sometimes 𝑘 is called balance factor.
2 6
1 3 5 7
Note: For constructing 𝐻𝐵(0) tree refer to 𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 section.
6 12 6 10
3 10 3 9 12
As an example, among the above binary search trees, the left one is not an AVL tree, whereas the right binary
search tree is an AVL tree.
root
ℎ−2
ℎ
𝑁(ℎ − 2)
ℎ−1
𝑁(ℎ − 1)
Rotations
When the tree structure changes (e.g., with insertion or deletion), we need to modify the tree to restore the AVL
tree property. This can be done using single rotations or double rotations. Since an insertion/deletion involves
adding/deleting a single node, this can only increase/decrease the height of a subtree by 1.
So, if the AVL tree property is violated at a node 𝑋, it means that the heights of left(𝑋) and right(𝑋) differ by exactly
2. This is because, if we balance the AVL tree every time, then at any point, the difference in heights of left(𝑋) and
right(𝑋) differ by exactly 2. Rotations is the technique used for restoring the AVL tree property. This means, we
need to apply the rotations for the node 𝑋.
Observation: One important observation is that, after an insertion, only nodes that are on the path from the
insertion point to the root might have their balances altered, because only those nodes have their subtrees altered.
To restore the AVL tree property, we start at the insertion point and keep going to the root of the tree.
While moving to the root, we need to consider the first node that is not satisfying the AVL property. From that
node onwards, every node on the path to the root will have the issue.
Also, if we fix the issue for that first node, then all other nodes on the path to the root will automatically satisfy
the AVL tree property. That means we always need to care for the first node that is not satisfying the AVL property
on the path from the insertion point to the root and fix it.
Types of Violations
Let us assume the node that must be rebalanced is 𝑋. Since any node has at most two children, and a height
imbalance requires that 𝑋’𝑠 two subtree heights differ by two, we can observe that a violation might occur in four
cases:
1. An insertion into the left subtree of the left child of 𝑋.
2. An insertion into the right subtree of the left child of 𝑋.
3. An insertion into the left subtree of the right child of 𝑋.
4. An insertion into the right subtree of the right child of 𝑋.
Cases 1 and 4 are symmetric and easily solved with single rotations. Similarly, cases 2 and 3 are also symmetric
and can be solved with double rotations (needs two single rotations).
Single Rotations
Left Left Rotation (LL Rotation) [Case-1]: In the case below, node 𝑋 is not satisfying the AVL tree property. As
discussed earlier, the rotation does not have to be done at the root of a tree. In general, we start at the node
inserted and travel up the tree, updating the balance information at every node on the path.
X W
W X
C A
A B B C
root root
6 6
5 9 5 8
3 3
80 70 9
For example, in the figure above, after the insertion of 7 in the original AVL tree on the left, node 9 becomes
unbalanced. So, we do a single left-left rotation at 9. As a result we get the tree on the right.
struct AVLTreeNode *singleRotateLeft(struct AVLTreeNode *X ){
struct AVLTreeNode *W = X→left;
X→left = W→right;
W→right = X;
X→height = max( height(X→left), height(X→right) ) + 1;
W→height = max( height(W→left), X→height ) + 1;
return W; /* New root */
}
Time Complexity: O(1). Space Complexity: O(1).
Right Right Rotation (RR Rotation) [Case-4]: In this case, node 𝑋 is not satisfying the AVL tree property.
W X
X W
A C
B C A B
root 8 root 8
6 15 6 19
3 19 3 15 29
29
For example, in the above figure, after the insertion of 29 in the original AVL tree on the left, node 15 becomes
unbalanced. So, we do a single right-right rotation at 15. As a result we get the tree on the right.
struct AVLTreeNode *singleRotateRight(struct AVLTreeNode *W ) {
struct AVLTreeNode *X = W→right;
W→right = X→left;
X→left = W;
W→height = max( height(W→right), height(W→left) ) + 1;
X→height = max( height(X→right), W→height) + 1;
return X;
}
Time Complexity: O(1). Space Complexity: O(1).
Double Rotations
Left Right Rotation (LR Rotation) [Case-2]: For case-2 and case-3 single rotation does not fix the problem. We
need to perform two rotations.
Z Z
A C
B C A B
Y
X Z
A B C D
As an example, let us consider the following tree: Insertion of 7 is creating the case-2 scenario and right side tree
is the one after double rotation.
5 9 6 5 8
9
3 6 5 7 3 7
9
7 3
X X Right rotation
at X
Z Y
A A Z
Y
Left rotation at
Z D B
B C Y C D
X Z
A B C D
As an example, let us consider the following tree: The insertion of 6 is creating the case-3 scenario and the right
side tree is the one after the double rotation.
4 4
6
2 7 2
6 4 7
6 86
5 7
2 5 86
5
8
𝑁(1) = 2
h-1 h-2
Problem-76 For Problem-73, how many different shapes can there be of a minimal AVL tree of height ℎ?
Solution: Let 𝑁𝑆(ℎ) be the number of different shapes of a minimal AVL tree of height ℎ.
𝑁𝑆(0) = 1
𝑁𝑆(1) = 2
2 6
a b
1 3 5 7
The idea is to make use of the recursive property of binary search trees. There are three cases to consider: whether
the current node is in the range [𝑎, 𝑏], on the left side of the range [𝑎, 𝑏], or on the right side of the range [𝑎, 𝑏]. Only
subtrees that possibly contain the nodes will be processed under each of the three cases.
int rangeCount(struct AVLNode *root, int a, int b) {
if(root == NULL)
return 0;
else if(root→data > b)
return rangeCount(root→left, a, b);
else if(root→data < a)
return rangeCount(root→right, a, b);
else if(root→data >= a && root→data <= b)
return rangeCount(root→left, a, b) + rangeCount(root→right, a, b) + 1;
}
The complexity is similar to 𝑖𝑛 − 𝑜𝑟𝑑𝑒𝑟 traversal of the tree but skipping left or right sub-trees when they do not
contain any answers. So in the worst case, if the range covers all the nodes in the tree, we need to traverse all the
𝑛 nodes to get the answer. The worst time complexity is therefore O(𝑛).
If the range is small, which only covers a few elements in a small subtree at the bottom of the tree, the time
complexity will be O(ℎ) = O(𝑙𝑜𝑔𝑛), where ℎ is the height of the tree. This is because only a single path is traversed
to reach the small subtree at the bottom and many higher level subtrees have been pruned along the way.
Note: Refer similar problem in BST.
Problem-80 Given a BST (applicable to AVL trees as well) where each node contains two data elements (its
data and also the number of nodes in its subtrees) as shown below. Convert the tree to another BST by replacing
the second data element (number of nodes in its subtrees) with previous node data in inorder traversal. Note
that each node is merged with 𝑖𝑛𝑜𝑟𝑑𝑒𝑟 previous node data. Also make sure that conversion happens in-place.
6 5
6 5
4 2 9 2
4 2 9 -
2 0 5 0 7 1
7 8
8 0
Solution: The simplest way is to use level order traversal. If the number of elements in the left subtree is greater
than the number of elements in the right subtree, find the maximum element in the left subtree and replace the
current node second data element with it. Similarly, if the number of elements in the left subtree is less than the
number of elements in the right subtree, find the minimum element in the right subtree and replace the current
node 𝑠𝑒𝑐𝑜𝑛𝑑 data element with it.
struct BST *treeCompression (struct BST *root){
struct BST *temp, *temp2;
struct Queue *Q = createQueue();
if(!root)
return NULL;
enQueue(Q, root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
if(temp→left && temp→right && temp→left→data2 > temp→right→data2)
temp2 = findMax(temp);
else temp2 = findMin(temp);
temp→data2 = temp2→data2; //Process current node
//Remember to delete this node.
deleteNodeInBST(temp2);
if(temp→left)
enQueue(Q, temp→left);
if(temp→right)
enQueue(Q, temp→right);
}
deleteQueue(Q);
}
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛) on average since BST takes O(𝑙𝑜𝑔𝑛) on average to find the maximum or minimum
element. Space Complexity: O(𝑛). Since, in the worst case, all the nodes on the entire last level could be in the
queue simultaneously.
Problem-81 Can we reduce time complexity for the previous problem?
Solution: Let us try using an approach that is similar to what we followed in Problem-62. The idea behind this
solution is that inorder traversal of BST produces sorted lists. While traversing the BST in inorder, keep track of
the elements visited and merge them.
struct BinarySearchTreeNode * treeCompression(struct BinarySearchTreeNode *root, int *previousNodeData){
if(!root) return NULL;
treeCompression(root→left, previousNode);
if(*previousNodeData == INT_MIN){
*previousNodeData = root→data;
free(root);
}
if(*previousNodeData != INT_MIN){ // Process current node
root→data2 = previousNodeData;
*previousNodeData = INT_MIN;
}
return treeCompression(root→right, previousNode);
}
Time Complexity: O(𝑛).
Space Complexity: O(1). Note that, we are still having recursive stack space for inorder traversal.
Problem-82 Given a BST and a key, find the element in the BST which is closest to the given key.
Solution: As a simple solution, we can use level-order traversal and for every element compute the difference
between the given key and the element’s value. If that difference is less than the previous maintained difference,
then update the difference with this new minimum value. With this approach, at the end of the traversal we will
get the element which is closest to the given key.
int closestInBST(struct BinaryTreeNode *root, int key){
struct BinaryTreeNode *temp, *element;
struct Queue *Q;
int difference = INT_MAX;
if(!root)
return 0;
Q = createQueue();
enQueue(Q,root);
while(!isEmpty(Q)) {
temp = deQueue(Q);
if(difference > (abs(temp→data-key)){
difference = abs(temp→data-key);
element = temp;
}
if(temp→left)
enQueue (Q, temp→left);
if(temp→right)
enQueue (Q, temp→right);
}
deleteQueue(Q);
6.13 AVL (Adelson-Velskii and Landis) Trees 205
Data Structures and Algorithms Made Easy Trees
return element→data;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-83 For Problem-82, can we solve it using the recursive approach?
Solution: The approach is similar to Problem-18. Following is a simple algorithm for finding the closest Value in
BST.
1. If the root is NULL, then the closest value is zero (or NULL).
2. If the root’s data matches the given key, then the closest is the root.
3. Else, consider the root as the closest and do the following:
a. If the key is smaller than the root data, find the closest on the left side tree of the root recursively
and call it temp.
b. If the key is larger than the root data, find the closest on the right side tree of the root recursively
and call it temp.
4. Return the root or temp depending on whichever is nearer to the given key.
struct BinaryTreeNode * closestInBST(struct BinaryTreeNode *root, int key){
struct BinaryTreeNode *temp;
if(root == NULL)
return root;
if(root→data == key)
return root;
if(key < root→data){
if(!root→left)
return root;
temp = closestInBST(root→left, key);
return abs(temp→data-key) > abs(root→data-key) ? root : temp;
}else{
if(!root→right)
return root;
temp = closestInBST(root→right, key);
return abs(temp→data-key) > abs(root→data-key) ? root : temp;
}
return NULL;
}
Time Complexity: O(𝑛) in worst case, and in average case it is O(𝑙𝑜𝑔𝑛). Space Complexity: O(𝑛).
Problem-84 Median in an infinite series of integers
Solution: Median is the middle number in a sorted list of numbers (if we have odd number of elements). If we
have even number of elements, median is the average of two middle numbers in a sorted list of numbers.
For solving this problem we can use a binary search tree with additional information at each node, and the number
of children on the left and right subtrees. We also keep the number of total nodes in the tree. Using this additional
information we can find the median in O(𝑙𝑜𝑔𝑛) time, taking the appropriate branch in the tree based on the number
of children on the left and right of the current node. But, the insertion complexity is O(𝑛) because a standard
binary search tree can degenerate into a linked list if we happen to receive the numbers in sorted order.
So, let’s use a balanced binary search tree to avoid worst case behavior of standard binary search trees. For this
problem, the balance factor is the number of nodes in the left subtree minus the number of nodes in the right
subtree. And only the nodes with a balance factor of +1 or 0 are considered to be balanced.
So, the number of nodes on the left subtree is either equal to or 1 more than the number of nodes on the right
subtree, but not less.
If we ensure this balance factor on every node in the tree, then the root of the tree is the median, if the number of
elements is odd. In the number of elements is even, the median is the average of the root and its inorder successor,
which is the leftmost descendent of its right subtree. So, the complexity of insertion maintaining a balanced
condition is O(𝑙𝑜𝑔𝑛) and finding a median operation is O(1) assuming we calculate the inorder successor of the
root at every insertion if the number of nodes is even.
Insertion and balancing is very similar to AVL trees. Instead of updating the heights, we update the number of
nodes information. Balanced binary search trees seem to be the most optimal solution, insertion is O(𝑙𝑜𝑔𝑛) and
find median is O(1).
Note: For an efficient algorithm refer to the 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 𝑎𝑛𝑑 𝐻𝑒𝑎𝑝𝑠 chapter.
Problem-85 Given a binary tree, how do you remove all the half nodes (which have only one child)? Note that
we should not touch leaves.
Solution: By using post-order traversal we can solve this problem efficiently. We first process the left children,
then the right children, and finally the node itself. So we form the new tree bottom up, starting from the leaves
towards the root. By the time we process the current node, both its left and right subtrees have already been
processed.
struct BinaryTreeNode *removeHalfNodes(struct BinaryTreeNode *root){
if (!root)
return NULL;
root→left=removeHalfNodes(root→left);
root→right=removeHalfNodes(root→right);
if (root→left == NULL && root→right == NULL)
return root;
if (root→left == NULL)
return root→right;
if (root→right == NULL)
return root→left;
return root;
}
Time Complexity: O(𝑛).
Problem-86 Given a binary tree, how do you remove its leaves?
Solution: By using post-order traversal we can solve this problem (other traversals would also work).
struct BinaryTreeNode* removeLeaves(struct BinaryTreeNode* root) {
if (root != NULL) {
if (root→left == NULL && root→right == NULL) {
free(root);
return NULL;
} else {
root→left = removeLeaves(root→left);
root→right = removeLeaves(root→right);
}
}
return root;
}
Time Complexity: O(𝑛).
Problem-87 Given a BST and two integers (minimum and maximum integers) as parameters, how do you
remove (prune) elements that are not within that range?
Sample Tree
37 89
13 41 53
7 19 71
25 60 82
25 41 71 60 25
60
Solution: Observation: Since we need to check each and every element in the tree, and the subtree changes
should be reflected in the parent, we can think about using post order traversal. So we process the nodes starting
from the leaves towards the root. As a result, while processing the node itself, both its left and right subtrees are
valid pruned BSTs. At each node we will return a pointer based on its value, which will then be assigned to its
parent’s left or right child pointer, depending on whether the current node is the left or right child of the parent.
If the current node’s value is between 𝐴 and 𝐵 (𝐴 <= 𝑛𝑜𝑑𝑒’𝑠 𝑑𝑎𝑡𝑎 <= 𝐵) then no action needs to be taken, so we
return the reference to the node itself.
If the current node’s value is less than 𝐴, then we return the reference to its right subtree and discard the left
subtree. Because if a node’s value is less than 𝐴, then its left children are definitely less than A since this is a
binary search tree. But its right children may or may not be less than A; we can’t be sure, so we return the
reference to it. Since we’re performing bottom-up post-order traversal, its right subtree is already a trimmed valid
binary search tree (possibly NULL), and its left subtree is definitely NULL because those nodes were surely less
than A and they were eliminated during the post-order traversal.
A similar situation occurs when the node’s value is greater than 𝐵, so we now return the reference to its left
subtree. Because if a node’s value is greater than 𝐵, then its right children are definitely greater than 𝐵. But its
left children may or may not be greater than 𝐵; So we discard the right subtree and return the reference to the
already valid left subtree.
struct BinarySearchTreeNode* pruneBST(struct BinarySearchTreeNode *root, int A, int B){
if(!root) return NULL;
root→left= pruneBST(root→left,A,B);
root→right= pruneBST(root→right,A,B);
if(A<=root→data && root→data<=B)
return root;
if(root→data<A)
return root→right;
if(root→data>B)
return root→left;
}
Time Complexity: O(𝑛) in worst case and in average case it is O(𝑙𝑜𝑔𝑛).
Note: If the given BST is an AVL tree then O(𝑛) is the average time complexity.
Problem-88 Given a binary tree, how do you connect all the adjacent nodes at the same level? Assume that
given binary tree has next pointer along with left and right pointers as shown below.
struct BinaryTreeNode {
int data;
struct BinaryTreeNode *left;
struct BinaryTreeNode *right;
struct BinaryTreeNode *next;
};
Solution: One simple approach is to use level-order traversal and keep updating the next pointers. While
traversing, we will link the nodes on the next level. If the node has left and right node, we will link left to right. If
node has next node, then link rightmost child of current node to leftmost child of next node.
void linkingNodesOfSameLevel(struct BinaryTreeNode *root){
struct Queue *Q = createQueue();
struct BinaryTreeNode *prev=NULL; // Pointer to the previous node of the current level
struct BinaryTreeNode *temp;
int currentLevelNodeCount, nextLevelNodeCount;
if(!root) return;
enQueue(Q, root);
currentLevelNodeCount = 1;
nextLevelNodeCount = 0;
while (!isEmpty(Q)) {
temp = deQueue(Q);
if (temp→left){
enQueue(Q, temp→left);
nextLevelNodeCount++;
}
if (temp→right){
enQueue(Q, temp→right);
nextLevelNodeCount++;
}
// Link the previous node of the current level to this node
if (prev)
prev→next = temp;
// Set the previous node to the current
prev = temp;
currentLevelNodeCount--;
6.14.3 B-Trees
B-Tree is like other self-balancing trees such as AVL and Red-black tree such that it maintains its balance of
nodes while operations are performed against it. B-Tree has the following properties:
• Minimum degree "𝑡" where, except root node, all other nodes must have no less than 𝑡 − 1 keys
• Each node with 𝑛 keys has 𝑛 + 1 children
• Keys in each node are lined up where 𝑘1 < 𝑘2 < .. 𝑘𝑛
• Each node cannot have more than 2t-1 keys, thus 2t children
• Root node at least must contain one key. There is no root node if the tree is empty.
• Tree grows in depth only when root node is split.
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored
in non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with
keys less than or equal to the key but greater than the preceding key. A node also has an additional rightmost
child that is the root for a subtree containing all keys greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟. If 𝑡 is this
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟, every node must have at least 𝑡 − 1 keys. Under certain circumstances, the root node is
allowed to violate this property by having fewer than 𝑡 − 1 keys. Every node may have at most 2𝑡 − 1 keys or,
equivalently, 2𝑡 children.
Since each node tends to have a large branching factor (a large number of children), it is typically necessary to
traverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a
B-tree will minimize the number of disk accesses required. The minimization factor is usually chosen so that the
total size of each node corresponds to a multiple of the block size of the underlying storage device. This choice
simplifies and optimizes disk access. Consequently, a B-tree is an ideal data structure for situations where all
data cannot reside in primary storage and accesses to secondary storage are comparatively expensive (or time
consuming).
To 𝑠𝑒𝑎𝑟𝑐ℎ the tree, it is similar to binary tree except that the key is compared multiple times in a given node
because the node contains more than 1 key. If the key is found in the node, the search terminates. Otherwise, it
moves down where at child pointed by ci where key 𝑘 < 𝑘𝑖 .
Key 𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝑠 of a B-tree happens from the bottom fashion. This means that it walks down the tree from root to
the target child node first. If the child is not full, the key is simply inserted. If it is full, the child node is split in
the middle, the median key moves up to the parent, then the new key is inserted. When inserting and walking
down the tree, if the root node is found to be full, it's split first and we have a new root node. Then the normal
insertion operation is performed.
Key 𝑑𝑒𝑙𝑒𝑡𝑖𝑜𝑛 is more complicated as it needs to maintain the number of keys in each node to meet the constraint.
If a key is found in leaf node and deleting it keeps the number of keys in the nodes not too low, it's simply done
right away. If it's done to the inner node, the predecessor of the key in the corresponding child node is moved to
replace the key in the inner node. If moving the predecessor will cause the child node to violate the node count
constraint, the sibling child nodes are combined and the key in the inner node is deleted.
3 2
13 50
1 1 1
10 13 70
approach like maintaining a separate array of 𝑛 elements, where 𝑛 is the size of the original array, where each
index stores the sum of all elements from 0 to that index. So essentially we have with a bit of preprocessing
brought down the query time from a worst case O(𝑛) to O(1). Now this is great as far as static arrays are concerned,
but what if we are required to perform updates on the array too?
The first approach gives us an O(𝑛) query time, but an O(1) update time. The second approach, on the other hand,
gives us O(1) query time, but an O(𝑛) update time. So, which one do we choose?
Interval trees are also binary search trees, and they store interval information in the node structure. That means,
we maintain a set of 𝑛 intervals [𝑖1 , 𝑖2 ] such that one of the intervals containing a query point 𝑄 (if any) can be
found efficiently. Interval trees are used for performing range queries efficiently.
A segment tree is a heap-like data structure that can be used for making update/query operations upon array
intervals in logarithmical time. We define the segment tree for the interval [𝑖, 𝑗] in the following recursive manner:
• The root (first node in the array) node will hold the information for the interval [𝑖, 𝑗]
𝑖+𝑗 𝑖+𝑗
• If 𝑖 < 𝑗 the left and right children will hold the information for the intervals [𝑖, 2 ] and [ 2 +1, 𝑗]
Segment trees (also called 𝑠𝑒𝑔𝑡𝑟𝑒𝑒𝑠 and 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑡𝑟𝑒𝑒𝑠) is a cool data structure, primarily used for range queries. It
is a height balanced binary tree with a static structure. The nodes of a segment tree correspond to various intervals
and can be augmented with appropriate information pertaining to those intervals. It is somewhat less powerful
than a balanced binary tree because of its static structure, but due to the recursive nature of operations on the
segtree, it is incredibly easy to think about and code.
We can use segment trees to solve a range of minimum/maximum query problems. The time complexity is T(𝑛𝑙𝑜𝑔𝑛)
where O(𝑛) is the time required to build the tree and each query takes O(𝑙𝑜𝑔𝑛) time.
Example: Given a set of intervals: 𝑆 = {[2-5], [6-7], [6-10], [8-9], [12-15], [15-23], [25-30]}. A query with 𝑄 = 9
returns [6, 10] or [8, 9] (assume these are the intervals which contain 9 among all the intervals). A query with 𝑄 =
23 returns [15, 23].
Query Line
Intervals
Construction of Interval Trees: Let us assume that we are given a set 𝑆 of 𝑛 intervals (called 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠). These 𝑛
intervals will have 2𝑛 endpoints. Now, let us see how to construct the interval tree.
Algorithm: Recursively build tree on interval set 𝑆 as follows:
• Sort the 2𝑛 endpoints
• Let Xmid be the median point
Time Complexity for building interval trees: O(𝑛𝑙𝑜𝑔𝑛). Since we are choosing the median, Interval Trees will be
approximately balanced. This ensures that we split the set of end points up in half each time. The depth of the
tree is O(𝑙𝑜𝑔𝑛). To simplify the search process, generally 𝑋𝑚𝑖𝑑 is stored with each node.
we traverse back up the tree. If we find an imbalance where a child's size exceeds the parent's size times alpha,
we must rebuild the subtree at the parent, the 𝑠𝑐𝑎𝑝𝑒𝑔𝑜𝑎𝑡.
There might be more than one possible scapegoat, but we only have to pick one. The most optimal scapegoat is
determined by height balance. When removing it, we see if the total size of the tree is less than alpha of the largest
size since the last rebuilding of the tree. If so, we rebuild the entire tree. The alpha for a scapegoat tree can be any
number between 0.5 and 1.0. The value 0.5 will force perfect balance, while 1.0 will cause rebalancing to never
occur, effectively turning it into a BST.
9 9 5 9
9 9 9 9 9 6
Not an univalued
Solution: A tree is univalued if both its children are univalued, plus the root node has the same value as the child
nodes. We can write our function recursively. 𝑖𝑠𝐿𝑒𝑓𝑡𝑈𝑛𝑖𝑣𝑎𝑙𝑇𝑟𝑒𝑒 will represent that the left subtree is correct: ie.,
that it is univalued, and the root value is equal to the left child's value. 𝑖𝑠𝑅𝑖𝑔ℎ𝑡𝑈𝑛𝑖𝑣𝑎𝑙𝑇𝑟𝑒𝑒 will represent the same
thing for the right subtree. We need both properties to be true.
class Solution {
public:
bool isUnivalueTree(BinaryTreeNode* root) {
if (root->left != NULL) {
if (root->data != root->left->data || !isUnidataTree(root->left)) {
return false;
}
}
if (root->right != NULL) {
if (root->data != root->right->data || !isUnidataTree(root->right)) {
return false;
}
}
return true;
}
};
Time complexity: O(𝑛), where 𝑛 is the number of nodes in the given tree.
Space complexity: O(ℎ), where ℎ is the height of the given tree.
Chapter
Priority Queues
and Heaps 7
7.1 What is a Priority Queue?
In some situations, we may need to find the minimum/maximum element among a collection of elements. We can
do this with the help of Priority Queue ADT. A priority queue ADT is a data structure that supports the operations
𝐼𝑛𝑠𝑒𝑟𝑡 and 𝐷𝑒𝑙𝑒𝑡𝑒𝑀𝑖𝑛 (which returns and removes the minimum element) or 𝐷𝑒𝑙𝑒𝑡𝑒𝑀𝑎𝑥 (which returns and removes
the maximum element).
These operations are equivalent to 𝐸𝑛𝑄𝑢𝑒𝑢𝑒 and 𝐷𝑒𝑄𝑢𝑒𝑢𝑒 operations of a queue. The difference is that, in priority
queues, the order in which the elements enter the queue may not be the same in which they were processed. An
example application of a priority queue is job scheduling, which is prioritized instead of serving in first come first
serve.
Insert DeleteMax
Priority Queue
A priority queue is called an 𝑎𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔 − 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦 queue, if the item with the smallest key has the highest priority
(that means, delete the smallest element always). Similarly, a priority queue is said to be a 𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔 − 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦
queue if the item with the largest key has the highest priority (delete the maximum element always). Since these
two types are symmetric we will be concentrating on one of them: ascending-priority queue.
Comparing Implementations
Implementation Insertion Deletion (deleteMax) Find Min
Unordered array 1 𝑛 𝑛
Unordered list 1 𝑛 𝑛
Ordered array 𝑛 1 1
Ordered list 𝑛 1 1
Binary Search Trees 𝑙𝑜𝑔𝑛 (average) 𝑙𝑜𝑔𝑛 (average) 𝑙𝑜𝑔𝑛 (average)
Balanced Binary Search Trees 𝑙𝑜𝑔𝑛 𝑙𝑜𝑔𝑛 𝑙𝑜𝑔𝑛
Binary Heaps 𝑙𝑜𝑔𝑛 𝑙𝑜𝑔𝑛 1
2 3
4 5 6
In the examples below, the left tree is a heap (each element is greater than its children) and the right tree is not a
heap (since 11 is greater than 2).
7.5 Heaps and Binary Heaps 216
Data Structures and Algorithms Made Easy Priority Queues and Heaps
3 6 5 2
1 2 4 6 7 4 3
Types of Heaps?
Based on the property of a heap we can classify heaps into two types:
• Min heap: The value of a node must be less than or equal to the values of its children.
5 2
6 7 4 3
• Max heap: The value of a node must be greater than or equal to the values of its children.
5 6
1 4 2 3
Declaration of Heap
struct Heap {
int *array;
int count; // Number of elements in Heap
int capacity; // Size of the heap
int heap_type; // Min Heap or Max Heap
};
Creating Heap
struct Heap * createHeap(int capacity, int heap_type) {
struct Heap * h = (struct Heap *)malloc(sizeof(struct Heap));
if(h == NULL) {
printf("Memory Error");
return;
}
h→heap_type = heap_type;
h→count = 0;
h→capacity = capacity;
h→array = (int *) malloc(sizeof(int) * h→capacity);
if(h→array == NULL) {
printf("Memory Error");
return;
}
return h;
}
Parent of a Node
𝑖−1
For a node at 𝑖 𝑡ℎ location, its parent is at location. In the previous example, the element 6 is at second location
2
and its parent is at 0𝑡ℎ location.
int parent (struct Heap * h, int i) {
if(i <= 0 || i >= h→count)
return -1;
return i-1/2;
}
Time Complexity: O(1).
Children of a Node
Similar to the above discussion, for a node at 𝑖 𝑡ℎ location, its children are at 2 ∗ 𝑖 + 1 and 2 ∗ 𝑖 + 2 locations. For
example, in the above tree the element 6 is at second location and its children 2 and 5 are at 5 (2 ∗ 𝑖 + 1 = 2 ∗ 2 +
1) and 6 (2 ∗ 𝑖 + 2 = 2 ∗ 2 + 2) locations.
int leftChild(struct Heap *h, int i) { int rightChild(struct Heap *h, int i) {
int left = 2 * i + 1; int right = 2 * i + 2;
if(left >= h→count) if(right >= h→count)
return -1; return -1;
return left; return right;
} }
Time Complexity: O(1). Time Complexity: O(1).
Heapifying an Element
After inserting an element into heap, it may not satisfy the heap property. In that case we need to adjust the locations of the heap to make it
heap again. This process is called ℎ𝑒𝑎𝑝𝑖𝑓𝑦𝑖𝑛𝑔. In max-heap, to heapify an element, we have to find the maximum of its children and swap
it with the current element and continue this process until the heap property is satisfied at every node.
1 21
9 10 12 18
3 2 8 7
Observation: One important property of heap is that, if an element is not satisfying the heap property, then all the elements from that element
to the root will have the same problem. In the example below, element 1 is not satisfying the heap property and its parent 31 is also having
the issue.
Similarly, if we heapify an element, then all the elements from that element to the root will also satisfy the heap property automatically. Let us
go through an example. In the above heap, the element 1 is not satisfying the heap property. Let us try heapifying this element.
To heapify 1, find the maximum of its children and swap with that.
1 21
9 1 12 18
3 2 8 7
We need to continue this process until the element satisfies the heap properties. Now, swap 1 with 8.
1 21
9 8 12 18
3 2 1 7
Now the tree is satisfying the heap property. In the above heapifying process, since we are moving from top to
bottom, this process is sometimes called 𝑝𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒 𝑑𝑜𝑤𝑛. Similarly, if we start heapifying from any other node to
root, we can that process 𝑝𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒 𝑢𝑝 as move from bottom to top.
//Heapifying the element at location 𝑖.
void percolateDown(struct Heap *h, int i) {
int l, r, max, temp;
l = leftChild(h, i);
r = rightChild(h, i);
if(l != -1 && h→array[l] > h→array[i])
max = l;
else
max = i;
if(r != -1&& h→array[r] > h→array[max])
max = r;
if(max != i) {
//Swap h→array[i] and h→array[max];
temp = h→array[i];
h→array[i] = h→array[max];
h→array[max] = temp;
percolateDown(h, max);
}
}
Time Complexity: O(𝑙𝑜𝑔𝑛). Heap is a complete binary tree and in the worst case we start at the root and come down to the leaf. This is equal
to the height of the complete binary tree. Space Complexity: O(1).
Deleting an Element
To delete an element from heap, we just need to delete the element from the root. This is the only operation
(maximum element) supported by standard heap. After deleting the root element, copy the last element of the heap
(tree) and delete that last element. After replacing the last element, the tree may not satisfy the heap property. To
make it heap again, call the 𝑃𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒𝐷𝑜𝑤𝑛 function.
• Copy the first element into some variable
• Copy the last element into first element location
• 𝑃𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒𝐷𝑜𝑤𝑛 the first element
int deleteMax(struct Heap *h) {
int data;
if(h→count == 0)
return -1;
data = h→array[0];
h→array[0] = h→array[h→count-1];
h→count--; //reducing the heap size
percolateDown(h, 0);
return data;
}
Note: Deleting an element uses 𝑃𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒𝐷𝑜𝑤𝑛, and inserting an element uses 𝑃𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒𝑈𝑝.
Time Complexity: same as 𝐻𝑒𝑎𝑝𝑖𝑓𝑦 function and it is O(𝑙𝑜𝑔𝑛).
Inserting an Element
Insertion of an element is similar to the heapify and deletion process.
• Increase the heap size
10 16
9 8 14 12
3 1 5 7 19
In order to heapify this element (19), we need to compare it with its parent and adjust them. Swapping 19 and 14 gives:
10 16
9 8 19 12
3 1 5 7 14
Again, swap 19 and16:
1 19
0
9 8 16 12
3 1 5 7 14
Now the tree is satisfying the heap property. Since we are following the bottom-up approach we sometimes call this process 𝑝𝑒𝑟𝑐𝑜𝑙𝑎𝑡𝑒 𝑢𝑝.
int insert(struct Heap *h, int data) {
int i;
if(h→count == h→capacity)
resizeHeap(h);
h→count++; //increasing the heap size to hold this new item
i = h→count-1;
while(i>=0 && data > h→array[(i-1)/2]) {
h→array[i] = h→array[(i-1)/2];
i = i-1/2;
}
h→array[i] = data;
}
void resizeHeap(struct Heap * h) {
int *array_old = h→array;
h→array = (int *) malloc(sizeof(int) * h→capacity * 2);
if(h→array == NULL) {
printf("Memory Error");
return;
}
for (int i = 0; i < h→capacity; i ++)
h→array[i] = array_old[i];
h→capacity *= 2;
free(array_old);
}
Time Complexity: O(𝑙𝑜𝑔𝑛). The explanation is the same as that of the 𝐻𝑒𝑎𝑝𝑖𝑓𝑦 function.
Destroying Heap
void DestroyHeap (struct Heap *h) {
if(h == NULL)
7.6 Binary Heaps 220
Data Structures and Algorithms Made Easy Priority Queues and Heaps
return;
free(h→array);
free(h);
h = NULL;
}
5 14
2 10 21 18
7.7 Heapsort
One main application of heap ADT is sorting (heap sort). The heap sort algorithm inserts all elements (from an
unsorted array) into a heap, then removes them from the root of a heap until the heap is empty. Note that heap
sort can be done in place with the array to be sorted. Instead of deleting an element, exchange the first element
(maximum) with the last element and reduce the heap size (array size). Then, we heapify the first element. Continue
this process until the number of remaining elements is one.
void Heapsort(int A[], in n) {
struct Heap *h = createHeap(n);
int old_size, i, temp;
buildHeap(h, A, n);
old_size = h→count;
for(i = 𝑛-1; i > 0; i--) {
//h→array [0] is the largest element
7.7 Heapsort 221
Data Structures and Algorithms Made Easy Priority Queues and Heaps
temp = h→array[0];
h→array[0] = h→array[h→count-1];
h→array[0] = temp;
h→count--;
percolateDown(h, 0);
}
h→count = old_size;
}
Time complexity: As we remove the elements from the heap, the values become sorted (since maximum elements
are always 𝑟𝑜𝑜𝑡 only). Since the time complexity of both the insertion algorithm and deletion algorithm is O(𝑙𝑜𝑔𝑛)
(where 𝑛 is the number of items in the heap), the time complexity of the heap sort algorithm is O(𝑛𝑙𝑜𝑔𝑛).
2 5
3 4 6 7
Problem-3 Is there a max-heap with seven distinct elements so that the preorder traversal of it gives the
elements in sorted order?
Solution: Yes. For the tree below, preorder traversal produces descending order.
6 3
5 4 2 1
Problem-4 Is there a min-heap/max-heap with seven distinct elements so that the inorder traversal of it
gives the elements in sorted order?
Solution: No. Since a heap must be either a min-heap or a max-heap, the root will hold the smallest element or
the largest. An inorder traversal will visit the root of the tree as its second step, which is not the appropriate place
if the tree’s root contains the smallest or largest element.
Problem-5 Is there a min-heap/max-heap with seven distinct elements so that the postorder traversal of it
gives the elements in sorted order?
Solution:
3 6 5 2
1 2 4 5 6 7 4 3
Yes, if the tree is a max-heap and we want descending order (below left), or if the tree is a min-heap and we want
ascending order (below right).
Problem-6 Show that the height of a heap with 𝑛 elements is 𝑙𝑜𝑔𝑛?
Solution: A heap is a complete binary tree. All the levels, except the lowest, are completely full. A heap has at
least 2ℎ elements and at most elements 2ℎ ≤ 𝑛 ≤ 2ℎ+1 − 1. This implies, ℎ ≤ 𝑙𝑜𝑔𝑛 ≤ ℎ + 1. Since ℎ is an integer, ℎ =
𝑙𝑜𝑔𝑛.
7.8 Priority Queues [Heaps]: Problems & Solutions 222
Data Structures and Algorithms Made Easy Priority Queues and Heaps
Problem-7 Given a min-heap, give an algorithm for finding the maximum element.
Solution: For a given min heap, the maximum element will always be at leaf only. Now, the next question is how
to find the leaf nodes in the tree.
5 14
2 10 21 18
3 11 28 37 42
If we carefully observe, the next node of the last element’s parent is the first leaf node. Since the last element is
𝑠𝑖𝑧𝑒−1
always at the 𝑠𝑖𝑧𝑒 − 1𝑡ℎ location, the next node of its parent (parent at location 2 ) can be calculated as:
𝑠𝑖𝑧𝑒 − 1 𝑠𝑖𝑧𝑒 + 1
+1≈
2 2
Now, the only step remaining is scanning the leaf nodes and finding the maximum among them.
int findMaxInMinHeap(struct Heap *h) {
int Max = -1;
for(int i = (h→count+1)/2; i < h→count; i++)
if(h→array[i] > Max)
Max = h→array[i];
}
𝑛
Time Complexity: O(2 ) ≈ O(𝑛).
Problem-8 Give an algorithm for deleting an arbitrary element from min heap.
Solution: To delete an element, first we need to search for an element. Let us assume that we are using level order
traversal for finding the element. After finding the element we need to follow the deleteMin process.
Time Complexity = Time for finding the element + Time for deleting an element
= O(𝑛) + O(𝑙𝑜𝑔𝑛) ≈O(𝑛). //Time for searching is dominated.
Problem-9 Give an algorithm for deleting the 𝑖 𝑡ℎ indexed element in a given min-heap.
Solution:
int delete(struct Heap *h, int i) {
int key;
if(i > h→count) {
printf(“Wrong position”);
return;
}
key = h→array[i];
h→array[i]= h→array[h→count-1];
h→count--;
percolateDown(h, i);
return key;
}
Time Complexity = O(𝑙𝑜𝑔𝑛).
Problem-10 Prove that, for a complete binary tree of height ℎ the sum of the height of all nodes is O(𝑛 − ℎ).
Solution: A complete binary tree has 2𝑖 nodes on level 𝑖. Also, a node on level 𝑖 has depth 𝑖 and height ℎ − 𝑖. Let
us assume that 𝑆 denotes the sum of the heights of all these nodes and 𝑆 can be calculated as:
ℎ
𝑆 = ∑ 2𝑖 (ℎ − 𝑖)
𝑖=0
𝑆 = ℎ + 2(ℎ − 1) + 4(ℎ − 2) + ⋯ + 2ℎ−1 (1)
Multiplying with 2 on both sides gives: 2𝑆 = 2ℎ + 4(ℎ − 1) + 8(ℎ − 2) + ⋯ + 2ℎ (1)
Now, subtract 𝑆 from 2𝑆: 2𝑆 − 𝑆 = −ℎ + 2 + 4 + ⋯ + 2ℎ ⟹ 𝑆 = (2ℎ+1 − 1) − (ℎ − 1)
But, we already know that the total number of nodes 𝑛 in a complete binary tree with height ℎ is 𝑛 = 2ℎ+1 − 1. This
gives us: ℎ = log(𝑛 + 1).
Finally, replacing 2ℎ+1 − 1 with 𝑛, gives: 𝑆 = 𝑛 − (ℎ − 1) = O(𝑛 − 𝑙𝑜𝑔𝑛) = O(𝑛 − ℎ).
Problem-11 Give an algorithm to find all elements less than some value of 𝑘 in a binary heap.
Solution: Start from the root of the heap. If the value of the root is smaller than 𝑘 then print its value and call
recursively once for its left child and once for its right child. If the value of a node is greater or equal than 𝑘 then
the function stops without printing that value.
The complexity of this algorithm is O(𝑛), where 𝑛 is the total number of nodes in the heap. This bound takes place
in the worst case, where the value of every node in the heap will be smaller than 𝑘, so the function has to call each
node of the heap.
Problem-12 Give an algorithm for merging two binary max-heaps. Let us assume that the size of the first heap
is 𝑚 + 𝑛 and the size of the second heap is 𝑛.
Solution: One simple way of solving this problem is:
• Assume that the elements of the first array (with size 𝑚 + 𝑛) are at the beginning. That means, first 𝑚 cells
are filled and remaining 𝑛 cells are empty.
• Without changing the first heap, just append the second heap and heapify the array.
• Since the total number of elements in the new array is 𝑚 + 𝑛, each heapify operation takes O(𝑙𝑜𝑔(𝑚 + 𝑛)).
The complexity of this algorithm is : O((𝑚 + 𝑛)𝑙𝑜𝑔(𝑚 + 𝑛)).
Problem-13 Can we improve the complexity of Problem-12?
Solution: Instead of heapifying all the elements of the 𝑚 + 𝑛 array, we can use the technique of “building heap
with an array of elements (heapifying array)”. We can start with non-leaf nodes and heapify them. The algorithm
can be given as:
• Assume that the elements of the first array (with size 𝑚 + 𝑛) are at the beginning. That means, the first 𝑚
cells are filled and the remaining 𝑛 cells are empty.
• Without changing the first heap, just append the second heap.
• Now, find the first non-leaf node and start heapifying from that element.
In the theory section, we have already seen that building a heap with 𝑛 elements takes O(𝑛) complexity. The
complexity of merging with this technique is: O(𝑚 + 𝑛).
Problem-14 Is there an efficient algorithm for merging 2 max-heaps (stored as an array)? Assume both arrays
have 𝑛 elements.
Solution: The alternative solution for this problem depends on what type of heap it is. If it's a standard heap
where every node has up to two children and which gets filled up so that the leaves are on a maximum of two
different rows, we cannot get better than O(𝑛) for the merge.
There is an O(𝑙𝑜𝑔𝑚 × 𝑙𝑜𝑔𝑛) algorithm for merging two binary heaps with sizes 𝑚 and 𝑛. For 𝑚 = 𝑛, this algorithm
takes O(𝑙𝑜𝑔2 𝑛) time complexity. We will be skipping it due to its difficulty and scope.
For better merging performance, we can use another variant of binary heap like a 𝐹𝑖𝑏𝑜𝑛𝑎𝑐𝑐𝑖-𝐻𝑒𝑎𝑝 which can merge
in O(1) on average (amortized).
Problem-15 Give an algorithm for finding the 𝑘 𝑡ℎ smallest element in min-heap.
Solution: One simple solution to this problem is: perform deletion 𝑘 times from min-heap.
int findKthLargestEle(struct Heap *h, int k) {
//Just delete first k-1 elements and return the k-th element.
for(int i=0;i<k-1;i++)
deleteMin(h);
return deleteMin(h);
}
Time Complexity: O(𝑘𝑙𝑜𝑔𝑛). Since we are performing deletion operation 𝑘 times and each deletion takes O(𝑙𝑜𝑔𝑛).
Problem-16 For Problem-15, can we improve the time complexity?
Solution: Assume that the original min-heap is called 𝐻𝑂𝑟𝑖𝑔 and the auxiliary min-heap is named 𝐻𝐴𝑢𝑥. Initially,
the element at the top of 𝐻𝑂𝑟𝑖𝑔, the minimum one, is inserted into 𝐻𝐴𝑢𝑥. Here we don’t do the operation of
deleteMin with 𝐻𝑂𝑟𝑖𝑔.
Heap HOrig;
Heap HAux;
int findKthLargestEle( int k ) {
int heapElement;//Assuming heap data is of integers
int count=1;
HAux.insert(HOrig.min());
while( true ) {
c++;
}
int deQueue() {
return PQ.deleteMin();
}
int Front() {
return PQ.min();
}
int size() {
return PQ.size();
}
int isEmpty() {
return PQ.isEmpty();
}
Note: We could also decrement 𝑐 when popping.
Observation: We could use just the current system time instead of 𝑐 (to avoid overflow). The implementation based
on this can be given as:
void enQueue(int element) {
PQ.insert(gettime(),element);
}
Note: The only change is that we need to take a positive 𝑐 value instead of negative.
Problem-21 Given a big file containing billions of numbers, how can you find the 10 maximum numbers from
that file?
Solution: Always remember that when you need to find max 𝑛 elements, the best data structure to use is priority
queues.
One solution for this problem is to divide the data in sets of 1000 elements (let’s say 1000) and make a heap of
them, and then take 10 elements from each heap one by one. Finally heap sort all the sets of 10 elements and take
the top 10 among those. But the problem in this approach is where to store 10 elements from each heap. That
may require a large amount of memory as we have billions of numbers.
Reusing the top 10 elements (from the earlier heap) in subsequent elements can solve this problem. That means
take the first block of 1000 elements and subsequent blocks of 990 elements each. Initially, Heapsort the first set
of 1000 numbers, take max 10 elements, and mix them with 990 elements of the 2𝑛𝑑 set. Again, Heapsort these
1000 numbers (10 from the first set and 990 from the 2𝑛𝑑 set), take 10 max elements, and mix them with
990 elements of the 3𝑟𝑑 set. Repeat till the last set of 990 (or less) elements and take max 10 elements from the final
heap. These 10 elements will be your answer.
Time Complexity: O(𝑛) = 𝑛/1000 ×(complexity of Heapsort 1000 elements) Since complexity of heap sorting 1000
elements will be a constant so the O(𝑛) = 𝑛 i.e. linear complexity.
Problem-22 Merge 𝒌 sorted lists with total of 𝒏 elements: We are given 𝑘 sorted lists with total 𝑛 inputs in
all the lists. Give an algorithm to merge them into one single sorted list.
n
Solution: Since there are 𝑘 equal size lists with a total of 𝑛 elements, the size of each list is k. One simple way of solving this problem is:
n
• Take the first list and merge it with the second list. Since the size of each list is k, this step produces a sorted
2n 2n
list with size k . This is similar to merge sort logic. The time complexity of this step is: k . This is because we need to scan all the
elements of both the lists.
3n
• Then, merge the second list output with the third list. As a result, this step produces a sorted list with size k . The time complexity
3n 2n n
of this step is: . This is because we need to scan all the elements of both lists (one with size and the other with size ).
k k k
• Continue this process until all the lists are merged to one list.
2n 3n 4n kn in n n(k2 )
Total time complexity: = + + + ⋯. = ∑ni=2 k = k ∑ni=2 i ≈ ≈O(𝑛𝑘).
k k k k k
Space Complexity: O(1).
Problem-23 For Problem-22, can we improve the time complexity?
Solution:
1 Divide the lists into pairs and merge them. That means, first take two lists at a time and merge them so
that the total elements parsed for all lists is O(𝑛). This operation gives 𝑘/2 lists.
2 Repeat step-1 until the number of lists becomes one.
Time complexity: Step-1 executes 𝑙𝑜𝑔𝑘 times and each operation parses all 𝑛 elements in all the lists for making
𝑘/2 lists. For example, if we have 8 lists, then the first pass would make 4 lists by parsing all 𝑛 elements. The
second pass would make 2 lists by again parsing 𝑛 elements and the third pass would give 1 list by again parsing
𝑛 elements. As a result the total time complexity is O(𝑛𝑙𝑜𝑔𝑛).
Space Complexity: O(𝑛).
Problem-24 For Problem-23, can we improve the space complexity?
Solution: Let us use heaps for reducing the space complexity.
1. Build the max-heap with all the first elements from each list in O(𝑘).
2. In each step, extract the maximum element of the heap and add it at the end of the output.
3. Add the next element from the list of the one extracted. That means we need to select the next element of
the list which contains the extracted element of the previous step.
4. Repeat step-2 and step-3 until all the elements are completed from all the lists.
Time Complexity = O(𝑛𝑙𝑜𝑔𝑘 ). At a time we have 𝑘 elements max-heap and for all 𝑛 elements we have to read just the
heap in 𝑙𝑜𝑔𝑘 time, so total time = O(𝑛𝑙𝑜𝑔𝑘). Space Complexity: O(𝑘) [for Max-heap].
Problem-25 Given 2 arrays 𝐴 and 𝐵 each with 𝑛 elements. Give an algorithm for finding largest 𝑛 pairs
(𝐴[𝑖], 𝐵[𝑗]).
Solution:
Algorithm:
• Heapify 𝐴 and 𝐵. This step takes O(2𝑛) ≈O(𝑛).
• Then keep on deleting the elements from both the heaps. Each step takes O(2𝑙𝑜𝑔𝑛) ≈O(𝑙𝑜𝑔𝑛).
Total Time complexity: O(𝑛𝑙𝑜𝑔𝑛).
Problem-26 Min-Max heap: Give an algorithm that supports min and max in O(1) time, insert, delete min,
and delete max in O(𝑙𝑜𝑔𝑛) time. That means, design a data structure which supports the following operations:
Operation Complexity
init O(𝑛)
insert O(𝑙𝑜𝑔𝑛)
findMin O(1)
findMax O(1)
deleteMin O(𝑙𝑜𝑔𝑛)
deleteMax O(𝑙𝑜𝑔𝑛)
Solution: This problem can be solved using two heaps. Let us say two heaps are: Minimum-Heap Hmin and
Maximum-Heap Hmax. Also, assume that elements in both the arrays have mutual pointers. That means, an
element in Hmin will have a pointer to the same element in Hmax and an element in Hmax will have a pointer to the
same element in Hmin.
init Build Hmin in O(𝑛) and Hmax in O(𝑛)
insert(x) Insert x to Hmin in O(𝑙𝑜𝑔𝑛). Insert x to Hmax in O(𝑙𝑜𝑔𝑛). Update the pointers in O(1)
findMin() Return root(Hmin) in O(1)
findMax Return root(Hmax) in O(1)
Delete the minimum from Hmin in O(𝑙𝑜𝑔𝑛). Delete the same element from Hmax by using
deleteMin
the mutual pointer in O(𝑙𝑜𝑔𝑛)
Delete the maximum from Hmax in O(𝑙𝑜𝑔𝑛). Delete the same element from Hmin by using
deleteMax
the mutual pointer in O(𝑙𝑜𝑔𝑛)
Problem-27 Dynamic median finding. Design a heap data structure that supports finding the median.
Solution: In a set of 𝑛 elements, median is the middle element, such that the number of elements lesser than the
median is equal to the number of elements larger than the median. If 𝑛 is odd, we can find the median by sorting
the set and taking the middle element. If 𝑛 is even, the median is usually defined as the average of the two middle
elements. This algorithm works even when some of the elements in the list are equal. For example, the median of
the multiset {1, 1, 2, 3, 5} is 2, and the median of the multiset {1, 1, 2, 3, 5, 8} is 2.5.
“𝑀𝑒𝑑𝑖𝑎𝑛 ℎ𝑒𝑎𝑝𝑠” are the variant of heaps that give access to the median element. A median heap can be implemented
using two heaps, each containing half the elements. One is a max-heap, containing the smallest elements; the
other is a min-heap, containing the largest elements. The size of the max-heap may be equal to the size of the
min-heap, if the total number of elements is even. In this case, the median is the average of the maximum element
of the max-heap and the minimum element of the min-heap. If there is an odd number of elements, the max-heap
will contain one more element than the min-heap. The median in this case is simply the maximum element of the
max-heap.
Problem-28 Maximum sum in sliding window: Given array A[] with sliding window of size 𝑤 which is moving
from the very left of the array to the very right. Assume that we can only see the 𝑤 numbers in the window.
Each time the sliding window moves rightwards by one position. For example: The array is [1 3 -1 -3 5 3 6 7],
and 𝑤 is 3.
Window position Max
[1 3 -1] -3 5 3 6 7 3
1 [3 -1 -3] 5 3 6 7 3
1 3 [-1 -3 5] 3 6 7 5
1 3 -1 [-3 5 3] 6 7 5
1 3 -1 -3 [5 3 6] 7 6
1 3 -1 -3 5 [3 6 7] 7
Input: A long array A[], and a window width 𝑤. Output: An array B[], B[i] is the maximum value of from A[i]
to A[i+w-1]
Requirement: Find a good optimal way to get B[i]
Solution: Brute force solution is, every time the window is moved we can search for a total of 𝑤 elements in the
window.
Time complexity: O(𝑛𝑤).
Problem-29 For Problem-28, can we reduce the complexity?
Solution: Yes, we can use heap data structure. This reduces the time complexity to O(𝑛𝑙𝑜𝑔𝑤). Insert operation
takes O(𝑙𝑜𝑔𝑤) time, where 𝑤 is the size of the heap. However, getting the maximum value is cheap; it merely takes
constant time as the maximum value is always kept in the root (head) of the heap. As the window slides to the
right, some elements in the heap might not be valid anymore (range is outside of the current window). How should
we remove them? We would need to be somewhat careful here. Since we only remove elements that are out of the
window’s range, we would need to keep track of the elements’ indices too.
Problem-30 For Problem-28, can we further reduce the complexity?
Solution: Yes, The double-ended queue is the perfect data structure for this problem. It supports
insertion/deletion from the front and back. The trick is to find a way such that the largest element in the window
would always appear in the front of the queue. How would you maintain this requirement as you push and pop
elements in and out of the queue?
Besides, you will notice that there are some redundant elements in the queue that we shouldn’t even consider.
For example, if the current queue has the elements: [10 5 3], and a new element in the window has the element
11. Now, we could have emptied the queue without considering elements 10, 5, and 3, and insert only element 11
into the queue.
Typically, most people try to maintain the queue size the same as the window’s size. Try to break away from this
thought and think out of the box. Removing redundant elements and storing only elements that need to be
considered in the queue is the key to achieving the efficient O(𝑛) solution below. This is because each element in
the list is being inserted and removed at most once. Therefore, the total number of insert + delete operations is
2𝑛.
void maxSlidingWindow(int A[], int n, int w, int B[]) {
struct DoubleEndQueue *Q = createDoubleEndQueue();
for (int i = 0; i < w; i++) {
while (!isEmptyQueue(Q) && A[i] >= A[QBack(Q)])
popBack(Q);
pushBack(Q, i);
}
for (int i = w; i < n; i++) {
B[i-w] = A[QFront(Q)];
while (!isEmptyQueue(Q) && A[i] >= A[QBack(Q)])
popBack(Q);
while (!isEmptyQueue(Q) && QFront(Q) <= i-w)
popFront(Q);
pushBack(Q, i);
}
B[n-w] = A[QFront(Q)];
}
Problem-31 A priority queue is a list of items in which each item has associated with it a priority. Items are
withdrawn from a priority queue in order of their priorities starting with the highest priority item first. If the
maximum priority item is required, then a heap is constructed such than priority of every node is greater than
the priority of its children.
Design such a heap where the item with the middle priority is withdrawn first. If there are n items in the heap,
𝑛 𝑛
then the number of items with the priority smaller than the middle priority is 2 if 𝑛 is odd, else 2 ∓ 1.
Explain how withdraw and insert operations work, calculate their complexity, and how the data structure is
constructed.
Solution: We can use one min heap and one max heap such that root of the min heap is larger than the root of
the max heap. The size of the min heap should be equal or one less than the size of the max heap. So the middle
element is always the root of the max heap.
For the insert operation, if the new item is less than the root of max heap, then insert it into the max heap; else
insert it into the min heap. After the withdraw or insert operation, if the size of heaps are not as specified above
than transfer the root element of the max heap to min heap or vice-versa. With this implementation, insert and
withdraw operation will be in O(𝑙𝑜𝑔𝑛) time.
Problem-32 Given two heaps, how do you merge (union) them?
Solution: Binary heap supports various operations quickly: Find-min, insert, decrease-key. If we have two min-
heaps, H1 and H2, there is no efficient way to combine them into a single min-heap.
For solving this problem efficiently, we can use mergeable heaps. Mergeable heaps support efficient union
operation. It is a data structure that supports the following operations:
• create-Heap(): creates an empty heap
• insert(H,X,K) : insert an item x with key K into a heap H
• find-Min(H) : return item with min key
• delete-Min(H) : return and remove
• Union(H1, H2) : merge heaps H1 and H2
Examples of mergeable heaps are:
• Binomial Heaps
• Fibonacci Heaps
Both heaps also support:
• decrease-Key(H,X,K): assign item Y with a smaller key K
• delete(H,X) : remove item X
Binomial Heaps: Unlike binary heap which consists of a single tree, a 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 heap consists of a small set of
component trees and no need to rebuild everything when union is performed. Each component tree is in a special
format, called a 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑡𝑟𝑒𝑒.
A binomial tree of order 𝑘, denoted by 𝐵𝑘 is defined recursively as follows:
• 𝐵0 is a tree with a single node
• For 𝑘 ≥ 1, 𝐵𝑘 is formed by joining two 𝐵𝑘−1 , such that the root of one tree becomes the leftmost child of
the root of the other.
Example:
𝐵0
𝐵1
𝐵2 𝐵3
Fibonacci Heaps: Fibonacci heap is another example of mergeable heap. It has no good worst-case guarantee for
any operation (except insert/create-Heap). Fibonacci Heaps have excellent amortized cost to perform each
operation. Like 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 heap, 𝑓𝑖𝑏𝑜𝑛𝑎𝑐𝑐𝑖 heap consists of a set of min-heap ordered component trees. However,
unlike binomial heap, it has
• No limit on number of trees (up to O(𝑛)), and
• No limit on height of a tree (up to O(𝑛))
Also, 𝐹𝑖𝑛𝑑-𝑀𝑖𝑛, 𝐷𝑒𝑙𝑒𝑡𝑒-𝑀𝑖𝑛, 𝑈𝑛𝑖𝑜𝑛, 𝐷𝑒𝑐𝑟𝑒𝑎𝑠𝑒-𝐾𝑒𝑦, 𝐷𝑒𝑙𝑒𝑡𝑒 all have worst-case O(𝑛) running time. However, in the
amortized sense, each operation performs very quickly.
Operation Binary Heap Binomial Heap Fibonacci Heap
create-Heap (1) (1) (1)
find-Min (1) (𝑙𝑜𝑔𝑛) (1)
delete-Min (𝑙𝑜𝑔𝑛) (𝑙𝑜𝑔𝑛) (𝑙𝑜𝑔𝑛)
5 6 8
3 1
After Insertion of 7:
7 6 8
3 1 5
After Insertion of 2:
7 6 8
3 1 5 2
7 9 8
3 1 5 2 6
After Insertion of 4:
7 9 8
3 1 5 2 6 4
Problem-35 A complete binary min-heap is made by including each integer in [1,1023] exactly once. The depth
of a node in the heap is the length of the path from the root of the heap to that node. Thus, the root is at depth
0. The maximum depth at which integer 9 can appear is___.
Solution: As shown in the figure below, for a given number 𝑖, we can fix the element 𝑖 at 𝑖 𝑡ℎ level and arrange the
numbers 1 to 𝑖 − 1 to the levels above. Since the root is at depth 𝑧𝑒𝑟𝑜, the maximum depth of the 𝑖 𝑡ℎ element in a
min-heap is 𝑖 − 1. Hence, the maximum depth at which integer 9 can appear is 8.
1
2
3
4
5
6
7
8
9
Problem-36 A 𝑑-ary heap is like a binary heap, but instead of 2 children, nodes have 𝑑 children. How would
you represent a 𝑑-ary heap with 𝑛 elements in an array? What are the expressions for determining the parent
of a given element, 𝑃𝑎𝑟𝑒𝑛𝑡(𝑖), and a 𝑗𝑡ℎ child of a given element, 𝐶ℎ𝑖𝑙𝑑(𝑖, 𝑗), where 1 ≤ j ≤ d?
Solution: The following expressions determine the parent and 𝑗𝑡ℎ child of element i (where 1 ≤ j ≤ d):
𝑃𝑎𝑟𝑒𝑛𝑡(𝑖) = 𝑖+𝑑−2
⌊ ⌋
𝑑
𝐶ℎ𝑖𝑙𝑑(𝑖, 𝑗) = (𝑖 − 1). 𝑑 + 𝑗 + 1
Problem-37 Given an integer array, sort the integers in the array in ascending order by the number of 1's in
their binary representation and in case of two or more integers have the same number of 1's you have to sort
them in ascending order.
Solution: The very basic approach to solve this problem is to count the number of 1's (set bits) in each element of
the array and then use a comparator function to sort the array. The comparator function compares two elements.
If both elements contain a different number of set bits then the number which contains less number of set bits
comes first else smaller number comes first. We can use a min-heap for solving this problem. In the min-heap, by
storing pair<count of ones, number> as the values the pair with minimum number of ones will be at the top.
vector<int> sortByBits(vector<int>& A) {
priority_queue<pair<int,int>, vector<pair<int,int>>, greater<pair<int,int>>> pq;
for(auto element: A){
int counterSetBits = 0;
int t = element;
while(element){
element = (element&(element-1));
counterSetBits++;
}
pq.push({counterSetBits, t});
}
vector<int> v;
while(pq.size()){
v.push_back(pq.top().second);
pq.pop();
}
return v;
}
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛).
Problem-38 The K Weakest Rows in a Matrix: We are given an m x n binary matrix mat of 1's (representing
soldiers) and 0's (representing civilians). The soldiers are positioned in front of the civilians. That is, all the 1's
will appear to the left of all the 0's in each row. A row i is weaker than a row j if one of the following is true:
• The number of soldiers in row i is less than the number of soldiers in row j.
• Both rows have the same number of soldiers and i < j.
• Return the indices of the k weakest rows in the matrix ordered from weakest to strongest.
Solution: We can solve this problem using a combination of priority queues and binary search. The idea is to use
a priority queue to keep track of the k weakest rows seen so far. We can maintain the size of the priority queue at
k by popping the strongest row from the queue whenever its size exceeds k.
To determine the number of soldiers in each row, we can use binary search to find the index of the first zero in
each row. The number of soldiers in a row will be equal to the index of the first zero.
Here is the algorithm in more detail:
• Create an empty priority queue called "pq".
• Iterate through each row i in the matrix mat.
• Use binary search to find the index of the first zero in row i. Let this index be "j".
• Add a tuple (number of soldiers, row index) = (j, i) to the priority queue "pq".
• If the size of "pq" exceeds k, pop the strongest row (i.e., the row with the largest number of soldiers) from
"pq".
• After iterating through all rows, extract the row indices from "pq" and return them in a list.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int numSoldiers;
int rowIndex;
} Row;
int binarySearch(int *row, int n) {
int lo = 0, hi = n - 1;
while (lo <= hi) {
int mid = (lo + hi) / 2;
if (row[mid] == 1) {
lo = mid + 1;
} else {
hi = mid - 1;
}
}
return lo;
}
int cmp(const void *a, const void *b) {
Row *rowA = (Row *)a;
Row *rowB = (Row *)b;
if (rowA->numSoldiers == rowB->numSoldiers) {
return rowA->rowIndex - rowB->rowIndex;
}
return rowA->numSoldiers - rowB->numSoldiers;
}
int *kWeakestRows(int **mat, int matSize, int *matColSize, int k, int *returnSize) {
Row *rows = (Row *)malloc(matSize * sizeof(Row));
for (int i = 0; i < matSize; i++) {
rows[i].numSoldiers = binarySearch(mat[i], matColSize[i]);
rows[i].rowIndex = i;
}
qsort(rows, matSize, sizeof(Row), cmp);
*returnSize = k;
int *result = (int *)malloc(k * sizeof(int));
for (int i = 0; i < k; i++) {
result[i] = rows[i].rowIndex;
}
free(rows);
return result;
}
The time complexity of this algorithm is O(mn log k), where m is the number of rows and n is the number of
columns in the matrix. The binary search takes O(log n) time for each row, and we iterate through m rows, so the
total time complexity of the binary search step is O(m log n). The priority queue has a maximum size of k, so each
push and pop operation takes O(log k) time, and we perform at most mn such operations, leading to a total time
complexity of O(mn log k). The space complexity of this algorithm is O(k), which is the space required to store the
priority queue.
Chapter
Disjoint Sets
ADT 8
8.1 Introduction
In this chapter, we will represent an important mathematics concept: 𝑠𝑒𝑡𝑠. This means how to represent a group
of elements which do not need any order. The disjoint sets ADT is the one used for this purpose. It is used for
solving the equivalence problem. It is very simple to implement. A simple array can be used for the implementation
and each function takes only a few lines of code. Disjoint sets ADT acts as an auxiliary data structure for many
other algorithms (for example, 𝐾𝑟𝑢𝑠𝑘𝑎𝑙’𝑠 algorithm in graph theory). Before starting our discussion on disjoint sets
ADT, let us look at some basic properties of sets.
8.4 Applications
Disjoint sets ADT have many applications and a few of them are:
• To represent network connectivity
• Image processing
• To find least common ancestor
• To define equivalence of finite state automata
• Kruskal's minimum spanning tree algorithm (graph theory)
• In game algorithms
To differentiate the root node, let us assume its parent is the same as that of the element in the array. Based on
this representation, MAKESET, FIND, UNION operations can be defined as:
• (𝑋): Creates a new set containing a single element 𝑋 and in the array update the parent of 𝑋 as 𝑋. That
means root (set name) of 𝑋 is 𝑋.
• UNION(𝑋, 𝑌): Replaces the two sets containing 𝑋 and 𝑌 by their union and in the array updates the parent
of 𝑋 as 𝑌.
X Y Y
X
1 4 X
6 7 X
6 7
X
2 5 X
1 4
2 5
• FIND(X): Returns the name of the set containing the element 𝑋. We keep on searching for 𝑋’𝑠 set name
until we come to the root of the tree.
X
1 4
2 X
To perform a UNION on two sets, we merge the two trees by making the root of one tree point to the root of the
other.
Initial Configuration for the elements 0 to 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
Parent
Array
After UNION(5,6)
0 1 2 3 4 6
0 1 2 3 4 6
5
Parent 5
Array
After UNION(1,2)
0 2 3 4 6
0 2 3 4 6
1 5
Parent Array
1 5
After UNION(0,2)
2 3 4 6
2 3 4 6
0 1 5
Parent Array
0 1 5
One important thing to observe here is, UNION operation is changing the root’s parent only, but not for all the
elements in the sets. Due to this, the time complexity of UNION operation is O(1). A FIND(𝑋) on element 𝑋 is
performed by returning the root of the tree containing 𝑋.
The time to perform this operation is proportional to the depth of the node representing 𝑋. Using this method, it
is possible to create a tree of depth 𝑛 – 1 (Skew Trees). The worst-case running time of a FIND is O(𝑛) and 𝑚
consecutive FIND operations take O(𝑚𝑛) time in the worst case.
MAKESET
void MAKESET( int S[], int size) {
for(int i = size-1; i >=0; i-- )
S[i] = i;
}
FIND
int FIND(int S[], int size, int X) {
if(!(X >= 0 && X < size))
return -1;
if( S[X] == X )
return X;
else return FIND(S, S[X]);
}
UNION
void UNION( int S[], int size, int root1, int root2 ) {
if(FIND(S, size, root1) == FIND(S, size, root2))
return;
if(!((root1 >= 0 && root1 < size) && (root2 >= 0 && root2 < size)))
return;
S[root1] = root2;
}
UNION by Size
In the earlier representation, for each element 𝑖 we have stored 𝑖 (in the parent array) for the root element and for
other elements we have stored the parent of 𝑖. But in this approach we store negative of the size of the tree (that
means, if the size of the tree is 3 then store −3 in the parent array for the root element). For the previous example
(after UNION(0,2)), the new representation will look like:
2 3 4 6
-3 -1 -1 -2
0 1 5
Parent Array
2 2 6
Assume that the size of one element set is 1 and store −1. Other than this there is no change.
MAKESET
void MAKESET( int S[], int size) {
for(int i = size-1; i >= 0; i-- )
S[i] = -1;
}
FIND
int FIND(int S[], int size, int X) {
if(!(X >= 0 && X < size)) )
return -1;
if( S[X] == -1 )
return X;
else return FIND(S, S[X]);
}
UNION by Size
void UNIONBySize(int S[], int size, int root1, int root2) {
if((FIND(S, size, root1) == FIND(S, size, root2)) && FIND(S, size, root1) != -1)
return;
if( S[root2] < S[root1] ) {
S[root1] = root2;
S[root2] += S[root1];
}
else {
S[root2] = root1;
S[root1] += S[root2];
}
}
Note: There is no change in FIND operation implementation.
2 3 4 6
-2 -1 -1 -2
0 1 5
Parent Array
2 2 6
UNION by Height
void UNIONByHeight(int S[], int size, int root1, int root2) {
if((FIND(S, size, root1) == FIND(S, size, root2)) && FIND(S, size, root1) != -1)
return;
if( S[root2] < S[root1] )
S[root1] = root2;
else {
if( S[root2] == S[root1] ){
S[root1]--;
S[root2] = root1;
}
}
}
Note: For FIND operation there is no change in the implementation.
Similarly with UNION by height, if we take the UNION of two trees of the same height, the height of the UNION is
one larger than the common height, and otherwise equal to the max of the two heights. This will keep the height
of tree of 𝑛 nodes from growing past O(𝑙𝑜𝑔𝑛). A sequence of 𝑚 UNIONs and FINDs can then still cost O(𝑚 𝑙𝑜𝑔𝑛).
Path Compression
FIND operation traverses a list of nodes on the way to the root. We can make later FIND operations efficient by
making each of these vertices point directly to the root. This process is called 𝑝𝑎𝑡ℎ 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛. For example, in
the FIND(𝑋) operation, we travel from 𝑋 to the root of the tree. The effect of path compression is that every node
on the path from 𝑋 to the root has its parent changed to the root.
1 1
X
2 3 X
2 3
4 5 6 7 4 5 6 7
With path compression the only change to the FIND function is that 𝑆[𝑋] is made equal to the value returned by
FIND. That means, after the root of the set is found recursively, 𝑋 is made to point directly to it. This happen
recursively to every node on the path to the root.
8.10 Summary
Performing 𝑚 union-find operations on a set of 𝑛 objects.
Algorithm Worst-case time
Quick-Find 𝑚𝑛
Quick-Union 𝑚𝑛
Quick-Union by Size/Height 𝑛 + 𝑚 𝑙𝑜𝑔𝑛
Path compression 𝑛 + 𝑚 𝑙𝑜𝑔𝑛
Quick-Union by Size/Height + Path Compression (𝑚 + 𝑛) 𝑙𝑜𝑔𝑛
8.10 Summary 239
Data Structures and Algorithms Made Easy Disjoint Sets ADT
Chapter
Graph
Algorithms 9
9.1 Introduction
In the real world, many problems are represented in terms of objects and connections between them. For example,
in an airline route map, we might be interested in questions like: “What’s the fastest way to go from Hyderabad to
New York?” 𝑜𝑟 “What is the cheapest way to go from Hyderabad to New York?” To answer these questions we need
information about connections (airline routes) between objects (towns). Graphs are data structures used for
solving these kinds of problems.
As part of this chapter, you will learn several ways to traverse graphs and how you can do useful things while
traversing the graph in some order. We will also talk about shortest paths algorithms. We will finish with minimum
spanning trees, which are used to plan road, telephone and computer networks and also find applications in
clustering and approximate algorithms.
9.2 Glossary
Graph: A graph G is simply a way of encoding pairwise relationships among a set of objects: it consists of a collection
V of nodes and a collection E of edges, each of which “joins” two of the nodes. We thus represent an edge e in E
as a two-element subset of V: e = {u, v} for some u, v in V, where we call u and v the ends of e.
Edges in a graph indicate a symmetric relationship between their ends. Often we want to encode asymmetric
relationships, and for this, we use the closely related notion of a directed graph. A directed graph G’ consists of a
set of nodes V and a set of directed edges E’. Each e’ in E’ is an ordered pair (u, v); in other words, the roles of u
and v are not interchangeable, and we call u the tail of the edge and v the head. We will also say that edge e’ leaves
node u and enters node v.
When we want to emphasize that the graph we are considering is 𝑛𝑜𝑡 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑, we will call it an 𝑢𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑔𝑟𝑎𝑝ℎ;
by default, however, the term “graph” will mean an undirected graph. It is also worth mentioning two warnings in
our use of graph terminology. First, although an edge e in an undirected graph should properly be written as a
set of nodes {u, u}, one will more often see it written in the notation used for ordered pairs: e = (u, v). Second, a
node in a graph is also frequently called a vertex; in this context, the two words have exactly the same meaning.
• 𝑉𝑒𝑟𝑡𝑖𝑐𝑒𝑠 and 𝑒𝑑𝑔𝑒𝑠 are positions and store elements
• Definitions that we use:
o 𝐷𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑒𝑑𝑔𝑒:
▪ Ordered pair of vertices (𝑢, 𝑣)
▪ First vertex 𝑢 is the origin
▪ Second vertex 𝑣 is the destination
▪ Example: one-way road traffic
𝑢 𝑣
o 𝑈𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑒𝑑𝑔𝑒:
▪ Unordered pair of vertices (𝑢, 𝑣)
▪ Example: railway lines
𝑢 𝑣
o 𝐷𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑔𝑟𝑎𝑝ℎ:
▪ All the edges are directed
▪ Example: route network
A B
D C
o 𝑈𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑔𝑟𝑎𝑝ℎ:
▪ All the edges are undirected
▪ Example: flight network
C D F
B A E
• When an edge connects two vertices, the vertices are said to be adjacent to each other and the edge is
incident on both vertices.
• A graph with no cycles is called a 𝑡𝑟𝑒𝑒. A tree is an acyclic connected graph.
C D F
B A E
• A self loop is an edge that connects a vertex to itself.
• Two edges are parallel if they connect the same pair of vertices.
A B
• A path in a graph is a sequence of adjacent vertices. Simple path is a path with no repeated vertices. In the
graph below, the dotted lines represent a path from 𝐺 to 𝐸.
C D F
B A E
• A cycle is a path where the first and last vertices are the same. A simple cycle is a cycle with no repeated
vertices or edges (except the first and last vertices).
C D F
B A E
We say that an undirected graph is connected if, for every pair of nodes u and v, there is a path from u to v.
Choosing how to define connectivity of a directed graph is a bit more subtle, since it’s possible for u to have a path
to v while v has no path to u. We say that a directed graph is strongly connected if, for every two nodes u and v,
there is a path from u to v and a path from v to u.
• We say that one vertex is connected to another if there is a path that contains both of them.
• A graph is connected if there is a path from 𝑒𝑣𝑒𝑟𝑦 vertex to every other vertex.
• If a graph is not connected then it consists of a set of connected components.
C D F
B A E
• A 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑎𝑐𝑦𝑐𝑙𝑖𝑐 𝑔𝑟𝑎𝑝ℎ [DAG] is a directed graph with no cycles.
A B
D C
• In 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑔𝑟𝑎𝑝ℎ𝑠 integers (𝑤𝑒𝑖𝑔ℎ𝑡𝑠) are assigned to each edge to represent (distances or costs).
A B
7 8
5 C 5
9 7
E
15
6 9
8
F G
11
• In addition to simply knowing about the existence of a path between some pair of nodes u and v, we may
also want to know whether there is a short path. Thus we define the distance between two nodes u and v
to be the minimum number of edges in a u-v path.
• A forest is a disjoint set of trees.
• A spanning tree of a connected graph is a subgraph that contains all of that graph’s vertices and is a single
tree. A spanning forest of a graph is the union of spanning trees of its connected components.
• A bipartite graph is a graph whose vertices can be divided into two sets such that all edges connect a vertex
in one set with a vertex in the other set.
D C
• Graphs with relatively few edges (generally if it edges < |𝑉| log |𝑉|) are called 𝑠𝑝𝑎𝑟𝑠𝑒 𝑔𝑟𝑎𝑝ℎ𝑠.
• Graphs with relatively few of the possible edges missing are called 𝑑𝑒𝑛𝑠𝑒 graphs.
• Directed weighted graphs are sometimes called 𝑛𝑒𝑡𝑤𝑜𝑟𝑘.
• We will denote the number of vertices in a given graph by |𝑉|, and the number of edges by |𝐸|. Note that
|𝑉|(|𝑉| − 1)
𝐸 can range anywhere from 0 to (in undirected graph). This is because each node can connect to
2
every other node.
Adjacency Matrix
Graph Declaration for Adjacency Matrix
First, let us look at the components of the graph data structure. To represent graphs, we need the number of
vertices, the number of edges and also their interconnections. So, the graph can be declared as:
struct Graph {
int V;
int E;
int **adjMatrix; // we need two dimensional matrix
};
Description
The adjacency matrix of a graph is a square matrix of size 𝑉 × 𝑉. The 𝑉 is the number of vertices of the graph G.
The values of matrix are boolean. Let us assume the matrix is 𝑎𝑑𝑗𝑀𝑎𝑡𝑟𝑖𝑥. The value 𝑎𝑑𝑗𝑀𝑎𝑡𝑟𝑖𝑥[𝑢, 𝑣] is set to 1 if
there is an edge from vertex u to vertex v and 0 otherwise. In the matrix, each edge is represented by two bits for
undirected graphs. That means, an edge from u to v is represented by 1 value in both 𝑎𝑑𝑗𝑀𝑎𝑡𝑟𝑖𝑥[u, v] and
𝑎𝑑𝑗𝑀𝑎𝑡𝑟𝑖𝑥[𝑢, 𝑣]. To save time, we can process only half of this symmetric matrix. Also, we can assume that there
is an “edge” from each vertex to itself. So, 𝑎𝑑𝑗𝑀𝑎𝑡𝑟𝑖𝑥[u, u] is set to 1 for all vertices.
If the graph is a directed graph then we need to mark only one entry in the adjacency matrix. As an example,
consider the directed graph below.
A B
D C
The adjacency matrix for this graph can be given as:
A B C D
A 0 1 0 1
B 0 0 1 0
C 1 0 0 1
D 0 0 0 0
Now, let us concentrate on the implementation. To read a graph, one way is to first read the vertex names and
then read pairs of vertex names (edges). The code below reads an undirected graph.
// This code creates a graph with adj matrix representation
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#define MAX_VERTICES 50 // max number of vertices for our graph
#define MAX_DEGREE 50 // max degree for a vertex
struct graph{
int V; // number of vertices
int E; // number of edges
int **adjMatrix; // adjacency matrix
};
struct edge{
int source;
int destination;
};
if (G->adjMatrix[x][y] == 1){
G->adjMatrix[x][y] = 0;
G->adjMatrix[y][x] = 0;
(G->E)--;
}
}
void destroyGraph(struct graph* G){ // to free memory
if (G){
if (G->adjMatrix){
int i;
for (i = 0; i < G->V; i++)
free(G->adjMatrix[i]);
free(G->adjMatrix);
}
free(G);
}
}
struct edge newEdge(int x, int y){
// return an edge with ends x and y
struct edge e;
e.source = x;
e.destination = y;
return e;
}
struct graph* randomGraph(const int N, const float p){
// A random graph with N vertices and probability p for each edge
int i, j;
struct edge E;
struct graph* G = createGraph(N);
rand_init();
for (i=0; i < N; i++) for(j=i+1; j < N; j++) {
if (rand() < p * RAND_MAX) { // rand() returns an integer between 0 and RAND_MAX
E = newEdge(i,j);
insertEdge(G, E);
}
}
return G;
}
void main(void){ // Test code
struct edge E;
struct graph* G = randomGraph(10, 0.15);
displayGraph(G);
E = newEdge(5,6);
insertEdge(G, E);
displayGraph(G);
printf("\n");
displayEdges(G);
removeEdge(G, E);
displayGraph(G);
printf("\n");
displayEdges(G);
destroyGraph(G);
}
The adjacency matrix representation is good if the graphs are dense. The matrix requires O(V 2 ) bits of storage and
O(V 2 ) time for initialization. If the number of edges is proportional to V 2 , then there is no problem because V 2 steps
are required to read the edges. If the graph is sparse, the initialization of the matrix dominates the running time
of the algorithm as it takes takes O(V 2 ).
The downsides of adjacency matrices are that enumerating the outgoing edges from a vertex takes O(𝑛) time even
if there aren't very many, and the O(V 2 ) space cost is high for sparse graphs, those with much fewer than V 2 edges.
The adjacency matrix representation takes O( V 2 ) amount of space while it is computed. When graph has maximum
number of edges or minimum number of edges, in both cases the required space will be same.
Adjacency List
Graph Declaration for Adjacency List
In this representation all the vertices connected to a vertex 𝑣 are listed on an adjacency list for that vertex 𝑣. This
can be easily implemented with linked lists. That means, for each vertex 𝑣 we use a linked list and list nodes
represents the connections between 𝑣 and other vertices to which 𝑣 has an edge.
The total number of linked lists is equal to the number of vertices in the graph. The graph ADT can be declared
as:
struct Graph {
int V;
int E;
int *Adj; // head pointer s to linked list
};
Description
Considering the same example as that of the adjacency matrix, the adjacency list representation can be given as:
A
B D
B
C
C
A D
D
Since vertex A has an edge for B and D, we have added them in the adjacency list for A. The same is the case with other vertices as well.
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
#include <time.h>
struct ListNode {
int vertex;
struct ListNode *next;
};
struct edge{
int source;
int destination;
};
struct graph{
int V; // number of vertices
int E; // number of edges
struct ListNode *adjList[]; // adjacency matrix
};
void rand_init(void){
// Initializes the random generator rand()
time_t t;
srand((unsigned) time(&t));
}
int insertEdge(struct graph* G, const struct edge E) {
int n, from, to;
n = G->V;
from = E.source;
to = E.destination;
if (0 > from || from > n || 0 > to || to > n) return -1;
struct ListNode *prev = NULL, *ptr = G->adjList[from];
while (ptr != NULL) {
if (ptr->vertex == to) return 0;
else {
prev = ptr;
ptr = ptr->next;
}
}
if (ptr==NULL) {
struct ListNode *newNode = (struct ListNode *) malloc(sizeof(struct ListNode));
newNode->vertex = to;
newNode->next = NULL;
if (prev == NULL) {
G->adjList[from] = newNode;
} else {
prev->next = newNode;
}
}
return 1;
}
int removeEdge(struct graph* G, const struct edge E) {
int n, from, to;
n = G->V;
from = E.source;
to = E.destination;
if (0 > from || from > n || 0 > to || to > n) return -1;
struct ListNode *prev = NULL, *ptr = G->adjList[from];
while (ptr != NULL) {
if (ptr->vertex == to) {
if (prev == NULL) {
G->adjList[from] = ptr->next;
free(ptr);
} else {
prev->next = ptr->next;
free(ptr);
}
return 1;
} else {
prev = ptr;
ptr = ptr->next;
}
}
return 0;
}
struct graph* createGraph(const int numVertices) {
assert(numVertices >= 0);
// Create an empty graph with numVertices
int i, j;
struct graph* G = (struct graph *) malloc(sizeof( struct graph));
assert(G != NULL);
G->V = numVertices;
G->E = 0;
// allocate memory for each column and initialise with 0
struct ListNode *newNode, *last;
for (int i = 0; i < G->V; i++) {
G->adjList[i] = (struct ListNode *) malloc(sizeof(struct ListNode));
assert(G->adjList[i] != NULL);
G->adjList[i]->vertex = i;
G->adjList[i]->next = NULL;
}
return G;
}
struct edge newEdge(int x, int y){ // return an edge with ends x and y
struct edge e;
e.source = x;
e.destination = y;
return e;
}
struct graph* randomGraph(const int N, const float p){
// A random graph with N vertices and probability p for each edge
int i, j;
struct edge E;
struct graph* G = createGraph(N);
rand_init();
for (i=0; i < N; i++) for(j=i+1; j < N; j++) {
if (rand() < p * RAND_MAX) { // rand() returns an integer between 0 and RAND_MAX
E = newEdge(i,j);
insertEdge(G, E);
}
}
return G;
}
void displayGraph(struct graph* G) {
struct ListNode *ptr;
int i;
for (i = 0; i < G->V; i++) {
ptr = G->adjList[i];
printf("\nnode %d neighbors:", i);
while (ptr != NULL) {
printf(" %d", ptr->vertex);
ptr = ptr->next;
}
}
}
void destroyGraph(struct graph* G) {
int i;
struct ListNode *temp, *ptr;
for (i = 0; i < G->V; i++) {
ptr = G->adjList[i];
while (ptr != NULL) {
temp = ptr;
ptr = ptr->next;
free(temp);
}
G->adjList[i] = NULL;
}
printf("\nGraph is deleted");
}
int main(int argc, char *args[]) { // Test code
struct edge E;
struct graph* G = randomGraph(10, 0.15);
displayGraph(G);
E = newEdge(5,6);
insertEdge(G, E);
displayGraph(G);
printf("\n");
removeEdge(G, E);
displayGraph(G);
printf("\n");
destroyGraph(G);
return 0;
}
For this representation, the order of edges in the input is 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡. This is because they determine the order of
the vertices on the adjacency lists. The same graph can be represented in many different ways in an adjacency
list. The order in which edges appear on the adjacency list affects the order in which edges are processed by
algorithms.
Adjacency Set
It is very much similar to adjacency list but instead of using Linked lists, Disjoint Sets [Union-Find] are used. For more details refer to the
𝐷𝑖𝑠𝑗𝑜𝑖𝑛𝑡 𝑆𝑒𝑡𝑠 𝐴𝐷𝑇 chapter.
9.5 Graph Traversals
To solve problems on graphs, we need a mechanism for traversing the graphs. Graph traversal algorithms are also
called 𝑔𝑟𝑎𝑝ℎ 𝑠𝑒𝑎𝑟𝑐ℎ algorithms. Like trees traversal algorithms (Inorder, Preorder, Postorder and Level-Order
traversals), graph search algorithms can be thought of as starting at some source vertex in a graph and "searching"
the graph by going through the edges and marking the vertices. Now, we will discuss two such algorithms for
traversing the graphs.
• Depth First Search [DFS]
• Breadth First Search [BFS]
A graph can contain cycles, which may bring you to the same node again while traversing the graph. To avoid
processing of same node again, use a boolean array which marks the node after it is processed. While visiting the
nodes in the layer of a graph, store them in a manner such that you can traverse the corresponding child nodes
in a similar order.
Initially all vertices are marked unvisited (false). The DFS algorithm starts at a vertex 𝑢 in the graph. By starting
at vertex 𝑢 it considers the edges from 𝑢 to other vertices. If the edge leads to an already visited vertex, then
backtrack to current vertex 𝑢. If an edge leads to an unvisited vertex, then go to that vertex and start processing
from that vertex. That means the new vertex becomes the current vertex. Follow this process until we reach the
dead-end. At this point start 𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘𝑖𝑛𝑔. The process terminates when backtracking leads back to the start
vertex.
As an example, consider the following graph. We can see that sometimes an edge leads to an already discovered
vertex. These edges are called 𝑏𝑎𝑐𝑘 𝑒𝑑𝑔𝑒𝑠, and the other edges are called 𝑡𝑟𝑒𝑒 𝑒𝑑𝑔𝑒𝑠 because deleting the back edges
from the graph generates a tree.
The final generated tree is called the DFS tree and the order in which the vertices are processed is called
𝐷𝐹𝑆 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 of the vertices. In the graph below, the gray color indicates that the vertex is visited (there is no other
significance). We need to see when the visited table is being updated. In the following example, DFS algorithm
traverses from A to B to C to D first, then to E, F, and G and lastly to H. It employs the following rules.
1. Visit the adjacent unvisited vertex. Mark it as visited. Display it (processing). Push it onto a stack.
2. If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the vertices from the
stack, which do not have adjacent vertices.)
3. Repeat step 1 and step 2 until the stack is empty.
Mark A as visited and put it onto the stack. Explore any unvisited adjacent node from A. We have only one adjacent
node B and we can pick that. For this example, we shall take the node in an alphabetical order. Then, mark B as
visited and put it onto the stack.
A A Visited Table
Visited Table
1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
B H B H
B
A A
Stack Stack
C E G C E G
Starting vertex A
D F is marked visited D F Vertex B is visited
Explore any unvisited adjacent node from B. Both C and H are adjacent to B but we are concerned for unvisited
nodes only. Visit C and mark it as visited and put onto the stack. Here, we have B, D, and E nodes, which are
adjacent to C and nodes D and E are unvisited. Let us choose one of them; say, D.
B H C B H D
B
C
A
B
Stack
A
C E G C E G Stack
Recursive call of DFS, Recursive call of DFS,
D F vertex C is visited D F vertex D is visited
Here D does not have any unvisited adjacent node. So, we pop D from the stack. We check the stack top for return
to the previous node and check if it has any unvisited nodes. Here, we find C to be on the top of the stack. Here,
we have B, D, and E nodes which are adjacent to C and node E is unvisited.
A A Visited
Visited Table Table
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
B H B H E
C
C
B
B
A
A
Stack
C E G C E G Stack
Here, we find E to be on the top of the stack. Here, we have C, F, G, and H nodes which are adjacent to E and nodes F, G, and H
are unvisited. Let us choose one of them; say, F. Here node F does not have any unvisited adjacent node. So, we pop F from
the stack.
A Visited Table A Visited
1 1 1 1 1 1 0 0 Table
1 1 1 1 1 1 0 0
F
B H E B H E
C C
B B
A A
C E G Stack C E G Stack
Recursive call of DFS, Backtrack from F
D F vertex F is visited D F
Here, we find E to be on the top of the stack. Here, we have C, F, G, and H nodes which are adjacent to E and nodes G, and H
are unvisited. Let us choose one of them; say, G. Here node G does not have any unvisited adjacent node. So, we pop G
from the stack.
A Visited Table A Visited
1 1 1 1 1 1 1 0 Table
1 1 1 1 1 1 1 0
G
B H E B H E
C C
B B
A A
C E G Stack C E G Stack
Here, we find E to be on the top of the stack. Here, we have C, F, G, and H nodes which are adjacent to E and node H is unvisited.
Let us choose that remaining node H. Here node H does not have any unvisited adjacent node. So, we pop H from the
stack.
H
B H E B H E
C C
B B
A A
C E G Stack C E G Stack
Now, we find E to be on the top of the stack with no unvisited nodes adjacent to it. So, we pop E from the stack.
Then, node C becomes the top of the stack. For node C too, there were no adjacent unvisited nodes. Hence, pop
node C from the stack.
B H B H
C
B B
A A
C E G C E G
Stack Stack
Similarly, we find B to be on the top of the stack with no unvisited nodes adjacent to it. So, we pop B from the
stack. Then, node A becomes the top of the stack. For node A too, there were no adjacent unvisited nodes. Hence,
pop node A from the stack. With this, the stack is empty.
B H B H
C E G A C E G
Stack
From the above diagrams, it can be seen that the DFS traversal creates a tree (without back edges) and we call
such tree a 𝐷𝐹𝑆 𝑡𝑟𝑒𝑒. In DFS, if we start from a start node it will mark all the nodes connected to the start node
as visited. Therefore, if we choose any node in a connected component and run DFS on that node it will mark the
whole connected component as visited. The above algorithm works even if the given graph has connected
components.
We can see that sometimes an edge leads to an already discovered vertex. These edges are called 𝑏𝑎𝑐𝑘 𝑒𝑑𝑔𝑒𝑠, and
the other edges are called 𝑡𝑟𝑒𝑒 𝑒𝑑𝑔𝑒𝑠 because deleting the back edges from the graph generates a tree.
The final generated tree is called the DFS tree and the order in which the vertices are processed is called
𝐷𝐹𝑆 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 of the vertices. In the graph below, the gray color indicates that the vertex is visited (there is no other
significance). We need to see when the visited table is updated.
Advantages
• Depth-first search on a binary tree generally requires less memory than breadth-first.
• Depth-first search can be easily implemented with recursion.
Disadvantages
• A DFS doesn't necessarily find the shortest path to a node, while breadth-first search does.
Applications of DFS
• Topological sorting
• Finding connected components
• Finding articulation points (cut vertices) of the graph
• Finding strongly connected components
• Solving puzzles such as mazes
For algorithms refer to 𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 𝑆𝑒𝑐𝑡𝑖𝑜𝑛.
Implementation
The algorithm based on this mechanism is given below: assume visited[] is a global array.
// Refer graph implementation from previous section
void DFS_iterative(struct graph* G, int visited[], int start){
int stack[G->V];
int top = -1, i;
visited[start] = 1;
stack[++top] = start;
struct ListNode *p = NULL;
while (top != -1) {
start = stack[top--];
printf("%d ", start);
p = G->adjList[start];
while (p) {
i = p->vertex;
if (visited[i] == 0) {
stack[++top] = i;
visited[i] = 1;
}
p = p->next;
}
}
}
void DFS_recursive(struct graph* G, int visited[], int start){
int i;
struct ListNode *p = NULL;
visited[start] = 1;
printf("%d ", start);
p = G->adjList[start];
while (p) {
i = p->vertex;
if (visited[i] == 0) {
DFS_recursive(G, visited, i);
}
p = p->next;
}
}
int main(int argc, char *args[]) {
// Test code
struct edge E;
int n = 10;
struct graph* G = randomGraph(n, 0.15);
displayGraph(G);
int i; // initialization of visited array
int visited[n];
for (i = 0; i < n; i++) visited[i] = 0;
printf("\nDFS recursive order:\n");
DFS_recursive(G, visited, 0); // DFS start from 0
printf("\nvisited by DFS:\n");
for (i=0; i<n; i++) {
if (visited[i] == 1) printf("%d ", i);
}
for (i = 0; i < n; i++) visited[i] = 0; // initialization of visited array
printf("\nDFS iterative order:\n");
DFS_iterative(G, visited, 0); // DFS start from 0
printf("\nvisited by DFS:\n");
for (i=0; i<n; i++) {
if (visited[i] == 1) printf("%d ", i);
}
return 0;
}
The time complexity of DFS is O(𝑉 + 𝐸), if we use adjacency lists for representing the graphs. This is because we
are starting at a vertex and processing the adjacent nodes only if they are not visited. Similarly, if an adjacency
matrix is used for a graph representation, then all edges adjacent to a vertex can't be found efficiently, and this
gives O(𝑉 2 ) complexity.
C E G C E G
Visited Table Visited Table
D F 0 0 0 0 0 0 0 0 D F 1 0 0 0 0 0 0 0
Next, dequeue the element B and mark it as visited. We then see an unvisited adjacent nodes C and H from node B, and enqueue them.
Then dequeue the element C, mark it as visited, and enqueue its unvisited adjacent nodes D and E to the queue. Next, dequeue the element
H, and mark it as visited. Observe that, node H does not have any unvisited adjacent nodes.
A A Vertices C and H are
B is completed. completed. Circled part is
Selected part is level 2 level 3 (add to Queue).
(add to Queue).
B H B H
Queue: C, H Queue: D, E
C E G C E G
Next, dequeue the element D, and mark it as visited. We see no unvisited adjacent nodes from node D. Then dequeue the element E, mark
it as visited, and enqueue its unvisited adjacent nodes F and G to the queue. Next, dequeue the element F, and mark it as visited. Observe
that, node G does not have any unvisited adjacent nodes. At this point, queue has only one element G. Dequeue the element and mark it
visited. Notice that, node G too does not have any unvisited adjacent nodes.
C E G C E G
Time complexity of BFS is O(𝑉 + 𝐸), if we use adjacency lists for representing the graphs, and O(𝑉 2 ) for adjacency
matrix representation.
Advantages
• A BFS will find the shortest path between the starting point and any other reachable node. A depth-first
search will not necessarily find the shortest path.
Disadvantages
• A BFS on a binary tree generally requires more memory than a DFS.
Applications of BFS
• Finding all connected components in a graph
• Finding all nodes within one connected component
• Finding the shortest path between two nodes
• Testing a graph for bipartiteness
Implementation
The implementation for the above discussion can be given as:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 40
struct Queue {
int items[SIZE];
int front;
int rear;
};
struct ListNode{
int vertex;
struct ListNode* next;
};
struct ListNode* createNode(int);
struct Graph{
int V;
struct ListNode** adjLists;
int* visited;
};
struct ListNode* createNode(int v){
struct ListNode* newNode = malloc(sizeof(struct ListNode));
newNode->vertex = v;
newNode->next = NULL;
return newNode;
}
struct Graph* createGraph(int vertices){
struct Graph* graph = malloc(sizeof(struct Graph));
graph->V = vertices;
graph->adjLists = malloc(vertices * sizeof(struct ListNode*));
}
}
}
void displayGraph(struct Graph* graph) {
struct ListNode *ptr;
int i;
for (i = 0; i < graph->V; i++) {
ptr = graph->adjLists[i];
printf("\nnode %d neighbors:", i);
while (ptr != NULL) {
printf(" %d", ptr->vertex);
ptr = ptr->next;
}
}
}
void BFS(struct Graph* graph, int startVertex) {
struct Queue* q = createQueue();
enQueue(q, startVertex);
while(!isEmpty(q)){
//printQueue(q);
int currentVertex = deQueue(q);
printf("\nVisited %d\n", currentVertex);
graph->visited[currentVertex] = 1;
struct ListNode* temp = graph->adjLists[currentVertex];
while(temp) {
int adjVertex = temp->vertex;
if(graph->visited[adjVertex] == 0){
graph->visited[adjVertex] = 1;
enQueue(q, adjVertex);
}
temp = temp->next;
}
}
}
int main(){
struct Graph* graph = createGraph(8);
addEdge(graph, 0, 1);
addEdge(graph, 1, 2);
addEdge(graph, 1, 7);
addEdge(graph, 2, 3);
addEdge(graph, 2, 5);
addEdge(graph, 4, 5);
addEdge(graph, 4, 6);
addEdge(graph, 4, 7);
displayGraph(graph);
BFS(graph, 0);
return 0;
}
If someone asks whether DFS is better or BFS is better, the answer depends on the type of the problem that we are trying to solve. BFS visits
each level one at a time, and if we know the solution we are searching for is at a low depth, then BFS is good. DFS is a better choice if the
solution is at maximum depth. The below table shows the differences between DFS and BFS in terms of their applications.
Applications DFS BFS
Spanning forest, connected components, paths, cycles Yes Yes
Shortest paths Yes
Minimal use of memory space Yes
2 9 10
We can implement topological sort using a queue. First visit all edges, counting the number of edges that lead to
each vertex (i.e., count the number of prerequisites for each vertex). Initially, 𝑖𝑛𝑑𝑒𝑔𝑟𝑒𝑒 is computed for all vertices,
starting with the vertices which are having indegree 0. That means consider the vertices which do not have any
prerequisite. To keep track of vertices with indegree zero we can use a queue.
All vertices with no prerequisites (indegree 0) are placed on the queue. We then begin processing the queue. While
the queue is not empty, a vertex 𝑣 is removed, and all edges adjacent to 𝑣 have their indegrees decremented. A
vertex is put on the queue as soon as its indegree falls to 0. The topological ordering is the order in which the
vertices deQueue. If the queue becomes empty without printing all of the vertices, then the graph contains a cycle
(i.e., there is no possible ordering for the tasks that does not violate some prerequisite).
// Refer adjacent graph representation code from previous sections.
int queue[MAX_VERTICES], front = -1, rear = -1;
int findIndegree(struct graph* G, int node) {
int i, indegree = 0;
for (i = 0; i < G->V; i++) {
if (G->adjMatrix[i][node] == 1)
indegree++;
}
return indegree;
}
void insertQueue(int node) {
if (rear == MAX_VERTICES)
printf("\nOVERFLOW ");
else {
if (front == -1) /*If queue is initially empty */
front = 0;
queue[++rear] = node;
}
}
int deleteQueue() {
int del_node;
if (front == -1 || front > rear) {
printf("\nUNDERFLOW %d %d", front, rear);
return -1;
} else {
del_node = queue[front++];
return del_node;
}
}
void topologicalSort( struct graph *G ) {
int topsort[G->V], indeg[G->V];
int i;
for (i = 0; i < G->V; i++) { /*Find the in-degree of each node*/
indeg[i] = findIndegree(G, i);
if (indeg[i] == 0)
insertQueue(i);
}
int j=0;
int del_node;
while (front <= rear){ /*Continue loop until queue is empty */
del_node = deleteQueue();
topsort[j] = del_node; /*Add the deleted node to topsort*/
j++;
for (i = 0; i < G->V; i++) { /*Delete the del_node edges */
if (G->adjMatrix[del_node][i] == 1) {
G->adjMatrix[del_node][i] = 0;
indeg[i] = indeg[i] - 1;
if (indeg[i] == 0)
insertQueue(i);
}
}
}
printf("The topological sorting can be given as:\n");
for (i=0; i<j; i++)
printf("%d ", topsort[i]);
}
void main(void){ // Test code
struct edge E;
struct graph* G = randomGraph(10, 0.15);
displayGraph(G);
topologicalSort(G);
}
The time complexity of this algorithm is O(|𝐸| + |𝑉|) if adjacency lists are used.
Note: The Topological sorting problem can be solved with DFS. Refer to the 𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 𝑆𝑒𝑐𝑡𝑖𝑜𝑛 for the algorithm.
path from 𝑠 to every other vertex in 𝐺. There are variations in the shortest path algorithms which depends on the
type of the input graph and are given below.
D E
F G
Algorithm
void UnweightedShortestPath(struct Graph *G, int s) {
struct Queue *Q = createQueue();
int v, w;
enQueue(Q, s);
for (int i = 0; i< G→V;i++)
Distance[i]=-1;
Distance[s]= 0;
while (!isEmpty(Q)) {
v = deQueue(Q);
Each vertex examined at most once
for each w adjacent to v
if(Distance[w] == -1) {
Distance[w] = Distance[v] + 1;
Path[w] = v;
enQueue(Q, w); Each vertex EnQueue’d at most once
}
}
deleteQueue(Q);
}
Running time: O(|𝐸| + |𝑉|), if adjacency lists are used. In for loop, we are checking the outgoing edges for a given
vertex and the sum of all examined edges in the while loop is equal to the number of edges which gives O(|𝐸|).
If we use matrix representation the complexity is O(|𝑉|2 ), because we need to read an entire row in the matrix of
length |𝑉| in order to find the adjacent vertices for a given vertex.
Shortest path in Weighted Graph without Negative Edge Weights [Dijkstra’s Algorithm]
A famous solution for the shortest path problem was developed by 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎. 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎’𝑠 algorithm is a generalization
of the BFS algorithm. The regular BFS algorithm cannot solve the shortest path problem as it cannot guarantee
that the vertex at the front of the queue is the vertex closest to source 𝑠.
Dijkstra's algorithm makes use of breadth-first search (which is not a single source shortest path algorithm) to
solve the single-source problem. It does place one constraint on the graph: there can be no negative weight edges.
Dijkstra's algorithm is also sometimes used to solve the all-pairs shortest path problem by simply running it on
all vertices in V. Again, this requires all edge weights to be positive.
Before going to code let us understand how the algorithm works. As in unweighted shortest path algorithm, here
too we use the distance table. The algorithm works by keeping the shortest distance of vertex 𝑣 from the source
in the 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table. The value 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒[𝑣] holds the distance from s to v. The shortest distance of the source to
itself is zero. The 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table for all other vertices is set to −1 to indicate that those vertices are not already
processed.
Vertex Distance[v] Previous vertex which gave Distance[v]
𝐴 -1 -
𝐵 -1 -
𝐶 0 -
𝐷 -1 -
𝐸 -1 -
𝐹 -1 -
𝐺 -1 -
After the algorithm finishes, the 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table will have the shortest distance from source 𝑠 to each other vertex
𝑣. To simplify the understanding of 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎’𝑠 algorithm, let us assume that the given vertices are maintained in
two sets. Initially the first set contains only the source element and the second set contains all the remaining
elements. After the 𝑘 𝑡ℎ iteration, the first set contains 𝑘 vertices which are closest to the source. These 𝑘 vertices
are the ones for which we have already computed the shortest distances from source.
The 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎’𝑠 algorithm can be better understood through an example, which will explain each step that is taken
and how 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 is calculated. The weighted graph below has 5 vertices from 𝐴 − 𝐸.
The value between the two vertices is known as the edge cost between two vertices. For example, the edge cost
between 𝐴 and 𝐶 is 1. Dijkstra’s algorithm can be used to find the shortest path from source 𝐴 to the remaining
vertices in the graph.
A B
4 4
1 E
2
C D 4
4
Initially the 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table is:
Vertex Distance[v] Previous vertex which gave Distance[v]
𝐴 0 -
𝐵 -1 -
𝐶 -1 -
𝐷 -1 -
𝐸 -1 -
After the first step, from vertex 𝐴, we can reach 𝐵 and 𝐶. So, in the 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table we update the reachability of 𝐵
and 𝐶 with their costs and the same is shown below.
𝐴 0 - A B 4
4
𝐵 4 A
𝐶 1 A 1 E
2
𝐷 -1 -
𝐸 -1 - C D 4
4
𝑆ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑓𝑟𝑜𝑚 𝐵, 𝐶 𝑓𝑟𝑜𝑚 𝐴
Now, let us select the minimum distance among all. The minimum distance vertex is 𝐶. That means, we have to
reach other vertices from these two vertices (𝐴 and 𝐶). For example, 𝐵 can be reached from 𝐴 and also from 𝐶. In
this case we have to select the one which gives the lowest cost. Since reaching 𝐵 through 𝐶 is giving the minimum
cost (1 + 2), we update the 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 table for vertex 𝐵 with cost 3 and the vertex from which we got this cost as 𝐶.
𝐴 0 - A B 4
4
𝐵 3 C
𝐶 1 A 1 E
2
𝐷 5 C
𝐸 -1 - C D 4
4
𝑆ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑡𝑜 𝐵, 𝐷 𝑢𝑠𝑖𝑛𝑔 𝐶 𝑎𝑠 𝑖𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑣𝑒𝑟𝑡𝑒𝑥
The only vertex remaining is 𝐸. To reach 𝐸, we have to see all the paths through which we can reach 𝐸 and select the one which gives the
minimum cost. We can see that if we use 𝐵 as the intermediate vertex through 𝐶 we get the minimum cost.
𝐴 0 - A B 4
𝐵 3 C 4
𝐶 1 A 1 E
𝐷 5 C 2
𝐸 7 B 4
C D
4
The final minimum cost tree which Dijkstra’s algorithm generates is:
A B 4
1 E
2
C D
4
void Dijkstra(struct Graph *G, int s) {
struct PriorityQueue *PQ = createPriorityQueue();
int v, w;
enQueue(PQ, s);
for (int i = 0; i< G→V;i++)
Distance[i]=-1;
Distance[s] = 0;
while ((!isEmpty(PQ)) {
v = deleteMin(PQ);
for all adjacent vertices w of v {
Compute new distance d= Distance[v] + weight[v][w];
if(Distance[w] == -1) {
Distance[w] = new distance d;
Insert w in the priority queue with priority d
Path[w] = v;
}
if(Distance[w] > new distance d) {
Distance[w] = new distance d;
Update priority of vertex w to be d;
Path[w] = v;
}
}
}
}
Performance
In Dijkstra’s algorithm, the efficiency depends on the number of deleteMins (𝑉 deleteMins) and updates for priority
queues (𝐸 updates) that are used. If a 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑏𝑖𝑛𝑎𝑟𝑦 ℎ𝑒𝑎𝑝 is used then the complexity is O(𝐸𝑙𝑜𝑔𝑉). The term 𝐸𝑙𝑜𝑔𝑉
comes from 𝐸 updates (each update takes 𝑙𝑜𝑔𝑉) for the standard heap. If the set used is an array then the
complexity is O(𝐸 + 𝑉 2 ).
Bellman-Ford Algorithm
The Bellman-Ford algorithm is a graph search algorithm that finds the shortest path between a given source vertex
and all other vertices in the graph. This algorithm can be used on both weighted and unweighted graphs. Like
Dijkstra's shortest path algorithm, the Bellman-Ford algorithm is guaranteed to find the shortest path in a graph.
If the graph has negative edge costs, then 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm does not work.
Though it is slower than Dijkstra's algorithm, Bellman-Ford is capable of handling graphs that contain negative
edge weights, so it is more versatile. It is worth noting that if there exists a negative cycle in the graph, then there
is no shortest path. Going around the negative cycle an infinite number of times would continue to decrease the
cost of the path (even though the path length is increasing). Because of this, Bellman-Ford can also detect negative
cycles which is a useful feature.
The Bellman-Ford algorithm operates on an input graph, G, with |V| vertices and |E| edges. A single source
vertex, s, must be provided as well, as the Bellman-Ford algorithm is a single-source shortest path algorithm. No
destination vertex needs to be supplied, however, because Bellman-Ford calculates the shortest distance to all
vertices in the graph from the source vertex.
The Bellman-Ford algorithm, like Dijkstra's algorithm, uses the principle of relaxation to find increasingly accurate
path length. Bellman-Ford, though, tackles two main issues with this process:
1. If there are negative weight cycles, the search for a shortest path will go on forever.
2. Choosing a bad ordering for relaxations leads to exponential relaxations.
The detection of negative cycles is important, but the main contribution of this algorithm is in its ordering of
relaxations. Relaxation is the most important step in Bellman-Ford. It is what increases the accuracy of the
distance to any given vertex. Relaxation works by continuously shortening the calculated distance between vertices
comparing that distance with other known distances.
Bellman Ford algorithm works by overestimating the length of the path from the starting vertex to all other vertices.
Then it iteratively relaxes those estimates by finding new paths that are shorter than the previously overestimated
paths. Take the baseball example. Let's say I think the distance to the baseball stadium is 30 miles. However, I
know that the distance to the corner right before the stadium is 15 miles, and I know that from the corner to the
stadium, the distance is 1 mile. Clearly, the distance from me to the stadium is at most 16 miles. So, I can update
my belief to reflect that. That is one cycle of relaxation, and it's done over and over until the shortest paths are
found.
The problem with 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm is that once a vertex 𝑢 is declared known, it is possible that from some
other, unknown vertex 𝑣 there is a path back to 𝑢 that is very negative. In such case, taking a path from 𝑠 to 𝑣
back to 𝑢 is better than going from 𝑠 to 𝑢 without using 𝑣. A combination of Dijkstra's algorithm and unweighted
algorithms will solve the problem. Initialize the queue with 𝑠. Then, at each stage, we 𝐷𝑒𝑄𝑢𝑒𝑢𝑒 a vertex 𝑣. We find
all vertices w adjacent to 𝑣 such that,
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑣 + 𝑤𝑒𝑖𝑔ℎ𝑡(𝑣, 𝑤) < old distance to w
We update w old distance and path, and place 𝑤 on a queue if it is not already there. A bit can be set for each vertex
to indicate presence in the queue. We repeat the process until the queue is empty.
void BellmanFordAlgorithm(struct graph *G, int s) {
struct Queue *Q = createQueue();
int v, w;
enQueue(Q, s);
Distance[s] = 0; // assume the Distance table is filled with INT_MAX
while ( !isEmpty(Q) ){
v = deQueue(Q);
for all adjacent vertices w of v {
Compute new distance d= Distance[v] + weight[v][w];
if(old distance to w > new distance d ) {
Distance[v] = (distance to v) + weight[v][w]);
Path[w] = v;
if(w is there in queue)
enQueue(Q, w)
}
}
}
}
This algorithm works if there are no negative-cost cycles. Each vertex can deQueue at most |V| times, so the
running time is O(|𝐸| . |𝑉|) if adjacency lists are used.
Edges
Vertices
For this simple graph, we can have multiple spanning trees as shown below.
The cost of the spanning tree is the sum of the weights of all the edges in the tree. There can be many spanning
trees. Minimum spanning tree is the spanning tree where the cost is minimum among all the spanning trees.
There also can be many minimum spanning trees.
The algorithm we will discuss now is 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑝𝑎𝑛𝑛𝑖𝑛𝑔 𝑡𝑟𝑒𝑒 in an undirected graph. We assume that the given
graphs are weighted graphs. If the graphs are unweighted graphs then we can still use the weighted graph
algorithms by treating all weights as equal. A 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑝𝑎𝑛𝑛𝑖𝑛𝑔 𝑡𝑟𝑒𝑒 of an undirected graph 𝐺 is a tree formed
from graph edges that connect all the vertices of 𝐺 with minimum total cost (weights). A minimum spanning tree
exists only if the graph is connected. There are two famous algorithms for this problem:
• 𝑃𝑟𝑖𝑚′𝑠 Algorithm
• 𝐾𝑟𝑢𝑠𝑘𝑎𝑙′𝑠 Algorithm
Prim's Algorithm
Prim’s algorithm also use 𝑔𝑟𝑒𝑒𝑑𝑦 approach to find the minimum spanning tree. Prim's algorithm shares a similarity
with the shortest path first algorithms. In Prim’s algorithm we grow the spanning tree from a starting position.
Prim's algorithm is almost same as Dijkstra's algorithm. Like in Dijkstra's algorithm, in Prim's algorithm also we
keep values 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 and 𝑝𝑎𝑡ℎ𝑠 in distance table. The only exception is that since the definition of 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 is
different and as a result the updating statement also changes little. The update statement is simpler than before.
In Prim’s algorithm, we will start with an arbitrary node (it doesn’t matter which one) and mark it. In each iteration
we will mark a new vertex that is adjacent to the one that we have already marked. As a greedy algorithm, Prim’s
algorithm will select the cheapest edge and mark the vertex.
Algorithm
• Maintain two disjoint sets of vertices. One containing vertices that are in the growing spanning tree and
other that are not in the growing spanning tree.
• Select the cheapest vertex that is connected to the growing spanning tree and is not in the growing
spanning tree and add it into the growing spanning tree. This can be done using priority queues. Insert
the vertices, that are connected to growing spanning tree, into the priority queue.
• Check for cycles. To do that, mark the nodes which have been already selected and insert only those
nodes in the priority queue that are not marked.
void Prims(struct Graph *G, int s) {
struct PriorityQueue *PQ = createPriorityQueue();
int v, w;
enQueue(PQ, s);
Distance[s] = 0; // assume the Distance table is filled with -1
while ((!isEmpty(PQ)) {
v = deleteMin(PQ);
for all adjacent vertices w of v {
Compute new distance d= Distance[v] + weight[v][w];
if(Distance[w] == -1) {
Distance[w] = weight[v][w];
Insert w in the priority queue with priority d
Path[w] = v;
}
if(Distance[w] > new distance d) {
Distance[w] = weight[v][w];
Update priority of vertex w to be d;
Path[w] = v;
}
}
}
}
The time complexity of the Prim’s algorithm is O((𝑉 + 𝐸)𝑙𝑜𝑔𝑉) because each vertex is inserted in the priority queue
only once and insertion in priority queue take logarithmic time. The entire implementation of this algorithm is
identical to that of Dijkstra's algorithm. The running time is O(|𝑉|2 ) without heaps [good for dense graphs], and
O(𝐸𝑙𝑜𝑔𝑉) using binary heaps [good for sparse graphs].
Kruskal’s Algorithm
Kruskal's algorithm is a minimum spanning tree algorithm that uses the 𝑔𝑟𝑒𝑒𝑑𝑦 approach. This algorithm treats
the graph as a forest and every node it has as an individual tree. A tree connects to another only and only if, it
has the least cost among all available options and does not violate MST properties. Kruskal’s Algorithm builds the
spanning tree by adding edges one by one into a growing spanning tree.
The algorithm starts with V different trees (V is the vertices in the graph). While constructing the minimum
spanning tree, every time Kruskal’s algorithm selects an edge that has minimum weight and then adds that edge
if it doesn’t create a cycle. So, initially, there are |V| single-node trees in the forest. Adding an edge merges two
trees into one. When the algorithm is completed, there will be only one tree, and that is the minimum spanning
tree.
Algorithm
• Sort the graph edges with respect to their weights.
• Start adding edges to the minimum snapping tree from the edge with the smallest weight until the edge
of the largest weight.
• Only add edges which doesn't form a cycle, edges which connect only disconnected components.
The greedy Choice is to put the smallest weight edge that does not because a cycle in the MST constructed so far.
There are two ways of implementing Kruskal’s algorithm:
• By using disjoint sets: Using UNION and FIND operations
• By using priority queues: Maintains weights in priority queue
So now the question is how to check if vertices are connected or not ?
This could be done using DFS which starts from the first vertex, then check if the second vertex is visited or not.
But DFS will make time complexity large as it has an order of O(E+V) where V is the number of vertices, E is the
number of edges. Disjoint sets are sets whose intersection is the empty set so it means that they don't have any
element in common.
The appropriate data structure is the UNION/FIND algorithm [for implementing forests]. Two vertices belong to
the same set if and only if they are connected in the current spanning forest. Each vertex is initially in its own set.
If 𝑢 and 𝑣 are in the same set, the edge is rejected because it forms a cycle. Otherwise, the edge is accepted, and
a UNION is performed on the two sets containing 𝑢 and 𝑣.
As an example, consider the following graph (the edges show the weights).
A B
7 8
5 5
9 C 7
E
15
6 9
8
F G
11
Now let us perform Kruskal’s algorithm on this graph. We always select the edge which has minimum weight.
A B
7 8 From the above graph, the edges which have
minimum weight (cost) are: AD and BE.
5 5
C From these two we can select one of them and
9 7
let us assume that we select AD (dotted line).
E
15
6 9
8
F G
11
A B
7 8
5 5
9 C 7
BE now has the lowest cost and we select it
E (dotted lines indicate selected edges).
15
6 9
8
F G
11
A B
7 8
5 5
9 C 7
DF is the next edge that has the lowest cost
E (6).
15
6 9
8
F G
11
A B
7 8
5 5
9 C 7
Next, AC and CE have the low cost of 7 and
E we select AC.
15
6 9
8
F G
11
A B
7 8
5 5
9 C 7
Then we select CE as its cost is 7 and it does
E not form a cycle.
15
6 9
8
F G
11
Solution: The number is V (V − 1)/2. Any directed graph can have at most 𝑛2 edges. However, since the graph has
no cycles it cannot contain a self loop, and for any pair 𝑥, 𝑦 of vertices, at most one edge from (𝑥, 𝑦) and (𝑦, 𝑥) can
be included. Therefore the number of edges can be at most (V 2 − V)/2 as desired. It is possible to achieve V(V −
1)/2 edges. Label 𝑛 nodes 1, 2. . . 𝑛 and add an edge (x, y) if and only if 𝑥 < 𝑦. This graph has the appropriate number
of edges and cannot contain a cycle (any path visits an increasing sequence of nodes).
Problem-7 How many simple directed graphs with no parallel edges and self-loops are possible in terms of
𝑉?
Solution: (V) × (V − 1). Since, each vertex can connect to V − 1 vertices without self-loops.
Problem-8 What are the differences between DFS and BFS?
Solution:
DFS BFS
Backtracking is possible from a dead end. Backtracking is not possible.
Vertices from which exploration is incomplete are The vertices to be explored are organized as a FIFO
processed in a LIFO order queue.
The search is done in one particular direction The vertices at the same level are maintained in parallel.
Problem-9 Earlier in this chapter, we discussed minimum spanning tree algorithms. Now, give an algorithm
for finding the maximum-weight spanning tree in a graph.
Solution:
- 1
1
2 -
- 2
1 3 1 - -
3
1 1
1 -
Given graph Transformed graph 1with negative edge
Using the given graph, construct a new graph with the same nodes and edges. But instead of weights
using the same weights, take the negative of their
weights. That means, weight of an edge = negative of weight of the corresponding edge in the given graph. Now, we can use existing 𝑚𝑖𝑛𝑖𝑚𝑢𝑚
𝑠𝑝𝑎𝑛𝑛𝑖𝑛𝑔 𝑡𝑟𝑒𝑒 algorithms on this new graph. As a result, we will get the maximum-weight spanning tree in the original one.
Problem-10 Give an algorithm for checking whether a given graph 𝐺 has simple path from source 𝑠 to
destination 𝑑. Assume the graph 𝐺 is represented using the adjacent matrix.
Solution: Let us assume that the structure for the graph is:
struct Graph {
int V; //Number of vertices
int E; //Number of edges
int ** adjMatrix; //Two dimensional array for storing the connections
};
For each vertex call 𝐷𝐹𝑆 and check whether the current vertex is the same as the destination vertex or not. If they are the same, then return
1. Otherwise, call the 𝐷𝐹𝑆 on its unvisited neighbors. One important thing to note here is that, we are calling the DFS algorithm on vertices
which are not yet visited.
void HasSimplePath(struct Graph *G, int s, int d) {
int t;
Viisited[s] = 1;
if(s == d)
return 1;
for(t = 0; t < G→V; t++) {
if(G→adjMatrix[s][t] && !Viisited[t])
if(DFS(G, t, d))
return 1;
}
return 0;
}
Time Complexity: O(𝐸). In the above algorithm, for each node, since we are not calling 𝐷𝐹𝑆 on all of its neighbors
(discarding through 𝑖𝑓 condition). Space Complexity: O(𝑉).
Problem-11 Count simple paths for a given graph 𝐺 has simple path from source s to destination d? Assume
the graph is represented using the adjacent matrix.
Solution: Similar to the discussion in Problem-10, start at one node and call DFS on that node. As a result of this call, it visits all the nodes
that it can reach in the given graph. That means it visits all the nodes of the connected component of that node. If there are any nodes that
have not been visited, then again start at one of those nodes and call DFS.
Before the first DFS in each connected component, increment the connected components 𝑐𝑜𝑢𝑛𝑡. Continue this process until all of the graph
nodes are visited. As a result, at the end we will get the total number of connected components. The implementation based on this logic is
given below.
void CountSimplePaths(struct Graph * G, int s, int d) {
int t;
Viisited[s] = 1;
if(s == d) {
count++;
Visited[s] = 0;
return;
}
for(t = 0; t < G→V; t++) {
if(G→adjMatrix[s][t] && !Viisited[t]) {
DFS(G, t, d);
Visited[t] = 0;
}
}
}
Problem-12 All pairs shortest path problem: Find the shortest graph distances between every pair of vertices in a given graph. Let us
assume that the given graph does not have negative edges.
Solution: The problem can be solved using 𝑛 applications of 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm. That means we apply 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠
algorithm on each vertex of the given graph. This algorithm does not work if the graph has edges with negative
weights.
Problem-13 In Problem-12, how do we solve the all pairs shortest path problem if the graph has edges with negative weights?
Solution: This can be solved by using the 𝐹𝑙𝑜𝑦𝑑 − 𝑊𝑎𝑟𝑠ℎ𝑎𝑙𝑙 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚. This algorithm also works in the case of a weighted graph where
the edges have negative weights. This algorithm is an example of Dynamic Programming – refer to the 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 chapter.
Problem-14 DFS Application: 𝐶𝑢𝑡 𝑉𝑒𝑟𝑡𝑒𝑥 or 𝐴𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑜𝑖𝑛𝑡𝑠.
Solution: In an undirected graph, a 𝑐𝑢𝑡 𝑣𝑒𝑟𝑡𝑒𝑥 (or articulation point) is a vertex, and if we remove it, then the graph splits into two
disconnected components. As an example, consider the following figure. Removal of the “𝐷” vertex divides the graph into two connected
components ({𝐸, 𝐹} and {𝐴, 𝐵, 𝐶, 𝐺}).
Similarly, removal of the "𝐶" vertex divides the graph into ({𝐺} and {𝐴, 𝐵, 𝐷, 𝐸, 𝐹}). For this graph, 𝐴 and 𝐶 are the cut vertices.
C D F
B A E
Note: A connected, undirected graph is called 𝑏𝑖 − 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 if the graph is still connected after removing any vertex.
A,1/1
B,2/1
C,3/1
D,4/1 G,7/7
E,5/4
F,6/4
𝐷𝐹𝑆 provides a linear-time algorithm (O(𝑛)) to find all cut vertices in a connected graph. Starting at any vertex, call a 𝐷𝐹𝑆 and number the
nodes as they are visited. For each vertex 𝑣, we call this DFS number 𝑑𝑓𝑠𝑛𝑢𝑚(v). The tree generated with DFS traversal is called 𝐷𝐹𝑆
𝑠𝑝𝑎𝑛𝑛𝑖𝑛𝑔 𝑡𝑟𝑒𝑒. Then, for every vertex 𝑣 in the 𝐷𝐹𝑆 spanning tree, we compute the lowest-numbered vertex, which we call 𝑙𝑜𝑤(𝑣), that is
reachable from 𝑣 by taking zero or more tree edges and then possibly one back edge (in that order).
Based on the above discussion, we need the following information for this algorithm: the 𝑑𝑓𝑠𝑛𝑢𝑚 of each vertex in the 𝐷𝐹𝑆 tree (once it gets
visited), and for each vertex 𝑣, the lowest depth of neighbors of all descendants of 𝑣 in the 𝐷𝐹𝑆 tree, called the 𝑙𝑜𝑤.
The 𝑑𝑓𝑠𝑛𝑢𝑚 can be computed during DFS. The low of 𝑣 can be computed after visiting all descendants of 𝑣 (i.e.,
just before 𝑣 gets popped off the 𝐷𝐹𝑆 stack) as the minimum of the 𝑑𝑓𝑠𝑛𝑢𝑚 of all neighbors of 𝑣 (other than the parent of 𝑣
in the 𝐷𝐹𝑆 tree) and the 𝑙𝑜𝑤 of all children of 𝑣 in the 𝐷𝐹𝑆 tree.
The root vertex is a cut vertex if and only if it has at least two children. A non-root vertex u is a cut vertex if and only if there is a son 𝑣 of 𝑢
such that 𝑙𝑜𝑤(𝑣) ≥ 𝑑𝑓𝑠𝑛𝑢𝑚(𝑢). This property can be tested once the 𝐷𝐹𝑆 is returned from every child of 𝑢 (that means,
just before u gets popped off the DFS stack), and if true, 𝑢 separates the graph into different bi-connected
components. This can be represented by computing one bi-connected component out of every such 𝑣 (a component
which contains 𝑣 will contain the sub-tree of 𝑣, plus 𝑢), and then erasing the sub-tree of 𝑣 from the tree.
For the given graph, the 𝐷𝐹𝑆 tree with 𝑑𝑓𝑠𝑛𝑢𝑚/𝑙𝑜𝑤 can be given as shown in the figure below. The implementation for the above
discussion is:
int adjMatrix [256] [256] ;
int dfsnum [256], num = 0, low [256];
void CutVertices( int u ) {
low[u] = dfsnum[u] = num++;
for (int v = 0 ; v < 256; ++v ) {
if(adjMatrix[u][v] && dfsnum[v] == -1) {
CutVertices( v ) ;
if(low[v] > dfsnum[u])
printf(“Cut Vetex:%d”,u);
low[u] = min ( low[u] , low[v] ) ;
}
else // (u,v) is a back edge
low[u ] = min(low[u] , dfsnum[v]) ;
}
}
Problem-15 Let 𝐺 be a connected graph of order 𝑛. What is the maximum number of cut-vertices that 𝐺 can
contain?
Solution: 𝑛 − 2. As an example, consider the following graph. In the graph below, except for the vertices 1 and 𝑛, all
the remaining vertices are cut vertices. This is because removing 1 and 𝑛 vertices does not split the graph into
two. This is a case where we can get the maximum number of cut vertices.
1 2 3 4 𝑛
In the above graph, if we remove the edge 𝑢𝑣 then the graph splits into two components. For this graph, 𝑢𝑣 is a
bridge. The discussion we had for cut vertices holds good for bridges also. The only change is, instead of printing
the vertex, we give the edge. The main observation is that an edge (𝑢, 𝑣) cannot be a bridge if it is part of a cycle. If
(𝑢, 𝑣) is not part of a cycle, then it is a bridge.
We can detect cycles in 𝐷𝐹𝑆 by the presence of back edges. (𝑢, 𝑣) is a bridge if and only if none of v or 𝑣’𝑠 children
has a back edge to 𝑢 or any of 𝑢’𝑠 ancestors. To detect whether any of 𝑣’𝑠 children has a back edge to 𝑢’𝑠 parent,
we can use a similar idea as above to see what is the smallest 𝑑𝑓𝑠𝑛𝑢𝑚 reachable from the subtree rooted at 𝑣.
int dfsnum[256], num = 0, low [256];
void Bridges( struct Graph *G, int u ) {
low[u] = dfsnum[u] = num++;
for (int v = 0 ; G→V; ++v ) {
0 5
2 4
If we start at vertex 0, we can select the edge to vertex 1, then select the edge to vertex 2, then select the edge to
vertex 0. There are now no remaining unchosen edges from vertex 0:
1 3
0 5
2 4
We now have a circuit 0,1,2,0 that does not traverse every edge. So, we pick some other vertex that is on that
circuit, say vertex 1. We then do another depth first search of the remaining edges. Say we choose the edge to
node 3, then 4, then 1. Again we are stuck. There are no more unchosen edges from node 1. We now splice this
path 1,3,4,1 into the old path 0,1,2,0 to get: 0,1,3,4,1,2,0. The unchosen edges now look like this:
1 3
0 5
2 4
We can pick yet another vertex to start another DFS. If we pick vertex 2, and splice the path 2,3,5,4,2, then we get the final circuit
0,1,3,4,1,2,3,5,4,2,0.
A similar problem is to find a simple cycle in an undirected graph that visits every vertex. This is known as the
𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 𝑐𝑦𝑐𝑙𝑒 𝑝𝑟𝑜𝑏𝑙𝑒𝑚. Although it seems almost identical to the 𝐸𝑢𝑙𝑒𝑟 circuit problem, no efficient algorithm
for it is known.
Notes:
• A connected undirected graph is 𝐸𝑢𝑙𝑒𝑟𝑖𝑎𝑛 if and only if every graph vertex has an even degree, or exactly two vertices with an odd
degree.
• A directed graph is 𝐸𝑢𝑙𝑒𝑟𝑖𝑎𝑛 if it is strongly connected and every vertex has an equal 𝑖𝑛 and 𝑜𝑢𝑡 degree.
Application: A postman has to visit a set of streets in order to deliver mails and packages. He needs to find a path
that starts and ends at the post-office, and that passes through each street (edge) exactly once. This way the
postman will deliver mails and packages to all the necessary streets, and at the same time will spend minimum
time/effort on the road.
Problem-18 DFS Application: Finding Strongly Connected Components.
Solution: This is another application of DFS. In a directed graph, two vertices 𝑢 and 𝑣 are strongly connected if and
only if there exists a path from 𝑢 to 𝑣 and there exists a path from 𝑣 to 𝑢. The strong connectedness is an
equivalence relation.
• A vertex is strongly connected with itself
• If a vertex 𝑢 is strongly connected to a vertex 𝑣, then 𝑣 is strongly connected to 𝑢
• If a vertex 𝑢 is strongly connected to a vertex 𝑣, and 𝑣 is strongly connected to a vertex 𝑥, then 𝑢 is strongly
connected to 𝑥
What this says is, for a given directed graph we can divide it into strongly connected components. This problem
can be solved by performing two depth-first searches. With two DFS searches we can test whether a given directed
graph is strongly connected or not. We can also produce the subsets of vertices that are strongly connected.
Algorithm
• Perform DFS on given graph 𝐺.
• Number vertices of given graph 𝐺 according to a post-order traversal of depth-first spanning forest.
• Construct graph 𝐺𝑟 by reversing all edges in 𝐺.
• Perform DFS on 𝐺𝑟 : Always start a new DFS (initial call to Visit) at the highest-numbered vertex.
• Each tree in the resulting depth-first spanning forest corresponds to a strongly-connected component.
Why this algorithm works?
Let us consider two vertices, 𝑣 and 𝑤. If they are in the same strongly connected component, then there are paths
from 𝑣 to w and from 𝑤 to 𝑣 in the original graph 𝐺, and hence also in 𝐺𝑟 . If two vertices 𝑣 and 𝑤 are not in the
same depth-first spanning tree of 𝐺𝑟 , clearly they cannot be in the same strongly connected component. As an
example, consider the graph shown below on the left. Let us assume this graph is 𝐺.
A
A B
D C C
D
Now, as per the algorithm, performing 𝐷𝐹𝑆 on this G graph gives the following diagram. The dotted line from 𝐶 to
𝐴 indicates a back edge.
Now, performing post order traversal on this tree gives: 𝐷, 𝐶, 𝐵 and 𝐴.
Vertex Post Order Number
A 4
B 3
C 2
D 1
Now reverse the given graph 𝐺 and call it 𝐺𝑟 and at the same time assign postorder numbers to the vertices. The reversed graph 𝐺𝑟
will look like:
A,4 B,3
D,1 C,2
The last step is performing DFS on this reversed graph 𝐺𝑟 . While doing 𝐷𝐹𝑆, we need to consider the vertex which
has the largest DFS number. So, first we start at 𝐴 and with 𝐷𝐹𝑆 we go to 𝐶 and then 𝐵. At B, we cannot move
further. This says that {𝐴, 𝐵, 𝐶} is a strongly connected component. Now the only remaining element is 𝐷 and we
end our second 𝐷𝐹𝑆 at 𝐷. So the connected components are: {𝐴, 𝐵, 𝐶} and {𝐷}.
A,4 D,1
C,2
B,3
}
Time Complexity: Same as that of DFS and it depends on implementation. With adjacency matrix the complexity
is O(|𝐸| + |𝑉|) and with adjacency matrix the complexity is O(|𝑉|2 ).
Problem-20 Can we solve the Problem-19, using BFS?
Solution: Yes. This problem can be solved with one extra counter in BFS.
void BFS(struct Graph *G, int u) {
int v,
Queue Q = createQueue();
enQueue(Q, u);
while(!isEmpty(Q)) {
u = deQueue(Q);
Process u; //For example, print
Visited[s]=1;
/* For example, if the adjacency matrix is used for representing the
graph, then the condition be used for finding unvisited adjacent
vertex of u is: if( !Visited[v] && G→Adj[u][v] ) */
for each unvisited adjacent node v of u {
enQueue(Q, v);
}
}
}
void BFSTraversal(struct Graph *G) {
for (int i = 0; i< G→V;i++)
Visited[i]=0;
//This loop is required if the graph has more than one component
for (int i = 0; i< G→V; i++)
if(!Visited[i])
BFS(G, i);
}
Time Complexity: Same as that of 𝐵𝐹𝑆 and it depends on implementation. With adjacency matrix the complexity
is O(|𝐸| + |𝑉|) and with adjacency matrix the complexity is O(|𝑉|2 ).
Problem-21 Let us assume that 𝐺(𝑉, 𝐸) is an undirected graph. Give an algorithm for finding a spanning tree
which takes O(|𝐸|) time complexity (not necessarily a minimum spanning tree).
Solution: The test for a cycle can be done in constant time, by marking vertices that have been added to the set 𝑆. An edge will introduce a
cycle, if both its vertices have already been marked.
Algorithm:
S = {}; //Assume S is a set
for each edge e ∈ E {
if(adding e to S doesn’t form a cycle) {
add e to S;
mark e;
}
}
Problem-22 Is there any other way of solving Problem-20?
Solution: Yes. We can run 𝐵𝐹𝑆 and find the 𝐵𝐹𝑆 tree for the graph (level order tree of the graph). Then start at the root element and keep
moving to the next levels and at the same time we have to consider the nodes in the next level only once. That means, if we have a node with
multiple input edges then we should consider only one of them; otherwise they will form a cycle.
Problem-23 Detecting a cycle in an undirected graph
Solution: An undirected graph is acyclic if and only if a 𝐷𝐹𝑆 yields no back edges, edges (𝑢, 𝑣) where 𝑣 has already
been discovered and is an ancestor of 𝑢.
• Execute 𝐷𝐹𝑆 on the graph.
• If there is a back edge - the graph has a cycle.
If the graph does not contain a cycle, then |𝐸| < |𝑉| and 𝐷𝐹𝑆 cost O(|𝑉|). If the graph contains a cycle, then a
back edge is discovered after 2|𝑉| steps at most.
Problem-24 Detecting a cycle in DAG
Solution: 1
2 3
Cycle detection on a graph is different than on a tree. This is because in a graph, a node can have multiple parents.
In a tree, the algorithm for detecting a cycle is to do a depth first search, marking nodes as they are encountered.
If a previously marked node is seen again, then a cycle exists. This won’t work on a graph. Let us consider the
graph shown in the figure below. If we use a tree cycle detection algorithm, then it will report the wrong result.
That means that this graph has a cycle in it. But the given graph does not have a cycle in it. This is because node
3 will be seen twice in a 𝐷𝐹𝑆 starting at node 1.
The cycle detection algorithm for trees can easily be modified to work for graphs. The key is that in a 𝐷𝐹𝑆 of an
acyclic graph, a node whose descendants have all been visited can be seen again without implying a cycle. But, if
a node is seen for the second time before all its descendants have been visited, then there must be a cycle. Can
you see why this is? Suppose there is a cycle containing node A. This means that A must be reachable from one
of its descendants. So when the 𝐷𝐹𝑆 is visiting that descendant, it will see 𝐴 again, before it has finished visiting
all of 𝐴’𝑠 descendants. So there is a cycle. In order to detect cycles, we can modify the depth first search.
int DetectCycle(struct Graph *G) {
for (int i = 0; i< G→V; i++) {
Visited[s]=0;
Predecessor[i] = 0;
}
for (int i = 0; i < G→V;i++) {
if(!Visited[i] && HasCycle(G, i))
return 1;
}
return 0;
}
int HasCycle(struct Graph *G, int u) {
Visited[u]=1;
for (int i = 0; i< G→V; i++) {
if(G→Adj[s][i]) {
if(Predecessor[i] != u && Visited[i])
return 1;
else {
Predecessor[i] = u;
return HasCycle(G, i);
}
}
}
return 0;
}
Time Complexity: O(𝑉 + 𝐸).
Problem-25 Given a directed acyclic graph, give an algorithm for finding its depth.
Solution: If it is an undirected graph, we can use the simple unweighted shortest path algorithm (check
𝑆ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑃𝑎𝑡ℎ 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 section). We just need to return the highest number among all distances. For directed
acyclic graph, we can solve by following the similar approach which we used for finding the depth in trees. In
trees, we have solved this problem using level order traversal (with one extra special symbol to indicate the end of
the level).
//Assuming the given graph is a DAG
int DepthInDAG( struct Graph *G ) {
struct Queue *Q;
int counter;
int v, w;
Q = createQueue();
counter = 0;
for (v = 0; v< G→V; v++)
if( indegree[v] == 0 )
enQueue( Q , v );
enQueue( Q, ‘$’ );
while( !isEmpty( Q )) {
v = deQueue( Q );
if(v == ‘$’) {
counter++;
if(!isEmpty( Q ))
enQueue( Q , ‘$’ );
}
for each w adjacent to v
if( --indegree[w] == 0 )
enQueue ( Q , w );
}
deleteQueue( Q );
return counter;
}
Total running time is O(𝑉 + 𝐸) .
Problem-26 How many topological sorts of the following dag are there?
Solution: If we observe the above graph there are three stages with 2 vertices. In the early discussion of this chapter, we saw that topological
sort picks the elements with zero indegree at any point of time. At each of the two vertices stages, we can first process either the top vertex or
the bottom vertex. As a result, at each of these stages we have two possibilities. So the total number of possibilities is the multiplication of
possibilities at each stage and that is, 2 × 2 × 2 = 8.
Problem-27 Unique topological ordering: Design an algorithm to determine whether a directed graph has a
unique topological ordering.
Solution: A directed graph has a unique topological ordering if and only if there is a directed edge between each pair
of consecutive vertices in the topological order. This can also be defined as: a directed graph has a unique
topological ordering if and only if it has a Hamiltonian path. If the digraph has multiple topological orderings, then
a second topological order can be obtained by swapping a pair of consecutive vertices.
Problem-28 Let us consider the prerequisites for courses at 𝐼𝐼𝑇 𝐵𝑜𝑚𝑏𝑎𝑦. Suppose that all prerequisites are
mandatory, every course is offered every semester, and there is no limit to the number of courses we can take
in one semester. We would like to know the minimum number of semesters required to complete the major.
Describe the data structure we would use to represent this problem, and outline a linear time algorithm for
solving it.
Solution: Use a directed acyclic graph (DAG). The vertices represent courses and the edges represent the prerequisite relation between
courses at 𝐼𝐼𝑇 𝐵𝑜𝑚𝑏𝑎𝑦. It is a DAG, because the prerequisite relation has no cycles.
The number of semesters required to complete the major is one more than the longest path in the dag. This can be calculated on the DFS
tree recursively in linear time. The longest path out of a vertex 𝑥 is 0 if 𝑥 has outdegree 0, otherwise it is 1 + 𝑚𝑎𝑥
{𝑙𝑜𝑛𝑔𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑜𝑢𝑡 𝑜𝑓 𝑦 | (𝑥, 𝑦) 𝑖𝑠 𝑎𝑛 𝑒𝑑𝑔𝑒 𝑜𝑓 𝐺}.
Problem-29 At a university let’s say 𝐼𝐼𝑇 𝐵𝑜𝑚𝑏𝑎𝑦), there is a list of courses along with their prerequisites. That
means, two lists are given:
𝐴 - Courses list
𝐵 – Prerequisites: B contains couples (𝑥, 𝑦) where 𝑥, 𝑦 ∈ 𝐴 indicating that course 𝑥 can't be taken before course
𝑦.
Let us consider a student who wants to take only one course in a semester. Design a schedule for this
student.
Example: A = {C-Lang, Data Structures, OS, CO, Algorithms, Design Patterns, Programming }. B = { (C-Lang,
CO), (OS, CO), (Data Structures, Algorithms), (Design Patterns, Programming) }. 𝑂𝑛𝑒 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑠𝑐ℎ𝑒𝑑𝑢𝑙𝑒 𝑐𝑜𝑢𝑙𝑑
𝑏𝑒:
Semester 1: Data Structures
Semester 2: Algorithms
Semester 3: C-Lang
Semester 4: OS
Semester 5: CO
Semester 6: Design Patterns
Semester 7: Programming
Solution: The solution to this problem is exactly the same as that of topological sort. Assume that the courses
names are integers in the range [1. . 𝑛], 𝑛 is known (𝑛 is not constant). The relations between the courses will be
represented by a directed graph 𝐺 = (𝑉, 𝐸), where 𝑉 are the set of courses and if course 𝑖 is prerequisite of course
𝑗, 𝐸 will contain the edge (𝑖, 𝑗). Let us assume that the graph will be represented as an Adjacency list.
First, let's observe another algorithm to topologically sort a DAG in O(|𝑉| + |𝐸|).
• Find in-degree of all the vertices - O(|𝑉| + |𝐸|)
• Repeat:
Find a vertex v with in-degree=0 - O(|𝑉|)
Output v and remove it from G, along with its edges - O(|𝑉|)
Reduce the in-degree of each node u such as (𝑣, 𝑢) was an edge in G and keep a list of vertices with in-
degree=0 - O(𝑑𝑒𝑔𝑟𝑒𝑒(𝑣))
Repeat the process until all the vertices are removed
The time complexity of this algorithm is also the same as that of the topological sort and it is O(|𝑉| + |𝐸|).
Problem-30 In Problem-29, a student wants to take all the courses in 𝐴, in the minimal number of semesters.
That means the student is ready to take any number of courses in a semester. Design a schedule for this
scenario. 𝑂𝑛𝑒 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑠𝑐ℎ𝑒𝑑𝑢𝑙𝑒 𝑖𝑠:
𝑆𝑒𝑚𝑒𝑠𝑡𝑒𝑟 1: C-Lang, OS, Design Patterns
𝑆𝑒𝑚𝑒𝑠𝑡𝑒𝑟 2: Data Structures, CO, Programming
𝑆𝑒𝑚𝑒𝑠𝑡𝑒𝑟 3: Algorithms
Solution: A variation of the above topological sort algorithm with a slight change: In each semester, instead of
taking one subject, take all the subjects with zero indegree. That means, execute the algorithm on all the nodes
with degree 0 (instead of dealing with one source in each stage, all the sources will be dealt and printed).
Time Complexity: O(|𝑉| + |𝐸|).
Problem-31 LCA of a DAG: Given a DAG and two vertices 𝑣 and 𝑤, find the 𝑙𝑜𝑤𝑒𝑠𝑡 𝑐𝑜𝑚𝑚𝑜𝑛 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟 (LCA) of 𝑣
and 𝑤. The LCA of 𝑣 and 𝑤 is an ancestor of 𝑣 and 𝑤 that has no descendants that are also ancestors of 𝑣 and
𝑤.
Hint: Define the height of a vertex 𝑣 in a DAG to be the length of the longest path from 𝑟𝑜𝑜𝑡 to 𝑣. Among the
vertices that are ancestors of both 𝑣 and 𝑤, the one with the greatest height is an LCA of 𝑣 and 𝑤.
Problem-32 Shortest ancestral path: Given a DAG and two vertices 𝑣 and 𝑤, find the 𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑎𝑛𝑐𝑒𝑠𝑡𝑟𝑎𝑙 𝑝𝑎𝑡ℎ
between 𝑣 and 𝑤. An ancestral path between 𝑣 and 𝑤 is a common ancestor 𝑥 along with a shortest path from
𝑣 to 𝑥 and a shortest path from 𝑤 to 𝑥. The shortest ancestral path is the ancestral path whose total length is
minimized.
Hint: Run BFS two times. First run from 𝑣 and second time from 𝑤. Find a DAG where the shortest ancestral path
goes to a common ancestor 𝑥 that is not an LCA.
Problem-33 Let us assume that we have two graphs 𝐺1 and 𝐺2 . How do we check whether they are isomorphic
or not?
Solution: There are many ways of representing the same graph. As an example, consider the following simple graph.
It can be seen that all the representations below have the same number of vertices and the same number of edges.
Solution: We will try to answer this question in two steps. First, we count all labeled graphs. Assume all the
representations below are labeled with {1, 2, 3} as vertices. The set of all such graphs for 𝑛 = 3 are:
There are only two choices for each edge: it either exists or it does not. Therefore, since the maximum number of
n n(n−1)
edges is ( ) (and since the maximum number of edges in an undirected graph with 𝑛 vertices is 2 = nc2 = (n2)), the
2
n
total number of undirected labeled graphs is 2(2) .
Problem-35 Hamiltonian path in DAGs: Given a DAG, design a linear time algorithm to determine whether
there is a path that visits each vertex exactly once.
Solution: The 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path problem is an NP-Complete problem (for more details ref 𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 chapter).
To solve this problem, we will try to give the approximation algorithm (which solves the problem, but it may not
always produce the optimal solution).
Let us consider the topological sort algorithm for solving this problem. Topological sort has an interesting property:
that if all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directed
𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path in the DAG. If a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path exists, the topological sort order is unique. Also, if a topological
sort does not form a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path, the DAG will have two or more topological orderings.
𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚: Compute a topological sort and check if there is an edge between each consecutive pair
of vertices in the topological order.
In an unweighted graph, find a path from 𝐬 to 𝐭 that visits each vertex exactly once. The basic solution based on backtracking is, we start at 𝑠
and try all of its neighbors recursively, making sure we never visit the same vertex twice. The algorithm based on this implementation can be
given as:
bool seenTable[32];
void HamiltonianPath( struct Graph *G, int u ) {
if( u == t )
/* Check that we have seen all vertices. */
else {
for( int v = 0; v < n; v++ )
if( !seenTable[v] && G→Adj[u][v] ) {
seenTable[v] = true;
HamiltonianPath( v );
seenTable[v] = false;
}
}
}
Note that if we have a partial path from s to u using vertices s = v1 , v2 , . . . , vk = u, then we don't care about the order in which we visited these
vertices so as to figure out which vertex to visit next. All that we need to know is the set of vertices we have seen (the seenTable[] array) and
which vertex we are at right now (u).
There are 2n possible sets of vertices and n choices for u. In other words, there are 2n possible 𝑠𝑒𝑒𝑛𝑇𝑎𝑏𝑙𝑒[ ] arrays and n different parameters
to HamiltonianPath(). What HamiltonianPath() does during any particular recursive call is completely determined by the 𝑠𝑒𝑒𝑛𝑇𝑎𝑏𝑙𝑒[ ] array
and the parameter u.
Problem-36 For a given graph 𝐺 with 𝑛 vertices how many trees we can construct?
Solution: There is a simple formula for this problem and it is named after Arthur Cayley. For a given graph with 𝑛 labeled vertices the formula
for finding number of trees on is 𝑛𝑛−2 . Below, the number of trees with different 𝑛 values is shown.
n value Formula value: 𝑛𝑛−2 Number of Trees
2 1 1 2
1 3
2
3 3
2 3 1 2 3 1
Problem-37 For a given graph G with 𝑛 vertices how many spanning trees can we construct?
Solution: The solution to this problem is the same as that of Problem-36. It is just another way of asking the same question. Because the
number of edges in both regular tree and spanning tree are the same.
Problem-38 The 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 𝑐𝑦𝑐𝑙𝑒 problem: Is it possible to traverse each of the vertices of a graph exactly
once, starting and ending at the same vertex?
Solution: Since the 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path problem is an NP-Complete problem, the 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle problem is an NP-
Complete problem. A 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle is a cycle that traverses every vertex of a graph exactly once. There are no known conditions
in which are both necessary and sufficient, but there are a few sufficient conditions.
• For a graph to have a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle the degree of each vertex must be two or more.
• The Petersen graph does not have a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle and the graph is given below.
• In general, the more edges a graph has, the more likely it is to have a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle.
𝑛
• Let G be a simple graph with n ≥ 3 vertices. If every vertex has a degree of at least 2 , then G has a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle.
• The best known algorithm for finding a 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 cycle has an exponential worst-case complexity.
Note: For the approximation algorithm of 𝐻𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛 path, refer to the 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 chapter.
Problem-39 What is the difference between 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎’𝑠 and 𝑃𝑟𝑖𝑚′𝑠 algorithm?
Solution: 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm is almost identical to that of 𝑃𝑟𝑖𝑚′𝑠. The algorithm begins at a specific vertex and
extends outward within the graph until all vertices have been reached. The only distinction is that 𝑃𝑟𝑖𝑚′𝑠 algorithm
stores a minimum cost edge whereas 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm stores the total cost from a source vertex to the current
vertex. More simply, 𝐷𝑖𝑗𝑘𝑠𝑡𝑟𝑎′𝑠 algorithm stores a summation of minimum cost edges whereas 𝑃𝑟𝑖𝑚′𝑠 algorithm
stores at most one minimum cost edge.
Problem-40 Reversing Graph: Give an algorithm that returns the reverse of the directed graph (each edge
from 𝑣 to 𝑤 is replaced by an edge from 𝑤 to 𝑣).
Solution: In graph theory, the reverse (also called 𝑡𝑟𝑎𝑛𝑠𝑝𝑜𝑠𝑒) of a directed graph 𝐺 is another directed graph on the same set of vertices with
all the edges reversed. That means, if 𝐺 contains an edge (𝑢, 𝑣) then the reverse of 𝐺 contains an edge (𝑣, 𝑢) and vice versa.
Algorithm:
Graph ReverseTheDirectedGraph(struct Graph *G) {
Create new graph with name ReversedGraph and
let us assume that this will contain the reversed graph.
// The reversed graph also will contain same number of vertices and edges.
for each vertex of given graph G {
for each vertex w adjacent to v {
Add the w to v edge in ReversedGraph;
// That means we just need to reverse the bits in adjacency matrix.
}
}
return ReversedGraph;
}
Problem-41 Travelling Sales Person Problem: Find the shortest path in a graph that visits each vertex at
least once, starting and ending at the same vertex.
Solution: The Traveling Salesman Problem (𝑇𝑆𝑃) is related to finding a Hamiltonian cycle. Given a weighted graph 𝐺, we want to find the
shortest cycle (may be non-simple) that visits all the vertices.
Approximation algorithm: This algorithm does not solve the problem but gives a solution which is within a factor of 2 of optimal (in the worst-
case).
1) Find a Minimal Spanning Tree (MST).
2) Do a DFS of the MST.
For details, refer to the chapter on 𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝐶𝑙𝑎𝑠𝑠𝑒𝑠.
Problem-42 Discuss Bipartite matchings?
Solution: In Bipartite graphs, we divide the graphs in to two disjoint sets, and each edge connects a vertex from one
set to a vertex in another subset (as shown in figure).
Definition: A simple graph 𝐺 = (𝑉, 𝐸) is called a 𝑏𝑖𝑝𝑎𝑟𝑡𝑖𝑡𝑒 𝑔𝑟𝑎𝑝ℎ if its vertices can be divided into two disjoint sets 𝑉 =
𝑉1 ∪ 𝑉2 , such that every edge has the form 𝑒 = (𝑎, 𝑏) where 𝑎 ∈ 𝑉1 and 𝑏 ∈ 𝑉2 . One important condition is that no
vertices both in 𝑉1 or both in 𝑉2 are connected.
𝐾2,3 𝐾3,3
• A subset of edges 𝑀 ϲ 𝐸 is a 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 if no two edges have a common vertex. As an example, matching sets
of edges are represented with dotted lines. A matching 𝑀 is called 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 if it has the largest number
of possible edges. In the graphs, the dotted edges represent the alternative matching for the given graph.
1 3 3 1
4
2 2 4
• A matching 𝑀 is 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 if it matches all vertices. We must have 𝑉1 = 𝑉2 in order to have perfect matching.
• An 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑛𝑔 𝑝𝑎𝑡ℎ is a path whose edges alternate between matched and unmatched edges. If we find an
alternating path, then we can improve the matching. This is because an alternating path consists of
matched and unmatched edges. The number of unmatched edges exceeds the number of matched edges
by one. Therefore, an alternating path always increases the matching by one.
The next question is, how do we find a perfect matching? Based on the above theory and definition, we can
find the perfect matching with the following approximation algorithm.
Matching Algorithm (Hungarian algorithm)
1) Start at unmatched vertex.
2) Find an alternating path.
3) If it exists, change matching edges to no matching edges and conversely. If it does not exist, choose another
unmatched vertex.
4) If the number of edges equals 𝑉/2, stop. Otherwise proceed to step 1 and repeat, as long as all vertices
have been examined without finding any alternating paths.
Time Complexity of the Matching Algorithm: The number of iterations is in O(𝑉). The complexity of finding an
alternating path using BFS is O(𝐸). Therefore, the total time complexity is O(𝑉 × 𝐸).
Problem-43 Marriage and Personnel Problem?
Marriage Problem: There are 𝑋 men and 𝑌 women who desire to get married. Participants indicate who among
the opposite sex could be a potential spouse for them. Every woman can be married to at most one man, and every
man to at most one woman. How can we marry everybody to someone they like?
Personnel Problem: You are the boss of a company. The company has 𝑀 workers and 𝑁 jobs. Each worker is
qualified to do some jobs, but not others. How will you assign jobs to each worker?
Solution: These two cases are just another way of asking about bipartite graphs, and the solution is the same as
that of Problem-42.
Problem-44 How many edges will be there in complete bipartite graph 𝐾𝑚,𝑛 ?
Solution: 𝑚 × 𝑛. This is because each vertex in the first set can connect all vertices in the second set.
Problem-45 A graph is called regular graph if it has no loops and multiple edges where each vertex has the
same number of neighbors; i.e. every vertex has the same degree. Now, if 𝐾𝑚,𝑛 is a regular graph what is the
relation between 𝑚 and 𝑛?
Solution: Since each vertex should have the same degree, the relation should be 𝑚 = 𝑛.
Problem-46 What is the maximum number of edges in the maximum matching of a bipartite graph with 𝑛
vertices?
Solution: From the definition of 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔, we should not have edges with common vertices. So in a bipartite
graph, each vertex can connect to only one vertex. Since we divide the total vertices into two sets, we can get the
𝑛
maximum number of edges if we divide them in half. Finally the answer is 2 .
Problem-47 Discuss Planar Graphs. 𝑃𝑙𝑎𝑛𝑎𝑟 𝑔𝑟𝑎𝑝ℎ: Is it possible to draw the edges of a graph in such a way
that the edges do not cross?
Solution: A graph G is said to be planar if it can be drawn in the plane in such a way that no two edges meet each
other except at a vertex to which they are incident. Any such drawing is called a plane drawing of G. As an example
consider the below graph:
A B
D C
This graph we can easily convert to a planar graph as below (without any crossed edges).
A B
D C
Solution: A vertex i is a sink if and only if 𝑀[𝑖, 𝑗] = 0 for all j and 𝑀[𝑗, 𝑖] = 1 for all 𝑗 ≠ 𝑖. For any pair of vertices 𝑖
and 𝑗:
𝑀[𝑖, 𝑗] = 1 → vertex i can't be a sink
𝑀[𝑖, 𝑗] = 0 → vertex j can't be a sink
Algorithm:
• Start at 𝑖 = 1, 𝑗 = 1
• If 𝑀[𝑖, 𝑗] = 0 → 𝑖 wins, 𝑗 + +
• If 𝑀[𝑖, 𝑗] = 1 → 𝑗 wins, 𝑖 + +
• Proceed with this process until 𝑗 = 𝑛 or 𝑖 = 𝑛 + 1
• If 𝑖 == 𝑛 + 1 , the graph does not contain a sink
• Otherwise, check row i – it should be all zeros; and check column 𝑖 – it should be all but 𝑀[𝑖, 𝑖] ones; – if
so, 𝑖 is a sink.
Time Complexity: O(𝑉), because at most 2|𝑉| cells in the matrix are examined.
Problem-52 What is the worst – case memory usage of DFS?
Solution: It occurs when the O(|V|), which happens if the graph is actually a list. So the algorithm is memory
efficient on graphs with small diameter.
1 2 3 4 𝑛
Problem-53 Does DFS find the shortest path from start node to some node w ?
Solution: No. In DFS it is not compulsory to select the smallest weight edge.
Problem-54 True or False: Dijkstra’s algorithm does not compute the “all pairs” shortest paths in a directed
graph with positive edge weights because, running the algorithm a single time, starting from some single vertex
𝑥, it will compute only the min distance from 𝑥 to 𝑦 for all nodes 𝑦 in the graph.
Solution: True.
Problem-55 True or False: Prim’s and Kruskal’s algorithms may compute different minimum spanning trees
when run on the same graph.
Solution: True.
Chapter
Sorting 10
10.1 What is Sorting?
𝑆𝑜𝑟𝑡𝑖𝑛𝑔 is an algorithm that arranges the elements of a list in a certain order [either 𝑎𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔 or 𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔]. The
output is a permutation or reordering of the input.
By Number of Comparisons
In this method, sorting algorithms are classified based on the number of comparisons. For comparison based
sorting algorithms, best case behavior is O(𝑛𝑙𝑜𝑔𝑛) and worst case behavior is O(𝑛2 ). Comparison-based sorting
algorithms evaluate the elements of the list by key comparison operation and need at least O(𝑛𝑙𝑜𝑔𝑛) comparisons
for most inputs.
Later in this chapter we will discuss a few 𝑛𝑜𝑛 − 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛 (𝑙𝑖𝑛𝑒𝑎𝑟) sorting algorithms like Counting sort, Bucket
sort, Radix sort, etc. Linear Sorting algorithms impose few restrictions on the inputs to improve the complexity.
By Number of Swaps
In this method, sorting algorithms are categorized by the number of 𝑠𝑤𝑎𝑝𝑠 (also called 𝑖𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛𝑠).
By Memory Usage
Some sorting algorithms are "𝑖𝑛 𝑝𝑙𝑎𝑐𝑒" and they need O(1) or O(𝑙𝑜𝑔𝑛) memory to create auxiliary locations for
sorting the data temporarily.
By Recursion
Sorting algorithms are either recursive [quick sort] or non-recursive [selection sort, and insertion sort], and there
are some algorithms which use both (merge sort).
By Stability
Sorting algorithm is 𝑠𝑡𝑎𝑏𝑙𝑒 if for all indices 𝑖 and 𝑗 such that the key 𝐴[𝑖] equals key 𝐴[𝑗], if record 𝑅[𝑖] precedes record 𝑅[𝑗] in the original
file, record 𝑅[𝑖] precedes record 𝑅[𝑗] in the sorted list. Few sorting algorithms maintain the relative order of elements with equal
keys (equivalent elements retain their relative positions even after sorting).
By Adaptability
With a few sorting algorithms, the complexity changes based on pre-sortedness [quick sort]: pre-sortedness of the
input affects the running time. Algorithms that take this into account are known to be adaptive.
Internal Sort
Sort algorithms that use main memory exclusively during the sort are called 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙 sorting algorithms. This kind
of algorithm assumes high-speed random access to all memory.
External Sort
Sorting algorithms that use external memory, such as tape or disk, during the sort come under this category.
Implementation
void bubbleSort(int data[], int size){
for(int step=0; step<size-1; ++step){
for(int i=0; i<size-step-1; ++i){
// To sort in descending order, change > to <.
if (data[i]>data[i+1]){
int temp = data[i];
data[i] = data[i+1];
data[i+1]= temp;
}
}
}
}
Algorithm takes O(𝑛2 ) (even in best case). In the above code, all the comparisons are made even if the array is already sorted at some
point. It increases the execution time.
The code can be optimized by introducing an extra variable 𝑠𝑤𝑎𝑝𝑝𝑒𝑑. After every pass, if there is no swapping taking place then, there is no
need for performing further loops. Variable 𝑠𝑤𝑎𝑝𝑝𝑒𝑑 is false if there is no swapping. Thus, we can prevent further iterations. No more swaps
indicate the completion of sorting. If the list is already sorted, we can use this flag to skip the remaining passes.
void bubbleSortImproved(int data[], int size) {
int pass, i, temp, swapped = 1;
for (pass = size - 1; pass >= 0 && swapped; pass--) {
swapped = 0;
for (i = 0; i <= pass - 1 ; i++) {
if(data[i] > data[i+1]) {
// swap elements
temp = data[i];
data[i] = data[i+1];
data[i+1] = temp;
swapped = 1;
}
}
}
}
This modified version improves the best case of bubble sort to O(𝑛).
Performance
Worst case complexity O(𝑛2 )
Best case complexity (Improved version) O(𝑛)
Average case complexity (Basic version) O(𝑛2 )
Worst case space complexity O(1) auxiliary
4 5 7 10 57 43 45 9 91 Swap 10 and 9
4 5 7 9 57 43 45 10 91 Swap 57 amd 10
4 5 7 9 10 43 45 57 91 43 is the next smallest, skip
5 4 7 9 10 43 45 57 91 45 is the next smallest, skip
5 4 7 9 10 43 45 57 91 57 is the next smallest, skip
4 5 7 9 10 43 45 57 91 List is ordered
Advantages
• Easy to implement
• In-place sort (requires no additional storage space)
Disadvantages
• Doesn't scale well: O(𝑛2 )
Algorithm
1. Find the minimum value in the list
2. Swap it with the value in the current position
3. Repeat this process for all the elements until the entire array is sorted
This algorithm is called 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑠𝑜𝑟𝑡 since it repeatedly 𝑠𝑒𝑙𝑒𝑐𝑡𝑠 the smallest element.
Implementation
void selectionSort(int data[], int size) {
int i, j, min, temp;
for (i = 0; i < size - 1; i++) {
min = i;
for (j = i+1; j < size; j++) {
if(data [j] < data [min])
min = j;
}
temp = data[min]; // Swap the elements
data[min] = data[i];
data[i] = temp;
}
}
Performance
Worst case complexity O(𝑛2 )
Best case complexity (Improved version) O(𝑛2 )
Average case complexity (Basic version) O(𝑛2 )
Worst case space complexity O(1) auxiliary
Advantages
• Easy to implement
• Efficient for small data
• Adaptive: If the input list is presorted [may not be completely] then insertions sort takes O(𝑛 + 𝑑), where
𝑑 is the number of inversions
• Practically more efficient than selection and bubble sorts, even though all of them have O(𝑛2 ) worst case
complexity
• Stable: Maintains relative order of input data if the keys are same
• In-place: It requires only a constant amount O(1) of additional memory space
• Online: Insertion sort can sort the list as it receives it
Algorithm
Every repetition of insertion sort removes an element from the input list, and inserts it into the correct position in
the already-sorted list until no input elements remain. Sorting is typically done in-place. The resulting array after
𝑘 iterations has the property where the first 𝑘 + 1 entries are sorted.
Sorted partial Unordered
result ≤ 𝑥 >𝑥 𝑥 elements…
become
s Sorted partial Unordered
result ≤ 𝑥 𝑥 >𝑥 elements…
Example
Following table shows the sixth pass in detail. At this point in the algorithm, a sorted sublist of six elements
consisting of 4, 5, 10, 43, 57 and 91 exists. We want to insert 45 back into the already sorted items. The first
comparison against 91 causes 91 to be shifted to the right. 57 is also shifted. When the item 43 is encountered,
the shifting process stops and 45 is placed in the open position. Now we have a sorted sublist of seven elements.
Remarks: Sixth pass
0 1 2 3 4 5 6 7 8 Hold the current element A[6] in a variable
4 5 10 43 57 91 45 9 7 Need to insert 45 into the sorted list, copy 91 to A[6] as 91> 45
4 5 10 43 57 91 91 9 7 copy 57 to A[5] as 57> 45
4 5 10 43 57 57 91 9 7 45 is the next largest, skip
4 5 10 43 45 57 91 9 7 45>43, so we can insert 45 at A[4], and sublist is sorted
Analysis
The implementation of insertionSort shows that there are again 𝑛 − 1 passes to sort 𝑛 items. The iteration starts at position 1 and moves
through position 𝑛 − 1, as these are the items that need to be inserted back into the sorted sublists. Notice that this is not a complete swap as
was performed in the previous algorithms.
The maximum number of comparisons for an insertion sort is the sum of the first 𝑛 − 1 integers. Again, this is O(𝑛2 ). However, in the best
case, only one comparison needs to be done on each pass. This would be the case for an already sorted list.
One note about shifting versus exchanging is also important. In general, a shift operation requires approximately a third of the processing
work of an exchange since only one assignment is performed. In benchmark studies, insertion sort will show very good performance.
Performance
If every element is greater than or equal to every element to its left, the running time of insertion sort is (𝑛). This situation occurs if the array
starts out already sorted, and so an already-sorted array is the best case for insertion sort.
Worst case complexity O(𝑛2 )
Best case complexity (Improved version) O(𝑛)
Average case complexity (Basic version) O(𝑛2 )
Worst case space complexity O(𝑛2 ) total, O(1) auxiliary
Implementation
void ShellSort(int A[], int array_size) {
int i, j, h, v;
for (h = 1; h = array_size/9; h = 3*h+1);
for ( ; h > 0; h = h/3) {
for (i = h+1; i = array_size; i += 1) {
v = A[i];
j = i;
while (j > h && A[j-h] > v) {
A[j] = A[j-h];
j -= h;
}
A[j] = v;
}
}
}
Note that when ℎ == 1, the algorithm makes a pass over the entire list, comparing adjacent elements, but doing
very few element exchanges. For ℎ == 1, shell sort works just like insertion sort, except the number of inversions
that have to be eliminated is greatly reduced by the previous steps of the algorithm with ℎ > 1.
Analysis
Shell sort is efficient for medium size lists. For bigger lists, the algorithm is not the best choice. It is the fastest of
all O(𝑛2 ) sorting algorithms.
The disadvantage of Shell sort is that it is a complex algorithm and not nearly as efficient as the merge, heap, and
quick sorts. Shell sort is significantly slower than the merge, heap, and quick sorts, but is a relatively simple
algorithm, which makes it a good choice for sorting lists of less than 5000 items unless speed is important. It is
also a good choice for repetitive sorting of smaller lists.
The best case in Shell sort is when the array is already sorted in the right order. The number of comparisons is
less. The running time of Shell sort depends on the choice of increment sequence.
Performance
Worst case complexity depends on gap sequence. Best known: O(𝑛𝑙𝑜𝑔2 𝑛)
Best case complexity O(𝑛)
Average case complexity depends on gap sequence
Worst case space complexity O(𝑛)
Algorithm
Because we're using divide-and-conquer to sort, we need to decide what our subproblems are going to look like.
The full problem is to sort an entire array. Let's say that a subproblem is to sort a subarray. In particular, we'll
think of a subproblem as sorting the subarray starting at index 𝑙𝑒𝑓𝑡 and going through index 𝑟𝑖𝑔ℎ𝑡. It will be
convenient to have a notation for a subarray, so let's say that 𝐴[𝑙𝑒𝑓𝑡. . 𝑟𝑖𝑔ℎ𝑡] denotes this subarray of array 𝐴. In
terms of our notation, for an array of 𝑛 elements, we can say that the original problem is to sort A[0..n-1].
Algorithm Merge-sort(A):
• 𝐷𝑖𝑣𝑖𝑑𝑒 by finding the number 𝑚𝑖𝑑 of the position midway between 𝑙𝑒𝑓𝑡 and 𝑟𝑖𝑔ℎ𝑡. Do this step the same
way we found the midpoint in binary search:
(ℎ𝑖𝑔ℎ−𝑙𝑜𝑤) 𝑙𝑜𝑤+ℎ𝑖𝑔ℎ
𝑚𝑖𝑑 = 𝑙𝑜𝑤 + 𝑜𝑟 .
2 2
• 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 by recursively sorting the subarrays in each of the two subproblems created by the divide step.
That is, recursively sort the subarray 𝐴[𝑙𝑒𝑓𝑡. . 𝑚𝑖𝑑] and recursively sort the subarray 𝐴[𝑚𝑖𝑑 + 1. . 𝑟𝑖𝑔ℎ𝑡].
• 𝐶𝑜𝑚𝑏𝑖𝑛𝑒 by merging the two sorted subarrays back into the single sorted subarray 𝐴[𝑙𝑒𝑓𝑡. . 𝑟𝑖𝑔ℎ𝑡].
We need a base case. The base case is a subarray containing fewer than two elements, that is, when 𝑙𝑒𝑓𝑡 ≥ 𝑟𝑖𝑔ℎ𝑡,
since a subarray with no elements or just one element is already sorted. So we'll divide-conquer-combine only
when 𝑙𝑒𝑓𝑡 < 𝑟𝑖𝑔ℎ𝑡.
Example
To understand merge sort, let us walk through an example:
54 26 93 17 77 31 44 55
We know that merge sort first divides the whole array iteratively into equal halves unless the atomic values are
achieved. We see here that an array of 8 items is divided into two arrays of size 4.
54 26 93 17 77 31 44 55
This does not change the sequence of appearance of items in the original. Now we divide these two arrays into
halves.
54 26 93 17 77 31 44 55
We further divide these arrays and we achieve atomic value which can no more be divided.
54 26 93 17 77 31 44 55
Now, we combine them in exactly the same manner as they were broken down.
We first compare the element for each array and then combine them into another array in a sorted manner. We
see that 54 and 26 and in the target array of 2 values we put 26 first, followed by 54.
Similarly, we compare 93 and 17 and in the target array of 2 values we put 17 first, followed by 93. On the similar
lines, we change the order of 77 and 31 whereas 44 and 55 are placed sequentially.
26 54 17 93 31 77 44 55
In the next iteration of the combining phase, we compare lists of two data values, and merge them into an array
of found data values placing all in a sorted order.
17 26 54 93 31 44 55 77
After the final merging, the array should look like this:
17 26 31 44 54 55 77 93
The overall flow of above discussion can be depicted as:
54 26 93 17 77 31 44 55
54 26 93 17 77 31 44 55
54 26 93 17 77 31 44 55
54 26 93 17 77 31 44 55
26 54 17 93 31 77 44 55
17 26 54 93 31 44 55 77
17 26 31 44 54 55 77 93
Implementation
void Mergesort(int A[], int temp[], int left, int right) {
int mid;
if(right > left) {
mid = (right + left) / 2;
Mergesort(A, temp, left, mid);
Mergesort(A, temp, mid+1, right);
Merge(A, temp, left, mid+1, right);
}
}
void Merge(int A[], int temp[], int left, int mid, int right) {
int i, left_end, size, temp_pos;
left_end = mid - 1;
temp_pos = left;
Analysis
In merge-sort the input array is divided into two parts and these are solved recursively. After solving the subarrays,
they are merged by scanning the resultant subarrays. In merge sort, the comparisons occur during the merging
step, when two sorted arrays are combined to output a single sorted array. During the merging step, the first
available element of each array is compared and the lower value is appended to the output array. When either
array runs out of values, the remaining elements of the opposing array are appended to the output array.
How do determine the complexity of merge-sort? We start by thinking about the three parts of divide-and-conquer
and how to account for their running times. We assume that we're sorting a total of 𝑛 elements in the entire array.
The divide step takes constant time, regardless of the subarray size. After all, the divide step just computes the
midpoint 𝑚𝑖𝑑 of the indices 𝑙𝑒𝑓𝑡 and 𝑟𝑖𝑔ℎ𝑡. Recall that in big- notation, we indicate constant time by (1).
𝑛
The conquer step, where we recursively sort two subarrays of approximately 2 elements each, takes some amount
of time, but we'll account for that time when we consider the subproblems. The combine step merges a total of 𝑛
elements, taking (𝑛) time.
If we think about the divide and combine steps together, the (1) running time for the divide step is a low-order
term when compared with the (𝑛) running time of the combine step. So let's think of the divide and combine
steps together as taking (𝑛) time. To make things more concrete, let's say that the divide and combine steps
together take 𝑐𝑛 time for some constant 𝑐.
Let us assume 𝑇(𝑛) is the complexity of merge-sort with 𝑛 elements. The recurrence for the merge-sort can be
defined as:
𝑛
𝑇(𝑛) = 2𝑇( ) + (𝑛)
2
Using Master theorem, we get, 𝑇(𝑛) = ( 𝑛𝑙𝑜𝑔𝑛).
For merge-sort there is no running time difference between best, average and worse cases as the division of input arrays
happen irrespective of the order of the elements. Above merge sort algorithm uses an auxiliary space of O(𝑛) for left and
right subarrays together. Merge-sort is a recursive algorithm and each recursive step puts another frame on the run time
stack. Sorting 32 items will take one more recursive step than 16 items, and it is in fact the size of the stack that is
referred to when the space requirement is said to be O(𝑙𝑜𝑔𝑛).
Worst case complexity (𝑛𝑙𝑜𝑔𝑛)
Best case complexity (𝑛𝑙𝑜𝑔𝑛)
Average case complexity (𝑛𝑙𝑜𝑔𝑛)
Space complexity (𝑙𝑜𝑔𝑛) for runtime stack space and O(𝑛) for the auxiliary space
Performance
Worst case performance (𝑛𝑙𝑜𝑔𝑛)
Best case performance (𝑛𝑙𝑜𝑔𝑛)
Average case performance (𝑛𝑙𝑜𝑔𝑛)
Worst case space complexity (𝑛) total, (1) auxiliary space
For other details on Heapsort refer to the 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 chapter.
Algorithm
The recursive algorithm consists of four steps:
1) If there are one or no elements in the list to be sorted, return.
2) Pick an element in the list to serve as the 𝑝𝑖𝑣𝑜𝑡 point. Usually the first element in the list is used as a 𝑝𝑖𝑣𝑜𝑡.
3) Split the list into two parts - one with elements larger than the 𝑝𝑖𝑣𝑜𝑡 and the other with elements smaller than
the 𝑝𝑖𝑣𝑜𝑡.
4) Recursively repeat the algorithm for both halves of the original list.
In the above algorithm, the important step is partitioning the list into two sublists. The basic steps to partition a
list are:
1. Select the first element as a 𝑝𝑖𝑣𝑜𝑡 in the list.
2. Start a pointer (the 𝑙𝑒𝑓𝑡 pointer) at the second item in the list.
3. Start a pointer (the 𝑟𝑖𝑔ℎ𝑡 pointer) at the last item in the list.
4. While the value at the 𝑙𝑒𝑓𝑡 pointer in the list is lesser than the 𝑝𝑖𝑣𝑜𝑡 value, move the 𝑙𝑒𝑓𝑡 pointer to the
right (add 1). Continue this process until the value at the 𝑙𝑒𝑓𝑡 pointer is greater than or equal to the 𝑝𝑖𝑣𝑜𝑡
value.
5. While the value at the 𝑟𝑖𝑔ℎ𝑡 pointer in the list is greater than the 𝑝𝑖𝑣𝑜𝑡 value, move the 𝑟𝑖𝑔ℎ𝑡 pointer to the
left (subtract 1). Continue this process until the value at the 𝑟𝑖𝑔ℎ𝑡 pointer is lesser than or equal to the
𝑝𝑖𝑣𝑜𝑡 value.
6. If the 𝑙𝑒𝑓𝑡 pointer value is greater than or equal to the 𝑟𝑖𝑔ℎ𝑡 pointer value, then swap the values at these
locations in the list.
7. If the 𝑙𝑒𝑓𝑡 pointer and 𝑟𝑖𝑔ℎ𝑡 pointer don’t meet, go to step 1.
Example
Following example shows that 50 will serve as our first pivot value. The partition process will happen next. It will find the 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 point
and at the same time move other items to the appropriate side of the list, either lesser than or greater than the 𝑝𝑖𝑣𝑜𝑡 value.
50 25 92 16 76 30 43 54 19
pivot
Partitioning begins by locating two position markers—let’s call them 𝑙𝑒𝑓𝑡 and 𝑟𝑖𝑔ℎ𝑡—at the beginning and end of the remaining items in the
list (positions 1 and 8 in figure). The goal of the partition process is to move items that are on the wrong side with respect to the pivot value
while converging on the split point also. The figure given below shows this process as we locate the position of 50.
50 25 92 16 76 30 43 54 19
50 25 19 16 43 30 76 54 92
pivot right
left
76 > 50, stop from moving 𝑙𝑒𝑓𝑡 pointer:
50 25 19 16 43 30 76 54 92
pivot right
left
76 > 50, move 𝑟𝑖𝑔ℎ𝑡 pointer to left:
50 25 19 16 43 30 76 54 92
30 25 19 16 43 50 76 54 92
Implementation
void quickSort( int A[], int low, int high ) {
int pivot;
/* Termination condition! */
if( high > low ) {
pivot = partition( A, low, high );
quickSort ( A, low, pivot-1 );
quickSort ( A, pivot+1, high );
}
}
int partition( int A, int low, int high ) {
int left, right, pivot_item = A[low];
left = low;
right = high;
while ( left < right ) {
/* Move left while item < pivot */
while( A[left] <= pivot_item )
left++;
/* Move right while item > pivot */
while( A[right] > pivot_item )
right--;
if( left < right )
swap(A,left,right);
}
/* right is final position for the pivot */
A[low] = A[right];
A[right] = pivot_item;
return right;
}
Analysis
Let us assume that T(n) be the complexity of Quick sort and also assume that all elements are distinct. Recurrence for 𝑇(𝑛) depends on two
subproblem sizes which depend on partition element. If pivot is 𝑖 𝑡ℎ smallest element then exactly (𝑖 − 1) items will be in left part and (n −
𝑖) in right part. Let us call it as 𝑖 −split. Since each element has equal probability of selecting it as pivot the probability of selecting 𝑖 𝑡ℎ element
1
is 𝑛.
Best Case: Each partition splits array in halves and gives
T(n) = 2T(n/2) + (n) = (nlogn), [using 𝐷𝑖𝑣𝑖𝑑𝑒 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 master theorem]
Worst Case: Each partition gives unbalanced splits and we get
𝑇(𝑛) = 𝑇(𝑛 − 1) + Θ(𝑛) = (𝑛2 )[𝑢𝑠𝑖𝑛𝑔 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 𝑚𝑎𝑠𝑡𝑒𝑟 𝑡ℎ𝑒𝑜𝑟𝑒𝑚]
The worst-case occurs when the list is already sorted and last element chosen as pivot.
Average Case: In the average case of Quick sort, we do not know where the split happens. For this reason, we take all possible values of split
locations, add all their complexities and divide with 𝑛 to get the average case complexity.
𝑛
1
𝑇(𝑛) = ∑ (𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑖 − 𝑠𝑝𝑙𝑖𝑡) + 𝑛 + 1
𝑛
𝑖=1
𝑁
1
= ∑(𝑇(𝑖 − 1) + 𝑇(𝑛 − 𝑖)) + 𝑛 + 1
𝑛
𝑖=1
//since we are dealing with best case we can assume 𝑇(𝑛 − 𝑖) and 𝑇(𝑖 − 1) are equal
𝑛
2
= ∑ 𝑇(𝑖 − 1) + 𝑛 + 1
𝑛
𝑖=1
𝑛−1
2
= ∑ 𝑇(𝑖) + 𝑛 + 1
𝑛
𝑖=0
Multiply both sides by 𝑛.
𝑛−1
𝑛T(𝑛) = 2 ∑ 𝑇(𝑖) + 𝑛2 + 𝑛
𝑖=0
Same formula for 𝑛 − 1.
𝑛−2
Performance
Worst case complexity O(𝑛2 )
Best case complexity O(𝑛𝑙𝑜𝑔𝑛)
Average case complexity O(𝑛𝑙𝑜𝑔𝑛)
Worst case space complexity O(1)
Performance
The average number of comparisons for this method is O(𝑛𝑙𝑜𝑔𝑛). But in worst case, the number of comparisons
is reduced by O(𝑛2 ), a case which arises when the sort tree is skew tree.
A[i++] = j;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
K-Way Mergesort
Complexity of the 2-way External Merge sort: In each pass we read + write each page in file. Let us assume that
there are 𝑛 pages in file. That means we need 𝑙𝑜𝑔𝑛 + 1 number of passes. The total cost is 2𝑛(𝑙𝑜𝑔𝑛 + 1).
Solution: This problem is nothing but finding the element which repeated the maximum number of times. The
solution is similar to the Problem-1 solution: keep track of counter.
int CheckWhoWinsTheElection(in A[], int n) {
int i, j, counter = 0, maxCounter = 0, candidate;
candidate = A[0];
for (i = 0; i < n; i++) {
candidate = A[i];
counter = 0;
for (j = i + 1; j < n; j++) {
if(A[i]==A[j]) counter++;
}
if(counter > maxCounter) {
maxCounter = counter;
candidate = A[i];
}
}
return candidate;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Note: For variations of this problem, refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-4 Can we improve the time complexity of Problem-3? Assume we don’t have any extra space.
Solution: Yes. The approach is to sort the votes based on candidate ID, then scan the sorted array and count up
which candidate so far has the most votes. We only have to remember the winner, so we don’t need a clever data
structure. We can use Heapsort as it is an in-place sorting algorithm.
int CheckWhoWinsTheElection(in A[], int n) {
int i, j, currentCounter = 1, maxCounter = 1;
int currentCandidate, maxCandidate;
currentCandidate = maxCandidate= A[0];
//for heap sort algorithm refer 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 Chapter
Heapsort( A, n );
for (int i = 1; i <= n; i++) {
if( A[i] == currentCandidate)
currentCounter ++;
else {
currentCandidate = A[i];
currentCounter = 1;
}
if(currentCounter > maxCounter)
maxCandidate = currentCandidate;
maxCounter = currentCounter;
}
}
return candidate;
}
Since Heapsort time complexity is O(𝑛𝑙𝑜𝑔𝑛) and in-place, so it only uses an additional O(1) of storage in addition
to the input array. The scan of the sorted array does a constant-time conditional 𝑛 − 1 times, thus using O(𝑛)
time. The overall time bound is O(𝑛𝑙𝑜𝑔𝑛).
Problem-5 Can we further improve the time complexity of Problem-3?
Solution: In the given problem, number of candidates is less but the number of votes is significantly large. For
this problem we can use counting sort.
Time Complexity: O(𝑛), 𝑛 is the number of votes (elements) in array.
Space Complexity: O(𝑘), 𝑘 is the number of candidates participated in election.
Problem-6 Consider the voting problem from the previous exercise, but now suppose that we know the
number k < n of candidates running. Describe an O(𝑛𝑙𝑜𝑔𝑘)-time algorithm for determining who wins the
election.
Solution: In this case, the candidates can be stored in a balanced binary tree (for example, an AVL Tree). Each
node should store a candidate ID and the number of votes they have received so far. As each vote in the sequence
is processed, search the tree for the candidate ID of the chosen candidate (which takes O(𝑙𝑜𝑔𝑘) time). If the ID is
found, add one to its number of votes. Otherwise, create a new node with this ID and with one vote. At the end,
go through all the nodes in the tree to find the one with the most votes. The process could be sped up even further
in the average case (though not in the worst case) by replacing the AVL Tree with a Hash Table.
Problem-7 Given an array 𝐴 of 𝑛 elements, each of which is an integer in the range [1, 𝑛2 ], how do we sort the
array in O(𝑛) time?
Solution: If we subtract each number by 1 then we get the range [0, 𝑛2 – 1]. If we consider all number as 2 −digit
base 𝑛. Each digit ranges from 0 to 𝑛2 - 1. Sort this using radix sort. This uses only two calls to counting sort.
Finally, add 1 to all the numbers. Since there are 2 calls, the complexity is O(2𝑛) ≈O(𝑛).
Problem-8 For Problem-7, what if the range is [1. . . 𝑛³]?
Solution: If we subtract each number by 1 then we get the range [0, 𝑛3 – 1]. Considering all numbers as 3-digit
base 𝑛: each digit ranges from 0 to 𝑛3 - 1. Sort this using radix sort. This uses only three calls to counting sort.
Finally, add 1 to all the numbers. Since there are 3 calls, the complexity is O(3𝑛) ≈O(𝑛).
Problem-9 Given an array with 𝑛 integers, each of value less than 𝑛100 , can it be sorted in linear time?
Solution: Yes. The reasoning is the same as in of Problem-7 and Problem-8.
Problem-10 Let 𝐴 and 𝐵 be two arrays of 𝑛 elements each. Given a number 𝐾, give an O(𝑛𝑙𝑜𝑔𝑛) time algorithm
for determining whether there exists a ∈ A and b ∈ B such that 𝑎 + 𝑏 = 𝐾.
Solution: Since we need O(𝑛𝑙𝑜𝑔𝑛), it gives us a pointer that we need to sort. So, we will do that.
int find( int A[], int B[], int n, K ) {
int i, c;
Heapsort( A, n ); // O(𝑛𝑙𝑜𝑔𝑛)
for (i =0; i< n; i++) { // O(𝑛)
c = k-B[i]; // O(1)
if(BinarySearch(A, c)) // O(𝑙𝑜𝑔𝑛)
return 1;
}
return 0;
}
Note: For variations of this problem, refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-11 Let 𝐴, 𝐵 and 𝐶 be three arrays of 𝑛 elements each. Given a number 𝐾, give an O(𝑛𝑙𝑜𝑔𝑛) time
algorithm for determining whether there exists 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵 and 𝑐 ∈ 𝐶 such that 𝑎 + 𝑏 + 𝑐 = 𝐾.
Solution: Refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-12 Given an array of 𝑛 elements, can we output in sorted order the 𝐾 elements following the median
in sorted order in time O(𝑛 + 𝐾𝑙𝑜𝑔𝐾).
Solution: Yes. Find the median and partition the median. With this we can find all the elements greater than it.
Now find the 𝐾 th largest element in this set and partition it; and get all the elements less than it. Output the sorted
list of the final set of elements. Clearly, this operation takes O(𝑛 + 𝐾𝑙𝑜𝑔𝐾) time.
Problem-13 Consider the sorting algorithms: Bubble sort, Insertion sort, Selection sort, Merge sort, Heap sort,
and Quick sort. Which of these are stable?
Solution: Let us assume that 𝐴 is the array to be sorted. Also, let us say 𝑅 and 𝑆 have the same key and 𝑅 appears
earlier in the array than 𝑆. That means, 𝑅 is at 𝐴[𝑖] and 𝑆 is at 𝐴[𝑗], with 𝑖 < 𝑗. To show any stable algorithm, in
the sorted output 𝑅 must precede 𝑆.
Bubble sort: Yes. Elements change order only when a smaller record follows a larger. Since 𝑆 is not smaller than
𝑅 it cannot precede it.
Selection sort: No. It divides the array into sorted and unsorted portions and iteratively finds the minimum values
in the unsorted portion. After finding a minimum 𝑥, if the algorithm moves 𝑥 into the sorted portion of the array
by means of a swap, then the element swapped could be 𝑅 which then could be moved behind 𝑆. This would invert
the positions of 𝑅 and 𝑆, so in general it is not stable. If swapping is avoided, it could be made stable but the cost
in time would probably be very significant.
Insertion sort: Yes. As presented, when 𝑆 is to be inserted into sorted subarray 𝐴[1. . 𝑗 − 1], only records larger
than 𝑆 are shifted. Thus 𝑅 would not be shifted during 𝑆’𝑠 insertion and hence would always precede it.
Merge sort: Yes, In the case of records with equal keys, the record in the left subarray gets preference. Those are
the records that came first in the unsorted array. As a result, they will precede later records with the same key.
Heap sort: No. Suppose 𝑖 = 1 and 𝑅 and 𝑆 happen to be the two records with the largest keys in the input. Then
𝑅 will remain in location 1 after the array is heapified, and will be placed in location 𝑛 in the first iteration of
Heapsort. Thus 𝑆 will precede 𝑅 in the output.
Quick sort: No. The partitioning step can swap the location of records many times, and thus two records with
equal keys could swap position in the final output.
Problem-14 Consider the same sorting algorithms as that of Problem-13. Which of them are in-place?
Solution:
Bubble sort: Yes, because only two integers are required.
Insertion sort: Yes, since we need to store two integers and a record.
Selection sort: Yes. This algorithm would likely need space for two integers and one record.
Merge sort: No. Arrays need to perform the merge. (If the data is in the form of a linked list, the sorting can be
done in-place, but this is a nontrivial modification.)
Heap sort: Yes, since the heap and partially-sorted array occupy opposite ends of the input array.
Quicksort: No, since it is recursive and stores O(𝑙𝑜𝑔𝑛) activation records on the stack. Modifying it to be non-
recursive is feasible but nontrivial.
Problem-15 Among Quick sort, Insertion sort, Selection sort, and Heap sort algorithms, which one needs the
minimum number of swaps?
Solution: Selection sort – it needs 𝑛 swaps only (refer to theory section).
Problem-16 What is the minimum number of comparisons required to determine if an integer appears more
than 𝑛/2 times in a sorted array of 𝑛 integers?
Solution: Refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-17 Sort an array of 0’s, 1’s and 2’s: Given an array A[] consisting of 0’𝑠, 1’𝑠 and 2’𝑠, give an
algorithm for sorting 𝐴[]. The algorithm should put all 0’𝑠 first, then all 1’𝑠 and all 2’𝑠 last.
Example: Input = {0,1,1,0,1,2,1,2,0,0,0,1}, Output = {0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2}
Solution: Use Counting sort. Since there are only three elements and the maximum value is 2, we need a
temporary array with 3 elements.
Time Complexity: O(𝑛). Space Complexity: O(1).
Note: For variations of this problem, refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-18 Is there any other way of solving Problem-16?
Solution: Using Quick dort. Since we know that there are only 3 elements, 0, 1 and 2 in the array, we can select 1
as a pivot element for Quick sort. Quick sort finds the correct place for 1 by moving all 0’s to the left of 1 and all
2’s to the right of 1. For doing this it uses only one scan.
Time Complexity: O(𝑛). Space Complexity: O(1).
Note: For efficient algorithm, refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-19 How do we find the number that appeared the maximum number of times in an array?
Solution: One simple approach is to sort the given array and scan the sorted array. While scanning, keep track
of the elements that occur the maximum number of times.
Algorithm:
int mostFrequent(int A[], int n) {
// Sort the Aay
sort(A, A + n);
// find the max frequency using linear traversal
int maxCounter = 1, res = A[0], currentCounter = 1;
for (int i = 1; i < n; i++) {
if (A[i] == A[i - 1])
currentCounter++;
else {
if (currentCounter > maxCounter) {
maxCounter = currentCounter;
res = A[i - 1];
}
currentCounter = 1;
}
}
// If last element is most frequent
if (currentCounter > maxCounter) {
maxCounter = currentCounter;
res = A[n - 1];
}
10.20 Sorting: Problems & Solutions 305
Data Structures and Algorithms Made Easy Sorting
return res;
}
Time Complexity = Time for Sorting + Time for Scan = O(𝑛𝑙𝑜𝑔𝑛) +O(𝑛) = O(𝑛𝑙𝑜𝑔𝑛). Space Complexity: O(1).
Note: For variations of this problem, refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-20 Is there any other way of solving Problem-19?
Solution: Using Binary Tree. Create a binary tree with an extra field 𝑐𝑜𝑢𝑛𝑡 which indicates the number of times
an element appeared in the input. Let us say we have created a Binary Search Tree [BST]. Now, do the In-Order
traversal of the tree. The In-Order traversal of BST produces the sorted list. While doing the In-Order traversal
keep track of the maximum element.
Time Complexity: O(𝑛) +O(𝑛) ≈O(𝑛). The first parameter is for constructing the BST and the second parameter is
for Inorder Traversal. Space Complexity: O(2𝑛) ≈O(𝑛), since every node in BST needs two extra pointers.
Problem-21 Is there yet another way of solving Problem-19?
Solution: Using Hash Table. For each element of the given array we use a counter, and for each occurrence of
the element we increment the corresponding counter. At the end we can just return the element which has the
maximum counter.
Time Complexity: O(𝑛). Space Complexity: O(𝑛). For constructing the hash table we need O(𝑛).
Note: For the efficient algorithm, refer to the 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-22 Given a 2 GB file with one string per line, which sorting algorithm would we use to sort the file
and why?
Solution: When we have a size limit of 2GB, it means that we cannot bring all the data into the main memory.
Algorithm: How much memory do we have available? Let’s assume we have 𝑋 MB of memory available. Divide
the file into 𝐾 chunks, where 𝑋 ∗ 𝐾 ~2 𝐺𝐵.
• Bring each chunk into memory and sort the lines as usual (any O(𝑛𝑙𝑜𝑔𝑛) algorithm).
• Save the lines back to the file.
• Now bring the next chunk into memory and sort.
• Once we’re done, merge them one by one; in the case of one set finishing, bring more data from the
particular chunk.
The above algorithm is also known as external sort. Step 3 − 4 is known as K-way merge. The idea behind going
for an external sort is the size of data. Since the data is huge and we can’t bring it to the memory, we need to go
for a disk-based sorting algorithm.
Problem-23 Nearly sorted: Given an array of 𝑛 elements, each which is at most 𝐾 positions from its target
position, devise an algorithm that sorts in O(𝑛 𝑙𝑜𝑔𝐾) time.
Solution: Divide the elements into 𝑛/𝐾 groups of size 𝐾, and sort each piece in O(𝐾𝑙𝑜𝑔𝐾) time, let’s say using
Mergesort. This preserves the property that no element is more than 𝐾 elements out of position. Now, merge each
block of 𝐾 elements with the block to its left.
Problem-24 Is there any other way of solving Problem-23?
Solution: Insert the first 𝐾 elements into a binary heap. Insert the next element from the array into the heap, and
delete the minimum element from the heap. Repeat.
Problem-25 Merging K sorted lists: Given 𝐾 sorted lists with a total of 𝑛 elements, give an O(𝑛𝑙𝑜𝑔𝐾) algorithm
to produce a sorted list of all 𝑛 elements.
𝑛
Solution: Simple Algorithm for merging 𝐾 sorted lists: Consider groups each having 𝐾 elements. Take the first list
and merge it with the second list using a linear-time algorithm for merging two sorted lists, such as the merging
2𝑛
algorithm used in merge sort. Then, merge the resulting list of K elements with the third list, and then merge the
3𝑛
resulting list of elements with the fourth list. Repeat this until we end up with a single sorted list of all 𝑛
𝐾
elements.
Time Complexity: In each iteration we are merging 𝐾 elements.
𝐾
2𝑛 3𝑛 4𝑛 𝐾𝑛 𝑛
𝑇(𝑛) = + + +⋯ (𝑛) = ∑ 𝑖
𝐾 𝐾 𝐾 𝐾 𝐾
𝑖=2
𝑛 𝐾(𝐾+1)
𝑇(𝑛) = 𝐾 [ ] ≈O(𝑛𝐾)
2
Problem-35 Nuts and Bolts Problem: Given a set of 𝑛 nuts of different sizes and 𝑛 bolts such that there is a
one-to-one correspondence between the nuts and the bolts, find for each nut its corresponding bolt. Assume
that we can only compare nuts to bolts: we cannot compare nuts to nuts and bolts to bolts.
Alternative way of framing the question: We are given a box which contains bolts and nuts. Assume there
are 𝑛 nuts and 𝑛 bolts and that each nut matches exactly one bolt (and vice versa). By trying to match a bolt
and a nut we can see which one is bigger, but we cannot compare two bolts or two nuts directly. Design an
efficient algorithm for matching the nuts and bolts.
Solution: Brute Force Approach: Start with the first bolt and compare it with each nut until we find a match. In
the worst case, we require 𝑛 comparisons. Repeat this for successive bolts on all remaining gives O(𝑛2 ) complexity.
Problem-36 For Problem-35, can we improve the complexity?
Solution: In Problem-35, we got O(𝑛2 ) complexity in the worst case (if bolts are in ascending order and nuts are
in descending order). Its analysis is the same as that of Quick Sort. The improvement is also along the same lines.
To reduce the worst case complexity, instead of selecting the first bolt every time, we can select a random bolt and
match it with nuts. This randomized selection reduces the probability of getting the worst case, but still the worst
case is O(𝑛2 ).
Problem-37 For Problem-35, can we further improve the complexity?
Solution: We can use a divide-and-conquer technique for solving this problem and the solution is very similar to
randomized Quick Sort. For simplicity let us assume that bolts and nuts are represented in two arrays 𝐵 and 𝑁.
The algorithm first performs a partition operation as follows: pick a random bolt 𝐵[𝑖]. Using this bolt, rearrange
the array of nuts into three groups of elements:
• First the nuts smaller than 𝐵[𝑖]
• Then the nut that matches 𝐵[𝑖], and
• Finally, the nuts larger than 𝐵[𝑖].
Next, using the nut that matches 𝐵[𝑖], perform a similar partition on the array of bolts. This pair of partitioning
operations can easily be implemented in O(𝑛) time, and it leaves the bolts and nuts nicely partitioned so that the
“𝑝𝑖𝑣𝑜𝑡" bolt and nut are aligned with each other and all other bolts and nuts are on the correct side of these pivots
– smaller nuts and bolts precede the pivots, and larger nuts and bolts follow the pivots. Our algorithm then
completes by recursively applying itself to the subarray to the left and right of the pivot position to match these
remaining bolts and nuts. We can assume by induction on 𝑛 that these recursive calls will properly match the
remaining bolts.
To analyze the running time of our algorithm, we can use the same analysis as that of randomized Quick Sort.
Therefore, applying the analysis from Quick Sort, the time complexity of our algorithm is O(𝑛𝑙𝑜𝑔𝑛).
Alternative Analysis: We can solve this problem by making a small change to Quick Sort. Let us assume that we
pick the last element as the pivot, say it is a nut. Compare the nut with only bolts as we walk down the array.
This will partition the array for the bolts. Every bolt less than the partition nut will be on the left. And every bolt
greater than the partition nut will be on the right.
While traversing down the list, find the matching bolt for the partition nut. Now we do the partition again using
the matching bolt. As a result, all the nuts less than the matching bolt will be on the left side and all the nuts
greater than the matching bolt will be on the right side. Recursively call on the left and right arrays.
The time complexity is O(2nlogn) ≈O(nlogn).
Problem-38 Given a binary tree, can we print its elements in sorted order in O(𝑛) time by performing an In-
order tree traversal?
Solution: Yes, if the tree is a Binary Search Tree [BST]. For more details refer to the 𝑇𝑟𝑒𝑒𝑠 chapter.
Problem-39 Given an array of elements, convert it into an array such that A < B > C < D > E < F and so on.
Solution: Sort the array, then swap every adjacent element to get the final result.
#include<algorithm>
convertArraytoSawToothWave(){
int A[] = {0,-6,9,13,10,-1,8,12,54,14,-5};
int n = sizeof(A)/sizeof(A[0]), i = 1, temp;
sort(A, A+n);
for(i=1; i < n; i+=2){
if(i+1 < n){
temp = A[i]; A[i] = A[i+1]; A[i+1] = temp;
}
}
for(i=0; i < n; i++){
printf(“%d “, A[i]);
}
}
The time complexity is O(nlogn + n) ≈ O(nlogn), for sorting and a scan.
Problem-40 Can we do Problem-39 with O(n) time?
Solution: Make sure all even positioned elements are greater than their adjacent odd elements, and we don’t need
to worry about odd positioned elements. Traverse all even positioned elements of input array, and do the following:
• If the current element is smaller than the previous odd element, swap previous and current.
• If the current element is smaller than the next odd element, swap next and current.
convertArraytoSawToothWave(){
int A[] = {0,-6,9,13,10,-1,8,12,54,14,-5};
int n = sizeof(A)/sizeof(A[0]), i = 1, temp;
sort(A, A+n);
for(i=1; i < n; i+=2){
if (i>0 && A[i-1] > A[i] ){
temp = A[i]; A[i] = A[i-1]; A[i-1] = temp;
}
if (i<n-1 && A[i] < A[i+1] ){
temp = A[i]; A[i] = A[i+1]; A[i+1] = temp;
}
}
for(i=0; i < n; i++){
cout<<A[i]<< " ";
}
}
The time complexity is O(n).
Problem-41 Merge sort uses
(a) Divide and conquer strategy (b) Backtracking approach (c) Heuristic search (d) Greedy approach
Solution: (a). Refer theory section.
Problem-42 Which of the following algorithm design techniques is used in the quicksort algorithm?
(a) Dynamic programming (b) Backtracking (c) Divide and conquer (d) Greedy method
Solution: (c). Refer theory section.
Problem-43 Sort the linked list elements in O(𝑛), where 𝑛 is the number of elements in the linked list.
Solution: As stated many times, the lower bound on comparison based sorting for general data is going to be
O(𝑛𝑙𝑜𝑔𝑛). So, for general data on a linked list, the best possible sort that will work on any data that can compare
two objects is going to be O(𝑛𝑙𝑜𝑔𝑛). However, if you have a more limited domain of things to work in, you can
improve the time it takes (at least proportional to n). For instance, if you are working with integers no larger than
some value, you could use Counting Sort or Radix Sort, as these use the specific objects you're sorting to reduce
the complexity with proportion to 𝑛. Be careful, though, these add some other things to the complexity that you
may not consider (for instance, Counting Sort and Radix sort both add in factors that are based on the size of the
numbers you're sorting, O(𝑛 + 𝑘) where 𝑘 is the size of largest number for Counting Sort, for instance).
Also, if you happen to have objects that have a perfect hash (or at least a hash that maps all values differently),
you could try using a counting or radix sort on their hash functions. It is the application of counting sort.
Algorithm:
1) In the given linked list, find the maximum element (𝑘). This would take O(𝑛) time.
2) Create a hash map of the size 𝑘. This would need O(𝑘) space.
3) Scan through the linked list elements, set 1 to the corresponding index in the array. Suppose element in
the linked is 19, then H[4] = 1. This would take O(𝑛) time.
4) Now the array is sorted as similar to counting sort.
5) Read the elements from the array and add it back to the list with a time complexity of O(𝑘).
Problem-44 For merging two sorted lists of sizes m and n into a sorted list of size m+n, we required
comparisons of
(a) O(𝑚) (b) O(𝑛) (c) O(𝑚 + 𝑛) (d) O(𝑙𝑜𝑔𝑚 + 𝑙𝑜𝑔𝑛)
Solution: (c). We can use merge sort logic. Refer theory section.
Problem-45 Quick-sort is run on two inputs shown below to sort in ascending order
(i) 1,2,3 ….n (ii) n, n – 1, n – 2, …. 2, 1
Let C1 and C2 be the number of comparisons made for the inputs (i) and (ii) respectively. Then,
(a) C1 < C2 (b) C1 > C2 (c) C1 = C2 (d) we cannot say anything for arbitrary 𝑛.
Solution: (b). Since the given problems needs the output in ascending order, Quicksort on already sorted order
gives the worst case (O(𝑛2 )). So, (i) generates worst case and (ii) needs fewer comparisons.
Problem-46 Give the correct matching for the following pairs:
(A) O(𝑙𝑜𝑔𝑛) (P) Selection
(B) O(𝑛) (Q) Insertion sort
(C) O(𝑛𝑙𝑜𝑔𝑛) (R) Binary search
(D) O(𝑛2 ) (S) Merge sort
(a) A – R B – P C – Q D - S (b) A – R B – P C – S D - Q (c) A – P B – R C – S D - Q (d) A – P B – S C – R D
-Q
Solution: (b). Refer theory section.
Problem-47 Let s be a sorted array of n integers. Let t(n) denote the time taken for the most efficient algorithm
to determine if there are two elements with sum less than 1000 in s. which of the following statements is true?
a) 𝑡(𝑛) is O(1) b) 𝑛 < 𝑡(𝑛) < 𝑛𝑙𝑜𝑔2𝑛 c) 𝑛𝑙𝑜𝑔2𝑛 < 𝑡(𝑛) < (𝑛2) d) 𝑡(𝑛) = (𝑛2)
Solution: (a). Since the given array is already sorted it is enough if we check the first two elements of the array.
Problem-48 The usual (𝑛2 ) implementation of Insertion Sort to sort an array uses linear search to identify the position
where an element is to be inserted into the already sorted part of the array. If, instead, we use binary search to identify the
position, the worst case running time will
(a) remain (𝑛2 ) (b) become (𝑛(𝑙𝑜𝑔 𝑛)2 ) (c) become (𝑛𝑙𝑜𝑔𝑛) (d) become (𝑛)
Solution: (a). If we use binary search then there will be 𝑙𝑜𝑔2𝑛! comparisons in the worst case, which is (𝑛𝑙𝑜𝑔𝑛). But
the algorithm as a whole will still have a running time of (𝑛2 ) on average because of the series of swaps required
for each insertion.
Problem-49 In quick sort, for sorting n elements, the 𝑛/4𝑡ℎ smallest element is selected as pivot using an O(𝑛)
time algorithm. What is the worst case time complexity of the quick sort?
(A) (𝑛) (B) (𝑛𝐿𝑜𝑔𝑛) (C) (𝑛2 ) (D) (𝑛2 𝑙𝑜𝑔𝑛)
Solution: The recursion expression becomes: T(n) = T(n/4) + T(3n/4) + cn. Solving the recursion using 𝑣𝑎𝑟𝑖𝑎𝑛𝑡 of
master theorem, we get (𝑛𝐿𝑜𝑔𝑛).
Problem-50 Consider the Quicksort algorithm. Suppose there is a procedure for finding a pivot element which
splits the list into two sub-lists each of which contains at least one-fifth of the elements. Let T(𝑛) be the number
of comparisons required to sort n elements. Then
A) T (n) ≤ 2T (n /5) + n B) T (n) ≤ T (n /5) + T (4n /5) + n C) T (n) ≤ 2T (4n /5) + n D) T (n) ≤ 2T (n /2) + n
Solution: (C). For the case where n/5 elements are in one subset, T(n/5) comparisons are needed for the first
subset with 𝑛/5 elements, T(4n/5) is for the rest 4n/5 elements, and 𝑛 is for finding the pivot. If there are more
than 𝑛/5 elements in one set then other set will have less than 4n/5 elements and time complexity will be less
than T(n/5) + T(4n/5) + n.
Problem-51 Which of the following sorting algorithms has the lowest worst-case complexity?
(A) Merge sort (B) Bubble sort (C) Quick sort (D) Selection sort
Solution: (A). Refer theory section.
Problem-52 Which one of the following in place sorting algorithms needs the minimum number of swaps?
(A) Quick sort (B) Insertion sort (C) Selection sort (D) Heap sort
Solution: (C). Refer theory section.
Problem-53 You have an array of n elements. Suppose you implement quicksort by always choosing the
central element of the array as the pivot. Then the tightest upper bound for the worst case performance is
(A) O(𝑛2 ) (B) O(𝑛𝑙𝑜𝑔𝑛) (C) (𝑛𝑙𝑜𝑔𝑛) (D) O(𝑛3 )
Solution: (A). When we choose the first element as the pivot, the worst case of quick sort comes if the input is
sorted- either in ascending or descending order.
Problem-54 Let P be a QuickSort Program to sort numbers in ascending order using the first element as pivot.
Let t1 and t2 be the number of comparisons made by P for the inputs {1, 2, 3, 4, 5} and {4, 1, 5, 3, 2} respectively.
Which one of the following holds?
(A) t1 = 5 (B) t1 < t2 (C) t1 > t2 (D) t1 = t2
Solution: (C). Quick Sort‘s worst case occurs when first (or last) element is chosen as pivot with sorted arrays.
Problem-55 The minimum number of comparisons required to find the minimum and the maximum of 100
numbers is __
Solution: 147 (Formula for the minimum number of comparisons required is 3n/2 – 3 with n numbers).
Problem-56 The number of elements that can be sorted in T(𝑙𝑜𝑔𝑛) time using heap sort is
(A) (1) (B) (sqrt(logn)) (C) (log n/(log log n)) (D) (logn)
Solution: (D). Sorting an array with k elements takes time (k log k) as k grows. We want to choose k such that
(k log k) = (logn). Choosing k = (logn) doesn't necessarily work, since (k log k) = (logn loglogn) ≠ (logn). On
the other hand, if you choose k = T(log n / log log n), then the runtime of the sort will be
= ((logn / loglogn) log (logn / loglogn))
= ((logn / loglogn) (loglogn - logloglogn))
= (logn - logn logloglogn / loglogn)
= (logn (1 - logloglogn / loglogn))
Notice that 1 - logloglogn / loglogn tends toward 1 as n goes to infinity, so the above expression actually is (log
n), as required. Therefore, if you try to sort an array of size (logn / loglogn) using heap sort, as a function of n,
the runtime is (logn).
Problem-57 Which one of the following is the tightest upper bound that represents the number of swaps
required to sort 𝑛 numbers using selection sort?
(A) O(𝑙𝑜𝑔𝑛) (B) O(𝑛) (C) O(𝑛𝑙𝑜𝑔𝑛) (D) O(𝑛2 )
Solution: (B). Selection sort requires only O(𝑛) swaps.
Problem-58 Which one of the following is the recurrence equation for the worst case time complexity of the
Quicksort algorithm for sorting n(≥ 2) numbers? In the recurrence equations given in the options below, c is a
constant.
(A)T(n) = 2T (n/2) + cn (B) T(n) = T(n – 1) + T(0) + cn (C) T(n) = 2T (n – 2) + cn (D) T(n) = T(n/2) + cn
Solution: (B). When the pivot is the smallest (or largest) element at partitioning on a block of size 𝑛 the result
yields one empty sub-block, one element (pivot) in the correct place and sub block of size 𝑛 − 1.
Problem-59 True or False. In randomized quicksort, each key is involved in the same number of comparisons.
Solution: False.
Problem-60 True or False: If Quicksort is written so that the partition algorithm always uses the median value
of the segment as the pivot, then the worst-case performance is O(𝑛𝑙𝑜𝑔𝑛).
Soution: True.
Problem-61 Squares of a sorted array: Given an array of numbers A sorted in ascending order, return an
array of the squares of each number, also in sorted ascending order. For array = [-6, -4, 1, 2, 3, 5], the output
should be [1, 4, 9, 16, 25, 36].
Solution: Intuitive approach: One simplest approach to solve this problem is to create an array of the squares
of each element, and sort them.
#include<bits/stdc++.h>
using namespace std;
void sortedSquaredArray(int A[], int n) {
int result[n];
for (int i = 0; i < n; ++i)
result[i] = A[i] * A[i];
sort(result, result+n);
cout << "\nSorted squares array " << endl;
for (int i = 0 ; i < n ; i++)
cout << result[i] << " " ;
}
int main(){
int A[] = { -4, -3, -1, 3, 4, 5 };
int n = sizeof(A)/sizeof(A[0]);
cout << "Given sorted array " << endl;
for (int i = 0; i < n; i++)
cout << A[i] << " " ;
sortedSquaredArray(A, n);
return 0;
}
Time complexity: O(𝑛𝑙𝑜𝑔𝑛), for sorting the array. Space complexity: O(𝑛), for the result array.
Elegant approach: Since the given array A is sorted, it might have some negative elements. The squares of these
negative numbers would be in decreasing order. Similarly, the squares of positive numbers would be in increasing
order. For example, with [-4, -3, -1, 3, 4, 5], we have the negative part [-4, -3, -1] with squares [16, 9, 1], and the
positive part [3, 4, 5] with squares [9, 16, 25]. Our strategy is to iterate over the negative part in reverse, and the
positive part in the forward direction.
We can use two pointers to read the positive and negative parts of the array - one pointer i in the positive direction,
and another j in the negative direction.
Now that we are reading two increasing arrays (the squares of the elements), we can merge these arrays together
using a two-pointer technique.
#include<bits/stdc++.h>
using namespace std;
void sortedSquaredArray(int A[], int n) {
int result[n];
int j = 0;
// Find the last index of the negative numbers
while (j < n && A[j] < 0)
j++;
// i points to the last index of negative numbers
int i = j-1;
int t = 0;
// j points to the first index of the positive numbers
while (i >= 0 && j < n) {
if (A[i] * A[i] < A[j] * A[j]) {
result[t++] = A[i] * A[i];
i--;
} else {
result[t++] = A[j] * A[j];
j++;
}
}
// add the remaining negative numbers squares to result
while (i >= 0) {
result[t++] = A[i] * A[i];
i--;
}
// add the remaining positive numbers squares to result
while (j < n) {
result[t++] = A[j] * A[j];
j++;
}
cout << "\nSorted squares array " << endl;
for (int i = 0 ; i < n ; i++)
cout << result[i] << " " ;
}
int main(){
int A[] = { -4, -3, -1, 3, 4, 5 };
int n = sizeof(A)/sizeof(A[0]);
cout << "Given sorted array " << endl;
for (int i = 0; i < n; i++)
cout << A[i] << " " ;
sortedSquaredArray(A, n);
return 0;
}
Time complexity: O(𝑛). Space complexity: O(𝑛), for the result array.
Problem-62 Height Checker: A school is trying to take an annual photo of all the students. The students are
asked to stand in a single file line in non-decreasing order by height. Let this ordering be represented by the
integer array expected where expected[i] is the expected height of the ith student in line. We are given an integer
array ℎ𝑒𝑖𝑔ℎ𝑡𝑠 representing the current order that the students are standing in. Each ℎ𝑒𝑖𝑔ℎ𝑡𝑠[i] is the height of
the ith student in line (0-indexed). Return the number of indices where heights[i] != expected[i].
Solution: The goal of the problem is to count the number of students in a line who are not standing in non-
decreasing order of height. We can create a sorted version of the line ("expected" order) using a sorting algorithm,
and then compare it to the original line to count the number of differences. However, sorting the entire line can
be inefficient for large inputs. Instead, we can use the counting sort algorithm to create a sorted array of heights.
• Find the maximum height in the heights array and store it in a variable max_height.
• Create a counting array count of length (max_height + 1) and initialize all its elements to zero.
• Loop through the heights array and use each value to increment the count of that value in the count
array.
• Calculate the cumulative sum of the count array. This means that each element in the count array will
contain the number of elements less than or equal to the index of that element.
• Initialize a variable result to zero.
• Loop through the heights array and compare each element with the expected height at the same index in
the expected array. If they are not equal, increment the result variable.
• Return the result variable.
int heightChecker(int* heights, int heightsSize){
int expected[101] = {0}; // initialize expected array with 0s
int count = 0;
// count the number of occurrences of each height in the heights array
for (int i = 0; i < heightsSize; i++) {
expected[heights[i]]++;
}
// compare the heights in the heights array with the expected heights
// increment count for each height that is out of order
int j = 0;
for (int i = 1; i < 101; i++) {
while (expected[i]-- > 0) {
if (heights[j++] != i) {
count++;
}
}
}
The time complexity of the counting sort algorithm is O(n + k), where n is the length of the input array and k is
the range of the input values. In this specific problem, the range is 100 since the heights are between 1 and 100.
Therefore, the time complexity of the counting sort implementation for this problem is O(n + 100), which is
equivalent to O(n). Additionally, there is a single pass over the input array to compare the sorted array with the
original array, which takes O(n) time. Thus, the overall time complexity of the algorithm is O(n). The space
complexity of the algorithm is O(k), which is equivalent to O(1) in this problem since the range is constant.
Chapter
Searching 11
11.1 What is Searching?
In computer science, 𝑠𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 is the process of finding an item with specified properties from a collection of items.
The items may be stored as records in a database, simple data elements in arrays, text in files, nodes in trees,
vertices and edges in graphs, or they may be elements of other search spaces.
return i;
else if(A[i] > data)
return -1;
}
return -1;
}
Time complexity of this algorithm is O(𝑛). This is because in the worst case we need to scan the complete array.
But in the average case it reduces the complexity even though the growth rate is the same.
Space complexity: O(1).
Note: For the above algorithm we can make further improvement by incrementing the index at a faster rate (say,
2). This will reduce the number of comparisons for searching in the sorted list.
(ℎ𝑖𝑔ℎ−𝑙𝑜𝑤) 𝑙𝑜𝑤+ℎ𝑖𝑔ℎ
𝑚𝑖𝑑 = 𝑙𝑜𝑤 + 𝑜𝑟
2 2
In the mathematics, interpolation is a process of constructing new data points within the range of a discrete set of
known data points. In computer science, one often has a number of data points which represent the values of a
function for a limited number of values of the independent variable. It is often required to interpolate (i.e. estimate)
the value of that function for an intermediate value of the independent variable.
For example, suppose we have a table like this, which gives some values of an unknown function f. Interpolation
provides a means of estimating the function at intermediate points, such as x = 5.5.
𝑥 𝑓(𝑥)
1 10
2 20
3 30
4 40
5 50
6 60
7 70
There are many different interpolation methods, and one of the simplest methods is linear interpolation. Consider
the above example of estimating f(5.5). Since 5.5 is midway between 5 and 6, it is reasonable to take 𝑓(5.5) midway
between 𝑓(5) = 50 and 𝑓(6) = 60, which yields 55 ((50+60)/2).
Linear interpolation takes two data points, say (𝑥1 , 𝑦1 ) and (𝑥2 , 𝑦2 ), and the interpolant is given by:
𝑥 − 𝑥1
𝑦 = 𝑦1 + (𝑦2 − 𝑦1 ) 𝑎𝑡 𝑝𝑜𝑖𝑛𝑡 (𝑥, 𝑦)
𝑥2 − 𝑥1
With above inputs, what will happen if we don’t use the constant ½, but another more accurate constant “K”, that
can lead us closer to the searched item.
𝑙𝑜𝑤 𝑑𝑎𝑡𝑎 to be searched ℎ𝑖𝑔ℎ
𝑑𝑎𝑡𝑎−𝑙𝑜𝑤
𝐾= ℎ𝑖𝑔ℎ−𝑙𝑜𝑤
This algorithm tries to follow the way we search a name in a phone book, or a word in the dictionary. We, humans,
know in advance that in case the name we’re searching starts with a “m”, like “monk” for instance, we should
start searching near the middle of the phone book. Thus if we’re searching the word “career” in the dictionary,
you know that it should be placed somewhere at the beginning. This is because we know the order of the letters,
we know the interval (a-z), and somehow we intuitively know that the words are dispersed equally. These facts are
enough to realize that the binary search can be a bad choice. Indeed the binary search algorithm divides the list
in two equal sub-lists, which is useless if we know in advance that the searched item is somewhere in the beginning
or the end of the list. Yes, we can use also jump search if the item is at the beginning, but not if it is at the end,
in that case this algorithm is not so effective.
The interpolation search algorithm tries to improve the binary search. The question is how to find this value? Well,
we know bounds of the interval and looking closer to the image above we can define the following formula.
𝑑𝑎𝑡𝑎 − 𝑙𝑜𝑤
𝐾=
ℎ𝑖𝑔ℎ − 𝑙𝑜𝑤
This constant 𝐾 is used to narrow down the search space. For binary search, this constant 𝐾 is (𝑙𝑜𝑤 + ℎ𝑖𝑔ℎ)/2.
Now we can be sure that we’re closer to the searched value. On average the interpolation search makes about
𝑙𝑜𝑔(𝑙𝑜𝑔𝑛) comparisons (if the elements are uniformly distributed), where 𝑛 is the number of elements to be
searched. In the worst case (for instance where the numerical values of the keys increase exponentially) it can
make up to O(𝑛) comparisons. In interpolation-sequential search, interpolation is used to find an item near the
one being searched for, then linear search is used to find the exact item. For this algorithm to give best results,
the dataset should be ordered and uniformly distributed.
int interpolationSearch(int A[], int data){
int low = 0, mid, high = sizeof(A) – 1;
while (low <= high) {
mid = low + (((data – A[low]) * (high – low))/(A[high] – A[low]));
if (data == A[mid])
return mid + 1;
if (data < A[mid])
high = mid - 1;
else
low = mid + 1;
}
return -1;
}
Solution: Yes, using hash table. Hash tables are a simple and effective method used to implement dictionaries.
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 time to search for an element is O(1), while worst-case time is O(𝑛). Refer to 𝐻𝑎𝑠ℎ𝑖𝑛𝑔 chapter for more
details on hashing algorithms. As an example, consider the array, 𝐴 = {3, 2, 1, 2, 2, 3}.
Scan the input array and insert the elements into the hash. For each inserted element, keep the 𝑐𝑜𝑢𝑛𝑡𝑒𝑟 as 1
(assume initially all entires are filled with zeros). This indicates that the corresponding element has occurred
already. For the given array, the hash table will look like (after inserting the first three elements 3, 2 and 1):
1 → 1
2 → 1
3 → 1
Now if we try inserting 2, since the counter value of 2 is already 1, we can say the element has appeared twice.
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-4 Can we further improve the complexity of Problem-1 solution?
Solution: Let us assume that the array elements are positive numbers and also all the elements are in the range
0 to 𝑛 − 1. For each element 𝐴[𝑖], go to the array element whose index is 𝐴[𝑖]. That means select 𝐴[𝐴[𝑖]] and mark
- 𝐴[𝐴[𝑖]] (negate the value at 𝐴[𝐴[𝑖]]). Continue this process until we encounter the element whose value is already
negated. If one such element exists then we say duplicate elements exist in the given array. As an example,
consider the array, 𝐴 = {3, 2, 1, 2, 2, 3}.
Initially,
3 2 1 2 2 3
0 1 2 3 4 5
At step-1, negate A[abs(A[0])],
3 2 1 -2 2 3
0 1 2 3 4 5
At step-2, negate A[abs(A[1])],
3 2 -1 -2 2 3
0 1 2 3 4 5
At step-3, negate A[abs(A[2])],
3 -2 -1 -2 2 3
0 1 2 3 4 5
At step-4, negate A[abs(A[3])],
3 -2 -1 -2 2 3
0 1 2 3 4 5
At step-4, observe that 𝐴[𝑎𝑏𝑠(𝐴[3])] is already negative. That means we have encountered the same value twice.
void checkDuplicates(int A[], int n) {
for(int i = 0; i < n; i++) {
if(A[abs(A[i])] < 0) {
printf(“Duplicates exist:%d”, A[i]);
return;
}
else A[A[i]] = - A[A[i]];
}
printf(“No duplicates in given array.”);
}
Time Complexity: O(𝑛). Since only one scan is required. Space Complexity: O(1).
Notes:
• This solution does not work if the given array is read only.
• This solution will work only if all the array elements are positive.
• If the elements range is not in 0 to 𝑛 − 1 then it may give exceptions.
Problem-5 Given an array of 𝑛 numbers. Give an algorithm for finding the element which appears the
maximum number of times in the array?
Brute Force Solution: One simple solution to this is, for each input element check whether there is any element
with the same value, and for each such occurrence, increment the counter. Each time, check the current counter
with the 𝑚𝑎𝑥 counter and update it if this value is greater than 𝑚𝑎𝑥 counter. This we can solve just by using two
simple 𝑓𝑜𝑟 loops.
int maxRepetitions(int A[], int n) {
int counter =0, max=0;
for(int i = 0; i < n; i++) {
counter=0;
for(int j = 0; j < n; j++) {
if(A[i] == A[j])
counter++;
}
if(counter > max) max = counter;
}
return max;
}
Time Complexity: O(𝑛2 ), for two nested 𝑓𝑜𝑟 loops. Space Complexity: O(1).
Problem-6 Can we improve the complexity of Problem-5 solution?
Solution: Yes. Sort the given array. After sorting, all the elements with equal values come adjacent. Now, just do
another scan on this sorted array and see which element is appearing the maximum number of times.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛). (for sorting). Space Complexity: O(1).
Problem-7 Is there any other way of solving Problem-5?
Solution: Yes, using hash table. For each element of the input, keep track of how many times that element
appeared in the input. That means the counter value represents the number of occurrences for that element.
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-8 For Problem-5, can we improve the time complexity? Assume that the elements’ range is 1 to 𝑛.
That means all the elements are within this range only.
Solution: Yes. We can solve this problem in two scans. We 𝑐𝑎𝑛𝑛𝑜𝑡 use the negation technique of Problem-3 for
this problem because of the number of repetitions. In the first scan, instead of negating, add the value 𝑛. That
means for each occurrence of an element add the array size to that element. In the second scan, check the element
value by dividing it by 𝑛 and return the element which gives the maximum value. The code based on this method
is given below.
void maxRepetitions(int A[], int n) {
int i = 0, max = 0, maxIndex;
for(i = 0; i < n; i++)
A[A[i]%n] +=n;
for(i = 0; i < n; i++)
if(A[i]/n > max) {
max = A[i]/n;
maxIndex =i;
}
return maxIndex;
}
Notes:
• This solution does not work if the given array is read only.
• This solution will work only if the array elements are positive.
• If the elements range is not in 1 to 𝑛 then it may give exceptions.
Time Complexity: O(𝑛). Since no nested 𝑓𝑜𝑟 loops are required. Space Complexity: O(1).
Problem-9 Given an array of 𝑛 numbers, give an algorithm for finding the first element in the array which is
repeated. For example, in the array 𝐴 = {3, 2, 1, 2, 2, 3}, the first repeated number is 3 (not 2). That means, we
need to return the first element among the repeated elements.
Solution: We can use the brute force solution that we used for Problem-1. For each element, since it checks
whether there is a duplicate for that element or not, whichever element duplicates first will be returned.
Problem-10 For Problem-9, can we use the sorting technique?
Solution: No. For proving the failed case, let us consider the following array. For example, 𝐴 = {3, 2, 1, 2, 2, 3}.
After sorting we get 𝐴 = {1, 2, 2, 2, 3, 3}. In this sorted array the first repeated element is 2 but the actual answer is
3.
Problem-11 For Problem-9, can we use the hashing technique?
Solution: Yes. But the simple hashing technique which we used for Problem-3 will not work. For example, if we
consider the input array as A = {3, 2, 1, 2, 3}, then the first repeated element is 3, but using our simple hashing
technique we get the answer as 2. This is because 2 is coming twice before 3. Now let us change the hashing table
behavior so that we get the first repeated element. Let us say, instead of storing 1 value, initially we store the
position of the element in the array. As a result the hash table will look like (after inserting 3, 2 and 1):
1 → 3
11.11 Searching: Problems & Solutions 320
Data Structures and Algorithms Made Easy Searching
2 → 2
3 → 1
Now, if we see 2 again, we just negate the current value of 2 in the hash table. That means, we make its counter
value as −2. The negative value in the hash table indicates that we have seen the same element two times.
Similarly, for 3 (the next element in the input) also, we negate the current value of the hash table and finally the
hash table will look like:
1 → 3
2 → -2
3 → -1
After processing the complete input array, scan the hash table and return the highest negative indexed value from
it (i.e., −1 in our case). The highest negative value indicates that we have seen that element first (among repeated
elements) and also repeating.
What if the element is repeated more than twice? In this case, just skip the element if the corresponding
value 𝑖 is already negative.
Problem-12 For Problem-9, can we use the technique that we used for Problem-3 (negation technique)?
Solution: No. As an example of contradiction, for the array 𝐴 = {3, 2, 1, 2, 2, 3} the first repeated element is 3. But
with negation technique the result is 2.
Problem-13 Find the Missing Number: We are given a list of 𝑛 − 1 integers and these integers are in the range
of 1 to 𝑛. There are no duplicates in the list. One of the integers is missing in the list. Given an algorithm to
find the missing integer. Example: I/P: [1, 2, 4, 6, 3, 7, 8] O/P: 5
Alternative problem statement: There is an array of numbers. A second array is formed by shuffling the
elements of the first array and deleting a random element. Given these two arrays, find which element is
missing in the second array.
Brute Force Solution: One naive way to solve this problem is for each number 𝑖 in the range 1 to 𝑛, check whether
number 𝑖 is in the given array or not.
int findMissingNumber(int A[], int n) {
int i, j, found=0;
for (i = 1; i < =n; i ++) {
found = 0;
for (j = 0; j < n; j ++)
if(A[j]==i)
found = 1;
if(!found) return i;
}
return -1;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-14 For Error! Reference source not found., can we use the sorting technique?
Solution: Yes. More efficient solution is to sort the first array, so while checking whether an element in the range
1 to 𝑛 appears in the given array, we can do binary search.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛), for sorting. Space Complexity: O(1).
Problem-15 For Error! Reference source not found., can we use the hashing technique?
Solution: Yes. Scan the input array and insert elements into the hash. For inserted elements, keep 𝑐𝑜𝑢𝑛𝑡𝑒𝑟 as 1
(assume initially all entires are filled with zeros). This indicates that the corresponding element has occurred
already. Now, for each element in the range 1 to 𝑛 check the hash table and return the element which has counter
value zero. That is, once hit an element with zero count that’s the missing element.
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-16 For Error! Reference source not found., can we improve the complexity?
Solution: Yes. We can use summation formula.
1) Get the sum of numbers, 𝑠𝑢𝑚 = 𝑛 × (𝑛 + 1)/2.
2) Subtract all the numbers from 𝑠𝑢𝑚 and you will get the missing number.
Time Complexity: O(𝑛), for scanning the complete array.
Problem-17 In Error! Reference source not found., if the sum of the numbers goes beyond the maximum
allowed integer, then there can be integer overflow and we may not get the correct answer. Can we solve this
problem?
11.11 Searching: Problems & Solutions 321
Data Structures and Algorithms Made Easy Searching
Solution:
1) 𝑋𝑂𝑅 all the array elements, let the result of 𝑋𝑂𝑅 be 𝑋.
2) 𝑋𝑂𝑅 all numbers from 1 to 𝑛, let 𝑋𝑂𝑅 be Y.
3) 𝑋𝑂𝑅 of 𝑋 and 𝑌 gives the missing number.
int findMissingNumber(int A[], int n) {
int i, X, Y;
for (i = 0; i < n; i ++)
X ^= A[i];
for (i = 1; i <= n; i ++)
Y ^= i;
//In fact, one variable is enough.
return X ^ Y;
}
Let’s analyze why this approach works. What happens when we XOR two numbers? We should think bitwise,
instead of decimal. XORing a 4-bit number with 1101 would flip the first, second, and fourth bits of the number.
XORing the result again with 1101 would flip those bits back to their original value. So, if we XOR a number two
times with some number nothing will change. We can also XOR with multiple numbers and the order would not
matter. For example, say we XOR the number number1 with number2, then XOR the result with number3, then
XOR their result with number2, and then with number3. The final result would be the original number number1.
Because every XOR operation flips some bits and when we XOR with the same number again, we flip those bits
back. So the order of XOR operations is not important. If we XOR a number with some number an odd number of
times, there will be no effect.
Above we XOR all the numbers in the given range 1 to 𝑛 and given array A. All numbers in given array A are from
the range 1 to 𝑛, but there is an extra number in range 1 to 𝑛. So the effect of each XOR from array A is being
reset by the corresponding same number in the range 1 to 𝑛 (remember that the order of XOR is not important).
But we can’t reset the XOR of the extra number in the range 1 to 𝑛, because it doesn’t appear in array A. So the
result is as if we XOR 0 with that extra number, which is the number itself. Since XOR of a number with 0 is the
number. Therefore, in the end we get the missing number in array A. The space complexity of this solution is
constant O(1) since we only use one extra variable. Time complexity is O(𝑛) because we perform a single pass from
the array A and the range 1 to 𝑛.
Time Complexity: O(𝑛), for scanning the complete array. Space Complexity: O(1).
Problem-18 Find the Number Occurring an Odd Number of Times: Given an array of positive integers, all
numbers occur an even number of times except one number which occurs an odd number of times. Find the
number in O(𝑛) time & constant space. Example: I/P = [1, 2, 3, 2, 3, 1, 3] O/P = 3
Solution: Do a bitwise 𝑋𝑂𝑅 of all the elements. We get the number which has odd occurrences. This is because,
𝐴 𝑋𝑂𝑅 𝐴 = 0.
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-19 Find the two repeating elements in a given array: Given an array with 𝑠𝑖𝑧𝑒, all elements of the
array are in range 1 to 𝑛 and also all elements occur only once except two numbers which occur twice. Find
those two repeating numbers. For example: if the array is 4, 2, 4, 5, 2, 3, 1 with 𝑠𝑖𝑧𝑒 = 7 and 𝑛 = 5. This input
has 𝑛 + 2 = 7 elements with all elements occurring once except 2 and 4 which occur twice. So the output
should be 4 2.
Solution: One simple way is to scan the complete array for each element of the input elements. That means use
two loops. In the outer loop, select elements one by one and count the number of occurrences of the selected
element in the inner loop. For the code below, assume that 𝑃𝑟𝑖𝑛𝑡𝑅𝑒𝑝𝑒𝑎𝑡𝑒𝑑𝐸𝑙𝑒𝑚𝑒𝑛𝑡𝑠 is called with 𝑛 + 2 to indicate
the size.
void PrintRepeatedElements(int A[], int size) {
for(int i = 0; i < size; i++)
for(int j = i+1; j < size; j++)
if(A[i] == A[j])
printf(“%d”, A[i]);
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-20 For Problem-19, can we improve the time complexity?
Solution: Sort the array using any comparison sorting algorithm and see if there are any elements which are
contiguous with the same value.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛). Space Complexity: O(1).
Solution: This solution works only if array has positive integers and all the elements in the array are in range
from 1 to 𝑛. The algorithm for this problem involces navigating the array and updating the array for every 𝑖 𝑡ℎ index
as 𝐴[𝑎𝑏𝑠(𝐴[𝑖])] = 𝐴[𝑎𝑏𝑠(𝐴[𝑖])] ∗ −1. If 𝐴[𝑖] is already negative, then it means we are visiting it second time, means it
is repeated.
void PrintRepeatedElementsWithNegationTechnique(int A[], int size) {
int i;
printf("n The repeating elements are");
for(i = 0; i < size; i++) {
if(A[abs(A[i])] > 0)
A[abs(A[i])] = -A[abs(A[i])];
else
printf(" %d ", abs(A[i]));
}
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-25 Similar to Problem-19, let us assume that the numbers are in the range 1 to 𝑛. Also, 𝑛 − 1 elements
are repeated thrice and the remaining element repeated twice. Find the element which repeated twice.
Solution: If we 𝑋𝑂𝑅 all the elements in the array and all integers from 1 to 𝑛, then all the elements which are
repeated thrice will become zero. This is because, since the element is repeating thrice and 𝑋𝑂𝑅 another time from
range makes that element appear four times. As a result, the output of a 𝑋𝑂𝑅 𝑎 𝑋𝑂𝑅 𝑎 𝑋𝑂𝑅 𝑎 = 0. It is the same
case with all elements that are repeated three times.
With the same logic, for the element which repeated twice, if we 𝑋𝑂𝑅 the input elements and also the range, then
the total number of appearances for that element is 3. As a result, the output of 𝑎 𝑋𝑂𝑅 𝑎 𝑋𝑂𝑅 𝑎 = 𝑎. Finally, we get
the element which repeated twice.
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-26 Given an array of 𝑛 elements. Find two elements in the array such that their sum is equal to given
element 𝐾.
Brute Force Solution: One simple solution to this is, for each input element, check whether there is any element
whose sum is 𝐾. This we can solve just by using two simple for loops. The code for this solution can be given as:
void bruteForceSearch[int A[], int n, int K) {
for (int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(A[i]+A[j] == K) {
printf(“Items Found:%d %d”, i, j);
return;
}
}
}
printf(“Items not found: No such elements”);
}
Time Complexity: O(𝑛2 ). This is because of two nested 𝑓𝑜𝑟 loops. Space Complexity: O(1).
Problem-27 For Problem-26, can we improve the time complexity?
Solution: Yes. Let us assume that we have sorted the given array. This operation takes O(𝑛 𝑙𝑜𝑔𝑛). On the sorted
array, maintain indices 𝑙𝑜𝐼𝑛𝑑𝑒𝑥 = 0 and hiIndex = 𝑛 − 1 and compute 𝐴[𝑙𝑜𝐼𝑛𝑑𝑒𝑥] + 𝐴[ℎ𝑖𝐼𝑛𝑑𝑒𝑥]. If the sum equals
𝐾, then we are done with the solution. If the sum is less than 𝐾, decrement ℎ𝑖𝐼𝑛𝑑𝑒𝑥, if the sum is greater than 𝐾,
increment 𝑙𝑜𝐼𝑛𝑑𝑒𝑥.
void search[int A[], int n, int K) {
int loIndex, hiIndex, sum;
sort(A, n);
for(loIndex = 0, hiIndex = n-1; loIndex < hiIndex) {
sum = A[loIndex] + A[hiIndex];
if(sum == K) {
printf("Elements Found: %d %d", loIndex, hiIndex);
return;
}
else if(sum < K)
loIndex = loIndex + 1;
else hiIndex = hiIndex - 1;
11.11 Searching: Problems & Solutions 324
Data Structures and Algorithms Made Easy Searching
}
return;
}
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛). If the given array is already sorted then the complexity is O(𝑛).
Space Complexity: O(1).
Problem-28 Does the solution of Problem-26 work even if the array is not sorted?
Solution: Yes. Since we are checking all possibilities, the algorithm ensures that we get the pair of numbers if
they exist.
Problem-29 Is there any other way of solving Problem-26?
Solution: Yes, using hash table. Since our objective is to find two indexes of the array whose sum is 𝐾. Let us say
those indexes are 𝑋 and 𝑌. That means, 𝐴[𝑋] + 𝐴[𝑌] = 𝐾. What we need is, for each element of the input array
𝐴[𝑋], check whether 𝐾 − 𝐴[𝑋] also exists in the input array. Now, let us simplify that searching with hash table.
Algorithm:
• For each element of the input array, insert it into the hash table. Let us say the current element is 𝐴[𝑋].
• Before proceeding to the next element we check whether 𝐾 – 𝐴[𝑋] also exists in the hash table or not.
• Ther existence of such number indicates that we are able to find the indexes.
• Otherwise proceed to the next input element.
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-30 Given an array A of 𝑛 elements. Find three indices, 𝑖, 𝑗 & 𝑘 such that 𝐴[𝑖]2 + 𝐴[𝑗]2 = 𝐴[𝑘]2 ?
Solution:
Algorithm:
• Sort the given array in-place.
• For each array index 𝑖 compute 𝐴[𝑖]2 and store in array.
• Search for 2 numbers in array from 0 to 𝑖 − 1 which adds to 𝐴[𝑖] similar to Problem-26. This will give us
the result in O(𝑛) time. If we find such a sum, return true, otherwise continue.
sort(A); // Sort the input array
for (int i=0; i < n; i++)
A[i] = A[i]*A[i];
for (i=n; i > 0; i--) {
res = false;
if(res) {
//Problem-11/12 Solution
}
}
Time Complexity: Time for sorting + 𝑛 × (Time for finding the sum) = O(𝑛𝑙𝑜𝑔𝑛) + 𝑛 ×O(𝑛)= 𝑛2 .
Space Complexity: O(1).
Problem-31 Two elements whose sum is closest to zero. Given an array with both positive and negative
numbers, find the two elements such that their sum is closest to zero. For the below array, algorithm should
give −80 and 85. Example: 1 60 − 10 70 − 80 85.
Brute Force Solution: For each element, find the 𝑠𝑢𝑚 with every other element in the array and compare sums.
Finally, return the minimum sum.
void twoElementsWithMinSum(int A[], int n) {
int i, j, min_sum, sum, min_i, min_j, inv_count = 0;
if(n < 2) {
printf("Invalid Input");
return;
}
// Initialization
min_i = 0;
min_j = 1;
min_sum = A[0] + A[1];
for(i= 0; i < n - 1; i ++) {
for(j = i + 1; j < n; j++) {
sum = A[i] + A[j];
if(abs(min_sum) > abs(sum)) {
min_sum = sum;
min_i = i;
min_j = j;
}
}
}
printf(" The two elements are %d and %d", A[min_i], A[min_j]);
}
Time complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-32 Can we improve the time complexity of Problem-31?
Solution: Use Sorting.
Algorithm:
1. Sort all the elements of the given input array.
2. Maintain two indexes, one at the beginning (𝑖 = 0) and the other at the ending (𝑗 = 𝑛 − 1). Also, maintain
two variables to keep track of the smallest positive sum closest to zero and the smallest negative sum
closest to zero.
3. While 𝑖 < 𝑗:
a. If the current pair sum is > zero and < postiveClosest then update the postiveClosest. Decrement
𝑗.
b. If the current pair sum is < zero and > negativeClosest then update the negativeClosest.
Increment 𝑖.
c. Else, print the pair.
void twoElementsWithMinSum(int A[], int n) {
int i = 0, j = n-1, temp, postiveClosest = INT_MAX, negativeClosest = INT_MIN;
sort(A, n);
while(i < j) {
temp = A[i] + A[j];
if(temp > 0) {
if (temp < postiveClosest)
postiveClosest = temp;
j--;
}
else if (temp < 0) {
if (temp > negativeClosest)
negativeClosest = temp;
i++;
}
else printf("Closest Sum: %d ", A[i] + A[j]);
}
return (abs(negativeClosest)> postiveClosest: postiveClosest: negativeClosest);
}
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛), for sorting. Space Complexity: O(1).
Problem-33 Find elements whose sum is closest to given target. Given an array with both positive and
negative numbers, find the two elements such that their sum is closest to given target. Given A = [2, 7, 11,
15], target = 9. Because, A[0] + A[1] = 2 + 7 = 9, return [0, 1].
Brute Force Solution: The brute force approach is simple. Loop through each element 𝐴[𝑖] and find if there is
another value that equals to 𝑡𝑎𝑟𝑔𝑒𝑡 − 𝐴[𝑖].
int* twoSum(int* A, int numsSize, int target, int* returnSize){
int i, j;
int *result = (int*)malloc(2 * sizeof(int));
*returnSize = 2;
for(i=0; i < numsSize; i++){
for(j=i+1; j < numsSize; j++){
if(A[i] + A[j] == target){
result[0] = i;
result[1] = j;
}
}
}
return result;
}
Time complexity: O(𝑛). We traverse the list containing nn elements exactly twice. Since the hash table reduces the
look up time to O(1), the time complexity is O(𝑛).
Space complexity: O(𝑛). The extra space required depends on the number of items stored in the hash table, which
stores exactly 𝑛 elements.
It turns out we can do it in one-pass. While we iterate and inserting elements into the table, we also look back to
check if current element's complement already exists in the table. If it exists, we have found a solution and return
immediately.
class Solution {
public:
vector<int> twoSum(vector<int>& A, int target) {
int i = 0, n = A.size();
unordered_map<int, int> m;
vector<int> ret;
for (i = 0; i < n; ++i){
if (m.find(target - A[i]) != m.end()){
ret.push_back(m[target - A[i]]);
ret.push_back(i);
break;
}
m[A[i]] = i;
}
return ret;
}
};
Time complexity: O(𝑛). We traverse the list containing 𝑛 elements only once. Each look up in the table costs only
O(1) time.
Space complexity: O(𝑛). The extra space required depends on the number of items stored in the hash table, which
stores at most 𝑛 elements.
Problem-36 Given an array of 𝑛 elements. Find three elements in the array such that their sum is equal to
given element 𝐾?
Brute Force Solution: The default solution to this is, for each pair of input elements check whether there is any
element whose sum is 𝐾. This we can solve just by using three simple for loops. The code for this solution can be
given as:
void bruteForceSearch[int A[], int n, int data) {
for (int i = 0; i < n; i++) {
for(int j = i+1; j < n; j++) {
for(int k = j+1; k < n; k++) {
if(A[i] + A[j] + A[k]== data) {
printf(“Items Found:%d %d %d”, i, j, k);
return;
}
}
}
}
printf(“Items not found: No such elements”);
}
Time Complexity: O(𝑛3 ), for three nested 𝑓𝑜𝑟 loops. Space Complexity: O(1).
Problem-37 Does the solution of Problem-36 work even if the array is not sorted?
Solution: Yes. Since we are checking all possibilities, the algorithm ensures that we can find three numbers
whose sum is 𝐾 if they exist.
Problem-38 Can we use the sorting technique for solving Problem-36?
Solution: Yes.
void search[int A[], int n, int data) {
sort(A, n);
for(int k = 0; k < n; k++) {
for(int i = k + 1, j = n-1; i < j; ) {
if(A[k] + A[i] + A[j] == data) {
}
The recursion equation is 𝑇(𝑛) = 2𝑇(𝑛/2) + 𝑐. Using master theorem, we get O(𝑙𝑜𝑔𝑛).
Problem-42 If we don't know 𝑛, how do we solve the Problem-41?
Solution: Repeatedly compute 𝐴[1], 𝐴[2], 𝐴[4], 𝐴[8], 𝐴[16] and so on, until we find a value of 𝑛 such that 𝐴[𝑛] > 0.
Time Complexity: O(𝑙𝑜𝑔𝑛), since we are moving at the rate of 2. Refer to 𝐼𝑛𝑡𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑡𝑜 𝐴𝑛𝑎𝑙𝑦𝑠𝑖𝑠 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠
chapter for details on this.
Problem-43 Given an input array of size unknown with all 1′𝑠 in the beginning and 0′𝑠 in the end. Find the
index in the array from where 0′𝑠 start. Consider there are millions of 1′𝑠 and 0′𝑠 in the array. E.g. array
contents 1111111. . . . . . .1100000. . . . . . . . .0000000.
Solution: This problem is almost similar to Problem-42. Check the bits at the rate of 2𝐾 where 𝑘 = 0, 1, 2 …. Since
we are moving at the rate of 2, the complexity is O(𝑙𝑜𝑔𝑛).
Problem-44 Given a sorted array of 𝑛 integers that has been rotated an unknown number of times, give a
O(𝑙𝑜𝑔𝑛) algorithm that finds an element in the array.
Example: Find 5 in array (15 16 19 20 25 1 3 4 5 7 10 14) Output: 8 (the index of 5 in the array)
Solution: Let us assume that the given array is 𝐴[]and use the solution of Problem-41 with an extension. The
function below 𝐹𝑖𝑛𝑑𝑃𝑖𝑣𝑜𝑡 returns the 𝑘 value (let us assume that this function returns the index instead of the
value). Find the pivot point, divide the array into two sub-arrays and call binary search.
The main idea for finding the pivot point is – for a sorted (in increasing order) and pivoted array, the pivot element
is the only element for which the next element to it is smaller than it. Using the above criteria and the binary
search methodology we can get pivot element in O(𝑙𝑜𝑔𝑛) time.
Algorithm:
1) Find out the pivot point and divide the array into two sub-arrays.
2) Now call binary search for one of the two sub-arrays.
a. if the element is greater than the first element then search in left subarray.
b. else search in right subarray.
3) If element is found in selected sub-array, then return index 𝑒𝑙𝑠𝑒 return −1.
int findPivot(int A[], int start, int finish) {
if(finish - start == 0)
return start;
else if(start == finish - 1) {
if(A[start] >= A[finish])
return start;
else return finish;
}
else {
mid = start + (finish-start)/2;
if(A[start] >= A[mid])
return findPivot(A, start, mid);
else return findPivot(A, mid, finish);
}
}
int search(int A[], int n, int x) {
int pivot = findPivot(A, 0, n-1);
if(A[pivot] == x)
return pivot;
if(A[pivot] <= x)
return BinarySearch(A, 0, pivot-1, x);
else return BinarySearch(A, pivot+1, n-1, x);
}
int BinarySearch(int A[], int low, int high, int x) {
if(high >= low) {
int mid = low + (high - low)/2;
if(x == A[mid])
return mid;
if(x > A[mid])
return binarySearch(A, (mid + 1), high, x);
else return binarySearch(A, low, (mid -1), x);
}
Solution: To find the last occurrence of a number we need to check for the following condition. Return the position
if any one of the following is true:
mid == high && A[mid] == data || A[mid] == data && A[mid+1] > data
int binarySearchLastOccurrence(int A[], int low, int high, int data) {
int mid;
if(high >= low) {
mid = low + (high-low) / 2;
if((mid == high && A[mid] == data) || (A[mid] == data && A[mid + 1] > data))
return mid;
// Give preference to right half of the array
else if(A[mid] <= data)
return binarySearchLastOccurrence (A, mid + 1, high, data);
else return binarySearchLastOccurrence (A, low, mod - 1, data);
}
return -1;
}
Time Complexity: O(𝑙𝑜𝑔𝑛).
Problem-52 Given a sorted array of 𝑛 elements, possibly with duplicates. Find the number of occurrences of
a number.
Brute Force Solution: Do a linear search of the array and increment count as and when we find the element
data in the array.
int LinearSearchCount(int A[], int n, int data) {
int count = 0;
for (int i = 0; i < n; i++)
if(A[i] == data)
count++;
return count;
}
Time Complexity: O(𝑛).
Problem-53 Can we improve the time complexity of Problem-52?
Solution: Yes. We can solve this by using one binary search call followed by another small scan.
Algorithm:
• Do a binary search for the 𝑑𝑎𝑡𝑎 in the array. Let us assume its position is 𝐾.
• Now traverse towards the left from K and count the number of occurrences of 𝑑𝑎𝑡𝑎. Let this count be
𝑙𝑒𝑓𝑡𝐶𝑜𝑢𝑛𝑡.
• Similarly, traverse towards right and count the number of occurrences of 𝑑𝑎𝑡𝑎. Let this count be
𝑟𝑖𝑔ℎ𝑡𝐶𝑜𝑢𝑛𝑡.
• Total number of occurrences = 𝑙𝑒𝑓𝑡𝐶𝑜𝑢𝑛𝑡 + 1 + 𝑟𝑖𝑔ℎ𝑡𝐶𝑜𝑢𝑛𝑡
Time Complexity – O(𝑙𝑜𝑔𝑛 + 𝑆) where 𝑆 is the number of occurrences of 𝑑𝑎𝑡𝑎.
Problem-54 Is there any alternative way of solving Problem-52?
Solution:
Algorithm:
• Find first occurrence of 𝑑𝑎𝑡𝑎 and call its index as 𝑓𝑖𝑟𝑠𝑡O𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 (for algorithm refer to Problem-50)
• Find last occurrence of 𝑑𝑎𝑡𝑎 and call its index as 𝑙𝑎𝑠𝑡𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 (for algorithm refer to Problem-51)
• Return 𝑙𝑎𝑠𝑡𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 – 𝑓𝑖𝑟𝑠𝑡𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 + 1
Time Complexity = O(𝑙𝑜𝑔𝑛 + 𝑙𝑜𝑔𝑛) = O(𝑙𝑜𝑔𝑛).
Problem-55 What is the next number in the sequence 1, 11, 21, and why?
Solution: Read the given number loudly. This is just a fun problem.
One One
Two Ones
One two, one one→ 1211
So the answer is: the next number is the representation of the previous number by reading it loudly.
Problem-56 Finding the second smallest number efficiently.
Solution: We can construct a heap of the given elements using up just less than 𝑛 comparisons (Refer to the
𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 chapter for the algorithm). Then we find the second smallest using 𝑙𝑜𝑔𝑛 comparisons for the
getMax() operation. Overall, we get 𝑛 + 𝑙𝑜𝑔𝑛 + 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡.
Problem-57 Is there any other solution for Problem-56?
Solution: Alternatively, split the 𝑛 numbers into groups of 2, perform 𝑛/2 comparisons successively to find the
largest, using a tournament-like method. The first round will yield the maximum in 𝑛 − 1 comparisons. The
second round will be performed on the winners of the first round and the ones that the maximum popped. This
will yield 𝑙𝑜𝑔𝑛 − 1 comparison for a total of 𝑛 + 𝑙𝑜𝑔𝑛 − 2. The above solution is called the 𝑡𝑜𝑢𝑟𝑛𝑎𝑚𝑒𝑛𝑡 𝑝𝑟𝑜𝑏𝑙𝑒𝑚.
Problem-58 An element is a majority if it appears more than 𝑛/2 times. Give an algorithm that takes an array
of 𝑛 element as argument and identifies a majority (if it exists).
Solution: The basic solution is to have two loops and keep track of the maximum count for all different elements.
If the maximum count becomes greater than 𝑛/2, then break the loops and return the element having maximum
count. If maximum count doesn’t become more than 𝑛/2, then the majority element doesn’t exist.
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-59 Can we improve Problem-58 time complexity to O(𝑛𝑙𝑜𝑔𝑛)?
Solution: Using binary search we can achieve this. Node of the Binary Search Tree (used in this approach) will be
as follows.
struct TreeNode {
int element;
int count;
struct TreeNode *left;
struct TreeNode *right;
} BST;
Insert elements in BST one by one and if an element is already present then increment the count of the node. At
any stage, if the count of a node becomes more than 𝑛/2, then return. This method works well for the cases where
𝑛/2 + 1 occurrences of the majority element are present at the start of the array, for example {1, 1, 1, 1, 1, 2, 3, and
4}.
Time Complexity: If a binary search tree is used then worst time complexity will be O(𝑛2 ). If a balanced-binary-
search tree is used then O(𝑛𝑙𝑜𝑔𝑛). Space Complexity: O(𝑛).
Problem-60 Is there any other of way of achieving O(𝑛𝑙𝑜𝑔𝑛) complexity for Problem-58?
Solution: Sort the input array and scan the sorted array to find the majority element.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛). Space Complexity: O(1).
Problem-61 Can we improve the complexity for Problem-58?
Solution: If an element occurs more than 𝑛/2 times in 𝐴 then it must be the median of 𝐴. But, the reverse is not
true, so once the median is found, we must check to see how many times it occurs in 𝐴. We can use linear selection
which takes O(𝑛) time (for algorithm, refer to 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 chapter).
int CheckMajority(int A[], in n) {
1) Use linear selection to find the median 𝑚 of 𝐴.
2) Do one more pass through 𝐴 and count the number of occurrences of 𝑚.
a. If 𝑚 occurs more than 𝑛/2 times then return true;
b. Otherwise return false.
}
Problem-62 Is there any other way of solving Problem-58?
Solution: We can find the majority element using linear time and constant space using Boyer–Moore majority vote
algorithm. Since only one element is repeating, we can use a simple scan of the input array by keeping track of
the count for the elements. If the count is 0, then we can assume that the element visited for the first time
otherwise that the resultant element.
The algorithm can be expressed in pseudocode as the following steps. The algorithm processes each element of
the sequence, one at a time. While processing an element:
• If the counter is 0, we set the current candidate to element and we set the counter to 1.
• If the counter is not 0, we increment or decrement the counter according to whether element is the current
candidate.
At the end of this process, if the sequence has a majority, it will be the element stored by the algorithm. If there
is no majority element, the algorithm will not detect that fact, and will still output one of the elements. We can
modify the algorithm to verify that the element found is really is a majority element or not.
int MajorityNum(int[] A, int n) {
int count = 0, element = -1;
for(int i = 0; i < n; i++) {
// If the counter is 0 then set the current candidate to majority num and set the counter to 1.
if(count == 0) {
element = A[i];
count = 1;
}
else if(element == A[i]) {
// Increment counter If the counter is not 0 and element is same as current candidate.
count++;
}
else {
// Decrement counter If the counter is not 0 and element is different from current candidate.
count--;
}
}
return element;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-63 Given an array of 2𝑛 elements of which 𝑛 elements are the same and the remaining 𝑛 elements
are all different. Find the majority element.
Solution: The repeated elements will occupy half the array. No matter what arrangement it is, only one of the
below will be true:
• All duplicate elements will be at a relative distance of 2 from each other. Ex: 𝐧, 1, 𝐧, 100, 𝐧, 54, 𝐧 ...
• At least two duplicate elements will be next to each other.
Ex: 𝑛, 𝑛, 1, 100, 𝑛, 54, 𝑛, . . . .
𝑛, 1, 𝑛, 𝑛, 𝑛, 54, 100 . . .
1, 100, 54, 𝑛, 𝑛, 𝑛, 𝑛. . . .
In worst case, we will need two passes over the array:
• First Pass: compare 𝐴[𝑖] and 𝐴[𝑖 + 1]
• Second Pass: compare 𝐴[𝑖] and 𝐴[𝑖 + 2]
Something will match and that's your element. This will cost O(𝑛) in time and O(1) in space.
Problem-64 Given an array with 2𝑛 + 1 integer elements, 𝑛 elements appear twice in arbitrary places in the
array and a single integer appears only once somewhere inside. Find the lonely integer with O(𝑛) operations
and O(1) extra memory.
Solution: Except for one element, all elements are repeated. We know that 𝐴 𝑋O𝑅 𝐴 = 0. Based on this if we 𝑋𝑂𝑅
all the input elements then we get the remaining element.
int Solution(int* A) {
int i, res;
for (i = res = 0; i < 2n+1; i++)
res = res ^ A[i];
return res;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-65 Throwing eggs from an n-story building: Suppose we have an 𝑛 story building and a number of
eggs. Also assume that an egg breaks if it is thrown from floor 𝐹 or higher, and will not break otherwise. Devise
a strategy to determine floor 𝐹, while breaking O(𝑙𝑜𝑔𝑛) eggs.
Solution: Refer to 𝐷𝑖𝑣𝑖𝑑𝑒 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 chapter.
Problem-66 Local minimum of an array: Given an array 𝐴 of 𝑛 distinct integers, design an O(𝑙𝑜𝑔𝑛) algorithm
to find a 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑢𝑚: an index 𝑖 such that 𝐴[𝑖 − 1] < 𝐴[𝑖] < 𝐴[𝑖 + 1].
Solution: Check the middle value 𝐴[𝑛/2], and two neighbors 𝐴[𝑛/2 − 1] and 𝐴[𝑛/2 + 1]. If 𝐴[𝑛/2] is local minimum,
stop; otherwise search in half with smaller neighbor.
Problem-67 Give an 𝑛 × 𝑛 array of elements such that each row is in ascending order and each column is in
ascending order, devise an O(𝑛) algorithm to determine if a given element 𝑥 is in the array. You may assume
all elements in the 𝑛 × 𝑛 array are distinct.
Solution: Let us assume that the given matrix is 𝐴[𝑛][𝑛]. Start with the last row, first column [or first row, last
column]. If the element we are searching for is greater than the element at 𝐴[1][𝑛], then the first column can be
eliminated. If the search element is less than the element at 𝐴[1][𝑛], then the last row can be completely eliminated.
Once the first column or the last row is eliminated, start the process again with the left-bottom end of the
remaining array. In this algorithm, there would be maximum 𝑛 elements that the search element would be
compared with.
Time Complexity: O(𝑛). This is because we will traverse at most 2𝑛 points. Space Complexity: O(1).
Problem-68 Given an 𝑛 × 𝑛 array a of 𝑛2 numbers, give an O(𝑛) algorithm to find a pair of indices 𝑖 and 𝑗 such
that 𝐴[𝑖][𝑗] < 𝐴[𝑖 + 1][𝑗], 𝐴[𝑖][𝑗] < 𝐴[𝑖][𝑗 + 1], 𝐴[𝑖][𝑗] < 𝐴[𝑖 − 1][𝑗], and 𝐴[𝑖][𝑗] < 𝐴[𝑖][𝑗 − 1].
Solution: This problem is the same as Problem-67.
Problem-69 Given 𝑛 × 𝑛 matrix, and in each row all 1’s are followed by 0’𝑠. Find the row with the maximum
number of 0’𝑠.
Solution: Start with first row, last column. If the element is 0 then move to the previous column in the same row
and at the same time increase the counter to indicate the maximum number of 0’𝑠. If the element is 1 then move
to the next row in the same column. Repeat this process until your reach last row, first column.
Time Complexity: O(2𝑛) ≈O(𝑛) (similar to Problem-67).
Problem-70 Given an input array of size unknown, with all numbers in the beginning and special symbols in
the end. Find the index in the array from where the special symbols start.
Solution: Refer to 𝐷𝑖𝑣𝑖𝑑𝑒 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 chapter.
Problem-71 Separate even and odd numbers: Given an array 𝐴[ ], write a function that segregates even and
odd numbers. The functions should put all even numbers first, and then odd numbers. Example: Input =
{12, 34, 45, 9, 8, 90, 3} Output = {12, 34, 90, 8, 9, 45, 3}
Note: In the output, the order of numbers can be changed, i.e., in the above example 34 can come before 12,
and 3 can come before 9.
Solution: The problem is very similar to 𝑆𝑒𝑝𝑎𝑟𝑎𝑡𝑒 0’𝑠 𝑎𝑛𝑑 1’𝑠 (Problem-72) in an array, and both problems are
variations of the famous 𝐷𝑢𝑡𝑐ℎ 𝑛𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝑓𝑙𝑎𝑔 𝑝𝑟𝑜𝑏𝑙𝑒𝑚.
Algorithm: The logic is similar to Quick sort.
1) Initialize two index variables left and right: 𝑙𝑒𝑓𝑡 = 0, 𝑟𝑖𝑔ℎ𝑡 = 𝑛 − 1
2) Keep incrementing the left index until you see an odd number.
3) Keep decrementing the right index until youe see an even number.
4) If 𝑙𝑒𝑓𝑡 < 𝑟𝑖𝑔ℎ𝑡 then swap 𝐴[𝑙𝑒𝑓𝑡] and 𝐴[𝑟𝑖𝑔ℎ𝑡]
void DutchNationalFlag(int A[], int n) {
int left = 0, right = n-1;
while(left < right) {
// Increment left index while we see 0 at left
while(A[left]%2 == 0 && left < right)
left++;
// Decrement right index while we see 1 at right
while(A[right]%2 == 1 && left < right)
right--;
if(left < right) {
// Swap A[left] and A[right]
swap(&A[left], &A[right]);
left++;
right--;
}
}
}
Time Complexity: O(𝑛).
Problem-72 The following is another way of structuring Problem-71, but with a slight difference.
Separate 0’s and 1’s in an array: We are given an array of 0’s and 1’s in random order. Separate 0’s on the
left side and 1’s on the right side of the array. Traverse the array only once.
Input array = [0, 1, 0, 1, 0, 0, 1, 1, 1, 0] Output array = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Problem-76 Given an array A[], find the maximum j – i such that A[j] > A[i]. For example, Input: {34, 8, 10, 3,
2, 80, 30, 33, 1} and Output: 6 (j = 7, i = 1).
Solution: Brute Force Approach: Run two loops. In the outer loop, pick elements one by one from the left. In the
inner loop, compare the picked element with the elements starting from the right side. Stop the inner loop when
you see an element greater than the picked element and keep updating the maximum j-i so far.
int maxIndexDiff(int A[], int n){
int maxDiff = -1;
int i, j;
for (i = 0; i < n; ++i){
for (j = n-1; j > i; --j){
if(A[j] > A[i] && maxDiff < (j - i))
maxDiff = j - i;
}
}
return maxDiff;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-77 Can we improve the complexity of Problem-76?
Solution: To solve this problem, we need to get two optimum indexes of A[]: left index 𝑖 and right index 𝑗. For an
element A[i], we do not need to consider A[i] for the left index if there is an element smaller than A[i] on the left
side of A[i]. Similarly, if there is a greater element on the right side of A[j] then we do not need to consider this j
for the right index.
So we construct two auxiliary Arrays LeftMins[] and RightMaxs[] such that LeftMins[i] holds the smallest element
on the left side of A[i] including A[i], and RightMaxs[j] holds the greatest element on the right side of A[j] including
A[j]. After constructing these two auxiliary arrays, we traverse both these arrays from left to right.
While traversing LeftMins[] and RightMaxs[], if we see that LeftMins[i] is greater than RightMaxs[j], then we must
move ahead in LeftMins[] (or do i++) because all elements on the left of LeftMins[i] are greater than or equal to
LeftMins[i]. Otherwise we must move ahead in RightMaxs[j] to look for a greater 𝑗 – 𝑖 value.
int maxIndexDiff(int A[], int n){
int maxDiff, i, j;
int *LeftMins = (int *)malloc(sizeof(int)*n);
int *RightMaxs = (int *)malloc(sizeof(int)*n);
LeftMins[0] = A[0];
for (i = 1; i < n; ++i)
LeftMins[i] = min(A[i], LeftMins[i-1]);
RightMaxs[n-1] = A[n-1];
for (j = n-2; j >= 0; --j)
RightMaxs[j] = max(A[j], RightMaxs[j+1]);
i = 0, j = 0, maxDiff = -1;
while (j < n && i < n){
if (LeftMins[i] < RightMaxs[j]){
maxDiff = max(maxDiff, j-i);
j = j + 1;
}
else
i = i+1;
}
return maxDiff;
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-78 Given an array of elements, how do you check whether the list is pairwise sorted or not? A list is
considered pairwise sorted if each successive pair of numbers is in sorted (non-decreasing) order.
Solution:
int checkPairwiseSorted(int A[], int n) {
if (n == 0 || n == 1)
return 1;
for (int i = 0; i < n - 1; i += 2){
if (A[i] > A[i+1])
return 0;
11.11 Searching: Problems & Solutions 339
Data Structures and Algorithms Made Easy Searching
}
return 1;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-79 Given an array of 𝑛 elements, how do you print the frequencies of elements without using extra
space. Assume all elements are positive, editable and less than 𝑛.
Solution: Use 𝑛𝑒𝑔𝑎𝑡𝑖𝑜𝑛 technique.
void frequencyCounter(int A[],int n){
int pos = 0;
while(pos < n){
int expectedPos = A[pos] - 1;
if(A[pos] > 0 && A[expectedPos] > 0){
swap(A[pos], A[expectedPos]);
A[expectedPos] = -1;
}
else if(A[pos] > 0){
A[expectedPos] --;
A[pos ++] = 0;
}
else{
pos ++;
}
}
for(int i = 0; i < n; ++i){
printf("%d frequency is %d\n", i + 1 ,abs(A[i]));
}
}
int main(int argc, char* argv[]){
int A[] = {10, 10, 9, 4, 7, 6, 5, 2, 3, 2, 1};
frequencyCounter(A, sizeof(A)/ sizeof(A[0]));
return 0;
}
Array should have numbers in the range [1, 𝑛] (where 𝑛 is the size of the array). The if condition (A[𝑝𝑜𝑠] > 0 &&
A[𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑃𝑜𝑠] > 0) means that both the numbers at indices 𝑝𝑜𝑠 and 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑃𝑜𝑠 are actual numbers in the array
but not their frequencies. So we will swap them so that the number at the index 𝑝𝑜𝑠 will go to the position where
it should have been if the numbers 1, 2, 3, ...., 𝑛 are kept in 0, 1, 2, ..., 𝑛 − 1 indices. In the above example input
array, initially 𝑝𝑜𝑠 = 0, so 10 at index 0 will go to index 9 after the swap. As this is the first occurrence of 10, make
it to -1. Note that we are storing the frequencies as negative numbers to differentiate between actual numbers and
frequencies.
The else if condition (A[𝑝𝑜𝑠] > 0) means A[𝑝𝑜𝑠] is a number and A[𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑃𝑜𝑠] is its frequency without including
the occurrence of A[𝑝𝑜𝑠]. So increment the frequency by 1 (that is decrement by 1 in terms of negative numbers).
As we count its occurrence we need to move to next pos, so 𝑝𝑜𝑠 + +, but before moving to that next position we
should make the frequency of the number 𝑝𝑜𝑠 + 1 which corresponds to index 𝑝𝑜𝑠 of zero, since such a number
has not yet occurred.
The final else part means the current index pos already has the frequency of the number 𝑝𝑜𝑠 + 1, so move to the
next 𝑝𝑜𝑠, hence 𝑝𝑜𝑠 + +.
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-80 Which is faster and by how much, a linear search of only 1000 elements on a 5-GHz computer or
a binary search of 1 million elements on a 1-GHz computer. Assume that the execution of each instruction on
the 5-GHz computer is five times faster than on the 1-GHz computer and that each iteration of the linear
search algorithm is twice as fast as each iteration of the binary search algorithm.
Solution: A binary search of 1 million elements would require 𝑙𝑜𝑔21,000,000 or about 20 iterations at most (i.e., worst
case). A linear search of 1000 elements would require 500 iterations on the average (i.e., going halfway through
500
the array). Therefore, binary search would be 20 = 25 faster (in terms of iterations) than linear search. However,
25
since linear search iterations are twice as fast, binary search would be 2 or about 12 times faster than linear
search overall, on the same machine. Since we run them on different machines, where an instruction on the 5-
12
GhZ machine is 5 times faster than an instruction on a 1-GHz machine, binary search would be 5 or about 2
times faster than linear search! The key idea is that software improvements can make an algorithm run much
faster without having to use more powerful software.
Problem-81 Given an array of integers, give an algorithm that returns the 𝑝𝑖𝑣𝑜𝑡 index of this array. 𝑃𝑖𝑣𝑜𝑡 index
is the index where the sum of the numbers to the left of the index is equal to the sum of the numbers to the
right of the index. If no such index exists, we should return -1. If there are multiple pivot indexes, you should
return the left-most pivot index.
Example 1: Input: A = [1, 8, 4, 7, 6, 7], Output: 3
Explanation: The sum of the numbers to the left of index 3 (A[3] = 7) is equal to the sum of numbers to the right
of index 3. Also, 3 is the first index where this occurs.
Example 2: Input: A = [2, 3, 4], Output: -1
Explanation: There is no index that satisfies the conditions in the problem statement.
Solution: We need to quickly compute the sum of values to the left and the right of every index. Let's say we knew
𝑡𝑜𝑡𝑎𝑙𝑆𝑢𝑚 as the sum of the numbers, and we are at index 𝑖. If we knew the sum of numbers leftsum that are to
the left of index 𝑖, then the other sum to the right of the index would just be 𝑡𝑜𝑡𝑎𝑙𝑆𝑢𝑚 − 𝐴[𝑖] − 𝑙𝑒𝑓𝑡𝑠𝑢𝑚.
As such, we only need to know about 𝑙𝑒𝑓𝑡𝑠𝑢𝑚 to check whether an index is a pivot index in constant time. Let's
do that: as we iterate through candidate indexes i, we will maintain the correct value of leftsum.
class PivotIndex {
public:
int pivotIndex(vector<int>& A) {
int sum= accumulate(A.begin(),A.end(),0);
int ls=0;
for(int i=0;i<A.size();i++)
{
sum-=A[i];
if(sum==ls) {return i; exit(0);}
ls+=A[i];
}
return -1;
}
};
Time Complexity: O(𝑛), where 𝑛 is the length of array A. Space Complexity: O(1), the space used by 𝑙𝑒𝑓𝑡𝑠𝑢𝑚 and
𝑡𝑜𝑡𝑎𝑙𝑆𝑢𝑚.
Problem-82 Given two strings s and t which consist of only lowercase letters. String t is generated by random
shuffling string s and then add one more letter at a random position. Find the letter that was added in t.
Example Input: s = "abcd" t = "abcde" Output: e
Explanation: 'e' is the letter that was added.
Solution: Refer Other Programming Questions section in Hacks on Bitwise Programming chapter.
Chapter
Selection
Algorithms
[Medians]
12
12.1 What are Selection Algorithms?
𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 is an algorithm for finding the 𝑘 𝑡ℎ smallest/largest number in a list (also called as 𝑘 𝑡ℎ order
statistic). This includes finding the minimum, maximum, and median elements. For findingthe 𝑘 𝑡ℎ order statistic,
there are multiple solutions which provide different complexities, and in this chapter we will enumerate those
possibilities.
18 16
12 15
18
15
18
Problem-6 Find the 𝑘-smallest elements in an array 𝑆 of 𝑛 elements using the partitioning method.
Solution: Brute Force Approach: Scan through the numbers 𝑘 times to have the desired element. This method
is the one used in bubble sort (and selection sort), every time we find out the smallest element in the whole
sequence by comparing every element. In this method, the sequence has to be traversed 𝑘 times. So the complexity
is O(𝑛 × 𝑘).
Problem-7 Can we use the sorting technique for solving Problem-6?
Solution: Yes. Sort and take the first 𝑘 elements.
1. Sort the numbers.
2. Pick the first 𝑘 elements.
The time complexity calculation is trivial. Sorting of 𝑛 numbers is of O(𝑛𝑙𝑜𝑔𝑛) and picking 𝑘 elements is of O(𝑘).
The total complexity is O(𝑛𝑙𝑜𝑔𝑛 + 𝑘) = O(𝑛𝑙𝑜𝑔𝑛).
Problem-8 Can we use the 𝑡𝑟𝑒𝑒 𝑠𝑜𝑟𝑡𝑖𝑛𝑔 technique for solving Problem-6?
Solution: Yes.
1. Insert all the elements in a binary search tree.
2. Do an InOrder traversal and print 𝑘 elements which will be the smallest ones. So, we have the 𝑘 smallest
elements.
The cost of creation of a binary search tree with 𝑛 elements is O(𝑛𝑙𝑜𝑔𝑛) and the traversal up to 𝑘 elements is O(𝑘).
Hence the complexity is O(𝑛𝑙𝑜𝑔𝑛 + 𝑘) = O(𝑛𝑙𝑜𝑔𝑛).
Disadvantage: If the numbers are sorted in descending order, we will be getting a tree which will be skewed
𝑛(𝑛−1)
towards the left. In that case, the construction of the tree will be 0 + 1 + 2 + . . . + (𝑛 − 1) = 2 which is O(𝑛2 ).
To escape from this, we can keep the tree balanced, so that the cost of constructing the tree will be only 𝑛𝑙𝑜𝑔𝑛.
Problem-9 Can we improve the 𝑡𝑟𝑒𝑒 𝑠𝑜𝑟𝑡𝑖𝑛𝑔 technique for solving Problem-6?
Solution: Yes. Use a smaller tree to give the same result.
1. Take the first 𝑘 elements of the sequence to create a balanced tree of 𝑘 nodes (this will cost 𝑘𝑙𝑜𝑔𝑘).
2. Take the remaining numbers one by one, and
a. If the number is larger than the largest element of the tree, return.
b. If the number is smaller than the largest element of the tree, remove the largest element of the
tree and add the new element. This step is to make sure that a smaller element replaces a larger
element from the tree. And of course the cost of this operation is 𝑙𝑜𝑔𝑘 since the tree is a balanced
tree of 𝑘 elements.
Once Step 2 is over, the balanced tree with 𝑘 elements will have the smallest 𝑘 elements. The only remaining task
is to print out the largest element of the tree.
Time Complexity:
1. For the first 𝑘 elements, we make the tree. Hence the cost is 𝑘𝑙𝑜𝑔𝑘.
2. For the rest 𝑛 − 𝑘 elements, the complexity is O(𝑙𝑜𝑔𝑘).
Step 2 has a complexity of (𝑛 − 𝑘) 𝑙𝑜𝑔𝑘. The total cost is 𝑘𝑙𝑜𝑔𝑘 + (𝑛 − 𝑘) 𝑙𝑜𝑔𝑘 = 𝑛𝑙𝑜𝑔𝑘 which is O(𝑛𝑙𝑜𝑔𝑘). This
bound is actually better than the ones provided earlier.
Problem-10 Can we use the partitioning technique for solving Problem-6?
Solution: Yes.
Algorithm
1. Choose a pivot from the array.
2. Partition the array so that: 𝐴[𝑙𝑜𝑤. . . 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 − 1] <= 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 <= 𝐴[𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 + 1. . ℎ𝑖𝑔ℎ].
3. if 𝑘 < 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 then it must be on the left of the pivot, so do the same method recursively on the left
part.
4. if 𝑘 = 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 then it must be the pivot and print all the elements from 𝑙𝑜𝑤 to 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡.
5. if 𝑘 > 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡 then it must be on the right of pivot, so do the same method recursively on the right
part.
The top-level call would be kthSmallest = Selection(1, n, k).
int selection (int low, int high, int k) {
int pivotpoint;
if(low == high)
return S[low];
else {
pivotpoint = partition(low, high);
if(k == pivotpoint)
return S[pivotpoint]; //we can print all the elements from 𝑙𝑜𝑤 to 𝑝𝑖𝑣𝑜𝑡𝑝𝑜𝑖𝑛𝑡.
else if(k < pivotpoint)
return selection (low, pivotpoint - 1, k);
else return selection (pivotpoint + 1, high, k);
}
}
void partition (int low, int high) {
int i, j, pivotitem;
pivotitem = S[low];
j = low;
for (i = low + 1; i <= high; i++)
if(S[i] < pivotitem) {
j++;
Swap S[i] and S[j];
}
pivotpoint = j;
Swap S[low] and S[pivotpoint];
return pivotpoint;
}
Time Complexity: O(𝑛2 ) in worst case as similar to Quicksort. Although the worst case is the same as that of
Quicksort, this performs much better on the average [O(𝑛𝑙𝑜𝑔𝑘) − Average case].
Problem-11 Find the 𝑘 𝑡ℎ -smallest element in an array 𝑆 of 𝑛 elements in the best possible way.
Solution: This problem is similar to Problem-6 and all the solutions discussed for Problem-6 are valid for this
problem. The only difference is that instead of printing all the 𝑘 elements, we print only the 𝑘 𝑡ℎ element. We can
improve the solution by using the 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛𝑠 algorithm. Median is a special case of the selection algorithm.
The algorithm Selection(A, 𝑘) to find the 𝑘 𝑡ℎ smallest element from set 𝐴 of 𝑛 elements is as follows:
Algorithm: 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛(A, k)
𝑙𝑒𝑛𝑔𝑡ℎ(𝐴)
1. Partition 𝐴 into 𝑐𝑒𝑖𝑙 ( ) groups, with each group having five items (the last group may have fewer
5
items).
2. Sort each group separately (e.g., insertion sort).
𝑛
3. Find the median of each of the 5 groups and store them in some array (let us say 𝐴′ ).
4. Use 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 recursively to find the median of 𝐴′ (median of medians). Let us asay the median of medians
is 𝑚.
12.6 Selection Algorithms: Problems & Solutions 345
Data Structures and Algorithms Made Easy Selection Algorithms [Medians]
𝑙𝑒𝑛𝑔𝑡ℎ(𝐴)
𝑚 = 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛(𝐴′ , 25 );
5. Let 𝑞 = # elements of 𝐴 smaller than 𝑚;
6. If(𝑘 == 𝑞 + 1)
return 𝑚;
/* Partition with pivot */
7. Else partition 𝐴 into 𝑋 and 𝑌
• 𝑋 = {items smaller than 𝑚}
• 𝑌 = {items larger than 𝑚}
/* Next,form a subproblem */
8. If(𝑘 < 𝑞 + 1)
return Selection(X, k);
9. Else
return Selection(Y, k – (q+1));
Before developing recurrence, let us consider the representation of the input below. In the figure, each circle is an
element and each column is grouped with 5 elements. The black circles indicate the median in each group of 5
elements. As discussed, sort each column using constant time insertion sort.
Medians
Median of medians
Items>= Gray
In the figure above the gray circled item is the median of medians (let us call this 𝑚). It can be seen that at least
1/2 of 5 element group medians £ 𝑚. Also, these 1/2 of 5 element groups contribute 3 elements that are ≤ 𝑚 except
2 groups [last group which may contain fewer than 5 elements, and other group which contains 𝑚]. Similarly, at
least 1/2 of 5 element groups contribute 3 elements that are ≥ 𝑚 as shown above. 1/2 of 5 element groups
1 𝑛 3𝑛 3𝑛 7𝑛 7𝑛
contribute 3 elements, except 2 groups gives: 3(2 5 -2) ≈ 10 − 6. The remaining are 𝑛 − ( 10 − 6) ≈ 10 + 6. Since 10 +
3𝑛 7𝑛
6 is greater than − 6 we need to consider + 6 for worst.
10 10
Components in recurrence:
• In our selection algorithm, we choose 𝑚, which is the median of medians, to be a pivot, and partition A into
two sets 𝑋 and 𝑌. We need to select the set which gives maximum size (to get the worst case).
• The time in function 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 when called from procedure 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛. The number of keys in the input to this
𝑛
call to 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 is 5 .
• The number of comparisons required to partition the array. This number is 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆), let us say 𝑛.
𝑛
We have established the following recurrence: 𝑇(𝑛) = 𝑇 (5 ) + Θ(𝑛) + 𝑀𝑎𝑥{𝑇(𝑋), 𝑇(𝑌)}
From the above discussion we have seen that, if we select median of medians m as pivot, the partition sizes are:
3𝑛 7𝑛
− 6 and 10 + 6. If we select the maximum of these, then we get:
10
𝑛 7𝑛
𝑇(𝑛) = 𝑇 ( 5 ) + Θ(𝑛) + 𝑇 ( 10 + 6)
𝑛 7𝑛
≈ 𝑇 ( 5 ) + Θ(𝑛) + 𝑇 ( 10 ) + O(1)
7𝑛 𝑛
≤ 𝑐 10 + 𝑐 5 + Θ(𝑛) + O(1)
Finally, 𝑇(𝑛) = Θ(𝑛).
Problem-12 In Problem-11, we divided the input array into groups of 5 elements. The constant 5 play an
important part in the analysis. Can we divide in groups of 3 which work in linear time?
Solution: In this case the modification causes the routine to take more than linear time. In the worst case, at
𝑛
least half of the 3 medians found in the grouping step are greater than the median of medians 𝑚, but two of
those groups contribute less than two elements larger than 𝑚. So as an upper bound, the number of elements
larger than the pivotpoint is at least:
1 𝑛 𝑛
2(23 −2) ≥ −4
3
𝑛 2𝑛
Likewise this is a lower bound. Thus up to 𝑛 − (3 − 4) = + 4 elements are fed into the recursive call to 𝑆𝑒𝑙𝑒𝑐𝑡. The
3
𝑛
recursive step that finds the median of medians runs on a problem of size 3 , and consequently the time
recurrence is:
𝒏
𝑇(𝑛) = 𝑇(𝟑)+𝑇(2𝑛/3 + 4) +Θ(𝑛).
2𝑛 2𝑛 𝑛
Assuming that 𝑇(𝑛) is monotonically increasing, we may conclude that 𝑇( 3 + 4) ≥ 𝑇( 3 ) ≥ 2𝑇(3 ), and we can say
𝑛
the upper bound for this as 𝑇(𝑛) ≥ 3𝑇(3 ) + Θ(𝑛), which is O(𝑛𝑙𝑜𝑔𝑛). Therefore, we cannot select 3 as the group
size.
Problem-13 As in Problem-12, can we use groups of size 7?
Solution: Following a similar reasoning, we once more modify the routine, now using groups of 7 instead of 5. In
𝑛
the worst case, at least half the 7 medians found in the grouping step are greater than the median of medians
𝑚, but two of those groups contribute less than four elements larger than 𝑚. So as an upper bound, the number
of elements larger than the pivotpoint is at least:
2𝑛
4( 1/2 𝑛/7 -2) ≥ − 8.
7
2𝑛 5𝑛
Likewise this is a lower bound. Thus up to 𝑛 − ( 7 − 8) = + 8 elements are fed into the recursive call to Select.
7
𝑛
The recursive step that finds the median of medians runs on a problem of size 7 , and consequently the time
recurrence is
𝑛 5𝑛
𝑇(𝑛) = 𝑇( 7 ) + 𝑇( 7 + 8 ) + O(𝑛)
𝒏 5𝑛
𝑇(𝑛) ≤ 𝑐 𝟕 + 𝑐( 7 + 8) + O(𝑛)
𝑛 5𝑛
≤𝑐 + 𝑐 + 8𝑐 + 𝑎𝑛, 𝑎 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
7 7
𝑛
= 𝑐𝑛 − 𝑐 + 𝑎𝑛 + 9𝑐
7
𝑛
= (𝑎 + 𝑐) 𝑛 − (𝑐 − 9𝑐).
7
𝑛
This is bounded above by (𝑎 + 𝑐) 𝑛 provided that 𝑐 7 − 9𝑐 ≥ 0. Therefore, we can select 7 as the group size.
Problem-14 Given two arrays each containing 𝑛 sorted elements, give an O(𝑙𝑜𝑔𝑛)-time algorithm to find the
median of all 2𝑛 elements.
Solution: The simple solution to this problem is to merge the two lists and then take the average of the middle
two elements (note the union always contains an even number of values). But, the merge would be Θ(𝑛), so that
doesn't satisfy the problem statement. To get 𝑙𝑜𝑔𝑛 complexity, let 𝑚𝑒𝑑𝑖𝑎𝑛𝐴 and 𝑚𝑒𝑑𝑖𝑎𝑛𝐵 be the medians of the
respective lists (which can be easily found since both lists are sorted). If 𝑚𝑒𝑑𝑖𝑎𝑛𝐴 == 𝑚𝑒𝑑𝑖𝑎𝑛𝐵, then that is the
overall median of the union and we are done. Otherwise, the median of the union must be between 𝑚𝑒𝑑𝑖𝑎𝑛𝐴 and
𝑚𝑒𝑑𝑖𝑎𝑛𝐵. Suppose that 𝑚𝑒𝑑𝑖𝑎𝑛𝐴 < 𝑚𝑒𝑑𝑖𝑎𝑛𝐵 (the opposite case is entirely similar). Then we need to find the median
of the union of the following two sets:
{𝑥 𝑖𝑛 𝐴 | 𝑥 >= 𝑚𝑒𝑑𝑖𝑎𝑛𝐴} {𝑥 𝑖𝑛 𝐵 | 𝑥 <= 𝑚𝑒𝑑𝑖𝑎𝑛𝐵}
So, we can do this recursively by resetting the 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑖𝑒𝑠 of the two arrays. The algorithm tracks both arrays
(which are sorted) using two indices. These indices are used to access and compare the median of both arrays to
find where the overall median lies.
void findMedian(int A[], int alo , int ahi, int B[], int blo int bhi) {
amid = alo + (ahi-alo)/2;
amed = a[amid];
bmid = blo + (bhi-blo)/2;
bmed = b[bmid];
if( ahi - alo + bhi - blo < 4) {
Handle the boundary cases and solve it smaller problem in O(1) time.
return;
}
else if(amed < bmed)
findMedian(A, amid, ahi, B, blo, bmid+1);
else findMedian(A, alo, amid+1,B, bmid+1, bhi);
}
Time Complexity: O(𝑙𝑜𝑔𝑛), since we are reducing the problem size by half every time.
Problem-15 Let 𝐴 and 𝐵 be two sorted arrays of 𝑛 elements each. We can easily find the 𝑘 𝑡ℎ smallest element
in 𝐴 in O(1) time by just outputting 𝐴[𝑘]. Similarly, we can easily find the 𝑘 𝑡ℎ smallest element in 𝐵. Give an O(𝑙𝑜𝑔𝑘)
time algorithm to find the 𝑘 𝑡ℎ smallest element overall { 𝑖. 𝑒., the 𝑘 𝑡ℎ smallest in the union of 𝐴 and 𝐵.
Solution: It’s just another way of asking Problem-14.
Problem-16 Find the 𝒌 smallest elements in sorted order: Given a set of 𝑛 elements from a totally-ordered
domain, find the 𝑘 smallest elements, and list them in sorted order. Analyze the worst-case running time of the
best implementation of the approach.
Solution: Sort the numbers, and list the 𝑘 smallest.
𝑇(𝑛) = Time complexity of sort + listing 𝑘 smallest elements = Θ(𝑛𝑙𝑜𝑔𝑛) + Θ(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑛).
Problem-17 For Problem-16, if we follow the approach below, then what is the complexity?
Solution: Using the priority queue data structure from heap sort, construct a min-heap over the set, and perform
extract-min 𝑘 times. Refer to the 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 (𝐻𝑒𝑎𝑝𝑠) chapter for more details.
Problem-18 For Problem-16, if we follow the approach below then what is the complexity?
Find the 𝑘 𝑡ℎ -smallest element of the set, partition around this pivot element, and sort the 𝑘 smallest elements.
Solution:
𝑇(𝑛) = 𝑇𝑖𝑚𝑒 𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝑜𝑓 𝑘𝑡ℎ − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 + 𝐹𝑖𝑛𝑑𝑖𝑛𝑔 𝑝𝑖𝑣𝑜𝑡 + 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 𝑝𝑟𝑒𝑓𝑖𝑥
= Θ(𝑛) + Θ(𝑛) + Θ(𝑘𝑙𝑜𝑔𝑘) = Θ(𝑛 + 𝑘𝑙𝑜𝑔𝑘)
Since, 𝑘 ≤ 𝑛, this approach is better than Problem-16 and Problem-17.
Problem-19 Find 𝑘 nearest neighbors to the median of 𝑛 distinct numbers in O(𝑛) time.
Solution: Let us assume that the array elements are sorted. Now find the median of 𝑛 numbers and call its index
𝑛
as 𝑋 (since array is sorted, median will be at 2 location). All we need to do is select 𝑘 elements with the smallest
absolute differences from the median, moving from 𝑋 − 1 to 0, and 𝑋 + 1 to 𝑛 − 1 when the median is at index 𝑚.
Time Complexity: Each step takes Θ(𝑛). So the total time complexity of the algorithm is Θ(𝑛).
Problem-20 Is there any other way of solving Problem-19?
Solution: Assume for simplicity that n is odd and k is even. If set A is in sorted order, the median is in position
𝑛/2 and the 𝑘 numbers in A that are closest to the median are in positions (𝑛 − 𝑘)/2 through (𝑛 + 𝑘)/2.
We first use linear time selection to find the (𝑛 − 𝑘)/2, 𝑛/2, and (𝑛 + 𝑘)/2 elements and then pass through set A
to find the numbers less than the (𝑛 + 𝑘)/2 element, greater than the (𝑛 − 𝑘)/2 element, and not equal to the
𝑛/2 element. The algorithm takes O(𝑛) time as we use linear time selection exactly three times and traverse the 𝑛
numbers in 𝐴 once.
Problem-21 Given (𝑥, 𝑦) coordinates of 𝑛 houses, where should you build a road parallel to 𝑥-axis to minimize
the construction cost of building driveways?
Solution: The road costs nothing to build. It is the driveways that cost money. The driveway cost is proportional
to its distance from the road. Obviously, they will be perpendicular. The solution is to put the street at the median
of the 𝑦 coordinates.
Problem-22 Given a big file containing billions of numbers, find the maximum 10 numbers from that file.
Solution: Refer to the 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 chapter.
Problem-23 Suppose there is a milk company. The company collects milk every day from all its agents. The
agents are located at different places. To collect the milk, what is the best place to start so that the least amount
of total distance is travelled?
Solution: Starting at the median reduces the total distance travelled because it is the place which is at the center
of all the places.
Chapter
Symbol Tables 13
13.1 Introduction
Since childhood, we all have used a dictionary, and many of us have a word processor (say, Microsoft Word) which
comes with a spell checker. The spell checker is also a dictionary but limited in scope. There are many real time
examples for dictionaries and a few of them are:
• Spell checker
• The data dictionary found in database management applications
• Symbol tables generated by loaders, assemblers, and compilers
• Routing tables in networking components (DNS lookup)
In computer science, we generally use the term ‘symbol table’ rather than ‘dictionary’ when referring to the abstract
data type (ADT).
Hashing Implementation
This method is important. For a complete discussion, refer to the 𝐻𝑎𝑠ℎ𝑖𝑛𝑔 chapter.
Chapter
Hashing 14
14.1 What is Hashing?
In this chapter we introduce so-called 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑣𝑒 𝑎𝑟𝑟𝑎𝑦𝑠, that is, data structures that are similar to arrays but are
not indexed by integers, but other forms of data such as strings. One popular data structures for the
implementation of associative arrays are hash tables. To analyze the asymptotic efficiency of hash tables we have
to explore a new point of view, that of average case complexity. Hashing is a technique used for storing and
retrieving information as quickly as possible. It is used to perform optimal searches and is useful in implementing
symbol tables.
}
else count[str[i]]++;
}
if(i==len)
printf("No Repeated Characters");
return 0;
}
Universe of possible
keys 0
5 1 1
0
11
60
0 2
5 3
3
0
20 3
6 1 4
6
Used
keys
In this case the set of possible values is infinity (or at least very big). Creating a huge array and storing the counters
is not possible. That means there are a set of universal keys and limited locations in the main memory. To solve
this problem we need to somehow map all these possible keys to the possible memory locations.
From the above discussion and diagram it can be seen that we need a mapping of possible keys to one of the
available locations. As a result, using simple arrays is not the correct choice for solving the problems where the
possible keys are very big. The process of mapping the keys to available main memory locations is called ℎ𝑎𝑠ℎ𝑖𝑛𝑔.
Note: For now, do not worry about how the keys are mapped to locations. That depends on the function used for
conversions. One such simple function is 𝑘𝑒𝑦 % 𝑡𝑎𝑏𝑙𝑒 𝑠𝑖𝑧𝑒.
Direct addressing is applicable when we can afford to allocate an array with one position for every possible key.
But if we do not have enough space to allocate a location for each possible key, then we need a mechanism to
handle this case. Another way of defining the scenario is: if we have less locations and more possible keys, then
simple array implementation is not enough.
In these cases one option is to use hash tables. Hash table or hash map is a data structure that stores the keys
and their associated values, and hash table uses a hash function to map keys to their associated values. The
general convention is that we use a hash table when the number of keys actually stored is small relative to the
number of possible keys.
A hash table is a collection of items which are stored in such a way as to make it easy to find them later. Each
position of the hash table, often called a 𝑠𝑙𝑜𝑡 (or a 𝑏𝑢𝑐𝑘𝑒𝑡), can hold an item and is named by an integer value
starting at 0.
For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially, the hash table
contains no items so every slot is empty. We can implement a hash table by using a list with each element
initialized to the special NULL.
Slots or Buckets
Universe of possible keys
0
5 10
1
11
600 2
50 3
33
201
6
4
6
Used keys
phone number 436-555-4601 hashes to slot 1. Some folding methods go one step further and reverse every other
piece before the addition. For the above example, we get 43+56+55+64+01=219 which gives 219 % 11=10.
found our entry, if not we continue the search. If we reach the end of the chain and do not find an entry with key
k, then no entry with the given key exists.
In separate chaining, each slot of the hash table is a linked list. To store an element in the hash table you must
insert it into a specific linked list. If there is any collision (i.e. two different elements have same hash value) then
store both the elements in the same linked list.
As an example, consider the following simple hash function:
ℎ(𝑘𝑒𝑦) = 𝑘𝑒𝑦 % 𝑡𝑎𝑏𝑙𝑒 𝑠𝑖𝑧𝑒
In a hash table with size 7, keys 27 and 130 would get 6 and 4 as hash indices respectively.
Slot
0
1
2
3
4 → (130, “John”)
5
6 → (27, “Ram”)
If we insert a new element (18, “Saleem”), that would also go to the fourth index as 18%7 is 4.
Slot
0
1
2
3
4 → (130, “John”) → (18, “Saleem”)
5
6 → (27, “Ram”)
The cost of a lookup is that of scanning the entries of the selected linked list for the required key. If the distribution
of the keys is sufficiently uniform, then the average cost of a lookup depends only on the average number of keys
per linked list. For this reason, chained hash tables remain effective even when the number of table entries (𝑛) is
much higher than the number of slots.
For separate chaining, the worst-case scenario is when all the entries are inserted into the same linked list. The
lookup procedure may have to scan all its entries, so the worst-case cost is proportional to the number (𝑛) of
entries in the table.
The worst-case behavior of hashing with chaining is terrible: all 𝑛 keys hash to the same slot, creating a list of
length 𝑛. The worst-case time for searching is thus )( 𝑛) plus the time to compute the hash function--no better
than if we used one linked list for all the elements. Clearly, hash tables are not used for their worst-case
performance.
Linear Probing
The interval between probes is fixed at 1. In linear probing, we search the hash table sequentially. starting from
the original hash location. If a location is occupied, we check the next location. We wrap around from the last
table location to the first table location if necessary. The function for rehashing is the following:
𝑟𝑒ℎ𝑎𝑠ℎ(𝑘𝑒𝑦) = (𝑛 + 1)% 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒
One of the problems with linear probing is that table items tend to cluster together in the hash table. This means
that the table contains groups of consecutively occupied locations that are called 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑖𝑛𝑔.
Clusters can get close to one another, and merge into a larger cluster. Thus, the one part of the table might be
quite dense, even though another part has relatively few items. Clustering causes long probe searches and
therefore decreases the overall efficiency.
The next location to be probed is determined by the step-size, where other step-sizes (more than one) are possible.
The step-size should be relatively prime to the table size, i.e. their greatest common divisor should be equal to 1.
If we choose the table size to be a prime number, then any step-size is relatively prime to the table size. Clustering
cannot be avoided by larger step-sizes.
Quadratic Probing
The interval between probes increases proportionally to the hash value (the interval thus increasing linearly, and
the indices are described by a quadratic function). The problem of clustering can be eliminated if we use the
quadratic probing method. Quadratic probing is also referred to as 𝑚𝑖𝑑 − 𝑠𝑞𝑢𝑎𝑟𝑒 method.
In quadratic probing, we start from the original hash location 𝑖. If a location is occupied, we check the locations
𝑖 + 12 , 𝑖 + 22 , 𝑖 + 32 , 𝑖 + 42 ... We wrap around from the last table location to the first table location if necessary.
The function for rehashing is the following:
𝑟𝑒ℎ𝑎𝑠ℎ(𝑘𝑒𝑦) = (𝑛 + 𝑘 2 )% 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒
𝐸𝑥𝑎𝑚𝑝𝑙𝑒: Let us assume that the table size is 11 (0. .10) 0
Hash Function: h(key) = key mod 11 1
𝐼𝑛𝑠𝑒𝑟𝑡 𝑘𝑒𝑦𝑠: 2 2
3 13
31 𝑚𝑜𝑑 11 = 9 4 25
19 𝑚𝑜𝑑 11 = 8 5 5
2 𝑚𝑜𝑑 11 = 2 6 24
13 𝑚𝑜𝑑 11 = 2 → 2 + 12 = 3
7 9
25 𝑚𝑜𝑑 11 = 3 → 3 + 12 = 4
8 19
24 𝑚𝑜𝑑 11 = 2 → 2 + 12 , 2 + 22 = 6
9 31
21 𝑚𝑜𝑑 11 = 10
10 21
9 𝑚𝑜𝑑 11 = 9 → 9 + 12 , 9 + 22 𝑚𝑜𝑑 11, 9 + 32 𝑚𝑜𝑑 11 = 7
Even though clustering is avoided by quadratic probing, still there are chances of clustering. Clustering is caused by multiple search keys
mapped to the same hash key. Thus, the probing sequence for such search keys is prolonged by repeated conflicts along the probing sequence.
Both linear and quadratic probing use a probing sequence that is independent of the search key.
Double Hashing
The interval between probes is computed by another hash function. Double hashing reduces clustering in a better
way. The increments for the probing sequence are computed by using a second hash function. The second hash
function ℎ2 should be:
ℎ2(𝑘𝑒𝑦) ≠ 0 and ℎ2 ≠ ℎ1
We first probe the location ℎ1(𝑘𝑒𝑦). If the location is occupied, we probe the location ℎ1(𝑘𝑒𝑦) + ℎ2(𝑘𝑒𝑦), ℎ1(𝑘𝑒𝑦) +
2 ∗ ℎ2(𝑘𝑒𝑦), ...
0
𝐸𝑥𝑎𝑚𝑝𝑙𝑒: 1
Table size is 11 (0. .10) 2
Hash Function: assume ℎ1(𝑘𝑒𝑦) = 𝑘𝑒𝑦 𝑚𝑜𝑑 11 and 3 58
ℎ2(𝑘𝑒𝑦) = 7 – (𝑘𝑒𝑦 𝑚𝑜𝑑 7) 4 25
𝐼𝑛𝑠𝑒𝑟𝑡 𝑘𝑒𝑦𝑠: 5
58 𝑚𝑜𝑑 11 = 3 6 91
14 𝑚𝑜𝑑 11 = 3 → 3 + 7 = 10 7
91 𝑚𝑜𝑑 11 = 3 → 3 + 7, 3 + 2 ∗ 7 𝑚𝑜𝑑 11 = 6 8
25 𝑚𝑜𝑑 11 = 3 → 3 + 3, 3 + 2 ∗ 3 = 9 9 25
10 14
14.13 Comparison of Collision Resolution Techniques
Comparisons: Linear Probing vs. Double Hashing
The choice between linear probing and double hashing depends on the cost of computing the hash function and
on the load factor [number of elements per slot] of the table. Both use few probes but double hashing take more
time because it hashes to compare two hash functions for long keys.
Static Hashing
If the data is fixed then static hashing is useful. In static hashing, the set of keys is kept fixed and given in advance,
and the number of primary pages in the directory are kept fixed.
Dynamic Hashing
If the data is not fixed, static hashing can give bad performance, in which case dynamic hashing is the alternative,
in which case the set of keys can change dynamically.
How it works?
A Bloom filter starts off with a bit array initialized to zero. To store a data value, we simply apply 𝑘 different hash
functions and treat the resulting 𝑘 values as indices in the array, and we set each of the 𝑘 array elements to 1.
We repeat this for every element that we encounter.
Now suppose an element turns up and we want to know if we have seen it before. What we do is apply the 𝑘 hash
functions and look up the indicated array elements. If any of them are 0 we can be 100% sure that we have never
14.14 How Hashing Gets O(1) Complexity 358
Data Structures and Algorithms Made Easy Hashing
encountered the element before - if we had, the bit would have been set to 1. However, even if all of them are one,
we still can't conclude that we have seen the element before because all of the bits could have been set by the 𝑘
hash functions applied to multiple other elements. All we can conclude is that it is likely that we have encountered
the element before.
0
Now that the bits in the bit vector have
HashFunction1 0
been set for 𝐸𝑙𝑒𝑚𝑒𝑛𝑡1 and 𝐸𝑙𝑒𝑚𝑒𝑛𝑡2; we
1 can query the bloom filter to tell us if
0 something has been seen before.
Element1
0
The element is hashed but instead of
1 setting the bits, this time a check is
HashFunction2
1 done and if the bits that would have
0 been set are already set the bloom
filter will return true that the element
0
has been seen before.
0
HashFunction1 1
0
Element2 1
0
HashFunction2 1
0
1
0
Note that it is not possible to remove an element from a Bloom filter. The reason is simply that we can't unset a
bit that appears to belong to an element because it might also be set by another element.
If the bit array is mostly empty, i.e., set to zero, and the 𝑘 hash functions are independent of one another, then
the probability of a false positive (i.e., concluding that we have seen a data item when we actually haven't) is low.
For example, if there are only 𝑘 bits set, we can conclude that the probability of a false positive is very close to
zero as the only possibility of error is that we entered a data item that produced the same 𝑘 hash values - which
is unlikely as long as the ‘has’ functions are independent.
As the bit array fills up, the probability of a false positive slowly increases. Of course when the bit array is full,
every element queried is identified as having been seen before. So clearly we can trade space for accuracy as well
as for time.
One-time removal of an element from a Bloom filter can be simulated by having a second Bloom filter that contains
elements that have been removed. However, false positives in the second filter become false negatives in the
composite filter, which may be undesirable. In this approach, re-adding a previously removed item is not possible,
as one would have to remove it from the 𝑟𝑒𝑚𝑜𝑣𝑒𝑑 filter.
Space advantages
While risking false positives, Bloom filters have a strong space advantage over other data structures for
representing sets, such as self-balancing binary search trees, tries, hash tables, or simple arrays or linked lists of
the entries. Most of these require storing at least the data items themselves, which can require anywhere from a
small number of bits, for small integers, to an arbitrary number of bits, such as for strings (tries are an exception,
since they can share storage between elements with equal prefixes). Linked structures incur an additional linear
space overhead for pointers.
However, if the number of potential values is small and many of them can be in the set, the Bloom filter is easily
surpassed by the deterministic bit array, which requires only one bit for each potential element.
Time advantages
Bloom filters also have the unusual property that the time needed either to add items or to check whether an item
is in the set is a fixed constant, O(𝑘), completely independent of the number of items already in the set. No other
constant-space set data structure has this property, but the average access time of sparse hash tables can make
them faster in practice than some Bloom filters. In a hardware implementation, however, the Bloom filter shines
because its k lookups are independent and can be parallelized.
Implementation
Refer to 𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 𝑆𝑒𝑐𝑡𝑖𝑜𝑛.
s[last++] = s[current];
HashInsert(h, s[current]);
}
}
s[last] = ‘\0’;
}
Time Complexity: Θ(𝑛) on average. Space Complexity: O(𝑛).
Problem-5 Given two arrays of unordered numbers, check whether both arrays have the same set of
numbers?
Solution: Let us assume that two given arrays are A and B. A simple solution to the given problem is: for each
element of A, check whether that element is in B or not. A problem arises with this approach if there are duplicates.
For example consider the following inputs:
𝐴 = {2,5,6,8,10,2,2}
𝐵 = {2,5,5,8,10,5,6}
The above algorithm gives the wrong result because for each element of A there is an element in B also. But if we
look at the number of occurrences, they are not the same. This problem we can solve by moving the elements
which are already compared to the end of the list. That means, if we find an element in B, then we move that
element to the end of B, and in the next searching we will not find those elements. But the disadvantage of this is
it needs extra swaps. Time Complexity of this approach is O(𝑛2 ), since for each element of A we have to scan B.
Problem-6 Can we improve the time complexity of Problem-5?
Solution: Yes. To improve the time complexity, let us assume that we have sorted both the lists. Since the sizes
of both arrays are n, we need O(𝑛 log 𝑛) time for sorting them. After sorting, we just need to scan both the arrays
with two pointers and see whether they point to the same element every time, and keep moving the pointers until
we reach the end of the arrays.
Time Complexity of this approach is O(𝑛 log 𝑛). This is because we need O(𝑛 log 𝑛) for sorting the arrays. After
sorting, we need O(𝑛) time for scanning but it is less compared to O(𝑛 log 𝑛).
Problem-7 Can we further improve the time complexity of Problem-5?
Solution: Yes, by using a hash table. For this, consider the following algorithm.
Algorithm:
• Construct the hash table with array 𝐴 elements as keys.
• While inserting the elements, keep track of the number frequency for each number. That means, if there
are duplicates, then increment the counter of that corresponding key.
• After constructing the hash table for 𝐴’𝑠 elements, now scan the array 𝐵.
• For each occurrence of 𝐵’𝑠 elements reduce the corresponding counter values.
• At the end, check whether all counters are zero or not.
• If all counters are zero, then both arrays are the same otherwise the arrays are different.
Time Complexity: O(𝑛) for scanning the arrays. Space Complexity: O(𝑛) for hash table.
Problem-8 Given a list of number pairs; if 𝑝𝑎𝑖𝑟(𝑖, 𝑗) exists, and 𝑝𝑎𝑖𝑟 (𝑗, 𝑖) exists, report all such pairs. For
example, in {{1,3}, {2,6}, {3,5}, {7,4}, {5,3}, {8,7}}, we see that {3,5} and {5,3} are present. Report this pair when you
encounter {5,3}. We call such pairs ‘symmetric pairs’. So, give an efficient algorithm for finding all such pairs.
Solution: By using hashing, we can solve this problem in just one scan. Consider the following algorithm.
Algorithm:
• Read the pairs of elements one by one and insert them into the hash table. For each pair, consider the
first element as key and the second element as value.
• While inserting the elements, check if the hashing of the second element of the current pair is the same
as the first number of the current pair.
• If they are the same, then that indicates a symmetric pair exits and output that pair.
• Otherwise, insert that element into that. That means, use the first number of the current pair as key and
the second number as value and insert them into the hash table.
• By the time we complete the scanning of all pairs, we have output all the symmetric pairs.
Time Complexity: O(𝑛) for scanning the arrays. Note that we are doing a scan only of the input. Space Complexity:
O(𝑛) for hash table.
Problem-9 Given a singly linked list, check whether it has a loop in it or not.
Solution: Using Hash Tables
Algorithm:
• Traverse the linked list nodes one by one.
• Check if the node’s address is there in the hash table or not.
• If it is already there in the hash table, that indicates we are visiting a node which was already visited. This
is possible only if the given linked list has a loop in it.
• If the address of the node is not there in the hash table. then insert that node’s address into the hash
table.
• Continue this process until we reach the end of the linked list or we find the loop.
Time Complexity: O(𝑛) for scanning the linked list. Note that we are doing a scan only of the input. Space
Complexity: O(𝑛) for hash table.
Note: for an efficient solution, refer to the 𝐿𝑖𝑛𝑘𝑒𝑑 𝐿𝑖𝑠𝑡𝑠 chapter.
Problem-10 Given an array of 101 elements. Out of them 50 elements are distinct, 24 elements are repeated 2
times, and one element is repeated 3 times. Find the element that is repeated 3 times in O(1).
Solution: Using Hash Tables
Algorithm:
• Scan the input array one by one.
• Check if the element is already there in the hash table or not.
• If it is already there in the hash table, increment its counter value [this indicates the number of
occurrences of the element].
• If the element is not there in the hash table, insert that node into the hash table with counter value 1.
• Continue this process until reaching the end of the array.
Time Complexity: O(𝑛), because we are doing two scans. Space Complexity: O(𝑛), for hash table.
Note: For an efficient solution refer to the 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-11 Given 𝑚 sets of integers that have 𝑛 elements in them, provide an algorithm to find an element
which appeared in the maximum number of sets?
Solution: Using Hash Tables
Algorithm:
• Scan the input sets one by one.
• For each element keep track of the counter. The counter indicates the frequency of occurrences in all the
sets.
• After completing the scan of all the sets, select the one which has the maximum counter value.
Time Complexity: O(𝑚𝑛), because we need to scan all the sets. Space Complexity: O(𝑚𝑛), for hash table. Because,
in the worst case all the elements may be different.
Problem-12 Given two sets 𝐴 and 𝐵, and a number 𝐾, Give an algorithm for finding whether there exists a pair
of elements, one from 𝐴 and one from 𝐵, that add up to 𝐾.
Solution: For simplicity, let us assume that the size of 𝐴 is 𝑚 and the size of 𝐵 is 𝑛.
Algorithm:
• Select the set which has minimum elements.
• For the selected set create a hash table. We can use both key and value as the same.
• Now scan the second array and check whether (𝐾-𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑒𝑙𝑒𝑚𝑒𝑛𝑡) exists in the hash table or not.
• If it exists then return the pair of elements.
• Otherwise continue until we reach the end of the set.
Time Complexity: O(𝑀𝑎𝑥(𝑚, 𝑛)), because we are doing two scans.
Space Complexity: O(𝑀𝑖𝑛(𝑚, 𝑛)), for hash table. We can select the small set for creating the hash table.
Problem-13 Give an algorithm to remove the specified characters from a given string which are given in
another string?
Solution: For simplicity, let us assume that the maximum number of different characters is 256. First we create
an auxiliary array initialized to 0. Scan the characters to be removed, and for each of those characters we set the
value to 1, which indicates that we need to remove that character.
After initialization, scan the input string, and for each of the characters, we check whether that character needs
to be deleted or not. If the flag is set then we simply skip to the next character, otherwise we keep the character
in the input string. Continue this process until we reach the end of the input string. All these operations we can
do in-place as given below.
break;
}
}
if(i==len)
printf("No non-repeated characters");
return 0;
}
Time Complexity: We have O(𝑛) to create the hash table and another O(𝑛) to read the entries of hash table. So
the total time is O(𝑛) + O(𝑛) = O(2𝑛) ≈ O(𝑛). Space Complexity: O(𝑛) for keeping the count values.
Problem-16 Given a string, give an algorithm for finding the first repeating letter in a string?
Solution: The solution to this problem is somewhat similar to Problem-13 and Problem-15. The only difference is,
instead of scanning the hash table twice we can give the answer in just one scan. This is because while inserting
into the hash table we can see whether that element already exists or not. If it already exists then we just need to
return that character.
char firstRepeatedCharUsinghash( char * str ) {
int i, len=strlen(str);
int count[256]; // additional array
for(i=0;i<len;++i)
count[i] = 0;
for(i=0; i<len; ++i) {
if(count[str[i]]==1) {
printf(“%s”,str[i]);
break;
}
else count[str[i]]++;
}
if(i==len)
printf("No Repeated Characters");
return 0;
}
Time Complexity: We have O(𝑛) for scanning and creating the hash table. Note that we need only one scan for this
problem. So the total time is O(𝑛). Space Complexity: O(𝑛) for keeping the count values.
Problem-17 Given an array of 𝑛 numbers, create an algorithm which displays all pairs whose sum is 𝑆.
Solution: This problem is similar to Problem-12. But instead of using two sets we use only one set.
Algorithm:
• Scan the elements of the input array one by one and create a hash table. Both key and value can be the
same.
• After creating the hash table, again scan the input array and check whether (𝑆 − 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑒𝑙𝑒𝑚𝑒𝑛𝑡) exits in
the hash table 𝑜𝑟 not.
• If it exits then return the pair of elements.
• Otherwise continue and read all the elements of the array.
Time Complexity: We have O(𝑛) to create the hash table and another O(𝑛) to read the entries of the hash table.
So the total time is O(𝑛) + O(𝑛) = O(2𝑛) ≈O(𝑛). Space Complexity: O(𝑛) for keeping the count values.
Problem-18 Is there any other way of solving Problem-17?
Solution: Yes. The alternative solution to this problem involves sorting. First sort the input array. After sorting,
use two pointers, one at the starting and another at the ending. Each time add the values of both the indexes and
see if their sum is equal to 𝑆. If they are equal then print that pair. Otherwise increase the left pointer if the sum
is less than S and decrease the right pointer if the sum is greater than 𝑆.
Time Complexity: Time for sorting + Time for scanning = O(𝑛𝑙𝑜𝑔𝑛) + O(𝑛) ≈ O(𝑛𝑙𝑜𝑔𝑛). Space Complexity: O(1).
Problem-19 We have a file with millions of lines of data. Only two lines are identical; the rest are unique. Each
line is so long that it may not even fit in the memory. What is the most efficient solution for finding the identical
lines?
Solution: Since a complete line may not fit into the main memory, read the line partially and compute the hash
from that partial line. Then read the next part of the line and compute the hash. This time use the previous hash
also while computing the new hash value. Continue this process until we find the hash for the complete line. Do
this for each line and store all the hash values in a file [or maintain a hash table of these hashes]. If at any point
you get same hash value, read the corresponding lines part by part and compare.
return 0;
}
int checkElementBloom(struct Bloom *blm, const char *s){
for(int n=0; n<blm→nHashFunctions; ++n) {
if(!(GETBLOOMBIT(blm→bloomArray, blm→funcsArray[n](s)%blm→bloomArraySize))) return 0;
}
return 1;
}
unsigned int shiftAddXORHash(const char *key){
unsigned int h=0;
while(*key) h^=(h<<5)+(h>>2)+(unsigned char)*key++;
return h;
}
unsigned int XORHash(const char *key){
unsigned int h=0;
hash_t h=0;
while(*key) h^=*key++;
return h;
}
int test(){
FILE *fp;
char line[1024];
char *p;
struct Bloom *blm;
if(!(blm=createBloom(1500000, 2, shiftAddXORHash, XORHash))) {
fprintf(stderr, "ERROR: Could not create Bloom filter\n");
return -1;
}
if(!(fp=fopen("path", "r"))) {
fprintf(stderr, "ERROR: Could not open file %s\n", argv[1]);
return -1;
}
while(fgets(line, 1024, fp)) {
if((p=strchr(line, '\r'))) *p='\0';
if((p=strchr(line, '\n'))) *p='\0';
addElementBloom(blm, line);
}
fclose(fp);
while(fgets(line, 1024, stdin)) {
if((p=strchr(line, '\r'))) *p='\0';
if((p=strchr(line, '\n'))) *p='\0';
p=strtok(line, " \t,.;:\r\n?!-/()");
while(p) {
if(!checkBloom(blm, p)) {
printf("No match for ford \"%s\"\n", p);
}
p=strtok(NULL, " \t,.;:\r\n?!-/()");
}
}
deleteBloom(blm);
return 1;
}
Problem-22 Given a hash table with size=11 entries and the following hash function ℎ1 and step function ℎ2 :
ℎ1 (𝑘𝑒𝑦) = 𝑘𝑒𝑦 % size
ℎ2 (𝑘𝑒𝑦) = {key % (size-1)} + 1
Insert the keys {22, 1, 13, 11, 24, 33, 18, 42, 31} in the given order (from left to right) to the hash table
using each of the following hash methods:
o Chaining with ℎ1 [ℎ(𝑘𝑒𝑦) = ℎ1 (𝑘𝑒𝑦)]
o Linear-Probing with h1 --> ℎ(𝑘𝑒𝑦,i) = (ℎ1 (𝑘𝑒𝑦)+i) % size]
o Double-Hashing with ℎ1 as the hash function and ℎ2 as the step function [ℎ(𝑘𝑒𝑦, 𝑖) = (ℎ1 (𝑘𝑒𝑦) +
iℎ2 (𝑘𝑒𝑦)) % size].
Solution:
Chaining Linear Probing Double Hashing
0 33 → 11→ 22 22 22
1 1 1 1
2 24 → 13 13 13
3 11
4 24 11
5 33 18
6 31
7 18 18 24
8 33
9 31→ 42 42 42
10 31
Chapter
String
Algorithms 15
15.1 Introduction
To understand the importance of string algorithms let us consider the case of entering the URL (Uniform Resource
Locator) in any browser (say, Internet Explorer, Firefox, or Google Chrome). You will observe that after typing the
prefix of the URL, a list of all possible URLs is displayed. That means, the browsers are doing some internal
processing and giving us the list of matching URLs. This technique is sometimes called 𝑎𝑢𝑡𝑜 − 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛.
Similarly, consider the case of entering the directory name in the command line interface (in both 𝑊𝑖𝑛𝑑𝑜𝑤𝑠 and
𝑈𝑁𝐼𝑋). After typing the prefix of the directory name, if we press the 𝑡𝑎𝑏 button, we get a list of all matched directory
names available. This is another example of auto completion.
In order to support these kinds of operations, we need a data structure which stores the string data efficiently. In
this chapter, we will look at the data structures that are useful for implementing string algorithms.
We start our discussion with the basic problem of strings: given a string, how do we search a substring (pattern)?
This is called a 𝑠𝑡𝑟𝑖𝑛𝑔 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 problem. After discussing various string matching algorithms, we will look at
different data structures for storing strings.
return -1;
}
Time Complexity: O((𝑛 − 𝑚 + 1) × 𝑚) ≈O(𝑛 × 𝑚). Space Complexity: O(1).
Finite Automata
A finite automaton F is a 5-tuple (𝑄, 𝑞0 , 𝐴, ∑, 𝛿), where
• 𝑄 is a finite set of states
• q0 ∈ 𝑄 is the start state
Matching Algorithm
Now, let us concentrate on the matching algorithm.
• For a given pattern 𝑃[0. . 𝑚 − 1], first we need to build a finite automaton 𝐹
o The state set is 𝑄 = {0, 1, 2, … , 𝑚}
o The start state is 0
o The only accepting state is 𝑚
o Time to build 𝐹 can be large if å is large
• Scan the text string 𝑇[0. . 𝑛 − 1] to find all occurrences of the pattern 𝑃[0. . 𝑚 − 1]
• String matching is efficient: Θ(𝑛)
o Each character is examined exactly once
o Constant time for each character
o But the time to compute 𝛿 (transition function) is O(𝑚|å|). This is because 𝛿 has O(𝑚|å|) entries.
If we assume |å| is constant then the complexity becomes O(𝑚).
Algorithm:
//Input: Pattern string P[0..m-1], δ and F
//Goal: All valid shifts displayed
FiniteAutomataStringMatcher(int P[], int m, F, δ) {
q = 0;
for (int i = 0; i < m; i++)
q = δ(q,T[i]);
if(q == m)
printf(“Pattern occurs with shift: %d”, i-m);
}
Time Complexity: O(𝑚).
stores the knowledge about how the pattern matches against shifts of itself. This information can be used to avoid
useless shifts of the pattern 𝑃. It means that this table can be used for avoiding backtracking on the string 𝑇.
Prefix Table
int F[]; //assume F is a global array
void Prefix-Table(int P[], int m) {
int i=1,j=0, F[0]=0;
while(i<m) {
if(P[i]==P[j]) {
F[i]=j+1;
i++;
j++;
}
else if(j>0)
j=F[j-1];
else {
F[i]=0;
i++;
}
}
}
As an example, assume that 𝑃 = 𝑎 𝑏 𝑎 𝑏 𝑎 𝑐 𝑎. For this pattern, let us follow the step-by-step instructions for filling
the prefix table F. Initially: 𝑚 = 𝑙𝑒𝑛𝑔𝑡ℎ[𝑃] = 7, 𝐹[0] = 0 and 𝐹[1] = 0.
Step 1: 𝑖 = 1, 𝑗 = 0, F[1] = 0 0 1 2 3 4 5 6
P a b a b a c a
F 0 0
Step 2: 𝑖 = 2, 𝑗 = 0, F[2] = 1 0 1 2 3 4 5 6
P a b a b a c a
F 0 0 1
Step 3: 𝑖 = 3, 𝑗 = 1, F[3] = 2 0 1 2 3 4 5 6
P a b a b a c a
F 0 0 1 2
Step 4: 𝑖 = 4, 𝑗 = 2, F[4] = 3 0 1 2 3 4 5 6
P a b a b a c a
F 0 0 1 2 3
Step 5: 𝑖 = 5, 𝑗 = 3, F[5] = 1 0 1 2 3 4 5 6
P a b a b a c a
F 0 0 1 2 3 0
Step 6: 𝑖 = 6, 𝑗 = 1, F[6] = 1 0 1 2 3 4 5 6
P a b a b a c a
F 0 0 1 2 3 0 1
At this step the filling of the prefix table is complete.
Matching Algorithm
The KMP algorithm takes pattern 𝑃, string 𝑇 and prefix function 𝐹 as input, and finds a match of 𝑃 in 𝑇.
int KMP(char T[], int n, int P[], int m) {
int i=0,j=0;
Prefix-Table(P,m);
while(i<n) {
if(T[i]==P[j]) {
if(j==m-1)
return i-j;
else {
i++;
j++;
}
}
else if(j>0)
j=F[j-1];
else i++;
}
return -1;
}
Time Complexity: O(𝑚 + 𝑛), where 𝑚 is the length of the pattern and 𝑛 is the length of the text to be searched.
Space Complexity: O(𝑚).
Now, to understand the process let us go through an example. Assume that 𝑇 = 𝑏 𝑎 𝑐 𝑏 𝑎 𝑏 𝑎 𝑏 𝑎 𝑏 𝑎 𝑐 𝑎 𝑐 𝑎 & 𝑃 =
𝑎 𝑏 𝑎 𝑏 𝑎 𝑐 𝑎. Since we have already filled the prefix table, let us use it and go to the matching algorithm. Initially:
𝑛 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑇 = 15; 𝑚 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑃 = 7.
Step 1: 𝑖 = 0, 𝑗 = 0, comparing 𝑃[0] with 𝑇[0]. 𝑃[0] does not match with 𝑇[0]. 𝑃 will be shifted one position to the
right.
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 2: 𝑖 = 1, 𝑗 = 0, comparing 𝑃[0] with 𝑇[1]. 𝑃[0] matches with 𝑇[1]. Since there is a match, 𝑃 is not shifted.
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 3: 𝑖 = 2, 𝑗 = 1, comparing 𝑃[1] with 𝑇[2]. 𝑃[1] does not match with 𝑇[2]. Backtracking on 𝑃, comparing 𝑃[0]
and 𝑇[2].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 4: 𝑖 = 3, 𝑗 = 0, comparing 𝑃[0] with 𝑇[3]. 𝑃[0] does not match with 𝑇[3].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 5: 𝑖 = 4, 𝑗 = 0, comparing 𝑃[0] with 𝑇[4]. 𝑃[0] matches with 𝑇[4].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 6: 𝑖 = 5, 𝑗 = 1, comparing 𝑃[1] with 𝑇[5]. 𝑃[1] matches with T[5].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 7: 𝑖 = 6, 𝑗 = 2, comparing 𝑃[2] with 𝑇[6]. 𝑃[2] matches with 𝑇[6].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 8: 𝑖 = 7, 𝑗 = 3, comparing 𝑃[3] with 𝑇[7]. 𝑃[3] matches with 𝑇[7].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 9: 𝑖 = 8, 𝑗 = 4, comparing 𝑃[4] with 𝑇[8]. 𝑃[4] matches with 𝑇[8].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 10: 𝑖 = 9, 𝑗 = 5, comparing 𝑃[5] with 𝑇[9]. 𝑃[5] does not match with 𝑇[9]. Backtracking on 𝑃, comparing 𝑃[4]
with 𝑇[9] because after mismatch 𝑗 = 𝐹[4] = 3.
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Comparing 𝑃[3] with 𝑇[9].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 11: 𝑖 = 10, 𝑗 = 4, comparing 𝑃[4] with 𝑇[10]. 𝑃[4] matches with 𝑇[10].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 12: 𝑖 = 11, 𝑗 = 5, comparing 𝑃[5] with 𝑇[11]. 𝑃[5] matches with 𝑇[11].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Step 13: 𝑖 = 12, 𝑗 = 6, comparing 𝑃[6] with 𝑇[12]. 𝑃[6] matches with 𝑇[12].
𝑇 b a c b a b a b a b a c a c a
𝑃 a b a b a c a
Pattern 𝑃 has been found to completely occur in string 𝑇. The total number of shifts that took place for the match
to be found are: 𝑖 – 𝑚 = 13 – 7 = 6 shifts.
Notes:
• KMP performs the comparisons from left to right
• KMP algorithm needs a preprocessing (prefix function) which takes O(𝑚) space and time complexity
• Searching takes O(𝑛 + 𝑚) time complexity (does not depend on alphabet size)
is
caree string
r
a monk this
15.11 Tries
Now, let us see the alternative representation that reduces the time complexity of the search operation. The name
𝑡𝑟𝑖𝑒 is taken from the word re”trie”.
What is a Trie?
A 𝑡𝑟𝑖𝑒 is a tree and each node in it contains the number of pointers equal to the number of characters of the
alphabet. For example, if we assume that all the strings are formed with English alphabet characters “𝑎” to “𝑧”
then each node of the trie contains 26 pointers. A trie data structure can be declared as:
struct TrieNode {
char data; // Contains the current node character.
int is_End_Of_String; // Indicates whether the string formed from root to current node is a string
or not
struct TrieNode *child[26]; // Pointers to other tri nodes
};
Suppose we want to store the strings “𝑎”, “𝑎𝑙𝑙”, “𝑎𝑙𝑠”, and “𝑎𝑠””: 𝑡𝑟𝑖𝑒 for these strings will look like:
‘a’ 1 … …
Non None
e
‘a’ 1 … … ‘s’ 1 …
Why Tries?
The tries can insert and find strings in O(𝐿) time (where 𝐿 represents the length of a single word). This is much
faster than hash table and binary search tree representations.
Trie Declaration
The structure of the TrieNode has data (char), is_End_Of_String (boolean), and has a collection of child nodes
(Collection of TrieNodes). It also has one more method called subNode(char). This method takes a character as
argument and will return the child node of that character type if that is present. The basic element - TrieNode of
a TRIE data structure looks like this:
struct TrieNode {
char data;
int is_End_Of_String;
struct TrieNode *child[];
};
struct TrieNode *TrieNode subNode(struct TrieNode *root, char c){
if(root! = NULL){
for(int i=0; i < 26; i++){
if(root.child[i]→data == c)
return root.child[i];
}
}
return NULL;
}
Now that we have defined our TrieNode, let’s go ahead and look at the other operations of TRIE. Fortunately, the
TRIE data structure is simple to implement since it has two major methods: insert() and search(). Let’s look at the
elementary implementation of both these methods.
int is_End_Of_String;
struct TSTNode *left;
struct TSTNode *eq;
struct TSTNode *right;
};
The Ternary Search Tree (TST) uses three pointers:
• The 𝑙𝑒𝑓𝑡 pointer points to the TST containing all the strings which are alphabetically less than 𝑑𝑎𝑡𝑎.
• The 𝑟𝑖𝑔ℎ𝑡 pointer points to the TST containing all the strings which are alphabetically greater than 𝑑𝑎𝑡𝑎.
• The 𝑒𝑞 pointer points to the TST containing all the strings which are alphabetically equal to 𝑑𝑎𝑡𝑎. That
means, if we want to search for a string, and if the current character of the input string and the 𝑑𝑎𝑡𝑎 of
current node in TST are the same, then we need to proceed to the next character in the input string and
search it in the subtree which is pointed by 𝑒𝑞.
‘b’ 0
None None
‘o’ 0
None None
‘a’ 0
None None
‘t’ 0
None None
‘s’ 1
None None
None
Now if we want to insert the string 𝑏𝑜𝑎𝑡, then the TST becomes [the only change is setting the 𝑖𝑠𝐸𝑛𝑑𝑂𝑓𝑆𝑡𝑟𝑖𝑛𝑔 flag
of “𝑡” node to 1]:
‘b’ 0
None None
‘o’ 0
None None
‘a’ 0
None None
‘t’ 1
None None
‘s’ 1
None None
None
‘b’ 0
None
‘a’ 0 ‘o’ 0
None None None None
‘t’ 1 ‘a’ 0
None None None None
None
‘t’ 1
None None
‘s’ 1
None None
Now, let us insert the final word: 𝑏𝑎𝑡𝑠. None
‘b’ 0
None
‘a’ 0 ‘o’ 0
None None None None
‘t’ 1 ‘a’ 0
None None None None
‘s’ 1 ‘t’ 1
None None
None
Based on these examples, we can write the insertion algorithm as below. We will combine the insertion operation
of BST and tries.
struct TSTNode *insertInTST(struct TSTNode *root, char *word) {
if(root == NULL) {
root = (struct TSTNode *) malloc(sizeof(struct TSTNode));
root→data = *word;
root→is_End_Of_String = 1;
root→left = root→eq = root→right = NULL;
}
if(*word < root→data)
root→left = insertInTST (root→left, word);
else if(*word == root→data) {
if(*(word+1))
root→eq = insertInTST (root→eq, word+1);
else root→is_End_Of_String = 1;
}
else root→right = insertInTST (root→right, word);
return root;
}
Time Complexity: O(𝐿), where 𝐿 is the length of the string to be inserted.
• TSTs can grow and shrink dynamically but hash tables resize only based on load factor.
• TSTs allow partial search whereas BSTs and hash tables do not support it.
• TSTs can display the words in sorted order, but in hash tables we cannot get the sorted order.
• Tries perform search operations very fast but they take huge memory for storing the string.
• TSTs combine the advantages of BSTs and Tries. That means they combine the memory efficiency of BSTs
and the time efficiency of tries.
Observation
From the above example, we can easily see that for a given text 𝑇 and pattern 𝑃, the exact string matching problem
can also be defined as:
• Find a suffix of 𝑇 such that 𝑃 is a prefix of this suffix 𝑜𝑟
• Find a prefix of 𝑇 such that 𝑃 is a suffix of this prefix.
Example: Let the text to be searched be 𝑇 = 𝑎𝑐𝑐𝑏𝑘𝑘𝑏𝑎𝑐 and the pattern be 𝑃 = 𝑘𝑘𝑏. For this example, 𝑃 is a prefix
of the suffix 𝑘𝑘𝑏𝑎𝑐 and also a suffix of the prefix 𝑎𝑐𝑐𝑏𝑘𝑘𝑏.
Now, for 𝑆2 and 𝑆3 (as they have more than one element), let us find the longest prefix in the group, and the result
is shown below.
Group Indexes for this group Longest Prefix of Group Suffixes
𝑆2 3, 5 𝑎𝑡
𝑆3 2, 4, 6 𝑡
For 𝑆2 and 𝑆3 , create internal nodes, and the edge contains the longest common prefix of those groups.
𝑎𝑡 $
𝑡
$ $
𝑎𝑡$ 𝑎𝑡$
𝑎𝑡𝑎𝑡$
Now we have to remove the longest common prefix from the 𝑆2 and 𝑆3 group elements.
Group Indexes for this group Longest Prefix of Group Suffixes Resultant Suffixes
𝑆2 3, 5 𝑎𝑡 $, 𝑎𝑡$
𝑆3 2, 4, 6 𝑡 $, 𝑎𝑡$, 𝑎𝑡𝑎𝑡$
Out next step is solving 𝑆2 and 𝑆3 recursively. First let us take 𝑆2 . In this group, if we sort them based on their
first character, it is easy to see that the first group contains only one element $, and the second group also contains
only one element, 𝑎𝑡$. Since both groups have only one element, we can directly create leaf nodes for them.
𝑎𝑡 $
𝑡
𝑎𝑡$ $
$ 𝑎𝑡$
𝑎𝑡𝑎𝑡$
At this step, both 𝑆1 and 𝑆2 elements are done and the only remaining group is 𝑆3 . As similar to earlier steps, in the
𝑆3 group, if we sort them based on their first character, it is easy to see that there is only one element in the first
group and it is $. For 𝑆3 remaining elements, remove the longest common prefix.
Group Indexes for this group Longest Prefix of Group Suffixes Resultant Suffixes
𝑆3 4, 6 𝑎𝑡 $, 𝑎𝑡$
In the 𝑆3 second group, there are two elements: $ and 𝑎𝑡$. We can directly add the leaf nodes for the first group
element $. Let us add 𝑆3 subtree as shown below.
𝑎𝑡 $
𝑡
$ $
𝑎𝑡$ $ 𝑎𝑡
𝑎𝑡$
𝑎𝑡𝑎𝑡$
$
𝑎𝑡$
Now, 𝑆3 contains two elements. If we sort them based on their first character, it is easy to see that there are only
two elements and among them one is $ and other is 𝑎𝑡$. We can directly add the leaf nodes for them. Let us add
𝑆3 subtree as shown below.
𝑎𝑡 $
𝑡
$ $
𝑎𝑡$ $ 𝑎𝑡
𝑎𝑡$
𝑎𝑡𝑎𝑡$
𝑎𝑡$ $
Since there are no more elements, this is the completion of the construction of the suffix tree for string 𝑇 = 𝑡𝑎𝑡𝑎𝑡.
The time-complexity of the construction of a suffix tree using the above algorithm is O(𝑛2 ) where 𝑛 is the length
of the input string because there are 𝑛 distinct suffixes. The longest has length 𝑛, the second longest has length
𝑛 − 1, and so on.
Note:
• There are O(𝑛) algorithms for constructing suffix trees.
• To improve the complexity, we can use indices instead of string for branches.
Another way of doing this is: We can build a suffix tree for the string 𝑇1 $𝑇2 #. This is equivalent to building a
common suffix tree for both the strings.
Time Complexity: O(𝑚 + 𝑛), where 𝑚 and 𝑛 are the lengths of input strings 𝑇1 and 𝑇2 .
Problem-3 Longest Palindrome: Given a text 𝑇 how do we find the substring of 𝑇 which is the longest
palindrome of 𝑇?
Solution: The longest palindrome of 𝑇[1. . 𝑛] can be found in O(𝑛) time. The algorithm is: first build a suffix tree
for 𝑇$𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑇)# or build a generalized suffix tree for 𝑇 and 𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑇). After building the suffix tree, find the
deepest node marked with both $ and #. Basically it means find the longest common substring.
Problem-4 Given a string (word), give an algorithm for finding the next word in the dictionary.
Solution: Let us assume that we are using Trie for storing the dictionary words. To find the next word in Tries we
can follow a simple approach as shown below. Starting from the rightmost character, increment the characters
one by one. Once we reach 𝑍, move to the next character on the left side.
Whenever we increment, check if the word with the incremented character exists in the dictionary or not. If it
exists, then return the word, otherwise increment again. If we use 𝑇𝑆𝑇, then we can find the inorder successor for
the current word.
Problem-5 Give an algorithm for reversing a string.
Solution:
//If the 𝑠𝑡𝑟 is 𝑒𝑑𝑖𝑡𝑎𝑏𝑙𝑒
char *ReversingString(char str[]) {
char temp, start, end;
if(str == NULL || *str == '\0')
return str;
for (end = 0; str[end]; end++);
end--;
for (start = 0; start < end; start++, end--) {
temp = str[start]; str[start] = str[end]; str[end] = temp;
}
return str;
}
Time Complexity: O(𝑛), where 𝑛 is the length of the given string. Space Complexity: O(𝑛).
Problem-6 If the string is not editable, how do we create a string that is the reverse of the given string?
Solution: If the string is not editable, then we need to create an array and return the pointer of that.
//If 𝑠𝑡𝑟 is a 𝑐𝑜𝑛𝑠𝑡 string (not editable)
char* ReversingString(char* str) {
int start, end, len;
char temp, *ptr=NULL;
len=strlen(str);
ptr=malloc(sizeof(char)*(len+1));
ptr=strcpy(ptr,str);
for (start=0, end=len-1; start<=end; start++, end--) { //Swapping
temp=ptr[start]; ptr[start]=ptr[end]; ptr[end]=temp;
}
return ptr;
}
𝑛
Time Complexity: O(2 ) ≈O(𝑛), where 𝑛 is the length of the given string. Space Complexity: O(𝑛).
Problem-7 Can we reverse the string without using any temporary variable?
Solution: Yes, we can use XOR logic for swapping the variables.
char* ReversingString(char *str) {
int start = 0, end= strlen(str)-1;
while( start<end ) {
str[start] ^= str[end]; str[end] ^= str[start]; str[start] ^= str[end];
++start;
--end;
}
return str;
}
𝑛
Time Complexity: O(2 ) ≈O(𝑛), where 𝑛 is the length of the given string. Space Complexity: O(𝑛).
Problem-8 Given a text and a pattern, give an algorithm for matching the pattern in the text. Assume ? (single
character matcher) and ∗ (multi character matcher) are the wild card characters.
Solution: Brute Force Method. For efficient method, refer to the theory section.
int PatternMatching(char *text, char *pattern) {
if(*pattern == 0)
return 1;
if(*text == 0)
return *p == 0;
if('?' == *pattern)
return PatternMatching(text+1,pattern+1) || PatternMatching(text,pattern+1);
if('*' == *pattern)
return PatternMatching(text+1,pattern) || PatternMatching(text,pattern+1);
if(*text == *pattern)
return PatternMatching(text+1,pattern+1);
return -1;
}
Time Complexity: O(𝑚𝑛), where 𝑚 is the length of the text and 𝑛 is the length of the pattern. Space Complexity:
O(1).
Problem-9 Give an algorithm for reversing words in a sentence.
Example: Input: “This is a Career Monk String” Output: “String Monk Career a is This”
Solution: Start from the beginning and keep on reversing the words. The below implementation assumes that ‘ ‘
(space) is the delimiter for words in given sentence.
void ReverseWordsInSentences(char *text) {
int wordStart, wordEnd, length;
length = strlen(text);
ReversingString(text, 0, length-1);
for(wordStart = wordEnd = 0; wordEnd < length; wordEnd ++) {
if(text[wordEnd] ! = ' ') {
wordStart = wordEnd;
while (text[wordEnd] != ' ' && wordEnd < length)
wordEnd ++;
wordEnd--;
ReversingString(text, wordStart, wordEnd); //Found current word, reverse it now.
}
}
}
void ReversingString(char text[], int start, int end) {
for (char temp; start < end; start++, end--) {
temp = str[end];
str[end] = str[start];
strstart] = temp;
}
}
Time Complexity: O(2𝑛) ≈O(𝑛), where 𝑛 is the length of the string. Space Complexity: O(1).
Problem-10 Permutations of a string [anagrams]: Give an algorithm for printing all possible permutations of the
characters in a string. Unlike combinations, two permutations are considered distinct if they contain the same
characters but in a different order. For simplicity assume that each occurrence of a repeated character is a
distinct character. That is, if the input is “aaa”, the output should be six repetitions of “aaa”. The permutations
may be output in any order.
Solution: The solution is reached by generating n! strings, each of length n, where n is the length of the input string.
void Permutations(int depth, char *permutation, int *used, char *original) {
int length = strlen(original);
if(depth == length)
printf(“%s”,permutation);
else {
for (int i = 0; i < length; i++) {
if(!used[i]) {
used[i] = 1;
permutation[depth] = original[i];
Permutations(depth + 1, permutation, used, original);
used[i] = 0;
}
15.15 String Algorithms: Problems & Solutions 386
Data Structures and Algorithms Made Easy String Algorithms
}
}
}
Problem-11 Combinations of a String: Unlike permutations, two combinations are considered to be the same if
they contain the same characters, but may be in a different order. Give an algorithm that prints all possible
combinations of the characters in a string. For example, "𝑎𝑐" and "𝑎𝑏" are different combinations from the input
string "𝑎𝑏𝑐", but "𝑎𝑏" is the same as "𝑏𝑎".
Solution: The solution is achieved by generating 𝑛!/𝑟! (𝑛 − 𝑟)! strings, each of length between 1 and 𝑛 where 𝑛 is the length of the given
input string.
Algorithm:
For each of the input characters
a. Put the current character in output string and print it.
b. If there are any remaining characters, generate combinations with those remaining characters.
void combinations(int depth, char *combination, int start, char *original) {
int length = strlen(original);
for (int i = start; i < length; i++) {
combination[depth] = original[i];
combination[depth +1] = ‘\0’;
printf(“%s”, combination);
if(i < length -1)
combinations(depth + 1, combination, start + 1, original);
}
}
Problem-12 Given a string "ABCCBCBA", give an algorithm for recursively removing the adjacent characters
if they are the same. For example, ABCCBCBA --> ABBCBA-->ACBA
Solution: First we need to check if we have a character pair; if yes, then cancel it. Now check for next character
and previous element. Keep canceling the characters until we either reach the start of the array, reach the end of
the array, or don’t find a pair.
void removeAdjacentPairs(char* str) {
int len = strlen(str), i, j = 0;
for (i=1; i <= len; i++) {
while ((str[i] == str[j]) && (j >= 0)){ //Cancel pairs
i++;
j--;
}
str[++j] = str[i];
}
return;
}
Problem-13 Given a set of characters 𝐶𝐻𝐴𝑅𝑆 and a input string 𝐼𝑁𝑃𝑈𝑇, find the minimum window in 𝑠𝑡𝑟 which
will contain all the characters in 𝐶𝐻𝐴𝑅𝑆 in complexity O(𝑛). For example, 𝐼𝑁𝑃𝑈𝑇 = 𝐴𝐵𝐵𝐴𝐶𝐵𝐴𝐴 and 𝐶𝐻𝐴𝑅𝑆 =
𝐴𝐴𝐵 has the minimum window 𝐵𝐴𝐴.
Solution: This algorithm is based on the sliding window approach. In this approach, we start from the beginning of the array and move to the
right. As soon as we have a window which has all the required elements, try sliding the window as far right as possible with all the required
elements. If the current window length is less than the minimum length found until now, update the minimum length. For example, if the
input array is 𝐴𝐵𝐵𝐴𝐶𝐵𝐴𝐴 and the minimum window should cover characters 𝐴𝐴𝐵, then the sliding window will
move like this:
A B B A C B A A
A B B A C B A A
A B B A C B A A
Algorithm: The input is the given array and chars is the array of characters that need to be found.
1 Make an integer array shouldfind[] of len 256. The 𝑖 𝑡ℎ element of this array will have the count of how many times we need to
find the element of ASCII value 𝑖.
2 Make another array hasfound of 256 elements, which will have the count of the required elements found until now.
3 Count <= 0
4 While input[i]
a. If input[i] element is not to be found→ continue
b. If input[i] element is required => increase count by 1.
c. If count is length of chars[] array, slide the window as much right as possible.
d. If current window length is less than min length found until now, update min length.
#define MAX 256
void minLengthWindow(char input[], char chars[]) {
int shouldfind[MAX] = {0,}, hasfound[MAX] = {0,};
int j=0, cnt = 0, start=0, finish, minwindow = INT_MAX;
int charlen = strlen(chars), iplen = strlen(input);
for (int i=0; i< charlen; i++)
shouldfind[chars[i]] += 1;
finish = iplen;
for (int i=0; i< iplen; i++) {
if(!shouldfind[input[i]])
continue;
hasfound[input[i]] += 1;
if(shouldfind[input[i]] >= hasfound[input[i]])
cnt++;
if(cnt == charlen) {
while (shouldfind[input[j]] == 0 || hasfound[input[j]] > shouldfind[input[j]]) {
if(hasfound[input[j]] > shouldfind[input[j]])
hasfound[input[j]]--;
j++;
}
if(minwindow > (i - j +1)) {
minwindow = i - j +1;
finish = i;
start = j;
}
}
}
printf("Start:%d and Finish: %d", start, finish);
}
Complexity: If we walk through the code, 𝑖 and 𝑗 can traverse at most 𝑛 steps (where 𝑛 is the input size) in the worst case, adding to a total of
2𝑛 times. Therefore, time complexity is O(𝑛).
Problem-14 We are given a 2D array of characters and a character pattern. Give an algorithm to find if the
pattern is present in the 2D array. The pattern can be in any order (all 8 neighbors to be considered) but we
can’t use the same character twice while matching. Return 1 if match is found, 0 if not. For example: Find
“MICROSOFT” in the below matrix.
A C P R C
X S O P C
V O V N I
W G F M N
Q A T I T
Solution: Manually finding the solution of this problem is relatively intuitive; we just need to describe an algorithm for it. Ironically, describing
the algorithm is not the easy part.
How do we do it manually? First we match the first element, and when it is matched we match the second element in the 8 neighbors of the
first match. We do this process recursively, and when the last character of the input pattern matches, return true.
During the above process, take care not to use any cell in the 2D array twice. For this purpose, you mark every visited cell with some sign. If
your pattern matching fails at some point, start matching from the beginning (of the pattern) in the remaining cells. When returning, you
unmark the visited cells.
Let’s convert the above intuitive method into an algorithm. Since we are doing similar checks for pattern matching every time, a recursive
solution is what we need. In a recursive solution, we need to check if the substring passed is matched in the given matrix or not. The condition
is not to use the already used cell, and to find the already used cell, we need to add another 2D array to the function (or we can use an unused
bit in the input array itself.) Also, we need the current position of the input matrix from where we need to start. Since we need to pass a lot
more information than is actually given, we should be having a wrapper function to initialize the extra information to be passed.
Algorithm:
If we are past the last character in the pattern
Return true
If we get a used cell again
Return false if we got past the 2D matrix
Return false
If searching for first element and cell doesn’t match
FindMatch with next cell in row-first order (or column-first order)
Otherwise if character matches
mark this cell as used
res = FindMatch with next position of pattern in 8 neighbors
mark this cell as unused
Return res
Otherwise
Return false
#define MAX 100
boolean findMatch_wrapper(char mat[MAX][MAX], char *pat, int nrow, int ncol) {
if(strlen(pat) > nrow*ncol) return false;
int used[MAX][MAX] = {{0,},};
return findMatch(mat, pat, used, 0, 0, nrow, ncol, 0);
}
//level: index till which pattern is matched & x, y: current position in 2D array
boolean findMatch(char mat[MAX][MAX], char *pat, int used[MAX][MAX],
int x, int y, int nrow, int ncol, int level) {
if(level == strlen(pat)) //pattern matched
return true;
if(nrow == x || ncol == y) return false;
if(used[x][y]) return false;
if(mat[x][y] != pat[level] && level == 0) {
if(x < (nrow - 1))
return findMatch(mat, pat, used, x+1, y, nrow, ncol, level); //next element in same row
else if(y < (ncol - 1))
return findMatch(mat, pat, used, 0, y+1, nrow, ncol, level); //first element from same column
else return false;
}
else if(mat[x][y] == pat[level]) {
boolean res;
used[x][y] = 1; //marking this cell as used
//finding subpattern in 8 neighbors
res = (x > 0 ? findMatch(mat, pat, used, x-1, y, nrow, ncol, level+1) : false) ||
(res = x < (nrow - 1) ? findMatch(mat, pat, used, x+1, y, nrow, ncol, level+1) : false) ||
(res = y > 0 ? findMatch(mat, pat, used, x, y-1, nrow, ncol, level+1) : false) ||
(res = y < (ncol - 1) ? findMatch(mat, pat, used, x, y+1, nrow, ncol, level+1) : false) ||
(res = x < (nrow - 1)&&(y < ncol -1)?findMatch(mat, pat, used, x+1, y+1, nrow, ncol, level+1) : false) ||
(res = x < (nrow - 1) && y > 0 ? findMatch(mat, pat, used, x+1, y-1, nrow, ncol, level+1) : false) ||
(res = x > 0 && y < (ncol - 1) ? findMatch(mat, pat, used, x-1, y+1, nrow, ncol, level+1) : false) ||
(res = x > 0 && y > 0 ? findMatch(mat, pat, used, x-1, y-1, nrow, ncol, level+1) : false);
used[x][y] = 0; //marking this cell as unused
return res;
}
else return false;
}
Problem-15 Given two strings 𝑠𝑡𝑟1 and 𝑠𝑡𝑟2, write a function that prints all interleavings of the given two strings. We may assume
that all characters in both strings are different. Example: Input: 𝑠𝑡𝑟1 = "AB", 𝑠𝑡𝑟2 = "CD" and Output: ABCD ACBD ACDB CABD
CADB CDAB. An interleaved string of given two strings preserves the order of characters in individual strings. For example, in all the
interleavings of above first example, ‘A’ comes before ‘B’ and ‘C’ comes before ‘D’.
Solution: Let the length of 𝑠𝑡𝑟1 be 𝑚 and the length of 𝑠𝑡𝑟2 be 𝑛. Let us assume that all characters in 𝑠𝑡𝑟1 and 𝑠𝑡𝑟2 are different. Let
Count(𝑚, 𝑛) be the count of all interleaved strings in such strings. The value of Count(𝑚, 𝑛) can be written as following.
Count(m, n) = Count(m-1, n) + Count(m, n-1)
Count(1, 0) = 1 and Count(1, 0) = 1
To print all interleavings, we can first fix the first character of str1[0..m-1] in output string, and recursively call for str1[1..m-1] and str2[0..n-
1]. And then we can fix the first character of str2[0..n-1] and recursively call for str1[0..m-1] and str2[1..n-1].
void PrintInterleavings(char *str1, char *str2, char *iStr, int m, int n, int i){
// Base case: If all characters of str1 & str2 have been included in output string,
// then print the output string
if ( m==0 && n ==0 )
printf("%s\n", iStr) ;
// If some characters of str1 are left to be included, then include the
// first character from the remaining characters and recur for rest
if ( m != 0 ) {
iStr[i] = str1[0];
PrintInterleavings(str1 + 1, str2, iStr, m-1, n, i+1);
}
// If some characters of str2 are left to be included, then include the
// first character from the remaining characters and recur for rest
if ( n != 0 ) {
iStr[i] = str2[0];
PrintInterleavings(str1, str2+1, iStr, m, n-1, i+1);
}
}
// Allocates memory for output string and uses PrintInterleavings() for printing all interleaving’s
void Print(char *str1, char *str2, int m, int n){
char *iStr= (char*)malloc((m+n+1)*sizeof(char)); // allocate memory for the output string
iStr[m+n] = '\0'; // Set the terminator for the output string
PrintInterleavings(str1, str2, iStr, m, n, 0); // print all interleaving’s using PrintInterleavings()
free(iStr);
}
Problem-16 Given a matrix with size 𝑛 × 𝑛 containing random integers. Give an algorithm which checks whether rows match with a
column(s) or not. For example, if 𝑖 𝑡ℎ row matches with 𝑗𝑡ℎ column, and 𝑖 𝑡ℎ row contains the elements - [2,6,5,8,9]. Then 𝑗𝑡ℎ column
would also contain the elements - [2,6,5,8,9].
Solution: We can build a trie for the data in the columns (rows would also work). Then we can compare the rows with the trie. This would
allow us to exit as soon as the beginning of a row does not match any column (backtracking). Also this would let us check a row against all
columns in one pass.
If we do not want to waste memory for empty pointers then we can further improve the solution by constructing a suffix tree.
Problem-17 Write a method to replace all spaces in a string with '%20'. Assume string has sufficient space at end of string to hold
additional characters.
Solution: Find the number of spaces. Then, starting from end (assuming string has enough space), replace the characters. Starting from end
reduces the overwrites.
void encodeSpaceWithString(char* A){
char *space = "%20";
int stringLength = strlen(A);
if(stringLength ==0){
return;
}
int i, numberOfSpaces = 0;
for(i = 0; i < stringLength; i++){
if(A[i] == ' ' || A[i] == '\t'){
numberOfSpaces ++;
}
}
if(!numberOfSpaces)
return;
int newLength = len + numberOfSpaces * 2;
A[newLength] = '\0';
for(i = stringLength-1; i >= 0; i--){
if(A[i] == ' ' || A[i] == '\t'){
A[newLength--] = '0';
A[newLength--] = '2';
A[newLength--] = '%';
}
else{
A[newLength--] = A[i];
}
}
}
Time Complexity: O(𝑛). Space Complexity: O(1). Here, we do not have to worry about the space needed for extra characters.
Problem-18 Running length encoding: Write an algorithm to compress the given string by using the count
of repeated characters and if new com-pressed string length is not smaller than the original string then return
the original string.
Solution:
𝑊𝑖𝑡ℎ 𝑒𝑥𝑡𝑟𝑎 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 O(2):
string CompressString(string inputStr){
char last = inputStr.at(0);
int size = 0, count = 1;
char temp[2];
string str;
for (int i = 1; i < inputStr.length(); i++){
if(last == inputStr.at(i))
count ++;
else{
itoa(count, temp, 10);
str += last;
str += temp;
last = inputStr.at(i);
count = 1;
}
}
str = str + last + temp;
// If the compresed string size is greater than input string, return input string
if(str.length() >= inputStr.length())
return inputStr;
else return str;
}
Time Complexity: O(𝑛). Space Complexity: O(1), but it uses a temporary array of size two.
𝑊𝑖𝑡ℎ𝑜𝑢𝑡 𝑒𝑥𝑡𝑟𝑎 𝑠𝑝𝑎𝑐𝑒 (𝑖𝑛𝑝𝑙𝑎𝑐𝑒):
char CompressString(char *inputStr, char currentChar, int lengthIndex, int& countChar, int& index){
if(lengthIndex == -1)
return currentChar;
char lastChar = CompressString(inputStr, inputStr[lengthIndex], lengthIndex-1, countChar, index);
if(lastChar == currentChar)
countChar++;
else {
inputStr[index++] = lastChar;
for(int i =0; i< NumToString(countChar).length(); i++)
inputStr[index++] = NumToString(countChar).at(i);
countChar = 1;
}
return currentChar;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Chapter
Algorithms
Design
Techniques
16
16.1 Introduction
In the previous chapters, we have seen many algorithms for solving different kinds of problems. Before solving a
new problem, the general tendency is to look for the similarity of the current problem to other problems for which
we have solutions. This helps us in getting the solution easily.
In this chapter, we will see different ways of classifying the algorithms and in subsequent chapters we will focus
on a few of them (Greedy, Divide and Conquer, Dynamic Programming).
16.2 Classification
There are many ways of classifying algorithms and a few of them are shown below:
• Implementation Method
• Design Method
• Other Classifications
Deterministic or Non-Deterministic
𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 algorithms solve the problem with a predefined process, whereas 𝑛𝑜𝑛 − 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 algorithms
guess the best solution at each step through the use of heuristics.
Exact or Approximate
As we have seen, for many problems we are not able to find the optimal solutions. That means, the algorithms for
which we are able to find the optimal solutions are called 𝑒𝑥𝑎𝑐𝑡 algorithms. In computer science, if we do not have
the optimal solution, we give approximation algorithms.
Approximation algorithms are generally associated with NP-hard problems (refer to the 𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 chapter
for more details).
Greedy Method
𝐺𝑟𝑒𝑒𝑑𝑦 algorithms work in stages. In each stage, a decision is made that is good at that point, without bothering
about the future consequences. Generally, this means that some 𝑙𝑜𝑐𝑎𝑙 𝑏𝑒𝑠𝑡 is chosen. It assumes that the local
best selection also makes for the 𝑔𝑙𝑜𝑏𝑎𝑙 optimal solution.
Dynamic Programming
Dynamic programming (DP) and memoization work together. The difference between DP and divide and conquer
is that in the case of the latter there is no dependency among the sub problems, whereas in DP there will be an
overlap of sub-problems. By using memoization [maintaining a table for already solved sub problems], DP reduces
the exponential complexity to polynomial complexity (O(𝑛2 ), O(𝑛3 ), etc.) for many problems.
The difference between dynamic programming and recursion is in the memoization of recursive calls. When sub
problems are independent and if there is no repetition, memoization does not help, hence dynamic programming
is not a solution for all problems.
By using memoization [maintaining a table of sub problems already solved], dynamic programming reduces the
complexity from exponential to polynomial.
Linear Programming
Linear programming is not a programming language like C++, Java, or Visual Basic. Linear programming can be
defined as:
A method to allocate scarce resources to competing activities in an optimal manner when the problem can be
expressed using a linear objective function and linear inequality constraints.
A linear program consists of a set of variables, a linear objective function indicating the contribution of each
variable to the desired outcome, and a set of linear constraints describing the limits on the values of the variables.
The 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 to a linear program is a set of values for the problem variables that results in the best --
𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑜𝑟 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 -- value of the objective function and yet is consistent with all the constraints. Formulation is
the process of translating a real-world problem into a linear program.
Once a problem has been formulated as a linear program, a computer program can be used to solve the problem.
In this regard, solving a linear program is relatively easy. The hardest part about applying linear programming is
formulating the problem and interpreting the solution. In linear programming, there are inequalities in terms of
inputs and 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑖𝑛𝑔 (or 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑖𝑛𝑔) some linear function of the inputs. Many problems (example: maximum
flow for directed graphs) can be discussed using linear programming.
not dominated by the resulting reduced algorithms. For example, the selection algorithm for finding the median
in a list involves first sorting the list and then finding out the middle element in the sorted list. These techniques
are also called 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 𝑎𝑛𝑑 𝑐𝑜𝑛𝑞𝑢𝑒𝑟.
Classification by Complexity
In this classification, algorithms are classified by the time they take to find a solution based on their input size.
Some algorithms take linear time complexity (O(𝑛)) and others take exponential time, and some never halt. Note
that some problems may have multiple algorithms with different complexities.
Randomized Algorithms
A few algorithms make choices randomly. For some problems, the fastest solutions must involve randomness.
Example: Quick Sort.
Chapter
Greedy
Algorithms 17
17.1 Introduction
Let us start our discussion with simple theory that will give us an understanding of the Greedy technique. In the
game of 𝐶ℎ𝑒𝑠𝑠, every time we make a decision about a move, we have to also think about the future consequences.
Whereas, in the game of 𝑇𝑒𝑛𝑛𝑖𝑠 (or 𝑉𝑜𝑙𝑙𝑒𝑦𝑏𝑎𝑙𝑙), our action is based on the immediate situation.
This means that in some cases making a decision that looks right at that moment gives the best solution (𝐺𝑟𝑒𝑒𝑑𝑦),
but in other cases it doesn’t. The Greedy technique is best suited for looking at the immediate situation.
Optimal substructure
A problem exhibits optimal substructure if an optimal solution to the problem contains optimal solutions to the
subproblems. That means we can solve subproblems and build up the solutions to solve larger problems.
An Example
Let's assume that after scanning a file we find the following character frequencies:
Character Frequency
𝑎 12
𝑏 2
𝑐 7
𝑑 13
𝑒 14
𝑓 85
Given this, create a binary tree for each character that also stores the frequency with which it occurs (as shown below).
The algorithm works as follows: In the list, find the two binary trees that store minimum frequencies at their nodes.
Connect these two nodes at a newly created common node that will store no character but will store the sum of the frequencies of all the
nodes connected below it. So our picture looks like this:
21 27
48
21 27
133
48 f-85
21 27
b-2 c-7
Once the tree is built, each leaf node corresponds to a letter with a code. To determine the code for a particular
node, traverse from the root to the leaf node. For each move to the left, append a 0 to the code, and for each move
to the right, append a 1. As a result, for the above generated tree, we get the following codes:
Letter Code
a 001
b 0000
c 0001
d 010
e 011
f 1
Solution: Using the Greedy algorithm we can reduce the total time for merging the given files. Let us consider the
following algorithm.
Algorithm:
1. Store file sizes in a priority queue. The key of elements are file lengths.
2. Repeat the following until there is only one file:
a. Extract two smallest elements 𝑋 and 𝑌.
b. Merge 𝑋 and 𝑌 and insert this new file in the priority queue.
Variant of same algorithm:
1. Sort the file sizes in ascending order.
2. Repeat the following until there is only one file:
a. Take the first two elements (smallest) 𝑋 and 𝑌.
b. Merge 𝑋 and 𝑌 and insert this new file in the sorted list.
To check the above algorithm, let us trace it with the previous example. The given array is:
𝐹 = {10,5,100,50,20,15}
As per the above algorithm, after sorting the list it becomes: {5, 10, 15, 20, 50,100}. We need to merge the two smallest
files (5 and 10 size files) and as a result we get the following list of files. In the list below, 15 indicates the cost of
merging two files with sizes 10 and 5.
{15,15,20,50,100}
Similarly, merging the two smallest elements (15 and 15) produces: {20,30,50,100}. For the subsequent steps the
list becomes
{50,50,100} //merging 20 and 30
{100,100} //merging 20 and 30
Finally, {200}
The total cost of merging = Cost of all merging operations = 15 + 30 + 50 + 100 + 200 = 395. So, this algorithm is
producing the optimal solution for this merging problem.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛) time using heaps to find best merging pattern plus the optimal cost of merging the
files.
Problem-4 Interval Scheduling Algorithm: Given a set of 𝑛 intervals S = {(start i , endi )|1 ≤ i ≤ n}. Let us
assume that we want to find a maximum subset 𝑆′ of 𝑆 such that no pair of intervals in 𝑆′ overlaps. Check
whether the following algorithm works or not.
Algorithm: while (𝑆 is not empty) {
Select the interval 𝐼 that overlaps the least number of other intervals.
Add 𝐼 to final solution set 𝑆′.
Remove all intervals from 𝑆 that overlap with 𝐼.
}
Solution: This algorithm does not solve the problem of finding a maximum subset of non-overlapping intervals.
Consider the following intervals. The optimal solution is {𝑀, 𝑂, 𝑁, 𝐾}. However, the interval that overlaps with the
fewest others is 𝐶, and the given algorithm will select 𝐶 first.
M O N K
Problem-5 In Problem-4, if we select the interval that starts earliest (also not overlapping with already chosen
intervals), does it give the optimal solution?
Solution: No. It will not give the optimal solution. Let us consider the example below. It can be seen that the optimal solution is 4 whereas
the given algorithm gives 1.
Optimal Solution
Problem-6 In Problem-4, if we select the shortest interval (but it is not overlapping the already chosen
intervals), does it give the optimal solution?
Solution: This also will not give the optimal solution. Let us consider the example below. It can be seen that the optimal solution is 2 whereas
the algorithm gives 1.
Optimal Solution
B C D
E F G
Maximizing the number of classes in the first room results in having {𝐵, 𝐶, 𝐹, 𝐺} in one room, and classes 𝐴, 𝐷, and
𝐸 each in their own rooms, for a total of 4. The optimal solution is to put 𝐴 in one room, { 𝐵, 𝐶, 𝐷 } in another, and
{𝐸, 𝐹, 𝐺} in another, for a total of 3 rooms.
Problem-9 For Problem-8, consider the following algorithm. Process the classes in increasing order of start
times. Assume that we are processing class 𝐶. If there is a room 𝑅 such that 𝑅 has been assigned to an earlier
class, and 𝐶 can be assigned to 𝑅 without overlapping previously assigned classes, then assign 𝐶 to 𝑅.
Otherwise, put 𝐶 in a new room. Does this algorithm solve the problem?
Solution: This algorithm solves the interval-coloring problem. Note that if the greedy algorithm creates a new room
for the current class 𝑐𝑖 , then because it examines classes in order of start times, 𝑐𝑖 start point must intersect with
the last class in all of the current rooms. Thus when greedy creates the last room, 𝑛, it is because the start time
of the current class intersects with 𝑛 − 1 other classes. But we know that for any single point in any class it can
only intersect with at most s other class, so it must then be that 𝑛 ≤ 𝑆. As 𝑠 is a lower bound on the total number
needed, and greedy is feasible, it is thus also optimal.
Note: For optimal solution refer to Problem-7 and for code refer to Problem-10.
Problem-10 Suppose we are given two arrays 𝑆𝑡𝑎𝑟𝑡[1 . . 𝑛] and 𝐹𝑖𝑛𝑖𝑠ℎ[1 . . 𝑛] listing the start and finish times of each class. Our task
is to choose the largest possible subset 𝑋 ∈ {1, 2, . . . , 𝑛} so that for any pair 𝑖, 𝑗 ∈ 𝑋, either 𝑆𝑡𝑎𝑟𝑡 [𝑖] > 𝐹𝑖𝑛𝑖𝑠ℎ[𝑗] or 𝑆𝑡𝑎𝑟𝑡 [𝑗] >
𝐹𝑖𝑛𝑖𝑠ℎ [𝑖]
Solution: Our aim is to finish the first class as early as possible, because that leaves us with the most remaining classes. We scan through the
classes in order of finish time, and whenever we encounter a class that doesn’t conflict with the latest class so far, then we take that class.
int largestTasks(int Start[], int n, int Finish []) {
sort Finish[];
rearrange Start[] to match;
count = 1;
X[count] = 1;
for (i = 2; i<n; i++) {
if(Start[i] > Finish[X[count]]) {
count = count + 1;
X[count] = I;
}
}
return X[1 .. count];
}
This algorithm clearly runs in O(𝑛𝑙𝑜𝑔𝑛) time due to sorting.
Problem-11 Consider the making change problem in the country of India. The input to this problem is an
integer 𝑀. The output should be the minimum number of coins to make 𝑀 rupees of change. In India, assume
the available coins are 1, 5, 10, 20, 25, 50 rupees. Assume that we have an unlimited number of coins of each
type.
For this problem, does the following algorithm produce the optimal solution or not? Take as many coins
as possible from the highest denominations. So for example, to make change for 234 rupees the greedy
algorithm would take four 50 rupee coins, one 25 rupee coin, one 5 rupee coin, and four 1 rupee coins.
Solution: The greedy algorithm is not optimal for the problem of making change with the minimum number of coins
when the denominations are 1, 5, 10, 20, 25, and 50. In order to make 40 rupees, the greedy algorithm would use
three coins of 25, 10, and 5 rupees. The optimal solution is to use two 20-shilling coins.
Note: For the optimal solution, refer to the 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 chapter.
Problem-12 Let us assume that we are going for a long drive between cities A and B. In preparation for our
trip, we have downloaded a map that contains the distances in miles between all the petrol stations on our
route. Assume that our car’s tanks can hold petrol for 𝑛 miles. Assume that the value 𝑛 is given. Suppose we
stop at every point. Does it give the best solution?
Solution: Here the algorithm does not produce optimal solution. Obvious Reason: filling at each petrol station does
not produce optimal solution.
Problem-13 For problem Problem-12, stop if and only if you don’t have enough petrol to make it to the next
gas station, and if you stop, fill the tank up all the way. Prove or disprove that this algorithm correctly solves
the problem.
Solution: The greedy approach works: We start our trip from 𝐴 with a full tank. We check our map to determine the
farthest petrol station on our route within 𝑛 miles. We stop at that petrol station, fill up our tank and check our
map again to determine the farthest petrol station on our route within n miles from this stop. Repeat the process
until we get to 𝐵.
Note: For code, refer to 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 chapter.
Problem-14 Fractional Knapsack problem: Given items 𝑡1 , 𝑡2 , . . . , 𝑡𝑛 (items we might want to carry in our backpack)
with associated weights s1 , s2 , … , 𝑠𝑛 and benefit values 𝑣1 , 𝑣2 , . . . , 𝑣𝑛 , how can we maximize the total benefit
considering that we are subject to an absolute weight limit 𝐶?
Solution:
Algorithm:
𝑣
1) Compute value per size density for each item 𝑑𝑖 = 𝑠𝑖 .
𝑖
2) Sort each item by its value density.
3) Take as much as possible of the density item not already in the bag
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛) for sorting and O(𝑛) for greedy selections.
Note: The items can be entered into a priority queue and retrieved one by one until either the bag is full or all
items have been selected. This actually has a better runtime of O(𝑛 + 𝑐𝑙𝑜𝑔𝑛) where 𝑐 is the number of items that
actually get selected in the solution. There is a savings in runtime if 𝑐 = O(𝑛), but otherwise there is no change
in the complexity.
Problem-15 Number of railway-platforms: At a railway station, we have a time-table with the trains’ arrivals and departures. We need
to find the minimum number of platforms so that all the trains can be accommodated as per their schedule.
Example: The timetable is as given below, the answer is 3. Otherwise, the railway station will not be able to
accommodate all the trains.
Rail Arrival Departure
Rail A 0900 hrs 0930 hrs
1 1 -1 1 1 -1 -1 -1
Finally make a cumulative array out of this:
1 2 1 2 3 2 1 0
Our solution will be the maximum value in this array. Here it is 3.
Note: If we have a train arriving and another departing at the same time, then put the departure time first in the
sorted array.
Problem-16 Consider a country with very long roads and houses along the road. Assume that the residents of all houses use cell
phones. We want to place cell phone towers along the road, and each cell phone tower covers a range of 7 kilometers. Create an efficient
algorithm that allow for the fewest cell phone towers.
Solution:
7 miles 7 miles
𝐶(𝑘) = ∑ 𝐴[𝑖]
𝑖=1
The cost reflects the fact that before we read song 𝑘 we must first scan past all the earlier songs on the tape.
If we change the order of the songs on the tape, we change the cost of accessing the songs, with the result
that some songs become more expensive to read, but others become cheaper. Different song orders are likely
to result in different expected costs. If we assume that each song is equally likely to be accessed, which order
should we use if we want the expected cost to be as small as possible?
Solution: The answer is simple. We should store the songs in the order from shortest to longest. Storing the short songs at the beginning
reduces the forwarding times for the remaining jobs.
Problem-18 Let us consider a set of events at 𝐻𝐼𝑇𝐸𝑋 (𝐻𝑦𝑑𝑒𝑟𝑎𝑏𝑎𝑑 𝐶𝑜𝑛𝑣𝑒𝑛𝑡𝑖𝑜𝑛 𝐶𝑒𝑛𝑡𝑒𝑟). Assume that there are 𝑛 events where
each takes one unit of time. Event 𝑖 will provide a profit of 𝑝𝑖 (𝑝𝑖 > 0) if started at or before time 𝑡𝑖 , where 𝑡𝑖 is an arbitrary number. If
an event is not started by 𝑡𝑖 then there is no benefit in scheduling it at all. All events can start as early as time 0. Give the efficient algorithm
to find a schedule that maximizes the profit.
Solution: This problem can be solved with greedy technique. The setting is that we have 𝑛 events, each of which takes unit time, and a
convention center on which we would like to schedule them in as profitable a manner as possible. Each event has a profit associated with it,
as well as a deadline; if the event is not scheduled by the deadline, then we don’t get the profit.
Because each event takes the same amount of time, we will think of a 𝑆𝑐ℎ𝑒𝑑𝑢𝑙𝑒 𝐸 as consisting of a sequence of event “slots” 0, 2, 3, . . .
where 𝐸(𝑡) is the event scheduled in slot t.
More formally, the input is a sequence (𝑡0 , 𝑝0 ),( 𝑡1 , 𝑝1 ), (𝑡2 , 𝑝2 ) · · · ,( 𝑡𝑛−1 , 𝑝𝑛−1 ) where 𝑝𝑖 is a nonnegative real number representing the
profit obtainable from event 𝑖, and 𝑡𝑖 is the deadline for event 𝑖. Notice that, even if some event deadlines were bigger than 𝑛, we can schedule
them in a slot less than 𝑛 as each event takes only one unit of time.
Algorithm:
1. Sort the events according to their profits 𝑝𝑖 in the decreasing order.
2. Now, for each of the events:
o Schedule event 𝑖 in the latest possible free slot meeting its deadline.
o If there is no such slot, do not schedule event 𝑖.
The sort takes O(𝑛𝑙𝑜𝑔𝑛) and the scheduling take O(𝑛) for 𝑛 events. So the overall running time of the algorithm is O(𝑛𝑙𝑜𝑔𝑛) time.
Problem-19 Let us consider a customer-care server (say, mobile customer-care) with 𝑛 customers to be served in the queue. For
simplicity assume that the service time required by each customer is known in advance and it is 𝑤𝑖 minutes for customer 𝑖. So if, for
example, the customers are served in order of increasing 𝑖, then the 𝑖 𝑡ℎ customer has to wait: ∑𝑛−1
𝑗=1 𝑤𝑗 𝑚𝑖𝑛𝑢𝑡𝑒𝑠. The total waiting time
𝑛 𝑖−1
of all customers can be given as = ∑𝑖=1 ∑𝑗=1 𝑤𝑗 . What is the best way to serve the customers so that the total waiting time can be reduced?
Solution: This problem can be easily solved using greedy technique. Since our objective is to reduce the total waiting time, what we can do is,
select the customer whose service time is less. That means, if we process the customers in the increasing order of service time then we can
reduce the total waiting time.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛).
Chapter
Divide and
Conquer
Algorithms
18
18.1 Introduction
In the 𝐺𝑟𝑒𝑒𝑑𝑦 chapter, we have seen that for many problems the Greedy strategy failed to provide optimal solutions. Among those problems,
there are some that can be easily solved by using the 𝐷𝑖𝑣𝑖𝑑𝑒 𝑎𝑛𝑑 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 (D & 𝐶) technique. Divide and Conquer is an important algorithm
design technique based on recursion.
The 𝐷 & 𝐶 algorithm works by recursively breaking down a problem into two or more sub problems of the same type, until they become
simple enough to be solved directly. The solutions to the sub problems are then combined to give a solution to the original problem.
a problem of size 𝑛
Subproblems
a subproblem of size ………….. a subproblem of size
𝑛/𝑏 𝑛/𝑏
Combine (
DivideAndConquer ( P1 ),
DivideAndConquer ( P2 ),
...
DivideAndConquer ( Pk )
)
);
}
𝑎
1) If 𝑎 > 𝑏𝑘 , then 𝑇(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑏 )
2) If 𝑎 = 𝑏𝑘
𝑎
a. If 𝑝 > −1, then 𝑇(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑏 𝑙𝑜𝑔𝑝+1 𝑛)
𝑎
b. If 𝑝 = −1, then 𝑇(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑏 𝑙𝑜𝑔𝑙𝑜𝑔𝑛)
𝑎
c. If 𝑝 < −1, then 𝑇(𝑛) = Θ(𝑛𝑙𝑜𝑔𝑏 )
𝑘
3) If 𝑎 < 𝑏
a. If 𝑝 ≥ 0, then 𝑇(𝑛) = Θ(𝑛𝑘 𝑙𝑜𝑔𝑝 𝑛)
b. If 𝑝 < 0, then 𝑇(𝑛) = O(𝑛𝑘 )
Solution: Let us assume that input size is 𝑛 and 𝑇(𝑛) defines the solution to the given problem. As per the given
𝑛
code, after printing the character and dividing the problem into 2 subproblems with each of size 2 and solving
𝑛
them. So we need to solve 2T(2 ) subproblems. After solving these subproblems, the algorithm is not doing anything
for combining the solutions. The total recurrence algorithm for this problem can be given as:
𝑛
𝑇(𝑛) = 2𝑇 (2 )+O(1)
2
Using Master theorem (of D & C), we get the complexity as O(𝑛𝑙𝑜𝑔2 ) ≈ O(𝑛1 ) = O(𝑛).
Problem-5 Given an array, give an algorithm for finding the maximum and minimum.
Solution: Refer to 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 chapter.
Problem-6 Discuss Binary Search and its complexity.
Solution: Refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter for discussion on Binary Search.
Analysis: Let us assume that input size is 𝑛 and 𝑇(𝑛) defines the solution to the given problem. The elements are
in sorted order. In binary search we take the middle element and check whether the element to be searched is
equal to that element or not. If it is equal then we return that element.
If the element to be searched is less than the middle element then we consider the left sub-array for finding the
element and discard the right sub-array. Similarly, if the element to be searched is greater than the middle element
then we consider the right sub-array for finding the element and discard the left sub-array.
What this means is, in both the cases we are discarding half of the sub-array and considering the remaining half
only. Also, at every iteration we are dividing the elements into two equal halves.
𝑛
As per the above discussion every time we divide the problem into 2 sub problems with each of size 2 and solve
𝑛
one T(2 ) sub problem. The total recurrence algorithm for this problem can be given as:
𝑛
𝑇(𝑛) = 2𝑇 (2 ) +O(1)
Using Master theorem (of D & C), we get the complexity as O(𝑙𝑜𝑔𝑛).
Problem-7 Consider the modified version of binary search. Let us assume that the array is divided into 3
equal parts (ternary search) instead of 2 equal parts. Write the recurrence for this ternary search and find its
complexity.
n
Solution: From the discussion on Problem-5, binary search has the recurrence relation: T(n) = T (2) +O(1). Similar
to the Problem-5 discussion, instead of 2 in the recurrence relation we use "3". That indicates that we are dividing
the array into 3 sub-arrays with equal size and considering only one of them. So, the recurrence for the ternary
search can be given as:
𝑛
𝑇(𝑛) = 𝑇 (3 ) +O(1)
Using Master theorem (of 𝐷 & 𝐶), we get the complexity as O(𝑙𝑜𝑔3𝑛 ) ≈ O(𝑙𝑜𝑔𝑛) (we don’t have to worry about the base
of 𝑙𝑜𝑔 as they are constants).
Problem-8 In Problem-5, what if we divide the array into two sets of sizes approximately one-third and two-
thirds.
Solution: We now consider a slightly modified version of ternary search in which only one comparison is made,
𝑛 2𝑛
which creates two partitions, one of roughly 3 elements and the other of 3 . Here the worst case comes when the
2𝑛
recursive call is on the larger element part. So the recurrence corresponding to this worst case is:
3
2𝑛
𝑇(𝑛) = 𝑇 ( 3 ) + O(1)
Using Master theorem (of D & C), we get the complexity as O(𝑙𝑜𝑔𝑛). It is interesting to note that we will get the
same results for general 𝑘-ary search (as long as 𝑘 is a fixed constant which does not depend on 𝑛) as 𝑛 approaches
infinity.
Problem-9 Discuss Merge Sort and its complexity.
Solution: Refer to 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter for discussion on Merge Sort. In Merge Sort, if the number of elements are
greater than 1, then divide them into two equal subsets, the algorithm is recursively invoked on the subsets, and
the returned sorted subsets are merged to provide a sorted list of the original set. The recurrence equation of the
Merge Sort algorithm is:
𝑛
2𝑇 ( ) + 𝑂(𝑛), 𝑖𝑓 𝑛 > 1
𝑇(𝑛) = { 2
0 , 𝑖𝑓𝑛 = 1
If we solve this recurrence using D & C Master theorem it gives O(nlogn) complexity.
Problem-10 Discuss Quick Sort and its complexity.
Solution: Refer to 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter for discussion on Quick Sort. For Quick Sort we have different complexities for
best case and worst case.
Best Case: In 𝑄𝑢𝑖𝑐𝑘 𝑆𝑜𝑟𝑡, if the number of elements is greater than 1 then they are divided into two equal subsets,
and the algorithm is recursively invoked on the subsets. After solving the sub problems we don’t need to combine
them. This is because in 𝑄𝑢𝑖𝑐𝑘 𝑆𝑜𝑟𝑡 they are already in sorted order. But, we need to scan the complete elements
to partition the elements. The recurrence equation of 𝑄𝑢𝑖𝑐𝑘 𝑆𝑜𝑟𝑡 best case is
𝑛
2𝑇 ( ) + 𝑂(𝑛), 𝑖𝑓 𝑛 > 1
𝑇(𝑛) = { 2
0 , 𝑖𝑓𝑛 = 1
If we solve this recurrence using Master theorem of D & C gives O(𝑛𝑙𝑜𝑔𝑛) complexity.
Worst Case: In the worst case, Quick Sort divides the input elements into two sets and one of them contains only
one element. That means other set has 𝑛 − 1 elements to be sorted. Let us assume that the input size is 𝑛 and
𝑇(𝑛) defines the solution to the given problem. So we need to solve 𝑇(𝑛 − 1), 𝑇(1) subproblems. But to divide the
input into two sets Quick Sort needs one scan of the input elements (this takes O(𝑛)).
After solving these sub problems the algorithm takes only a constant time to combine these solutions. The total
recurrence algorithm for this problem can be given as:
𝑇(𝑛) = 𝑇(𝑛 − 1) +O(1) +O(𝑛).
𝑛(𝑛+1)
This is clearly a summation recurrence equation. So, 𝑇(𝑛) = = O(𝑛2 ).
2
Note: For the average case analysis, refer to 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter.
Problem-11 Given an infinite array in which the first 𝑛 cells contain integers in sorted order and the rest of
the cells are filled with some special symbol (say, $). Assume we do not know the 𝑛 value. Give an algorithm
that takes an integer 𝐾 as input and finds a position in the array containing 𝐾, if such a position exists, in
O(𝑙𝑜𝑔𝑛) time.
Solution: Since we need an O(𝑙𝑜𝑔𝑛) algorithm, we should not search for all the elements of the given list (which
gives O(𝑛) complexity). To get O(𝑙𝑜𝑔𝑛) complexity one possibility is to use binary search. But in the given scenario
we cannot use binary search as we do not know the end of the list. Our first problem is to find the end of the list.
To do that, we can start at the first element and keep searching with doubled index. That means we first search
at index 1 then, 2, 4, 8 …
int findInInfiniteSeries(int A[]) {
int mid, l = r = 1;
while( A[r] != ‘$’) {
l = r;
r = r × 2;
}
while( (r – l > 1 ) {
mid = (r – l)/2 + l;
if( A[mid] == ‘$’)
r = mid;
else l = mid;
}
}
It is clear that, once we have identified a possible interval 𝐴[𝑖, . . . ,2𝑖] in which 𝐾 might be, its length is at most 𝑛
(since we have only 𝑛 numbers in the array 𝐴), so searching for 𝐾 using binary search takes O(𝑙𝑜𝑔𝑛) time.
Problem-12 Given a sorted array of non-repeated integers 𝐴[1. . 𝑛], check whether there is an index 𝑖 for which
𝐴[𝑖] = 𝑖. Give a divide-and-conquer algorithm that runs in time O(𝑙𝑜𝑔𝑛).
Solution: We can't use binary search on the array as it is. If we want to keep the O(𝑙𝑜𝑔𝑛) property of the solution
we have to implement our own binary search. If we modify the array (in place or in a copy) and subtract 𝑖 from
A[i], we can then use binary search. The complexity for doing so is O(𝑛).
Problem-13 We are given two sorted lists of size 𝑛. Give an algorithm for finding the median element in the
union of the two lists.
Solution: We use the Merge Sort process. Use 𝑚𝑒𝑟𝑔𝑒 procedure of merge sort (refer to 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 chapter). Keep track
of the count while comparing elements of two arrays. If the count becomes 𝑛 (since there are 2𝑛 elements), we
have reached the median. Take the average of the elements at indexes 𝑛 − 1 and 𝑛 in the merged array.
Time Complexity: O(𝑛).
Problem-14 Can we give the algorithm if the size of the two lists are not the same?
Solution: The solution is similar to the previous problem. Let us assume that the lengths of two lists are 𝑚 and
𝑛. In this case we need to stop when the counter reaches (𝑚 + 𝑛)/2.
Time Complexity: O((𝑚 + 𝑛)/2).
Problem-15 Can we improve the time complexity of Problem-13 to O(𝑙𝑜𝑔𝑛)?
Solution: Yes, using the D & C approach. Let us assume that the given two lists are 𝐿1 and 𝐿2.
Algorithm:
1. Find the medians of the given sorted input arrays 𝐿1[] and 𝐿2[]. Assume that those medians are 𝑚1 and
𝑚2.
2. If 𝑚1 and 𝑚2 are equal then return 𝑚1 (or 𝑚2).
3. If 𝑚1 is greater than 𝑚2, then the final median will be below two sub arrays.
4. From first element of 𝐿1 to 𝑚1.
5. From 𝑚2 to last element of 𝐿2.
6. If 𝑚2 is greater than 𝑚1, then median is present in one of the two sub arrays below.
7. From 𝑚1 to last element of 𝐿1.
8. From first element of 𝐿2 to 𝑚2.
9. Repeat the above process until the size of both the sub arrays becomes 2.
10. If size of the two arrays is 2, then use the formula below to get the median.
11. Median = (𝑚𝑎𝑥(𝐿1[0], 𝐿2[0]) + 𝑚𝑖𝑛(𝐿1[1], 𝐿2[1])/2
Time Complexity: O(𝑙𝑜𝑔𝑛) since we are considering only half of the input and throwing the remaining half.
Problem-16 Given an input array 𝐴. Let us assume that there can be duplicates in the list. Now search for an
element in the list in such a way that we get the highest index if there are duplicates.
Solution: Refer to 𝑆𝑒𝑎𝑟𝑐ℎ𝑖𝑛𝑔 chapter.
Problem-17 Discuss Strassen's Matrix Multiplication Algorithm using Divide and Conquer. That means, given
two 𝑛 × 𝑛 matrices, 𝐴 and 𝐵, compute the 𝑛 × 𝑛 matrix 𝐶 = 𝐴 × 𝐵, where the elements of 𝐶 are given by
𝑛−1
Fortunately, it turns out that one of the eight matrix multiplications is redundant (found by Strassen). Consider
𝑛 𝑛
the following series of seven 2 × 2 matrices:
𝑀0 = (𝐴1,1 + 𝐴2,2 ) × (𝐵1,1 + 𝐵2,2 )
𝑀1 = (𝐴1,2 − 𝐴2,2 ) × (𝐵2,1 + 𝐵2,2 )
𝑀2 = (𝐴1,1 − 𝐴2,1 ) × (𝐵1,1 + 𝐵1,2 )
𝑀3 = (𝐴1,1 + 𝐴1,2 ) × 𝐵2,2
𝑀4 = 𝐴1,1 × (𝐵1,2 − 𝐵2,2 )
𝑀5 = 𝐴2,2 × (𝐵2,1 − 𝐵1,1 )
𝑀6 = (𝐴21 + 𝐴2,2 ) × 𝐵1,1
Each equation above has only one multiplication. Ten additions and seven multiplications are required to compute
𝑀0 through 𝑀6 . Given 𝑀0 through 𝑀6 , we can compute the elements of the product matrix C as follows:
𝐶1,1 = 𝑀0 + 𝑀1 − 𝑀3 + 𝑀5
𝐶1,2 = 𝑀3 + 𝑀4
𝐶2,1 = 𝑀5 + 𝑀6
𝐶2,2 = 𝑀0 − 𝑀2 + 𝑀4 − 𝑀6
𝑛 𝑛 𝑛 𝑛
This approach requires seven 2 × 2 matrix multiplications and 18 2 × 2 additions. Therefore, the worst-case
running time is given by the following recurrence:
O(1) , 𝑓𝑜𝑟 𝑛 = 1
𝑇(𝑛) = { 𝑛 2)
7𝑇 ( ) + O(𝑛 , 𝑓𝑜𝑟 𝑛 = 1
2
7
Using master theorem, we get, 𝑇(𝑛) = O(𝑛𝑙𝑜𝑔2 ) = O(𝑛2.81 ).
Problem-18 Stock Pricing Problem: Consider the stock price of 𝐶𝑎𝑟𝑒𝑒𝑟𝑀𝑜𝑛𝑘. 𝑐𝑜𝑚 in 𝑛 consecutive days. That
means the input consists of an array with stock prices of the company. We know that the stock price will not
be the same on all the days. In the input stock prices there may be dates where the stock is high when we
can sell the current holdings, and there may be days when we can buy the stock. Now our problem is to find
the day on which we can buy the stock and the day on which we can sell the stock so that we can make
maximum profit.
Solution: As given in the problem, let us assume that the input is an array with stock prices [integers]. Let us say
the given array is 𝐴[1], . . . , 𝐴[𝑛]. From this array we have to find two days [one for buy and one for sell] in such a
way that we can make maximum profit. Also, another point to make is that the buy date should be before sell
date. One simple approach is to look at all possible buy and sell dates.
void stockStrategy(int A[], int n, int *buyDateIndex, int *sellDateIndex) {
int j, profit=0;
*buyDateIndex =0; *sellDateIndex =0;
for (int i = 1; i < n; i++) //indicates buy date
//indicates sell date
for( j = i; j < n; j++)
if(A[j] - A[i] > profit) {
profit = A[j] - A[i];
*buyDateIndex = i;
*sellDateIndex = j;
}
}
The two nested loops take 𝑛(𝑛 + 1)/2 computations, so this takes time (𝑛2 ).
Problem-19 For Problem-18, can we improve the time complexity?
Solution: Yes, by opting for the Divide-and-Conquer (𝑛𝑙𝑜𝑔𝑛) solution. Divide the input list into two parts and
recursively find the solution in both the parts. Here, we get three cases:
• 𝑏𝑢𝑦𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 and 𝑠𝑒𝑙𝑙𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 both are in the earlier time period.
• 𝑏𝑢𝑦𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 and 𝑠𝑒𝑙𝑙𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 both are in the later time period.
• 𝑏𝑢𝑦𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 is in the earlier part and 𝑠𝑒𝑙𝑙𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 is in the later part of the time period.
The first two cases can be solved with recursion. The third case needs care. This is because 𝑏𝑢𝑦𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 is one
side and 𝑠𝑒𝑙𝑙𝐷𝑎𝑡𝑒𝐼𝑛𝑑𝑒𝑥 is on other side. In this case we need to find the minimum and maximum prices in the two
sub-parts and this we can solve in linear-time.
void stockStrategy(int A[], int left, int right) {
//Declare the necessary variables;
if(left + 1 == right)
return (0, left, left);
18.10 Divide and Conquer: Problems & Solutions 410
Data Structures and Algorithms Made Easy Divide and Conquer Algorithms
𝑇(1) = 1
𝑇(𝑛) = 2𝑇(𝑛/2) + 𝑛
Using 𝐷 & 𝐶 Master theorem, we get the time complexity as 𝑇(𝑛) = O(𝑛𝑙𝑜𝑔𝑛).
Note: For an efficient solution refer to the 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 chapter.
Problem-26 Closest-Pair of Points: Given a set of 𝑛 points, 𝑆 = {p1 , p2 , p3 , … , pn }, where pi = (xi , yi ). Find the pair
of points having the smallest distance among all pairs (assume that all points are in one dimension).
Solution: Let us assume that we have sorted the points. Since the points are in one dimension, all the points are
in a line after we sort them (either on 𝑋-axis or 𝑌-axis). The complexity of sorting is O(𝑛𝑙𝑜𝑔𝑛). After sorting we can
go through them to find the consecutive points with the least difference. So the problem in one dimension is solved
in O(𝑛𝑙𝑜𝑔𝑛) time which is mainly dominated by sorting time.
Time Complexity: O(𝑛𝑙𝑜𝑔𝑛).
Problem-27 For Problem-26, how do we solve it if the points are in two-dimensional space?
Solution: Before going to the algorithm, let us consider the following mathematical equation:
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑝1 , 𝑝2 ) = ඥ(𝑥1 − 𝑥2 )2 − (𝑦1 − 𝑦2 )2
The above equation calculates the distance between two points 𝑝1 = (𝑥1 , 𝑦1 ) and 𝑝2 = (𝑥2 , 𝑦2 ).
Brute Force Solution:
• Calculate the distances between all the pairs of points. From 𝑛 points there are nc2 ways of selecting 2
points. (𝑛𝑐2 = O(𝑛2 )).
• After finding distances for all 𝑛2 possibilities, we select the one which is giving the minimum distance and
this takes O(𝑛2 ).
The overall time complexity is O(𝑛2 ).
Problem-28 Give O(𝑛𝑙𝑜𝑔𝑛) solution for the 𝑐𝑙𝑜𝑠𝑒𝑠𝑡 𝑝𝑎𝑖𝑟 problem (Problem-27)?
Solution: To find O(𝑛𝑙𝑜𝑔𝑛) solution, we can use the D & C technique. Before starting the divide-and-conquer
process let us assume that the points are sorted by increasing 𝑥-coordinate. Divide the points into two equal
halves based on median of 𝑥-coordinates. That means the problem is divided into that of finding the closest pair
in each of the two halves. For simplicity let us consider the following algorithm to understand the process.
Algorithm:
1) Sort the given points in 𝑆 (given set of points) based on their 𝑥 −coordinates. Partition 𝑆 into two subsets, 𝑆1
and 𝑆2 , about the line 𝑙 through median of 𝑆. This step is the 𝐷𝑖𝑣𝑖𝑑𝑒 part of the 𝐷 & 𝐶 technique.
2) Find the closest-pairs in S1 and S2 and call them 𝐿 and 𝑅 recursively.
3) Now, steps 4 to 8 form the Combining component of the 𝐷 & 𝐶 technique.
4) Let us assume that 𝛿 = 𝑚𝑖𝑛 (𝐿, 𝑅).
5) Eliminate points that are farther than 𝛿 apart from 𝑙.
6) Consider the remaining points and sort based on their 𝑦-coordinates.
7) Scan the remaining points in the 𝑦 order and compute the distances of each point to all its neighbors that
are distanced no more than 2 × 𝛿 (that's the reason for sorting according to 𝑦).
8) If any of these distances is less than 𝛿 then update 𝛿.
x-coordinates of points
Line 𝑙 passing through the median point and divides the set into 2 equal parts
2 × 𝛿 area
𝛿 𝛿
x-coordinates of points
Line 𝑙 passing through the median point and divides the set into 2 equal parts
Let 𝛿 = 𝑚𝑖𝑛(𝐿, 𝑅), where L is the solution to first sub problem and R is the solution to second sub problem. The
possible candidates for closest-pair, which are across the dividing line, are those which are less than δ distance
from the line. So we need only the points which are inside the 2 × δ area across the dividing line as shown in the
figure. Now, to check all points within distance δ from the line, consider the following figure.
𝛿 𝛿
2𝛿 2𝛿
𝛿 𝛿
From the above diagram we can see that a maximum of 12 points can be placed inside the square with a distance
not less than 𝛿. That means, we need to check only the distances which are within 11 positions in the sorted list.
This is similar to the one above, but with the difference that in the above combining of subproblems, there are no
vertical bounds. So we can apply the 12-point box tactic over all the possible boxes in the 2 × 𝛿 area with the
dividing line as the middle line. As there can be a maximum of 𝑛 such boxes in the area, the total time for finding
the closest pair in the corridor is O(𝑛).
Analysis:
1) Step-1 and Step-2 take O(𝑛𝑙𝑜𝑔𝑛) for sorting and recursively finding the minimum.
2) Step-4 takes O(1).
3) Step-5 takes O(𝑛) for scanning and eliminating.
4) Step-6 takes O(𝑛𝑙𝑜𝑔𝑛) for sorting.
5) Step-7 takes O(𝑛) for scanning.
The total complexity: 𝑇(𝑛) = O(𝑛𝑙𝑜𝑔𝑛) + O(1) + O(𝑛) + O(𝑛) + O(𝑛) ≈ O(𝑛𝑙𝑜𝑔𝑛).
Problem-29 To calculate 𝑘 𝑛 , give the algorithm and discuss its complexity.
Solution: The naive algorithm to compute 𝑘 𝑛 is: start with 1 and multiply by 𝑘 until reaching 𝑘 𝑛 . For this approach;
there are 𝑛 − 1 multiplications and each takes constant time giving a (𝑛) algorithm. But there is a faster way to
compute 𝑘 𝑛 . For example,
924 = (912 )2 = ((96 )2 )2 = (((93 )2 )2 )2 = (((92 . 9)2 )2 )2
Note that taking the square of a number needs only one multiplication; this way, to compute 924 we need only 5
multiplications instead of 23.
int exponential(int k, int n) {
if (k == 0) return 1;
else{
if (n%2 == 1){
a = exponential(k, n-1);
return a*k;
}
else{
a= exponential(k, n/2);
return a*a;
}
}
}
Let T(𝑛) be the number of multiplications required to compute 𝑘 𝑛 . For simplicity, assume 𝑘 = 2𝑖 for some 𝑖 ≥ 1.
𝑛
𝑇(𝑛) = 𝑇( ) + 1
2
Using master theorem we get T(𝑛) = O (𝑙𝑜𝑔𝑛).
Problem-30 The Skyline Problem: Given the exact locations and shapes of 𝑛 rectangular buildings in a 2-
dimensional city. There is no particular order for these rectangular buildings. Assume that the bottom of all
buildings lie on a fixed horizontal line (bottom edges are collinear). The input is a list of triples; one per building.
A building 𝐵𝑖 is represented by the triple (𝑙𝑖 , ℎ𝑖 , 𝑟𝑖 ) where 𝑙𝑖 denote the 𝑥-position of the left edge and 𝑟𝑖 denote
the 𝑥-position of the right edge, and ℎ𝑖 denotes the building’s height. Give an algorithm that computes the
skyline (in 2 dimensions) of these buildings, eliminating hidden lines. In the diagram below there are 8
buildings, represented from left to right by the triplets (1, 14, 7), (3, 9, 10), (5, 17, 12), (14, 11, 18), (15, 6, 27),
(20, 19, 22), (23, 15, 30) and (26, 14, 29).
18
16
14
12
10
4
2
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
The output is a collection of points which describe the path of the skyline. In some versions of the problem this
collection of points is represented by a sequence of numbers 𝑝1 , 𝑝2 , ..., 𝑝𝑛 , such that the point 𝑝𝑖 represents a
horizontal line drawn at height 𝑝𝑖 if 𝑖 is even, and it represents a vertical line drawn at position 𝑝𝑖 if 𝑖 is odd. In
our case the collection of points will be a sequence of 𝑝1 , 𝑝2 , ..., 𝑝𝑛 pairs of (𝑥𝑖 , ℎ𝑖 ) where 𝑝𝑖 (𝑥𝑖 , ℎ𝑖 ) represents the ℎ𝑖
height of the skyline at position 𝑥𝑖 . In the diagram above the skyline is drawn with a thick line around the buildings
and it is represented by the sequence of position-height pairs (1, 14), (5, 17), (12, 0), (14, 11), (18, 6), (20, 19), (22,
6), (23, 15) and (30, 0). Also, assume that 𝑅𝑖 of the right most building can be maximum of 1000. That means,
the 𝐿𝑖 co-ordinate of left building can be minimum of 1 and 𝑅𝑖 of the right most building can be maximum of 1000.
Solution: The most important piece of information is that we know that the left and right coordinates of each and
every building are non-negative integers less than 1000. Now why is this important? Because we can assign a
height-value to every distinct 𝑥𝑖 coordinate where 𝑖 is between 0 and 9,999.
Algorithm:
• Allocate an array for 1000 elements and initialize all of the elements to 0. Let's call this array 𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠.
• Iterate over all of the buildings and for every 𝐵𝑖 building iterate on the range of [𝑙𝑖 .. 𝑟𝑖 ) where 𝑙𝑖 is the left,
𝑟𝑖 is the right coordinate of the building 𝐵𝑖 .
• For every 𝑥𝑗 element of this range check if ℎ𝑖 >𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠[xj], that is if building 𝐵𝑖 is taller than the current
height-value at position 𝑥𝑗 . If so, replace 𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠[𝑥𝑗 ] with ℎ𝑖 .
Once we checked all the buildings, the 𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠 array stores the heights of the tallest buildings at every position.
There is one more thing to do: convert the 𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠 array to the expected output format, that is to a sequence
of position-height pairs. It's also easy: just map each and every 𝑖 index to an (i, 𝑎𝑢𝑥𝐻𝑒𝑖𝑔ℎ𝑡𝑠[i]) pair.
#include<stdio.h>
#define MaxRightMostBuildingRi 1000
int auxHeights[MaxRightMostBuildingRi];
int skyLineBruteForce(){
int left,h,right,i,prev;
int rightMostBuildingRi = 0;
while(scanf("%d %d %d", &left, &h, &right)==3){
for(i=left;i<right;i++)
if(auxHeights[i]<h)
auxHeights[i]=h;
if(rightMostBuildingRi<right)
rightMostBuildingRi=right;
}
prev = 0;
for(i=1;i<rightMostBuildingRi;i++){
if(prev!=auxHeights[i]){
printf("%d %d ", i, auxHeights[i]);
prev=auxHeights[i];
}
}
printf("%d %d\n", rightMostBuildingRi, auxHeights[rightMostBuildingRi]);
return 0;
}
Let's have a look at the time complexity of this algorithm. Assume that, 𝑛 indicates the number of buildings in the
input sequence and 𝑚 indicates the maximum coordinate (right most building 𝑟𝑖 ). From the above code, it is clear
that for every new input building, we are traversing from 𝑙𝑒𝑓𝑡 (𝑙𝑖 ) to 𝑟𝑖𝑔ℎ𝑡 (𝑟𝑖 ) to update the heights. In the worst
case, with 𝑛 equal-size buildings, each having 𝑙 = 0 left and 𝑟 = 𝑚 − 1 right coordinates, that is every building
spans over the whole [0. . 𝑚) interval. Thus the running time of setting the height of every position is O(𝑛 × 𝑚). The
overall time-complexity is O(𝑛 × 𝑚), which is a lot larger than O(𝑛2 ) if 𝑚 > 𝑛.
Problem-31 Can we improve the solution of the Problem-30?
Solution:
18
16
14
12
10
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
It would be a huge speed-up if somehow we could determine the skyline by calculating the height for those
coordinates only where it matters, wouldn't it? Intuition tells us that if we can insert a building into an 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔
𝑠𝑘𝑦𝑙𝑖𝑛𝑒 then instead of all the coordinates the building spans over we only need to check the height at the left and
right coordinates of the building plus those coordinates of the skyline the building overlaps with and may modify.
Is merging two skylines substantially different from merging a building with a skyline? The answer is, of course,
No. This suggests that we use divide-and-conquer. Divide the input of 𝑛 buildings into two equal sets. Compute
(recursively) the skyline for each set then merge the two skylines. Inserting the buildings one after the other is not
the fastest way to solve this problem as we've seen it above. If, however, we first merge pairs of buildings into
skylines, then we merge pairs of these skylines into bigger skylines (and not two sets of buildings), and then merge
pairs of these bigger skylines into even bigger ones, then - since the problem size is halved in every step - after
𝑙𝑜𝑔𝑛 steps we can compute the final skyline.
class SkyLineDivideandConquer {
public:
vector<pair<int, int>> getSkyline(int start, int end, vector<vector<int>>& buildings) {
if (start == end) {
vector<pair<int, int>> result;
result.push_back(pair<int, int>(buildings[start][0], buildings[start][1]));
result.push_back(pair<int, int>(buildings[start][2], 0));
return result;
}
int mid = (end + start) / 2;
vector<pair<int, int>> leftSkyline = getSkyline(start, mid, buildings);
vector<pair<int, int>> rightSkyline = getSkyline(mid + 1, end, buildings);
vector<pair<int, int>> result = mergeSkylines(leftSkyline, rightSkyline);
return result;
}
vector<pair<int, int>> mergeSkylines(vector<pair<int, int>>& leftSkyline, vector<pair<int, int>>& rightSkyline) {
vector<pair<int, int>> result;
int i = 0, j = 0, currentHeight1 = 0, currentHeight2 = 0;
int maxH = max(currentHeight1, currentHeight2);
while (i != leftSkyline.size() && j != rightSkyline.size()) {
if (leftSkyline[i].first < rightSkyline[j].first) {
currentHeight1 = leftSkyline[i].second;
if (maxH != max(currentHeight1, currentHeight2))
result.push_back(pair<int, int>(leftSkyline[i].first, max(currentHeight1, currentHeight2)));
maxH = max(currentHeight1, currentHeight2);
i++;
}
else if (leftSkyline[i].first > rightSkyline[j].first) {
currentHeight2 = rightSkyline[j].second;
if (maxH != max(currentHeight1, currentHeight2))
result.push_back(pair<int, int>(rightSkyline[j].first, max(currentHeight1, currentHeight2)));
maxH = max(currentHeight1, currentHeight2);
j++;
} else {
if(leftSkyline[i].second >= rightSkyline[j].second) {
currentHeight1 = leftSkyline[i].second;
currentHeight2 = rightSkyline[j].second;
if(maxH != max(currentHeight1, currentHeight2))
result.push_back(pair<int, int>(leftSkyline[i].first, leftSkyline[i].second));
maxH = max(currentHeight1, currentHeight2);
i++;
j++;
} else {
currentHeight1 = leftSkyline[i].second;
currentHeight2 = rightSkyline[j].second;
if(maxH != max(currentHeight1, currentHeight2))
result.push_back(pair<int, int>(rightSkyline[j].first, rightSkyline[j].second));
maxH = max(currentHeight1, currentHeight2);
i++;
j++;
}
}
}
while (j < rightSkyline.size()) {
result.push_back(rightSkyline[j]);
j++;
}
while (i != leftSkyline.size()) {
result.push_back(leftSkyline[i]);
i++;
}
return result;
}
};
For example, given two skylines A=(𝑎1 , h𝑎1 , 𝑎2 , h𝑎2 , …, 𝑎𝑛 , 0) and B=(𝑏1 , h𝑏1 , 𝑏2 , h𝑏2 , …, 𝑏𝑚 , 0), we merge these
lists as the new list: (𝑐1 , h𝑐1 , 𝑐2 , h𝑐2 , …, 𝑐𝑛+𝑚 , 0). Clearly, we merge the list of 𝑎’s and 𝑏’s just like in the standard
Merge algorithm. But, in addition to that, we have to decide on the correct height in between these boundary
values. We use two variables 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡1 and 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡2 (note that these are the heights prior to
encountering the heads of the lists) to store the current height of the first and the second skyline, respectively.
When comparing the head entries (𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡1, 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡2) of the two skylines, we introduce a new strip
(and append to the output skyline) whose x-coordinate is the minimum of the entries’ x-coordinates and whose
height is the maximum of 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡1 and 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐻𝑒𝑖𝑔ℎ𝑡2. This algorithm has a structure similar to Mergesort.
So the overall running time of the divide and conquer approach will be O(𝑛𝑙𝑜𝑔𝑛).
18.10 Divide and Conquer: Problems & Solutions 417
Data Structures and Algorithms Made Easy Dynamic Programming
Chapter
Dynamic
Programming 19
19.1 Introduction
In this chapter, we will try to solve few of the problems for which we failed to get the optimal solutions using other
techniques (say, 𝐺𝑟𝑒𝑒𝑑𝑦 and 𝐷𝑖𝑣𝑖𝑑𝑒 & 𝐶𝑜𝑛𝑞𝑢𝑒𝑟 approaches). Dynamic Programming is a simple technique but it
can be difficult to master. Being able to tackle problems of this type would greatly increase your skill.
Dynamic programming (usually referred to as DP) is a very powerful technique to solve a particular class of
problems. It demands very elegant formulation of the approach and simple thinking and the coding part is very
easy. The idea is very simple, if you have solved a problem with the given input, then save the result for future
reference, so as to avoid solving the same problem again. Simply, we need to remember the past.
One easy way to identify and master DP problems is by solving as many problems as possible. The term DP is not
related to coding, but it is from literature, and means filling tables.
In other words, a greedy algorithm never reconsiders its choices. This is the main difference from dynamic
programming, which is exhaustive and is guaranteed to find the solution. After every stage, dynamic programming
makes decisions based on all the decisions made in the previous stage, and may reconsider the previous stage's
algorithmic path to solution.
The main difference between dynamic programming and divide and conquer is that in the case of the latter, sub
problems are independent, whereas in DP there can be an overlap of sub problems.
In the above example, 𝑓𝑖𝑏(2) was calculated three times (overlapping of subproblems). If 𝑛 is big, then many more
values of 𝑓𝑖𝑏 (subproblems) are recalculated, which leads to an exponential time algorithm. Instead of solving the
same subproblems again and again we can store the previous calculated values and reduce the complexity.
fib(5)
fib(3) fib(4)
fib(1) fib(0)
Note: For all problems, it may not be possible to find both top-down and bottom-up programming solutions.
Both versions of the Fibonacci series implementations clearly reduce the problem complexity to O(𝑛). This is
because if a value is already computed then we are not calling the subproblems again. Instead, we are directly
taking its value from the table.
Time Complexity: O(𝑛). Space Complexity: O(𝑛), for table.
Further Improving
One more observation from the Fibonacci series is: The current value is the sum of the previous two calculations
only. This indicates that we don’t have to store all the previous values. Instead, if we store just the last two values,
we can calculate the current value. The implementation for this is given below:
int fibonacci(int n) {
int a = 0, b = 1, sum, i;
for (i=0;i < n;i++) {
sum = a + b;
a = b;
b = sum;
}
return sum;
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Note: This method may not be applicable (available) for all problems.
Observations
While solving the problems using DP, try to figure out the following:
• See how the problems are defined in terms of subproblems recursively.
• See if we can use some table [memoization] to avoid the repeated calculations.
B D C A B A
From the above observation, we can see that the current characters of 𝑋 and 𝑌 may or may not match. That
means, suppose that the two first characters differ. Then it is not possible for both of them to be part of a common
subsequence - one or the other (or maybe both) will have to be removed. Finally, observe that once we have decided
what to do with the first characters of the strings, the remaining sub problem is again a 𝐿𝐶𝑆 problem, on two
shorter strings. Therefore, we can solve it recursively.
The solution to 𝐿𝐶𝑆 should find two sequences in 𝑋 and 𝑌 and let us say the starting index of sequence in 𝑋 is
𝑖 and the starting index of sequence in 𝑌 is 𝑗. Also, assume that 𝑋[𝑖 … 𝑚] is a substring of 𝑋 starting at character 𝑖
and going until the end of 𝑋, and that 𝑌[𝑗 … 𝑛] is a substring of 𝑌 starting at character 𝑗 and going until the end of
𝑌.
Based on the above discussion, here we get the possibilities described below:
1) If 𝑋[𝑖] == 𝑌[𝑗] : 1 + 𝐿𝐶𝑆(𝑖 + 1, 𝑗 + 1)
2) If 𝑋[𝑖] ≠ 𝑌[𝑗]: 𝐿𝐶𝑆(𝑖, 𝑗 + 1) // skipping 𝑗𝑡ℎ character of 𝑌
3) If 𝑋[𝑖] ≠ 𝑌[𝑗]: 𝐿𝐶𝑆(𝑖 + 1, 𝑗) // skipping 𝑖 𝑡ℎ character of 𝑋
In the first case, if 𝑋[𝑖] is equal to 𝑌[𝑗], we get a matching pair and can count it towards the total length of the 𝐿𝐶𝑆.
Otherwise, we need to skip either 𝑖 𝑡ℎ character of 𝑋 or 𝑗𝑡ℎ character of 𝑌 and find the longest common subsequence.
Now, 𝐿𝐶𝑆(𝑖, 𝑗) can be defined as:
0, 𝑖𝑓 𝑖 = 𝑚 𝑜𝑟 𝑗 = 𝑛
𝐿𝐶𝑆(𝑖, 𝑗) = { 𝑀𝑎𝑥{𝐿𝐶𝑆(𝑖, 𝑗 + 1), 𝐿𝐶𝑆(𝑖 + 1, 𝑗 )}, 𝑖𝑓 X[i] ≠ Y[j]
1 + 𝐿𝐶𝑆[𝑖 + 1, 𝑗 + 1], 𝑖𝑓 X[i] == Y[j]
LCS has many applications. In web searching, if we find the smallest number of changes that are needed to change
one word into another. A 𝑐ℎ𝑎𝑛𝑔𝑒 here is an insertion, deletion, or replacement of a single character.
//Initial Call: LCSLength(X, 0, m-1, Y, 0, n-1);
int LCSLength( int X[], int i, int m, int Y[], int j, int n) {
if (i == m || j == n)
return 0;
else if (X[i] == Y[j]) return 1 + LCSLength(X, i+1, m, Y, j+1, n);
else return max( LCSLength(X, i+1, m, Y, j, n), LCSLength(X, i, m, Y, j+1, n));
}
This is a correct solution, but it is very time consuming. For example, if the two strings have no matching
characters, the last line always gets executed which gives (if 𝑚 == 𝑛) close to O(2n ).
DP Solution: Adding Memoization: The problem with the recursive solution is that the same subproblems get
called many different times. A subproblem consists of a call to LCSLength, with the arguments being two suffixes
of 𝑋 and 𝑌, so there are exactly (𝑖 + 1)(𝑗 + 1) possible subproblems (a relatively small number). If there are nearly
𝟐𝐧 recursive calls, some of these subproblems must be solved over and over.
The DP solution is to check, whenever we want to solve a sub problem, whether we've already done it before. So,
we look up the solution instead of solving it again. Implemented in the most direct way, we just add some code to
our recursive solution. To do this, look up the code. This can be given as:
int LCS[1024][1024];
int LCSLength( int X[], int m, int Y[], int n) {
19.9 Longest Common Subsequence 423
Data Structures and Algorithms Made Easy Dynamic Programming
L[i][j] L[i][j+1]
L[i+1][j] L[i+1][j+1]
The value of 𝐿𝐶𝑆[𝑖][𝑗] depends on 3 other values (𝐿𝐶𝑆[𝑖 + 1][𝑗 + 1], 𝐿𝐶𝑆[𝑖][𝑗 + 1] and 𝐿𝐶𝑆[𝑖 + 1][𝑗]), all of which have
larger values of 𝑖 or 𝑗. They go through the table in the order of decreasing 𝑖 and 𝑗 values. This will guarantee that
when we need to fill in the value of 𝐿𝐶𝑆[𝑖][𝑗], we already know the values of all the cells on which it depends.
Time Complexity: O(𝑚𝑛), since 𝑖 takes values from 1 to 𝑚 and and 𝑗 takes values from 1 to 𝑛. Space
Complexity: O(𝑚𝑛).
Note: In the above discussion, we have assumed 𝐿𝐶𝑆(𝑖, 𝑗) is the length of the 𝐿𝐶𝑆 with 𝑋[𝑖 … 𝑚] and 𝑌[𝑗 … 𝑛]. We can
solve the problem by changing the definition as 𝐿𝐶𝑆(𝑖, 𝑗) is the length of the 𝐿𝐶𝑆 with 𝑋[1 … 𝑖] and 𝑌[1 … 𝑗].
Printing the subsequence: The above algorithm can find the length of the longest common subsequence but
cannot give the actual longest subsequence. To get the sequence, we trace it through the table. Start at cell
(0, 0). We know that the value of 𝐿𝐶𝑆[0][0] was the maximum of 3 values of the neighboring cells. So we simply
recompute 𝐿𝐶𝑆[0][0] and note which cell gave the maximum value. Then we move to that cell (it will be one of (1, 1),
(0, 1) or (1, 0)) and repeat this until we hit the boundary of the table. Every time we pass through a cell (𝑖, 𝑗) where
𝑋[𝑖] == 𝑌[𝑗], we have a matching pair and print 𝑋[𝑖]. At the end, we will have printed the longest common
subsequence in O(𝑚𝑛) time.
An alternative way of getting path is to keep a separate table for each cell. This will tell us which direction we came
from when computing the value of that cell. At the end, we again start at cell (0, 0) and follow these directions
until the opposite corner of the table.
From the above examples, I hope you understood the idea behind DP. Now let us see more problems which can
be easily solved using the DP technique.
Note: As we have seen above, in DP the main component is recursion. If we know the recurrence then converting
that to code is a minimal task. For the problems below, we concentrate on getting the recurrence.
Solution: The code for the given recursive formula can be given as:
19.10 Dynamic Programming: Problems & Solutions 424
Data Structures and Algorithms Made Easy Dynamic Programming
int f(int n) {
int sum = 0;
if(n==0 || n==1) //Base Case
return 2;
//recursive case
for(int i=1; i < n;i++)
sum += 2 * f(i) * f(i-1);
return sum;
}
Problem-2 Can we improve the solution to Problem-1 using memoization of DP?
Solution: Yes. Before finding a solution, let us see how the values are calculated.
𝑇(0) = 𝑇(1) = 2
𝑇(2) = 2 ∗ 𝑇(1) ∗ 𝑇(0)
𝑇(3) = 2 ∗ 𝑇(1) ∗ 𝑇(0) + 2 ∗ 𝑇(2) ∗ 𝑇(1)
𝑇(4) = 2 ∗ 𝑇(1) ∗ 𝑇(0) + 2 ∗ 𝑇(2) ∗ 𝑇(1) + 2 ∗ 𝑇(3) ∗ 𝑇(2)
From the above calculations it is clear that there are lots of repeated calculations with the same input values. Let
us use a table for avoiding these repeated calculations, and the implementation can be given as:
int f(int n) {
T[0] = T[1] = 2;
for(int i=2; i <= n; i++) {
T[i] = 0;
for (int j=1; j < i; j++)
T[i] +=2 * T[j] * T[j-1];
}
return T[n];
}
Time Complexity: O(𝑛2 ), two 𝑓𝑜𝑟 loops. Space Complexity: O(𝑛), for table.
Problem-3 Can we further improve the complexity of Problem-2?
Solution: Yes, since all sub-problem calculations are dependent only on previous calculations, the code can be
modified as:
int f(int n) {
T[0] = T[1] = 2;
T[2] = 2 * T[0] * T[1];
for(int i=3; i <= n; i++)
T[i]=T[i-1] + 2 * T[i-1] * T[i-2];
return T[n];
}
Time Complexity: O(𝑛), since only one 𝑓𝑜𝑟 loop. Space Complexity: O(𝑛).
Problem-4 Maximum Value Contiguous Subsequence: Given an array of 𝑛 numbers, give an algorithm for
finding a contiguous subsequence 𝐴(𝑖). . . 𝐴(𝑗) for which the sum of elements is maximum.
Example: {-2, 11, -4, 13, -5, 2} → 20 and {1, -3, 4, -2, -1, 6} → 7
Solution: Goal: If there are no negative numbers, then the solution is just the sum of all elements in the given
array. If negative numbers are there, then our aim is to maximize the sum [there can be a negative number in the
contiguous sum].
One simple and brute force approach is to see all possible sums and select the one which has maximum value.
int maxContiguousSum(int A[], in n) {
int maxSum = 0;
for(int i = 0; i < n; i++) // for each possible start point
for(int j = i; j < n; j++) { // for each possible end point
int currentSum = 0;
for(int k = i; k <= j; k++)
currentSum += A[k];
if(currentSum > maxSum)
maxSum = currentSum;
}
}
return maxSum;
19.10 Dynamic Programming: Problems & Solutions 425
Data Structures and Algorithms Made Easy Dynamic Programming
}
Time Complexity: O(𝑛3 ). Space Complexity: O(1).
Problem-5 Can we improve the complexity of Problem-4?
Solution: Yes. One important observation is that, if we have already calculated the sum for the subsequence
𝑖, … , 𝑗 − 1, then we need only one more addition to get the sum for the subsequence 𝑖, … , 𝑗. But, the Problem-4
algorithm ignores this information. If we use this fact, we can get an improved algorithm with the running time
O(𝑛2 ).
int maxContiguousSum(int A[], int n) {
int maxSum = 0;
for( int i = 0; i < n; i++) {
int currentSum = 0;
for( int j = i; j < n; j++) {
currentSum += a[j];
if(currentSum > maxSum)
maxSum = currentSum;
}
}
return maxSum;
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-6 Can we solve Problem-4 using Dynamic Programming?
Solution: Yes. For simplicity, let us say, 𝑀(𝑖) indicates maximum sum over all windows ending at 𝑖.
To find maximum sum we have to do one of the following and select maximum among them.
• Either extend the old sum by adding 𝐴[𝑖]
• or start new window starting with one element 𝐴[𝑖]
return M[n-1];
}
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-10 In Problem-9, we assumed that 𝑀(𝑖) represents the maximum sum from 1 to 𝑖 numbers without
selecting two contiguous numbers. Can we solve the same problem by changing the definition as: 𝑀(𝑖)
represents the maximum sum from 𝑖 to 𝑛 numbers without selecting two contiguous numbers?
Solution: Yes. Let us assume that 𝑀(𝑖) represents the maximum sum from 𝑖 to 𝑛 numbers without selecting two
contiguous numbers.
Given Array, A: recursive formula considers the case of selecting 𝑖 𝑡ℎ element
? ….. …..
A[i] A[i+1] A[i+2]
While computing 𝑀(𝑖), the decision we have to make is, whether to select 𝑖 𝑡ℎ element or not. This gives us the
following possibilities:
𝐴[𝑖] + 𝐴[𝑖 + 1] + 𝑀(𝑖 + 3)
𝑀(𝑖) = 𝑀𝑎𝑥 {𝐴[𝑖] + 𝑀(𝑖 + 2)
𝑀(𝑖 + 1)
• In the given problem the restriction is to not select three continuous numbers, but we can select two
elements continuously and skip the third one. That is what the first case says in the above recursive
formula. That means we are skipping 𝐴[𝑖 + 2].
• The other possibility is, selecting 𝑖 𝑡ℎ element and skipping second 𝑖 − 1𝑡ℎ element. This is the second case
(skipping 𝐴[𝑖 + 1]).
• And the third case is not selecting 𝑖 𝑡ℎ element and as a result we should solve the problem with 𝑖 + 1
elements.
Time Complexity: O(𝑛). Space Complexity: O(𝑛).
Problem-13 Catalan Numbers: How many binary search trees are there with 𝑛 vertices?
Solution: Binary Search Tree (BST) is a tree where the left subtree elements are less than the root element, and
the right subtree elements are greater than the root element. This property should be satisfied at every node in
the tree. The number of BSTs with 𝑛 nodes is called 𝐶𝑎𝑡𝑎𝑙𝑎𝑛 𝑁𝑢𝑚𝑏𝑒𝑟 and is denoted by 𝐶𝑛 . For example, there are
2 BSTs with 2 nodes (2 choices for the root) and 5 BSTs with 3 nodes.
Number of nodes, 𝑛 Number of Trees
1 1
2 1
2
1 2
3 1 3 1 2
3
2 2 1 2 1 3
1 3 2 3
Let us assume that the nodes of the tree are numbered from 1 to 𝑛. Among the nodes, we have to select some
node as root, and then divide the nodes which are less than root node into left sub tree, and elements greater than
root node into right sub tree. Since we have already numbered the vertices, let us assume that the root element
we selected is 𝑖 𝑡ℎ element.
If we select 𝑖 𝑡ℎ element as root then we get 𝑖 − 1 elements on left sub-tree and 𝑛 − 𝑖 elements on right sub tree.
Since 𝐶𝑛 is the Catalan number for 𝑛 elements, 𝐶𝑖−1 represents the Catalan number for left sub tree elements
(𝑖 − 1 elements) and 𝐶𝑛−𝑖 represents the Catalan number for right sub tree elements. The two sub trees are
independent of each other, so we simply multiply the two numbers. That means, the Catalan number for a fixed 𝑖
value is 𝐶𝑖−1 × 𝐶𝑛−𝑖 .
Since there are 𝑛 nodes, for 𝑖 we will get 𝑛 choices. The total Catalan number with 𝑛 nodes can be given as:
𝑛
𝐶𝑛 = ∑ 𝐶𝑖−1 × 𝐶𝑛−𝑖
𝑖=1
int CatalanNumber( int n ) {
if( n == 0 ) return 1;
int count = 0;
for( int i = 1; i <= n; i++ )
count += CatalanNumber (i -1) * CatalanNumber (n -i);
return count;
}
Time Complexity: O(4𝑛 ). For proof, refer to 𝐼𝑛𝑡𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 chapter.
Problem-14 Can we improve the time complexity of Problem-13 using DP?
Solution: The recursive call 𝐶𝑛 depends only on the numbers 𝐶0 to 𝐶𝑛−1 and for any value of 𝑖, there are a lot of
recalculations. We will keep a table of previously computed values of 𝐶𝑖 . If the function 𝐶𝑎𝑡𝑎𝑙𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟() is called
with parameter 𝐢, and if it has already been computed before, then we can simply avoid recalculating the same
subproblem.
int Table[1024];
int CatalanNumber( int n ) {
if( Table[n] ) != 1 )
return Table[n];
Table[n] = 0;
for( int i = 1; i <= n; i++ )
Table[n] += CatalanNumber( i -1) * CatalanNumber(n -i);
return Table[n];
}
The time complexity of this implementation O(𝑛2 ), because to compute 𝐶𝑎𝑡𝑎𝑙𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟(𝑛), we need to compute all
of the 𝐶𝑎𝑡𝑎𝑙𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟(𝑖) values between 0 and 𝑛 − 1, and each one will be computed exactly once, in linear time.
(2𝑛)!
In mathematics, Catalan Number can be represented by direct equation as: .
𝑛!(𝑛+1)!
The base case would be 𝑃(1), and obviously, the number of ways to parenthesize the two matrices is 1.
𝑃(1) = 1
This is related to the Catalan numbers (which in turn is related to the number of different binary trees on n nodes).
4𝑛
As said above, applying Stirling’s formula, we find that C(n) is O( 3 ). Since 4𝑛 is exponential and 𝑛3/2 is just
𝑛2 √𝜋
polynomial, the exponential will dominate, implying that function grows very fast. Thus, this will not be practical
except for very small n. In summary, brute force is not an option.
Now let us use DP to improve this time complexity. Assume that, 𝑀[𝑖, 𝑗] represents the least number of
multiplications needed to multiply 𝐴𝑖 · · · 𝐴𝑗 .
0 , 𝑖𝑓 𝑖 = 𝑗
𝑀[𝑖 , 𝑗 ] = {
𝑀𝑖𝑛{𝑀[𝑖 , 𝑘] + 𝑀[𝑘 + 1, 𝑗 ] + 𝑃𝑖−1 𝑃𝑘 𝑃𝑗 }, 𝑖𝑓 𝑖 < 𝑗
The above recursive formula says that we have to find point 𝑘 such that it produces the minimum number of
multiplications. After computing all possible values for 𝑘, we have to select the 𝑘 value which gives minimum
value. We can use one more table (say, 𝑆[𝑖, 𝑗]) to reconstruct the optimal parenthesizations. Compute the 𝑀[𝑖, 𝑗]
and 𝑆[𝑖, 𝑗] in a bottom-up fashion.
/* P is the sizes of the matrices, Matrix i has the dimension P[i-1] x P[i].
M[i,j] is the best cost of multiplying matrices i through j
S[i,j] saves the multiplication point and we use this for back tracing */
void MatrixChainOrder(int P[], int length) {
int n = length – 1, M[n][n], S[n][n];
Real-time example: Suppose we are going by flight, and we know that there is a limitation on the luggage
weight. Also, the items which we are carrying can be of different types (like laptops, etc.). In this case, our
objective is to select the items with maximum value. That means, we need to tell the customs officer to select
the items which have more weight and less value (profit).
Solution: Input is a set of 𝑛 items with sizes 𝑠𝑖 and values 𝑣𝑖 and a Knapsack of size 𝐶 which we need to fill with
a subset of items from the given set. Let us try to find the recursive formula for this problem using DP. Let
𝑀(𝑖, 𝑗) represent the optimal value we can get for filling up a knapsack of size 𝑗 with items 1 … 𝑖. The recursive
formula can be given as:
𝑀(𝑖, 𝑗) = 𝑀𝑎𝑥 {𝑀(𝑖 − 1, 𝑗), 𝑀(𝑖 − 1, 𝑗 − 𝑠𝑖 ) + 𝑣𝑖 }
𝑀(𝑖 − 1, 𝑗 − 𝑠𝑖 ) 𝑀(𝑖 − 1, 𝑗)
+𝑣𝑖
𝑀(𝑖, 𝑗)
𝑖
Since 𝑖 takes values from 1 … 𝑛 and 𝑗 takes values from 1 … 𝐶, there are a total of 𝑛𝐶 subproblems. Now let us see
what the above formula says:
• 𝑀(𝑖 − 1, 𝑗): Indicates the case of not selecting the 𝑖 𝑡ℎ item. In this case, since we are not adding any size to
the knapsack we have to use the same knapsack size for subproblems but excluding the 𝑖 𝑡ℎ item. The
remaining items are 𝑖 − 1.
• 𝑀(𝑖 − 1, 𝑗 − 𝑠𝑖 ) + 𝑣𝑖 indicates the case where we have selected the 𝑖 𝑡ℎ item. If we add the 𝑖 𝑡ℎ item then we
have to reduce the subproblem knapsack size to 𝑗 − 𝑠𝑖 and at the same time we need to add the value 𝐯𝐢
to the optimal solution. The remaining items are 𝑖 − 1.
Now, after finding all 𝑀(𝑖, 𝑗) values, the optimal objective value can be obtained as: 𝑀𝑎𝑥𝑗 {𝑀(𝑛, 𝑗)}
This is because we do not know what amount of capacity gives the best solution.
In order to compute some value 𝑀(𝑖, 𝑗), we take the maximum of 𝑀(𝑖 − 1, 𝑗) and 𝑀(𝑖 − 1, 𝑗 − 𝑠𝑖 ) + 𝑣𝑖 . These two values
(𝑀(𝑖, 𝑗) and 𝑀(𝑖 − 1, 𝑗 − 𝑠𝑖 )) appear in the previous row and also in some previous columns. So, 𝑀(𝑖, 𝑗) can be
computed just by looking at two values in the previous row in the table.
Problem-19 Making Change: Given 𝑛 types of coin denominations of values 𝑣1 < 𝑣2 < . . . < 𝑣𝑛 (integers).
Assume 𝑣1 = 1, so that we can always make change for any amount of money 𝐶. Give an algorithm which makes
change for an amount of money 𝐶 with as few coins as possible.
Solution:
Now it is easy to understand an optimal way to make money 𝐶 with the fewest coins is completely equivalent to
the optimal way to fill the Knapsack of size 𝐶. This is because since every value has a value of −1, and the Knapsack
algorithm uses as few items as possible which correspond to as few coins as possible.
Let us try formulating the recurrence. Let 𝑀(𝑗) indicate the minimum number of coins required to make change
for the amount of money equal to 𝑗.
𝑀(𝑗) = 𝑀𝑖𝑛𝑖 {𝑀(𝑗 − 𝑣𝑖 )} + 1
What this says is, if coin denomination 𝑖 was the last denomination coin added to the solution, then the optimal
way to finish the solution with that one is to optimally make change for the amount of money j − vi and then add
one extra coin of value 𝑣𝑖 .
int Table[128] ; //Initialization
int makingChange(int n) {
if(n < 0) return -1;
if(n == 0)
return 0;
if(Table[n] != -1)
return Table[n];
int ans = -1;
for ( int i = 0 ; i < num_denomination ; ++i )
ans = min( ans , makingChange(n - denominations [ i ] ) ) ;
Now after finding the maximum sequence for all positions we have to select the one among all positions which
gives the maximum sequence and it is defined as:
𝑀𝑎𝑥𝑖 {𝐿(𝑖)}
int LISTable [1024];
int LongestIncreasingSequence( int A[], int n ) {
int i, j, max = 0;
for ( i = 0; i < n; i++ )
LISTable[i] = 1;
Let 𝐿(𝑖) represent the optimal subsequence which is starting at position 𝐴[𝑖] and ending at 𝐴[𝑛]. The optimal way
to obtain a strictly increasing subsequence starting at position 𝑖 is going to be to extend some subsequence starting
at some later position 𝑗. For this the recursive formula can be written as:
𝐿(𝑖) = 𝑀𝑎𝑥𝑖<𝑗 𝑎𝑛𝑑 𝐴[𝑖]<𝐴[𝑗] {𝐿(𝑗)} + 1
We have to select some later position 𝑗 which gives the maximum sequence. The 1 in the recursive formula is the
addition of 𝑖 𝑡ℎ element. After finding the maximum sequence for all positions select the one among all positions
which gives the maximum sequence and it is defined as:
𝑀𝑎𝑥𝑖 {𝐿(𝑖)}
int LISTable [1024];
int LongestIncreasingSequence( int A[], int n ) {
int i, j, max = 0;
for ( i = 0; i < n; i++ )
LISTable[i] = 1;
for(i = n – 1; i >= 0; i++) {
// try picking a larger second element
for( j = i + 1; j < n; j++ ) {
if( A[i] < A[j] && LISTable [i] < LISTable [j] + 1)
LISTable[i] = LISTable[j] + 1;
}
}
for ( i = 0; i < n; i++ ) {
if( max < LISTable[i] )
max = LISTable[i];
}
return max;
}
Time Complexity: O(𝑛2 ), since two nested 𝑓𝑜𝑟 loops. Space Complexity: O(𝑛), for table.
Problem-22 Is there an alternative way of solving Problem-21?
Solution: Yes. The other method is to sort the given sequence and save it into another array and then take out
the “Longest Common Subsequence” (LCS) of the two arrays. This method has a complexity of O(𝑛2 ). For the LCS
problem refer to the 𝑡ℎ𝑒𝑜𝑟𝑦 𝑠𝑒𝑐𝑡𝑖𝑜𝑛 of this chapter.
Problem-23 Box Stacking: Assume that we are given a set of 𝑛 rectangular 3 − D boxes. The dimensions of 𝑖 𝑡ℎ
box are height ℎ𝑖 , width 𝑤𝑖 and depth 𝑑𝑖 . Now we want to create a stack of boxes which is as tall as possible, but
we can only stack a box on top of another box if the dimensions of the 2 −D base of the lower box are each
strictly larger than those of the 2 −D base of the higher box. We can rotate a box so that any side functions as
its base. It is possible to use multiple instances of the same type of box.
2 𝑗 ……..
1 ……..
Decreasing base
area
This simplification allows us to forget about the rotations of the boxes and we just focus on the stacking of n boxes
with each height as ℎ𝑖 and a base area of (𝑤𝑖 × 𝑑𝑖 ). Also assume that 𝑤𝑖 ≤ 𝑑𝑖 . Now what we do is, make a stack of
boxes that is as tall as possible and has maximum height. We allow a box 𝑖 on top of box 𝑗 only if box 𝑖 is smaller
than box 𝑗 in both the dimensions. That means, if 𝑤𝑖 < 𝑤𝑗 && 𝑑𝑖 < 𝑑𝑗 . Now let us solve this using DP. First select
the boxes in the order of decreasing base area.
Now, let us say 𝐻(𝑗) represents the tallest stack of boxes with box 𝑗 on top. This is very similar to the LIS problem
because the stack of 𝑛 boxes with ending box 𝑗 is equal to finding a subsequence with the first 𝑗 boxes due to the
sorting by decreasing base area. The order of the boxes on the stack is going to be equal to the order of the
sequence.
Now we can write 𝐻(𝑗) recursively. In order to form a stack which ends on box 𝑗, we need to extend a previous
stack ending at 𝑖. That means, we need to put 𝑗 box at the top of the stack [𝑖 box is the current top of the stack].
To put 𝑗 box at the top of the stack we should satisfy the condition 𝑤𝑖 > 𝑤𝑗 𝑎𝑛𝑑 𝑑𝑖 > 𝑑𝑗 [this ensures that the low
level box has more base than the boxes above it]. Based on this logic, we can write the recursive formula as:
𝐻(𝑗) = 𝑀𝑎𝑥𝑖<𝑗 𝑎𝑛𝑑 wi >wj and di>dj {𝐻(𝑖)} + ℎ𝑖
Similar to the LIS problem, at the end we have to select the best 𝑗 over all potential values. This is because we are
not sure which box might end up on top.
𝑀𝑎𝑥𝑗 {𝐻(𝑗)}
Time Complexity: O(𝑛2 ).
Problem-24 Building Bridges in India: Consider a very long, straight river which moves from north to south.
Assume there are 𝑛 cities on both sides of the river: 𝑛 cities on the left of the river and 𝑛 cities on the right side
of the river. Also, assume that these cities are numbered from 1 to 𝑛 but the order is not known. Now we want
to connect as many left-right pairs of cities as possible with bridges such that no two bridges cross. When
connecting cities, we can only connect city 𝑖 on the left side to city 𝑖 on the right side.
Solution:
Input: Two pairs of sets with each numbered from 1 to 𝑛.
Goal: Construct as many bridges as possible without any crosses between the cities on the left side of the river
and the cities on the right side.
1 R 3
i
v
2 e 𝑛
r
𝑛 2
To understand better let us consider the diagram below. In the diagram it can be seen that there are 𝑛 cities on
the left side of river and 𝑛 cities on the right side of river. Also, note that we are connecting the cities which have
the same number [a requirement in the problem]. Our goal is to connect the maximum cities on the left side of
river to cities on the right side of the river, without any cross edges. Just to make it simple, let us sort the cities
on one side of the river.
If we observe carefully, since the cities on the left side are already sorted, the problem can be simplified to finding
the maximum increasing sequence. That means we have to use the LIS solution for finding the maximum
increasing sequence on the right side cities of the river.
Time Complexity: O(𝑛2 ), (same as LIS).
Problem-25 Subset Sum: Given a sequence of 𝑛 positive numbers 𝐴1 . . .𝐴𝑛 , give an algorithm which checks
whether there exists a subset of 𝐴 whose sum of all numbers is 𝑇?
Solution: This is a variation of the Knapsack problem. As an example, consider the following array:
𝐴 = [3, 2, 4, 19, 3, 7, 13, 10, 6, 11]
Suppose we want to check whether there is any subset whose sum is 17. The answer is yes, because the sum of
4 + 13 = 17 and therefore {4, 13} is such a subset.
Let us try solving this problem using DP. We will define 𝑛 × 𝑇 matrix, where 𝑛 is the number of elements in our
input array and 𝑇 is the sum we want to check.
Let, 𝑀[𝑖, 𝑗] = 1 if it is possible to find a subset of the numbers 1 through 𝑖 that produce sum 𝑗 and 𝑀[𝑖, 𝑗] = 0
otherwise.
𝑀[𝑖, 𝑗] = 𝑀𝑎𝑥(𝑀[𝑖 − 1, 𝑗], 𝑀[𝑖 − 1, 𝑗 − 𝐴𝑖 ])
According to the above recursive formula similar to the Knapsack problem, we check if we can get the sum 𝑗 by
not including the element 𝑖 in our subset, and we check if we can get the sum 𝑗 by including 𝑖 and checking if the
sum 𝑗 − 𝐴𝑖 exists without the 𝑖 𝑡ℎ element. This is identical to Knapsack, except that we are storing 0/1’s instead
of values. In the below implementation we can use binary OR operation to get the maximum among 𝑀[𝑖 − 1, 𝑗]
and 𝑀[𝑖 − 1, 𝑗 − 𝐴𝑖 ].
int SubsetSum( int A[], int n, int T ) {
int i, j, M[n+1][T +1];
M[0][0]=0;
for (i=1; i<= T; i++)
M[0][i]= 0;
for (i=1; i<=n; i++) {
for (j = 0; j<= T; j++) {
M[i][j] = M[i-1][j] || M[i-1][j - A[i]];
}
}
return M[n][T];
}
How many subproblems are there? In the above formula, 𝑖 can range from 1 𝑡𝑜 𝑛 and 𝑗 can range from 1 𝑡𝑜 𝑇.
There are a total of 𝑛𝑇 subproblems and each one takes O(1). So the time complexity is O(𝑛𝑇) and this is not
polynomial as the running time depends on two variables [𝑛 and 𝑇], and we can see that they are an exponential
function of the other.
Space Complexity: O(𝑛𝑇).
Problem-26 Given a set of 𝑛 integers and the sum of all numbers is at most 𝐾. Find the subset of these 𝑛
elements whose sum is exactly half of the total sum of 𝑛 numbers.
Solution: Assume that the numbers are 𝐴1 . . .𝐴𝑛 . Let us use DP to solve this problem. We will create a boolean
array 𝑇 with size equal to 𝐾 + 1. Assume that 𝑇[𝑥] is 1 if there exists a subset of given 𝑛 elements whose sum is 𝑥.
That means, after the algorithm finishes, 𝑇[𝐾] will be 1, if and only if there is a subset of the numbers that has
sum 𝐾. Once we have that value then we just need to return 𝑇[𝐾/2]. If it is 1, then there is a subset that adds up
to half the total sum.
Initially we set all values of 𝑇 to 0. Then we set 𝑇[0] to 1. This is because we can always build 0 by taking an empty
set. If we have no numbers in 𝐴, then we are done! Otherwise, we pick the first number, 𝐴[0]. We can either throw
it away or take it into our subset. This means that the new 𝑇[] should have 𝑇[0] and 𝑇[𝐴[0]] set to 1. This creates
the base case. We continue by taking the next element of 𝐴.
Suppose that we have already taken care of the first i − 1 elements of A. Now we take A[i] and look at our table T[].
After processing i − 1 elements, the array T has a 1 in every location that corresponds to a sum that we can make
from the numbers we have already processed. Now we add the new number, A[i]. What should the table look like?
First of all, we can simply ignore A[i]. That means, no one should disappear from T[] – we can still make all those
sums. Now consider some location of T[j] that has a 1 in it. It corresponds to some subset of the previous numbers
that add up to j. If we add A[i] to that subset, we will get a new subset with total sum j + A[i]. So we should set
T[j + A[i]] to 1 as well. That's all. Based on the above discussion, we can write the algorithm as:
bool T[10240];
bool SubsetHalfSum( int A[], int n ) {
int K = 0;
for( int i = 0; i < n; i++ )
K += A[i];
T[0] = 1; // initialize the table
for( int i = 1; i <= K; i++ )
T[i] = 0;
// process the numbers one by one
for( int i = 0; i < n; i++ ) {
for( int j = K - A[i]; j >= 0; j--) {
if( T[j] )
T[j + A[i]] = 1;
}
}
return T[K / 2];
}
In the above code, 𝑗 loop moves from right to left. This reduces the double counting problem. That means, if we
move from left to right, then we may do the repeated calculations.
Time Complexity: O(𝑛𝐾), for the two 𝑓𝑜𝑟 loops. Space Complexity: O(𝐾), for the boolean table 𝑇.
Problem-27 Can we improve the performance of Problem-26?
Solution: Yes. In the above code what we are doing is, the inner 𝑗 loop is starting from 𝐾 and moving left. That
means, it is unnecessarily scanning the whole table every time.
What we actually want is to find all the 1 entries. At the beginning, only the 0th entry is 1. If we keep the location
of the rightmost 1 entry in a variable, we can always start at that spot and go left instead of starting at the right
end of the table.
To take full advantage of this, we can sort 𝐴[] first. That way, the rightmost 1 entry will move to the right as slowly
as possible. Finally, we don't really care about what happens in the right half of the table (after 𝑇[𝐾/2]) because if
𝑇[𝑥] is 1, then 𝑇[𝐾𝑥] must also be 1 eventually – it corresponds to the complement of the subset that gave us 𝑥.
The code based on above discussion is given below.
int T[10240];
int SubsetHalfSumEfficient( int A[], int n ) {
int K = 0;
for( int i = 0; i < n; i++ )
K += A[i];
sort(A,n));
T[0] = 1; // initialize the table
for( int i = 1; i <= sum; i++ )
T[i] = 0;
int R = 0; // rightmost 1 entry
for( int i = 0; i < n; i++) { // process the numbers one by one
for( int j = R; j >= 0; j--) {
if( T[j] )
T[j + A[i]] = 1;
R = min(K / 2, R + C[i] );
}
}
return T[K / 2];
}
After the improvements, the time complexity is still O(𝑛𝐾), but we have removed some useless steps.
Problem-28 The partition problem is to determine whether a given set can be partitioned into two subsets
such that the sum of elements in both subsets is the same [the same as the previous problem but a different
way of asking]. For example, if A[] = {1, 5, 11, 5}, the array can be partitioned as {1, 5, 5} and {11}. Similarly, if
A[] = {1, 5, 3}, the array cannot be partitioned into equal sum sets.
Solution: Let us try solving this problem another way. Following are the two main steps to solve this problem:
1. Calculate the sum of the array. If the sum is odd, there cannot be two subsets with an equal sum, so
return false.
2. If the sum of the array elements is even, calculate 𝑠𝑢𝑚/2 and find a subset of the array with a sum equal
to 𝑠𝑢𝑚/2.
The first step is simple. The second step is crucial, and it can be solved either using recursion or Dynamic
Programming.
Recursive Solution: Following is the recursive property of the second step mentioned above. Let subsetSum(A,
n, sum/2) be the function that returns true if there is a subset of A[0..n-1] with sum equal to 𝑠𝑢𝑚/2. The
isSubsetSum problem can be divided into two sub problems:
a) isSubsetSum() without considering last element (reducing 𝑛 to 𝑛 − 1)
b) isSubsetSum considering the last element (reducing sum/2 by A[n-1] and 𝑛 to 𝑛 − 1)
If any of the above sub problems return true, then return true.
𝑠𝑢𝑏𝑠𝑒𝑡𝑆𝑢𝑚 (𝐴, 𝑛, 𝑠𝑢𝑚/2) = 𝑖𝑠𝑆𝑢𝑏𝑠𝑒𝑡𝑆𝑢𝑚 (𝐴, 𝑛 − 1, 𝑠𝑢𝑚/2) || 𝑠𝑢𝑏𝑠𝑒𝑡𝑆𝑢𝑚 (𝐴, 𝑛 − 1, 𝑠𝑢𝑚/2 − 𝐴[𝑛 − 1])
// A utility function that returns true if there is a subset of A[] with sum equal to given sum
bool subsetSum(int A[], int n, int sum){
if (sum == 0)
return true;
if (n == 0 && sum != 0)
return false;
// If last element is greater than sum, then ignore it
if (A[n-1] > sum)
return subsetSum (A, n-1, sum);
return subsetSum (A, n-1, sum) || subsetSum (A, n-1, sum-A[n-1]);
}
// Returns true if A[] can be partitioned in two subsets of equal sum, otherwise false
bool findPartition(int A[], int n){
// Calculate sum of all elements
int sum = 0;
for (int i = 0; i < n; i++)
sum += A[i];
// If sum is odd, there cannot be two subsets with equal sum
if (sum%2 != 0)
return false;
// Find if there is subset with sum equal to half of total sum
return subsetSum(A, n, sum/2);
}
Time Complexity: O(2𝑛 ) In worst case, this solution tries two possibilities (whether to include or exclude) for every
element.
Dynamic Programming Solution: The problem can be solved using dynamic programming when the sum of the
elements is not too big. We can create a 2D array 𝑝𝑎𝑟𝑡[][] of size (𝑠𝑢𝑚/2)*(𝑛 + 1). And we can construct the solution
in a bottom-up manner such that every filled entry has a following property
𝑝𝑎𝑟𝑡[𝑖][𝑗] = 𝑡𝑟𝑢𝑒 𝑖𝑓 𝑎 𝑠𝑢𝑏𝑠𝑒𝑡 𝑜𝑓 {𝐴[0], 𝐴[1], . . 𝐴[𝑗 − 1]} ℎ𝑎𝑠 𝑠𝑢𝑚 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑠𝑢𝑚/2, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑓𝑎𝑙𝑠𝑒
// Returns true if A[] can be partitioned in two subsets of equal sum, otherwise false
bool findPartition (int A[], int n){
int sum = 0;
int i, j;
// calculate sum of all elements
for (i = 0; i < n; i++)
sum += A[i];
if (sum%2 != 0)
return false;
bool part[sum/2+1][n+1];
// initialize top row as true
for (i = 0; i <= n; i++)
part[0][i] = true;
// initialize leftmost column, except part[0][0], as 0
for (i = 1; i <= sum/2; i++)
part[i][0] = false;
// Fill the partition table in bottom up manner
for (i = 1; i <= sum/2; i++) {
for (j = 1; j <= n; j++) {
part[i][j] = part[i][j-1];
if (i >= A[j-1])
part[i][j] = part[i][j] || part[i - A[j-1]][j-1];
}
}
return part[sum/2][n];
}
Time Complexity: O(𝑠𝑢𝑚 × 𝑛). Space Complexity: O(𝑠𝑢𝑚 × 𝑛). Please note that this solution will not be feasible for
arrays with a big sum.
Problem-29 Counting Boolean Parenthesizations: Let us assume that we are given a boolean expression
consisting of symbols ′𝑡𝑟𝑢𝑒′, ′𝑓𝑎𝑙𝑠𝑒′, ′𝑎𝑛𝑑′, ′𝑜𝑟′, 𝑎𝑛𝑑 ′𝑥𝑜𝑟′. Find the number of ways to parenthesize the expression
such that it will evaluate to 𝑡𝑟𝑢𝑒. For example, there is only 1 way to parenthesize ′𝑡𝑟𝑢𝑒 𝑎𝑛𝑑 𝑓𝑎𝑙𝑠𝑒 𝑥𝑜𝑟 𝑡𝑟𝑢𝑒′ such
that it evaluates to 𝑡𝑟𝑢𝑒.
Solution: Let the number of symbols be n and between symbols there are boolean operators like and, or, xor, etc.
For example, if 𝑛 = 4, 𝑇 𝑜𝑟 𝐹 𝑎𝑛𝑑 𝑇 𝑥𝑜𝑟 𝐹. Our goal is to count the numbers of ways to parenthesize the expression
with boolean operators so that it evaluates to 𝑡𝑟𝑢𝑒. In the above case, if we use 𝑇 𝑜𝑟 ( (𝐹 𝑎𝑛𝑑 𝑇) 𝑥𝑜𝑟 𝐹) then it
evaluates to true.
𝑇 𝑜𝑟( (𝐹 𝑎𝑛𝑑 𝑇)𝑥𝑜𝑟 𝐹) = 𝑇𝑟𝑢𝑒
Now let us see how DP solves this problem. Let 𝑇(𝑖, 𝑗) represent the number of ways to parenthesize the sub
expression with symbols 𝑖 … 𝑗 [symbols means only 𝑇 and 𝐹 and not the operators] with boolean operators so that
it evaluates to 𝑡𝑟𝑢𝑒. Also, 𝑖 and 𝑗 take the values from 1 to 𝑛. For example, in the above case, 𝑇(2, 4) = 0 because
there is no way to parenthesize the expression 𝐹 𝑎𝑛𝑑 𝑇 𝑥𝑜𝑟 𝐹 to make it 𝑡𝑟𝑢𝑒.
Just for simplicity and similarity, let 𝐹(𝑖, 𝑗) represent the number of ways to parenthesize the sub expression with
symbols 𝑖 … 𝑗 with boolean operators so that it evaluates to 𝑓𝑎𝑙𝑠𝑒. The base cases are 𝑇(𝑖, 𝑖) and 𝐹(𝑖, 𝑖).
Now we are going to compute 𝑇(𝑖, 𝑖 + 1) and 𝐹(𝑖, 𝑖 + 1) for all values of 𝑖. Similarly, 𝑇(𝑖, 𝑖 + 2) and 𝐹(𝑖, 𝑖 + 2) for all
values of 𝑖 and so on. Now let’s generalize the solution.
𝑗−1 𝑇(𝑖, 𝑘)𝑇(𝑘 + 1, 𝑗), 𝑓𝑜𝑟 "𝑎𝑛𝑑"
𝑇(𝑖, 𝑗) = ∑ { 𝑇𝑜𝑡𝑎𝑙(𝑖, 𝑘)𝑇𝑜𝑡𝑎𝑙(𝑘 + 1, 𝑗) − 𝐹(𝑖, 𝑘)𝐹(𝑘 + 1, 𝑗), 𝑓𝑜𝑟 "𝑜𝑟"
𝑘=𝑖 𝑇(𝑖, 𝑘)𝐹(𝑘 + 1, 𝑗) + 𝐹(𝑖, 𝑘)𝑇(𝑘 + 1, 𝑗), 𝑓𝑜𝑟 "𝑥𝑜𝑟"
Where, 𝑇𝑜𝑡𝑎𝑙(𝑖, 𝑘) = 𝑇(𝑖, 𝑘) + 𝐹(𝑖, 𝑘).
1 2 … 𝑖 … 𝑘 𝑘+1 … 𝑗 … 𝑛
number of times that a particular item is searched in the binary search trees. That means we need to construct
a binary search tree so that the total search time will be reduced.
Solution: Before solving the problem let us understand the problem with an example. Let us assume that the
given array is A = [3, 12, 21, 32, 35]. There are many ways to represent these elements, two of which are listed below.
12 35
3 32 12
21 35 3 21
32
Of the two, which representation is better? The search time for an element depends on the depth of the node.
1+2+2+3+3 11
The average number of comparisons for the first tree is: = and for the second tree, the average number
5 5
1+2+3+3+4 13
of comparisons is: 5
=
𝟓
. Of the two, the first tree gives better results.
If frequencies are not given and if we want to search all elements, then the above simple calculation is enough for
deciding the best tree. If the frequencies are given, then the selection depends on the frequencies of the elements
and also the depth of the elements. An obvious way to find an optimal binary search tree is to generate each
possible binary search tree for the keys, calculate the search time, and keep that tree with the smallest total
search time. This search through all possible solutions is not feasible, since the number of such trees grows
exponentially with 𝑛.
An alternative would be a recursive algorithm. Consider the characteristics of any optimal tree. Of course it has a
root and two subtrees. Both subtrees must themselves be optimal binary search trees with respect to their keys
and frequencies. First, any subtree of any binary search tree must be a binary search tree. Second, the subtrees
must also be optimal.
For simplicity let us assume that, the given array is 𝐴 and the corresponding frequencies are in array 𝐹. 𝐹[𝑖]
indicates the frequency of 𝑖 𝑡ℎ element 𝐴[𝑖]. With this, the total search time S(root) of the tree with root can be
defined as:
𝑛
struct BinarySearchTreeNode *OptimalBST(int A[], int F[], int low, int high) {
int r, minTime = 0;
struct BinarySearchTreeNode *newNode=(struct BinarySearchTreeNode *) malloc(sizeof(struct
BinarySearchTreeNode));
if(!newNode) {
printf("Memory Error");
return;
}
for (r =0, r <= n-1; r++) {
root→left = OptimalBST(A, F, low, r-1);
root→left = OptimalBST(A, F, r+1, high)
root→data = A[r];
if(minTime > S(root))
minTime = S(root);
}
return minTime;
}
Problem-31 Edit Distance: Given two strings 𝐴 of length 𝑚 and 𝐵 of length 𝑛, transform 𝐴 into 𝐵 with a
minimum number of operations of the following types: delete a character from 𝐴, insert a character into 𝐴, or
change some character in 𝐴 into a new character. The minimal number of such operations required to transform
𝐴 into 𝐵 is called the 𝑒𝑑𝑖𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 between 𝐴 and 𝐵.
Solution: Input: Two text strings 𝐴 of length 𝑚 and 𝐵 of length 𝑛. Goal: Convert string 𝐴 into 𝐵 with
minimal conversions.
Before going to a solution, let us consider the possible operations for converting string 𝐴 into 𝐵.
• If 𝑚 > 𝑛, we need to remove some characters of 𝐴
• If 𝑚 == 𝑛, we may need to convert some characters of 𝐴
• If 𝑚 < 𝑛, we need to remove some characters from 𝐴
So the operations we need are the insertion of a character, the replacement of a character and the deletion of a
character, and their corresponding cost codes are defined below.
Costs of operations:
Insertion of a character 𝑐𝑖
Replacement of a character 𝑐𝑟
Deletion of a character 𝑐𝑑
Now let us concentrate on the recursive formulation of the problem. Let, 𝑇(𝑖, 𝑗) represents the minimum cost
required to transform first 𝑖 characters of 𝐴 to first 𝑗 characters of 𝐵. That means, 𝐴[1 … 𝑖] to 𝐵[1 … 𝑗].
𝑐𝑑 + 𝑇(𝑖 − 1, 𝑗)
𝑇(𝑖, 𝑗 − 1) + 𝑐𝑖
𝑇(𝑖, 𝑗) = 𝑚𝑖𝑛
𝑇(𝑖 − 1, 𝑗 − 1), 𝑖𝑓 𝐴[𝑖] == 𝐵[𝑗]
{
{ 𝑇(𝑖 − 1, 𝑗 − 1) + 𝑐𝑟 𝑖𝑓 𝐴[𝑖] ≠ 𝐵[𝑗]
Based on the above discussion we have the following cases.
• If we delete 𝑖 𝑡ℎ character from 𝐴, then we have to convert remaining 𝑖 − 1 characters of 𝐴 to 𝑗 characters of
𝐵
• If we insert 𝑖 𝑡ℎ character in 𝐴, then convert these 𝑖 characters of 𝐴 to 𝑗 − 1 characters of 𝐵
• If 𝐴[𝑖] == 𝐵[𝑗], then we have to convert the remaining 𝑖 − 1 characters of 𝐴 to 𝑗 − 1 characters of 𝐵
• If 𝐴[𝑖] ≠ 𝐵[𝑗], then we have to replace 𝑖 𝑡ℎ character of 𝐴 to 𝑗𝑡ℎ character of B and convert remaining 𝑖 − 1
characters of 𝐴 to 𝑗 − 1 characters of 𝐵
After calculating all the possibilities we have to select the one which gives the lowest cost.
How many subproblems are there? In the above formula, 𝑖 can range from 1 𝑡𝑜 𝑚 and 𝑗 can range from 1 𝑡𝑜 𝑛.
This gives 𝑚𝑛 subproblems and each one takes O(1) and the time complexity is O(𝑚𝑛). Space Complexity: O(𝑚𝑛)
where 𝑚 is number of rows and 𝑛 is number of columns in the given matrix.
Problem-32 All Pairs Shortest Path Problem: Floyd's Algorithm: Given a weighted directed graph 𝐺 = (𝑉, 𝐸),
where 𝑉 = {1, 2, . . . , 𝑛}. Find the shortest path between any pair of nodes in the graph. Assume the weights are
represented in the matrix 𝐶[𝑉][𝑉], where 𝐶[𝑖][𝑗] indicates the weight (or cost) between the nodes 𝑖 and 𝑗. Also,
𝐶[𝑖][𝑗] = ∞ or -1 if there is no path from node 𝑖 to node 𝑗.
Solution: Let us try to find the DP solution (Floyd’s algorithm) for this problem. The Floyd’s algorithm for all pairs
shortest path problem uses matrix 𝐴[1. . 𝑛][1. . 𝑛] to compute the lengths of the shortest paths. Initially,
𝐴[𝑖, 𝑗] = 𝐶[𝑖, 𝑗] 𝑖𝑓 𝑖 ≠ 𝑗
=0 𝑖𝑓 𝑖 = 𝑗
From the definition, 𝐶[𝑖, 𝑗] = ∞ if there is no path from 𝑖 to 𝑗. The algorithm makes 𝑛 passes over A. Let A0 , A1 , . . . , An
be the values of 𝐴 on the 𝑛 passes, with A0 being the initial value.
Just after the k − 1th iteration, Ak−1 [𝑖, 𝑗] = smallest length of any path from vertex 𝑖 to vertex 𝑗 that does not pass
through the vertices {𝑘 + 1, 𝑘 + 2, … . 𝑛}. That means, it passes through the vertices possibly through {1, 2, 3, … , 𝑘 −
1}.
In each iteration, the value 𝐴[𝑖][𝑗] is updated with minimum of Ak−1 [𝑖, 𝑗] and Ak−1 [𝑖, 𝑘] + Ak−1 [𝑘, 𝑗].
Ak−1 [𝑖, 𝑗]
𝐴[𝑖, 𝑗] = min {
Ak−1 [𝑖, 𝑘] + Ak−1 [𝑘, 𝑗]
The 𝑘 𝑡ℎ pass explores whether the vertex 𝑘 lies on an optimal path from 𝑖 to 𝑗, for all 𝑖, 𝑗. The same is shown in the
diagram below.
j
Ak−1 [𝑘, 𝑗]
Ak−1 [𝑖, 𝑘]
i k
Ak−1 [𝑖, 𝑗]
1 2 … 𝑖 … … 𝑗 … 𝑛
𝑣1 𝑣2 … 𝑣𝑖 … 𝑣𝑗 𝑣𝑛
𝑉(𝑖 + 1, 𝑗 − 1) 𝑉(𝑖, 𝑗 − 2)
𝑉(𝑖, 𝑗) = 𝑀𝑎𝑥 {𝑀𝑖𝑛 { } + 𝑣𝑖 , 𝑀𝑖𝑛 { } + 𝑣𝑗 }
𝑉(𝑖 + 2, 𝑗) 𝑉(𝑖 + 1, 𝑗 − 1)
In the recursive call we have to focus on ith coin to 𝑗𝑡ℎ coin (𝑣𝑖 . . . 𝑣𝑗 ). Since it is our turn to pick the coin, we have
two possibilities: either we can pick 𝑣𝑖 or 𝑣𝑗 . The first term indicates the case if we select 𝑖 𝑡ℎ coin (𝑣𝑖 ) and the second
term indicates the case if we select 𝑗𝑡ℎ coin (𝑣𝑗 ). The outer 𝑀𝑎𝑥 indicates that we have to select the coin which gives
maximum value. Now let us focus on the terms:
• Selecting 𝑖 𝑡ℎ coin: If we select the 𝑖 𝑡ℎ coin then the remaining range is from 𝑖 + 1 𝑡𝑜 𝑗. Since we selected the
𝑖 𝑡ℎ coin we get the value 𝑣𝑖 for that. From the remaining range 𝑖 + 1 𝑡𝑜 𝑗, the opponents can select either
𝑖 + 1𝑡ℎ coin or 𝑗𝑡ℎ coin. But the opponents selection should be minimized as much as possible [the 𝑀𝑖𝑛
term]. The same is described in the below figure.
1 2 … 𝑖 𝑖+1 𝑗−1 𝑗 … 𝑛
𝑣1 𝑣2 … vi 𝑣𝑖+1 … … 𝑣𝑗−1 𝑣𝑗 𝑣𝑛
1 2 … 𝑖 𝑖+1 𝑗−1 𝑗 … 𝑛
𝑣1 𝑣2 … 𝑣𝑖 𝑣𝑖+1 … … 𝑣𝑗−1 𝑣𝑗 𝑣𝑛
If we have to find such arrangements for 12, we can either place a 1 at the end or we can add 2 in the arrangements
possible with 10. Similarly, let us say we have 𝐹𝑛 possible arrangements for n. Then for (𝑛 + 1), we can either place
just 1 at the end 𝑜𝑟 we can find possible arrangements for (𝑛 − 1) and put a 2 at the end. Going by the above
theory:
𝐹𝑛+1 = 𝐹𝑛 + 𝐹𝑛−1
Let’s verify the above theory for our original problem:
• In how many ways can we fill a 2 × 1 strip: 1 → Only one vertical tile.
• In how many ways can we fill a 2 × 2 strip: 2 → Either 2 horizontal or 2 vertical tiles.
• In how many ways can we fill a 2 × 3 strip: 3 → Either put a vertical tile in the 2 solutions possible for a
2 × 2 strip, or put 2 horizontal tiles in the only solution possible for a 2 × 1 strip. (2 + 1 = 3).
• Similarly, in how many ways can we fill a 2 × 𝑛 strip: Either put a vertical tile in the solutions possible
for 2 𝑋 (𝑛 − 1) strip or put 2 horizontal tiles in the solution possible for a 2 × (𝑛 − 2) strip. (𝐹𝑛−1 + 𝐹𝑛−2 ).
• That’s how we verified that our final solution is: 𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 with 𝐹1 = 1 and 𝐹2 = 2.
Problem-35 Longest Palindrome Subsequence: A sequence is a palindrome if it reads the same whether we
read it left to right or right to left. For example 𝐴, 𝐶, 𝐺, 𝐺, 𝐺, 𝐺, 𝐶, 𝐴. Given a sequence of length 𝑛, devise an algorithm
to output the length of the longest palindrome subsequence. For example, the string
Solution: Let us assume that the given matrix is 𝐴[𝑛][𝑚]. The first thing that must be observed is that there are
at most 2 ways we can come to a cell - from the left (if it's not situated on the first column) and from the top (if it's
not situated on the most upper row).
S[i-1][j]
S[i][j-1] S[i][j]
To find the best solution for that cell, we have to have already found the best solutions for all of the cells from
which we can arrive to the current cell. From above, a recurrent relation can be easily obtained as:
𝑆(𝑖, 𝑗 − 1), 𝑖𝑓 𝑗 > 0
𝑆(𝑖, 𝑗) = {𝐴[𝑖][𝑗] + 𝑀𝑎𝑥 { }}
𝑆(𝑖 − 1, 𝑗), 𝑖𝑓 𝑖 > 0
𝑆(𝑖, 𝑗) must be calculated by going first from left to right in each row and process the rows from top to bottom, or
by going first from top to bottom in each column and process the columns from left to right.
int findApplesCount(int A[][], int n, int m) {
int S[n][m];
for( int i = 1;i<=n;i++) {
for(int j = 1;i<=m;j++) {
S[i][j] = A[i][j];
if(j>0 && S[i][j] < S[i][j] + S[i][j-1])
S[i][j] += S[i][j-1];
if(i>0 && S[i][j] < S[i][j] + S[i-1][j])
S[i][j] +=S[i-1][j];
}
}
return S[n][m];
}
How many such subproblems are there? In the above formula, 𝑖 can range from 1 𝑡𝑜 𝑛 and 𝑗 can range
from 1 𝑡𝑜 𝑚. There are a total of 𝑛𝑚 subproblems and each one takes O(1).
Time Complexity is O(𝑛𝑚). Space Complexity: O(𝑛𝑚), where 𝑚 is number of rows and 𝑛 is number of columns in
the given matrix.
Problem-39 Similar to Problem-38, assume that we can go down, right one cell, or even in a diagonal direction.
We need to arrive at the bottom-right corner. Give DP solution to find the maximum number of apples we can
collect.
Solution: Yes. The discussion is very similar to Problem-38. Let us assume that the given matrix is A[n][m]. The
first thing that must be observed is that there are at most 3 ways we can come to a cell - from the left, from the
top (if it's not situated on the uppermost row) or from the top diagonal. To find the best solution for that cell, we
have to have already found the best solutions for all of the cells from which we can arrive to the current cell. From
above, a recurrent relation can be easily obtained:
𝑆(𝑖, 𝑗 − 1), 𝑖𝑓 𝑗 > 0
𝑆(𝑖, 𝑗) = {𝐴[𝑖][𝑗] + 𝑀𝑎𝑥 {𝑆(𝑖 − 1, 𝑗), 𝑖𝑓 𝑖 > 0}}
𝑆(𝑖 − 1, 𝑗 − 1), 𝑖𝑓 𝑖 > 0 𝑎𝑛𝑑 𝑗 > 0
𝑆(𝑖, 𝑗) must be calculated by going first from left to right in each row and process the rows from top to bottom, or
by going first from top to bottom in each column and process the columns from left to right.
S[i-1][j-1] S[i-1][j]
S[i][j-1] S[i][j]
How many such subproblems are there? In the above formula, 𝑖 can range from 1 to 𝑛 and 𝑗 can range from 1 𝑡𝑜 𝑚.
There are a total of 𝑛𝑚 subproblems and and each one takes O(1). Time Complexity is O(𝑛𝑚).
Space Complexity: O(𝑛𝑚) where 𝑚 is number of rows and 𝑛 is number of columns in the given matrix.
Problem-40 Maximum size square sub-matrix with all 1’s: Given a matrix with 0’s and 1’s, give an algorithm
for finding the maximum size square sub-matrix with all 1s. For example, consider the binary matrix below.
0 1 1 0 1
1 1 0 1 0
0 1 1 1 0
1 1 1 1 0
1 1 1 1 1
0 0 0 0 0
The maximum square sub-matrix with all set bits is
1 1 1
1 1 1
1 1 1
Solution: Let us try solving this problem using DP. Let the given binary matrix be 𝐵[𝑚][𝑚]. The idea of the
algorithm is to construct a temporary matrix 𝐿[ ][ ] in which each entry 𝐿[𝑖][𝑗] represents size of the square sub-
matrix with all 1’𝑠 including 𝐵[𝑖][𝑗] and 𝐵[𝑖][𝑗] is the rightmost and bottom-most entry in the sub-matrix.
Algorithm
1) Construct a sum matrix 𝐿[𝑚][𝑛] for the given matrix 𝐵[𝑚][𝑛].
a. Copy first row and first columns as is from 𝐵[ ][ ] to 𝐿[ ][ ].
b. For other entries, use the following expressions to construct L[ ][ ]
if(𝐵[𝑖][𝑗] )
𝐿[𝑖][𝑗] = 𝑚𝑖𝑛(𝐿[𝑖][𝑗 − 1], 𝐿[𝑖 − 1][𝑗], 𝐿[𝑖 − 1][𝑗 − 1]) + 1;
else 𝐿[𝑖][𝑗] = 0;
2) Find the maximum entry in 𝐿[𝑚][𝑛].
3) Using the value and coordinates of maximum entry in 𝐿[𝑖], print sub-matrix of 𝐵[ ][ ].
void MatrixSubSquareWithAllOnes(int B[][], int m, int n) {
int i, j, L[m][n], max_of_s, max_i, max_j;
// Setting first column of L[][]
for(i = 0; i < m; i++)
L[i][0] = B[i][0];
// Setting first row of L[][]
for(j = 0; j < n; j++)
L[0][j] = B[0][j];
// Construct other entries of L[][]
for(i = 1; i < m; i++) {
for(j = 1; j < n; j++) {
if(B[i][j] == 1)
L[i][j] = min(L[i][j-1], L[i-1][j], L[i-1][j-1]) + 1;
else L[i][j] = 0;
}
}
max_of_s = L[0][0]; max_i = 0; max_j = 0;
for(i = 0; i < m i++) {
for(j = 0; j < n; j++) {
if(L[i][j] > max_of_s){
max_of_s = L[i][j];
max_i = i;
max_j = j;
}
}
}
printf("Maximum sub-matrix");
for(i = max_i; i > max_i - max_of_s; i--) {
for(j = max_j; j > max_j - max_of_s; j--)
printf(“%d”,B[i][j]);
}
}
How many subproblems are there? In the above formula, 𝑖 can range from 1 to 𝑛 and 𝑗 can range from 1 to 𝑚.
There are a total of 𝑛𝑚 subproblems and each one takes O(1). Time Complexity is O(𝑛𝑚). Space Complexity is
O(𝑛𝑚), where 𝑛 is number of rows and 𝑚 is number of columns in the given matrix.
Problem-41 Maximum size sub-matrix with all 1’s: Given a matrix with 0’s and 1’s, give an algorithm for
finding the maximum size sub-matrix with all 1s. For example, consider the binary matrix below.
1 1 0 0 1 0
0 1 1 1 1 1
1 1 1 1 1 0
0 0 1 1 0 0
The maximum sub-matrix with all set bits is
1 1 1 1
1 1 1 1
Solution: If we draw a histogram of all 1’𝑠 cells in the above rows for a particular row, then maximum all 1′𝑠 sub-
matrix ending in that row will be equal to maximum area rectangle in that histogram. Below is an example for
3𝑟𝑑 row in the above discussed matrix [1]:
1 1 0 0 1 0
0 1 1 1 1 1
1 1 1 1 1 0
0 0 1 1 0 0
If we calculate this area for all the rows, maximum area will be our answer. We can extend our solution very easily
to find start and end co-ordinates. For this, we need to generate an auxiliary matrix 𝑆[][] where each element
represents the number of 1s above and including it, up until the first 0. 𝑆[][] for the above matrix will be as shown
below:
110010
021121
132230
003300
Now we can simply call our maximum rectangle in histogram on every row in 𝑆[][] and update the maximum area
every time. Also we don’t need any extra space for saving 𝑆. We can update original matrix (𝐴) to 𝑆 and after
calculation, we can convert 𝑆 back to 𝐴.
#define ROW 10
#define COL 10
int find_max_matrix(int A[ROW][COL]) {
int max, cur_max = 0;
//Calculate Auxilary matrix
for (int i=1; i<ROW; i++)
for(int j=0; j<COL; j++) {
if(A[i][j] == 1)
A[i][j] = A[i-1][j] + 1;
}
//Calculate maximum area in S for each row
for (int i=0; i<ROW; i++) {
max = maxRectangleArea(A[i], COL); //Refer Stacks Chapter
if(max > cur_max)
cur_max = max;
}
//Regenerate Original matrix
for (int i=ROW-1; i>0; i--)
for(int j=0; j<COL; j++) {
if(A[i][j])
A[i][j] = A[i][j] - A[i-1][j];
}
return cur_max;
}
Problem-42 Maximum sum sub-matrix: Given an 𝑛 × 𝑛 matrix 𝑀 of positive and negative integers, give an
algorithm to find the sub-matrix with the largest possible sum.
Solution: Let 𝐴𝑢𝑥[𝑟, 𝑐] represent the sum of rectangular subarray of 𝑀 with one corner at entry [1, 1] and the other
at [𝑟, 𝑐]. Since there are 𝑛2 such possibilities, we can compute them in O(𝑛2 ) time. After computing all possible
sums, the sum of any rectangular subarray of 𝑀 can be computed in constant time. This gives an O(𝑛4 ) algorithm:
we simply guess the lower-left and the upper-right corner of the rectangular subarray and use the 𝐴𝑢𝑥 table to
compute its sum.
Problem-43 Can we improve the complexity of Problem-42?
Solution: We can use the Problem-4 solution with little variation, as we have seen that the maximum sum array
of a 1 − D array algorithm scans the array one entry at a time and keeps a running total of the entries. At any
point, if this total becomes negative, then set it to 0. This algorithm is called 𝐾𝑎𝑑𝑎𝑛𝑒’𝑠 algorithm. We use this as
an auxiliary function to solve a two-dimensional problem in the following way.
public void findMaximumSubMatrix(int[][] A, int n){
//computing the vertical prefix sum for columns
int[][] M = new int[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (j == 0)
M[j][i] = A[j][i];
else
M[j][i] = A[j][i] + M[j - 1][i];
}
}
int maxSoFar = 0, min, subMatrix;
//iterate over the possible combinations applying Kadane's Alg.
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
min = 0;
subMatrix = 0;
for (int k = 0; k < n; k++) {
if (i == 0)
subMatrix += M[j][k];
else subMatrix += M[j][k] - M[i - 1 ][k];
if(subMatrix < min)
min = subMatrix;
if((subMatrix - min) > maxSoFar)
maxSoFar = subMatrix - min;
}
}
}
}
Time Complexity: O(𝑛3 ).
Problem-44 Given a number 𝑛, find the minimum number of squares required to sum a given number 𝑛.
𝐸𝑥𝑎𝑚𝑝𝑙𝑒𝑠: min[1] = 1 = 12 , min[2] = 2 = 12 + 12 , min[4] = 1 = 22 , min[13] = 2 = 32 + 22 .
Solution: This problem can be reduced to a coin change problem. The denominations are 1 to √𝑛. Now, we just
need to make change for 𝑛 with a minimum number of denominations.
Problem-45 Finding Optimal Number of Jumps To Reach Last Element: Given an array, start from the
first element and reach the last by jumping. The jump length can be at most the value at the current position
in the array. The optimum result is when you reach the goal in the minimum number of jumps. Example: Given
array A = {2,3,1,1,4}. Possible ways to reach the end (index list) are:
• 0,2,3,4 (jump 2 to index 2, and then jump 1 to index 3, and then jump 1 to index 4)
• 0,1,4 (jump 1 to index 1, and then jump 3 to index 4)
Since second solution has only 2 jumps it is the optimum result.
Solution: This problem is a classic example of Dynamic Programming. Though we can solve this by brute-force,
it would be complex. We can use the LIS problem approach for solving this. As soon as we traverse the array, we
should find the minimum number of jumps for reaching that position (index) and update our result array. Once
we reach the end, we have the optimum solution at last index in result array.
How can we find the optimum number of jumps for every position (index)? For first index, the optimum
number of jumps will be zero. Please note that if value at first index is zero, we can’t jump to any element and
return infinite. For 𝑛 + 1𝑡ℎ element, initialize result[𝑛 + 1] as infinite. Then we should go through a loop from 0 … 𝑛,
and at every index 𝑖, we should see if we are able to jump to 𝑛 + 1 from i or not. If possible, then see if total number
of jumps (result[i] + 1) is less than result[𝑛 + 1], then update result[𝑛 + 1], else just continue to next index.
//Define MAX 1 less so that adding 1 doesn't make it 0
#define MAX 0xFFFFFFFE;
unsigned int jump(int *array, int n) {
unsigned answer, int *result = new unsigned int[n];
int i, j;
//Boundary conditions
19.10 Dynamic Programming: Problems & Solutions 449
Data Structures and Algorithms Made Easy Dynamic Programming
if(n==0 || array[0] == 0)
return MAX;
result[0] = 0; //no need to jump at first element
for (i = 1; i < n; i++) {
result[i] = MAX; //Initialization of result[i]
for (j = 0; j < i; j++) {
//check if jump is possible from j to is
if(array[j] >= (i-j)) {
//check if better solution available
if((result[j] + 1) < result[i])
result[i] = result[j] + 1; //updating result[i]
}
}
}
answer = result[n-1]; //return result[n-1]
delete[] result;
return answer;
}
The above code will return optimum number of jumps. To find the jump indexes as well, we can very easily modify
the code as per requirement.
Time Complexity: Since we are running 2 loops here and iterating from 0 to 𝑖 in every loop, then total time takes
will be 1 + 2 + 3 + 4 + … + 𝑛 − 1. So time efficiency O(𝑛) = O(𝑛 ∗ (𝑛 − 1)/2) = O(𝑛2 ).
Space Complexity: O(𝑛) space for result array.
Problem-46 Explain what would happen if a dynamic programming algorithm is designed to solve a problem
that does not have overlapping sub-problems.
Solution: It will be just a waste of memory, because the answers of sub-problems will never be used again. And
the running time will be the same as using the Divide & Conquer algorithm.
Problem-47 Christmas is approaching. You’re helping Santa Claus to distribute gifts to children. For ease of
delivery, you are asked to divide 𝑛 gifts into two groups such that the weight difference of these two groups is
minimized. The weight of each gift is a positive integer. Please design an algorithm to find an optimal division
minimizing the value difference. The algorithm should find the minimal weight difference as well as the
groupings in O(𝑛𝑆) time, where 𝑆 is the total weight of these 𝑛 gifts. Briefly justify the correctness of your
algorithm.
𝑆
Solution: This problem can be converted into making one set as close to as possible. We consider an equivalent
2
𝑆
problem of making one set as close to W=⌊2⌋ as possible. Define FD(𝑖, 𝑤) to be the minimal gap between the weight
of the bag and W when using the first 𝑖 gifts only. WLOG, we can assume the weight of the bag is always less than
or equal to W. Then fill the DP table for 0≤i≤ 𝑛 and 0≤ 𝑤 ≤W in which F(0, 𝑤) = W for all 𝑤, and
𝐹𝐷(𝑖, 𝑤) = 𝑚𝑖𝑛{𝐹𝐷(𝑖 − 1, 𝑤 − 𝑤𝑖 )−𝑤𝑖 , 𝐹𝐷(𝑖 − 1, 𝑤)} 𝑖𝑓 {𝐹𝐷(𝑖 − 1, 𝑤 − 𝑤𝑖 ) ≥ 𝑤𝑖
= 𝐹𝐷(𝑖 − 1, 𝑤) otherwise
This takes O(𝑛𝑆) time. 𝐹𝐷(𝑛, 𝑊) is the minimum gap. Finally, to reconstruct the answer, we backtrack from (𝑛, 𝑊).
During backtracking, if 𝐹𝐷(𝑖, 𝑗) = 𝐹𝐷(𝑖 − 1, 𝑗) then 𝑖 is not selected in the bag and we move to F(𝑖 − 1, 𝑗). Otherwise,
𝑖 is selected and we move to F(𝑖 − 1, 𝑗 − 𝑤𝑖 ).
Problem-48 A circus is designing a tower routine consisting of people standing atop one another's shoulders.
For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him
or her. Given the heights and weights of each person in the circus, write a method to compute the largest
possible number of people in such a tower.
Solution: It is same as Box stacking and Longest increasing subsequence (LIS) problem.
Chapter
Complexity
Classes 20
20.1 Introduction
In the previous chapters we have solved problems of different complexities. Some algorithms have lower rates of
growth while others have higher rates of growth. The problems with lower rates of growth are called 𝑒𝑎𝑠𝑦 problems
(or 𝑒𝑎𝑠𝑦 𝑠𝑜𝑙𝑣𝑒𝑑 𝑝𝑟𝑜𝑏𝑙𝑒𝑚𝑠) and the problems with higher rates of growth are called ℎ𝑎𝑟𝑑 problems (or ℎ𝑎𝑟𝑑 𝑠𝑜𝑙𝑣𝑒𝑑
𝑝𝑟𝑜𝑏𝑙𝑒𝑚𝑠). This classification is done based on the running time (or memory) that an algorithm takes for solving
the problem.
Time Complexity Name Example Problems
O(1) Constant Adding an element to the front of a linked list
O(𝑙𝑜𝑔𝑛) Logarithmic Finding an element in a binary search tree
O(𝑛) Linear Finding an element in an unsorted array
Easy solved problems
O(𝑛𝑙𝑜𝑔𝑛) Linear Logarithmic Merge sort
O(𝑛2 ) Quadratic Shortest path between two nodes in a graph
O(𝑛3 ) Cubic Matrix Multiplication
O(2𝑛 ) Exponential The Towers of Hanoi problem
Hard solved problems
O(𝑛!) Factorial Permutations of a string
There are lots of problems for which we do not know the solutions. All the problems we have seen so far are the
ones which can be solved by computer in deterministic time. Before starting our discussion let us look at the basic
terminology we use in this chapter.
Yes
Input Algorithm
No
problems with related complexity. It is the branch of theory of computation that studies the resources required
during computation to solve a given problem.
The most common resources are time (how much time the algorithm takes to solve a problem) and space (how
much memory it takes).
NP Class
The complexity class 𝑁𝑃 (𝑁𝑃 stands for non-deterministic polynomial time) is the set of decision problems that
can be solved by a non-deterministic machine in polynomial time. 𝑁𝑃 class problems refer to a set of problems
whose solutions are hard to find, but easy to verify.
For better understanding let us consider a college which has 500 students on its roll. Also, assume that there are
100 rooms available for students. A selection of 100 students must be paired together in rooms, but the dean of
students has a list of pairings of certain students who cannot room together for some reason.
The total possible number of pairings is too large. But the solutions (the list of pairings) provided to the dean, are
easy to check for errors. If one of the prohibited pairs is on the list, that's an error. In this problem, we can see
that checking every possibility is very difficult, but the result is easy to validate.
That means, if someone gives us a solution to the problem, we can tell them whether it is right or not in polynomial
time. Based on the above discussion, for 𝑁𝑃 class problems if the answer is 𝑦𝑒𝑠, then there is a proof of this fact,
which can be verified in polynomial time.
Co-NP Class
𝐶𝑜 − 𝑁𝑃 is the opposite of 𝑁𝑃 (complement of 𝑁𝑃). If the answer to a problem in 𝐶𝑜 − 𝑁𝑃 is 𝑛𝑜, then there is a proof
of this fact that can be checked in polynomial time.
𝑃 Solvable in polynomial time
𝑁𝑃 𝑌𝑒𝑠 answers can be checked in polynomial time
𝐶𝑜 − 𝑁𝑃 𝑁𝑜 answers can be checked in polynomial time
Co-NP NP
One of the important open questions in theoretical computer science is whether or not 𝑃 = 𝑁𝑃. Nobody knows.
Intuitively, it should be obvious that 𝑃 ≠ 𝑁𝑃, but nobody knows how to prove it.
Another open question is whether 𝑁𝑃 and 𝐶𝑜 − 𝑁𝑃 are different. Even if we can verify every YES answer quickly,
there’s no reason to think that we can also verify NO answers quickly.
It is generally believed that 𝑁𝑃 ≠ 𝐶𝑜 − 𝑁𝑃, but again nobody knows how to prove it.
NP-hard Class
It is a class of problems such that every problem in 𝑁𝑃 reduces to it. All 𝑁𝑃-hard problems are not in 𝑁𝑃, so it
takes a long time to even check them. That means, if someone gives us a solution for 𝑁𝑃-hard problem, it takes a
long time for us to check whether it is right or not.
A problem 𝐾 is 𝑁𝑃-hard indicates that if a polynomial-time algorithm (solution) exists for 𝐾 then a polynomial-
time algorithm for every problem is 𝑁𝑃. Thus:
NP-Hard
NP
NP-complete Class
Finally, a problem is 𝑁𝑃-complete if it is part of both 𝑁𝑃-hard and 𝑁𝑃. 𝑁𝑃-complete problems are the hardest
problems in 𝑁𝑃. If anyone finds a polynomial-time algorithm for one 𝑁𝑃-complete problem, then we can find
polynomial-time algorithm for every 𝑁𝑃-complete problem. This means that we can check an answer fast and every
problem in 𝑁𝑃 reduces to it.
NP-
Hard
NP
NP-
Complete
NP-Hard
Co-
NP
P NP-Complete
The set of problems that are 𝑁𝑃-hard is a strict superset of the problems that are 𝑁𝑃-complete. Some problems
(like the halting problem) are 𝑁𝑃-hard, but not in 𝑁𝑃. 𝑁𝑃-hard problems might be impossible to solve in general.
We can tell the difference in difficulty between 𝑁𝑃-hard and 𝑁𝑃-complete problems because the class 𝑁𝑃 includes
everything easier than its "toughest" problems – if a problem is not in 𝑁𝑃, it is harder than all the problems in 𝑁𝑃.
Does P==NP?
If 𝑃 = 𝑁𝑃, it means that every problem that can be checked quickly can be solved quickly (remember the difference
between checking if an answer is right and actually solving a problem).
This is a big question (and nobody knows the answer), because right now there are lots of 𝑁𝑃-complete problems
that can't be solved quickly. If 𝑃 = 𝑁𝑃, that means there is a way to solve them fast. Remember that "quickly"
means not trial-and-error. It could take a billion years, but as long as we didn't use trial and error, it was quick.
In future, a computer will be able to change that billion years into a few minutes.
20.7 Reductions
Before discussing reductions, let us consider the following scenario. Assume that we want to solve problem 𝑋 but
feel it’s very complicated. In this case what do we do?
The first thing that comes to mind is, if we have a similar problem to that of 𝑋 (let us say 𝑌), then we try to map 𝑋
to 𝑌 and use 𝑌’𝑠 solution to solve 𝑋 also. This process is called reduction.
Instance of Solution to X
Algorithm for 𝑌
Input (for 𝑋)
Algorithm for 𝑋
In order to map problem 𝑋 to problem 𝑌, we need some algorithm and that may take linear time or more. Based
on this discussion the cost of solving problem 𝑋 can be given as:
𝐶𝑜𝑠𝑡 𝑜𝑓 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝑋 = 𝐶𝑜𝑠𝑡 𝑜𝑓 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝑌 + 𝑅𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒
Now, let us consider the other scenario. For solving problem 𝑋, sometimes we may need to use 𝑌’𝑠 algorithm
(solution) multiple times. In that case,
𝐶𝑜𝑠𝑡 𝑜𝑓 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝑋 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑇𝑖𝑚𝑒𝑠 ∗ 𝐶𝑜𝑠𝑡 𝑜𝑓 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝑋 + 𝑅𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒
The main thing in 𝑁𝑃-Complete is reducibility. That means, we reduce (or transform) given 𝑁𝑃-Complete problems
to other known 𝑁𝑃-Complete problem. Since the 𝑁𝑃-Complete problems are hard to solve and in order to prove
that given 𝑁𝑃-Complete problem is hard, we take one existing hard problem (which we can prove is hard) and try
to map given problem to that and finally we prove that the given problem is hard.
Note: It’s not compulsory to reduce the given problem to known hard problem to prove its hardness. Sometimes,
we reduce the known hard problem to given problem.
∑ aij xj = 𝑏𝑖 1 ≤ 𝑖 ≤ 𝑀
𝑗=1
xj ∈ {0,1} 1 ≤ 𝑗 ≤ 𝑁
In the figure, arrows indicate the reductions. For example, Ham-Cycle (Hamiltonian Cycle Problem) can be reduced
to CNF-SAT. Same is the case with any pair of problems. For our discussion, we can ignore the reduction process
20.7 Reductions 454
Data Structures and Algorithms Made Easy Complexity Classes
for each of the problems. There is a theorem called 𝐶𝑜𝑜𝑘’𝑠 𝑇ℎ𝑒𝑜𝑟𝑒𝑚 which proves that Circuit satisfiability problem
is NP-hard. That means, Circuit satisfiability is a known 𝑁𝑃-hard problem.
CNF-SAT NP-hard unless P=NP
3-CNF-SAT Clique
Chapter
Miscellaneous
Concepts 21
21.1 Introduction
In this chapter we will cover the topics which are useful for interviews and exams.
21.2.2 Bitwise OR
The bitwise OR tests two binary numbers and returns bit values of 1 for positions where either bit or both bits are
one, the result of 0 only happens when both bits are 0:
01001011
| 00010101
----------
01011111
Note: For computing –𝑛, use two’s complement representation. That means, toggle all bits and add 1.
while(n) {
if(n%2 ==1)
count++;
n = n/2;
}
Time Complexity: This requires one iteration per bit and the number of iterations depends on system.
Method3: Using toggling approach: 𝑛 & 𝑛 − 1
unsigned int n;
unsigned int count=0;
while(n) {
count++;
n &= n - 1;
}
Time Complexity: The number of iterations depends on the number of 1 bits in the number.
Method4: Using preprocessing idea. In this method, we process the bits in groups. For example if we process
them in groups of 4 bits at a time, we create a table which indicates the number of one’s for each of those
possibilities (as shown below).
0000→0 0100→1 1000→1 1100→2
0001→1 0101→2 1001→2 1101→3
0010→1 0110→2 1010→2 1110→3
0011→2 0111→3 1011→3 1111→4
The following code to count the number of 1s in the number with this approach:
int Table = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
int count = 0;
for(; n; n >>= 4)
count = count + Table[n & 0xF];
return count;
Time Complexity: This approach requires one iteration per 4 bits and the number of iterations depends on system.
upper limit. Once the last column is printed, direction changes to left, the column is discarded by decrementing
the right hand limit.
void spiral(int **A, int n) {
int rowStart=0, columnStart=0;
int rowEnd=n-1, columnEnd=n-1;
while(rowStart<=rowEnd && columnStart<=columnEnd) {
int i=rowStart, j=columnStart;
for(j=columnStart; j<=columnEnd; j++)
printf("%d ",A[i][j]);
for(i=rowStart+1, j--; i<=rowEnd; i++)
printf("%d ",A[i][j]);
for(j=columnEnd-1, i--; j>=columnStart; j--)
printf("%d ",A[i][j]);
for(i=rowEnd-1, j++; i>=rowStart+1; i--)
printf("%d ",A[i][j]);
rowStart++; columnStart++; rowEnd--; columnEnd--;
}
}
Time Complexity: O(𝑛2 ). Space Complexity: O(1).
Problem-2 Give an algorithm for shuffling the deck of cards.
Solution: Assume that we want to shuffle an array of 52 cards, from 0 to 51 with no repeats, such as we might
want for a deck of cards. First fill the array with the values in order, then go through the array and exchange each
element with a randomly chosen element in the range from itself to the end. It's possible that an element will swap
with itself, but there is no problem with that.
void shuffle(int cards[], int n){
srand(time(0)); // initialize seed randomly
for (int i=0; i<n; i++)
cards[i] = i; // filling the array with card number
for (int i=0; i<n; i++) {
int r = i + (rand() % (52-i)); // Random remaining position.
int temp = cards[i];
cards[i] = cards[r];
cards[r] = temp;
}
printf("Shuffled Cards:" );
for (int i=0; i<n; i++)
printf("%d ", cards[i]);
}
Time Complexity: O(𝑛). Space Complexity: O(1).
Problem-3 Reversal algorithm for array rotation: Write a function rotate(A[], d, n) that rotates A[] of size 𝑛
by 𝑑 elements. For example, the array 1, 2, 3, 4, 5, 6, 7 becomes 3, 4, 5, 6, 7, 1, 2 after 2 rotations.
Solution: Consider the following algorithm.
Algorithm:
rotate(Array[], d, n)
reverse(Array[], 1, d) ;
reverse(Array[], d + 1, n);
reverse(Array[], l, n);
Let AB be the two parts of the input Arrays where A = Array[0..d-1] and B = Array[d..n-1]. The idea of the algorithm
is:
Reverse A to get ArB. /* Ar is reverse of A */
Reverse B to get ArBr. /* Br is reverse of B */
Reverse all to get (ArBr) r = BA.
For example, if Array[] = [1, 2, 3, 4, 5, 6, 7], d =2 and n = 7 then, A = [1, 2] and B = [3, 4, 5, 6, 7]
Reverse A, we get ArB = [2, 1, 3, 4, 5, 6, 7], Reverse B, we get ArBr = [2, 1, 7, 6, 5, 4, 3]
Reverse all, we get (ArBr)r = [3, 4, 5, 6, 7, 1, 2]
Implementation:
//function to left rotate Array[] of size n by d
void leftRotate(int Array[], int d, int n) {
rvereseArrayay(Array, 0, d-1);
rvereseArrayay(Array, d, n-1);
21.3 Other Programming Questions 461
Data Structures and Algorithms Made Easy Miscellaneous Concepts
rvereseArrayay(Array, 0, n-1);
}
//UTILITY FUNCTIONS: function to print an Arrays
void printArrayay(int Array[], int size){
for(int i = 0; i < size; i++)
printf("%d ", Array[i]);
printf("%\n ");
}
//function to reverse Array[] from index start to end
void rvereseArrayay(int Array[], int start, int end) {
int i;
int temp;
while(start < end){
temp = Array[start];
Array[start] = Array[end];
Array[end] = temp;
start++;
end--;
}
}
Problem-4 Suppose you are given an array s[1...n] and a procedure reverse (s,i,j) which reverses the order of
elements in between positions i and j (both inclusive). What does the following sequence
do, where 1 < k <= n:
reverse (s, 1, k);
reverse (s, k + 1, n);
reverse (s, 1, n);
a) Rotates s left by k positions b) Leaves s unchanged c) Reverses all elements of s d) None of the above
Solution: (b). Effect of the above 3 reversals for any 𝑘 is equivalent to left rotation of the array of size 𝑛 by 𝑘
[refer to Problem-3].
Problem-5 Finding Anagrams in Dictionary: you are given these 2 files: dictionary.txt and jumbles.txt
The jumbles.txt file contains a bunch of scrambled words. Your job is to print out those jumbles words,
1 word to a line. After each jumbled word, print a list of real dictionary words that could be formed by
unscrambling the jumbled word. The dictionary words that you have to choose from are in the
dictionary.txt file. Sample content of jumbles.txt:
nwae: wean anew wane
eslyep: sleepy
rpeoims: semipro imposer promise
ettniner: renitent
ahicryrhe: hierarchy
dica: acid cadi caid
dobol: blood
......
%
Solution: Step-By-Step
𝑆𝑡𝑒𝑝 1: Initialization
• Open the dictionary.txt file and read the words into an array (before going further verify by echoing out
the words back from the array out to the screen).
• Declare a hash table variable.
𝑆𝑡𝑒𝑝 2: Process the Dictionary for each dictionary word in the array. Do the following:
We now have a hash table where each key is the sorted form of a dictionary word and the value associated to it is
a string or array of dictionary words that sort to that same key.
• Remove the newline off the end of each word via chomp($word);
• Make a sorted copy of the word - i.e. rearrange the individual chars in the string to be sorted alphabetically
• Think of the sorted word as the key value and think of the set of all dictionary words that sort to the exact
same key word as being the value of the key
• Query the hashtable to see if the sortedWord is already one of the keys
• If it is not already present then insert the sorted word as key and the unsorted original of the word as the
value
• Else concat the unsorted word onto the value string already out there (put a space in between)
B
Solution: Before finding the solution, we try to understand the problem with a simpler version. The smallest
problem that we can consider is the number of possible routes in a 1×1 grid.
0 1
1 2
From the above figure, it can be seen that:
• From both the bottom-left and the top-right corners there's only one possible route to the destination.
• From the top-left corner there are trivially two possible routes.
Similarly, for 2𝑥2 and 3𝑥3 grids, we can fill the matrix as:
0 1 0 1 1
1 2 1 2 3
1 3 6
From the above discussion, it is clear that to reach the bottom right corner from left top corner, the paths are
overlapping. As unique paths could overlap at certain points (grid cells), we could try to alter the previous
algorithm, as a way to avoid following the same path again. If we start filling 4𝑥4 and 5𝑥5, we can easily figure out
the solution based on our childhood mathematics concepts.
0 1 1 1 0 1 1 1 1
1 2 3 4 1 2 3 4 5
1 3 6 10 1 3 6 10 15
1 4 10 20 1 4 10 20 35
1 5 15 35 70
Are you able to figure out the pattern? It is the same as 𝑃𝑎𝑠𝑐𝑎𝑙𝑠 triangle. So, to find the number of ways, we can
simply scan through the table and keep counting them while we move from left to right and top to bottom
(starting with left-top). We can even solve this problem with mathematical equation of 𝑃𝑎𝑠𝑐𝑎𝑙𝑠 triangle.
Problem-7 Given a string that has a set of words and spaces, write a program to move the spaces to 𝑓𝑟𝑜𝑛𝑡 of
string. You need to traverse the array only once and you need to adjust the string in place.
𝐼𝑛𝑝𝑢𝑡 = "move these spaces to beginning" 𝑂𝑢𝑡𝑝𝑢𝑡 =" movethesepacestobeginning"
Solution: Maintain two indices 𝑖 and 𝑗; traverse from end to beginning. If the current index contains char, swap
chars in index 𝑖 with index 𝑗. This will move all the spaces to beginning of the array.
void mySwap(char A[],int i,int j){ void testCode(int argc, char * argv[]){
char temp=A[i]; char sparr[]="move these spaces to beginning";
A[i]=A[j]; printf("Value of A is: %s\n", sparr);
A[j]=temp; moveSpacesToBegin(sparr);
} printf("Value of A is: %s", sparr);
void moveSpacesToBegin(char A[]){ }
int i=strlen(A)-1;
int j=i;
for(; j>=0; j--){
if(!isspace(A[j]))
mySwap(A,i--,j);
}
}
Time Complexity: O(𝑛) where 𝑛 is the number of characters in input array. Space Complexity: O(1).
Problem-8 For Problem-7, can we improve the complexity?
Solution: We can avoid a swap operation with a simple counter. But, it does not reduce the overall complexity.
void moveSpacesToBegin(char A[]){ int testCode(){
int n=strlen(A)-1,count=n; char sparr[]="move these spaces to beginning";
int i=n; printf("Value of A is: %s\n", sparr);
for(;i>=0;i--){ moveSpacesToBegin(sparr);
if(A[i]!=' ') printf("Value of A is: %s", sparr);
A[count--]=A[i]; }
}
while(count>=0)
A[count--]=' ';
}
Time Complexity: O(𝑛) where 𝑛 is the number of characters in input array. Space Complexity: O(1).
Problem-9 Given a string that has a set of words and spaces, write a program to move the spaces to 𝑒𝑛𝑑 of
string. You need to traverse the array only once and you need to adjust the string in place.
𝐼𝑛𝑝𝑢𝑡 = "move these spaces to end" 𝑂𝑢𝑡𝑝𝑢𝑡 = "movethesepacestoend "
Solution: Traverse the array from left to right. While traversing, maintain a counter for non-space elements in
array. For every non-space character A[𝑖], put the element at A[𝑐𝑜𝑢𝑛𝑡] and increment 𝑐𝑜𝑢𝑛𝑡. After complete
traversal, all non-space elements have already been shifted to front end and 𝑐𝑜𝑢𝑛𝑡 is set as index of first 0. Now,
all we need to do is run a loop which fills all elements with spaces from 𝑐𝑜𝑢𝑛𝑡 till end of the array.
void moveSpacesToEnd(char A[]){ void testCode(int argc, char * argv[]){
// Count of non-space elements char sparr[]="move these spaces to end";
Problem-16 Given two strings s and t which consist of only lowercase letters. String t is generated by random
shuffling string s and then add one more letter at a random position. Find the letter that was added in t.
Example Input: s = "abcd" t = "abcde" Output: e
Explanation: 'e' is the letter that was added.
Solution: Since there is only character difference between the two given strings, we can simply perform the XOR
of all the characters from both the strings to get the difference.
char findTheDifference(string s, string t) {
char r=0;
for(char c:s) r ^=c;
for(char c:t) r ^=c;
return r;
}
Time Complexity: O(𝑛), where 𝑛 is the length of arrays. Space Complexity: O(1).
Problem-17 Given an integer array, sort the integers in the array in ascending order by the number of 1's in
their binary representation and in case of two or more integers having the same number of 1's you must sort
them in ascending order.
Solution: Refer the solution section of 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑄𝑢𝑒𝑢𝑒𝑠 chapter.