Data Structures and Algorithms: Nikin Baidar
Data Structures and Algorithms: Nikin Baidar
Nikin Baidar
Year 2022
Chapter 1. Basics of Data Structures
Data Structures
Three things to note:
– Static types support only queries, while dynamic types also support up-
dates.
– A persistent data structure is a dynamic data structure that supports op-
erations on past versions.
Introduction to Big O
Big O is a mathematical notation used to describe the computational complexity of
an algorithm. The computational complexity is split into two parts: (1) time com-
plexity and (2) space complexity. The time complexity of an algorithm is the amount
of time that the algorithm needs to run relative to the input size. The space com-
plexity of an algorithm is the amount of memory allocated/needed by the algorithm
when run relative to the input size.
Complexity is described by a function of variables that can change with the input.
The most common variables is ′ n ′ , which usually describes the length of an input
array or string. This function is wrapped by a capital ′ O ′ . Here are some example
complexities:
• O(n)
• O(n2 )
• O(2n )
• O(log n)
• O(n.m)
These functions describe the limiting behavior of a function when the argument
tends towards a particular value or infinity. Meaning that, they describe how the
amount of time(operations)/memory required by the arguments tend to infinity.
Because the variables are tending to infinity, constants are always ignored. This
essentially means O(1000n) = O(100n) = O(10n) = O(n) = O(n/10). This also
means that addition/subtraction between terms of the same variable can also be
ignored. For example, O(2n + n2 − n) = O(2n ), because as ′ n ′ tends to infinity, 2n
completely dominates the other two terms. Essentially, Big O characterizes func-
tions according to their growth rates. The letter ′ O ′ is used because the growth
rate of a function is also referred to as the order of the function. A description of
a function in terms of big O notation usually only provides an upper bound on the
growth rate of the function. In CS, big O is used to classify algorithms according
to how their resource requirements grow during runtime as the input size grows.
Being able to analyze an algorithm and derive it’s time and space complexity is a
crucial skill. Being able to analyze an algorithm also enables you to determine what
parts of it can be improved.
The best complexity is O(1), called “constant time” or “constant space”. It means
the algorithm ALWAYS uses the same amount of resources, regardless of the input.
When talking about complexity, there are normally three cases:
• Best case
• Average case
• Worst case
In most cases, all three of these will be equal, but some algorithms will have
them differ. If you have to choose one to call the algorithm’s complexity, it is most
correct to use the worst case scenario.
Let’s look at some example algorithms in pseudo-code and talk about their complex-
ities.
Example 1:
// Given an array 'arr' of length 'n'.
for (int item: arr) {
echo item
}
This algorithm has time complexity of O(n). In each iteration, we are perform-
ing an echo action, which costs O(1) time. Since the for loop runs n times, the
complexity becomes O(1 . n).
Example 2:
// Given an array 'arr' with length 'n',
for (int item: arr) {
for (int i=0; i< 500,000; i++) {
echo item
}
}
Here, the statement inside the innermost loop i.e. echo item takes O(1), in-
ner loop runs 500, 000 times, outer loop runs n times. Time complexity becomes
O(1.500, 000.n) = O(5000, 000n) = O(n).
Even though the first two algorithms technically have the same complexity, in real-
ity the second algorithm is much slower than the first one.
Example 3:
echo item * item_2 has O(1) complexity, inner loop runs n times and outer loop
runs n times too. So the complexity becomes O(1.n.n) = O(n2 ).
Sometimes, work is done by an algorithm that isn’t included in the time complexity.
For example, here we said that multiplication costs O(1), but multiplication takes
longer on larger numbers. This is ignored because, the number of inputs n has
nothing to do with the size of the element of the array.
Example 4:
// Loop A
for (int num:arr)
echo num
// Loop A
for (int num:arr)
echo num
// Loop B
for (int num:arr2)
echo num
The inner statements in each of loop A and B has O(1) time complexity. Loop
A has a complexity of O(1.n) and loop B has time complexity O(1.m). Loop A runs
twice. So the total complexity of the algorithm becomes: O(n) + O(n) + O(m) =
O(n + n + m) = O(2n + m) = O(n + m).
n(n + 1) n2 + n
1 + 2 + 3 + ... + n = =
2 2
n2 + n
In terms of big O, is O(n2 ) because the addition term in the numerator
2
and the constant term in the denominator are both ignored.
When you declare variables and modify the data stored by said variables, your al-
gorithm is allocating memory. We never count the space used by the input(it is bad
practice to modify input), and don’t count the space used by the output.
Example 1:
This algorithm has a space complexity of O(1). The only space allocated is an inte-
ger variable num, which is constant relative to n.
Example 2:
Example 3:
This algorithm has a space complexity of O(n). The array nums stores the first 1%
n
of numbers in arr. This gives a space complexity of O( ) = O(n).
100
Example 4:
This algorithm has a space complexity of O(n.m). We are creating a grid that has
dimensions n.m.
Decomposition (Divide and Conquer) Algorithms
/* Given an integer 'x', compute it's 'n'-th power.*/
int p(int x, int n) {
return (n > 0) ? x * p(x, n-1) : 1 ;
}
This algorithm has a time complexity of O(n). The number of times x is multiplied
with itself increases as the value of n increases.
One approach to the design of algorithms it to decompose a problem into sub
problems that resemble the original problem, but on a reduced scale. Suppose,
in the above example we want to compute xn and the above algorithm has a linear
growth rate (the number of times x is multiplied with itself increases as n increases).
However, xn can be computed from x(n/2) , reason that:
1 if n = 0
n (n/2) 2
x (x ) if n is even, (1.1)
(n/2) 2
x(x ) if n is odd
if (isEven(n))
return result * result;
else
return x * result * result;
}
• Linear order
• Operations supported: Retrieving, Inserting and Removing elements given
its position.
• E.g. Arrays, Linked Lists ; Stacks are special type of sequences where
insertions and deletions can only be done at the head or the tail of the
sequences.
• A basic form of data structure that is used to realize and implement other
complex data types and data structures.
2. Priority queue
3. Dictionary
• Basic Example: Hash tables
• Advanced example: Skip lists, tries, and balanced search trees such as
APL-trees, red-black trees, 2-3 trees, 2-3-4 trees, weight-balanced trees,
biased search trees and splay trees.
• Applications: When you search for a combination of elements in the data
structure that make up something. For instance, anagrams, two sums,
three sums. (In external memories, dictionaries are often implemented
as B-trees and their variations).
4. Union-Find
• Graphs and networks: Adjacency matrix, Adjacency lists, link-cut tree, dy-
namic expression tree, topology tree, scarification tree.
Sequence (S)
A sequence is a container that stores elements in a certain order which is imposed
by the operations performed. Using locators, we can define a full repertory of op-
erations for a sequence S.
• SIZE(N)
• HEAD(c)
• TAIL(c)
• LOCATE_RANK(r,c)
• PREV(c', c)
• NEXT(c, c')
• INSERT_BEFORE(e, c', c)
• INSERT_AFTER(e, c, c')
• INSERT_HEAD(e, c)
• INSERT_TAIL(e, c)
• INSERT_RANK(e, r, c)
• REMOVE_HEAD(e, c)
• REMOVE_TAIL(e, c)
• REMOVE_RANK(e, r, c)
• MODIFY_HEAD(e, c)
• MODIFY_TAIL(e, c)
• MODIFY_RANK(e, r, c)
Additional operations can be achieved with the iteration operator NEXT such as
MAX and MIN.
Priority Queue (Q)
Offers two most basic operations:
1. INSERT
2. REMOVE_MAX
In a Q, insertions are done only at the head or the tail. So, a Q INSERT operation
can be realized with a S INSERT_HEAD or INSERT_TAIL operation. However, this ren-
ders the S unsorted. Operation MAX can be performed by scanning the S with the
NEXT operation, keeping track of the maximum element encountered until SIZE(N)
is scanned. And finally, MAX followed by REMOVE realizes the REMOVE_MAX operation.
An alternative approach is to use a sorted S. This however, breaks the rule that
in a Q insertions can only be done at either the tail or the head. With a sorted S,
the operation MAX corresponds to the operation TAIL. However, the INSERT opera-
tion corresponds to INSERT_RANK, as it becomes necessary to scan the S to find the
appropriate rank where a new element (with a certain priority) can be inserted.
Realization of Q with a Heap
Figure 1.1: (i) {D, E} and {F, G} are cousins because B and C are siblings. (ii) Since
B is the parent of D, B is the grandparent of H and I. A leaf node is a node that does
not contain any child nodes.
HEAP IMPLEMENTATION:
Heaps are usually implemented with an array as follows:
• Each element in the array represents a node of the heap, and
• The parent/child relationship is defined implicitly by the elements’ indices in
the array.
• How the tree is represented:
– The first element of the array at index 0 is the root node at level 0.
– The next 2 indices at index 1, 2 represent the root’s children at level 1.
– The next 4 indices represent the nodes of level 2.
– The next 8 indices represent the nodes of level 3 and so on.
• A pattern can be seen between the indexes of the parents and the children.
Given a node at index i, it has two children, the left child node for a given
node i is at 2(i) + 1 and the right child node is at 2i + 2.
• Given a child node at index i, its parent is found at index floor((i − 1)/2).
• Divides the input into a sorted and an unsorted region, and iteratively shrinks
the unsorted region by extraction g the larges element from it.
• In place sorting algorithm.
• Construction of a heap could be thought as a step in heap sort, so that the
maximum element is always at the root of the heap.
The heapsort (Bottom Up Approach):
Step 0: Start with a heap that does not follow partial order property (Recall level prop-
erty is implicit).
Step 1: Start from the parent of the last node in the heap (or start with the last non-
leaf node). In a heap with ‘n’ nodes, the index of parent of the nth node is
given by iLastLeafNode = iParent(iLastNode).
Step 2: Perform siftDown (not shift down, to “sift” is to go through especially to sort
out what is useful or valuable). In the siftDown operation, we repair a sub
heap, whose root is at index ”i”, while this suppositional root has at least one
child:
while exists(iLeftChild(r)) {
/* If the left child is greater. */
if (A[i] < A[iLeftChild])
toSwap = iLeftChild
Step 3: {Remember in Step 1 we started out with the last parent node (i).} Jump to
the next parent node i.e. decrement i and repeat Step 2 while i is greater
than 0.
Graphs
Low Level Design For Spreadsheet Software Using Graphs
Graphs are suitable for writing spreadsheet software, where each cell is connected
to the other based on the relations they have (each cell is like a node on a graph).
The reason for doing t his as follows:
• One of the biggest points we would have to consider in an application like excel
is how we optimize the way we update teh dependent cells in an excel file when
a cell changes. This can be very efficiently done by using graphs through DFS
(Depth First Search) or BFS (Breadth First Search), as we would have access
to cells that are depended on the current cell.
Expanding on the above idea, let’s say a cell has the following formula:
cell_X = sum(cell_A, cell_B, cell_C) + produce(cell_D, cell_E, cell_F).
We can notice that our cell_X depends on other cells, and use different functions
for each group of cells. We can also expand and nest different functions with their
own cell dependencies. We can efficiently update cell_X when any of the depen-
dencies changes, as we are using a graph (we just have to probe all the dependent
cells to update theri values). To calculate the value itself, we can go for a recursive
function to deal with multiple nested functions and cell.s