0% found this document useful (0 votes)
15 views16 pages

Data Structures and Algorithms: Nikin Baidar

The document discusses the fundamentals of data structures, emphasizing their role as containers for elements and the importance of locators for accessing these elements. It covers various classifications of data structures, including static vs. dynamic, explicit vs. implicit, and internal vs. external, as well as the significance of time and space complexity in algorithm analysis using Big O notation. Additionally, it introduces key concepts such as decomposition algorithms, binary search, and abstract data types like sequences, priority queues, and dictionaries.

Uploaded by

nikin.research
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

Data Structures and Algorithms: Nikin Baidar

The document discusses the fundamentals of data structures, emphasizing their role as containers for elements and the importance of locators for accessing these elements. It covers various classifications of data structures, including static vs. dynamic, explicit vs. implicit, and internal vs. external, as well as the significance of time and space complexity in algorithm analysis using Big O notation. Additionally, it introduces key concepts such as decomposition algorithms, binary search, and abstract data types like sequences, priority queues, and dictionaries.

Uploaded by

nikin.research
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Structures and Algorithms

Nikin Baidar

Year 2022
Chapter 1. Basics of Data Structures

Data Structures
Three things to note:

• Container: Most data structures can be viewed as containers.


• Elements: Collection of objects of a given type.
• Locators: A means of accessing the elements in the container.

When an object is inserted into a container, a locator is returned, which can be


later used to access the object. A locator is typically implemented with a pointer or
an index.

A data structure has an associated repertory1 of operations, classified into (i)


queries and (ii) updates. Queries retrieve information from the data structure (e.g.
return the number of elements, or test the presence of a given item), whilst updates
modify the structure e.g.(insertion, deletion or replacements). Performance of a
data structure is characterized by space requirement and the time complexity of
the operations in its repertory. However, efficiency is not the only quality measure
of data structures. Simplicity and ease of implementation should be taken into
account when choosing a data structure for solving a practical problem.

Major Issues in the Study of Data Structures

• Static vs. dynamic

– Static types support only queries, while dynamic types also support up-
dates.
– A persistent data structure is a dynamic data structure that supports op-
erations on past versions.

• Implicit vs. explicit

– Explicit data structures make use of pointers (memory address) to link


the elements and access them. For example, in a singly linked list, each
element has a pointer to the next one. Although they’re flexible and have
a much easier implementation, explicit data structures must use addi-
tional space to store pointers.
– In implicit data structures, mathematical relationships support the re-
trieval of elements.

• Internal vs. external


For large-scale problems, data structures need to be designed that take into
account the two levels of memory. For example, two-level indices such as B-
trees have designed to efficiently search into large tbs.
1
a place where something can be found (set?)
• Space vs. time
Data structures exhibit a trade-off between space and time complexity. For
example, suppose we want to represent a set of integers in the range [0, N] (as-
sume ‘N’is seriously large) such that we can efficiently query whether a given
element is in the set, insert an element or delete an element. Two possible
data structures for this problem.
1. An N-element bit array (where the bit in position i indicates the presence
of integer i in the set).
2. And a balanced search tree (such as 2-3 tree or a red-black tree).
The bit-array has optimal time complexity, since it supports queries plus up-
dates in constant time. However, it uses space proportional to the size N of the
range, irrespectively of the number of elements actually stored. At the same
time, the balanced search tree supports queries, insertions and deletions in
logarithmic time but uses optimal space proportional to the current number
of elements stored.

• Theory vs. practice

Introduction to Big O
Big O is a mathematical notation used to describe the computational complexity of
an algorithm. The computational complexity is split into two parts: (1) time com-
plexity and (2) space complexity. The time complexity of an algorithm is the amount
of time that the algorithm needs to run relative to the input size. The space com-
plexity of an algorithm is the amount of memory allocated/needed by the algorithm
when run relative to the input size.
Complexity is described by a function of variables that can change with the input.
The most common variables is ′ n ′ , which usually describes the length of an input
array or string. This function is wrapped by a capital ′ O ′ . Here are some example
complexities:

• O(n)
• O(n2 )
• O(2n )
• O(log n)
• O(n.m)

These functions describe the limiting behavior of a function when the argument
tends towards a particular value or infinity. Meaning that, they describe how the
amount of time(operations)/memory required by the arguments tend to infinity.
Because the variables are tending to infinity, constants are always ignored. This
essentially means O(1000n) = O(100n) = O(10n) = O(n) = O(n/10). This also
means that addition/subtraction between terms of the same variable can also be
ignored. For example, O(2n + n2 − n) = O(2n ), because as ′ n ′ tends to infinity, 2n
completely dominates the other two terms. Essentially, Big O characterizes func-
tions according to their growth rates. The letter ′ O ′ is used because the growth
rate of a function is also referred to as the order of the function. A description of
a function in terms of big O notation usually only provides an upper bound on the
growth rate of the function. In CS, big O is used to classify algorithms according
to how their resource requirements grow during runtime as the input size grows.
Being able to analyze an algorithm and derive it’s time and space complexity is a
crucial skill. Being able to analyze an algorithm also enables you to determine what
parts of it can be improved.
The best complexity is O(1), called “constant time” or “constant space”. It means
the algorithm ALWAYS uses the same amount of resources, regardless of the input.
When talking about complexity, there are normally three cases:
• Best case
• Average case
• Worst case
In most cases, all three of these will be equal, but some algorithms will have
them differ. If you have to choose one to call the algorithm’s complexity, it is most
correct to use the worst case scenario.

Analyzing Time Complexity

Let’s look at some example algorithms in pseudo-code and talk about their complex-
ities.

Example 1:
// Given an array 'arr' of length 'n'.
for (int item: arr) {
echo item
}

This algorithm has time complexity of O(n). In each iteration, we are perform-
ing an echo action, which costs O(1) time. Since the for loop runs n times, the
complexity becomes O(1 . n).

Example 2:
// Given an array 'arr' with length 'n',
for (int item: arr) {
for (int i=0; i< 500,000; i++) {
echo item
}
}

Here, the statement inside the innermost loop i.e. echo item takes O(1), in-
ner loop runs 500, 000 times, outer loop runs n times. Time complexity becomes
O(1.500, 000.n) = O(5000, 000n) = O(n).

Even though the first two algorithms technically have the same complexity, in real-
ity the second algorithm is much slower than the first one.
Example 3:

/* Given an array 'arr' with length 'n' */


for (int item: arr) {
for (int item_2: arr) {
echo item * item_2
}
}

echo item * item_2 has O(1) complexity, inner loop runs n times and outer loop
runs n times too. So the complexity becomes O(1.n.n) = O(n2 ).

Sometimes, work is done by an algorithm that isn’t included in the time complexity.
For example, here we said that multiplication costs O(1), but multiplication takes
longer on larger numbers. This is ignored because, the number of inputs n has
nothing to do with the size of the element of the array.

Example 4:

/* "arr" has length 'n'


* "arr2" has length 'm'
*/

// Loop A
for (int num:arr)
echo num

// Loop A
for (int num:arr)
echo num

// Loop B
for (int num:arr2)
echo num

The inner statements in each of loop A and B has O(1) time complexity. Loop
A has a complexity of O(1.n) and loop B has time complexity O(1.m). Loop A runs
twice. So the total complexity of the algorithm becomes: O(n) + O(n) + O(m) =
O(n + n + m) = O(2n + m) = O(n + m).

Example 5: Exponential time.

/* Given an array 'arr' with length 'n'


for (int i=0; i < arr.length; i++) {
for (int j=i; j < arr.length; j++) {
print(arr[i] + arr[j])
}
}
The body of the inner loop has O(1). The outer loop runs n times and the inner
loop runs (n) + (n − 1) + (n − 2) + ... + 3 + 2 + 1 times depending on what iteration
the outer for loop is currently on. The partial sum of this series is given by

n(n + 1) n2 + n
1 + 2 + 3 + ... + n = =
2 2
n2 + n
In terms of big O, is O(n2 ) because the addition term in the numerator
2
and the constant term in the denominator are both ignored.

Example 6: Logarithmic time


A time complexity of O(log n) is called logarithmic time and is extremely fast. A
common time complexity is O(n. log n), which is fast for most problems and also the
time complexity of efficient sorting algorithms. Typically, the base of the logarithm
will be 2. This means that if you input is size n, then the algorithm will perform
x operations where 2x = n. However, the base of the logarithm doesn’t actually
matter for big O.
O(logn) means that somewhere in your algorithm, the input is being reduced by
a certain fraction at every step. A good example of this is binary search, which is a
searching algorithm that runs in O(log n). With binary search, we initially consider
the entire input i.e. n elements. After the first step, we only consider half of the
input i.e. n/2, after the second step, we consider n/4 and so on. At each step, we
are reducing our search space by 50%, which gives us a logarithmic time complexity.

Analyzing Space Complexity

When you declare variables and modify the data stored by said variables, your al-
gorithm is allocating memory. We never count the space used by the input(it is bad
practice to modify input), and don’t count the space used by the output.

Example 1:

/* Given an array 'arr' with length 'n',


for (int num: arr)
print(num)

This algorithm has a space complexity of O(1). The only space allocated is an inte-
ger variable num, which is constant relative to n.

Example 2:

/* Given an array "arr" with length n */

Array doubledNums = int []


for (int num: arr) {
doubledNums.add(num * 2)
}
This algorithm has space complexity of O(n) because the array doubledNums stores
n integers at the end of the loop.

Example 3:

/* Given an array "arr" with length n */

Array nums = int[]


int oneHundredth = n / 100

for (int i=0; i < oneHundredth; i++) {


nums.add(arr[i])
}

This algorithm has a space complexity of O(n). The array nums stores the first 1%
n
of numbers in arr. This gives a space complexity of O( ) = O(n).
100
Example 4:

/* Given an array "arr" with length n


* Given another array "arr2" with length m
*/

Array grid = int [n][m]

for (int i=0; i < arr.length ; i++) {


for (int j=0; j < arr2.length ; j++) {
grid[i][j] = arr[i] * arr[j]
}
}

This algorithm has a space complexity of O(n.m). We are creating a grid that has
dimensions n.m.
Decomposition (Divide and Conquer) Algorithms
/* Given an integer 'x', compute it's 'n'-th power.*/
int p(int x, int n) {
return (n > 0) ? x * p(x, n-1) : 1 ;
}

This algorithm has a time complexity of O(n). The number of times x is multiplied
with itself increases as the value of n increases.
One approach to the design of algorithms it to decompose a problem into sub
problems that resemble the original problem, but on a reduced scale. Suppose,
in the above example we want to compute xn and the above algorithm has a linear
growth rate (the number of times x is multiplied with itself increases as n increases).
However, xn can be computed from x(n/2) , reason that:


1 if n = 0
n (n/2) 2
x (x ) if n is even, (1.1)

 (n/2) 2
x(x ) if n is odd

This divide-and-conquer decomposition reduces the rate of growth of the time


required by the algorithm to T (n) = O(log n). We can implement this decomposition
recurrence as follows:
int p (int x, int n) {
int result = ( (n/2 > 0) ? x * p(x, n/2 - 1) : 1 );

if (isEven(n))
return result * result;
else
return x * result * result;
}

As n become seriously large the T (n) optimization achieved by this divide-and-


conquer implementation becomes significant.
Binary Search Algorithm

The binary search algorithm is an extremely well-known instance of divide-and-


conquer paradigms. Binary search algorithm “probes” the middle element of an
ordered array of n elements for a given element, continuing in either the left or the
right segment of the array, depending on the outcome of the probe:

/* + Initial call to this would normally look like:


* binarySearch(x, arr, 0, getSize(a)-1)
* + 'start' is the staring point and 'end' is the
* ending point of the search segment.
*/

int binarySearch(int x, int arr[], int start, int end) {


if (start > end) {
// x does not exist in arr.
return -1;
}
else {
int mid = (start+end)/2;
if (arr[mid] < x)
return binarySearch(x, arr, mid+1, end);
else if (arr[mid] > x)
return binarySearch(x, arr, start, mid-1);
else
return mid;
}
}

Abstract Data Types (ADT)


1. Sequence

• Linear order
• Operations supported: Retrieving, Inserting and Removing elements given
its position.
• E.g. Arrays, Linked Lists ; Stacks are special type of sequences where
insertions and deletions can only be done at the head or the tail of the
sequences.
• A basic form of data structure that is used to realize and implement other
complex data types and data structures.

2. Priority queue

• Key application in sorting and scheduling algorithms.


• Realized with: Sequence, Heap, Dictionary

3. Dictionary
• Basic Example: Hash tables
• Advanced example: Skip lists, tries, and balanced search trees such as
APL-trees, red-black trees, 2-3 trees, 2-3-4 trees, weight-balanced trees,
biased search trees and splay trees.
• Applications: When you search for a combination of elements in the data
structure that make up something. For instance, anagrams, two sums,
three sums. (In external memories, dictionaries are often implemented
as B-trees and their variations).

4. Union-Find

Examples of fundamental data structures used in three major computer appli-


cation domains are mentioned below:

• Graphs and networks: Adjacency matrix, Adjacency lists, link-cut tree, dy-
namic expression tree, topology tree, scarification tree.

• Geometry and graphics Binary space partition tree, chain tree,

• Text processing: String, suffix tree, Patricia tree.

Sequence (S)
A sequence is a container that stores elements in a certain order which is imposed
by the operations performed. Using locators, we can define a full repertory of op-
erations for a sequence S.

• SIZE(N)

• HEAD(c)

• TAIL(c)

• LOCATE_RANK(r,c)

• PREV(c', c)

• NEXT(c, c')

• INSERT_BEFORE(e, c', c)

• INSERT_AFTER(e, c, c')

• INSERT_HEAD(e, c)

• INSERT_TAIL(e, c)

• INSERT_RANK(e, r, c)

• REMOVE_HEAD(e, c)

• REMOVE_TAIL(e, c)
• REMOVE_RANK(e, r, c)

• MODIFY_HEAD(e, c)

• MODIFY_TAIL(e, c)

• MODIFY_RANK(e, r, c)

Additional operations can be achieved with the iteration operator NEXT such as
MAX and MIN.
Priority Queue (Q)
Offers two most basic operations:

1. INSERT
2. REMOVE_MAX

Note that, the operation REMOVE_MAX is equivalent to a MAX followed by a REMOVE.

Realization of Q with an unsorted S

In a Q, insertions are done only at the head or the tail. So, a Q INSERT operation
can be realized with a S INSERT_HEAD or INSERT_TAIL operation. However, this ren-
ders the S unsorted. Operation MAX can be performed by scanning the S with the
NEXT operation, keeping track of the maximum element encountered until SIZE(N)
is scanned. And finally, MAX followed by REMOVE realizes the REMOVE_MAX operation.

Realization of Q with a sorted S

An alternative approach is to use a sorted S. This however, breaks the rule that
in a Q insertions can only be done at either the tail or the head. With a sorted S,
the operation MAX corresponds to the operation TAIL. However, the INSERT opera-
tion corresponds to INSERT_RANK, as it becomes necessary to scan the S to find the
appropriate rank where a new element (with a certain priority) can be inserted.
Realization of Q with a Heap

• Rooted Binary Tree


A rooted binary tree has a root node and every node has at most two children,
which are referred to as the left child and the right child of the node. The
node at the top of the tree is known as the “root” node.

Figure 1.1: (i) {D, E} and {F, G} are cousins because B and C are siblings. (ii) Since
B is the parent of D, B is the grandparent of H and I. A leaf node is a node that does
not contain any child nodes.

• Partial order property


In a max heap, for any given internal node2 C, if P is a parent node of C, then
the key of P must be greater than or equal to the key of C. Similarly, in a min
heap, for any given internal node C, if P is a parent node of C, then the key of
P must be less than or equal to the key of C.
• Level property
All the levels of T are full, except possibly for the bottommost level, which is
left-filled. An example can be seen in the above figure where E has only a
single left child J.
Realizing a Q with a S (sorted or unsorted) has a drawback that some operations
require liner time O(1) in the worst case. A more sophisticated realization of Q uses
a data structure called heap. A heap3 is a specialized tree-based data structure, a
rooted binary tree that satisfies the partial order property may be referred to as a
heap.
In a heap, the key of the root is always the element with either the highest or the
lowest priority. Nevertheless, the heap is not a sorted structure, it can be regarded
as “partially ordered”. In a heap, there is no implied ordering between siblings
or cousins.
2
an internal node is any node in the heap distinct from the root node.
3
A heap in DS is not same as the memory heap. See dynamic memory allocation in C for info on
a memory heaps.
REPERTORY:
• FIND_MAX or FIND_MIN
• INSERT
• REMOVE_MAX or REMOVE_MIN
• REPLACE Pop root and push a new key.
• EXTRACT_MAX or EXTRACT_MIN
Returns the node of the maximum value from a max heap or the node of the
minimum value from a min heap after removing it from the heap.

HEAP IMPLEMENTATION:
Heaps are usually implemented with an array as follows:
• Each element in the array represents a node of the heap, and
• The parent/child relationship is defined implicitly by the elements’ indices in
the array.
• How the tree is represented:

– The first element of the array at index 0 is the root node at level 0.
– The next 2 indices at index 1, 2 represent the root’s children at level 1.
– The next 4 indices represent the nodes of level 2.
– The next 8 indices represent the nodes of level 3 and so on.
• A pattern can be seen between the indexes of the parents and the children.
Given a node at index i, it has two children, the left child node for a given
node i is at 2(i) + 1 and the right child node is at 2i + 2.
• Given a child node at index i, its parent is found at index floor((i − 1)/2).

HEAP CONSTRUCTION (HeapSort):

• Divides the input into a sorted and an unsorted region, and iteratively shrinks
the unsorted region by extraction g the larges element from it.
• In place sorting algorithm.
• Construction of a heap could be thought as a step in heap sort, so that the
maximum element is always at the root of the heap.
The heapsort (Bottom Up Approach):
Step 0: Start with a heap that does not follow partial order property (Recall level prop-
erty is implicit).
Step 1: Start from the parent of the last node in the heap (or start with the last non-
leaf node). In a heap with ‘n’ nodes, the index of parent of the nth node is
given by iLastLeafNode = iParent(iLastNode).
Step 2: Perform siftDown (not shift down, to “sift” is to go through especially to sort
out what is useful or valuable). In the siftDown operation, we repair a sub
heap, whose root is at index ”i”, while this suppositional root has at least one
child:

/* Assume that the current node is at proper position. */


toSwap = r;

while exists(iLeftChild(r)) {
/* If the left child is greater. */
if (A[i] < A[iLeftChild])
toSwap = iLeftChild

/* If there is a right child too and that child is greater. */


if (exists(iRightChild) && A[i] < A[iRightChild]
toSwap = iRightChild;

/* If "toSwap" and "r" are equal nothing is to be swapped. */


if (toSwap == r)
return
else
swap(A[r], A[toSwap])

/* Whatever is swapped with the root becomes the new root. */


r = toSwap;
}

Step 3: {Remember in Step 1 we started out with the last parent node (i).} Jump to
the next parent node i.e. decrement i and repeat Step 2 while i is greater
than 0.

Graphs
Low Level Design For Spreadsheet Software Using Graphs

Graphs are suitable for writing spreadsheet software, where each cell is connected
to the other based on the relations they have (each cell is like a node on a graph).
The reason for doing t his as follows:

• One of the biggest points we would have to consider in an application like excel
is how we optimize the way we update teh dependent cells in an excel file when
a cell changes. This can be very efficiently done by using graphs through DFS
(Depth First Search) or BFS (Breadth First Search), as we would have access
to cells that are depended on the current cell.

• As a spreadsheet files has many cells, it might be inefficient to instantitate all


the cells without them doing anything. Instead, as we are treating them as
nodes on a graph, we can only instantitate them when that cell is assigned
something.

Expanding on the above idea, let’s say a cell has the following formula:
cell_X = sum(cell_A, cell_B, cell_C) + produce(cell_D, cell_E, cell_F).
We can notice that our cell_X depends on other cells, and use different functions
for each group of cells. We can also expand and nest different functions with their
own cell dependencies. We can efficiently update cell_X when any of the depen-
dencies changes, as we are using a graph (we just have to probe all the dependent
cells to update theri values). To calculate the value itself, we can go for a recursive
function to deal with multiple nested functions and cell.s

Shank’s Baby Step Giant Step


Let g ∈ G be an element of order N ≥ 2. Shank’s √ baby step giant step algorithm

solves the discrete logarithm problem go = h in O( N.log(N)) steps using O( N)
storage.

You might also like