Algos Midterm2
Algos Midterm2
List: finite number of ordered values that allows repetition (array or linked list)
Why do we care?
● Widely applicable to all types of problems
● No limitation on instance size
● Expense of designing more efficient algorithm unjustifiable if only a few instances of
a problem need to be solved (simple)
● Useful for small-size instances
● Educational yardstick for judging more efficient alternatives
Why don’t we care?
● Rarely yields efficient algorithms, often unacceptably slow
● Not as constructive as some other design techniques
Selection sort
● Sorting a list of numbers means the smallest element is in position 1, the next
smallest in position 2, etc.
● L[i] <= L[i+1]
● So selection sort scans an array for the smallest element and swaps it with the first
element, then looks for the second-smallest and swaps it with the second, etc.
● If our array is of size n, our indices will be from [0..n-1] and the last element we need
to swap is [n-2] because the final element will automatically be in place if the rest are
Selection sort is Θ(n2) on all inputs, but the number of swaps is only Θ(n-1) in the average
and worst case (or Θ(1) in a sorted array)
Input size is n (list length), basic operation is comparison, no separate best/avg/worst cases;
note that this only holds true assuming swap is a constant time operation (not for linked lists)
Sidenote: language APIs might tell you whether an algorithm is mutating or non-mutating;
e.g. for verb operations, imperative form is mutating and ‘-ed’ or ‘-ing’ is for non-mutating.
e.g. for noun operations, form- is for mutating and noun by itself is for non-mutating.
Bubble sort
● Compare adjacent elements and exchange if they are out-of-order
● “Bubble up” the largest element to the last position on the list in each pass
The number of swaps, however, depends on the input; almost sorted data will require a lot
fewer swaps.
Can optimise: if a pass through the list results in no swaps, we know our list is sorted so we
can stop; faster on some inputs but still Θ(n2) in the average and worst cases.
Also note that bubble sort is stable—only swap adjacent elements so adjacent equal
elements never get swapped. Non-adjacent equal elements eventually become adjacent and
don’t get swapped.
Sequential search: compare successive elements of a list with a given search key until a
match is encountered (success) or the list is exhausted without a match (unsuccess); we can
append search key to the end of the list so that searches are always successful and we don’t
need to do an end-of-list check.
If we know a list is sorted, we can stop our search as soon as we find an element greater
than or equal to the search key.
Brute force string matching: given a string of n characters call the text, and a string of m
characters (m<=n) called the pattern, we find a substring of the text that matches the pattern
● Align the pattern against the first m characters of the text and start matching the
corresponding pairs of characters from left to right until either
○ all the m pairs of the characters match (then the algorithm can stop) or
○ a mismatching pair is encountered, in which case, shift the pattern one
position to the right and resume the character comparisons, starting again
with the first character of the pattern and its counterpart in the text
● Last possible position to analyse is n - m
Worst case: make all m comparisons before shifting the pattern for each of the n - m + 1 tries
● Θ(nm) class because it makes m(n-m+1) comparisons in the worst case
● Typical case is Θ(n) because most shifts happen after a few comparisons
Small modifications can make a big difference in brute force algorithms. For example:
The efficiency of the first algorithm is Θ(n2) but a small modification (calculating from right to
left) gives it an efficiency of Θ(n)
Closest pair problem: finding the closest points in a set of n points (in a 2-D Cartesian
plane)
● Compute the distance between every pair of distinct points
● Return the indexes of the points for which the distances are the smallest
Efficiency Θ(n2)—however, the square root is the most computationally expensive operation
because it computes an irrational number and is often approximated, so since we know
sqrt(x1) < sqrt(x2) iff x1 < x2, we can just compare squared distances instead of square
rooting. Same Big-Oh complexity but constant factor improvement.
Exhaustive search: generating each and every element of a problem domain, selecting the
ones that satisfy constraints, and finding a desired element; often related to combinatorial
objects (domain size grows faster with instance sizes)
Travelling salesman problem (TSP): find the shortest tour through a given set of n cities
that visits each one exactly once before returning to the start (e.g. find Hamiltonian circuit)
● Modelled with a weighted graph
● In other words, n+1 vertices from v0, v1,…vn-1, v0 where the first and last are the same
and the remaining n-1 are distinct
● Generate all permutations of n-1 intermediate cities, compute tour lengths, and find
shortest among them—(n-1)!
● Since some tours only differ by direction, we only have to consider half of the
paths—0.5*(n-1)!
Knapsack problem: given n items of known weights and values, and a knapsack of
capacity W, find the most valuable subset of items that fit into the knapsack.
● Generate all subsets of n items, compute the total weight of each subset, discard the
non-feasible ones, and find the subset with the largest value among what’s left
● Generating subsets is 2n so Ω(2n)
Both the TSP and knapsack problem are NP-hard problems because no polynomial-time
algorithm is known for them
Assignment problem: n people need to be assigned to n jobs, one per job. Cost of ith
person assigned to jth job is C[i, j]. Problem is to find the assignment with the minimum total
cost.
● Use a cost matrix and select one element in each row such that all selected elements
are in different columns and the total sum is the smallest possible.
● Express feasible solutions as n-tuples indicating the column of the element selected
from the ith row, which represents the ith person.
● Generate all permutations of 1..n, compute the total cost by summing, and select the
smallest sum.
● Number of permutations is n! so algorithm is only feasible for small instances
More often than not, there are no known polynomial-time algorithms for problems whose
domain grows exponentially with instance size, provided we want to solve them exactly.
Depth-first forest (for the brave): unvisited vertices are attached as children from their
starting point; edges leading to previously visited vertices other than the predecessor are
back edges while the remaining edges are tree edges. Uses a stack.
Efficiency of DFS
● For adjacency matrix Θ(|V|2)
● For adjacency list Θ(|V|+|E|)
DFS provides two orderings: order in which vertices are reached for the first time (pushed
onto the stack) and the order in which they become dead ends (popped off the stack)
Can be used to check connectivity: since the algorithm quits after visiting all connected
vertices, if it quits before visiting all overall vertices, you know the graph is not connected.
Can also be used to check acyclicity: if there are no back edges, the graph is acyclic.
Breadth-first search (BFS) (for the cautious): proceeds in a concentric manner by visiting
first all the vertices adjacent to a starting vertex, then all unvisited vertices two edges apart,
and so on, until all vertices in the same connected component have been visited.
● If there remain unvisited unconnected vertices, the algorithm restarts at an arbitrary
point on one of these.
● Use queue: initialise it with the starting vertex, and on each iteration, identify all
adjacent vertices and add them to the queue. After that the front vertex is removed.
Breadth first search forest: starting node is the root and all adjacent vertices are attached
as a child via a tree edge. If an edge connects a previously visited vertex, other than its
immediate predecessor, we add a cross edge.
Efficiency of BFS
● For adjacency matrix Θ(|V|2)
● For adjacency list Θ(|V|+|E|)
Only single ordering of vertices because the queue is FIFO. BFS can also be used to check
connectivity and acyclicity.
● Also finds path with fewest number of edges between two given vertices (start BFS
traversal at one of the two vertices, stop as soon as the other vertex is reached)
Decrease and conquer (aka inductive or incremental approach): reduce problem into
smaller instance of the same problem, solve smaller instance, then extend solution of
smaller instance to obtain original instance
Three types:
● Decrease by constant (usually 1): insertion sort, permutations, etc.
● Decrease by constant factor (usually half): binary search, exponentiation by squaring,
Russian multiplication
● Variable-size decrease: Euclid’s, selection by partition, N
Insertion sort: assuming you have an array of size A[0..n-1], you decrease by one and
assume you have sorted A[0..n-2], then simply bring A[n-1] down through the remaining
array.
Input size is list length, n. Basic operation is key comparison A[j]>v. Number of comparisons
depends on input. In worst case, A[j]>v is executed for everything on the left side of the
array, in other words, if the array is strictly decreasing
The best case happens if the comparison is only executed once per outer loop, which is true
for strictly nondecreasing arrays.
For average case, insertion sort makes half as many comparisons as on decreasing arrays
Overall, insertion sort is in-place, stable, and the best elementary sorting algorithm overall.
Sentinel value: instead of having j>=0 as a loop condition, we simply put a super negative
value at the end of the array which A[i] is always greater than (but that messes up our
indexes a little because A[0] is a value we inputted, not part of original array)
● Can simply do one O(n) pass, find the smallest element, and put it in the 1st place
position so we don’t have to deal with coming up with a sentinel or allocating extra
memory
● Same time complexity because we’re still doing O(n2) comparisons for A[j]>v
Using linked lists, we would simply splice the element A[i] into it’s place by finding the
appropriate location (unlike arrays, where we’d have to swap elements each time.
● This has the same time complexity for comparisons but lower for swaps
Exponentiation by squaring
Input is the number of bits in n, the basic operation is multiplication/squaring, and the
complexity is log2(n) for either multiplication or squaring.
Binary search (decrease-by-constant-factor): recursively/iteratively compare search key
with array’s middle element until a match is found.
To analyse efficiency, we count the number of times the search key is compared with an
array element. Depends on n and instances of the problem.
Overall it is very fast: Cw(106) = 20 BUT the array must be sorted to use it
Fake coin problem: among n identical looking coins, one is fake. With a balance scale, we
can compare any two sets of coins (i.e. balance will tip left, right, or even so we can tell if the
sets weigh the same or one is heavier than the other). How can we efficiently detect a fake
coin, assuming the fake is lighter than the genuine ones?
● Divide coins into two piles, ⎣n/2⎦ each (leave a coin out if n is odd) and weigh the two
piles. If the piles differ, proceed with the lighter pile and keep dividing the piles until
you narrow down the fake coin.
Recurrence for this problem is almost identical to binary search except for the initial
condition. Can make it even more efficient if you divide into 3 piles.
Russian peasant multiplication: assume you want to multiply n and m. If n is even or odd,
we can break it down as follows:
Even: Odd:
Stop at trivial case 1*m = m. Using these formulas recursively or iteratively, we can compute
the product very fast (good hardware implementation because doubling and halving binary
numbers is easy with shifts at the machine level). For example:
Josephus problem: let n people, numbered 1..n, stand in a circle. Eliminate every second
person (starting at 1) until one survivor is left.
If n is even, each pass reduces the instance of the problem by half, although the positions of
the survivors will change as people are eliminated. To find the initial position of the final
survivor:
Lomuto partitioning is one way to partition the array using quickselect. This approach
divides the array into three segments:
● Elements known to be smaller than the pivot
● Elements known to be greater than the pivot
● Elements that have yet to be compared to the pivot
● l represents the leftmost part of the array and is also the pivot
● The segment between l and s marks those elements that are smaller than the pivot (s
marking the end of the smaller-than segment).
● The segment between s and i represents the elements greater than the pivot (i being
the index at the end of the greater-than segment)
● The segment between i and r is the portion of the array that hasn’t been analysed
yet, with r being the rightmost element in the array
Starting with the element after l (after the pivot), we scan the array using the index i,
comparing each element to the pivot. If the element at Array[i] is greater than the pivot, we
just increment i to include it in the greater-than segment. If the element at Array[i] is less
than the pivot, we increase s (to increase the size of the smaller-than segment). At this point
s will have encroached onto the greater-than section, and so A[s] will be greater than the
pivot. Swap A[i] with A[s], moving the greater-than element in A[s] to the greater-than section
and bringing the less-than element A[i] to the less-than segment. Once all of the elements
have been processed, we swap the pivot with A[s] to move it into its rightful place.
The value of A[s] with now hold the value of the s smallest element in the array. Keep calling
the Lomuto partition until s is the element you’re looking for.
● If we’re looking for element k, then keep going until s = k - 1 (subtracting one to
account for array indexing starting from 0)
● If s =/= k - 1, then call lomuto on the left subarray assuming s > k - 1 or the right
subarray if s < k - 1
Efficiency of quickselect?
● The partition requires n-1 comparisons
● If we only need to do one partition, Cbest(n) = n - 1 ∈ Θ (n).
● If the partition is unbalanced (e.g. one side of the partition is empty and the other
contains n-1 elements, aka degenerate split), we have
Interpolation search: unlike binary search, which always compares the search key with the
middle term in an array, interpolation search accounts for the value of the key to find an
element. Similar to looking up a name telephone book: if you’re looking for a B-name, you’d
start at the beginning of the book rather than the middle or end.
● Interpolation search assumes array values increase linearly from left to right (so array
must be sorted). The search key’s value is compared to the element whose index is
the x-coordinate on a straight line through (l, A[l]) and (r, A[r])
Ferrying soldiers problem: n soldiers must cross a river. They borrow a boat from two boys
on the shore, but only two boys or one soldier can fit in at a time. How can the soldiers cross
the river and leave the boys in possession of the boat and how many times does the boat
cross the river?
● Two boys take the boat across, one comes back.
● Soldier rows across
● Second boy takes the boat back so the boys and boat are back where they started
● 4n trips total
Alternating glasses problem: 2n glasses in a row, first n of them are filled with beer and
the rest are empty. Make them alternate in empty-filled-empty-filled-... pattern with the
minimum number of glass moves
i.e. →
Divide and conquer: divide the instance of the problem into 2+ smaller instances and solve
those instances recursively. Obtain the solution to the larger instance by combining the
smaller instances.
Where f(n) is the time spent cutting the instances down and merging their solutions.
The time complexity of divide and conquer depends on whether solving the instances takes
longer, or dividing and merging takes longer. So, we can use a shortcut to determine what it
should be called the master theorem:
f(n) must be polynomial time complexity, i.e. f(n) must be of complexity class (nd) for this
shortcut to apply.
We can understand this theorem as follows:
● a is the number of subproblems we split our problem into on each recursive call
● b is the factor by which the size of our original problem decreases on each call
● d is the exponent from f(n), representing the amount of work done by f(n) for each
recursive call
When a < bd, as the number of subproblems increases in each recursive call, the amount of
work done per subproblem also decreases more rapidly with every recursive call. So the
hardest work is done at the beginning.
When a = bd, the amount of work done per subproblem decreases proportionally as the
number of subproblems increases with each recursive call. Time complexity it the amount of
work done at each level (nd) multiplied by the number of levels with is logbn
When a > bd, it means that as the number of subproblems increases with each recursive call,
the work done per subproblem decreases more slowly with each recursive call. So the total
amount of work increases the more you split the problem (but the amount per problem is still
decreasing). Time complexity if O(nlogba) to highlight the fact that the complexity is dominated
by the depth and breadth of the recursion tree.
Mergesort: split array A[0..n-1] into two equal halves, make copies of each half in arrays B
and C. Sort arrays B and C recursively. Merge sorted arrays B and C into array A as follows:
● Compare the first elements in the remaining unprocessed portions of the arrays
● Copy the smaller of the two into A, while incrementing the index indicating the
unprocessed portion of that array
● Once all elements in one of the arrays are processed, copy the remaining
unprocessed elements from the other array into A
Quicksort just applies this partition recursively until base case where array length is 1
Best case happens if the split occurs in the middle each time:
Worst case when splits are skewed to the ends of the array (for increasing or decreasing
arrays)
Improvements can be made by selecting better pivots, switching to insertion sort on small
subarrays, etc.