TOA Cheatsheet
TOA Cheatsheet
Halting problem:
• Write an ALG that given any program as input, will tell us whether/not for any input, the program stops?
• No = can’t prove
Algorithm Features:
Feature Def
Finite Terminates after finite no. of steps
Definite Rigorously and unambiguously specified
ALG should only be interpreted in one way with
the same behavior.
Input Valid inputs → clearly specified
Output Can be proved to produce the correct output
given a valid input
Effective Steps are comprehensible → basic & simple
Notion of an Algorithm:
• Diagram: the (well-defined) problem is used to construct/design an algorithm. Then, given a certain input, the
computer uses the algorithm to produce a certain output.
• Again, each step must be unambiguous.
• Range of inputs (for which an ALG works) must be specified carefully.
• Same ALG can be represented in diff ways
• Several ALG’s may be used to solve the same problem → these ALG’s can be based on diff ideas and can solve
the problem at diff speeds.
No point in finding better (or faster) algorithms if
• that part of the sys is not the bottleneck
• time isn’t the issue
• program will only be used a few times
Design Basics:
Algorithm design technique (strategy/paradigm): general approach to solving problems agorithmically that is applicable
to a variety of problems from diff areas of computing.
Proving corectness: have to prove that the algorithm yields a required result for every legitimate input in a finite amount
of time. Common tech: mathematical induction bc ALG iterations provide natural sequence of steps needed for such
proofs.
Problem domains:
Fibonacci
Brute Force:
This recursion example below makes a binary tree The recursive solution calculates Fibonacci numbers using repeated calls to fibonacci(n-1) and fibonacci(n-2).
Exponential Time Complexity: O(2^n)
Better Solution 1:
• Forward order – work out all elements up to & including the one you need
• Compute nth fib number, progressing from F0 & F1 → working forward
• Array F: set 1st and 2nd elements to 0 and 1
• Then, use for-loop from 3rd element until nth term
o At each step, lookup previous 2 and add them together then place in current element
• Better → MIGHT HAVE PROBLEMS
Better Solution 2:
• Use big integer because Fib sequence grows exponentially. This prevents integer overflow.
• Don’t use entire array
o 3-element array (don’t store all prior elements)
o Overwrite data in prev elements
• Good space complexity and solves problem for large data sizes
• Using the fib mathematical formula: gets increasingly worse for higher n; Floating point rounding problems;
Practical implementation problems
Euclid’s algorithm
Interested in:
• Growth order (asymptotic complexity)
o O(n^2) vs O(n^3)
• Performance
o Best/Average/Worst
• Time and memory efficiency for input size n
o Basic operations (time)
o How many extra memory units are required (mem)
• Computational eff
o How long it takes to run on certain inputs
Efficiency Classes:
Big-O:
Example 1:
Example 2:
Definition:
• A function t(n) ∈ O(g(n)) if (and only if) there is a c and an n0 such that t(n) <= cg(n) for all n>=n0
o T - Function on the no. of operations and ‘n’
o Belongs to class Big O of some function and if can determine some constant ‘c’ and input size n0, so that
this function ‘t’ will always be less than the constant ‘x’ function of Big-O for all input beyond n0.
o Upper bound on ALG performance
•
Big-Omega:
• A function t(n) ∈ Ω(g(n)) if there is a positive constant c and a non-negative integer n0, such that t(n) >= cg(n) for
all n>=n0
• Not widely used – lower bound growth
Big-Theta:
If t1(n) ∈ O(g(n)) and t2(n) ∈ O(h(n)) then t1(n) + t2(n) ∈ O(max{g(n), h(n)})
Examples
• Average of O(n^2) – algorithm (runtime) grows at most as fast as n^2 with average case input
• Worst case of O(n3) – algorithm grows at most as fast as n^3 with its worst case
S1:
• Standard for-loop → goes from standard counter at index ‘j’ to ‘n’
• Operation inside loop = 1
• Number of occurrences is n-j+1 → upper limit – lower, subtract 1
R2:
• Summation of ‘c x ai’ where ai depends on index ‘I’
R3:
• Can separate out summations
S2:
• Summation where amount of work done depends on the index
• Use sum of geometric sequence → equivalent to n^2
Analysis solution 1:
What does the alg do:
• Finds the largest element in the array → “maxVal”
What is the basic operation
• A[i] > maxVal
• comparison
• This is basically checking if the number is bigger than the current biggest number
How many times is the basic operation executed?
Is it a set?
Description:
• Loaded a file of strings into an array
• Want to check if there are any duplicates → each element must appear once
Brute force: 2 loops. Loop through the array twice, compare one value to every other element in the array. As soon as
there’s a matching element = return false.
• Worst case: Completion of both loops
• Core operations: comparisons (if statement)
• How many times is the inner loop executed? Worst case → Θ(n^2)
Decrease and conquer: Recursion. A new index value is passed each time. Still loop through the array until a match is
found. If a match is found, return false because it's not a set.
• Mathematically identical
• Worst case: Θ(n^2)
Transform & conquer: Sort the array and then do a linear search through the array.
• Most efficient sorting algorithms are Θ(n log n)
• The for-loop is linear, so the worst case is Θ(n)
• Sort + For-loop: Θ(max{n log n, n}) = Θ(n log n)
‘Brute force’ vs ‘transform and conquer’: Transform and conquer is faster than divide and conquer.
Recursive Algorithms:
Pattern for determining efficiency:
1. Determine core operation
2. Create equation for number of repetitions
3. Solve
4. Derive efficiency
Basic operation:
• Multiplication
Backward substitution
• Let M(n) be number of multiplications
• Then M(0) = 0
o And
o M(n) = M(n-1) + 1
o M(n) = M(n-1) +1 = [M(n-2) + 1] + 1 = [M(n-3) + 1] + 2 = M(n-3) + 3
• And in general:
o M(n) = M(n-k) + k,
o for k substitutions
o Thus, when k = n (n substitutions) M(n) = M(n-n) + n = M(0) + n = n
o So M(n) ∈ Θ (n) [it is exactly n in this case]
Analysis 5: Fibonacci I need help to understand this one as well
Basic operation
• Addition
Recurrence relation:
• A(n) = A(n-1) + 1 + A(n-2)
• Uses “characteristic equation”
• A(n) ∈ Θ(1.61803n) (!)
Not a solution to all recurrences - but, shows the asymptotic efficiency class of recurrence relations in the form:
• T(n) = aT(n/b) + f(n) where f(n) ∈ Θ(n^d)
This provides a shortcut that replaces backward substitution.
Example:
Consider: A(n) = A(n / 2) + 1, A(1) = 0
• For pattern: T(n) = aT(n/b) + f(n), f(n) ∈ Θ(n^d) and T(1) = c
• a = 1, b = 2, c = 0, and f(n) ∈ Θ(1) = Θ(n^0)
• Thus, d = 0
Select master theorem case:
• b^d = 1 → (2^0), therefore, a = b^d
• By master theorem: T(n) ∈ Θ(n^d log n)
Since d = 0
• T(n) ∈ Θ( log n)
• n^0 = 1 → don’t need to show the coefficient.
See example 2 → notebook (ok, here’s the plan) and slide deck 2.
String matching:
Worst case: the search string matches on every character accept the last one, for iteration of the outer loop.
• The ALG may have to make ‘m’ comparisons before shifting the pattern and this can happen for each of the ‘(n-
m) + 1’ tries
• Therefore, the worst case is m(n-m)+1
o = nm-m2+m – as long as n >> m, can remove the lower order terms (-m2+m)
• Θ(m x n) for ‘m’ much smaller than n (which is what happens in practice)
• Average case on natural language? – Θ(n)
Closest Pair:
Problem:
• Find the 2 points that are closest together in a set of ‘n’ 2D points
• Almost like ‘is this a set’.
o Take a given point & compare it against other points → record the shorter distance each time.
o Then, discard that point, take another and do the same
• Set initial distance between points to ∞, this way the first distance is going to be smaller.
• Iterate over poits from 1 to n-1 (end)
o Then iterate from current point + 1 to the end
• Record distance between 2 points using formula.
• Return indices of closest distance
Efficiency: Θ(n2 )
Convex Hull Which algorithm was being used to get the worst case as O(n^3)
Problem:
• Find convex hull enclosing ‘n’ 2D points
• Convex Hull: if ‘S’ is a set of points then the convex hull of ‘S’ is the smallest convex set containing ‘S’
o BASICALLY → smallest enclosing boundary around a set of points
• By definition, an extreme point of a convex set is a point of this set that is not a middle point of any line segment
with endpoints in the set.
Convex Set: set of points in the plane is convex if the line segment between 2 points belongs to the set
• 2 points on the set
• If a straight line is drawn between, there is no point on the line that is outside of the convex set
ALG: For each pair of points p1 and p2, determine whether all other points lie to the same side of the straight line
through p1 and p2. Then they form the convex hull.
Efficiency: Θ(n3)
• Get every single pair of points O(n^2)
• Compare 2 points against other remaining points O(n)
• Wide applicability
• Simple
• Yields reasonable ALG for some important problems & std ALG for simple computational tasks
• Yardstick for better algorithms
Cons
EXHAUSTIVE SEARCH
EXHAUSTIVE SEARCH
Basically there’s a lot of different situations & have to enumerate through all of them to determine which is best.
Def:
• Usually among combinatorial objects such as permutations & subsets (order within subset doesn’t matter)
Method:
3. Announce ‘winner’
Problem:
• Find the shortest tour that passes through all the cities (once) before returning to the starting city —> Start at ‘a’
& then, end at ‘a’.
• Hamilitonian Circuit is defined as a cycle that passes through all the vertices of the graph exactly once
Solution:
• Improvements
2. Knapsack Problem:
• Think of yourself as a thief, how many items can you fit in your bag for the most value
• ‘n’ items of known weights ‘w1’, ‘w2’, etc. and values ‘v1’, ‘v2’, etc.
• And weight or capacity of the bag ‘W’.
Problem example:
• So lets say the capacity is W = 16kg, everthing after that is infeasible and cant fit
• We must find what combination of items is gonna give us the most value
• BUT, total weight must be under 16kg.
Efficiency: Ω(2n)
3.Assignment Problem:
About:
• ‘n’ people that need to be assigned to execute ‘n’ jobs – 1 person per job.
• Cost accrued if the ith person is assigned to the jth job is a known quantity C[i, j]
• Point is to find the assignment with the lowest cost.
Example of problem:
Solution:
3 Variants:
• Decrease by a constant
• Decrease by a constant factor – divide size of problem by a constant (e.g., 2)
• Var size decrease – depending on data, might reduce by more/less each iteration
Decrease & Conquer: Throws away half (or constant factor) of work → Binary Search
Divide & conquer: Divides a problem and solves both halves then combine results → Quicksort
Strengths:
Weaknesses:
DECREASE BY A CONSTANT:
Insertion sort:
Idea:
• Take element
• Assume already sorted list of (n-1) for ‘n’ elements
• Insert remaining element in the correct position → find position (where the element should be placed) by
searching through the list, of where this element is (that has been taken away).
• Before doing this, recursively perform insertion sort on smaller list.
• Take & set 1st element aside, insert into already sorted list
o List size = 1
• Reducing the size of the unsorted list at each step
o Reach sorted list
Topological sort
Problem:
• In a directed acyclic graph (DAG) → list the vertices in an order such that edge direction is respected.
• Any directed edge the source vertex must appear before the dest in the list.
• Multiple possible orderings
• Cycles = not solvable
• DFS (depth first search) can be used to solve
Applications:
Generating permutations:
Solution:
Alternative permutation generator that avoids permutations of smaller lists → more efficient bc it doesn’t require you to
derive intermediate results (no need for previous tree-like structure).
• An element ‘k’ is mobile if its arrow points to an adajent element smaller than it.
Johnson-Trotter Example:
• Initiak list of numbers & set each 1 to be a left mobile number → move leftwards each time until it can bc left
number is bigger.
• Then, while you still have mobile elements, repeatedly:
o Search & find largest mobile element
o Swop ‘k’ with its immediate neighbour in the direction in which its pointing to.
o Reverse direction of all elements → ‘k’ arrows flip → REVERSE DIRECTION OF ALL ELEMENTS BIGGER
THAN THE VALUE MOVED.
o Print the current state of those lists of numbers as one of the permutations.
• Carry on until there is no more mobility, then return the list of perm
Problem: Among ‘n’ coins, one is fake (& weighs less). There’s a balance scale which can compare any 2 sets.
Algorithm:
• Divide into 2 size floor piles → |n/2| (keeping a coin aside if ‘n’ is odd, e.g., n = 3)
• If ‘n’ is odd, and if the 2 coins on the scale weigh the same then the one that’s left out is the fake coin.
• Otherwise proceed recursively with lighter pile
• Efficiency:
o |_ _| => floor
Algorithm → 3 pile:
Efficiency:
Multiplication A LA Russe:
n*m:
• (n/2) * (2m) → if ‘n’ is even
• [(n – 1)/2] * (2m) + m → if ‘n’ is uneven
Example: 50 * 20
• (25 * 40)
• (12 * 80) + 40
• (6 * 160) + 40
• (3 * 320) + 40 → (1 * 640) + 320 + 40 = 1000
Euclid GCD:
Greatest common divisor of 2 integers ‘m’ and ‘n’ is the largest integer that divideds both
Solution:
• Gcd(m, n) = gcd(n, m % n)
• Gcd(m, 0) = m
• Right-side args are smaller by neither a constant nor factor
Problem: finding the kth smallest element in a list. The median is usually k = n/2. But sorting the list is inefficient.
Solution:
Variable size because quicksort doesn’t partition into equal size sub lists. → Not divide & conquer bc we throw-away the
other half of the list is thrown away
2
Efficiency: average case Θ(n), worst case Θ(n )
E.g.: List of numbers that needs to be added together. Divide the list in half, and recursively add. The add the two
answers.
Illustrated:
Merge Sort:
Efficiency: O (n log n)
Algorithm:
Example:
Diagram description:
Recurrence relation:
Quick Sort:
Algorithm steps:
Algorithm:
Example:
Worst-case efficiency:
General efficiency:
Improve efficiency:
• Better pivot selection: E.g., take one element from start, middle, and end of list and select median of those 3.
• Insertion sort: Switch to this on small subfiles
• Eliminate recursion: overhead costs.
Geometric Problems:
Divide and conquer because dividing space into two halves i.t.o. the number of points in each half
Example:
o Efficiency: a = 2, b = 2, d =1
o A = b^d in Master Theorem
o Θ(n log n) – Same cost as pre-sorting step
o 2 Sorts in algorithm → initial sort from left to right according to ‘x’
▪ Then another in straddle zone → sort vertically in increasing ‘y’
Limits in the straddle zone:
QuickHull:
Solution:
Diff sub strategies (NB the way you transform the problem):
• Create simpler e.g., of the same problem → sort array before trying to solve
• Use another kind of data structure to support solution to the problem → heaps, balance-search trees.
• Transform instance of current problem into another → use existing solution to solve
Approaches:
Instance Simplification
Gaussian elimination:
First transform to upper triangular form by Gaussian elimination (simplification) and then solve by backward substitution
(conquer).
Convert into RHS form → once have zeroes lower diagonals → easier to solve → use backwards substitution.
Pre-sorting:
Solve instance of problem by pre-processing the problem to transform it into another simpler instance of the same
problem
• Searching
• Finding median
• Finding repeating elements
• Convex hull and closest pair
Efficiency:
Representation Change
Evaluating polynomials:
Fast Fourier Transform (FFT)
→ Brute force polynomial
• For a polynomial of size ‘n’, just the 1st term required ‘n’ multiplications using brute force
o Becomes an Θ(n2) alg
• Improve efficiency by calculating x^n → calculating lower order terms and then gradually building up
• Use Horner’s rule → does better for large polynomials and is easy
→ Horner’s Rule
→ Pseudocode:
Parameters:
1. Array of coefficients
2. Value ‘x’
Method content:
Efficiency:
o
• For the entire polynomial, it takes as many multiplications as the Brute Force method uses for its first
term.
Binary exponentiation:
Solution:
•
Horner for exponentiation:
• Factorize p(x)
• Calculate ‘p’ → get 13, as ‘p’ is a representation of 13.
o When x=2, p(x) should = 13
o ‘P’ is the values of each consecutive bracket
• Calculate a^p → each step of Horner, take current total multiply by ‘x’ and add coefficient in
Algorithm (for exponentiation):
P = 1: Assume highest order coefficient will be 1, because Successively build up final a^n by applying relevant
in any binary representation first digit is never 0. formula.
Apply for-loop → with 2 subbed in for ‘x’. Simplify statement → take power term and split into
halves → power term is added == multiplying (with same
base).
PROBLEM REDUCTION
Strengths:
Weaknesses:
• Store supportive info about the problem instance which will make subsequent solving it faster
• Problem instance → do preprocessing on it → build up to a table that will help accelerate the problem solving
• E.g., sorting by counting, Horspool and Boyer-Moore’s string matching
Pre-structuring
Sorting by counting:
Idea:
1. Sort list whose elements fall in a restricted range of integers [L…U].
a. Can use pre-processing to make actual stage of sorting, linear → general sort that’s
unrestricted
b. Range within lower and upper integer.
2. Frequency table → counts no. of occurrences of each element
3. Distribution table derived from frequencies → tells where to place the elements
a. Convert freq table to distribution
Example:
• 3 elements → 11,12,13
• Run through (Θ(n)time)→ calculate no. of occurrences of integers.
• Prefix sum → See the distribution values 1, 4, 6 → shows where the values will occur.
o E.g., 11 will occur in the 1st position. 12 will occur (last) in position 4 (previous distribution value
+ freq of 12).
• Final run-through → grab elements into input array in linear time and place into their final position →
start from last element in OG array → work backwards
o Start with 12 → maps to position 4 (distributional) → put 12 at position (4-1) 3 → then subtract
one from the distributional value
• 1 linear iteration to get frequencies
• 2nd linear iteration to calculate distributional
• 3rd linear iteration through list to sort accordingly
• Eff = Θ(n) → restricted subset
Pseudocode:
public int[] sortArray(int[] A, int L, int U) {
int n = A.length;
int[] D = new int[U - L + 1];
int[] S = new int[n];
for (int j = 0; j <= U - L; j++) {// Initialize frequency array
D[j] = 0;
}
for (int i = 0; i < n; i++) {// Calculate frequencies
D[A[i] - L]++;
}
for (int j = 1; j <= U - L; j++) {// Calculate distribution
D[j] += D[j - 1];
}
for (int i = n - 1; i >= 0; i--) {// Sort elements into S
int j = A[i] - L;
S[D[j] - 1] = A[i];
D[j]--;
}
return S;
Horspool’s String Matching:
}
Idea:
• Start matching from the end of the pattern and on a mismatch → shift the pattern by more than a single space
• Shift table ‘T’:
o 1 entry per character ‘c’
o Gives no. of places to shift he pattern when text character is aligned with the last character of the
pattern.
o Max shift is ‘m’ → length of pattern
o Mismatch → retrieve T(c) where ‘c’ is char in text aligned to last char in pattern → shift patter right T(c)
positions
Example:
Overall alg:
Case 3 → last letter in Text is in the pattern but not at the end of the pattern
E.g., E → find the rightmost index that isn’t at the very end
• So, for G the rightmost index that isn’t at the very end is 4.
• ‘m – 1’ → because ignore the last character
• If there’s a mismatch, T (c = character that caused the mismatch). Then, look in the table to see the
corresponding number that it needs to be shifted by.
• If the letter causing the mismatch is not in the pattern at all → shift the pattern by 8 (length of pattern) to the
right.
E.g., 3: Mismatch between ‘N’ and ‘G’ last characters. But T (N)
is in the pattern → So, go to the table and find the
corresponding value (= 1) → shift the pattern by 1 so that the
characters align.
Boyer-Moore
Idea:
Except:
Efficiency:
Bad-Symbol shift:
• E, R matched → k = 2.
• Mismatch at character (c) = ‘S’
• Check for shift value of ‘S’ in table → not there so T (S) = 6 (length of pattern Barber)
• E, R matched → k =2
• Mismatch at text character = ‘A’
• ‘A’ is in the shift table
o T (A) = 6 - 1 – 1
• Shift = max (T(A) – k, 1) → max (4 -2, 1) = 2
• Less shifts than with Horspool
Pre-processing
Before searching – know everything about the pattern
Both algorithms aim to extract useful info from the pattern in advance of the search → to maximize the size of the shift
they do each on each mismatch.
Point: Shift table is determined solely by the properties of the pattern, not the text.
• Boyer Moore selects greater shift between good suffix and bad symbol rule
• Based on no. of characters that were successfully matched before had 1st failure
Algorithms:
• Introductory algos:
-Fibonacci
-Change making
• Transitive closure: Warshalls algo
• All pairs shortest path: Floyd’s algo
• Knapsack problem
Change making algo:
• Given a collection of coins, find exact change for amount n such that minimum number of coins is
used
-Each denomination has unlimited number
-d1 =R1
• For typical sub cases there is an efficient greedy algorithm
• Recurrence relation:
-Add one coin that accounts for the largest part of n for different denominations, solve for coins
required to make up remaining amount
-F(n) is minimum number of coins adding up to n
-F(0) = 0
-F(n) = min {F(n-dj)} +1 for n>0
Transitive closure:
• Problem = give change for a specific amount n, with the least number of coins of the denominations
• E.g. Smallest change for R2.54 using R5, R2, R1, 50c, 20c, 10c, 5c, 2c, 1c
-Fewest coins = R2+50c+2c+2c
• Algo:
-At any step choose the coin of the largest denomination that doesn’t exceed the remaining total, repeat
-Optimal for reasonable sets of coins
Text Encoding:
• Assign bit patterns to an alphabet of characters so that in general texts, fewer bits are used overall (this is called
entropy coding)
• Use a fixed length: -Same
number of bits for each character (e.g. ASCII)
• Use a variable length: -
Number of bits for a character varies according to probability of
occurrence -
Leads to better compression (e.g. morse code encoding uses this as
pictured on the right)
Huffman Trees:
• Compression ratio(CR):
-Standard measure of compression
-CR = 100 * (y-x) / y where x is compressed and y is uncompressed
-Typically 20-80% in the case of Huffman coding
• Yields optimal compression if:
-Probabilities of character occurrence are independent and are known in advance and in powers of 2
-Not great with bit strings (alphabet = {0,1}
• More sophisticated approaches = adaptive Huffman, which builds frequency tables on the fly
Strengths:
Weaknesses: