0% found this document useful (0 votes)
45 views3 pages

cs2040s Cheatsheet

Uploaded by

yiheding1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views3 pages

cs2040s Cheatsheet

Uploaded by

yiheding1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CS2040S SORTING AVL Trees


AY20/21 sem 2 overview • height-balanced (maintained with rotations)
github.com/jovyntls • ⇐⇒ |v.left.height - v.right.height| ≤ 1
• BubbleSort - compare adjacent items and swap
• each node is augmented with its height - v.height = h(v)
• SelectionSort - takes the smallest element, swaps into place
• space complexity: O(LN ) for N strings of length L
ORDERS OF GROWTH • InsertionSort - from left to right: swap element leftwards
until it’s smaller than the next element. repeat for next rebalancing
definitions element • insertion: max. 2 rotations
[case 1] B is balanced: right-rotate
T (n) = Θ(f (n)) • tends to be faster than the other O(n2 ) algorithms • deletion: recurse all the way up
⇐⇒ T (n) = O(f (n)) and T (n) = Ω(f (n)) • MergeSort - mergeSort 1st half; mergeSort 2nd half; merge h(L) = h(M ), h(R) = h(M ) − 1 • rotations can create every possible tree shape.
• QuickSort
• partition algorithm: O(n) Trie
• stable quicksort: O(log n) space (due to recursion stack)
• search, insert - O(L) (for string of length L)
• first element as partition. 2 pointers from left to right
• space: O(size of text · overhead)
· left pointer moves until element > pivot
· right pointer moves until element < pivot
[case 2] B is left-heavy: right-rotate interval trees
· swap elements until left = right.
• then swap partition and left=right index. h(L) = h(M ) + 1, h(R) = h(M ) • search(key) ⇒ O(log n)
T (n) = O(f (n)) • if value is in root interval, return
optimisations of QuickSort • if value > max(left subtree), recurse right
if ∃c, n0 > 0 such that for all n > n0 , T (n) ≤ cf (n)
• array of duplicates: O(n2 ) without 3-way partitioning • else recurse left (go left only when can’t go right)
T (n) = Ω(f (n))
• stable if the partitioning algo is stable. • all-overlaps ⇒ O(k log n) for k overlapping intervals
if ∃c, n0 > 0 such that for all n > n0 , T (n) ≥ cf (n)
• extra memory allows quickSort to be stable.

choice of pivot [case 3] B is right-heavy: left-rotate(v.left), right-rotate(v)


• worst case O(n2 ): first/last/middle element h(L) = h(M ) − 1, h(R) = h(L)
• worst case O(n log n): median/random element
• split by fractions: O(n log n)
• choose at random: runtime is a random variable

quickSelect orthogonal range searching


properties • O(n) - to find the kth smallest element • binary tree; leaves store points, internal nodes store max
• after partitioning, the partition is always in the correct position value in left subtree
Let T (n) = O(f (n)) and S(n) = O(g(n))
• buildTree(points[]) ⇒ O(n log n) (space is O(n))
• addition: T (n) + S(n) = O(f (n) + g(n)) TREES • query(low, hight) ⇒ O(k + log n) for k points
• multiplication: T (n) ∗ S(n) = O(f (n) ∗ g(n))
binary search trees (BST) • v=findSplit() ⇒ O(log n) - find node b/w low & high
• composition: f1 ◦ f2 = O(g1 ◦ g2 )
• a BST is either empty, or a node pointing to 2 BSTs. • leftTraversal(v) ⇒ O(k) - either output all the right
• only if both functions are increasing
• tree balance depends on order of insertion subtree and recurse left, or recurse right
• if/else statements: cost = max(c1, c2) ≤ c1 + c2
• balanced tree: O(h) = O(log n) • rightTraversal(v) - symmetric
• max: max(f (n), g(n)) ≤ f (n) + g(n)
• for a full-binary tree of size n, ∃k ∈ Z+ s.t. n = 2k − 1 • insert(key), insert(key) ⇒ O(log n)
notable • 2D_query() ⇒ O(log2 n + k) (space is O(n log n))
√ BST operations • build x-tree from x-coordinates; for each node, build a
• n log n is O(n)
• height, h(v) = max(h(v.left), h(v.right)) updating nodes after rotation y-tree from y-coordinates of subtree
• O(22n ) 6= O(2n )
• leaf nodes: h(v) = 0 weights • 2D_buildTree(points[]) ⇒ O(n log n)
• O(log(n!)) = O(n log n) ⇒ sterling’s approximation
• T (n − 1) + T (n − 2) + · · · + T (1) = 2T (n − 1) • modifying operations
• search, insert - O(h) kd-Tree
master theorem • delete - O(h)
• case 1: no children - remove the node
T (n) = aT ( nb ) + f (n) a ≥ 0, b > 1
 • case 2: 1 child - remove the node, connect parent to
Θ(nlogb a )
 if f (n) < nlogb a polynomially child
= Θ(n log a if f (n) = nlogb a
b log n) • case 3: 2 children - delete the successor; replace node
if f (n) > nlogb a polynomially with successor

Θ(f (n)) max

• query operations
space complexity • searchMin - O(h) - recurse into left subtree
• Θ(f (n)) time complexity ⇒ O(f (n)) space complexity • searchMax - O(h) - recurse into right subtree • stores geometric data (points in an (x, y) plane)
• the maximum space incurred at any time at any point • successor - O(h) • alternates splitting (partitioning) via x and y coordinates
• NOT the maximum space incurred altogether! • if node has a right subtree: searchMin(v.right) • construct(points[]) ⇒ O(n log n)
• assumption: once we exit the function, we release all • else: traverse upwards and return the first parent that • search(point) ⇒ O(h) √
memory that was used contains the key in its left subtree • searchMin() ⇒ T (n) = 2T ( n
4
) + O(1) ⇒ O( n)
(a, b)-trees HASH TABLES • better cache performance (table is one place in memory) • KnuthShuffle ⇒ O(n) - for every element in array A, swap
Let the m be the table size; let n be the number of items; let • rarely allocate memory (no new list-node allocation) it with a random index in array A.
e.g. a (2, 4)-tree storing 18 keys • disadvantages
cost(h) be the cost of the hash function
• load(hash table), α = m n • more sensitive to choice of hash function (clustering) AMORTIZED ANALYSIS
• = average number of items per bucket • more sensitive to load (as α → 1, performance degrades)
an operation has amortized cost T (n) if
• = expected number of items per bucket double hashing for every integer k, the cost of k operations is ≤ kT (n).
• designing hashing techniques
for 2 functions f, g , define • binary counter ADT: increment ⇒ O(1)
• division method: h(k) = k mod m (m is prime)
h(k, i) = f (k) + i · g(k) mod m • hash table resizing: O(k) for k insertions ⇒ O(1)
• don’t choose m = 2x
1 • if g(k) is relatively prime to m, then h(k, i) hits all buckets • search operation: expected O(1) (not amortized)
• if k and m have common divisor d, only d of the table
• rules will be used • e.g. for g(k) = nk , n and m should be coprime.
1. (a, b)-child policy where 2 ≤ a ≤ (b + 1)/2 • multiplication method - GRAPHS
h(k) = (Ak) mod 2w  (w − r) for odd constant A table size
# keys # children • degree (node): number of adjacent edges
and m = 2r and w = size of a key in bits assume chaining & simple uniform hashing • degree (graph): max. degree of a node
node type min max min max
let m1 = size of the old hash table; m2 = size of the new • in-/out-degree: number of incoming/outgoing edges
root 1 b−1 2 b hashing assumptions hash table; n = number of elements in the hash table • diameter: max. shortest path
internal a−1 b−1 a b
• simple uniform hashing assumption • growing the table: O(m1 + m2 + n) • even cycles are bipartite!
leaf a−1 b−1 0 0
• every key has an equal probability of being mapped to • rate of growth • graph is dense if |E| = θ(V 2 )
2. an internal node has 1 more child than its number of keys
every bucket table growth resize insert n items
3. all leaf nodes must be at the same depth from the root adj space (cycle) (clique) use for
• keys are mapped independently increment by 1 O(n) O(n2 )
• terminology (for a node z ) list O(V + E) O(V ) O(V 2 ) sparse
• uniform hashing assumption double O(n) O(n), average O(1)
• key range - range of keys covered in subtree rooted at z matrix O(V 2 ) O(V 2 ) O(V 2 ) dense
• every key is equally likely to be mapped to every square O(n2 ) O(n)
• keylist - list of keys within z
permutation, independent of every other key.
• treelist - list of z ’s children SET ADT searching
• NOT fulfilled by linear probing
• max height = O(loga n) + 1 • breadth-first search ⇒ O(V + E)
• X speed X space × no ordering
• min height = O(logb n) properties of a good hash function • O(V ) - every vertex is added exactly once to a frontier
• X no false negatives × may have false positives
• search(key) ⇒ O(log n) • O(E) - every neighbourList is enumerated once
1. able to enumerate all possible buckets - h : U → {1..m} • hash table: more space, but resolves collisions
• = O(log2 b · loga n) for binary search at each node • parent edges form a tree & shortest path from S
• insert(key) ⇒ O(log n) • for every bucket j , ∃i such that h(key, i) = j
2. simple uniform hashing assumption fingerprint hash table • implement with queue
• split() a node with too many children • depth-first search ⇒ O(V + E)
• only stores m bits - does not store the key in a table
1. use median to split the keylist into 2 halves • O(V ) - DFSvisit is called exactly once per node
hashCode • P (no false positives) with SUHA = (1 − m 1 n
) ≈ ( 1e
n/m
)
2. move median key to parent; re-connect remaining nodes • O(E) - DFSvisit enumerates each neighbour
3. (if the parent is now unbalanced, recurse upwards; if the rules for the hashCode() method • i.e. probability of nothing else in the given (same) bucket
n 1 • with adjacency matrix: O(V ) per node ⇒ total O(V 2 )
root is reached, median key becomes the new root) 1. always returns the same value, if object hasn’t changed • for P (no false positives) < p, need m ≤ log( 1−p )
• implement with stack
2. if two objects are equal, they return the same hashCode
bloom filter
rules for the equals method
• reflexive - x.equals(x) => true • 2 hash functions - requires 2 collisions for a false positive shortest paths
• symmetric - x.equals(y) ⇒ y.equals(x) • for k hash functions (assume independent slots): • Bellman-Ford ⇒ O(V E)
1 kn
• transitive - x.equals(y), y.equals(z) ⇒ x.equals(z) • P (a given bit is 0) = (1 − m ) ≈ ( 1e )kn/m • |V | iterations of relaxing every edge - terminate when an
⇒ • consistent - always returns the same answer • P (false positive) = (1 − ( 1 )kn/m )k entire sequence of |E| operations have no effect
e
• null is null - x.equals(null) => false n • Dijkstra ⇒ O((V + E) log V ) = O(E log V )
• delete(key) ⇒ O(log n) • P (no false positives) < p, need m ≤ k1 log( 1
)
• if the node becomes empty, merge(y, z) - join it with its
1−p1/k • using a PQ to track the min-estimate node, relax its
m −k
chaining • optimal k = ln 2 ⇒ error probability = 2 outgoing edges and add incoming nodes to the PQ
left sibling & replace it with their parent n
• insert(key, value) - O(1 + cost(h)) ⇒ O(1) • delete operation: store counter instead of 1 bit • no negative weight edges!
• for n items: expected maximum cost = O(log n) • insert, delete, query ⇒ O(k) • |V | times of insert/deleteMin (log V each)
log n • intersection (bitwise AND), union (OR) ⇒ O(m) • |E| times of relax/decreaseKey (log V each)
• = Θ( log(log(n)) )
• gives the same false positives as both • with fibonacci heap ⇒ O(E + V log V )
• search(key)
• for DAG ⇒ O(E) (topo-sort and relax in this order)
• worst case: O(n + cost(h)) ⇒ O(n) PROBABILITY THEORY
⇒ n • longest path: negate the edges/modify relax function
• expected case: O( m + cost(h)) ⇒ O(1)
• if the combined nodes exceed max size: share(y, z) = • if an event occurs with probability p, the expected number of • for Trees ⇒ O(V ) (relax each edge in BFS/DFS order)
• total space: O(m + n) 1
merge(y, z) then split() iterations needed for this event to occur is p .
open addressing - linear probing • for random variables: expectation is always equal to the topological ordering
• redefined hash function: h(k, i) = h(k, 1) + i mod m probability • post-order DFS ⇒ O(V + E)
B-Tree
• delete(key): use a tombstone value - DON’T set to null • linearity of expectation: E[A + B] = E[A] + E[B] • prepend each node from the post-order traversal
• (B, 2B)-trees ⇒ (a, b)-tree where a = B, b = 2B
• performance • Kahn’s algorithm (lecture vers.) ⇒ O(E log V )
• possible augmentation: use a linkedList to connect between 1 UNIFORMLY RANDOM PERMUTATION
• if the table is 4 full, there will be clusters of size Θ(log n) • add nodes without incoming edges to the topological order
each level • for an array of n items, every of the n! possible permutations
• expected cost of an operation, E[#probes] ≤ 1 • remove min-degree node from PQ ⇒ O(V log V )
1−α 1
(assume α < 1 and uniform hashing) are producible with probability of exactly n! • decreaseKey (in-degree) of its children ⇒ O(E log V )
Merkle Trees • degrades badly as α → 1 • the number of outcomes should distribute over each • Kahn’s algorithm (tutorial vers.) ⇒ O(E + V )
• binary tree - nodes augmented with a hash of their children • advantages permutation uniformly. (i.e. # #ofofpermutations
outcomes
∈ N) • add nodes with in-degree=0 to a queue; decrement the
• same root value = identical tree • saves space (use empty slots vs linked list) 1
• probability of an item remaining in its initial position = n in-degree of its adjacent nodes. dequeue & repeat
spanning trees • Boruvka’s - O(E log V ) • parent(x) = b x−1 2
c • weighted union + path compression - for m union/find
• any 2 subtrees of the MSTs are also MSTs • each node: store a componentId ⇒ O(V ) • HeapSort: ⇒ O(n log n) always operations on n objects: O(n + mα(m, n))
• for every cycle, the maximum weight edge is NOT in the MST • one Boruvka step: for each cc, add minimum weight • unsorted array to heap: O(n) (bubble down, low to high) • O(α(m, n)) find, O(α(m, n)) union
• for every partition of the nodes, the minimum weight edge outgoing edge to merge cc’s ⇒ O(V + E) dfs/bfs • heap to sorted array: O(n log n) (extractMax, swap to
across the cut is in the MST • at most O(log V ) Boruvka steps back) DYNAMIC PROGRAMMING
• for every vertex, the minimum outgoing edge is in the MST. • update componentIds ⇒ O(V ) 1. optimal sub-structure - optimal solution can be constructed
• Steiner Tree: (NP-hard) MST containing a given set of nodes • directed MST with one root ⇒ O(E) UNION-FIND from optimal solutions to smaller sub-problems
1. calculate the shortest path between any 2 vertices • for every node, add minimum weight incoming edge • union - connect 2 objects • greedy algorithms / divide-and-conquer algorithms
2. construct new graph on required nodes • find - check if objects are connected 2. overlapping sub-problems - can memoize
3. MST the new graph and map edges back to original HEAPS • quick-find - int[] componentId, flat trees • optimal substructure but no overlapping subproblems =
• 2 properties: • O(1) find - check if objects have the same componentId divide-and-conquer
MST algorithms 1. heap ordering - priority[parent] ≥ priority[child] • O(n) union - enumerate all items in array to update id • prize collecting: ⇒ O(kE) or O(kV 2 ) for k steps
• Prim’s - O(E log V ) 2. complete binary tree - every level (except last level) is full; • quick-union - int[] parent, deeper trees • vertex cover (set of nodes where every edge is adjacent to at
• add the minimum edge across the cut to MST all nodes as far left as possible • O(n) find - check for same root (common parent) least one node) of a tree: ⇒ O(V ) or O(V 2 )
• PQ to store nodes (priority: lowest incoming edge weight) • operations: all O(max height) = O(blog nc) • O(n) union - add as a subtree of the root • all pairs shortest path: dijksytra all ⇒ O(V E log V )
• each vertex: one insert/extractMin ⇒ O(V log V ) • insert: insert as leaf, bubble up to fix ordering • weighted union - int[] parent, int[] size • diameter of a graph: SSSP all ⇒ O(V 2 log V )
• each edge: one decreaseKey ⇒ O(E log V ) • increase/decreaseKey: bubble up/down leftwards • O(log n) find - check for same root (common parent) • floyd warshall ⇒ O(V 3 )
• Kruskal’s - O(E log V ) • delete: swap w bottomrightmost in subtree; bubble down • O(log n) union - add as a smaller tree as subtree of root • S[v, w, Pk ] = shortest path from v to w only using nodes
• sort edges by weight, add edges if unconnected • extractMax: delete(root) • path compression - set parent of each traversed node to from set P
• sorting ⇒ O(E log E) = O(E log V ) • heap as an array: the root - O(log n) find, O(log n) union • S[v, w, P8 ] =
• each edge: find/union ⇒ O(log V ) using union-find DS • left(x) = 2x + 1, right(x) = 2x + 2 • a binomial tree remains a binomial tree min(S[v, w, P7 ], S[v, 8, P7 ] + S[8, w, P7 ])

sort best average worst stable? memory data structures assuming O(1) comparison cost orders of growth
bubble Ω(n) O(n2 ) O(n2 ) X O(1) data structure search insert n
selection Ω(n2 ) O(n2 ) O(n2 ) × O(1) sorted array O(log n) O(n) T (n) = 2T ( ) + O(n) ⇒ O(n log n)
insertion Ω(n) O(n2 ) O(n2 ) X O(1) 2
unsorted array O(n) O(1) n
merge Ω(n log n) O(n log n) O(n log n) X O(n) linked list O(n) O(1) T (n) = T ( ) + O(n) ⇒ O(n)
2
quick Ω(n log n) O(n log n) O(n2 ) × O(1) tree (kd/(a, b)/binary) O(log n) or O(h) O(log n) or O(h) n
heap Ω(n log n) O(n log n) O(n log n) × O(n) trie O(L) O(L) T (n) = 2T ( ) + O(1) ⇒ O(n)
2
searching heap O(log n) or O(h) O(log n) or O(h) n
sorting invariants search average T (n) = T ( ) + O(1) ⇒ O(log n)
dictionary O(log n) O(log n) 2
sort invariant (after k iterations) linear O(n) symbol table O(1) O(1) T (n) = 2T (n − 1) + O(1) ⇒ O(2n )
bubble largest k elements are sorted binary O(log n) chaining O(n) O(1) n
selection smallest k elements are sorted quickSelect O(n) open addressing 1
= O(1) O(1) T (n) = 2T ( ) + O(n log n) ⇒ O(n(log n)2 )
interval
1−α 2
insertion first k slots are sorted O(log n) priority queue (contains) O(1) O(log n) n √
merge given subarray is sorted all-overlaps O(k log n) skip list O(log n) O(log n)
T (n) = 2T ( ) + O(1) ⇒ O( n)
4
quick partition is in the right position 1D range O(k + log n)
2D range O(k + log2 n) orders of growth T (n) = T (n − c) + O(n) ⇒ O(n2 )

1 < log n < n < n < n log n < n2 < n3 < 2n < 22n
loga n < na < an < n! < nn

You might also like