0% found this document useful (0 votes)
37 views5 pages

CSC263 Cheat Sheet

263 cheatsheet

Uploaded by

panktipatel104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

CSC263 Cheat Sheet

263 cheatsheet

Uploaded by

panktipatel104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ADTs and Data Structures 1, 2, or 3.

1, 2, or 3. Since a node has 1 more child than internal values and we can perform our queries faster? In-neighborhood, out-neighborhood: given a vertex u in a
Q. What is an ADT? A set of objects together with a set of the number of children is between 2 and 4. Q: What would help us with questions about rank? augment each directed graph, get the set of vertices v such that (v, u) (or(u, v),
operations that can be performed on these objects. Q: What range of values would be allowed in any subtree rooted node x such that it has an additional field ’size[x]’ respectively) is an edge Degree, in-degree, out-degree: compute
1. Objects: integers Operations: ADD (x; y), SUBTRACT (x; y), at node (33)? Between 31 and 40 inclusive that stores the number of keys in the subtree rooted at x the size of the neighborhood, in-neighborhood, or out-
MULTIPLY (x; y), Q: If we were to insert the value 15 into the tree above, where (including x itself). neighborhood, respectively.
QUOTIENT (x; y), REMAINDER (x; y) would it need to go to preserve the order property? In the same Q: How is this related to ‘rank’? Suppose we have a 2-node with Traversal: visit each vertex of a graph to perform some task.
2. Objects: stacks Operations: PUSH (S; x) - add the element x to node as (12) making (12,15) NOTE: This is formally defined in a single element x. What is rank(x) in terms of the keys that come Applications of Graphs
the top of the stack S POP (S) - delete the top element from the section 3.3.1, before x in the tree? WWW-google!/ Scheduling /Chip Design /Network Analysis,
nonempty stack S and return it EMPTY (S) - return true if S is 2-3-4 Trees RANK(x) = 1 + # keys that come before x in the tree. such as transportation flow, cellular coverage, electrical current
empty, false otherwise Let’s formalize our intuition by introducing some notation: Q: Now with respect to the left subtree rooted at x, what is the etc./ Flow Charts/ Explanatory schematics
Q. What is a data structure?A data structure is an implementation A node with d children is called a d-node;_ The values stored at relative RANK(x)?
of an ADT. This includes away to represent objects and the node are labelled k1, k2,…, kd The children are labelled RANK(x) = SIZE (v1) + 1 where v1 is the left child. Data structures for graphs
algorithms for the operations. Examples objects: A stack could v1,v2,….,vd Q: Now suppose we have a 4-node X with elements x1,x2, x3. There are two reasonable data structures to store graphs:
be implemented by either a singly-linked list or an array with a Q: Given the two properties, size and depth what can we say What is the relative rank of xi with respect to the subtree rooted a) Adjacency matrix b) Adjacency list
counter to keep track of the “top.” about the height of a 2-3-4 tree which stores n items? at X? An adjacency matrix: let V = {v1, v2.., vn}.
Q: Why are ADTs important? h h RELATIVE RANK(xi) = 1+# elements the precede xi in the tree We store information about the edges of the graph in an
Important for specification. Provides modularity, usage depends 2 ≤ n+1 ≤ 4 = 1+ SIZE(v1) +1+ SIZE(v2) + . . . 1+SIZE(vi􀀀1) + 1+SIZE(vi) n x n array A where:
only on the definition, not on the implementation of the ADT can
vj
be changed (corrected or improved) without changing the rest of
the program Reusability an abstract data type can be
implemented once, and used in lots of different programs
Recap: ADT is a way to describe what the data is and what
Relating 2-3-4 Trees to Red-Black Trees
The following properties must hold:
1. The root of the tree is black.
2. Every external node is black
j=i

∑ ¿ (¿)+1
A [i, j] =
{10ifotherwise
( vi, vj ) ∈ E

What do we know about the matrix for undirected graphs?


you can do with it. A data structure is a way to describe how 3. The children of a red node are black. j=1 The matrix will be symmetric (A[I, j] and A[j, i] will always hold
the data is implemented and how the operations are 4. All external nodes have the same black depth. (The same So the rank of a node is related to the size of the subtrees rooted the same value).
preformed number of black ancestors at neighbouring nodes. If the graph is weighted, A[I, j] stores the weight of the edge
A black node can have red children or black children Let’s look at rank queries more closely: Computing (vi,vj) if that edge exists, and either 0 or 1 if the edge doesn’t
Analysis of Data Structures and Algorithms All red nodes have only black children. RANK(k): Given key k, do a exist–depends on the application.
The complexity of an algorithm is the amount of resources it Q: How can we relate black and red nodes to 2,3 or 4 nodes in a SEARCH(k) keeping track of the rank of the current node. Complexity of the adjacency matrix data structure?
uses. 2-3-4 tree? black nodes with no red children become 2-nodes. Each time you go down a level you must add the size of the Storage requires θ(n2) space
Types of resources: Running time, space (memory), number of Black nodes with 1 red child become 3-nodes and black nodes subtrees to the left that you skipped and the key itself that you Edge Queries are θ(1).
logic gates (in a circuit), area (in a VLSI) chip, messages or bits with 2 red children become 4-nodes. skipped. An adjacency list: Use a 1-dimensional array A of size n. At
communicated (in a network) Think of this as the ”relative” rank of the key to the left of the entry A[i], we store a linked-list of neighbours of vi If the graph
Dictionaries A dictionary is an important abstract data type B-trees subtree you are exploring. is directed, we store only the out-neighbours.
(ADT). It represents the following object and operations: B-trees are a generalization of 2-3-4 trees. B-trees are multiway Q: When we find x how do we determine its true rank? Each edge (vi, vj) of the graph is represented by exactly one
Object: Sets Each element x has a value key(x). trees with all leaves at the same level and a varying number of take the current rank so far, r, and add the size of the x0s left linked-list node in the directed case, and by exactly two linked-
key(x ) comes from a totally-ordered universe. This means that children per node. A B-tree node can hold at most m keys and child. Note that we did not deal with degenerate cases (such as list nodes in the undirected case
for any two keys a and b, either a > b, a < b, or a = b). pointers to m+children. More formally, the following properties when k does not belong to the tree), but it is easy to modify the Complexity? Storage required is θ(n+m) Edge queries can be
Operations (S is a set, x is an element, and k is a key): must hold in a B-tree of order m.1)The root must hold at least 1 algorithm to treat those cases. made in θ(log( maximum degree ))).How? The lists are stored as
ISEMPTY(S): check whether set S is empty or not key and at most m keys2) Each node must hold between └m/2┘ balanced trees. We now look at two common ways to traverse a
SEARCH(S,k): return some x in S s.t. key(x) = k, or NIL if no keys and m keys. 3) All leaves must be at the same level. Graph Theory graph.
such x exists Q: How does a 2-3-4 tree relate to a B-tree? A graph G = (V,E) consists of a set of vertices (or nodes) V and A Breadth-First Search (BFS)
INSERT(S,x): insert x in S A: A 2-3-4 tree is a B-tree of order 3 set of edges E. Intuition: BFS(vertex v) To start, all vertices are unmarked.
DELETE(S,x): remove x from S Let n = |V|, the num of nodes, and m = |E| the number of edges. 1. Start at v.
Data Structure Search Insert Augmenting Data Structures In a directed graph, each edge is an ordered pair of nodes (u, v) 2. Visit v and mark as visited.
Unsorted array n 1 An “augmented” data structure is simply an existing data (so (u; v) is considered different from (v, u) also, self-loops 3. Visit every unmarked neighbour ui of v and mark
structure modified to store additional information and/or perform (edges of the form (u, u) are allowed._ In an undirected graph, each ui as visited.
Sorted array log n n
additional operations. each edge is a set of two vertices {u, v} (so {u,v} and {v, u} are 4. Mark v finished.
Unsorted slightly linked list n 1
RANK(k): Given a key k, what is its “rank”, i.e., its position the same), and self-loops are disallowed. 5. Recurse on each vertex marked as visited in the
Unsorted double linked list n 1 In a weighted graph each edge e ∈ E is assigned a real number
among the elements in the data structure? order they were visited.
Direct access table 1 1 w(e) called its weight. An undirected graph is said to be
SELECT(r): Given a rank r, which key has this rank? For Q: What information about the graph can a BFS be used to find?
Hash table n n example, if our set of values is 3,15,27,30,56, then connected if there is a path between every two vertices. A The shortest path from v to any other vertex u and this distance
Binary search tree n n RANK(15) = 2 and SELECT(4) = 30. directed graph is said to be strongly connected if, for any two d(v). Whether the graph is connected, Number of connected
Balanced search tree log n log n Let’s look at 3 different ways we could do this. vertices u; v, there is a directed path from u to v. components.
1. Use 2-3-4 trees without modification. Queries: Simply do an Q: What is an appropriate ADT to implement a BFS given an
Binary Search Trees (BSTs) in-order traversal of the tree, keeping track of the number of adjacency list representation of a graph? a FIFO (first in, first
A binary tree is a BST if it satisfies the BST Property: For every nodes visited, until the desired rank or key is reached. out) queue .which has the operations:
node x,_ If node y is in the left subtree of x, then key(x)≥ key(y). Q: What will be the time for a query? at worst O(n) because we ENQUEUE (Q,v) DEQUEUE(Q) ISEMPTY(Q)
If node y is in the right subtree of x, then key(x)≤ key(y).Worst have may have to visit every node if our rank = n Q: What information will we need to store along the way?*the
case complexity for each of SEARCH, INSERT, DELETE? 2. Augment 2-3-4 trees so that each node has an additional field current node* the predecessor *the distance so far from v
O(height of tree)...the height of the tree ranges from log n to since ’rank[x]’ that stores its rank in the tree.
the running times for these operations all depend on the height of Q: What will be the time for a query? For RANK(k), the same as Complexity of BFS(G,v)
the tree, we would like to guarantee that the height is “fairly SEARCH, or O(log n). For SELECT(r), can use binary search: Q: How many times is each node ENQUEUEed?at most once,
good”. O(log n) when it is black, at which point it is coloured green.Therefore, the
This motivates using 2-3-4 Trees. Q: Will the other operations (SEARCH/INSERT/DELETE) adjacency list of each node is examined at most once, so that the
Operations on Graphs
take any longer? INSERT will now require the update of the rank Some standard operations on graphs are: total running time of BFS is O(n+m) or linear in the size of the
field. How quickly can we do this? need to update rank of Add/Remove a vertex/edge. adjacency list.
subsequent nodes, therefore order O(n) . Edge Query: given two vertices u; v, find out if the edge Depth-First Search(DFS)
Q: What is the problem? Could we do better? Still inefficient for (u,v) (if the graph is directed) or the edge {u, v} (if it is Intuition: DFS(G,v) All vertices and edges start out unmarked.
Consider the node (4,9): it contains two values and has three INSERT and DELETE. undirected) is in the graph. 1. Walk as far as possible away from v visiting
children (the nodes (1,2,3), (7,8), and (12)). Neighborhood: given a vertex u in an undirected graph, get the vertices.
Q: What are the possible numbers of values an internal node of a 3. Augment the tree in a more sophisticated way. set of vertices v such that {u, v} is an edge. 2. If the current vertex has not been visited, Mark as
2-3-4This
tree may
studycontain?
source was downloaded by 100000883358294 Q: from
How can we augment the nodes
CourseHero.com of 2-3-4 trees
on 01-05-2025 so that GMT -06:00
12:42:31 visited and the edge that is traversed as a DFS edge.
3. Otherwise, if the vertex has been visited, mark the Q: Where are the bottleneck complexities for each of these data logn
traversed edge as a back-edge, back up to the
previous vertex
structures (implementing priority queues)?
1. Unsorted list: takes time θ(n) for EXTRACT-MIN in the O ∑ n x number of subtrees of height h
4. When the current vertex has only visited neighbours worst-case. n=1
left mark as finished (white). 2. Sorted list (by priorities): takes time θ(n) for INSERT in Q: What does the summation approach?
5. Backtrack to the first vertex that is not finished. worst-case. The sum goes to log n because the height of the whole tree is
Just like BFS, DFS constructs a spanning-tree and gives 3. 2-3-4 tree (key-values are priorities): INSERT and θ(log n).
connected component information. Q: Does it find the shortest EXTRACT-MIN take time θ(log n). Therefore, using the fact that:
path between v and all other vertices? No 4. Direct addressing: (We’ll see this later.) If the universe U of n∞
Q: What is an example of an application in which you would we priorities is small and the priorities are all distinct, then we can
want to be able to find a MCST? store an element with priority k in the kth cell of an array. ∑ 2hh ≤2
A: Imagine you have a network of computers that are connected INSERT takes time θ(1). EXTRACT-MIN requires time θ(|U|) in n=0
by various links. Some of these links are faster, or more reliable, the worst-case(have to look at each location to find the first So building a heap: O(n)
than others. You might want to pick a minimal set of links that nonempty one) Complexity of Prim’s Algorithm
connects every computer (in other words, a spanning tree) such Heaps for Priority Queues Q: What would you expect the complexity of DECREASE-
that these links are overall the best (they have minimum cost). A heap is a binary tree T of elements with priorities such that the PRIORITY to be? O(log n)
Once you have found these links, you never have to use the following heap properties hold: Q: How many times does the while loop iterate? at most once for
remaining slower, or less reliable, links. 1. T is complete: this means that every level of the tree is full every vertex in the graph.
except perhaps the bottom one, which fills up from left to right. What is the complexity of building the initial heap take? N
Implementing a DFS Prim’s Algorithm 2. For each node x in T, if x has a left-child, then p(x) ≤ p(left(x)) Therefore, the worst-case running time is O(m log n) .
Q: Which ADT would be the most helpful for implementing DFS Prim’s algorithm uses a Priority Queue ADT. and if x has a right-child, then p(x) ≤ p(right(x)). Heap Sort How can we use a heap to sort an array?
given an adjacency list representation of G? More formally, a priority queue consists of a set of element each We can conclude a few immediate facts about heaps from the HEAP-SORT(array A)
A stack S to store edges with the operations-PUSH(S, (u,v)) element has a priority. The operations: definition. The Disjoint Set ADT:
POP(S) ISEMPTY(S) 1. ENQUEUE(x,p): insert an element x in the set, with 1. the root has minimum priority. Objects: A collection of nonempty disjoint sets S = S1, S2..,Sk,
priority value p 2. every subtree of a heap is also a heap (in particular, an empty i.e., each Si is a nonempty set that has no element in common
Complexity of DFS (G,s) 2. ISEMPTY(): return whether the priority queue is tree is a heap). with any other Sj. In mathematical notation this is:
Q: How many times does DFS visit the neighbours of a node? empty 3. Since heaps are complete, if a heap contains n nodes, then its Si ∩Sj = for every i ≠ j
Once...when the node is green and the neighbour is black 3. EXTRACT MIN(): remove and return an element x height h is θ(log n). Each set is identified by a unique element called its
Therefore, the adjacency list of each vertex is visited at most with the smallest priority value Storing heaps representative.
once. So the total running time is just like for BFS, θ(n+m) i.e., 4. DECREASE-PRIORITY(x, p): decrease the priority Traditionally, a heap is stored by using Operations:
linear in the size of the adjacency list. Note that the gold edges, or of x to p(x is already in the queue.) an array A and an integer heap size that stores the number of MAKE-SET(x): Given an element x that does not already belong
the DFS edges form a tree called the DFS-tree. elements currently in the heap (or the number of nonempty to one of the sets, create a new set {x} that contains only x (and
We can specify edges on DFS-tree according to how they are Algorithm Overview entries in A). assign x as the representative of that new set).
traversed during the search. 1. Start with an empty set A (A will contain edges of In general, if element x is stored at A[i], then left(x) is stored at FIND-SET(x): Given an element x, return the representative of
1. Tree-Edges are the edges in the DFS tree. the MCST) A[2i] and right(x) is stored at A[2i+1]. the set that contains x (or some special value, like NIL, if x does
2. Back-Edges are edges from a vertex u to an ancestor 2. Pick an arbitrary start vertex s. If the size of the array is close not belong to any set).
of u in the DFS tree. 3. Add (s;􀀀) to A to the number of elements in UNION(x,y): Given two distinct elements x and y let Sx be the
3. Forward-Edges are edges from a vertex u to a 4. Pick the edge, e ∈ E, of smallest weight such that the heap, then this data set that contains x and Sy be the set that contains y.
descendent of u in the DFS tree. structure is extremely space- Two things need to be updated to maintain a valid collection of
exactly one endpoint is a vertex in A
4. Cross-Edges are all the other edges that are not part efficient because we don’t have sets:
5. Add this edge e to A
of the DFS tree (from a vertex u to another vertex v to store any pointers. 1. Remove Sx and Sy from the collection (since all the sets must
6. Repeat 4. and 5. until every vertex is has an endpoint
that is neither an ancestor nor a descendent of u in of an edge in A be disjoint).
the DFS tree).Only apply to directed graphs. 2. Pick a representative for the new set. Note: If both x and y
Q: How can a DFS be used to determine whether a graph G has belong to the same set already (i.e., Sx = Sy), nothing is done by
any cycles? Implementing priority queues with heaps this operation.
It is not hard to see that there is a cycle in G if and only if there We can perform the priority queue operations on a heap as Q: How can we test whether u and v are connected?
are any back-edges when DFS is run . follows: INSERT: Increment heap size and add the new element FIND-SET(u) == FIND-SET(v)?
Q: How can we detect back-edges during a DFS? at the end of the array.
Add a test after the line marked by (*) in DFS. If the color of v is Q: Are we done? No. The result might violate the heap property. Kruskal’s Algorithm
green instead of black, then we know that we have seen v before Q: How can we fix this? Percolate the element up (exchanging it Intuition: Grow an MCST A by repeatedly adding the “lightest”
on the current path from the source s. This means that the with its parent) until its priority is no smaller than the priority of
edge(u,v) is a back edge and therefore forms a cycle. its parent.
Q: How do we do this? Define 2 arrays: Q: What is the worst-case complexity? θ( height ) = θ(log n).
Minimum Cost Spanning Trees (MCSTs) p such that p[u] contains the a vertex v such that (v; u) ∈ EXTRACT-MIN: Decrement heap size and remove the first
Let G = (V,E) be a connected, undirected graph with edge E and w(v; u) is minimized for all v ∈ A adjacent to u. priority element of the array.
Q: How can we make the heap valid? Move the last element in
weights w(e) for each edge e ∈ E. A tree is a subset of such that priority [u] contains w(p[u], u).
Define a priority queue Q containing vertices and their priorities. the array to the first position (so the heap now has the right
edges A C E such that A is connected and does not contain a ”shape”), and percolate this element down until its priority is no
cycle. Algorithm outline using priority queues: greater than the priorities of both its children.
Q: How many edges must any spanning tree contain? 1. A = {} HEAPIFY(A,i) percolates x downwards until the subtree of the
n - 1 edges where |V| = n 2. For each vertex u, assign NIL to p[u] element now stored at A[i] is a heap. The running time is...as with Data Structures for Disjoint Sets
A minimum cost spanning tree (MCST) is a spanning tree A such 3. For each vertex u, assign 1 to priority[u] EXTRACT_MIN, is θ(log n). 1. Arrays 2. Linked lists 3. Trees
that the sum of the weights is minimized for all possible spanning 4. Initialize a priority queue Q, by inserting (u,∞) for Q: Why are we guaranteed that the preconditions for HEAPIFY Arrays: One position for each element. Each position stores the
trees B. each u ∈ V are met before each call? Because each item in the second half of element and an index to the set representative. For example, the
the array is already a heap (it’s a leaf). collection of sets
5. EXTRACT MIN() i.e., remove the vertex with
smallest weight edge with an endpoint in A Q: How many calls do we make to HEAPIFY? O(n) calls {{A}, {B,E},{C,F,G},{D}}
6. Insert (p[u], u) into A (assuming that p[u] 6= NIL). Q: How long does each one take? O(log n) time
7. For each vertex v adjacent to u, if v ∈ Q and if w(u, Therefore, we get a bound of O(n log n).But in fact, we can do
v) is smaller than priority[v] then DECREASE better by analyzing more carefully. We call HEAPIFY on each
subtree of height ≥ 1. HEAPIFY runs in time proportional to the Operations
PRIORITY (v, w(u, v)) assign u to p[v]
height of that subtree. Therefore, we can estimate the total MAKE-SET(x): store x in the next available location, O(1)
This study source was downloaded by 100000883358294 from CourseHero.com on 01-05-2025 12:42:31 GMT -06:00 running time as follows: FIND-SET(x): the index value stored with x indicates the
representative of the set containing x, O(1). Lower Bound Just like for the linked list with back pointers but amortized seq. comp. = - Shown that the total credit can never be negative?
UNION(x,y): Let X be the index value stored with x and Y be the
index value stored with y. If X ≠ Y , then go through every
no size I.e., we can create a tree that is just one long chain with
m/4 elements.
worst−case sequence complexity each element has a credit of 1 while it is in the stack,
and there can never be a negative number of
element in the array and replace every index value equal to Y
with an X. Takes time O(length of list).
Q: How can we improve the trees data structure representation of
disjoint sets?
m -
elements in the stack...so, yes.
Shown that the total charge for any sequence of m
Example: Suppose that we want to maintain a linked list of
Example: UNION(A,E) results in: Add “union-by-weight”. keep track of the weight (i.e., size) of operations is an upper bound on the total cost for
elements under the operations INSERT, DELETE, SEARCH,
each tree and always append the smaller tree to the larger one that particular sequence? The total charge for m
starting from an initially empty list.
when performing UNION. The complexity of MAKE-SET and operations is at most 2m, so the total cost is
Recall that for a linked list containing k elements:
UNION are still O(1) and O(max(height(treex),height(treey))) O(m)...done.
Worst-case sequence complexity for m operations: INSERT belongs to θ(1)
What is the complexity of FIND-SET? - Computed the amortized complexity? Dividing by
Upper bound:-In terms of n, the number of elements, each DELETE belongs to θ(k)
Suppose during a sequence of m operations, there are n MAKE- the number of operations gives us an amortized
operation takes time at most O(n). Therefore, any sequence of m SEARCH belongs to θ(k)).
SET operations, then the maximum height of any tree is O(log n) complexity of O(1) for each operation.
operations takes time O(m2). Q: If we perform a sequence of m operations, what is the worst
Q: What does this tell us about the running time of any individual Another Example Dynamic Arrays An array of fixed size, and
Lower bound: Do m/2 MAKE-SETs followed by m/2 - 1 case total time for all the operations?
FIND-SET operation? O(log n) So the total time is O(m log n). two operations,
UNIONs – each UNION will take time Ω(m/2) (the number of Q: Can we do better? Add path compression APPEND: store an element in the first free position of the array
elements)– so Ω(m/2) * (m/2 -1)) = Ω(m2). When performing FIND-SET(x), keep track of the nodes visited DELETE: remove the element in the last occupied position of the
Circularly-linked List on the path from x to the root of the tree by using a stack or array.
MAKE-SET(x): make a new set by creating a new linked list Example:
queue once the root is found, update the parent pointers of each A mail-order company employs a person to read customer’s Q: What is an advantage of this data structure?
with element x, O(1) node to point directly to the root. accessing elements is very efficient.
FIND-SET(x): in the worst-case, we need to traverse every link letters and process each order:
Q: How does this affect the complexity of the FIND-SET - we care about the time taken to process a day’s worth of orders, Disadvantage?
in a list before we find the first element. Note that x has a pointer operation? doubles it the first time, constant the rest of the time the size of the structure is fixed. We can get around the
to the next element, but doesn’t know which list it belongs to. for example, and not the time for each individual order.
Consider a sequence of operations including n MAKE-SET ops, A symbol table in a compiler is used to keep track of information disadvantage with the following idea:
O(length of list) at most n- 1 UNIONs and f FIND-SET ops, the worst-case When trying to APPEND an element to an array that is full,
UNION(x,y): Append list y to list x, update the pointers between about variables in the program being compiled:
running time of a single operation in the sequence is: - we care about the time taken to process the entire program,i.e., create a new array that is twice the size of the old one, copy all
the two lists. Fix this and the lower bound section too. O(1)
Worst-case sequence complexity for m operations: the entire sequence of variables, and not about the time taken for the elements from the old array into the new one, and then carry
each individual variable. out the APPEND operation.
Upper bound The number of elements in the structure at any Amortized cost?
point in any sequence of m operations is ≤ m. The complexity of MULTIPOP Extend the standard Stack ADT
each operation in a sequence is O(m), so the total time is O(m2).
Q: Can we do better? MULTIPOP(S,k): removes the top k elements from the stack. The Think about the cost of performing n APPEND operations,
Add “union-by-rank” and path compression. time complexity of each PUSH and POP operation is O(1) The starting from an empty array of size 1, in the amortized sense.
Lower Bound: Perform m/4 MAKE-SETs with different
We’ll store the rank of a tree at it’s root. time complexity of MULTIPOP(S,k) is simply proportional to k,
elements. Then do m/4 - 1 UNIONs constructing one list with The Aggregate Method
Operations the number of elements removed
m/4 elements, Followed by m/2 FIND-SETs on the second Suppose we have an empty dynamic array. Consider adding 6
MAKE-SET(x): Same as before, add rank(x)=0.
element in the list, so that each one requires time Ω(m/4)=Ω(m2). elements and the total cost accumulated
UNION(x,y): We know rank(tree y) and rank(tree x). The Aggregate Method
Q. How can we improve this?
Which root of tree x and tree y becomes the new root? We compute the worst-case sequence complexity of a sequence
Linked-list with front pointer.
The node with higher rank is the new root of operations and divide by the number of operations in the
MAKE-SET(x): create a new linked list with element, O(1)
What is the rank of the new tree? sequence.
FIND-SET(x): Follow x’s pointer back to the head. O(1)
Same as larger rank unless the two nodes have the same rank, For our MULTIPOP example, consider performing a total of n
UNION(x,y): Append list y to the end of list x, update the
pick any one as the new root and increase its rank by 1 operations from among PUSH, POP, and MULTIPOP, on a stack
pointers .Complexity: θ(length of listy)
FIND-SET: Nothing changes that is initially empty.
Worst-case sequence complexity for m operations:
Q: Worst-case sequence complexity? O(mlog* n). Naive Approach:
Upper Bound Same as before.
Q: What is log*?It is the number of times that you need to apply The stack can contain at most: n elements .- Therefore, the cost of
Lower Bound Perform m/2 MAKE-SETs with different
log to n until the answer is less than 1. each operation is at most O(n) for a total of O(n2). This gives us
elements. Then do m/2- 1 UNIONs with a set of size one and the
Example: an average of O(n). So the amortized cost over 2n APPENDS is
growing list. Total time is Ω(m2)
problem? still very inefficient if we have long unions. Since there can be at most n PUSH operations, there can be at
Linked list with extra pointer to front and “union-by-weight” most n POP operations implies that the total time taken for the
Now we keep track of the number of elements in each list. entire sequence is at most O(n).This gives us that each operation
Q: Are MAKE-SET and FIND-SET affected? no. takes on average O(1) time. The Accounting Method
Suppose that we implement the disjoint set ADT using linked lists
Q: What about UNION? We always append the smaller set to the with union-by-weight.(Remember, linked-lists have a pointer Q: What is the cost per APPEND if we do not need to increase
longer one.This is called “union-by-weight”. The “weight” of a The Accounting Method the array size?1 Q: What does the cost per APPEND depend on if
back to the representative element.) The cost to us for each operation is the operation’s worst case
set is simply its size. Worst-case sequence complexity for m How many MAKE-SETs do we do? n Complexity? O(n) we need to
operations: running time. increase the array size? The current array size.Q: So the cost per
Q: How many FIND-SETs do we do? 2m-since we visit the We charge the customer for each operation such that we cover our
Upper Bound Let n be the number of MAKE-SET operations in endpoints of an edge 2 times. Complexity? O(m) APPEND if we need to increase the array sizeis: The cost is 1 per
the sequence. For some arbitrary element x, how many times can costs with what we earn in charges copied element and 1 additionally to add the new element.Q:
Q: How many UNIONs do we do? at most m Aim for a total charge as close as possible to the total cost, this
x’s back pointer can be updated? Consider when this happens: Complexity? at most O(n log n) What should we charge for each APPEND? Guess: 1 2 3 or 4 ? 3
This happens only when list x is UNIONed with a set that is no will give us the best estimate of the true complexity. Consider again APPENDING to an empty array with size 0 and
So the worst-case complexity of Kruskals is O(mlogm + n +m+n For our MULTIPOP example, the cost of each operation
smaller. So each time x’s back pointer is updated, the resulting set log n). The bottleneck is the sorting (priority queue step). this charge set-up.
must have size at least twice |listx|. (representing the time complexity of each operation) is as The cost for this operation is 1 and the charge is 3. This leaves a
Therefore the complexity is O(m log m) follows:
So the limit on the number of times that x0s back pointer is credit of 2.
updated is log(n) times. cost(PUSH(S; x)) = 1
Amortized Analysis cost(POP(S)) = 1
This is true for every element x. Therefore, O(n log n) “Time for person who processes orders for whole day, not each”
Since the time for other operations is still O(1), and there are m cost(MULTIPOP(S; k)) =min(k,|S|)
Two cases: Each element can take part in at most two operations (one PUSH
operations in total, the total time is O(m+n log n). 1. We need to know the complexity of each operation in the
Trees and one POP or MULTIPOP)
sequence, then we can simply analyze the worst-case complexity Therefore, the total “cost” for one element is 2, so we will assign
MAKE-SET(x) just create a new tree with root x, O(1) of each operation, or
FIND-SET(x) simply follow ”parent” pointers back to the root of charges as follows:
2. We need to know only the complexity for processing the entire charge(PUSH) = 2
x’s tree. Complexity: O(depth of x) sequence.
UNION(x,y): just make the root of one of the trees point to the charge(POP) = 0
Amortized Analysis complexity of a sequence of operations charge(MULTIPOP) = 0 So for n insertions the amortized cost is O(n):
root of the second one. Need to call FIND-SET() for each of x -- > the worst-case complexity of each operation
and y Accounting Method Checklist: Q: What might be an advantage of the accounting method over
worst-case sequence complexity ≤ - Shown that each operation can be payed for? Yes the aggregate method?
m *from
This study source was downloaded by 100000883358294 worst-case of a single operation
CourseHero.com on 01-05-2025 12:42:31 GMT -06:00
An advantage of the accounting method over the aggregate Recap:
method is that different operations can be assigned different - t(n) is the number of steps algorithm A takes on a
charges, representing more closely the actual amortized cost of specific input of size n
each operation. - Twc(n) is the greatest number of steps algorithm A
Mini Stats Review will take on any input of size n. I.e., an element with key k1 and an element with key k2 can both
Sample space of random die is S = {1, 2, 3, 4, 5, 6} - tn(x) is the number of steps taken by algorithm A on be stored at position h(k1 ) = h(k2 ) (see figure).
An event is rolling a 5. a specific input x . tn is a random variable Tavg(n) = Complexity
a random variable? with a fair die, each of 1 ,…,6 E[tn] is the average number of steps algorithm A will Assume we can compute the hash function h in constant time.
The expected value of the face is then: take on inputs of size n. The INSERT operation will take time θ(1) , Given an element a,
we just compute i = h(key(a)) and insert a at the head of the
Direct Addressing linked list in position i of the hash table.
Upper and Lower Bounds Recall that a dictionary is an ADT that supports the following
Recall that Twc(n) = max{t(x) : x has size n} DELETE takes θ(1) , Assume the list is doubly-linked, and we
operations on a set of elements with well-ordered key-values: are given a pointer to the element.
Q: What do we need to do to prove that Twc(n) has an upper 1. INSERT 2. DELETE 3. SEARCH
bound of f(n)? SEARCH(S,k) is a little more complicated.
Q If we know the key-values are integers from 1 to K, what is a
Show that EVERY member of the set of inputs of size n takes at fast and simple way to represent a dictionary?
most f(n) time. Worst Case Complexity of Search
just allocate an array of size K and store an element with key i Q: What happens if |U|> m(n - 1)?
Q: If f(n) is an upper bound of Twc(n) how do we denote this? in the ith cell of the array. This is called direct addressing. any given hash function will put at least n key-values in some
T(n) ∈ O(f(n)) Q: What is the asymptotic worst-case time for each of the entry of the hash table.
Q: What about lower bounds? important operations? Θ(1). Q: What is the worst case? Every entry of the table has no
That at least g(n) steps are needed in the worst case. Q: What may be a major problem with direct addressing, elements except for one entry, which has n elements.
Q: What do we need to do prove that g(n) is a lower bound? Find though...If the key-values are not bounded by a reasonable Then we have to search to the end of that list to find k
an example that needs g(n) steps. number, the array will be huge This takes time θ(n) (not so good) .
Q: How do we denote that g(n) is a lower bound for Twc(n)? Example 1: Reading a text file Example.
Twc(n) ∈ Ω (g(n)): Suppose we want to keep track of the frequencies of each letter in
Average-case complexity a text file.
Q: Why is this a good application of direct addressing? So the average-case running time of SEARCH under simple
Let A be an algorithm. Consider the sample space Sn of all inputs uniform hashing with chaining is O(a).
of size n. Consider a fixed probability distribution where each There are only 256 ASCII characters, so we could use an array of
256 cells, where the ith cell will hold the count of the number of Depending on the application, we can sometimes consider
input is equally likely Let tn : Sn ! N be the random variable such For the average case, the sample space is the set of elements a to be constant since we can make m bigger when we
that tn(x) is the number of steps taken by algorithm A on a occurrences of the ith ASCII character in our text file.
Example 2: Reading a data file of 32-bit integers that have key-values from U. For any probability distribution on know that n will be large.
specific input x. U, we assume that our hash function h obeys a property called When this is the case, SEARCH takes time O(1) on average.
Then E[tn] is the expected number of steps taken by algorithm A Suppose we want to keep track of the frequencies of each
number. simple uniform hashing. This means that if:
on inputs of size n Ai is the event (subset of U) {k ∈ U | h(k) = i},
The average case time complexity of A on inputs of size n is Q: Is this a good or bad application of direct addressing?
The array would have to be of size 232, which is pretty big! in words, Ai are all those members k of U such that the hash Examples of Hash Functions
defined as Tavg(n) = E[tn] function computes the same index i. then: Recall the definition of simple uniform hashing:
Example: Let’s use LIST SEARCH as algorithm A. If Ai (subset of U) is the event {k ∈ U | h(k) = i},
LIST_SEARCH (L, k) Hashing
Q: Could it be possible to take advantage of the nature of the Then
z := head(L) Q: What does Pr(Ai) = 1/m mean?
while (z != NIL) and (key(z) != k) do data in Example 2?
A: Build a hash table: each entry in the hash table is used just as much as any other.
z := next(z) So the expected number of elements in any entry is n=m .
return z - Suppose that the key-values of our elements come
We will call this the load factor, denoted by a. This assumption Ideally, the hash table gets evenly used for whatever distribution
Consider any input (L; k) where L has length n. Suppose we from a universe (or set) U,
may or may not be accurate depending on U, h and the of keys we are dealing with.
partition S into n+1 events such that: - allocate a table (or an array) of size m (where m <
probability distribution on U. Q: How can we choose a good hash function? For uniformly
Ai represents that the key k appears in the ith position jUj) , -use a hash function h : U -> {0;,..,m-1} to
To calculate the average-case running time: distributed keys in the range 1 through K (for large K), the
and if k is not in the list L? decide where to store the element with key-value x.
Let T be a random variable which counts the number of elements following methods come close to simple, uniform hashing:
let A0 represent this event. I.e., x gets stored in position h(x ) of the hash table.
Example: checked when searching for key k. Let Li be the length of the list
Now, the sample space on an input of size n is: Sn = {Ai | for all at entry i in the hash table. Then the average-case running time is: The Division Method
i, 0 ≤ i ≤ n} If we assume that each of these n + 1 possible inputs Q: Given that the set of all keys is the set of all possible integer
First choose a natural number m. Then, the hash function h(k) is
is equally likely then each input has probability values from 0 to 232-1, what are some possible hash functions if just h(k) = k mod m.
m = 1,024(210)?
1/(n+ 1) and the number of steps for each event is: Take key mod m which gives the last 10 bits, or key m which
m: prime number not too close to power of 2
Example. Most compilers or interpreters of computer programs
gives the first x digits (217/210 gives first 7 bits).

{
2i for i≠ 0 construct a symbol table to keep track of the identifiers used in
The idea is that when we want to access key k, instead of looking
t (A ) =
n t the input program.
up T[k] in a table T, we look in T[f(k)]
2 n+1 if i=0 Q: If m < |U|, then what must be true about some k1 and k2 in U?
Q: We would like to use the division method for hashing. How
can we do this?
there must be k1; k2 ∈ U such that k1 ≠ k2 and yet Turn the identifiers (strings of text) into positive integers.
h(k1) = h(k2). This is called a collision. We can do this by:
Q: How can we resolve a collision? • considering each string to be a number in base128 (if there are
Q: Does this make intuitive sense? 128 text characters).
Example. You have a small address book and one of the letters Yes! If we are searching for a key that is not in the hash table, we • each character x can be represented by a number from 1
fills up, for example, ”N”s.
need to traverse one complete linked list whose average size is a. through 128 denoted num(x).
Where do you add the next ”N” entry? Q: What if k 2 Lj for some j? Then we want to consider k chosen
Closed Addressing: write a little note explaining where to find • a string of characters xnxn1...x1 can be represented uniquely by
uniformly at random from the elements in hash table T. So:
rest of the N names (explicit directions) the number
Open Addressing: flip to the next page–overflow (general rule)
and the probability that k is the jth element in bucket i
Closed Addressing
Then: conditional to the fact that h(k) = i, is uniform, i.e., equals 1/Li
Q: How can we resolve a collision using Closed Addressing?
This value is somewhat smaller (as expected) than Twc(n) The Multiplication Method
Chaining: Idea: store a linked list at each entry in the hash table
Asymptotically, are Tavg(n) and Twc(n) the same?
yes, they are both θ(n). For some algorithms, Twc and Tavg are h(k) = floor[ m x fract (kA) ]
different even when written asymptotically A: fixed real number
This study source was downloaded by 100000883358294 from CourseHero.com on 01-05-2025 12:42:31 GMT -06:00
fract(x) is the fractional part of a real number x. END KRUSKAL-MST
m: power of 2
Closed Addressing Theorem. If G is connected, Kruskal’s algorithm produces
We handle collisions by enlarging the storage capacity at the an MST (given by the edges in A).
relevant entry in the hash table, e.g., using a linked list.
Open Addressing Proof. We’ll prove the following three claims:
Each entry in the hash table stores only one element, in particular, Intuition because there are n j + 1 elements that we haven’t seen (1) “The graph F = (V, A) is a forest” is a loop
we only use it when n < m. among the remaining mj+1 slots that we haven’t seen. Hence, invariant,
Q: How can we insert a new element if we get a collision? (2) when the while loop terminates F = (V, A) is connected
Answer: (3) the sum of weights of edges in A is the same as the
- Find a new location to store the new element. sum of weights in an MST. Taken all together these imply
- We need to know where we put it as well. that A is an MST because (1) and (2) show A is a spanning
- Search a well-defined sequence of other locations in the hash tree and (3) shows its weight is minimal.
table until we find one that’s not full. Now we can calculate the expected value of T, or the average- 1. The statement in italics is true the first time we enter
This sequence is called a probe sequence. case complexity of insert. the loop body since no edges have been added to A (so
there are no cycles in A). If the statement is true upon
Probe Sequences entering the loop body then it will be true at the end of the
si = (h(k) + i) mod m for i = 0,1,2,.... body because an edge e is added to A only if its endpoints
Q: What is the problem with linear probing? are in different components of F in which case adding e
Clustering: cannot produce a new cycle.
Q: What happens when we hash to something within a group of 2. Suppose F is not connected and consider its components,
filled locations? say they have edge sets V1, V2, . . .Vk.
- we have to probe the whole group until we reach an empty slot. Since G is connected there must be at least one edge of E
- we increase the size of the cluster.Resulting in two keys that with an endpoint in V1 and the other endpoint in one of
didn’t necessarily share the same “home” location ending up the Vi, i = 2, . . . , k. If there are several such edges let e
with almost identical probe sequences. be the one of least weight. By our assumption, it is an
element of E but not of A. However since the algorithm
considers edges in order of weight, when it considers e
there would be no other edges from V1 to the “outside
Non-Linear Probing: world” (i.e. vertices of the other Vi) so the two endpoints
Non-linear probing includes schemes where the probe sequence of e would at that moment be in different components and e
does not involve steps of fixed size. would have been added to A, a contraction.
Example: Quadratic probing where the probe sequence is 3. Let T be an MST (i.e the edge set of an MST). If A =
calculated as: T we are done. If A = T let e ∈ T be the smallest weight
si = (h(k) + i2) mod m for i = 0,1,2,.... Asymptotic Notation edge in T \ A. Note A ∪ {e} contains a single cycle C and
all its edges are lower weight than e (because the algorithm
Q: Now what problem may occur? Let d(n), e(n), f(n) and g(n) be functions mapping adds edges in the order of weight and would otherwise have
probe sequences will still be identical for elements that hash to nonnegative integers to nonnegative reals. Then added e to A before one of the other edges of C). Also
the same home location. 1. If d(n) is O(f(n), then ad(n) is O(f(n)), for any some edge f of C is in A but not T (because T is a tree).
constants a>0 Let A2 := A \ {f } ∪ {e}. This has one more edge in
Double Hashing: common with T than A did, but weight(A) ≤ weight(A2).
In double hashing we use a different hash function h2(k) to
2. If d(n) is O(f(n)) and e(n) is O(g(n), then d(n)
Continuing in this way we can keep producing spanning
+e(n) is O(f(n)+g(n)) trees which have more and more edges in common with T.
calculate the step size.
The probe sequence is: 3. If d(n) is O(f(n)) and e(n) is O(g(n)), then Eventually we have a tree Ar with all edges in common
Ai = (h(k) + j * h2(k)) mod m for j = 0,1,2,.... d(n)e(n) is O(f(n)g(n)) with T (i.e. T = Ar). On the other hand, the total weights
4. If d(n) is O(f(n) and f(n) is O(g(n)), then d(n) is were non-decreasing too:
Also, we want to choose h2 so that, if h(k1) = h(k2) for two keys weight(A) ≤ weight(A2) ≤ weight(A3) ≤ . . . ≤ weight(Ar)
O(g(n)
k1, k2, it won’t be the case that h2(k1) = h2(k2). = A(T ).
5. If f(n) is a polynomial of degree d, then f(n) is
O(nd) Thus A must have the same weight as T .
Analysis of Open Addressing:
Notice that in open addressing, INSERT and SEARCH take the 6. nx is O(an ) for any fixed x > 0 and a>1
same amount of work. 7. log nx is O(log n) for any fixed x> 0 Pre-Order Traversal: node, left, right
Let’s consider the complexity of INSERT for a key k: 8. logx n is O(ny) for any fixed constants x>0 and Yes, it is possible that a pre-order traversal of a heap returns the
It’s not hard to come up with worst-case situations where the keys in decreasing order. This occurs when at every node the key
y>0
above types of open addressing require O(n) time for INSERT. of the left child is greater than the
To simplify the analysis of the average case, we make some key of the right child (if it exists). For any set of keys, there is
assumptions: Proof of Kruskal’s Algorithm exactly one heap that obeys this property.
- there is a hash table with m locations In-order traversal: left, node, right
- the hash table contains n elements and we want to insert a new KRUSKAL-MST(G=(V,E),w:E->Z) No, it is not possible that an in-order traversal of a heap returns
key k. A := {} the keys in sorted order. This is because any heap containing at
- consider a random probe sequence for k, that is, it’s probe insert the edges into a priority queue Q least 3 keys will contain a parent x
sequence is equally likely to be any permutation of (0,1,...,m 1). for each vertex v in V with children y and z such that x > y and x > z but an in-order
MAKE-SET(v) traversal of this subtree will necessarily visit x between y and z
Computing the Expected Average Complexity end (for) which is not in sorted order.
Let T denote the number of probes performed in the INSERT. Let while (Q not empty) Post-Order traversal: left, right, node
Ai denote event that each location until the i-th probe is occupied. e = EXTRACT-MIN(Q)
if FIND-SET(u) =/= FIND-SET(v)
UNION(u,v)
A := A U {e}
end (if)
end (while)
This study source was downloaded by 100000883358294 from CourseHero.com on 01-05-2025 12:42:31 GMT -06:00
Powered by TCPDF (www.tcpdf.org)

You might also like