DAA Lecture Notes
DAA Lecture Notes
Ref: https://cp-algorithms.com/data_structures/disjoint_set_union.htm
Ref: https://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/16/Small16.pdfl
Q4 What is path compression and union by rank? What is disadvantage of array implementation
of Union Find.
Data structure Disjoint Set Union or DSU. Often it is also called Union Find because of its two
main operations.
This data structure provides the following capabilities. We are given several elements, each of
which is a separate set.
A DSU will have an operation to combine any two sets, and it will be able to tell in which set a
specific element is.
The classical version also introduces a third operation, it can create a set from a new element.
Thus the basic interface of this data structure consists of only three operations:
● union_sets(a, b) - merges the two specified sets (the set in which the element a is
located, and the set in which the element b is located)
● find_set(v) - returns the representative (also called leader) of the set that contains the
element v.
This representative is an element of its corresponding set. It is selected in each set by the data
structure itself (and can change over time, namely after union_sets calls). This representative can
be used to check if two elements are part of the same set or not. a and b are exactly in the same
set, if find_set(a) == find_set(b). Otherwise they are in different sets.
As described in more detail later, the data structure allows you to do each of these operations in
almost O(1)
In the following image you can see the representation of such trees.
In the beginning, every element starts as a single set, therefore each vertex is its own tree.
Then we combine the set containing the element 1 and the set containing the element 2
Then we combine the set containing the element 3 and the set containing the element 4. And
in the last step, we combine the set containing the element 1 and the set containing the element 3.
For the implementation this means that we will have to maintain an array parent that stores a
reference to its immediate ancestor in the tree.
#include <iostream>
#include <vector>
using namespace std;
int main() {
// Number of elements
int n;
cout << "Enter the number of elements: ";
cin >> n;
if (op == "union") {
int u, v;
cout << "Enter two elements to union: ";
cin >> u >> v;
union_sets(u, v);
cout << "Union of " << u << " and " << v << "
completed.\n";
} else if (op == "find") {
int u;
cout << "Enter the element to find the set: ";
cin >> u;
cout << "The representative of " << u << " is: "
<< find_set(u) << endl;
} else {
cout << "Invalid operation. Use 'union' or
'find'.\n";
}
}
return 0;
}
O(n)
This is far away from the complexity that we want to have (nearly constant time).
Therefore we will consider two optimizations that will allow to significantly accelerate
the work.
Explanation
Input:
mathematica
CopyEdit
Enter the number of elements: 5
Enter the number of operations (union/find): 4
Enter operation (union/find): union
Enter two elements to union: 1 2
Enter operation (union/find): union
Enter two elements to union: 3 4
Enter operation (union/find): union
Enter two elements to union: 2 3
Enter operation (union/find): find
Enter the element to find the set: 4
Output:
sql
CopyEdit
Union of 1 and 2 completed.
Union of 3 and 4 completed.
Union of 2 and 3 completed.
The representative of 4 is: 1
Final parent array: 1 1 1 1 5
Features
Visualization
https://www.cs.usfca.edu/~galles/visualization/DisjointSets.html
If we call find_set(v) for some vertex v, we actually find the representative p for all
vertices that we visit on the path between v and the actual representative p. The trick is
to make the paths for all those nodes shorter, by setting the parent of each visited
vertex directly to p.
You can see the operation in the following image. On the left there is a tree, and on the
right side there is the compressed tree after calling find_set(7), which shortens the
paths for the visited nodes 7, 5, 3 and 2.
int find_set(int v) {
if (v == parent[v])
return v;
The simple implementation does what was intended: first find the representative of the
set (root vertex), and then in the process of stack unwinding the visited nodes are
attached directly to the representative.
This simple modification of the operation already achieves the time complexity
O(logn) per call on average (here without proof). There is a second modification, that
will make it even faster.
O(n) . With this optimization we will avoid this by choosing very carefully which tree
gets attached.
There are many possible heuristics that can be used. Most popular are the following
two approaches:
and in the second one we use the depth of the tree (more precisely, the upper bound
on the tree depth
, because the depth will get smaller when applying path compression).
In both approaches the essence of the optimization is the same: we attach the tree with
the lower rank to the one with the bigger rank.
void make_set(int v) {
parent[v] = v;
size[v] = 1;
a = find_set(a);
b = find_set(b);
if (a != b) {
swap(a, b);
parent[b] = a;
size[a] += size[b];
}
And here is the implementation of union by rank based on the depth of the trees:
void make_set(int v) {
parent[v] = v;
rank[v] = 0;
a = find_set(a);
b = find_set(b);
if (a != b) {
swap(a, b);
parent[b] = a;
if (rank[a] == rank[b])
rank[a]++;
Both optimizations are equivalent in terms of time and space complexity. So in practice
you can use any of them.
Time complexity
O(logn) in the worst case, but if we do m such calls back to back we will end up
with an average time of O(α(n))
Also, it's worth mentioning that DSU with union by size / rank, but without path
compression works in
The linked list representation of disjoint sets is one way to implement the
disjoint-set data structure. In this representation, each set is maintained as a
linked list, and every element in the set points to its parent or the head of the list
(which represents the set leader). Below is a detailed explanation of this
approach:
Key Components
1. make_set(x):
○ Creates a new set with a single element x.
○ x becomes the head of the list and points to itself.
2. find_set(x):
○ Returns the representative (leader) of the set containing x.
○ Traverse the linked list to find the head node.
3. union_sets(a, b):
○ Merges the sets containing a and b.
○ Append the smaller linked list to the larger one (union by size).
○ Update all the elements in the smaller list to point to the leader of the
larger list.
Code Implementation
cpp
CopyEdit
#include <iostream>
#include <unordered_map>
using namespace std;
int main() {
// Example usage
make_set(1);
make_set(2);
make_set(3);
make_set(4);
make_set(5);
union_sets(1, 2);
union_sets(3, 4);
union_sets(2, 3);
return 0;
}
1. make_set(x):
○ Creates a new node for x and initializes its leader to itself.
2. find_set(x):
○ Accesses the leader node through the leader pointer.
3. union_sets(a, b):
○ Finds the leaders of the two sets.
○ Appends the second list to the first and updates the leader pointers
of all elements in the second list.
4. display_set(x):
○ Traverses the linked list starting from the leader and prints all
elements in the set.
Advantages:
● Simple to implement.
● Intuitive representation of sets.
Disadvantages:
● The find_set operation is slow (O(n)) due to the need to traverse the
entire list.
● Updating all elements in one set during union_sets can be inefficient.
Conclusion
The linked list representation of disjoint sets is useful for understanding the basic
concept, but it is inefficient compared to more advanced implementations like the
forest representation with union by rank and path compression, which achieve
nearly constant time for both union and find.
Forest representation
The forest representation is a highly efficient way to implement disjoint sets. In
this approach, each set is represented as a tree, where every node points to its
parent. The root of the tree serves as the representative of the set. Two critical
optimizations, union by rank and path compression, ensure that operations are
nearly constant time.
Key Concepts
Operations
1. make_set(x):
○ Creates a new set with a single element x.
○ The parent of x is set to itself, and its rank is initialized to 0.
2. find_set(x):
○ Returns the representative of the set containing x.
○ Uses path compression to make future queries faster by linking all
nodes on the path directly to the root.
3. union_sets(a, b):
○ Merges the sets containing a and b.
○ Uses union by rank to attach the smaller tree to the root of the larger
tree.
Code Implementation
#include <iostream>
#include <vector>
using namespace std;
if (a != b) {
// Union by rank
if (rank[a] < rank[b]) {
parent[a] = b; // Attach smaller tree to larger
tree
} else if (rank[a] > rank[b]) {
parent[b] = a;
} else {
parent[b] = a;
rank[a]++; // Increase rank when ranks are equal
}
}
}
// Example usage
int main() {
int n = 10; // Number of elements
for (int i = 1; i <= n; i++) {
make_set(i);
}
return 0;
}
Advantages
1. Efficiency:
○ Both find_set and union_sets operations run in nearly constant
time, specifically O(α(n))O(\alpha(n))O(α(n)), where
α(n)\alpha(n)α(n) is the inverse Ackermann function (extremely small
for all practical inputs).
2. Scalability:
○ Works well for large datasets and dynamic connectivity problems.
Example Execution
● Initial State:
○ make_set(1), make_set(2), ..., make_set(7) create individual
sets:
■ parent: [1, 2, 3, 4, 5, 6, 7]
■ rank: [0, 0, 0, 0, 0, 0, 0]
● Union Operations:
○ union_sets(1, 2): Attach 2 to 1, increase rank of 1:
■ parent: [1, 1, 3, 4, 5, 6, 7]
■ rank: [1, 0, 0, 0, 0, 0, 0]
○ union_sets(2, 3): Attach 3 to 1:
■ parent: [1, 1, 1, 4, 5, 6, 7]
○ union_sets(5, 6) and union_sets(4, 5): Merge 4, 5, 6 into
one set:
■ parent: [1, 1, 1, 4, 4, 4, 7]
● Path Compression:
○ After find_set(3), the parent of 3 points directly to 1.
This diagram shows three disjoint sets represented as trees in a forest structure:
Optimizations:
Applications
This representation is widely used in scenarios where quick union and find
operations are critical.
The electrical cables are expensive, and digging ditches for the cables,
or stretching the cables in the air is expensive as well. The terrain can
certainly be a challenge, and then there is perhaps a future cost for
maintenance that is different depending on where the cables end up.
After such a graph is created, the Minimum Spanning Tree (MST) can be
found, and that will be the most effective way to connect these villages
to the electrical grid.
And this is actually what the first MST algorithm (Borůvka's algorithm)
was made for in 1926: To find the best way to connect the historical
region of Moravia, in the Check Republic, to the electrical grid.
Ref:
https://www.w3schools.com/dsa/dsa_theory_mst_minspantree.php
Spanning Tree:
A spanning tree is a tree in which we have N nodes(i.e. All the nodes present in
the original graph) and N-1 edges and all nodes are reachable from each other.
For the above graph, if we try to draw a spanning tree, the following illustration
will be one:
We can draw more spanning trees for the given graph. Two of them are like the
following:
Note: Point to remember is that a graph may have more than one spanning
trees.
All the above spanning trees contain some edge weights. For each of them, if
we add the edge weights we can get the sum for that particular tree. Now, let’s
try to figure out the minimum spanning tree:
Among all possible spanning trees of a graph, the minimum spanning tree is the
one for which the sum of all the edge weights is the minimum.
Let’s understand the definition using the given graph drawn above. Until now,
for the given graph we have drawn three spanning trees with the sum of edge
weights 18, 24, and 18. If we can draw all possible spanning trees, we will find
that the following spanning tree with the minimum sum of edge weights 16 is
the minimum spanning tree for the given graph:
Note: There may exist multiple minimum spanning trees for a graph like a graph
may have multiple spanning trees.
Prims MST
How it works:
1. Choose a random vertex as the starting point, and include it as the first
vertex in the MST.
2. Compare the edges going out from the MST. Choose the edge with the
lowest weight that connects a vertex among the MST vertices to a vertex
outside the MST.
3. Add that edge and vertex to the MST.
4. Keep doing step 2 and 3 until all vertices belong to the MST.
In Prim’s Algorithm, we will start with an arbitrary node and mark it.
In each iteration we will mark a new vertex that is adjacent to the one that we have already
marked.
As a greedy algorithm, Prim’s algorithm will select the cheapest edge and mark the vertex. So we
will simply choose the edge with weight 1.
In the next iteration we have three options, edges with weight 2, 3 and 4.
So, we will select the edge with weight 2 and mark the vertex
So we will select the edge with weight 4 and we end up with the minimum spanning tree of total cost
7 ( = 1 + 2 +4).
Unset
#include <iostream>
#include <vector>
#include <queue>
#include <functional>
#include <utility>
bool marked[MAX];
int y;
PII p;
Q.push(make_pair(0, x));
while(!Q.empty())
{
// Select the edge with minimum weight
p = Q.top();
Q.pop();
x = p.second;
if(marked[x] == true)
continue;
minimumCost += p.first;
marked[x] = true;
y = adj[x][i].second;
if(marked[y] == false)
Q.push(adj[x][i]);
return minimumCost;
int main()
adj[x].push_back(make_pair(weight, y));
adj[y].push_back(make_pair(weight, x));
minimumCost = prim(1);
return 0;
Illustration:
https://www.cs.usfca.edu/~galles/visualization/Kruskal.html
[1]
Kruskal's algorithm finds a minimum spanning forest of an undirected
edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a
greedy algorithm that in each step adds to the forest the lowest-weight edge that will not
[2]
form a cycle. The key steps of the algorithm are sorting and the use of a disjoint-set
data structure to detect cycles. Its running time is dominated by the time to sort all of the
graph edges by their weight.
A minimum spanning tree of a connected weighted graph is a connected subgraph,
without cycles, for which the sum of the weights of all the edges in the subgraph is
minimal. For a disconnected graph, a minimum spanning forest is composed of a
minimum spanning tree for each connected component.
Kruskal's algorithm is a well-known algorithm for finding the minimum spanning tree of
a graph. It is a greedy algorithm that makes use of the fact that the edges of a minimum
spanning tree must form a subset of the edges of any other spanning tree.
The time complexity of Kruskal's Algorithm is O(ElogE), where E is the number of edges
in the graph. This complexity is because the algorithm uses a priority queue with a time
complexity of O(logE). However, the space complexity of the algorithm is O(E), which is
relatively high.
Kruskal's algorithm is popular in computer science for finding the minimum spanning
tree in a graph. A greedy algorithm selects the cheapest edge that does not form a cycle
in the graph. The following are some of the applications of Kruskal's algorithm:
● Network Design: Kruskal's algorithm can be used to design networks with the
least cost. It can be used to find the least expensive network connections that
can connect all the nodes in the network.
● Approximation Algorithms: Kruskal's algorithm can find approximate solutions
to several complex optimization problems. It can also solve the traveling
salesman problem, the knapsack problem, and other NP-hard optimization
problems.
● Image Segmentation: Image segmentation is partitioning an image into
multiple segments. Kruskal's algorithm can be used to break down an image
into its constituent parts in an efficient manner.
● Clustering: Clustering is grouping data points based on their similarity.
Conclusion
Kruskal's Algorithm is a greedy algorithm for finding the Minimum Spanning Tree (MST)
of a connected, weighted graph. It works by selecting the edges with the smallest
weights and adding them to the spanning tree, provided they do not form a cycle.
Steps of Kruskal's Algorithm
● List all the edges in the graph and sort them in ascending order of their
weights.
● This ensures that the edges with the smallest weights are considered first.
● Continue adding edges until the spanning tree has exactly (V - 1) edges, where
V is the number of vertices.
Example Walkthrough
Graph:
Vertices: {A, B, C, D}
Step-by-Step Execution:
2. Initialize MST:
4. Final MST:
Complexity Analysis
1. Sorting Edges:
2. Union-Find Operations:
Ref: https://www.simplilearn.com/tutorials/data-structure-tutorial/kruskal-algorithm
Algorithm[edit]
The algorithm performs the following steps:
Pseudocode[edit]
The following code is implemented with a disjoint-set data structure. It represents the
forest F as a set of undirected edges, and uses the disjoint-set data structure to
efficiently determine whether two vertices are part of the same tree.
algorithm Kruskal(G) is
F:= ∅
MAKE-SET(v)
F := F ∪ { {u, v} }
UNION(FIND-SET(u), FIND-SET(v))
return F
Complexity[edit]
For a graph with E edges and V vertices, Kruskal's algorithm can be shown to run in
time O(E log E) time, with simple data structures. This time bound is often written
instead as O(E log V), which is equivalent for graphs with no isolated vertices, because
2
for these graphs V/2 ≤ E < V and the logarithms of V and E are again within a
constant factor of each other.
To achieve this bound, first sort the edges by weight using a comparison sort in O(E log
E) time. Once sorted, it is possible to loop through the edges in sorted order in constant
time per edge. Next, use a disjoint-set data structure, with a set of vertices for each
component, to keep track of which vertices are in which components. Creating this
structure, with a separate set for each vertex, takes V operations and O(V) time. The
final iteration through all edges performs two find operations and possibly one union
operation per edge. These operations take amortized time O(α(V)) time per operation,
giving worst-case total time O(E α(V)) for this loop, where α is the extremely slowly
growing inverse Ackermann function. This part of the time bound is much smaller than
the time for the sorting step, so the total time for the algorithm can be simplified to the
time for the sorting step.
In cases where the edges are already sorted, or where they have small enough integer
weight to allow integer sorting algorithms such as counting sort or radix sort to sort them
in linear time, the disjoint set operations are the slowest remaining part of the algorithm
and the total time is O(E α(V)).
Example[edit]
Image Description
Proof of correctness[edit]
The proof consists of two parts. First, it is proved that the algorithm produces a
spanning tree. Second, it is proved that the constructed spanning tree is of minimal
weight.
Example 1:
Input Format:
V = 5, edges = { {0, 1, 2}, {0, 3, 6}, {1, 2, 3}, {1, 3, 8}, {1, 4,
5}, {4, 2, 7}}
Result: 16
Explanation: The minimum spanning tree for the given graph is drawn
below:
Example 2:
Input Format:
V = 5,
edges = { {0, 1, 2}, {0, 2, 1}, {1, 2, 1}, {2, 3, 2}, {3, 4, 1}, {4,
2, 2}}
Result: 5
Disclaimer: Don't jump directly to the solution, try it out yourself first.
Problem link.
Solution:
In the previous article on the minimum spanning tree, we had already discussed that
there are two ways to find the minimum spanning tree for a given weighted and
undirected graph. Among those two algorithms, we have already discussed Prim’s
algorithm.
In this article, we will be discussing another algorithm, named Kruskal’s algorithm, that
is also useful in finding the minimum spanning tree.
Approach:
We will be implementing Kruskal’s algorithm using the Disjoint Set data structure that
we have previously learned.
Now, we know Disjoint Set provides two methods named findUPar()(This function helps
to find the ultimate parent of a particular node) and Union(This basically helps to add
the edges between two nodes). To know more about these functionalities, do refer to
the article on Disjoint Set.
● First, we need to extract the edge information(if not given already) from the
given adjacency list in the format of (wt, u, v) where u is the current node, v is
the adjacent node and wt is the weight of the edge between node u and v and
we will store the tuples in an array.
● Then the array must be sorted in the ascending order of the weights so that
while iterating we can get the edges with the minimum weights first.
● After that, we will iterate over the edge information, and for each tuple, we will
apply the following operation:
● First, we will take the two nodes u and v from the tuple and check if
the ultimate parents of both nodes are the same or not using the
findUPar() function provided by the Disjoint Set data structure.
● If the ultimate parents are the same, we need not do anything to
that edge as there already exists a path between the nodes and we
will continue to the next tuple.
● If the ultimate parents are different, we will add the weight of the
edge to our final answer(i.e. mstWt variable used in the following
code) and apply the union operation(i.e. either unionBySize(u, v)
or unionByRank(u, v)) with the nodes u and v. The union operation
is also provided by the Disjoint Set.
● Finally, we will get our answer (in the mstWt variable as used in the following
code) successfully.
Note: Points to remember if the graph is given as an adjacency list we must extract the
edge information first. As the graph contains bidirectional edges we can get a single
edge twice in our array (For example, (wt, u, v) and (wt, v, u), (5, 1, 2) and (5, 2, 1)). But
we should not worry about that as the Disjoint Set data structure will automatically
discard the duplicate one.
Note: This algorithm mainly contains the Disjoint Set data structure used to find the
minimum spanning tree of a given graph. So, we just need to know the data structure.
Ref:
https://takeuforward.org/data-structure/kruskals-algorithm-minimum-spanning-tree-g-47/
Unlike Prim's algorithm, Kruskal's algorithm can be used for such graphs that
are not connected, which means that it can find more than one MST, and that is
what we call a Minimum Spanning Forest.
Ref: https://www.w3schools.com/dsa/dsa_algo_mst_kruskal.php
With E
as the number of edges in our graph, the time complexity for Kruskal's
algorithm is
O(E⋅logE)
We get this time complexity because the edges must be sorted before Kruskal's
can start adding edges to the MST. Using a fast algorithm like Quick Sort or
Merge Sort gives us a time complexity of
O(E⋅logE)
After the edges are sorted, they are all checked one by one, to see if they will
create a cycle, and if not, they are added to the MST.
Although it looks like a lot of work to check if a cycle will be created using the
find method, and then to include an edge to the MST using the union method,
this can still be viewed as one operation. The reason we can see this as just one
operation is that it takes approximately constant time. That means that the time
this operation takes grows very little as the graph grows, and so it does actually
not contribute to the overall time complexity.
Since the time complexity for Kruskal's algorithm only varies with the number of
edges
, it is especially fast for sparse graphs where the ratio between the number of
edges and the number of vertices is relatively low.
Spanning tree[edit]
AD and CE are the shortest edges, with length 5, and AD has been arbitrarily chosen,
so it is highlighted.
CE is now the shortest edge that does not form a cycle, with length 5, so it is highlighted
as the second edge.
The next edge, DF with length 6, is highlighted using much the same method.
The next-shortest edges are AB and BE, both with length 7. AB is chosen arbitrarily,
and is highlighted. The edge BD has been highlighted in red, because there already
exists a path (in green) between B and D, so it would form a cycle (ABD) if it were
chosen.
The process continues to highlight the next-smallest edge, BE with length 7. Many more
edges are highlighted in red at this stage: BC because it would form the loop BCE, DE
because it would form the loop DEBA, and FE because it would form FEBAD.
Finally, the process finishes with the edge EG of length 9, and the minimum spanning
tree is found.
Minimality[edit]
We show that the following proposition P is true by induction: If F is the set of edges
chosen at any stage of the algorithm, then there is some minimum spanning tree that
contains F and none of the edges rejected by the algorithm.
ref: https://www.geeksforgeeks.org/kruskals-minimum-spanning-tree-algorithm-greedy-algo-2/
2. Pick the smallest edge. Check if it forms a cycle with the
3. Repeat step#2 until there are (V-1) edges in the spanning tree.
Kruskal’s algorithm to find the minimum cost spanning tree uses the
greedy approach. The Greedy Choice is to pick the smallest weight edge
that does not cause a cycle in the MST constructed so far. Let us
Input Graph:
The graph contains 9 vertices and 14 edges. So, the minimum spanning
tree formed will be having (9 – 1) = 8 edges.
After sorting:
Weigh Destinatio
Source
t n
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from the sorted list of edges
Step 6: Pick edge 8-6. Since including this edge results in the cycle,
discard it. Pick edge 2-3: No cycle is formed, include it.
Add edge 2-3 in the MST
Step 7: Pick edge 7-8. Since including this edge results in the cycle,
discard it. Pick edge 0-7. No cycle is formed, include it.
Add edge 0-7 in MST
Step 8: Pick edge 1-2. Since including this edge results in the cycle,
discard it. Pick edge 3-4. No cycle is formed, include it.
Add edge 3-4 in the MST
Note: Since the number of edges included in the MST equals to (V – 1), so
the algorithm stops here
Kruskal's Algorithm
Kruskal's algorithm is a minimum spanning tree algorithm that takes a
graph as input and finds the subset of the edges of that graph which
We start from the edges with the lowest weight and keep adding edges
until we reach our goal.
Choose the edge with the least weight, if there are more than 1, choose anyone
Choose the next shortest edge that doesn't create a cycle and add it
Repeat until you have a spanning tree
The most common way to find this out is an algorithm called Union FInd.
The Union-Find algorithm divides the vertices into clusters and allows
us to check if two vertices belong to the same cluster or not and hence
decide whether adding an edge creates a cycle.
Unset
KRUSKAL(G):
A = ∅
weight(u, v):
if FIND-SET(u) ≠ FIND-SET(v):
A = A ∪ {(u, v)}
UNION(u, v)
return A
Kruskal's algorithm
This algorithm was described by Joseph Bernard Kruskal, Jr. in 1956.
Kruskal's algorithm initially places all the nodes of the original graph isolated from
each other, to form a forest of single node trees, and then gradually merges these
trees, combining at each iteration any two of all the trees with some edge of the
original graph. Before the execution of the algorithm, all edges are sorted by weight (in
non-decreasing order). Then begins the process of unification: pick all edges from the
first to the last (in sorted order), and if the ends of the currently picked edge belong to
different subtrees, these subtrees are combined, and the edge is added to the answer.
After iterating through all the edges, all the vertices will belong to the same sub-tree,
and we will get the answer.
The simplest implementation
The following code directly implements the algorithm described above, and is having
O(MlogM+N2)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>M</mi><mo>+</mo><msup><mi>N</mi><mn>2</mn></msup><mo stretchy="false">)</mo></math>
O(MlogN)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>N</mi><mo stretchy="false">)</mo></math>
O(MlogM)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>M</mi><mo stretchy="false">)</mo></math>
maintained with the help of an array tree_id[] - for each vertex v, tree_id[v] stores
the number of the tree , to which v belongs. For each edge, whether it belongs to the
ends of different trees, can be determined in
O(1)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>
$O(N)$ by a simple pass through tree_id[] array. Given that the total number of merge
operations is
N−1
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>N</mi><mo>−</mo><mn>1</mn></math>
O(MlogN+N2)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>N</mi><mo>+</mo><msup><mi>N</mi><mn>2</mn></msup><mo stretchy="false">)</mo></math>
struct Edge {
int u, v, weight;
};
int n;
vector<Edge> edges;
int cost = 0;
vector<int> tree_id(n);
vector<Edge> result;
tree_id[i] = i;
sort(edges.begin(), edges.end());
if (tree_id[e.u] != tree_id[e.v]) {
cost += e.weight;
result.push_back(e);
if (tree_id[i] == old_id)
tree_id[i] = new_id;
Proof of correctness
Why does Kruskal's algorithm give us the correct result?
Ref: https://cp-algorithms.com/graph/mst_kruskal.html#practice-problems
If the original graph was connected, then also the resulting graph will be connected.
Because otherwise there would be two components that could be connected with at
least one edge. Though this is impossible, because Kruskal would have chosen one of
these edges, since the ids of the components are different. Also the resulting graph
doesn't contain any cycles, since we forbid this explicitly in the algorithm. Therefore
the algorithm generates a spanning tree.
F
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>
$F$ is a set of edges chosen by the algorithm at any stage in the algorithm, then there
exists a MST that contains all edges of
F
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>
The proposal is obviously true at the beginning, the empty set is a subset of any MST.
F
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>
F
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>
$F$ and
e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>
If
e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>
$e$ generates a cycle, then we don't add it, and so the proposal is still true after this step.
In case that
T
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi></math>
In case
T
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi></math>
e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>
$e$ , then
T+e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi><mo>+</mo><mi>e</mi></math>
C
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi></math>
F
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>
T−f+e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi><mo>−</mo><mi>f</mi><mo>+</mo><mi>e</mi></math>
f
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>
e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>
f
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>
$f$ earlier. It also cannot have a bigger weight, since that would make the total weight of
T−f+e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi><mo>−</mo><mi>f</mi><mo>+</mo><mi>e</mi></math>
T
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi></math>
T
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi></math>
e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>
f
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>
$f$ . Therefore
T−f+e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>T</mi><mo>−</mo><mi>f</mi><mo>+</mo><mi>e</mi></math>
F+e
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi><mo>+</mo><mi>e</mi></math>
This proves the proposal. Which means that after iterating over all edges the resulting
edge set will be connected, and will be contained in a MST, which means that it has to
be a MST already.
For an explanation of the MST problem and the Kruskal algorithm, first see the main
article on Kruskal's algorithm.
In this article we will consider the data structure "Disjoint Set Union" for implementing
Kruskal's algorithm, which will allow the algorithm to achieve the time complexity of
O(MlogN)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>N</mi><mo stretchy="false">)</mo></math>
Description
Just as in the simple version of the Kruskal algorithm, we sort all the edges of the
graph in non-decreasing order of weights. Then put each vertex in its own tree (i.e. its
set) via calls to the make_set function - it will take a total of
O(N)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>N</mi><mo stretchy="false">)</mo></math>
$O(N)$ . We iterate through all the edges (in sorted order) and for each edge determine
whether the ends belong to different trees (with two find_set calls in
O(1)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>
$O(1)$ each). Finally, we need to perform the union of the two trees (sets), for which the
DSU union_sets function will be called - also in
O(1)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>
O(MlogN+N+M)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>N</mi><mo>+</mo><mi>N</mi><mo>+</mo><mi>M</mi><mo stretchy="false">)</mo></math>
O(MlogN)
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>M</mi><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>N</mi><mo stretchy="false">)</mo></math>
Implementation
Here is an implementation of Kruskal's algorithm with Union by Rank.
void make_set(int v) {
parent[v] = v;
rank[v] = 0;
int find_set(int v) {
if (v == parent[v])
return v;
a = find_set(a);
b = find_set(b);
if (a != b) {
swap(a, b);
parent[b] = a;
if (rank[a] == rank[b])
rank[a]++;
struct Edge {
int u, v, weight;
};
int n;
vector<Edge> edges;
int cost = 0;
vector<Edge> result;
parent.resize(n);
rank.resize(n);
make_set(i);
sort(edges.begin(), edges.end());
for (Edge e : edges) {
if (find_set(e.u) != find_set(e.v)) {
cost += e.weight;
result.push_back(e);
union_sets(e.u, e.v);
N−1
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>N</mi><mo>−</mo><mn>1</mn></math>
$N-1$ edges, we can stop the for loop once we found that many.
Dijkstra's algorithm
is an algorithm for finding the shortest paths between nodes in a weighted graph,
which may represent, for example, a road network. It was conceived by computer
[4][5][6]
scientist Edsger W. Dijkstra in 1956 and published three years later.
Dijkstra's algorithm finds the shortest path from a given source node to every
[7]: 196–206
other node. It can be used to find the shortest path to a specific
destination node, by terminating the algorithm after determining the shortest path
to the destination node. For example, if the nodes of the graph represent cities,
and the costs of edges represent the average distances between pairs of cities
connected by a direct road, then Dijkstra's algorithm can be used to find the
shortest route between one city and all other cities. A common application of
shortest path algorithms is network routing protocols, most notably IS-IS
(Intermediate System to Intermediate System) and OSPF (Open Shortest Path
First). It is also employed as a subroutine in algorithms such as Johnson's
algorithm.
The algorithm uses a min-priority queue data structure for selecting the shortest
paths known so far. Before more advanced priority queue structures were
discovered, Dijkstra's original algorithm ran in
Θ(|V|2)
time, where
|V|
[8][9]
is the number of nodes. Fredman & Tarjan 1984 proposed a Fibonacci heap
priority queue to optimize the running time complexity to
Θ(|E|+|V|log|V|)
How it works:
1. Set initial distances for all vertices: 0 for the source vertex, and
infinity for all the other.
2. Choose the unvisited vertex with the shortest distance from the
start to be the current vertex. So the algorithm will always start
with the source as the current vertex.
3. For each of the current vertex's unvisited neighbor vertices,
calculate the distance from the source and update the distance if
the new, calculated, distance is lower.
4. We are now done with the current vertex, so we mark it as
visited. A visited vertex is not checked again.
5. Go back to step 2 to choose a new current vertex, and keep
repeating these steps until all vertices are visited.
6. In the end we are left with the shortest path from the source
vertex to every other vertex in the graph.
Let’s apply Dijkstra’s Algorithm for the graph given below, and find the shortest path
from node A to node C:
Solution:
1. All the distances from node A to the rest of the nodes is ∞.
2. Calculating the distance between node A and the immediate nodes (node B & node
D):
For node B,
Node A to Node B = 3
For node D,
Node A to Node D = 8
3. Choose the node with the shortest distance to be the current node from unvisited
nodes, i.e., node B. Calculating the distance between node B and the immediate nodes:
For node E,
4. Choose the node with the shortest distance to be the current node from unvisited
nodes, i.e., node D. Calculating the distance between node D and the immediate nodes:
For node E,
For node F,
5. Choose the node with the shortest distance to be the current node from unvisited
nodes, i.e., node E. Calculating the distance between node E and the immediate nodes:
For node C,
For node F,
6. Choose the node with the shortest distance to be the current node from unvisited
nodes, i.e., node F. Calculating the distance between node F and the immediate nodes:
For node C,
Node F to Node C = 10+3 = 13 ([18 < 13] FALSE: So, Change the previous value)
So, after performing all the steps, we have the shortest path from node A to node C,
i.e., a value of 13 units.
Dijkstra's Algorithm
Dijkstra's algorithm is a method for finding the shortest path from a starting node to all
other nodes in a weighted graph. Here's a simple explanation with an example:
How It Works
1. Assign a tentative distance value to every node: set the starting node to 0 and
all other nodes to infinity
2. Set the starting node as current and mark it as visited
3. For the current node, consider all unvisited neighbors and calculate their
tentative distances
4. When done considering all neighbors, mark the current node as visited
5. Select the unvisited node with the smallest tentative distance as the new
current node
6. Repeat steps 3-5 until all nodes are visited
Example
Where A is our starting node, and each edge has a weight (the number shown).
Step-by-Step Execution:
Initialize:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Final Result:
Ref:
https://medium.com/basecs/finding-the-shortest-path-with-a-little-help-from-dijkstra
-613149fbdc8e
Simulator:
https://www.cs.usfca.edu/~galles/visualization/Dijkstra.html
The two algorithms are compared which are Dijkstra and Bellman-Ford algorithms to
conclude which of them is more efficient for finding the shortest path between two
vertices. Our results show that the Dijkstra algorithm is much faster than the algorithm
of the Bellman ford and commonly used in real-time applications.
The key advantage of Bellman-Ford over Dijkstra's algorithm is its ability to handle
negative weights and detect negative cycles, though it has a higher time complexity of
O(|V|×|E|) compared to Dijkstra's O(|E| + |V|log|V|) with a priority queue implementation.
How It Works
1. Initialize distances from the source to all vertices as infinite and distance to the
source itself as 0
2. Relax all edges |V| - 1 times, where |V| is the number of vertices
○ For each edge (u, v) with weight w, if distance[u] + w < distance[v], then
update distance[v] = distance[u] + w
3. Check for negative weight cycles by trying to relax edges one more time
○ If any distance gets updated, then there is a negative weight cycle
Shortest Paths
In many applications one wants to obtain the shortest path from a to b. Depending on the context, the length of the
path does not necessarily have to be the length in meter: One can as well look at the cost of a path – both if we have
to pay for using it – or if we receive some.
In general we speak of cost. Therefore one assigns cost to each part of the path – also called "edge".
Dijkstra's Algorithm computes shortest – or cheapest paths, if all cost are positive numbers. However, if one allows
negative numbers, the algorithm will fail.
The Bellman-Ford Algorithm by contrast can also deal with negative cost.
These can for example occur when a taxi driver receives more money for a tour than he spends on fuel. If he does
not transport somebody, his cost are positive.
The Bellman-Ford Algorithm computes the cost of the cheapest paths from a starting node to all other nodes in the
graph. Thus, he can also construct the paths afterwards.
The algorithm proceeds in an interactive manner, by beginning with a bad estimate of the cost and then improving it
until the correct value is found.
Afterwards, the algorithm checks every edge for the following condition: Are the cost of the source of the edge
plus the cost for using the edge smaller than the cost of the edge's target?
If this is the case, we have found a short-cut: It is more profitable to use the edge which was just checked, than
using the path used so far. Therefore the cost of the edge's target get updated: They are set to the cost of the source
plus the cost for using the edge (compare example on the right).
Looking at all edges of the graph and updating the cost of the nodes is called a phase. Unfortunately, it is not
sufficient to look at all edges only once. After the first phase, the cost of all nodes for which the shortest path only
uses one edge have been calculated correctly. After two phases all paths that use at most two edges have been
computed correctly, and so on.
The green path from the starting node is the cheapest path. It uses 3 edges.
How many phases ware necessary? To answer this question, the observation that a shortest path has to use less
edges than there are nodes in the graph. Thus, we need at most one phase less than the number of nodes in the
graph. A shortest path that uses more edges than the number of nodes would visit some node twice and thus build a
circle.
At the end of the algorithm, the shortest path to each node can be constructed by going backwards using the
predecessor edges until the starting node is reached.
If the graph contains a circle with a negative sum of edge weights – a Negative Circle, the algorithm probably will not
find a cheapest path.
As can be seen in the example on the right, paths in this case can be infinitely cheap – one keeps on going through
the circle.
This problem occurs if the negative circle can be reached from the starting node. Luckily, the algorithm can detect
whether a negative circle exists. This is checked in the last step of the algorithm.
A negative circle can be reached if and only if after iterating all phases, one can still find a short-cut. Therefore, at
the end the algorithm checks one more time for all edges whether the cost of the source node plus the cost of the
edge are less than the cost of the target node. If this is the case for an edge, the message "Negative Circle found" is
returned.
One can even find the negative circle with the help of the predecessor edges: One just goes back until one traversed
a circle (that had negative weight).
return distance
Unset
Unset
BEGIN
d(v[1]) ← 0
FOR j = 2,..,n DO
d(v[j]) ← ∞
FOR i = 1,..,(|V|-1) DO
FOR ALL (u,v) in E DO
d(v) ← min(d(v), d(u) + l(u,v))
FOR ALL (u,v) in E DO
IF d(v) > d(u) + l(u,v) DO
Message: "Negative Circle"
END
At the beginning, the value ∞ is assigned to each node. We need n steps for that.
Then we do the n-1 phases of the algorithm – one phase less than the number of nodes. In each phase,
all edges of the graph are checked, and the distance value of the target node may be changed. We can
interpret this check and assignment of a new value as one step and therefore have m steps in each
phase. In total all phases together require m · (n-1) steps.
Afterwards, the algorithm checks whether there is a negative circle, for which he looks at each edge once.
Altogether he needs m steps for the check.
The total running time of the algorithm is of the magnitude m · n , as the n steps at the beginning
and the m steps at the end can be neglected compared to the m · (n-1) steps for the phases.
Simulator
https://algorithms.discrete.ma.tum.de/graph-algorithms/spp-bellman-ford/index_en.html
Let's consider that you have n activities with their start and finish
times, the objective is to find solution set having maximum number of
non-conflicting activities that can be executed in a single time frame,
assuming that only one person or machine is available for execution.
Step 2: Select the first activity from sorted array act[] and add it to
sol[] array.
Step 4: If the start time of the currently selected activity is greater than
or equal to the finish time of previously selected activity, then add it to
the sol[] array.
5 9 a1
1 2 a2
3 4 a3
0 6 a4
5 7 a5
8 9 a6
1 2 a2
3 4 a3
0 6 a4
5 7 a5
5 9 a1
8 9 a6
Step 2: Select the first activity from sorted array act[] and add it to
the sol[] array, thus sol = {a2}.
Step 3: Repeat the steps 4 and 5 for the remaining activities in act[].
Step 4: If the start time of the currently selected activity is greater than
or equal to the finish time of the previously selected activity, then add
it to sol[].
1. Select activity a3. Since the start time of a3 is greater than the
finish time of a2 (i.e. s(a3) > f(a2)), we add a3 to the solution
set. Thus sol = {a2, a3}.
2. Select a4. Since s(a4) < f(a3), it is not added to the solution
set.
3. Select a5. Since s(a5) > f(a3), a5 gets added to solution set.
Thus sol = {a2, a3, a5}
4. Select a1. Since s(a1) < f(a5), a1 is not added to the solution
set.
5. Select a6. a6 is added to the solution set since s(a6) > f(a5).
Thus sol = {a2, a3, a5, a6}.
(1,2)
(3,4)
(5,7)
(8,9)
Unset
arr = [[3, 4], [2, 5], [1, 3], [5, 9], [0, 7], [11, 12],
[8, 10]]
n = len(arr)
Output
Unset
● Complexity Analysis:
Complexity Analysis:
Unset
Case 1:
Case 1:
A greedy algorithm works for the activity selection problem because of the following
properties of the problem:
1. The problem has the 'greedy-choice property', which means that the locally
optimal choice (the activity with the earliest finish time) leads to a globally
optimal solution.
2. The problem has the 'optimal substructure' property, which means that an
optimal solution to the problem can be constructed efficiently from optimal
solutions to subproblems.
The greedy algorithm makes the locally optimal choice at each step by selecting the
activity with the earliest finish time. This choice ensures that there is no overlap
between the selected activities and that the maximum number of activities can be
completed. Since the problem has the 'optimal substructure' property, a globally optimal
solution can be constructed efficiently by making locally optimal choices at each step.
Introduction to Fractional Knapsack
Problem
Given two arrays named value and weight of size
each, where value[i] represents the value and weight[i] represents the weight associated with
the
ith
The task is to pick some items (possibly all of them) such that the sum of values of all the items
is maximized and the sum of weights of all the items is at most W.
As the name Fractional suggests, here we are allowed to take a fraction of an item. For
example, if the weight of an
ith
item is
x
units, then we can have the following choices :
Note that if we are taking a fraction of an item, we get the value in the same fraction.
Example
Input :
Unset
Unset
162
Explanation :
Let’s say we take the
1st
item as a whole, so we obtain a value of40and the capacity of the knapsack is now reduced
to50. Then we pick the
3rd
item as a whole, after which we obtain a value of120and the capacity of the knapsack is now
reduced to30.
4th
7050×30=42
to our value due to which we obtained a value of 162 and we are left with no space in the
knapsack hence we can conclude the result to be 162.
It can be proved that there is no other choice of picking item(s) such that the value of the chosen
item(s) exceeds 162.
Techniques to Solve the Fractional
Knapsack Problem
Brute Force Approach
Since for an item of weight
, we have
x+1
choices
i.e.
taking 0 Kg. of the item, taking 1 Kg. of the item, taking 2 Kg. of the item, …., taking
Kg. of the item. Therefore, with the brute force approach, we can check the value of all the
possibilities from the set of items provided.
Example :
1 1 3
2 1 6
3 1 5
So, as per the brute force algorithm, we can have the following choices :
In the below-given matrix, the value in the
celli,j
jth
ith
choice.
0 0 0 0
0 0 1 5
0 1 0 6
0 1 1 11
1 0 0 3
1 0 1 8
1 1 0 9
1 1 1 14
i.e.
Greedy Approach
We have seen that the brute force approach works every time but on a reasonable large input, it
will take a very huge amount of time to calculate the answer.
So, we can opt for a greedy algorithm. We can have several potentially correct strategies, some
of the obvious ones are :
Sorting: Initially sort the array based on the profit/weight ratio. The sorted
array will be {{60, 10}, {100, 20}, {120, 30}}.
Iteration:
Follow the given steps to solve the problem using the above approach:
● Return res.
Pseudocode
Unset
n = weight.length
For i = 0 To i = n - 1:
wt = weight[i]
val = value[i]
valuePerUnitWeight = value[i]/weight[i]
Else:
capacity = 0
break
return ans
end function
A thief goes to a house to steal n objects having price values but his
money he can make from them. The bonus is that the object can be
broken into pieces. So lets see how can help the thief!
So now you see, the entire aim of this problem is to help the thief
efficiently steal the items so they fit in his knapsack while giving him
the maximum value. But this isn’t the only application of the Knapsack
knapsack.
3. We need to select the objects such that the total weight of the
maximized.
Example
Problem Statement: We have 5 objects having weights
maximized.
The maximum price comes out to be 500. One combination to get that
is when we take the whole items 3,1,5,2 and a 2/7th fraction of item 4.
one combination.
P=340+((6/7)*140)=340+120=460.
Now the remaining weight has become 0 so we don’t consider item 5.
The total price comes out to be 460 which is less than the maximum
reach the most optimal solution. So it is safe to say that the Brute
comes to our mind is the greedy by price method, i.e, we take the items
Next we take 1/5 fraction of item 2 (as remaining weight is less than
P=410+((1/5)*100)=410+20=430.
Greedy by price calculation
So the total max price is 430 which is still less than 500. Hence we can
Greedy by Weight
In the greedy by weight method, we rank the weights in increasing
order of the weight. We first take the item having the smallest weight
and then so on. This is so that we can fit the maximum number of
this case but this approach does not always give the most
optimal solution!
We will look at another approach now which will always give the most
optimal solution.
greedy by price per unit approach. Price per unit is the ratio of
price/weight for each item. The item having the highest value of the
ratio is considered first and so on. The steps for implementing this are:
Calculating ratios
We calculate the ratio by dividing the price of every item by the weight
the ratio
From the above table calculations in step 1, we now arrange in such a
manner that the item having the largest ratio value is first and the one
3>1>5>2>4
The item having a greater ratio value will be considered first and so
on.
Note that here both the items 2 and 4 have the same ratio values. We
can take any item first and it won’t make any difference as the ratio
So this is the order in which we will select the item: 3 -> 1 -> 5 -> 2 ->
4.
The Xi column in the table is for the fraction of the weight that we will
product of the fraction selected and the weight, i.e. the total weight of
the item to be considered and the PiXi does the same for price.
item.
Otherwise if the total remaining weight W is less than the
total price.
The next item is item 1. Again Wi<W so Xi=1. We calculate the rest of
The next item is item 5. Again Wi<W so Xi=1. We calculate the rest of
The next item is item 2. Again Wi<W so Xi=1. We calculate the rest of
The last item that we have is item 4. The weight of item 4 is 70 but the
Xi=20/70=2/7.
As we are taking the fraction of item 4, the weight and price gets
divided accordingly.
So the weight to be considered will be 70*(2/7)=20 and price will be
140*(2/7)=40.
The total price P comes out to be 500 which is the optimal solution.
From the tabular calculation we can see that we have taken the whole
{1,1,1,2/7,1}.
Overview
In the article we have seen 4 methods to solve the Fractional Knapsack
Problem:
The second and third methods may not always give the most optimal
solution.
The fourth method always gives the most optimal solution and is an
You can find the code to implement the Fractional Knapsack Problem
GitHub- https://github.com/riya1620/FractionalKnapsack
Ref:
https://medium.com/@riya.tendulkar/how-to-solve-the-fractional-knapsack-p
roblem-d2c11b56aa38
Ref: https://www.geeksforgeeks.org/fractional-knapsack-problem/
Job Sequencing Problem
Ref: https://www.geeksforgeeks.org/job-sequencing-problem/
Given three arrays id[], deadline[], profit[], where each job i is associated
with id[i], deadline[i], and profit[i]. Each job takes 1 unit of time to complete,
and only one job can be scheduled at a time. You will earn the profit
associated with a job only if it is completed by its deadline. The task is to
find the maximum profit that can be gained by completing the jobs and the
count of jobs completed to earn the maximum profit.
Examples:
Problem Statement: You are given a set of N jobs where each job comes with a
deadline and profit. The profit can only be earned upon completing the job within its
deadline. Find the number of jobs done and the maximum profit that can be obtained.
Each job takes a single unit of time and only one job can be performed at a time.
Examples
Example 1:
Output: 2 60
Explanation: The 3rd job with a deadline 1 is performed during the first unit of time .The
1st job is performed during the second unit of time as its deadline is 4.
Profit = 40 + 20 = 60
Example 2:
Output: 2 127
Explanation: The first and third job both having a deadline 2 give the highest profit.
Profit = 100 + 27 = 127
Suppose, if there are n number of jobs, then to find all the sequences of jobs we need to
calculate all the possible subsets and a total of
2n
subsets will be created. Thus, the time complexity of this solution would be
O(2n)
To optimize this algorithm, we can make use of a greedy approach that produces an
optimal result, which works by selecting the best and the most profitable option
available at the moment.
\Ref:
https://medium.com/@dillihangrae/job-sequence-with-deadlines-greedy-algorithm-18d3
734d6d1c
Naive string matching algorithm
REf: https://www.geeksforgeeks.org/naive-algorithm-for-pattern-searching/
Given text string with length n and a pattern with length m, the task is to
prints all occurrences of pattern in text.
Note: You may assume that n > m.
Examples:
m = len(pattern)
n = len(text)
j = 0
j += 1
if j == m:
● When the pattern is found at the very beginning of the text (or very
early on).
the pattern.
● When the pattern doesn’t appear in the text at all or appears only
● In the worst case, for each position in the text, the algorithm may
● Comparison: Start from the start of the text and examine every character
with the pattern
● Repeat: Keep repeating this system till you discover a match or attain the
end of the text.
● Result: When a suit is observed, or you look at the textual content, you
know whether the pattern exists in the text.
Algorithm
1. There will be two loops: the outer loop will range from i=0 to i≤n-m, and the
inner loop will range from j=0 to j<m, where ‘m’ is the length of the input
pattern and n is the length of the text string.
2. Now, At the time of searching algorithm in a window, there will be two cases:
○ Case 1: If a match is found, we will match the entire pattern with
the current window of the text string. And if found the pattern
string is found in the current window after traversing, print the
result. Else traverse in the next window.
Given a text T[0. . .n-1] and a pattern P[0. . .m-1], write a function search(char P[], char
T[]) that prints all occurrences of P[] present in T[] using Rabin Karp algorithm. You may
assume that n > m.
Examples:
Like the Naive Algorithm, the Rabin-Karp algorithm also check every
substring. But unlike the Naive algorithm, the Rabin Karp algorithm matches
the hash value of the pattern with the hash value of the current substring
of text, and if the hash values match then only it starts matching individual
characters. So Rabin Karp algorithm needs to calculate hash values for the
following strings.
● Pattern itself
● All the substrings of the text of length m which is the size of pattern.
Step-by-step approach:
● Select a prime number ‘p‘ as the modulus. This choice helps avoid overflow
issues and ensures a good distribution of hash values.
● Choose a base ‘b‘ (usually a prime number as well), which is often the size of the
character set (e.g., 256 for ASCII characters).
● Start by calculating the hash value for the first substring of the text that is the
same length as the pattern.
● To slide the pattern one position to the right, you remove the contribution of the
leftmost character and add the contribution of the new character on the right.
● The formula for updating the hash value when moving from position ‘i’ to ‘i+1’ is:
The idea behind the string hashing is the following: we map each string into an integer
and compare those instead of the strings. Doing this allows us to reduce the execution
time of the string comparison to O(1).
Hash Function
First of all, the algorithm is only as good as its hash function. If a hash function which
results in many false positives is used, then character comparisons will be done far too
often to deem this method any more performant than a naive approach.
Secondly, you might have noticed that a new hash is computed each time the substring
window traverses through the text. This is highly inefficient as it results in the same
performance (if not worse) as a naive approach.
Both these problems can be solved using polynomial hashing with additions and
multiplications. Although this is not a Rabin-fingerprint, it works equally well.
Hashing
Ref: https://brilliant.org/wiki/rabin-karp-algorithm/
Say you have a hash function that maps each letter in the alphabet to a unique prime number,
[prime_a…prime_z][prime_a…primez_]. You are searching a text TT of length nn for a word
WW of length mm. Let’s say that the text is the following string “abcdefghijklmnopqrstuvwxyz”
and the word is “fgh”. How can we use a rolling hash to search for “fgh”?
At each step, we compare three letters of the text to the three letters of the word. At the first
step,
Similarly, the hash value of “fgh” is prime_f×prime_g×prime_h. Compare the hash value of “abc”
with “fgh” and see that the numbers do not match.
To move onto the next iteration, we need to calculate the hash value of “bcd”. How can we do
this? We could compute
prime_b×prime_c×prime_d,
We know
prime_a×prime_b×prime_c
If we divide out the prime_a and multiply in the prime_d,we can get the hash value of “bcd”. We
compare this value with “fgh”, and so on
We can compute the polynomial hash with multiplications and additions as shown
below.
Example
For the sake of brevity, let’s use integers directly instead of character conversions in this
example.
Given the pattern ‘135’ and a text ‘2135’ with a base b = 10.
Next, we compute the hash of the first m = 3 characters of the text which is ‘213’.
This is clearly not a match. So, let’s slide the window by dropping the first character of
the previous window and adding the next character to it. The window now represents
‘135’.
Now our hashes are a match and the algorithm essentially comes to an end.
Rolling Hash
Notice that we had to compute the entire hash of ‘213’ and ‘135’ after moving the sliding
window. This is undesirable as we had to compute the hash of some integers we had
already accounted for in the previous window.
The rolling hash function can effectively eliminate these additional computations by
acknowledging that the hash of a new window skips the first character of a previous
window and adds the computation of a new character.
In theory, we can get rid of the hash value of the skipped character, multiply the
resulting value by the base (to restore the correct order of the exponents of the previous
untouched characters), and finally add the value of the new character.
Therefore, we can compute the hash of the new window by using the equation shown
below.
Using the previous example of moving from ‘213’ to ‘135’, we can plug in the values to
get the new hash.
By using the rolling hash function, we can calculate the hash of each sliding window in
linear time. This is the main component of the Rabin-Karp algorithm that provides its
best case time complexity of O(n+m).
Modular Arithmetic
All math in the Rabin-Karp algorithm needs to be done in modulo Q to avoid
manipulating large H values and integer overflows. This is done at the expense of
increased hash collisions, also known as spurious hits.
The value for Q would usually be a large prime number — as large as it can
be without compromising arithmetic performance. The smaller the value of
Q, the higher the chances of spurious hits.
There is a potential problem with the above approach. To understand what that is, let’s
have a look at a simple JavaScript code snippet of it.
Notice that there are two multiplications and an addition done inside the loop. Not only
is that inefficient, but it also fails to prevent integer overflows as larger sums are
calculated before the modulo operator is even used. We can overcome this problem by
using Horner’s method.
Horner’s Method
Horner’s method simplifies the process of evaluating a polynomial by dividing it into
monomials (polynomials of the 1st degree).
Using this method, we can eliminate one multiplication from our previous
implementation. This leaves us with only one multiplication and one addition at each
step in the loop, which in turn allows us to prevent integer overflows.
Complexity
Given a word of length m and a text of length n, the best case time complexity is O(n +
m) and space complexity is O(m). The worst case time complexity is O(nm). This can
occur when an extremely poor performing hash function is chosen, but a good hash
function, such as polynomial hashing, can fix this problem.