Combinedm
Combinedm
Optimal merge pattern is a pattern that relates to the merging of two or more sorted files into a
single sorted files. This type of merging can be done by two way merging method. When two or
more sorted files are to be merged altogether to form a single file, the minimum computations are
done to reach this file are known as Optimal Merge Pattern
If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4
files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to
get X3 as the output file.
If we have two files of sizes m and n, the total computation time will be O(m+n).
for i := 1 to n -1 do
An optimal merge pattern corresponds to a binary merge tree with minimum weighted external
path length. The function tree algorithm uses the greedy rule to get a two- way merge tree for n
files. The algorithm contains an input list of n trees. There are three field child lchild, rchild, and
weight in each node of the tree.
Initially, each tree in a list contains just one node. This external node has lchild and rchild field
zero whereas weight is the length of one of the n files to be merged. For any tree in the list with
root node t, t = it represents the weight that gives the length of the merged file.
There are two functions least (list) and insert (list, t) in a function tree. Least (list) obtains a tree
in lists whose root has the least weight and return a pointer to this tree. This tree is deleted from the
list. Function insert (list, t) inserts the tree with root t into the list.
The main for loop in this algorithm is executed in n-1 times. If the list is kept in increasing order
according to the weight value in the roots, then least (list) needs only O(1) time and insert (list, t)
can be performed in O(n) time. Hence, the total time taken is O (n 2). If the list is represented as a
min-heap in which the root value is less than or equal to the values of its children, then least (list)
and insert (list, t) can be done in O (log n) time. In this condition, the computing time for the tree is
O (n log n).
Example
Let us consider the given files, f1, f2, f3, f4and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.
If merge operations are performed according to the provided sequence, then
Sorting the numbers according to their size in an ascending order, we get the following sequence −
f4, f3, f1, f2, f5. Hence, merge operations can be performed on this sequence
Therefore, the total number of operations is 15 + 35 + 65 + 95 = 210. Obviously, this is better than
the previous one. In this context, we are now going to solve the problem using this algorithm.
Hence, the solution takes 15 + 35 + 60 + 95 = 205 number of comparisons
After this, pick two smallest numbers and repeat this until we left with only one number.
pq.push(files[i]);
int count = 0;
pq.pop();
pq.pop();
count += temp;
pq.push(temp);
}
return count;
} // Driver code
int main()
// No of files
int files[] = { 2, 3, 4, 5, 6, 7 };
return 0;
Huffman Coding
Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It
was first developed by David Huffman.
Huffman Coding is generally useful to compress the data in which there are frequently occurring characters.
Using the Huffman Coding technique, we can compress the string to a smaller size.
Huffman coding first creates a tree using the frequencies of the character and then generates code for each
character.
Once the data is encoded, it has to be decoded. Decoding is done using the same tree.
Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix code ie. a code
associated with a character should not be present in the prefix of any other code. The tree created above
helps in maintaining the property.
2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.
4. Create an empty node z. Assign the minimum frequency to the left child of z and assign the second
minimum frequency to the right child of z. Set the value of the z as the sum of the above two
minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of frequencies (*
denote the internal nodes in the figure above).
8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.
For sending the above string over a network, we have to send the tree as well as the above compressed-code.
The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32 + 15 +
28 = 75.
For decoding the code, we can take the code and traverse through the tree to find the character.
Let 101 is to be decoded, we can traverse from the root as in the figure below.
The time complexity for encoding each unique character based on its frequency is O(nlog n).
Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity
is O(log n). Thus the overall complexity is O(nlog n).
Minimum spanning tree (MST)
A minimum spanning tree (MST) is a fundamental concept in graph theory and computer science.
Given a connected, undirected graph, the goal is to find a subset of the edges that connects all the
vertices together in the most efficient way possible.it is defined as a spanning tree that has the
minimum weight among all the possible spanning trees. The minimum spanning tree has all the
properties of a spanning tree with an added constraint of having the minimum possible weights
among all possible spanning trees. Like a spanning tree, there can also be many possible MSTs for
a graph.
- The number of vertices (V) in the graph and the spanning tree is the same.
- There is a fixed number of edges in the spanning tree which is equal to one less than the
total number of vertices ( E = V-1 ).
- The spanning tree should not be disconnected, as in there should only be a single source of
component, not more than that.
- The spanning tree should be acyclic, which means there would not be any cycle in the tree.
- There can be many possible spanning trees for a graph.
At each iteration, the algorithm adds the next lowest-weight edge one by one, such that the
edges picked until now does not form a cycle
The key idea behind Kruskal's algorithm is to greedily add the cheapest available edge that does
not create a cycle. By sorting the edges first, we can efficiently check for cycles and ensure the
final result is the minimum spanning tree.
To efficiently check for cycles, Kruskal's algorithm uses a data structure called a disjoint-set, also
known as a union-find data structure. This allows for quick determination of whether two vertices
are in the same connected component, and thus whether adding an edge would create a cycle.
The time complexity of Kruskal's algorithm is O(E log E), where E is the number of edges in the
graph. This is due to the sorting step, which dominates the overall runtime.
Example
The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:
Step 6: Pick edge 8-6. Since including this edge results in the cycle, discard it. Pick edge 2-
3: No cycle is formed, include it.
Step 7: Pick edge 7-8. Since including this edge results in the cycle, discard it. Pick edge 0-
7. No cycle is formed, include it.
Step 8: Pick edge 1-2. Since including this edge results in the cycle, discard it. Pick edge 3-
4. No cycle is formed, include it.
Algorithm
#include <bits/stdc++.h>
class DSU {
int* parent;
int* rank;
public:
DSU(int n)
parent[i] = -1;
rank[i] = 1;
// Find function
int find(int i)
if (parent[i] == -1)
return i;
// Union function
int s1 = find(x);
int s2 = find(y);
if (s1 != s2) {
parent[s1] = s2;
parent[s2] = s1;
else {
parent[s2] = s1;
rank[s1] += 1;
};
class Graph {
int V;
public:
Graph(int V) { this->V = V; }
edgelist.push_back({ w, x, y });
}
void kruskals_mst()
sort(edgelist.begin(), edgelist.end());
DSU s(V);
int ans = 0;
"constructed MST"
<< endl;
int w = edge[0];
int x = edge[1];
int y = edge[2];
if (s.find(x) != s.find(y)) {
s.unite(x, y);
ans += w;
cout << x << " -- " << y << " == " << w
<< endl;
};
// Driver code
int main()
Graph g(4);
g.addEdge(0, 1, 10);
g.addEdge(1, 3, 15);
g.addEdge(2, 3, 4);
g.addEdge(2, 0, 6);
g.addEdge(0, 3, 5);
// Function call
g.kruskals_mst();
return 0;
Prim's algorithm is another popular approach for computing the minimum spanning tree. Unlike
Kruskal's, which starts with an empty set and greedily adds edges, Prim's algorithm starts with a
single vertex and gradually expands the tree.
Then, it repeatedly checks for the minimum edge weight that connects one vertex of MST
to another vertex that is not yet in the MST.
This process is continued until all the vertices are included in the MST.
Prim's algorithm works by always adding the cheapest available edge that connects a vertex in the
current MST to a vertex outside of it. This ensures that the final result is the minimum spanning
tree.
The overall time complexity of Prim's algorithm is O(E log V), where E is the number of edges and
V is the number of vertices. This is slightly slower than Kruskal's algorithm for very dense graphs,
but Prim's can be more efficient for sparse graphs.
Example
Algorithm
#include <cstring>
#include <iostream>
#define V 5
int G[V][V] = {
int main() {
int no_edge; // number of edge
int selected[V];
no_edge = 0;
//graph
selected[0] = true;
<< "Weight";
cout << endl;
//For every vertex in the set S, find the all adjacent vertices
x = 0;
y = 0;
if (selected[i]) {
min = G[i][j];
x = i;
y = j;
cout << x << " - " << y << " : " << G[x][y];
no_edge++;
return 0;