0% found this document useful (0 votes)
20 views22 pages

Combinedm

The document discusses optimal merge patterns for merging sorted files. It describes how to merge multiple files in pairs to create a single sorted output file with minimum computations. An algorithm and analysis is provided along with examples to illustrate the optimal merge pattern.

Uploaded by

yabera528
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Combinedm

The document discusses optimal merge patterns for merging sorted files. It describes how to merge multiple files in pairs to create a single sorted output file with minimum computations. An algorithm and analysis is provided along with examples to illustrate the optimal merge pattern.

Uploaded by

yabera528
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Optimal File Merge Patterns

Optimal merge pattern is a pattern that relates to the merging of two or more sorted files into a
single sorted files. This type of merging can be done by two way merging method. When two or
more sorted files are to be merged altogether to form a single file, the minimum computations are
done to reach this file are known as Optimal Merge Pattern

If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4
files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to
get X3 as the output file.

If we have two files of sizes m and n, the total computation time will be O(m+n).

Algorithm: TREE (n)

for i := 1 to n -1 do

declare new node

node.leftchild := least (list)

node.rightchild := least (list)

node.weight := ((node.leftchild).weight) + ((node.rightchild).weight)

insert (list, node);

return least (list);

Analysis of Optimal Merge Pattern

An optimal merge pattern corresponds to a binary merge tree with minimum weighted external
path length. The function tree algorithm uses the greedy rule to get a two- way merge tree for n
files. The algorithm contains an input list of n trees. There are three field child lchild, rchild, and
weight in each node of the tree.

Initially, each tree in a list contains just one node. This external node has lchild and rchild field
zero whereas weight is the length of one of the n files to be merged. For any tree in the list with
root node t, t = it represents the weight that gives the length of the merged file.

There are two functions least (list) and insert (list, t) in a function tree. Least (list) obtains a tree
in lists whose root has the least weight and return a pointer to this tree. This tree is deleted from the
list. Function insert (list, t) inserts the tree with root t into the list.

The main for loop in this algorithm is executed in n-1 times. If the list is kept in increasing order
according to the weight value in the roots, then least (list) needs only O(1) time and insert (list, t)
can be performed in O(n) time. Hence, the total time taken is O (n 2). If the list is represented as a
min-heap in which the root value is less than or equal to the values of its children, then least (list)
and insert (list, t) can be done in O (log n) time. In this condition, the computing time for the tree is
O (n log n).

Example

Let us consider the given files, f1, f2, f3, f4and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.

If merge operations are performed according to the provided sequence, then

M1= merge f1and f2 => 20 + 30 = 50

M2= merge M1and f3 => 50 + 10 = 60

M3= merge M2and f4 => 60 + 5 = 65

M4= merge M3and f5 => 65 + 30 = 95

Hence, the total number of operations is 50 + 60 + 65 + 95 = 270.

 Now, the question arises is there any better solution?

Sorting the numbers according to their size in an ascending order, we get the following sequence −
f4, f3, f1, f2, f5. Hence, merge operations can be performed on this sequence

M1= merge f4and f3 => 5 + 10 = 15

M2= merge M1and f1 => 15 + 20 = 35

M3= merge M2and f2 => 35 + 30 = 65

M4= merge M3and f5 => 65 + 30 = 95

Therefore, the total number of operations is 15 + 35 + 65 + 95 = 210. Obviously, this is better than
the previous one. In this context, we are now going to solve the problem using this algorithm.
Hence, the solution takes 15 + 35 + 60 + 95 = 205 number of comparisons

Example 2 Given a set of unsorted files: 5, 3, 2, 7, 9, 13

Now, arrange these elements in ascending order: 2, 3, 5, 7, 9, 13

After this, pick two smallest numbers and repeat this until we left with only one number.

Now follow following steps:


So, The merging cost = 5 + 10 + 16 + 23 + 39 = 93

It can be implemented also:

int minComputation(int size, int files[])

// Create a min heap

priority_queue<int, vector<int>, greater<int> > pq;

for (int i = 0; i < size; i++) {

// Add sizes to priorityQueue

pq.push(files[i]);

// Variable to count total Computation

int count = 0;

while (pq.size() > 1) {

// pop two smallest size element

// from the min heap

int first_smallest = pq.top();

pq.pop();

int second_smallest = pq.top();

pq.pop();

int temp = first_smallest + second_smallest;

// Add the current computations // with the previous one's

count += temp;

// Add new combined file size // to priority queue or min heap

pq.push(temp);
}

return count;

} // Driver code

int main()

// No of files

int n = 6; // 6 files with their sizes

int files[] = { 2, 3, 4, 5, 6, 7 };

// Total no of computations // do be done final answer

cout << "Minimum Computations = " << minComputation(n, files);

return 0;

Time Complexity: O(nlogn)

Auxiliary Space: O(n)

Huffman Coding
Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It
was first developed by David Huffman.

Huffman Coding is generally useful to compress the data in which there are frequently occurring characters.

How Huffman Coding works?

Suppose the string below is to be sent over a network.


Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of 8 * 15
= 120 bits are required to send this string.

Using the Huffman Coding technique, we can compress the string to a smaller size.

Huffman coding first creates a tree using the frequencies of the character and then generates code for each
character.

Once the data is encoded, it has to be decoded. Decoding is done using the same tree.

Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix code ie. a code
associated with a character should not be present in the prefix of any other code. The tree created above
helps in maintaining the property.

Huffman coding is done with the help of the following steps.

1. Calculate the frequency of each character in the string.

2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.

3. Make each unique character as a leaf node.

4. Create an empty node z. Assign the minimum frequency to the left child of z and assign the second
minimum frequency to the right child of z. Set the value of the z as the sum of the above two
minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of frequencies (*
denote the internal nodes in the figure above).

6. Insert node z into the tree.

7. Repeat steps 3 to 5 for all the characters.

8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

For sending the above string over a network, we have to send the tree as well as the above compressed-code.
The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32 + 15 +
28 = 75.

Decoding the code

For decoding the code, we can take the code and traverse through the tree to find the character.

Let 101 is to be decoded, we can traverse from the root as in the figure below.

Huffman Coding Complexity

The time complexity for encoding each unique character based on its frequency is O(nlog n).

Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity
is O(log n). Thus the overall complexity is O(nlog n).
Minimum spanning tree (MST)

A minimum spanning tree (MST) is a fundamental concept in graph theory and computer science.
Given a connected, undirected graph, the goal is to find a subset of the edges that connects all the
vertices together in the most efficient way possible.it is defined as a spanning tree that has the
minimum weight among all the possible spanning trees. The minimum spanning tree has all the
properties of a spanning tree with an added constraint of having the minimum possible weights
among all possible spanning trees. Like a spanning tree, there can also be many possible MSTs for
a graph.

Properties of spanning tree

- The number of vertices (V) in the graph and the spanning tree is the same.
- There is a fixed number of edges in the spanning tree which is equal to one less than the
total number of vertices ( E = V-1 ).
- The spanning tree should not be disconnected, as in there should only be a single source of
component, not more than that.
- The spanning tree should be acyclic, which means there would not be any cycle in the tree.
- There can be many possible spanning trees for a graph.

Algorithms to find minimum spanning tree

1. Kruskal’s minimum spanning tree


Kruskal's algorithm is a greedy approach to constructing the minimum spanning tree. The basic
steps are as follows:

 First, it sorts all the edges of the graph by their weights,

 Then starts the iterations of finding the spanning tree.

 At each iteration, the algorithm adds the next lowest-weight edge one by one, such that the
edges picked until now does not form a cycle

The key idea behind Kruskal's algorithm is to greedily add the cheapest available edge that does
not create a cycle. By sorting the edges first, we can efficiently check for cycles and ensure the
final result is the minimum spanning tree.

To efficiently check for cycles, Kruskal's algorithm uses a data structure called a disjoint-set, also
known as a union-find data structure. This allows for quick determination of whether two vertices
are in the same connected component, and thus whether adding an edge would create a cycle.

The time complexity of Kruskal's algorithm is O(E log E), where E is the number of edges in the
graph. This is due to the sorting step, which dominates the overall runtime.
Example

The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:

Step 1: Pick edge 7-6. No cycle is formed, include it.

Step 2: Pick edge 8-2. No cycle is formed, include it.

Step 3: Pick edge 6-5. No cycle is formed, include it.


Step 4: Pick edge 0-1. No cycle is formed, include it.

Step 5: Pick edge 2-5. No cycle is formed, include it.

Step 6: Pick edge 8-6. Since including this edge results in the cycle, discard it. Pick edge 2-
3: No cycle is formed, include it.
Step 7: Pick edge 7-8. Since including this edge results in the cycle, discard it. Pick edge 0-
7. No cycle is formed, include it.

Step 8: Pick edge 1-2. Since including this edge results in the cycle, discard it. Pick edge 3-
4. No cycle is formed, include it.
Algorithm

#include <bits/stdc++.h>

using namespace std;

// DSU data structure

// path compression + rank by union

class DSU {

int* parent;

int* rank;
public:

DSU(int n)

parent = new int[n];

rank = new int[n];

for (int i = 0; i < n; i++) {

parent[i] = -1;

rank[i] = 1;

// Find function

int find(int i)

if (parent[i] == -1)

return i;

return parent[i] = find(parent[i]);

// Union function

void unite(int x, int y)

int s1 = find(x);

int s2 = find(y);
if (s1 != s2) {

if (rank[s1] < rank[s2]) {

parent[s1] = s2;

else if (rank[s1] > rank[s2]) {

parent[s2] = s1;

else {

parent[s2] = s1;

rank[s1] += 1;

};

class Graph {

vector<vector<int> > edgelist;

int V;

public:

Graph(int V) { this->V = V; }

// Function to add edge in a graph

void addEdge(int x, int y, int w)

edgelist.push_back({ w, x, y });

}
void kruskals_mst()

// Sort all edges

sort(edgelist.begin(), edgelist.end());

// Initialize the DSU

DSU s(V);

int ans = 0;

cout << "Following are the edges in the "

"constructed MST"

<< endl;

for (auto edge : edgelist) {

int w = edge[0];

int x = edge[1];

int y = edge[2];

// Take this edge in MST if it does

// not forms a cycle

if (s.find(x) != s.find(y)) {

s.unite(x, y);

ans += w;

cout << x << " -- " << y << " == " << w

<< endl;

cout << "Minimum Cost Spanning Tree: " << ans;


}

};

// Driver code

int main()

Graph g(4);

g.addEdge(0, 1, 10);

g.addEdge(1, 3, 15);

g.addEdge(2, 3, 4);

g.addEdge(2, 0, 6);

g.addEdge(0, 3, 5);

// Function call

g.kruskals_mst();

return 0;

2. Prim's minimum spanning tree

Prim's algorithm is another popular approach for computing the minimum spanning tree. Unlike
Kruskal's, which starts with an empty set and greedily adds edges, Prim's algorithm starts with a
single vertex and gradually expands the tree.

 It starts by selecting an arbitrary vertex and then adding it to the MST.

 Then, it repeatedly checks for the minimum edge weight that connects one vertex of MST
to another vertex that is not yet in the MST.

 This process is continued until all the vertices are included in the MST.
Prim's algorithm works by always adding the cheapest available edge that connects a vertex in the
current MST to a vertex outside of it. This ensures that the final result is the minimum spanning
tree.

The overall time complexity of Prim's algorithm is O(E log V), where E is the number of edges and
V is the number of vertices. This is slightly slower than Kruskal's algorithm for very dense graphs,
but Prim's can be more efficient for sparse graphs.

Example

Step 1. Start with a weighted graph

Step 2. Choose a vertex.


Step 3. Choose the shortest edge from this vertex and add it

Step 4. Choose the nearest vertex not yet in the solution


Step 5. Choose the nearest edge not yet in the solution, if there are multiple choices, choose one at
randomChoose the nearest edge not yet in the solution, if there are multiple choices, choose one at
random.

Algorithm

// Prim's Algorithm in C++

#include <cstring>

#include <iostream>

using namespace std;

#define INF 9999999

// number of vertices in grapj

#define V 5

// create a 2d array of size 5x5

//for adjacency matrix to represent graph

int G[V][V] = {

{0, 9, 75, 0, 0},

{9, 0, 95, 19, 42},

{75, 95, 0, 51, 66},

{0, 19, 51, 0, 31},

{0, 42, 66, 31, 0}};

int main() {
int no_edge; // number of edge

// create a array to track selected vertex

// selected will become true otherwise false

int selected[V];

// set selected false initially

memset(selected, false, sizeof(selected));

// set number of edge to 0

no_edge = 0;

// the number of egde in minimum spanning tree will be

// always less than (V -1), where V is number of vertices in

//graph

// choose 0th vertex and make it true

selected[0] = true;

int x; // row number

int y; // col number

// print for edge and weight

cout << "Edge"

<< " : "

<< "Weight";
cout << endl;

while (no_edge < V - 1) {

//For every vertex in the set S, find the all adjacent vertices

// , calculate the distance from the vertex selected at step 1.

// if the vertex is already in the set S, discard it otherwise

//choose another vertex nearest to selected vertex at step 1.

int min = INF;

x = 0;

y = 0;

for (int i = 0; i < V; i++) {

if (selected[i]) {

for (int j = 0; j < V; j++) {

if (!selected[j] && G[i][j]) { // not in selected and there is an edge

if (min > G[i][j]) {

min = G[i][j];

x = i;

y = j;

cout << x << " - " << y << " : " << G[x][y];

cout << endl;


selected[y] = true;

no_edge++;

return 0;

You might also like