0% found this document useful (0 votes)

20 views22 pages

Combinedm

The document discusses optimal merge patterns for merging sorted files. It describes how to merge multiple files in pairs to create a single sorted output file with minimum computations. An algorithm and analysis is provided along with examples to illustrate the optimal merge pattern.

Uploaded by

yabera528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views22 pages

Combinedm

Uploaded by

yabera528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Optimal File Merge Patterns

Optimal merge pattern is a pattern that relates to the merging of two or more sorted files into a
single sorted files. This type of merging can be done by two way merging method. When two or
more sorted files are to be merged altogether to form a single file, the minimum computations are
done to reach this file are known as Optimal Merge Pattern

If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4
files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to
get X3 as the output file.

If we have two files of sizes m and n, the total computation time will be O(m+n).

Algorithm: TREE (n)

for i := 1 to n -1 do

declare new node

node.leftchild := least (list)

node.rightchild := least (list)

node.weight := ((node.leftchild).weight) + ((node.rightchild).weight)

insert (list, node);

return least (list);

Analysis of Optimal Merge Pattern

An optimal merge pattern corresponds to a binary merge tree with minimum weighted external
path length. The function tree algorithm uses the greedy rule to get a two- way merge tree for n
files. The algorithm contains an input list of n trees. There are three field child lchild, rchild, and
weight in each node of the tree.

Initially, each tree in a list contains just one node. This external node has lchild and rchild field
zero whereas weight is the length of one of the n files to be merged. For any tree in the list with
root node t, t = it represents the weight that gives the length of the merged file.

There are two functions least (list) and insert (list, t) in a function tree. Least (list) obtains a tree
in lists whose root has the least weight and return a pointer to this tree. This tree is deleted from the
list. Function insert (list, t) inserts the tree with root t into the list.

The main for loop in this algorithm is executed in n-1 times. If the list is kept in increasing order
according to the weight value in the roots, then least (list) needs only O(1) time and insert (list, t)
can be performed in O(n) time. Hence, the total time taken is O (n 2). If the list is represented as a
min-heap in which the root value is less than or equal to the values of its children, then least (list)
and insert (list, t) can be done in O (log n) time. In this condition, the computing time for the tree is
O (n log n).

Example

Let us consider the given files, f1, f2, f3, f4and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.

If merge operations are performed according to the provided sequence, then

M1= merge f1and f2 => 20 + 30 = 50

M2= merge M1and f3 => 50 + 10 = 60

M3= merge M2and f4 => 60 + 5 = 65

M4= merge M3and f5 => 65 + 30 = 95

Hence, the total number of operations is 50 + 60 + 65 + 95 = 270.

 Now, the question arises is there any better solution?

Sorting the numbers according to their size in an ascending order, we get the following sequence −
f4, f3, f1, f2, f5. Hence, merge operations can be performed on this sequence

M1= merge f4and f3 => 5 + 10 = 15

M2= merge M1and f1 => 15 + 20 = 35

M3= merge M2and f2 => 35 + 30 = 65

M4= merge M3and f5 => 65 + 30 = 95

Therefore, the total number of operations is 15 + 35 + 65 + 95 = 210. Obviously, this is better than
the previous one. In this context, we are now going to solve the problem using this algorithm.
Hence, the solution takes 15 + 35 + 60 + 95 = 205 number of comparisons

Example 2 Given a set of unsorted files: 5, 3, 2, 7, 9, 13

Now, arrange these elements in ascending order: 2, 3, 5, 7, 9, 13

After this, pick two smallest numbers and repeat this until we left with only one number.

Now follow following steps:

So, The merging cost = 5 + 10 + 16 + 23 + 39 = 93

It can be implemented also:

int minComputation(int size, int files[])

// Create a min heap

priority_queue<int, vector<int>, greater<int> > pq;

for (int i = 0; i < size; i++) {

// Add sizes to priorityQueue

pq.push(files[i]);

// Variable to count total Computation

int count = 0;

while (pq.size() > 1) {

// pop two smallest size element

// from the min heap

int first_smallest = pq.top();

pq.pop();

int second_smallest = pq.top();

pq.pop();

int temp = first_smallest + second_smallest;

// Add the current computations // with the previous one's

count += temp;

// Add new combined file size // to priority queue or min heap

pq.push(temp);
}

return count;

} // Driver code

int main()

// No of files

int n = 6; // 6 files with their sizes

int files[] = { 2, 3, 4, 5, 6, 7 };

// Total no of computations // do be done final answer

cout << "Minimum Computations = " << minComputation(n, files);

return 0;

Time Complexity: O(nlogn)

Auxiliary Space: O(n)

Huffman Coding
Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It
was first developed by David Huffman.

Huffman Coding is generally useful to compress the data in which there are frequently occurring characters.

How Huffman Coding works?

Suppose the string below is to be sent over a network.

Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of 8 * 15
= 120 bits are required to send this string.

Using the Huffman Coding technique, we can compress the string to a smaller size.

Huffman coding first creates a tree using the frequencies of the character and then generates code for each
character.

Once the data is encoded, it has to be decoded. Decoding is done using the same tree.

Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix code ie. a code
associated with a character should not be present in the prefix of any other code. The tree created above
helps in maintaining the property.

Huffman coding is done with the help of the following steps.

1. Calculate the frequency of each character in the string.

2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.

3. Make each unique character as a leaf node.

4. Create an empty node z. Assign the minimum frequency to the left child of z and assign the second
minimum frequency to the right child of z. Set the value of the z as the sum of the above two
minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of frequencies (*
denote the internal nodes in the figure above).

6. Insert node z into the tree.

7. Repeat steps 3 to 5 for all the characters.

8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

For sending the above string over a network, we have to send the tree as well as the above compressed-code.
The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32 + 15 +
28 = 75.

Decoding the code

For decoding the code, we can take the code and traverse through the tree to find the character.

Let 101 is to be decoded, we can traverse from the root as in the figure below.

Huffman Coding Complexity

The time complexity for encoding each unique character based on its frequency is O(nlog n).

Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity
is O(log n). Thus the overall complexity is O(nlog n).
Minimum spanning tree (MST)

A minimum spanning tree (MST) is a fundamental concept in graph theory and computer science.
Given a connected, undirected graph, the goal is to find a subset of the edges that connects all the
vertices together in the most efficient way possible.it is defined as a spanning tree that has the
minimum weight among all the possible spanning trees. The minimum spanning tree has all the
properties of a spanning tree with an added constraint of having the minimum possible weights
among all possible spanning trees. Like a spanning tree, there can also be many possible MSTs for
a graph.

Properties of spanning tree

- The number of vertices (V) in the graph and the spanning tree is the same.
- There is a fixed number of edges in the spanning tree which is equal to one less than the
total number of vertices ( E = V-1 ).
- The spanning tree should not be disconnected, as in there should only be a single source of
component, not more than that.
- The spanning tree should be acyclic, which means there would not be any cycle in the tree.
- There can be many possible spanning trees for a graph.

Algorithms to find minimum spanning tree

1. Kruskal’s minimum spanning tree

Kruskal's algorithm is a greedy approach to constructing the minimum spanning tree. The basic
steps are as follows:

 First, it sorts all the edges of the graph by their weights,

 Then starts the iterations of finding the spanning tree.

 At each iteration, the algorithm adds the next lowest-weight edge one by one, such that the
edges picked until now does not form a cycle

The key idea behind Kruskal's algorithm is to greedily add the cheapest available edge that does
not create a cycle. By sorting the edges first, we can efficiently check for cycles and ensure the
final result is the minimum spanning tree.

To efficiently check for cycles, Kruskal's algorithm uses a data structure called a disjoint-set, also
known as a union-find data structure. This allows for quick determination of whether two vertices
are in the same connected component, and thus whether adding an edge would create a cycle.

The time complexity of Kruskal's algorithm is O(E log E), where E is the number of edges in the
graph. This is due to the sorting step, which dominates the overall runtime.
Example

The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:

Step 1: Pick edge 7-6. No cycle is formed, include it.

Step 2: Pick edge 8-2. No cycle is formed, include it.

Step 3: Pick edge 6-5. No cycle is formed, include it.

Step 4: Pick edge 0-1. No cycle is formed, include it.

Step 5: Pick edge 2-5. No cycle is formed, include it.

Step 6: Pick edge 8-6. Since including this edge results in the cycle, discard it. Pick edge 2-
3: No cycle is formed, include it.
Step 7: Pick edge 7-8. Since including this edge results in the cycle, discard it. Pick edge 0-
7. No cycle is formed, include it.

Step 8: Pick edge 1-2. Since including this edge results in the cycle, discard it. Pick edge 3-
4. No cycle is formed, include it.
Algorithm

#include <bits/stdc++.h>

using namespace std;

// DSU data structure

// path compression + rank by union

class DSU {

int* parent;

int* rank;
public:

DSU(int n)

parent = new int[n];

rank = new int[n];

for (int i = 0; i < n; i++) {

parent[i] = -1;

rank[i] = 1;

// Find function

int find(int i)

if (parent[i] == -1)

return i;

return parent[i] = find(parent[i]);

// Union function

void unite(int x, int y)

int s1 = find(x);

int s2 = find(y);
if (s1 != s2) {

if (rank[s1] < rank[s2]) {

parent[s1] = s2;

else if (rank[s1] > rank[s2]) {

parent[s2] = s1;

else {

parent[s2] = s1;

rank[s1] += 1;

};

class Graph {

vector<vector<int> > edgelist;

int V;

public:

Graph(int V) { this->V = V; }

// Function to add edge in a graph

void addEdge(int x, int y, int w)

edgelist.push_back({ w, x, y });

}
void kruskals_mst()

// Sort all edges

sort(edgelist.begin(), edgelist.end());

// Initialize the DSU

DSU s(V);

int ans = 0;

cout << "Following are the edges in the "

"constructed MST"

<< endl;

for (auto edge : edgelist) {

int w = edge[0];

int x = edge[1];

int y = edge[2];

// Take this edge in MST if it does

// not forms a cycle

if (s.find(x) != s.find(y)) {

s.unite(x, y);

ans += w;

cout << x << " -- " << y << " == " << w

<< endl;

cout << "Minimum Cost Spanning Tree: " << ans;

}

};

// Driver code

int main()

Graph g(4);

g.addEdge(0, 1, 10);

g.addEdge(1, 3, 15);

g.addEdge(2, 3, 4);

g.addEdge(2, 0, 6);

g.addEdge(0, 3, 5);

// Function call

g.kruskals_mst();

return 0;

2. Prim's minimum spanning tree

Prim's algorithm is another popular approach for computing the minimum spanning tree. Unlike
Kruskal's, which starts with an empty set and greedily adds edges, Prim's algorithm starts with a
single vertex and gradually expands the tree.

 It starts by selecting an arbitrary vertex and then adding it to the MST.

 Then, it repeatedly checks for the minimum edge weight that connects one vertex of MST
to another vertex that is not yet in the MST.

 This process is continued until all the vertices are included in the MST.
Prim's algorithm works by always adding the cheapest available edge that connects a vertex in the
current MST to a vertex outside of it. This ensures that the final result is the minimum spanning
tree.

The overall time complexity of Prim's algorithm is O(E log V), where E is the number of edges and
V is the number of vertices. This is slightly slower than Kruskal's algorithm for very dense graphs,
but Prim's can be more efficient for sparse graphs.

Example

Step 1. Start with a weighted graph

Step 2. Choose a vertex.

Step 3. Choose the shortest edge from this vertex and add it

Step 4. Choose the nearest vertex not yet in the solution

Step 5. Choose the nearest edge not yet in the solution, if there are multiple choices, choose one at
randomChoose the nearest edge not yet in the solution, if there are multiple choices, choose one at
random.

Algorithm

// Prim's Algorithm in C++

#include <cstring>

#include <iostream>

using namespace std;

#define INF 9999999

// number of vertices in grapj

#define V 5

// create a 2d array of size 5x5

//for adjacency matrix to represent graph

int G[V][V] = {

{0, 9, 75, 0, 0},

{9, 0, 95, 19, 42},

{75, 95, 0, 51, 66},

{0, 19, 51, 0, 31},

{0, 42, 66, 31, 0}};

int main() {
int no_edge; // number of edge

// create a array to track selected vertex

// selected will become true otherwise false

int selected[V];

// set selected false initially

memset(selected, false, sizeof(selected));

// set number of edge to 0

no_edge = 0;

// the number of egde in minimum spanning tree will be

// always less than (V -1), where V is number of vertices in

//graph

// choose 0th vertex and make it true

selected[0] = true;

int x; // row number

int y; // col number

// print for edge and weight

cout << "Edge"

<< " : "

<< "Weight";
cout << endl;

while (no_edge < V - 1) {

//For every vertex in the set S, find the all adjacent vertices

// , calculate the distance from the vertex selected at step 1.

// if the vertex is already in the set S, discard it otherwise

//choose another vertex nearest to selected vertex at step 1.

int min = INF;

x = 0;

y = 0;

for (int i = 0; i < V; i++) {

if (selected[i]) {

for (int j = 0; j < V; j++) {

if (!selected[j] && G[i][j]) { // not in selected and there is an edge

if (min > G[i][j]) {

min = G[i][j];

x = i;

y = j;

cout << x << " - " << y << " : " << G[x][y];

cout << endl;

selected[y] = true;

no_edge++;

return 0;

Agentforce Sales Agents Pitch Deck
No ratings yet
Agentforce Sales Agents Pitch Deck
30 pages
PLC Unit 4
No ratings yet
PLC Unit 4
62 pages
Resume Format For History Teachers
100% (2)
Resume Format For History Teachers
6 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
Unit 3
No ratings yet
Unit 3
122 pages
Huffman
No ratings yet
Huffman
35 pages
IT Skill-2
100% (1)
IT Skill-2
58 pages
Ada Unit 2
No ratings yet
Ada Unit 2
36 pages
The Girl With The Broken Heart Lurlene Mcdaniel Download
No ratings yet
The Girl With The Broken Heart Lurlene Mcdaniel Download
27 pages
Mini Project
No ratings yet
Mini Project
26 pages
ADA Unit 2 - 1711437399
No ratings yet
ADA Unit 2 - 1711437399
124 pages
Jessica
No ratings yet
Jessica
8 pages
Project Report
No ratings yet
Project Report
17 pages
Bi QB
No ratings yet
Bi QB
6 pages
Daa Cat2
No ratings yet
Daa Cat2
27 pages
11052r Wingding
No ratings yet
11052r Wingding
45 pages
Heaps and Priority Queue
No ratings yet
Heaps and Priority Queue
13 pages
DS Aat2
No ratings yet
DS Aat2
15 pages
Daa 02
No ratings yet
Daa 02
7 pages
DS Assignment
No ratings yet
DS Assignment
10 pages
Huffman Coding (Anurag Verma) v1.0
No ratings yet
Huffman Coding (Anurag Verma) v1.0
12 pages
Unit 2 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
22 pages
Dms Final 1
No ratings yet
Dms Final 1
14 pages
Data Structure Assignment No3
No ratings yet
Data Structure Assignment No3
18 pages
4.6 Huffman Coding, Optimal Merge Pattern
No ratings yet
4.6 Huffman Coding, Optimal Merge Pattern
24 pages
DAA File
No ratings yet
DAA File
31 pages
COM 113 Lecture 1
100% (1)
COM 113 Lecture 1
39 pages
Ex 7 Daa
No ratings yet
Ex 7 Daa
8 pages
DAA Unit-4
No ratings yet
DAA Unit-4
26 pages
Algorithm Design Techiques I
No ratings yet
Algorithm Design Techiques I
35 pages
Unit Ii Aa CH-1
No ratings yet
Unit Ii Aa CH-1
11 pages
Quick Sort and Its Worst-Case Analysis
No ratings yet
Quick Sort and Its Worst-Case Analysis
6 pages
Huffman Codes
No ratings yet
Huffman Codes
8 pages
Greedy Algorithm
No ratings yet
Greedy Algorithm
28 pages
Huffman Zipper
No ratings yet
Huffman Zipper
11 pages
Huffmancode
No ratings yet
Huffmancode
3 pages
Haufmann Coding
No ratings yet
Haufmann Coding
6 pages
1929 Rakul Mathavan
No ratings yet
1929 Rakul Mathavan
11 pages
AC by Sheshank Gupta
No ratings yet
AC by Sheshank Gupta
12 pages
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
No ratings yet
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
7 pages
Binary Search Tree
No ratings yet
Binary Search Tree
5 pages
DAA Lab Manual
No ratings yet
DAA Lab Manual
30 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Huffman Code
No ratings yet
Huffman Code
9 pages
DAA Unit-IV
No ratings yet
DAA Unit-IV
12 pages
Question Demo
No ratings yet
Question Demo
3 pages
Project Management Software Application: Muhammad Tahir Khan
No ratings yet
Project Management Software Application: Muhammad Tahir Khan
57 pages
M1 Greedy - Huffman Codes
No ratings yet
M1 Greedy - Huffman Codes
2 pages
Assignment Analysis Design of Algorithm Anshika Chauhan 0103CS191041
No ratings yet
Assignment Analysis Design of Algorithm Anshika Chauhan 0103CS191041
51 pages
Unit 2 CA209
No ratings yet
Unit 2 CA209
29 pages
Manual Detroit Diesel Serie 92
No ratings yet
Manual Detroit Diesel Serie 92
180 pages
How To Use The Browse Tree Guide: Background Information
No ratings yet
How To Use The Browse Tree Guide: Background Information
90 pages
5, Huffman Code
No ratings yet
5, Huffman Code
5 pages
Applied Epic en Us
No ratings yet
Applied Epic en Us
2 pages
Ecomdash Setup Checklist
No ratings yet
Ecomdash Setup Checklist
2 pages
61 Practical 06
No ratings yet
61 Practical 06
5 pages
Huffman Code
No ratings yet
Huffman Code
5 pages
Semi-Detailed Lesson Plan in CSS
No ratings yet
Semi-Detailed Lesson Plan in CSS
5 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Unit III
No ratings yet
Unit III
28 pages
4.6 Huffman Coding, Optimal Merge Patterns.
No ratings yet
4.6 Huffman Coding, Optimal Merge Patterns.
9 pages
Unit 2
No ratings yet
Unit 2
28 pages
Adp Huffman Coding
No ratings yet
Adp Huffman Coding
15 pages
Unit 2 - Analysis Design of Algorithm - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Analysis Design of Algorithm - WWW - Rgpvnotes.in
11 pages
Kinisi ERPNext Solution. (26 - 08 - 21)
100% (1)
Kinisi ERPNext Solution. (26 - 08 - 21)
17 pages
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
No ratings yet
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
28 pages
5V/1A Synchronous Boost With 1A Linear Charger For Power Bank
No ratings yet
5V/1A Synchronous Boost With 1A Linear Charger For Power Bank
4 pages
Nessus Report: 21/mar/2012:16:20:52 GMT
No ratings yet
Nessus Report: 21/mar/2012:16:20:52 GMT
74 pages
202441551502IA Alatishe
No ratings yet
202441551502IA Alatishe
1 page
2nd 15 Minutes Candle Breakout, Technical Analysis Scanner
No ratings yet
2nd 15 Minutes Candle Breakout, Technical Analysis Scanner
3 pages
Assignment: Course Title: Computer Algorithm Course Code: CSE 1001
No ratings yet
Assignment: Course Title: Computer Algorithm Course Code: CSE 1001
20 pages
You Do Not Need To Fully Understand This Section To Complete The Assessment.
No ratings yet
You Do Not Need To Fully Understand This Section To Complete The Assessment.
9 pages
CSE2012 Lab Assignment 1
No ratings yet
CSE2012 Lab Assignment 1
22 pages
Architect Philippines - Parking and Loading Space Requirements
No ratings yet
Architect Philippines - Parking and Loading Space Requirements
4 pages
16 Greedy Algorithms
No ratings yet
16 Greedy Algorithms
21 pages
Hauffman Coading
No ratings yet
Hauffman Coading
6 pages
Three (3) PHD Positions in Positron Emission Tomography Quantification Department
No ratings yet
Three (3) PHD Positions in Positron Emission Tomography Quantification Department
1 page
Huffman Coding Source Code
No ratings yet
Huffman Coding Source Code
4 pages
3a.huffman Encoding
No ratings yet
3a.huffman Encoding
4 pages
Huffman Code1
100% (1)
Huffman Code1
13 pages
Documentation in Daa
No ratings yet
Documentation in Daa
16 pages
PSG Mechanical Design Data Book
No ratings yet
PSG Mechanical Design Data Book
1 page
Informative Speech Steve Jobs
100% (1)
Informative Speech Steve Jobs
5 pages
Presentation On: Presented To Dr. Vinay Pathak
89% (19)
Presentation On: Presented To Dr. Vinay Pathak
37 pages
Powershell
No ratings yet
Powershell
4 pages
Assignment No: 02 Title: Huffman Algorithm
No ratings yet
Assignment No: 02 Title: Huffman Algorithm
7 pages
Freehand or Graphic Method
No ratings yet
Freehand or Graphic Method
11 pages
Data Structutes Using C'
No ratings yet
Data Structutes Using C'
7 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Lecture 15
No ratings yet
Lecture 15
3 pages