Unit3 Approximation Algorithm
Unit3 Approximation Algorithm
Intuitively, the approximation ratio measures how bad the approximate solution is distinguished with
the optimal solution. A large (small) approximation ratio measures the solution is much worse than
(more or less the same as) an optimal solution.
Observe that P (n) is always ≥ 1, if the ratio does not depend on n, we may write P. Therefore, a 1-
approximation algorithm gives an optimal solution. Some problems have polynomial-time
approximation algorithm with small constant approximate ratios, while others have best-known
polynomial time approximation algorithms whose approximate ratios grow with n.
Approximation Algorithms
Approximation algorithms are algorithms designed to solve problems that are not solvable in
polynomial time for approximate solutions. These problems are known as NP complete problems.
These problems are significantly effective to solve real world problems, therefore, it becomes
important to solve them using a different approach
NP complete problems can still be solved in three cases: the input could be so small that the execution
time is reduced, some problems can still be classified into problems that can be solved in polynomial
time, or use approximation algorithms to find near-optima solutions for the problems.
Performance Ratios
The main idea behind calculating the performance ratio of an approximation algorithm, which is also
called as an approximation ratio, is to find how close the approximate solution is to the optimal
solution.
The approximate ratio is represented using ρ(n) where n is the input size of the algorithm, C is the
near-optimal solution obtained by the algorithm, C* is the optimal solution for the problem. The
algorithm has an approximate ratio of ρ(n) if and only if −
The algorithm is then called a ρ(n)-approximation algorithm. Approximation Algorithms can be applied
on two types of optimization problems: minimization problems and maximization problems. If the
optimal solution of the problem is to find the maximum cost, the problem is known as the
maximization problem; and if the optimal solution of the problem is to find the minimum cost, then
the problem is known as a minimization problem.
For maximization problems, the approximation ratio is calculated by C*/C since 0 ≤ C ≤ C*. For
minimization problems, the approximation ratio is calculated by C/C* since 0 ≤ C* ≤ C.
Assuming that the costs of approximation algorithms are all positive, the performance ratio is well
defined and will not be less than 1. If the value is 1, that means the approximate algorithm generates
the exact optimal solution.
A good approximation algorithm aims for an approximation ratio as close as possible to 1, meaning the
algorithm’s output is as close to optimal as possible.
4. Performance Guarantee: Approximation algorithms often come with performance
guarantees, which provide bounds on how much worse the algorithm's solution is compared
to the optimal solution. This guarantee ensures that the algorithm's solution will always be
within a known factor of the optimal.
5. Types of Approximation Algorithms:
o Greedy Algorithms: Make locally optimal choices at each step with the hope of finding
a global optimum. Common in problems like the Set Cover or Knapsack problems.
o Primal-Dual Algorithms: These algorithms solve optimization problems by
simultaneously considering the primal and dual problems and maintaining a
relationship between them.
o Local Search Algorithms: Involve iterating over possible solutions and moving to a
neighboring solution that improves the objective function.
Conclusion
Approximation algorithms offer a powerful way to deal with NP-hard optimization problems by trading
off perfect accuracy for efficiency. While they may not always provide the exact optimal solution, their
ability to find near-optimal solutions in polynomial time makes them invaluable in real-world
applications where exact solutions would be too slow or infeasible to compute.
there exist some problems whose solutions are not yet found, the problems are divided into classes
known as Complexity Classes. In complexity theory, a Complexity Class is a set of problems with
related complexity. These classes help scientists to group problems based on how much time and space
they require to solve problems and verify the solutions. It is the branch of the theory of computation
that deals with the resources required to solve a problem.
The common resources are time and space, meaning how much time the algorithm takes to solve a
problem and the corresponding memory usage.
• The time complexity of an algorithm is used to describe the number of steps required to solve
a problem, but it can also be used to describe how long it takes to verify the answer.
• The space complexity of an algorithm describes how much memory is required for the
algorithm to operate.
P Class
The P in the P class stands for Polynomial Time. It is the collection of decision problems(problems with
a “yes” or “no” answer) that can be solved by a deterministic machine in polynomial time.
Features:
• The solution to P problems is easy to find.
• P is often a class of computational problems that are solvable and tractable. Tractable means
that the problems can be solved in theory as well as in practice. But the problems that can be
solved in theory but not in practice are known as intractable.
This class contains many problems:
1. Calculating the greatest common divisor.
2. Finding a maximum matching.
3. Merge Sort
NP Class
The NP in NP class stands for Non-deterministic Polynomial Time. It is the collection of decision
problems that can be solved by a non-deterministic machine in polynomial time.
Features:
• The solutions of the NP class are hard to find since they are being solved by a non-deterministic
machine but the solutions are easy to verify.
• Problems of NP can be verified by a Turing machine in polynomial time.
This class contains many problems that one would like to be able to solve effectively:
1. Boolean Satisfiability Problem (SAT).
2. Hamiltonian Path Problem.
3. Graph coloring.
Co-NP Class
Co-NP stands for the complement of NP Class. It means if the answer to a problem in Co-NP is No, then
there is proof that can be checked in polynomial time.
Features:
• If a problem X is in NP, then its complement X’ is also in CoNP.
Some example problems for CoNP are:
1. To check prime number.
2. Integer Factorization.
NP-hard class
An NP-hard problem is at least as hard as the hardest problem in NP and it is a class of problems such
that every problem in NP reduces to NP-hard.
Features:
• All NP-hard problems are not in NP.
• It takes a long time to check them. This means if a solution for an NP-hard problem is given
then it takes a long time to check whether it is right or not.
• A problem A is in NP-hard if, for every problem L in NP, there exists a polynomial-time reduction
from L to A.
Some of the examples of problems in Np-hard are:
1. Halting problem.
2. Qualified Boolean formulas.
3. No Hamiltonian cycle.
NP-complete class
A problem is NP-complete if it is both NP and NP-hard. NP-complete problems are the hard problems
in NP.
Features:
• NP-complete problems are special as any problem in NP class can be transformed or reduced
into NP-complete problems in polynomial time.
• If one could solve an NP-complete problem in polynomial time, then one could also solve any
NP problem in polynomial time.
NP-hard All NP-hard problems are not in NP and it takes a long time to check them.
Definition of NP class Problem: - The set of all decision-based problems came into the division of NP
Problems who can't be solved or produced an output within polynomial time but verified in
the polynomial time. NP class contains P class as a subset. NP problems being hard to solve.
Definition of P class Problem: - The set of decision-based problems come into the division of P
Problems who can be solved or produced an output within polynomial time. P problems being easy to
solve
Definition of Polynomial time: - If we produce an output according to the given input within a specific
amount of time such as within a minute, hours. This is known as Polynomial time.
Definition of Non-Polynomial time: - If we produce an output according to the given input but there
are no time constraints is known as Non-Polynomial time. But yes output will produce but time is not
fixed yet.
Definition of Decision Based Problem: - A problem is called a decision problem if its output is a simple
"yes" or "no" (or you may need this of this as true/false, 0/1, accept/reject.) We will phrase many
optimization problems as decision problems. For example, Greedy method, D.P., given a graph G= (V,
E) if there exists any Hamiltonian cycle.
Definition of NP-hard class: - Here you to satisfy the following points to come into the division of NP-
hard
1. If we can solve this problem in polynomial time, then we can solve all NP problems in
polynomial time
2. If you convert the issue into one form to another form within the polynomial time
Definition of NP-complete class: - A problem is in NP-complete, if
1. It is in NP
2. It is NP-hard
Pictorial representation of all NP classes which includes NP, NP-hard, and NP-complete
In computational complexity theory, P, NP, NP-Hard, and NP-Complete are classes used to categorize
problems based on their computational difficulty and the resources required to solve them. These
concepts help us understand the inherent difficulty of solving various computational problems and
how they relate to one another.
P (Polynomial Time)
• Definition: P is the class of decision problems (problems with a yes/no answer) that can be
solved in polynomial time by a deterministic algorithm. In other words, problems in P can be
solved efficiently.
• Examples:
o Sorting algorithms (like MergeSort and QuickSort).
o Finding the shortest path in a graph (using Dijkstra's algorithm).
o Matrix multiplication.
NP (Nondeterministic Polynomial Time)
• Definition: NP is the class of decision problems for which a proposed solution can be verified
in polynomial time by a deterministic algorithm. In simpler terms, if someone gives you a
solution, you can check whether it’s correct in polynomial time.
• Key Point: Problems in NP might not be solvable efficiently, but if you are given a potential
solution, you can verify its correctness quickly (in polynomial time).
• Examples:
o Sudoku (given a completed grid, it's easy to verify if it's correct).
o Graph Coloring (given a coloring, it's easy to check if it satisfies the constraints).
NP-Hard
• Definition: NP-Hard is a class of problems that are at least as hard as the hardest problems in
NP. In other words, a problem is NP-Hard if any problem in NP can be reduced to it in
polynomial time.
• Key Point: An NP-Hard problem does not necessarily have to be in NP itself (i.e., it might not
have a polynomial-time verification procedure), but it is at least as difficult as the hardest NP
problems.
• Examples:
o Travelling Salesman Problem (TSP) (finding the shortest possible route through all
cities).
o Knapsack Problem (select items to maximize value within weight limits).
o Halting Problem (deciding whether a program will halt or run forever).
NP-Complete
• Definition: NP-Complete is a subset of NP problems that are both in NP and NP-Hard. In other
words, a problem is NP-Complete if:
1. It is in NP (its solution can be verified in polynomial time).
2. Every problem in NP can be reduced to it in polynomial time (meaning it is at least as
hard as the hardest problems in NP).
• Key Point: If you can solve any NP-Complete problem in polynomial time, then every problem
in NP can also be solved in polynomial time (this would imply P = NP).
• Examples:
o Boolean Satisfiability Problem (SAT) (given a boolean expression, determine if there's
an assignment that makes it true).
o Knapsack Problem (with integer weights and values).
o Graph Coloring (finding the smallest number of colors to color a graph).
An optimization problem can be solved using Greedy if the problem has the following property:
• At every step, we can make a choice that looks best at the moment, and we get the optimal
solution to the complete problem.
• Some popular Greedy Algorithms are Fractional Knapsack, Dijkstra’s algorithm, Kruskal’s
algorithm, Huffman coding and Prim’s Algorithm
• The greedy algorithms are sometimes also used to get an approximation for Hard optimization
problems. For example, Traveling Salesman Problem is an NP-Hard problem. A Greedy choice
for this problem is to pick the nearest unvisited city from the current city at every step. These
solutions don’t always produce the best optimal solution but can be used to get an
approximately optimal solution.
However, it’s important to note that not all problems are suitable for greedy algorithms. They work
best when the problem exhibits the following properties:
• Greedy Choice Property: The optimal solution can be constructed by making the best local
choice at each step.
• Optimal Substructure: The optimal solution to the problem contains the optimal solutions to
its subproblems.
Characteristics of Greedy Algorithm
Here are the characteristics of a greedy algorithm:
• Greedy algorithms are simple and easy to implement.
• They are efficient in terms of time complexity, often providing quick solutions. Greedy
Algorithms are typically preferred over Dynamic Programming for the problems where both
are applied. For example, Jump Game problem and Single Source Shortest Path Problem
(Dijkstra is preferred over Bellman Ford where we do not have negative weights)..
• These algorithms do not reconsider previous choices, as they make decisions based on current
information without looking ahead.
These characteristics help to define the nature and usage of greedy algorithms in problem-solving.
Want to master Greedy algorithm and more? Check out our DSA Self-Paced Course for a
comprehensive guide to Data Structures and Algorithms at your own pace. This course will help you
build a strong foundation and advance your problem-solving skills.
How does the Greedy Algorithm works?
Greedy Algorithm solve optimization problems by making the best local choice at each step in the
hope of finding the global optimum. It’s like taking the best option available at each moment, hoping
it will lead to the best overall outcome.
Here’s how it works:
1. Start with the initial state of the problem. This is the starting point from where you begin
making choices.
2. Evaluate all possible choices you can make from the current state. Consider all the options
available at that specific moment.
3. Choose the option that seems best at that moment, regardless of future consequences. This
is the “greedy” part – you take the best option available now, even if it might not be the best
in the long run.
4. Move to the new state based on your chosen option. This becomes your new starting point for
the next iteration.
5. Repeat steps 2-4 until you reach the goal state or no further progress is possible. Keep making
the best local choices until you reach the end of the problem or get stuck..
Example:
Let’s say you have a set of coins with values {1, 2, 5, 10, 20, 50, 100} and you need to give minimum
number of coin to someone change for 36 .
The greedy algorithm for making change would work as follows:
1. Start with the largest coin value that is less than or equal to the amount to be changed. In this
case, the largest coin less than 36 is 20 .
2. Subtract the largest coin value from the amount to be changed, and add the coin to the
solution. In this case, subtracting 20 from 36 gives 16 , and we add a 20 coin to the solution.
3. Repeat steps 1 and 2 until the amount to be changed becomes 0.
So, using the greedy algorithm, the solution for making change for 36 would be one 20 coins, one 10
coin, one 5 coins and one 1 coin needed.
Note: This is just one example, and other greedy choices could have been made at each step. However,
in this case, the greedy approach leads to the optimal solution.
Used when a greedy choice at each Applied when the problem can be
step leads to the globally optimal broken down into overlapping
Applications solution subproblems
2. Approximation Ratio:
o For NP-hard problems, greedy algorithms provide approximation ratios. These ratios
tell us how close the greedy solution is to the optimal one.
o For example, in the Set Cover Problem, the greedy algorithm achieves a ratio of
lnn\ln nlnn, meaning the solution is at most lnn\ln nlnn times worse than the
optimal solution.
3. Time Complexity:
o Greedy algorithms are generally efficient. For example:
4. Limitations:
o Greedy algorithms do not always give the optimal solution for all problems. They can
sometimes be short-sighted, making decisions that seem good in the short term but
are not optimal in the long run.
o For non-fractional knapsack or other combinatorial problems, greedy algorithms may
fail to provide a good solution without a specific structure in the problem.
Conclusion
The Greedy Approach is a powerful tool in the design of approximation algorithms. Although it does
not always guarantee optimal solutions, it often provides efficient, simple, and good-enough solutions
for NP-hard problems. By choosing the best option at each step, greedy algorithms offer practical and
scalable solutions for complex problems across a variety of fields.
o Approximation: The 0/1 knapsack problem itself can be solved exactly using this DP
approach, but when faced with fractional or large-scale versions, relaxed or approximate
solutions may be used, such as the Fractional Knapsack version or Bounded Knapsack
Approximation.
Approximation: For some string matching or sequence alignment problems, we can use
approximate string matching or limited LCS for large datasets, where the solution is relaxed to
avoid exponential complexity.
Approximation Algorithms:
It plays a vital role in finding the optimal solution to the knapsack problems as in real-world scenarios
finding the exact optimal solution to the knapsack problem is quite impractical due to the problem’s
NP-hard nature. These approximation algorithms offer a reasonable solution to the knapsack
problem by taking both time and space complexity into consideration.
• Greedy Algorithms
• Dynamic Programming
The greedy approach is a simple and intutive algorithm for solving knapsack problem. It will select
the items based on theri value to weight ratios and choosing the items with highest ratios first.
Step-by-step algorithm:
• Initialize the knapsack as empty and set the total value to zero.
o If the current item can be fully included in the knapsack, add it completely and
update the total value.
o Otherwise, include a fractional part of the item that fits the remaining capacity of
the knapsack, proportionally increasing the total value.
Time Complexity: O(N log N) where N is the number of items due to sorting
Auxiliary Space: O(N) where N is the number of items.
Using dynamic programming we can break down the problem into smaller subproblems and will use
a table to store the optimal solutions for the these subproblems. We will iterate through each item
and weight combination making a decision to either include or exclude the item based on its value
and weight and we can achieve result by avoiding redundant calucations
Step-by-step algorithm:
• Create a 2D list called dp with dimensions (N+1) x (W+1) and initialize all values to 0. This
table will store the maximum achievable values for different combinations of items and
capacities.
o If the weight of the current item is greater than the current capacity, it cannot be
included in the knapsack. So, assign the value at the previous item and the same
capacity to dp[i][j].
o If the weight of the current item is less than or equal to the current capacity, we
have two choices:
o Include the current item: Add its value to the value obtained by considering the
remaining capacity after including the item (values[i-1] + dp[i-1][j – weights[i-1]]).
o Exclude the current item: Consider the value obtained by excluding the item (dp[i-
1][j]). Choose the maximum value between the two choices and assign it to dp[i][j].
• After completing the iterations, the value at dp[N][W] represents the maximum achievable
value for the given knapsack capacity.
• Return the value dp[N][W] as the maximum value that can be obtained.
The Knapsack Problem is one of the most well-known optimization problems and is often used to
demonstrate approximation algorithms. It involves selecting a subset of items, each with a given
weight and value, such that the total weight of the selected items does not exceed a given weight
capacity, and the total value is maximized.
2. Fractional Knapsack Problem: Items can be broken into smaller fractions, allowing you to
take a part of an item.
While the Fractional Knapsack problem can be solved optimally in polynomial time using a greedy
approach, the 0/1 Knapsack Problem is NP-hard, and finding the optimal solution for large instances
becomes computationally expensive. Thus, approximation algorithms are often used when finding an
exact solution is not feasible.
Knapsack Problem Variants in Approximation Algorithms
In the 0/1 Knapsack Problem, each item is either included or excluded, and the goal is to maximize
the total value while ensuring that the total weight does not exceed the given capacity of the
knapsack.
In the Fractional Knapsack Problem, items can be divided into fractions, meaning you can take a
fraction of any item rather than just an entire item.
• Greedy Solution:
Since the 0/1 Knapsack Problem is NP-Hard, we often use approximation algorithms to find good
solutions in reasonable time, especially when an exact solution is not feasible due to the problem's
computational complexity.
Although the greedy algorithm does not guarantee an optimal solution for the 0/1 Knapsack, it can
be adapted as an approximation strategy for certain cases. A simple greedy approach would be to
sort items by their value-to-weight ratio and try to include them in the knapsack, but we can only
take full items.
However, the Greedy Approach for 0/1 Knapsack does not always yield the optimal solution. For
example, sometimes including a lower value item might lead to a better solution than including a
high value-to-weight ratio item.
For the 0/1 Knapsack, the greedy approach might be used as a heuristic, but there is no fixed
approximation ratio that applies universally. Instead, we typically rely on more sophisticated
techniques.
Approximation Algorithms for the Knapsack Problem: Summary
Key Takeaways
• The 0/1 Knapsack Problem is NP-hard, meaning that finding an exact solution can be
computationally expensive for large problem instances.
• Greedy algorithms provide a fast heuristic but do not guarantee optimal solutions for the
0/1 Knapsack Problem.
• Dynamic Programming can provide an exact solution but is pseudo-polynomial in time, and
scaling techniques can be used for approximation.
In practice, when an exact solution is infeasible, approximation algorithms like FPTAS or dynamic
programming with scaling offer a good balance between computational time and solution quality.
Huffman Coding is a popular lossless data compression algorithm used to minimize the size of data
while preserving the integrity of the original information. It works by assigning variable-length binary
codes to each input character, with shorter codes assigned to more frequent characters, and longer
codes assigned to less frequent ones. The main goal of Huffman coding is to reduce the total number
of bits required to represent a given set of symbols (or characters).
The algorithm was developed by David A. Huffman in 1952 while he was a Ph.D. student at MIT, and
it has since become one of the foundational algorithms in data compression, particularly used in
formats like ZIP files, JPEG image compression, and MP3 audio compression.
How Huffman Coding Works
1. Frequency Analysis:
o The first step is to analyze the frequency of each character (or symbol) in the input
data. Characters that appear more frequently are given shorter codes, while those
that appear less frequently are assigned longer codes.
o A binary tree is built in which each leaf node represents a symbol, and its weight
corresponds to the frequency of that symbol.
o Two nodes with the smallest frequencies are repeatedly combined into a parent
node. The frequency of the parent node is the sum of the frequencies of the two
child nodes.
o This process continues until there is only one node left, which becomes the root of
the Huffman tree.
3. Assigning Codes:
o After the binary tree is constructed, the Huffman codes are assigned. Starting from
the root, move left for 0 and right for 1. The code for each symbol is the sequence of
0s and 1s from the root to the corresponding leaf node.
4. Compression:
o The input data is then encoded using these binary codes. Since more frequent
symbols have shorter codes, the total length of the compressed data is minimized.
• Optimality: It produces an optimal prefix code for a given set of symbols, ensuring the
smallest possible number of bits required for encoding, given the frequencies of the
symbols.
• Binary Tree Structure: The algorithm works by constructing a binary tree where the leaves
represent the symbols, and the tree is built based on the frequencies of those symbols.
o Place each symbol along with its frequency into a priority queue (or min-heap)
ordered by frequency.
o Extract the two nodes with the smallest frequencies from the queue.
o Create a new internal node with these two nodes as children. The frequency of the
internal node is the sum of the two children's frequencies.
o Insert the new internal node back into the priority queue.
o Repeat this process until there is only one node left, which will be the root of the
Huffman tree.
o Traverse the tree starting from the root. Assign 0 for left branches and 1 for right
branches.
o The path from the root to each leaf node represents the Huffman code for that
symbol.
o Use the Huffman codes to replace each symbol in the input data with its
corresponding Huffman code.
5. Decoding:
o To decode the compressed data, traverse the Huffman tree according to the
sequence of 0s and 1s, starting from the root. Each time a leaf is reached, the
corresponding symbol is decoded.
• Efficient Compression: Huffman coding minimizes the number of bits required to represent
data, making it highly efficient for data storage and transmission.
• Optimal for Known Frequencies: If the frequency distribution of symbols is known, Huffman
coding produces the optimal prefix-free binary code.
• Widely Used: It is the basis for many common compression algorithms, including ZIP files,
JPEG image format, and MP3 audio encoding.
• Requires Frequency Information: Huffman coding requires knowing the frequency of each
symbol in advance, which may not always be feasible or efficient if the symbol frequencies
change over time.
• No Support for Adaptive Compression: While Huffman coding works optimally for static
frequency distributions, it doesn't adapt well to data streams where frequencies may change
dynamically. This limitation can be addressed by Adaptive Huffman Coding.
• It uses a binary tree structure, built using a greedy algorithm, to generate optimal codes for
the symbols.
• Efficiency: Huffman coding achieves optimal compression when symbol frequencies are
known in advance.
• It is widely used in compression schemes for file formats such as ZIP, JPEG, and MP3.
• Time Complexity: Building the tree takes O(nlogn)O(n \log n)O(nlogn), where nnn is the
number of unique symbols.
• Decoding: Requires the tree for decoding, which can be done efficiently by traversing the
tree.
Huffman Coding is also used as a component in many different compression algorithms. It is used as
a component in lossless compressions such as zip, gzip, and png, and even as part of lossy
compression algorithms like mp3 and jpeg.
Use the animation below to see how a text can be compressed using Huffman Coding.
How it works:
2. Build a binary tree, starting with the nodes with the lowest count. The new parent node has
the combined count of its child nodes.
3. The edge from a parent gets '0' for the left child, and '1' for the edge to the right child.
4. In the finished binary tree, follow the edges from the root node, adding '0' or '1' for each
branch, to find the new Huffman code for each piece of data.
5. Create the Huffman code by converting the data, piece-by-piece, into a binary code using the
binary tree.
Huffman Coding uses a variable length of bits to represent each piece of data, with a shorter bit
representation for the pieces of data that occurs more often.
Furthermore, Huffman Coding ensures that no code is the prefix of another code, which makes the
compressed data easy to decode.
Data compression is when the original data size is reduced, but the information is mostly, or fully,
kept. Sound or music files are for example usually stored in a compressed format, roughly just 10% of
the original data size, but with most of the information kept.
Lossless means that even after the data is compressed, all the information is still there. This means
that for example a compressed text still has all the same letters and characters as the original.
Lossy is the other variant of data compression, where some of the original information is lost, or
sacrificed, so that the data can be compressed even more. Music, images, and video is normally
stored and streamed with lossy compression like mp3, jpeg, and mp4.
Data may be compressed using the Huffman Coding technique to become smaller without losing any
of its information. After David Huffman, who created it in the beginning? Data that contains
frequently repeated characters is typically compressed using Huffman coding.
A well-known Greedy algorithm is Huffman Coding. The size of code allocated to a character relies on
the frequency of the character, which is why it is referred to be a greedy algorithm. The short-length
variable code is assigned to the character with the highest frequency, and vice versa for characters
with lower frequencies. It employs a variable-length encoding, which means that it gives each
character in the provided data stream a different variable-length code.
Prefix Rule
Essentially, this rule states that the code that is allocated to a character shall not be another code's
prefix. If this rule is broken, various ambiguities may appear when decoding the Huffman tree that
has been created.
What is the Huffman Coding process?
The Huffman Code is obtained for each distinct character in primarily two steps:
o Create a Huffman Tree first using only the unique characters in the data stream provided.
o Second, we must proceed through the constructed Huffman Tree, assign codes to the
characters, and then use those codes to decode the provided text.
The steps used to construct the Huffman tree using the characters provided
1. Input:
If Huffman Coding is employed in this case for data compression, the following information must be
determined for decoding:
o Utilizing the formulas covered below, the final two of them are discovered.
o Huffman developed a greedy technique that generates a Huffman Code, an ideal prefix code,
for each distinct character in the input data stream.
o The approach uses the fewest nodes each time to create the Huffman tree from the bottom
up.
o Because each character receives a length of code based on how frequently it appears in the
given stream of data, this method is known as a greedy approach. It is a commonly occurring
element in the data if the size of the code retrieved is less.
o Here, we'll talk about some practical uses for Huffman Coding:
o Conventional compression formats like PKZIP, GZIP, etc. typically employ Huffman coding.
o Huffman Coding is used for data transfer by fax and text because it minimizes file size and
increases transmission speed.
o Huffman encoding (particularly the prefix codes) is used by several multimedia storage
formats, including JPEG, PNG, and MP3, to compress the files.
o When a string of often recurring characters has to be sent, it can be more helpful.
Conclusion
o In general, Huffman Coding is helpful for compressing data that contains frequently occurring
characters.
o We can see that the character that occurs most frequently has the shortest code, whereas
the one that occurs the least frequently has the greatest code.
o The Huffman Code compression technique is used to create variable-length coding, which
uses a varied amount of bits for each letter or symbol. This method is superior to fixed-
length coding since it uses less memory and transmits data more quickly.
The Traveling Salesman Problem (TSP) is a classic NP-hard problem in the field of optimization. The
problem asks for the shortest possible route that visits each of a given set of cities exactly once and
returns to the starting city. Despite its simple statement, the TSP is notoriously difficult to solve
efficiently for large datasets, as the number of possible solutions grows factorially with the number
of cities.
• Input: A set of cities and the distances between each pair of cities.
• Output: A Hamiltonian circuit (a path that visits each city exactly once and returns to the
starting point) with the minimum total distance.
• NP-Hard: The exact solution to TSP is computationally expensive and can only be solved
optimally in exponential time for large instances.
• Exponential Solutions: The brute-force solution involves checking all possible permutations
of the cities, which has a time complexity of O(n!), where nnn is the number of cities.
Given the computational intractability of the TSP, approximation algorithms aim to find solutions that
are close to the optimal, with a guaranteed performance ratio (i.e., how far the approximate
solution is from the optimal solution). The goal is to find a polynomial-time approximation that gives
a solution within a known factor of the optimal one.
Christofides' Algorithm is the most widely used approximation algorithm for solving the metric TSP
(where the triangle inequality holds, i.e., the direct path between two cities is never longer than any
indirect route).
• Approximation Ratio: Christofides' algorithm guarantees that the total length of the tour will
be at most 1.5 times the optimal length. This is the best possible approximation ratio that is
achievable in polynomial time for the metric TSP.
• Steps of Christofides' Algorithm:
1. Minimum Spanning Tree (MST): Find the MST of the given graph (using algorithms
like Prim's or Kruskal's). The MST represents the least-cost way to connect all the
cities.
2. Find Odd Degree Vertices: In the MST, some vertices will have an odd degree (i.e.,
an odd number of connections). To make the MST Eulerian (so that it can be turned
into a cycle), we need to pair up the odd-degree vertices.
4. Combine MST and Matching: Combine the MST with the edges of the perfect
matching. This creates a multigraph where every vertex has an even degree.
5. Eulerian Circuit: Find an Eulerian circuit in the multigraph (which can be done in
linear time). This Eulerian circuit is not a valid TSP tour, but it visits every edge.
6. Shortcutting: Finally, remove any repeated cities from the Eulerian circuit to get a
valid TSP tour.
• Approximation Ratio: Christofides' algorithm guarantees that the length of the tour is no
more than 1.5 times the length of the optimal TSP solution.
The Nearest Neighbor Algorithm is a simple and greedy approach to the TSP, often used for heuristic
solutions.
• Steps:
2. From the current city, move to the nearest city (the one with the smallest distance
that hasn't been visited yet).
• Approximation Quality:
o Worst-case performance: The Nearest Neighbor algorithm can perform poorly, and
its approximation ratio can be as bad as n, where nnn is the number of cities. This is
because the greedy choices may lead to suboptimal paths.
o Practical use: Despite its poor worst-case performance, it is fast and often yields
relatively good solutions for smaller instances.
3. Minimum Spanning Tree-Based Approximations
Another approximation strategy for TSP is based on constructing a Minimum Spanning Tree (MST).
This method works as follows:
• Steps:
2. Double the MST: Since an MST is a tree, it does not form a cycle. To make it a cycle,
"double" the tree by traversing each edge twice, once in each direction.
• Approximation Ratio:
o The resulting approximation algorithm guarantees that the length of the tour is at
most twice the length of the optimal TSP solution.
• Time Complexity: This algorithm runs in polynomial time, as the MST can be computed in
O(n log n) time using algorithms like Prim’s or Kruskal’s, and the shortcutting step is linear.
1. No Exact Solution: Approximation algorithms do not provide the exact optimal solution;
they only provide near-optimal solutions.
2. Worst-Case Performance: Some algorithms, like Nearest Neighbor, can have poor worst-case
performance, especially for poorly structured instances of the problem.
• Logistics and Route Planning: Optimizing delivery routes, traveling sales, and logistics for
minimizing travel time or cost.
• Manufacturing: Minimizing the time spent on tasks or machines (e.g., in cutting, drilling, or
assembly lines).
• Computer Networks: Optimizing the layout of network paths for minimizing latency and
maximizing efficiency.
• Circuit Design: Minimizing wire length in the design of electronic circuits, where routing a
circuit between multiple points is analogous to solving TSP.
Conclusion
The Traveling Salesman Problem (TSP) is one of the most famous NP-hard problems. Due to its
computational complexity, approximation algorithms are used in practice to find near-optimal
solutions efficiently. Among the most prominent is Christofides' algorithm, which guarantees a
solution within 1.5 times the optimal for metric TSP. While greedy algorithms like Nearest Neighbor
are faster, they may result in poor solutions in the worst case. Still, approximation algorithms are
invaluable for real-world applications where finding an optimal solution is impractical, but a near-
optimal solution is good enough.
Minimum Spanning Tree − The minimum spanning tree is a tree data structure that contains all the
vertices of main graph with minimum number of edges connecting them. We apply prim’s algorithm
for minimum spanning tree in this case.
Pre-order Traversal − The pre-order traversal is done on tree data structures where a pointer is
walked through all the nodes of the tree in a [root – left child – right child] order.
Algorithm
Step 1 − Choose any vertex of the given graph randomly as the starting and ending point.
Step 2 − Construct a minimum spanning tree of the graph with the vertex chosen as the root using
prim’s algorithm.
Step 3 − Once the spanning tree is constructed, pre-order traversal is performed on the minimum
spanning tree obtained in the previous step.
Step 4 − The pre-order solution obtained is the Hamiltonian path of the travelling salesperson.
Pseudocode
APPROX_TSP(G, c)
T <- MST_Prim(G, c, r)
visited = {ф}
for i in range V:
H <- Preorder_Traversal(G)
visited = {H}
Analysis
To prove this, we need to show that the approximate cost of the problem is double the optimal cost.
Few observations that support this claim would be as follows −
• The cost of minimum spanning tree is never less than the cost of the optimal Hamiltonian
path. That is, c(M) ≤ c(H*).
• The cost of full walk is also twice as the cost of minimum spanning tree. A full walk is defined
as the path traced while traversing the minimum spanning tree preorderly. Full walk
traverses every edge present in the graph exactly twice. Thereore, c(W) = 2c(T)
• Since the preorder walk path is less than the full walk path, the output of the algorithm is
always lower than the cost of the full walk.
Example
Let us look at an example graph to visualize this approximation algorithm −
Solution
Consider vertex 1 from the above graph as the starting and ending point of
the travelling salesperson and begin the algorithm from here.
Step 1
Step 2
Once, the minimum spanning tree is constructed, consider the starting vertex
as the root node (i.e., vertex 1) and walk through the spanning tree
preorderly.
Step 3
Adding the root node at the end of the traced path, we get, 1 → 2 → 5 → 6 → 3
→4→1
This is the output Hamiltonian path of the travelling salesman approximation
problem. The cost of the path would be the sum of all the costs in the
minimum spanning tree, i.e., 55.
The All-Pairs Shortest Path (APSP) problem is a fundamental problem in graph theory. The task is to
find the shortest paths between all pairs of vertices in a weighted graph. Given a graph with nnn
vertices and mmm edges, the goal is to compute the shortest path from every vertex to every other
vertex.
While the APSP problem can be solved exactly using algorithms like Floyd-Warshall or Johnson's
Algorithm, these methods can be computationally expensive for large graphs, especially when the
graph has a large number of vertices. For this reason, approximation algorithms are often used in
scenarios where exact solutions are not feasible due to time or resource constraints.
In cases where the exact solution is not required or is too computationally expensive, approximation
algorithms can be used to compute the shortest paths in a more efficient manner. Approximation
algorithms for APSP aim to provide solutions that are close to the optimal, but with reduced time
complexity.
The Thorup-Zwick algorithm provides an approximation for the All-Pairs Shortest Path (APSP)
problem on unweighted or weighted graphs. It is particularly efficient for dense graphs and is known
for providing small approximation guarantees.
Steps of Matrix Multiplication Approximation:
1. Adjacency Matrix: Represent the graph as an adjacency matrix, where each entry
A[i][j]A[i][j]A[i][j] contains the weight (or infinity if no edge exists) between vertex iii and
vertex jjj.
3. Approximate the Distances: The matrix product represents the shortest paths, and the
entries of the matrix are approximated within the desired factor.
For certain types of graphs, particularly planar graphs or graphs with low edge density, local search-
based algorithms can be used to approximate APSP solutions.
• Approximation Ratio: The approximation ratio can vary depending on the type of graph and
the specific algorithm used, but it is typically around (1 + ε), where ϵ\epsilonϵ is a small
constant.
• Time Complexity: These algorithms can run in polynomial time, depending on the graph
structure and the method used.
3. Termination: Continue refining until the approximation is close enough to the optimal
solution or until a predefined error threshold is reached.
Advantages of APSP Approximation Algorithms
• Efficiency: These algorithms are designed to be much more efficient than exact algorithms,
especially in dense or large graphs.
• Approximation Factor: These algorithms do not always provide exact solutions but rather
approximate solutions, which may not be optimal in all cases.
• Specialized for Certain Graphs: Some algorithms, like Thorup-Zwick and local search-based
approaches, are more suitable for specific types of graphs (e.g., dense graphs, planar
graphs).
• Error Margin: The approximation ratio is often a small constant ϵ\epsilonϵ, meaning there is
always some error margin in the computed distances.
1. Network Routing: In communication networks, determining the shortest paths between all
pairs of nodes can be useful for routing decisions.
2. Transportation Systems: In logistics and route planning, approximate APSP solutions can
help optimize the routes between multiple cities or locations.
3. Geospatial Analysis: In geography and urban planning, approximate APSP solutions are used
for determining the shortest travel paths between multiple locations in large-scale maps.
4. Data Mining: In some machine learning applications, APSP can be used to compute distances
between points (e.g., in clustering algorithms).
Conclusion
Approximation algorithms for All-Pairs Shortest Path (APSP) offer practical solutions when exact
algorithms are computationally expensive or infeasible for large graphs. Thorup-Zwick’s algorithm
and matrix multiplication-based methods provide strong approximation guarantees, making them
suitable for dense graphs. While local search-based methods work well for specific
Given two strings, s1 and s2, the task is to find the length of the Longest Common Subsequence. If
there is no common subsequence, return 0.
A subsequence is a string generated from the original string by deleting 0 or more characters and
without changing the relative order of the remaining characters. For example , subsequences of
“ABC” are “”, “A”, “B”, “C”, “AB”, “AC”, “BC” and “ABC”.
In general a string of length n has 2n subsequences.
Examples:
The Longest Common Subsequence (LCS) problem is a classical problem in computer science, often
encountered in string matching, bioinformatics, and data comparison. It deals with finding the
longest sequence that appears in both strings in the same relative order but not necessarily
consecutively.
Applications of LCS
1. Bioinformatics: Finding similarities between genetic sequences. For example, aligning DNA
or protein sequences to identify conserved subsequences.
2. Text Comparison: Used in diff tools to compare files or texts and highlight changes.
4. Version Control: In version control systems, LCS is used to detect changes between versions
of a file.