0% found this document useful (0 votes)
14 views

Unit3 Approximation Algorithm

Approx algorithm

Uploaded by

suhanaqureshi076
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit3 Approximation Algorithm

Approx algorithm

Uploaded by

suhanaqureshi076
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Unit3(Approximation algorithms)

An approximation algorithm is a way of dealing with NP-completeness for an optimization problem.


This technique does not guarantee the best solution. The goal of the approximation algorithm is to
come as close as possible to the optimal solution in polynomial time. Such algorithms are called
approximation algorithms or heuristic algorithms.
• An approximation algorithm guarantees to run in polynomial time though it does not
guarantee the most effective solution.
• An approximation algorithm guarantees to seek out high accuracy and top quality solution(say
within 1% of optimum)
• Approximation algorithms are used to get an answer near the (optimal) solution of an
optimization problem in polynomial time
Scenario-1 :
1. Suppose that we are working on an optimization problem in which each potential solution has
a cost, and we wish to find a near-optimal solution. Depending on the problem, we may define
an optimal solution as one with maximum possible cost or one with minimum possible cost,i.e,
the problem can either be a maximization or minimization problem.
2. We say that an algorithm for a problem has an appropriate ratio of P(n) if, for any input size n,
the cost C of the solution produced by the algorithm is within a factor of P(n) of the cost C* of
an optimal solution as follows.
max(C/C*,C*/C)<=P(n)
Scenario-2 :
If an algorithm reaches an approximation ratio of P(n), then we call it a P(n)-approximation algorithm.
• For a maximization problem, 0< C < C*, and the ratio of C*/C gives the factor by which the cost
of an optimal solution is larger than the cost of the approximate algorithm.
• For a minimization problem, 0< C* < C, and the ratio of C/C* gives the factor by which the cost
of an approximate solution is larger than the cost of an optimal solution.
Some examples of the Approximation algorithm :
Here, we will discuss some examples of the Approximation Algorithm as follows.
1. The Vertex Cover Problem –
In the vertex cover problem, the optimization problem is to find the vertex cover with
the fewest vertices, and the approximation problem is to find the vertex cover
with few vertices.
2. Travelling Salesman Problem –
In the traveling salesperson problem, the optimization problem is to find the shortest cycle,
and the approximation problem is to find a short cycle.
3. The Set Covering Problem –
This is an optimization problem that models many problems that require resources to be
allocated. Here, a logarithmic approximation ratio is used.
4. The Subset Sum Problem –
In the Subset sum problem, the optimization problem is to find a subset of {x1,×2,×3…xn}
whose sum is as large as possible but not larger than the target value t.

An Approximate Algorithm is a way of approach NP-COMPLETENESS for the optimization problem.


This technique does not guarantee the best solution. The goal of an approximation algorithm is to
come as close as possible to the optimum value in a reasonable amount of time which is at the most
polynomial time. Such algorithms are called approximation algorithm or heuristic algorithm.
o For the traveling salesperson problem, the optimization problem is to find the shortest cycle,
and the approximation problem is to find a short cycle.
o For the vertex cover problem, the optimization problem is to find the vertex cover with fewest
vertices, and the approximation problem is to find the vertex cover with few vertices.
Performance Ratios
Suppose we work on an optimization problem where every solution carries a cost. An Approximate
Algorithm returns a legal solution, but the cost of that legal solution may not be optimal.
For Example, suppose we are considering for a minimum size vertex-cover (VC). An approximate
algorithm returns a VC for us, but the size (cost) may not be minimized.
Another Example is we are considering for a maximum size Independent set (IS). An approximate
Algorithm returns an IS for us, but the size (cost) may not be maximum. Let C be the cost of the solution
returned by an approximate algorithm, and C* is the cost of the optimal solution.
We say the approximate algorithm has an approximate ratio P (n) for an input size n, where

Intuitively, the approximation ratio measures how bad the approximate solution is distinguished with
the optimal solution. A large (small) approximation ratio measures the solution is much worse than
(more or less the same as) an optimal solution.
Observe that P (n) is always ≥ 1, if the ratio does not depend on n, we may write P. Therefore, a 1-
approximation algorithm gives an optimal solution. Some problems have polynomial-time
approximation algorithm with small constant approximate ratios, while others have best-known
polynomial time approximation algorithms whose approximate ratios grow with n.
Approximation Algorithms
Approximation algorithms are algorithms designed to solve problems that are not solvable in
polynomial time for approximate solutions. These problems are known as NP complete problems.
These problems are significantly effective to solve real world problems, therefore, it becomes
important to solve them using a different approach
NP complete problems can still be solved in three cases: the input could be so small that the execution
time is reduced, some problems can still be classified into problems that can be solved in polynomial
time, or use approximation algorithms to find near-optima solutions for the problems.
Performance Ratios
The main idea behind calculating the performance ratio of an approximation algorithm, which is also
called as an approximation ratio, is to find how close the approximate solution is to the optimal
solution.
The approximate ratio is represented using ρ(n) where n is the input size of the algorithm, C is the
near-optimal solution obtained by the algorithm, C* is the optimal solution for the problem. The
algorithm has an approximate ratio of ρ(n) if and only if −

The algorithm is then called a ρ(n)-approximation algorithm. Approximation Algorithms can be applied
on two types of optimization problems: minimization problems and maximization problems. If the
optimal solution of the problem is to find the maximum cost, the problem is known as the
maximization problem; and if the optimal solution of the problem is to find the minimum cost, then
the problem is known as a minimization problem.
For maximization problems, the approximation ratio is calculated by C*/C since 0 ≤ C ≤ C*. For
minimization problems, the approximation ratio is calculated by C/C* since 0 ≤ C* ≤ C.
Assuming that the costs of approximation algorithms are all positive, the performance ratio is well
defined and will not be less than 1. If the value is 1, that means the approximate algorithm generates
the exact optimal solution.

Few popular examples of the approximation algorithms are −


• Vertex Cover Algorithm
• Set Cover Problem
• Travelling Salesman Problem (Approximation Approach)
• The Subset Sum Problem

Approximation Algorithms: Explanation


An approximation algorithm is an algorithm used to find near-optimal solutions to optimization
problems, particularly for problems that are NP-hard or intractable. These algorithms do not
guarantee finding the exact optimal solution, but they provide solutions that are close to the best
possible one within a guaranteed factor. This makes approximation algorithms especially useful when
exact algorithms would be too slow or impractical due to the problem’s complexity.
Key Concepts in Approximation Algorithms
1. Optimization Problems: These problems aim to find the best solution from a set of possible
solutions. Examples include:
o Maximizing some objective (e.g., profit, efficiency, coverage).
o Minimizing some cost (e.g., distance, time, error).
2. NP-Hard Problems: Approximation algorithms are especially valuable for NP-hard problems,
which do not have known polynomial-time solutions. These problems include:
o Traveling Salesman Problem (TSP).
o Knapsack Problem.
o Set Cover Problem.
o Vertex Cover Problem.
o Max Cut Problem.
3. Approximation Ratio:
o This is the measure of how close the solution provided by an approximation algorithm
is to the optimal solution.

A good approximation algorithm aims for an approximation ratio as close as possible to 1, meaning the
algorithm’s output is as close to optimal as possible.
4. Performance Guarantee: Approximation algorithms often come with performance
guarantees, which provide bounds on how much worse the algorithm's solution is compared
to the optimal solution. This guarantee ensures that the algorithm's solution will always be
within a known factor of the optimal.
5. Types of Approximation Algorithms:
o Greedy Algorithms: Make locally optimal choices at each step with the hope of finding
a global optimum. Common in problems like the Set Cover or Knapsack problems.
o Primal-Dual Algorithms: These algorithms solve optimization problems by
simultaneously considering the primal and dual problems and maintaining a
relationship between them.
o Local Search Algorithms: Involve iterating over possible solutions and moving to a
neighboring solution that improves the objective function.

Example of Approximation Algorithms


1. Vertex Cover Problem:
o Problem: Find the smallest set of vertices that cover all edges in a graph.
o Greedy Approximation Algorithm: At each step, pick an edge and add both its
endpoints to the vertex cover. Repeat until all edges are covered.
o Approximation Ratio: The approximation ratio is 2, meaning the algorithm will never
produce a vertex cover larger than twice the size of the optimal solution.
2. Set Cover Problem:
o Problem: Given a universe of elements and a collection of sets, find the smallest
number of sets that cover all the elements.
o Greedy Approximation Algorithm: Repeatedly pick the set that covers the largest
number of uncovered elements.
o Approximation Ratio: The greedy algorithm provides a solution that is at most
ln⁡n\ln nlnn times the optimal solution, where nnn is the number of elements in the
universe.
3. Traveling Salesman Problem (TSP):
o Problem: Find the shortest possible route that visits each city exactly once and returns
to the origin city.
o Christofides’ Algorithm: For the metric TSP (where distances satisfy the triangle
inequality), this algorithm provides a solution within 1.5 times the optimal.
o Approximation Ratio: 1.5, meaning the solution will be at most 1.5 times the length
of the optimal tour.

Key Points of Approximation Algorithms


• Used for NP-Hard Problems: Approximation algorithms are applied when exact solutions are
computationally expensive or impossible for NP-hard problems.
• Approximation Ratio: The quality of an approximation algorithm is measured by its
approximation ratio, which bounds how close the solution is to the optimal.
• Performance Guarantees: They provide performance guarantees, ensuring that the solution
is not arbitrarily worse than the optimal solution.
• Polynomial Time: Approximation algorithms typically run in polynomial time, making them
feasible for large inputs where exact solutions would be too time-consuming.
• Greedy, Primal-Dual, and Local Search: Common strategies for developing approximation
algorithms, depending on the problem structure.
• Common Applications: Used in problems like Knapsack, Set Cover, Traveling Salesman, Vertex
Cover, and Max Cut.

Summary of Approximation Algorithms (Bullet Points)


• Approximation algorithms are used for solving NP-hard optimization problems when exact
solutions are computationally impractical.
• Approximation ratio measures the quality of the solution and guarantees that the algorithm's
solution will be close to the optimal solution within a specified factor.
• Common strategies for approximation algorithms:
o Greedy algorithms (locally optimal choices).
o Primal-dual algorithms (simultaneously consider primal and dual).
o Local search (exploring neighbors for improvements).
• Key characteristics:
o Polynomial time complexity: Typically efficient for large inputs.
o Guaranteed bounds on the performance (approximation ratio).
• Examples of problems solved with approximation algorithms include:
o Set Cover, Vertex Cover, Knapsack, Traveling Salesman, Max Cut.
• Approximation algorithms provide an effective way to find near-optimal solutions for problems
where exact solutions are not feasible due to time or computational constraints.

Applications of Approximation Algorithms


• Network Design: Finding the minimum cost network that connects various points or covers
certain resources.
• Resource Allocation: Problems like knapsack or budgeting, where a solution needs to be near-
optimal due to constraints.
• Scheduling and Routing: Problems like TSP and vehicle routing where exact solutions may be
too expensive, but near-optimal solutions are sufficient.
• Data Mining: Approximate solutions for clustering or classification problems where exact
solutions are computationally prohibitive.

Conclusion
Approximation algorithms offer a powerful way to deal with NP-hard optimization problems by trading
off perfect accuracy for efficiency. While they may not always provide the exact optimal solution, their
ability to find near-optimal solutions in polynomial time makes them invaluable in real-world
applications where exact solutions would be too slow or infeasible to compute.
there exist some problems whose solutions are not yet found, the problems are divided into classes
known as Complexity Classes. In complexity theory, a Complexity Class is a set of problems with
related complexity. These classes help scientists to group problems based on how much time and space
they require to solve problems and verify the solutions. It is the branch of the theory of computation
that deals with the resources required to solve a problem.

The common resources are time and space, meaning how much time the algorithm takes to solve a
problem and the corresponding memory usage.
• The time complexity of an algorithm is used to describe the number of steps required to solve
a problem, but it can also be used to describe how long it takes to verify the answer.
• The space complexity of an algorithm describes how much memory is required for the
algorithm to operate.

Complexity classes are useful in organising similar types of problems.


Types of Complexity Classes
1. P Class
2. NP Class
3. NP-hard
4. NP-complete

P Class
The P in the P class stands for Polynomial Time. It is the collection of decision problems(problems with
a “yes” or “no” answer) that can be solved by a deterministic machine in polynomial time.
Features:
• The solution to P problems is easy to find.
• P is often a class of computational problems that are solvable and tractable. Tractable means
that the problems can be solved in theory as well as in practice. But the problems that can be
solved in theory but not in practice are known as intractable.
This class contains many problems:
1. Calculating the greatest common divisor.
2. Finding a maximum matching.
3. Merge Sort

NP Class
The NP in NP class stands for Non-deterministic Polynomial Time. It is the collection of decision
problems that can be solved by a non-deterministic machine in polynomial time.
Features:
• The solutions of the NP class are hard to find since they are being solved by a non-deterministic
machine but the solutions are easy to verify.
• Problems of NP can be verified by a Turing machine in polynomial time.

This class contains many problems that one would like to be able to solve effectively:
1. Boolean Satisfiability Problem (SAT).
2. Hamiltonian Path Problem.
3. Graph coloring.

Co-NP Class
Co-NP stands for the complement of NP Class. It means if the answer to a problem in Co-NP is No, then
there is proof that can be checked in polynomial time.
Features:
• If a problem X is in NP, then its complement X’ is also in CoNP.
Some example problems for CoNP are:
1. To check prime number.
2. Integer Factorization.

NP-hard class
An NP-hard problem is at least as hard as the hardest problem in NP and it is a class of problems such
that every problem in NP reduces to NP-hard.
Features:
• All NP-hard problems are not in NP.
• It takes a long time to check them. This means if a solution for an NP-hard problem is given
then it takes a long time to check whether it is right or not.
• A problem A is in NP-hard if, for every problem L in NP, there exists a polynomial-time reduction
from L to A.
Some of the examples of problems in Np-hard are:
1. Halting problem.
2. Qualified Boolean formulas.
3. No Hamiltonian cycle.

NP-complete class
A problem is NP-complete if it is both NP and NP-hard. NP-complete problems are the hard problems
in NP.
Features:
• NP-complete problems are special as any problem in NP class can be transformed or reduced
into NP-complete problems in polynomial time.
• If one could solve an NP-complete problem in polynomial time, then one could also solve any
NP problem in polynomial time.

Some example problems include:


1. Hamiltonian Cycle.
2. Satisfiability.

Complexity Class Characteristic feature

P Easily solvable in polynomial time.

NP Yes, answers can be checked in polynomial time.

Co-NP No, answers can be checked in polynomial time.

NP-hard All NP-hard problems are not in NP and it takes a long time to check them.

NP-complete A problem that is NP and NP-hard is NP-complete.

Definition of NP class Problem: - The set of all decision-based problems came into the division of NP
Problems who can't be solved or produced an output within polynomial time but verified in
the polynomial time. NP class contains P class as a subset. NP problems being hard to solve.

Definition of P class Problem: - The set of decision-based problems come into the division of P
Problems who can be solved or produced an output within polynomial time. P problems being easy to
solve
Definition of Polynomial time: - If we produce an output according to the given input within a specific
amount of time such as within a minute, hours. This is known as Polynomial time.
Definition of Non-Polynomial time: - If we produce an output according to the given input but there
are no time constraints is known as Non-Polynomial time. But yes output will produce but time is not
fixed yet.
Definition of Decision Based Problem: - A problem is called a decision problem if its output is a simple
"yes" or "no" (or you may need this of this as true/false, 0/1, accept/reject.) We will phrase many
optimization problems as decision problems. For example, Greedy method, D.P., given a graph G= (V,
E) if there exists any Hamiltonian cycle.
Definition of NP-hard class: - Here you to satisfy the following points to come into the division of NP-
hard
1. If we can solve this problem in polynomial time, then we can solve all NP problems in
polynomial time
2. If you convert the issue into one form to another form within the polynomial time
Definition of NP-complete class: - A problem is in NP-complete, if
1. It is in NP
2. It is NP-hard

Pictorial representation of all NP classes which includes NP, NP-hard, and NP-complete

In computational complexity theory, P, NP, NP-Hard, and NP-Complete are classes used to categorize
problems based on their computational difficulty and the resources required to solve them. These
concepts help us understand the inherent difficulty of solving various computational problems and
how they relate to one another.

Introduction to P, NP, NP-Hard, and NP-Complete


In computational complexity theory, P, NP, NP-Hard, and NP-Complete are classes used to categorize
problems based on their computational difficulty and the resources required to solve them. These
concepts help us understand the inherent difficulty of solving various computational problems and
how they relate to one another.
Here’s a breakdown of each class:

P (Polynomial Time)
• Definition: P is the class of decision problems (problems with a yes/no answer) that can be
solved in polynomial time by a deterministic algorithm. In other words, problems in P can be
solved efficiently.
• Examples:
o Sorting algorithms (like MergeSort and QuickSort).
o Finding the shortest path in a graph (using Dijkstra's algorithm).
o Matrix multiplication.
NP (Nondeterministic Polynomial Time)
• Definition: NP is the class of decision problems for which a proposed solution can be verified
in polynomial time by a deterministic algorithm. In simpler terms, if someone gives you a
solution, you can check whether it’s correct in polynomial time.
• Key Point: Problems in NP might not be solvable efficiently, but if you are given a potential
solution, you can verify its correctness quickly (in polynomial time).
• Examples:
o Sudoku (given a completed grid, it's easy to verify if it's correct).
o Graph Coloring (given a coloring, it's easy to check if it satisfies the constraints).

NP-Hard
• Definition: NP-Hard is a class of problems that are at least as hard as the hardest problems in
NP. In other words, a problem is NP-Hard if any problem in NP can be reduced to it in
polynomial time.
• Key Point: An NP-Hard problem does not necessarily have to be in NP itself (i.e., it might not
have a polynomial-time verification procedure), but it is at least as difficult as the hardest NP
problems.
• Examples:
o Travelling Salesman Problem (TSP) (finding the shortest possible route through all
cities).
o Knapsack Problem (select items to maximize value within weight limits).
o Halting Problem (deciding whether a program will halt or run forever).

NP-Complete
• Definition: NP-Complete is a subset of NP problems that are both in NP and NP-Hard. In other
words, a problem is NP-Complete if:
1. It is in NP (its solution can be verified in polynomial time).
2. Every problem in NP can be reduced to it in polynomial time (meaning it is at least as
hard as the hardest problems in NP).
• Key Point: If you can solve any NP-Complete problem in polynomial time, then every problem
in NP can also be solved in polynomial time (this would imply P = NP).
• Examples:
o Boolean Satisfiability Problem (SAT) (given a boolean expression, determine if there's
an assignment that makes it true).
o Knapsack Problem (with integer weights and values).
o Graph Coloring (finding the smallest number of colors to color a graph).

Comparison Table: P, NP, NP-Hard, and NP-Complete


Class Definition Key Characteristics Examples
Sorting, Matrix
Problems that can be solved in
Efficiently solvable in polynomial multiplication,
P polynomial time by a
time. Shortest path in a
deterministic algorithm.
graph.
Problems whose solutions can
Solutions can be verified in Sudoku, Graph
NP be verified in polynomial time
polynomial time, but not Coloring, Subset Sum.
by a deterministic algorithm.
Class Definition Key Characteristics Examples
necessarily solvable in polynomial
time.
Problems that are at least as
No guarantee that solutions can be
hard as the hardest problems TSP, Halting Problem,
verified in polynomial time. These
NP-Hard in NP. They may not be in NP Knapsack Problem
problems are as hard as NP
(not verifiable in polynomial (General Case).
problems.
time).
The hardest problems in NP; if any
SAT, Knapsack
Problems that are both in NP NP-Complete problem can be
NP- Problem (0/1), Graph
and NP-Hard. They are the solved in polynomial time, all NP
Complete Coloring, Clique
hardest problems in NP. problems can be solved in
Problem.
polynomial time.
Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next
piece that offers the most obvious and immediate benefit. Greedy algorithms are used for optimization
problems.

An optimization problem can be solved using Greedy if the problem has the following property:
• At every step, we can make a choice that looks best at the moment, and we get the optimal
solution to the complete problem.
• Some popular Greedy Algorithms are Fractional Knapsack, Dijkstra’s algorithm, Kruskal’s
algorithm, Huffman coding and Prim’s Algorithm
• The greedy algorithms are sometimes also used to get an approximation for Hard optimization
problems. For example, Traveling Salesman Problem is an NP-Hard problem. A Greedy choice
for this problem is to pick the nearest unvisited city from the current city at every step. These
solutions don’t always produce the best optimal solution but can be used to get an
approximately optimal solution.
However, it’s important to note that not all problems are suitable for greedy algorithms. They work
best when the problem exhibits the following properties:
• Greedy Choice Property: The optimal solution can be constructed by making the best local
choice at each step.
• Optimal Substructure: The optimal solution to the problem contains the optimal solutions to
its subproblems.
Characteristics of Greedy Algorithm
Here are the characteristics of a greedy algorithm:
• Greedy algorithms are simple and easy to implement.
• They are efficient in terms of time complexity, often providing quick solutions. Greedy
Algorithms are typically preferred over Dynamic Programming for the problems where both
are applied. For example, Jump Game problem and Single Source Shortest Path Problem
(Dijkstra is preferred over Bellman Ford where we do not have negative weights)..
• These algorithms do not reconsider previous choices, as they make decisions based on current
information without looking ahead.
These characteristics help to define the nature and usage of greedy algorithms in problem-solving.
Want to master Greedy algorithm and more? Check out our DSA Self-Paced Course for a
comprehensive guide to Data Structures and Algorithms at your own pace. This course will help you
build a strong foundation and advance your problem-solving skills.
How does the Greedy Algorithm works?
Greedy Algorithm solve optimization problems by making the best local choice at each step in the
hope of finding the global optimum. It’s like taking the best option available at each moment, hoping
it will lead to the best overall outcome.
Here’s how it works:
1. Start with the initial state of the problem. This is the starting point from where you begin
making choices.
2. Evaluate all possible choices you can make from the current state. Consider all the options
available at that specific moment.
3. Choose the option that seems best at that moment, regardless of future consequences. This
is the “greedy” part – you take the best option available now, even if it might not be the best
in the long run.
4. Move to the new state based on your chosen option. This becomes your new starting point for
the next iteration.
5. Repeat steps 2-4 until you reach the goal state or no further progress is possible. Keep making
the best local choices until you reach the end of the problem or get stuck..
Example:
Let’s say you have a set of coins with values {1, 2, 5, 10, 20, 50, 100} and you need to give minimum
number of coin to someone change for 36 .
The greedy algorithm for making change would work as follows:
1. Start with the largest coin value that is less than or equal to the amount to be changed. In this
case, the largest coin less than 36 is 20 .
2. Subtract the largest coin value from the amount to be changed, and add the coin to the
solution. In this case, subtracting 20 from 36 gives 16 , and we add a 20 coin to the solution.
3. Repeat steps 1 and 2 until the amount to be changed becomes 0.
So, using the greedy algorithm, the solution for making change for 36 would be one 20 coins, one 10
coin, one 5 coins and one 1 coin needed.
Note: This is just one example, and other greedy choices could have been made at each step. However,
in this case, the greedy approach leads to the optimal solution.

Greedy Algorithm Vs Dynamic Programming


Below are the comparison of Greedy Algorithm and Dynamic Programming based on various criteria:

Criteria Greedy Algorithm Dynamic Programming

Makes the locally optimal choice at Solves subproblems and builds up to


Basic Idea each stage the optimal solution

Not always guaranteed to provide Guarantees the globally optimal


Optimal Solution the globally optimal solution solution

Typically faster; often linear or Usually slower due to solving


Time Complexity polynomial time overlapping subproblems

Requires less memory; often Requires more memory due to storing


Space Complexity constant or linear space intermediate results
Criteria Greedy Algorithm Dynamic Programming

Subproblems Does not handle overlapping Handles overlapping subproblems


Overlapping subproblems efficiently

Finding minimum spanning tree, Matrix chain multiplication, shortest


Examples Huffman coding path problems

Used when a greedy choice at each Applied when the problem can be
step leads to the globally optimal broken down into overlapping
Applications solution subproblems

Applications of Greedy Algorithms:


• We use Greedy Algorithms in our day to day life to find minimum number of coins or notes for
a given amount. We fist begin with largest denomination and try to use maximum number of
the largest and then second largest and so on.
• Dijkstra’s shortest path algorithm: Finds the shortest path between two nodes in a graph.
• Kruskal’s and Prim’s minimum spanning tree algorithm: Finds the minimum spanning tree for
a weighted graph. Minimum Spanning Trees are used in Computer Networks Designs and have
many real world applications
• Huffman coding: Creates an optimal prefix code for a set of symbols based on their
frequencies.
• Fractional knapsack problem: Determines the most valuable items to carry in a knapsack with
a limited weight capacity.
• Activity selection problem: Chooses the maximum number of non-overlapping activities from
a set of activities.
• Job Sequencing and Job Scheduling Problems.
• Finding close to the optimal solution for NP-Hard problems like TSP. ide range of network
design problems, such as routing, resource allocation, and capacity planning.
• Machine learning: Greedy algorithms can be used in machine learning applications, such as
feature selection, clustering, and classification. In feature selection, greedy algorithms are
used to select a subset of features that are most relevant to a given problem. In clustering and
classification, greedy algorithms can be used to optimize the selection of clusters or classes
• Image processing: Greedy algorithms can be used to solve a wide range of image processing
problems, such as image compression, denoising, and segmentation. For example, Huffman
coding is a greedy algorithm that can be used to compress digital images by efficiently
encoding the most frequent pixels.
• Combinatorics optimization: Greedy algorithms can be used to solve combinatorial
optimization problems, such as the traveling salesman problem, graph coloring, and
scheduling. Although these problems are typically NP-hard, greedy algorithms can often
provide close-to-optimal solutions that are practical and efficient.
• Game theory: Greedy algorithms can be used in game theory applications, such as finding the
optimal strategy for games like chess or poker. In these applications, greedy algorithms can be
used to identify the most promising moves or actions at each turn, based on the current state
of the game.
Advantages of Greedy Algorithms:
• Simple and easy to understand: Greedy algorithms are often straightforward to implement
and reason about.
• Efficient for certain problems: They can provide optimal solutions for specific problems, like
finding the shortest path in a graph with non-negative edge weights.
• Fast execution time: Greedy algorithms generally have lower time complexity compared to
other algorithms for certain problems.
• Intuitive and easy to explain : The decision-making process in a greedy algorithm is often easy
to understand and justify.
• Can be used as building blocks for more complex algorithms: Greedy algorithms can be
combined with other techniques to design more sophisticated algorithms for challenging
problems.
Disadvantages of the Greedy Approach:
• Not always optimal: Greedy algorithms prioritize local optima over global optima, leading to
suboptimal solutions in some cases.
• Difficult to prove optimality: Proving the optimality of a greedy algorithm can be challenging,
requiring careful analysis.
• Sensitive to input order: The order of input data can affect the solution generated by a greedy
algorithm.
• Limited applicability: Greedy algorithms are not suitable for all problems and may not be
applicable to problems with complex constraints.

Greedy Approach in Approximation Algorithms


The Greedy Approach is a simple and effective strategy used in many approximation algorithms,
particularly when solving optimization problems. In a greedy algorithm, the problem is solved by
making a sequence of choices that look the best at the moment, with the hope of finding an optimal
or near-optimal solution.
In the context of approximation algorithms, the greedy approach is often used for problems where
finding the exact optimal solution is computationally hard (i.e., the problem is NP-Hard), but a good
enough solution can be found efficiently.

How Greedy Algorithms Work


1. Local Optimal Choice: In each step, the algorithm makes a decision that seems the best, based
only on the information available at that moment (i.e., a local optimal choice).
2. No Backtracking: Once a decision is made, the algorithm doesn't revisit or reconsider previous
decisions. There’s no backtracking or undoing of choices, making it greedy in nature.
3. Hope for Global Optimality: The key idea is that making the best choice at each step leads to
a globally good solution. While this is not always true, it works well for many problems where
a near-optimal solution is sufficient.

Greedy Approximation Algorithms: Key Characteristics


• Efficiency: Greedy algorithms are typically very efficient, running in polynomial time (often
linear or logarithmic) because they only make a small number of decisions.
• Simplicity: They are easy to implement due to their straightforward decision-making process.
• No Guarantees of Optimality: Greedy algorithms do not always guarantee an optimal solution,
but they often produce good approximations to the optimal solution.
• Performance Guarantee: In the case of approximation algorithms, greedy algorithms come
with a performance guarantee, typically an approximation ratio that bounds how far the
solution is from the optimal.

Summary of Key Points (Greedy Approach in Approximation Algorithms)


• Greedy Approach: Involves making a sequence of locally optimal choices to solve an
optimization problem, with the hope of finding a global optimum or a good approximation.
• Efficiency: Greedy algorithms are typically fast and efficient, running in polynomial time.
• Approximation: While they may not always find the optimal solution, greedy algorithms often
provide good approximation ratios for NP-hard problems.
• Applications: Greedy algorithms are used for a wide range of problems like the Set Cover,
Knapsack, Minimum Spanning Tree, and Activity Selection problems.
• Optimality: In some cases (like Fractional Knapsack), the greedy approach yields the optimal
solution. In other cases (like Set Cover), it provides a near-optimal solution with a known
performance guarantee.

Examples of Approximation Problems Solved Using Greedy Algorithms


1. Set Cover Problem:
o Problem: Given a universe of elements and a collection of subsets, choose the
smallest number of subsets such that all elements in the universe are covered.
o Greedy Approach: In each step, choose the subset that covers the largest number of
uncovered elements. Repeat this until all elements are covered.
o Approximation Ratio: The greedy algorithm provides an approximation ratio of
ln⁡n\ln nlnn, where nnn is the number of elements in the universe.
2. Fractional Knapsack Problem:
o Problem: Given a set of items, each with a weight and value, and a knapsack with a
weight capacity, determine the maximum value that can be obtained by placing items
in the knapsack (allowing fractions of items).
o Greedy Approach: The algorithm sorts items by their value-to-weight ratio and picks
items starting from the highest ratio, placing as much of each item as possible in the
knapsack.
o Approximation Ratio: The greedy algorithm finds the optimal solution in this case
because the problem is fractional, meaning it's possible to take fractions of items.
3. Activity Selection Problem:
o Problem: Given a set of activities with start and end times, select the maximum
number of non-overlapping activities.
o Greedy Approach: Select the activity that finishes the earliest, then repeat for the
remaining activities, always picking the next one that starts after the previous one
finishes.
o Approximation Ratio: This greedy algorithm provides an optimal solution (i.e., it is
exact, not just an approximation).
4. Huffman Coding (used in data compression):
o Problem: Given a set of symbols with frequencies, build a prefix-free binary tree to
minimize the total length of the encoded messages.
o Greedy Approach: Merge the two symbols with the lowest frequencies at each step,
creating a binary tree that represents the optimal encoding.
o Approximation Ratio: The greedy algorithm produces the optimal solution for this
problem.
5. Prim’s Algorithm for Minimum Spanning Tree (MST):
o Problem: Given a connected, weighted graph, find the minimum spanning tree (MST),
i.e., a tree that connects all vertices with the minimum possible total edge weight.
o Greedy Approach: Start with an arbitrary node and iteratively add the smallest edge
that connects a vertex in the tree to a vertex outside the tree, until all vertices are
included.
o Approximation Ratio: The greedy algorithm always produces the exact solution for
the MST problem (optimal).

Analysis of Greedy Algorithms


1. Correctness:
o Greedy algorithms can provide the optimal solution in some cases (like MST and
Fractional Knapsack).
o In other cases, they provide a good approximation but not necessarily the optimal
solution (like Set Cover or the 0/1 Knapsack problem).

2. Approximation Ratio:
o For NP-hard problems, greedy algorithms provide approximation ratios. These ratios
tell us how close the greedy solution is to the optimal one.
o For example, in the Set Cover Problem, the greedy algorithm achieves a ratio of
ln⁡n\ln nlnn, meaning the solution is at most ln⁡n\ln nlnn times worse than the
optimal solution.
3. Time Complexity:
o Greedy algorithms are generally efficient. For example:

4. Limitations:
o Greedy algorithms do not always give the optimal solution for all problems. They can
sometimes be short-sighted, making decisions that seem good in the short term but
are not optimal in the long run.
o For non-fractional knapsack or other combinatorial problems, greedy algorithms may
fail to provide a good solution without a specific structure in the problem.

Summary of Example Problems Solved by Greedy Algorithms


Approximation Ratio /
Problem Greedy Approach
Optimality
Choose the set that covers the most Approximation ratio of
Set Cover
uncovered elements. ln⁡n\ln nlnn
Sort items by value-to-weight ratio and take
Fractional Knapsack Optimal solution
fractions.
Activity Selection Select the activity that finishes the earliest. Optimal solution
Merge the two least frequent symbols
Huffman Coding Optimal solution
iteratively.
Approximation Ratio /
Problem Greedy Approach
Optimality
Minimum Spanning Tree Add the smallest edge connecting a new
Optimal solution
(MST) vertex to the MST.

Conclusion
The Greedy Approach is a powerful tool in the design of approximation algorithms. Although it does
not always guarantee optimal solutions, it often provides efficient, simple, and good-enough solutions
for NP-hard problems. By choosing the best option at each step, greedy algorithms offer practical and
scalable solutions for complex problems across a variety of fields.

Dynamic Programming Approach in Approximation Algorithms


Dynamic Programming (DP) is a powerful method used for solving optimization problems by breaking
them down into smaller subproblems, solving each subproblem just once, and storing their solutions.
While Dynamic Programming is often associated with exact algorithms, it can also be used in
approximation algorithms to efficiently find near-optimal solutions to complex problems.

Dynamic Programming in Approximation Algorithms


In approximation algorithms, Dynamic Programming is typically used when:
1. The problem is too complex to solve in polynomial time, and thus finding an exact solution is
computationally prohibitive.
2. We need to find a good-enough solution that is close to the optimal, often within a provable
approximation ratio.
While greedy algorithms are based on making local optimal choices, Dynamic Programming
approaches look at all possible subproblems and combine them in a way that leads to an optimal or
near-optimal solution, often using memoization or a bottom-up approach.

How Dynamic Programming Works in Approximation Algorithms


1. Breaking the Problem Into Subproblems:
o The problem is divided into smaller subproblems, and each subproblem is solved
independently.
o These subproblems are stored in a table or matrix (memoization), so the solution to
each subproblem is only computed once and can be reused.
2. Recursion and Overlapping Subproblems:
o A key feature of DP is that the subproblems often overlap, meaning the same
subproblem is solved multiple times. DP avoids recomputing the same result by
storing previous results.
3. Optimal Substructure:
o The optimal solution to the problem can be constructed from the solutions of the
subproblems. This is known as optimal substructure, and it is a key characteristic of
DP problems.
4. Approximation via Relaxed Solutions:
o In approximation problems, DP can be modified or "relaxed" to give near-optimal
solutions that are efficient to compute, often with an approximation ratio.
Examples of Approximation Problems Using Dynamic Programming
1. Knapsack Problem (0/1 Knapsack)
• Problem: Given a set of items, each with a weight and value, and a knapsack with a capacity,
select a subset of items such that their total weight does not exceed the knapsack's capacity,
and the total value is maximized.
• Dynamic Programming Approach:

o Approximation: The 0/1 knapsack problem itself can be solved exactly using this DP
approach, but when faced with fractional or large-scale versions, relaxed or approximate
solutions may be used, such as the Fractional Knapsack version or Bounded Knapsack
Approximation.

2. Longest Common Subsequence (LCS)


• Problem: Given two sequences, find the longest subsequence common to both.
• Dynamic Programming Approach:

Approximation: For some string matching or sequence alignment problems, we can use
approximate string matching or limited LCS for large datasets, where the solution is relaxed to
avoid exponential complexity.

TSP (Traveling Salesman Problem) Approximation


• Problem: Find the shortest possible route that visits a set of cities and returns to the starting
city. This problem is NP-hard.
• Dynamic Programming Approximation:
o In this case, Held-Karp algorithm is a dynamic programming approach that
approximates the TSP by solving a relaxed version of the problem.
o This approach uses a memoization technique where we store the shortest path that
visits a set of cities and returns to the starting point.
• Approximation: While Held-Karp provides an exact solution for TSP, Approximation
Algorithms such as Christofides’ Algorithm are often used for practical solutions with a
guaranteed approximation ratio (within 1.5 times the optimal for metric TSP).
Dynamic Programming Approximation Ratios
When using DP for approximation, the goal is often to provide a polynomial-time approximation
scheme (PTAS) or an approximation algorithm with a guaranteed ratio.
1. Approximation Algorithms using Dynamic Programming can often find solutions within a
constant factor of the optimal solution.
2. For problems where exact solutions are difficult to compute (like NP-hard problems), Dynamic
Programming can help find a solution that is close enough to the optimal with a known
approximation ratio.

Advantages and Disadvantages of Dynamic Programming in Approximation Algorithms


Advantages:
• Optimal Substructure: DP works well for problems with overlapping subproblems and optimal
substructure, making it suitable for many approximation algorithms.
• Polynomial Time: Many problems that would otherwise take exponential time can be solved
in polynomial time using DP.
• Performance Guarantees: DP-based approximation algorithms often come with a known
approximation ratio that guarantees the solution is close to optimal.
Disadvantages:
• Memory Intensive: DP can be memory-intensive, especially for large problem sizes, as it stores
solutions to all subproblems.
• Computational Complexity: While DP is more efficient than brute force, the complexity of the
algorithm can still be high (e.g., O(n^2) or O(n^3)), which might be infeasible for very large
instances.

Summary of Key Points (Dynamic Programming in Approximation Algorithms)


• Dynamic Programming (DP) can be used to solve approximation problems by breaking the
problem into smaller subproblems and solving them optimally.
• Approximation through Relaxation: DP often gives exact solutions but can also be adapted to
provide near-optimal solutions (especially for NP-hard problems).
• Optimal Substructure and Overlapping Subproblems make DP suitable for approximation,
especially in problems like the Knapsack problem, TSP, and LCS.
• Approximation Ratio: Many DP-based approximation algorithms provide performance
guarantees and can be shown to be within a factor of the optimal solution.
• Efficiency: While DP can be efficient, it may have high time and space complexity, which limits
its scalability for very large inputs.

Example of Approximation Algorithms Using Dynamic Programming


Dynamic Programming
Problem Approximation/Exact Solution
Approach
Use a 2D DP table to track
Knapsack (0/1
the optimal value at each Exact solution using DP.
Knapsack)
weight.
Use memoization to Held-Karp provides an exact solution but is exponential
TSP (Traveling
calculate the shortest path in complexity. Approximate solutions (Christofides) give
Salesman)
recursively. 1.5 approximation ratio.
Longest Common Use a 2D DP table to track
Exact solution using DP.
Subsequence the length of the LCS.
Conclusion
Dynamic Programming is a key technique in approximation algorithms for problems where exact
solutions are too computationally expensive. DP-based algorithms often provide efficient, near-
optimal solutions for complex problems and guarantee a performance ratio close to the optimal.
While DP solutions can be memory- and time-intensive, they are essential for problems with
overlapping subproblems and optimal substructure, and are widely used in real-world applications in
optimization, bioinformatics, and network design.
Knapsack problems are those problems in which some set of items will be given to us, each with a
weight and value and we will be asked to find the most valuable combination by maximizing the total
value of items and the weight should not exceed the knapsack weigh.

Approximation Algorithms:

It plays a vital role in finding the optimal solution to the knapsack problems as in real-world scenarios
finding the exact optimal solution to the knapsack problem is quite impractical due to the problem’s
NP-hard nature. These approximation algorithms offer a reasonable solution to the knapsack
problem by taking both time and space complexity into consideration.

we will discuss two approximation algorithms:

• Greedy Algorithms

• Dynamic Programming

Greedy Algorithms to the Knapsack Problem

The greedy approach is a simple and intutive algorithm for solving knapsack problem. It will select
the items based on theri value to weight ratios and choosing the items with highest ratios first.

Step-by-step algorithm:

• Sort the items in descending order of their value-to-weight ratios.

• Initialize the knapsack as empty and set the total value to zero.

• Iterate through the items in the sorted order:

o If the current item can be fully included in the knapsack, add it completely and
update the total value.

o Otherwise, include a fractional part of the item that fits the remaining capacity of
the knapsack, proportionally increasing the total value.

• Return the knapsack’s final configuration and the total value.

Time Complexity: O(N log N) where N is the number of items due to sorting
Auxiliary Space: O(N) where N is the number of items.

Dynamic Programming Approach for the Knapsack Problem

Using dynamic programming we can break down the problem into smaller subproblems and will use
a table to store the optimal solutions for the these subproblems. We will iterate through each item
and weight combination making a decision to either include or exclude the item based on its value
and weight and we can achieve result by avoiding redundant calucations
Step-by-step algorithm:

• Create a 2D list called dp with dimensions (N+1) x (W+1) and initialize all values to 0. This
table will store the maximum achievable values for different combinations of items and
capacities.

• Iterate through each item from 1 to N and each capacity from 1 to W:

o If the weight of the current item is greater than the current capacity, it cannot be
included in the knapsack. So, assign the value at the previous item and the same
capacity to dp[i][j].

o If the weight of the current item is less than or equal to the current capacity, we
have two choices:

o Include the current item: Add its value to the value obtained by considering the
remaining capacity after including the item (values[i-1] + dp[i-1][j – weights[i-1]]).

o Exclude the current item: Consider the value obtained by excluding the item (dp[i-
1][j]). Choose the maximum value between the two choices and assign it to dp[i][j].

• After completing the iterations, the value at dp[N][W] represents the maximum achievable
value for the given knapsack capacity.

• Return the value dp[N][W] as the maximum value that can be obtained.

Time Complexity: O(N * W) where N is items and W is capacities.


Auxiliary Space: O(N * W) where N is items and W is capacities.

Knapsack Problem in Approximation Algorithms

The Knapsack Problem is one of the most well-known optimization problems and is often used to
demonstrate approximation algorithms. It involves selecting a subset of items, each with a given
weight and value, such that the total weight of the selected items does not exceed a given weight
capacity, and the total value is maximized.

The Knapsack Problem has two main variants:

1. 0/1 Knapsack Problem: Each item can either be included or excluded.

2. Fractional Knapsack Problem: Items can be broken into smaller fractions, allowing you to
take a part of an item.

While the Fractional Knapsack problem can be solved optimally in polynomial time using a greedy
approach, the 0/1 Knapsack Problem is NP-hard, and finding the optimal solution for large instances
becomes computationally expensive. Thus, approximation algorithms are often used when finding an
exact solution is not feasible.
Knapsack Problem Variants in Approximation Algorithms

1. 0/1 Knapsack Problem (NP-Hard)

In the 0/1 Knapsack Problem, each item is either included or excluded, and the goal is to maximize
the total value while ensuring that the total weight does not exceed the given capacity of the
knapsack.

2. Fractional Knapsack Problem (Greedy Approach)

In the Fractional Knapsack Problem, items can be divided into fractions, meaning you can take a
fraction of any item rather than just an entire item.

• Greedy Solution:

Approximation Algorithms for 0/1 Knapsack Problem

Since the 0/1 Knapsack Problem is NP-Hard, we often use approximation algorithms to find good
solutions in reasonable time, especially when an exact solution is not feasible due to the problem's
computational complexity.

1. Greedy Approximation (for 0/1 Knapsack)

Although the greedy algorithm does not guarantee an optimal solution for the 0/1 Knapsack, it can
be adapted as an approximation strategy for certain cases. A simple greedy approach would be to
sort items by their value-to-weight ratio and try to include them in the knapsack, but we can only
take full items.

However, the Greedy Approach for 0/1 Knapsack does not always yield the optimal solution. For
example, sometimes including a lower value item might lead to a better solution than including a
high value-to-weight ratio item.
For the 0/1 Knapsack, the greedy approach might be used as a heuristic, but there is no fixed
approximation ratio that applies universally. Instead, we typically rely on more sophisticated
techniques.
Approximation Algorithms for the Knapsack Problem: Summary

Key Takeaways

• The 0/1 Knapsack Problem is NP-hard, meaning that finding an exact solution can be
computationally expensive for large problem instances.

• Greedy algorithms provide a fast heuristic but do not guarantee optimal solutions for the
0/1 Knapsack Problem.

• Dynamic Programming can provide an exact solution but is pseudo-polynomial in time, and
scaling techniques can be used for approximation.

In practice, when an exact solution is infeasible, approximation algorithms like FPTAS or dynamic
programming with scaling offer a good balance between computational time and solution quality.

Huffman Coding: Explanation

Huffman Coding is a popular lossless data compression algorithm used to minimize the size of data
while preserving the integrity of the original information. It works by assigning variable-length binary
codes to each input character, with shorter codes assigned to more frequent characters, and longer
codes assigned to less frequent ones. The main goal of Huffman coding is to reduce the total number
of bits required to represent a given set of symbols (or characters).

The algorithm was developed by David A. Huffman in 1952 while he was a Ph.D. student at MIT, and
it has since become one of the foundational algorithms in data compression, particularly used in
formats like ZIP files, JPEG image compression, and MP3 audio compression.
How Huffman Coding Works

1. Frequency Analysis:

o The first step is to analyze the frequency of each character (or symbol) in the input
data. Characters that appear more frequently are given shorter codes, while those
that appear less frequently are assigned longer codes.

2. Building a Binary Tree:

o A binary tree is built in which each leaf node represents a symbol, and its weight
corresponds to the frequency of that symbol.

o Two nodes with the smallest frequencies are repeatedly combined into a parent
node. The frequency of the parent node is the sum of the frequencies of the two
child nodes.

o This process continues until there is only one node left, which becomes the root of
the Huffman tree.

3. Assigning Codes:

o After the binary tree is constructed, the Huffman codes are assigned. Starting from
the root, move left for 0 and right for 1. The code for each symbol is the sequence of
0s and 1s from the root to the corresponding leaf node.

4. Compression:

o The input data is then encoded using these binary codes. Since more frequent
symbols have shorter codes, the total length of the compressed data is minimized.

Key Points about Huffman Coding

• Lossless Compression: Huffman coding is a lossless compression technique, meaning no


data is lost during compression.

• Variable-Length Codes: Huffman assigns variable-length codes to symbols, where more


frequent symbols get shorter codes and less frequent ones get longer codes.

• Optimality: It produces an optimal prefix code for a given set of symbols, ensuring the
smallest possible number of bits required for encoding, given the frequencies of the
symbols.

• Binary Tree Structure: The algorithm works by constructing a binary tree where the leaves
represent the symbols, and the tree is built based on the frequencies of those symbols.

• Greedy Algorithm: Huffman coding is a greedy algorithm, meaning it makes a series of


locally optimal choices (combining the two least frequent symbols) in the hope of finding a
globally optimal solution.
Steps of Huffman Coding Algorithm

1. Create a Priority Queue (Min-Heap):

o Place each symbol along with its frequency into a priority queue (or min-heap)
ordered by frequency.

2. Build the Huffman Tree:

o Extract the two nodes with the smallest frequencies from the queue.

o Create a new internal node with these two nodes as children. The frequency of the
internal node is the sum of the two children's frequencies.

o Insert the new internal node back into the priority queue.

o Repeat this process until there is only one node left, which will be the root of the
Huffman tree.

3. Generate Huffman Codes:

o Traverse the tree starting from the root. Assign 0 for left branches and 1 for right
branches.

o The path from the root to each leaf node represents the Huffman code for that
symbol.

4. Encode the Input:

o Use the Huffman codes to replace each symbol in the input data with its
corresponding Huffman code.

o The result is a compressed representation of the input.

5. Decoding:

o To decode the compressed data, traverse the Huffman tree according to the
sequence of 0s and 1s, starting from the root. Each time a leaf is reached, the
corresponding symbol is decoded.

Advantages of Huffman Coding

• Efficient Compression: Huffman coding minimizes the number of bits required to represent
data, making it highly efficient for data storage and transmission.

• Optimal for Known Frequencies: If the frequency distribution of symbols is known, Huffman
coding produces the optimal prefix-free binary code.

• Widely Used: It is the basis for many common compression algorithms, including ZIP files,
JPEG image format, and MP3 audio encoding.

Disadvantages of Huffman Coding

• Requires Frequency Information: Huffman coding requires knowing the frequency of each
symbol in advance, which may not always be feasible or efficient if the symbol frequencies
change over time.
• No Support for Adaptive Compression: While Huffman coding works optimally for static
frequency distributions, it doesn't adapt well to data streams where frequencies may change
dynamically. This limitation can be addressed by Adaptive Huffman Coding.

Summary of Key Points

• Huffman Coding is a lossless compression algorithm that assigns variable-length binary


codes to symbols based on their frequencies, with more frequent symbols assigned shorter
codes.

• It uses a binary tree structure, built using a greedy algorithm, to generate optimal codes for
the symbols.

• Efficiency: Huffman coding achieves optimal compression when symbol frequencies are
known in advance.

• It is widely used in compression schemes for file formats such as ZIP, JPEG, and MP3.

Summary Bullet Points

• Type: Lossless data compression algorithm.

• Approach: Assigns variable-length codes based on symbol frequencies (greedy).

• Optimality: Generates the optimal prefix-free code for a set of symbols.

• Data Structure: Uses a binary tree to assign codes.

• Applications: Used in ZIP, JPEG, MP3, and other compression formats.

• Time Complexity: Building the tree takes O(nlog⁡n)O(n \log n)O(nlogn), where nnn is the
number of unique symbols.

• Decoding: Requires the tree for decoding, which can be done efficiently by traversing the
tree.

Example of Huffman Coding

Step 1: Build the Huffman Tree

Step 2: Assign Huffman Codes

Step 3: Encode the Data

Huffman Coding is an algorithm used for lossless data compression.

Huffman Coding is also used as a component in many different compression algorithms. It is used as
a component in lossless compressions such as zip, gzip, and png, and even as part of lossy
compression algorithms like mp3 and jpeg.

Use the animation below to see how a text can be compressed using Huffman Coding.
How it works:

1. Count how often each piece of data occurs.

2. Build a binary tree, starting with the nodes with the lowest count. The new parent node has
the combined count of its child nodes.

3. The edge from a parent gets '0' for the left child, and '1' for the edge to the right child.

4. In the finished binary tree, follow the edges from the root node, adding '0' or '1' for each
branch, to find the new Huffman code for each piece of data.

5. Create the Huffman code by converting the data, piece-by-piece, into a binary code using the
binary tree.

Huffman Coding uses a variable length of bits to represent each piece of data, with a shorter bit
representation for the pieces of data that occurs more often.

Furthermore, Huffman Coding ensures that no code is the prefix of another code, which makes the
compressed data easy to decode.

Data compression is when the original data size is reduced, but the information is mostly, or fully,
kept. Sound or music files are for example usually stored in a compressed format, roughly just 10% of
the original data size, but with most of the information kept.

Lossless means that even after the data is compressed, all the information is still there. This means
that for example a compressed text still has all the same letters and characters as the original.

Lossy is the other variant of data compression, where some of the original information is lost, or
sacrificed, so that the data can be compressed even more. Music, images, and video is normally
stored and streamed with lossy compression like mp3, jpeg, and mp4.

Huffman Coding Algorithm

Data may be compressed using the Huffman Coding technique to become smaller without losing any
of its information. After David Huffman, who created it in the beginning? Data that contains
frequently repeated characters is typically compressed using Huffman coding.

A well-known Greedy algorithm is Huffman Coding. The size of code allocated to a character relies on
the frequency of the character, which is why it is referred to be a greedy algorithm. The short-length
variable code is assigned to the character with the highest frequency, and vice versa for characters
with lower frequencies. It employs a variable-length encoding, which means that it gives each
character in the provided data stream a different variable-length code.

Prefix Rule

Essentially, this rule states that the code that is allocated to a character shall not be another code's
prefix. If this rule is broken, various ambiguities may appear when decoding the Huffman tree that
has been created.
What is the Huffman Coding process?

The Huffman Code is obtained for each distinct character in primarily two steps:

o Create a Huffman Tree first using only the unique characters in the data stream provided.

o Second, we must proceed through the constructed Huffman Tree, assign codes to the
characters, and then use those codes to decode the provided text.

Steps to Take in Huffman Coding

The steps used to construct the Huffman tree using the characters provided

1. Input:

2. string str = "abbcdbccdaabbeeebeab"

If Huffman Coding is employed in this case for data compression, the following information must be
determined for decoding:

o For each character, the Huffman Code

o Huffman-encoded message length (in bits), average code length

o Utilizing the formulas covered below, the final two of them are discovered.

Greedy Huffman Code Construction Algorithm

o Huffman developed a greedy technique that generates a Huffman Code, an ideal prefix code,
for each distinct character in the input data stream.

o The approach uses the fewest nodes each time to create the Huffman tree from the bottom
up.

o Because each character receives a length of code based on how frequently it appears in the
given stream of data, this method is known as a greedy approach. It is a commonly occurring
element in the data if the size of the code retrieved is less.

The use of Huffman Coding

o Here, we'll talk about some practical uses for Huffman Coding:

o Conventional compression formats like PKZIP, GZIP, etc. typically employ Huffman coding.

o Huffman Coding is used for data transfer by fax and text because it minimizes file size and
increases transmission speed.

o Huffman encoding (particularly the prefix codes) is used by several multimedia storage
formats, including JPEG, PNG, and MP3, to compress the files.

o Huffman Coding is mostly used for image compression.

o When a string of often recurring characters has to be sent, it can be more helpful.

Conclusion

o In general, Huffman Coding is helpful for compressing data that contains frequently occurring
characters.
o We can see that the character that occurs most frequently has the shortest code, whereas
the one that occurs the least frequently has the greatest code.

o The Huffman Code compression technique is used to create variable-length coding, which
uses a varied amount of bits for each letter or symbol. This method is superior to fixed-
length coding since it uses less memory and transmits data more quickly.

o Go through this article to have a better knowledge of the greedy algorithm.

Traveling Salesman Problem (TSP) in Approximation Algorithms

The Traveling Salesman Problem (TSP) is a classic NP-hard problem in the field of optimization. The
problem asks for the shortest possible route that visits each of a given set of cities exactly once and
returns to the starting city. Despite its simple statement, the TSP is notoriously difficult to solve
efficiently for large datasets, as the number of possible solutions grows factorially with the number
of cities.

Problem Statement (TSP)

• Input: A set of cities and the distances between each pair of cities.

• Output: A Hamiltonian circuit (a path that visits each city exactly once and returns to the
starting point) with the minimum total distance.

Challenges with TSP

• NP-Hard: The exact solution to TSP is computationally expensive and can only be solved
optimally in exponential time for large instances.

• Exponential Solutions: The brute-force solution involves checking all possible permutations
of the cities, which has a time complexity of O(n!), where nnn is the number of cities.

• Approximation: Since finding an optimal solution for large instances is infeasible,


approximation algorithms are used to find a good-enough solution in polynomial time.

TSP Approximation Algorithms

Given the computational intractability of the TSP, approximation algorithms aim to find solutions that
are close to the optimal, with a guaranteed performance ratio (i.e., how far the approximate
solution is from the optimal solution). The goal is to find a polynomial-time approximation that gives
a solution within a known factor of the optimal one.

1. Christofides' Algorithm (Best Known Approximation for Metric TSP)

Christofides' Algorithm is the most widely used approximation algorithm for solving the metric TSP
(where the triangle inequality holds, i.e., the direct path between two cities is never longer than any
indirect route).

• Approximation Ratio: Christofides' algorithm guarantees that the total length of the tour will
be at most 1.5 times the optimal length. This is the best possible approximation ratio that is
achievable in polynomial time for the metric TSP.
• Steps of Christofides' Algorithm:

1. Minimum Spanning Tree (MST): Find the MST of the given graph (using algorithms
like Prim's or Kruskal's). The MST represents the least-cost way to connect all the
cities.

2. Find Odd Degree Vertices: In the MST, some vertices will have an odd degree (i.e.,
an odd number of connections). To make the MST Eulerian (so that it can be turned
into a cycle), we need to pair up the odd-degree vertices.

3. Minimum Weight Perfect Matching: Find a minimum weight perfect matching


among the odd-degree vertices. This ensures that the total weight of the matching is
as small as possible.

4. Combine MST and Matching: Combine the MST with the edges of the perfect
matching. This creates a multigraph where every vertex has an even degree.

5. Eulerian Circuit: Find an Eulerian circuit in the multigraph (which can be done in
linear time). This Eulerian circuit is not a valid TSP tour, but it visits every edge.

6. Shortcutting: Finally, remove any repeated cities from the Eulerian circuit to get a
valid TSP tour.

• Approximation Ratio: Christofides' algorithm guarantees that the length of the tour is no
more than 1.5 times the length of the optimal TSP solution.

2. Nearest Neighbor Algorithm (Greedy Approximation)

The Nearest Neighbor Algorithm is a simple and greedy approach to the TSP, often used for heuristic
solutions.

• Steps:

1. Start at an arbitrary city.

2. From the current city, move to the nearest city (the one with the smallest distance
that hasn't been visited yet).

3. Repeat the process until all cities are visited.

4. Return to the starting city to complete the tour.

• Approximation Quality:

o Worst-case performance: The Nearest Neighbor algorithm can perform poorly, and
its approximation ratio can be as bad as n, where nnn is the number of cities. This is
because the greedy choices may lead to suboptimal paths.

o Practical use: Despite its poor worst-case performance, it is fast and often yields
relatively good solutions for smaller instances.
3. Minimum Spanning Tree-Based Approximations

Another approximation strategy for TSP is based on constructing a Minimum Spanning Tree (MST).
This method works as follows:

• Steps:

1. Compute the Minimum Spanning Tree (MST) of the graph.

2. Double the MST: Since an MST is a tree, it does not form a cycle. To make it a cycle,
"double" the tree by traversing each edge twice, once in each direction.

3. Shortcutting: Perform shortcutting to eliminate repeated visits to the same city.

• Approximation Ratio:

o The resulting approximation algorithm guarantees that the length of the tour is at
most twice the length of the optimal TSP solution.

• Time Complexity: This algorithm runs in polynomial time, as the MST can be computed in
O(n log n) time using algorithms like Prim’s or Kruskal’s, and the shortcutting step is linear.

Summary of TSP Approximation Algorithms

Advantages of Approximation Algorithms for TSP

1. Polynomial-Time Solutions: Approximation algorithms provide polynomial-time solutions,


making them feasible for large instances of TSP.

2. Performance Guarantees: Algorithms like Christofides' algorithm provide guaranteed


performance ratios (e.g., at most 1.5 times the optimal), making them reliable for near-
optimal solutions.

3. Practicality: While exact solutions are computationally expensive, approximation algorithms


offer practical solutions in real-world applications like logistics, manufacturing, and network
design.
Disadvantages of Approximation Algorithms

1. No Exact Solution: Approximation algorithms do not provide the exact optimal solution;
they only provide near-optimal solutions.

2. Worst-Case Performance: Some algorithms, like Nearest Neighbor, can have poor worst-case
performance, especially for poorly structured instances of the problem.

3. Applicability: Many approximation algorithms (like Christofides' algorithm) are limited to


specific variants of TSP, such as metric TSP (where the triangle inequality holds), and may not
work optimally for general TSP.

Applications of TSP Approximation Algorithms

• Logistics and Route Planning: Optimizing delivery routes, traveling sales, and logistics for
minimizing travel time or cost.

• Manufacturing: Minimizing the time spent on tasks or machines (e.g., in cutting, drilling, or
assembly lines).

• Computer Networks: Optimizing the layout of network paths for minimizing latency and
maximizing efficiency.

• Circuit Design: Minimizing wire length in the design of electronic circuits, where routing a
circuit between multiple points is analogous to solving TSP.

Conclusion

The Traveling Salesman Problem (TSP) is one of the most famous NP-hard problems. Due to its
computational complexity, approximation algorithms are used in practice to find near-optimal
solutions efficiently. Among the most prominent is Christofides' algorithm, which guarantees a
solution within 1.5 times the optimal for metric TSP. While greedy algorithms like Nearest Neighbor
are faster, they may result in poor solutions in the worst case. Still, approximation algorithms are
invaluable for real-world applications where finding an optimal solution is impractical, but a near-
optimal solution is good enough.

Travelling Salesperson Approximation Algorithm

The travelling salesperson approximation algorithm requires some prerequisite algorithms to be


performed so we can achieve a near optimal solution. Let us look at those prerequisite algorithms
briefly −

Minimum Spanning Tree − The minimum spanning tree is a tree data structure that contains all the
vertices of main graph with minimum number of edges connecting them. We apply prim’s algorithm
for minimum spanning tree in this case.

Pre-order Traversal − The pre-order traversal is done on tree data structures where a pointer is
walked through all the nodes of the tree in a [root – left child – right child] order.

Algorithm

Step 1 − Choose any vertex of the given graph randomly as the starting and ending point.
Step 2 − Construct a minimum spanning tree of the graph with the vertex chosen as the root using
prim’s algorithm.

Step 3 − Once the spanning tree is constructed, pre-order traversal is performed on the minimum
spanning tree obtained in the previous step.

Step 4 − The pre-order solution obtained is the Hamiltonian path of the travelling salesperson.

Pseudocode

APPROX_TSP(G, c)

r <- root node of the minimum spanning tree

T <- MST_Prim(G, c, r)

visited = {ф}

for i in range V:

H <- Preorder_Traversal(G)

visited = {H}

Analysis

The approximation algorithm of the travelling salesperson problem is a 2-approximation algorithm if


the triangle inequality is satisfied.

To prove this, we need to show that the approximate cost of the problem is double the optimal cost.
Few observations that support this claim would be as follows −

• The cost of minimum spanning tree is never less than the cost of the optimal Hamiltonian
path. That is, c(M) ≤ c(H*).

• The cost of full walk is also twice as the cost of minimum spanning tree. A full walk is defined
as the path traced while traversing the minimum spanning tree preorderly. Full walk
traverses every edge present in the graph exactly twice. Thereore, c(W) = 2c(T)

• Since the preorder walk path is less than the full walk path, the output of the algorithm is
always lower than the cost of the full walk.

Example
Let us look at an example graph to visualize this approximation algorithm −
Solution
Consider vertex 1 from the above graph as the starting and ending point of
the travelling salesperson and begin the algorithm from here.

Step 1

Starting the algorithm from vertex 1, construct a minimum spanning tree


from the graph. To learn more about constructing a minimum spanning tree,
please click here.

Step 2

Once, the minimum spanning tree is constructed, consider the starting vertex
as the root node (i.e., vertex 1) and walk through the spanning tree
preorderly.

Rotating the spanning tree for easier interpretation, we get −

The preorder traversal of the tree is found to be − 1 → 2 → 5 → 6 → 3 → 4

Step 3

Adding the root node at the end of the traced path, we get, 1 → 2 → 5 → 6 → 3
→4→1
This is the output Hamiltonian path of the travelling salesman approximation
problem. The cost of the path would be the sum of all the costs in the
minimum spanning tree, i.e., 55.

All-Pairs Shortest Path (APSP) in Approximation Algorithms

The All-Pairs Shortest Path (APSP) problem is a fundamental problem in graph theory. The task is to
find the shortest paths between all pairs of vertices in a weighted graph. Given a graph with nnn
vertices and mmm edges, the goal is to compute the shortest path from every vertex to every other
vertex.

While the APSP problem can be solved exactly using algorithms like Floyd-Warshall or Johnson's
Algorithm, these methods can be computationally expensive for large graphs, especially when the
graph has a large number of vertices. For this reason, approximation algorithms are often used in
scenarios where exact solutions are not feasible due to time or resource constraints.

Approximation Algorithms for APSP

In cases where the exact solution is not required or is too computationally expensive, approximation
algorithms can be used to compute the shortest paths in a more efficient manner. Approximation
algorithms for APSP aim to provide solutions that are close to the optimal, but with reduced time
complexity.

1. Thorup-Zwick Approximation Algorithm

The Thorup-Zwick algorithm provides an approximation for the All-Pairs Shortest Path (APSP)
problem on unweighted or weighted graphs. It is particularly efficient for dense graphs and is known
for providing small approximation guarantees.
Steps of Matrix Multiplication Approximation:

1. Adjacency Matrix: Represent the graph as an adjacency matrix, where each entry
A[i][j]A[i][j]A[i][j] contains the weight (or infinity if no edge exists) between vertex iii and
vertex jjj.

2. Exponentiation: Use matrix exponentiation or matrix multiplication to approximate the


shortest path distances between all pairs.

3. Approximate the Distances: The matrix product represents the shortest paths, and the
entries of the matrix are approximated within the desired factor.

3. Local Search-Based Approximation

For certain types of graphs, particularly planar graphs or graphs with low edge density, local search-
based algorithms can be used to approximate APSP solutions.
• Approximation Ratio: The approximation ratio can vary depending on the type of graph and
the specific algorithm used, but it is typically around (1 + ε), where ϵ\epsilonϵ is a small
constant.

• Time Complexity: These algorithms can run in polynomial time, depending on the graph
structure and the method used.

Steps in Local Search Approximation:

1. Initial Approximation: Start by computing an initial approximation using simpler algorithms


(e.g., Dijkstra’s algorithm or BFS).

2. Refinement: Iteratively refine the approximation by exploring local neighborhoods around


each vertex and improving the shortest path estimate.

3. Termination: Continue refining until the approximation is close enough to the optimal
solution or until a predefined error threshold is reached.
Advantages of APSP Approximation Algorithms

• Polynomial Time: Approximation algorithms provide polynomial-time solutions, making


them feasible for large graphs.

• Good Approximation Guarantees: Algorithms like Thorup-Zwick and matrix multiplication


give strong guarantees on how close the approximation is to the optimal solution.

• Efficiency: These algorithms are designed to be much more efficient than exact algorithms,
especially in dense or large graphs.

Disadvantages of APSP Approximation Algorithms

• Approximation Factor: These algorithms do not always provide exact solutions but rather
approximate solutions, which may not be optimal in all cases.

• Specialized for Certain Graphs: Some algorithms, like Thorup-Zwick and local search-based
approaches, are more suitable for specific types of graphs (e.g., dense graphs, planar
graphs).

• Error Margin: The approximation ratio is often a small constant ϵ\epsilonϵ, meaning there is
always some error margin in the computed distances.

Applications of APSP Approximation Algorithms

1. Network Routing: In communication networks, determining the shortest paths between all
pairs of nodes can be useful for routing decisions.

2. Transportation Systems: In logistics and route planning, approximate APSP solutions can
help optimize the routes between multiple cities or locations.

3. Geospatial Analysis: In geography and urban planning, approximate APSP solutions are used
for determining the shortest travel paths between multiple locations in large-scale maps.

4. Data Mining: In some machine learning applications, APSP can be used to compute distances
between points (e.g., in clustering algorithms).

Conclusion

Approximation algorithms for All-Pairs Shortest Path (APSP) offer practical solutions when exact
algorithms are computationally expensive or infeasible for large graphs. Thorup-Zwick’s algorithm
and matrix multiplication-based methods provide strong approximation guarantees, making them
suitable for dense graphs. While local search-based methods work well for specific

Longest Common Subsequence (LCS)

Given two strings, s1 and s2, the task is to find the length of the Longest Common Subsequence. If
there is no common subsequence, return 0.
A subsequence is a string generated from the original string by deleting 0 or more characters and
without changing the relative order of the remaining characters. For example , subsequences of
“ABC” are “”, “A”, “B”, “C”, “AB”, “AC”, “BC” and “ABC”.
In general a string of length n has 2n subsequences.

Examples:

Input: s1 = “ABC”, s2 = “ACD”


Output: 2
Explanation: The longest subsequence which is present in both strings is “AC”.

Input: s1 = “AGGTAB”, s2 = “GXTXAYB”


Output: 4
Explanation: The longest common subsequence is “GTAB”.

Input: s1 = “ABC”, s2 = “CBA”


Output: 1
Explanation: There are three longest common subsequences of length 1, “A”, “B” and “C”.

Longest Common Subsequence (LCS) Problem

The Longest Common Subsequence (LCS) problem is a classical problem in computer science, often
encountered in string matching, bioinformatics, and data comparison. It deals with finding the
longest sequence that appears in both strings in the same relative order but not necessarily
consecutively.
Applications of LCS

1. Bioinformatics: Finding similarities between genetic sequences. For example, aligning DNA
or protein sequences to identify conserved subsequences.

2. Text Comparison: Used in diff tools to compare files or texts and highlight changes.

3. Data Deduplication: Identifying duplicate or similar data by finding common subsequences.

4. Version Control: In version control systems, LCS is used to detect changes between versions
of a file.

5. Speech and Language Processing: Finding common patterns or subsequences in speech or


text data.

You might also like