M5 Daa-Cs201
M5 Daa-Cs201
Content
Graph Algorithms and NP Completeness
Connectivity
Topological Sort
Shortest Path Network Flow
Disjoint Set Union Problem
String Matching
Disjoint Set Manipulation
Classification Of Problems- Decision & Optimisation Problems
Classificationo Of Algorithms-Deterministic & Non-Deterministic
Problems
Classes Of Problems
Relationship Among The Classes Of Problems
Reducibility
Cook’s Theorem
Satisfiability
C-SAT Problem
Clique Decision Problem
Topological Sort:-
Topological sorting is an ordering of the vertices of a directed acyclic graph (DAG)
such that for every directed edge (u, v), vertex u comes before vertex v in the
ordering. This ordering is useful in scenarios where tasks or activities have
dependencies, and the order of execution must respect these dependencies.
Algorithm: The most common algorithm for topological sorting is based on depth-
first search (DFS). The basic idea is to visit nodes in a depth-first manner and assign
ordering numbers to the nodes based on the finishing times of the DFS visits.
Pseudocode: The pseudocode for topological sorting using DFS might look like this:
1. Initialization:
- Initialise an empty stack to keep track of the ordering.
- Mark all vertices as not visited.
3. Ordering:
- The order in which vertices are pushed onto the stack represents the topological
ordering.
4. Result:
- Pop elements from the stack to get the final topological ordering.
Applying topological sorting to this graph might result in the order `[5, 2, 0, 3, 1, 4]`,
indicating a valid order in which tasks or activities can be executed without violating
any dependencies.
Shortest Path Network Flow:-
In the context of network flow, the shortest path problem and maximum flow
problem are two fundamental and closely related problems. Both problems deal
with finding paths through a network, but they have different objectives and
constraints.
The shortest path problem aims to find the path between two nodes in a
network that minimizes the total cost of traversing the edges along the path.
The cost of an edge can represent distance, time, or any other relevant metric. The
shortest path problem is often applied to applications such as navigation, routing,
and supply chain optimization.
The shortest path problem and the maximum flow problem are related in two ways:
1. Unit-capacity maximum flow: When all edge capacities are set to 1, the
from a source node to a destination node with the minimum total cost, the
shortest path algorithm can be used to find a single path with the minimum
cost.
The choice of algorithm depends on the specific characteristics of the network and
the desired performance.
2. Union(x, y): Merges the sets containing x and y into a single set.
APPLICATIONS:-
String Matching:-
String matching is the process of locating the occurrence(s) of a specific
sequence of characters (pattern) within another longer sequence of characters
(text). Various algorithms, like KMP, Boyer-Moore, and Rabin-Karp, are employed
for efficient and quick identification of these patterns in the text.
for i from 0 to n - m:
j=0
while j < m and text[i + j]
equals pattern[j]:
j=j+1
if j equals m:
// Pattern found at position i in the text
return i
Complexity:-
1. Brute Force:
- Time: O((n - m + 1) * m)
- Space: O(1)
2. Optimization- Knuth-Morris-Pratt (KMP):
- Time: O(n + m)
- Space: O(m)
Advantages:-
1. Efficiency: designed for quick and efficient identification of patterns in large texts.
2. Flexibility: Different algorithms cater to various scenarios and types of patterns,
providing flexibility in choosing the most suitable approach.
3. Pattern Recognition: applications involving pattern recognition
4. Multiple Pattern Matching: Some algorithms efficiently handle multiple patterns
simultaneously, beneficial in tasks like virus scanning and content filtering.
5. Hashing Techniques: Algorithms like Rabin-Karp leverage hashing for a balance
between simplicity and speed in pattern matching.
Disadvantages:-
1. Complexity: Some algorithms are complex and challenging to implement.
2. Memory Usage: Certain algorithms may require significant memory.
3. Sensitive to Input: Performance may vary with specific data or patterns.
4. False Positives/Negatives: Risk of incorrect matches or missing valid ones.
Applications :-
1. Search Engines: Locate relevant documents based on user queries.
2. Data Mining: Identify patterns in large datasets for information extraction.
3. Plagiarism Detection: Identify similarities in documents to detect plagiarism.
4. Virus/Malware Detection: Identify malicious code patterns in files and processes.
Complexity:-
1. Find Operation: - Amortized nearly constant time with path compression.
2. Union Operation: - Amortized nearly constant time with rank-based optimization.
3. Space Complexity: - Linear in terms of the number of elements in the disjoint-set.
Advantages :-
1. Efficiency: - Quick set membership checks and set merging.
2. Path Compression: - Efficient representative element lookup.
3. Rank-Based Union: - Maintains balanced trees, preventing performance
degradation.
4. Cycle Detection: - Useful in algorithms like Kruskal's for detecting cycles.
Disadvantages :-
1. Dynamic Changes: - May not perform optimally with frequent structural changes
to sets.
2. Memory Overhead: - Requires additional memory for parent and rank arrays.
3. Sequential Nature: - Operations may be inherently sequential, limiting
parallelization.
4. Dependency on Input Order: - Efficiency may depend on the order of operations.
Applications:-
1. Connected Components:- Identify connected components in graphs.
2. Dynamic Connectivity: - Track network connectivity with edge changes.
3. Image Segmentation: - Group pixels with similar attributes in images.
4. Maze Generation: - Connect disjoint cells for maze creation.
5. Network Design: - Ensure efficient connectivity in computer networks.
Classification Of Problems:-
In computational complexity theory, problems are often categorized based on the
type of task they require a computer to perform. Two fundamental categories are
decision problems and optimization problems.
1. Decision Problems:-
A decision problem is a problem where the answer is a simple
"yes" or "no" (true or false). The goal is to determine whether a
given input satisfies a certain property or condition.
2. Optimization Problems:-
An optimization problem involves finding the best solution
from all feasible solutions. The goal is to optimize
(minimize or maximize) a certain objective function, subject
to given constraints.
Example: The Traveling Salesman problem (TSP) is an
optimization problem. Given a list of cities and the distances
between each pair of cities, the task is to find the shortest
possible tour that visits each city exactly once and returns to
the starting city.
These categories are not mutually exclusive, and an optimization problem can often
be reformulated as a decision problem and vice versa.
Classification Of Algorithm:-
Classification of algorithms into deterministic and non-deterministic is based on the
predictability of their behavior.
1. Deterministic Algorithms:
Examples:
- Deterministic: Binary search, linear search, bubble sort, quicksort.
- Non-deterministic: Genetic algorithms, simulated annealing, some machine
learning algorithms like stochastic gradient descent.
Use Cases:
- Deterministic: Situations where reproducibility and predictability are crucial, such
as in financial calculations or critical systems.
- Non-deterministic: Optimization problems where exploring different possibilities is
beneficial, like in evolutionary algorithms or certain machine learning tasks.
Classes Of Problems:-
1. P (Polynomial Time):
Definition:
P is the class of decision problems for which a deterministic Turing machine can
solve instances in polynomial time. In simpler terms, it includes problems with
efficient algorithms.
Essential Features:
1) Efficient Algorithms: P problems have algorithms with polynomial time
complexity.
2) Polynomial Bound: The running time is bounded by a polynomial in terms of
the input size.
3) Deterministic Computation:Solutions can be found deterministically in
polynomial time.
Areas of Application:
Many practical problems with efficient algorithms fall into P, such as sorting,
searching, and basic graph algorithms.
Examples: Linear search, bubble sort, matrix multiplication with known efficient
algorithms.
2. NP (Nondeterministic Polynomial Time):
Definition:
NP is the class of decision problems for which a solution, once proposed, can be
verified in polynomial time by a deterministic Turing machine. The term
"nondeterministic" does not imply randomness but refers to the non-deterministic
nature of the verification process.
Essential Features:
1) Efficient Verification: Given a solution, it can be verified in polynomial time.
2) Nondeterministic Computation: While verification is efficient, finding solutions
is not necessarily efficient.
Areas of Application:
Problems where it's easy to check a given solution but might be hard to find one, like
certain optimization problems.
Essential Features:
1) Superior Hardness: NP-hard problems are at least as hard as the hardest
problems in NP.
2) No Efficient Solutions: No known polynomial-time algorithm exists to solve all
instances of an NP-hard problem.
3) Reduction: Any problem in NP can be reduced to an NP-hard problem in
polynomial time.
Areas of Application:
Serves as a benchmark for the inherent difficulty of solving certain problems.
Essential Features:
NP and NP-hard: NP-complete problems are in NP and NP-hard.
Areas of Application:
Identifying NP-complete problems is crucial because they represent a class of
problems that, if solved efficiently, would imply efficient solutions for all of NP.
Examples: Boolean Satisfiability (SAT), Traveling Salesman Problem (in its decision
form).
NP-complete problems play a central role in theoretical computer science, and their
study has far-reaching implications for the feasibility of efficient algorithms in
various application domains.
Cook’s Theorem:-
Previously, we have seen the circuit-SAT problem, which states that: If given a
Boolean circuit and the values of some of its inputs. Does there exist a method
from which an output I can be obtained by setting the rest of the inputs?
Cooks Statement:
Theorem:
Circuit-SAT is in NP-complete.
Proof:
SATISFIABILITY:-
A Boolean function is said to be SAT if the output for the given value of the input is
true/high/1.
1. CONCEPTS OF SAT
4. SAT ϵ NPC
1. CONCEPT: - A Boolean function is said to be SAT if the output for the given
value of the input is true/high/1.
3. SAT≤ ρ CIRCUIT SAT: - For the sake of verification of an output you have
to convert SAT into CIRCUIT SAT within the polynomial time, and through the
CIRCUIT SAT you can get the verification of an output successfully.
4. SAT ϵ NPC: - As you know very well, you can get the SAT through CIRCUIT
SAT that comes from NP.
Proof of NPC: - Reduction has been successfully made within the polynomial time
from CIRCUIT SAT TO SAT. Output has also been verified within the polynomial time
as you did in the above conversation.
Cliques:-
The clique problem is a fundamental concept in graph theory, focusing on the
identification of complete subgraphs within a given graph. In simple terms, a
clique is a set of vertices where every pair of distinct vertices is connected by an
edge.
Importance of Cliques:-
1. Network Structure:
Cliques are essential in understanding the structural patterns within networks. They
help identify densely connected subgroups of nodes, providing insights into the
organization of complex systems.
2. Biological Networks:
In bioinformatics, cliques are used to model interactions in biological networks, such
as protein-protein interaction networks. Identifying cliques aids in understanding
functional relationships among biological entities.
3. Computer Vision:
In computer vision, cliques play a role in image segmentation. They help identify
coherent regions within an image by recognizing clusters of pixels with strong
connections.
KEY COMPONENTS:-
1. Boolean Circuit:
- Composed of logical gates representing boolean operations.
- Input variables have truth values (true/false).
- Output is computed through a combination of gates.
2. Boolean Formula:
- The boolean circuit can be converted into an equivalent boolean formula in
conjunctive normal form (CNF) or disjunctive normal form (DNF).
3. SAT Instance:
- A specific assignment of truth values to the variables in the boolean formula.
- The SAT problem is to determine if there exists any assignment that makes
the formula true.
CHARACTERISTICS:-
1. Decision Problem:
- SAT is a decision problem, with the goal of answering "yes" or "no" based
on the existence of a satisfying assignment.
2. NP-Completeness:
- SAT is one of the first problems proven to be NP-complete by Stephen Cook.
This means that any problem in the class NP can be reduced to SAT in polynomial
time.
3. Complexity:
- The general SAT problem is known to be NP-complete, but specific instances
may have efficient solutions.
- Algorithms like the Davis-Putnam-Logemann-Loveland (DPLL) algorithm and its
variations are commonly used to solve SAT problems.
4. Applications:
- SAT solvers are extensively used in various areas, including hardware and
software verification, artificial intelligence, automated planning, and
optimization problems.