Daa Complete
Daa Complete
• Chapter-1 (Introduction): Algorithms, Analysing Algorithms, Efficiency of an Algorithm, Time and Space Complexity,
Asymptotic notations: Big-Oh, Time-Space trade-off Complexity of Algorithms, Growth of Functions, Performance
Measurements.
• Chapter-2 (Sorting and Order Statistics): Concept of Searching, Sequential search, Index Sequential Search, Binary Search Shell
Sort, Quick Sort, Merge Sort, Heap Sort, Comparison of Sorting Algorithms, Sorting in Linear Time. Sequential search, Binary
Search, Comparison and Analysis Internal Sorting: Insertion Sort, Selection, Bubble Sort, Quick Sort, Two Way Merge Sort,
Heap Sort, Radix Sort, Practical consideration for Internal Sorting.
• Chapter-3 (Divide and Conquer): with Examples Such as Sorting, Matrix Multiplication, Convex Hull and Searching.
• Chapter-4 (Greedy Methods): with Examples Such as Optimal Reliability Allocation, Knapsack, Huffman algorithm
• Chapter-5 (Minimum Spanning Trees) – Prim’s and Kruskal’s Algorithms,
• Chapter-6 (Single Source Shortest Paths): - Dijkstra’s and Bellman Ford Algorithms.
• Chapter-7 (Dynamic Programming) with Examples Such as Knapsack. All Pair Shortest Paths – Warshal’s and Floyd’s
Algorithms, Resource Allocation Problem. Backtracking, Branch and Bound with Examples Such as Travelling Salesman
Problem, Graph Coloring, n-Queen Problem, Hamiltonian Cycles and Sum of Subsets.
• Chapter-8 (Advanced Data Structures): Red-Black Trees, B – Trees, Binomial Heaps, Fibonacci Heaps, Tries, Skip List,
Introduction to Activity Networks Connected Component.
• Chapter-9 (Selected Topics): Algebraic Computation, Fast Fourier Transform, String Matching, Theory of NPCompleteness,
Approximations Algorithms and Randomized Algorithms .
Chapter-1 (Introduction)
Algorithms, Analysing Algorithms,
Efficiency of an Algorithm, Time and Space
Complexity, Asymptotic notations: Big-Oh,
Time-Space trade-off Complexity of
Algorithms, Growth of Functions,
Performance Measurements.
Q Find the Largest Number Among Three Numbers ?
1. Start
2. Read the three numbers to be compared, as A, B and C.
3. Check if A is greater than B.
1. If true, then check if A is greater than C.
1. If true, print 'A' as the greatest number.
2. If false, print 'C' as the greatest number.
2. If false, then check if B is greater than C.
1. If true, print 'B' as the greatest number.
2. If false, print 'C' as the greatest number.
4. End
#include <stdio.h>
int main()
{
int A, B, C;
return 0;
}
Introduction to Algorithm
• In mathematics and computer science, an algorithm is a finite
sequence of well-defined, computer-implementable instructions,
typically to solve a class of problems or to perform a computation. A
stem by step Procedure.
• Algorithms are unambiguous specifications for performing
calculation, data processing, automated reasoning, and other tasks.
• Will accept Zero or more input, but generate at least
one output.
Properties of Algorithm
• Should terminate in finite time
• Unambiguous
• Input Zero or more output at least one
• Every instruction in algo should be effective
• It should be deterministic
Algorithm Program
Written in design Phase Written in Implementation Phase
For Algo?
• Following are the parameters which can be considered
while analysis of an algorithm
• Time
• Space
• Bandwidth
• Register
• Battery power
• Out of all time is the most important Criteria for analysis of algorithm
• How to analysis time?
Types of Analysis
• Experimental or • Apriori Analysis or
• Apostrium or • Independent analysis or
• Relative analysis • Absolute analysis
• Experimental or Apostrium or relative analysis : Means analysis of
algorithm after it is converted to code. Implement both the algorithms
and run the two programs on your computer for different inputs and
see which one takes less time.
• Advantage: Exact values no rough
• Disadvantage: final result instead of depending only algorithm
depends on many other factors like background software & hardware,
programming language, even the temperature of the room.
• Apriori Analysis or Independent analysis or Absolute analysis: we do
analysis using asymptomatic notations and mathematical tools of only
algorithm, i.e. before converting it into program of a particular
programming language.
• It is a determination of order of magnitude of a statement.
• In Asymptotic Analysis, we evaluate the performance of an algorithm in terms of
input size (we don’t measure the actual running time). We calculate, how does
the time (or space) taken by an algorithm increases with the input size.
• Asymptotic Analysis is not perfect, but that’s the best way available for analyzing
algorithms.
• It might be possible that those large inputs are never given to your software and
an algorithm which is asymptotically slower, always performs better for your
particular situation. So, you may end up choosing an algorithm that is
Asymptotically slower but faster for your software.
• the Omega notation is the least used notation among all three
Master Theorem
• In the analysis of algorithms, the master theorem for divide-and-conquer
recurrences provides an asymptotic analysis (using Big O notation) for recurrence
relations of types that occur in the analysis of many divide and conquer
algorithms.
• The approach was first presented by Jon Bentley, Dorothea Haken, and James B.
Saxe in 1980, where it was described as a "unifying method" for solving such
recurrences. The name "master theorem" was popularized by the widely used
algorithms textbook Introduction to Algorithms by Cormen, Leiserson, Rivest,
and Stein.
• The above equation divides the problem into ‘a’ number of
subproblems recursively, a >= 1
• Each subproblem being of size n/b. the subproblems (of size less
than k) that do not recurse. (b > 1)
• where f(n) is the time to create the subproblems and combine their
results in the above procedure.
T(n) = a T(n/b) + f(n)
Case 1
• Q T(n) = 4T(n/2) + n?
• Case1
• If f(n) = O (n logb a - ϵ) for some constant ϵ > 0,
• then T(n) = Θ (n logba)
• Q T(n) = 9T(n/3) + n?
• Case1
• If f(n) = O (n logb a - ϵ) for some constant ϵ > 0,
• then T(n) = Θ (n logba)
• Q T(n) = 2T(n/2) + n?
• Case1
• If f(n) = O (n log b a - ϵ) for some constant ϵ > 0,
• then T(n) = Θ (n log ba)
• Case2
• If f(n) = Θ (n log b a ),
• then T(n) = Θ (n log ba lg n)
• Q T(n) = T(2n/3) + 1?
Case 3
• If f(n) = Ω (n logb a + ϵ) for some constant ϵ > 0,
• and if a f(n/b) <= c f(n) for some constant c < 1 and all sufficiently
large n,
• then T(n) = Θ (f(n))
• Case1
• If f(n) = O (n log b a - ϵ) for some constant ϵ > 0,
• then T(n) = Θ (n log ba)
• Case2
• If f(n) = Θ (n log b a ),
• then T(n) = Θ (n log ba lg n)
• Case3
• f(n) = Ω (n log b a + ϵ) for some constant ϵ > 0,
• and if a f(n/b) <= c f(n) for some constant c < 1 and all sufficiently large n, then T(n) = Θ
(f(n))
• T(n) = T(n/3) + n ?
• Case1
• If f(n) = O (n log b a - ϵ) for some constant ϵ > 0,
• then T(n) = Θ (n log ba)
• Case2
• If f(n) = Θ (n log b a ),
• then T(n) = Θ (n log ba lg n)
• Case3
• f(n) = Ω (n log b a + ϵ) for some constant ϵ > 0,
• and if a f(n/b) <= c f(n) for some constant c < 1 and all sufficiently large n, then T(n) = Θ
(f(n))
• T(n) = 3T(n/4) + n lgn ?
Iteration method
Q Consider the following three functions.
f1 = 10n
f2 = nlogn
f3 = n√n
Which one of the following options arranges the functions in the
increasing order of asymptotic growth rate?
f2 < f3 < f4
Q Consider the following functions from positives integers to real
numbers 10, √n, n, log2n, 100/n. The CORRECT arrangement of the
above functions in increasing order of asymptotic complexity
• There are number of approaches available for sorting and some parameter based on
which we judge the performance of these algorithm.
Sorting Algorithm Best Case Worst Case
• Selection sort is noted for its simplicity and has performance advantages over
more complicated algorithms in certain situations(number of swaps, which is
n − 1 in the worst case).
}
}
Bubble / Shell / Sinking Sort (Analysis with flag)
Bubble sort (A, n)
{ • Depends on structure or content ?
for k 1 to n-1 • Both
{
ptr = 1
• Internal/External sort algorithm ?
while(ptr <= n-k) • Internal
{ • Stable/Unstable sort algorithm ?
if(A[ptr] > A[ptr+1]) • Stable
{
exchange(A[ptr],A[ptr+1])
• Best case time complexity ?
flag = 1 • O(n)
} • Worst case time complexity ?
ptr = ptr+1 • O(n2)
}
if(!flag) • Algorithmic Approach?
{ • Subtract and Conquer
break;
}
}
}
Bubble / Shell / Sinking Sort (Conclusion)
• Even other О(n2) sorting algorithms, such as insertion sort selection sort, generally run
faster than bubble sort, and are no more complex. Therefore, bubble sort is not a
practical sorting algorithm. This simple algorithm performs poorly in real world use and
is used primarily as an educational tool. More efficient algorithms such as heap sort,
or merge sort are used by the sorting libraries built into popular programming
languages such as Python and Java.
• When the list is already sorted (best-case), the complexity of bubble sort is only O(n).
By contrast, most other algorithms, even those with better average-case complexity,
perform their entire sorting process on the set and thus are more complex. However,
not only does insertion sort share this advantage, but it also performs better on a list
that is substantially sorted (having a small number of inversions).
Insertion Sort
Insertion Sort
• At each iteration, insertion sort removes one element from the input data, finds the location it
belongs within the sorted list, and inserts it there. It repeats until no input elements remain.
• At each array-position, it checks the value there against the largest value in the sorted list
(which happens to be next to it, in the previous array-position checked).
• If larger, it leaves the element in place and moves to the next. If smaller, it finds the correct
position within the sorted list, shifts all the larger values up to make a space, and inserts into
that correct position.
• The resulting array after k iterations has the property where the first k + 1 entries are
sorted ("+1" because the first entry is skipped). In each iteration the first remaining
entry of the input is removed, and inserted into the result at the correct position, thus
extending the result:
Insertion Sort (Algo)
Insertion sort (A, n) 1 2 3 4 5 6
{
for j 2 to n
{
key = A[j]
i=j-1
while(i>0 and A[i] > key)
{
A[i+1] = A[i]
i = i-1
}
A[i+1]=key
}
}
Insertion Sort (Analysis)
Insertion sort (A, n) • Depends on structure or content ?
{ • Both
for j 2 to n • Internal/External sort algorithm ?
{ • Internal
• Stable/Unstable sort algorithm ?
key = A[j] • Stable
i=j-1 • Best case time complexity ?
while(i>0 and A[i] > key) • O(n)
{ • Worst case time complexity ?
A[i+1] = A[i] • O(n2)
• Algorithmic Approach?
i = i-1
• Subtract and Conquer
}
A[i+1]=key
}
}
Insertion Sort (Conclusion)
• Insertion sort is much less efficient on large lists than more advanced algorithms such
as heapsort(O(nlogn)), or merge sort (O(nlogn)). However, insertion sort provides
several advantages:
• Efficient for (quite) small data sets, much better other quadratic sorting algorithms
such as selection and bubble sort.
Merge Sort
• In computer science, merge sort is an efficient, general-purpose, comparison-based sorting algorithm.
• Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945.
Von Neumann Architecture
Merge Sort
• Conceptually, a merge sort works as follows:
• Divide the unsorted list into n sublists, each containing one element (a list of one
element is considered sorted).
• Repeatedly merge sublists to produce new sorted sublists until there is only one
sublist remaining. This will be the sorted list.
Merge Sort(Algo)
Merge_Sort(A, p, r)
{
if(p < r)
{
q ⌊ (p + r)/2 ⌋
Merge_Sort (A, p, q)
Merge_Sort (A, q + 1, r)
Merge (A, p, q, r)
}
}
Merge (A, p, q, r)
{ Merge Sort(Algo)
n1 q – p + 1
n2 r – q
Create array L [1……... n1+1] and R [1……... n2+1]
for i 1 to n1
do L[i] = A [p + i - 1]
for j 1 to n2
do R[j] = A [j + q]
L [n1+1] ∞
R [n2+1] ∞
i1
j1
for k p to r
{
if(L[i] <= R[j])
{
A[k] = L[i]
i=i+1
}
Else
{
A[k] = R[j]
j=j+1
}
}
}
Merge Sort(Analysis)
• Depends on structure or content ?
• Structure
• Internal/External sort algorithm ?
• External
• Stable/Unstable sort algorithm ?
• Stable
• Best case time complexity ?
• O(nlogn)
• Worst case time complexity ?
• O(nlogn)
• Algorithmic Approach?
• Divide and Conquer
Merge Sort(Conclusion)
• If the running time of merge sort for a list of length n is T(n), then the recurrence
• T(n) = 2T(n/2) + n
• In the worst case, merge sort does about 39% fewer comparisons than quicksort does
in the average case.
• Merge sort is more efficient than quicksort for some types of lists if the data to be
sorted can only be efficiently accessed sequentially, and is thus popular in languages
such as Lisp, where sequentially accessed data structures are very common.
• The sub-arrays are then sorted recursively. This can be done in-place,
requiring small additional amounts of memory to perform the sorting.
Quick Sort(Algo)
Quick_Sort(A, p, r) 1 2 3 4 5 6
{
if(p < r)
{
q partition (A, p, r)
quick_Sort(A, p, q - 1)
quick_Sort(A, q + 1, r)
}
}
Quick Sort(Algo)
Partition (A, p, r)
{ 1 2 3 4 5 6
x A[r]
ip–1
for j p to r – 1
{
if(A[j] <= x)
{
ii+1
Exchange( A[i] →A[j])
}
}
Exchange( A[i + 1] →A[r])
return i+1
}
Quick Sort(Analysis)
• Depends on structure or content ?
• Both
• Internal/External sort algorithm ?
• Internal
• Stable/Unstable sort algorithm ?
• Unstable
• Best case time complexity ?
• O(nlogn)
• Worst case time complexity ?
• O(n2)
• Algorithmic Approach?
• Divide and Conquer
Counting Sort
• Although radix sorting itself dates back far longer, counting sort, and its
application to radix sorting, were both invented by Harold H. Seward in 1954.
Counting Sort
Counting Sort
• Counting sort is a non-comparative stable sorting algorithm suitable for sorting
elements within a specific range. It counts the number of objects that have
distinct key values, and then does some arithmetic to calculate the position of
each object in the output sequence.
• Counting sort is efficient if the range of the input data is not significantly greater
than the number of objects to be sorted. It's not a comparison-based sort, and
its time complexity is O(n+k), where n is the number of elements in the input
array and k is the range of the input. Space complexity is O(k).
Counting Sort
Counting_Sort(A, B, k) 1 2 3 4 5 6 7 8
{ 7 4 6 1 3 1 3 6
let C[0..k] be a new array
For I 0 to k
do C[i] 0
for j 1 to length[A]
do C[A[j]] C[A[j]] + 1
for i 1tok
do C[i] C[i] + C[i–1]
for j length[A] down to 1
do B[C[A[j]]] A[j]
C[A[j]] C[A[j]] – 1
}
Radix Sort
• Radix sort dates back as far as 1887 to the work of Herman Hollerith on
tabulating machines. Radix sorting algorithms came into common use as a way
to sort punched cards as early as 1923.
Radix Sort
• An IBM card sorter performing a radix sort on a large set of punched cards. Cards are fed into
a hopper below the operator's chin and are sorted into one of the machine's 13 output
baskets, based on the data punched into one column on the cards. The crank near the input
hopper is used to move the read head to the next column as the sort progresses. The rack in
back holds cards from the previous sorting pass.
Radix Sort
• Radix Sort is an integer sorting algorithm that organizes data by individual digits which have
the same position and value. It starts by sorting integers based on their least significant digit
using a stable sorting method like counting sort to keep the same relative order for similar key
values. After sorting by the least significant digit, it progresses to the next digit to the left,
continuing this process until it has sorted by the most significant digit.
• The time complexity is generally O(nk), where n is the number of elements and k is the
number of passes needed, which depends on the number of digits in the largest number.
• Radix Sort excels at sorting fixed-length number sequences like phone numbers or dates and
may outperform comparison-based sorts such as quicksort or mergesort if the numbers aren’t
much longer than the array size. It’s particularly adept at handling large data sets because its
speed depends more on digit count rather than the actual size of the numbers being sorted.
Radix Sort
radixSort(arr)
{
max = largest element in the given array
d = number of digits in the largest element (or, max)
Now, create d buckets of size 0 - 9
for i -> 0 to d
sort the array elements using counting sort (or any stable sort) according to the
digits at the ith place
}
Bucket Sort
• Bucket sort, also known as bin sort, is an effective sorting algorithm ideal for uniformly
distributed data. Here are the condensed key points:
• Element Distribution: Elements are distributed across several buckets based on their
value.
• Sorting Buckets: Each bucket is sorted individually, either using another sorting algorithm
or by applying bucket sort recursively.
• Combining Buckets: Sorted buckets are merged back into a single array for the final sorted
order.
• Performance: Best for data evenly distributed over a range. Average and best-case
complexity is O(n+k), with n being the number of elements and k the number of buckets.
• Space Requirement: Occupies O(n⋅k) space due to the buckets.
• Stability: Maintains the relative order of equal elements.
• Ideal Scenarios: Most efficient for large sets of floating-point numbers or uniformly
spread data.
• Drawbacks: Performance drops for non-uniformly distributed data and depends on the
input distribution and bucket count.
• In brief, bucket sort is fast and stable, best for evenly distributed datasets. It segregates
elements into buckets, sorts these, and then merges them into a sorted array.
Bucket Sort
.79 .43 .60 .11 .32 .29 .57 .82 .94 .07
0
1
2
3
4
5
6
7
8
9
BUCKET-SORT (A)
{
let B[0 . . n - 1] be a new array
n = A.length
for i = 0 to n - 1
make B[i] an empty list
for i = 1 to n
Insert A[i] into list B[nA[i]]
for i = 0 to n - 1
sort list B[i] with insertion sort
concatenate the lists together
B[0], B[1],...B[n - 1] (in order)
}
Ch-3
Divide and Conquer
with Examples Such as Sorting,
Matrix Multiplication, Convex
Hull and Searching.
Divide and conquer
• Divide and conquer is a fundamental algorithm design paradigm in computer science.
It works by recursively breaking down a problem into two or more sub-problems of the
same or related type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution to the original
problem.
• Here are the main steps involved in a divide and conquer algorithm:
• Divide: Split the problem into several sub-problems that are smaller
instances of the same problem.
• Conquer: Solve the sub-problems recursively. If the sub-problems are small
enough, solve them in a straightforward manner.
• Combine: Combine the solutions of the sub-problems into the solution for
the original problem.
• The efficiency of a divide and conquer algorithm depends on the size reduction
at each division and the efficiency of the combine step. When properly
designed, such algorithms can lead to significant reductions in time complexity,
often achieving logarithmic growth in computational cost.
Merge Sort
Quick Sort
What is Matrix Multiplication
Matrix Multiplication with Divide and Conquer
Matrix Multiplication with Divide and Conquer
Matrix Multiplication with Divide and Conquer
• Volker Strassen first published this algorithm in
1969 and thereby proved that the n3 general
matrix multiplication algorithm was not optimal.
The Strassen algorithm's publication resulted in
more research about matrix multiplication that led
to both asymptotically lower bounds and improved
computational upper bounds.
Strassens Matrix Multiplication
Convex Hull
• The convex hull problem is a classic algorithmic problem in the field of computational
geometry. The goal is to find the smallest convex polygon that encloses a set of points in a
plane. In simple terms, if you imagine a set of nails hammered into a board and you wrap a
rubber band around all the nails, the rubber band would outline the convex hull.
• Convex Hull: The smallest convex polygon formed by a set of points such that no point from
the set lies outside the polygon.
• The convex hull is a fundamental structure in computational geometry, with
applications in pattern recognition, image processing, GIS (Geographic
Information Systems), and in solving other geometric problems.
Applications:
• Pathfinding and motion planning problems.
• Collision detection in physical simulations and computer games.
• Determining the boundary of an object in machine learning and computer
vision.
• Clustering analysis in data mining.
• Supporting GIS operations like creating the boundary for geographical datasets.
GRAHAM-SCAN(Q)
{
Let p0 be the point in Q with the minimum y-coordinate, or the leftmost such point in
case of a tie.
Let <p1, p2, ..., pm> be the remaining points in Q, sorted by polar angle in counter
clockwise order around p0.
Top[S] ← 0
PUSH(p0, S)
PUSH(p1, S)
PUSH(p2, S)
for i ← 3 to m
do while the angle formed by points NEXT-TO-TOP(S), Top(S), and pi makes a non-
left turn
POP(S)
PUSH(pi, S)
Return S
}
Binary Search
• Binary search is a classic searching algorithm used to find the position
of a target value within a sorted array. It is much more efficient than a
linear search, offering O(logn) time complexity, where n is the number
of element in the array.
• Binary search locates a target value within a sorted array using a divide-and-conquer
approach:
• Initialization: Set low and high pointers at the array's start and end.
• Middle Index: Calculate the middle of low and high. Use low + (high - low) / 2 to
prevent overflow.
• Comparison: Check if the middle element is the target. If yes, return its index.
• Direction: If the target is smaller, search the left side (high becomes middle - 1); if
larger, search the right (low becomes middle + 1).
• Iteration: Repeat until low exceeds high.
• Outcome: Return the target's index or indicate it's not found.
Ch-4
Greedy Methods
with Examples Such as Optimal
Reliability Allocation, Knapsack,
Huffman algorithm
लालच बुर' बला
hअtpगर है ,
://बwुरwु
Greedy Algorithm
• A greedy algorithm is a problem solving approach like Subtract and conquer,
divide and conquer and dynamic programming, which is used for solving
optimality problem(one Solution), out of all feasible solution.
IAS Officer
Bank Officer
• Will greedy always work? - In many problems, a greedy strategy does not usually produce an
optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that
approximate a globally optimal solution in a reasonable amount of time.
• For example, a greedy strategy for the traveling salesman problem (which is of a high
computational complexity) is the following heuristic:
• "At each step of the journey, visit the nearest unvisited city." This heuristic does not intend to
find a best solution, but it terminates in a reasonable number of steps; finding an optimal
solution to such a complex problem typically requires unreasonably many steps.
• We can make whatever choice seems best at the moment and then
solve the subproblems that arise later.
• It iteratively makes one greedy choice after another, reducing each
given problem into a smaller one. In other words, a greedy algorithm
never reconsiders its choices.
http:इ/स/wमwु?wु म.k'nजoुाऊwु
• This is the main difference from dynamic programming, which is exhaustive and
is guaranteed to find the solution. After every stage, dynamic programming
makes decisions based on all the decisions made in the previous stage, and may
reconsider the previous stage's algorithmic path to solution.
Optimization Problem
• An optimization problem is a type of problem that seeks to find the best solution from
all feasible solutions. Here’s a simple breakdown:
• Optimization Problems:
• Objective: Minimize or maximize some quantity (like cost, profit, distance).
• Constraints: Set of restrictions or conditions that the solution must satisfy.
• Feasible Solution: A solution that meets all constraints.
• Optimal Solution: A feasible solution that yields the best value of the objective
function.
Activity Selection Problem
• The Activity Selection Problem is a classic problem and often used to illustrate
the concept of greedy algorithms. Here's a simplified explanation:
• You have a set of activities, each with a start and an end time. Each activity has
si a start time, and fi a finish time. If activity i is selected, the resource is
occupied in the intervals (si, fi).We say i and j are compatible activities if their
start and finish time does not overlap i.e., i and j compatible if si >= fj and sj >= fi
• You need to select the maximum number of activities that don't overlap in time.
The Goal is to Maximize the number of activities selected.
• Approach:
• Sort Activities: First, sort all activities by their finish time.
• Select First Activity: Choose the activity that finishes first.
• Iterate and Select: For each subsequent activity, if its start time is after or at
the finish time of the previously selected activity, select it.
• Activities (start time, end time): (1, 3), (2, 4), (3, 5), (5, 7)
• Sorted by end time: (1, 3), (3, 5), (2, 4), (5, 7)
• Selected Activities: (1, 3), (3, 5), (5, 7)
• At each step, you make the choice that seems best at the moment (choosing the activity that
ends earliest). This local optimization leads to a globally optimal solution.
• Use Cases:
• Scheduling tasks in a single resource environment (like a single meeting room).
• Allocating time slots for interviews or exams where no overlap is allowed.
• Complexity:
• If the activities are not sorted, the main complexity is in the sorting step, which is
typically O(n log n) for n activities.
• The selection process is O(n), as it involves iterating through the list once.
• This problem is an excellent example to teach students about greedy algorithms, showing
how a locally optimal choice can lead to a globally optimal solution in certain scenarios.
ii. Pseudo code for iterative greedy algorithm :
1. n m lengÑh [s]
2. A w [o1]
4. m «- 2 to n
5. do if
6. n A m A « la
8. return
Knap Sack Problem
• The knapsack problem or rucksack problem is a problem in combinatorial
optimization: Given a set of items, each with a weight and a value, determine the
number of each item to include in a collection so that the total weight is less
than or equal to a given limit and the total value is as large as possible.
• Knap Sack problem can be studied in two versions fractional Knap Sack
and 0/1 Knap Sack, here we will be discussing Fractional Knap Sack and
then 0/1 Knap Sack will be solved in Dynamic Programming.
Q Consider the weights and values of items listed below. Note that there is only
one unit of each item. Object O O O
1 2 3
Object O1 O2 O3
Profit 25 24 15
Weight 18 15 10
Solution
Q Consider the weights and values of items listed below. Note that there is only
one unit of each item. Object O O O
1 2 3
Object O1 O2 O3
Profit 25 24 15
Weight 18 15 10
Solution
Q Consider the weights and values of items listed below. Note that there is only
one unit of each item. Object O O O
1 2 3
Object O1 O2 O3
Profit 25 24 15
Weight 18 15 10
Profit/Weight 1.38 1.6 1.5
Solution
Problem Definition
• More formally there are n number of objects O1, O2, O3…On, each object has a
weight associated with its Wi, and a profit associated with it Pi, we can take xi
fraction of the object ranging from 0 <= xi <= 1
• Profit ∑𝑛 𝑃𝑖 . 𝑋𝑖
𝑖=1
X1 X2 X3 X N-1 XN
• The knapsack problem has been studied for more
than a century, with early works dating as far back
as 1897. The name "knapsack problem" dates back
to the early works of mathematician Tobias
Dantzig (1884–1956).
Conclusion
• It derives its name from the problem faced by someone who is constrained by a fixed-
size knapsack and must fill it with the most valuable items.
• The problem often arises in resource allocation where there are financial
constraints and is studied in fields such as
• Combinatorics
• Computer science
• Complexity theory
• Cryptography
• Applied mathematics.
Job sequencing with Deadline
• We are given n-jobs, where each job is associated with a deadline Di and a profit
Pi if the job if finished before the deadline.
• We have single CPU with Non-Primitive Scheduling.
• With each job we assume arrival time is 0, burst time of each job requirement is
1.
• Select a Subset of ‘n’ jobs, such that, the jobs in the subset can be completed
with in the deadline and generate Max profit.
Q if we have for task T1, T2, T3, T4, having Deadline D1 = 2, D2 =1, D3=2,
D4=1, and profit P1=100, P2=10, P3=27, P4=15, find the maximum profit
possible? Task T1 T2 T3 T4
Profit 100 10 27 15
Deadline 2 1 2 1
Task T1 T2 T3 T4 T5 T6 T7
Profit 35 30 25 20 15 12 5
Deadline 3 4 4 2 3 1 2
1 2 3 4
Huffman coding
• In computer science and information theory, a Huffman code is a particular type of
optimal prefix code that is commonly used for lossless data compression.
• The process of finding or using such a code proceeds by means of Huffman coding, an
algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published
in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
• In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term
paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding
the most efficient binary code.
• Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for
the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this
method the most efficient.
• In doing so, Huffman outdid Fano, who had worked with Claude Shannon to develop a similar code.
Building the tree from the bottom up guaranteed optimality, unlike the top-down approach of Shannon–
Fano coding.
• The output from Huffman's algorithm can be viewed as a variable-
length code table for encoding a source symbol (such as a character in
a file).
for i l to n 1
dlocate a newnode z
J§,M’?Mix(§)
EXTRACT-MIN(§)
jjjSERT((j, g)
return Ex1RACT-MIN reNn the root of the tree
(e) a:4
Q Consider the following character with frequency and generate Huffman tree, find
Huffman code for each character, find the number of bits required for a message of
100 characters? Character Frequency
M1 12
M2 4
M3 45
M4 17
M5 23
Ch-5
Minimum Spanning Trees
Prim’s and Kruskal’s
Algorithms
Spanning tree
• A tree T is said to be spanning tree of a connected graph G, if T is a
subgraph of G and T contains all vertices of G.
• A minimum spanning tree (MST) or minimum weight spanning tree is
a subset of the edges of a connected, edge-weighted undirected graph
that connects all the vertices together, without any cycles and with the
minimum possible total edge weight.
• Minimum spanning tree (MST) can be more than one
Kruskal Algorithm
• Joseph Bernard Kruskal, Jr. was an American mathematician,
statistician, computer scientist and psychometrician.
• Kruskal had two notable brothers
Martin David Kruskal (September 28, 1925 – December 26,
2006) was an American mathematician and physicist. He made
fundamental contributions in many areas of mathematics and
science, ranging from plasma physics to general relativity and
from nonlinear analysis to asymptotic analysis. His most celebrated
contribution was in the theory of solitons.
William Henry Kruskal (October 10, 1919 – April 21, 2005) was an
American mathematician and statistician. He is best known for having
formulated the Kruskal–Wallis one-way analysis of variance (together
with W. Allen Wallis), a widely used nonparametric statistical method.
• Kruskal Algorithm Actual idea
• जब तक graph connect
ना हो चलते जाने का
Kruskal
Minimum_Spanning_Tree (G, w)
{
Aɸ
For each vertex v ϵ V(G)
{
do Make_Set(v)
}
Sort the edges of E into non-decreasing order by weight w
for each edge (u, v) ϵ E, then in non-decreasing order by weights
{
if (Find_Set(u) != Find_Set(v))
{
A A U {(u, v)}
UNION (u, v)
}
}
Return A
}
Q Consider the following graph: Which one of the following is NOT the sequence of edges added
to the minimum spanning tree using Kruskal’s algorithm?
(A) (b, e),(e, f),(a, c),(b, c),(f, g),(c, d)
key[u] ∞
∏[u] NIL
}
Key[r] 0
Q V[G]
While (Q != ɸ) a b c d e f g
{
u Extract-Min(Q)
For each v ϵ adj[u]
{
∏[v] u
key[u] w(u, v)
}
Q For the undirected, weighted graph given below, which of the following sequences of
edges represents a correct execution of Prim’s algorithm to construct a Minimum Span-
ning Tree?
(A) (a, b), (d, f), (f, c), (g, i), (d, a), (g, h), (c, e), (f, h)
(B) (c, e), (c, f), (f, d), (d, a), (a, b), (g, h), (h, f), (g, i)
(C) (d, f), (f, c), (d, a), (a, b), (c, e), (f, h), (g, h), (g, i)
(D) (h, g),(g,i),(h,f),(f, c),(f,d),(d,a),(a,b),(c,e)
Ch-6
Single Source Shortest Paths
Dijkstra’s and Bellman
Ford Algorithm
Single Source Shortest Path
• In graph theory, the shortest path problem is the problem of finding a path between
two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is
minimized.
• Edsger Wybe Dijkstra (11 May 1930 – 6 August 2002) was a
Dutch computer scientist, programmer, software engineer, systems
scientist, and science essayist. He received the 1972 Turing Award for
fundamental contributions to developing programming languages.
• One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on
the café terrace to drink a cup of coffee and I was just thinking about whether I could do this,
and I then designed the algorithm for the shortest path.
• As I said, it was a twenty-minute invention. In fact, it was published in ’59, three years later.
The publication is still readable, it is, in fact, quite nice.
• One of the reasons that it is so nice was that I designed it without pencil and paper. I learned
later that one of the advantages of designing without pencil and paper is that you are almost
forced to avoid all avoidable complexities.
• A widely used application of shortest path algorithm is network routing protocols, most
notably IS-IS (Intermediate System to Intermediate System) and Open Shortest Path First
(OSPF).
Dijkstra algorithm (G, W, S)
{
initialize-Single-source (G, S)
Sɸ
Q V[G]
While (Q != ɸ)
{
u extract-min (Q)
S S U {u} P Q R S T U
for each vertex v ϵ adj(u)
{
relax (u, v, w)
}
}
}
Initialize_Single_Source (G, S)
{
for each vertex v ϵ V[G]
{
d[v] ∞
∏[v] NIL
}
d[S] 0
}
Relax (u, v, w)
{
if(d[v] > d[u] + w (u, v))
{
d[v] d[u] + w (u, v)
∏[v] u
}
}
http://www.knowledgegate.in/GATE
Dynamic Programming
• Divide and conquer partition the problem into independent
subproblem, solve the subproblems recursively and then combine
their solutions to solve the original problems.
n 0 1 2 3 4 5
F(n)
• Dynamic programming is like the divide and conquer method, solve
problems by combining the solutions to the subproblems.
• In contrast, dynamic programming is applicable when the subproblems
are not independent, i.e. when subproblems share subsubproblems.
• A dynamic-programming algorithm solves every subsubproblems just one and
then saves its answer in a table there by avoiding the work of recomputing the
answer every time the subproblem is encountered.
O(N)
• Dynamic programming is typically applied to optimization problems. In such case
there can be many possible solutions. Each solution has a value, and we wish to
find a solution with the optimal value (minimum, maximum).
• There are four steps of dynamic programming
• Characterize the solution of an optimal solution.
• Profit ∑𝑛 𝑃𝑖 . 𝑋𝑖
𝑖=1
X1 X2 X3 X N-1 XN
Object O1 O2 O3 O4
Profit 1 2 5 6
Weight 2 3 4 5
• The Floyd-Warshall algorithm is used to find the shortest paths between all
pairs of vertices in a weighted graph.
• It uses a dynamic programming approach to incrementally improve the solution
by considering all possible paths.
Floyd warshall problem
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
D0 = 𝑎 ∏0 = 𝑎
𝑏 𝑏
𝑐 𝑐
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
Da = 𝑎 ∏a = 𝑎
𝑏 𝑏
𝑐 𝑐
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
Db = 𝑎 ∏b = 𝑎
𝑏 𝑏
𝑐 𝑐
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
Dc = 𝑎 ∏c = 𝑎
𝑏
𝑐 𝑐
• Algorithm Steps:
• Initially, the distance between all pairs of vertices is assumed to be infinity,
except for the distance from a vertex to itself, which is zero.
• The algorithm then iteratively updates the distance matrix to include the
shortest path using at most k vertices, where k goes from 1 to the number of
vertices in the graph.
• If a shorter path is found, the distance matrix is updated accordingly.
Q Consider a set of non-negative integer S = {2, 3, 7, 8, 10}, find if there is a sub set of S with sum equal to 14?
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
10
• The subset-sum problem is defined as follows. Given a set of n positive integers,
S = {a1, a2 ,a3 ,…,an} and positive integer W, is there a subset of S whose elements
sum to W
• A dynamic program for solving this problem uses a 2-dimensional Boolean array
X, with n rows and W+1 column. X[i, j],1 <= i <= n, 0 <= j <= W, is TRUE if and only
if there is a subset of {a1 ,a2 ,...,ai} whose elements sum to j.
Systematic search for solution by trying and Systematic search for solution using optimization
Approach
eliminating possibilities. and partitioning.
To find a feasible solution that satisfies all To find the optimal solution with respect to a given
Goal
constraints. objective.
Removes candidates that fail to satisfy the Uses bounds to estimate the optimality of partial
Technique
constraints of the problem. solutions.
State Space Tree Pruning is used to chop off branches that Branches are pruned based on bounds of the cost
cannot possibly lead to a solution. function (lower and upper bounds).
• Find the Node: Locate the node you wish to delete using standard binary search tree deletion steps. If the node has two children, find the in-
order successor (or predecessor) and swap its value with the node to delete.
• Remove the Node: Delete the node from the tree. There are three possible scenarios:
• Deleting a node with no children: simply remove the node.
• Deleting a node with one child: replace the node with its child.
• Deleting a node with two children: replace the node with its in-order successor (or predecessor), which will have at most one child, and
then delete the successor.
• Fixing Double Black Issue:
• If you removed a red node, the properties still hold. If a black node was removed, this creates a "double black" issue, where one path has
one fewer black node.
• The node that replaces the deleted node (if any) is marked as "double black," which means it either has an extra black than it should, or
it's black and has taken the place of a removed black node.
• Rebalance the Tree: To resolve the "double black" issue, several cases need to be handled until the extra blackness is moved up to the root
(which can then be discarded) or until re-balancing is done:
• Case 1: Sibling is red - Perform rotations to change the structure such that the sibling becomes black, and then proceed to other cases.
• Case 2: Sibling is black with two black children - Repaint the sibling red and move the double black up to the parent.
• Case 3: Sibling is black with at least one red child - Perform rotations and recoloring so that the red child of the sibling becomes the
sibling's parent. This corrects the double black without introducing new problems.
• Termination: The process terminates when:
• The double black node becomes the root (simply remove the extra blackness).
• The double black node is red (simply repaint it to black).
• The tree has been adjusted to redistribute the black heights and remove the double blackness.
• Update Root: After rotations and recoloring, the root might change, so ensure the root of the tree is still black.
Q Insert the nodes 15, 13, 12, 16, 19, 23, 5, 8 in empty red-black tree and delete in
the reverse order of insertion?
Q Insert the following element in an initially empty RB-Tree 12, 9, 81, 76, 23, 43, 65, 88, 76, 32,
54. Now delete 23 and 81 ?
Q Insert the elements 8, 20, 11, 14, 9, 4, 12 in a Red-Black tree and delete 12, 4, 9, 14
respectively ?
Q Consider the following RB Tree and delete 55, 30, 90, 80, 50, 35, 15 respectively ?
http://www.knowledgegate.in/GATE
Q Consider the following RB Tree and delete 55, 30, 90, 80, 50, 35, 15 respectively ?
http://www.knowledgegate.in/GATE
Binomial Tree
• Development and Context: Binomial trees as a data structure were developed as part of the broader
exploration of efficient data structures for priority queue operations. They were introduced as a key
component of binomial heaps.
• Key Contributors: The concept of the binomial heap, which uses binomial trees, was introduced by J.
W. J. Williams in 1964. Later, it was further developed and popularized by Jean Vuillemin in 1978.
• Evolution of Heaps: Prior to binomial heaps, binary heaps were commonly used for priority queues.
Binomial trees and heaps were an evolution in this area, offering more efficient solutions for certain
operations.
http://www.knowledgegate.in/GATE
Binomial Tree
• A binomial tree is a specific kind of tree used in certain data structures such as
binomial heaps.
• Recursive Definition: A binomial tree of order 0 is a single node. A binomial
tree of order k consists of two binomial trees of order k−1 linked together:
one is the leftmost child of the root of the other.
http://www.knowledgegate.in/GATE
Binomial Tree Properties
• A binomial tree of order k has exactly 2k nodes. The height of a binomial tree of order k is k
• The height of a binomial tree Bk is k. That is, the tree has k+1 levels, indexed from 0 to k.
• The number of nodes at depth i in a binomial tree of order k is given by the binomial coefficient
kC i=0,1,…,k.
i
• The root of a binomial tree Bk will have a degree k, which will be greater than any other nodes.
• The root node of a binomial tree of order k has a degree of k, and it has k children. The children
are themselves root nodes of binomial trees of orders k−1,k−2,…,0, respectively.
http://www.knowledgegate.in/GATE
Binomial Heap
• A binomial heap is implemented as a collection of binomial tree. It keeps data sorted and
allows insertion and deletion in amortized time.
• Properties:
• Unlike a binary heap, which is a single tree, a binomial heap is a collection of trees. which
are ordered and comply with the min-heap (or max-heap) property.
• Each binomial tree in a binomial heap has a different order
• The heap with n elements will have at most logn + 1 binomial trees. Because binary
representation of n has logn +1 bits.
http://www.knowledgegate.in/GATE
http://www.knowledgegate.in/GATE
logn
logn
http://www.knowledgegat e.in/GATE
<i-s”
http://www.knowledgegate.in/GATE
Q Construct the binomial heap for the following sequence of number 4, 6, 3, 11,
9, 5, 14, 10, 21, 7, 13, 20, 2 ?
http://www.knowledgegate.in/GATE
Insertion in Binomial Heap Algorithm
• Create New Tree: Form a new single-node binomial tree with the insertion
value.
• Union Heaps: Merge this new tree with the existing binomial heap.
• Combine trees of the same degree as necessary.
• Update Heap: Set the merged heap as the main heap.
http://www.knowledgegate.in/GATE
Extract Min in Binomial Heap Algorithm
• Find Minimum: Traverse the root list to find the binomial tree with the minimum root
key. Let's call this tree B_min.
• Remove B_min: Remove B_min from the root list of the heap.
• Reverse Children of B_min: Reverse the order of the children of B_min, which are
now roots of binomial trees, and create a new binomial heap H' with these trees.
• Union Heaps: Perform a union operation between the original heap (without B_min)
and H'.
• Update Pointers: Update the root list and the minimum element pointer of the heap.
http://www.knowledgegate.in/GATE
Decrease Key in Binomial Heap Algorithm
• Structure: It consists of a set of marked trees. Each tree is a rooted but unordered tree that obeys the min-heap
property.
• Lazy Organization: Trees within a Fibonacci Heap are not necessarily binomial trees, and the heap does not
enforce strict structure. Trees are linked together only as necessary, which usually occurs during extract-min
operations.
• Node Marking: Nodes in a Fibonacci Heap can be marked, which indicates that a child has been lost since this
node was added to its current parent. This is part of the mechanism to limit the degree (number of children) of
nodes.
• Mechanism: When a child node is removed from a parent (which can happen during certain heap operations),
the parent node gets marked. If a marked node loses another child, it's removed from its parent, potentially
causing a cascade of changes up the heap.
• Consolidation: After the removal of the minimum node, the remaining trees are consolidated into a more
structured heap. This operation is "lazy" and is done only when necessary, such as during extract-min operations,
to keep the number of trees small.
• Degree and Number of Children: The size of a tree rooted at a node is at least F k+2 , where k is the degree of the
http://www.knowledgegate.in/GATE
node, and F is the Fibonacci sequence. This ensures that the heap has a low maximum degree.
• Amortized Costs:
• Insert: O(1) amortized
• Decrease Key: O(1) amortized
• Merge/Union: O(1) amortized
• Extract Min: O(logn) amortized
• Delete: O(logn) amortized
• The Fibonacci Heap is ideal for applications where the number of insertions and decrease-key
operations vastly outnumber the number of extract-min and delete operations, thus taking
advantage of the low amortized costs for the former operations. Its complexity comes into play
in theory more than in practical applications due to constants and lower-order terms involved
in its operations.
• Insertion in a Fibonacci heap is a simple and efficient operation. Here's how it is typically
implemented:
• Create a New Node: A new node containing the key to be inserted is created.
• Add to Root List: This new node is added to the list of roots in the Fibonacci heap.
• Update Minimum: If the new key is smaller than the current minimum key in the heap,
the minimum pointer is updated to this new node.
• Amortized Cost: The operation runs in O(1) amortized time, making it very efficient.
• The simplicity of the insertion operation is due to the lazy approach of Fibonacci heaps, where
no immediate reorganization or consolidation of the heap is performed at the time of
insertion.
• Decrease key in a Fibonacci heap is an operation to reduce the value of a given node and then adjust the heap to
maintain the heap property. Here's how it works:
• Decrease the Key: Update the value of the node to the new lower key.
• Check Heap Property: If the new key is still greater than or equal to the parent's key, the heap property is
maintained, and no further action is required.
• Cut and Add to Root List: If the new key violates the heap property (it's now less than the parent's key), cut
this node from its parent and add it to the root list.
• Mark the Parent: If the parent node was unmarked, mark it. If it was already marked, cut it as well and add
it to the root list, then continue this process recursively up the tree.
• Update Minimum: If the new key is less than the current heap's minimum key, update the minimum pointer.
• Merging in a Fibonacci heap, also known as the union operation, is the process of combining two Fibonacci
heaps into a single heap. This operation is efficient and is performed in the following way:
• Concatenate Root Lists: Combine the root lists of the two heaps into a single root list, which can be done in
O(1) time since the root lists are typically circular doubly linked lists.
• Update Minimum: Check the minimum nodes of both heaps and update the pointer to the minimum node if
necessary.
• No Immediate Consolidation: Unlike other heap structures, the trees are not consolidated right away. This is
part of the lazy strategy of Fibonacci heaps, which delays work until it's needed (e.g., during an extract-min
operation).
• Amortized Cost: The amortized cost of the merge operation is O(1).
• The merge operation in Fibonacci heaps is a foundational operation that supports the efficiency of other heap
operations, like insertions and decrease-key, because it enables the heap to maintain a more flexible structure.
• The "Extract Min" operation in a Fibonacci heap is used to remove and return the smallest key in the heap.
It's a key operation that involves more restructuring of the heap compared to other operations. Here's a
step-by-step description:
• Remove the Minimum Node: Take out the node containing the minimum key from the root list. This node
will be returned at the end of the operation.
• Add Children to Root List: The children of the minimum node are added to the root list of the heap.
• Consolidate the Heap: This step restructures the heap to ensure that no two trees have the same degree in
the root list:
• Pairwise combine trees in the root list that have the same degree until every tree has a unique degree.
• During this process, link trees by making the one with the larger root a child of the one with the smaller
root.
• Find New Minimum: Traverse the root list to find the new minimum node and update the heap's minimum
pointer.
• Time Complexity: The worst-case time complexity for this operation is O(logn), but this is the amortized
complexity because the actual work is done during the consolidation step, which doesn't happen on every
operation.
• Amortized Cost: The amortized cost is O(logn) due to the potential increase from the previous operations
which pays for the consolidation.
(k)
(mJ
• The delete operation in a Fibonacci heap removes a specific node from the heap. It's more complex
than the extract-min operation because it requires locating the node to be deleted, regardless of its
position in the heap. Here's how the delete operation works:
• Decrease Key to Minus Infinity: Decrease the key of the node to be deleted to the smallest
possible value. This is often done by setting it to negative infinity or a value lower than any other in
the heap.
• Extract Min: Perform the extract-min operation, which will now remove the node with the
decreased key since it's the smallest in the heap.
• Consolidate the Heap: If necessary, during the extract-min operation, the heap is consolidated to
ensure that it maintains the correct structure, merging trees where necessary.
• Time Complexity: The time complexity for the delete operation is O(logn) amortized, due to the
extract-min operation that is invoked.
• By combining the decrease-key and extract-min operations, the delete operation ensures that the heap
remains properly structured and that the min-heap property is maintained throughout.
Procedure Binary Heap Binomial Heap Fibonoci Heap
Make Heap O(1) O(1) O(1)
Insert O(logn) O(logn) O(1)
Min O(1) O(logn) O(1)
Extract min O(logn) O(logn) O(logn)
• Pros:
• Very simple to understand and implement.
• Works well when the patterns are short and the text is not too large.
• Cons:
• Can be very slow, especially if the pattern occurs frequently but with mismatches.
String : a b c d e f g h
Pattern : d e f
String: a b c d a b c a b c d f
Pattern: a b c d f
NAIVE-STRING-MATCHER (T, P)
{
n = T.length
m = P.length
for s = 0 to n- m
if P[1..m] == T[s + 1 ..s + m]
print "Pattern occurs with shift" s
}
Rabin-Karp Algorithm
• In computer science, the Rabin–Karp algorithm or Karp–Rabin
algorithm is a string-searching algorithm created by Richard M.
Karp and Michael O. Rabin (1987) that uses hashing to find an exact
match of a pattern string in a text.
Rabin-Karp Algorithm
• The Rabin-Karp algorithm uses hashing to find any set of pattern occurrences. Instead of
checking all characters of the pattern at every position (like the naive algorithm), it checks a
hash value.
• Think of it like looking for a specific page in a book by its unique code instead of by reading
every word. If the page number (hash) matches, then you check to make sure it's really the
page you're looking for.
• Pros:
• Faster than the naive approach on average.
• Very efficient for multiple pattern searches at once.
• Cons:
• Requires a good hash function to avoid frequent spurious hits.
• The worst-case time complexity can be as bad as the Naive algorithm if many hash
collisions occur.
Text: a a a a a b a-1
b-2
c-3
Pattern: a a b d-4
e-5
f-6
g-7
h-8
i-9
j-10
Text: a b c d a b c e a-1
b-2
c-3
Pattern: b c e d-4
e-5
f-6
g-7
h-8
i-9
j-10
Text: c c a c c a a e d b a a-1
b-2
c-3
Pattern: d b a d-4
e-5
f-6
g-7
h-8
i-9
Spurious Hits O(mn) j-10
Knuth-Morris-Pratt (KMP) Algorithm
P2: a b c d a b c y
P3: a b c d a b e a b f
P4: a b c d e a b f a b c
String: a b a b c a b c a b a b a b d
Pattern: a b a b d
KMP-MATCHER (T, P)
{
n = T.length
m = P.length
pie = COMPUTE-PREFIX-FUNCTION(P)
pie = 0
for i = 1 to n
while q > 0 and P[q + 1] ‡ T[i]
q = pie[q]
if P[q + 1] == T[i]
q=q+1
if q == m
print "Pattern occurs with shift i-m
q = pie[q]
}
COMPUTE-PREFIX-FUNCTION( P)
{
M = P.LENGTH
Let pie[1 . . . m] be a new array
pie[1] = 0
k=0
for q = 2 to m
whilek>o and p[k+1] ‡ p[q]
k=pie[k]
if p[k+1] == p[q]
k=k+1
pie[q] = k
return pie
}
What computer science deals with?
• Don’t
• So we do not study how to design a computer
• We do not study how to run a computer
• Do
• We deal with problem solving, according to computer science a problem can
be divided as follows
PROBLEM
SOLVABLE UNSOLVABLE
DECIDABLE UNDECIDABLE
P TYPE NP TYPE
Konigsberg Bridge Problem
• SOLVABLE - A problem is said to be solvable if either we can solve it or if we can
prove that the problem cannot be solved
• NP Type – a problem is said to be NP type if there exist a np time algo to solve a problem on a
deterministic machine or there exist a polynomial time algo to verify a problem
• In computational complexity theory, the Cook–Levin theorem, also known
as Cook's theorem, states that the Boolean satisfiability problem is NP-complete.
That is, it is in NP, and any problem in NP can be reduced in polynomial time by
a deterministic Turing machine to the Boolean satisfiability problem.
• An important consequence of this theorem is that if there exists a deterministic
polynomial time algorithm for solving Boolean satisfiability, then
every NP problem can be solved by a deterministic polynomial time algorithm.
The question of whether such an algorithm for Boolean satisfiability exists is thus
equivalent to the P versus NP problem, which is widely considered the most
important unsolved problem in theoretical computer science.
Stephen A. Cook 1968
Stephen A. Cook 2008
• During his PhD, Cook worked on complexity of functions, mainly on multiplication. In his seminal 1971 paper "The
Complexity of Theorem Proving Procedures", Cook formalized the notions of polynomial-time reduction (a.k.a. Cook
reduction) and NP-completeness, and proved the existence of an NP-complete problem by showing that
the Boolean satisfiability problem (usually known as SAT) is NP-complete.
• This theorem question asks whether every optimization problem whose answers can be efficiently verified for
correctness/optimality can be solved optimally with an efficient algorithm.
• Cook conjectures that there are optimization problems (with easily checkable solutions) which cannot be solved by
efficient algorithms, i.e., P is not equal to NP. This conjecture has generated a great deal of research
in computational complexity theory, which has considerably improved our understanding of the inherent difficulty of
computational problems and what can be computed efficiently. Yet, the conjecture remains open.
• In 1982, Cook received the Turing award for his contributions to complexity theory.
• Graph coloring
• 3-Sat
• Cliuqe
• Hamiltonian circuit
• Knapsack problem
• Vertex cover
• Set cover
• Partition problem
• Independent set
• Travelling salesman problem
• Job scheduling
Approximation Algorithm
• An approximation algorithm is a type of algorithm used for solving optimization problems. The key
characteristics of an approximation algorithm are:
• Purpose: It is used when finding the exact solution is too time-consuming, complex, or when an
exact solution may not even be necessary.
• Performance: It quickly finds a solution that is close to the best possible answer, or "optimal
solution."
• Guarantee: It provides a provable guarantee on how close the output is to the optimal solution,
usually expressed as a factor of the optimal value.
• Complexity: Often, the problems tackled by approximation algorithms are NP-hard, meaning that
no known polynomial-time algorithms can solve these problems to optimality.
• Use Cases: They are commonly used in fields such as operations research, computer science, and
engineering for problems like scheduling, packing, and routing where exact solutions are less
feasible as the problem size grows.
• The concept of approximation is fundamental in scenarios where a perfectly precise answer is
either impossible or impractical to obtain, and thus a solution that is "good enough" is acceptable
given the constraints of time and resources.
Feature Approximation Algorithm Deterministic Algorithm
Result Always accurate; results are either correct or Results are probabilistically accurate; there's a
Accuracy the algorithm does not provide an answer. non-zero chance of error.
Execution Variable; can take a long time to find a solution, Typically faster; provides approximate
Time but the solution is correct. solutions in a reasonable time frame.