0% found this document useful (0 votes)
5 views144 pages

MADF Unit 4

The document discusses string-matching algorithms, focusing on the naive string matching method and the Boyer-Moore algorithm, explaining their mechanisms and performance in terms of best and worst-case scenarios. It also introduces the concept of a trie, a tree-based data structure used for efficient string storage and retrieval. The document highlights the properties and types of tries, emphasizing their utility in pattern matching and prefix matching tasks.

Uploaded by

Vishwajeet Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views144 pages

MADF Unit 4

The document discusses string-matching algorithms, focusing on the naive string matching method and the Boyer-Moore algorithm, explaining their mechanisms and performance in terms of best and worst-case scenarios. It also introduces the concept of a trie, a tree-based data structure used for efficient string storage and retrieval. The document highlights the properties and types of tries, emphasizing their utility in pattern matching and prefix matching tasks.

Uploaded by

Vishwajeet Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 144

Internet Algorithms

No match No match

No match match

The string-matching problem.


The goal is to find all occurrences of the pattern P = abaa in the text T = abcabaabcabac.
The pattern occurs only once in the text, at shift s = 3. The shift s = 3 is said to be a valid shift. Each
character of the pattern is connected by a vertical line to the matching character in the text, and all
matched characters are shown shaded.
• Valid shift: If pattern P occurs with shift s in T, then we call s a valid
shift;

• Invalid shift: Pattern P does match with Text string T with any of the
total shift.

• The string-matching problem is the problem of finding all valid shifts


with which a given pattern P occurs in a given text T.
Algorithms used for string matching
• Naïve String matching algorithm
• Boyer Moore Algorithm
• Knuth Morris Pratt Algorithm
• Rabin Karp Algorithm
Naïve String Matching Algorithm
( Brute Force Method)
• Naive pattern searching is the simplest method among other pattern
searching algorithms. It checks for all character of the main string to
the pattern.
• This algorithm is helpful for smaller texts. It does not need any pre-
processing phases. The substring is found by checking once for the
string. It also does not occupy extra space to perform the operation.
• The naive approach tests all the possible placement of Pattern P
[1…….m] relative to text T [1……n].
• Try shift s = 0, 1…….n-m, successively and for each shift s. Compare T
[s+1…….s+m] to P [1……m].It returns all the valid shifts found.
• That is starting from first letters of the text and first letter of the
pattern check whether these two letters are equal. if it is, then check
second letters of the text and pattern.
• If it is not equal, then move first letter of the pattern to the second
letter of the text. then check these two letters.
≠ ≠
≠ ≠

≠ ≠

Pattern found at position: 2


• Suppose T = 1011101110 , P = 111 . Find all the Valid Shift

Pattern found at position: 2


Pattern found at position: 6
What is the best case?
→The best case occurs when the first character
of the pattern is not present in text at all.
txt[] = "BBACCAADDEE";
pat[] = "HBB";

The number of comparisons in best case is O(n).


What is the worst case ?

→The worst case of Naive Pattern Searching occurs in following scenarios.

1) When all characters of the text and pattern are same.


txt[] = "DDDD";
pat[] = "DD"; ((2*(4-2+1)=>6 comparsions)

2) Worst case also occurs when only the last character is different.
txt[] = "VVVVVVVVVVVVK";
pat[] = "VVVK";

The number of comparisons in the worst case is O(m*(n-m+1)).


Boyer Moore Pattern Matching Algorithm

• Robert Boyer and J Strother Moore established it in 1977.


• The B-M String search algorithm is a particularly efficient algorithm
• The B-M algorithm takes a 'backward' approach: the pattern string (P) is
aligned with the start of the text string (T), and then compares the
characters of a pattern from right to left, beginning with rightmost character.
• If a character is compared that is not within the pattern, no match can be
found by analyzing any further aspects at this position so the pattern can be
changed entirely past the mismatching character.
• The main idea is to improve the running time of the brute-force algorithm by
adding two potentially time-saving heuristics:
1. Looking-Glass Heuristic:
When testing a possible placement of P against T, begin the
comparisons from the end of P and move backward to the
front of P.
2. Character-Jump Heuristic (also known as Bad Match table):
During the testing of a possible placement of P against T, a
mismatch of text character T [i] = c with the corresponding
pattern character P[j] is handled as follows.
a) If c is not contained anywhere in P, then shift P completely
past T [i] (for it cannot match any character in P).
b) Otherwise, shift P until an occurrence of character c in P
gets aligned with T [i].
Last Occurrence function
• Boyer-Moore’s algorithm preprocesses the pattern P and the
alphabet Σ to build the last-occurrence function last() mapping Σ to
integers
• To implement this heuristic, define a function last (c) that takes a
character c from the alphabet and specifies how far to shift the
pattern P if a character equal to c is found in the text that does not
match the pattern.
• The last-occurrence function can be represented by
an array indexed by the numeric codes of the
characters .
• The last-occurrence function can be computed in
time O (m + s ), where m is the size of P and s is the
size of Σ
• last(a) is the index of the last (rightmost) occurrence of 'a' in P, which
is 4.
• last(c) is the index of the last occurrence of c in P, which is 3
• 'd' does not exist in the pattern there we have last (d) = -1.

• last(b) is the index of last occurrence of b in P?


Illustration of jump set in BM algorithm, where l denotes
last(T[i])

Case1: we shift the pattern by one unit

Case2, we shift the pattern by j-l units


• Example
T= ABACAABADCABACABAABB
P=ABACAB

0 1 2 3 45
Step 1: Construct the Bad Match Table for P: ABACAB

c A B C *
Last ( c) 4 5 3 -1
c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B

A B A C A B Shift by 1 unit

Case1: we shift the pattern by one unit


Step a) j=5, i=5
Check if P[5]=T[5]
A !=B
last(T[5])=> last(A)=>4, Assign this to L ie l=4
If(1+l<=j) => (1+4<=5) true, so shift the pattern by j-l units ie (5-4)=1
unit, Hence , now i=6.

Case2, we shift the pattern by j-l units


c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B
A B A C A B Shift by 1 unit

Step b) j=5, i=6 P[5]=T[6] B=B


j=4, i=5 P[4]=T[5] A=A Case1: we shift the pattern by one unit
j=3, i=4 P[3]=T[4] A!=C

last(T[4])=> last(A)=>4, Assign this to l ie l=4


If(1+l<=j) => (1+4<=3) false
If(j<1+l)=> (3<1+4) true, so shift the pattern by 1 unit, Hence , now
i=7.
Case2, we shift the pattern by j-l units
c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B
A B A C A B Shift by 1 unit

Step c) j=5, i=7 P[5]=T[7] B !=A


Case1: we shift the pattern by one unit
last(T[7])=> last(A)=>4, Assign this to l ie l=4
If(1+l<=j) => (1+4<=5) true, so shift the pattern by j-l units ie (5-4)=1
unit, Hence , now i=8.

Case2, we shift the pattern by j-l units


c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B
A B A C A B Shift by 6 units

Step d) j=5, i=8 P[5]=T[8] B !=D


Case1: we shift the pattern by one unit
last(T[8])=> last(D)=>-1, Assign this to l ie l=-1
If(1+l<=j) => (1+(-1))<=5) true, so shift the pattern by j-l units ie
(5+1)=6 units, Hence , now i=14.

Case2, we shift the pattern by j-l units


c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B
A B A C A B Shift by 1 units

Step e) j=5, i=14 P[5]=T[14] B !=A


Case1: we shift the pattern by one unit
last(T[14])=> last(A)=>4, Assign this to l ie l=4
If(1+l<=j) => (1+4<=5) true, so shift the pattern by j-L units ie (5-4)=1
unit, Hence , now i=15.

Case2, we shift the pattern by j-l units


c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A B A C A A B A D C A B A C A B A A B B

A B A C A B

Step f) j=5, i=15 P[5]=T[15] B=B


j=4, i=14 P[4]=T[14] A=A
j=3, i=13 P[3]=T[13] C=C
j=2, i=14 P[2]=T[12] A=A Case1: we shift the pattern by one unit
j=1, i=13 P[1]=T[11] B=B
j=0, i=13 P[0]=T[10] A=A

Since j=0, return i=10.


Hence the pattern is present at index i=10

Case2, we shift the pattern by j-l units


Best Case Text Length=12
Pattern Length=3
0 1 2 3 4 5 6 7 8 9 10 11

A B A C A A B A D C A B c X Y Z *
Last (
c) 0 1 2 -1
X Y Z
X Y Z
X Y Z Best Time Complexity= (12/3)=(n/m)

X Y Z
Worst Case Text Length=12
Pattern Length=3
0 1 2 3 4

A A A A A c A *
Last (
c) 2 -1
A A A

Worst Time Complexity=3+3+3 comparsions


A A A
=O(m* (n-m+1))
A A A =O(mn)
What is a TRIE?
• Tree based data structure used for Information ReTRIEval task
• Also called as Digital Tree, Prefix tree, Radix Tree
• Trie is used mostly for storing strings in a compact way. E.g. words in
dictionary
• A tries supports fast pattern matching and prefix matching
• A trie, is an ordered tree data structure that is used to store a
dynamic set or associative array where the keys are usually strings.
• Unlike a binary search tree, no node in the tree stores the key
associated with that node; instead, its position in the tree defines the
key with which it is associated.
• All the descendants of a node have a common prefix of the string
associated with that node, and the root is associated with the empty
string.
• Values are normally not associated with every node, only with leaves
and some inner nodes that correspond to keys of interest.
Properties of a tries
• A multi-way tree.
• Each node has from 1 to n children.
• Each edge of the tree is labeled with a character.
• Each leaf nodes corresponds to the stored string, which is a
concatenation of characters on a path from the root to this node.
Different types of tries
• Standard trie
• Compressed trie
• Suffix trie
Standard trie
• Let S be a set of s strings from alphabet L, such that no string in S is a
prefix of another string.
• A standard trie for S is an ordered tree T with the following properties,
1. Each node of T, except the root, is labeled with a character of L.
2. The ordering of the children of an internal node of T is determined by a
canonical ordering of the alphabet L.
3. T has n external nodes, each associated with a string of S, such that the
concatenation of the labels of the nodes on the path from the root to an
ex ternal node v of T yields the string of S associated with v.
Thus, a trie T represents the strings of S with paths from the root to the
external nodes of T.
• Standard Trie for the strings
• S = {ant, any, at, bet, bear}

a b

n
t e

at

t y t e

ant any
bet
t

beat
Performance of Standard Trie
• The worst case for the number of nodes of a trie occurs when no two
strings share a common non-empty prefix; that is, except for the
root, all internal nodes have one child.
• A trie T for a set S of strings can be used to implement a dictionary
whose keys are the strings of S.
• A search for a string X is performed in T by tracing down from the
root the path indicated by the characters in X.
• If this path can be traced and terminates at an external node, then X
is in the dictionary. If the path cannot be traced or the path can be
traced but terminates at an internal node, then X is not in the
dictionary.
Important structural properties of a standard trie

• A standard trie storing a collection S of s strings of total length n


from an alphabet of size d has the following properties:
• 1. Every internal node of T has at most d children
• 2. T has s external nodes
• 3. The height of T is equal to the length of the longest string in S
• 4. The number of nodes of T is O(n)
Word Matching – an application of Tries

• A trie can be used to perform a special type of pattern matching,


called word matching, where we want to determine whether a given
pattern matches one of the words of the text exactly.
• Word matching differs from standard pattern matching since the
pattern cannot match an arbitrary substring of the text, but only one
of its words.
• Using a trie, word matching for a pattern of length m takes O(dm)
time, where d is the size of the alphabet, independent of the size of
the text.
Fig: A standard Trie for the words in the text with articles and prepositions, which areknown as stop words excluded.
COMPRESSED TRIES
• A compressed trie storing a collection S of s strings from an alphabet
of size d has the following properties:
1. Every internal node of T has at least two children and at most
d children
2. T has s external nodes
3. The number of nodes of T is O (s).

• Thus, nodes in a compressed trie are labeled with strings, which are
substrings of strings in the collection, rather than with individual
characters.

• The advantage of a compressed trie over a standard trie is that the


number of nodes of the compressed trie is proportional to the
number of strings and not to their total length.
Example

Standard trie Compressed trie.


COMPACT REPRESENTATION in memory OF
COMPRESSED TRIES
• Problem with Standard trie :We cannot search for a substring of a
particular string in a text because standard trie is based on prefix matching.
• So to handle this situation, we can use compressed tries. Therefore,
instead of storing the substrings, we store the range of indices in a nodes of
compressed trie.
• Compressed Trie uses O(n) space, where n is no. of strings in an array
which is much efficient as standard trie.
• Let the collection S of strings is an array of strings S[0] , S[ 1], . . . , S[s - 1] .
Instead of storing the label X of a node explicitly, it can be represented
implicitly by a triplet of integers (i, j, k) ,, such that X = S[i] [j..k]; that is, X is
the substring of S[i] consisting of the characters from the jth to the kth
included.
• Performance
• This compression scheme reduces the total space for the trie itself
from O(n) for the standard trie to O(s) for the compressed trie, where n
is the total length of the strings in S and s is the number of strings in S
Suffix tries
• One of the primary applications for tries is for the case when the strings in
the
• collection S are all the suffixes of a string X. Such a trie is called the suffix
trie (also known as a suffix tree or position tree) of string X.
• For a suffix trie, the compact representation can be further simplified by
constructing the trie so that the label of each vertex is a pair (i, j) indicating
the string X [i.. j] .
• To satisfy the rule that no suffix of X is a prefix of another suffix, a special
character, denoted with $ is added, that is not in the original alphabet ∑
at the end of X (and thus to every suffix)
• That is, if string X has length n, build a trie for the set of n strings X [i.. n-
1]$, for i= 0, . . . , n - 1.
Example
• Obtain the suffix trie X= MINIMIZE
• Identify the suffixes of the word minimize
• S={e$,
ze$,
ize$,
mize$,
imize$,
nimize$,
inimize$,
minimize$}
1) Empty

2) $

$ 8,8
3) e$

$ e$ 8,8 7,8

4) ze$

$ e$ ze$ 8,8 7,8 6,8

5) ize$

$ e$ ize$ ze$ 8,8 7,8 5,8 6,8


6) mize$

$ e$ ize$ mize$ ze$ 8,8 7,8 5,8 8,8 6,8

7) imize$

$ e$ i mize$ ze$ 8,8 7,8 3,3 8,8 6,8

mize$ ze$ 4,8 6,8


8) nimize$

$ e$ i mize$ nimze$ ze$ 3,3 6,8


8,8 7,8 4,8 2,8

mize$ ze$ 4,8 6,8

9) inimize$

$ e$ i mize$ nimze$ ze$ 1,1 6,8


8,8 7,8 4,8 2,8

mize$ nimize$ ze$ 4,8 2,8 6,8


9) minimize$

$ e$
i mi nimize$ ze$ 8,8 7,8 1,1 0,1 2,8 6,8

mize$ nimize$ ze$ nimize$ imize$


4,8 2,8 6,8 2,8 6,8
• Construct a standard trie, compress it and suffix trie for the word MINIMIZE( 2nd method)

• A) Standard trie

e i m n z

m z i n i e
n

i i e i m
z

m i
z m e

z
e i i

z z e

e e
B) Compressed Trie

e mi
i nimize ze

mize nimize ze nimize ze

C) Suffix Trie

7,7 1,1 0,1 6,7


2,7

4,7 2,7 6,7 6,7 2,7


Text Compression
Compression
• Text compression involves changing the representation of a file
so that the compressed output takes less space to store, or less
time to transmit, but still the original file can be
reconstructed exactly from its compressed representation.
Huffman Coding for Text Compression
• Text compression algorithms aim at statistical reductions in the
volume of data.
• One commonly used compression algorithm is Huffman coding, which
makes use of information on the frequency of characters to assign
variable-length codes to characters.
• Standard encoding schemes, such as the ASCII and Unicode systems,
use fixed- length binary strings to encode characters (with 7 bits in
the ASCII system and 16 in the Unicode system).
Huffman Coding
• Huffman Coding is a famous Greedy Algorithm.
• It is used for the lossless compression of data.
• It uses variable length encoding.
• It assigns variable length code to all the characters.
• The code length of a character depends on how frequently it occurs in
the given text.
• The character which occurs most frequently gets the smallest code.
• The character which occurs least frequently gets the largest code.
• It is also known as Huffman Encoding.
Prefix Rule-

• Huffman Coding implements a rule known as a prefix rule.


• This is to prevent the ambiguities while decoding.
• It ensures that the code assigned to any character is not a prefix of
the code assigned to any other character.
• Example
• {a=0, b=110, c=10, d=111} is a prefix code.
• 01101100 = 0 110 110 0= abba
Example relative to C1, 010011 is uniquely decodable to bad

relative to C2, 1100111 is uniquely decodable to bad

But, relative to C3, 1101111 is not uniquely decipherable since it


could have encoded either bad or acad
The Huffman Coding Algorithm
• The algorithm for Huffman encoding involves the following steps :
1. Constructing a frequency table sorted in descending order.
2. Building a binary tree carrying out iterations until completion of a
complete Binary tree:
a) Merge the last two items (which have the minimum
frequencies) of the Frequency table to form a new combined
item with a sum frequency of the two.
b) Insert the combined item and update the frequency table
3. Deriving Huffman tree Starting at the root, trace down to every
leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .
4. Generating Huffman code: Collecting the 0s and 1s for each path
from the root to a leaf and assigning a 0-1 code word for each symbol .
A C E H I
3 5 8 2 7
Step 1: Arrange all the nodes in increasing order of
their frequency value.

H:2 A:3 C:5 I:7 E:8

Step 2: Considering the first two nodes having minimum frequency,


•Create a new internal node.
•The frequency of this new node is the sum of frequency of those two nodes.
•Make the first node as a left child and the other node as a right child of the newly created node.

5 C:5 I:7 E:8

H:2 A:3

•Keep repeating Step-01 and Step-02 until all the nodes form a single tree.
•The tree finally obtained is the desired Huffman Tree.
A C E H I 10 15
3 5 8 2 7
5
C:5 I:7 E:8
H:2 A:3 C:5 I:7 E:8
H:2 A:3

5 C:5 I:7 E:8

25
H:2 A:3

10 15
I:7 E:8 10
5
C:5 I:7 E:8
5
C:5
H:2 A:3

H:2 A:3
Deriving Huffman tree Starting at the root, trace down to
every leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .

25

0 1
Generating Huffman code: Collecting the 0s and 1s for each
10 path from the root to a leaf and assigning a 0-1 code word for
15 each symbol .
0
1 0 1
5 E=11
C:5 I:7 E:8
I =10
0 1
C= 01
H:2 A:3 A = 001
H= 000
Input: ACE
Output: (001)(01)(11)
• Decoding
• Read compressed file & binary tree
• Use binary tree to decode file
• Follow path from root to leaf

25 Input: 0010111
0
0 1
00
001(A)
10 15 A0
0 A01
1 0 1
5 AC1
C:5 I:7 E:8 AC11
0 1 ACE

Output: ACE
H:2 A:3
Text Compression
Compression
• Text compression involves changing the representation of a file
so that the compressed output takes less space to store, or less
time to transmit, but still the original file can be
reconstructed exactly from its compressed representation.
Huffman Coding for Text Compression
• Text compression algorithms aim at statistical reductions in the
volume of data.
• One commonly used compression algorithm is Huffman coding, which
makes use of information on the frequency of characters to assign
variable-length codes to characters.
• Standard encoding schemes, such as the ASCII and Unicode systems,
use fixed- length binary strings to encode characters (with 7 bits in
the ASCII system and 16 in the Unicode system).
Huffman Coding
• Huffman Coding is a famous Greedy Algorithm.
• It is used for the lossless compression of data.
• It uses variable length encoding.
• It assigns variable length code to all the characters.
• The code length of a character depends on how frequently it occurs in
the given text.
• The character which occurs most frequently gets the smallest code.
• The character which occurs least frequently gets the largest code.
• It is also known as Huffman Encoding.
Prefix Rule-

• Huffman Coding implements a rule known as a prefix rule.


• This is to prevent the ambiguities while decoding.
• It ensures that the code assigned to any character is not a prefix of
the code assigned to any other character.
• Example
• {a=0, b=110, c=10, d=111} is a prefix code.
• 01101100 = 0 110 110 0= abba
Example relative to C1, 010011 is uniquely decodable to bad

relative to C2, 1100111 is uniquely decodable to bad

But, relative to C3, 1101111 is not uniquely decipherable since it


could have encoded either bad or acad
The Huffman Coding Algorithm
• The algorithm for Huffman encoding involves the following steps :
1. Constructing a frequency table sorted in descending order.
2. Building a binary tree carrying out iterations until completion of a
complete Binary tree:
a) Merge the last two items (which have the minimum
frequencies) of the Frequency table to form a new combined
item with a sum frequency of the two.
b) Insert the combined item and update the frequency table
3. Deriving Huffman tree Starting at the root, trace down to every
leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .
4. Generating Huffman code: Collecting the 0s and 1s for each path
from the root to a leaf and assigning a 0-1 code word for each symbol .
A C E H I
3 5 8 2 7
Step 1: Arrange all the nodes in increasing order of
their frequency value.

H:2 A:3 C:5 I:7 E:8

Step 2: Considering the first two nodes having minimum frequency,


•Create a new internal node.
•The frequency of this new node is the sum of frequency of those two nodes.
•Make the first node as a left child and the other node as a right child of the newly created node.

5 C:5 I:7 E:8

H:2 A:3

•Keep repeating Step-01 and Step-02 until all the nodes form a single tree.
•The tree finally obtained is the desired Huffman Tree.
A C E H I 10 15
3 5 8 2 7
5
C:5 I:7 E:8
H:2 A:3 C:5 I:7 E:8
H:2 A:3

5 C:5 I:7 E:8

25
H:2 A:3

10 15
I:7 E:8 10
5
C:5 I:7 E:8
5
C:5
H:2 A:3

H:2 A:3
Deriving Huffman tree Starting at the root, trace down to
every leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .

25

0 1
Generating Huffman code: Collecting the 0s and 1s for each
10 path from the root to a leaf and assigning a 0-1 code word for
15 each symbol .
0
1 0 1
5 E=11
C:5 I:7 E:8
I =10
0 1
C= 01
H:2 A:3 A = 001
H= 000
Input: ACE
Output: (001)(01)(11)
• Decoding
• Read compressed file & binary tree
• Use binary tree to decode file
• Follow path from root to leaf

25 Input: 0010111
0
0 1
00
001(A)
10 15 A0
0 A01
1 0 1
5 AC1
C:5 I:7 E:8 AC11
0 1 ACE

Output: ACE
H:2 A:3
COMP 6.2 DAA UNIT – 4, Part 1

CLASS P AND CLASS NP


Polynomial Problem / Algorithm
 An algorithm is said to be polynomially bounded if its worst-case complexity is bounded
by a polynomial function of the input size.
 A problem is said to be polynomially bounded if there is a polynomially bounded
algorithm for it.

Class P
 P is the class of all decision problems that are polynomially bounded. The implication is
that a decision problem X in P can be solved in polynomial time on a deterministic
computation model (or can be solved by deterministic algorithm).
 A deterministic machine, at each point in time, executes an instruction. Depending on
the outcome of executing the instruction, it then executes some next instruction, which
is unique.
 The class P consists of problems that are solvable in polynomial time. These problems are
also called tractable. The advantages in considering the class of polynomial-time
algorithms is that all reasonable deterministic single processor model of
computation can be simulated on each other.

Class NP (NP stands for Non-deterministically Polynomial)


 NP represents the class of problems which can be solved in polynomial time by a Non-
deterministic model of computation (or by a non-deterministic algorithm). That is, a
problem X in NP can be solved in polynomial-time on a non-deterministic computation
model. A non-deterministic model can make the right guesses on every move and race
towards the solution much faster than a deterministic model.
 A non-deterministic machine has a choice of next steps. It is free to choose any step that
it wishes. For example, it can always choose a next step that leads to the best solution
for the problem. A non-deterministic machine thus has the power of extremely good,
optimal guessing.
 The class NP consists of those problems that are verifiable in polynomial time. NP is the
class of decision problems for which it is easy to check the correctness of a claimed
answer, with the aid of a little extra information. Hence, we aren’t asking for a way to
find a solution, but only to verify that an alleged solution really is correct. Every problem
in this class can be solved in exponential time using exhaustive search.

1
COMP 6.2 DAA UNIT – 4, Part 1

OPTIMIZATION PROBLEM
Optimization problems are those for which the objective is to maximize or minimize some
values. For example,
 Finding the minimum number of colors needed to color a given graph.
 Finding the shortest path between two vertices in a graph.

DECISION PROBLEM
There are many problems for which the answer is a Yes or a No. These types of problems are
known as decision problems. For example,
 Whether a given graph can be colored by only 4-colors.
 Finding Hamiltonian cycle in a graph is not a decision problem, whereas checking a
graph is Hamiltonian or not is a decision problem.

CLASS - P AND CLASS - NP


 Every decision problem that is solvable by a deterministic polynomial time algorithm is
also solvable by a polynomial time non-deterministic algorithm.
 All problems in P can be solved with polynomial time algorithms, whereas all problems
in NP - P are intractable.
 It is not known whether P = NP. However, many problems are known in NP with the
property that if they belong to P, then it can be proved that P = NP.
 If P ≠ NP, there are problems in NP that are neither in P nor in NP-Complete.
 The problem belongs to class P if it’s easy to find a solution for the problem. The
problem belongs to NP, if it’s easy to check a solution that may have been very tedious
to find.

NON-DETERMINISTIC ALGORITHMS

The deterministic algorithms has the property that the result of every operation is uniquely
defined, whereas the non-deterministic algorithm contains operations whose outcomes are not
uniquely defined but are limited to specified set of possibilities.

2
COMP 6.2 DAA UNIT – 4, Part 1

To specify such algorithms some new functions are introduced as follows:


Choice (S) – arbitrarily chooses one of the elements of set S. There is no rule
specifying how this choice is to be made.
Failure () – signals an unsuccessful completion. The Non-deterministic algorithm
terminates unsuccessfully iff there exists no set of choices leading to a
success signal.
Success () – signals a successful completion. The computing time of all these three
functions are taken to be O(1), Constant.

Sample Non-deterministic Algorithms


Algorithm ND_Search (A, n, Key) The Non-deterministic search algorithm taking an
j= Choice(1..n) array A[1..n] containing n elements and the element
ifA[j] == Key then Write (j); Success(); to be searched, Key.
else Write(0); Failure(); The ND function, Choice() will return the position of
End Key in A[1..n]. Time Complexity=O(1)
Algorithm ND Sort(A, n)
For i=1 to n do
j=Choice(1..n);
B[j] = A[i]; The function Choice() will read the ith element from
For i=1 to n do // verify the order the array A[1..n] and return the proper position
If B[i] > B[i+1] then Failure(); where it has to be placed in the sorted sequence.
Write (B[1..n])
Success();
End
Algorithm ND Knapsack (n, p, w, m)
p=0; w=0;
for i= 1 to n do Knapsack decision problem-the Choice () function
x[i] = Choice(0,1); assigns 0 or 1 value to the solution vector x[1..n] at
w=w+x[i].w[i]; p=p+x[i].p[i]; every step.
If w>m then Failure(); Time Complexity= O(n), Linear.
Success();
End
Algorithm ND Clique (G, k)
S=
For i= 1 to k do
t=Choice(1..n); Clique Decision problem the nondeterministic
if t S then Failure(); function Choice() will determine the subset of k out
S= S t of n vertices that are forming a clique in linear time.
For all pairs (i,j) such that i,j S & i≠ j do Time Complexity = O(n), Linear.
If (i,j) E(G) then Failure();
Write(S); Success();
End.

3
COMP 6.2 DAA UNIT – 4, Part 1

NP – HARD & NP – COMPLETE


A problem / language B is NP-complete if it satisfies the following two conditions:
1. The problem B is in Class NP
2. Every problem A in NP is reducible to B in polynomial time.
If a problem or a language satisfies the second property, but not necessarily the first one, then
the problem / language B is known as NP-Hard.

 Informally, a search problem B is NP-Hard if there exists some NP-


Complete problem A that reduces to B.
 The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem
is proved to be NP-Complete, there is no need to waste time on trying to find an
efficient algorithm for it. Instead, we can focus on design approximation algorithm.

Polynomially equivalent problems:


 Two problems L1 and L2 are said to be polynomially equivalent if and only if L1 L2 and
L1 L2.
 To show that a problem L2 is NP-hard, it is adequate to show L1 L2 where L1 is some
problem already known to be NP – Hard.
 Since is transitive relation, it follows that if L1 L2 and L2 L3, then L3 also NP –Hard.
 To show that an NP-hard problem A is NP-Complete, just exhibit a polynomial time
Non-Deterministic Algorithm for A.

REDUCTION ( )
How to prove some problems are computationally difficult?
Consider the statement: “Problem X is at least as hard as problem Y”,
To prove such a statement: Reduce problem Y to problem X.
(i) If problem Y can be reduced to problem X, we denote this by Y ≤P X or X Y.
(ii) This means “Y is polynomal-time reducible to X.”
(iii) It also means that X is at least as hard as Y because if we can solve X, we can solve Y.

4
COMP 6.2 DAA UNIT – 4, Part 2

NP-HARD GRAPH PROBLEMS


1. CDP – CLIQUE DECISION PROBLEM

Problem Description: Given a graph G = (V, E) and a positive integer K, the Clique Decision
Problem is to determine whether the graph G contains a CLIQUE of size K or not.

To Show CDP is in Class – NP: [Verify the result in polynomial time]

For the given graph G=(V, E), use the subset V’ V of vertices in the clique as a certificate for G.
Checking V’ is a clique in polynomial time for each pair u V’, v V’ and the edge (u,v) E,
requires O(n2) .

To Show CDP is NP-hard: [Reduce an instance of known NP-hard problem into CDP Instance]

(i) Pick a known NP-hard problem: CNF – Satisfiability Problem.


(ii) Transform (Reduce) the instance of CNF – SAT into an instance of CDP problem in
polynomial time.
(iii) If CNF – SAT CDP, then CDP is also NP – Hard.

Proof: Let F = ⋀ can be a propositional calculus formula in CNF having K clauses and
let xi for 1≤i≤n be the n boolean variables or literals used in F. Construct from F, a graph G=(V,E)
such that G has a clique of size atleast K if and only if F is satisfiable.
For any F, the graph G = (V, E) is defined as follows:
The vertices V = {<a, i> | a is a literal / variable in the clause Ci (i.e) a Ci}
The edges E = {(<a,i>,<b,j>) | a and b belongs to different Clauses (i.e) i≠ j & a≠b’}

Claim: F is satisfiable if and only if G has a Clique of size atleast K.

Proof of Claim: If F is satisfiable, then there is a set of truth values for all xi such that each
clause is true with this assignment.
Let S = { <a,i> | a is true in Ci} be a set containing exactly one <a,i> for each i. Between any two
nodes <a,i> and <b,j> in S there is an edge in G since i≠j and both a and b have the true value.
Thus S forms a clique of size K.
Example: Please refer Class-work note-book.
COMP 6.2 DAA UNIT – 4, Part 2

2. NCDP – NODE COVER DECISION PROBLEM

Problem Description: A set S V is a node cover for a graph G=(V,E) if and only if all edges in E
are incident to atleast one vertex in the set S.

To Show NCDP is in Class – NP: [Verify the vertex cover in polynomial time]

 For the given graph G=(V, E), use the subset S V of vertices in the vertex cover as a
certificate for G. Checking vertices in S is covering all the edges of E requires O(|E|),
Linear
 Hence the verification of node cover is done in polynomial time, NCDP is Class NP.

To Show NCDP is NP-hard: [Reduce an instance of known NP-hard problem into NCDP Instance]

(i) Pick a known NP-hard problem: CDP – Clique Decision Problem.


(ii) Transform (Reduce) the instance of CDP into an instance of NCDP problem in
polynomial time.
(iii) If CDP NCDP, then NCDP is also NP – Hard.

Proof: Let G = (V, E) and K defines an instance of CDP. Let |V| = n, Construct from G, a new
graph G’=(V,E’) such that G’ has a node cover of atmost n-K vertices if and only if G has a clique
of size K.
The graph G’ = (V, E’) is defined as follows:
The edges E’ = {(u,v)|u V, v V and (u,v) E}. The graph G’ is the complement of G.
Claim: The graph G has a clique of size K if and only if G’ has a vertex cover of atmost n – K.

Proof of Claim:
 Let K be any clique in G, since there are no edges in E’ connecting vertices in K, the
remaining n - |K| vertices in G’ must cover all edges in E’.
 Hence, G’ can be obtained from G in polynomial time, CDP can be solved in polynomial
time if we have a polynomial time deterministic algorithm for NCDP.
 So, NCDP is also NP-Hard.

Example: Please refer Classwork note-book.


COMP 6.2 DAA UNIT – 4, Part 2

3. CNDP – Chromatic Number Decision Problem

Problem Description: A colouring of a graph G=(V,E) is a function f: V = {1, 2, …k} defined for
every i V. If any edge (u,v) is in E then f(u) ≠ f(v). The Chromatic number decision problem is to
determine whether G has a colouring for a given k.

To Show CNDP is in Class – NP: [Verify the coloured graph in polynomial time]

 Given a graph G=(V,E) and a positive integer m is it possible to assign one of the
numbers(colours) 1, 2, ..., m to vertices of G so that for no edge in E is it true that the
vertices on that edge have been assigned the same colour.
 The verification of coloured grpah is done in polynomial time and then CNDP is Class NP.

To Show CNDP is NP-hard: [Reduce an instance of known NP-hard problem into CNDP Instance]

(i) Pick a known NP-hard problem: CNF – Satisfiability Problem.


(ii) Transform (Reduce) the instance of CNF – SAT problem into an instance of CNDP in
polynomial time.
(iii) If CNF – SAT CNDP, then CNDP is also NP – Hard.

Proof: Let F be a propositional calculus formula having atmost three literals in each clause and
having ‘r’ clauses C1, C2, …Cr. Let xi , 1≤i≤n be the ‘n’ boolean variables or literals used in F.
Construct a polynomial time graph G that is n+1 colourable if and only if F is satisfiable.

For any F, the graph G = (V, E) is defined as:


V = {x1,x2,…xn} {x’1,x’2,…x’n} { y1,y2,…yn} {C1, C2, …,Cr}, where y1, y2,…, yn are new variables.
E = {(xi, xi’),1≤i≤n} {(yi, yj)|i≠j} {(yi, xj)|i≠j} {(yi, x’j)|i≠j} {(xi, Cj)|xi Cj} {(x’i, Cj)|x’i Cj}.

Claim: A graph G is n+1 colorable if and only if F is satisfiable.

Proof of Claim:
 First observe all yi form a complete sub-graph on n vertices. Since yi is connected to all
the xj and xj’ except xi and xi’, the colour ‘i’ can be assigned to only xi and xi’. However
(xi, xi’) is in E and also a new colour n+1 is needed for one of these vertices. The vertex
assigned a new colour n+1 is called a false vertex.
 Each Clause has atmost three literals, each Ci is adjacent to a pair of vertices xj, xj’ for
atleast one j. so no Ci can be assigned the colour n+1. This imply that only colours that
can be assigned to Ci correspond to vertices xj or xj’ that are in clause Ci and are true
vertices. hence G is n+1 colourable if and only if there is a true vertex corresponding to
each Ci.
 So G is n+1 colourable iff F is satisfiable.
COMP 6.2 DAA UNIT – 4, Part 3

NP HARD SCHEDULING PROBLEMS

1. SCHEDULING IDENTICAL PROCESSORS

Problem Description:
 There are m identical processors (or) machines, P1, P2, …, Pm
 There are n different jobs J1, J2, …, Jn to be processed.
 Each job Ji requires some ti processing time.
 A schedule S is an assignment of jobs to processors, which specifies the time interval
and the processor on which the job Ji is to be processed.
 A Schedule can be either a non-preemptive schedule (the processing of a job is not
terminated until the job is complete) or a preemptive schedule.
 Constraint: A job can’t be processed by more than one processor at any given time.
 The problem is obtaining a minimum finish time non-preemptive schedule.

The Mean Finish Time (MFT) of a schedule S is:

MFT (S) = ∑ , where fi is the time at which the processing of job Ji is completed.

The Weighted Mean Finish Time (WMFT) of a schedule S is:

WMFT (S) = ∑ , where wi is the weight associated with each job Ji.

The overall Finish Time (FT) of a schedule S is:

FT (S) = , where Ti is the time at which the processor Pi finishes processing


all the jobs assigned to it.

To show the Minimum finish time non-preemptive schedule is NP – Hard


(i) Choose a known NP – hard problem: Partition problem.
(ii) Reduce Partition MFT Non-Preemptive Schedule Instance.
Proof: Let ai, 1 ≤ i ≤ n, be an instance of the Partition problem. Define n jobs with processing
time requirements ti = ai, 1 ≤ i ≤ n. There is a non-preemptive MFT schedule for this set of jobs
on two processors with finish time at most ∑ , if and only if there is a partition of the
ai’s.
Claim: There is a minimum WMFT non-preemptive schedule with n jobs on two processors iff
there is a partition of the ai’s.

Proof of Claim: With the instance of the partition problem, construct a two-processor
scheduling problem with n jobs and wi = ti = ai, 1 ≤ i ≤ n.

For this set of jobs the schedule S with weighted MFT atmost ½ ∑ 2
+ ¼ (∑ )2 iff the ai’s
have a partition.
COMP 6.2 DAA UNIT – 4, Part 3

Let the weights and times of jobs on Processor P1 be (w1, t1), (w2, t2),…(wk, tk) and on the
processor P2 be (w1’, t1’), (w2’, t2’),…(wj’, tj’) and the order the in which the jobs are processed.

Then the schedule S is:


WMFT (S) = {w1t1 + w2 (t1+t2)+ …+ wk(t1+t2+…+ tk)} + {w1’t1’+ w2’ (t1’+t2’)+ …+wk’(t1’+t2’+…+ tk’)}
= ½ ∑ 2 + ¼ (∑ )2 + ½ (∑ ∑ )2
This is obtainable if and only if wi’ s and so also ai’s have a partition.
Thus the MWFT Non-preemptive Schedule problem is NP – Hard.

2. FLOW SHOP SCHEDULING

Problem Description:
 There are m processors P1, P2, P3…..Pm & there are n jobs J1, J2, J3…..Jn
 Each job Ji, 1≤i≤n requires processing on every processor Pj, 1≤j≤m in sequence.
 Each job requiring m tasks T1,i,T2,i,…Tm,i for 1≤i≤n to be performed and task Tj,i must be
assigned to processor Pj.
 A Schedule for the n jobs is an assignment of tasks to time intervals on the processors.
 For any job Jk the processing of task Tj, k, k>1 can not be started until task Tj-1, k has been
completed.
 The problem of flow shop sequencing is to assign jobs to each assigned processors in a
manner that the every processors are engaged all the time without being left ideal.
 Obtain a minimum finish time preemptive schedule is NP – hard.
To show the Minimum finish time preemptive FS schedule is NP – Hard
(i) Choose a known NP – hard problem: Partition problem.
(ii) Reduce Partition Minimum FT Preemptive FS Schedule Instance.
Proof: Let ai, 1 ≤ i ≤ n, be an instance of the Partition problem.
Let m= 3, construct the following preemptive FS instance with n+2 jobs with atmost two
non-zero tasks per job.
t1,i = ai t2,i =0 t3,i= ai, 1≤i≤n.
t1,n+1 = t2,n+1 =T t3,n+1= 0
t1,n+2 = 0 t2,n+2 =T t3,n+2= , where T = ∑ i
Claim: The constructed FS instance has a preemptive schedule with finish time at most 2T if and
only if A has a partition.
Proof of claim: If a partition problem instance A has a partition u, then there is a
non-preemptive schedule with finish time 2T. If A has no partition then all preemptive schedule
for JS must have a finish time greater than 2T. It can be shown by contradiction.
Let assume that there is a preemptive schedule for FS with finish time at most 2T:
(a) Task t1,n+1 must finish by time T as t2,n+1 = T and can not start until t1,n+1 finishes.
(b) Task t3,n+2 can not start before T units of time have elapsed as t2,n+2=T.
COMP 6.2 DAA UNIT – 4, Part 3

From the above observations,


P1 t1,n+1 Idle time of P1
P2 t2,n+2 t2,n+1
P3 Idle time of P3 t3,n+2
0 T 2T

Let V be the set of indices of tasks completed on processor P 1 by time T excluding task t1,n+1,
∑ 1,i < as A has no partition. hence, ∑ 3,i >

Thus the processing of jobs not included in V can not commence on processor P3 until after
time T since their processor P1 processing is not completed until after T. So total amount of
processing time left for processor P3 at time T is t3,n+2 + ∑ 3,i > T.
The schedule length must therefore be more than 2T.

3. JOB SHOP SCHEDULING

Problem Description:
 There are m processors P1, P2, P3…..Pm & there are n jobs J1, J2, J3…..Jn
 The time of the jth task of Ji is denoted as tk,i,j and the task is to be processed by Pk
 The task for any job are to be carried out in the order 1, 2, 3, and so on. Task j can not
begin until task j-1 has been completed
 Obtaining a minimum finish time preemptive or non-preemptive schedule is NP hard
when m=2.
To show the Minimum finish time preemptive JS schedule is NP – Hard
 (i) Choose a known NP – hard problem: Partition problem.
 (ii) Reduce Partition Minimum FT Preemptive JS Schedule Instance.
Proof: Let ai, 1 ≤ i ≤ n, be an instance of the Partition problem. Construct the following JS
instance with n+1 jobs and m=2 processors.
Jobs 1…n : t1,i,1 = t2,i,2 =ai for 1≤i≤n.
Job n+1 : t1,n+1,1 = t2,n+1,2 = t2,n+1,3 = t1,n+1,4 =
Claim: The job shop instance has a preemptive schedule with finish time at most 2T if and only
if A has a partition.

Proof of claim:
 If A has a partition u then there is a schedule with finish time 2T.
 If A has no partition then all schedules for JS must have a finish time greater than 2T.
 Let assume that there is a schedule S with finish time at most 2T.
 There can be no idle time on either P1 or P2.
 Let R be the set of jobs scheduled for P1 in the interval [0 - ]. Let R’ be the subset of R
representing jobs whose first task is completed on P1 in this interval.
 Since A has no partition ∑ i,j,1 < , consequently ∑ 2,j,2 <

 Since only the second task of jobs in R’ can be scheduled on P2 in the interval [ , T]. It
follows that there is some idle time on P2 in this interval.
Hence S must have a finish time greater than 2T.
COMP 6.2 DAA UNIT – 4, Part 4

4. APPROXIMATION ALGORITHMS

 An algorithm that runs in polynomial time and yields a solution close to the optimal solution is
called an approximation algorithm.
 We will explore polynomial-time approximation algorithms for several NP-Hard problems.

Formal Definition:
 Let P be a minimization problem and I be an instance of P.
 Let A be an algorithm that finds feasible solution to instances of P.
 Let A(I) is the cost of the solution returned by A for instance I and OPT(I) is the cost of the
optimal solution for I. Then, A is said to be an α-approximation algorithm for P if,
I, ≤ , where α ≥ 1.
 So any minimum optimization problem A(I) ≥ OPT(I). Therefore, 1-approximation algorithm
produces an optimal solution.
 An approximation algorithm with a large α may return a solution that is much worse than
optimal. So the smaller α is, the better quality of the approximation the algorithm produces.

For instance size n, the most common approximation classes are:


a. α = O(n c ) for c < 1, e.g. Clique.
b. α = O(log n), e.g. Set Cover.
c. α = O(1), e.g. Vertex Cover.

1. VERTEX COVER PROBLEM


Problem Description:
 Given a graph G = (V, E), find a minimum subset C V, such that C covers all edges in E,
i.e., every edge E is incident to at least one vertex in C.
 The optimum vertex cover must cover every edge in E (G). So, it must include at least
one of the endpoints of each edge E (G).
Example: Consider the following graph G = (V, E)
V= {a, b, c, d, e, f, g, h, i}
E= {(a,b), (a,c), (b,c), (c,e), (d,e), (e,f),
(e,i), (f,g), (f,i), (g,h), (g,i), (h,i)}
An optimal vertex cover for the graph is
C = {b, c, e, i, g}.
Algorithm APPROX – VERTEX – COVER (G)
C = ; E’ = E
Mark the degree of every vertex in V
while C does not cover all edges in E’
pick a vertex u with highest degree
C = C {u}
Delete a vertex u and all the edges incident on u, from E’
return C
COMP 6.2 DAA UNIT – 4, Part 4

2. TRAVELING SALESPERSON PROBLEM


Problem Description:
 A salesman wants to visit each of n cities exactly once each, minimizing total distance
travelled and returning to the starting point.
 Input: A complete undirected graph G = (V,E), with edge weights (costs) w, and |V| = n.
 Output: A tour (cycle that visits all n vertices exactly once and returning to starting
vertex) of minimum cost.
Approximation Algorithm for metric TSP
A metric space is a pair (S, d), where S is a set and d is a distance function that satisfies, for all u,
v, w S, the following conditions.
1. d (u, v) = d (v, u)
2. d (u, v) + d(v, w) ≥ d (u, w) (triangle inequality – a least distance to reach a vertex w
from vertex u is always to reach w directly from u, rather than through some other
vertex v).
The approximation algorithm works only if the problem instance satisfies Triangle – Inequality.
Algorithm APPROX – TSP - TOUR (G)
1. Select a starting vertex r V, to be a root vertex.
2. Compute a weighted MST T for G from r.
3. Traverse the MST T, in pre-order: v1, v2, . . . , vn.
4. Return tour: v1  v2  · · ·  vn  v1.

3. SET COVER PROBLEM


 An Instance (X, F) of the set-covering problem consists of a finite set X and a family F of
subset of X, such that every element of X belongs to at least one subset of F,
X=
 Let say that a subset S F covers all elements in X. The goal is to find a minimum size subset
C F whose members cover all of X.
X=
 The cost of the set-covering is the size of C, which defines as the number of sets it contains,
and we want |C| to be minimum.
Example 1: Consider an instance (X, F) of set-covering problem. Here, X consists of 12 vertices
and F = {T1, T2, T3, T4, T5, T6}. The minimum size set cover is C = {T3, T4, T5} and it has the size
of 3.
COMP 6.2 DAA UNIT – 4, Part 4

Greedy Approximation Algorithm for Set Cover


 Idea: At each stage, the greedy algorithm picks the set S F that covers the greatest
numbers of elements not yet covered.
The description of this algorithm as following:
1. First, start with an empty set C.
2. Let K contain, at each stage, the set of remaining uncovered elements.
3. While there exists remaining uncovered elements,
a. choose the set S from F that covers as many uncovered elements as possible,
b. put that set in C and
c. Remove these covered elements from K.
4. When all element are covered, C contains a subfamily of F that covers X, return C
Algorithm GREEDY – SET COVER (X, F)
K=X
C =
While K ≠ do
Select a subset S F that maximizes |S K|
K= K−S
C = C {S}
Return C

Example 2: An Instance (X, F) of set-covering problem where X consists of 9 vertices and


F = {T1, T2, T3, T4}.

The greedy algorithm produces set cover of size 3 by selecting the sets T1, T3 and T2 in order.
Probabilistic Algorithms
Probabilistic Algorithm
• A probabilistic algorithm is an algorithm where the result and/or the
way the result is obtained depend on chance. These algorithms are
also sometimes called randomized algorithms.
• A random source is an idealized device that outputs a sequence of
bits that are uniformly and independently distributed. For example
the random source could be a device that tosses coins, observes the
outcome, and outputs it.
• Types of Probabilistic algorithms - There is a variety of behaviours
associated with probabilistic algorithms:
• Monte Carlo algorithms
• Las Vegas algorithms
• Numerical Approximation algorithms
Pseudo Random number generation
• Random Number - A random selection of a number from a set or range of
numbers is one in which each number in the range is equally likely to be
selected.
A) True random numbers can only be generated by observations of random
physical events, like dice throws or radioactive decay. Generation of random
numbers by observation of physical events can be slow and impractical.
B) Pseudo random numbers: sequences of numbers that approximate
randomness are generated using algorithms. These numbers are inherently
non random because they are generated by deterministic mathematical
processes. Hence, these numbers are known as pseudorandom numbers.
The algorithms used to generate them are called pseudorandom number
generators.
Linear Congruence Method
• The method uses the following formula:
Xn+1 = (a * Xn + b) mod c
given seed value X0 and integer values of a, b, and c
Expected versus Average time
Las Vegas Algorithm
• Las Vegas algorithms, on the other hand, also use randomness in their approach,
but will always return the correct output.

• Example :
• Randomized Quick Sort: Here an element of a list i is chosen at random. Then the
elements of the list are compared to i create a batch of numbers L that is less
than i and a batch of numbers R that is greater than i. Then L and R are
recursively sorted, and then the three groups are placed back in order to obtain a
sorting of the original set.

• At the end of the running of this algorithm, a correct output has been obtained:
the numbers will be in sorted order. But the number of comparisons used by the
algorithm depends on the element i chosen at random.
Monte Carlo algorithms

• Monte Carlo Algorithms: These algorithms always give an answer, but


may occasionally produce an answer that is incorrect.
• A Monte Carlo algorithm always gives an answer but occasionally
makes a mistake. But it finds a correct solution with high probability
whatever the instance is processed, i.e. there is no instance on which
the probability of error is high.
• However, no warning is usually given when the algorithm gives a
wrong solution.
• Example: Primality testing
Freivalds Algorithm
• Freivalds algorithm is used to verify correctness of matrix
multiplication. It is a randomized Monte Carlo algorithm and has a
small probability of error.
There are four two-element 0/1 vectors, and half of them give the zero vector in this case Vector r=[0,0] and
r=[1,1].so the chance of randomly selecting these in two trials
(and falsely concluding that AB=C) is 1/2 2 or 1/4.
Primality Testing

• A primality test is a test to determine whether or not a given number


is prime without factoring it.
Fermat Primality Test
•Fermat test is the first probabilistic method.
• If it fails number is not a prime number
• If it passes, number MAY BE a prime number
• Based on the Fermats little theorem

If n is a prime number then,


an ≡ a (mod n)
Algorithm:

1. Pick a random value for n of the right size


2. Pick value of a such that 1 < a < n-1
3. Check if an-1 ≡ 1 (mod n)
if NO
then (“n is composite”)
if YES
then (“n is MAY BE a probable prime”)
• Given n = 15, a = 2
an-1 ≡ 1 (mod n)
215-1 ≡ 1 (mod 15)
214 ≡ 1 (mod 15)
214 mod 15 = 4

The number 15 is not a prime as it


does pass the test
• Given n = 561, a = 2
an-1 ≡ 1 (mod n)
2561-1 ≡ 1 (mod 561)
2560 ≡ 1 (mod 561)
2560 mod 561 = 1

The number 561 is a probable prime number as it passes the test


But 561=17*33( It is not a prime number)
. Miller Rabin Test

Algorithm:
Step 1: Compute n - 1 = 2k . m , Where m is odd
Step 2: Choose a random integer a such that 1 ≤ a ≤ n-1
Step 3: Compute, b = am mod n
if ( b ≡ 1 mod n ) then return (“n is prime”)
for ( i = 0 to k – 1 )
{
if ( b ≡ -1 mod n ) then return (“n is prime”)
else
b = b2 mod n
} 61
is probable prime
return (“n is composite”)
97 will be prime
561 will be composite
NUMERICAL PROBABILISTIC ALGORITHMS
• For certain real life problems, computation of an exact solution is not
possible, maybe
• because of uncertainties in the experimental data to be used
• Because a digital computer cannot represent an irrational number exactly
• Because a precise answer will take too long to compute
• A numerical probabilistic algorithm will give an approximate answer
• The precision of the answer increases when more time is given to the
algorithm to work on
Buffon Needle problem
• Buffon's needle is an early problem in geometrical probability that
was investigated experimentally in 1777 by the French naturalist and
mathematician Comte Georges Louis de Buffon.
• It involves dropping a needle repeatedly onto a lined sheet of paper
and finding the probability of the needle crossing one of the lines on
the page. The result, surprisingly, is directly related to the value of pi.
• Dropping a needle many times on to lined paper gives an interesting
(but slow) way to find π.

You might also like