Week 11
Week 11
Shahid Hussain
Weeks 11, 12, and 13: October 28 – November 13, 2024: Fall 2024
1
Probability Theory
Probability Theory
2
Probability Theory: Distribution
• If S is finite and Pr{s} = 1/|S| for any s ∈ S then we have the uniform
probability distribution on S
• Two events are independent if Pr{A ∩ B} = Pr{A} Pr{B}
• A collection of events A1 , . . . An is independent if, for every set of indices
I ⊆ {1, . . . , n} we have
( )
\ Y
Pr Ai = Pr{Ai }
i∈I i∈I
4
Probability Theory: Random Variables
• The function f (x) = Pr{X = x} is the probability mass function (PMF) of the
random variable X
• From the probability axioms, f (x) ≥ 0 and Σx f (x) = 1
5
Probability Theory: Expected Values
Lemma
If we repeatedly perform independent trials of an experiment, each of which
succeeds with probability p > 0, then the expected number of trials we need to
perform until the first success is 1/p.
7
Probability Theory: Bernoulli/Binomial Trial
8
Randomized Algorithms
Randomized Algorithms
9
Randomized Algorithms
• Let us consider one problem and two randomized algorithms for this problem
one Las Vegas and one Monte Carlo
• Consider an array A of n elements
• Such that each half of the array contains 0’s and half of the array contains 1’s
• We need to find the index j such that A[j] = 1
10
Randomized Algorithms: Las Vegas Algorithm
Algorithm: find-one-LasVegas
Input: An array A of n elements, s.t. half of A is 1’s and half is 0’s
Output: The index j such that A[j] = 1
1. while true
2. j = random(1, n)
3. if A[j] = 1 return j
• Above algorithm will ultimately find an index j such that A[j] = 1 if the
random number generator used in Line 2 is does not repeatedly select the
same element again and again.
11
Randomized Algorithms: Monte Carlo Algorithm
Algorithm: find-one-MonteCarlo
Input: An array A of n elements, s.t. half of A is 1’s and half is 0’s and k > 0
Output: An index j s.t. A[j] = 1 (success), otherwise nil with p = (1/2)k (failure)
1. i = 0
2. while i < k
3. j = random(1, n)
4. i=i+1
5. if A[j] = 1 return j
6. return nil
• This algorithm does not guarantee to give correct answer all the times
• When successful it returns the index j s.t. A[j] = 1, otherwise fails with probability (1/2)k
• If k is sufficiently large then the probability of failure is very small
12
Selection Problem
13
Randomized Selection Algorithm
Independent of the choice of the splitter this algorithm finds the k-th smallest
element of S
14
Example
15
Analysis of Selection Algorithm: Worst-Case
n(n − 1)
(n − 1) + (n − 2) + · · · + 1 = = Ω(n2 )
2
16
Analyis of Selection Algorithm: Average-Case
17
Analyis of Selection Algorithm: Average-Case
• We say that the algorithm is in phase j when the size of the set under
consideration (denoted as m) satisfies the following inequality:
j+1 j
3 3
n <m≤n
4 4
• In a given iteration we say that an element is central if there are at least
⌊m/4⌋ elements which are smaller than it and at least ⌊m/4⌋ elements which
are larger than it
• If a central element is chosen as the splitter than the number of elements the
algorithm has to work with will be at most m − ⌊m/4⌋ − 1, and clearly
jmk j+1
3 3
k− −1≤m ≤n
4 4 4
• Now the algorithm is in phase j + 1
18
Analyis of Selection Algorithm: Average-Case
19
QuickSort
20
Analysis of QuickSort
n(n − 1)
(n − 1) + (n − 2) + · · · + 1 = = Ω(n2 )
2
21
Average Case Analysis of QuickSort
22
Average Case Analysis of QuickSort (cont.)
• Let Zij = {zi , zi+1 , . . . , zj } be the set of elements between zi and zj in the
sorted array
• In order for zi and zj to be compared, the pivot must be chosen from Zij
• The probability that the pivot is chosen from Zij is 2/|Zij | = 2/(j − i + 1)
• We get:
n−1 n n−1 n n−1 n
X X 2 X X 1 XX 1
E[X] = =2 ≤2 ≤ 2n ln n
j−i+1 j−i+1 k
i=1 j=i+1 i=1 j=i+1 i=1 k=2
23
String Equality Testing with Randomized Algorithms
• Problem: Alice and Bob want to check if their strings x and y are equal.
• Solution: Use a fingerprint (hash) to represent each string.
Algorithm
1. Alice selects a prime p from the set of primes less than M
2. Alice computes fp (x).
3. Alice sends p and fp (x) to Bob.
4. Bob compares fp (x) with fp (y) to check equality.
24
Probability of False Positives in String Equality
25
Probability of False Positives in String Equality
• Which is negligible
27
Pattern Matching
28
Pattern Matching
29
Pattern Matching
• Let Iq (T (j)) represents the fingerprint of the block of text T (j) modulo q
• Then the fingerprint of the block of text T (j + 1) can be computed as:
30
Pattern Matching
Algorithm: pattern-matching
Input: A text T of length n and a pattern P of length m
Output: The 1-st index j such that tj = p1 , . . . , tj+m−1 = pm or 0 otherwise
1. q = random(1, M )
2. j=1
3. Wq = 2m mod q
4. Compute Iq (P ) and Iq (T (j))
5. while j ≤ n − m + 1
6. if Iq (P ) = Iq (T (j)) return j
7. Iq (T (j + 1)) = (2Iq (T (j)) − Wq tj + tj+m ) mod q
8. j =j+1
9. return 0
31
Analysis of Pattern Matching Algorithm
• ζ ≤ (2m )n = 2mn
• Therefore, the no. of primes that divide it cannot exceed π(mn)
32
Analysis of Pattern Matching Algorithm
33
Converting Monte Carol to Las Vegas
• That is, the Las Vegas algorithm has the same expected running time as the
Monte Carlo algorithm
34
Random Sampling
35
Random Sampling
Algorithm: random-sampling
Input: Two positive integers n and k such that k < n
Output: An array A[1..k] of k distinct elements from {1, 2, . . . , n}
1. B = ⟨0⟩n // BX is an n-bit vector of 0’s
2. j=0
3. while j < k
4. r = random(1, n)
5. if Bi = 0
6. j =j+1
7. A[j] = r
8. Br = 1
9. return A
36
Analys of Random Sampling Algorithm
• If k ≈ n i.e., much larger than n/2 in that case we can discard n − k elements
and return the rest, so, assuming k ≤ n/2
• Let pj be the probability that the j − 1 elements have already been selected
1 ≤ j ≤ k, clearly
n−j+1
pk =
n
• Let Xj be the indicator random variable that the j-th element is selected
• Then the expected value of Xj is
1 n
E[Xj ] = =
pk n−j+1
• Let X = X1 + X2 + · · · + Xk then the expected value of X is
k k k n−k
X X 1 X1 X1
E[X] = E[Xj ] = n =n −n
n−j+1 n n
j=1 j=1 j=1 j=1
37
Analys of Random Sampling Algorithm
Pn
• We known j=1 1/j ≤ ln n + O(1), therefore
38