0% found this document useful (0 votes)

23 views13 pages

Randomized Algorithms Notes

The document discusses hashing and its use for efficient membership queries on a set of integers. It describes how a truly random hash function can map elements to an array in O(1) expected time by randomly assigning elements to buckets. However, truly random hash functions require too much space. Instead, c-approximate or k-wise independent hash families are used in practice to achieve similar performance while using less space. The document also discusses how hashing can be applied to nearest neighbor search on binary vectors by mapping vectors to buckets and searching buckets to find the closest neighbor to a query vector.

Uploaded by

Sebastian Yde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views13 pages

Randomized Algorithms Notes

Uploaded by

Sebastian Yde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Randomized Algorithms - Eksamen

Sebastian Yde Madsen

August 20, 2023

1 Hashing
• General introductory description of Hashing
– Want to store a set S of n (integer) keys:

S ≡ {x1 , x2 , . . . , xn }, xi ∈ Universe of integers ≡ [U ] = {0, 1, . . . , U − 1} (1)

in a memory economical data structure, that supports membership queries1 (MQ) in an effecient manner -
i.e. generally supports MQ in (expected) sub-linear time.
– Generally accomplished through that utilization of a hash function h ∈ H (where we assume that h can be
evaluated in O(1) time). One could consider the Hashing w. Chaining (HWC) data structure2 w. an array A
of size m, then:
h : [U ] → [m] (2)
– Since we must have m ≪ U , there must always be collisions, i.e.

∃ x ̸= y ∈ [U ] s.t. h(x) = h(y) (3)

resulting in linked lists of sizes > 1.

– To remedy this, i.e. minimize expected length of linked list in HWC (corresponds to expected query time), we
decide:
H = Family of truly random Hash functions (4)
Any h ∈ H is defined by:
Hash of all others.
z }| {
1. ∀ x ∈ [U ], h(x) is independent of h [U ] \ h(x) .
1
2. h is uniformly random in range, i.e. ∀ x ∈ [U ], ∀ i ∈ [m], P[h(x) = i] = m
and prior to storing S, we randomly pick h ∈ H.
– Now, it is easy to realize that we have linear worst case MQ time (in case entire S is mapped to same
linked-list):
1
Worst Case: O(n) with P[h(x1 ) = h(x2 ) = . . . = h(xn )] = (Due to independence of h.) (5)
mn
and we can also easily determine expected MQ time by realizing that it corresponds to expected length of a
linked list in HWC data structure.

Specifically consider expected length of linked list that some query element q hashes to - define binary in-
dicator variable: (
1 , h(xi ) = h(q)
Xi ≡ (6)
0 , o.w.
and then:
n
X
X≡ Xi (7)
i=1

1 I.e. given some x, answer the question: is x ∈ S.

2 Outer array of size m, w. linked lists (tuples of values, and pointers to next (value,pointer)).

1
and then utilize Linearity of Expectation + definition of expectation of discrete random variable:
h i n n n
X X X
E |A[h(q)]| = E[MQ time] = E X = E[Xi ] = P[Xi = 1] · 1 + P[Xi = 0] · 0 = P[Xi = 1] (8)
i=1 i=1 i=1

and then depending on whether q ∈ S:

n
X X n−1 n
q∈S: P[Xi = 1] = 1 + P[Xi = 1] = 1 + ≤1+ (9)
m m
i=1 xi : xi ̸=q
n
X n
q∈
/S: P[Xi = 1] = (10)
i=1
m

but in general, if we set m = n, we get the bound:

h i n
E |A[h(q)]| = E[MQ time] = E X ≤ 1 + = 2 (11)
n
→ a constant expected MQ time:
Expected: O(1) (12)
– Furthermore, one might be concerned w. determining an upper bound on the probability of the length of the
linked list |A[h(q)]| being longer than the some value.

Since this is a non-negative random variable we can utilize Markovs Inequality:

E[X] 2
∀t > 0, P X>t < ≤ (13)
t t
and if we wanted to know a bound of the probability of this random variable being significantly bigger than its
expectation, we might utilize a formulation of the Chernoff bound to find:
h i 1
P X > 4 ln(n)/ ln(ln(n)) < 2 (polynomially decreasing in n.) (14)
n
and we might even utilize Union bound to find that none of the linked lists are longer than 4 ln(n)/ ln(ln(n))
w. probability ≥ 1 − n1 (for m = n):
h i 1
P ∀i : A[i] ≤ 4 ln(n)/ ln(ln(n)) ≥ 1 − (Monotonically increasing in n.) (15)
n
– Note, that in practice one doesn’t utilize Truly Random Hash functions, as that would require storing an
array of size of universe filled w. uniformly random values. This means that one would need (typically much)
more space for storing h, than for storing the keys S - not practical.
Understand this
Instead one utilizes c−approx universal families, defined by collision probability:
h i c
∀x ̸= y ∈ [U ], Ph∈H h(x) = h(y) ≤ (16)
m
or k−wise independent families (sometimes also called k−strongly universal), defined by collision prob-
ability:
(
x1 ̸= x2 ̸= . . . ̸= xk ∈ [U ] h i 1
∀ , Ph∈H h(x1 ) = y1 ∧ h(x2 ) = y2 ∧ . . . ∧ h(xk ) = yk = k (17)
y1 , y2 , . . . , yk ∈ [m] m

or sometimes the combination (c, k)-independent families, defined by:

(
x1 ̸= x2 ̸= . . . ̸= xk ∈ [U ] h i c
∀ , Ph∈H h(x1 ) = y1 ∧ h(x2 ) = y2 ∧ . . . ∧ h(xk ) = yk ≤ k (18)
y1 , y2 , . . . , yk ∈ [m] m

• Specific application - Nearest Neighbour Search

2
– In the following it is assumed that we are dealing w. binary vectors:

U ≡ {0, 1}d corresponding to [U ] = [2d ] (19)

and Hamming distance measure:

distHAM (x, y) = |{i : x[i] ̸= y[i]}| (20)

– One of the practical applications of hashing is it’s utilization in Nearest Neighbour Search (NNS). NNS is
defined as follows: Given n data points:
S ≡ {x1 , x2 , . . . , xn } (21)
stored in a data structure, and a relevant distance measure defined on the ’universe’:

dist : U → R≥0 (22)

and some query element q, return the element in x∗ ∈ S closest to q - specifically:

x∗ = min dist(x, q)

(23)
x∈S

– Problem w. exact
NNS is that (mostly believed) that there exist no alg. w. sub-linear expected query time when
using O poly(n) space and having dim(U ) ≥ Ω(lg(n)).
– Therefore we instead consider an approximate formulation called c−Approx R−NNS: if:

min dist(x, q) = R, (24)
x∈S

then return any:

xi with dist(q, xi ) ≤ cR, (c =approximation factor.), (25)
– Specifically, we consider relaxed c−Approx R−NNS where we fix R initially, and say: if:

∃ xi with dist(xi , q) ≤ R (26)

return any:
xj with dist(xj , q) ≤ cR. (27)
– Turns out that we can solve c−Approx R−NNS w. acceptable overhead by construction of multiple relaxed
c−Approx R−NNS data structures w. increasing R:

R0 , (1 + α)R0 , . . . , (1 + α)i R0 , . . . , (1 + α)N R0 (28)

– Now, say the smallest of these that returns an xj has R = (1 + α)i R0 → we know that there is no points with
dist(x, q) < (1 + α)i−1 R0 and therefore that:

R = min dist(q, x) ≥ (1 + α)i−1 R0

(29)
x∈S

and as such that we have solved the original c−Approx R−NNS with. approximation factor c(1 + α).
– If turns out to be reasonable to set:

(1 + α)N R0 ≡ max dist(x, y)

R0 ≡ min dist(x, y) , (30)
x̸=y∈S x̸=y∈S

s.t.:

maxx̸=y∈S dist(x, y) d
# relaxed c−Approx R−NNS = log1+α ≤ log1+α = log1+α (d) (31)
minx̸=y∈S dist(x, y) 1

and to get rid of weird logarithmic basis, we utilize 1. order Mclaurin Expansion ex ≈ 1 + x and write:

ln(d) ln(d) ln(d)

log1+α (d) = ≈ = (32)
ln(1 + α) ln(eα ) α

3
– Now, say that each of them takes space s0 and time t0 , and that we utilize binary search to find the smallest one
returning some xj , then the time and space consumption becomes:

ln(d)
t = t0 log = t0 log ln(d) + log(1/α) (33)
α
ln(d)
s = s0 (34)
α
sub-linear overhead → acceptable.

– Now, lets see how we might utilize Locality sensitive hashing in this context. This is a type of hash function,
where the collision probability is inversely proportional to the distance → close points collide often and vice versa.
– We say that the family H is (R, cR, P1 , P2 )−sensitive iff:
1. ∀ x, y with dist(x, y) ≤ R ⇒ PH [h(x) = h(y)] ≥ P1
2. ∀ x, y with dist(x, y) ≥ cR ⇒ PH [h(x) = h(y)] ≤ P2
(for P1 > P2 ).
– In the case of eq. (19), we can check that the family of hash-functions that returns (at uniform random) a bit of
the binary vectors:
H ≡ h(x) = x[i] i = 1, 2, . . . , d (35)
s.t.:
1
∀ x ∈ [U ], PH h(x) = x[1] = PH h(x) = x[2] = . . . = PH h(x) = x[d] = . (36)
d
is (R, cR, P1 , P2 )−sensitive. This is done by picking some x, y ∈ U w. dist(x, y) ≥ R and seeing that:
R
PH h(x) = h(y) ≥ 1 − (37)
d
R cR R cR
s.t. P1 = 1 − d. Equivalently one can determine P2 = 1 − d , s.t. that H is (R, cR, 1 − d,1 − d )−sensitive.
– Now, naive and tempting to use HWC to store S, but it turns out that this results in both P2 (prob. of ’far away’
points hashing to same), being to big, and P1 (prob. of ’close’ points hashing to same) being to small for usual
vals. of R, d - in fact we can by choosing some values of R, d see that we have a small constant gap= P1 − P2 (we
need to create a big - possibly not constant - one).
– What we do to remedy this is to draw k hash functions from H (independently) and define hash of some point
x ∈ U , as concatenation:
bit-string∈{0,1}k
z }| {
g(x) = h1 (x) ◦ h2 (x) ◦ . . . ◦ hk (x) (38)
resulting in a ’far away’ collision prob. bound:

∀ x, y ∈ U, dist(x, y) ≥ cR ⇒ Pg∼H g(x) = g(y) ≤ P2k

(Just prod. as hi ’s ∼ H independently.) (39)

and common design choice is to take:

1 1
P2k = ⇐⇒ k = logP2 (40)
n n
– From here one can show that the expected nr. of ’far away collision points’ is at most 1. Specifically, by utilizing
Linearity of Expectation + definition of expectation of discrete random variable it is seen that:

E |{x ∈ S : g(x) = g(q) ∧ dist(x, q) ≥ cR}| ≤ 1 (41)

– However, we are still faced w. problem from definition of alg. - if there is any xi ∈ S with dist(xi , q) ≤ R the
algorithms should return some xj with dist(xj , q) ≤ cR, but what if no x ∈ S w. dist(x, q) ≤ cR, that hashes to
same as query → then alg. doesn’t return anything valid.
– In particular, if we consider the ’close point’ collision probability now:

∀ x, y ∈ U, dist(x, y) ≤ R ⇒ Pg∼H g(x) = g(y) ≥ P1k (Just prod. as hi ’s ∼ H independently.)

(42)

we can see that unfortunately P1k is to small for common vals. of R, d (even though concatenation of k hash
functions has made initially constant gap (P1 − P2 ) → some polynomially big gap in n → (P1k − P2k ) is now
function of n. )

4
– Final step to remedy this is and guarantee a success prob. of at least 12 , is to draw L independent g’s from H (a
total of L × k independent h’s.):
g1 (x) = h11 (x) ◦ h12 (x) ◦ . . . ◦ h1k (x)
g2 (x) = h21 (x) ◦ h22 (x) ◦ . . . ◦ h2k (x)
..
.
gL (x) = hL1 (x) ◦ hL2 (x) ◦ . . . ◦ hLk (x) (43)
(44)
and thus create hash table w. L copies.
– The final algorithm then becomes: and as it solves the problem w. probability 1/2, one could always repeat

Algorithm 1 : Locality Sensitive Hashing Query for Relaxed c-Approx R-NNS.

procedure query(q)
counter ← 0
for i = 1 to L do
for xj ∈ {x ∈ S : gi (x) = gi (q)} do ▷ All points that hash to same as query (for ith hash function).
counter ← counter + 1
if dist(xj , q) ≤ cR then
return xj
end if
if counter > 3L then
Abort algorithm
end if
end for
end for
end procedure

multiple times to improve probability of success.

– If we store pointers in the L copies (of the n data points), the space usage becomes:
Space : O(nL + nd), (’n’ d−dim points take nd space.) (45)
and as we have to compute distance (which takes d for each point) at most 3L times, and if we assume it takes
t0 time to evaluate hij (such that it takes kt0 to evaluate each gi ), the time usage becomes:
Time : O(dL + kt0 L) (46)

Is this expected or worst case or?

– At this point another design choice is made - specifically chose:

1
L = log1−P1k (47)
8
such that the probability of any ≤ R ’close points’ xi hashing to same bucket (linked list) as the query point q,
becomes less than 1/8:
L L 1
< 1 − P1k

Pg∼H ∀ j, gj (xi ) ̸= gj (q) = 1 − Pg∼H g(xi ) = g(q) = (48)
8
– Finally - to see that this results in a success probability > 1/2 we will utilize Union Bound - therefore we
initially define the 2 bad events resulting in unsuccessful execution:
1. Case w. at least one xi ∈ S w. dist(xi , q) ≤ R (For sake of analysis just consider case w. only 1) but it never
hashes to same as q:
E1 ≡ {∀ j, gj (xi ) ̸= gj (q)} (49)
2. Case w. more than 3L ’far away’ (≥ cR) points hashing to same as query (then its possible that alg. aborts
even though there might be point ≤ R):
l
X
E2 ≡ {xi ∈ S : dist(xi , q) ≥ cR ∧ gj (xi ) = gj (q)} > 3L (50)
j=1

5
to this end, we bound the probability of E2 via Markov’s inequality:
E 1gj (x)=g(q)
PL P
j=1 x∈S: PL 1
E x dist(x,q)≥cR j=1 n n 1
P[E2 ] = P X > 3L < = ≤ = (51)
3L 3L 3L 3
such that, by utilizing union bound, we see:
h i X 1 1 13 1
P E1 ∪ E2 = 1 − P E1 ∪ E2 ≥ 1 − P Ei = 1 − − = > (52)
i
8 3 24 2

Furthermore, we may determine a bound on L. By getting rid of annoying basis in eq. (47) and plugging in values
of P1 , P2 , we arrive at:
1
L ≤ 3n c (53)
and then one can see the effect of choice of approximation factor, on the space and time consumption as defined
in eqs. (45) and (46).

2 Streaming and Dimensionality Reduction

• General introductory description of Streaming
– Fundamental idea is that we only receive or read a smaller portion of some bigger data set, and that we do it in
such a way that we can only see each element xi once - a stream of elements. If we end up having streamed n
elements:
x1 , x2 , . . . , xn , xi ∈ U (54)
a fundamental goal/requirement of alg. is to use much less memory than n - specifically:
Memory : O poly(log(n)) = O(log(n)c )

(55)

– Even though we cannot access all data at once, we might still be able to calculate interesting properties of data,
with high accuracy (with high probability). A typical goal is to determine frequencies of items - i.e. how often
various things occur in stream.
– Naive approach would be to store counter for ∀ xi ∈ U , but this would typically be very memory expensive → it
is not immediately clear than one can determine most frequent item (Heavy Hitter) in stream w. memory
≪ min(n, |U |).
• Lets consider a simple problem that only requires approximation (and not also randomization) for solving - we solve
w. deterministic algorithm.
– The Frequency estimation / Point queries problem - defined by query(i): How many times has the ith
element occurred (so far)?
– (Think of elems. as keys from now on) Quickly define frequency vector:
f ∈ R|U | , (An entry for each elem. in ’universe’.) (56)
th
with i entry defined as # occurrences of xi ∈ U , in stream, so far.
As such, one might consider the problem as that of creating a smaller (≪ R|U | ) representation f˜, of f → then
query(i) should return f˜i .
– Simply algorithm achieving this is Misra-Gries - takes space budget k (size of f˜) and supports Update(i) and
Estimate(i):

Algorithm 2 : Misra-Gries Update(i)

1: procedure Update(i) Algorithm 3 : Misra-Gries Estimate(i)
2: if exists counter ci for key i then
1: procedure Estimate(i)
3: ci ← ci + 1.
2: if exists counter ci for key i then
4: else if have < k counters stored then
3: Return ci .
5: add (i, 1) as counter.
4: else
6: else
5: Return 0.
7: Decrement all counters by 1.
6: end if
8: remove all counters = 0.
7: end procedure
9: end if
10: end procedure

6
– Obviously doesn’t always return exact count (decrements and remove counters sometimes). Lets analyze how
close it comes. It’s fairly obvious that:

f˜i ≤ fi (Never over-count) (57)

and to give a lower bound one must realize:

1. Difference between exact count and estimate (|f˜i − fi |) only grows by 1 during else in the Update(i).
2. The sum of all counts in f˜ has to be ≥ 0.
3. each else in the Update(i) decrements sum of all counts in f˜ by k (because it decrements all by 1).
as such else in the Update(i) is ’visited’ at most n/k times (otherwise sum of counts could become negative),
and the complete interval bound is therefore:
n
fi − ≤ f˜i ≤ fi (58)
k

and therefore we also know that every item/key i which occurs more than n/k times in stream is guaranteed to have
a counter ci > 0 in f˜.
• Now, lets generalize problem in way that requires randomization for solution (if one wants to store anything less than
entire stream).

– Specifically, we now consider possibility of performing generally sized integer updates of counters (instead of just
±1), for key i we can perform:

f˜i ← f˜i + ∆, ∆ ∈ {−M, −M + 1, . . . M − 1, M } (59)

here one considers 2 different frameworks:

1. Strict Turnstile: we can still add and subtract from frequency, but we guarantee it remains non-negative,
i.e. f˜i ≥ 0
2. General Turnstile: we can add and subtract from frequency, and we don’t guarantee it remains non-
negative.
– Lets first consider the Strict turnstile setting, and define the kind of guarantee we want:
always true
z}|{ X
fi ≤ f˜i ≤ fi + ε · ||f ||1 , with the 1−norm defined as ||f ||1 ≡ |fi | (60)
| {z }
i
true with prob. 1 − δ

Understand why it never underestimates

and then outline the data structure and alg. used to obtain this. Given ′ k ′ -sized array A:
k

and some 1-approx universal hash function:

1
h : U → [k], with ∀x ̸= y ∈ U : Ph∼H h(x) = h(y) ≤ (61)
k
the data structure supports Update(i, ∆) and Estimate(i), like:

Algorithm 4 : Strict turnstile Update(i, ∆) Algorithm 5 : Strict turnstile Estimate(i)

1: procedure Update(i, ∆) 1: procedure Estimate(i)
2: A[h(i)] ← A[h(i)] + ∆ 2: Return A[h(i)]
3: end procedure 3: end procedure

– Now, lets determine the space budget (value of k) required to have an additive error of at most ε||f ||1 w.
probability 1 − δ - corresponding to failing w. prob. δ.

7
– Consider the value that the array in principle holds for any key i:
≡X
zX }| {
A[h(i)] = True frequency + Noise from colliding keys = fi + 1h(xj )=h(xi ) fj (62)
j : j̸=i

and specifically (as the noise X is non-negative R.V.) utilize Markov’s Inequality, to determine probability of
having too large additive error:
E X
P X > ε||f ||1 ] < (63)
ε||f ||1
and to that end bound the expectation utilizing Linearity of Expectation + definition of expectation of
discrete random variable:
=P[h(i)=h(j)]
X z h }| i{ X X 1 ||f ||1
fj 1h(j)=h(i) = fj E 1h(i)=h(j) <
X
E X =E fj P[h(i) = h(j)] ≤ fj = (64)
j j
k k
j : j̸=i j : j̸=i

such that:
1
P X > ε||f ||1 ] < (65)
εk
and then, for some given success prob. 1 − δ (and resulting failure prob. δ), and additive error factor ε, one can
always chose k:
1 1
<δ⇔k> (66)
εk εδ
– Now, it actually turns out that one can get an even smaller memory dependence on δ. In fact it suffices to chose
k: 1
1
k=O log (67)
ε δ
and what we do to achieve this dependence is usual trick of repeating data structure. Specifically just perform
t independent repetitions:
k
A1 : , h1
A2 : , h2
..
.
At : , ht

and then define the supported methods be:

Algorithm 6 : Count-min sketch Update(i, ∆)

1: procedure Update(i, ∆) Algorithm 7 : Count-min sketch Estimate(i)
2: for j = 1, . . . , t do 1: procedure Estimate(i)
3: Aj [hj (i)] ← Aj [hj (i)] + ∆ 2: Return minj Aj [hj (i)]
4: end for 3: end procedure
5: end procedure

where we always return the value from the Aj w. the lowest estimate, as we pr. definition only over estimates
exact frequency (and never under estimates), and as such, this will always be result w. lowest additive error →
therefore also named Count-min Sketch.
– Now, the probability of this failing corresponds to the probability of all the individual ones failing simultaneously,
which is just the product as the hash functions are drawn independently. Now, if we impose the reasonable req.
of wanting the individual ones to fail w. at most prob. 1/2, s.t. δj = 1/2, and:
1 1
P X > ε||f ||1 ] < < (68)
εk 2
then the total prob. of failure becomes exponentially decreasing in t:
t
P X > ε||f ||1 ] < 2−t (69)

8
and then requiring this to be at most δ:
1
2−t < δ ⇔ t = log2 (70)
δ
we get the logarithmic dependency on δ.
– From here it is also evident that both Update and Estimate takes logarithmic time (in the chosen failure prob.):
1
Time : O(t) = O log2 (71)
δ

– Lets now consider General Turnstile and instead of aiming for failure prob. δ = 1/2 (as before w. Count-
min), we will star out w. aiming for δ = 1/4, and the guarantee that we will be considering can both over- and
underestimate:
sX !
fi − ε||f ||2 ≤ f˜i ≤ fi + ε||f ||2 , with the 2−norm defined as ||f ||2 ≡ fi2 (72)
i

– We will again just start w. one k−sized array A:

but, this time this array will be accompanied but 2 hash functions, h and g. Where h is 1-approx universal (as
before), but g is 2−wise independent, and maps to {±1}:
1
h : U → [k], with ∀x ̸= y ∈ U : Ph∼H h(x) = h(y) ≤ (73)
( k
x1 ̸= x2 ∈ U h i 1
g : U → {±1}, with ∀ , Pg∈H g(x1 ) = y1 ∧ g(x2 ) = y2 = 2 (74)
y1 , y2 ∈ {±1} k

and where we then define Update and Estimate as:

Algorithm 8 : General turnstile Update(i, ∆) Algorithm 9 : General turnstile Estimate(i)

1: procedure Update(i, ∆) 1: procedure Estimate(i)
2: A[h(i)] ← A[h(i)] + g(i) · ∆ 2: Return g(i) · A[h(i)]
3: end procedure 3: end procedure

– The general idea behind the randomization of the sign from g is that it hopefully will minimize the noise as some
of it will cancel out with other parts of it, and when multiplying w. sign from g in Estimate the sign of g from
the ’real’ value - given in update - is then cancelled (because (±1)2 = 1).
– Now, to see what value for k we must chose for eq. (72) to fail w. prob. δ = 1/4, lets repeat strategy from strict
turnstile, and consider ’theoretical’ output of Estimate:

1h(j)=h(i) fj g(j)
X
g(i)A[h(i)] = g(i) signed True frequency + signed Noise from colliding keys = g(i) g(i)fi +
j : j̸=i

1h(j)=h(i) g(i)g(j)fj
X
2
= g(i) fi + (75)
| {z }
j : j̸=i
(±1)2 =1 | {z }
≡X

and with the intention of determining the failure probability, lets start out by calculating the expectation of the
noise-part, by utilizing linearity of expectation, the fact that for any 2 independent R.V’s E[a · b] = E[a] · E[b]
and the definition of expectation of discrete random variable:
X X h i
1h(j)=h(i) g(i)g(j)fj = E 1h(j)=h(i) E g(i)g(j) fj =
X
E X =E P h(j) = h(i) E g(i) E g(j) fj = 0
j : j̸=i j : j̸=i j : j̸=i
| {z } | {z }
=0 =0

Now, because X is not in general a non-negative R.V. we cannot use Markov’s Inequality, but still Cheby-
shev’s Inequality:
h i Var[X] E[X 2 ] − E[X]2
P X − E[X] > t < 2
= (76)
t t2

9
and by virtue of the fact that E[X] = 0, we can estimate the probability of violating eq. (72) by setting t = ε||f ||2 :
h i E[X 2 ]
P |X| > ε||f ||2 < 2 (77)
ε ||f ||22

s.t. we simply need to determine (bound) E[X 2 ] in order to bound the failure prob. in terms of k, ε:
X X X
1h(j)=h(i) g(i)g(j)fj 1h(k)=h(i) g(i)g(k)fk = E 1h(j)=h(i) 1h(k)=h(i) g(i)2 g(j)g(k)fj fk
X
E X2 = E

j : j̸=i k : k̸=i j : j̸=i k : k̸=i

E 1h(j)=h(i) 1h(k)=h(i) E g(j)g(k) fj fk , g(j)g(k) = 1, 1h(j)=h(i) 1h(k)=h(i) = 1h(j)=h(i) , j = k
X X
=
j : j̸=i k : k̸=i
| {z }
0,j̸=k
X1 ||f ||22
E 1h(j)=h(i) fj2 <
X
fj2 =

= (78)
j
k k
j : j̸=i
| {z }
P h(j)=h(i)

s.t.: h i E[X 2 ] 1
P |X| > ε||f ||2 < 2 2 < 2 (79)
ε ||f ||2 ε k
and imposing failure prob. of at most δ = 1/4 we get:
4
k> (80)
ε2
however as before (with strict turnstile), this is not the optimal space usage, and once again what we do is to
create t independent copies of the data structure:
4
k= ε2

A1 : , (h1 , g1 )
A2 : , (h2 , g2 )
..
.
At : , (ht , gt )

and implement the methods as:

Algorithm 10 : Count-median sketch Update(i, ∆)

1: procedure Update(i, ∆) Algorithm 11 : Count-median sketch Estimate(i)
2: for j = 1, . . . , t do 1: procedure Estimate(i)
3: Aj [hj (i)] ← gj (i)Aj [hj (i)] + ∆ 2: Return medianj Aj [hj (i)]
4: end for 3: end procedure
5: end procedure

and as before with count-min sketch in strict turnstile, we consider how many of the t arrays that has to
fail, for the Estimate to fail.
– Specifically, consider the estimates from each of the t arrays for some ith key, ordered numerically:

f˜i,1 ≤ f˜i,2 ≤ . . . ≤ median ≤ . . . ≤ f˜i,t−1 ≤ f˜i,t (81)

now, clearly, if the median is too low (< fi − ε||f ||2 ), then all vals. to left (half) is also too low, and if median is
too high (> fi + ε||f ||2 ), then all the values to the right (half) is also.
– As such, we might define (
maybe this is just bounded by prob. of at least half failing ? why =
) the failure prob. (which we want to be at most δ) as prob. of more than half the arrays failing:
t
(
h ti X 1, Aj fails.
P X> ≤ δ, X ≡ Xj , Xj ≡ (82)
2 j=1
0, o.w.

10
as we are dealing w. sum of independent 0-1 R.V.’s we utilize Chernoff ’s Inequality, and if we do, we get:
t/4
h ti e
P X> ≤ ≤δ (83)
2 4
from which the number of arrays becomes:
1
t ≥ 4 log 4e (84)
δ

3 Applications
• Lets consider a specific application of Randomized algorithms. Specifically Minimum Cut.
– Given an undirected and unweighted graph:
(
Vertices V
G ≡ (V, E), (85)
Edges E ≡ {{i, j} ∈ V : i ̸= j}
| {z }
Unordered pair

the problem of determining a Minimum cut, is defined as that of partitioning the nodes/vertices of the graph
into two disjunct sets, s.t. the number of edges between the two sets are as small as possible, e.g.:

1 2 5 6 1 2 5 6

3 4 7 8 3 4 7 8

– Also quickly define the problem of determining the Minimum s-t cut as the problem of partitioning the
nodes/vertices of the graph into two disjunct sets, s.t. the number of edges between the two sets are as small
as possible, but where it is predetermined that node s has to be in the one set and node t in the other, e.g.:

t 2 5 6 t 2 5 6

3 s 7 8 3 s 7 8

• There exists multiple ways to Deterministically compute the (global) Minimum cut, e.g. :
• A randomized approach is the alg. known as Karger’s (contraction) algorithm:
– Lets start by defining a contraction of a graph. A Contraction of 2 nodes a, b in a graph G is simply the process
of merging them into a ’super node’ ab, creating a new graph G′ :

a 2 ab 2

b 4 4

s.t. there are now 2 parallel edges from ′ ab′ to 4 (both the one from a and the one from b).
– We might repeat this process until there is only two nodes/supernodes left in the resulting graph. Now, obviously,
depending on the sequence of contractions performed, the resulting graph might have a different number of
edges between the 2 supernodes.
– As such, the naive implementations of Karger’s alg. becomes:

Algorithm 12 : Naive Karger’s algorithm

procedure MinCut(G = (V, E))
while |V | > 2 do ▷ Will pick n − 2 edges at random. (n ≡ |V |)
pick (i, j) ∈ E uniformly at random
contract nodes (i), (j) → (ij)
G ← G′
end while
Return G′
end procedure

11
– By the assumption that it takes O(n) time to update node-edge information pr. contraction and given the fact
that we will perform n − 2 contractions, the algorithm has running time:

Time : O(n2 ) (86)

– Lets analyze the algorithm by considering the probability of it determining the (global) minimum cut → we
consider absolute success prob., and no approximation ratio.
– Lets define the minimum cut of a graph as the set C of edges involved in the cut → the size of the cut is then
|C|.
– Begin by making observation that if none of the edges in C are among the n − 2 (where n = |V |) contractions C
has survived and the edges connecting the two supernodes in the resulting graph corresponds to the min. cut C.
– Define:

Ei ≡ An edge of C is picked for the ith contraction.

Si ≡ All edges in C has survived after the ith contraction.

as such - P [Sn−2 ] corresponds to having contracted none of the edges in the (global) min. cut C at the end of
the algorithm, and is given as:
h i h i h i h i
P Sn−2 = P E1 · P E2 E1 · P E3 E1 ∩ E2 · . . . · P En−2 E1 ∩ E2 ∩ . . . ∩ En−3 (87)
n−2
" i−1
#
Y \
= P Ei Ej , (Just prod. as each step is taken independently.) (88)
i=1 j=1

– To bound the probability, consider:

∗ Recall the fact that the number of edges incident to any node/vertice in a graph, is denoted the degree, and
since every edge incident to a node/vertice also is incident to exactly one other, in general it is given that:
X
deg(v) = 2|E| (89)
v∈V

∗ Furthermore, at (just before) the ith contraction, the graph has n − i + 1 nodes/vertices, and at every step
in the sequence of contractions, any ith node/vertice has deg(v) ≥ |C| (else C wouldn’t be the global min.
cut). Therefore:
(n − i + 1)|C|
# Edges at ith contraction ≥ (90)
2
and we might (upper) bound the probability of picking some e ∈ C at ith contraction (without having picked
any in advance): " #
i−1
\ # Edges in min. cut 2
P Ei Ej ≤ th
= (91)
j=1
# Edges at i contraction n − i+1

– As such, the probability of not having picked any e ∈ C at the ith contraction is bounded as:
" i−1
#
\ 2
P Ei Ej ≥ 1 − (92)
j=1
n−i+1

resulting in an algorithm that takes O(n2 ) running timer, but has a polynomially decreasing success probability:
" # n−2
n−2Y i−1
\ Y 2
P Sn−2 = P Ei Ej ≥ 1−
i=1 j=1 i=1
n−i+1
−1
2 2 2 n−2n−3n−4 1 2 n
= 1− 1− ... 1 − = ... = = (93)
n n−1 3 n n−1n−2 3 n(n − 1) 2

– However, as so often before, what we can do is simply repeat the algorithm k times to improve success probability.
Specifically, say that we require failure prob to be:
h i 1
P Sn−2 ≤ (94)
n

12
we can solve for k, and find that:
k 1

2 1 ln n − ln(n)
1− ≤ ⇔k≥ 2
= 2
(95)
n(n − 1) n ln 1 − n(n−1) ln 1 − n(n−1)

and then utilizing that ln(.) is monotonically increasing, s.t. ∀a < b : ln(a) < ln(b), together w. fact that
∀x : e−x ≥ 1 − x (is obvious from considering Maclaurin expansion of ex ):

− ln(n) − ln(n) n(n − 1)

k≥ 2
≥ 2 = ln(n) (96)
ln 1 − n(n−1) − n(n−1) 2

s.t. we must repeat Karger’s algorithm O(n2 ln(n)) to guarantee:

1
P Sn−2 ≥ 1 − (97)
n
resulting in a total runtime:
O n4 ln(n)

Time : (98)
which is still better than deterministic Edmonds-Karp (at least for fairly dense graphs).
• Lets now consider an improvement to the process of repeating Karger’s (contraction) algorithm:
– First, observe the fact that each factor is decreasing in eq. (93), and specifically, that performing n−l contractions
yields:
l(l − 1)
P Sn−l ≥ (99)
n(n − 1)
prob. of the specific min. cut C having survived thus far. As such, setting l ≈ √1 3 yields:
2

1
P Sn− √n2 ≥ (100)
2
– Idea is then that, if one only performs the first n − l contractions in each run of Karger’s algorithm, the prob.
of a specific min. cut surviving is bigger. Based on this, the Fast Cut algorithm is devised as:

Algorithm 13 : Fast Min. Cut algorithm

procedure FastMinCut(G = (V, E))
if |V | ≤ 6 then
Return Brute-force solution.
end if
G1 ← Karger’s algorithm on G using n − √n contractions.
n
G2 ← Karger’s algorithm on G using n − √n contractions. ▷ Using different RNG seed to make independent.
) n
X1 ← FastMinCut(G1 )
▷ Recursive calls
X2 ← FastMinCut(G2 )
Return minimum cut between G1 and G2 .
end procedure

– The first call takes O(n2 ) because algorithm has to perform Karger’s contraction

3 Specifically
√
it should be 1/ 1.69722 to be accurate.

Coordinate Geometry - by Trockers
No ratings yet
Coordinate Geometry - by Trockers
44 pages
24 SimilaritySearch
No ratings yet
24 SimilaritySearch
52 pages
Toc CS246 PRK
No ratings yet
Toc CS246 PRK
17 pages
Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
4 pages
Lec 31 Handout
No ratings yet
Lec 31 Handout
18 pages
Near-Optimal Hashing Algorithms For Approximate Near (Est) Neighbor Problem
No ratings yet
Near-Optimal Hashing Algorithms For Approximate Near (Est) Neighbor Problem
31 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
19 pages
AA Exam 2021 Answers
No ratings yet
AA Exam 2021 Answers
6 pages
3.flajolet Martin Algorithm
No ratings yet
3.flajolet Martin Algorithm
31 pages
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
No ratings yet
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
28 pages
High Speed Hashing For Integers
No ratings yet
High Speed Hashing For Integers
17 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
1 Hashing: 1.1 Desired Properties
No ratings yet
1 Hashing: 1.1 Desired Properties
8 pages
CS174: Note07
No ratings yet
CS174: Note07
5 pages
Complexity, The Changing Minimum and Closest Pair: 1 Las Vegas and Monte Carlo Algorithms
No ratings yet
Complexity, The Changing Minimum and Closest Pair: 1 Las Vegas and Monte Carlo Algorithms
5 pages
Assignment1 PDF
No ratings yet
Assignment1 PDF
2 pages
Lecture 4 - September 16, 2014
100% (1)
Lecture 4 - September 16, 2014
4 pages
1603-ADA-2nd Internals+ Answer Key
No ratings yet
1603-ADA-2nd Internals+ Answer Key
5 pages
1 Overview: Lecture 2 - February 3, 2005
No ratings yet
1 Overview: Lecture 2 - February 3, 2005
6 pages
L11 PDF
No ratings yet
L11 PDF
5 pages
CSE291 Course Notes
No ratings yet
CSE291 Course Notes
69 pages
F.F.T. Hashing Is Not Collision-Free: January 1995
No ratings yet
F.F.T. Hashing Is Not Collision-Free: January 1995
11 pages
Randomized Algosnotes
No ratings yet
Randomized Algosnotes
362 pages
Lec 3
No ratings yet
Lec 3
6 pages
Homework Assignment 3 Spring 2024
No ratings yet
Homework Assignment 3 Spring 2024
3 pages
Combinatorics Circle Presentation
No ratings yet
Combinatorics Circle Presentation
30 pages
hw05 Solution PDF
No ratings yet
hw05 Solution PDF
8 pages
PracticeSolution 1
No ratings yet
PracticeSolution 1
15 pages
Average-Case Hardness
No ratings yet
Average-Case Hardness
11 pages
Ch11 Soln 2
No ratings yet
Ch11 Soln 2
8 pages
Rozprawa
No ratings yet
Rozprawa
77 pages
MIT6 006S20 Ps3-Solutions
No ratings yet
MIT6 006S20 Ps3-Solutions
9 pages
AA Resit 2020 Answers
No ratings yet
AA Resit 2020 Answers
5 pages
CS246 Hw1
No ratings yet
CS246 Hw1
5 pages
COL 351: Analysis and Design of Algorithms Semester I, 2021-22, CSE, IIT Delhi
No ratings yet
COL 351: Analysis and Design of Algorithms Semester I, 2021-22, CSE, IIT Delhi
4 pages
1 Applications of Nearest Neighbor
No ratings yet
1 Applications of Nearest Neighbor
5 pages
Ppuuffinn-Talk 899
No ratings yet
Ppuuffinn-Talk 899
21 pages
wk3 3
No ratings yet
wk3 3
111 pages
Daa Aakash
No ratings yet
Daa Aakash
136 pages
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
No ratings yet
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
11 pages
10 Dictionaries
No ratings yet
10 Dictionaries
11 pages
CSC 172 Midterm
No ratings yet
CSC 172 Midterm
11 pages
ACM - ICPC Advanced Complete Syllabus
No ratings yet
ACM - ICPC Advanced Complete Syllabus
9 pages
Lec 6 TMTO
No ratings yet
Lec 6 TMTO
17 pages
A Reliable Randomized Algorithm For The Closest-Pair Problem - 1997 (CP-11.4.1997)
No ratings yet
A Reliable Randomized Algorithm For The Closest-Pair Problem - 1997 (CP-11.4.1997)
33 pages
A Reliable Randomized Algorithm For The Closest-Pair Problem
No ratings yet
A Reliable Randomized Algorithm For The Closest-Pair Problem
33 pages
Expectation of Geometric Distribution Variance and Standard Deviation
No ratings yet
Expectation of Geometric Distribution Variance and Standard Deviation
5 pages
Computing Functions Over Wireless Networks
No ratings yet
Computing Functions Over Wireless Networks
37 pages
('Christos Papadimitriou', 'Midterm 1', ' (Solution) ') Fall 2009
No ratings yet
('Christos Papadimitriou', 'Midterm 1', ' (Solution) ') Fall 2009
5 pages
MIT6 046JS12 Lec10
No ratings yet
MIT6 046JS12 Lec10
8 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
DS Cheatsheet
No ratings yet
DS Cheatsheet
2 pages
Advanced Algorithms
No ratings yet
Advanced Algorithms
218 pages
DGIM
No ratings yet
DGIM
90 pages
Randomized Algorithms Dsa CP3151
No ratings yet
Randomized Algorithms Dsa CP3151
14 pages
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
No ratings yet
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
453 pages
Probab 10
No ratings yet
Probab 10
3 pages
Brics: Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions
No ratings yet
Brics: Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions
14 pages
Week 12
No ratings yet
Week 12
7 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Q4 - SUMMATIVE TEST - Entrepreneurship - WEEK 7&8
0% (1)
Q4 - SUMMATIVE TEST - Entrepreneurship - WEEK 7&8
3 pages
Power Flow and Fault Analysis Simulation For A PV Wind Hybrid DC Microgrid
No ratings yet
Power Flow and Fault Analysis Simulation For A PV Wind Hybrid DC Microgrid
6 pages
InRouter615-S - PRDT Spec - V2.8 March. 2021 - InHand Networks
No ratings yet
InRouter615-S - PRDT Spec - V2.8 March. 2021 - InHand Networks
4 pages
Lab7 IAA202
No ratings yet
Lab7 IAA202
3 pages
EFD-500 High Precision Ultrasonic Flaw Detector
No ratings yet
EFD-500 High Precision Ultrasonic Flaw Detector
4 pages
VL Series - Wey Knife Gate Valves-En
No ratings yet
VL Series - Wey Knife Gate Valves-En
9 pages
User Manual PRO MIXER DJX700
No ratings yet
User Manual PRO MIXER DJX700
10 pages
Managing Services in t24
No ratings yet
Managing Services in t24
13 pages
DW TAS Resource Profile Template
No ratings yet
DW TAS Resource Profile Template
2 pages
990 LTR BUFFER TANKS Drawing
No ratings yet
990 LTR BUFFER TANKS Drawing
1 page
Nmea Components
No ratings yet
Nmea Components
4 pages
VT Evo Yanmar Spare Parts SP007 01 05 00
No ratings yet
VT Evo Yanmar Spare Parts SP007 01 05 00
24 pages
Combinatorial Identities Through Algebra Handout
No ratings yet
Combinatorial Identities Through Algebra Handout
5 pages
GFW HW19
No ratings yet
GFW HW19
6 pages
Firstcoin
No ratings yet
Firstcoin
2 pages
Bodil Marie Stavning Thomsen - Lars Von Trier's Renewal of Film 1984-2014 - Signal, Pixel, Diagram-Aarhus University Press (2018)
No ratings yet
Bodil Marie Stavning Thomsen - Lars Von Trier's Renewal of Film 1984-2014 - Signal, Pixel, Diagram-Aarhus University Press (2018)
365 pages
Azul Com Separar Chats
No ratings yet
Azul Com Separar Chats
6 pages
Dynamic Surrogate Optimization of Vertically Stacked Nanosheet FET Based On Gaussian Process Regression
No ratings yet
Dynamic Surrogate Optimization of Vertically Stacked Nanosheet FET Based On Gaussian Process Regression
9 pages
Plate No - EM8635 - Official Receipt - PDF - Economies - Vehicles
No ratings yet
Plate No - EM8635 - Official Receipt - PDF - Economies - Vehicles
1 page
32LG2000
No ratings yet
32LG2000
460 pages
Motor Bomba Alemite 3234404 - 7786-A5
No ratings yet
Motor Bomba Alemite 3234404 - 7786-A5
12 pages
DIGITAL CIRCUIT AND SYSTEM End Term
No ratings yet
DIGITAL CIRCUIT AND SYSTEM End Term
2 pages
Week 1 AY2021-2022 Introduction To Forensic Accounting V2.0
No ratings yet
Week 1 AY2021-2022 Introduction To Forensic Accounting V2.0
50 pages
Week 008 - Lesson 6 Current and Future Trends of Media and Information
No ratings yet
Week 008 - Lesson 6 Current and Future Trends of Media and Information
4 pages
NAME:Yash Nahata Roll No: 76 GR NO: 11910337 Subject: Fluid Flow Operations (Lab) Guide: Manik Deosarkar
No ratings yet
NAME:Yash Nahata Roll No: 76 GR NO: 11910337 Subject: Fluid Flow Operations (Lab) Guide: Manik Deosarkar
29 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
25 pages
Shreeji Finserv LLP - MF Operations Executive
No ratings yet
Shreeji Finserv LLP - MF Operations Executive
3 pages
Amcot Cooling Tower Corporation: Operating Instructions and Service Manual For Model ST 3 1500
No ratings yet
Amcot Cooling Tower Corporation: Operating Instructions and Service Manual For Model ST 3 1500
13 pages
CV Akhilesh
No ratings yet
CV Akhilesh
2 pages

Randomized Algorithms Notes

Uploaded by

Randomized Algorithms Notes

Uploaded by

Randomized Algorithms - Eksamen

Sebastian Yde Madsen

S ≡ {x1 , x2 , . . . , xn }, xi ∈ Universe of integers ≡ [U ] = {0, 1, . . . , U − 1} (1)

∃ x ̸= y ∈ [U ] s.t. h(x) = h(y) (3)

resulting in linked lists of sizes > 1.

1 I.e. given some x, answer the question: is x ∈ S.

and then depending on whether q ∈ S:

but in general, if we set m = n, we get the bound:

Since this is a non-negative random variable we can utilize Markovs Inequality:

or sometimes the combination (c, k)-independent families, defined by:

• Specific application - Nearest Neighbour Search

U ≡ {0, 1}d corresponding to [U ] = [2d ] (19)

and Hamming distance measure:

distHAM (x, y) = |{i : x[i] ̸= y[i]}| (20)

dist : U → R≥0 (22)

and some query element q, return the element in x∗ ∈ S closest to q - specifically:

then return any:

∃ xi with dist(xi , q) ≤ R (26)

R0 , (1 + α)R0 , . . . , (1 + α)i R0 , . . . , (1 + α)N R0 (28)

R = min dist(q, x) ≥ (1 + α)i−1 R0

(1 + α)N R0 ≡ max dist(x, y)

ln(d) ln(d) ln(d)

∀ x, y ∈ U, dist(x, y) ≥ cR ⇒ Pg∼H g(x) = g(y) ≤ P2k

and common design choice is to take:  

∀ x, y ∈ U, dist(x, y) ≤ R ⇒ Pg∼H g(x) = g(y) ≥ P1k (Just prod. as hi ’s ∼ H independently.)

Algorithm 1 : Locality Sensitive Hashing Query for Relaxed c-Approx R-NNS.

multiple times to improve probability of success.

Is this expected or worst case or?

– At this point another design choice is made - specifically chose:

2 Streaming and Dimensionality Reduction

Algorithm 2 : Misra-Gries Update(i)

f˜i ≤ fi (Never over-count) (57)

and to give a lower bound one must realize:

f˜i ← f˜i + ∆, ∆ ∈ {−M, −M + 1, . . . M − 1, M } (59)

here one considers 2 different frameworks:

Understand why it never underestimates

and some 1-approx universal hash function:

Algorithm 4 : Strict turnstile Update(i, ∆) Algorithm 5 : Strict turnstile Estimate(i)

and then define the supported methods be:

Algorithm 6 : Count-min sketch Update(i, ∆)

– We will again just start w. one k−sized array A:

and where we then define Update and Estimate as:

Algorithm 8 : General turnstile Update(i, ∆) Algorithm 9 : General turnstile Estimate(i)

j : j̸=i k : k̸=i j : j̸=i k : k̸=i

and implement the methods as:

Algorithm 10 : Count-median sketch Update(i, ∆)

f˜i,1 ≤ f˜i,2 ≤ . . . ≤ median ≤ . . . ≤ f˜i,t−1 ≤ f˜i,t (81)

Algorithm 12 : Naive Karger’s algorithm

Time : O(n2 ) (86)

Ei ≡ An edge of C is picked for the ith contraction.

– To bound the probability, consider:

− ln(n) − ln(n) n(n − 1)

s.t. we must repeat Karger’s algorithm O(n2 ln(n)) to guarantee:

Algorithm 13 : Fast Min. Cut algorithm

You might also like

and common design choice is to take: