0% found this document useful (0 votes)
96 views95 pages

Greedy Algorithms and Data Compression.: Curs Fall 2017

The document discusses greedy algorithms and their application to optimization problems involving data compression and scheduling tasks. It provides examples of problems that can be solved using greedy approaches, such as fractional knapsack and activity selection problems, and examines conditions where greedy algorithms provide optimal solutions. It also discusses cases where greedy methods may not yield optimal answers and requires alternative approaches.

Uploaded by

Sarbani Dasgupts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views95 pages

Greedy Algorithms and Data Compression.: Curs Fall 2017

The document discusses greedy algorithms and their application to optimization problems involving data compression and scheduling tasks. It provides examples of problems that can be solved using greedy approaches, such as fractional knapsack and activity selection problems, and examines conditions where greedy algorithms provide optimal solutions. It also discusses cases where greedy methods may not yield optimal answers and requires alternative approaches.

Uploaded by

Sarbani Dasgupts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Greedy Algorithms and Data Compression.

Curs Fall 2017


Greedy Algorithms
An optimization problem: Given of (S, f ), where S is a set of
feasible elements and f : S → R is the objective function, find a
u ∈ S, which optimizes (maximizes o minimizes) f .
A greedy algorithm always makes a locally optimal choice in the
(myopic) hope that this choice will lead to a globally optimal
solution.
Greedy algorithms do not always yield optimal solutions, but for
some problems they are very efficient.
Greedy Algorithms

I At each step we choose the best choice at the moment and


then solve the subproblem that arise later.
I The choice may depend on previous choices, but not on future
choices.
I At each choice, the algorithm reduces the problem into a
smaller one.
I A greedy algorithm never backtracks.
Greedy Algorithms

For the greedy strategy to work correctly, it is necessary that the


problem under consideration has two characteristics:
I Greedy choice property: We can arrive to the global optimum
by selecting a local optimums.
I Optimal substructure: An optimal solution to the problem
contains the optimal solutions to subproblems.
Fractional knapsack problem

Fractional Knapsack
INPUT:a set I = {i}n1 of items that can be fractioned, each i with
weight wi and value vi . A maximum weight W permissible
QUESTION: select whole or partial items to maximize the profit,
within allowed weight W
Example.
Item I: 1 2 3
Value V: 60 100 120
Weight w : 10 20 30
W = 28
Fractional knapsack

Greedy for fractional knapsack (I , V , W )


Sort I in decreasing value of vi /wi
Take the maximum amount of the first item
while Total weight taken ≤ W do
Take the maximum amount of the next item
end while
If n is the number of items, The algorithm has a cost of
T (n) = O(n + n log n).
Example.
Item I: 1 2 3
Value V : 60 100 120
Weight w : 10 20 30
v /w : 6 5 4
As W = 28 then take 10 of 1 and 18 of 2
Correctness?
Greedy does not always work

0-1 Knapsack
INPUT:a set I = {i}n1 of items that can NOT be fractioned, each i
with weight wi and value vi . A maximum weight W permissible
QUESTION: select the items to maximize the profit, within
allowed weight W .
For example
Item I: 1 2 3
Value V : 60 100 120
with
Weight w : 10 20 30
v /w : 6 5 4
W = 50.
Then any solution which includes item 1 is not optimal. The
optimal solution consists of items 2 and 3.
Activity scheduling problems

A set of activities S = {1, 2, . . . , n} to be processed by a single


processor, according to different constrains.
1. Interval scheduling problem: Each i ∈ S has a start time si
and a finish time fi . Maximize the set of mutually compatible
activities
2. Weighted interval scheduling problem: Each i ∈ S has a si , a
fi , and a weight wP
i . Find the set of mutually compatible such
that it maximises i∈S wi
3. Job scheduling problem (Lateness minimisation): Each i ∈ S
has a processing time ti (could start at any time si ) but it has
a deadline di , define lateness Li of i by maxi {0, (si + ti ) − di }.
Find the schedule of the si for all the tasks s.t. no two tasks
are planned to be processed at the time and the lateness is
minimized.
The interval scheduling problem

Activity Selection Problem


INPUT: a set S = {1, 2, . . . , n} of activities to be processed by a
single resource. Each activity i has a start time si and a finish time
fi , with fi > si .
QUESTION: Maximize the set of mutually compatible activities,
where activities i and j are compatible if [si fi ) ∩ (sj fj ] = ∅.
Notice, S = {A|A ⊂ S, is compatible } and f (A) = |A|.
Example.

Activity (A): 1 2 3 4 5 6 7 8
Start (s): 3 2 2 1 8 6 4 7
Finish (f): 5 5 3 5 9 9 5 8

4
3 7 8

2 6
1 5

1 2 3 4 5 6 7 8 9 10
To apply the greedy technique to a problem, we must take into
consideration the following,
I A local criteria to allow the selection,
I a condition to determine if a partial solution can be
completed,
I a procedure to test that we have the optimal solution.
The Activity Selection problem.
Given a set A of activities, wish to maximize the number of
compatible activities.

Activity selection A
Sort A by increasing order of fi
Let a1 , a2 , . . . , an the resulting sorted list of activities
S := {a1 }
j := 1 {pointer in sorted list}
for i = 2 to n do
if si ≥ fj then
S := S ∪ {ai } and j := i
end if
end for
return S.

4
A : 3 1 2 7 8 5 6; fi : 3 5 5 5 8 9 9 3 7 8

⇒ SOL: 3 1 8 5 2 6
1 5

1 2 3 4 5 6 7 8 9 10
Notice: In the activity problem we are maximizing the number of
activities, independently of the occupancy of the resource under
consideration. For example in:
4
3 7 8

2 6
1 5

1 2 3 4 5 6 7 8 9 10

solution 3185 is as valid as 3785. If we were asking for maximum


occupancy 456 will be a solution.
How would you modify the previous algorithm to deal with the
maximum occupancy problem?
Theorem
The previous algorithm produces an optimal solution to the activity
selection problem.
There is an optimal solution that includes the activity with earlier
finishing time.
Proof.
Given A = {1, . . . , n} sorted by finishing time, we must show there
is an optimal solution that begins with activity 1. Let
S = {k, . . . , m} be a solution. If k = 1 done. Otherwise, define
B = (S − {k}) ∪ {1}. As f1 ≤ fk the activities in B are disjoint.
As |B| = |S|, B is also an optimal solution. If S is an optimal
solution to A, then S 0 = A − {1} is an optimal solution for
A0 = {i ∈ A|si ≥ f1 }. Therefore, after each greedy choice we are
left with an optimization problem of the same form as the original.
Induction on the number of choices, the greedy strategy produces
an optimal solution
Notice the the optimal substructure of the problem: If an optimal
solution S to a problem includes ak , then the partial solutions
excluding ak from S should also be optimal in their corresponding
domains.
Greedy does not always work.
Weighted Activity Selection Problem
INPUT: a set S = {1, 2, . . . , n} of activities to be processed by a
single resource. Each activity i has a start time si and a finish time
fi , with fi > si , and a weight wi .
QUESTION: PFind the set of mutually compatible such that it
maximizes i∈S wi
S := ∅
sort W = {wi } by decreasing value
choose the max. weight wm
add m = (lm , um ) to S
remove all intervals overlapping with m
while there are intervals do
repeat the greedy procedure
end while
return S
Correctness?
Greedy does not always work.

The previous greedy does not always solve the problem!

6 6
10

1 5 10
The algorithm chooses the interval (1, 10) with weight 10, and the
solution is the intervals (2, 5) and (5, 9) with total weight of 12
Job scheduling problem
Also known as the Lateness minimisation problem.
We have a single resource and n requests to use the resource, each
request i taking a time ti .
In contrast to the previous problem, each request instead of having
an starting and finishing time, it has a deadline di . The goal is to
schedule the tasks as to minimize over all the requests, the
maximal amount of time that a request exceeds the deadline.
Minimize Lateness
I We have a single processor
i ti di
I We have n jobs such that job i:
1 1 9
I requires ti units of processing time,
I it has to be finished by time di , 2 2 8
3 2 15
I Lateness of i:
( 4 3 6
0 if fi ≤ di , 5 3 14
Li = max 6 4 9
fi − d i otherwise.

Goal: schedule the jobs to minimize lateness


i.e. We must assign starting time si to each i, as to mini max Li .

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6
0 0 0 2 0 6
Minimize Lateness
Schedule jobs according to some ordering
(1.-) Sort in increasing order of ti :
Process jobs with short time first

i ti di
1 1 6
2 5 5

(2.-) Sort in increasing order of di − ti :


Process first jobs with less slack time

i ti di d1 − ti
1 1 2 1
2 10 10 0

In this case, job 2 should be processed first, which doesn’t


minimise lateness.
Process urgent jobs first

(3.-) Sort in increasing order of di .


LatenessA {i, ti , di }
SORT by increasing order of di : i ti di sorted i
{d1 , d2 , . . . , dn } 1 1 9 3
Rearrange the jobs i : 1, 2, . . . , n
2 2 8 2
t=0
for i = 2 to n do 3 2 15 6
Assign job i to [t, t + ti ] 4 3 6 1
t = t + ti 5 3 14 5
si = t; fi = t + ti
end for
6 4 9 4
return S = {[s1 , f1 ], . . . [sn , fn ]}.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d: 6 8 9 9 14 15
i: 1 2 3 4 5 6
0 0 0 1 0 0
Complexity and idle time

Time complexity
Running-time of the algorithm without comparison sorting: O(n)
Total running-time: O(n lg n)
Idle steps
From an optimal schedule with idle steps, we always can eliminate
gaps to obtain another optimal schedule:
0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

There exists an optimal schedule with no idle steps.


Inversions and exchange argument
An schedule has an inversion if i is scheduled before j with dj < di .
fj

i j
fi0

j i

dj di

Lemma
Exchanging two inverted jobs reduces the number of inversions by 1 and
does not increase the max lateness.
Proof Let L = lateness before exchange and let L0 be the lateness after
the exchange, let Li , Lj , L0i , L0j , the corresponding quantities for i, j.
Notice that fj0 = fi , using the fact that dj < di
⇒ L0i = fi 0 − di = fj − di < fj − dj = Lj
Therefore the swapping does not increase the maximum lateness of the
schedule. 2
Correctness of LatenessA

Notice the output S produced by LatenessA has no inversions and


no idle steps.

Theorem
Algorithm LatenessA returns an optimal schedule S.
Proof
Assume Ŝ is an optimal schedule with the minimal number of
inversions (and no idle steps).
If Ŝ has 0 inversions then Ŝ = S.
If number inversions in Ŝ is > 0, let i − j be an adjacent inversion.
Exchanging i and j does not increase lateness and decrease the
number of inversions.
Therefore, max lateness S ≤ max lateness Ŝ. 2
Network construction: Minimum Spanning Tree
I We have a set of locations V = {v1 , . . . , vn },
I we want to build a communication network on top of them
I we want that any vi can communicate with any vj ,
I for any pair (vi , vj ) there is a cost w (vi , vj ) of building a
direct link,
I if E is the set of all n(n − 1)/2 possible edges, we want to
P T (E ) ⊆ E s.t. (V , T (E )) is connected and
find a subset
minimizes e∈T (E ) w (e).
Network construction: Minimum Spanning Tree
I We have a set of locations V = {v1 , . . . , vn },
I we want to build a communication network on top of them
I we want that any vi can communicate with any vj ,
I for any pair (vi , vj ) there is a cost w (vi , vj ) of building a
direct link,
I if E is the set of all n(n − 1)/2 possible edges, we want to
P T (E ) ⊆ E s.t. (V , T (E )) is connected and
find a subset
minimizes e∈T (E ) w (e).

Construct
the
MST
Minimum Spanning Tree (MST).

INPUT: An edge weighted graph G = (V , E ),


|V | = n, ∀e ∈ E , w (e) ∈ R,
QUESTION: Find a tree TP with V (T ) = V and E (T ) ⊆ E , such
that it minimizes w (T ) = e∈E (T ) w (e).

e e
6 4 6 4
5 9 5 9
a f g a f g
14 2 14 2
b 10 d 15 h b 10 d 15 h

3 8 3 8
c c
Some definitions

Given G = (V , E ):
A path is a sequence of consecutive
edges. A cyle is a path with no
repeated vertices other that the one 6
e
4
that it starts and ends. a 5 f
9
g
A cut is a partition of V into S and 14 2
10 15
V − S. b d h

The cut-set of a cut is the set of 3


c
8
S
edges with one end in S and the
other in V − S.
Overall strategy

Given a MST T in G , with different edge weights, T has the


following properties:
I Cut property
e ∈ T ⇔ e is the lighted edge across some cut in G .
I Cycle property
e 6∈ T ⇔ e is the heaviest edge on some cycle in G .

The MST algorithms are methods for ruling edges in or out of G .


The ⇐ implication of the cut property will yield the blue (include)
rule, which allow us to include a min weight edge in T for ∃ cut.
The ⇒ implication of the cycle property will yield the red (exclude)
rule which allow us to exclude a max weight edge from T for ∃
cycles.
The cut rule (Blue rule)
The MST problem has the property of the optimal substructure:
Given an optimal MST T , removing an edge e yields T1 and T2
which are optimal for each subgraph.

Theorem (The cut rule)


Given G = (V , E ), w : E → R, let T be a MST of G and S ⊆ T .
Let e = (u, v ) an min-weight edge in G connecting S to V − S.
Then e ∈ T .
The edges incorporated to the solution by this rule, are said to be
blue.
Proof. 6
e
4
Assume e 6∈ T . Consider a path from a 5 f
9
g
u to v in T . Replacing the first edge 14 2
10 15
in the path, which is not in S, by e, b d h

must give a spanning tree of equal or 3


c
8

less weight.
The cycle rule (Red rule)

The MST problem has the property that the the optimal solution
can’t be a cycle.

Theorem (The cycle rule)


Given G = (V , E ), w : E → R, let C be a cycle in G , the edge
e ∈ C with greater weight can not be part of the optimal MST T .
The edges processed by this rule, are said to be red.
e
6 4
Proof. a 5 f
9
g
Let C be a cycle spanning through C 2
14
vertices {vi , . . . , vl }, then removing b 10 d 15 h
the max weighted edge gives a a 3 8
c
better solution.
C=cycle spanning {a,c,d,f}
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure


problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select from the
e
cut-set a non-colored edge with min weight 6 4
5 9
and paint it blue a f g
14 2
Red rule: Given a cycle C with no red
b 10 d 15 h
edges, selected an non-colored edge in C
3 8
with max weight and paint it red. c

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure


problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a e
4
6
non-colored edge with min weight and a 5 f
9
g
paint it blue 14 2
Red rule: Given cycle C with no red edges, b 10 d 15 h
selected a non-colored edge in C with max 3 8
c
weight and paint it red.
Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure


problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a
e
non-colored edge with min weight and 6 4
5 9
paint it blue a f g
14 2
Red rule: Given cycle C with no red edges,
b 10 d 15 h
selected a non-colored edge in C with max
3 8
weight and paint it red. c

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure


problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a e
6 4
non-colored edge with min weight and 5 9
a f g
paint it blue 14 2
Red rule: Given cycle C with no red edges, b 10 d 15 h

selected a non-colored edge in C with max 3


c
8

weight and paint it red.


Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure


problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a
e
non-colored edge with min weight and 6 4
5 9
paint it blue a f g
14 2
Red rule: Given cycle C with no red edges,
b 10 d 15 h
selected a non-colored edge in C with max
3 8
weight and remove it. c

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST : Correctness

Theorem
There exists a MST T containing only all blue edges. Moreover
the algorithm finishes and finds a MST
Sketch of proof Induction on number of
iterations for blue and red rules. The base e
4
case (no edges colored) is trivial. The 6
9
a 5 f g
induction step is the same that in the
14 10 2
proofs of the cut and cycle rules. b d 15 h
Moreover if we have an e not colored, if 3
e
8
c
ends are in different blue tree, apply blue C

rule, otherwise color red e. 2


We need implementations for the algorithm! The ones we present
use only the blue rule
A short history of MSP implementation

There has been extensive work to obtain the most efficient


algorithm to find a MST in a given graph:
I O. Borůvka gave the first greedy algorithm for the MST in 1926. V.
Jarnik gave a different greedy for MST in 1930, which was re-discovered
by R. Prim in 1957. In 1956 J. Kruskal gave a different greedy algorithms
for the MST. All those algorithms run in O(m lg n).
I Fredman and Tarjan (1984) gave a O(m log∗ n) algorithm, introducing a
new data structure for priority queues, the Fibbonacci heap. Recall log∗ n
is the number of operations to go from n = 1 to log∗ 1000 = 4.
I Gabow, Galil, Spencer and Tarjan (1986) improved Fredman-Tarjan to
O(m log(log∗ n)).
I Karger, Klein and Tarjan (1995) O(m) randomized algorithm.
I In 1997 B. Chazelle gave an O(mα(n)) algorithm, where α(n) is a very
slowly growing function, the inverse of the Ackermann function.
Basic algorithms
Use the greedy
I Jarnik-Prim (Serial centralized) Starting from a vertex v ,
grows T adding each time the lighter edge already connected
to a vertex in T , using the blue’s rule. Uses a priority heap
(usually a heap) to store the edges to be added and retrieve
the lighter one.
I Kruskal (Serial distributed) Considers every edge and grows a
forest F by using the blue and red rules to include or discard
e. The insight of the algorithm is to consider the edges in
order of increasing weight. This makes the complexity of
Kruskal’s to be dominated by Ω(m lg m). At the end F
becomes T . The efficient implementation of the algorithm
uses Union-find data structure.
Jarnik-Prim vs. Kruskal

Jarnik−Prim: How blue man can spread his message to everybody

Kruskal: How to stablish a min distance cost network about all men
Jarnı́k-Prim vs. Kruskal

Jarnik−Prim: How blue man can spread his message to everybody


(first 6 steps)

Kruskal: How to stablish a min distance cost network about all men
(6 first edges)
Jarnı́k - Prim greedy algorithm.

V. Jarnı́k, 1936, R. Prim, 1957


Greedy on vertices with Priority queue
Starting with an arbitrary node r , at each step build the MST by
incrementing the tree with an edge of minimum weight, which
does not form a cycle.

MST (G , w , r )
T := ∅
for i = 1 to |V | do
Let e ∈ E : e touches T , it has min weight, and do not form a
cycle
T := T ∪ {e}
end for
Use a priority queue to choose min e connected to the tree already
formed.
For every v ∈ V − T , let k[v ] = minimum weight of any edge
connecting v to any vertex in T .
Start with k[v ] = ∞ for all v .
For v ∈ T , let π[v ] be the parent of v . During the algorithm
T = {(v , π[v ]) : v ∈ V − {r } − Q}
where r is the arbitrary starting vertex and Q is a min priority
queue storing k[v ]. The algorithm finishes when Q = ∅.
Example.

e e e
6 4 6 4 6 4
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h b 10 d 15 h

3 8 3 8 3 8
c c c

e e e
4 4 6 4
6 6
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h 10 15
b d h
3 8 3 8 3 8
c c c
e e
6 4 6 4
5 9 5 9
a f g a f g
14 2 14 2
b 10 d 15 h b 10 d 15 h

3 8 3 8
c c

w (T ) = 52

Time: depends on the implementation of Q.


Q an unsorted array: T (n) = O(|V |2 );
Q a heap: T (n) = O(|E | lg |V |).
Q a Fibonacci heap: T (n) = O(|E | + |V | lg |V |)
Kruskal’s greedy algorithm.

J. Kruskal, 1956
Similar to Jarnı́k - Prim, but chooses minimum weight edges,
without keeping the graph connected.
MST (G , w , r )
Sort E by increasing weight
T := ∅
for i = 1 to |V | do
Let e ∈ E : with minimum weight and do not form a cycle
with T
T := T ∪ {e}
end for
We have an O(m lg m) from the sorting the edges.
Useful to implement the adding and removing of edges to T we
use Union-Find data structure.
Example.

e e e
6 4 6 4 6 4
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h b 10 d 15 h

3 8 3 8 3 8
c c c

e e e
4 4 6 4
6 6
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h 10 15
b d h
3 8 3 8 3 8
c c c
Union-find: A DS to implement Kruskal

Notice that Kruskal evolves by building clumps of connected nodes


without cycles and merging the clumps into larger clumps, taking
care there are not cycles.
In trees, the connectivity relation is an equivalence relation, two
nodes are connected if there is only one path between them.
Recall that given a set S an equivalence relation R on S:
∀x, y ∈ S xRy iff:
I Reflexive: xRx,
I Symmetric: if xRy then y Rx,
I Transitive: if xRy and y Rz then xRz
For the MST on undirected G = (V , E ) for u, v ∈ V , uRv iff
(uv ) ∈ E .
R partition the elements in equivalence classes.
Union-find: A DS to implement Kruskal
In the case of the MST on G = (V , E ), notice uRv = path is
indeed an equivalence relation:
I Reflexive: There is a path 0 between x an itself,
I Symmetric: because G is undirected,
I Transiteive: if u connected to v and v to w then w connected
to u.
That relation partition V into equivalence classes (connected
components without cycles)
Basic idea of U-F: Construct and maintain efficiently the
equivalence classes of the elements of a set.
Given G , for every v ∈ V : f a
2 1 1
Sv =Make-set(v )
e 1 2
S 0 = Union (Su , Sv ) c
Find (w ) (in S) 1 2 2
d b
Union-Find, aka Disjoint-sets
B. Galler, M. Fisher: An improved equivalence algorithm. ACM
Comm., 1964
A disjoint-set data structure that maintains a collection
{S1 , . . . , Sk } of disjoint dynamic sets, each set identified by a
representative.
Union find supports three operations on partitions of a set:
Make-set (x): creates a new set containing x.
Union (x, y ): Merge the sets containing x and y .
Find (x): find to which set x belongs,
(i.e. given x find its representative).
Union-find manipulates all types of objects: names, integers,
web-pages, computes ID, . . .
Union-find can be used in a myriad of applications, for ex. to keep
subsets of similar integers in a set of integers.
Union-find implementation: First idea

Given {S1 , . . . , Sk }, use double links


• Each set Si represented by a
double linked list.
• The representative of Si is
x y w
defined to be the element at Si
the head of the list
representing the set.

I Make-set (x): Initializes x as a lone list. Worst time Θ(1).


I Union (x, y ): Goes to the tail of Sx and points to head of Sy ,
concatenating both lists. Worst time Θ(n)
I Find (x): Goes left from x to the head of Sx . Worst case
Θ(n).

Can we do it more efficient?


Union-find implementation: Forest of Trees

Represent each set as a tree of elements:


• The root contains the
representative.
• Make-set (x): Θ(1)
A B H
• Find (x): find the root of the
tree containing x. Θ(height) D E F G
• Union (x, y ): make the root
Tree−representation of C
on one tree point to the root {A}, {B,D},{C,G,F,E,H}
of the other.
A better implementation: Link by rank

In this course: rank(x) = height of subtree rooted at x.


Any singleton element has rank=0
Inductively as we join trees, increase the rank of the root.
rank= 0 rank=1 rank= 2 rank= 3

0 0 1 1 0 0 2

0 0 1 0
0 0
Link by rank
Union rule: Link the root of smaller rank tree to the root of larger
rank tree.
In case the roots of both trees have the same rank, choose
arbitrarily and increase +1 the rank of the winer
except for the root, a node does not change rank during the
process

H rank=2

rank=1 B E F G

D C

Union(D,F)

• Union (x, y ): climbs to the roots of the tree containing x and y


and merges sets by making the tree with less rank a subtree of the
other tree.
This takes Θ(height) steps.
Link by rank

Maintain an integer rank for each node, initially 0. Link root of


smaller rank to root of larger rank; if a tie, increase rank of new
root by 1.

Union (x, y )
Make-set (x) rx = Find(x)
parent(x) = x ry = Find(y )
rank(x) = 0 if rx = ry then
Stop
else if (rank)(rx ) > (rank)(ry ) then
parent(ry ) = rx
else if (rank)(rx ) < (rank)(ry ) then
Find (x) parent(rx ) = ry
while (x 6= parent(x) else
do parent(rx ) = ry
x = parent(x) (rank)(ry ) = (rank)(ry ) + 1
end if
end while
Example construction Union-find

0 A 0 B 0 C 0 D 0 E 0 F 0 G 0 H 0 I 0 J 0 K

1F 1H
1B C1
0 G 0 I 0 J 0K
0 A 0 D 0E

2C 2H
1B D 0 1F 0 J K0
0 I
0 A 0 E 0 G

3
H
J 0 K 0
2C 1F I 0

1B D 0 E0 G0

A 0
Properties of Link by rank

3
P1.- If x is not a root then
2
rank(x) <
rank(parent(x))
1
P2.- If parent(x) changes
0
then rank(parent(x)) Ranks

increases
P3.- Any root of rank k has ≥ 2k descendants.
P4.- The highest rank of a root is ≤ blg nc
P5.- For any r ≥ 0, there are ≤ n/2r nodes with rank r .
Properties of Link by rank

Theorem
Using link-by-rank, each Union(x, y ) and Find(x) operations take
O(lg n) operations.
Proof The number of steps for each operation is bounded by the
height of the tree, which is O(lg n). 2

Theorem
Starting from an empty data structure with h disjoint single sets,
link-by-rank performs any intermixed sequence of m ≥ n Find and
n − 1 Union operations in m lg n steps.
Back to Kruskal

MST (G , w , r )
Sort E by increasing weight: {e1 , . . . , em }
T := ∅
for all v ∈ V do
Make-set(v )
end for
for i = 1 to m do
Chose ei = (u, v ) in order from E
if Find(x) 6= Find(y ) then
T := T ∪ {ei }
Union(u, v )
end if
end for
Cost is dominated by O(m lg m)
Greedy and Approximations algorithms

Many times the Greedy strategy yields a local feasible solution with
value which is near to the optimum solution.
In many practical cases, when finding the global optimum is hard,
it is sufficient to find a good local approximation.
Given an optimization problem (maximization or minimization) an
optimal algorithm computes the best output OPT (e) on any
instance e of size n.
An approximation algorithm for the problem computes any valid
output.
We want to design approximation algorithms, that are fast and in
worst case get an output as close as possible to OPT (e).
Greedy and Approximations algorithms

Given an optimization problem, an α-approximation algorithm Apx


computes a worst case output Apx(e), whose cost is within an
α ≥ 1 factor of OPT (e):

1 Apx(e)
≤ ≤ α.
α OPT (e)

α is denotes as the approximation ratio.


Notice, α measures the factor by which the output of Apx exceeds
OPT (e) , on a worst-case input.
The first ≤ works for maximization and the second ≤ works for
minimization.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the


minimum set of vertices S ⊆ V such that it covers every edge of G .

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the


minimum set of vertices S ⊆ V such that it covers every edge of G .

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the


minimum set of vertices S ⊆ V such that it covers every edge of G .

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the


minimum set of vertices S ⊆ V such that it covers every edge of G

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover
Theorem
The algorithm Apx runs in O(m + n) steps. Moreover,
|Apx(e)| ≤ 2|OPT (e) |.

Proof.
We use induction to prove |Apx(e)| ≤ 2|OPT (e) |. Notice for
every {u, v } we add to Apx(e), either u or v are in OPT (e).
Base: If V = ∅ then |Apx(e)| = |OPT (e) | = 0.
Hipothesis: |Apx(e) − {u, v }| ≤ 2|OPT (e) − {u, v }|. Then,

|Apx(e)| = |Apx(e) − {u, v }| + 2 ≤ 2|OPT (e) − {u, v }| + 2


≤ 2(|OPT (e) | − 1) + 2 ≤ 2|OPT (e) |.

The decision problem for Vertex Cover is NP-complet. Moreover,


unless P=NP, vertex cover can’t be approximated within a factor
α ≤ 1.36
Clustering problems

Clustering: process of finding interesting structure in a set of data.


Given a collection of objects, organize them into coherent groups
with respect to some metric distance (distance function d(·, ·)).
Recall if d is a metric: d(x, x) = 0, d(x, y ) > 0, d(x, y ) > 0 for
x 6= y and d(x, y ) + d(y , z) ≤ d(x, z).
k-clustering Problem: Given a set of points X = {x1 , x2 , . . . , xn }
together with a distance function on X and given a k > 0, want to
partition X into k disjoint subsets, a k-clustering, such as to
optimize some function (depending on d).
In this lecture we use d = Euclidean distance, and Z2 .
The k-Center clustering problem

Given as input a set of X = {x1 , . . . , xn }, with distances


D = {d(xi , xj )} and a given integer k:
Find the partition X into k clusters {C1 , . . . , Ck } such as to
minimize the diameter of the clusters, minj maxx,y ∈Cj d(x, y ).
Each ball Ci will be determine by a center ci and a radius r . Let
C = {c1 , . . . , ck } be the set of centers and r = r (C ).
Define C to be a r -cover for X if ∀x ∈ X , ∃cj ∈ C s.t. d(x, cj ) ≤ r .
The k-Center clustering problem
Equivalent statement of the problem: Given as input (X , D, k),
select the centers C = {c1 , . . . , ck }, and r = r (C ) such that the
resulting {C1 , . . . , Ck } is an r -cover for X , with r as small as
possible.
Formal definition of k-centre: Given X ⊂ Z2 points and k ∈ Z,
compute the set C = {c1 , . . . , ck } of centers C ⊂ X such that if
X̃ = X \C , it maximizes minx∈X̃ d(x, C ).
The k-Center clustering problem: Complexity

For k > 2, the decision version of the k-centre clustering problem


is NP-complete.
There is a deterministic algorithm working in O(nk ). ( Can you
design one?)
For k = 1: Find the smallest radius disk enclosing a point set
The problem can be solved in O(n lg n) (How?)
The k-Center clustering problem: Greedy algorithm

The algorithm iterates k times, at each iteration choose a new


center, add a new cluster and refines the radius ri of the cluster
balls. T. Gonzalez (1985)
1. Choose arbitrarily x and make c1 = x. Let C1 = {c1 }
2. For all xi ∈ X compute d(xi , c1 ).
3. Choose c2 = xj s.t. maxx∈X d(x, c1 ).
4. Let r1 = d(c1 , c2 ) and C2 = {c1 , c2 }.
5. For i = 2 to k
5.1 At interaction i + 1: Let ci+1 be the element in X \Ci that
maximizes the minimum distances to Ci .
5.2 Let Ci+1 = {c1 , c2 , . . . , ci+1 } and ri = max minj≤i d(ci+1 , cj ),
6. Output the C = {c1 , . . . , ck } centers and rk .
Greedy algorithm: Example

Given X , k = 3 and the n2 distance vector D:

x1

r1

x1
x2

r2 r
Greedy algorithm: Complexity

We have the set X of points and all their O(n2 ) distances. We assume we have
a data structure that keeps ordered the set of distances D, so we can and it is
quick to retrieve quickly any distance between points in X . How?
I At each step i we have to compute the distance from all x ∈ X to all
current centers c ∈ Ci−1 , and choose the new ci and ri , but
I For each x ∈ define
di [x] = d(x, Ci ) = min{d(x, Ci−1 ), d(x, ci )} = min{di−1 [x], d(x, ci )}
| {z }
(∗)

I Therefore at each step, to compute ri we need to update (∗).


I At iteration i, choosing ci and computing ri takes O(n) steps, therefore
the complexity of the greedy algorithm is O(kn) steps.
Approximation to the k-center problem

Theorem
The the resulting diameter in the previous greedy algorithm is an
approximation algorithm to the k-center problem, with an
approximation ratio of α = 2.
(i.e. It returns a set C s.t. r (C ) ≤ 2r (C ∗ ) where C ∗ is an optimal
set of k-centers.
Proof
Let C ∗ = {ci∗ }ki=1 and r ∗ be the optimal values, and let
C = {Ci }ki=1 and r the values returned by the algorithm. Want to
prove r ≤ 2r ∗ .

Case 1: Every Cj∗ covers at least one ci .


⇒ as ∀x ∈ X , ∃Cj∗ covering it, let
∃ci ∈ Cj∗ . Then, d(x, ci ) ≤ 2r ∗ . r
r*
Proof cont.

Case 2: At least one Cj∗ does not cover any center in C . Then,
∃Cl∗ covering at least ci and cj ⇒ d(ci , cj ) ≤ 2r ∗ .
We need to prove that d(ci , cj ) > r . Wlog assume the algorithm
chooses cj at iteration j and that ci has been selected as centre in
a previous iteration, then d(ci , cj ) > rj .
Moreover, notice than r1 ≥ r2 ≥ . . . rk = r ,
therefore d(ci , cj ) ≥ rj > r and r ≤ d(ci , cj ) ≤ 2r ∗ 2
Data Compression

INPUT: Given a text T over an finite


alphabet Σ
QUESTION: Represent T with as few
bits as possible.

The goal of data compression is to


reduce the time to transmit large
files, and to reduce the space to
store them.
If we are using variable-length
encoding we need a system easy to
encode and decode.
Example.

|AAACAGTTGCAT{z· · · GGTCCCTAGG}
130.000.000

I Fixed-length encoding: A = 00 ,C = 01, G = 10 and T = 11.


Needs 260Mbites to store.
I Variable-length encoding: If A appears 7 × 108 times, C
appears 3 × 106 times, G 2 × 108 and T 37 × 107 , better to
assign a shorter string to A and longer to C
Prefix property

Given a set of symbols Σ, a prefix code, is φ : Σ → {0, 1}+


(symbols to chain of bits) where for distinct x, y ∈ Σ, φ(x) is not a
prefix of φ(y ).
If φ(A) = 1 and φ(C ) = 101 then φ is no prefix code.
φ(A) = 1, φ(T ) = 01, φ(G ) = 000, φ(C ) = 001 is prefix code.
Prefix codes easy to decode (left-to-right):
000101100110100000101

000 |{z}
|{z} 1 |{z}
01 |{z}
1 |{z}
001 |{z}
1 |{z}
01 |{z}
000 |{z}
001 |{z}
01
G A T A C A T G C T
Prefix tree.
Represent encoding with prefix property as a binary tree, the prefix
tree:
A prefix tree T is a binary tree with the following properties:
I One leaf for symbol,
I Left edge labeled 0 and right edge labeled 1,
I Labels on the path from the root to a leaf specify the code for
that leaf.
For Σ = {A, T , G , C }

0 1

A
0 1

T
0 1

G C
Frequency.

To find an efficient code, first given a text S on Σ, with |S| = n,


first we must find the frequencies of the alphabet symbols.
∀x ∈ Σ, define the frequency
number occurrencies of x ∈ S
f (x) =
n
P
Notice: x∈Σ f (x) = 1.
Given a prefix code φ, which is the total length of the encoding?
The encoding length of S is
X X
B(S) = nf (x)|φ(x)| = n f (x)|φ(x)| .
x∈Σ x∈Σ
| {z }
α
P
Given φ, α = x∈Σ f (x)|φ(x)| is the average number of bits
required per symbol.
In terms of prefix tree of φ, given x and f (x), the length of the
codeword |φ(x)| is also the depth of x in T , let us denote it by
dx (T ).
P
Let B(T ) = x∈Σ f (x)dx (T ).
Example.
Let Σ = {a, b, c, d, e} and let S be a text over Σ.
Let f (a) = .32, f (b) = .25, f (c) = .20, f (d) = .18, f (e) = .05
If we use a fixed length code we need dlg 5e = 3 bits.
Consider the prefix-code φ1 :
φ1 (a) = 11, φ1 (b) = 01, φ1 (c) = 001, φ1 (d) = 10, φ1 (e) = 000

0 1

1 0 1
0

b d a
0 1

e c

α = .32 · 2 + .25 · 2 + .20 · 3 + .18 · 2 + .05 · 3 = 2.25


In average, φ1 reduces the bits per symbol over the fixed-length
code from 3 to 2.25, about 25%
Is that the maximum reduction?
Consider the prefix-code φ2 :
φ2 (a) = 11, φ2 (b) = 10, φ2 (c) = 01, φ2 (d) = 001, φ2 (e) = 000

0 1

1 0 1
0

c b a
0 1

e d

α = .32 · 2 + .25 · 2 + .20 · 2 + .18 · 3 + .05 · 3 = 2.23


is that the best? (the maximal compression)
Optimal prefix code.

Given a text, an optimal prefix code is a prefix code that minimizes


the total number of bits needed to encode the text.
Note that an optimal encoding minimizes α.
Intuitively, in the T of an optimal prefix code, symbols with high
frequencies should have small depth ans symbols with low
frequency should have large depth.
The search for an optimal prefix code is the search for a T , which
minimizes the α.
Characterization of optimal prefix trees.

A binary tree T is full if every interior node has two sons.


Lemma
The binary prefix tree corresponding to an optimal prefix code is
full.

Proof.
Let T be the prefix tree of an optimal code, and suppose it
contains a u with a son v .
If u is the root, construct T 0 by deleting u and using v com root.
T 0 will yield a code with less bits to code the symbols.
Contradiction to optimality of T .
If u is not the root, let w be the father of u. Construct T 0 by
deleting u and connecting directly v to w . Again this decreases the
number of bits, contradiction to optimality of T .
Greedy approach: Huffman code

Greedy approach due to David Huffman


(1925-99) in 1952, while he was a PhD student
at MIT

Wish to produce a labeled binary full tree, in which the leaves are
as close to the root as possible. Moreover symbols with low
frequency will be placed deeper than the symbol with high
frequency.
Greedy approach: Huffman code

I Given S assume we computed f (x) for every x ∈ Σ


I Sort the symbols by increasing f . Keep the dynamic sorted
list in a priority queue Q.
I Construct a tree in bottom-up fashion, take two first elements
of Q join them by a new virtual node with f the sum of the
f ’s of its sons, and place the new node in Q.
I When Q is empty, the resulting tree will be prefix tree of an
optimal prefix code.
Huffman Coding: Construction of the tree.

Huffman Σ, S
Given Σ and S {compute the frequencies {f }}
Construct priority queue Q of Σ, ordered by increasing f
while Q 6= ∅ do
create a new node z
x =Extract-Min (Q)
y =Extract-Min (Q)
make x, y the sons of z
f (z) = f (x) + f (y )
Insert (Q, z)
end while
If Q is implemented by a Heap, the algorithm has a complexity
O(n lg n).
Example

Consider the text: for each rose, a rose is a rose, the rose.
with Σ = {for/ each/ rose/ a/ is/ the/ ,/ b}
Frequencies: f (for) = 1/21, f (rose) = 4/21, f (is) = 1/21,
f (a) = 2/21, f (each) = 1/21, f (,) = 2/21, f (the) = 1/21,
f (b) = 9/21.
Priority Queue:
Q=(for(1/21), each(1/21), a(1/21), is(1/21), ,(2/21), the(2/21),
rose(4/21), b(9/21))

z1 (2/21)

for each

Q=(a(1/21), is(1/21), ,(2/21), the(2/21), z1(2/21), rose(4,21),


b(9/21))
Example.

z2 (2/21) z3 (4/21)
z4 (4/21)

z1 z2

a is , the
for each a is

z7 (21/21)
z5
0 1
z6
b z6
0 1
rose z3

z4 z5
z5 z4
0 1 0 1

, the z1 z2 rose z3
0 1 0 1 1
0

, the
for each a is
Example

Therefore for each rose, a rose is a rose, the rose is Huffman


codified:
10000100101101110010100110010110101001101110011110110
Notice with a fix code we will use 4 bits per symbol ⇒ 84 bits
instead of the 53 we use.
The solution is not unique!
Why does the Huffman’s algorithm produce an optimal prefix code?
Correctness.

Theorem (Greedy property)


Let Σ be an alphabet, and x, y two symbols with the lowest
frequency. Then, there is an optimal prefix code in which the code
for x and y have the same length and differ only in the last bit.

Proof.
For T optimal with a and b sibilings at max. depth. Assume
f (b) ≤ f (a). Construct T 0 by exchanging x with a and y with b.
As f (x) ≤ f (a) and f (y ) ≤ f (b) then B(T 0 ) ≤ B(T ).
Theorem (Optimal substructure)
Assume T 0 is an optimal prefix tree for (Σ − {x, y }) ∪ {z} where
x, y are symbols with lowest frequency, and z has frequency
f (x) + f (y ). The T obtained from T 0 by making x and y children
of z is an optimal prefix tree for Σ.

Proof.
Let T0 be any prefix tree for Σ. Must show B(T ) ≤ B(T0 ).
We only need to consider T0 where x and y are siblings. Let T00 be
obtained by removing x, y from T0 . As T00 is a prefix tree for
(Σ − {x, y }) ∪ {z}, then B(T00 ) ≥ B(T 0 ).
Comparing T0 with T00 we get,
B(T00 ) + f (x) + f (y ) = B(T0 ) and B(T 0 ) + f (x) + f (y ) = B(T ),
Putting together the three identities, we get B(T ) ≤ B(T0 ).
Optimality of Huffman

Huffman is optimal under assumptions:


I The compression is lossless, i.e. uncompressing the
compressed file yield the original file.
I We must know the alphabet beforehand (characters, words,
etc.)
I We must pre-compute the frequencies of symbols, i.e. read
the data twice
For certain applications is very slow (on the size n of the input
text)

You might also like