0% found this document useful (0 votes)

96 views95 pages

Greedy Algorithms and Data Compression.: Curs Fall 2017

The document discusses greedy algorithms and their application to optimization problems involving data compression and scheduling tasks. It provides examples of problems that can be solved using greedy approaches, such as fractional knapsack and activity selection problems, and examines conditions where greedy algorithms provide optimal solutions. It also discusses cases where greedy methods may not yield optimal answers and requires alternative approaches.

Uploaded by

Sarbani Dasgupts

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views95 pages

Greedy Algorithms and Data Compression.: Curs Fall 2017

Uploaded by

Sarbani Dasgupts

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

Greedy Algorithms and Data Compression.

Curs Fall 2017

Greedy Algorithms
An optimization problem: Given of (S, f ), where S is a set of
feasible elements and f : S → R is the objective function, find a
u ∈ S, which optimizes (maximizes o minimizes) f .
A greedy algorithm always makes a locally optimal choice in the
(myopic) hope that this choice will lead to a globally optimal
solution.
Greedy algorithms do not always yield optimal solutions, but for
some problems they are very efficient.
Greedy Algorithms

I At each step we choose the best choice at the moment and

then solve the subproblem that arise later.
I The choice may depend on previous choices, but not on future
choices.
I At each choice, the algorithm reduces the problem into a
smaller one.
I A greedy algorithm never backtracks.
Greedy Algorithms

For the greedy strategy to work correctly, it is necessary that the

problem under consideration has two characteristics:
I Greedy choice property: We can arrive to the global optimum
by selecting a local optimums.
I Optimal substructure: An optimal solution to the problem
contains the optimal solutions to subproblems.
Fractional knapsack problem

Fractional Knapsack
INPUT:a set I = {i}n1 of items that can be fractioned, each i with
weight wi and value vi . A maximum weight W permissible
QUESTION: select whole or partial items to maximize the profit,
within allowed weight W
Example.
Item I: 1 2 3
Value V: 60 100 120
Weight w : 10 20 30
W = 28
Fractional knapsack

Greedy for fractional knapsack (I , V , W )

Sort I in decreasing value of vi /wi
Take the maximum amount of the first item
while Total weight taken ≤ W do
Take the maximum amount of the next item
end while
If n is the number of items, The algorithm has a cost of
T (n) = O(n + n log n).
Example.
Item I: 1 2 3
Value V : 60 100 120
Weight w : 10 20 30
v /w : 6 5 4
As W = 28 then take 10 of 1 and 18 of 2
Correctness?
Greedy does not always work

0-1 Knapsack
INPUT:a set I = {i}n1 of items that can NOT be fractioned, each i
with weight wi and value vi . A maximum weight W permissible
QUESTION: select the items to maximize the profit, within
allowed weight W .
For example
Item I: 1 2 3
Value V : 60 100 120
with
Weight w : 10 20 30
v /w : 6 5 4
W = 50.
Then any solution which includes item 1 is not optimal. The
optimal solution consists of items 2 and 3.
Activity scheduling problems

A set of activities S = {1, 2, . . . , n} to be processed by a single

processor, according to different constrains.
1. Interval scheduling problem: Each i ∈ S has a start time si
and a finish time fi . Maximize the set of mutually compatible
activities
2. Weighted interval scheduling problem: Each i ∈ S has a si , a
fi , and a weight wP
i . Find the set of mutually compatible such
that it maximises i∈S wi
3. Job scheduling problem (Lateness minimisation): Each i ∈ S
has a processing time ti (could start at any time si ) but it has
a deadline di , define lateness Li of i by maxi {0, (si + ti ) − di }.
Find the schedule of the si for all the tasks s.t. no two tasks
are planned to be processed at the time and the lateness is
minimized.
The interval scheduling problem

Activity Selection Problem

INPUT: a set S = {1, 2, . . . , n} of activities to be processed by a
single resource. Each activity i has a start time si and a finish time
fi , with fi > si .
QUESTION: Maximize the set of mutually compatible activities,
where activities i and j are compatible if [si fi ) ∩ (sj fj ] = ∅.
Notice, S = {A|A ⊂ S, is compatible } and f (A) = |A|.
Example.

Activity (A): 1 2 3 4 5 6 7 8
Start (s): 3 2 2 1 8 6 4 7
Finish (f): 5 5 3 5 9 9 5 8

4
3 7 8

2 6
1 5

1 2 3 4 5 6 7 8 9 10
To apply the greedy technique to a problem, we must take into
consideration the following,
I A local criteria to allow the selection,
I a condition to determine if a partial solution can be
completed,
I a procedure to test that we have the optimal solution.
The Activity Selection problem.
Given a set A of activities, wish to maximize the number of
compatible activities.

Activity selection A
Sort A by increasing order of fi
Let a1 , a2 , . . . , an the resulting sorted list of activities
S := {a1 }
j := 1 {pointer in sorted list}
for i = 2 to n do
if si ≥ fj then
S := S ∪ {ai } and j := i
end if
end for
return S.

4
A : 3 1 2 7 8 5 6; fi : 3 5 5 5 8 9 9 3 7 8

⇒ SOL: 3 1 8 5 2 6
1 5

1 2 3 4 5 6 7 8 9 10
Notice: In the activity problem we are maximizing the number of
activities, independently of the occupancy of the resource under
consideration. For example in:
4
3 7 8

2 6
1 5

1 2 3 4 5 6 7 8 9 10

solution 3185 is as valid as 3785. If we were asking for maximum

occupancy 456 will be a solution.
How would you modify the previous algorithm to deal with the
maximum occupancy problem?
Theorem
The previous algorithm produces an optimal solution to the activity
selection problem.
There is an optimal solution that includes the activity with earlier
finishing time.
Proof.
Given A = {1, . . . , n} sorted by finishing time, we must show there
is an optimal solution that begins with activity 1. Let
S = {k, . . . , m} be a solution. If k = 1 done. Otherwise, define
B = (S − {k}) ∪ {1}. As f1 ≤ fk the activities in B are disjoint.
As |B| = |S|, B is also an optimal solution. If S is an optimal
solution to A, then S 0 = A − {1} is an optimal solution for
A0 = {i ∈ A|si ≥ f1 }. Therefore, after each greedy choice we are
left with an optimization problem of the same form as the original.
Induction on the number of choices, the greedy strategy produces
an optimal solution
Notice the the optimal substructure of the problem: If an optimal
solution S to a problem includes ak , then the partial solutions
excluding ak from S should also be optimal in their corresponding
domains.
Greedy does not always work.
Weighted Activity Selection Problem
INPUT: a set S = {1, 2, . . . , n} of activities to be processed by a
single resource. Each activity i has a start time si and a finish time
fi , with fi > si , and a weight wi .
QUESTION: PFind the set of mutually compatible such that it
maximizes i∈S wi
S := ∅
sort W = {wi } by decreasing value
choose the max. weight wm
add m = (lm , um ) to S
remove all intervals overlapping with m
while there are intervals do
repeat the greedy procedure
end while
return S
Correctness?
Greedy does not always work.

The previous greedy does not always solve the problem!

6 6
10

1 5 10
The algorithm chooses the interval (1, 10) with weight 10, and the
solution is the intervals (2, 5) and (5, 9) with total weight of 12
Job scheduling problem
Also known as the Lateness minimisation problem.
We have a single resource and n requests to use the resource, each
request i taking a time ti .
In contrast to the previous problem, each request instead of having
an starting and finishing time, it has a deadline di . The goal is to
schedule the tasks as to minimize over all the requests, the
maximal amount of time that a request exceeds the deadline.
Minimize Lateness
I We have a single processor
i ti di
I We have n jobs such that job i:
1 1 9
I requires ti units of processing time,
I it has to be finished by time di , 2 2 8
3 2 15
I Lateness of i:
( 4 3 6
0 if fi ≤ di , 5 3 14
Li = max 6 4 9
fi − d i otherwise.

Goal: schedule the jobs to minimize lateness

i.e. We must assign starting time si to each i, as to mini max Li .

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6
0 0 0 2 0 6
Minimize Lateness
Schedule jobs according to some ordering
(1.-) Sort in increasing order of ti :
Process jobs with short time first

i ti di
1 1 6
2 5 5

(2.-) Sort in increasing order of di − ti :

Process first jobs with less slack time

i ti di d1 − ti
1 1 2 1
2 10 10 0

In this case, job 2 should be processed first, which doesn’t

minimise lateness.
Process urgent jobs first

(3.-) Sort in increasing order of di .

LatenessA {i, ti , di }
SORT by increasing order of di : i ti di sorted i
{d1 , d2 , . . . , dn } 1 1 9 3
Rearrange the jobs i : 1, 2, . . . , n
2 2 8 2
t=0
for i = 2 to n do 3 2 15 6
Assign job i to [t, t + ti ] 4 3 6 1
t = t + ti 5 3 14 5
si = t; fi = t + ti
end for
6 4 9 4
return S = {[s1 , f1 ], . . . [sn , fn ]}.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d: 6 8 9 9 14 15
i: 1 2 3 4 5 6
0 0 0 1 0 0
Complexity and idle time

Time complexity
Running-time of the algorithm without comparison sorting: O(n)
Total running-time: O(n lg n)
Idle steps
From an optimal schedule with idle steps, we always can eliminate
gaps to obtain another optimal schedule:
0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

There exists an optimal schedule with no idle steps.

Inversions and exchange argument
An schedule has an inversion if i is scheduled before j with dj < di .
fj

i j
fi0

j i

dj di

Lemma
Exchanging two inverted jobs reduces the number of inversions by 1 and
does not increase the max lateness.
Proof Let L = lateness before exchange and let L0 be the lateness after
the exchange, let Li , Lj , L0i , L0j , the corresponding quantities for i, j.
Notice that fj0 = fi , using the fact that dj < di
⇒ L0i = fi 0 − di = fj − di < fj − dj = Lj
Therefore the swapping does not increase the maximum lateness of the
schedule. 2
Correctness of LatenessA

Notice the output S produced by LatenessA has no inversions and

no idle steps.

Theorem
Algorithm LatenessA returns an optimal schedule S.
Proof
Assume Ŝ is an optimal schedule with the minimal number of
inversions (and no idle steps).
If Ŝ has 0 inversions then Ŝ = S.
If number inversions in Ŝ is > 0, let i − j be an adjacent inversion.
Exchanging i and j does not increase lateness and decrease the
number of inversions.
Therefore, max lateness S ≤ max lateness Ŝ. 2
Network construction: Minimum Spanning Tree
I We have a set of locations V = {v1 , . . . , vn },
I we want to build a communication network on top of them
I we want that any vi can communicate with any vj ,
I for any pair (vi , vj ) there is a cost w (vi , vj ) of building a
direct link,
I if E is the set of all n(n − 1)/2 possible edges, we want to
P T (E ) ⊆ E s.t. (V , T (E )) is connected and
find a subset
minimizes e∈T (E ) w (e).
Network construction: Minimum Spanning Tree
I We have a set of locations V = {v1 , . . . , vn },
I we want to build a communication network on top of them
I we want that any vi can communicate with any vj ,
I for any pair (vi , vj ) there is a cost w (vi , vj ) of building a
direct link,
I if E is the set of all n(n − 1)/2 possible edges, we want to
P T (E ) ⊆ E s.t. (V , T (E )) is connected and
find a subset
minimizes e∈T (E ) w (e).

Construct
the
MST
Minimum Spanning Tree (MST).

INPUT: An edge weighted graph G = (V , E ),

|V | = n, ∀e ∈ E , w (e) ∈ R,
QUESTION: Find a tree TP with V (T ) = V and E (T ) ⊆ E , such
that it minimizes w (T ) = e∈E (T ) w (e).

e e
6 4 6 4
5 9 5 9
a f g a f g
14 2 14 2
b 10 d 15 h b 10 d 15 h

3 8 3 8
c c
Some definitions

Given G = (V , E ):
A path is a sequence of consecutive
edges. A cyle is a path with no
repeated vertices other that the one 6
e
4
that it starts and ends. a 5 f
9
g
A cut is a partition of V into S and 14 2
10 15
V − S. b d h

The cut-set of a cut is the set of 3

c
8
S
edges with one end in S and the
other in V − S.
Overall strategy

Given a MST T in G , with different edge weights, T has the

following properties:
I Cut property
e ∈ T ⇔ e is the lighted edge across some cut in G .
I Cycle property
e 6∈ T ⇔ e is the heaviest edge on some cycle in G .

The MST algorithms are methods for ruling edges in or out of G .

The ⇐ implication of the cut property will yield the blue (include)
rule, which allow us to include a min weight edge in T for ∃ cut.
The ⇒ implication of the cycle property will yield the red (exclude)
rule which allow us to exclude a max weight edge from T for ∃
cycles.
The cut rule (Blue rule)
The MST problem has the property of the optimal substructure:
Given an optimal MST T , removing an edge e yields T1 and T2
which are optimal for each subgraph.

Theorem (The cut rule)

Given G = (V , E ), w : E → R, let T be a MST of G and S ⊆ T .
Let e = (u, v ) an min-weight edge in G connecting S to V − S.
Then e ∈ T .
The edges incorporated to the solution by this rule, are said to be
blue.
Proof. 6
e
4
Assume e 6∈ T . Consider a path from a 5 f
9
g
u to v in T . Replacing the first edge 14 2
10 15
in the path, which is not in S, by e, b d h

must give a spanning tree of equal or 3

c
8

less weight.
The cycle rule (Red rule)

The MST problem has the property that the the optimal solution
can’t be a cycle.

Theorem (The cycle rule)

Given G = (V , E ), w : E → R, let C be a cycle in G , the edge
e ∈ C with greater weight can not be part of the optimal MST T .
The edges processed by this rule, are said to be red.
e
6 4
Proof. a 5 f
9
g
Let C be a cycle spanning through C 2
14
vertices {vi , . . . , vl }, then removing b 10 d 15 h
the max weighted edge gives a a 3 8
c
better solution.
C=cycle spanning {a,c,d,f}
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure

problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select from the
e
cut-set a non-colored edge with min weight 6 4
5 9
and paint it blue a f g
14 2
Red rule: Given a cycle C with no red
b 10 d 15 h
edges, selected an non-colored edge in C
3 8
with max weight and paint it red. c

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure

problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a e
4
6
non-colored edge with min weight and a 5 f
9
g
paint it blue 14 2
Red rule: Given cycle C with no red edges, b 10 d 15 h
selected a non-colored edge in C with max 3 8
c
weight and paint it red.
Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure

problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a
e
non-colored edge with min weight and 6 4
5 9
paint it blue a f g
14 2
Red rule: Given cycle C with no red edges,
b 10 d 15 h
selected a non-colored edge in C with max
3 8
weight and paint it red. c

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure

problem we can apply greedy.
Robert Tarjan: Data Structures and Network Algorithms, SIAM ,
1984
Blue rule: Given a cut-set between S and
V − S with no blue edges, select a e
6 4
non-colored edge with min weight and 5 9
a f g
paint it blue 14 2
Red rule: Given cycle C with no red edges, b 10 d 15 h

selected a non-colored edge in C with max 3

c
8

weight and paint it red.

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST

The Min. Spanning Tree problem has the optimal substructure

Greedy scheme:
Given G , V (G ) = n, apply the red and blue rules until having
n − 1 blue edges, those form the MST.
Greedy for MST : Correctness

Theorem
There exists a MST T containing only all blue edges. Moreover
the algorithm finishes and finds a MST
Sketch of proof Induction on number of
iterations for blue and red rules. The base e
4
case (no edges colored) is trivial. The 6
9
a 5 f g
induction step is the same that in the
14 10 2
proofs of the cut and cycle rules. b d 15 h
Moreover if we have an e not colored, if 3
e
8
c
ends are in different blue tree, apply blue C

rule, otherwise color red e. 2

We need implementations for the algorithm! The ones we present
use only the blue rule
A short history of MSP implementation

There has been extensive work to obtain the most efficient

algorithm to find a MST in a given graph:
I O. Borůvka gave the first greedy algorithm for the MST in 1926. V.
Jarnik gave a different greedy for MST in 1930, which was re-discovered
by R. Prim in 1957. In 1956 J. Kruskal gave a different greedy algorithms
for the MST. All those algorithms run in O(m lg n).
I Fredman and Tarjan (1984) gave a O(m log∗ n) algorithm, introducing a
new data structure for priority queues, the Fibbonacci heap. Recall log∗ n
is the number of operations to go from n = 1 to log∗ 1000 = 4.
I Gabow, Galil, Spencer and Tarjan (1986) improved Fredman-Tarjan to
O(m log(log∗ n)).
I Karger, Klein and Tarjan (1995) O(m) randomized algorithm.
I In 1997 B. Chazelle gave an O(mα(n)) algorithm, where α(n) is a very
slowly growing function, the inverse of the Ackermann function.
Basic algorithms
Use the greedy
I Jarnik-Prim (Serial centralized) Starting from a vertex v ,
grows T adding each time the lighter edge already connected
to a vertex in T , using the blue’s rule. Uses a priority heap
(usually a heap) to store the edges to be added and retrieve
the lighter one.
I Kruskal (Serial distributed) Considers every edge and grows a
forest F by using the blue and red rules to include or discard
e. The insight of the algorithm is to consider the edges in
order of increasing weight. This makes the complexity of
Kruskal’s to be dominated by Ω(m lg m). At the end F
becomes T . The efficient implementation of the algorithm
uses Union-find data structure.
Jarnik-Prim vs. Kruskal

Jarnik−Prim: How blue man can spread his message to everybody

Kruskal: How to stablish a min distance cost network about all men
Jarnı́k-Prim vs. Kruskal

Jarnik−Prim: How blue man can spread his message to everybody

(first 6 steps)

Kruskal: How to stablish a min distance cost network about all men
(6 first edges)
Jarnı́k - Prim greedy algorithm.

V. Jarnı́k, 1936, R. Prim, 1957

Greedy on vertices with Priority queue
Starting with an arbitrary node r , at each step build the MST by
incrementing the tree with an edge of minimum weight, which
does not form a cycle.

MST (G , w , r )
T := ∅
for i = 1 to |V | do
Let e ∈ E : e touches T , it has min weight, and do not form a
cycle
T := T ∪ {e}
end for
Use a priority queue to choose min e connected to the tree already
formed.
For every v ∈ V − T , let k[v ] = minimum weight of any edge
connecting v to any vertex in T .
Start with k[v ] = ∞ for all v .
For v ∈ T , let π[v ] be the parent of v . During the algorithm
T = {(v , π[v ]) : v ∈ V − {r } − Q}
where r is the arbitrary starting vertex and Q is a min priority
queue storing k[v ]. The algorithm finishes when Q = ∅.
Example.

e e e
6 4 6 4 6 4
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h b 10 d 15 h

3 8 3 8 3 8
c c c

e e e
4 4 6 4
6 6
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h 10 15
b d h
3 8 3 8 3 8
c c c
e e
6 4 6 4
5 9 5 9
a f g a f g
14 2 14 2
b 10 d 15 h b 10 d 15 h

3 8 3 8
c c

w (T ) = 52

Time: depends on the implementation of Q.

Q an unsorted array: T (n) = O(|V |2 );
Q a heap: T (n) = O(|E | lg |V |).
Q a Fibonacci heap: T (n) = O(|E | + |V | lg |V |)
Kruskal’s greedy algorithm.

J. Kruskal, 1956
Similar to Jarnı́k - Prim, but chooses minimum weight edges,
without keeping the graph connected.
MST (G , w , r )
Sort E by increasing weight
T := ∅
for i = 1 to |V | do
Let e ∈ E : with minimum weight and do not form a cycle
with T
T := T ∪ {e}
end for
We have an O(m lg m) from the sorting the edges.
Useful to implement the adding and removing of edges to T we
use Union-Find data structure.
Example.

e e e
6 4 6 4 6 4
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h b 10 d 15 h

3 8 3 8 3 8
c c c

e e e
4 4 6 4
6 6
5 9 5 9 5 9
a f g a f g a f g
14 2 14 2 14 2
b 10 d 15 h b 10 d 15 h 10 15
b d h
3 8 3 8 3 8
c c c
Union-find: A DS to implement Kruskal

Notice that Kruskal evolves by building clumps of connected nodes

without cycles and merging the clumps into larger clumps, taking
care there are not cycles.
In trees, the connectivity relation is an equivalence relation, two
nodes are connected if there is only one path between them.
Recall that given a set S an equivalence relation R on S:
∀x, y ∈ S xRy iff:
I Reflexive: xRx,
I Symmetric: if xRy then y Rx,
I Transitive: if xRy and y Rz then xRz
For the MST on undirected G = (V , E ) for u, v ∈ V , uRv iff
(uv ) ∈ E .
R partition the elements in equivalence classes.
Union-find: A DS to implement Kruskal
In the case of the MST on G = (V , E ), notice uRv = path is
indeed an equivalence relation:
I Reflexive: There is a path 0 between x an itself,
I Symmetric: because G is undirected,
I Transiteive: if u connected to v and v to w then w connected
to u.
That relation partition V into equivalence classes (connected
components without cycles)
Basic idea of U-F: Construct and maintain efficiently the
equivalence classes of the elements of a set.
Given G , for every v ∈ V : f a
2 1 1
Sv =Make-set(v )
e 1 2
S 0 = Union (Su , Sv ) c
Find (w ) (in S) 1 2 2
d b
Union-Find, aka Disjoint-sets
B. Galler, M. Fisher: An improved equivalence algorithm. ACM
Comm., 1964
A disjoint-set data structure that maintains a collection
{S1 , . . . , Sk } of disjoint dynamic sets, each set identified by a
representative.
Union find supports three operations on partitions of a set:
Make-set (x): creates a new set containing x.
Union (x, y ): Merge the sets containing x and y .
Find (x): find to which set x belongs,
(i.e. given x find its representative).
Union-find manipulates all types of objects: names, integers,
web-pages, computes ID, . . .
Union-find can be used in a myriad of applications, for ex. to keep
subsets of similar integers in a set of integers.
Union-find implementation: First idea

Given {S1 , . . . , Sk }, use double links

• Each set Si represented by a
double linked list.
• The representative of Si is
x y w
defined to be the element at Si
the head of the list
representing the set.

I Make-set (x): Initializes x as a lone list. Worst time Θ(1).

I Union (x, y ): Goes to the tail of Sx and points to head of Sy ,
concatenating both lists. Worst time Θ(n)
I Find (x): Goes left from x to the head of Sx . Worst case
Θ(n).

Can we do it more efficient?

Union-find implementation: Forest of Trees

Represent each set as a tree of elements:

• The root contains the
representative.
• Make-set (x): Θ(1)
A B H
• Find (x): find the root of the
tree containing x. Θ(height) D E F G
• Union (x, y ): make the root
Tree−representation of C
on one tree point to the root {A}, {B,D},{C,G,F,E,H}
of the other.
A better implementation: Link by rank

In this course: rank(x) = height of subtree rooted at x.

Any singleton element has rank=0
Inductively as we join trees, increase the rank of the root.
rank= 0 rank=1 rank= 2 rank= 3

0 0 1 1 0 0 2

0 0 1 0
0 0
Link by rank
Union rule: Link the root of smaller rank tree to the root of larger
rank tree.
In case the roots of both trees have the same rank, choose
arbitrarily and increase +1 the rank of the winer
except for the root, a node does not change rank during the
process

H rank=2

rank=1 B E F G

D C

Union(D,F)

• Union (x, y ): climbs to the roots of the tree containing x and y

and merges sets by making the tree with less rank a subtree of the
other tree.
This takes Θ(height) steps.
Link by rank

Maintain an integer rank for each node, initially 0. Link root of

smaller rank to root of larger rank; if a tie, increase rank of new
root by 1.

Union (x, y )
Make-set (x) rx = Find(x)
parent(x) = x ry = Find(y )
rank(x) = 0 if rx = ry then
Stop
else if (rank)(rx ) > (rank)(ry ) then
parent(ry ) = rx
else if (rank)(rx ) < (rank)(ry ) then
Find (x) parent(rx ) = ry
while (x 6= parent(x) else
do parent(rx ) = ry
x = parent(x) (rank)(ry ) = (rank)(ry ) + 1
end if
end while
Example construction Union-find

0 A 0 B 0 C 0 D 0 E 0 F 0 G 0 H 0 I 0 J 0 K

1F 1H
1B C1
0 G 0 I 0 J 0K
0 A 0 D 0E

2C 2H
1B D 0 1F 0 J K0
0 I
0 A 0 E 0 G

3
H
J 0 K 0
2C 1F I 0

1B D 0 E0 G0

A 0
Properties of Link by rank

3
P1.- If x is not a root then
2
rank(x) <
rank(parent(x))
1
P2.- If parent(x) changes
0
then rank(parent(x)) Ranks

increases
P3.- Any root of rank k has ≥ 2k descendants.
P4.- The highest rank of a root is ≤ blg nc
P5.- For any r ≥ 0, there are ≤ n/2r nodes with rank r .
Properties of Link by rank

Theorem
Using link-by-rank, each Union(x, y ) and Find(x) operations take
O(lg n) operations.
Proof The number of steps for each operation is bounded by the
height of the tree, which is O(lg n). 2

Theorem
Starting from an empty data structure with h disjoint single sets,
link-by-rank performs any intermixed sequence of m ≥ n Find and
n − 1 Union operations in m lg n steps.
Back to Kruskal

MST (G , w , r )
Sort E by increasing weight: {e1 , . . . , em }
T := ∅
for all v ∈ V do
Make-set(v )
end for
for i = 1 to m do
Chose ei = (u, v ) in order from E
if Find(x) 6= Find(y ) then
T := T ∪ {ei }
Union(u, v )
end if
end for
Cost is dominated by O(m lg m)
Greedy and Approximations algorithms

Many times the Greedy strategy yields a local feasible solution with
value which is near to the optimum solution.
In many practical cases, when finding the global optimum is hard,
it is sufficient to find a good local approximation.
Given an optimization problem (maximization or minimization) an
optimal algorithm computes the best output OPT (e) on any
instance e of size n.
An approximation algorithm for the problem computes any valid
output.
We want to design approximation algorithms, that are fast and in
worst case get an output as close as possible to OPT (e).
Greedy and Approximations algorithms

Given an optimization problem, an α-approximation algorithm Apx

computes a worst case output Apx(e), whose cost is within an
α ≥ 1 factor of OPT (e):

1 Apx(e)
≤ ≤ α.
α OPT (e)

α is denotes as the approximation ratio.

Notice, α measures the factor by which the output of Apx exceeds
OPT (e) , on a worst-case input.
The first ≤ works for maximization and the second ≤ works for
minimization.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the

minimum set of vertices S ⊆ V such that it covers every edge of G .

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover

Given a graph G = (V , E ) with |V | = n, |E | = m find the

minimum set of vertices S ⊆ V such that it covers every edge of G .

Given a graph G = (V , E ) with |V | = n, |E | = m find the

minimum set of vertices S ⊆ V such that it covers every edge of G .

Given a graph G = (V , E ) with |V | = n, |E | = m find the

minimum set of vertices S ⊆ V such that it covers every edge of G

GreedyVC G = (V , E )
E 0 = E , S = ∅,
while E 0 6= ∅ do
2
Pick e ∈ E 0 , say e = (u, v ) 1 3
S = S ∪ {u, v },
4 5 6 7
E 0 = E 0 − {(u, v ) ∪ {edges incident to u, v }}
end while
return S.
An easy example: Vertex cover
Theorem
The algorithm Apx runs in O(m + n) steps. Moreover,
|Apx(e)| ≤ 2|OPT (e) |.

Proof.
We use induction to prove |Apx(e)| ≤ 2|OPT (e) |. Notice for
every {u, v } we add to Apx(e), either u or v are in OPT (e).
Base: If V = ∅ then |Apx(e)| = |OPT (e) | = 0.
Hipothesis: |Apx(e) − {u, v }| ≤ 2|OPT (e) − {u, v }|. Then,

|Apx(e)| = |Apx(e) − {u, v }| + 2 ≤ 2|OPT (e) − {u, v }| + 2

≤ 2(|OPT (e) | − 1) + 2 ≤ 2|OPT (e) |.

The decision problem for Vertex Cover is NP-complet. Moreover,

unless P=NP, vertex cover can’t be approximated within a factor
α ≤ 1.36
Clustering problems

Clustering: process of finding interesting structure in a set of data.

Given a collection of objects, organize them into coherent groups
with respect to some metric distance (distance function d(·, ·)).
Recall if d is a metric: d(x, x) = 0, d(x, y ) > 0, d(x, y ) > 0 for
x 6= y and d(x, y ) + d(y , z) ≤ d(x, z).
k-clustering Problem: Given a set of points X = {x1 , x2 , . . . , xn }
together with a distance function on X and given a k > 0, want to
partition X into k disjoint subsets, a k-clustering, such as to
optimize some function (depending on d).
In this lecture we use d = Euclidean distance, and Z2 .
The k-Center clustering problem

Given as input a set of X = {x1 , . . . , xn }, with distances

D = {d(xi , xj )} and a given integer k:
Find the partition X into k clusters {C1 , . . . , Ck } such as to
minimize the diameter of the clusters, minj maxx,y ∈Cj d(x, y ).
Each ball Ci will be determine by a center ci and a radius r . Let
C = {c1 , . . . , ck } be the set of centers and r = r (C ).
Define C to be a r -cover for X if ∀x ∈ X , ∃cj ∈ C s.t. d(x, cj ) ≤ r .
The k-Center clustering problem
Equivalent statement of the problem: Given as input (X , D, k),
select the centers C = {c1 , . . . , ck }, and r = r (C ) such that the
resulting {C1 , . . . , Ck } is an r -cover for X , with r as small as
possible.
Formal definition of k-centre: Given X ⊂ Z2 points and k ∈ Z,
compute the set C = {c1 , . . . , ck } of centers C ⊂ X such that if
X̃ = X \C , it maximizes minx∈X̃ d(x, C ).
The k-Center clustering problem: Complexity

For k > 2, the decision version of the k-centre clustering problem

is NP-complete.
There is a deterministic algorithm working in O(nk ). ( Can you
design one?)
For k = 1: Find the smallest radius disk enclosing a point set
The problem can be solved in O(n lg n) (How?)
The k-Center clustering problem: Greedy algorithm

The algorithm iterates k times, at each iteration choose a new

center, add a new cluster and refines the radius ri of the cluster
balls. T. Gonzalez (1985)
1. Choose arbitrarily x and make c1 = x. Let C1 = {c1 }
2. For all xi ∈ X compute d(xi , c1 ).
3. Choose c2 = xj s.t. maxx∈X d(x, c1 ).
4. Let r1 = d(c1 , c2 ) and C2 = {c1 , c2 }.
5. For i = 2 to k
5.1 At interaction i + 1: Let ci+1 be the element in X \Ci that
maximizes the minimum distances to Ci .
5.2 Let Ci+1 = {c1 , c2 , . . . , ci+1 } and ri = max minj≤i d(ci+1 , cj ),
6. Output the C = {c1 , . . . , ck } centers and rk .
Greedy algorithm: Example

Given X , k = 3 and the n2 distance vector D:

x1
x2

r2 r
Greedy algorithm: Complexity

We have the set X of points and all their O(n2 ) distances. We assume we have
a data structure that keeps ordered the set of distances D, so we can and it is
quick to retrieve quickly any distance between points in X . How?
I At each step i we have to compute the distance from all x ∈ X to all
current centers c ∈ Ci−1 , and choose the new ci and ri , but
I For each x ∈ define
di [x] = d(x, Ci ) = min{d(x, Ci−1 ), d(x, ci )} = min{di−1 [x], d(x, ci )}
| {z }
(∗)

I Therefore at each step, to compute ri we need to update (∗).

I At iteration i, choosing ci and computing ri takes O(n) steps, therefore
the complexity of the greedy algorithm is O(kn) steps.
Approximation to the k-center problem

Theorem
The the resulting diameter in the previous greedy algorithm is an
approximation algorithm to the k-center problem, with an
approximation ratio of α = 2.
(i.e. It returns a set C s.t. r (C ) ≤ 2r (C ∗ ) where C ∗ is an optimal
set of k-centers.
Proof
Let C ∗ = {ci∗ }ki=1 and r ∗ be the optimal values, and let
C = {Ci }ki=1 and r the values returned by the algorithm. Want to
prove r ≤ 2r ∗ .

Case 1: Every Cj∗ covers at least one ci .

⇒ as ∀x ∈ X , ∃Cj∗ covering it, let
∃ci ∈ Cj∗ . Then, d(x, ci ) ≤ 2r ∗ . r
r*
Proof cont.

Case 2: At least one Cj∗ does not cover any center in C . Then,
∃Cl∗ covering at least ci and cj ⇒ d(ci , cj ) ≤ 2r ∗ .
We need to prove that d(ci , cj ) > r . Wlog assume the algorithm
chooses cj at iteration j and that ci has been selected as centre in
a previous iteration, then d(ci , cj ) > rj .
Moreover, notice than r1 ≥ r2 ≥ . . . rk = r ,
therefore d(ci , cj ) ≥ rj > r and r ≤ d(ci , cj ) ≤ 2r ∗ 2
Data Compression

INPUT: Given a text T over an finite

alphabet Σ
QUESTION: Represent T with as few
bits as possible.

The goal of data compression is to

reduce the time to transmit large
files, and to reduce the space to
store them.
If we are using variable-length
encoding we need a system easy to
encode and decode.
Example.

|AAACAGTTGCAT{z· · · GGTCCCTAGG}
130.000.000

I Fixed-length encoding: A = 00 ,C = 01, G = 10 and T = 11.

Needs 260Mbites to store.
I Variable-length encoding: If A appears 7 × 108 times, C
appears 3 × 106 times, G 2 × 108 and T 37 × 107 , better to
assign a shorter string to A and longer to C
Prefix property

Given a set of symbols Σ, a prefix code, is φ : Σ → {0, 1}+

(symbols to chain of bits) where for distinct x, y ∈ Σ, φ(x) is not a
prefix of φ(y ).
If φ(A) = 1 and φ(C ) = 101 then φ is no prefix code.
φ(A) = 1, φ(T ) = 01, φ(G ) = 000, φ(C ) = 001 is prefix code.
Prefix codes easy to decode (left-to-right):
000101100110100000101

000 |{z}
|{z} 1 |{z}
01 |{z}
1 |{z}
001 |{z}
1 |{z}
01 |{z}
000 |{z}
001 |{z}
01
G A T A C A T G C T
Prefix tree.
Represent encoding with prefix property as a binary tree, the prefix
tree:
A prefix tree T is a binary tree with the following properties:
I One leaf for symbol,
I Left edge labeled 0 and right edge labeled 1,
I Labels on the path from the root to a leaf specify the code for
that leaf.
For Σ = {A, T , G , C }

0 1

A
0 1

T
0 1

G C
Frequency.

To find an efficient code, first given a text S on Σ, with |S| = n,

first we must find the frequencies of the alphabet symbols.
∀x ∈ Σ, define the frequency
number occurrencies of x ∈ S
f (x) =
n
P
Notice: x∈Σ f (x) = 1.
Given a prefix code φ, which is the total length of the encoding?
The encoding length of S is
X X
B(S) = nf (x)|φ(x)| = n f (x)|φ(x)| .
x∈Σ x∈Σ
| {z }
α
P
Given φ, α = x∈Σ f (x)|φ(x)| is the average number of bits
required per symbol.
In terms of prefix tree of φ, given x and f (x), the length of the
codeword |φ(x)| is also the depth of x in T , let us denote it by
dx (T ).
P
Let B(T ) = x∈Σ f (x)dx (T ).
Example.
Let Σ = {a, b, c, d, e} and let S be a text over Σ.
Let f (a) = .32, f (b) = .25, f (c) = .20, f (d) = .18, f (e) = .05
If we use a fixed length code we need dlg 5e = 3 bits.
Consider the prefix-code φ1 :
φ1 (a) = 11, φ1 (b) = 01, φ1 (c) = 001, φ1 (d) = 10, φ1 (e) = 000

0 1

1 0 1
0

b d a
0 1

e c

α = .32 · 2 + .25 · 2 + .20 · 3 + .18 · 2 + .05 · 3 = 2.25

In average, φ1 reduces the bits per symbol over the fixed-length
code from 3 to 2.25, about 25%
Is that the maximum reduction?
Consider the prefix-code φ2 :
φ2 (a) = 11, φ2 (b) = 10, φ2 (c) = 01, φ2 (d) = 001, φ2 (e) = 000

0 1

1 0 1
0

c b a
0 1

e d

α = .32 · 2 + .25 · 2 + .20 · 2 + .18 · 3 + .05 · 3 = 2.23

is that the best? (the maximal compression)
Optimal prefix code.

Given a text, an optimal prefix code is a prefix code that minimizes

the total number of bits needed to encode the text.
Note that an optimal encoding minimizes α.
Intuitively, in the T of an optimal prefix code, symbols with high
frequencies should have small depth ans symbols with low
frequency should have large depth.
The search for an optimal prefix code is the search for a T , which
minimizes the α.
Characterization of optimal prefix trees.

A binary tree T is full if every interior node has two sons.

Lemma
The binary prefix tree corresponding to an optimal prefix code is
full.

Proof.
Let T be the prefix tree of an optimal code, and suppose it
contains a u with a son v .
If u is the root, construct T 0 by deleting u and using v com root.
T 0 will yield a code with less bits to code the symbols.
Contradiction to optimality of T .
If u is not the root, let w be the father of u. Construct T 0 by
deleting u and connecting directly v to w . Again this decreases the
number of bits, contradiction to optimality of T .
Greedy approach: Huffman code

Greedy approach due to David Huffman

(1925-99) in 1952, while he was a PhD student
at MIT

Wish to produce a labeled binary full tree, in which the leaves are
as close to the root as possible. Moreover symbols with low
frequency will be placed deeper than the symbol with high
frequency.
Greedy approach: Huffman code

I Given S assume we computed f (x) for every x ∈ Σ

I Sort the symbols by increasing f . Keep the dynamic sorted
list in a priority queue Q.
I Construct a tree in bottom-up fashion, take two first elements
of Q join them by a new virtual node with f the sum of the
f ’s of its sons, and place the new node in Q.
I When Q is empty, the resulting tree will be prefix tree of an
optimal prefix code.
Huffman Coding: Construction of the tree.

Huffman Σ, S
Given Σ and S {compute the frequencies {f }}
Construct priority queue Q of Σ, ordered by increasing f
while Q 6= ∅ do
create a new node z
x =Extract-Min (Q)
y =Extract-Min (Q)
make x, y the sons of z
f (z) = f (x) + f (y )
Insert (Q, z)
end while
If Q is implemented by a Heap, the algorithm has a complexity
O(n lg n).
Example

Consider the text: for each rose, a rose is a rose, the rose.
with Σ = {for/ each/ rose/ a/ is/ the/ ,/ b}
Frequencies: f (for) = 1/21, f (rose) = 4/21, f (is) = 1/21,
f (a) = 2/21, f (each) = 1/21, f (,) = 2/21, f (the) = 1/21,
f (b) = 9/21.
Priority Queue:
Q=(for(1/21), each(1/21), a(1/21), is(1/21), ,(2/21), the(2/21),
rose(4/21), b(9/21))

z1 (2/21)

for each

Q=(a(1/21), is(1/21), ,(2/21), the(2/21), z1(2/21), rose(4,21),

b(9/21))
Example.

z2 (2/21) z3 (4/21)
z4 (4/21)

z1 z2

a is , the
for each a is

z7 (21/21)
z5
0 1
z6
b z6
0 1
rose z3

z4 z5
z5 z4
0 1 0 1

, the z1 z2 rose z3
0 1 0 1 1
0

, the
for each a is
Example

Therefore for each rose, a rose is a rose, the rose is Huffman

codified:
10000100101101110010100110010110101001101110011110110
Notice with a fix code we will use 4 bits per symbol ⇒ 84 bits
instead of the 53 we use.
The solution is not unique!
Why does the Huffman’s algorithm produce an optimal prefix code?
Correctness.

Theorem (Greedy property)

Let Σ be an alphabet, and x, y two symbols with the lowest
frequency. Then, there is an optimal prefix code in which the code
for x and y have the same length and differ only in the last bit.

Proof.
For T optimal with a and b sibilings at max. depth. Assume
f (b) ≤ f (a). Construct T 0 by exchanging x with a and y with b.
As f (x) ≤ f (a) and f (y ) ≤ f (b) then B(T 0 ) ≤ B(T ).
Theorem (Optimal substructure)
Assume T 0 is an optimal prefix tree for (Σ − {x, y }) ∪ {z} where
x, y are symbols with lowest frequency, and z has frequency
f (x) + f (y ). The T obtained from T 0 by making x and y children
of z is an optimal prefix tree for Σ.

Proof.
Let T0 be any prefix tree for Σ. Must show B(T ) ≤ B(T0 ).
We only need to consider T0 where x and y are siblings. Let T00 be
obtained by removing x, y from T0 . As T00 is a prefix tree for
(Σ − {x, y }) ∪ {z}, then B(T00 ) ≥ B(T 0 ).
Comparing T0 with T00 we get,
B(T00 ) + f (x) + f (y ) = B(T0 ) and B(T 0 ) + f (x) + f (y ) = B(T ),
Putting together the three identities, we get B(T ) ≤ B(T0 ).
Optimality of Huffman

Huffman is optimal under assumptions:

I The compression is lossless, i.e. uncompressing the
compressed file yield the original file.
I We must know the alphabet beforehand (characters, words,
etc.)
I We must pre-compute the frequencies of symbols, i.e. read
the data twice
For certain applications is very slow (on the size n of the input
text)

Greedy Algorithms
No ratings yet
Greedy Algorithms
78 pages
Lecture5 IO BLG336E 2022
No ratings yet
Lecture5 IO BLG336E 2022
101 pages
Week9 GreedyAlgorithms
No ratings yet
Week9 GreedyAlgorithms
46 pages
Greedy DP
No ratings yet
Greedy DP
81 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Lec5 - Design and Analysis of Algorithms
No ratings yet
Lec5 - Design and Analysis of Algorithms
25 pages
03chapter Three Greedy-Method
No ratings yet
03chapter Three Greedy-Method
53 pages
OTG Recipes
No ratings yet
OTG Recipes
22 pages
Stress Management Shilpa
50% (2)
Stress Management Shilpa
19 pages
Lecture 5 A
No ratings yet
Lecture 5 A
18 pages
14 greedy-II
No ratings yet
14 greedy-II
40 pages
Lecture 12 Greedy Algorithms
No ratings yet
Lecture 12 Greedy Algorithms
40 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Unit-2 (Module-2)
No ratings yet
Unit-2 (Module-2)
16 pages
323 Lecture Notes 8 Part 1
No ratings yet
323 Lecture Notes 8 Part 1
25 pages
2.2.1 GreedyAlgorithms 2
No ratings yet
2.2.1 GreedyAlgorithms 2
78 pages
DAA Module 3
No ratings yet
DAA Module 3
18 pages
Job Scheduling1
No ratings yet
Job Scheduling1
43 pages
Activity
No ratings yet
Activity
17 pages
Unit-3 r23
No ratings yet
Unit-3 r23
54 pages
Greedy Method
No ratings yet
Greedy Method
43 pages
Greedy Algorithm
No ratings yet
Greedy Algorithm
19 pages
Book Download Why Is This Happening To Me Again 010709 Entire - Book - in - English
100% (3)
Book Download Why Is This Happening To Me Again 010709 Entire - Book - in - English
128 pages
TDesign and Analysis 4
No ratings yet
TDesign and Analysis 4
28 pages
5CS4-AOA-Unit-2 - PPT @zammers
No ratings yet
5CS4-AOA-Unit-2 - PPT @zammers
108 pages
DAA NOTES Unit 2
No ratings yet
DAA NOTES Unit 2
36 pages
Design and Analysis SAE
100% (1)
Design and Analysis SAE
186 pages
Lecture04 Greedy Scheduling
No ratings yet
Lecture04 Greedy Scheduling
47 pages
Ada Unit Iv
No ratings yet
Ada Unit Iv
13 pages
06dp Sched
No ratings yet
06dp Sched
18 pages
Greedy - Redefine Subproblem
No ratings yet
Greedy - Redefine Subproblem
5 pages
Greedy Algo
No ratings yet
Greedy Algo
6 pages
Co SC 3131 Chap 3
No ratings yet
Co SC 3131 Chap 3
53 pages
Lecture 11
No ratings yet
Lecture 11
22 pages
Greedy Method
No ratings yet
Greedy Method
35 pages
Film Theory
100% (1)
Film Theory
35 pages
Lecture06 07
No ratings yet
Lecture06 07
35 pages
2.1 GreedyAlgorithms
No ratings yet
2.1 GreedyAlgorithms
50 pages
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
No ratings yet
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
135 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
LAB 3 Triangle of Forces
67% (3)
LAB 3 Triangle of Forces
3 pages
Chapter Three
No ratings yet
Chapter Three
43 pages
Lect07 Greedy Sched
No ratings yet
Lect07 Greedy Sched
9 pages
Greedy Method
No ratings yet
Greedy Method
28 pages
CSL 356: Analysis and Design of Algorithms: Ragesh Jaiswal CSE, IIT Delhi
No ratings yet
CSL 356: Analysis and Design of Algorithms: Ragesh Jaiswal CSE, IIT Delhi
21 pages
Greedy RK
No ratings yet
Greedy RK
7 pages
Chapter Three
No ratings yet
Chapter Three
43 pages
Greedy Approach
No ratings yet
Greedy Approach
13 pages
2025050625 (2)
No ratings yet
2025050625 (2)
80 pages
Activity Selection
No ratings yet
Activity Selection
30 pages
Chap 3 Greedy
No ratings yet
Chap 3 Greedy
20 pages
Algorithm Design Methods: Greedy Method. Divide and Conquer. Dynamic Programming. Backtracking. Branch and Bound
No ratings yet
Algorithm Design Methods: Greedy Method. Divide and Conquer. Dynamic Programming. Backtracking. Branch and Bound
24 pages
Dynamic Programming
No ratings yet
Dynamic Programming
14 pages
Association Rules & Sequential Patterns
No ratings yet
Association Rules & Sequential Patterns
65 pages
(MAA 5.2) DERIVATIVES - BASIC RULES - Solutions
No ratings yet
(MAA 5.2) DERIVATIVES - BASIC RULES - Solutions
6 pages
Greedy Algorithms: 4.1 Interval Scheduling
No ratings yet
Greedy Algorithms: 4.1 Interval Scheduling
13 pages
Scheduling
No ratings yet
Scheduling
11 pages
Selling Skills MMS I NOTES
No ratings yet
Selling Skills MMS I NOTES
48 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Mergers and Merger Remedies in The EU
No ratings yet
Mergers and Merger Remedies in The EU
282 pages
DAA Unit 3
No ratings yet
DAA Unit 3
17 pages
3-Building Decision Trees Using SAS
No ratings yet
3-Building Decision Trees Using SAS
30 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Chapter 9: Interval Scheduling, Reservations, and Timetabling
No ratings yet
Chapter 9: Interval Scheduling, Reservations, and Timetabling
18 pages
1 Greedy Algorithms: 1.1 Activity Selection Problem
No ratings yet
1 Greedy Algorithms: 1.1 Activity Selection Problem
6 pages
Communicative Purpose of A Text
No ratings yet
Communicative Purpose of A Text
2 pages
Efficient Mining of Top-K Sequential Rules: Philippe Fournier-Viger
No ratings yet
Efficient Mining of Top-K Sequential Rules: Philippe Fournier-Viger
21 pages
Ready Reckoner - Domain Skilling Final
No ratings yet
Ready Reckoner - Domain Skilling Final
70 pages
Topic 1: Introduction To Greedy Algorithm
No ratings yet
Topic 1: Introduction To Greedy Algorithm
5 pages
Greedy Note
No ratings yet
Greedy Note
20 pages
Cfe Bbva Price Strategy Optimization
No ratings yet
Cfe Bbva Price Strategy Optimization
16 pages
Mba Clickstream
No ratings yet
Mba Clickstream
13 pages
GM PDF
No ratings yet
GM PDF
11 pages
Principal Component Analysis vs. Exploratory Factor Analysis
No ratings yet
Principal Component Analysis vs. Exploratory Factor Analysis
11 pages
1 Interval Scheduling: Maximize Intervals Scheduled: 1.1 Problem
No ratings yet
1 Interval Scheduling: Maximize Intervals Scheduled: 1.1 Problem
9 pages
Manoj Kumar P B Data Associate Resume 3
No ratings yet
Manoj Kumar P B Data Associate Resume 3
1 page
Advanced Forecasting Models Using Sas Software
No ratings yet
Advanced Forecasting Models Using Sas Software
10 pages
Greedy Methods
No ratings yet
Greedy Methods
14 pages
COD Management
No ratings yet
COD Management
8 pages
Hall and Wayman 1990
No ratings yet
Hall and Wayman 1990
25 pages
Scheduling PDF
No ratings yet
Scheduling PDF
6 pages
Chapter 16 Greedy Algorithms
No ratings yet
Chapter 16 Greedy Algorithms
4 pages
PROC SQL - The Dark Side of SAS ?: Kirsty Lauderdale, PRA International, Victoria, BC
No ratings yet
PROC SQL - The Dark Side of SAS ?: Kirsty Lauderdale, PRA International, Victoria, BC
5 pages
NRMCA Plant Certification: Items The Company Should Have Ready Prior To Requesting An Inspection
No ratings yet
NRMCA Plant Certification: Items The Company Should Have Ready Prior To Requesting An Inspection
2 pages
Fundamental Algorithms, Assignment 8
No ratings yet
Fundamental Algorithms, Assignment 8
3 pages
Programming Abdominal Training Part II
100% (1)
Programming Abdominal Training Part II
4 pages
CS586 HW2 Solution
No ratings yet
CS586 HW2 Solution
8 pages
Portfolio Second Language Acquisition
No ratings yet
Portfolio Second Language Acquisition
35 pages
Elements of Mechanical Engineering
No ratings yet
Elements of Mechanical Engineering
5 pages
Matrix Theory and Linear Algebra
100% (1)
Matrix Theory and Linear Algebra
4 pages
КТЖ 10 Action 68 Сағат Жаңа
No ratings yet
КТЖ 10 Action 68 Сағат Жаңа
9 pages
Flow
No ratings yet
Flow
11 pages
Networking in Sensor: Rajiv Shrivastava Roll No. 15MP08 ME Manufacturing 2 Sem
No ratings yet
Networking in Sensor: Rajiv Shrivastava Roll No. 15MP08 ME Manufacturing 2 Sem
11 pages
What Is It - Docx Activity 3 Philosophy PAUL ANDRIE A. COSO
No ratings yet
What Is It - Docx Activity 3 Philosophy PAUL ANDRIE A. COSO
4 pages
A Study Conducted To Investigate The Feasibility of Recycling Commingled Plastics Fiber in Concrete
No ratings yet
A Study Conducted To Investigate The Feasibility of Recycling Commingled Plastics Fiber in Concrete
8 pages
Mastering Software Development in R
No ratings yet
Mastering Software Development in R
391 pages
Vernacular Arch
No ratings yet
Vernacular Arch
59 pages
All Papers Will Be Published in Peer Reviewed Journals: 7 July 2019
No ratings yet
All Papers Will Be Published in Peer Reviewed Journals: 7 July 2019
2 pages
Standard Jmit CV - PG
No ratings yet
Standard Jmit CV - PG
2 pages
Observation Task 4-b
No ratings yet
Observation Task 4-b
2 pages

Greedy Algorithms and Data Compression.: Curs Fall 2017

Uploaded by

Greedy Algorithms and Data Compression.: Curs Fall 2017

Uploaded by

Greedy Algorithms and Data Compression.

Curs Fall 2017

I At each step we choose the best choice at the moment and

For the greedy strategy to work correctly, it is necessary that the

Greedy for fractional knapsack (I , V , W )

A set of activities S = {1, 2, . . . , n} to be processed by a single

Activity Selection Problem

solution 3185 is as valid as 3785. If we were asking for maximum

The previous greedy does not always solve the problem!

Goal: schedule the jobs to minimize lateness

(2.-) Sort in increasing order of di − ti :

In this case, job 2 should be processed first, which doesn’t

(3.-) Sort in increasing order of di .

There exists an optimal schedule with no idle steps.

Notice the output S produced by LatenessA has no inversions and

INPUT: An edge weighted graph G = (V , E ),

The cut-set of a cut is the set of 3

Given a MST T in G , with different edge weights, T has the

The MST algorithms are methods for ruling edges in or out of G .

Theorem (The cut rule)

must give a spanning tree of equal or 3

Theorem (The cycle rule)

The Min. Spanning Tree problem has the optimal substructure

The Min. Spanning Tree problem has the optimal substructure

The Min. Spanning Tree problem has the optimal substructure

The Min. Spanning Tree problem has the optimal substructure

selected a non-colored edge in C with max 3

weight and paint it red.

The Min. Spanning Tree problem has the optimal substructure

rule, otherwise color red e. 2

There has been extensive work to obtain the most efficient

Jarnik−Prim: How blue man can spread his message to everybody

Jarnik−Prim: How blue man can spread his message to everybody

V. Jarnı́k, 1936, R. Prim, 1957

Time: depends on the implementation of Q.

Notice that Kruskal evolves by building clumps of connected nodes

Given {S1 , . . . , Sk }, use double links

I Make-set (x): Initializes x as a lone list. Worst time Θ(1).

Can we do it more efficient?

Represent each set as a tree of elements:

In this course: rank(x) = height of subtree rooted at x.

• Union (x, y ): climbs to the roots of the tree containing x and y

Maintain an integer rank for each node, initially 0. Link root of

Given an optimization problem, an α-approximation algorithm Apx

α is denotes as the approximation ratio.

Given a graph G = (V , E ) with |V | = n, |E | = m find the

Given a graph G = (V , E ) with |V | = n, |E | = m find the

Given a graph G = (V , E ) with |V | = n, |E | = m find the

Given a graph G = (V , E ) with |V | = n, |E | = m find the

|Apx(e)| = |Apx(e) − {u, v }| + 2 ≤ 2|OPT (e) − {u, v }| + 2

The decision problem for Vertex Cover is NP-complet. Moreover,

Clustering: process of finding interesting structure in a set of data.

Given as input a set of X = {x1 , . . . , xn }, with distances

For k > 2, the decision version of the k-centre clustering problem

The algorithm iterates k times, at each iteration choose a new

Given X , k = 3 and the n2 distance vector D:

I Therefore at each step, to compute ri we need to update (∗).

Case 1: Every Cj∗ covers at least one ci .

INPUT: Given a text T over an finite

The goal of data compression is to

I Fixed-length encoding: A = 00 ,C = 01, G = 10 and T = 11.

Given a set of symbols Σ, a prefix code, is φ : Σ → {0, 1}+

To find an efficient code, first given a text S on Σ, with |S| = n,

α = .32 · 2 + .25 · 2 + .20 · 3 + .18 · 2 + .05 · 3 = 2.25

α = .32 · 2 + .25 · 2 + .20 · 2 + .18 · 3 + .05 · 3 = 2.23

Given a text, an optimal prefix code is a prefix code that minimizes

A binary tree T is full if every interior node has two sons.

Greedy approach due to David Huffman

I Given S assume we computed f (x) for every x ∈ Σ

Q=(a(1/21), is(1/21), ,(2/21), the(2/21), z1(2/21), rose(4,21),

Therefore for each rose, a rose is a rose, the rose is Huffman

Theorem (Greedy property)

Huffman is optimal under assumptions:

You might also like