0% found this document useful (0 votes)
7 views21 pages

l1 Ext Slides

The document discusses ℓ1-norm methods for convex-cardinality problems, focusing on total variation reconstruction, which minimizes the difference between a piecewise constant estimate and a corrupted signal while controlling the number of jumps. It also introduces an iterated weighted ℓ1 heuristic to improve solutions for sparse linear inequalities and time series models. Additionally, it covers extensions to matrix rank problems and factor modeling, utilizing nuclear norms and trace heuristics for approximation.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

l1 Ext Slides

The document discusses ℓ1-norm methods for convex-cardinality problems, focusing on total variation reconstruction, which minimizes the difference between a piecewise constant estimate and a corrupted signal while controlling the number of jumps. It also introduces an iterated weighted ℓ1 heuristic to improve solutions for sparse linear inequalities and time series models. Additionally, it covers extensions to matrix rank problems and factor modeling, utilizing nuclear norms and trace heuristics for approximation.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ℓ1-norm Methods for

Convex-Cardinality Problems
Part II

• total variation
• iterated weighted ℓ1 heuristic
• matrix rank constraints

EE364b, Stanford University


Total variation reconstruction
• fit xcor with piecewise constant x̂, no more than k jumps

• convex-cardinality problem: minimize kx̂ − xcork2 subject to


card(Dx) ≤ k (D is first order difference matrix)

• heuristic: minimize kx̂ − xcork2 + γkDxk1; vary γ to adjust number of


jumps

• kDxk1 is total variation of signal x̂

• method is called total variation reconstruction

• unlike ℓ2 based reconstruction, TVR filters high frequency noise out


while preserving sharp jumps

EE364b, Stanford University 1


Example (§6.3.3 in BV book)
signal x ∈ R2000 and corrupted signal xcor ∈ R2000
2

0
x

−1

−2
0 500 1000 1500 2000

1
xcor

−1

−2
0 500 1000 1500 2000

EE364b, Stanford University 2


Total variation reconstruction
for three values of γ
2

0

−2
0 500 1000 1500 2000
2

0

−2
0 500 1000 1500 2000
2

0

−2
0 500 1000 1500 2000

EE364b, Stanford University 3


ℓ2 reconstruction
for three values of γ
2

0

−2
0 500 1000 1500 2000
2

0

−2
0 500 1000 1500 2000
2

0

−2
0 500 1000 1500 2000

EE364b, Stanford University 4


Example: 2D total variation reconstruction
• x ∈ Rn are values of pixels on N × N grid (N = 31, so n = 961)

• assumption: x has relatively few big changes in value (i.e., boundaries)

• we have m = 120 linear measurements, y = F x (Fij ∼ N (0, 1))

• as convex-cardinality problem:

minimize card(xi,j − xi+1,j ) + card(xi,j − xi,j+1)


subject to y = F x

• ℓ1 heuristic (objective is a 2D version of total variation)


P P
minimize |xi,j − xi+1,j | + |xi,j − xi,j+1|
subject to y = F x

EE364b, Stanford University 5


TV reconstruction
original TV reconstruction

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5
0 0
5 5
10 30 10 30
15 25 15 25
20 20 20 20
15 15
25 25
10 10
30 30
5 5
35 35

. . . not bad for 8× more variables than measurements!

EE364b, Stanford University 6


ℓ2 reconstruction
original ℓ2 reconstruction

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5
0 0
5 5
10 30 10 30
15 25 15 25
20 20 20 20
15 15
25 25
10 10
30 30
5 5
35 35

. . . this is what you’d expect with 8× more variables than measurements

EE364b, Stanford University 7


Iterated weighted ℓ1 heuristic

• to minimize card(x) over x ∈ C


w := 1
repeat
minimize k diag(w)xk1 over x ∈ C
wi := 1/(ǫ + |xi|)

• first iteration is basic ℓ1 heuristic


• increases relative weight on small xi
• typically converges in 5 or fewer steps
• often gives a modest improvement (i.e., reduction in card(x)) over
basic ℓ1 heuristic

EE364b, Stanford University 8


Interpretation
• wlog we can take x  0 (by writing x = x+ − x−, x+, x−  0, and
replacing card(x) with card(x+) + card(x−))

• we’ll use approximation card(z) ≈ log(1 + z/ǫ), where ǫ > 0, z ∈ R+

• using this approximation, we get (nonconvex) problem


Pn
minimize i=1 log(1 + xi /ǫ)
subject to x ∈ C, x  0

• we’ll find a local solution by linearizing objective at current point,


n n n (k)
X X (k)
X xi − x i
log(1 + xi/ǫ) ≈ log(1 + xi /ǫ) + (k)
i=1 i=1 i=1 ǫ+ xi

EE364b, Stanford University 9


and solving resulting convex problem
Pn
minimize i=1 wi xi
subject to x ∈ C, x  0

with wi = 1/(ǫ + xi), to get next iterate

• repeat until convergence to get a local solution

EE364b, Stanford University 10


Sparse solution of linear inequalities
• minimize card(x) over polyhedron {x | Ax  b}, A ∈ R100×50
• ℓ1 heuristic finds x ∈ R50 with card(x) = 44
• iterated weighted ℓ1 heuristic finds x with card(x) = 36
(global solution, via branch & bound, is card(x) = 32)
50

40
card(x)

30

20

10
iterated ℓ1
ℓ1
0
1 2 3 4 5 6

iteration

EE364b, Stanford University 11


Detecting changes in time series model
• AR(2) scalar time-series model

y(t + 2) = a(t)y(t + 1) + b(t)y(t) + v(t), v(t) IID N (0, 0.52)

• assumption: a(t) and b(t) are piecewise constant, change infrequently


• given y(t), t = 1, . . . , T , estimate a(t), b(t), t = 1, . . . , T − 2
• heuristic: minimize over variables a(t), b(t), t = 1, . . . , T − 1
PT −2
(y(t + 2) − a(t)y(t + 1) − b(t)y(t))2
t=1
PT −2
+γ t=1 (|a(t + 1) − a(t)| + |b(t + 1) − b(t)|)

• vary γ to trade off fit versus number of changes in a, b

EE364b, Stanford University 12


Time series and true coefficients

3
1

2 0.8
0.6

1 0.4
0.2
y(t)

0 0
−0.2 b(t)
−1 −0.4
−0.6
a(t)
−2 −0.8
−1
−3
0 50 100 150 200 250 300 50 100 150 200 250 300

t t

EE364b, Stanford University 13


TV heuristic and iterated TV heuristic

left: TV with γ = 10; right: iterated TV, 5 iterations, ǫ = 0.005

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1

50 100 150 200 250 300 50 100 150 200 250 300

t t

EE364b, Stanford University 14


Extension to matrices

• Rank is natural analog of card for matrices

• convex-rank problem: convex, except for Rank in objective or


constraints

• rank problem reduces to card problem when matrices are diagonal:


Rank(diag(x)) = card(x)
P
• analog of ℓ1 heuristic: use nuclear norm, kXk∗ = i σi (X)
(sum of singular values; dual of spectral norm)

• for X  0, reduces to Tr X (for x  0, kxk1 reduces to 1T x)

EE364b, Stanford University 15


Factor modeling
• given matrix Σ ∈ Sn+, find approximation of form Σ̂ = F F T + D, where
F ∈ Rn×r , D is diagonal nonnegative
• gives underlying factor model (with r factors)

x = F z + v, v ∼ N (0, D), z ∼ N (0, I)

• model with fewest factors:

minimize Rank X
subject to X  0, D  0 diagonal
X +D ∈C

with variables D, X ∈ Sn
C is convex set of acceptable approximations to Σ

EE364b, Stanford University 16


• e.g., via KL divergence

C = {Σ̂ | − log det(Σ−1/2Σ̂Σ−1/2) + Tr(Σ−1/2Σ̂Σ−1/2) − n ≤ ǫ}

• trace heuristic:

minimize Tr X
subject to X  0, D  0 diagonal
X +D ∈C

with variables d ∈ Rn, X ∈ Sn

EE364b, Stanford University 17


Example
• x = F z + v, z ∼ N (0, I), v ∼ N (0, D), D diagonal; F ∈ R20×3

• Σ is empirical covariance matrix from N = 3000 samples

• set of acceptable approximations

C = {Σ̂ | kΣ−1/2(Σ̂ − Σ)Σ−1/2k ≤ β}

• trace heuristic
minimize Tr X
subject to X  0, d  0
kΣ−1/2(X + diag(d) − Σ)Σ−1/2k ≤ β

EE364b, Stanford University 18


Trace approximation results

2
16 10

14

0
12 10
Rank(X)

λi(X)
10

−2
8 10

−4
4 10

2 −2 −1 0 −2 −1 0
10 10 10 10 10 10

β β

EE364b, Stanford University 19


• for β = 0.1357 (knee of the tradeoff curve) we find
T
range(X), range(F F ) = 6.8◦

– 6
– kd − diag(D)k/k diag(D)k = 0.07

• i.e., we have recovered the factor model from the empirical covariance

EE364b, Stanford University 20

You might also like