Details on the Positive Group Lasso

This tutorial presents how to derive the proximity operator and subdifferential of the l_2-penalty, and the l_2-penalty with nonnegative constraints.

Proximity operator of the group Lasso

Let

g:x \mapsto \norm{x}_2
,

then its Fenchel-Legendre conjugate is

g^{\star}:x \mapsto i_{\norm{x}_2 \leq 1}
,

and for all x \in \mathbb{R}^p

\text{prox}_{g^{\star}}(x)
=
\text{proj}_{\mathcal{B}_2}(x) = \frac{x}{\max(\norm{x}_2, 1)}
.

Using the Moreau decomposition, Equations :eq:`fenchel` and :eq:`prox_projection`, one has

\text{prox}_{\lambda g}(x)
=
x
- \lambda \text{prox}_{g^\star/\lambda }(x/\lambda)

= x
- \lambda \text{prox}_{g^\star}(x/\lambda)

= x
- \lambda  \frac{x/\lambda}{\max(\norm{x/\lambda}_2, 1)}

= x
- \frac{\lambda x}{\max(\norm{x}_2, \lambda)}

= (1 - \frac{\lambda}{\norm{x}})_{+} x
.

A similar formula can be derived for the group Lasso with nonnegative constraints.

Proximity operator of the group Lasso with positivity constraints

Let

h:x \mapsto \norm{x}_2
+ i_{x \geq 0}
.

Let x \in \mathbb{R}^p and S = \{ j \in 1, ..., p | x_j > 0 \} \in \mathbb{R}^p, then

h^{\star} :x  \mapsto i_{\norm{x_S}_2 \leq 1}
,

and

\text{prox}_{h^{\star}}(x)_{S^c}
=
x_{S^c}

\text{prox}_{h^{\star}}(x)_S
=
\text{proj}_{\mathcal{B}_2}(x_S) = \frac{x_S}{\max(\norm{x_S}_2, 1)}
.

As before, using the Moreau decomposition and Equation :eq:`fenchel_nn` yields

\text{prox}_{\lambda h}(x)
=
x
- \lambda \text{prox}_{h^\star / \lambda }(x/\lambda)

= x
- \lambda \text{prox}_{h^\star}(x/\lambda)
,

and thus, combined with Equations :eq:`prox_projection_nn_Sc` and :eq:`prox_projection_nn_S` it leads to

\text{prox}_{\lambda h}(x)_{S^c} = 0

\text{prox}_{\lambda h}(x)_{S}
=
(1 - \frac{\lambda}{\norm{x_S}})_{+} x_S
.

Subdifferential of the positive Group Lasso penalty

For the subdiff_diff working set strategy, we compute the distance D(v) for some v to the subdifferential of the h penalty at a point w. Since the penalty is group-separable, we reduce the case where w is a block of variables in \mathbb{R}^g.

Case `w \notin \mathbb{R}_+^g`

If any component of w is strictly negative, the subdifferential is empty, and the distance is + \infty.

D(v) = + \infty, \quad \forall v \in \mathbb{R}^g
.

Case `w = 0`

At w = 0, the subdifferential is:

\lambda \partial || \cdot ||_2 + \partial \iota_{x \geq 0} = \lambda \mathcal{B}_2 + \mathbb{R}_-^g
,

where \mathcal{B}_2 is the unit ball.

Therefore, the distance to the subdifferential writes

D(v) = \min_{u \in \lambda \mathcal{B}_2, n \in \mathbb{R}_{-}^g} \ || u + n - v ||
.

Minimizing over n then over u, thanks to [1], yields

D(v) = \max(0, ||v^+|| - \lambda)
,

where v^+ is v restricted to its positive coordinates. Intuitively, it is clear that if v_i < 0, we can cancel it exactly in the objective function by taking n_i = - v_i and u_i = 0; on the other hand, if v_i>0, taking a non zero n_i will only increase the quantity that u_i needs to bring closer to 0.

For a rigorous derivation of this, introduce the Lagrangian on a squared objective

\mathcal{L}(u, n, \nu, \mu) =
\frac{1}{2}\norm{u + n - v}^2 + \nu(\frac{1}{2} \norm{u}^2 - \lambda^2 / 2) + \langle \mu, n \rangle
,

and write down the optimality condition with respect to u and n. Treat the case nu = 0 separately; in the other case show that :math:u must be positive, and that v = (1 + \nu) u + n, together with u = \mu / \nu and complementary slackness, to reach the conclusion.

Case `|| w || \ne 0`

The subdifferential in that case is \lambda w / {|| w ||} + C_1 \times \ldots \times C_g where C_j = {0} if w_j > 0 and C_j = mathbb{R}_- otherwise (w_j =0).

By letting p denotes the projection of v onto this set, one has

p_j = \lambda \frac{w_j}{||w||}  \text{ if }  w_j > 0

and

p_j = \min(v_j, 0)  \text{ otherwise}.

The distance to the subdifferential is then:

D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda \frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2

since v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j).

References

[1] https://math.stackexchange.com/a/2887332/167258

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prox_nn_group_lasso.rst

prox_nn_group_lasso.rst

Details on the Positive Group Lasso

Proximity operator of the group Lasso

Proximity operator of the group Lasso with positivity constraints

Subdifferential of the positive Group Lasso penalty

Case `w \notin \mathbb{R}_+^g`

Case `w = 0`

Case `|| w || \ne 0`

References

Files

prox_nn_group_lasso.rst

Latest commit

History

prox_nn_group_lasso.rst

File metadata and controls

Details on the Positive Group Lasso

Proximity operator of the group Lasso

Proximity operator of the group Lasso with positivity constraints

Subdifferential of the positive Group Lasso penalty

Case w \notin \mathbb{R}_+^g

Case w = 0

Case || w || \ne 0

References

Case `w \notin \mathbb{R}_+^g`

Case `w = 0`

Case `|| w || \ne 0`