Data Science - Convex Optimization and Examples PDF
Data Science - Convex Optimization and Examples PDF
Summary
1.2 Several examples
We begin by some illustrations in challenging topics in modern data
science. Then, this session introduces (or reminds) some basics on Spam detection From a set of labelled messages (spam or not), build a clas-
optimization, and illustrate some key applications in supervised clas- sification for automatic spam rejection.
sification.
1 Data Science
1.1 What is data science :
Extract from data some knowledge for industrial or academic exploitation.
It generally involves :
In its whole, this sequence of questions are at the core of Artificial Intel- ● Select among the words meaningful elements ?
ligence and may also be referred to as Computer Science problems. In this ● Automatic classification ?
lecture, we will address some issues raised in red items. Each time, practical
examples will be provided
Most of our motivation comes from the Big Data world, encountered in
image processing, finance, genetics and many other fields where knowledge Gene expression profiles analysis One measures micro-array datasets built
extraction is needed, when facing many observations described by many va- from a huge amount of profile genes expression. Number of genes p (of order
riables. thousands). Number of samples n (of order hundred).
2 Data Science - Convex optimization and application
Credit scoring Build an indicator (Q score) from a dataset for the probability
of interest in a financial product (Visa premier credit card).
Recommandation problems
2.4 Why convexity is powerful ? X ⊂ Rd . The most famous local descent method relies on
Two kinds of optimization problems : yt+1 = xt − ηgt where gt ∈ ∂f (xt ),
and
xt+1 = ΠX (yt+1 ),
where η > 0 is a fixed step-size parameters.
This set of subgradients may be empty. Fortunately, it is not the case for convex
T HÉORÈME 6. — [Convergence of the projected gradient descent method,
functions.
fixed step-size] If f is convex over X with X ⊂ B(0, R) and ∥∂f ∥∞ ≤ L,
√
P ROPOSITION 5. — f ∶ Rd Ð→ R is convex if and only if ∂f (x) ≠ ∅ for any the choice η = L t leads to
R
x of Rd .
1 t RL
f ( ∑ xs ) − min f ≤ √
3 Gradient descent method t s=1 t
√
T HÉORÈME 8. — [Convergence of the gradient descent method, β smooth Ω : circle of radius 2
function] If f is a convex and β-smooth function, then η = β1 leads to Optimal solution : θ = (−1, −1)t and J(θ⋆ ) = −2.
⋆
3.5.2 KKT Conditions ● The dual function L is lower than p∗ , for any (λ, µ) ∈ Rn × Rm
+
● We aim to make this lower bound as close as possible to p∗ : idea to
D ÉFINITION 12. — [KKT Conditions] If J and f, g are smooth, we define the maximize w.r.t. λ, µ the function L.
Karush-Kuhn-Tucker (KKT) conditions as
● Stationarity : ∇θ L(λ, µ, θ) = 0. D ÉFINITION 15. — [Dual problem]
● Primal Admissibility : f (θ) = 0 and g(θ) ≤ 0.
max L(λ, µ).
● Dual admissibility µj ≥ 0, ∀j = 1, . . . , m. λ∈Rn ,µ∈Rm
+
T HÉORÈME 13. — A convex minimization problem of J under convex L(θ, λ, µ) affine function on λ, µ and thus convex. Hence, L is convex and
constraints f and g has a solution θ∗ if and only if there exists λ∗ and µ∗ almost unconstrained.
such that KKT conditions hold. ● Dual problems are easier than primal ones (because of almost constraints
omissions).
Example : ● Dual problems are equivalent to primal ones : maximization of the dual
⇔ minimization of the primal (not shown in this lecture).
1 ● Dual solutions permit to recover primal ones with KKT conditions (La-
J(θ) = ∥θ∥22 s.t. θ1 − 2θ2 + 2 ≤ 0 grange multipliers).
2
∥θ∥2
We get L(θ, µ) = 2 2 + µ(θ1 + 2θ2 + 2) with µ ≥ 0.
Stationarity : (θ1 + µ, θ2 − 2µ) = 0.
θ2 = −2θ1 with θ2 ≤ 0.
We deduce that θ∗ = (−2/5, 4/5).
3.5.3 Dual function
We introduce the dual function :
Example :
L(λ, µ) = min L(θ, λ, µ).
θ 2 +θ 2
θ
● Lagrangian : L(θ, µ) = 1 2 2 + µ(θ1 − 2θ2 + 2).
We have the following important result ● Dual function L(µ) = minθ L(θ, µ) = − 52 µ2 + 2µ.
● Dual solution : max L(µ) such that µ ≥ 0 : µ = 2/5.
T HÉORÈME 14. — Denote the optimal value of the constrained problem p∗ = ● Primal solution : KKT Ô⇒ θ = (−µ, 2µ) = (−2/5, 4/5).
min {J(θ)∣f (θ) = 0, g(θ) ≤ 0}, then To obtain further details, see the Minimax von Neuman’s Theorem . . .
● Many Big Data problems will be traduced in an optimization of a convexin the 4/5 pages of the report.
problem.
● Efficient algorithms are available to optimize them : The report files should be named lastname.doc or lastname.pdf and expected
independently on the dimension of the underlying space. in my mailbox before 8th of February.
● Primal - Dual formulations are important to overcome some constraints
on the optimization. And to do this, anything is fair game (you can do what you want and find
● Numerical convex solvers are widely and freely distributed.
sources everywhere, but take care to avoid a plagiat !)