0% found this document useful (0 votes)
6 views

training_intro_learning

Uploaded by

anh thu Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

training_intro_learning

Uploaded by

anh thu Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Machine learning ISUP - Sorbonne Université

Introduction to machine learning

1 Warm-up: Bayes classier for scalar Gaussian mixtures


Let (Xi , Yi )1⩽i⩽n be independent variables in R × {0, 1}. Assume that P(Y1 = 0) = 1/2. Assume
also that the distribution of X1 given {Y1 = 0} (resp. {Y1 = 1}) is Gaussian with mean µ0 (resp.
µ1 ) and variance 1. The probability density function of X1 is written g . Write

g0 : x 7→ (2π)−1/2 exp(−(x − µ0 )2 /2) and g1 : x 7→ (2π)−1/2 exp(−(x − µ1 )2 /2) .

Figure 1: Samples and density when µ0 = −2 et µ1 = 0 (left) and µ0 = −2 and µ1 = 2 (right).

1. Provide an expression of a classier h∗ minimizing h 7→ P(h(X) ̸= Y ).


2. Using Bayes rule, show that h∗ depends only on g1 /g0 .
3. Show that the Bayes classier uses the mean between µ0 and µ1 to classify samples.

2 Bayes classier
2.1 Uniform distributions
Assume that (X, Y ) ∈ R × {0, 1} is dened on (Ω, F, P) with P(Y = 1) = π ∈ (0, 1). Assume that
conditionally on {Y = 0} (resp. {Y = 1}) X has a uniform distribution on [0, θ] with θ ∈ (0, 1)
(resp. on [0, 1]). Compute η(X) = P(Y = 1|X).

2.2 Weighted risk


Assume that (X, Y ) ∈ R × {0, 1} is dened on (Ω, F, P). Using ω0 , ω1 > 0, with ω0 + ω1 = 1, we
consider the weighted risk:
R(h) = E[2ωY 1Y ̸=h(X) ] .
Compute a classier h∗ minimizing h 7→ R(h) and R(h∗ ).

1
3 Additional exercises
3.1 Bayes classier: excess risk
Let (X, Y ) ∈ Rd × {0, 1} be random variables dened on the same probability space (Ω, F, P). For
any classier h : X → {0, 1}, dene its classication error by
R(h) = P(Y ̸= h(X)) .

The classier h∗ dened by:


h∗ (x) = sign(η(x) − 1/2) ,
where
η(X) = P(Y = 1|X) ,
minimizes h 7→ R(h).
1. Prove that
1
R(h∗ ) = E [η(X) ∧ (1 − η(X))] ⩽ .
2
2. Prove that for all classiers h, the excess risk is given by
R(h) − R(h∗ ) = E [|1 − 2η(X)| |h(X) − h∗ (X)|] .

3.2 Plug-in classier


Let (X, Y ) ∈ Rd × {−1, 1} be random variables dened on the same probability space (Ω, F, P).
For any classier h : X → {−1, 1}, dene its classication error by
R(h) = P(Y ̸= h(X)) .

The classier h∗ dened by:


h∗ (x) = sign(η(x) − 1/2) ,
where
η(X) = P(Y = 1|X) ,
minimizes h 7→ R(h). Given n independent couples {(Xi , Yi )}1⩽i⩽n with the same distribution as
(X, Y ), an empirical surrogate for h∗ is obtained from a possibly nonparametric estimator ηbn of
η:
hn : x 7→ sign(b
b ηn (x) − 1/2) .

1. Prove that for any classier h : X → {−1, 1},


P(Y ̸= h(X)|X) = (2η(X) − 1)1h(X)=−1 + 1 − η(X)

and  
1
R(h) − R(h∗ ) = 2E η(X) − 1h(X)̸=h∗ (X) .
2

2. Prove that
|η(x) − 1/2|1bhn (x)̸=h∗ (x) ⩽ |η(x) − ηbn (x)|1bhn (x)̸=h∗ (x) ,
where
hn : x 7→ sign(b
b ηn (x) − 1/2) .
Deduce that
hn ) − R(h∗ ) ⩽ 2E[|η(X) − ηbn (X)|2 ]1/2 .
R(b

You might also like