0% found this document useful (0 votes)
6 views

training_intro_learning

Uploaded by

anh thu Tran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

training_intro_learning

Uploaded by

anh thu Tran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Machine learning ISUP - Sorbonne Université

Introduction to machine learning

1 Warm-up: Bayes classier for scalar Gaussian mixtures


Let (Xi , Yi )1⩽i⩽n be independent variables in R × {0, 1}. Assume that P(Y1 = 0) = 1/2. Assume
also that the distribution of X1 given {Y1 = 0} (resp. {Y1 = 1}) is Gaussian with mean µ0 (resp.
µ1 ) and variance 1. The probability density function of X1 is written g . Write

g0 : x 7→ (2π)−1/2 exp(−(x − µ0 )2 /2) and g1 : x 7→ (2π)−1/2 exp(−(x − µ1 )2 /2) .

Figure 1: Samples and density when µ0 = −2 et µ1 = 0 (left) and µ0 = −2 and µ1 = 2 (right).

1. Provide an expression of a classier h∗ minimizing h 7→ P(h(X) ̸= Y ).


2. Using Bayes rule, show that h∗ depends only on g1 /g0 .
3. Show that the Bayes classier uses the mean between µ0 and µ1 to classify samples.

2 Bayes classier
2.1 Uniform distributions
Assume that (X, Y ) ∈ R × {0, 1} is dened on (Ω, F, P) with P(Y = 1) = π ∈ (0, 1). Assume that
conditionally on {Y = 0} (resp. {Y = 1}) X has a uniform distribution on [0, θ] with θ ∈ (0, 1)
(resp. on [0, 1]). Compute η(X) = P(Y = 1|X).

2.2 Weighted risk


Assume that (X, Y ) ∈ R × {0, 1} is dened on (Ω, F, P). Using ω0 , ω1 > 0, with ω0 + ω1 = 1, we
consider the weighted risk:
R(h) = E[2ωY 1Y ̸=h(X) ] .
Compute a classier h∗ minimizing h 7→ R(h) and R(h∗ ).

1
3 Additional exercises
3.1 Bayes classier: excess risk
Let (X, Y ) ∈ Rd × {0, 1} be random variables dened on the same probability space (Ω, F, P). For
any classier h : X → {0, 1}, dene its classication error by
R(h) = P(Y ̸= h(X)) .

The classier h∗ dened by:


h∗ (x) = sign(η(x) − 1/2) ,
where
η(X) = P(Y = 1|X) ,
minimizes h 7→ R(h).
1. Prove that
1
R(h∗ ) = E [η(X) ∧ (1 − η(X))] ⩽ .
2
2. Prove that for all classiers h, the excess risk is given by
R(h) − R(h∗ ) = E [|1 − 2η(X)| |h(X) − h∗ (X)|] .

3.2 Plug-in classier


Let (X, Y ) ∈ Rd × {−1, 1} be random variables dened on the same probability space (Ω, F, P).
For any classier h : X → {−1, 1}, dene its classication error by
R(h) = P(Y ̸= h(X)) .

The classier h∗ dened by:


h∗ (x) = sign(η(x) − 1/2) ,
where
η(X) = P(Y = 1|X) ,
minimizes h 7→ R(h). Given n independent couples {(Xi , Yi )}1⩽i⩽n with the same distribution as
(X, Y ), an empirical surrogate for h∗ is obtained from a possibly nonparametric estimator ηbn of
η:
hn : x 7→ sign(b
b ηn (x) − 1/2) .

1. Prove that for any classier h : X → {−1, 1},


P(Y ̸= h(X)|X) = (2η(X) − 1)1h(X)=−1 + 1 − η(X)

and  
1
R(h) − R(h∗ ) = 2E η(X) − 1h(X)̸=h∗ (X) .
2

2. Prove that
|η(x) − 1/2|1bhn (x)̸=h∗ (x) ⩽ |η(x) − ηbn (x)|1bhn (x)̸=h∗ (x) ,
where
hn : x 7→ sign(b
b ηn (x) − 1/2) .
Deduce that
hn ) − R(h∗ ) ⩽ 2E[|η(X) − ηbn (X)|2 ]1/2 .
R(b

You might also like