Overview of Supervised Learning
Overview of Supervised Learning
Learning
Outline
• Linear Regression and Nearest Neighbors method
• Statistical Decision Theory
• Local Methods in High Dimensions
• Statistical Models, Supervised Learning and
Function Approximation
• Structured Regression Models
• Classes of Restricted Estimators
• Model Selection and Bias
对EPE逐点极小化得
f ( x ) arg min c EY | X ([Y c ]2 | X x )
极小解为:
f ( x) E (Y | X x)
18/10/25 Overview of Supervised Learning 16
Case 2: Qualitative output G:
•Suppose our prediction rule is Gˆ( X, )and G and Gˆ( Xtake ) values in
G card
, with (G) . K
•We have a different loss function for penalizing prediction errors.
L( k , l ) is the price paid for classifying an observation belonging
to class Gk as Gl .
•Most often we use the 0-1 loss function where all misclassifications
are charged a single unit.
•The expected prediction error is
EPE E � L( G , ˆ( X ) )�
G
� �
•Solution is
K
Gˆ( x) argmin g �G �L(Gk , g )P(Gk X x)
k 1
Gˆ( x) Gk if P(Gk �
x) max P( g �
X)
g �G
� � �
ETVar )0) E[ yBias
2
[ f (Tx(0yˆˆ 0 ] (
E y
[ y
00
) .
] y 0 ]2
� � �
ET [ y 0 ET ( y 0 )] ET [ E ( y 0 ) f ( x0 )]2
2
� � �
2 E{[ y 0 ET ( y 0 )][ E ( y 0 ) f ( x0 )]}
� �
VarT ( y 0 ) Bias ( y 0 )
2
� N
• Test error y x0T ˆ x0T �li ( x0 )e i
i 1
In cases like this (and of course, assuming we know this is the case),
simple linear regression methods are not affected by the dimension.
1
model directly.
18/10/25 Overview of Supervised Learning 29
Supervised Learning
• Given: Training examples
{(x1,f(x1)),(x2,f(x2)),…,(xP,f(xP))}
of some unknown function (system) y = f(x)
Assumes
• xi, yi are points in, say IRp+1.
• A (parametric) form for f (X)
• A loss function for measuring the quality of approximation.
Figure 2.10 illustrates the situation.
Pr G k X pk ,q ( X )
N
(q ) log Prg i ,q ( xi )
i 1
1 k s2
s [ f ( x0 ) �f ( x( l ) )]
2 2
k l 1 k
Selecting k is a bias-variance tradeoff — see Figure 2.11.
18/10/25 Overview of Supervised Learning 39
Model Selection & the Bias-Variance Tradeoff