0% found this document useful (0 votes)

56 views41 pages

Overview of Supervised Learning

This document provides an overview of supervised learning methods. It discusses linear regression, k-nearest neighbors classification, and statistical decision theory. It introduces notation for inputs (X), outputs (Y), and qualitative responses (G). It then compares linear regression and k-nearest neighbors on a classification example and discusses when each method may perform better. Finally, it discusses the Bayes classifier and how it relates to minimizing error rates.

Uploaded by

Peter Parker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views41 pages

Overview of Supervised Learning

Uploaded by

Peter Parker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

Overview of Supervised

Learning
Outline
• Linear Regression and Nearest Neighbors method
• Statistical Decision Theory
• Local Methods in High Dimensions
• Statistical Models, Supervised Learning and
Function Approximation
• Structured Regression Models
• Classes of Restricted Estimators
• Model Selection and Bias

18/10/25 Overview of Supervised Learning 2

Notation
• X: inputs, feature vector, predictors, independent variables.
Generally X will be a vector of p values. Qualitative features
are coded in X.
– Sample values of X generally in lower case; xi is i-th of N sample
values.

• Y: output, response, dependent variable.

– Typically a scalar, can be a vector, of real values. Again yi is a
realized value.

• G: a qualitative response, taking values in a discrete set G;

e.g. G={ survived, died }. We often code G via a binary
indicator response vector Y.

18/10/25 Overview of Supervised Learning 3

Problem
• 200 points generated in IR2
from a unknown
distribution; 100 in each of
two classes G={ GREEN,
RED }.

• Can we build a rule to

predict the color of the
future points?

18/10/25 Overview of Supervised Learning 4

Linear regression
• Code Y=1 if G=RED, else Y=0.
• We model Y as a linear function of X:
  p  
Y  0   X j  j  X T 
j 1

• Obtain  by least squares, by minimizing the

quadratic criterion: N
RSS (  )   ( yi  xiT  ) 2
i 1
• Given an N  p model matrix X and a response

vector y,  ( X X ) X Y
T 1 T

18/10/25 Overview of Supervised Learning 5

Linear regression
• Prediction at a future point x0 is Yˆ( x0 )  x0T ˆ. Also
�
�RED if Yˆ( x0 ) > 0. 5,
Gˆ( x0 )  �
GREEN if Yˆ( x0 ) �0. 5.
�
• The decision boundary is { X | X T ˆ  0. 5} is linear
(Figure 2.1) (and seems to make many errors on the
training data).

18/10/25 Overview of Supervised Learning 6

Linear regression
• Figure 2.1: A Classification
example in two dimensions.
The classes are coded as a
binary variable (GREEN=0,
RED=1) and then fit by linear
regression. The line is the
decision boundary
 defined by
X T   0.5

• The red shaded region

denotes that part of input
space classified as RED
,while the green region is
classified as GREEN.
18/10/25 Overview of Supervised Learning 7
Possible scenarios
Scenario 1: The data in each class are generated from a
Gaussian distribution with uncorrelated components,
same variances, and different means.

Scenario 2: The data in each class are generated from a

mixture of 10 gaussians in each class.

For Scenario 1, the linear regression rule is almost optimal

(Chapter 4).

For Scenario 2, it is far too rigid.

18/10/25 Overview of Supervised Learning 8

K-Nearest Neighbors
A natural way to classify a new point is to have a look at its neighbors,
and take a vote:
1
ˆ
Yk( x)  �
k xi �Nk ( x )
yi ,

where N k( x) is a neighborhood of x that contains exactly k neighbors (k-

nearest neighborhood).
If there is a clear dominance of one of the classes in the neighborhood of
an observation x, then it is likely that the observation itself would belong
to that class, too. Thus the classification rule is the majority voting among
the members of N k( x) . As before,
�
�RED if Yˆk( x0 ) > 0. 5,
Gˆk( x0 )  �
GREEN if Yˆk( x0 ) �0. 5.
�
18/10/25 Overview of Supervised Learning 9
K-Nearest Neighbors
• Figure 2.2: The same
classification example in two
dimensions as in Figure 2.1.
The classes are coded as a
binary variable (GREEN=0,
RED=1) and the fit by 15-
nearest-neighbor.
• The predicted class is hence
chosen by majority vote
amongst the 15-nearest
neighbors.

18/10/25 Overview of Supervised Learning 10

K-Nearest Neighbors
• Figure 2.3: The same
classification example
are coded as a binary
variable ( GREEN=0,
RED=1), and then
predicted by
1-nearest-neighbor
classification.

18/10/25 Overview of Supervised Learning 11

Linear regression vs. k-NN
First we expose the oracle. The density for each class was
an equal mixture of 10 Gaussians. For the GREEN class,
its 10 means were generated from a N((1,0)T,I) distribution
(and considered fixed). For the RED class, the 10 means
were generated from a N((1,0)T,I). The within cluster
variances were 1/5.

18/10/25 Overview of Supervised Learning 12

Linear regression vs. k-NN
• Figure 2.4: Misclassification curves
for the simulation example above.
a test sample of size 10,000 was
used. The red curves are test and
the green are training error for k-
NN classification. The results for
linear regression are the bigger
green and red dots at three
degrees of freedom. The purple
line is the optimal Bayes Error
Rate.

18/10/25 Overview of Supervised Learning 13

Other Methods
Many modern procedures are variants of linear
regression and K-nearest neighbors:
•Kernel smoothers
•Local linear regression
•Linear basis expansions
•Projection pursuit and neural networks

18/10/25 Overview of Supervised Learning 14

Statistical decision theory
Case 1 ： Quantitative output Y
p
•Let X � Rdenote a real valued random input vector
•We have a Loss function L( Y , f ( X ) )for penalizing errors in
prediction.
•Most common and convenient is squared error loss:
L( Y , f ( X ) )  ( Y  f ( X ) )2.
•This leads us to a criterion for choosing f.
EPE( f )  E( Y  f ( X ) )2
the Expected (squared) Prediction Error,
•Minimizing EPE( fleads
) to a solution f ( x)  E( Y X  x )the
,
conditional expectation, also known as the regression function.

18/10/25 Overview of Supervised Learning 15

回归函数
EPE ( f )  E[Y  f ( X )]
  (y-f(x))2 pr(dx,dy )
  (y-f(x))2 pr(dy | dx) pr (dx )
 E X EY | X ([Y  f ( X )] | X ) 2

对EPE逐点极小化得
f ( x )  arg min c EY | X ([Y  c ]2 | X  x )
极小解为：
f ( x)  E (Y | X  x)
18/10/25 Overview of Supervised Learning 16
Case 2: Qualitative output G:
•Suppose our prediction rule is Gˆ( X, )and G and Gˆ( Xtake ) values in
G card
, with (G) . K
•We have a different loss function for penalizing prediction errors.
L( k , l ) is the price paid for classifying an observation belonging
to class Gk as Gl .
•Most often we use the 0-1 loss function where all misclassifications
are charged a single unit.
•The expected prediction error is
EPE  E � L( G , ˆ( X ) )�
G
� �
•Solution is
K
Gˆ( x)  argmin g �G �L(Gk , g )P(Gk X  x)
k 1

18/10/25 Overview of Supervised Learning 17

• K-nn tries to implement conditional expectations directly, by
- Approximating expectations by sample averages
- Relaxing the notion of conditioning at a point, to conditioning in
a region about a point.
• As N, k → ∞, such that k/N → 0, the K-nearest neighbor estimate
fˆ( x) � E( Y X  x) — it is consistent.
• Linear regression assumes a (linear) structural form for f ( x )  x T
,
and minimizes sample version of EPE directly.
• As sample size grows, our estimate of linear coefficients ˆ
converges to the optimal  opt  E( XX T )1 E( XY ).
• Model is limited by the linearity assumption

Question: Why not always use k-nearest neighbors?

18/10/25 Overview of Supervised Learning 18

Bayes Classifier
With the 0-1 loss function this simplifies to

Gˆ( x)  Gk if P(Gk �
x)  max P( g �
X)
g �G

This is known as the Bayes classifier. It just says that we should

pick the class having maximum probability at the input x.
Question: how do we construct the Bayes classifier for our
simulation example?

18/10/25 Overview of Supervised Learning 19

Bayes Classifier
• Figure 2.5: The optimal
Bayes decision
boundary for the
simulation example
above.
• Since the generating
density is known for
each class, this
boundary can be
calculated exactly.

18/10/25 Overview of Supervised Learning 20

Curse of dimensionality
K-nearest neighbors can fail in high dimensions, because it becomes
difficult to gather K observations close to a target point x0:
•near neighborhoods tend to be spatially large, and estimates are
biased.
•reducing the spatial size of the neighborhood means reducing K,
and the variance of the estimate increases.

•Most points are at the boundary

•Sampling density is proportional to N1/p; if 100 points are sufficient
to estimate a function in IR1, 10010 are needed to achieve similar
accuracy in IR10

18/10/25 Overview of Supervised Learning 21

Example 1
•1000 training examples xi generated uniformly on [-1,1]p.
2
8 X
• Y  f ( X )  e (no measurement error).
•use the 1-nearest-neighbor rule to predict y0 at the test-point x0= 0.

EPE( x0 )  ET [ f ( x0)  yˆ0]2

�
EPE ( x0 ) ETE[ Tf[( yxˆˆˆ
0
0
) 
 y
E 0T]2
( y0
) ]2
 [ E T
( y0
)  f ( x0
) ]2

� � �
ETVar )0) E[ yBias
2
[ f (Tx(0yˆˆ 0 ]  (
E y
[ y
00
) .
]  y 0 ]2

� � �
 ET [ y 0  ET ( y 0 )]  ET [ E ( y 0 )  f ( x0 )]2
2

� � �
 2 E{[ y 0  ET ( y 0 )][ E ( y 0 )  f ( x0 )]}
� �
 VarT ( y 0 )  Bias ( y 0 )
2

18/10/25 Overview of Supervised Learning 22

18/10/25 Overview of Supervised Learning 23
Figure 2.8: A simulation example with the same setup as
in Figure 2.7. Here the function is constant in all but one
1
dimension: F( X )  ( X 1  1)3. The variance dominates.
2

18/10/25 Overview of Supervised Learning 24

Linear Model
• Linear Model Y  XT e
�
1
• Linear Regression βX (X T
X) y T

� N
• Test error y  x0T ˆ  x0T   �li ( x0 )e i
i 1

li ( x0 )   the i-th component of X ( X T X ) 1 x0

18/10/25 Overview of Supervised Learning 25

Curse of dimensionality
Example 2

If the linear model is correct, or almost correct, K-nearest neighbors

will do much worse than linear regression.

In cases like this (and of course, assuming we know this is the case),
simple linear regression methods are not affected by the dimension.

Figure 2.9 illustrates two simple cases.

18/10/25 Overview of Supervised Learning 26

Here f ( X )  1( X  1)3. Here we have
2 1
• Efˆ( x )  Ef ( X ) � f ( EX )  f ( x for
) all
0 (1) (1) 0
dimensions
• Varfˆ( x )  Varf ( X ) �with dimension
0 (1)
Figure 2.9: The curves show the expected prediction
error (at x0 = 0) for 1-nearest neighbor relative to least
squares for the model Y = f ( X ) + ε. For the 1 red curve,
f (x) = x1, while for the green curve f ( x )  ( x1
 1)3
.
18/10/25 Overview of Supervised Learning 2 28
Statistical Models
Y=f(X)+ε
with E(ε) = 0 and X and ε independent.
• E( Y�X )  f( X )
• Pr( Y� )
X depends on X only through f ( X ).
•Useful approximation to the truth — all unmeasured variables
captured by ε
•N realizations yi = f ( xi ) + εi, i = 1, …, N
•Assume εi and εj are independent.
X )  s 2( X ) .
More generally can have, for example, Var( Y�
For qualitative outcomes { Pr( G  Gk�
X )}  p( X )which we
K

1
model directly.
18/10/25 Overview of Supervised Learning 29
Supervised Learning
• Given: Training examples
{(x1,f(x1)),(x2,f(x2)),…,(xP,f(xP))}
of some unknown function (system) y = f(x)

• Find fˆ(x) (i.e. an approximation)

— Predict y �= fˆ(x� ), where x�is not in the
training set

18/10/25 Overview of Supervised Learning 30

Two Types of Supervised Learning
{
• Classification y � c1, c2, K , cN }
─ Model output is a prediction that the input
belongs to some class
─ If the input is an image, the output might be
chair, face, dog, boat,… etc.
• Regression y � R
─ The output has infinitely many values
─ If the input is stock features, the output could
be a prediction of tomorrow’s stock price

18/10/25 Overview of Supervised Learning 31

Learning Classification Models

18/10/25 Overview of Supervised Learning 32

Learning Regression Models

18/10/25 Overview of Supervised Learning 33

Function Approximation
N
RSS(q )  �( y
i 1
i
 fq( xi ) )2

Assumes
• xi, yi are points in, say IRp+1.
• A (parametric) form for f (X)
• A loss function for measuring the quality of approximation.
Figure 2.10 illustrates the situation.

18/10/25 Overview of Supervised Learning 34

Function Approximation
• Figure 2.10: Least
squares fitting of a
function of two inputs.
The parameters of
fθ(x) are chosen so as
to minimize the sum-
of-squared vertical
errors.

18/10/25 Overview of Supervised Learning 35

Function Approximation
• More generally, Maximum Likelihood Estimation
provides a natural basis for estimation.
• E.g. multinomial

Pr  G  k X   pk ,q ( X )
N
(q )   log Prg i ,q ( xi )
i 1

18/10/25 Overview of Supervised Learning 36

Structured Regression Models
N
RSS( f )  � i
( y
i 1
 f ( xi
) )2

• Any function passing through (xi , yi) has RSS = 0

• Need to restrict the class
• Usually restrictions impose local behavior — see equivalent
kernels in chapters 5 and 6
• Any method that attempts to approximate locally varying
functions is “cursed”
• Alternatively, any method that “overcomes” the curse,
assumes an implicit metric that does not allow
neighborhoods to be simultaneously small in all directions.

18/10/25 Overview of Supervised Learning 37

Classes of Restricted Estimators
Some of the classes of restricted methods that we cover are
•Roughness Penalty and Bayesian Methods
PRSS( f , l )  RSS( f )  l J( f )
•Kernel Methods and Local Regression
N
RSS( fq , x0 )  �l 0 i i q i
K (
i 1
x , x ) ( y  f ( x ) )2

•Basis functions and dictionary methods

M
fq( x)  �q
m 1
h ( x)
m m

18/10/25 Overview of Supervised Learning 38

Model Selection & the Bias-Variance Tradeoff

Many of the flexible methods have a complexity parameter:

•the multiplier of the penalty term
•the width of the kernel
•the number of basis functions
Cannot use RSS to determine this parameter — why?
Can use Prediction error on unseen test cases to guide us
E.g. Y = f ( X ) + ε, K-nn (and assume the sample xi are fixed):
E�
( Y  ˆˆˆ
f ( x ) )2
�X  x � s 2
 Bias 2
( f k( x0) )  VarT( f k( x0) )
� k 0 0�

1 k s2
 s  [ f ( x0 )  �f ( x( l ) )] 
2 2

k l 1 k
Selecting k is a bias-variance tradeoff — see Figure 2.11.
18/10/25 Overview of Supervised Learning 39
Model Selection & the Bias-Variance Tradeoff

• Test and training error as a function of model

complexity.

18/10/25 Overview of Supervised Learning 40

• Page 27
• Ex 2.1; 2.2 ； 2.6

18/10/25 Overview of Supervised Learning 41

Unit 3
No ratings yet
Unit 3
100 pages
AAI Lecture 11 SP 25
No ratings yet
AAI Lecture 11 SP 25
77 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
Nearest Neighbor Regression: Find Training Datum Closest To Predict
No ratings yet
Nearest Neighbor Regression: Find Training Datum Closest To Predict
37 pages
Lecture 11
No ratings yet
Lecture 11
32 pages
Classification
No ratings yet
Classification
31 pages
SML01
No ratings yet
SML01
53 pages
ML-Module 3
No ratings yet
ML-Module 3
64 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
ML Module5Notes
No ratings yet
ML Module5Notes
20 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
CS8082U4L01 - K-Nearest Neighbour Learning
No ratings yet
CS8082U4L01 - K-Nearest Neighbour Learning
21 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
BTech V KCS 055 Unit3
No ratings yet
BTech V KCS 055 Unit3
12 pages
Unit-4 ML
No ratings yet
Unit-4 ML
12 pages
MCA 4th Sem
No ratings yet
MCA 4th Sem
18 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
Comparison of Classification Algorithms
No ratings yet
Comparison of Classification Algorithms
11 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Module 5 Part 2 3
No ratings yet
Module 5 Part 2 3
19 pages
15 Ec 834
No ratings yet
15 Ec 834
26 pages
Supervised Learning: Instance Based Learning
No ratings yet
Supervised Learning: Instance Based Learning
16 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Unit 5
No ratings yet
Unit 5
73 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
20 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Nearest Neighbour
No ratings yet
Nearest Neighbour
25 pages
Classification
No ratings yet
Classification
74 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
INSTANCE Based Learning
No ratings yet
INSTANCE Based Learning
12 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
Lect 1
No ratings yet
Lect 1
24 pages
ESL: Chapter 1: 1.1 Introduction To Linear Regression
No ratings yet
ESL: Chapter 1: 1.1 Introduction To Linear Regression
4 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
1993 Apush DBQ
100% (1)
1993 Apush DBQ
11 pages
OM Ch1 8th Ed
100% (2)
OM Ch1 8th Ed
31 pages
History 11 - Exercise 2.3
100% (1)
History 11 - Exercise 2.3
2 pages
Paper Cup Making Proposal
No ratings yet
Paper Cup Making Proposal
5 pages
Applied Buddhism - The Foundation of Our True Understanding
No ratings yet
Applied Buddhism - The Foundation of Our True Understanding
10 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
Namtech Brochure
No ratings yet
Namtech Brochure
40 pages
Engineering Thesis Proposal Template
100% (2)
Engineering Thesis Proposal Template
7 pages
Chapter 15 - Selecting Employees
100% (1)
Chapter 15 - Selecting Employees
27 pages
Tragic Lesson Plan
No ratings yet
Tragic Lesson Plan
4 pages
Cambridge IGCSE: Global Perspectives 0457/13
No ratings yet
Cambridge IGCSE: Global Perspectives 0457/13
12 pages
Extended Merit List of Ph.D. Admission 2020-2021
No ratings yet
Extended Merit List of Ph.D. Admission 2020-2021
4 pages
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
46 pages
Natural Hazards
No ratings yet
Natural Hazards
4 pages
The Implications of The Exclusion of Fil
No ratings yet
The Implications of The Exclusion of Fil
12 pages
Kaufmann, EP, 'Dominant Ethnie,' 'Melting Pot,' 'Assimilation,' 'Nation-Building,' The Companion Guide To Nationalism (London: Transaction Publishers), 2000, Pp. 12-13, 51-3, 177-8, 208-9
No ratings yet
Kaufmann, EP, 'Dominant Ethnie,' 'Melting Pot,' 'Assimilation,' 'Nation-Building,' The Companion Guide To Nationalism (London: Transaction Publishers), 2000, Pp. 12-13, 51-3, 177-8, 208-9
13 pages
Genuine Student Requirement Form
No ratings yet
Genuine Student Requirement Form
5 pages
Transformational Leadership
No ratings yet
Transformational Leadership
11 pages
G6 Q3W7 DLL ENGLISH (MELCs)
No ratings yet
G6 Q3W7 DLL ENGLISH (MELCs)
8 pages
Story of My Thinking Sample
No ratings yet
Story of My Thinking Sample
28 pages
WEINS Mapping China Paper Constructivist China-ASEAN
No ratings yet
WEINS Mapping China Paper Constructivist China-ASEAN
29 pages
G-XII HYE SYLLABUS For AY-2024-25
No ratings yet
G-XII HYE SYLLABUS For AY-2024-25
2 pages
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
51 pages
Basis Expansion and Regularization: Prof. Liqing Zhang
No ratings yet
Basis Expansion and Regularization: Prof. Liqing Zhang
45 pages
Architecture Thesis Literature Review
100% (2)
Architecture Thesis Literature Review
8 pages
Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
32 pages
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
29 pages
F. ICARS
No ratings yet
F. ICARS
1 page
Calculation of Acquired CPD Points by Candidate Please Enter Acquired CPD Points in Respective Red Boxes Only
No ratings yet
Calculation of Acquired CPD Points by Candidate Please Enter Acquired CPD Points in Respective Red Boxes Only
1 page
Abanoub Emil Demian: Objective
No ratings yet
Abanoub Emil Demian: Objective
3 pages
Sibugay Bulletin Vol.1
No ratings yet
Sibugay Bulletin Vol.1
8 pages
Khattak Nasim Zahid CV
No ratings yet
Khattak Nasim Zahid CV
7 pages
Literacy Narrative
No ratings yet
Literacy Narrative
3 pages
PCK6 COMPILATION Rosalyn Deozo BSED MATHEMATICS 2A
No ratings yet
PCK6 COMPILATION Rosalyn Deozo BSED MATHEMATICS 2A
9 pages
(182253875) Consumer Buying Behavior While Purchasing Apple Laptop
No ratings yet
(182253875) Consumer Buying Behavior While Purchasing Apple Laptop
6 pages
College of Education: Syllabus of Instruction in Field Study 3 (FS 3)
No ratings yet
College of Education: Syllabus of Instruction in Field Study 3 (FS 3)
7 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet

Overview of Supervised Learning

Uploaded by

Overview of Supervised Learning

Uploaded by

Overview of Supervised

18/10/25 Overview of Supervised Learning 2

• Y: output, response, dependent variable.

• G: a qualitative response, taking values in a discrete set G;

18/10/25 Overview of Supervised Learning 3

• Can we build a rule to

18/10/25 Overview of Supervised Learning 4

• Obtain  by least squares, by minimizing the

18/10/25 Overview of Supervised Learning 5

18/10/25 Overview of Supervised Learning 6

• The red shaded region

Scenario 2: The data in each class are generated from a

For Scenario 1, the linear regression rule is almost optimal

For Scenario 2, it is far too rigid.

18/10/25 Overview of Supervised Learning 8

where N k( x) is a neighborhood of x that contains exactly k neighbors (k-

18/10/25 Overview of Supervised Learning 10

18/10/25 Overview of Supervised Learning 11

18/10/25 Overview of Supervised Learning 12

18/10/25 Overview of Supervised Learning 13

18/10/25 Overview of Supervised Learning 14

18/10/25 Overview of Supervised Learning 15

18/10/25 Overview of Supervised Learning 17

Question: Why not always use k-nearest neighbors?

18/10/25 Overview of Supervised Learning 18

This is known as the Bayes classifier. It just says that we should

18/10/25 Overview of Supervised Learning 19

18/10/25 Overview of Supervised Learning 20

•Most points are at the boundary

18/10/25 Overview of Supervised Learning 21

EPE( x0 )  ET [ f ( x0)  yˆ0]2

18/10/25 Overview of Supervised Learning 22

18/10/25 Overview of Supervised Learning 24

li ( x0 )   the i-th component of X ( X T X ) 1 x0

18/10/25 Overview of Supervised Learning 25

If the linear model is correct, or almost correct, K-nearest neighbors

Figure 2.9 illustrates two simple cases.

18/10/25 Overview of Supervised Learning 26

• Find fˆ(x) (i.e. an approximation)

18/10/25 Overview of Supervised Learning 30

18/10/25 Overview of Supervised Learning 31

18/10/25 Overview of Supervised Learning 32

18/10/25 Overview of Supervised Learning 33

18/10/25 Overview of Supervised Learning 34

18/10/25 Overview of Supervised Learning 35

18/10/25 Overview of Supervised Learning 36

• Any function passing through (xi , yi) has RSS = 0

18/10/25 Overview of Supervised Learning 37

•Basis functions and dictionary methods

18/10/25 Overview of Supervised Learning 38

Many of the flexible methods have a complexity parameter:

• Test and training error as a function of model

18/10/25 Overview of Supervised Learning 40

18/10/25 Overview of Supervised Learning 41

You might also like