0% found this document useful (0 votes)
41 views22 pages

I2ml Chap8 v1 1

details
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views22 pages

I2ml Chap8 v1 1

details
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture Slides for

INTRODUCTION TO

Machine Learning
ETHEM ALPAYDIN
The MIT Press, 2004
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 8:

Nonparametric
Methods

Nonparametric Estimation

Parametric (single global model), semiparametric


(small number of local models)
Nonparametric: Similar inputs have similar outputs
Functions (pdf, discriminant, regression) change
smoothly
Keep the training data;let the data speak for itself
Given x, find a small number of closest training
instances and interpolate from these
Aka lazy/memory-based/case-based/instance-based
learning

Density Estimation

Given the training set X={xt}t drawn iid from p(x)

Divide data into bins of size h


Histogram:
#xt in the same bin as x
p x
Nh
Naive estimator:

or

# x h xt x h
p x
2Nh

1 N x xt
p x
w

Nh t 1 h

1 / 2 if u 1

w u

otherwise
4

Kernel Estimator

Kernel function, e.g., Gaussian kernel:

u2
1
K u
exp
2
2

Kernel estimator (Parzen windows)

1 N x xt
p x
K

Nh t 1 h

k-Nearest Neighbor Estimator

Instead of fixing bin width h and counting the


number of instances, fix the instances (neighbors)
k and check bin width

k
p x
2Ndk x
dk(x), distance to kth closest instance to x

10

Multivariate Data

Kernel density estimator

1
p x
Nh d

x xt
K

t 1
h

Multivariate Gaussian kernel


2

u
1

K u
exp
2

1
1 T 1
K u
exp u S u
1/ 2
d/ 2
2

2 S
d

spheric
ellipsoid

11

Nonparametric Classification

Estimate p(x|Ci) and use Bayes rule

Kernel estimator

1
p x | C i
N ih d

x xt
K

t 1
h

gi x p x | C i P C i
Nh d

ri

P C i N i
N

x xt
K

t 1
h

rit

k-NN estimator

P C i ki

ki
p
x
|
C
i
P C i | x
p x | C i

k
p x
N iV x
k
12

Condensed Nearest Neighbor

Time/space complexity of k-NN is O (N)


Find a subset Z of X that is small and is accurate
in classifying X (Hart, 1968)

E' Z | X E X | Z Z

13

Condensed Nearest Neighbor

Incremental algorithm: Add instance if needed

14

Nonparametric Regression

Aka smoothing models


Regressogram

t
t
b
x
,
x
r
t 1
N

g x

t
b
x
,
x
t 1
N

where
t

1
if
x
is in the same bin with x
t
b x,x
0 otherwise

15

16

17

Running Mean/Kernel
Smoother

Running mean smoother

x xt
t 1w h

g x

x xt
t 1w h

g x

where

r t

x xt
t 1 K h

where K( ) is Gaussian

1 if u 1
0 otherwise

x xt
t 1 K h

w u

Kernel smoother

Additive models (Hastie


and Tibshirani, 1990)

Running line smoother


18

19

20

21

How to Choose k or h?

When k or h is small, single instances matter; bias


is small, variance is large (undersmoothing): High
complexity
As k or h increases, we average over more
instances and variance decreases but bias
increases (oversmoothing): Low complexity
Cross-validation is used to finetune k or h.

22

You might also like