0% found this document useful (0 votes)
101 views37 pages

Univariate Smoothing

The document provides an overview of univariate smoothing techniques. It discusses smoothing problems, which aim to find the best function to minimize prediction error on new data points, and interpolation problems, where the model must match known data points. Common interpolation methods include linear interpolation, nearest neighbor interpolation, and polynomial interpolation of order n-1. Cubic spline interpolation fits cubic polynomials between points with continuous second derivatives. The document also discusses the difference between interpolation, which applies to points within the existing data range, and extrapolation, which applies to points outside that range. An example demonstrates linear and nearest neighbor interpolation on a chirp signal.

Uploaded by

titser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views37 pages

Univariate Smoothing

The document provides an overview of univariate smoothing techniques. It discusses smoothing problems, which aim to find the best function to minimize prediction error on new data points, and interpolation problems, where the model must match known data points. Common interpolation methods include linear interpolation, nearest neighbor interpolation, and polynomial interpolation of order n-1. Cubic spline interpolation fits cubic polynomials between points with continuous second derivatives. The document also discusses the difference between interpolation, which applies to points within the existing data range, and extrapolation, which applies to points outside that range. An example demonstrates linear and nearest neighbor interpolation on a chirp signal.

Uploaded by

titser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Univariate Smoothing Overview Problem Definition & Interpolation

• Problem definition • Smoothing Problem: Given a data set with a single input
• Interpolation variable x, find the best function ĝ(x) that minimizes the
prediction error on new inputs (probably not in the data set)
• Polynomial smoothing
• Cubic splines • Interpolation Problem: Same as the smoothing problem except
the model is subject to the constraint ĝ(xi ) = yi for every
• Basis splines input-output pair (xi , yi ) in the data set
• Smoothing splines – Linear Interpolation: Use a line between each pair of points
• Bayes’ rule – Nearest Neighbor Interpolation: Find the nearest input in
• Density estimation the data set and use the corresponding output as an
approximate fit
• Kernel smoothing
– Polynomial Interpolation:Fit a polynomial of order n − 1 to
• Local averaging n
input output data: ĝ(x) = i=1 wi xi−1
• Weighted least squares – Cubic Spline Interpolation: Fit a cubic polynomial with
• Local linear models continuous second derivatives in between each pair of points
• Prediction error estimates (more on this later)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 1 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 2

Interpolation versus Extrapolation Example 1: Linear Interpolation


• Interpolation is technically defined only for inputs that are within Chirp Linear Interpolation
2
the range of the data set mini xi ≤ x ≤ maxi xi
• If an input is outside of this range, the model is said to be 1.5
extrapolating
1
• A good model should do reasonable things for both cases
0.5
• Extrapolation is much a harder problem
Output y

−0.5

−1

−1.5

−2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 3 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 4
Example 1: MATLAB Code % ================================================
% Nearest Neighbor Interpolation
% ================================================
figure ;
% function [] = Interpolation ();
FigureSet (1 , ’ LTX ’ );
yh = interp1 (x ,y , xt , ’ nearest ’ );
close all ;
h = plot ( xt , yh , ’b ’ ,x ,y , ’ r. ’ );
set (h , ’ MarkerSize ’ ,8);
N = 15;
set (h , ’ LineWidth ’ ,1 .2 );
rand ( ’ state ’ ,2);
xlabel ( ’ Input x ’ );
x = rand (N ,1);
ylabel ( ’ Output y ’ );
y = sin (2* pi *2* x. ^2) + 0 .2 * randn (N ,1);
title ( ’ Chirp Nearest Neighbor Interpolation ’ );
xt = (0:0 .0001 :1) ’; % Test inputs
set ( gca , ’ Box ’ , ’ Off ’ );
grid on ;
% ================================================
% Linear Interpolation axis ([0 1 -2 2]);
% ================================================ AxisSet (8);
figure ; print - depsc I n t e r p o l a t i o n N e a r e s t N e i g h b o r ;
FigureSet (1 , ’ LTX ’ );
yh = interp1 (x ,y , xt , ’ linear ’ ); % ================================================
h = plot ( xt , yh , ’b ’ ,x ,y , ’ r. ’ ); % Polynomial Interpolation
set (h , ’ MarkerSize ’ ,8); % ================================================
set (h , ’ LineWidth ’ ,1 .2 ); A = zeros (N , N );
xlabel ( ’ Input x ’ ); for cnt = 1: size (A ,2) ,
ylabel ( ’ Output y ’ ); A (: , cnt ) = x. ^( cnt -1);
title ( ’ Chirp Linear Interpolation ’ ); end ;
set ( gca , ’ Box ’ , ’ Off ’ ); w = pinv ( A )* y ;
grid on ; At = zeros ( length ( xt ) , N );
axis ([0 1 -2 2]); for cnt = 1: size (A ,2) ,
AxisSet (8); At (: , cnt ) = xt. ^( cnt -1);
print - depsc I n t e r p o l a t i o n L i n e a r ; end ;

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 5 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 6

yh = At * w ; grid on ;
axis ([0 1 -2 2]);
figure ; AxisSet (8);
FigureSet (1 , ’ LTX ’ ); print - depsc I n t e r p o l a t i o n C u b i c S p l i n e ;
h = plot ( xt , yh , ’b ’ ,x ,y , ’ r. ’ );
set (h , ’ MarkerSize ’ ,8); % ================================================
set (h , ’ LineWidth ’ ,1 .2 ); % Cubic Spline Interpolation
xlabel ( ’ Input x ’ ); % ================================================
ylabel ( ’ Output y ’ ); figure ;
title ( ’ Chirp Polynomial Interpolation ’ ); FigureSet (1 , ’ LTX ’ );
set ( gca , ’ Box ’ , ’ Off ’ ); yt = sin (2* pi *2* xt. ^2);
grid on ;
h = plot ( xt , yt , ’b ’ ,x ,y , ’ r. ’ );
axis ([0 1 -2 2]);
set (h , ’ MarkerSize ’ ,8);
set (h , ’ LineWidth ’ ,1 .2 );
AxisSet (8);
xlabel ( ’ Input x ’ );
ylabel ( ’ Output y ’ );
print - depsc I n t e r p o l a t i o n P o l y n o m i a l ;
title ( ’ Chirp Optimal Model ’ );
set ( gca , ’ Box ’ , ’ Off ’ );
% ================================================
grid on ;
% Cubic Spline Interpolation
% ================================================ axis ([0 1 -2 2]);
figure ; AxisSet (8);
FigureSet (1 , ’ LTX ’ ); print - depsc I n t e r p o l a t i o n O p t i m a l M o d e l ;
yh = spline (x ,y , xt );
h = plot ( xt , yh , ’b ’ ,x ,y , ’ r. ’ );
set (h , ’ MarkerSize ’ ,8);
set (h , ’ LineWidth ’ ,1 .2 );
xlabel ( ’ Input x ’ );
ylabel ( ’ Output y ’ );
title ( ’ Chirp Cubic Spline Interpolation ’ );
set ( gca , ’ Box ’ , ’ Off ’ );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 7 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 8
Example 2: Nearest Neighbor Interpolation Example 2: MATLAB Code

Chirp Nearest Neighbor Interpolation


Same data set and test inputs as linear interpolation example.
2

1.5

0.5
Output y

−0.5

−1

−1.5

−2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 9 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 10

Example 3: Polynomial Interpolation Example 3: MATLAB Code

Chirp Polynomial Interpolation


Same data set and test inputs as linear interpolation example.
2

1.5

0.5
Output y

−0.5

−1

−1.5

−2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 11 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 12
Example 4: Cubic Spline Interpolation Example 4: MATLAB Code

Chirp Cubic Spline Interpolation


Same data set and test inputs as linear interpolation example.
2

1.5

0.5
Output y

−0.5

−1

−1.5

−2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 13 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 14

Interpolation Comments Smoothing


• There are an infinite number of functions that satisfy the • For the smoothing problem, even the constraint is relaxed
interpolation constraint:
ĝ(xi ) ≈ yi ∀i
ĝ(xi ) = yi ∀i
• The data set can be merely suggesting what the model output
• Of course, we would like to choose the model that minimizes the should be approximately at some specified points
prediction error • We need another constraint or assumption about the relationship
• Given only data, there is no way to do this exactly between x and y to have enough constraints to uniquely specify
• Our data set only specifies what ĝ(x) should be at specific points the model

• What should it be in between these points?


• In practice, the method of interpolation is usually chosen by the
user

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 15 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 16
Smoothing Assumptions and Statistical Model Example 5: Interpolation Optimal Model
y = g(x) + ε Chirp Optimal Model
2
• Generally we assume that the data was generated from the
statistical model above 1.5

• εi is a random variable with the following assumed properties 1


– Zero mean: E[ε] = 0
0.5
– εi and εj are independently distributed for i = j

Output y
– εi is identically distributed 0

• The two additional assumptions are usually made for the −0.5
smoothing problem
−1
– g(x) is continuous
– g(x) is smooth −1.5

−2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 17 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 18

Smoothing Bias-Variance Tradeoff


yi = g(xi ) + εi Recall that
 
2
MSE(x) = E (g(x) − ĝ(x))
 
• When we add noise, we can drop the interpolation constraint: =
2
(g(x) − E[ĝ(x)]) + E (ĝ(x) − E[ĝ(x)])
2
ĝ(xi ) = yi ∀i
• But we still want ĝ(·) to be consistent with (i.e. close to) the = Bias2 + Variance
data: ĝ(xi ) ≈ yi • Fundamental smoother tradeoff:
• The methods we will discuss are biased in favor of models that are – Smoothness of the estimate ĝ(x)
smooth – Fit to the data
• This can also be framed as a bias-variance tradeoff

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 19 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 20
Bias-Variance Tradeoff Continued Example 6: Univariate Smoothing Data
 
2 2 Motorcycle Data Set
MSE(x) = (g(x) − E[ĝ(x)]) + E (ĝ(x) − E[ĝ(x)])

50
• Smooth models
– Less sensitive to the data
0
– Less variance

Output y
– Potentially high bias since they don’t fit the data well
−50
• Flexible models
– Sensitive to the data
– In the most extreme case, they interpolate the data −100
– High variance since they are sensitive to the data
– Low bias
5 10 15 20 25 30 35 40 45 50 55
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 21 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 22

Example 6: Smoothing Problem MATLAB Code return ;

% ================================================
% Linear
function [] = S m o o t h i n g P r o b l e m (); % ================================================
figure ;
A = load ( ’ MotorCycle.txt ’ ); FigureSet (1 ,4 .5 ,2 .8 );
x = A (: ,1); A = [ ones (N ,1) x ];
y = A (: ,2); w = pinv ( A )* y ;
yh = [ ones ( size ( xt )) xt ]* w ;
figure ; h = plot ( xt , yh , ’b ’ ,x ,y , ’ r. ’ );
FigureSet (1 , ’ LTX ’ ); set (h , ’ MarkerSize ’ ,8);
h = plot (x ,y , ’ r. ’ ); set (h , ’ LineWidth ’ ,1 .2 );
set (h , ’ MarkerSize ’ ,6); xlabel ( ’ Input x ’ );
xlabel ( ’ Input x ’ ); ylabel ( ’ Output y ’ );
ylabel ( ’ Output y ’ ); title ( ’ Chirp Linear Least Squares ’ );
title ( ’ Motorcycle Data Set ’ ); set ( gca , ’ Box ’ , ’ Off ’ );
set ( gca , ’ Box ’ , ’ Off ’ ); grid on ;
grid on ; ymin = min ( y );
ymin = min ( y ); ymax = max ( y );
ymax = max ( y ); yrng = ymax - ymin ;
yrng = ymax - ymin ; ymin = ymin - 0 .05 * yrng ;
ymin = ymin - 0 .05 * yrng ;
ymax = ymax + 0 .05 * yrng ;
ymax = ymax + 0 .05 * yrng ;
axis ([ min ( x ) max ( x ) ymin ymax ]);
axis ([ min ( x ) max ( x ) ymin ymax ]);
AxisSet (8);
AxisSet (8);

% print - depsc Test ;


print - depsc L i n e a r L e a s t S q u a r e s ;
print - depsc S m o o t h i n g P r o b l e m ;

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 23 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 24
Polynomial Smoothing Example 7: Polynomial Smoothing
p−1
• We can fit a polynomial ĝ(x) = i=0 wi xi to the data using the Motorcycle Linear Regression
linear modeling methods
• Note that linear models are linear in the parameters wi
50
• They need not be linear in the inputs
• Alternatively, you can think of this as a linear model with p
different inputs where the ith input is given by xi = xi 0

Output y
• This model is smooth in the sense that all derivatives of ĝ(x) are
continuous −50
• This is one measure of model smoothness
Linear
• In general, this is a terrible smoother Quadratic
−100
– Terrible at extrapolation Cubic
4th Order
– The matrix inverse is often poorly conditioned and 5th Order
regularization is necessary −150
−10 0 10 20 30 40 50 60 70
– The user has to pick the order of the polynomial p − 1 Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 25 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 26

Example 7: MATLAB Code Cubic Splines


Matlab/PolynomialSmoothing.m • Cubic splines are modeled after the properties of flexible rods ship
designers used to use to draw smooth curves
• The rod would be rigidly constrained to go through specific points
(interpolation)
• The rod smoothly bent from one point to the next
• The rod naturally minimized its bending energy (i.e. curvature)
• This can be approximated by a piecewise cubic polynomial

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 27 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 28
Cubic Splines Functional Form Cubic Splines Smoothness Definition

3 • Out of all the functions that meet the above criteria, consider
ĝ(x) = wi (x)xi those that also minimize the approximate “curvature” of ĝ(x)
i=0  xmax  2 2
d ĝ(x)
C≡ dx
• Unlike polynomial regression, here the parameters wi (x) are also a xmin dx
function of x • These are piecewise cubics and are called cubic splines
• Consider a class of functions ĝ(x) that have the following • In the sense of satisfying the criteria listed above and minimizing
properties the curvature C, cubic splines are optimal
– Continuous • Even with all of these constraints, ĝ(x) is not uniquely specified
– Continuous 1st derivative • There are several cubic splines that meet the strict criteria and
– Continuous 2nd derivative have the same curvature
– Interpolates the data: ĝ(xi ) = yi • The most popular additional constraints are
ĝ(xmin ) = 0 ĝ(xmax ) = 0
• These are called natural cubic splines

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 29 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 30

Cubic Spline Constraints Cubic Spline Constraints Continued


y Cubic Spline y Cubic Spline

x x

Let pk (x) be the polynomial between the point xk and xk+1 . We need
• Cubic splines are piecewise cubic 4 × (n + 1) constraints to have the problem well defined.
3
• This means ĝ(x) = i=0 wi (x)xi has different weights between Property Expression Constraints
each pair of points Interpolation ĝ(xi ) = yi n
Continuous pk (xk+1 ) = pk+1 (xk+1 ) n
• For the entire region between each pair of points, the weights are
Continuous Derivative pk (xk+1 ) = pk+1 (xk+1 ) n
fixed
Continuous 2nd Derivative pk (xk+1 ) = pk+1 (xk+1 ) n
• Since each polynomial is defined by 4 parameters wi (x)
Natural splines have 4 additional constraints
• We have n + 1 regions where n is the number of points in the
data set p0 (x1 ) = 0 p0 (x1 ) = 0
• Thus, we need at least 4 × (n + 1) constraints for each region to pn (xn ) = 0 pn (xn ) = 0
uniquely specifies the weights

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 31 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 32
Basis Splines Basis Splines Continued
• You could solve for the 4(n + 1) model coefficients by solving a • The output of our model can then be written as
set of 4(n + 1) linear equations n−1

• This is cumbersome and very inefficient mathematically ĝ(x) = wi b3,i (x)
i=−2
• An easier way is to use basis functions
• Mathematically, each basis function is defined recursively • Numerically, this can be solved much more quickly (the order is
proportional to n)
x − kj ki+j+1 − x
bi,j (x) = bi−1,j (x) + bi−1,j+1 (x) • Since the basis functions have finite support (i.e. finite span) the
ki+j − kj ki+j+1 − kj+1
equivalent A matrix is banded
• Basis splines also have the nice property that they sum to unity
n−1

bi,j (x) = 1 ∀x ∈ [k1 , kn ]
j=1−i

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 33 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 34

Example 8: Basis Function 0 Example 8: Basis Function 1

Basis Function B0(x) Basis Function B1(x)

1 1

0.8 0.8

0.6 0.6
Output y

Output y

0.4 0.4

0.2 0.2

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 35 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 36
Example 8: Basis Function 2 Example 8: Basis Function 3

Basis Function B (x) Basis Function B (x)


2 3

1 1

0.8 0.8

0.6 0.6
Output y

Output y
0.4 0.4

0.2 0.2

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 37 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 38

Example 8: MATLAB Code Smoothing Splines


Matlab/BasisFunctions.m • For smoothing, we do not require ĝ(xi ) = yi
• But we would like it to be close: ĝ(xi ) ≈ yi
• How do we tradeoff smoothness (low variance) for a good fit to
the data (low bias)?
• One way is to find the ĝ(xi ) that minimizes the following
performance criterion:
n  +∞
(ĝ(x) ) dx
2 2
Eλ = (yi − ĝ(xi )) + λ
i=1 −∞

• Contrast to cubic splines in which we required the first term to be


zero
• The second term is a roughness penalty

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 39 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 40
Smoothing Splines Continued Smoothing Splines Comments
n
  +∞ n
  +∞
(ĝ(x) ) dx (ĝ(x) ) dx
2 2 2 2
Eλ = (yi − ĝ(xi )) + λ Eλ = (yi − ĝ(xi )) + λ
i=1 −∞ i=1 −∞

• Smoothing splines are smooth in the same sense as cubic splines


• λ is a user-specified parameter that controls the tradeoff – If cubic, the second derivative is continuous
• It turns out the optimal solution (in the sense of minimizing Eλ ) – If quadratic, the first derivative is continuous
is a smoothing spline – If linear, the function is continuous
• A smoothing spline is identical to a cubic spline in form • If cubic smoothing spline, then
– There is an ith order polynomial between each pair of points – As λ → ∞, ĝ(x) approaches a linear least squares fit to the
– Same number of knots data (i.e. ĝ(x) → 0)
– Same number of different sets of polynomials – As λ → 0, ĝ(x) becomes an interpolating cubic spline
• Unlike the cubic spline, we now drop the constraint that ĝ(xi ) = yi • This is implemented in MATLAB as csaps
• Instead, ĝ(xi ) = ỹi for some set of ỹi • Instead of λ, it takes an equivalent parameter scaled between 0
(linear least squares fit) and 1 (cubic spline interpolation)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 41 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 42

Example 9: Smoothing Spline Example 10: Smoothing Spline

Motorcycle Data Smoothing Spline Regression Motorcycle Data Smoothing Spline Regression α=0.0001

50 50

0 0
Output y

Output y

−50 −50

−100 α = 1.0 −100


α = 0.5
α = 0.0
−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 43 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 44
Example 10: Smoothing Spline Example 10: Smoothing Spline

Motorcycle Data Smoothing Spline Regression α=0.0010 Motorcycle Data Smoothing Spline Regression α=0.0100

50 50

0 0
Output y

Output y
−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 45 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 46

Example 10: Smoothing Spline Example 10: Smoothing Spline

Motorcycle Data Smoothing Spline Regression α=0.2000 Motorcycle Data Smoothing Spline Regression α=0.5000

50 50

0 0
Output y

Output y

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 47 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 48
Example 10: Smoothing Spline Example 10: Smoothing Spline

Motorcycle Data Smoothing Spline Regression α=0.9000 Motorcycle Data Smoothing Spline Regression α=0.9900

50 50

0 0
Output y

Output y
−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 49 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 50

Example 10: MATLAB Code xlabel ( ’ Input x ’ );


ylabel ( ’ Output y ’ );
title ( ’ Motorcycle Data Smoothing Spline Regression ’ );
set ( gca , ’ Box ’ , ’ Off ’ );
function [] = S m o o t h i n g S p l i n e E x (); grid on ;
axis ([ -10 70 -150 90]);
close all ; AxisSet (8);
legend ( ’\ alpha = 1 .0 ’ , ’\ alpha = 0 .5 ’ , ’\ alpha = 0 .0 ’ ,4);
A = load ( ’ MotorCycle.txt ’ ); print - depsc S m o o t h i n g S p l i n e E x ;
xr = A (: ,1); % Raw values
yr = A (: ,2); % Raw values L = [0 .0001 0 .001 0 .01 0 .2 0 .5 0 .9 0 .99 ];
for cnt = 1: length ( L ) ,
x = unique ( xr ); alpha = L ( cnt );
y = zeros ( size ( x )); figure ;
for cnt = 1: length ( x ) , FigureSet (1 , ’ LTX ’ );
y ( cnt ) = mean ( yr ( xr == x ( cnt ))); yh = csaps (x ’ ,y ’ , alpha , xt ’) ’;
end ; h = plot ( xt , yh ,x ,y , ’ k. ’ );
set (h , ’ MarkerSize ’ ,8);
N = size (A ,1); % No. data set points set (h , ’ LineWidth ’ ,1 .2 );
xt = ( -10:0 .2 :70) ’; xlabel ( ’ Input x ’ );
NT = length ( xt ); % No. test points ylabel ( ’ Output y ’ );
NS = 3; % No. of different splines st = sprintf ( ’ Motorcycle Data Smoothing Spline Regression \\ alpha =%6 .4f ’ , alpha );
title ( st );
yh = zeros ( NT , NS ); set ( gca , ’ Box ’ , ’ Off ’ );
yh (: ,3) = csaps (x ’ ,y ’ ,0 , xt ’) ’; grid on ;
yh (: ,2) = csaps (x ’ ,y ’ ,0 .5 , xt ’) ’; axis ([ -10 70 -150 90]);
yh (: ,1) = csaps (x ’ ,y ’ ,1 .0 , xt ’) ’; AxisSet (8);
FigureSet (1 , ’ LTX ’ ); st = sprintf ( ’ print - depsc S m o o t h i n g S p l i n e E x %04 d ; ’ , round ( alpha *10000));
h = plot ( xt , yh ,x ,y , ’ k. ’ ); eval ( st );
set (h , ’ MarkerSize ’ ,8); end ;
set (h , ’ LineWidth ’ ,1 .2 );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 51 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 52
Review of Bayes’ Rule Continuous Bayes’ Rule
• Bayes’ rule says that two discrete-valued random variables A and f (x, y) f (x|Y = y)f (y)
f (y|X = x) = =
B have the following relationship f (x) f (x)
Pr {A, B} Pr {A|B} Pr {B} • E[Y |X = x] is given by
Pr {B|A} = =
Pr {A} Pr {A}  +∞
E[Y |X = x] = yf (y|X = x) dy
• Recall that earlier we found the the ĝ(x) that minimizes the MSE −∞
is given by
Ŷ = g ∗ (x) = E[Y |X = x] • In order to estimate these equations we need a means of
estimating the densities f (x) and f (x, y)
• For smoothing, we can use the continuous analog of Bayes’ rule to
• A popular method of estimating density is to add a series of
estimate E[Y |X = x]
“bumps” together
f (x, y) f (x|Y = y)f (y) • The bumps are called kernels and should have the following
f (y|X = x) = =
f (x) f (x) property
 +∞
bσ (u) du = 1
−∞

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 53 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 54

Density Estimation Example 11: Density Estimation


• Then a kernel density estimator is simply expressed as Motorcycle Data Density Estimation w= 0.1
0.1
n
1
fˆ(x) = bσ (x − xi ) 0.09
n i=1
0.08
• The width of the kernel is specified by σ. Typically 0.07
1 u

Density p(x)
0.06
bσ (u) = b
σ σ 0.05

where it is easy to show that bσ (u) du = 1 for any value of σ 0.04
• Bumps shaped like a Gaussian are popular 0.03
u 2
b(u) = √1 e− 2 0.02

0.01
• Typically the bumps have even symmetry:
0
b(u) = b(−u) = b(|u|) −10 0 10 20 30 40 50 60 70
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 55 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 56
Example 11: Density Estimation Example 11: Density Estimation

Motorcycle Data Density Estimation w= 0.2 Motorcycle Data Density Estimation w= 0.5
0.1 0.1
0.09 0.09
0.08 0.08
0.07 0.07
Density p(x)

Density p(x)
0.06 0.06
0.05 0.05
0.04 0.04
0.03 0.03
0.02 0.02
0.01 0.01
0 0
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 57 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 58

Example 11: Density Estimation Example 11: Density Estimation

Motorcycle Data Density Estimation w= 1.0 Motorcycle Data Density Estimation w= 5.0
0.1 0.1
0.09 0.09
0.08 0.08
0.07 0.07
Density p(x)

Density p(x)
0.06 0.06
0.05 0.05
0.04 0.04
0.03 0.03
0.02 0.02
0.01 0.01
0 0
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 59 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 60
Example 11: MATLAB Code title ( st );
set ( gca , ’ Box ’ , ’ Off ’ );
grid on ;
axis ([ -10 70 0 0 .1 ]);
function [] = DensityEx (); AxisSet (8);
st = sprintf ( ’ print - depsc DensityEx %02 d ; ’ , round ( w *10));
close all ; eval ( st );
end ;
A = load ( ’ MotorCycle.txt ’ );
x = A (: ,1); % Raw values
y = A (: ,2); % Raw values

W = [0 .1 0 .2 0 .5 1 .0 5 .0 ];

xt = ( -10:0 .05 :70) ’;

for c1 = 1: length ( W ) ,
w = W ( c1 );
bs = zeros ( size ( xt )); % Bump sum
for c2 = 1: length ( x ) ,
bs = bs + exp ( -( xt - x ( c2 )) . ^2/(2* w. ^2))/ sqrt (2* pi * w ^2);
end ;
bs = bs / length ( x );

figure ;
FigureSet (1 , ’ LTX ’ );
h = plot (x , zeros ( size ( x )) , ’ k. ’ ,xt , bs );
set (h , ’ LineWidth ’ ,1 .5 );
xlabel ( ’ Input x ’ );
ylabel ( ’ Density p ( x ) ’ );
st = sprintf ( ’ Motorcycle Data Density Estimation w =%5 .1f ’ ,w );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 61 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 62

Density Estimation in Higher Dimensions Example 12: 2D Density Estimation


• Density estimation can be extended to higher dimensions in a the Motorcycle Data Input−Output Density Estimation w= 0.05
obvious way 1.8
n p
1  1.6
fˆ(x) = bσ (xj − xi,j ) 50
n i=1 j=1
1.4
where xj is the jth element of the input vector x and xi,j is the 1.2
jth element of the ith input vector in the data set 0
Output y

1
• Although you can use this for large values of p, it is not
recommended 0.8
−50
• The estimate becomes inaccurate very quickly as the number of 0.6
dimensions grows 0.4
−100
• For one or two dimensions this is a pretty good technique
0.2

0 10 20 30 40 50
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 63 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 64
Example 12: 2D Density Estimation Example 12: 2D Density Estimation

Motorcycle Data Input−Output Density Estimation w= 0.10 Motorcycle Data Input−Output Density Estimation w= 0.20
0.8
0.35

50 0.7 50
0.3
0.6
0.25
0 0.5 0
Output y

Output y
0.2
0.4
−50 −50 0.15
0.3
0.1
0.2
−100 −100
0.1 0.05

0 10 20 30 40 50 0 10 20 30 40 50
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 65 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 66

Example 12: 2D Density Estimation Example 12: 2D Density Estimation

Motorcycle Data Input−Output Density Estimation w= 0.50 Motorcycle Data Input−Output Density Estimation w= 1.00
0.14
0.07
0.12
50 50 0.06
0.1
0.05
0 0
Output y

Output y

0.08
0.04

−50 0.06 −50 0.03

0.04 0.02
−100 −100
0.02 0.01

0 10 20 30 40 50 0 10 20 30 40 50
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 67 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 68
Example 12: MATLAB Code for c2 = 1: length ( x ) ,
bx = exp ( -( xmt - x ( c2 )) . ^2/(2* w. ^2))/ sqrt (2* pi * w ^2);
by = exp ( -( ymt - y ( c2 )) . ^2/(2* w. ^2))/ sqrt (2* pi * w ^2);
bs = bs + bx. * by ;
function [] = DensityEx2D (); end ;
bs = bs / length ( x );
close all ; figure ;
FigureSet (1 , ’ LTX ’ );
A = load ( ’ MotorCycle.txt ’ ); h = imagesc ( xt , yt , bs );
xr = A (: ,1); % Raw values hold on ;
yr = A (: ,2); % Raw values h = plot ( xr , yr , ’ k. ’ ,xr , yr , ’ w. ’ );
xm = mean ( xr ); set ( h (1) , ’ MarkerSize ’ ,4);
ym = mean ( yr ); set ( h (2) , ’ MarkerSize ’ ,2);
xs = std ( xr ); hold off ;
ys = std ( yr ); set ( gca , ’ YDir ’ , ’ Normal ’ );
xlabel ( ’ Input x ’ );
x = ( xr - xm )/ xs ; ylabel ( ’ Output y ’ );
y = ( yr - ym )/ ys ; st = sprintf ( ’ Motorcycle Data Input - Output Density Estimation w =%5 .2f ’ ,w );
title ( st );
W = [0 .05 0 .1 0 .2 0 .5 1 .0 ]; set ( gca , ’ Box ’ , ’ Off ’ );
colorbar ;
xst = -2 .0 :0 .02 :2 .5 ; % X - test points AxisSet (8);
yst = -2 .5 :0 .02 :2 .5 ; % Y - test points
st = sprintf ( ’ print - depsc DensityEx2D %03 d ; ’ , round ( w *100));
[ xmt , ymt ] = meshgrid ( xst , yst ); % Grids of scaled test points
eval ( st );
end ;
xt = xst * xs + xm ; % Unscaled x - test values
yt = yst * ys + ym ; % Unscaled y - test values

for c1 = 1: length ( W ) ,
w = W ( c1 );
bs = zeros ( size ( xmt )); % Bump sum

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 69 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 70

Density Estimation and Scaling Example 13: 2D Density Estimation


• In higher dimensions it is important to scale each input to have x 10
−3
Motorcycle Data No Scaling Density Estimation w= 0.50
the same variance 9

• The following example shows the same data set without scaling 8
50
• Notice the oval-shaped bumps 7

6
0
Output y

−50 4

−100 2

−150
0 10 20 30 40 50 60
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 71 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 72
Example 13: 2D Density Estimation Example 13: 2D Density Estimation
−3 −4
Motorcycle Data No Scaling Density Estimation w= 1.00 x 10 Motorcycle Data No Scaling Density Estimation w= 5.00 x 10

4.5 8
50 4 50
7
3.5
6
0 3 0
Output y

Output y
5
2.5
4
−50 2 −50
3
1.5
−100 1 −100 2

0.5 1

−150 −150
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 73 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 74

Example 13: 2D Density Estimation Example 13: 2D Density Estimation


−4 −5
Motorcycle Data No Scaling Density Estimation w= 10.00 x 10 Motorcycle Data No Scaling Density Estimation w= 20.00 x 10
3 14

50 50 12
2.5

10
0 2 0
Output y

Output y

8
1.5
−50 −50 6
1
4
−100 −100
0.5
2

−150 −150
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 75 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 76
Example 13: 2D Density Estimation Example 13: MATLAB Code
−5
Motorcycle Data No Scaling Density Estimation w= 50.00 x 10
function [] = DensityEx2D ();
% This is the same as DensityEx2D , except no scaling is used.
close all ;
4
50 A = load ( ’ MotorCycle.txt ’ );
x = A (: ,1); % Raw values
3.5 y = A (: ,2); % Raw values

W = [0 .5 1 .0 2 .0 5 .0 10 .0 20 .0 50 .0 ];
0 3
Output y

xt = 0:0 .5 :60; % X - test points


yt = -150:90; % Y - test points
[ xmt , ymt ] = meshgrid ( xt , yt ); % Grids of scaled test points
2.5
−50 for c1 = 1: length ( W ) ,
w = W ( c1 );
2 bs = zeros ( size ( xmt )); % Bump sum
for c2 = 1: length ( x ) ,
bx = exp ( -( xmt - x ( c2 )) . ^2/(2* w. ^2))/ sqrt (2* pi * w ^2);
−100 1.5 by = exp ( -( ymt - y ( c2 )) . ^2/(2* w. ^2))/ sqrt (2* pi * w ^2);
bs = bs + bx. * by ;
end ;
1 bs = bs / length ( x );
figure ;
−150
FigureSet (1 , ’ LTX ’ );
0 10 20 30 40 50 60 h = imagesc ( xt , yt , bs );
Input x hold on ;

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 77 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 78

h = plot (x ,y , ’ k. ’ ,x ,y , ’ w. ’ );
set ( h (1) , ’ MarkerSize ’ ,4);
Kernel Smoothing Derivation
set ( h (2) , ’ MarkerSize ’ ,2);
hold off ; The following equations compose the Nadaraya-Watson estimator of
set ( gca , ’ YDir ’ , ’ Normal ’ );
xlabel ( ’ Input x ’ );
E[y|x]
ylabel ( ’ Output y ’ );
 ∞  ∞ ∞
st = sprintf ( ’ Motorcycle Data No Scaling Density Estimation w =%6 .2f ’ ,w );
f (x, y) y f (x, y) dy
title ( st );
E[y|x] = y f (y|x) dy = y dy = −∞
set ( gca , ’ Box ’ , ’ Off ’ );
colorbar ; −∞ −∞ f (x) f (x)
AxisSet (8);
st = sprintf ( ’ print - depsc DensityEx2Db %03 d ; ’ , round ( w *10)); The two densities can be estimated as follows
eval ( st );
n
1
end ;

fˆ(x, y) = bσ (|x − xi |) · bσ (|y − yi |)


n i=1
n
1
fˆ(x) = bσ (|x − xi |)
n i=1

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 79 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 80
Kernel Smoothing Derivation Continued (1) Kernel Smoothing Derivation Continued (2)
∞ n
y fˆ(x, y) dy
−∞ 1
E[y|x] ≈ fˆ(x) E[y|x] = bσ (|x − xi |) ×
fˆ(x) n i=1
 ∞   n
   ∞  ∞ 
ˆ 1 bσ (|y − yi |) dy + (y − yi ) bσ (|y − yi |) dy
f (x) E[y|x] ≈ y bσ (|x − xi |) · bσ (|y − yi |) dy yi
−∞ n i=1 −∞ −∞
n   ∞ 
n  ∞ 1
1 = bσ (|x − xi |) × yi + u bσ (|u|) dy
= bσ (|x − xi |) y bσ (|y − yi |) dy n i=1 −∞
n i=1 −∞
n
n  ∞ 1
1 = yi bσ (|x − xi |)
= bσ (|x − xi |) (y − yi + yi ) bσ (|y − yi |) dy n i=1
n i=1 −∞ n
n
1
yi bσ (|x − xi |)
1 E[y|x] = n
i=1
n
= bσ (|x − xi |) × 1
bσ (|x − xi |)
n i=1 nn i=1
  ∞   yi bσ (|x − xi |)
∞ = i=1
n
yi bσ (|y − yi |) dy + (y − yi ) bσ (|y − yi |) dy i=1 bσ (|x − xi |)
−∞ −∞

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 81 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 82

Kernel Smoothing Derivation Continued Example 14: Kernels


Thus, by combining the equations on the previous slides we obtain Epanechnikov Biweight Triweight
n
yi bσ (|x − xi |) 1 1 1
E[y|x] ≈ ĝ(x) = i=1n
i=1 bσ (|x − xi |)
• Popular kernels include 0.5 0.5 0.5
Epanechnikov: b(u) = c (1 − u2 ) p(u)
Biweight: b(u) = c (1 − u2 )2 p(u) 0 0 0
Triweight: b(u) = c (1 − u2 )3 p(u)
Triangular: b(u) = c (1 − |u|) p(u) −2 0 2 −2 0 2 −2 0 2
2
Gaussian: b(u) = c e−u Triangular Gaussian Sinc
Sinc: b(u) = c sinc(u) 1 1 1

• Here
∞ c is a constant chosen to meet the constraint 0.5 0.5 0.5
−∞
b(u) du = 1
• p(u) is the unit pulse: 0 0 0
1 |u| ≤ 1
p(u) =
0 Otherwise −2 0 2 −2 0 2 −2 0 2

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 83 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 84
Example 14: MATLAB Code h = plot ([ -5 5] ,[0 0] , ’k : ’ ,x , K (: , cnt ));
set ( h (2) , ’ LineWidth ’ ,1 .5 );
title ( char ( L ( cnt )));
box off ;
function [] = Kernels (); axis ([ min ( x ) max ( x ) -0 .3 1 .2 ]);
end ;
ST = 0 .01 ;
x = ( -2 .2 : ST :2 .2 ) ’; AxisSet (8);
u = abs ( x ); print - depsc Kernels ;
I = ( u≤1);

kep = (1 - u. ^2) . * I ; % Epanechnikov


kbw = (1 - u. ^2) . ^2 . * I ; % Biweight
ktw = (1 - u. ^2) . ^3 . * I ; % Triweight
ktr = (1 - u ) . * I ; % Triangular
kga = exp ( - u. ^2); % Gaussian
ksn = sinc ( u ); % Sinc

kep = kep /( sum ( kep )* ST ); % Normalize


kbw = kbw /( sum ( kbw )* ST ); % Normalize
ktw = ktw /( sum ( ktw )* ST ); % Normalize
ktr = ktr /( sum ( ktr )* ST ); % Normalize
kga = kga /( sum ( kga )* ST ); % Normalize
ksn = ksn /( sum ( ksn )* ST ); % Normalize

K = [ kep kbw ktw ktr kga ksn ];


L = { ’ Epanechnikov ’ , ’ Biweight ’ , ’ Triweight ’ , ’ Triangular ’ , ’ Gaussian ’ , ’ Sinc ’ };

FigureSet (1 ,4 .5 ,2 .8 );
for cnt = 1:6 ,
subplot (2 ,3 , cnt );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 85 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 86

Kernel Smoothing Comments Kernel Smoothing Effect of Support


n n
  n
y b (|x − x |)  b (|x − x |) yi bσ (|x − xi |)
E[y|x] ≈ ĝ(x) = i=1
i σ i
= yi n
σ i E[y|x] ≈ ĝ(x) = i=1
n
i=1 bσ (|x − xi |)
n
i=1 bσ (|x − xi |) i=1 j=1 bσ (|x − xj |)
As the width decreases (σ ↓) one of two things happens
• Kernel smoothing can be written as a weighted average • If b(u) has infinite support,
n
 bσ (|x − xi |) – All of the equivalent weights become nearly equal to zero
ĝ(x) = yi wi (x) wi (x) = n – The weight from the nearest neighbor dominates
i=1 j=1 bσ (|x − xj |)
– Thus ĝ(x) does nearest neighbor interpolation as σ → 0
n
• Note that by definition i=1 wi (x) = 1 • If b(u) has finite support,
• If all the weights were equal, wi (x) = 1
n, then ĝ(x) = ȳ – At some values of x all of the weights may be 0
• This occurs as σ → ∞ – In this happens ĝ(x) at these points is not defined (depends on
implementation)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 87 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 88
Kernel Smoothing Bias-Variance Tradeoff Example 15: Kernel Smoothing
n
1
yi bσ (|x − xi |)
E[y|x] ≈ ĝ(x) = 1 i=1
n
n
Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=0.1000
n i=1 bσ (|x − xi |)

50
• Thus, as with smoothing splines there is a single parameter that
controls the tradeoff of smoothness (high bias) for the ability of
the model to fit the data (high variance) 0

Output y
• Kernel smoothers have bounded outputs
min yi ≤ min ĝ(x) ≤ ĝ(x) ≤ max ĝ(x) ≤ max yi −50
i x x i

• In this sense, they are more stable than smoothing splines


−100
• Recall smoothing splines diverge outside of the data range
• However, kernel smoothers are more likely to round off sharp
−150
edges and peaks and troughs −10 0 10 20 30 40 50 60 70
Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 89 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 90

Example 15: Kernel Smoothing Example 15: Kernel Smoothing

Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=1.0000 Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=2.0000

50 50

0 0
Output y

Output y

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 91 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 92
Example 15: Kernel Smoothing Example 15: Kernel Smoothing

Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=5.0000 Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=10.0000

50 50

0 0
Output y

Output y
−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 93 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 94

Example 15: Kernel Smoothing Example 15: Kernel Smoothing

Motorcycle Data Kernel Smoothing Gaussian Kernel w=0.1000 Motorcycle Data Kernel Smoothing Gaussian Kernel w=1.0000

50 50

0 0
Output y

Output y

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 95 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 96
Example 15: Kernel Smoothing Example 15: Kernel Smoothing

Motorcycle Data Kernel Smoothing Gaussian Kernel w=2.0000 Motorcycle Data Kernel Smoothing Gaussian Kernel w=5.0000

50 50

0 0
Output y

Output y
−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Input x Input x

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 97 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 98

Example 15: Kernel Smoothing Example 15: MATLAB Code

Motorcycle Data Kernel Smoothing Gaussian Kernel w=10.0000


function [] = K e r n e l S m o o t h i n g E x ();

close all ;

50 A = load ( ’ MotorCycle.txt ’ );
x = A (: ,1); % Raw values
y = A (: ,2); % Raw values

xt = ( -10:0 .05 :70) ’;


0 W = [0 .1 1 .0 2 .0 3 .0 5 .0 10 .0 ];
Output y

% Epanechnikov Kernel
for cnt = 1: length ( W ) ,
w = W ( cnt );
−50 figure ;
FigureSet (1 , ’ LTX ’ );
yh = Kernel (x ,y , xt ,w ,2);
h = plot ( xt , yh , ’b ’ ,x ,y , ’ k. ’ );
set (h , ’ MarkerSize ’ ,8);
−100 set (h , ’ LineWidth ’ ,1 .2 );
xlabel ( ’ Input x ’ );
ylabel ( ’ Output y ’ );
st = sprintf ( ’ Motorcycle Data Kernel Smoothing Epanechnikov Kernel w =%6 .4f ’ ,w );
title ( st );
−150
set ( gca , ’ Box ’ , ’ Off ’ );
−10 0 10 20 30 40 50 60 70 grid on ;
Input x axis ([ -10 70 -150 90]);

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 99 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 100
AxisSet (8);
st = sprintf ( ’ print - depsc E K e r n e l S m o o t h i n g E x %03 d ; ’ , round ( w *10));
Local Averaging
eval ( st );
n

end ;
ĝ(x) = wi (x) yi
% Gaussian Kernel
for cnt = 1: length ( W ) , i=1
w = W ( cnt );
figure ;
FigureSet (1 , ’ LTX ’ );
yh = Kernel (x ,y , xt ,w ,1);
• We saw that kernel smoothers can be viewed as a weighted
h = plot ( xt , yh , ’b ’ ,x ,y , ’ k. ’ ); average
set (h , ’ MarkerSize ’ ,8);
set (h , ’ LineWidth ’ ,1 .2 );
xlabel ( ’ Input x ’ );
• Instead, we could take a local average of the k-nearest neighbors
ylabel ( ’ Output y ’ ); of x
k
1
st = sprintf ( ’ Motorcycle Data Kernel Smoothing Gaussian Kernel w =%6 .4f ’ ,w );
title ( st );
set ( gca , ’ Box ’ , ’ Off ’ ); ĝ(x) = yc(i)
grid on ; k i=1
axis ([ -10 70 -150 90]);
AxisSet (8);
st = sprintf ( ’ print - depsc G K e r n e l S m o o t h i n g E x %03 d ; ’ , round ( w *10)); where c(i) is the data set index of the ith nearest point
eval ( st );
end ; • For this type of model, k controls the smoothness

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 101 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 102

Local Averaging Concept MATLAB Code

Motorcycle Data Set function [] = L o c a l A v e r a g e C o n c e p t ();

D = load ( ’ Motorcycle.txt ’ );
x = D (: ,1);
y = D (: ,2);
50
[x , is ] = sort ( x );
Head Acceleration (g)

y = y ( is );

0 q = 30; % Query point ( test input )


k = 10; % No. of neighbors

d = (x - q ) . ^2;
[ ds , is ] = sort ( d );
−50 xs = x ( is );
ys = y ( is );

xn = xs (1: k );
yn = ys (1: k );
−100
[ xsmin , imin ] = min ( xs (1: k ));
[ xsmax , imax ] = max ( xs (1: k ));
imin = is ( imin );
−150 imax = is ( imax );

5 10 15 20 25 30 35 40 45 50 55 xll = ( x ( imin ) + x ( imin -1))/2; % lower limit


Time (ms) xul = ( x ( imax ) + x ( imax +1))/2; % upper limit

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 103 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 104
rg = max ( y ) - min ( y ); h = plot ( xav , yav , ’r - ’ );
ymin = min ( y ) -0 .1 * rg ; set (h , ’ LineWidth ’ ,1 .5 );
ymax = max ( y )+0 .1 * rg ; % h = plot ( xll , yll , ’ b : ’);
h = plot ( q *[1 1] , [ ymin ymax ] , ’b - - ’ );
xbox = [ xll xul xul xll ]; set (h , ’ LineWidth ’ ,1 .5 );
ybox = [ ymin ymin ymax ymax ]; hold off ;
rg = max ( y ) - min ( y );
yav = mean ( yn )*[1 1]; axis ([ min ( x ) max ( x ) ymin ymax ]);
xav = [ xll xul ]; xlabel ( ’ Time ( ms ) ’ );
ylabel ( ’ Head Acceleration ( g ) ’ );
A = [ xn ones (k ,1)]; title ( ’ Motorcycle Data Set ’ );
b = yn ;
set ( gca , ’ Layer ’ , ’ top ’ );
v = pinv ( A )* b ;
set ( gca , ’ Box ’ , ’ off ’ );
xl1 = 0;
yl1 = [ xl1 1]* v ;
AxisSet (8);
xl2 = 1 .5 ;
print - depsc L o c a l A v e r a g e C o n c e p t ;
yl2 = [ xl2 1]* v ;
xll = [ xl1 xl2 ];
yll = [ yl1 yl2 ];

figure ;
FigureSet (1 , ’ LTX ’ );

h = patch ( xbox , ybox , ’g ’ );

set (h , ’ FaceColor ’ , .8 *[1 1 1]);


set (h , ’ EdgeColor ’ , .8 *[1 1 1]);

hold on ;
h = plot (x ,y , ’ k. ’ );
set (h , ’ MarkerSize ’ ,8);

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 105 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 106

Local Averaging Discussion Example 16: Local Averaging


k
1 Motorcycle Data Set k=2
ĝ(x) = yc(i)
k i=1
50
• Local averaging has a number of disadvantages
– The data set must be stored in memory. (This is essentially Head Acceleration (g)
0
true for kernel smoothers and smoothing splines also)
– The output ĝ(x) is discontinuous
– Finding the k nearest neighbors can be computationally −50
expensive
−100

−150
−10 0 10 20 30 40 50 60 70
Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 107 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 108
Example 16: Local Averaging Example 16: Local Averaging

Motorcycle Data Set k=5 Motorcycle Data Set k=10

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 109 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 110

Example 16: Local Averaging Example 16: Local Averaging

Motorcycle Data Set k=20 Motorcycle Data Set k=50

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 111 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 112
Example 16: MATLAB Code yh ( cnt ) = mean ( yn );
end ;

figure ;
function [] = LocalAverageFit (); FigureSet (1 , ’ LTX ’ );
h = plot (x ,y , ’ k. ’ );
close all ; set (h , ’ MarkerSize ’ ,8);
hold on ;
D = load ( ’ Motorcycle.txt ’ ); h = stairs ( xt , yh , ’b ’ );
x = D (: ,1); set (h , ’ LineWidth ’ ,1 .2 );
y = D (: ,2); hold off ;
axis ([ -10 70 -150 90]);
[x , is ] = sort ( x ); xlabel ( ’ Time ( ms ) ’ );
y = y ( is ); ylabel ( ’ Head Acceleration ( g ) ’ );
st = sprintf ( ’ Motorcycle Data Set k =% d ’ ,k );
xt = ( -10:0 .1 :70) ’; title ( st );
yh = zeros ( size ( xt )); set ( gca , ’ Layer ’ , ’ top ’ );
set ( gca , ’ Box ’ , ’ off ’ );
K = [2 5 10 20 50]; AxisSet (8);
for c = 1: length ( K ) , st = sprintf ( ’ print - depsc LocalAverageEx %02 d ; ’ ,k );
k = K ( c ); eval ( st );
end ;
for cnt = 1: length ( xt ) ,
d = (x - xt ( cnt )) . ^2;
[ ds , is ] = sort ( d );
xs = x ( is );
ys = y ( is );

xn = xs (1: k );
yn = ys (1: k );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 113 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 114

Weighted Local Averaging Example 17: Local Averaging Weighting Functions


k k
bk (|x − xc(i) |) yc(i) i=1 bi yc(i) Weighted Averaging Weighting Functions
ĝ(x) = i=1
k = k
i=1 bk (|x − xc(i) ) i=1 bi 1 Epanechnikov
Biweight
Triweight
• Local averaging can be tweaked to produce a continuous ĝ(x) 0.8 Triangular
• We simply take a weighted average where b(u) is a smoothly Weighting Function
decreasing function of the distance 0.6

• We can use our familiar (non-negative) kernels to achieve this


0.4
• My favorite is the biweight function
 2
d2i 0.2
bi = 1 − 2
dk+1
0
where di = |x − xc(i) | is the distance between the input and the
ith nearest neighbor 0 0.2 0.4 0.6 0.8 1 1.2
Distance (u)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 115 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 116
Example 17: Weighted Averaging Example 17: Weighted Averaging

Motorcycle Data Set k=2 Motorcycle Data Set k=5

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 117 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 118

Example 17: Weighted Averaging Example 17: Weighted Averaging

Motorcycle Data Set k=10 Motorcycle Data Set k=20

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 119 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 120
Example 17: Weighted Averaging Example 17: MATLAB Code

Motorcycle Data Set k=50


function [ yt ] = WeightedAverage (x ,y , xt , k );

xarg = x ;
yarg = y ;
50 x = unique ( xarg );
y = zeros ( size ( x ));
for cnt = 1: length ( x ) ,
Head Acceleration (g)

y ( cnt ) = mean ( yarg ( xarg == x ( cnt )));


end ;
0
yt = zeros ( length ( xt ) ,1);
[ Np , Ni ] = size ( x );

for cnt = 1: length ( xt ) ,


−50 d = zeros ( Np ,1);
for cnt2 = 1: Ni ,
d = d + ( x (: , cnt2 ) - xt ( cnt , cnt2 )) . ^2;
end ;
[ ds , is ] = sort ( d );
−100 xs = x ( is );
ys = y ( is );

dn = ds (1: k );
dmax = ds ( k +1);
−150
xn = xs (1: k );
−10 0 10 20 30 40 50 60 70 yn = ys (1: k );
Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 121 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 122

w = (1 -( dn / dmax )) . ^2;
yt ( cnt ) = sum ( w. * yn )/ sum ( w );
Weighted Local Averaging Comments
end ; k
i=1 bk (|x − xc(i) |) yc(i)
ĝ(x) =  k
i=1 bk (|x − xc(i) |)

• Like kernel smoothers, weighted local averaging models are stable


(bounded)
min yc(i) ≤ ĝ(x) ≤ max yc(i)
c(i)∈1,k c(i)∈1,k

• The key difference here is that the kernel width is determined by


the distance to the (k + 1)th nearest neighbor
• This is advantageous
– In regions of dense data, the equivalent kernel width shrinks
– In regions of sparse data, the equivalent kernel width expands

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 123 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 124
Local Model Optimality Local Model Optimality Continued
It can be shown that for fixed weighting functions, w(x), both kernel
n
smoothers and weighted local averaging models minimize the weighted 1 2
ASE ≡ (yi − ĝ(xi )) b(|x − xi |)
average squared error n i=1
n n
1 2 yi b(|x − xi |)
ASE ≡ (yi − ĝ(xi )) b(|x − xi |) ĝ ∗ (x) = i=1
n
i=1 b(|x − xi |)
n i=1
n

dASE
∝ (yi − ĝ(x)) b(|x − xi |) = 0
dĝ(xi ) • Thus, we have an alternative derivation of kernel smoothers and
i=1
n n weighted local averaging models
 
0 = yi b(|x − xi |) − ĝ(x)b(|x − xi |) • They are the models that minimize the weighted ASE
i=1 i=1
n n • The only difference between kernel smoothers, local averaging
 
= yi b(|x − xi |) − ĝ(x) b(|x − xi |) models, and weighted local averaging models are the weighting
i=1 i=1 functions, b(·)
n
yi b(|x − xi |)
ĝ(x) = i=1
n
i=1 b(|x − xi |)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 125 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 126

Local Model Consistency Bias-Variance Tradeoff


   
• Under general assumptions, it can be shown that smoothing 2 2
PE ≡ E (y − ĝ(x)) = (g(x) − E [ĝ(x)]) + E (ĝ(x) − E [ĝ(x)])
2

splines, kernel smoothers, and local models are consistent      


2 Bias Variance
• Consistency means that as n → ∞, if the following conditions are
satisfied
• For each, we discussed a bias-variance tradeoff
– |bσ (u)| du < ∞
– Less smooth ⇒ More variance and less bias
– limu→∞ u bσ (u) = 0
– More smooth ⇒ Less variance and more bias
– E[ε2i ] < ∞
• Recall that the prediction error can be written as shown above
– As n → ∞, σ → 0 and nσ → ∞
then at every point that is continuous for g(x) and f (x) with • The expectation is taken over the distribution of data sets used to
f (x) > 0, construct ĝ(x)
ĝ(x) → g(x) • Conceptually, this can be plotted
with probability 1.

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 127 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 128
Bias-Variance Tradeoff Continued Model Selection
• How do we estimate the prediction error with only one data set?
Prediction Error
Bias • The ASE won’t work: monotonically decreases as the smoothness
Variance decreases
Smoothness • All of our smoothers can be written as
ĝ(x) = ŷ = H(x)y
• Our goal is to minimize the prediction error
• How do we choose the best smoothing parameter? for a given input vector x.
• All of the methods we discussed had a single parameter that • This is very similar to the hat matrix of linear models, except now
controlled smoothness the H matrix is a function of x
– Smoothing splines had a smoothness penalty parameter λ • The equivalent degrees of freedom can be estimated by
– Kernel methods had the bump width σ T

– Local averaging models had the number of neighbors k p ≈ trace HH


• How do we pick the best smoothness?
• Would like an accurate estimate of the prediction error

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 129 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 130

Model Selection Continued Resampling Techniques


• The prediction error can then be estimated by • It is also possible to use resampling techniques
p
– N-Fold Cross-Validation: Divide the data set into N
PE ≈ r × ASE
n different sets. Pick the first set as the test set and build the
model using the remaining N − 1 sets of points. Calculate the
where the r(·) is a function that adjusts the ASE to be a more
ASE on the test set. Repeat for all of the sets and average all
accurate estimate of the PE
N estimates of the ASE.
• A number of different functions r(·) have been proposed – Leave-one-out Cross-Validation: Same as above for N = n.
– Final Prediction Error: r(u) = 1−u
1+u
– Bootstrap: Select n points from the data set with
loge n u
– Schwartz’ Criterion: r(u) = 1 + 2 1−u
replacement and calculate the ASE.
– Generalized CVE: r(u) = 1
(1−u)2
• I personally prefer to use Leave-one-out CVE
– Shabata’s Model Selector: r(u) = 1 + 2u • A study I conducted last spring with weighted averaging models
• In each case u = p indicated that CVE and Generalized CVE were the best (for that
n
type of model)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 131 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 132
Example: Weighted Averaging CVE Example: Weighted Averaging CVE

Local Averaging Cross−Validation Error Motorcycle Data Local Averaging kopt=11

900
50
800

Head Acceleration (g)


0
700
CVE

600 −50

500 −100

400
−150
0 5 10 15 20 25 30 −10 0 10 20 30 40 50 60 70
Number of Neighbors (k) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 133 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 134

Weighted Least Squares Weighted Least Squares Continued


T T
n
1 2 ASEb = (y − Aw) B B(y − Aw)
ASEb = b (yi − ŷi )2
n i=1 i • This can be framed as a typical (unweighted) least squares
T T problem if we add the following definitions
= (y − Aw) B B(y − Aw)
Ab ≡ BA yb ≡ By
where ⎡ ⎤
b1 0 ... 0 • Then the ASEb can be written as
⎢0 b2 ... 0⎥ T
⎢ ⎥ ASEb = (yb − Ab w) (yb − Ab w)
B=⎢. .. .. .. ⎥
⎣ .. . . .⎦ which has the known optimal least squares solution
0 0 . . . bn T T
w = (Ab Ab )−1 Ab yb
• When we discussed linear models we found that the minimized the • Now we can easily generalize kernel methods and local averaging
average squared error models to create localized linear models
• We can generalize this easily to find the best linear model that • We merely specify the weights so that points near the input have
minimizes the weighted ASE the most influence on the model output

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 135 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 136
Example 18: Local Linear Model Example 18: Local Linear Model

Motorcycle Data Set k=2 Motorcycle Data Set k=5

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 137 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 138

Example 18: Local Linear Model Example 18: Local Linear Model

Motorcycle Data Set k=10 Motorcycle Data Set k=20

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 139 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 140
Example 18: Local Linear Model Example 18: Local Linear Model

Motorcycle Data Set k=30 Motorcycle Data Set k=50

50 50
Head Acceleration (g)

Head Acceleration (g)


0 0

−50 −50

−100 −100

−150 −150
−10 0 10 20 30 40 50 60 70 −10 0 10 20 30 40 50 60 70
Time (ms) Time (ms)

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 141 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 142

Example 18: Local Linear Model Example 18: MATLAB Code

Motorcycle Data Set k=93


function [] = LocalLinearEx ();

close all ;

50 D = load ( ’ Motorcycle.txt ’ );
x = D (: ,1);
y = D (: ,2);
Head Acceleration (g)

[x , is ] = sort ( x );
0 y = y ( is );

xt = ( -10:0 .05 :70) ’;

k = [2 5 10 20 30 50 93];
−50 for cnt = 1: length ( k ) ,
yh = LocalLinear (x ,y , xt , k ( cnt ));
figure ;
FigureSet (1 , ’ LTX ’ );
h = plot (x ,y , ’ k. ’ ,xt , yh , ’b ’ );
−100 set ( h (1) , ’ MarkerSize ’ ,8);
set ( h (2) , ’ LineWidth ’ ,1 .2 );
axis ([ -10 70 -150 90]);
xlabel ( ’ Time ( ms ) ’ );
ylabel ( ’ Head Acceleration ( g ) ’ );
−150
st = sprintf ( ’ Motorcycle Data Set k =% d ’ ,k ( cnt ));
−10 0 10 20 30 40 50 60 70 title ( st );
Time (ms) set ( gca , ’ Layer ’ , ’ top ’ );

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 143 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 144
set ( gca , ’ Box ’ , ’ off ’ );
AxisSet (8);
Univariate Smoothing Summary
st = sprintf ( ’ print - depsc LocalLinearEx %02 d ; ’ ,k ( cnt ));
eval ( st ); • We discussed four methods of interpolation
– Linear Interpolation
end ;

– Nearest Neighbor Interpolation


– Polynomial Interpolation
– Cubic Spline Interpolation
• We discussed six methods of univariate smoothing
– Polynomial regression (generalization of linear models)
– Smoothing splines
– Kernel smoothing
– Local averaging
– Weighted local averaging
– Local linear models (weighted)
• Discussed one method of density estimation based on kernels

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 145 J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 146

Univariate Smoothing Summary Continued


• All of the smoothing methods had a single parameter that controls
the smoothness of the model
• For each, we discussed a bias-variance tradeoff
– Less smooth ⇒ More variance and less bias
– More smooth ⇒ Less variance and more bias
• We discussed a several methods of estimating the “true”
prediction error of the model
– Some of the methods were simple modifications of the ASE
– Other methods were based on resampling (cross-validation &
the bootstrap)
• Most can be generalized to the multivariate case

J. McNames Portland State University ECE 4/557 Univariate Smoothing Ver. 1.25 147

You might also like