0% found this document useful (0 votes)

31 views8 pages

Nonlinear Regression

Uploaded by

Cupa no Densetsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views8 pages

Nonlinear Regression

Uploaded by

Cupa no Densetsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

3 Nonlinear Regression
Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear
models are necessary. In regression, all such models will have the same basic form, i.e.,
y = f (x) (1)
In linear regression, we have f (x) = Wx + b; the parameters W and b must be fit to data.
What nonlinear function do we choose? In principle, f (x) could be anything: it could involve
linear functions, sines and cosines, summations, and so on. However, the form we choose will
make a big difference on the effectiveness of the regression: a more general model will require
more data to fit, and different models are more appropriate for different problems. Ideally, the
form of the model would be matched exactly to the underlying phenomenon. If we’re modeling a
linear process, we’d use a linear regression; if we were modeling a physical process, we could, in
principle, model f (x) by the equations of physics.
In many situations, we do not know much about the underlying nature of the process being
modeled, or else modeling it precisely is too difficult. In these cases, we typically turn to a few
models in machine learning that are widely-used and quite effective for many problems. These
methods include basis function regression (including Radial Basis Functions), Artificial Neural
Networks, and k-Nearest Neighbors.
There is one other important choice to be made, namely, the choice of objective function for
learning, or, equivalently, the underlying noise model. In this section we extend the LS estimators
introduced in the previous chapter to include one or more terms to encourage smoothness in the
estimated models. It is hoped that smoother models will tend to overfit the training data less and
therefore generalize somewhat better.

3.1 Basis function regression

A common choice for the function f (x) is a basis function representation1 :
X
y = f (x) = wk bk (x) (2)
k

for the 1D case. The functions bk (x) are called basis functions. Often it will be convenient to
express this model in vector form, for which we define b(x) = [b1 (x), . . . , bM (x)]T and w =
[w1 , . . . , wM ]T where M is the number of basis functions. We can then rewrite the model as
y = f (x) = b(x)T w (3)
Two common choices of basis functions are polynomials and Radial Basis Functions (RBF).
A simple, common basis for polynomials are the monomials, i.e.,
b0 (x) = 1, b1 (x) = x, b2 (x) = x2 , b3 (x) = x3 , ... (4)
1
In the machine learning and statistics literature, these representations are often referred to as linear regression,
since they are linear functions of the “features” bk (x)

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 9

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

Polynomial basis functions

Radial Basis Functions
8
2
x0
6 1
x
x2
1.5
x3
4

1
2

0 0.5

−2
0

−4

−0.5
−6

−1
−8
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2
x x

Figure 1: The first three basis functions of a polynomial basis, and Radial Basis Functions

With a monomial basis, the regression model has the form

X
f (x) = w k xk , (5)

Radial Basis Functions, and the resulting regression model are given by
(x−ck )2
bk (x) = e− 2σ2 , (6)
X (x−ck )2
f (x) = wk e− 2σ2 , (7)
where ck is the center (i.e., the location) of the basis function and σ 2 determines the width of the
basis function. Both of these are parameters of the model that must be determined somehow.
In practice there are many other possible choices for basis functions, including sinusoidal func-
tions, and other types of polynomials. Also, basis functions from different families, such as mono-
mials and RBFs, can be combined. We might, for example, form a basis using the first few poly-
nomials and a collection of RBFs. In general we ideally want to choose a family of basis functions
such that we get a good fit to the data with a small basis set so that the number of weights to be
estimated is not too large.
To fit these models, we can again use least-squares regression, by minimizing the sum of
squared residual error between model predictions and the training data outputs:
!2
X X X
E(w) = (yi − f (xi ))2 = yi − wk bk (x) (8)
i i k

To minimize this function with respect to w, we note that this objective function has the same form
as that for linear regression in the previous chapter, except that the inputs are now the bk (x) values.

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 10

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

In particular, E is still quadratic in the weights w, and hence the weights w can be estimated the
same way. That is, we can rewrite the objective function in matrix-vector form to produce

E(w) = ||y − Bw||2 (9)

where ||·|| denotes the Euclidean norm, and the elements of the matrix B are given by Bi,j = bj (xi )
(for row i and column j). In Matlab the least-squares estimate can be computed as w∗ = B\y.

Picking the other parameters. The positions of the centers and the widths of the RBF basis
functions cannot be solved directly for in closed form. So we need some other criteria to select
them. If we optimize these parameters for the squared-error, then we will end up with one basis
center at each data point, and with tiny width that exactly fit the data. This is a problem as such a
model will not usually provide good predictions for inputs other than those in the training set.
The following heuristics instead are commonly used to determine these parameters without
overfitting the training data. To pick the basis centers:
1. Place the centers uniformly spaced in the region containing the data. This is quite simple,
but can lead to empty regions with basis functions, and will have an impractical number of
data points in higher-dimensinal input spaces.

2. Place one center at each data point. This is used more often, since it limits the number of
centers needed, although it can also be expensive if the number of data points is large.

3. Cluster the data, and use one center for each cluster. We will cover clustering methods later
in the course.
To pick the width parameter:
1. Manually try different values of the width and pick the best by trial-and-error.

2. Use the average squared distances (or median distances) to neighboring centers, scaled by a
constant, to be the width. This approach also allows you to use different widths for different
basis functions, and it allows the basis functions to be spaced non-uniformly.
In later chapters we will discuss other methods for determining these and other parameters of
models.

3.2 Overfitting and Regularization

Directly minimizing squared-error can lead to an effect called overfitting, wherein we fit the train-
ing data extremely well (i.e., with low error), yet we obtain a model that produces very poor pre-
dictions on future test data whenever the test inputs differ from the training inputs (Figure 2(b)).
Overfitting can be understood in many ways, all of which are variations on the same underlying
pathology:

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 11

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

1. The problem is insufficiently constrained: for example, if we have ten measurements and ten
model parameters, then we can often obtain a perfect fit to the data.

2. Fitting noise: overfitting can occur when the model is so powerful that it can fit the data and
also the random noise in the data.

3. Discarding uncertainty: the posterior probability distribution of the unknowns is insuffi-

ciently peaked to pick a single estimate. (We will explain what this means in more detail
later.)

There are two important solutions to the overfitting problem: adding prior knowledge and handling
uncertainty. The latter one we will discuss later in the course.
In many cases, there is some sort of prior knowledge we can leverage. A very common as-
sumption is that the underlying function is likely to be smooth, for example, having small deriva-
tives. Smoothness distinguishes the examples in Figure 2. There is also a practical reason to
prefer smoothness, in that assuming smoothness reduces model complexity: it is easier to estimate
smooth models from small datasets. In the extreme, if we make no prior assumptions about the
nature of the fit then it is impossible to learn and generalize at all; smoothness assumptions are one
way of constraining the space of models so that we have any hope of learning from small datasets.
One way to add smoothness is to parameterize the model in a smooth way (e.g., making the
width parameter for RBFs larger; using only low-order polynomial basis functions), but this limits
the expressiveness of the model. In particular, when we have lots and lots of data, we would like
the data to be able to “overrule” the smoothness assumptions. With large widths, it is impossible
to get highly-curved models no matter what the data says.
Instead, we can add regularization: an extra term to the learning objective function that prefers
smooth models. For example, for RBF regression with scalar outputs, and with many other types
of basis functions or multi-dimensional outputs, this can be done with an objective function of the
form:
E(w) = ||y − Bw||2 + λ||w||2 (10)
| {z } | {z }
data term smoothness term
This objective function has two terms. The first term, called the data term, measures the model fit
to the training data. The second term, often called the smoothness term, penalizes non-smoothness
(rapid changes in f (x)). This particular smoothness term (||w||) is called weight decay, because it
tends to make the weights smaller.2 Weight decay implicitly leads to smoothness with RBF basis
functions because the basis functions themselves are smooth, so rapid changes in the slope of f
(i.e., high curvature) can only be created in RBFs by adding and subtracting basis functions with
large weights. (Ideally, we might directly penalize smoothness, e.g., using an objective term that
directly penalizes the integral of the squared curvature of f (x), but this is usually impractical.)
2
Estimation with this objective function is sometimes called Ridge Regression in Statistics.

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 12

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

This regularized least-squares objective function is still quadratic with respect to w and can
be optimized in closed-form. To see this, we can rewrite it as follows:

E(w) = (y − Bw)T (y − Bw) + λwT w (11)

= wT BT Bw − 2wT BT y + λwT w + yT y (12)
= wT (BT B + λI)w − 2wT BT y + yT y (13)

To minimize E(w), as above, we solve the normal equations ∇E(w) = 0 (i.e., ∂E/∂wi = 0 for
all i). This yields the following regularized LS estimate for w:

w∗ = (BT B + λI)−1 BT y (14)

3.3 Artificial Neural Networks

Another choice of basis function is the sigmoid function. “Sigmoid” literally means “s-shaped.”
The most common choice of sigmoid is:
1
g(a) = (15)
1 + e−a
Sigmoids can be combined to create a model called an Artificial Neural Network (ANN). For
regression with multi-dimensional inputs x ∈ RK K1
2 , and multidimensional outputs y ∈ R :
!
X (1) X (2) (2)
y = f (x) = wj g wk,j xk + bj + b(1) (16)
j k

This equation describes a process whereby a linear regressor with weights w(2) is applied to x.
The output of this regressor is then put through the nonlinear Sigmoid function, the outputs of
which act as features to another linear regressor. Thus, note that the inner weights w(2) are distinct
(1)
parameters from the outer weights wj . As usual, it is easiest to interpret this model in the 1D
case, i.e., X (1) (2)
(2)
y = f (x) = wj g wj x + bj + b(1) (17)
j

Figure 3(left) shows plots of g(wx) for different values of w, and Figure 3(right) shows g(x+b)
for different values of b. As can be seen from the figures, the sigmoid function acts more or less
like a step function for large values of w, and more like a linear ramp for small values of w. The
bias b shifts the function left or right. Hence, the neural network is a linear combination of shifted
(smoothed) step functions, linear ramps, and the bias term.
To learn an artificial neural network, we can again write a regularized squared-error objective
function:
E(w, b) = ||y − f (x)||2 + λ||w||2 (18)

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 13

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

1.5 1.5
training data points
original curve
estimated curve

1
1

0.5

−0.5

−0.5
−1
training data points
original curve
estimated curve

−1 −1.5
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

(a) (b)
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 training data points −1

training data points
original curve
original curve
estimated curve
estimated curve

−1.5 −1.5
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Figure 2: Least-squares curve fitting of an RBF. (a) Point data (blue circles) was taken from a sine
curve, and a curve was fit to the points by a least-squares fit. The horizontal axis is x, the vertical
axis is y, and the red curve is the estimated f (x). In this case, the fit is essentially perfect. The
curve representation is a sum of Gaussian basis functions. (b) Overfitting. Random noise was
added to the data points, and the curve was fit again. The curve exactly fits the data points, which
does not reproduce the original curve (a green, dashed line) very well. (c) Underfitting. Adding
a smoothness term makes the resulting curve too smooth. (In this case, weight decay was used,
along with reducing the number of basis functions). (d) Reducing the strength of the smoothness
term yields a better fit.

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

1
1
w = 10
0.9 w=2 0.9
w=1
0.8 w = .5 g(x−4)
w = .1 0.8 g(x)
g(x+4)
0.7
0.7

0.6
0.6

0.5
g

0.5

0.4
0.4

0.3
0.3

0.2
0.2

0.1
0.1

0
-10 -8 -6 -4 -2 0 2 4 6 8 10 0
x −10 −8 −6 −4 −2 0 2 4 6 8 10

Figure 3: Left: Sigmoids g(wx) = 1/(1+e−wx ) for various values of w, ranging from linear ramps
to smooth steps to nearly hard steps. Right: Sigmoids g(x + b) = 1/(1 + e−x−b ) with different
shifts b.

where w comprises the weights at both levels for all j. Note that we regularize by applying weight
decay to the weights (both inner and outer), but not the biases, since only the weights affect the
smoothness of the resulting function (why?).
Unfortuntely, this objective function cannot be optimized in closed-form, and numerical opti-
mization procedures must be used. We will study one such method, gradient descent, in a later
chapter.

3.4 K-Nearest Neighbors

At their heart, many learning procedures — especially when our prior knowledge is weak —
amount to smoothing the training data. RBF fitting is an example of this. However, many of
these fitting procedures require making a number of decisions, such as the locations of the basis
functions, and can be sensitive to these choices. This raises the question: why not cut out the mid-
dleman, and smooth the data directly? This is the idea behind K-Nearest Neighbors regression.
The idea is simple. We first select a parameter K, which is the only parameter to the algorithm.
Then, for a new input x, we find the K nearest neighbors to x in the training set, based on their
Euclidean distance ||x − xi ||2 . Then, our new output y is simply an average of the training outputs
for those nearest neigbors. This can be expressed as:
1 X
y= yi (19)
K
i∈NK (x)

where the set NK (x) contains the indicies of the K training points closest to x. Alternatively, we
might take a weighted average of the K-nearest neighbors to give more influence to training points

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

close to x than to those further away:

P
i∈N (x) w(xi )yi 2 /2σ 2
y= P K , w(xi ) = e−||xi −x|| (20)
i∈NK (x) w(xi )

where σ 2 is an additional parameter to the algorithm. The parameters K and σ control the degree
of smoothing performed by the algorithm. In the extreme case of K = 1, the algorithm produces
a piecewise-constant function.
K-nearest neighbors is simple and easy to implement; it doesn’t require us to muck about at
all with different choices of basis functions or regularizations. However, it doesn’t compress the
data at all: we have to keep around the entire training set in order to use it, which could be very
expensive, and we must search the whole data set to make predictions. (The cost of searching
can be mitigated with spatial data-structures designed for searching, such as k-d-trees and locality-
sensitive hashing. We will not cover these methods here).

Numerical Methods Question Bank
No ratings yet
Numerical Methods Question Bank
10 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Least Square Fit
100% (2)
Least Square Fit
16 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Unit 2
No ratings yet
Unit 2
133 pages
Neural Networks Study Notes
100% (2)
Neural Networks Study Notes
11 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
4' - FDM - Examples
No ratings yet
4' - FDM - Examples
24 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
L7 CurveFitting (LeastSquaresRegression)
No ratings yet
L7 CurveFitting (LeastSquaresRegression)
45 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
03 1 Linear Basis Function Models Draft SEP24
No ratings yet
03 1 Linear Basis Function Models Draft SEP24
52 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
PR M4 Notes
No ratings yet
PR M4 Notes
38 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Lecture03 Kernel
No ratings yet
Lecture03 Kernel
28 pages
CPSC540: Regularization, Regularization, Nonlinear Prediction and Generalization
No ratings yet
CPSC540: Regularization, Regularization, Nonlinear Prediction and Generalization
23 pages
qt95f7x3hb Nosplash
No ratings yet
qt95f7x3hb Nosplash
26 pages
ML Module4 RBF
No ratings yet
ML Module4 RBF
21 pages
L11 ML
No ratings yet
L11 ML
27 pages
Regression Interpolation
No ratings yet
Regression Interpolation
34 pages
Chapter3 Annotated Almostwholething
No ratings yet
Chapter3 Annotated Almostwholething
26 pages
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
No ratings yet
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
58 pages
Module 07
No ratings yet
Module 07
21 pages
Lecture 04
No ratings yet
Lecture 04
19 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Dca2101 - Computer Oriented Numerical Methods
No ratings yet
Dca2101 - Computer Oriented Numerical Methods
9 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
MAI Lecture 07 RBFN
No ratings yet
MAI Lecture 07 RBFN
23 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Chap 5
No ratings yet
Chap 5
8 pages
Worksheet 2 - Polynomials
No ratings yet
Worksheet 2 - Polynomials
2 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Lecture 05
No ratings yet
Lecture 05
10 pages
ECEG2102 CM Notes - Ch4
No ratings yet
ECEG2102 CM Notes - Ch4
15 pages
Basis Approaches
No ratings yet
Basis Approaches
9 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
06 Basis
No ratings yet
06 Basis
9 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
11 pages
Radial Basis Function Networks: Yousef Akhlaghi
No ratings yet
Radial Basis Function Networks: Yousef Akhlaghi
28 pages
The Radial Basis Function Network: March 5, 2006
No ratings yet
The Radial Basis Function Network: March 5, 2006
26 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
Curve Fitting and Interpolation
No ratings yet
Curve Fitting and Interpolation
14 pages
w1d Linear Regression Regularization
No ratings yet
w1d Linear Regression Regularization
4 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Intro To Regression
No ratings yet
Intro To Regression
4 pages
Radial Basis Function Networks: The Structure of The RBF Networks
No ratings yet
Radial Basis Function Networks: The Structure of The RBF Networks
8 pages
Sample Exam For ML YSZ: Question 1 (Linear Regression)
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
4 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Fftgu
No ratings yet
Fftgu
5 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
Radial Basis Functions Neural Networks
No ratings yet
Radial Basis Functions Neural Networks
11 pages
Kondor Regression
No ratings yet
Kondor Regression
4 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
Optimization Methods - PPT
No ratings yet
Optimization Methods - PPT
19 pages
Chapter 3 OTSM
No ratings yet
Chapter 3 OTSM
74 pages
BE03005041
No ratings yet
BE03005041
5 pages
Math Mystery Escape Room by Slidesgo
No ratings yet
Math Mystery Escape Room by Slidesgo
19 pages
Lab-Manual TIET MATLAB
No ratings yet
Lab-Manual TIET MATLAB
24 pages
Lec 10-2
No ratings yet
Lec 10-2
15 pages
Cryogenics 8
No ratings yet
Cryogenics 8
16 pages
MCA Mathematical Foundation For Computer Application 15
No ratings yet
MCA Mathematical Foundation For Computer Application 15
12 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Lec 1
No ratings yet
Lec 1
8 pages
Lec 6, 2
No ratings yet
Lec 6, 2
16 pages
CH 03 Simplex Method and Sensitivity Analysis
No ratings yet
CH 03 Simplex Method and Sensitivity Analysis
121 pages
Summary Part Mai
No ratings yet
Summary Part Mai
8 pages
المؤسسية
No ratings yet
المؤسسية
38 pages
Lec 4, 2
No ratings yet
Lec 4, 2
20 pages
Cryogenics 12
No ratings yet
Cryogenics 12
5 pages
Lec 10-1
No ratings yet
Lec 10-1
15 pages
امتحانات n.g
No ratings yet
امتحانات n.g
146 pages
2009
No ratings yet
2009
14 pages
Lec 10, 1
No ratings yet
Lec 10, 1
15 pages
Isma Fitriana - Anum - Tugas 1
No ratings yet
Isma Fitriana - Anum - Tugas 1
4 pages
Duality and Sensitivity Analysis
No ratings yet
Duality and Sensitivity Analysis
63 pages
Integracion Richardson y Roomberg
No ratings yet
Integracion Richardson y Roomberg
63 pages
Lec 5, 1
No ratings yet
Lec 5, 1
16 pages
What Is Optimization?
No ratings yet
What Is Optimization?
10 pages
Cryogenics 1
No ratings yet
Cryogenics 1
7 pages
Cryogenics 7
No ratings yet
Cryogenics 7
5 pages
Lec 2, 2
No ratings yet
Lec 2, 2
19 pages
Format Ujian Regresi Linear Sederhana
No ratings yet
Format Ujian Regresi Linear Sederhana
17 pages
STA302 Week11 Full
No ratings yet
STA302 Week11 Full
49 pages
Cryogenics 9
No ratings yet
Cryogenics 9
11 pages
Cryogenics 11
No ratings yet
Cryogenics 11
5 pages
NG 2010
No ratings yet
NG 2010
10 pages
Influential Observation
No ratings yet
Influential Observation
4 pages
Knacksack Problem
No ratings yet
Knacksack Problem
2 pages
MathLPG10 - M4L1 - Demo at Iloilo Seminar 10202019 Graphing Polynomial Function
No ratings yet
MathLPG10 - M4L1 - Demo at Iloilo Seminar 10202019 Graphing Polynomial Function
8 pages
Process Control Engineering: Prof. Mahmoud A. El-Rifai Prof. Reem S. Ettouney
No ratings yet
Process Control Engineering: Prof. Mahmoud A. El-Rifai Prof. Reem S. Ettouney
37 pages
Algorithm - Multiply Polynomials - Stack Overflow
No ratings yet
Algorithm - Multiply Polynomials - Stack Overflow
4 pages
Assignment 2 G10
No ratings yet
Assignment 2 G10
8 pages
Lec 11
No ratings yet
Lec 11
23 pages
PC Unit 5 Packets - Polynomials
No ratings yet
PC Unit 5 Packets - Polynomials
21 pages
MATH-351 - Numerical Methods
No ratings yet
MATH-351 - Numerical Methods
3 pages
Lec 9, 1
No ratings yet
Lec 9, 1
18 pages
Numerical Solution of Volterra-Fredholm Integral
No ratings yet
Numerical Solution of Volterra-Fredholm Integral
16 pages
Lec 6, 1
No ratings yet
Lec 6, 1
13 pages
Cryogenics 2
No ratings yet
Cryogenics 2
11 pages
Lec 8, 2
No ratings yet
Lec 8, 2
12 pages
ICTCM21 Proceedings Aslam Villanueva
No ratings yet
ICTCM21 Proceedings Aslam Villanueva
8 pages
Backward Substitution - Algowiki
No ratings yet
Backward Substitution - Algowiki
10 pages
Lec 7, 1
No ratings yet
Lec 7, 1
9 pages
Cryogenics 5
No ratings yet
Cryogenics 5
7 pages
2021
No ratings yet
2021
2 pages
CISE301 Syllabus
No ratings yet
CISE301 Syllabus
3 pages
C. Equal Percentage Plot
No ratings yet
C. Equal Percentage Plot
3 pages
HW2
No ratings yet
HW2
3 pages
MANSCI Finals Reviewer
No ratings yet
MANSCI Finals Reviewer
2 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Complex Variables
From Everand
Complex Variables
Francis J. Flanigan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Nonlinear Regression

Uploaded by

Nonlinear Regression

Uploaded by

CSC 411 / CSC D11 / CSC C11 Nonlinear Regression

3.1 Basis function regression

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 9

Polynomial basis functions

With a monomial basis, the regression model has the form

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 10

E(w) = ||y − Bw||2 (9)

3.2 Overfitting and Regularization

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 11

3. Discarding uncertainty: the posterior probability distribution of the unknowns is insuffi-

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 12

E(w) = (y − Bw)T (y − Bw) + λwT w (11)

w∗ = (BT B + λI)−1 BT y (14)

3.3 Artificial Neural Networks

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 13

−1 training data points −1

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 14

3.4 K-Nearest Neighbors

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 15

close to x than to those further away:

Copyright c 2015 Aaron Hertzmann, David J. Fleet and Marcus Brubaker 16

You might also like