100% found this document useful (1 vote)

58 views48 pages

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression

This document outlines key concepts in linear regression models. It discusses using linear basis functions with polynomial, Gaussian, and sigmoidal shapes to model regression problems. It introduces maximum likelihood and least squares estimation for fitting linear models to data. The document covers overfitting, regularization techniques like ridge regression and the lasso to prevent overfitting, and Bayesian linear regression with a prior over weights. It also discusses using predictive distributions to make predictions from trained linear models.

Uploaded by

Harish Kumar J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

58 views48 pages

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression

Uploaded by

Harish Kumar J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR
REGRESSION

Outline

Discuss tutorial.
Regression Examples.
The Gaussian distribution.
Linear Regression.
Maximum Likelihood estimation.

Polynomial Curve Fitting

Academia Example
Predict: final percentage mark for student.
Features: 6 assignment grades, midterm exam, final
exam, project, age.
Questions we could ask.
I forgot the weights of components. Can you
recover them from a spreadsheet of the final
grades?
I lost the final exam grades. How well can I still
predict the final mark?
How important is each component, actually?
Could I guess well someones final mark given
their assignments? Given their exams?

The Gaussian Distribution

Central Limit Theorem

The distribution of the sum of N i.i.d.
random variables becomes
increasingly Gaussian as N grows.
Example: N uniform [0,1] random
variables.

Reading exponential prob

formulas
In infinite space, cannot just form sum
x p(x) grows to infinity.
Instead, use exponential, e.g.
p(n) = (1/2)n
Suppose there is a relevant feature
f(x) and I want to express that the
greater f(x) is, the less probable x
is.
Use p(x) = exp(-f(x)).

Example: exponential form

sample size
Fair Coin: The longer the sample
size, the less likely it is.
p(n) = 2-n.
ln[p(n)]

Sample size n

Exponential Form: Gaussian

mean
The further x is from the mean, the
less likely it is.
ln[p(x)]

2(x-)

Smaller variance decreases

probability
The smaller the variance 2, the less
likely x is (away from the mean). Or: the
greater the precision, the less likely x is.
ln[p(x)]

1/2 =

Minimal energy = max

probability
The greater the energy (of the joint
state), the less probable the state
is.
ln[p(x)]

E(x)

Linear Basis Function Models

(1)
Generally

where j(x) are known as basis functions.

Typically, 0(x) = 1, so that w0 acts as a
bias.
In the simplest case, we use linear basis
functions : d(x) = xd.

Linear Basis Function Models

(2)
Polynomial basis
functions:

These are global; a small

change in x affect all
basis functions.

Linear Basis Function Models

(3)
Gaussian basis functions:

These are local; a small

change in x only affect
nearby basis functions. j
and s control location and
scale (width).
Related to kernel methods.

Linear Basis Function Models

(4)
Sigmoidal basis functions:

where

Also these are local; a

small change in x only
affect nearby basis
functions. j and s control
location and scale
(slope).

Curve Fitting With Noise

Maximum Likelihood and Least

Squares (1)
Assume observations from a deterministic
function with added Gaussian noise:
wher
e

which is the same as saying,

Given observed inputs,
,
and targets,
, we obtain the likelihood
function

Maximum Likelihood and Least

Squares (2)
Taking the logarithm, we get

where

is the sum-of-squares error.

Maximum Likelihood and Least

Squares (3)
Computing the gradient and setting it to
zero yields

Solving for w, we get

where

The MoorePenrose pseudoinverse,

Linear Algebra/Geometry of Least

Squares
Consider

Ndimensional
Mdimensional

S is spanned by
.
wML minimizes the
distance between and
its orthogonal projection
on S, i.e. .

Maximum Likelihood and Least

Squares (4)
Maximizing with respect to the bias, w0,
alone, we see that

We can also maximize with respect to ,

giving

0th Order Polynomial

3rd Order Polynomial

9th Order Polynomial

Over-fitting

Root-Mean-Square (RMS)
Error:

Polynomial Coefficients

Data Set Size:

9th Order Polynomial

1st Order Polynomial

Data Set Size:

9th Order Polynomial

Quadratic Regularization
Penalize large coefficient values

Regularization:

vs.

Regularized Least Squares (1)

Consider the error function:
Data term + Regularization term

With the sum-of-squares error function and

a quadratic regularizer, we get

which is minimized by

is called
the
regularizati
on
coefficient.

Regularized Least Squares (2)

With a more general regularizer, we have

Lasso

Quadratic

Regularized Least Squares (3)

Lasso tends to generate sparser solutions
than a quadratic
regularizer.

Cross-Validation for
Regularization

Bayesian Linear Regression (1)

Define a conjugate shrinkage prior
over weight vector w:
p(w|) = N(w|0,-1I)
Combining this with the likelihood
function and using results for marginal
and conditional Gaussian distributions,
gives a posterior distribution.
Log of the posterior = sum of squared
errors + quadratic regularization.

Bayesian Linear Regression (3)

0 data points observed
Prior

Data Space

Bayesian Linear Regression (4)

1 data point observed
Likelihood

Posterior

Data Space

Bayesian Linear Regression (5)

2 data points observed
Likelihood

Posterior

Data Space

Bayesian Linear Regression (6)

20 data points observed
Likelihood

Posterior

Data Space

Predictive Distribution (1)

Predict t for new values of x by
integrating over w.
Can be solved analytically.

Predictive Distribution (2)

Example: Sinusoidal data, 9 Gaussian basis
functions, 1 data point

Predictive Distribution (3)

Example: Sinusoidal data, 9 Gaussian basis
functions, 2 data points

Predictive Distribution (4)

Example: Sinusoidal data, 9 Gaussian basis
functions, 4 data points

Predictive Distribution (5)

Example: Sinusoidal data, 9 Gaussian basis
functions, 25 data points

Limitations of Fixed Basis

Functions
M basis function along each
dimension of a D-dimensional input
space requires MD basis functions:
the curse of dimensionality.
In later chapters, we shall see how
we can get away with fewer basis
functions, by choosing these using
the training data.

Ps 1
No ratings yet
Ps 1
16 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ue22cs342aa2 20241114095341
No ratings yet
Ue22cs342aa2 20241114095341
23 pages
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
No ratings yet
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
108 pages
Python Automation Tools To Turbocharge - Hayden Van Der Post
No ratings yet
Python Automation Tools To Turbocharge - Hayden Van Der Post
400 pages
ML 3
No ratings yet
ML 3
66 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Amt305 Introduction To Machine Learning, Pyq
No ratings yet
Amt305 Introduction To Machine Learning, Pyq
5 pages
Getting Results With Curriculum Mapping
100% (2)
Getting Results With Curriculum Mapping
193 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Making Breakthrough Innovation Happen
No ratings yet
Making Breakthrough Innovation Happen
2 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Fractal Physiology and Chaos in Medicine
No ratings yet
Fractal Physiology and Chaos in Medicine
100 pages
Math4ml PDF
No ratings yet
Math4ml PDF
21 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Linear Modal For Regresion
No ratings yet
Linear Modal For Regresion
32 pages
Wechsler Intelligence Scale
100% (2)
Wechsler Intelligence Scale
13 pages
Curriculum Vitae English Teacher
100% (1)
Curriculum Vitae English Teacher
8 pages
Importance, Roles, Scope and Characteristics of Research
No ratings yet
Importance, Roles, Scope and Characteristics of Research
12 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
Wechsler Adult Intelligence Scale-III
50% (2)
Wechsler Adult Intelligence Scale-III
24 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Amazon Product Manager Interview Cheat Sheet
No ratings yet
Amazon Product Manager Interview Cheat Sheet
1 page
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Ensemble Methods
No ratings yet
Ensemble Methods
4 pages
Six Sigma Report
100% (1)
Six Sigma Report
47 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
Participant Guide Template
100% (3)
Participant Guide Template
17 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Leo Spitzer - Learning Turkish
100% (1)
Leo Spitzer - Learning Turkish
17 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Hebron Academy Viewbook
No ratings yet
Hebron Academy Viewbook
48 pages
Benlac Module 3 and 4 Reviewer Midterm
No ratings yet
Benlac Module 3 and 4 Reviewer Midterm
12 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Ps and Solution CS229
No ratings yet
Ps and Solution CS229
55 pages
03 Linear Regression
No ratings yet
03 Linear Regression
54 pages
Application of Nursing Theory of Dorothea Orem WRD
100% (5)
Application of Nursing Theory of Dorothea Orem WRD
26 pages
MIT15 401F08 Lec13 PDF
No ratings yet
MIT15 401F08 Lec13 PDF
47 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Chloe Provance Ilp
No ratings yet
Chloe Provance Ilp
10 pages
Compiled
No ratings yet
Compiled
26 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Bold Italic Font: [email protected] - Us [email protected] - Us
No ratings yet
Bold Italic Font: [email protected] - Us [email protected] - Us
34 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
100% (1)
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
01 - Welcome To ML4T
No ratings yet
01 - Welcome To ML4T
15 pages
Enlightenment
No ratings yet
Enlightenment
17 pages
Intro
No ratings yet
Intro
37 pages
Lesson 9 Pilot Testing
67% (6)
Lesson 9 Pilot Testing
2 pages
2.2 Learner Exceptionalities Portfolio
No ratings yet
2.2 Learner Exceptionalities Portfolio
3 pages
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
100% (1)
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
13 pages
0 - TERM End Assignment - Lesson Plan - Reviewed - 1
No ratings yet
0 - TERM End Assignment - Lesson Plan - Reviewed - 1
7 pages
Accessibility Training Plan
No ratings yet
Accessibility Training Plan
10 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Madurai Kamaraj University
No ratings yet
Madurai Kamaraj University
15 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Zotzi 3
No ratings yet
Zotzi 3
4 pages
Data Collection Template
No ratings yet
Data Collection Template
10 pages
My Canada Express Entry CRS Calculator v1.061217
No ratings yet
My Canada Express Entry CRS Calculator v1.061217
6 pages
Modes of Flexible Learning
No ratings yet
Modes of Flexible Learning
4 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Opportunities Intermediate Exam Zone Feladatok A Kozepszintu Erettsegire
No ratings yet
Opportunities Intermediate Exam Zone Feladatok A Kozepszintu Erettsegire
16 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
ISyE 6669 Homework 15 PDF
No ratings yet
ISyE 6669 Homework 15 PDF
3 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
S.No Empid Employeename Doj Organisation Empid For Empxtrack Team / Project Name
No ratings yet
S.No Empid Employeename Doj Organisation Empid For Empxtrack Team / Project Name
6 pages
Six Sigma Report - Updated
No ratings yet
Six Sigma Report - Updated
23 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
RC Moocs Docs 5
No ratings yet
RC Moocs Docs 5
2 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Mehryar Mohri - Foundations of Machine Learning - Book
No ratings yet
Mehryar Mohri - Foundations of Machine Learning - Book
1 page
Lic Notification 2011 12
No ratings yet
Lic Notification 2011 12
13 pages
Tomas Semester 4 ILP
No ratings yet
Tomas Semester 4 ILP
4 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Opening Range Paper
0% (1)
Opening Range Paper
7 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
BST Manager
No ratings yet
BST Manager
4 pages
Andrew Bongiovanni: Educator
No ratings yet
Andrew Bongiovanni: Educator
1 page
Learning Grammar Part 2
No ratings yet
Learning Grammar Part 2
9 pages
Lessons From Ruslana
No ratings yet
Lessons From Ruslana
4 pages
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
Resume - Galmish Alyssa
No ratings yet
Resume - Galmish Alyssa
2 pages
Tax Invoice: Billing Address Installation Address Invoice Details
No ratings yet
Tax Invoice: Billing Address Installation Address Invoice Details
1 page
Redundant Space at Stations List
No ratings yet
Redundant Space at Stations List
2 pages
Culture Class: Holidays in France S1 #10 Music Day: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #10 Music Day: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #11 Heritage Day: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #11 Heritage Day: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #8 Easter: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #8 Easter: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #4 Armistice: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #4 Armistice: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #1 New Year's Day: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #1 New Year's Day: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #7 Epiphany: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #7 Epiphany: Lesson Notes
2 pages
Culture Class: Holidays in France S1 #5 Labor Day: Lesson Notes
No ratings yet
Culture Class: Holidays in France S1 #5 Labor Day: Lesson Notes
2 pages
Ps 1
No ratings yet
Ps 1
5 pages
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
No ratings yet
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
3 pages
BU-PC - APPLICATION FORM ANNEX A (REGISTRATION FORM) PDF PDF
No ratings yet
BU-PC - APPLICATION FORM ANNEX A (REGISTRATION FORM) PDF PDF
1 page
Ell Differentiation Guide
No ratings yet
Ell Differentiation Guide
3 pages
Nursing Instructor Traits
No ratings yet
Nursing Instructor Traits
1 page

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression

Uploaded by

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression

Uploaded by

PATTERN RECOGNITION

AND MACHINE LEARNING

Polynomial Curve Fitting

The Gaussian Distribution

Central Limit Theorem

Reading exponential prob

Example: exponential form

Exponential Form: Gaussian

Smaller variance decreases

Minimal energy = max

Linear Basis Function Models

where j(x) are known as basis functions.

Linear Basis Function Models

These are global; a small

Linear Basis Function Models

These are local; a small

Linear Basis Function Models

Also these are local; a

Curve Fitting With Noise

Maximum Likelihood and Least

which is the same as saying,

Maximum Likelihood and Least

is the sum-of-squares error.

Maximum Likelihood and Least

Solving for w, we get

The MoorePenrose pseudoinverse,

Linear Algebra/Geometry of Least

Maximum Likelihood and Least

We can also maximize with respect to ,

0th Order Polynomial

3rd Order Polynomial

9th Order Polynomial

Data Set Size:

1st Order Polynomial

Data Set Size:

Regularized Least Squares (1)

With the sum-of-squares error function and

Regularized Least Squares (2)

Regularized Least Squares (3)

Bayesian Linear Regression (1)

Bayesian Linear Regression (3)

Bayesian Linear Regression (4)

Bayesian Linear Regression (5)

Bayesian Linear Regression (6)

Predictive Distribution (1)

Predictive Distribution (2)

Predictive Distribution (3)

Predictive Distribution (4)

Predictive Distribution (5)

Limitations of Fixed Basis

You might also like