0% found this document useful (0 votes)

13 views28 pages

Lecture 04

The document summarizes linear models for classification using machine learning. It discusses using linear discriminants to separate classes in input space and classify new data points. The perceptron algorithm is introduced for training linear classifiers by minimizing misclassifications. Probabilistic generative models are also covered, which model class-conditional densities to calculate posterior probabilities for classification. Maximum likelihood estimation is discussed for learning the parameters of these probabilistic linear classifiers from labeled training data.

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views28 pages

Lecture 04

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Advanced Machine Learning

Lecture 4: Classification
Sandjai Bhulai
Vrije Universiteit Amsterdam

[email protected]
15 September 2023
Linear models for classi cation

Advanced Machine Learning

fi
Classi cation with linear models
▪ Goal: take input vector x and map it onto one of K discrete
classes

▪ Consider linear models: separable by (D − 1) dimensional

hyperplanes in the D-dimensional input space

▪ Simplest linear regression model: y(x) = w⊤x + w0

f( ⋅ ) to map function onto discrete

▪ Use activation function
⊤
classes y(x) = f(w x + w0)

▪ Due to f( ⋅ ), these models are no longer linear in the parameters

3 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Discriminant functions
▪ The simplest case is the 2-class case: y(x) = w⊤x + w0,
where w is a weight vector and w0 is the bias
▪ Decision boundary is 0
▪ Consider 2 points xa and xb lie on the decision surface.
Because y(xa) = y(xb) = 0, we have w⊤(xa − xb) = 0.
▪ Thus, vector w is orthogonal to every vector lying within the
decision surface
▪ If x is on the decision surface, then y(x) = 0, indicating that
w⊤x w0
=−
∥w∥ ∥w∥

4 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Geometry of linear discriminants

▪ Decision surface is perpendicular to w

▪ Displacement is controlled by w0

5 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Multiple classes
▪ Not generally good idea to use multiple 2-class classi ers to
do K-class classi cation
▪ Leads to ambiguous regions

6 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
fi
Single K-class classi er
▪ Single discriminant comprising K linear
functions of form yk(x) = w⊤k x + wk0

▪ Point x belongs to class Ck if yk(x) > yj(x)

for all j ≠ k

▪ Decision boundary between Ck and Cj is

given by yk(x) = yj(x) and corresponds to
(D − 1)-dimensional hyperplane
(wk − wj)⊤x + (wk0 − wj0) = 0
▪ Decision region singly connected and convex
(due to linearity of discriminant functions)

7 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Perceptron algorithm
▪ Rosenblatt (1962)
▪ Linear model with step activation function:

{−1, a < 0
⊤ +1, a ≥ 0,
y(x) = f(w φ(x)) f(a) =

▪ Train using perceptron criterion (here tn ∈ {−1,1})

w⊤φntn
∑
EP = −
n∈ℳ

where ℳ is the set of misclassi ed patterns

▪ Note that direct misclassi cation using total number of misclassi ed
patterns will not work because of non-linear f( ⋅ )

8 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
fi
fi
Perceptron algorithm
▪ Total error function is piecewise linear
▪ Stochastic gradient descent:

w(τ+1) = w(τ) − η ∇EP(w) = w(τ) + ηφntn

▪ Update is not a function of w, thus η can be

equal to 1
▪ Perceptron convergence theorem: if there exists
and exact solution, then PA will nd a solution in
a nite number of steps
▪ Attacked by Minsky and Papert in Perceptrons
(1969). Attack valid only for single-layer
perceptrons. Consequence: research stopped in
neural computation for nearly a decade

9 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
fi
Perceptron algorithm

10 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Perceptron algorithm

11 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Perceptron algorithm

12 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Perceptron algorithm

13 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

p(x | C1)p(C1)
where have de ned a = ln
p(x | C2)p(C2)

▪ σ is the logistic sigmoid function

(1 − σ)
σ
▪
The inverse of σ is the logit function a = ln

14 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Probabilistic generative models
▪ Generalization to multiple classes:
p(x | Ck)p(Ck) exp(ak)
p(Ck | x) = =
∑j p(x | Cj)p(Cj) ∑j exp(aj)

where ak = ln(p(x | Ck)p(Ck))

▪ This is known as the softmax function, because it is a

smoothed version of the max

▪ Different representations for class-conditional densities yield

different consequences in how classi cation is done

15 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Continuous inputs
▪ First assume that all classes share the same covariance
matrix and that there are only 2 classes.
▪ We have

p(C1 | x) = σ(w⊤x + w0)

{ 2 }
1 1 1 ⊤ −1
p(x | Ck) = exp − (x − μk ) Σ (x − μk)
(2π) D/2
|Σ| 1/2

where
w = Σ−1(μ1 − μ2)
1 ⊤ −1 1 ⊤ −1 p(C1)
w0 = − μ1 Σ μ1 + μ2 Σ μ2 + ln
2 2 p(C2)
16 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023
Continuous inputs
▪ Quadratic term from Gaussian vanishes. The priors p(Ck)
only enter via the bias parameter
▪ For the general case of K classes, we have
ak(x) = w⊤k x + wk0

where
wk = Σ−1μk
1 ⊤ −1
wk0 = − μk Σ μk + ln p(Ck)
2

17 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Continuous inputs

18 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Linear versus quadratic
▪ When covariance is shared by classes, the decision
boundary is linear
▪ When covariances are unlinked, the decision boundary is
quadratic

19 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Maximum likelihood
▪ Since we have a parametric form for class-conditional
densities p(x | Ck), we can determine values of the
parameters and priors p(Ck)

p(xn, C1) = p(C1)p(xn | C1) = q (xn | μ1, Σ)

p(xn, C2) = p(C2)p(xn | C2) = (1 − q) (xn | μ2, Σ)

▪ Let tn ∈ {0,1}, then the likelihood is then given by

∏[
q (xn | μ1, Σ)] [(1 − q) (xn | μ2, Σ)]
tn 1−tn
p(t, X | q, μ1, μ2, Σ) =
n=1
𝒩
𝒩
20 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023
𝒩
𝒩
Maximum likelihood
▪ The log-likelihood function with relevant terms for q is:
N

∑{ n
t ln q + (1 − tn)ln(1 − q)}
n=1

▪ Maximize with respect to q, yields

1 N N1 N1
N∑
q= tn = =
n=1
N N1 + N2

∏[
q (xn | μ1, Σ)] [(1 − q) (xn | μ2, Σ)]
tn 1−tn
p(t, X | q, μ1, μ2, Σ) =
n=1
𝒩
𝒩
21 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023
Maximum likelihood
▪ The log-likelihood function with relevant terms for μ1 is:
N
1 N
tn(xn − μ1)⊤Σ−1(xn − μ1) + const
∑ 2∑
tn ln (xn | μ1, Σ) = −
n=1 n=1

▪ Maximize with respect to μ1, yields

1 N
N1 ∑
μ1 = tnxn
n=1

∏[
q (xn | μ1, Σ)] [(1 − q) (xn | μ2, Σ)]
tn 1−tn
p(t, X | q, μ1, μ2, Σ) =
n=1
𝒩
𝒩
𝒩
22 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023
Logistic regression
▪ Posterior probability of class C1 written as a logistic sigmoid
acting on a linear function of feature vector φ
p(C1 | φ) = y(φ) = σ(w⊤φ) dσ(a)
= σ(a)(1 − σ(a))
da
▪ More compact than maximum likelihood tting of Gaussians.
For M parameters, Gaussian model uses 2M parameters for
the means, and M(M + 1)/2 parameters for the shared
covariance matrix
N
yntn(1 − yn)1−tn
∏
▪ Maximum likelihood: p(t | w) =
n=1

23 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Logistic regression
▪ Negative log of likelihood yields cross entropy
N

∑{ n
E(w) = − ln p(t | w) = − t ln yn + (1 − tn)ln(1 − yn)}
n=1

▪ Gradient with respect to w

∑
∇E(w) = (yn − tn)φn
n=1

▪ Therefore, we have the same form for the gradient for the
sum-of-squares error N
{n n }
⊤ ⊤
∑
∇ln p(t | w, β) = t − w φ(x ) φ(xn )
n=1
24 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023
Iterative reweighted least squares
▪ Ef cient iterative optimization: Newton-Raphson

wnew = wold − H −1 ∇E(w)

where H is the Hessian matrix (with second derivatives)

▪ For sum-of-squares error this can be done in one step

because the error function is quadratic

▪ For cross entropy we get a similar set of normal equations for

weighted least squares, which depends on w

▪ This dependency forces us to apply the update iteratively

25 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

fi
Iterative reweighted least squares
▪ Apply this to linear regression
N
(w⊤φn − tn)φn = Φ⊤Φw − Φ⊤t
∑
∇E(w) =
n=1
N
φnφn⊤ = Φ⊤Φ
∑
H = ∇ ∇E(w) =
n=1

▪ The Newton-Raphson update then takes

wnew = wold − H −1 ∇E(w) = wold − (Φ⊤Φ)−1{Φ⊤Φwold − Φ⊤t}

= (Φ⊤Φ)−1Φ⊤t

26 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Iterative reweighted least squares
▪ Apply this to logistic regression
N
(yn − tn)φn = Φ⊤(y − t)
∑
∇E(w) =
n=1
N
yn(1 − yn)φnφn⊤ = Φ⊤RΦ
∑
H = ∇ ∇E(w) =
n=1

with R as diagonal matrix with Rnn = yn(1 − yn)

27 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Iterative reweighted least squares
▪ The Newton-Raphson update then takes

wnew = wold − H −1 ∇E(w) = wold − (Φ⊤RΦ)−1Φ⊤(y − t)

= (Φ⊤RΦ)−1{Φ⊤RΦwold − Φ⊤(y − t)}

= (Φ⊤RΦ)−1Φ⊤Rz

where
z = Φwold − R −1(y − t)

28 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

Modern Deep Learning For Tabular Data Novel To Ye Annas Archive
No ratings yet
Modern Deep Learning For Tabular Data Novel To Ye Annas Archive
855 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Class 10 Ai Sample Paper MS 23-24
50% (2)
Class 10 Ai Sample Paper MS 23-24
8 pages
Slides AML
No ratings yet
Slides AML
215 pages
Slides AML PDF
No ratings yet
Slides AML PDF
176 pages
Lecture 03
No ratings yet
Lecture 03
47 pages
Lecture 02
No ratings yet
Lecture 02
33 pages
Final Year Project Report-1
No ratings yet
Final Year Project Report-1
42 pages
LSTM PPT
No ratings yet
LSTM PPT
22 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
4.2 Generative
No ratings yet
4.2 Generative
21 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Pattern Unit 3
No ratings yet
Pattern Unit 3
14 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
AI Meets HR
No ratings yet
AI Meets HR
49 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
08 Classification
No ratings yet
08 Classification
46 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
CS229
No ratings yet
CS229
216 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
CH 1
No ratings yet
CH 1
24 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
NN Theory
No ratings yet
NN Theory
138 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
CN Lab Manual - BCS502 - 2024-25 Final
No ratings yet
CN Lab Manual - BCS502 - 2024-25 Final
60 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
To Machine Learning: Isabelle Guyon
No ratings yet
To Machine Learning: Isabelle Guyon
40 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
ML 3
No ratings yet
ML 3
36 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
ML in 10 Pages 1683806402
No ratings yet
ML in 10 Pages 1683806402
10 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Trading Strategies HF
0% (1)
Trading Strategies HF
6 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
DWDM - Unit - II
No ratings yet
DWDM - Unit - II
55 pages
TensorFlow Vs Theano Vs Torch Vs Keras - Deep Learning Library
No ratings yet
TensorFlow Vs Theano Vs Torch Vs Keras - Deep Learning Library
10 pages
M.SC (Data Science) 28.02.2018
No ratings yet
M.SC (Data Science) 28.02.2018
16 pages
Deep Learning DSE Handout
No ratings yet
Deep Learning DSE Handout
6 pages
An Enhanced Fuel Consumption Machine Learning Model Used in Vehicles
No ratings yet
An Enhanced Fuel Consumption Machine Learning Model Used in Vehicles
6 pages
Moogsoft Ebook Getting Acquainted With Aiops Platform
No ratings yet
Moogsoft Ebook Getting Acquainted With Aiops Platform
32 pages
Weka Tutorial 3
No ratings yet
Weka Tutorial 3
60 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Advances, Challenges and Opportunities in Creating Data For Trustworthy AI
No ratings yet
Advances, Challenges and Opportunities in Creating Data For Trustworthy AI
9 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
Trends in Personalized Video Recommendations
No ratings yet
Trends in Personalized Video Recommendations
46 pages
It Works As Follows:: Decision Tree ?
No ratings yet
It Works As Follows:: Decision Tree ?
3 pages
What Is Data Preparation? An In-Depth Guide - TechTarget
No ratings yet
What Is Data Preparation? An In-Depth Guide - TechTarget
14 pages
AI-driven Warehouse Automation: A Comprehensive Review of Systems
No ratings yet
AI-driven Warehouse Automation: A Comprehensive Review of Systems
12 pages
Module-1 Data Analytics in Healthcare Systems
No ratings yet
Module-1 Data Analytics in Healthcare Systems
23 pages
Univ 1001
No ratings yet
Univ 1001
3 pages
Phase-2 Intelligent Chatbot Automated Assistance
No ratings yet
Phase-2 Intelligent Chatbot Automated Assistance
7 pages
Lec 2
No ratings yet
Lec 2
11 pages
US KO PPT - Techblume
No ratings yet
US KO PPT - Techblume
12 pages
02+ijisae Budi+juarto
No ratings yet
02+ijisae Budi+juarto
7 pages
Pending As On 18.02.2025
No ratings yet
Pending As On 18.02.2025
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Lecture 04

Uploaded by

Lecture 04

Uploaded by

Advanced Machine Learning

Advanced Machine Learning

▪ Consider linear models: separable by (D − 1) dimensional

▪ Simplest linear regression model: y(x) = w⊤x + w0

f( ⋅ ) to map function onto discrete

▪ Due to f( ⋅ ), these models are no longer linear in the parameters

3 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

4 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ Decision surface is perpendicular to w

5 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

6 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ Point x belongs to class Ck if yk(x) > yj(x)

▪ Decision boundary between Ck and Cj is

7 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ Train using perceptron criterion (here tn ∈ {−1,1})

where ℳ is the set of misclassi ed patterns

8 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

w(τ+1) = w(τ) − η ∇EP(w) = w(τ) + ηφntn

▪ Update is not a function of w, thus η can be

9 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

10 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

11 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

12 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

13 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ σ is the logistic sigmoid function

14 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

where ak = ln(p(x | Ck)p(Ck))

▪ This is known as the softmax function, because it is a

▪ Different representations for class-conditional densities yield

15 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

p(C1 | x) = σ(w⊤x + w0)

17 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

18 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

19 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

p(xn, C1) = p(C1)p(xn | C1) = q (xn | μ1, Σ)

p(xn, C2) = p(C2)p(xn | C2) = (1 − q) (xn | μ2, Σ)

▪ Let tn ∈ {0,1}, then the likelihood is then given by

▪ Maximize with respect to q, yields

▪ Maximize with respect to μ1, yields

23 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ Gradient with respect to w

wnew = wold − H −1 ∇E(w)

▪ For sum-of-squares error this can be done in one step

▪ For cross entropy we get a similar set of normal equations for

▪ This dependency forces us to apply the update iteratively

25 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

▪ The Newton-Raphson update then takes

wnew = wold − H −1 ∇E(w) = wold − (Φ⊤Φ)−1{Φ⊤Φwold − Φ⊤t}

26 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

with R as diagonal matrix with Rnn = yn(1 − yn)

27 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

wnew = wold − H −1 ∇E(w) = wold − (Φ⊤RΦ)−1Φ⊤(y − t)

= (Φ⊤RΦ)−1{Φ⊤RΦwold − Φ⊤(y − t)}

28 Sandjai Bhulai / Advanced Machine Learning / 15 September 2023

You might also like