0% found this document useful (0 votes)

18 views47 pages

Neural - Networks

This document provides an introduction to machine learning and neural networks. It discusses how neural networks can be used for regression, binary classification, and multi-class classification problems. The key aspects covered are: 1) How neural networks extend linear models by making the basis functions parametric and learnable. 2) How neural networks are constructed and trained using gradient descent or variants to minimize an error function. 3) How neural networks can be applied to regression by minimizing mean squared error, and to classification by minimizing cross-entropy error.

Uploaded by

howgibaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views47 pages

Neural - Networks

Uploaded by

howgibaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Introduction to Machine Learning

Neural Networks
林彥宇教授
Yen-Yu Lin, Professor
國立陽明交通大學資訊工程學系
Computer Science, National Yang Ming Chiao Tung University

Some slides are modified from S.-J. Wang,

H.-T. Chen, V. Khalidov, and M. Hansard
Linear model for regression or classification

• A linear model for regression or classification

➢ Decision based on a linear combination of fixed nonlinear basis

functions
➢ 𝑓 is an identity function for regression
➢ 𝑓 is a nonlinear activation function for classification
◆Logistic sigmoid or softmax function

2
Linear model and neural networks

• Our goal is to extend the linear model by making

➢ 1. The basis functions depend on parameters
◆Parametric basis functions
➢ 2. Their parameters learnable during training

• The goal leads to the basic neural network model

https://www.houseofbots.com/news-detail/1442-1-what-is-deep-learning-and-neural-network 3
Activations

4
Activations

• Construct 𝑀 linear combinations of the inputs

➢ where 𝑎𝑗 is the activation for 𝑗 = 1, 2, … , 𝑀

1
➢ {𝑤𝑗𝑖 }𝐷
𝑖=1 are the weights. Superscript (1) indicates that these
parameters are in the first layer of neural networks
1
➢ 𝑤𝑗0 is the bias
➢ Each activation is nonlinearly transformed by using a
differentiable, nonlinear activation function ℎ, i.e.,
𝑧𝑗 = ℎ(𝑎𝑗 )
• {𝑧𝑗 = ℎ(𝑎𝑗 )}𝑀
𝑗=1 are called hidden units
5
Output unit activation

• The hidden units {𝑧𝑗 = ℎ(𝑎𝑗 )}𝑀

𝑗=1 are linearly combined in the
second layer of neural networks
• Suppose there are 𝐾 outputs in the neural networks. We have

➢ where 𝑎𝑘 is the output activation for 𝑘 = 1, 2, … , 𝐾

2
➢ {𝑤𝑘𝑗 }𝑀
𝑗=1 are the weights. Superscript (2) indicates that these
parameters are in the second layer of neural networks
2
➢ 𝑤𝑘0 is the bias
• 𝑎𝑘 is further transformed by output activation function

6
Neural networks for regression and classification

• Output activation 𝑎𝑘 is further transformed by output

activation function

• {𝑦𝑘 }𝐾
𝑘=1 are the final outputs of the neural networks

• For regression, is the identity function

• For two-class classification, is the logistic sigmoid function

• For multiclass classification, is the softmax function

7
Two-layer neural networks

• The two-layer neural network model

➢ where 𝐰 is the set of all weight and bias parameters

• The bias parameters can be absorbed into weight parameters
by using one additional input

8
Feed-forward neural networks

• Evaluating the following equation is called forward propagation

Network Diagram
Nodes: Input, hidden, and
output variables
Links: Weights and biases
Arrows: Propagation direction

9
Generalizations

• There may be more than one layer of hidden units

➢ Deep learning
• Individual units need not be fully connected to the next layer
➢ Convolutional neural networks
• Individual links may skip over one or more subsequent layers
➢ Skip connections

10
Neural networks as universal approximators

Points:
training data

Dashed curves:
Outputs of
three hidden
units

Curve:
Prediction by
the NN

11
Neural networks for classification

• 3-class classification
• 2-layer neural networks with 64 hidden units

https://www.annytab.com/neural-network-classification-in-python/

12
Network training

• Given a set of training data {𝐱𝑛 } where 𝑛 = 1, 2, … , 𝑁,

together with a corresponding set of target vectors {𝐭 𝑛 }, we
can learn the neural networks by minimizing

• Let’s consider how to train the networks by giving a

probabilistic interpretation to the network output

13
Neural networks for 1D regression

• We aim to minimize the error between and

• We assume that the target is a scalar-valued function, which is
normally distributed around the prediction

➢ where is the prediction by neural networks and is

the variance
• Suppose data are i.i.d. The likelihood is

➢ where and

14
ML solution for 1D regression

• Taking the negative logarithm, we get negative log likelihood

• The maximum likelihood solution for 𝐰 is equivalent to

minimizing the sum-of-squares error

• Does setting the gradient of to zero work?

➢ No closed-form solution

15
ML solution for 1D regression

• Optimization by using gradient descent, stochastic gradient descent,

or Newton-Raphson iterative optimization scheme

• The nonlinearity of makes to be nonconvex

• In practice, local minima of the negative log likelihood may be found

• After having found 𝐰ML , the value of 𝛽 can be found by

minimizing the negative log likelihood

16
ML solution for 1D regression

• After getting 𝐰ML and 𝛽ML , we can predict the distribution of

the target value 𝑡 for an input testing data point 𝐱 via

17
ML solution for 1D regression

• Two-layer neural networks for one-dimensional regression

(1) 𝑧𝑀 (2)
𝑤𝑀𝐷 𝑤1𝑀
𝑥𝑛𝐷
output target
input
….

𝑦𝑛 𝑡𝑛
𝐱𝑛
𝑥𝑛1
𝑥𝑛0 𝑧1
𝑧0

18
Neural networks for multi-dimensional regression

• Neural networks can be used for 𝐾-dimensional regression

• Construct neural networks with 𝐾 outputs

• Make the following assumption

• We can use maximum likelihood solution, which is equivalent

to minimizing the sum-of-squares errors, to get 𝐰ML

• Similarly given 𝐰ML , the optimal 𝛽ML is obtained

19
Neural networks for multi-dimensional regression

• Two-layer neural networks for 𝐾-dimensional regression

𝑧𝑀 output target
(1) (2)
𝑤𝑀𝐷 𝑤𝐾𝑀 𝒚𝑛 𝐭𝑛
𝑥𝑛𝐷 𝑦𝑛𝐾 𝑡𝑛𝐾
input

….
….

….
𝐱𝑛
𝑥𝑛1 𝑦𝑛1 𝑡𝑛1
𝑥𝑛0 𝑧1
𝑧0
1 1 K
En ( w ) = y n − t n =  ( ynk − t nk ) 2
2

2 2 k =1
20
Neural networks for binary classification

• Neural networks can be used for classification

• Given a set of training data {𝐱𝑛 } where 𝑛 = 1, 2, … , 𝑁,
together with a corresponding set of target labels {𝑡𝑛 }, where
𝑡𝑛 = 1 denotes class 𝐶1 and 𝑡𝑛 = 0 denotes class 𝐶2

• Construct (two-layer) neural networks having a single output

whose activation function is a logistic sigmoid

➢ where
➢ is the conditional probability
➢ The conditional probability is given by

21
ML solution for binary classification

• Regression: the target is a real-valued function, which is

normally distributed around the prediction

• Classification: the conditional distribution of a target given its

input is a Bernoulli distribution of the form

22
ML solution for binary classification

• When using ML optimization, we minimize the negative log

likelihood, here called cross-entropy error

➢ where denotes
• Optimize 𝐰 by using gradient descent or its variant

• After getting 𝐰ML , binary classification is carried out by

23
ML solution for binary classification

(1) 𝑧𝑀 (2)
𝑤𝑀𝐷 𝑤1𝑀
𝑥𝑛𝐷
output target
input
….

𝑦𝑛 𝑡𝑛
𝐱𝑛
𝑥𝑛1
𝑥𝑛0 𝑧1
𝑧0

24
Neural networks for multi-class classification

• Neural networks can be extended to 𝐾-class classification

• Given a set of training data {𝐱𝑛 } where 𝑛 = 1, 2, … , 𝑁,

together with a corresponding set of target vectors {𝐭 𝑛 },
where 𝐭 𝑛 is encoded by using 1-of- 𝐾 coding scheme

• Construct (two-layer) neural networks having 𝐾 outputs and

use softmax as the activation function

➢ where and

25
ML solution for multi-class classification

• The negative log likelihood or the cross-entropy error is

• Optimize 𝐰 by using gradient descent or its variant

• After getting 𝐰ML , multi-class classification is carried out by

using the softmax function

26
ML solution for multi-class classification

• Two-layer neural networks for 𝐾-class classification

𝑧𝑀 output target
(1) (2)
𝑤𝑀𝐷 𝑤𝐾𝑀 𝒚𝑛 𝐭𝑛
𝑥𝑛𝐷 𝑦𝑛𝐾 𝑡𝑛𝐾
input

….
….

….
𝐱𝑛
𝑥𝑛1 𝑦𝑛1 𝑡𝑛1
𝑥𝑛0 𝑧1
𝑧0

27
Gradient descent

• The simplest approach is to update 𝐰 by a displacement in the

negative gradient direction

➢ This is a steepest descent algorithm

➢ is the learning rate
➢ This is a batch method, as evaluation of involves the entire
data set
➢ A range of starting points {𝐰 (0) } may be needed, in order to find
a satisfactory minimum

28
Stochastic gradient descent

• Stochastic gradient descent (or called sequential gradient

descent) has proved useful in practice when training neural
networks on a large data set
• The error function needs to comprise a sum of terms, one for
each data point, i.e.,

➢ Sum-of-squares error for regression

➢ Cross-entropy error for classification

29
Stochastic gradient descent

• Stochastic gradient descent makes an update to the weight

vector based on one data point at a time

30
Geometric view of gradient descent

• The error function is a surface sitting over the weight

space

• is a local minimum

• is a global minimum

• At any point , the local gradient

of the error surface is given by the
vector

31
Error backpropagation

• The computational cost of gradient descent mainly lies in the

evaluation of gradient at each iteration
➢
➢ The dimension of gradient is the number of learnable parameters

• In feed-forward neural networks, the gradient of an error

function can be efficiently evaluated via an algorithm
called error backpropagation

32
Feed-forward neural networks

• Two-layer feed-forward neural networks for regression

𝑧 output
1 (2)
{𝑤𝑗𝑖 } 𝑀 {𝑤𝑘𝑗 } 𝒚
𝑥𝐷 𝑦𝐾
input 𝑎𝑗 𝑎𝑘

….
….

𝐱
𝑥1 𝑦1
𝑥0 𝑧1
M
D
𝑧0 ak =  wkj( 2 ) z j + wk( 20) , k = 1,..., K
aj = w x + w ,
(1)
ji i
(1)
j0 j = 1,..., M
i =1 j =1

z j = h(a j ) y k = ak

33
Error backpropagation

• Variables/Activations dependency:
1 (2)
{𝑥𝑖 } → {𝑤𝑗𝑖 } → {𝑎𝑗 } → {𝑧𝑗 } → {𝑤𝑘𝑗 } → {𝑎𝑘 } → {𝑦𝑘 } → 𝐸

• Our goal in gradient computation:

output
𝜕𝐸 𝜕𝐸
{𝑤𝑗𝑖
1
} 𝑧𝑀 {𝑤 (2) } 𝒚
(2) and 1 𝑘𝑗
𝜕𝑤𝑘𝑗 𝜕𝑤𝑗𝑖
𝑥𝐷 𝑦𝐾
𝑎𝑗 𝑎𝑘
input

….
.
…
• In backpropagation, we also 𝐱
𝑥1 𝑦1
need to compute
𝜕𝐸 𝜕𝐸
𝑥0 𝑧1
𝛿𝑘 = and 𝛿𝑗 =
𝜕𝑎𝑘 𝜕𝑎𝑗
𝑧0
34
output
}𝑧𝑀{𝑤 }
input 1 (2)
Error backpropagation 𝐱
{𝑤𝑗𝑖 𝑘𝑗
𝒚
𝑥𝐷 𝑎𝑘
𝑦𝐾
𝑎𝑗

….
.
…
• Stochastic gradient descent
𝑥1 𝑦1
𝑥0 𝑧1
• Multi-dimensional regression 𝑧0
D
a j =  w(ji1) xi + w(j10) , j = 1,..., M E E ak
i =1 Hidden layer j  =
z j = h(a j ) a j k ak a j

M
= h' (a j ) wkj( 2) k
ak =  wkj( 2) z j + wk( 20) , k = 1,..., K
k

Output layer E
j =1 k  = yk − t k
y k = ak ak

1 K
E (w ) =  ( yk − t k ) 2 Error function
2 k =1

35
Error backpropagation

• Variables/Activations dependency:
1 (2)
{𝑥𝑖 } → {𝑤𝑗𝑖 } → {𝑎𝑗 } → {𝑧𝑗 } → {𝑤𝑘𝑗 } → {𝑎𝑘 } → {𝑦𝑘 } → 𝐸

D
 j = h' (a j ) wkj( 2) k
a j =  w(ji1) xi + w(j10) , j = 1,..., M
k

i =1 Hidden layer
z j = h(a j )
M
ak =  wkj( 2) z j + wk( 20) , k = 1,..., K E E ak
j =1 Output layer = = k z j
wkj( 2) ak wkj
( 2)
y k = ak
1 K
E (w ) =  ( yk − t k ) 2 Error function  k = yk − t k
2 k =1

36
Error backpropagation

• Variables/Activations dependency:
1 (2)
{𝑥𝑖 } → {𝑤𝑗𝑖 } → {𝑎𝑗 } → {𝑧𝑗 } → {𝑤𝑘𝑗 } → {𝑎𝑘 } → {𝑦𝑘 } → 𝐸

D
 j = h' (a j ) wkj( 2) k
a j =  w(ji1) xi + w(j10) , j = 1,..., M
k

i =1 Hidden layer E E a j
= =  j xi
z j = h(a j ) w(ji1) a j w(ji1)
M
ak =  wkj( 2) z j + wk( 20) , k = 1,..., K E E ak
j =1 Output layer = = k z j
wkj( 2) ak wkj
( 2)
y k = ak
1 K
E (w ) =  ( yk − t k ) 2 Error function  k = yk − t k
2 k =1

37
A review of error backpropagation

output
{𝑤𝑗𝑖
1
} 𝑧𝑀 {𝑤 (2) } 𝒚
𝑘𝑗
𝑥𝐷 𝑦𝐾
input 𝑎𝑗 𝑎𝑘

….
.
𝐱 …
𝑥1 𝑦1
Step 1:
Step 3: 𝑥0 𝑧1  k = yk − t k
 j = h' (a j ) wkj( 2) k
k 𝑧0
Step 2:
Step 4: E E ak
= = k z j
E E a j wkj( 2) ak wkj
( 2)

= =  j xi
w ji
(1)
a j w ji
(1)

38
Error backpropagation for other tasks

E E yk
• Step 1: k  =
ak yk ak

 1 K
 
2 k =1
( yk − t k ) 2 regression

E (w ) =  − {t ln y (x, w ) + (1 − t ) ln(1 − y (x, w ))} binary classification
 K

−  t k ln yk (x , w ) multi - calss classification

 k =1
𝑦𝑘 = 𝑎𝑘 regression
1
𝑦= binary classification
1 + 𝑒 −𝑎
𝑒 𝑎𝑘
𝑦𝑘 = multi−class classification
σ𝑗 𝑒 𝑎𝑗

• Steps 2 ~ 4 remain unchanged

39
Neural networks’ applications

• Face detection

Rowley et al.
40
Convolutional neural networks

Low-Level Mid-Level High-Level Trainable

Feature Feature Feature Classifier

41
Convolutional neural networks’ applications

object recognition object segmentation

object detection

42
Recurrent neural networks

• Speech recognition

https://gab41.lab41.org/speech-recognition-you-down-with-ctc-8d3b558943f0

43
Generative adversarial networks

https://www.slideshare.net/xavigiro/deep-learning-for-computer-
vision-generative-models-and-adversarial-training-upc-2016

44
Generative adversarial networks’ applications

Karras et al. Wang et al.

45
References

• Chapters 5.1, 5.2, and 5.3 in the PRML textbook

46
Thank You for Your Attention!

Yen-Yu Lin (林彥宇)

Email: [email protected]
URL: https://www.cs.nctu.edu.tw/members/detail/lin

Artificial Intelligence Techniques in Human Resource Management (21st Century Business Management) - Apple Academic Press (202
No ratings yet
Artificial Intelligence Techniques in Human Resource Management (21st Century Business Management) - Apple Academic Press (202
281 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Polymer Engineering
100% (1)
Polymer Engineering
158 pages
Lecture - 3 - 6
No ratings yet
Lecture - 3 - 6
26 pages
Lecture01-AI-UMT-Spring 24
No ratings yet
Lecture01-AI-UMT-Spring 24
15 pages
Neural Network
100% (1)
Neural Network
54 pages
Ai and Film Making
No ratings yet
Ai and Film Making
12 pages
Neural Network
No ratings yet
Neural Network
97 pages
1Z0-1127-24 Practice Exam Questions
No ratings yet
1Z0-1127-24 Practice Exam Questions
12 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Latest Dissertation Topics For Mba Marketing
100% (2)
Latest Dissertation Topics For Mba Marketing
7 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
10 Neural Network
No ratings yet
10 Neural Network
65 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Full Download Intelligent Natural Language Processing Trends and Applications 1st Edition Khaled Shaalan PDF
88% (8)
Full Download Intelligent Natural Language Processing Trends and Applications 1st Edition Khaled Shaalan PDF
55 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
CONFERENCE REPORT - Sample
No ratings yet
CONFERENCE REPORT - Sample
3 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Neural
No ratings yet
Neural
53 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
No ratings yet
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
24 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Indiaai 21 Women in 21
No ratings yet
Indiaai 21 Women in 21
96 pages
Dec 2024
No ratings yet
Dec 2024
48 pages
Chapter 2 - 2 Shallow Neural Network 2 - 2
No ratings yet
Chapter 2 - 2 Shallow Neural Network 2 - 2
34 pages
Inference and Learning
No ratings yet
Inference and Learning
33 pages
4 Neural Networks
No ratings yet
4 Neural Networks
31 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
State of Platform Engineering Vol 2
No ratings yet
State of Platform Engineering Vol 2
26 pages
Smart AI Voice Assistant Through Generative Text Transformer and NLP Implementation in Python
No ratings yet
Smart AI Voice Assistant Through Generative Text Transformer and NLP Implementation in Python
6 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Problems Challenges of Hci
No ratings yet
Problems Challenges of Hci
15 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
Chapter 3 - Logistic Regression
No ratings yet
Chapter 3 - Logistic Regression
33 pages
Neural Networks: Introduction & Matlab Examples
No ratings yet
Neural Networks: Introduction & Matlab Examples
46 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Unit 3
No ratings yet
Unit 3
35 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
Activation Functions and Loss
No ratings yet
Activation Functions and Loss
17 pages
BlinkX Product Improvements
No ratings yet
BlinkX Product Improvements
14 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Unit 4
No ratings yet
Unit 4
13 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Redes Neuronales Desde 0
No ratings yet
Redes Neuronales Desde 0
21 pages
Chatgpt in Arabic-English Translation
No ratings yet
Chatgpt in Arabic-English Translation
20 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Unit II
No ratings yet
Unit II
12 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
INSAIT JOY Smart Basketball
No ratings yet
INSAIT JOY Smart Basketball
26 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
ARC Prize 2025 Paper Submission
No ratings yet
ARC Prize 2025 Paper Submission
13 pages
Introduction To Software Engineering
No ratings yet
Introduction To Software Engineering
23 pages
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
No ratings yet
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
36 pages
3EBX0 Lecture Notes Addendum
No ratings yet
3EBX0 Lecture Notes Addendum
10 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Machine Learning With Convolutional Neural Networks
No ratings yet
Machine Learning With Convolutional Neural Networks
22 pages
Ch22 Presn PDF
No ratings yet
Ch22 Presn PDF
34 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
ĐỀ SỞ GD LÀO CAI
No ratings yet
ĐỀ SỞ GD LÀO CAI
6 pages
Face Recognize in Vehicle
No ratings yet
Face Recognize in Vehicle
8 pages
Leeds Artificial Intelligence Course Guide
No ratings yet
Leeds Artificial Intelligence Course Guide
11 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Indian Staffing Federation: Thriving in Smart Transformation
No ratings yet
Indian Staffing Federation: Thriving in Smart Transformation
9 pages
Software Engineering Registration Examination Syllabus - Compressed
No ratings yet
Software Engineering Registration Examination Syllabus - Compressed
7 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Chapter 5 Summary
No ratings yet
Chapter 5 Summary
5 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Model Paper For ML
No ratings yet
Model Paper For ML
3 pages
Lecture 5 - Multi-Layer Feedforward Neural Networks Using Matlab Part 1
No ratings yet
Lecture 5 - Multi-Layer Feedforward Neural Networks Using Matlab Part 1
4 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet

Neural - Networks

Uploaded by

Neural - Networks

Uploaded by

Introduction to Machine Learning

Some slides are modified from S.-J. Wang,

• A linear model for regression or classification

➢ Decision based on a linear combination of fixed nonlinear basis

• Our goal is to extend the linear model by making

• The goal leads to the basic neural network model

• Construct 𝑀 linear combinations of the inputs

➢ where 𝑎𝑗 is the activation for 𝑗 = 1, 2, … , 𝑀

• The hidden units {𝑧𝑗 = ℎ(𝑎𝑗 )}𝑀

➢ where 𝑎𝑘 is the output activation for 𝑘 = 1, 2, … , 𝐾

• Output activation 𝑎𝑘 is further transformed by output

• For regression, is the identity function

• For multiclass classification, is the softmax function

• The two-layer neural network model

➢ where 𝐰 is the set of all weight and bias parameters

• Evaluating the following equation is called forward propagation

• There may be more than one layer of hidden units

• Given a set of training data {𝐱𝑛 } where 𝑛 = 1, 2, … , 𝑁,

• Let’s consider how to train the networks by giving a

• We aim to minimize the error between and

➢ where is the prediction by neural networks and is

• Taking the negative logarithm, we get negative log likelihood

• The maximum likelihood solution for 𝐰 is equivalent to

• Does setting the gradient of to zero work?

• Optimization by using gradient descent, stochastic gradient descent,

• The nonlinearity of makes to be nonconvex

• In practice, local minima of the negative log likelihood may be found

• After having found 𝐰ML , the value of 𝛽 can be found by

• After getting 𝐰ML and 𝛽ML , we can predict the distribution of

• Two-layer neural networks for one-dimensional regression

• Neural networks can be used for 𝐾-dimensional regression

• Construct neural networks with 𝐾 outputs

• Make the following assumption

• We can use maximum likelihood solution, which is equivalent

• Similarly given 𝐰ML , the optimal 𝛽ML is obtained

• Two-layer neural networks for 𝐾-dimensional regression

• Neural networks can be used for classification

• Construct (two-layer) neural networks having a single output

• Regression: the target is a real-valued function, which is

• Classification: the conditional distribution of a target given its

• When using ML optimization, we minimize the negative log

• After getting 𝐰ML , binary classification is carried out by

• Neural networks can be extended to 𝐾-class classification

• Given a set of training data {𝐱𝑛 } where 𝑛 = 1, 2, … , 𝑁,

• Construct (two-layer) neural networks having 𝐾 outputs and

• The negative log likelihood or the cross-entropy error is

• Optimize 𝐰 by using gradient descent or its variant

• After getting 𝐰ML , multi-class classification is carried out by

• Two-layer neural networks for 𝐾-class classification

• The simplest approach is to update 𝐰 by a displacement in the

➢ This is a steepest descent algorithm

• Stochastic gradient descent (or called sequential gradient

➢ Sum-of-squares error for regression

➢ Cross-entropy error for classification

• Stochastic gradient descent makes an update to the weight

• The error function is a surface sitting over the weight

• At any point , the local gradient

• The computational cost of gradient descent mainly lies in the

• In feed-forward neural networks, the gradient of an error

• Two-layer feed-forward neural networks for regression

• Our goal in gradient computation:

−  t k ln yk (x , w ) multi - calss classification

• Steps 2 ~ 4 remain unchanged

Low-Level Mid-Level High-Level Trainable

object recognition object segmentation

Karras et al. Wang et al.

• Chapters 5.1, 5.2, and 5.3 in the PRML textbook

Yen-Yu Lin (林彥宇)

You might also like