0% found this document useful (0 votes)

27 views68 pages

Foundations of Machine Learning: Module 6: Neural Network

Uploaded by

nilayesh.bhattacharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views68 pages

Foundations of Machine Learning: Module 6: Neural Network

Uploaded by

nilayesh.bhattacharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Foundations of Machine Learning

Module 6: Neural Network

Part A: Introduction

Sudeshna Sarkar
IIT Kharagpur
Introduction
• Inspired by the human brain.
• Some NNs are models of biological neural networks
• Human brain contains a massively interconnected
net of 1010-1011 (10 billion) neurons (cortical cells)
– Massive parallelism – large number of simple
processing units
– Connectionism – highly interconnected
– Associative distributed memory
• Pattern and strength of synaptic connections
Neuron

Neural Unit
ANNs
• ANNs incorporate the two fundamental components of
biological neural nets:
1. Nodes - Neurones
2. Weights - Synapses
Perceptrons
• Basic unit in a neural network: Linear separator
– N inputs, x1 ... xn
– Weights for each input, w1 ... wn
– A bias input x0 (constant) and associated weight w0
– Weighted sum of inputs, 𝑦 = σ𝑛𝑖=0 𝑤𝑖 𝑥𝑖
– A threshold function, i.e., 1 if y > 0, −1 if y <= 0
x0
x1 w0
w1
x2 w2

⋮ wn
Σ 𝜑=
1 if 𝑦 > 0
𝑦 = ෍ 𝑤𝑖 𝑥𝑖 −1 otherwise
xn
Perceptron training rule
Updates perceptron weights for a training ex as follows:
𝑤𝑖 = 𝑤𝑖 + 𝛥𝑤𝑖

𝛥𝑤𝑖 = 𝜂 𝑦 − 𝑦ො 𝑥𝑖
• If the data is linearly separable and 𝜂 is sufficiently small, it will
converge to a hypothesis that classifies all training data correctly in a
finite number of iterations
Gradient Descent
• Perceptron training rule may not converge if points are not
linearly separable
• Gradient descent by changing the weights by the total error
for all training points.
– If the data is not linearly separable, then it will converge to
the best fit
Linear neurons
• The neuron has a real- • Define the error as the
valued output which is a squared residuals summed
weighted sum of its inputs over all training cases:
1
𝐸 = ෍(𝑦 − 𝑦) ො 2
𝑦ො = ෍ 𝑤𝑖 𝑥𝑖 = 𝐰 𝑇 𝐱 2
𝑗
𝑖
• Differentiate to get error derivatives for weights
𝜕𝐸 1 𝜕𝑦ෝ𝑗 𝜕𝐸𝑗
= ෍ = − ෍ 𝑥𝑖,𝑗 (𝑦𝑗 − 𝑦ෝ𝑗 )
𝜕𝑤𝑖 2 𝜕𝑤𝑖 𝜕𝑦ෝ𝑗
𝑗=1..𝑚 𝑗=1..𝑚
• The batch delta rule changes the weights in proportion to
their error derivatives summed over all training cases
𝜕𝐸
∆𝑤𝑖 = −𝜂
𝜕𝑤𝑖
Error Surface
• The error surface lies in a space with a horizontal axis for each
weight and one vertical axis for the error.
– For a linear neuron, it is a quadratic bowl.
– Vertical cross-sections are parabolas.
– Horizontal cross-sections are ellipses.
Batch Line and Stochastic Learning
Batch Learning Stochastic/ Online Learning
• Steepest descent on the For each example compute the
error surface gradient.
1
𝐸 = (𝑦 − 𝑦) ො 2
2
𝜕𝐸 1 𝜕ෝ𝑦. 𝜕𝐸𝑗
=
𝜕𝑤𝑖 2 𝜕𝑤𝑖 𝜕ෝ 𝑦.
= −𝑥𝑖 (𝑦. − 𝑦ෝ. )
Computation at Units
• Compute a 0-1 or a graded function of the
weighted sum of the inputs
•  () is the activation function
x1 w1

  ( w.x)
x2

w2

wn
xn w.x   wi xi
Neuron Model: Logistic Unit
1 1
 (z)  z

1 e 1  e  w .x
𝜙′ 𝑧 = 𝜑 𝑧 1 − 𝜑 𝑧
1 1
ො = ෍(𝑦 − 𝜑 𝑤. 𝑥𝑑 )2
𝐸 = ෍(𝑦 − 𝑦) 2
2 2
𝑑 𝑑
𝜕𝐸 1 𝜕𝐸𝑑 𝜕 𝑦ෞ
𝑑.
=෍
𝜕𝑤𝑖 2 𝜕ෞ
𝑦𝑑 𝜕𝑤𝑖
𝑑
𝜕𝑦
= ෍(𝑦𝑑 − 𝑦ෞ
𝑑. ) 𝑦 − 𝜑(𝑤. 𝑥𝑑 )
𝜕𝑤𝑖 𝑑
𝑑

= − ෍(𝑦𝑑 − 𝑦ෞ
𝑑 . ) 𝜑′ 𝑤. 𝑥𝑑 𝑥𝑖,𝑑
𝑑

= − ෍ 𝑦𝑑 − 𝑦ෞ
𝑑. 𝑦
ෞ𝑑 . (1 − 𝑦
ෞ𝑑 . )𝑥𝑖,𝑑
𝑑
Training Rule: ∆𝑤𝑖 = 𝜂 σ𝑑 𝑦𝑑 − 𝑦ෞ
𝑑. 𝑦
ෞ𝑑 . (1 − 𝑦
ෞ𝑑 . )𝑥𝑖,𝑑
Thank You
Foundations of Machine Learning
Module 6: Neural Network
Part B: Multi-layer Neural
Network
Sudeshna Sarkar
IIT Kharagpur
Limitations of Perceptrons
• Perceptrons have a monotinicity property:
If a link has positive weight, activation can only increase as the
corresponding input value increases (irrespective of other
input values)
• Can’t represent functions where input interactions can cancel
one another’s effect (e.g. XOR)
• Can represent only linearly separable functions
A solution: multiple layers
output layer
y y

hidden layer
z1 z2 z1

input layer
x1 x2 x1
Power/Expressiveness of Multilayer
Networks
• Can represent interactions among inputs
• Two layer networks can represent any Boolean
function, and continuous functions (within a
tolerance) as long as the number of hidden units is
sufficient and appropriate activation functions used
• Learning algorithms exist, but weaker guarantees
than perceptron learning algorithms
Multilayer Network

Outputls
Inputs

First Second
Input hidden hidden Output
layer layer layer
Two-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

n1
n n2 yn2
xn
Input Hidden Output
layer layer

Error signals

6
The back-propagation training algorithm
• Step 1: Initialisation
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range
1

v01
v11 1
x1 1 1 w11
v21 w01

1 y1
v22
x2 2 2 w21
v22
Input v02 Output

1
x z y
Backprop
• Initialization
– Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small
range
• Forward computing:
– Apply an input vector x to input units
– Compute activation/output vector z on hidden layer
𝑧𝑗 = 𝜑(σ𝑖 𝑣𝑖𝑗 𝑥𝑖 )
– Compute the output vector y on output layer
𝑦𝑘 = 𝜑(σ𝑗 𝑤𝑗𝑘 𝑧𝑗 )
y is the result of the computation.
Learning for BP Nets
• Update of weights in W (between output and hidden layers):
– delta rule
• Not applicable to updating V (between input and hidden)
– don’t know the target values for hidden units z1, Z2, … ,ZP
• Solution: Propagate errors at output units to hidden units to
drive the update of weights in V (again by delta rule)
(error BACKPROPAGATION learning)
• Error backpropagation can be continued downward if the net
has more than one hidden layer.
• How to compute errors on hidden units?
Derivation
• For one output neuron, the error function is
1
𝐸 = (𝑦 − 𝑦) ො 2
2
• For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛

𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 ෍ 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
The input 𝑛𝑒𝑡𝑗 to a neuron is the weighted sum of outputs 𝑜𝑘
of previous 𝑛 neurons.
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
Derivation
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
𝑛
𝜕𝑛𝑒𝑡𝑗 𝜕
= ෍ 𝑤𝑘𝑗 𝑜𝑘 = 𝑜𝑖
𝜕𝑤𝑖𝑗 𝜕𝑤𝑖𝑗
𝑘=1
𝜕𝑜𝑗 𝜕
= 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 𝑛𝑒𝑡𝑗 1 − 𝜑 𝑛𝑒𝑡𝑗
𝜕𝑛𝑒𝑡𝑗 𝜕𝑛𝑒𝑡𝑗
Consider 𝐸 as as a function of the inputs of all neurons 𝑍 = 𝑧1 , 𝑧2 , …
receiving input from neuron 𝑗,
𝜕𝐸 𝑜𝑗 𝜕𝐸 𝑛𝑒𝑡𝑧1 , 𝑛𝑒𝑡𝑧2 , …
=
𝜕𝑜𝑗 𝜕𝑜𝑗
taking the total derivative with respect to 𝑜𝑗 , a recursive expression for
the derivative is obtained:
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤𝑗𝑧𝑙
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙
𝑙 𝑙
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙 𝑗𝑧𝑙
𝑙 𝑙
• Therefore, the derivative with respect to 𝑜𝑗 can be calculated if all the derivatives
with respect to the outputs 𝑜𝑧𝑙 of the next layer – the one closer to the output
neuron – are known.
• Putting it all together:
𝜕𝐸
= 𝛿𝑗 𝑜𝑖
𝜕𝑤𝑖𝑗
With
𝑜𝑗 − 𝑡𝑗 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an output neuron
𝜕𝐸 𝜕𝑜𝑗
𝛿𝑗 = =
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 ෍ 𝛿𝑧𝑙 𝑤𝑗𝑙 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an inner neuron
𝑍
To update the weight 𝑤𝑖𝑗 using gradient descent, one must choose a learning rate 𝜂.
𝜕𝐸
∆𝑤𝑖𝑗 = −𝜂
𝜕𝑤𝑖𝑗
Backpropagation Algorithm
Thank You
Foundations of Machine Learning
Module 6: Neural Network
Part C: Neural Network and
Backpropagation Algorithm

Sudeshna Sarkar
IIT Kharagpur
Single layer Perceptron
• Single layer perceptrons learn o x
linear decision boundaries
x2
0 0
0 0 o o
+ + 0 0
+ 0
+ ++
x: class I (y = 1)
o: class II (y = -1)
x1
x x
x2

+ 0
o x

0 +
x: class I (y = 1)
x1 o: class II (y = -1)
xor
x2

Boolean OR + +
OR

- + x1
input input
ouput
x1 x2
w0= -0.5
0 0 0
0 1 1 1
w1=1 w2=1
1 0 1
1 1 1 x1 x2
x2

Boolean AND - +

AND
input input x1
ouput - -
x1 x2
w0= -1.5
0 0 0
0 1 0 1
w1=1 w2=1
1 0 0
1 1 1 x1 x2
x2

Boolean XOR
+ -

XOR
input input
ouput
x1 x2 x1
- +
0 0 0
0 1 1
1 0 1
1 1 0
Boolean XOR
XOR

o -0.5

input input
ouput 1 -1
x1 x2
OR AND
0 0 0 -0.5 h1 h1 -1.5
0 1 1
1
1 0 1 1
1 1
1 1 0
x1 x1
Representation Capability of NNs
• Single layer nets have limited representation power (linear
separability problem). Multi-layer nets (or nets with non-
linear hidden units) may overcome linear inseparability
problem.
• Every Boolean function can be represented by a network with
a single hidden layer.
• Every bounded continuous function can be approximated with
arbitrarily small error, by network with one hidden layer
• Any function can be approximated to arbitrary accuracy by a
network with two hidden layers.
Multilayer Network

Outputls
Inputs

First Second
Input hidden hidden Output
layer layer layer
Two-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

n1
n n2 yn2
xn
Input Hidden Output
layer layer

Error signals

9
Derivation
• For one output neuron, the error function is
1
𝐸 = (𝑦 − 𝑜)2
2
• For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛

𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 ෍ 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
The input 𝑛𝑒𝑡𝑗 to a neuron is the weighted sum of outputs 𝑜𝑘
of previous 𝑛 neurons.
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
1
For one output neuron, the error function is 𝐸 = (𝑦 − 𝑜)2
2
For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛

𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 ෍ 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
𝜕𝐸 𝜕𝑜𝑙
=෍ 𝑤 𝜑 𝑛𝑒𝑡𝑗 1 − 𝜑 𝑛𝑒𝑡𝑗 𝑜𝑖
𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙 𝑗𝑧𝑙
𝑙
𝜕𝐸
= 𝛿𝑗 𝑜𝑖
𝜕𝑤𝑖𝑗
with
𝑜𝑗 − 𝑦𝑗 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an output neuron
𝜕𝐸 𝜕𝑜𝑗
𝛿𝑗 = =
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 ෍ 𝛿𝑧𝑙 𝑤𝑗𝑙 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an inner neuron
𝑍
To update the weight 𝑤𝑖𝑗 using gradient descent, one must choose a learning rate 𝜂.
𝜕𝐸
∆𝑤𝑖𝑗 = −𝜂
𝜕𝑤𝑖𝑗
Backpropagation Algorithm
Initialize all weights to small random numbers.
Until satisfied, do
– For each training example, do
• Input the training example to the network and compute the network
outputs
• For each output unit 𝑘
𝛿𝑘 ← 𝑜𝑘 (1 − 𝑜𝑘 )(𝑦𝑘 − 𝑜𝑘 )
• For each hidden unit h 𝑥𝑑 = input

𝛿ℎ ← 𝑜ℎ (1 − 𝑜ℎ ) ෍ 𝑤ℎ,𝑘 , 𝛿𝑘 , 𝑦𝑑 = target output

𝑘∈𝑜𝑢𝑡𝑝𝑢𝑡𝑠 𝑜𝑑 = observed unit output
• Update each network weight 𝑤𝑖 , 𝑗
𝑤𝑖,𝑗 ← 𝑤𝑖,𝑗 + ∆𝑤𝑖,𝑗 𝑤𝑖𝑗 = wt from i to j
where
∆𝑤𝑖,𝑗 = 𝜂𝛿𝑗 𝑥𝑖,𝑗
Backpropagation
• Gradient descent over entire network weight vector
• Can be generalized to arbitrary directed graphs
• Will find a local, not necessarily global error minimum
• May include weight momentum 𝛼
∆𝑤𝑖,𝑗 𝑛 = 𝜂𝛿𝑗 𝑥𝑖,𝑗 + 𝛼∆𝑤𝑖,𝑗 𝑛 − 1
• Training may be slow.
• Using network after training is very fast
Training practices: batch vs. stochastic
vs. mini-batch gradient descent
• Batch gradient descent:
1. Calculate outputs for the entire Too slow to converge
dataset Gets stuck in local minima
2. Accumulate the errors, back-
propagate and update
• Stochastic/online gradient descent: Converges to the solution faster
1. Feed forward a training example Often helps get the system out of
local minima
2. Back-propagate the error and
update the parameters
• Mini-batch gradient descent:
Learning in epochs
Stopping
• Train the NN on the entire training set over and over
again
• Each such episode of training is called an “epoch”

Stopping
1. Fixed maximum number of epochs: most naïve
2. Keep track of the training and validation error
curves.
Overfitting in ANNs
Local Minima

• NN can get stuck in local minima for small networks.

• For most large networks (many weights) local minima rarely occurs.
• It is unlikely that you are in a minima in every dimension
simultaneously.
ANN
• Highly expressive non-linear functions
• Highly parallel network of logistic function units
• Minimizes sum of squared training errors
• Can add a regularization term (weight squared)
• Local minima
• Overfitting
Thank You
Foundations of Machine Learning

Module 6: Neural Network

Part D: Deep Neural Network

Sudeshna Sarkar
IIT Kharagpur
Deep Learning
• Breakthrough results in
– Image classification
– Speech Recognition
– Machine Translation
– Multi-modal learning
Deep Neural Network
• Problem: training networks with many hidden layers
doesn’t work very well
• Local minima, very slow training if initialize with zero
weights.
• Diffusion of gradient.
Hierarchical Representation
• Hierarchical Representation help represent complex
functions.
• NLP: character ->word -> Chunk -> Clause -> Sentence
• Image: pixel > edge -> texton -> motif -> part -> object
• Deep Learning: learning a hierarchy of internal
representations
• Learned internal representation at the hidden layers
(trainable feature extractor)
• Feature learning

Trainable Trainable
Input Feature … Trainable
Feature Output
Extractor Classifier
Extractor
Unsupervised Pre-training
 We will use greedy, layer wise pre-training
 Train one layer at a time
 Fix the parameters of previous hidden layers
 Previous layers viewed as feature extraction
 find hidden unit features that are more common in training
input than in random inputs
Tuning the Classifier
• After pre-training of the layers
– Add output layer
– Train the whole network using
supervised learning (Back propagation)
Deep neural network
• Feed forward NN
• Stacked Autoencoders (multilayer neural net
with target output = input)
• Stacked restricted Boltzmann machine
• Convolutional Neural Network
A Deep Architecture: Multi-Layer Perceptron
Output Layer
y
Here predicting a supervised target

h3 …
Hidden layers
These learn more
h2 …
abstract representations
as you head up
h1 …

Input layer x …
Raw sensory inputs
A Neural Network
• Training : Back
Propagation of Error
– Calculate total error at
the top
– Calculate contributions
to error at each step
going backwards INPUT LAYER HIDDEN LAYER OUTPUT LAYER

– The weights are

modified as the error is
propagated
Training Deep Networks
• Difficulties of supervised training of deep networks
1. Early layers of MLP do not get trained well
• Diffusion of Gradient – error attenuates as it propagates to
earlier layers
• Leads to very slow training
• the error to earlier layers drops quickly as the top layers
"mostly" solve the task
2. Often not enough labeled data available while there may be
lots of unlabeled data
3. Deep networks tend to have more local minima problems
than shallow networks during supervised training

10
Training of neural networks
• Forward Propagation :
– Sum inputs, produce
activation
– feed-forward

Activation Functions examples

INPUT LAYER HIDDEN LAYER OUTPUT LAYER

Activation Functions
Non-linearity

𝑒 𝑥 −𝑒 −𝑥
• tanh(x)= 𝑥 −𝑥
𝑒 +𝑒

1
• sigmoid(x) =
1+𝑒 −𝑥

• Rectified linear
relu(x) = max(0,x)
- Simplifies backprop
- Makes learning faster
- Make feature sparse
→ Preferred option
Autoencoder
Unlabeled training examples
set
a1
{𝑥 (1) , 𝑥 (2) , 𝑥 (3) . . . }, 𝑥 (𝑖) ∈
ℝ𝑛
a2
Set the target values to be
a3 equal to the inputs. 𝑦 (𝑖) = 𝑥 (𝑖)
Network is trained to output
the input (learn identify
function).
ℎ𝑤,𝑏 𝑥 ≈ 𝑥
Solution may be trivial!
Autoencoders and sparsity
1. Place constraints on the
network, like limiting the
number of hidden units, to
discover interesting structure
about the data.
2. Impose sparsity constraint.
a neuron is “active” if its output
value is close to 1
It is “inactive” if its output value is
close to 0.
constrain the neurons to be inactive
most of the time.
Auto-Encoders

15
Stacked Auto-Encoders
• Do supervised training on the last layer using final
features
• Then do supervised training on the entire network
to fine- tune all weights

e zi
yi 
e j
z

j
16
Convolutional Neural netwoks
• A CNN consists of a number of convolutional and
subsampling layers.
• Input to a convolutional layer is a m x m x r image
where m x m is the height and width of the image
and r is the number of channels, e.g. an RGB image
has r=3
• Convolutional layer will have k filters (or kernels)
• size n x n x q
• n is smaller than the dimension of the image and,
• q can either be the same as the number of
channels r or smaller and may vary for each kernel
Convolutional Neural netwoks

Convolutional layers consist of a rectangular grid of neurons

Each neuron takes inputs from a rectangular section of the previous layer
the weights for this rectangular section are the same for each neuron in the
convolutional layer.
Pooling: Using features obtained after
Convolution for Classification
The pooling layer takes small rectangular
blocks from the convolutional layer and
subsamples it to produce a single output
from that block : max, average, etc.
CNN properties
• CNN takes advantage of the sub-structure of
the input
• Achieved with local connections and tied
weights followed by some form of pooling
which results in translation invariant features.

• CNN are easier to train and have many fewer

parameters than fully connected networks
with the same number of hidden units.
Recurrent Neural Network (RNN)
Thank You

F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
No ratings yet
F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
592 pages
Logcat
No ratings yet
Logcat
4,525 pages
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
7 pages
Chemical Signalling.
No ratings yet
Chemical Signalling.
73 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
Deixis
No ratings yet
Deixis
2 pages
Lect8 DNN
No ratings yet
Lect8 DNN
33 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
2025 Uc Secondary Teaching
No ratings yet
2025 Uc Secondary Teaching
20 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
Lecture 02 - Artificial Neural Network
No ratings yet
Lecture 02 - Artificial Neural Network
37 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
LS2 ALS Understanding-How-Your-Sense-Organs-Works-causes-and-symptoms
100% (1)
LS2 ALS Understanding-How-Your-Sense-Organs-Works-causes-and-symptoms
10 pages
PEC1-Format Prak Corporate 2024 (Autosaved)
No ratings yet
PEC1-Format Prak Corporate 2024 (Autosaved)
25 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Simple Future Tense: Presented by Henny Septia Utami, M.PD
100% (2)
Simple Future Tense: Presented by Henny Septia Utami, M.PD
10 pages
Field Record of Concrete: Commercial & Office Building On Plot No. 373-1343 at Al Barsha First Dubai
No ratings yet
Field Record of Concrete: Commercial & Office Building On Plot No. 373-1343 at Al Barsha First Dubai
38 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Catch-up-Friday-Teaching-Guide-HG V - Week 3
No ratings yet
Catch-up-Friday-Teaching-Guide-HG V - Week 3
5 pages
Non Core - Ganai
No ratings yet
Non Core - Ganai
2 pages
Germination Value A New Formula: Pinus Radiata
No ratings yet
Germination Value A New Formula: Pinus Radiata
5 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
12 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Neural
No ratings yet
Neural
32 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Chapter3 - BP
No ratings yet
Chapter3 - BP
12 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
A Study On Customer Preference Towards Sports Shoes: Bachelor of Business Administration
No ratings yet
A Study On Customer Preference Towards Sports Shoes: Bachelor of Business Administration
8 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Aw Hook-Simulationxpress Study-1
No ratings yet
Aw Hook-Simulationxpress Study-1
11 pages
Neural
No ratings yet
Neural
53 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
Human Settlements and Town Planning
No ratings yet
Human Settlements and Town Planning
3 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
15 Neural Network Updated
No ratings yet
15 Neural Network Updated
85 pages
Confirmation
No ratings yet
Confirmation
2 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Debasrita Physics
No ratings yet
Debasrita Physics
3 pages
CHEMISTY TEST Class 7 (2nd)
No ratings yet
CHEMISTY TEST Class 7 (2nd)
2 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
SikaGrout-220 2011-11 - 1
No ratings yet
SikaGrout-220 2011-11 - 1
4 pages
GM 3500T OwnersManual
No ratings yet
GM 3500T OwnersManual
36 pages
CHEMISTRY TEST 3rd
No ratings yet
CHEMISTRY TEST 3rd
2 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Neural Network
No ratings yet
Neural Network
44 pages
Day 1 Coaching For MTS, Postman Mailgurad
No ratings yet
Day 1 Coaching For MTS, Postman Mailgurad
8 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
Research Methods Synopsis
No ratings yet
Research Methods Synopsis
22 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
OOP VIMP Question - Super 25 - V2V
No ratings yet
OOP VIMP Question - Super 25 - V2V
2 pages
Mapeh Blank Grading Sheet
No ratings yet
Mapeh Blank Grading Sheet
19 pages
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
Exp 3
No ratings yet
Exp 3
9 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Analysis: SEED: The Untold Store
No ratings yet
Analysis: SEED: The Untold Store
1 page
Neural Network
100% (1)
Neural Network
54 pages
BS en 50164-6-2009
No ratings yet
BS en 50164-6-2009
18 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
SPC RT Flex96c - WECS 9520 - Dynex - 2014 03 PDF
No ratings yet
SPC RT Flex96c - WECS 9520 - Dynex - 2014 03 PDF
602 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
350 SX-F Cairoli Replica 2012: Spare Parts Manual: Chassis
No ratings yet
350 SX-F Cairoli Replica 2012: Spare Parts Manual: Chassis
36 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
HBRI Brochure
0% (1)
HBRI Brochure
8 pages
Facilities Management Conference Indonesia
No ratings yet
Facilities Management Conference Indonesia
6 pages
Bok Seng Logistics Pte LTD: Chains Working Load Limits 6.00 T
No ratings yet
Bok Seng Logistics Pte LTD: Chains Working Load Limits 6.00 T
1 page
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

Foundations of Machine Learning: Module 6: Neural Network

Uploaded by

Foundations of Machine Learning: Module 6: Neural Network

Uploaded by

Foundations of Machine Learning

Module 6: Neural Network

𝛿ℎ ← 𝑜ℎ (1 − 𝑜ℎ ) ෍ 𝑤ℎ,𝑘 , 𝛿𝑘 , 𝑦𝑑 = target output

• NN can get stuck in local minima for small networks.

Module 6: Neural Network

– The weights are

Activation Functions examples

INPUT LAYER HIDDEN LAYER OUTPUT LAYER

Convolutional layers consist of a rectangular grid of neurons

• CNN are easier to train and have many fewer

You might also like