0% found this document useful (0 votes)
48 views29 pages

Neural Networks

This document provides an overview of neural networks and some of their key concepts. It discusses perceptrons, which were the first neural network models developed in the 1960s. While simple, perceptrons established basic concepts used in modern multi-layer models. The document then explains multi-layer perceptron networks, which can represent non-linear functions using hidden layers. It also covers backpropagation, the main algorithm for training deep neural networks by propagating errors backwards. Finally, it discusses common activation functions like sigmoid, tanh, and ReLU, and regularization techniques like dropout that help neural networks generalize.

Uploaded by

Shailesh Sivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views29 pages

Neural Networks

This document provides an overview of neural networks and some of their key concepts. It discusses perceptrons, which were the first neural network models developed in the 1960s. While simple, perceptrons established basic concepts used in modern multi-layer models. The document then explains multi-layer perceptron networks, which can represent non-linear functions using hidden layers. It also covers backpropagation, the main algorithm for training deep neural networks by propagating errors backwards. Finally, it discusses common activation functions like sigmoid, tanh, and ReLU, and regularization techniques like dropout that help neural networks generalize.

Uploaded by

Shailesh Sivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Neural Networks

SHAILESH S
Perceptron
➢First neural network learning model in the 1960’s

➢Simple and limited (single layer models)

➢Basic concepts are similar for multi-layer models so this is a good learning tool

➢Still used in many current applications (modems, etc.)


Perceptron Model
Perceptron Model
Perceptron Algorithm
Learning AND gate
Learning AND gate
F = w1.x1 + w2.x2 – θ

W1=1, w2=1, θ= 2.5

1 x1 + 1 x2 – 2.5 = 0

w1 w2 (x1,x2) F
1 1 (0,1) -1.5
1 1 (1,1) -0.5
2 2 (0,0) -2.5
2 2 (1,0) -0.5
2 2
2 x1 + 2 x2 – 2.5 = 0
Learning AND gate
Implementing AND gate

Tools

• Python

• Sklearn

Try It For

OR, NAND, NOR, XOR


3𝑥 + 2𝑦 − 4 = 0
More Gates

?
Multi Layer Perceptron(MLP)

▪Feedforward network: The neurons in


each layer feed their output forward to
the next layer until we get the final
output from the neural network.

▪There can be any number of hidden


layers within a feedforward network.

▪The number of neurons can be


completely arbitrary.

▪MLP used to describe any general


feedforward (no recurrent connections)
network
Again to XOR problem

▪A Perceptron cannot represent


Exclusive XOR since it is not linearly
separable.

▪PERCEPTRON -------→ MLP


Solution XOR problem

4
-3
x O1

x y O1

-2 0 0 1
y
0 1 1
1 0 1
Shade indicate 1 (+ve region )
1 1 0
Solution XOR problem

x
1

y
1 O2

x y O2
0.5 0 0 0
0 1 1
Shade indicate 1 (+ve region)
1 0 1
1 1 1
Solution XOR problem
4
-4
x -3 O1 1.5
z
-2 1
y O2 1 O1 O2 z
1
0 0 0

0.5 0 1 0
1 0 0
1 1 1
Three layer networks
Three layer networks
▪ No connections within a layer
▪ No direct connections between input and output layers
▪ Fully connected between layers
▪ Often more than 3 layers
▪ Number of output units need not equal number of input units
▪ Number of hidden units per layer can be more or less than input or output
units
What do each of these layer do ?

▪1st layer draws ▪2nd layer combines ▪3rd layer can generate
linear boundaries the boundaries arbitrarily complex
boundaries
Backpropagation learning algorithm ‘BP’

▪BP has two phases:

Forward pass phase: computes ‘functional signal’, feed forward


propagation of input pattern signals through network

Backward pass phase: computes ‘error signal’, propagates the


error backwards through network starting at output units
(where the error is the difference between actual and desired
output values)
Backpropagation learning algorithm ‘BP’
Backpropagation learning algorithm ‘BP’

● Error gradient along all connection weights were measured by


propagating the error from output layer.
● First, a forward pass is performed - output of every neuron in every
layer is computed.
● Output error is estimated.
● Then compute how much each neuron in last hidden layer contributed to
output error.
● This is repeated backwards until input layer.
● Last step is Gradient Descent on all connection weights using error
gradients estimated in previous steps.
Activation: Sigmoid
+ Nice interpretation as the firing rate of a neuron
• 0 = not firing at all
• 1 = fully firing
- Sigmoid neurons saturate and kill gradients, thus NN will
barely learn
• when the neuron’s activation are 0 or 1 (saturate)
𝑅𝑛 → 0,1 🙁gradient at these regions almost zero
Takes a real-valued number and 🙁almost no signal will flow to its weights
“squashes” it into range between 0 and 🙁if initial weights are too large then most neurons
1.
would saturate
Activation: Tanh

- Like sigmoid, tanh neurons saturate


- Unlike sigmoid, output is zero-centered
- Tanh is a scaled sigmoid: tanh 𝑥 = 2𝑠𝑖𝑔𝑚 2𝑥 − 1

𝑅𝑛 → −1,1

Takes a real-valued number and


“squashes” it into range between -1
and 1.
Activation: ReLU
Most Deep Networks use ReLU nowadays

🙂Trains much faster


• accelerates the convergence of SGD
• due to linear, non-saturating form
🙂Less expensive operations
f 𝑥 = max(0, 𝑥) • compared to sigmoid/tanh (exponentials etc.)
𝑅𝑛 → 𝑅+𝑛 • implemented by simply thresholding a matrix at zero
🙂More expressive
Takes a real-valued number and
thresholds it at zero 🙂Prevents the gradient vanishing problem
Activation: ReLU
Most Deep Networks use ReLU nowadays

🙂Trains much faster


• accelerates the convergence of SGD
• due to linear, non-saturating form
🙂Less expensive operations
f 𝑥 = max(0, 𝑥) • compared to sigmoid/tanh (exponentials etc.)
𝑅𝑛 → 𝑅+𝑛 • implemented by simply thresholding a matrix at zero
🙂More expressive
Takes a real-valued number and
thresholds it at zero 🙂Prevents the gradient vanishing problem
Regularization
Dropout
• Randomly drop units (along with their
connections) during training
• Each unit retained with fixed probability p,
independent of other units
• Hyper-parameter p to be chosen (tuned)

L2 = weight decay
• Regularization term that penalizes big weights
• Weight decay value determines how dominant regularization is during
gradient computation
• Big weight decay coefficient → big penalty for big weights
Early-stopping
• Use validation error to decide when to stop training
• Stop when monitored quantity has not improved after n subsequent epochs
• n is called patience
Implementation of XOR Gate MLP
Deep Neural Networks
Thank You

ANY QUERIES ?

You might also like