0% found this document useful (0 votes)
40 views

Lecture 15

The document discusses activation functions and backpropagation algorithms in artificial neural networks. It defines key concepts like activation functions, types of activation functions including sigmoid, tanh, ReLU, and describes how non-linear activation functions allow neural networks to learn complex patterns. It then explains the backpropagation algorithm for training multi-layer perceptrons, including initializing weights, propagating inputs forward and errors backward to update weights, and conditions for terminating training.

Uploaded by

Abood Fazil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Lecture 15

The document discusses activation functions and backpropagation algorithms in artificial neural networks. It defines key concepts like activation functions, types of activation functions including sigmoid, tanh, ReLU, and describes how non-linear activation functions allow neural networks to learn complex patterns. It then explains the backpropagation algorithm for training multi-layer perceptrons, including initializing weights, propagating inputs forward and errors backward to update weights, and conditions for terminating training.

Uploaded by

Abood Fazil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

eMBA933

Data Mining
Tools & Techniques
Lecture 15.1

Dr. Faiz Hamid


Associate Professor
Department of IME
IIT Kanpur
[email protected]
Artificial Neural Networks
Role of Activation Function
• Mathematical “gate”

• With linear activation functions, no matter how


many layers in the neural network, the last layer will
be a linear function of the first layer
– A linear combination of linear functions is still a linear
function
– A linear activation function turns the neural network into
just one layer
– A neural network with a linear activation function is simply
a linear regression model
Role of Activation Function
• Non‐linear activation functions allow complex
mappings between the network’s inputs and outputs
– essential for learning and modeling complex data, such as
images, video, audio, and data sets which are non‐linear or
have high dimensionality

• Activation functions should be differentiable


– Derivative/ Gradient of the activation functions are used
for error calculations during back‐propagation algorithm to
improve and optimize the results
Types of Activation Functions
• Linear Function
– Takes the inputs, multiplied by the
weights for each neuron, and creates an
output signal proportional to the input
– Better than a step function; allows
multiple outputs, not just yes and no
– Not possible to use backpropagation
(gradient descent) to train the model
– Derivative of the function is a constant,
and has no relation to the input
Types of Activation Functions
• Nonlinear Activation Functions
– Allow backpropagation because they have a derivative function which is
related to the inputs
– Allow “stacking” of multiple layers of neurons to create a deep neural
network; learn complex data sets with high levels of accuracy
• Sigmoid / Logistic
– Smooth gradient, preventing “jumps”
in output values
– Clear predictions— For x above 2 or below
‐2, tends to bring the Y value (the prediction)
to the edge of the curve, very close to 1 or 0
– Output values between 0 and 1, normalizing the output of each neuron
– Usually used in output layer of a binary classification
– Vanishing gradient—for very high or very low values of X, there is almost
no change to the prediction; network refusing to learn further
– Computationally expensive
Types of Activation Functions
• TanH / Hyperbolic Tangent
– Scaled version of the sigmoid function
– Easier to model inputs that have strongly
negative, neutral, and strongly positive values
– vanishing gradient problem
• ReLU (REctified Linear Unit)
– Most widely used activation function
– Computationally efficient ‐ allows the network
to converge very quickly
– Non‐linear ‐ although it looks like a linear
function, ReLU has a derivative function and
allows for backpropagation
– Dying ReLU problem—when inputs approach
zero, or are negative, the gradient of the
function becomes zero
Types of Activation Functions
• Leaky ReLU
– An improved version of the ReLU function
– A small linear component of x for x < 0
– Removes the zero gradient
– Avoids dead neurons for x < 0
– Dead neuron = always produces same output,
not playing any role in discriminating the input
and is essentially useless
• Softplus
– A smooth approximation to the ReLU
Choosing the Right Activation Function
• Choose an activation function which will
approximate the function faster leading to faster
training process
• If output is binary classification, use sigmoid
function for output layer
• Linear activation function are used in the output
layer in case of regression problems
• Sigmoids and tanh functions are sometimes
avoided due to the vanishing gradient problem
• Tanh is avoided most of the time due to dead
neuron problem
• If a case of dead neurons occurs in the networks the leaky ReLU function
is the best choice
• If you really don’t know which function to use, simply use ReLU
• ReLU function should only be used in the hidden layers
Training Multilayer Perceptrons
• Backpropagation Algorithm
– Learns using a gradient descent method
– Minimize mean squared distance between the network’s class
prediction and the known target value
– Error is propagated backwards: update weights
Flow of Signal

Backpropagation of Error
Backpropagation Algorithm
• Neural network learning for classification or numeric
prediction, using the backpropagation algorithm

• Input
– D, a data set consisting of the training tuples and their associated
target values;
– l, the learning rate;
– network, a multilayer feed‐forward network

• Output: A trained neural network


Backpropagation Algorithm
• Initialize the weights
– weights and bias (thresholds) are initialized to small random
numbers (e.g., [‐1.0,1.0], [‐0.5 to 0.5])

• Propagate the inputs forward


– training tuple is fed to the network’s input layer
– inputs pass through the input units unchanged
– net input to a unit in the hidden or output layers is
computed as a linear combination of its inputs
Backpropagation Algorithm
• Each unit in the
hidden and output
layers takes its net
input and then applies
an activation function
to it

• Logistic / sigmoid
function is used
Backpropagation Algorithm
• Backpropagate the error
– Error is propagated backward by updating the weights and biases to
reflect the error of the network’s prediction
– Error Errj of a unit in output layer is computed by

– Oj is the actual output of unit j, and Tj is the known target value of the
given training tuple.
– Error of a hidden layer unit j is

– Weights and biases are updated as ( l being learning rate)


Backpropagation Algorithm
• Terminating condition
– Training stops when
• All in the previous epoch (iteration) are below
some specified threshold, or
• The percentage of tuples misclassified in the previous
epoch is below some specified threshold, or
• A prespecified number of epochs has expired
Backpropagation Algorithm
• Some comments
– Learning rate helps avoid getting stuck at a
local minimum
Help! I’m stuck
– If learning rate is too small, learning occurs at
a very slow pace
– If learning rate is too large, oscillation
between inadequate solutions may occur
– Time complexity of backpropagation is
O(n.m.hk.o.i)
• n training samples
• m features
• h neurons
• k hidden layers
• o output neurons
• i is the number of iterations
Backpropagation Algorithm

• Traditional default learning rate values are 0.1, 0.01, and 0.001
Backpropagation Algorithm
• Example. A multilayer feed‐forward neural network and initial
weight and bias values are given
– Training tuple X =(1, 0, 1), class label 1
– Learning rate = 0.9
– Sigmoid activation function
Backpropagation Algorithm

Error Errj of a unit in output layer is computed by

Error of a hidden layer unit j is


Backpropagation Algorithm
Representational Power
• Neural network with at least one hidden layer is a universal
approximator (can represent any function)
• Capacity of the network increases with more hidden units and
more hidden layers

You might also like