0% found this document useful (0 votes)
47 views44 pages

Unit 1.1

The document discusses activation functions in neural networks and their importance. Activation functions introduce non-linearity and allow neural networks to learn complex patterns from data. Common activation functions mentioned are sigmoid, tanh, ReLU, and softmax.

Uploaded by

Prateek Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views44 pages

Unit 1.1

The document discusses activation functions in neural networks and their importance. Activation functions introduce non-linearity and allow neural networks to learn complex patterns from data. Common activation functions mentioned are sigmoid, tanh, ReLU, and softmax.

Uploaded by

Prateek Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it. The purpose of the activation function is to introduce
non-linearity into the output of a neuron.

Explanation: We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function.
In a neural network, we would update the weights and biases of the neurons on the basis of the
error at the output. This process is known as back-propagation. Activation functions make the
back-propagation possible since the gradients are supplied along with the error to update the
weights and biases.

Why do we need Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The
activation function does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-
Hidden layer i.e. layer 1:

Layer 2 i.e. output layer :-


the composition of two linear
function is a linear function itself.
Neuron can not learn with just a linear
function attached to it. A non-linear
activation function will let it learn as per
the difference w.r.t error. Hence we
need an activation function.
Sigmoid Function
Tanh Function
RELU Function
Softmax Function
Choosing the Right Activation
Function
Models of Artificial Neural Network

1. McCulloch-Pitts Model of Neuron


• The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types
of inputs — Excitatory and Inhibitory. The excitatory inputs have weights of positive
magnitude and the inhibitory weights have weights of negative magnitude.
• The inputs of the McCulloch-Pitts neuron could be either 0 or 1. It has a threshold
function as an activation function.
• So, the output signal yout is 1 if the input ysum is greater than or equal to a given threshold
value, else 0. The diagrammatic representation of the model is as follows:
• Simple McCulloch-Pitts neurons can be used to design logical operations. For that purpose, the
connection weights need to be correctly decided along with the threshold function (rather than
the threshold value of the activation function).
John carries an umbrella if it is sunny or if it is raining. There are four
given situations. I need to decide when John will carry the umbrella.
The situations are as follows:

•First scenario: It is not raining, nor it is sunny


•Second scenario: It is not raining, but it is sunny
•Third scenario: It is raining, and it is not sunny
•Fourth scenario: It is raining as well as it is sunny

To analyse the situations using the McCulloch-Pitts neural model, I can


consider the input signals as follows:

•X1: Is it raining?
•X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the
value of both weights X1 and X2 as 1 and a threshold function as 1. So,
the neural network model will look like:
Truth Table for this case will be:

Situation x1 x2 ysum yout

1 0 0 0 0

2 0 1 1 1

3 1 0 1 1

4 1 1 2 1
So, I can say that,

From the truth table, I can conclude that in the situations where the value of yout is 1,
John needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios
2, 3 and 4.
Perceptron Model

• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning


rule based on the original MCP neuron.
• A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time.
• A perceptron has one or more than one inputs, a process, and only one output.
• The original Perceptron was designed to take a number of binary inputs, and produce
one binary output (0 or 1).

Perceptron Example
Imagine a perceptron (in your brain).

The perceptron tries to decide if you should go to a concert.


Is the artist good? Is the weather good?

What weights should these facts have?


Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4

The Perceptron Algorithm


Frank Rosenblatt suggested this algorithm:
1.Set a threshold value
2.Multiply all inputs with its weights
3.Sum all the results
4.Activate the output
1. Set a threshold value:
•Threshold = 1.5 3. Sum all the results:
2. Multiply all inputs with its weights: •0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted
•x1 * w1 = 1 * 0.7 = 0.7 Sum)
•x2 * w2 = 0 * 0.6 = 0 4. Activate the Output:
•x3 * w3 = 1 * 0.5 = 0.5 •Return true if the sum > 1.5 ("Yes I will go to
•x4 * w4 = 0 * 0.3 = 0 the Concert")
•x5 * w5 = 1 * 0.4 = 0.4
How does Perceptron work?

• Perceptron is regarded as a single-layer neural network comprising four key parameters in Machine
Learning. These parameters of the perceptron algorithm are input values (Input nodes), net sum,
weights and Bias, and an activation function.
• The perceptron model starts by multiplying every input value and its weights. Subsequently, it adds
these values to generate the weighted sum. This weighted sum is then applied to the activation
function “f” to get the anticipated output. The corresponding activation function is also called the step
function.
Basic Components of Perceptron

1.Input Layer: The input layer consists of one or more input neurons, which receive input signals from
the external world or from other layers of the neural network.

2.Weights: Each input neuron is associated with a weight, which represents the strength of the
connection between the input neuron and the output neuron.

3.Bias: A bias term is added to the input layer to provide the perceptron with additional flexibility in
modeling complex patterns in the input data.

4.Activation Function: The activation function determines the output of the perceptron based on the
weighted sum of the inputs and the bias term. Common activation functions used in perceptrons
include the step function, sigmoid function, and ReLU function.

5.Output: The output of the perceptron is a single binary value, either 0 or 1, which indicates the class
or category to which the input data belongs.

6.Training Algorithm: The perceptron is typically trained using a supervised learning algorithm such as
the perceptron learning algorithm or backpropagation. During training, the weights and biases of the
perceptron are adjusted to minimize the error between the predicted output and the true output for a
given set of training examples.
Types of Perceptron:

1.Single layer: Single layer perceptron can learn only linearly separable patterns.

2.Multilayer: Multilayer perceptron can learn about two or more layers having a
greater processing power.

The multi-layer perceptron model is also known as the


Backpropagation algorithm, which executes in two stages
as follows:
•Forward Stage: Activation functions start from the input
layer in the forward stage and terminate on the output
layer.
•Backward Stage: In the backward stage, weight and bias
values are modified as per the model's requirement. In this
stage, the error between actual output and demanded
originated backward on the output layer and ended on the
input layer.
Linear Separability

• Linear separability is an important concept in machine learning, particularly in the field


of supervised learning. It refers to the ability of a set of data points to be separated into
distinct categories using a linear decision boundary.
• In other words, if there exists a straight line that can cleanly divide the data into two
classes, then the data is said to be linearly separable.
• Linearly separable data points can be separated using a line, linear function, or flat
hyperplane.
• In practice, there are several methods to determine whether data is linearly separable
• If the separate points in n-dimensional space follows

then it is said linearly separable


Methods for checking linear separability:

1.Visual Inspection: If a distinct straight line or plane divides the various groups, it
can be visually examined by plotting the data points in a 2D or 3D space. The data
may be linearly separable if such a boundary can be seen.
2.Perceptron Learning Algorithm: This binary linear classifier divides the input
into two classes by learning a separating hyperplane iteratively. The data are linearly
separable if the method finds a separating hyperplane and converges. If not, it is
not.
3.Support vector machines: SVMs are a well-liked classification technique that can
handle data that can be separated linearly. To optimize the margin between the two
classes, they identify the separating hyperplane. The data can be linearly separated
if the margin is bigger than zero.
4.Kernel methods: The data can be transformed into a higher-dimensional space
using this family of techniques, where it might then be linearly separable. The
original data is also linearly separable if the converted data is linearly separable.
5.Quadratic programming: Finding the separation hyperplane that reduces the
classification error can be done using quadratic programming. If a solution is found,
the data can be separated linearly.
Linearly Separable 2D Data

We say a two-dimensional dataset is linearly separable if we can separate the positive


from the negative objects with a straight line.
For example, here is a case of selling a house based on area and price. We have got
a number of data points for that along with the class, which is house Sold/Not Sold:
Adaptive Linear Neuron (Adaline)

Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit.
It was developed by Widrow and Hoff in 1960. Some important points about Adaline are
as follows −

•It uses bipolar activation function.

•Adaline neuron can be trained using Delta rule or Least Mean Square(LMS) rule or
widrow-hoff rule

•The net input is compared with the target value to compute the error signal.

•on the basis of adaptive training algoritham weights are adjusted


The basic structure of Adaline is similar to perceptron having an extra feedback loop with
the help of which the actual output is compared with the desired/target output. After
comparison on the basis of training algorithm, the weights and bias will be updated.
Adaptive Linear Neuron Learning algorithm
Step 0: initialize the weights and the bias are set to some
random values but not to zero, also initialize the learning
rate α.
Step 1 − perform steps 2-7 when stopping condition is
false.
Step 2 − perform steps 3-5 for each bipolar training pair
s:t.
Step 3 − Activate each input unit as follows −
Step 4 − Obtain the net input with the following relation −
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 5 Until least mean square is obtained (t - yin), Adjust the
weight and bias as follows −

wi(new) = wi(old) + α(t - yin)xi


b(new) = b(old) + α(t - yin)

Now calculate the error using => E = (t - yin)2


Step 7 − Test for the stopping condition, if error generated is less
then or equal to specified tolerance then stop.
Multiple Adaptive Linear Neuron (Madaline)

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists
of many Adalines in parallel. It will have a single output unit. Some important points
about Madaline are as follows −

•It is just like a multilayer perceptron, where Adaline will act as a hidden unit between
the input and the Madaline layer.
•The weights and the bias between the input and Adaline layers, as in we see in the
Adaline architecture, are adjustable.
•The Adaline and Madaline layers have fixed weights and bias of 1.
•Training can be done with the help of Delta rule.
It consists of “n” units of input layer and “m” units of Adaline layer and “1” unit of the
Madaline layer. Each neuron in the Adaline and Madaline layers has a bias of excitation
“1”. The Adaline layer is present between the input layer and the Madaline layer; the
Adaline layer is considered as the hidden layer.

You might also like