Chap 1 DL
Chap 1 DL
DEEP LEARNING
Artificial Neural Networks and its Applications
As you read this article, which organ in your body is thinking about it? It’s the brain of course! But do
you know how the brain works? Well, it has neurons or nerve cells that are the primary units of both
the brain and the nervous system. These neurons receive sensory input from the outside world which
they process and then provide the output which might act as the input to the next neuron.
Each of these neurons is connected to other neurons in complex arrangements at synapses. Now, are
you wondering how this is related to Artificial Neural Networks ? Let’s check out what they are in
detail and how they learn information.
Well, Artificial Neural Networks are modeled after the neurons in the human brain
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are arranged
in a series of layers that together constitute the whole Artificial Neural Network in a system. A layer
can have only a dozen units or millions of units as this depends on how the complex neural networks
will be required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has
an input layer, an output layer as well as hidden layers. The input layer receives data from the outside
world which the neural network needs to analyze or learn about. Then this data passes through one or
multiple hidden layers that transform the input into data that is valuable for the output layer. Finally,
the output layer provides an output in the form of a response of the Artificial Neural Networks to
input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually
results in an output from the output layer.
Dendrite Inputs
Synapses Weights
Axon Output
Synapses : Synapses are the links between biological neurons that enable the transmission of
impulses from dendrites to the cell body. Synapses are the weights that join the one-layer
nodes to the next-layer nodes in artificial neurons. The strength of the links is determined by
the weight value.
Learning : In biological neurons, learning happens in the cell body nucleus or soma, which
has a nucleus that helps to process the impulses. An action potential is produced and travels
through the axons if the impulses are powerful enough to reach the threshold. This becomes
possible by synaptic plasticity, which represents the ability of synapses to become stronger or
weaker over time in reaction to changes in their activity. In artificial neural networks,
backpropagation is a technique used for learning, which adjusts the weights between nodes
according to the error or differences between predicted and actual outcomes.
Activation : In biological neurons, activation is the firing rate of the neuron which happens
when the impulses are strong enough to reach the threshold. In artificial neural networks, A
mathematical function known as an activation function maps the input to the output, and
executes activations.
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example, suppose you want to teach an
ANN to recognize a cat. Then it is shown thousands of different images of cats so that the network
can learn to identify a cat. Once the neural network has been trained enough using images of cats,
then you need to check if it can identify cat images correctly. This is done by making the ANN
classify the images it is provided by deciding whether they are cat images or not. The output obtained
by the ANN is corroborated by a human-provided description of whether the image is a cat image or
not. If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned
during training. Backpropagation is done by fine-tuning the weights of the connections in ANN units
based on the error rate obtained. This process continues until the artificial neural network can
correctly recognize a cat in an image with minimal possible error rates.
What are the types of Artificial Neural Networks?
Feedforward Neural Network : The feedforward neural network is one of the most basic
artificial neural networks. In this ANN, the data or the input provided travels in a single
direction. It enters into the ANN through the input layer and exits through the output layer
while hidden layers may or may not exist. So the feedforward neural network has a front-
propagated wave only and usually does not have backpropagation.
Convolutional Neural Network : A Convolutional neural network has some similarities to
the feed-forward neural network, where the connections between units have weights that
determine the influence of one unit on another unit. But a CNN has one or more than one
convolutional layer that uses a convolution operation on the input and then passes the result
obtained in the form of output to the next layer. CNN has applications in speech and image
processing which is particularly useful in computer vision.
Modular Neural Network: A Modular Neural Network contains a collection of different
neural networks that work independently towards obtaining the output with no interaction
between them. Each of the different neural networks performs a different sub-task by
obtaining unique inputs compared to other networks. The advantage of this modular neural
network is that it breaks down a large and complex computational process into smaller
components, thus decreasing its complexity while still obtaining the required output.
Radial basis function Neural Network: Radial basis functions are those functions that
consider the distance of a point concerning the center. RBF functions have two layers. In the
first layer, the input is mapped into all the Radial basis functions in the hidden layer and then
the output layer computes the output in the next step. Radial basis function nets are normally
used to model the data that represents any underlying trend or function.
Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and
feeds this output back to the input to better predict the outcome of the layer. The first layer in
the RNN is quite similar to the feed-forward neural network and the recurrent neural network
starts once the output of the first layer is computed. After this layer, each unit will remember
some information from the previous step so that it can act as a memory cell in performing
computations.
Applications of Artificial Neural Networks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For example,
let’s take the ‘People you may know’ feature on Facebook that suggests people that you
might know in real life so that you can send them friend requests. Well, this magical effect is
achieved by using Artificial Neural Networks that analyze your profile, your interests, your
current friends, and also their friends and various other factors to calculate the people you
might potentially know. Another common application of Machine Learning in social media
is facial recognition . This is done by finding around 100 reference points on the person’s
face and then matching them with those already available in the database using convolutional
neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they
will recommend your products to buy based on your previous browsing history. Similarly,
suppose you love Pasta, then Zomato, Swiggy, etc. will show you restaurant
recommendations based on your tastes and previous order history. This is true across all new-
age marketing segments like Book sites, Movie services, Hospitality sites, etc. and it is done
by implementing personalized marketing . This uses Artificial Neural Networks to identify
the customer likes, dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
3. Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that can
identify cancerous tissue at the microscopic level at the same accuracy as trained physicians.
Various rare diseases may manifest in physical characteristics and can be identified in their
premature stages by using Facial Analysis on the patient photos. So the full-scale
implementation of Artificial Neural Networks in the healthcare environment can only enhance
the diagnostic abilities of medical experts and ultimately lead to the overall improvement in
the quality of medical care all over the world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard
them based on the phones you have!!! These are personal assistants and an example of speech
recognition that uses Natural Language Processing to interact with the users and formulate a
response accordingly. Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as managing the language syntax,
semantics, correct speech, the conversation that is going on, etc.
Deep Learning
In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone technology,
revolutionizing how machines understand, learn, and interact with complex data. At its essence, Deep
Learning AI mimics the intricate neural networks of the human brain, enabling computers to
autonomously discover patterns and make decisions from vast amounts of unstructured data. This
transformative field has propelled breakthroughs across various domains, from computer vision and
natural language processing to healthcare diagnostics and autonomous driving.
As we dive into this introductory exploration of Deep Learning, we uncover its foundational
principles, applications, and the underlying mechanisms that empower machines to achieve human-
like cognitive abilities. This article serves as a gateway into understanding how Deep Learning is
reshaping industries, pushing the boundaries of what’s possible in AI, and paving the way for a future
where intelligent systems can perceive, comprehend, and innovate autonomously.
What is Deep Learning?
The definition of Deep learning is that it is the branch of machine learning that is based on artificial
neural network architecture. An artificial neural network or ANN uses layers of interconnected nodes
called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the input
layer. The output of one neuron becomes the input to other neurons in the next layer of the network,
and this process continues until the final layer produces the output of the network. The layers of the
neural network transform the input data through a series of nonlinear transformations, allowing the
network to learn complex representations of the input data.
Today Deep learning AI has become one of the most popular and visible areas of machine learning,
due to its success in a variety of applications, such as computer vision, natural language processing,
and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement machine learning.
it uses a variety of ways to process these.
Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data
based on the labeled datasets. Here we input both input features along with the target
variables. the neural network learns to make predictions based on the cost or error that comes
from the difference between the predicted and the actual target, this process is known as
backpropagation. Deep learning algorithms like Convolutional neural networks, Recurrent
neural networks are used for many supervised tasks like image classifications and
recognization, sentiment analysis, language translations, etc.
Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster the
dataset based on unlabeled datasets. Here there are no target variables. while the machine has
to self-determined the hidden patterns or relationships within the datasets. Deep learning
algorithms like autoencoders and generative models are used for unsupervised tasks like
clustering, dimensionality reduction, and anomaly detection.
Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to maximize
a reward signal. The agent interacts with the environment by taking action and observing the
resulting rewards. Deep learning can be used to learn policies, or a set of actions, that
maximizes the cumulative reward over time. Deep reinforcement learning algorithms like
Deep Q networks and Deep Deterministic Policy Gradient (DDPG) are used to reinforce tasks
like robotics and game playing etc.
Artificial neural networks
Artificial neural networks are built on the principles of the structure and operation of human neurons.
It is also known as neural networks or neural nets. An artificial neural network’s input layer, which is
the first layer, receives input from external sources and passes it on to the hidden layer, which is the
second layer. Each neuron in the hidden layer gets information from the neurons in the previous layer,
computes the weighted total, and then transfers it to the neurons in the next layer. These connections
are weighted, which means that the impacts of the inputs from the preceding layer are more or less
optimized by giving each input a distinct weight. These weights are then adjusted during the training
process to enhance the performance of the model.
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which are Relevant features are automatically extracted
manually extracted from images to detect an from images. It is an end-to-end learning
object in the image. process.
Less complex and easy to interpret the result. More complex, it works like the black box
Machine Learning Deep Learning
It can work on the CPU or requires less It requires a high-performance computer with
computing power as compared to deep learning. GPU.
if there are “n” nodes with each node having “m” weights, then it is represented as:
Bias: Bias is a constant that is added to the product of inputs and weights to calculate the
product. It is used to shift the result to the positive or negative side. The net input weight is
increased by a positive bias while The net input weight is decreased by a negative bias.
Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the function g(x)
which sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+........+xn+b
and the role of the activation is to provide the output depending on the results of the summation
function:
Y=1 if g(x)>=0
Y=0 else
Threshold: A threshold value is a constant value that is compared to the net input to get the
output. The activation function is defined based on the threshold value to calculate the output.
For Example:
Y=1 if net input>=threshold
Y=0 else
Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used for balancing
weights during the learning of ANN.
Target value: Target values are Correct values of the output variable and are also known as
just targets.
Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is also known
as Least Mean Square Method. It reduces the error over the entire learning and training
process. In order to minimize error, it follows the gradient descent method in which the
Activation Function continues forever.
Outstar Learning: It was first proposed by Grossberg in 1976, where we use the concept that
a Neural Network is arranged in layers, and weights connected through a particular node
should be equal to the desired output resulting in neurons that are connected with those
weights.
Unsupervised Learning Algorithms:
Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of nodes in a
network. The change in weight is based on input, output, and learning rate. the transpose of
the output is needed for weight adjustment.
Competitive Learning: It is a winner takes all strategy. Here, when an input pattern is sent to
the network, all the neurons in the layer compete with each other to represent the input
pattern, the winner gets the output as 1 and all the others 0, and only the winning neurons
have weight adjustments.
Understanding Neurons in Deep Learning
Neurons are a critical component of any deep learning model.
In fact, one could argue that you can't fully understand deep learning with having a deep knowledge
of how neurons work.
This article will introduce you to the concept of neurons in deep learning. We'll talk about the origin
of deep learning neurons, how they were inspired by the biology of the human brain, and why neurons
are so important in deep learning models today.
What is a Neuron in Biology?
Neurons in deep learning were inspired by neurons in the human brain. Here is a diagram of the
anatomy of a brain neuron:
As you can see, neurons have quite an interesting structure. Groups of neurons work together inside
the human brain to perform the functionality that we require in our day-to-day lives.
The question that Geoffrey Hinton asked during his seminal research in neural networks was whether
we could build computer algorithms that behave similarly to neurons in the brain. The hope was that
by mimicking the brain's structure, we might capture some of its capability.
To do this, researchers studied the way that neurons behaved in the brain. One important observation
was that a neuron by itself is useless. Instead, you require networks of neurons to generate any
meaningful functionality.
This is because neurons function by receiving and sending signals. More specifically, the
neuron's dendrites receive signals and pass along those signals through the axon.
The dendrites of one neuron are connected to the axon of another neuron. These connections are
called synapses - which is a concept that has been generalized to the field of deep learning.
What is a Neuron in Deep Learning?
Neurons in deep learning models are nodes through which data and computations flow.
Neurons work like this:
They receive one or more input signals. These input signals can come from either the raw data
set or from neurons positioned at a previous layer of the neural net.
They perform some calculations.
They send some output signals to neurons deeper in the neural net through a synapse.
Here is a diagram of the functionality of a neuron in a deep learning neural net:
The activation function calculates the output value for the neuron. This output value is then passed on
to the next layer of the neural network through another synapse.
This serves as a broad overview of deep learning neurons. Do not worry if it was a lot to take in -
we'll learn much more about neurons in deep learning throughout this course. For now, it's sufficient
for you to have a high-level understanding of how they are structured in a deep learning model.
Final Thoughts
In this tutorial, you had your first introduction to neurons in deep learning.
Here is a brief summary of what you learned:
A quick overview of how neurons work in the human brain
How neurons work in a deep learning model
The different layers of neurons in a deep learning model
The functionality of deep learning neurons
How weights are applied to input signals within a neuron
That activation functions are applied to the weighted sum of input signals to calculate a
neuron's output value
What is Perceptron | The Simplest Artificial neural network
A single-layer feedforward neural network was introduced in the late 1950s by Frank Rosenblatt. It
was the starting phase of Deep Learning and Artificial neural networks. During that time for
prediction, Statistical machine learning, or Traditional code Programming is used. Perceptron is one
of the first and most straightforward models of artificial neural networks. Despite being a
straightforward model, the perceptron has been proven to be successful in solving specific
categorization issues.
What is Perceptron?
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by Frank
Rosenblatt in 1957s. It is the simplest type of feedforward neural network, consisting of a single layer
of input nodes that are fully connected to a layer of output nodes. It can learn the linearly separable
patterns. it uses slightly different types of artificial neurons known as threshold logic units (TLU). it
was first introduced by McCulloch and Walter Pitts in the 1940s.
Types of Perceptron
Single-Layer Perceptron: This type of perceptron is limited to learning linearly separable
patterns. effective for tasks where the data can be divided into distinct categories through a
straight line.
Multilayer Perceptron: Multilayer perceptrons possess enhanced processing capabilities as
they consist of two or more layers, adept at handling more complex patterns and relationships
within the data.
Basic Components of Perceptron
A perceptron, the basic unit of a neural network, comprises essential components that collaborate in
information processing.
Input Features: The perceptron takes multiple input features, each input feature represents a
characteristic or attribute of the input data.
Weights: Each input feature is associated with a weight, determining the significance of each
input feature in influencing the perceptron’s output. During training, these weights are
adjusted to learn the optimal values.
Summation Function: The perceptron calculates the weighted sum of its inputs using the
summation function. The summation function combines the inputs with their respective
weights to produce a weighted sum.
Activation Function: The weighted sum is then passed through an activation function.
Perceptron uses Heaviside step function functions. which take the summed values as input
and compare with the threshold and provide the output as 0 or 1.
Output: The final output of the perceptron, is determined by the activation function’s result.
For example, in binary classification problems, the output might represent a predicted class (0
or 1).
Bias: A bias term is often included in the perceptron model. The bias allows the model to
make adjustments that are independent of the input. It is an additional parameter that is
learned during training.
Learning Algorithm (Weight Update Rule): During training, the perceptron learns by
adjusting its weights and bias based on a learning algorithm. A common approach is the
perceptron learning algorithm, which updates weights based on the difference between the
predicted output and the true output.
These components work together to enable a perceptron to learn and make predictions. While a single
perceptron can perform binary classification, more complex tasks require the use of multiple
perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the significance of that input to the
output. The perceptron’s output is a weighted sum of the inputs that have been run through an
activation function to decide whether or not the perceptron will fire. it computes the weighted sum of
its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is larger
than a threshold value and 0 otherwise, is the activation function that perceptrons utilize the most
frequently. The most common step function used in perceptron is the Heaviside step function:A
perceptron has a single layer of threshold logic units with each TLU connected to all inputs.
yj and is the jth actual and predicted value is the learning rate.
Backpropagation in Neural Network
Machine learning models learn from data and make predictions. One of the fundamental concepts
behind training these models is backpropagation
A neural network is a network structure, by the presence of computing units(neurons) the neural
network has gained the ability to compute the function. The neurons are connected with the help of
edges, and it is said to have an assigned activation function and also contains the adjustable
parameters. These adjustable parameters help the neural network to determine the function that needs
to be computed by the network. In terms of activation function in neural networks, the higher the
activation value is the greater the activation is.
What is backpropagation?
In machine learning, backpropagation is an effective algorithm used to train artificial neural
networks, especially in feed-forward neural networks.
Backpropagation is an iterative algorithm, that helps to minimize the cost function by
determining which weights and biases should be adjusted. During every epoch, the model
learns by adapting the weights and biases to minimize the loss by moving down toward the
gradient of the error. Thus, it involves the two most popular optimization algorithms, such
as gradient descent or stochastic gradient descent.
Computing the gradient in the backpropagation algorithm helps to minimize the cost
function and it can be implemented by using the mathematical rule called chain rule from
calculus to navigate through complex layers of the neural network.
∇wij = η ?j Oj
?j = Oj (1-Oj)(tj - Oj) (if j is an output unit)
?j = Oj (1-O)∑k ?k wkj (if j is a hidden unit)
where ,
η is the constant which is considered as learning rate,
tj is the correct output for unit j
?j is the error measure for unit j
Step 3: To calculate the backpropagation, we need to start from the output unit:
To compute the ?5, we need to use the output of forward pass,
?5 = y5(1-y5) (ytarget -y5)
= 0.67(1-0.67) (-0.17)
= -0.0376
For hidden unit,
To compute the hidden unit, we will take the value of ?5
?3 = y3(1-y3) (w1,3 * ?5)
=0.56(1-0.56) (0.3*-0.0376)
=-0.0027
?4 = y4 (1-y5) (w2,3 * ?5)
=0.59(1-0.59) (0.9*-0.0376)
=-0.0819
Step 4: We need to update the weights, from output unit to hidden unit,
∇ wj,i = η ?j Oj
Note- Here our learning rate is 1
∇ w2,3 = η ?5 O4
= 1 * (-0.376) * 0.59
= -0.22184
We will be updating the weights based on the old weight of the network,
∇ w1,1 = η ?3 O4
= 1 * (-0.0027) * 0.35
= 0.000945
Similarly, we need to calculate the new weight value using the old one:
Once, the above process is done, we again perform the forward pass to find if we obtain the actual
output as 0.5.
While performing the forward pass again, we obtain the following values:
y3 = 0.57
y4 = 0.56
y5 = 0.61
We can clearly see that our y5 value is 0.61 which is not an expected actual output, So again we need
to find the error and backpropagate through the network by updating the weights until the actual
output is obtained.
Error = y_{target} – y_5
= 0.5 – 0.61
= -0.11
This is how the backpropagate works, it will be performing the forward pass first to see if we obtain
the actual output, if not we will be finding the error rate and then backpropagating backwards through
the layers in the network by adjusting the weights according to the error rate. This process is said to
be continued until the actual output is gained by the neural network.
Types Of Learning Rules in ANN
Learning rule enhances the Artificial Neural Network’s performance by applying this rule over the
network. Thus learning rule updates the weights and bias levels of a network when certain conditions
are met in the training process. it is a crucial part of the development of the Neural Network.
Types Of Learning Rules in ANN
Out Star Learning Rule is implemented when nodes in a network are arranged in a layer. Here the
weights linked to a particular node should be equal to the targeted outputs for the nodes connected
through those same weights. Weight change is thus calculated as=δw=α(t-y)
Where α=learning rate, y=actual output, and t=desired output for n layer nodes.
6. Competitive Learning Rule
It is also known as the Winner-takes-All rule and is unsupervised in nature. Here all the output nodes
try to compete with each other to represent the input pattern and the winner is declared according to
the node having the most outputs and is given the output 1 while the rest are given 0.
There are a set of neurons with arbitrarily distributed weights and the activation function is applied to
a subset of neurons. Only one neuron is active at a time. Only the winner has updated weights, the rest
remain unchanged.
Activation functions in Neural Networks
In the process of building a neural network, one of the choices you get to make is what Activation
Function to use in the hidden layer as well as at the output layer of the network.
What is an Activation Function?
An activation function in the context of neural networks is a mathematical function applied to the
output of a neuron. The purpose of an activation function is to introduce non-linearity into the model,
allowing the network to learn and represent complex patterns in the data. Without non-linearity, a
neural network would essentially behave like a linear regression model, regardless of the number of
layers it has.
The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-
linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function. In a neural network, we would update the weights and
biases of the neurons on the basis of the error at the output. This process is known as back-
propagation. Activation functions make the back-propagation possible since the gradients are
supplied along with the error to update the weights and biases.
Elements of a Neural Network
Input Layer: This layer accepts input features. It provides information from the outside world to the
network, no computation is performed at this layer, nodes here just pass on the information(features)
to the hidden layer.
Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of the abstraction
provided by any neural network. The hidden layer performs all sorts of computation on the features
entered through the input layer and transfers the result to the output layer.
Output Layer: This layer bring up the information learned by the network to the outer world.
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear regression model. The
activation function does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-
The activation that works almost always better than sigmoid function is Tanh function also
known as Tangent Hyperbolic function. It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived from each other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies between -1 to
1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps
in centering the data by bringing mean close to 0. This makes learning for the next layer
much easier.
RELU Function
It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have multiple
layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves
simpler mathematical operations. At a time only a few neurons are activated making the
network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
Softmax Function
The softmax function is also a type of sigmoid function but is handy when we are trying to handle
multi- class classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. the softmax function was
commonly found in the output layer of image classification problems.The softmax function
would squeeze the outputs for each class between 0 and 1 and would also divide by the sum
of the outputs.
Output:- The softmax function is ideally used in the output layer of the classifier where we
are actually trying to attain the probabilities to define the class of each input.
The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function in hidden layers and is used in most
cases these days.
If your output is for binary classification then, sigmoid function is very natural choice for
output layer.
If your output is for multi-class classification then, Softmax is very useful to predict the
probabilities of each classes.
Understanding Loss Function in Deep Learning
Machine learning allows for prediction, classification and decisions derived from data. In research,
machine learning is part of artificial intelligence, and the process of developing a computational
model has capabilities mimicking human intelligence. Machine learning and related methods involve
developing algorithms that recognize patterns in the information that is available, and perform
predictive or classification of Loss Function.
What Are Loss Functions in Machine Learning?
The loss function helps determine how effectively your algorithm model the featured dataset.
Similarly loss is the measure that your model has for predictability, the expected results. Losses can
generally fall into two broad categories relating to real world problems: classification and regression.
We must predict probability for each class in which the problem is concerned. In regression however
we have the task of forecasting a constant value for a specific group of independent features.
What is Loss Function in Deep Learning?
In mathematical optimization and decision theory, a loss or cost function (sometimes also called an
error function) is a function that maps an event or values of one or more variables onto a real number
intuitively representing some “cost” associated with the event.
In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling
your dataset. It is a mathematical function of the parameters of the machine learning algorithm.
In simple linear regression, prediction is calculated using slope (m) and intercept (b). The loss
function for this is the (Yi – Yihat)^2 i.e., loss function is the function of slope and intercept.
Regression loss functions like the MSE loss function are commonly used in evaluating the
performance of regression models. Additionally, objective functions play a crucial role in optimizing
machine learning models by minimizing the loss or cost. Other commonly used loss functions include
the Huber loss function, which combines the characteristics of the MSE and MAE loss functions,
providing robustness to outliers in the data.
Advantage
Easy Interpretation: The MSE is straightforward to understand.
Always Differential: Due to the squaring, it is always differentiable.
Single Local Minimum: It has only one local minimum.
Disadvantage
Error Unit in Squared Form: The error is measured in squared units, which might not be
intuitively interpretable.
Not Robust to Outliers: MSE is sensitive to outliers.
Note: In regression tasks, at the last neuron, it’s common to use a linear activation function.
2. Mean Absolute Error/ L1 loss Functions
The Mean Absolute Error (MAE) is another simple loss function. It calculates the average absolute
difference between the actual value and the model prediction across the dataset.
Advantage
Intuitive and Easy: MAE is easy to grasp.
Error Unit Matches Output Column: The error unit is the same as the output column.
Robust to Outliers: MAE is less affected by outliers.
Disadvantage
Graph Not Differential: The MAE graph is not differentiable, so gradient descent cannot be
applied directly. Subgradient calculation is an alternative.
Note: In regression tasks, at the last neuron, a linear activation function is commonly used.
3. Huber Loss
The Huber loss is used in robust regression and is less sensitive to outliers compared to squared error
loss.
yi – actual values
yihat – Neural Network prediction
Advantage –
A cost function is a differential.
Disadvantage –
Multiple local minima
Not intuitive
Note – In classification at last neuron use sigmoid activation function.
2. Categorical Cross Entropy
Categorical Cross entropy is used for Multiclass classification and softmax regression.
loss function = -sum up to k(yjlagyjhat) where k is classes
where
k is classes,
y = actual value
yhat – Neural Network prediction
Note – In multi-class classification at the last neuron use the softmax activation function.