0% found this document useful (0 votes)
29 views36 pages

Chap 1 DL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views36 pages

Chap 1 DL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

CHAP 1

DEEP LEARNING
Artificial Neural Networks and its Applications
As you read this article, which organ in your body is thinking about it? It’s the brain of course! But do
you know how the brain works? Well, it has neurons or nerve cells that are the primary units of both
the brain and the nervous system. These neurons receive sensory input from the outside world which
they process and then provide the output which might act as the input to the next neuron.
Each of these neurons is connected to other neurons in complex arrangements at synapses. Now, are
you wondering how this is related to Artificial Neural Networks ? Let’s check out what they are in
detail and how they learn information.
Well, Artificial Neural Networks are modeled after the neurons in the human brain
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are arranged
in a series of layers that together constitute the whole Artificial Neural Network in a system. A layer
can have only a dozen units or millions of units as this depends on how the complex neural networks
will be required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has
an input layer, an output layer as well as hidden layers. The input layer receives data from the outside
world which the neural network needs to analyze or learn about. Then this data passes through one or
multiple hidden layers that transform the input into data that is valuable for the output layer. Finally,
the output layer provides an output in the form of a response of the Artificial Neural Networks to
input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually
results in an output from the output layer.

Neural Networks Architecture


The structures and operations of human neurons serve as the basis for artificial neural networks. It is
also known as neural networks or neural nets. The input layer of an artificial neural network is the
first layer, and it receives input from external sources and releases it to the hidden layer, which is the
second layer. In the hidden layer, each neuron receives input from the previous layer neurons,
computes the weighted sum, and sends it to the neurons in the next layer. These connections are
weighted means effects of the inputs from the previous layer are optimized more or less by assigning
different-different weights to each input and it is adjusted during the training process by optimizing
these weights for improved model performance.
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal brains So
they share a lot of similarities in structure and function wise.
 Structure : The structure of artificial neural networks is inspired by biological neurons. A
biological neuron has a cell body or soma to process the impulses, dendrites to receive them,
and an axon that transfers them to other neurons. The input nodes of artificial neural
networks receive input signals, the hidden layer nodes compute these input signals, and the
output layer nodes compute the final output by processing the hidden layer’s results using
activation functions.

Biological Neuron Artificial Neuron

Dendrite Inputs

Cell nucleus or Soma Nodes

Synapses Weights

Axon Output

 Synapses : Synapses are the links between biological neurons that enable the transmission of
impulses from dendrites to the cell body. Synapses are the weights that join the one-layer
nodes to the next-layer nodes in artificial neurons. The strength of the links is determined by
the weight value.
 Learning : In biological neurons, learning happens in the cell body nucleus or soma, which
has a nucleus that helps to process the impulses. An action potential is produced and travels
through the axons if the impulses are powerful enough to reach the threshold. This becomes
possible by synaptic plasticity, which represents the ability of synapses to become stronger or
weaker over time in reaction to changes in their activity. In artificial neural networks,
backpropagation is a technique used for learning, which adjusts the weights between nodes
according to the error or differences between predicted and actual outcomes.

Biological Neuron Artificial Neuron

Synaptic plasticity Backpropagations

 Activation : In biological neurons, activation is the firing rate of the neuron which happens
when the impulses are strong enough to reach the threshold. In artificial neural networks, A
mathematical function known as an activation function maps the input to the output, and
executes activations.
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example, suppose you want to teach an
ANN to recognize a cat. Then it is shown thousands of different images of cats so that the network
can learn to identify a cat. Once the neural network has been trained enough using images of cats,
then you need to check if it can identify cat images correctly. This is done by making the ANN
classify the images it is provided by deciding whether they are cat images or not. The output obtained
by the ANN is corroborated by a human-provided description of whether the image is a cat image or
not. If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned
during training. Backpropagation is done by fine-tuning the weights of the connections in ANN units
based on the error rate obtained. This process continues until the artificial neural network can
correctly recognize a cat in an image with minimal possible error rates.
What are the types of Artificial Neural Networks?
 Feedforward Neural Network : The feedforward neural network is one of the most basic
artificial neural networks. In this ANN, the data or the input provided travels in a single
direction. It enters into the ANN through the input layer and exits through the output layer
while hidden layers may or may not exist. So the feedforward neural network has a front-
propagated wave only and usually does not have backpropagation.
 Convolutional Neural Network : A Convolutional neural network has some similarities to
the feed-forward neural network, where the connections between units have weights that
determine the influence of one unit on another unit. But a CNN has one or more than one
convolutional layer that uses a convolution operation on the input and then passes the result
obtained in the form of output to the next layer. CNN has applications in speech and image
processing which is particularly useful in computer vision.
 Modular Neural Network: A Modular Neural Network contains a collection of different
neural networks that work independently towards obtaining the output with no interaction
between them. Each of the different neural networks performs a different sub-task by
obtaining unique inputs compared to other networks. The advantage of this modular neural
network is that it breaks down a large and complex computational process into smaller
components, thus decreasing its complexity while still obtaining the required output.
 Radial basis function Neural Network: Radial basis functions are those functions that
consider the distance of a point concerning the center. RBF functions have two layers. In the
first layer, the input is mapped into all the Radial basis functions in the hidden layer and then
the output layer computes the output in the next step. Radial basis function nets are normally
used to model the data that represents any underlying trend or function.
 Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and
feeds this output back to the input to better predict the outcome of the layer. The first layer in
the RNN is quite similar to the feed-forward neural network and the recurrent neural network
starts once the output of the first layer is computed. After this layer, each unit will remember
some information from the previous step so that it can act as a memory cell in performing
computations.
Applications of Artificial Neural Networks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For example,
let’s take the ‘People you may know’ feature on Facebook that suggests people that you
might know in real life so that you can send them friend requests. Well, this magical effect is
achieved by using Artificial Neural Networks that analyze your profile, your interests, your
current friends, and also their friends and various other factors to calculate the people you
might potentially know. Another common application of Machine Learning in social media
is facial recognition . This is done by finding around 100 reference points on the person’s
face and then matching them with those already available in the database using convolutional
neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they
will recommend your products to buy based on your previous browsing history. Similarly,
suppose you love Pasta, then Zomato, Swiggy, etc. will show you restaurant
recommendations based on your tastes and previous order history. This is true across all new-
age marketing segments like Book sites, Movie services, Hospitality sites, etc. and it is done
by implementing personalized marketing . This uses Artificial Neural Networks to identify
the customer likes, dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
3. Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that can
identify cancerous tissue at the microscopic level at the same accuracy as trained physicians.
Various rare diseases may manifest in physical characteristics and can be identified in their
premature stages by using Facial Analysis on the patient photos. So the full-scale
implementation of Artificial Neural Networks in the healthcare environment can only enhance
the diagnostic abilities of medical experts and ultimately lead to the overall improvement in
the quality of medical care all over the world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard
them based on the phones you have!!! These are personal assistants and an example of speech
recognition that uses Natural Language Processing to interact with the users and formulate a
response accordingly. Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as managing the language syntax,
semantics, correct speech, the conversation that is going on, etc.
Deep Learning
In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone technology,
revolutionizing how machines understand, learn, and interact with complex data. At its essence, Deep
Learning AI mimics the intricate neural networks of the human brain, enabling computers to
autonomously discover patterns and make decisions from vast amounts of unstructured data. This
transformative field has propelled breakthroughs across various domains, from computer vision and
natural language processing to healthcare diagnostics and autonomous driving.
As we dive into this introductory exploration of Deep Learning, we uncover its foundational
principles, applications, and the underlying mechanisms that empower machines to achieve human-
like cognitive abilities. This article serves as a gateway into understanding how Deep Learning is
reshaping industries, pushing the boundaries of what’s possible in AI, and paving the way for a future
where intelligent systems can perceive, comprehend, and innovate autonomously.
What is Deep Learning?
The definition of Deep learning is that it is the branch of machine learning that is based on artificial
neural network architecture. An artificial neural network or ANN uses layers of interconnected nodes
called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the input
layer. The output of one neuron becomes the input to other neurons in the next layer of the network,
and this process continues until the final layer produces the output of the network. The layers of the
neural network transform the input data through a series of nonlinear transformations, allowing the
network to learn complex representations of the input data.
Today Deep learning AI has become one of the most popular and visible areas of machine learning,
due to its success in a variety of applications, such as computer vision, natural language processing,
and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement machine learning.
it uses a variety of ways to process these.
 Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data
based on the labeled datasets. Here we input both input features along with the target
variables. the neural network learns to make predictions based on the cost or error that comes
from the difference between the predicted and the actual target, this process is known as
backpropagation. Deep learning algorithms like Convolutional neural networks, Recurrent
neural networks are used for many supervised tasks like image classifications and
recognization, sentiment analysis, language translations, etc.
 Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster the
dataset based on unlabeled datasets. Here there are no target variables. while the machine has
to self-determined the hidden patterns or relationships within the datasets. Deep learning
algorithms like autoencoders and generative models are used for unsupervised tasks like
clustering, dimensionality reduction, and anomaly detection.
 Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to maximize
a reward signal. The agent interacts with the environment by taking action and observing the
resulting rewards. Deep learning can be used to learn policies, or a set of actions, that
maximizes the cumulative reward over time. Deep reinforcement learning algorithms like
Deep Q networks and Deep Deterministic Policy Gradient (DDPG) are used to reinforce tasks
like robotics and game playing etc.
Artificial neural networks
Artificial neural networks are built on the principles of the structure and operation of human neurons.
It is also known as neural networks or neural nets. An artificial neural network’s input layer, which is
the first layer, receives input from external sources and passes it on to the hidden layer, which is the
second layer. Each neuron in the hidden layer gets information from the neurons in the previous layer,
computes the weighted total, and then transfers it to the neurons in the next layer. These connections
are weighted, which means that the impacts of the inputs from the preceding layer are more or less
optimized by giving each input a distinct weight. These weights are then adjusted during the training
process to enhance the performance of the model.

Fully Connected Artificial Neural Network


Artificial neurons, also known as units, are found in artificial neural networks. The whole Artificial
Neural Network is composed of these artificial neurons, which are arranged in a series of layers. The
complexities of neural networks will depend on the complexities of the underlying patterns in the
dataset whether a layer has a dozen units or millions of units. Commonly, Artificial Neural Network
has an input layer, an output layer as well as hidden layers. The input layer receives data from the
outside world which the neural network needs to analyze or learn about.
In a fully connected artificial neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the input
layer. The output of one neuron becomes the input to other neurons in the next layer of the network,
and this process continues until the final layer produces the output of the network. Then, after passing
through one or more hidden layers, this data is transformed into valuable data for the output
layer. Finally, the output layer provides an output in the form of an artificial neural network’s response
to the data that comes in.
Units are linked to one another from one layer to another in the bulk of neural networks. Each of these
links has weights that control how much one unit influences another. The neural network learns more
and more about the data as it moves from one unit to another, ultimately producing an output from the
output layer.
Difference between Machine Learning and Deep Learning :
machine learning and deep learning AI both are subsets of artificial intelligence but there are many
similarities and differences between them.

Machine Learning Deep Learning

Uses artificial neural network architecture to


Apply statistical algorithms to learn the hidden
learn the hidden patterns and relationships in
patterns and relationships in the dataset.
the dataset.

Requires the larger volume of dataset


Can work on the smaller amount of dataset
compared to machine learning

Better for complex task like image


Better for the low-label task.
processing, natural language processing, etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features which are Relevant features are automatically extracted
manually extracted from images to detect an from images. It is an end-to-end learning
object in the image. process.

Less complex and easy to interpret the result. More complex, it works like the black box
Machine Learning Deep Learning

interpretations of the result are not easy.

It can work on the CPU or requires less It requires a high-performance computer with
computing power as compared to deep learning. GPU.

Types of neural networks


Deep Learning models are able to automatically learn features from the data, which makes them well-
suited for tasks such as image recognition, speech recognition, and natural language processing. The
most widely used architectures in deep learning are feedforward neural networks, convolutional
neural networks (CNNs), and recurrent neural networks (RNNs).
1. Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image
classification, speech recognition, and natural language processing.
2. Convolutional Neural Networks (CNNs) are specifically for image and video recognition
tasks. CNNs are able to automatically learn features from the images, which makes them
well-suited for tasks such as image classification, object detection, and image segmentation.
3. Recurrent Neural Networks (RNNs) are a type of neural network that is able to process
sequential data, such as time series and natural language. RNNs are able to maintain an
internal state that captures information about the previous inputs, which makes them well-
suited for tasks such as speech recognition, natural language processing, and language
translation.
Deep Learning Applications:
The main applications of deep learning AI can be divided into computer vision, natural language
processing (NLP), and reinforcement learning.
1. Computer vision
The first Deep Learning applications is Computer vision. In computer vision, Deep learning AI
models can enable machines to identify and understand visual data. Some of the main applications of
deep learning in computer vision include:
 Object detection and recognition: Deep learning model can be used to identify and locate
objects within images and videos, making it possible for machines to perform tasks such as
self-driving cars, surveillance, and robotics.
 Image classification: Deep learning models can be used to classify images into categories
such as animals, plants, and buildings. This is used in applications such as medical imaging,
quality control, and image retrieval.
 Image segmentation: Deep learning models can be used for image segmentation into
different regions, making it possible to identify specific features within images.
2. Natural language processing (NLP):
In Deep learning applications, second application is NLP. NLP, the Deep learning model can enable
machines to understand and generate human language. Some of the main applications of deep learning
in NLP include:
 Automatic Text Generation – Deep learning model can learn the corpus of text and new text
like summaries, essays can be automatically generated using these trained models.
 Language translation: Deep learning models can translate text from one language to
another, making it possible to communicate with people from different linguistic
backgrounds.
 Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text,
making it possible to determine whether the text is positive, negative, or neutral. This is used
in applications such as customer service, social media monitoring, and political analysis.
 Speech recognition: Deep learning models can recognize and transcribe spoken words,
making it possible to perform tasks such as speech-to-text conversion, voice search, and
voice-controlled devices.
3. Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an environment to
maximize a reward. Some of the main applications of deep learning in reinforcement learning
include:
 Game playing: Deep reinforcement learning models have been able to beat human experts at
games such as Go, Chess, and Atari.
 Robotics: Deep reinforcement learning models can be used to train robots to perform
complex tasks such as grasping objects, navigation, and manipulation.
 Control systems: Deep reinforcement learning models can be used to control complex
systems such as power grids, traffic management, and supply chain optimization.
Challenges in Deep Learning
Deep learning has made significant advancements in various fields, but there are still some challenges
that need to be addressed. Here are some of the main challenges in deep learning:
1. Data availability: It requires large amounts of data to learn from. For using deep learning it’s
a big concern to gather as much data for training.
2. Computational Resources: For training the deep learning model, it is computationally
expensive because it requires specialized hardware like GPUs and TPUs.
3. Time-consuming: While working on sequential data depending on the computational
resource it can take very large even in days or months.
4. Interpretability: Deep learning models are complex, it works like a black box. it is very
difficult to interpret the result.
5. Overfitting: when the model is trained again and again, it becomes too specialized for the
training data, leading to overfitting and poor performance on new data.
Advantages of Deep Learning:
1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in
various tasks, such as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover and
learn relevant features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can
learn from massive amounts of data.
4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle
various types of data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their performance
as more data becomes available.
Disadvantages of Deep Learning:
1. High computational requirements: Deep Learning AI models require large amounts of data
and computational resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large
amount of labeled data for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult to
understand how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training data, resulting in
poor performance on new and unseen data.
4. Black-box nature: Deep Learning models are often treated as black boxes, making it difficult
to understand how they work and how they arrived at their predictions.
Conclusion
In conclusion, the field of Deep Learning represents a transformative leap in artificial intelligence. By
mimicking the human brain’s neural networks, Deep Learning AI algorithms have revolutionized
industries ranging from healthcare to finance, from autonomous vehicles to natural language
processing. As we continue to push the boundaries of computational power and dataset sizes, the
potential applications of Deep Learning are limitless. However, challenges such as interpretability and
ethical considerations remain significant. Yet, with ongoing research and innovation, Deep Learning
promises to reshape our future, ushering in a new era where machines can learn, adapt, and solve
complex problems at a scale and speed previously unimaginable.
Artificial Neural Network Terminologies
The ANN(Artificial Neural Network) is based on BNN(Biological Neural Network) as its primary
goal is to fully imitate the Human Brain and its functions. Similar to the brain having neurons
interlinked to each other, the ANN also has neurons that are linked to each other in various layers of
the networks which are known as nodes.
The ANN learns through various learning algorithms that are described as supervised or unsupervised
learning.
 In supervised learning algorithms, the target values are labeled. Its goal is to try to reduce the
error between the desired output (target) and the actual output for optimization. Here, a
supervisor is present.
 In unsupervised learning algorithms, the target values are not labeled and the network learns
by itself by identifying the patterns through repeated trials and experiments.
ANN Terminology:
 Weights: each neuron is linked to the other neurons through connection links that carry
weight. The weight has information and data about the input signal. The output depends
solely on the weights and input signal. The weights can be presented in a matrix form that is
known as the Connection matrix.

 if there are “n” nodes with each node having “m” weights, then it is represented as:
 Bias: Bias is a constant that is added to the product of inputs and weights to calculate the
product. It is used to shift the result to the positive or negative side. The net input weight is
increased by a positive bias while The net input weight is decreased by a negative bias.

Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the function g(x)
which sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+........+xn+b
and the role of the activation is to provide the output depending on the results of the summation
function:
Y=1 if g(x)>=0
Y=0 else
 Threshold: A threshold value is a constant value that is compared to the net input to get the
output. The activation function is defined based on the threshold value to calculate the output.
For Example:
Y=1 if net input>=threshold
Y=0 else
 Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used for balancing
weights during the learning of ANN.
 Target value: Target values are Correct values of the output variable and are also known as
just targets.
 Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
 Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is also known
as Least Mean Square Method. It reduces the error over the entire learning and training
process. In order to minimize error, it follows the gradient descent method in which the
Activation Function continues forever.
 Outstar Learning: It was first proposed by Grossberg in 1976, where we use the concept that
a Neural Network is arranged in layers, and weights connected through a particular node
should be equal to the desired output resulting in neurons that are connected with those
weights.
Unsupervised Learning Algorithms:
 Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of nodes in a
network. The change in weight is based on input, output, and learning rate. the transpose of
the output is needed for weight adjustment.
 Competitive Learning: It is a winner takes all strategy. Here, when an input pattern is sent to
the network, all the neurons in the layer compete with each other to represent the input
pattern, the winner gets the output as 1 and all the others 0, and only the winning neurons
have weight adjustments.
Understanding Neurons in Deep Learning
Neurons are a critical component of any deep learning model.
In fact, one could argue that you can't fully understand deep learning with having a deep knowledge
of how neurons work.
This article will introduce you to the concept of neurons in deep learning. We'll talk about the origin
of deep learning neurons, how they were inspired by the biology of the human brain, and why neurons
are so important in deep learning models today.
What is a Neuron in Biology?
Neurons in deep learning were inspired by neurons in the human brain. Here is a diagram of the
anatomy of a brain neuron:

As you can see, neurons have quite an interesting structure. Groups of neurons work together inside
the human brain to perform the functionality that we require in our day-to-day lives.
The question that Geoffrey Hinton asked during his seminal research in neural networks was whether
we could build computer algorithms that behave similarly to neurons in the brain. The hope was that
by mimicking the brain's structure, we might capture some of its capability.
To do this, researchers studied the way that neurons behaved in the brain. One important observation
was that a neuron by itself is useless. Instead, you require networks of neurons to generate any
meaningful functionality.
This is because neurons function by receiving and sending signals. More specifically, the
neuron's dendrites receive signals and pass along those signals through the axon.
The dendrites of one neuron are connected to the axon of another neuron. These connections are
called synapses - which is a concept that has been generalized to the field of deep learning.
What is a Neuron in Deep Learning?
Neurons in deep learning models are nodes through which data and computations flow.
Neurons work like this:
 They receive one or more input signals. These input signals can come from either the raw data
set or from neurons positioned at a previous layer of the neural net.
 They perform some calculations.
 They send some output signals to neurons deeper in the neural net through a synapse.
Here is a diagram of the functionality of a neuron in a deep learning neural net:

Let's walk through this diagram step-by-step.


As you can see, neurons in a deep learning model are capable of having synapses that connect to more
than one neuron in the preceding layer. Each synapse has an associated weight, which impacts the
preceding neuron's importance in the overall neural network.
Weights are a very important topic in the field of deep learning because adjusting a model's weights is
the primary way through which deep learning models are trained. You'll see this in practice later on
when we build our first neural networks from scratch.
Once a neuron receives its inputs from the neurons in the preceding layer of the model, it adds up
each signal multiplied by its corresponding weight and passes them on to an activation function, like
this:

The activation function calculates the output value for the neuron. This output value is then passed on
to the next layer of the neural network through another synapse.
This serves as a broad overview of deep learning neurons. Do not worry if it was a lot to take in -
we'll learn much more about neurons in deep learning throughout this course. For now, it's sufficient
for you to have a high-level understanding of how they are structured in a deep learning model.
Final Thoughts
In this tutorial, you had your first introduction to neurons in deep learning.
Here is a brief summary of what you learned:
 A quick overview of how neurons work in the human brain
 How neurons work in a deep learning model
 The different layers of neurons in a deep learning model
 The functionality of deep learning neurons
 How weights are applied to input signals within a neuron
 That activation functions are applied to the weighted sum of input signals to calculate a
neuron's output value
What is Perceptron | The Simplest Artificial neural network
A single-layer feedforward neural network was introduced in the late 1950s by Frank Rosenblatt. It
was the starting phase of Deep Learning and Artificial neural networks. During that time for
prediction, Statistical machine learning, or Traditional code Programming is used. Perceptron is one
of the first and most straightforward models of artificial neural networks. Despite being a
straightforward model, the perceptron has been proven to be successful in solving specific
categorization issues.
What is Perceptron?
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by Frank
Rosenblatt in 1957s. It is the simplest type of feedforward neural network, consisting of a single layer
of input nodes that are fully connected to a layer of output nodes. It can learn the linearly separable
patterns. it uses slightly different types of artificial neurons known as threshold logic units (TLU). it
was first introduced by McCulloch and Walter Pitts in the 1940s.
Types of Perceptron
 Single-Layer Perceptron: This type of perceptron is limited to learning linearly separable
patterns. effective for tasks where the data can be divided into distinct categories through a
straight line.
 Multilayer Perceptron: Multilayer perceptrons possess enhanced processing capabilities as
they consist of two or more layers, adept at handling more complex patterns and relationships
within the data.
Basic Components of Perceptron
A perceptron, the basic unit of a neural network, comprises essential components that collaborate in
information processing.
 Input Features: The perceptron takes multiple input features, each input feature represents a
characteristic or attribute of the input data.
 Weights: Each input feature is associated with a weight, determining the significance of each
input feature in influencing the perceptron’s output. During training, these weights are
adjusted to learn the optimal values.
 Summation Function: The perceptron calculates the weighted sum of its inputs using the
summation function. The summation function combines the inputs with their respective
weights to produce a weighted sum.
 Activation Function: The weighted sum is then passed through an activation function.
Perceptron uses Heaviside step function functions. which take the summed values as input
and compare with the threshold and provide the output as 0 or 1.
 Output: The final output of the perceptron, is determined by the activation function’s result.
For example, in binary classification problems, the output might represent a predicted class (0
or 1).
 Bias: A bias term is often included in the perceptron model. The bias allows the model to
make adjustments that are independent of the input. It is an additional parameter that is
learned during training.
 Learning Algorithm (Weight Update Rule): During training, the perceptron learns by
adjusting its weights and bias based on a learning algorithm. A common approach is the
perceptron learning algorithm, which updates weights based on the difference between the
predicted output and the true output.
These components work together to enable a perceptron to learn and make predictions. While a single
perceptron can perform binary classification, more complex tasks require the use of multiple
perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the significance of that input to the
output. The perceptron’s output is a weighted sum of the inputs that have been run through an
activation function to decide whether or not the perceptron will fire. it computes the weighted sum of
its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is larger
than a threshold value and 0 otherwise, is the activation function that perceptrons utilize the most
frequently. The most common step function used in perceptron is the Heaviside step function:A
perceptron has a single layer of threshold logic units with each TLU connected to all inputs.

Threshold Logic units


When all the neurons in a layer are connected to every neuron of the previous layer, it is known as a
fully connected layer or dense layer.
The output of the fully connected layer can be:
where X is the input W is the weight for each inputs neurons and b is the bias and h is the step
function.
During training, The perceptron’s weights are adjusted to minimize the difference between the
predicted output and the actual output. Usually, supervised learning algorithms like the delta rule or
the perceptron learning rule are used for this.
Here wi,j is the weight between the ith input and jth output neuron, xi is the ith input value, and

yj and is the jth actual and predicted value is the learning rate.
Backpropagation in Neural Network
Machine learning models learn from data and make predictions. One of the fundamental concepts
behind training these models is backpropagation
A neural network is a network structure, by the presence of computing units(neurons) the neural
network has gained the ability to compute the function. The neurons are connected with the help of
edges, and it is said to have an assigned activation function and also contains the adjustable
parameters. These adjustable parameters help the neural network to determine the function that needs
to be computed by the network. In terms of activation function in neural networks, the higher the
activation value is the greater the activation is.
What is backpropagation?
 In machine learning, backpropagation is an effective algorithm used to train artificial neural
networks, especially in feed-forward neural networks.
 Backpropagation is an iterative algorithm, that helps to minimize the cost function by
determining which weights and biases should be adjusted. During every epoch, the model
learns by adapting the weights and biases to minimize the loss by moving down toward the
gradient of the error. Thus, it involves the two most popular optimization algorithms, such
as gradient descent or stochastic gradient descent.
 Computing the gradient in the backpropagation algorithm helps to minimize the cost
function and it can be implemented by using the mathematical rule called chain rule from
calculus to navigate through complex layers of the neural network.

fig(a) A simple illustration of how the


backpropagation works by adjustments of weights
Advantages of Using the Backpropagation Algorithm in Neural Networks
Backpropagation, a fundamental algorithm in training neural networks, offers several advantages that
make it a preferred choice for many machine learning tasks. Here, we discuss some key advantages of
using the backpropagation algorithm:
1. Ease of Implementation: Backpropagation does not require prior knowledge of neural
networks, making it accessible to beginners. Its straightforward nature simplifies the
programming process, as it primarily involves adjusting weights based on error derivatives.
2. Simplicity and Flexibility: The algorithm’s simplicity allows it to be applied to a wide range
of problems and network architectures. Its flexibility makes it suitable for various scenarios,
from simple feedforward networks to complex recurrent or convolutional neural networks.
3. Efficiency: Backpropagation accelerates the learning process by directly updating weights
based on the calculated error derivatives. This efficiency is particularly advantageous in
training deep neural networks, where learning features of a function can be time-consuming.
4. Generalization: Backpropagation enables neural networks to generalize well to unseen data
by iteratively adjusting weights during training. This generalization ability is crucial for
developing models that can make accurate predictions on new, unseen examples.
5. Scalability: Backpropagation scales well with the size of the dataset and the complexity of
the network. This scalability makes it suitable for large-scale machine learning tasks, where
training data and network size are significant factors.
In conclusion, the backpropagation algorithm offers several advantages that contribute to its
widespread use in training neural networks. Its ease of implementation, simplicity, efficiency,
generalization ability, and scalability make it a valuable tool for developing and training neural
network models for various machine learning applications.
Working of Backpropagation Algorithm
The Backpropagation algorithm works by two different passes, they are:
 Forward pass
 Backward pass
How does Forward pass work?
 In forward pass, initially the input is fed into the input layer. Since the inputs are raw data,
they can be used for training our neural network.
 The inputs and their corresponding weights are passed to the hidden layer. The hidden layer
performs the computation on the data it receives. If there are two hidden layers in the neural
network, for instance, consider the illustration fig(a), h1 and h2 are the two hidden layers, and
the output of h1 can be used as an input of h2. Before applying it to the activation function,
the bias is added.
 To the weighted sum of inputs, the activation function is applied in the hidden layer to each of
its neurons. One such activation function that is commonly used is ReLU can also be used,
which is responsible for returning the input if it is positive otherwise it returns zero. By doing
this so, it introduces the non-linearity to our model, which enables the network to learn the
complex relationships in the data. And finally, the weighted outputs from the last hidden layer
are fed into the output to compute the final prediction, this layer can also use the activation
function called the softmax function which is responsible for converting the weighted outputs
into probabilities for each class.

The forward pass using weights and biases


How does backward pass work?
 In the backward pass process shows, the error is transmitted back to the network which helps
the network, to improve its performance by learning and adjusting the internal weights.
 To find the error generated through the process of forward pass, we can use one of the most
commonly used methods called mean squared error which calculates the difference between
the predicted output and desired output. The formula for mean squared error is: Mean squared
error = (predicted output – actual output)^2
 Once we have done the calculation at the output layer, we then propagate the error backward
through the network, layer by layer.
 The key calculation during the backward pass is determining the gradients for each weight
and bias in the network. This gradient is responsible for telling us how much each weight/bias
should be adjusted to minimize the error in the next forward pass. The chain rule is used
iteratively to calculate this gradient efficiently.
 In addition to gradient calculation, the activation function also plays a crucial role in
backpropagation, it works by calculating the gradients with the help of the derivative of the
activation function.
Example of Backpropagation in Machine Learning
Let us now take an example to explain backpropagation in Machine Learning,
Assume that the neurons have the sigmoid activation function to perform forward and
backward pass on the network. And also assume that the actual output of y is 0.5 and the
learning rate is 1. Now perform the backpropagation using backpropagation algorithm.

Example (1) of backpropagation sum


Implementing forward propagation:
Step1: Before proceeding to calculating forward propagation, we need to know the two formulae:
a_j = \sum (w_i,j * x_i)
Where,
 aj is the weighted sum of all the inputs and weights at each node,
 wi,j – represents the weights associated with the jth input to the ith neuron,
 xi – represents the value of the jth input,
y_j = F(a_j) = \frac 1 {1+e^{-aj}}, yi – is the output value, F denotes the activation function [sigmoid
activation function is used here), which transforms the weighted sum into the output value.
Step 2: To compute the forward pass, we need to compute the output for y3 , y4 , and y5.

To find the outputs of y3, y4 and y5


We start by calculating the weights and inputs by using the formula:
a_j = ∑ (w_{i,j} * x_i) To find y3 , we need to consider its incoming edges along with its weight and
input. Here the incoming edges are from X1 and X2.
At h1 node,
\begin {aligned} a_1 &= (w_{1,1} x_1) + (w_{2,1} x_2) \\& = (0.2 * 0.35) + (0.2* 0.7) \\&= 0.21 \
end {aligned}
Once, we calculated the a1 value, we can now proceed to find the y3 value:
y_j= F(a_j) = \frac 1 {1+e^{-aj}}
y_3 = F(0.21) = \frac 1 {1+e^{-0.21}}
y_3 = 0.56
Similarly find the values of y4 at h2 and y5 at O3 ,
a2 = (w_{1,2} * x_1) + (w_{2,2} * x_2) = (0.3*0.35)+(0.3*0.7)=0.315
y_4 = F(0.315) = \frac 1{1+e^{-0.315}}
a3 = (w_{1,3}*y_3)+(w_{2,3}*y_4) =(0.3*0.57)+(0.9*0.59) =0.702
y_5 = F(0.702) = \frac 1 {1+e^{-0.702} } = 0.67

Values of y3, y4 and y5


Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use the below
formula:
Error_j= y_{target} – y_5
Error = 0.5 – 0.67
= -0.17
Using this error value, we will be backpropagating.
Implementing Backward Propagation
Each weight in the network is changed by,

∇wij = η ?j Oj
?j = Oj (1-Oj)(tj - Oj) (if j is an output unit)
?j = Oj (1-O)∑k ?k wkj (if j is a hidden unit)
where ,
η is the constant which is considered as learning rate,
tj is the correct output for unit j
?j is the error measure for unit j
Step 3: To calculate the backpropagation, we need to start from the output unit:
To compute the ?5, we need to use the output of forward pass,
?5 = y5(1-y5) (ytarget -y5)
= 0.67(1-0.67) (-0.17)
= -0.0376
For hidden unit,
To compute the hidden unit, we will take the value of ?5
?3 = y3(1-y3) (w1,3 * ?5)
=0.56(1-0.56) (0.3*-0.0376)
=-0.0027
?4 = y4 (1-y5) (w2,3 * ?5)
=0.59(1-0.59) (0.9*-0.0376)
=-0.0819
Step 4: We need to update the weights, from output unit to hidden unit,

∇ wj,i = η ?j Oj
Note- Here our learning rate is 1

∇ w2,3 = η ?5 O4
= 1 * (-0.376) * 0.59
= -0.22184
We will be updating the weights based on the old weight of the network,

w2,3(new) = ∇ w4,5 + w4,5 (old)


= -0.22184 + 0.9
= 0.67816
From hidden unit to input unit,
For an hidden to input node, we need to do calculations by the following;

∇ w1,1 = η ?3 O4
= 1 * (-0.0027) * 0.35
= 0.000945
Similarly, we need to calculate the new weight value using the old one:

w1,1(new) = ∇ w1,1+ w1,1 (old)


= 0.000945 + 0.2
= 0.200945
Similarly, we update the weights of the other neurons: The new weights are mentioned below
w1,2 (new) = 0.271335
w1,3 (new) = 0.08567
w2,1 (new) = 0.29811
w2,2 (new) = 0.24267
The updated weights are illustrated below,

Through backward pass the weights are updated

Once, the above process is done, we again perform the forward pass to find if we obtain the actual
output as 0.5.
While performing the forward pass again, we obtain the following values:
y3 = 0.57
y4 = 0.56
y5 = 0.61
We can clearly see that our y5 value is 0.61 which is not an expected actual output, So again we need
to find the error and backpropagate through the network by updating the weights until the actual
output is obtained.
Error = y_{target} – y_5
= 0.5 – 0.61
= -0.11
This is how the backpropagate works, it will be performing the forward pass first to see if we obtain
the actual output, if not we will be finding the error rate and then backpropagating backwards through
the layers in the network by adjusting the weights according to the error rate. This process is said to
be continued until the actual output is gained by the neural network.
Types Of Learning Rules in ANN
Learning rule enhances the Artificial Neural Network’s performance by applying this rule over the
network. Thus learning rule updates the weights and bias levels of a network when certain conditions
are met in the training process. it is a crucial part of the development of the Neural Network.
Types Of Learning Rules in ANN

1. Hebbian Learning Rule


Donald Hebb developed it in 1949 as an unsupervised learning algorithm in the neural network. We
can use it to improve the weights of nodes of a network. The following phenomenon occurs when
 If two neighbor neurons are operating in the same phase at the same period of time, then the
weight between these neurons should increase.
 For neurons operating in the opposite phase, the weight between them should decrease.
 If there is no signal correlation, the weight does not change, the sign of the weight between
two nodes depends on the sign of the input between those nodes
 When inputs of both the nodes are either positive or negative, it results in a strong positive
weight.
 If the input of one node is positive and negative for the other, a strong negative weight is
present.
Mathematical Formulation:
δw=αxiy
where δw=change in weight,α is the learning rate.xi the input vector,y the output.
2. Perceptron Learning Rule
It was introduced by Rosenblatt. It is an error-correcting rule of a single-layer feedforward network. it
is supervised in nature and calculates the error between the desired and actual output and if the output
is present then only adjustments of weight are done.
Computed as follows:
Assume (x1,x2,x3……………………….xn) –>set of input vectors
and (w1,w2,w3…………………..wn) –>set of weights
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
α=learning rate
actual output(y)=wixi
learning signal(ej)=ti-y (difference between desired and actual output)
δw=αxiej
wnew=wo+δw
Now, the output can be calculated on the basis of the input and the activation function applied over
the net input and can be expressed as:
y=1, if net input>=θ
y=0, if net input<θ

3. Delta Learning Rule


It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and
has a continuous activation function. It is also known as the Least Mean Square method and it
minimizes error over all the training patterns.
It is based on a gradient descent approach which continues forever. It states that the modification in
the weight of a node is equal to the product of the error and the input where the error is the difference
between desired and actual output.
Computed as follows:
Assume (x1,x2,x3……………………….xn) –>set of input vectors
and (w1,w2,w3…………………..wn) –>set of weights
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
Error= ti-y
Learning signal(ej)=(ti-y)y’
y=f(net input)= ∫wixi
δw=αxiej=αxi(ti-y)y’
wnew=wo+δw
The updating of weights can only be done if there is a difference between the target and actual
output(i.e., error) present:
case I: when t=y
then there is no change in weight
case II: else
wnew=wo+δw
4. Correlation Learning Rule
The correlation learning rule follows the same similar principle as the Hebbian learning rule,i.e., If
two neighbor neurons are operating in the same phase at the same period of time, then the weight
between these neurons should be more positive. For neurons operating in the opposite phase, the
weight between them should be more negative but unlike the Hebbian rule, the correlation rule is
supervised in nature here, the targeted response is used for the calculation of the change in weight.
In Mathematical form:
δw=αxitj
where δw=change in weight,α=learning rate,xi=set of the input vector, and tj=target value

5. Out Star Learning Rule


It was introduced by Grossberg and is a supervised training procedure.

Out Star Learning Rule is implemented when nodes in a network are arranged in a layer. Here the
weights linked to a particular node should be equal to the targeted outputs for the nodes connected
through those same weights. Weight change is thus calculated as=δw=α(t-y)
Where α=learning rate, y=actual output, and t=desired output for n layer nodes.
6. Competitive Learning Rule

It is also known as the Winner-takes-All rule and is unsupervised in nature. Here all the output nodes
try to compete with each other to represent the input pattern and the winner is declared according to
the node having the most outputs and is given the output 1 while the rest are given 0.
There are a set of neurons with arbitrarily distributed weights and the activation function is applied to
a subset of neurons. Only one neuron is active at a time. Only the winner has updated weights, the rest
remain unchanged.
Activation functions in Neural Networks
In the process of building a neural network, one of the choices you get to make is what Activation
Function to use in the hidden layer as well as at the output layer of the network.
What is an Activation Function?
An activation function in the context of neural networks is a mathematical function applied to the
output of a neuron. The purpose of an activation function is to introduce non-linearity into the model,
allowing the network to learn and represent complex patterns in the data. Without non-linearity, a
neural network would essentially behave like a linear regression model, regardless of the number of
layers it has.
The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-
linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function. In a neural network, we would update the weights and
biases of the neurons on the basis of the error at the output. This process is known as back-
propagation. Activation functions make the back-propagation possible since the gradients are
supplied along with the error to update the weights and biases.
Elements of a Neural Network
Input Layer: This layer accepts input features. It provides information from the outside world to the
network, no computation is performed at this layer, nodes here just pass on the information(features)
to the hidden layer.
Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of the abstraction
provided by any neural network. The hidden layer performs all sorts of computation on the features
entered through the input layer and transfers the result to the output layer.
Output Layer: This layer bring up the information learned by the network to the outer world.
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear regression model. The
activation function does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-

Elements of the diagram are as follows:


Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
Here,
 z(1) is the vectorized output of layer 1
 W(1) be the vectorized weights assigned to neurons of hidden layer i.e. w1, w2, w3 and w4
 X be the vectorized input features i.e. i1 and i2
 b is the vectorized bias assigned to neurons in hidden layer i.e. b1 and b2
 a(1) is the vectorized form of any linear function.
(Note: We are not considering activation function here)

Layer 2 i.e. output layer :-


Note : Input for layer 2 is output from layer 1
z(2) = W(2)a(1) + b(2)
a(2) = z(2)
Calculation at Output layer
z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
Let,
[W(2) * W(1)] = W
[W(2)*b(1) + b(2)] = b
Final output : z(2) = W*X + b
which is again a linear function
This observation results again in a linear function even after applying a hidden layer, hence we can
conclude that, doesn’t matter how many hidden layer we attach in neural net, all layers will behave
same way because the composition of two linear function is a linear function itself. Neuron can not
learn with just a linear function attached to it. A non-linear activation function will let it learn as per
the difference w.r.t error. Hence we need an activation function.
Variants of Activation Function
Linear Function
 Equation : Linear function has the equation similar to as of a straight line i.e. y = x
 No matter how many layers we have, if all are linear in nature, the final activation function of
last layer is nothing but just a linear function of the input of first layer.
 Range : -inf to +inf
 Uses : Linear activation function is used at just one place i.e. output layer.
 Issues : If we will differentiate linear function to bring non-linearity, result will no more
depend on input “x” and function will become constant, it won’t introduce any ground-
breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House price may have any
big/small value, so we can apply linear activation at output layer. Even in this case neural net must
have any non-linear function at hidden layers.
Sigmoid Function

 It is a function which is plotted as ‘S’ shaped graph.


 Equation : A = 1/(1 + e-x)
 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep. This
means, small changes in x would also bring about large changes in the value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is either 0 or 1, as
value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to
be 1 if value is greater than 0.5 and 0 otherwise.
Tanh Function

 The activation that works almost always better than sigmoid function is Tanh function also
known as Tangent Hyperbolic function. It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived from each other.
 Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s values lies between -1 to
1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps
in centering the data by bringing mean close to 0. This makes learning for the next layer
much easier.
RELU Function
 It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and have multiple
layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves
simpler mathematical operations. At a time only a few neurons are activated making the
network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
Softmax Function

The softmax function is also a type of sigmoid function but is handy when we are trying to handle
multi- class classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. the softmax function was
commonly found in the output layer of image classification problems.The softmax function
would squeeze the outputs for each class between 0 and 1 and would also divide by the sum
of the outputs.
 Output:- The softmax function is ideally used in the output layer of the classifier where we
are actually trying to attain the probabilities to define the class of each input.
 The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function in hidden layers and is used in most
cases these days.
 If your output is for binary classification then, sigmoid function is very natural choice for
output layer.
 If your output is for multi-class classification then, Softmax is very useful to predict the
probabilities of each classes.
Understanding Loss Function in Deep Learning
Machine learning allows for prediction, classification and decisions derived from data. In research,
machine learning is part of artificial intelligence, and the process of developing a computational
model has capabilities mimicking human intelligence. Machine learning and related methods involve
developing algorithms that recognize patterns in the information that is available, and perform
predictive or classification of Loss Function.
What Are Loss Functions in Machine Learning?
The loss function helps determine how effectively your algorithm model the featured dataset.
Similarly loss is the measure that your model has for predictability, the expected results. Losses can
generally fall into two broad categories relating to real world problems: classification and regression.
We must predict probability for each class in which the problem is concerned. In regression however
we have the task of forecasting a constant value for a specific group of independent features.
What is Loss Function in Deep Learning?
In mathematical optimization and decision theory, a loss or cost function (sometimes also called an
error function) is a function that maps an event or values of one or more variables onto a real number
intuitively representing some “cost” associated with the event.
In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling
your dataset. It is a mathematical function of the parameters of the machine learning algorithm.
In simple linear regression, prediction is calculated using slope (m) and intercept (b). The loss
function for this is the (Yi – Yihat)^2 i.e., loss function is the function of slope and intercept.
Regression loss functions like the MSE loss function are commonly used in evaluating the
performance of regression models. Additionally, objective functions play a crucial role in optimizing
machine learning models by minimizing the loss or cost. Other commonly used loss functions include
the Huber loss function, which combines the characteristics of the MSE and MAE loss functions,
providing robustness to outliers in the data.

Why is the Loss Function Important in Deep Learning?


In mathematical optimization and decision theory, a loss or cost function (sometimes also called an
error function) is a function that maps an event or values of one or more variables onto a real number
intuitively representing some “cost” associated with the event.
In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling
your dataset. It is a mathematical function of the parameters of the machine learning algorithm.
In simple linear regression, prediction is calculated using slope (m) and intercept (b). The loss
function for this is the (Yi – Yihat)^2 i.e., loss function is the function of slope and intercept.
Regression loss functions like the MSE loss function are commonly used in evaluating the
performance of regression models. Additionally, objective functions play a crucial role in optimizing
machine learning models by minimizing the loss or cost. Other commonly used loss functions include
the Huber loss function, which combines the characteristics of the MSE and MAE loss functions,
providing robustness to outliers in the data.
Cost Functions in Machine Learning
Cost functions are vital in machine learning, measuring the disparity between predicted and actual
outcomes. They guide the training process by quantifying errors and driving parameter updates.
Common ones include Mean Squared Error (MSE) for regression and cross-entropy for classification.
These functions shape model performance and guide optimization techniques like gradient descent,
leading to better predictions.
Role of Loss Functions in Machine Learning Algorithms
Loss functions play a pivotal role in machine learning algorithms, acting as objective measures of the
disparity between predicted and actual values. They serve as the basis for model training, guiding
algorithms to adjust model parameters in a direction that minimizes the loss and improves predictive
accuracy. Here, we explore the significance of loss functions in the context of machine learning
algorithms.
In machine learning, loss functions quantify the extent of error between predicted and actual
outcomes. They provide a means to evaluate the performance of a model on a given dataset and are
instrumental in optimizing model parameters during the training process.
Fundamental Tasks
One of the fundamental tasks of machine learning algorithms is regression, where the goal is to
predict continuous variables. Loss functions such as Mean Squared Error (MSE) and Mean Absolute
Error (MAE) are commonly employed in regression tasks. MSE penalizes larger errors more heavily
than MAE, making it suitable for scenarios where outliers may have a significant impact on the
model’s performance.
For classification problems, where inputs are categorized into discrete classes, cross-entropy loss
functions are widely used. Binary cross-entropy loss is employed in binary classification tasks, while
categorical cross-entropy loss is utilized for multi-class classification. These functions measure the
disparity between predicted probability distributions and the actual distribution of classes, guiding the
model towards more accurate predictions.
The choice of a loss function depends on various factors, including the nature of the problem, the
distribution of the data, and the desired characteristics of the model. Different loss functions
emphasize different aspects of model performance and may be more suitable for specific applications.
During the training process, machine learning algorithms employ optimization techniques such as
gradient descent to minimize the loss function. By iteratively adjusting model parameters based on the
gradients of the loss function, the algorithm aims to converge to the optimal solution, resulting in a
model that accurately captures the underlying patterns in the data.
Overall, loss functions play a crucial role in machine learning algorithms, serving as objective
measures of model performance and guiding the learning process. Understanding the role of loss
functions is essential for effectively training and optimizing machine learning models for various
tasks and applications.
Loss Functions in Deep Learning
Regression Loss Functions
1. Mean Squared Error/Squared loss/ L2 loss
The Mean Squared Error (MSE) is a straightforward and widely used loss function. To calculate the
MSE, you take the difference between the actual value and the model prediction, square it, and then
average it across the entire dataset.

Advantage
 Easy Interpretation: The MSE is straightforward to understand.
 Always Differential: Due to the squaring, it is always differentiable.
 Single Local Minimum: It has only one local minimum.
Disadvantage
 Error Unit in Squared Form: The error is measured in squared units, which might not be
intuitively interpretable.
 Not Robust to Outliers: MSE is sensitive to outliers.
Note: In regression tasks, at the last neuron, it’s common to use a linear activation function.
2. Mean Absolute Error/ L1 loss Functions
The Mean Absolute Error (MAE) is another simple loss function. It calculates the average absolute
difference between the actual value and the model prediction across the dataset.

Advantage
 Intuitive and Easy: MAE is easy to grasp.
 Error Unit Matches Output Column: The error unit is the same as the output column.
 Robust to Outliers: MAE is less affected by outliers.
Disadvantage
 Graph Not Differential: The MAE graph is not differentiable, so gradient descent cannot be
applied directly. Subgradient calculation is an alternative.
Note: In regression tasks, at the last neuron, a linear activation function is commonly used.
3. Huber Loss
The Huber loss is used in robust regression and is less sensitive to outliers compared to squared error
loss.

 n: The number of data points.


 y: The actual value (true value) of the data point.
 ŷ: The predicted value returned by the model.
 δ: Defines the point where the Huber loss transitions from quadratic to linear.
Advantage
 Robust to Outliers: Huber loss is more robust to outliers.
 Balances MAE and MSE: It lies between MAE and MSE.
Disadvantage
 Complexity: Optimizing the hyperparameter δ increases training requirements.
Classification Loss
1. Binary Cross Entropy/log loss Functions in machine learning models
It is used in binary classification problems like two classes. example a person has covid or not or my
article gets popular or not.
Binary cross entropy compares each of the predicted probabilities to the actual class output which can
be either 0 or 1. It then calculates the score that penalizes the probabilities based on the distance from
the expected value. That means how close or far from the actual value.

 yi – actual values
 yihat – Neural Network prediction
Advantage –
 A cost function is a differential.
Disadvantage –
 Multiple local minima
 Not intuitive
Note – In classification at last neuron use sigmoid activation function.
2. Categorical Cross Entropy
Categorical Cross entropy is used for Multiclass classification and softmax regression.
loss function = -sum up to k(yjlagyjhat) where k is classes

cost function = -1/n(sum upto n(sum j to k (yijloghijhat))

where
 k is classes,
 y = actual value
 yhat – Neural Network prediction
Note – In multi-class classification at the last neuron use the softmax activation function.

if problem statement have 3 classes


softmax activation – f(z) = ez1/(ez1+ez2+ez3)
When to use categorical cross-entropy and sparse categorical cross-entropy?
If target column has One hot encode to classes like 0 0 1, 0 1 0, 1 0 0 then use categorical cross-
entropy. and if the target column has Numerical encoding to classes like 1,2,3,4….n then use sparse
categorical cross-entropy.
Which is Faster?
Sparse categorical cross-entropy faster than categorical cross-entropy.
Conclusion
The significance of loss functions in deep learning cannot be overstated. They serve as vital metrics
for evaluating model performance, guiding parameter adjustments, and optimizing algorithms during
training. Whether it’s quantifying disparities in regression tasks through MSE or MAE, penalizing
deviations in binary classification with binary cross-entropy, or ensuring robustness to outliers with
the Huber loss function, selecting the appropriate loss function is crucial. Understanding the
distinction between loss and cost functions, as well as their role in objective functions, provides
valuable insights into model optimization. Ultimately, the choice of loss function profoundly impacts
model training and performance, underscoring its pivotal role in the deep learning landscape.
Function Approximation with Deep Learning: A Practical Guide with Code Examples
Function approximation is a fundamental problem in machine learning, where the goal is to
approximate an unknown function from a set of input-output data pairs. Deep learning, with its ability
to learn complex non-linear representations, has proven to be a powerful tool for function
approximation problems. In this article, we will explore the basics of function approximation in deep
learning, and provide code examples using Python and the popular deep learning library, TensorFlow.
The Basics of Function Approximation:
Function approximation involves approximating an unknown function f(x) using a set of input-output
data pairs (x, y). The goal is to learn a function g(x) that is as close to the unknown function f(x) as
possible, based on the available data.
Deep learning models are well-suited for function approximation problems, as they have the ability to
learn complex non-linear relationships between the input and output data. The basic idea is to train a
neural network model on the input-output data pairs, and use the learned model to predict the output
for new input values.
Function Approximation in Reinforcement Learning
Function approximation is a critical concept in reinforcement learning (RL), enabling algorithms to
generalize from limited experience to a broader set of states and actions. This capability is essential
when dealing with complex environments where the state and action spaces are vast or continuous.
Significance of Function Approximation
In reinforcement learning, the agent’s goal is to learn a policy that maximizes cumulative reward over
time. This involves estimating value functions, which predict future rewards, or directly
approximating the policy, which maps states to actions. In many practical problems, the state or action
spaces are too large to allow for an exact representation of value functions or policies. Function
approximation addresses this issue by enabling the use of parameterized functions to represent these
components compactly.
Here are some key points highlighting its significance:
1. Handling Complexity: In many real-world problems, the state and action spaces are too vast
to enumerate or store explicitly. Function approximation allows RL algorithms to represent
value functions or policies compactly using parameterized functions.
2. Generalization: Function approximation enables RL agents to generalize from limited
experience to unseen states and actions. This is crucial for robust performance in
environments where exhaustive exploration is impractical.
3. Efficiency: By approximating value functions or policies, RL algorithms can operate
efficiently even in high-dimensional spaces. This efficiency is essential for scaling RL to
complex tasks such as robotic control or game playing.
Types of Function Approximation in Reinforcement learning:
1. Linear Function Approximation:
Linear function approximators use a linear combination of features to represent value functions or
policies. If ϕ(s) represents the feature vector for state ?s, the value function V(s) can be approximated
as:
V(s) \approx \theta^T \phi(s)
where θ is a vector of weights. Linear approximators are simple and efficient but may lack the
capacity to represent complex functions.
2. Non-linear Function Approximation
Non-linear methods, particularly those based on neural networks, have gained prominence due to their
ability to capture complex patterns. Deep Q-Networks (DQNs) and Actor-Critic methods are prime
examples. A neural network with weights θ approximates the value function or policy as:
V(s) \approx f(s;\theta)
These methods, while powerful, require careful tuning and substantial computational resources.
3. Basis Function Methods
Basis functions transform the state space into a higher-dimensional space where the function
approximation becomes easier. Examples include Radial Basis Functions (RBF) and tile coding.
4. Kernel Methods
Kernel-based methods, such as Gaussian processes, provide a non-parametric approach to function
approximation, offering flexibility and robustness at the cost of computational scalability.
Key Concepts in Function Approximation for Reinforcement Learning
1. Features: These are characteristics extracted from the agent’s state that represent relevant
information for making decisions. Choosing informative features is crucial for accurate value
estimation.
2. Learning Algorithm: This algorithm updates the parameters of the chosen function to
minimize the difference between the estimated value and the actual value experienced by the
agent (temporal-difference learning). Common algorithms include linear regression, gradient
descent variants, or policy gradient methods depending on the function class.
3. Function Class: This refers to the type of function used for approximation. Common choices
include linear functions, neural networks, decision trees, or a combination of these. The
complexity of the function class should be balanced with the available data and computational
resources.
Applications of Function Approximation in Reinforcement Learning
1. Robotics Control: Imagine a robot arm learning to manipulate objects. The state space could
include the positions, the object’s location, orientation and sensor readings like gripper force.
2. Playing Atari Games: The state space is vast, when we are dealing with complex
environments like Atari games. Function approximation using deep neural networks becomes
essential to capture the intricate relationships between the visual inputs and the optimal
actions.
3. Stock Market Trading: An RL agent learns to buy and sell stocks to maximize profit. The
state space could involve various financial indicators like stock prices, moving averages, and
market sentiment.
Benefits of Function Approximation
 Generalization: Agents can make good decisions even in unseen states based on what they
have learned from similar states.
 Scalability: Function approximation allows agents to handle problems with large or
continuous state spaces.
 Sample Efficiency: By learning patterns from a smaller set of experiences, agents can make
better decisions with less data.
Challenges in Function Approximation
1. Bias-Variance Trade-off: Choosing the right complexity for the function approximator is
crucial. Too simple a model introduces high bias, while too complex a model leads to high
variance. Balancing this trade-off is essential for stable and efficient learning.
2. Exploration vs. Exploitation: Function approximators must generalize well from limited
exploration data. Ensuring sufficient exploration to prevent overfitting to the initial
experiences is a major challenge.
3. Stability and Convergence: Particularly with non-linear approximators like neural networks,
ensuring stability and convergence during training is difficult. Techniques like experience
replay and target networks in DQNs have been developed to mitigate these issues.
4. Sample Efficiency: Function approximation methods need to be sample efficient, especially
in environments where obtaining samples is costly or time-consuming. Methods like transfer
learning and meta-learning are being explored to enhance sample efficiency.
Conclusion
Function approximation remains a cornerstone of modern reinforcement learning, enabling agents to
operate in complex and high-dimensional spaces. Despite the challenges, continued advancements in
this field hold the promise of more intelligent, efficient, and capable RL systems. As research
progresses, the integration of novel approximation techniques, improved stability methods, and
enhanced sample efficiency strategies will further unlock the potential of RL in diverse applications.

You might also like