0% found this document useful (0 votes)
2 views

Introduction to Deep Learning

The document provides an overview of deep learning, a subset of machine learning that utilizes artificial neural networks to process data. It explains various types of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their limitations. Additionally, it discusses the architecture of deep learning networks, key components like weights and biases, and the importance of activation functions in enabling complex computations.

Uploaded by

jessilsj139
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Introduction to Deep Learning

The document provides an overview of deep learning, a subset of machine learning that utilizes artificial neural networks to process data. It explains various types of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their limitations. Additionally, it discusses the architecture of deep learning networks, key components like weights and biases, and the importance of activation functions in enabling complex computations.

Uploaded by

jessilsj139
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to Deep Learning

Learning is “a process that leads to change, which occurs as a result of experience and increases the potential for
improved performance and future learning”.

Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-
awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-
solving. More generally, it can be described as the ability to perceive or infer information, and to retain it
as knowledge to be applied towards adaptive behaviours within an environment or context.

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to
think like humans and mimic their actions. The term may also be applied to any machine that exhibits traits
associated with a human mind such as learning and problem-solving.

The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best
chance of achieving a specific goal.

A subset of artificial intelligence is machine learning (ML), which refers to the concept that computer programs
can automatically learn from and adapt to new data without being assisted by humans.

Deep learning techniques enable this automatic learning through the absorption of huge amounts of unstructured
data such as text, images, or video.

A Machine Learning system learns from historical data, builds the prediction models, and whenever it receives
new data, predicts the output for it. The accuracy of predicted output depends upon the amount of data, as the
huge amount of data helps to build a better model which predicts the output more accurately.

Suppose we have a complex problem, where we need to perform some predictions, so instead of writing a code
for it, we just need to feed the data to generic algorithms, and with the help of these algorithms, machine builds
the logic as per the data and predict the output. Machine learning has changed our way of thinking about the
problem. The below block diagram explains the working of Machine Learning algorithm:

Features of Machine Learning:

1. Machine learning uses data to detect various patterns in a given dataset.

2. It can learn from past data and improve automatically.

3. It is a data-driven technology.

4. Machine learning is much similar to data mining as it also deals with the huge amount of the data.

These ML algorithms help to solve different business problems like Regression, Classification, Forecasting,
Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

Supervised machine learning is based on supervision. It means in the supervised learning technique, we train the
machines using the "labelled" dataset, and based on the training, the machine predicts the output. Here, the labelled
data specifies that some of the inputs are already mapped to the output. More preciously, we can say; first, we
train the machine with the input and corresponding output, and then we ask the machine to predict the output
using the test dataset.

Unsupervised learning is different from the Supervised learning technique; as its name suggests, there is no need
for supervision. It means, in unsupervised machine learning, the machine is trained using the unlabeled dataset,
and the machine predicts the output without any supervision. In unsupervised learning, the models are trained
with the data that is neither classified nor labelled, and the model acts on that data without any supervision.

Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With Labelled
training data) and Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised learning and
operates on the data that consists of a few labels, it mostly consists of unlabeled data. As labels are costly, but for
corporate purposes, they may have few labels. It is completely different from supervised and unsupervised
learning as they are based on the presence & absence of labels.

Reinforcement learning works on a feedback-based process, in which an AI agent (A software component)


automatically explore its surrounding by hitting & trail, taking action, learning from experiences, and
improving its performance. Agent gets rewarded for each good action and get punished for each bad action;
hence the goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only.

Limitations of Machine Learning

The limitations of machine learning models depend on particular model, problem being solved, and data set used
to train the model. Generally speaking, machine learning models can be limited by their accuracy, by the types of
problems they can solve, and by the quality of the data used to train them.

Limitations of supervised learning


As Supervised learning is a method of machine learning where an algorithm is “trained” using a set of labeled
data. After being trained, the algorithm can then be used to predict the labels of other data sets.

There are many supervised learning algorithms, but all of them have limitations. One of the biggest limitations is
that the algorithms can only learn so much from the data that is provided. In addition, the algorithms are also very
reliant on the data being correctly labeled. If the data is not correctly labeled, the algorithms will not produce
accurate results.

Limitations of unsupervised learning

Unsupervised learning is a type of machine learning where the algorithm is not provided with a set of known
inputs and outputs, and must learn from the data itself. The main limitation of unsupervised learning is that it is
more difficult for the algorithm to learn from the data, and often produces poorer results.

Limitations of semi-supervised learning


Semi-supervised learning is a type of machine learning that uses a combination of supervised learning and
unsupervised learning. The goal of semi-supervised learning is to improve the performance of a learning algorithm
using a small amount of labeled data together with a large amount of unlabeled data.

The limitations of semi-supervised learning are:

1. The quality of the results depends on the quality of the training data. If the training data is poor, the
results will also be poor.
2. Semi-supervised learning is less accurate than supervised learning.
3. It is more difficult to use semi-supervised learning than supervised learning.

Limitations of reinforcement learning

Reinforcement learning is a machine learning technique that allows agents to learn how to achieve a goal or satisfy
a condition by interacting with an environment.

Reinforcement learning has a number of limitations:

1. It can be difficult to determine the appropriate reinforcement learning algorithm to use for a given
problem.
2. It can be difficult to find a good learning rate and other optimization parameters for a reinforcement
learning algorithm.
3. Reinforcement learning can be slow to learn, especially in complex environments.
4. Reinforcement learning can be susceptible to “catastrophic forgetting,” where learned knowledge is
forgotten when it is no longer needed.
5. Reinforcement learning can be sensitive to changes in the environment, which can lead to unstable
or unpredictable behaviour.

Limitations of machine learning models – An overview

Let’s discuss it in wider angle, for example: one limitation is that machine learning cannot always accurately
predict outcomes for certain situations. For example, a machine may be able to predict that a customer is likely to
purchase a product, but may not be able to accurately predict which product the customer will purchase.

Limitations of machine learning models include:

1. Machine learning models are often opaque, making it difficult to understand why a particular
prediction was made.
2. Machine learning models are often unstable, meaning that they can produce different results when
trained on different data sets.
3. Machine learning models are often biased, meaning that they can produce inaccurate results when
applied to data sets that don’t match the data set on which the model was trained.
4. Machine learning models are often difficult to customize, meaning that it can be hard to change their
parameters or to adapt them to new data sets.
5. Machine learning models are often expensive to train, meaning that it can take a lot of time and
computational resources to build a model that is accurate.
6. Machine learning models are often vulnerable to learning from noise in the data, which can lead to
inaccurate predictions.
7. Machine learning models are often sensitive to the order in which the data is presented to them,
meaning that they can produce different results if the data is rearranged.
8. Machine learning models are often sensitive to the scale of the data, meaning that they can produce
different results if the data is aggregated or disaggregated.
9. Machine learning models are often sensitive to the distribution of the data, meaning that they can
produce different results if the data is sorted in a different way.
10. Machine learning models are often sensitive to the selection of training data, meaning that the results
of the model can be biased if the training data is not representative of the data set that will be used to
make predictions.

Machine learning has revolutionized BIG DATA and its potential application. It is growing day by day. It has the
ability to learn from past experience and make predictions on future events. Despite these impressive capabilities,
machine learning has limitations. One of its key limitations is its inability to account for unstructured data.
Additionally, machine learning is only as good as the data it is trained on. If the data is inaccurate or biased, the
machine learning algorithm will produce inaccurate results. Lastly, machine learning can be bypassed by human
beings who are better at understanding natural language and recognizing patterns.

Introduction to Deep learning


Deep learning, which is based on conventional artificial neural networks, is often regarded as
a subset of machine learning.
A connection in an artificial neural network is referred to as an artificial neuron. Similar to the
biological brain network of the human body, artificial neural networks have a layered
architecture and each network node (connection point) is capable of processing input and
transmitting output to other nodes in the network.
Input layer — initial data for the neural network.
Hidden layers — intermediate layer between input and output layer and place where all the
computation is done.
Output layer — produce the result for given inputs.
Input, output, and hidden layers are commonly present in conventional artificial neural
networks. They only have one hidden layer because as the complexity and number of hidden
layers rise, calculations become unstable and unfeasible.
Input, output, and several hidden layers are all included in deep learning neural networks.
Deep learning neural networks can have more than one hidden layer due to improved
algorithms and higher computing power.
Deep learning neural networks can be generally divided into two types:
1. Convolutional Neural Networks
2. Recurrent Neural Networks
Convolutional Neural Networks are mainly for images, while Recurrent Neural Networks are
mainly for sequence data, such as text and time series.
Deep learning can be dated back to 1989, when Yann LeCun and his colleagues proposed a
structure for a convolutional neural network, called LeNet. LeNet was successfully used in
handwritten digit recognition.
In 2010, Professor Feifei Li of Stanford University created the ImageNet, the largest image
database that contains more than 14 million images of daily objects, divided into more than
20,000 categories, such as cats, dogs, cars, tables, chairs, and so on. ImageNet then launched
an annual challenge on image classification, called ImageNet Large Scale Visual Recognition
Challenge (ILSVRC). The ImageNet challenges only use 1,000 categories. The competitions
ran from 2010 to 2017, and moved to Kaggle after 2017.
The breakthrough was made possible by Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton's creation of AlexNet in 2012. With an excellent top-five accuracy of 85%, AlexNet
took first place in the contest. The next-best outcome, at 74 percent, was well behind.
Top-one accuracy in image classification refers to how well the best prediction must match the
anticipated result. Top-five accuracy denotes the likelihood that each of the top five responses
will match the predicted response.
The depth of the model was essential to the original AlexNet publication's primary result that
it performed well. Although this required expensive calculation, using graphics processing
units (GPUs) during training made it possible.
For their work on deep learning, Geoffrey Hinton, Yann LeCun, and Canadian computer
scientist Yoshua Bengio are frequently referred to as the Godfathers of AI or the Godfathers of
Deep Learning.
In 2014, Google's GoogLeNet model, which made use of the Inception module and
architecture, helped Christian Szegedy and colleagues achieve top results in object detection.
In its 2014 publication titled "Going Deeper with Convolutions," this methodology was
discussed. The closest neural network comparable is GoogLeNet (later known as Inception),
which has a top-five error rate of 6.66 percent.
With their VGG model, Karen Simonyan and Andrew Zisserman of the Oxford Vision
Geometry Group (VGG) obtained outstanding outcomes for image categorization and
localisation. With a top-five mistake rate of 7.3%, VGG comes in second. The ideal human-
level classification error rate is 5.1 percent.
Using their residual network, Kaiming He and colleagues from Microsoft Research excelled at
object detection and object detection with localization tests in 2015. (ResNet). ResNet
surpassed humans at picture classification and identification, with a top-five error rate of just
3.7%.
In 2019, EfficientNet asserts to have attained a top-five classification accuracy of 97.1%.

The classic machine learning methods were initially incredibly efficient and successful when
there was little data. However, as the volume of data reaches the millions, their performance
approaches a plateau and stays at the same level even as the data volume grows. With a bigger
data set, the performance of conventional neural networks improves, but eventually reaches a
plateau. Only deep learning neural networks continue to perform better as the size of the data
grows. Deep learning neural networks are receiving a lot of study interest because of this.
Weights control the signal (or the strength of the connection) between two neurons. In other words, a weight
decides how much influence the input will have on the output. Biases, which are constant, are an additional input
into the next layer that will always have the value of 1. Bias can be positive or negative, increasing or decreasing
a neuron's output.

In the context of neural networks and deep learning, weights and biases are fundamental components that play a
crucial role in the functioning and training of neurons.

Neuron in a Neural Network:

A neuron in a neural network is a mathematical function that takes multiple inputs, applies weights to these inputs,
sums them up, adds a bias, and then applies an activation function to produce an output. This output is then passed
on to the next layer of neurons in the network.

Weights:

Weights are the parameters that the neural network learns during the training process. Each input to a neuron is
associated with a weight. The weight represents the strength of the connection between the input and the neuron.
A higher weight means the input has more influence on the neuron's output, and a lower weight means less
influence.

During training, these weights are adjusted using optimization algorithms (e.g., gradient descent) to minimize a
defined loss function, effectively tuning the network to make accurate predictions.

Bias:

The bias is an additional parameter associated with each neuron. It allows the activation function to shift left or
right, providing the model with more flexibility. The bias helps the neuron to activate even when the weighted
sum of inputs is zero.

In summary, weights determine the strength of connections between neurons and are adjusted during training to
optimize performance. The bias helps in adjusting the activation function and provides the neuron with the ability
to activate even for small input values. Together, weights and biases enable the neural network to learn and
generalize from input data to produce meaningful output predictions.

The activation function calculates a weighted total and then adds bias to it to decide whether a neuron should be
activated or not. The Activation Function’s goal is to introduce non-linearity into a neuron’s output.

A Neural Network without an activation function is basically a linear regression model in Deep Learning,
since these functions perform non-linear computations on the input of a Neural Network, enabling it to learn
and do more complex tasks.

Activation Functions

An activation function in a neural network defines how the weighted sum of the input is transformed into an output
from a node or nodes in a layer of the network.
Sometimes the activation function is called a “transfer function.” If the output range of the activation function is
limited, then it may be called a “squashing function.” Many activation functions are nonlinear and may be referred
to as the “nonlinearity” in the layer or the network design.
The choice of activation function has a large impact on the capability and performance of the neural network, and
different activation functions may be used in different parts of the model.
Technically, the activation function is used within or after the internal processing of each node in the network,
although networks are designed to use the same activation function for all nodes in a layer.

A network may have three types of layers: input layers that take raw input from the domain, hidden layers that
take input from another layer and pass output to another layer, and output layers that make a prediction.
All hidden layers typically use the same activation function. The output layer will typically use a different
activation function from the hidden layers and is dependent upon the type of prediction required by the model.

Activation functions are also typically differentiable, meaning the first-order derivative can be calculated for a
given input value. This is required given that neural networks are typically trained using the backpropagation of
error algorithm that requires the derivative of prediction error in order to update the weights of the model.

Gradient Descent Algorithm iteratively calculates the next point using gradient at the current position, scales
it (by a learning rate) and subtracts obtained value from the current position (makes a step). It subtracts the
value because we want to minimise the function (to maximise it would be adding).

Backpropagation, short for "backward propagation of errors," is an algorithm for supervised learning of
artificial neural networks using gradient descent. Given an artificial neural network and an error function, the
method calculates the gradient of the error function with respect to the neural network's weights.

There are many different types of activation functions used in neural networks, although perhaps only a small
number of functions used in practice for hidden and output layers.

Activation for Hidden Layers

A hidden layer in a neural network is a layer that receives input from another layer (such as another hidden layer
or an input layer) and provides output to another layer (such as another hidden layer or an output layer).

A hidden layer does not directly contact input data or produce outputs for a model, at least in general.

A neural network may have zero or more hidden layers.

Typically, a differentiable nonlinear activation function is used in the hidden layers of a neural network. This
allows the model to learn more complex functions than a network trained using a linear activation function.

In order to get access to a much richer hypothesis space that would benefit from deep representations, you need
a non-linearity, or activation function.

There are perhaps three activation functions you may want to consider for use in hidden layers; they are:

• Rectified Linear Activation (ReLU)


• Logistic (Sigmoid)
• Hyperbolic Tangent (Tanh)
This is not an exhaustive list of activation functions used for hidden layers, but they are the most commonly used.
ReLU Hidden Layer Activation Function

The rectified linear activation function, or ReLU activation function, is perhaps the most common function used
for hidden layers.

It is common because it is both simple to implement and effective at overcoming the limitations of other previously
popular activation functions, such as Sigmoid and Tanh. Specifically, it is less susceptible to vanishing
gradients that prevent deep models from being trained, although it can suffer from other problems like saturated
or “dead” units.
The ReLU function is calculated as follows:

• max(0.0, x)
This means that if the input value (x) is negative, then a value 0.0 is returned, otherwise, the value is returned.

When using the ReLU function for hidden layers, it is a good practice to use a “He Normal” or “He Uniform”
weight initialization and scale input data to the range 0-1 (normalize) prior to training.

Sigmoid Hidden Layer Activation Function

The sigmoid activation function is also called the logistic function.

It is the same function used in the logistic regression classification algorithm.

The function takes any real value as input and outputs values in the range 0 to 1. The larger the input (more
positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to 0.0.

The sigmoid activation function is calculated as follows:

• 1.0 / (1.0 + e^-x)


Where e is a mathematical constant, which is the base of the natural logarithm.
We can see the familiar S-shape of the sigmoid activation function.

Plot of Inputs vs. Outputs for the Sigmoid Activation Function.

When using the Sigmoid function for hidden layers, it is a good practice to use a “Xavier Normal” or “Xavier
Uniform” weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input
data to the range 0-1 (e.g. the range of the activation function) prior to training.
Tanh Hidden Layer Activation Function

The hyperbolic tangent activation function is also referred to simply as the Tanh (also “tanh” and “TanH“)
function.
It is very similar to the sigmoid activation function and even has the same S-shape.

The function takes any real value as input and outputs values in the range -1 to 1. The larger the input (more
positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to -1.0.

The Tanh activation function is calculated as follows:

• (e^x – e^-x) / (e^x + e^-x)


Where e is a mathematical constant that is the base of the natural logarithm.
We can see the familiar S-shape of the Tanh activation function.
Plot of Inputs vs. Outputs for the Tanh Activation Function.

When using the TanH function for hidden layers, it is a good practice to use a “Xavier Normal” or “Xavier
Uniform” weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input
data to the range -1 to 1 (e.g. the range of the activation function) prior to training.

How to Choose a Hidden Layer Activation Function

A neural network will almost always have the same activation function in all hidden layers.

It is most unusual to vary the activation function through a network model.

Traditionally, the sigmoid activation function was the default activation function in the 1990s. Perhaps through
the mid to late 1990s to 2010s, the Tanh function was the default activation function for hidden layers.

Both the sigmoid and Tanh functions can make the model more susceptible to problems during training, via the
so-called vanishing gradients problem.

The activation function used in hidden layers is typically chosen based on the type of neural network architecture.

Modern neural network models with common architectures, such as MLP and CNN, will make use of the ReLU
activation function, or extensions.
Recurrent networks still commonly use Tanh or sigmoid activation functions, or even both. For example, the
LSTM commonly uses the Sigmoid activation for recurrent connections and the Tanh activation for output.

• Multilayer Perceptron (MLP): ReLU activation function.


• Convolutional Neural Network (CNN): ReLU activation function.
• Recurrent Neural Network: Tanh and/or Sigmoid activation function.

How to Choose a Hidden Layer Activation Function

Activation for Output Layers

The output layer is the layer in a neural network model that directly outputs a prediction.

All feed-forward neural network models have an output layer.

Feed forward neural networks are artificial neural networks in which nodes do not form loops. This type of
neural network is also known as a multi-layer neural network as all information is only passed forward. During
data flow, input nodes receive data, which travel through hidden layers, and exit output nodes.

There are perhaps three activation functions you may want to consider for use in the output layer; they are:

• Linear
• Logistic (Sigmoid)
• Softmax
This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used.

Linear Output Activation Function

The linear activation function is also called “identity” (multiplied by 1.0) or “no activation.”
This is because the linear activation function does not change the weighted sum of the input in any way and instead
returns the value directly.

Sigmoid Output Activation Function


The sigmoid of logistic activation function was described above.

Softmax Output Activation Function

Max(2,3) = 3

A = [2,3]

A[0] = 2

A[1] = 3

Argmax(2,3) = 0 , 1

The softmax function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class
membership.
It is related to the argmax function that outputs a 0 for all options and 1 for the chosen option. Softmax is a “softer”
version of argmax that allows a probability-like output of a winner-take-all function.
As such, the input to the function is a vector of real values and the output is a vector of the same length with values
that sum to 1.0 like probabilities.

The softmax function is calculated as follows:

• e^x / sum(e^x)
Where x is a vector of outputs and e is a mathematical constant that is the base of the natural logarithm.
Target labels used to train a model with the softmax activation function in the output layer will be vectors with 1
for the target class and 0 for all other classes.

How to Choose an Output Activation Function

You must choose the activation function for your output layer based on the type of prediction problem that you
are solving.

Specifically, the type of variable that is being predicted.

For example, you may divide prediction problems into two main groups, predicting a categorical variable
(classification) and predicting a numerical variable (regression).
If your problem is a regression problem, you should use a linear activation function.

• Regression: One node, linear activation.


If your problem is a classification problem, then there are three main types of classification problems and each
may use a different activation function.

Predicting a probability is not a regression problem; it is classification. In all cases of classification, your model
will predict the probability of class membership (e.g. probability that an example belongs to each class) that you
can convert to a crisp class label by rounding (for sigmoid) or argmax (for softmax).
If there are two mutually exclusive classes (binary classification), then your output layer will have one node and
a sigmoid activation function should be used. If there are more than two mutually exclusive classes (multiclass
classification), then your output layer will have one node per class and a softmax activation should be used. If
there are two or more mutually inclusive classes (multilabel classification), then your output layer will have one
node for each class and a sigmoid activation function is used.

• Binary Classification: One node, sigmoid activation.


• Multiclass Classification: One node per class, softmax activation.
• Multilabel Classification: One node per class, sigmoid activation.
The figure below summarizes how to choose an activation function for the output layer of your neural network
model.

Biological Neurons

Typical biological neurons are individual cells, each composed of the main body of the cell along with many tendrils

that extend from that body. The body, or soma, houses the machinery for maintaining basic cell functions and

energy processing (e.g., the DNA-containing nucleus, and organelles for building proteins and processing sugar

and oxygen). There are two types of tendrils: dendrites, which receive information from other neurons and bring it

to the cell body, and axons, which send information from the cell body to other neurons.
Information transmission from a transmitting neuron to a receiving neuron is roughly composed of three stages.

First, the transmitting neuron generates a spatially- and temporally-confined electrical burst, or spike, that travels

along the neuron’s axon (and axonal branches) from the cell body to the terminal ends of the axon. An axon terminal

of the transmitting neuron is “connected” to a dendrite of a receiving neuron by a synapse. The spike causes the

transmitting neuron’s synapse to release chemicals, or neurotransmitters, that travel the short distance between the

two neurons via diffusion.


Specialized receptors on the receiving neuron recognize (bind with) specific neurotransmitters, and initiate a

number of cellular events (most of which are ignored in this post) when neurotransmitter molecules bind to the

receptors. One of those events is the opening of cellular channels which initiate another electrical wave, this time

propagating through the receiving neuron’s dendrite toward its cell body (this may be in the form of a spike, but

typically the wave is more spatially diffuse than spike-based transmission along axons — think of water being

pushed through a pipe).

Thus, information from one neuron can be transmitted to another. When a neuron receives

multiple excitatory spikes from multiple transmitting neurons, that electrical energy is accumulated at the neuron’s

cell body, and if enough energy is accumulated in a short period of time, the neuron will generate outgoing spikes

of its own, and relay them to other neurons.

There are three remaining aspects to discuss in order to understand the Modeling that takes us from biological

neurons to deep learning neurons.

• Rate-coding

• Synaptic strength

• Excitatory and inhibitory transmission

Rate-coding

A neuron that receives only a small number of excitatory spikes will produce and send few spikes of its own, if

any. If that same neuron receives many excitatory spikes it will (typically) send many spikes of its own. Although

spikes in biological neurons have a distinctly temporal characteristic, the temporal resolution is “blurred” in deep

learning neurons. For a given unit of time, the spiking activity of the deep learning neuron is represented as a

number of spikes (an integer) or more typically, an average spiking rate (a floating-point number).
In this contrived example, three neurons in the visual system receive indirect input from one of three groups of
color-sensitive cones cells in the eye. Each neuron is therefore maximally responsive to a particular wavelength
of light, and spiking activity is reported as the average spike rate (normalized to [0,1]). Thus, the input wavelength
is “encoded” by the collective spike rates of the three neurons.

Note, however, that in biological neurons, information is encoded in the relative timing of spikes in individual or

multiple neurons, not just in the individual neuron spiking rates. Thus, this type of information coding and

transmission is absent in deep learning neurons. The impact of this will be discussed further below.

Synaptic strength

Not all spikes are equal. When a propagating spike reaches an axonal terminal, the amount of electrical energy that

ultimately arises in the dendrite of the receiving neuron depends on the strength of the intervening synapse. This

strength is reflective of a number of underlying physiological factors including the amount of neurotransmitter

available for release in the transmitting neuron and the number of neurotransmitter receptors on the receiving

neuron.
Regardless, in deep learning neurons, synaptic strength is represented by a single floating-point number, and is

more commonly referred to as the weight of the synapse.

Excitatory and inhibitory neurotransmitters

Up until now, we have only considered excitatory neurotransmission. In that case, spikes received from a

transmitting neuron increase the likelihood that a receiving neuron will also spike. This is due to the particular

properties of the activated receptors on the receiving neuron. Although an oversimplification, one can group

neurotransmitters and their receptors into an excitatory class and an inhibitory class. When an inhibitory

neurotransmitter binds to a inhibitory receptor, the electrical energy at the dendrite in the receiving neuron is

reduced rather than increased. In general, neurons have receptors for both excitatory and inhibitory

neurotransmitters, but can release (transmit) only one class or the other. In the mammalian cortex, there are many

more excitatory neurons (which release the neurotransmitter glutamate with each spike) than inhibitory neurons

(which release the neurotransmitter GABA with each spike). Nonetheless, these inhibitory neurons are important

for increasing information selectivity in receiving neurons, gating neurons off and thus contributing to information

routing, and preventing epileptic activity (chaotic firing of many neurons in the network).
In deep learning networks, no distinction is made between excitatory and inhibitory neurons (those having only an

excitatory or inhibitory neurotransmitter, respectively). All neurons have output activity that is greater than zero,

and it is the synapses that model inhibition. The weights of the synapses are allowed to be negative, in which case

inputs from transmitting neurons cause the output of the receiving neuron to be reduced.

Differences between ANN and BNN:


S.No. ANN BNN
1. It is short for Artificial Neural Network. It is short for Biological Neural Network.
2. Processing speed is fast as compared to They are slow in processing information.
Biological Neural Network.
3. Allocation for Storage to a new process is Allocation for storage to a new process is
strictly irreplaceable as the old location is easy as it is added just by adjusting the
saved for the previous process. interconnection strengths.
4. Processes operate in sequential mode. The process can operate in massive parallel
operations.
5. If any information gets corrupted in the Information is distributed into the network
memory it cannot be retrieved. throughout into sub-nodes, even if it gets
corrupted it can be retrieved.
6. The activities are continuously monitored There is no control unit to monitor the
by a control unit. information being processed into the
network.

Definitions
1. Biological Neuron:
A cell in the nervous system that transmits information using electrical and chemical signals. It consists
of dendrites (input), a soma (cell body), and an axon (output).
2. Linear Perceptron:
An artificial neuron model that performs binary classification by computing a weighted sum of inputs
and applying a step activation function to produce either 0 or 1 as output.
3. Expressing Linear Perceptrons as Neurons:
Linear perceptrons can be modeled as artificial neurons with inputs, weights, bias, and an activation
function that mimics the input-output behavior of biological neurons.
4. Perceptron Learning Algorithm:
A method for training a perceptron by iteratively updating its weights based on errors in predictions,
ensuring the perceptron can classify linearly separable data.
5. Sigmoid Neurons:
Neurons that use the sigmoid activation function, mapping inputs to a range between 0 and 1, often
used in binary classification tasks.
6. Tanh Neurons:
Neurons that use the hyperbolic tangent (tanh) activation function, mapping inputs to a range between -
1 and 1, helping in faster optimization during training.
7. ReLU Neurons:
Neurons that use the Rectified Linear Unit (ReLU) activation function, which outputs the input directly
if it is positive, otherwise outputs zero, widely used in deep learning.
1. Explain the structure and components of a biological neuron. How do artificial neurons model them?
Biological Neuron: A biological neuron consists of three key components:

• Dendrites: Branch-like structures that receive signals from other neurons.

• Cell Body (Soma): The central part of the neuron that processes incoming signals and integrates
information.

• Axon: A long, slender projection that transmits electrical impulses to other neurons, muscles, or glands.

• Synapse: The junctions between neurons through which signals are passed from one neuron to another.
Artificial Neuron: An artificial neuron mimics the behavior of a biological neuron. It consists of:

• Inputs: Represent signals received by the dendrites.

• Weights: Analogous to synaptic weights, these control the strength of the input signals.

• Bias: A parameter that shifts the activation function, helping adjust the neuron's output.

• Activation Function: Similar to the soma's decision-making process, it determines the neuron's output
based on the weighted sum of inputs.

• Output: Represents the signal that is passed on to other neurons in the network.

2. What is a perceptron, and how does it work? Explain its limitations.


Perceptron: A perceptron is a simple algorithm for binary classification. It takes multiple inputs, processes them
by applying weights and biases, and passes the result through an activation function to produce an output (usually
0 or 1).
How it Works:

• The perceptron sums the weighted inputs, adds a bias, and then applies an activation function (typically
a step function) to produce an output. If the output exceeds a certain threshold, it is classified as 1;
otherwise, it's classified as 0.
Limitations:

• The perceptron can only solve linearly separable problems. It cannot solve problems where the data
cannot be separated by a straight line (like the XOR problem).

• It is unable to handle more complex decision boundaries, which is why more advanced models, such as
multi-layer neural networks, are necessary.

3. Why is the XOR problem significant in neural network history?


The XOR problem is significant because it exposed the limitations of the single-layer perceptron. A perceptron
could not solve the XOR problem because it is not linearly separable. This issue demonstrated the need for more
advanced neural networks (i.e., multi-layer networks) to handle non-linear classification tasks, which led to the
development of deep learning and multi-layer perceptrons (MLPs).

4. Derive the perceptron learning algorithm and explain its weight update rule.
The perceptron learning algorithm is used to adjust the weights based on the error between the predicted output
and the actual label. It works by updating the weights and bias whenever there is a misclassification. If the
perceptron makes an incorrect prediction, the weights are adjusted according to the formula:

• Weight Update Rule:


o If the prediction is wrong, the weights are updated by adding or subtracting the input values,
scaled by a learning rate.
• Bias Update Rule:
o If the prediction is incorrect, the bias is also updated similarly, usually by adding or subtracting
a constant value (often 1).
The learning process continues until the network converges, meaning the weights lead to correct classifications
on the training set.

5. What are activation functions, and why are they essential in neural networks?
Activation functions are mathematical functions applied to the weighted sum of inputs to a neuron. They introduce
non-linearity into the network, allowing it to learn complex patterns. Without activation functions, a neural
network would simply behave like a linear regression model, regardless of how many layers it has.
Why essential?

• They enable neural networks to approximate non-linear functions.

• They allow deep networks to learn complex patterns and make more accurate predictions.

• Without non-linear activation functions, neural networks would be limited in their ability to model
complex relationships.

6. Compare and contrast sigmoid, tanh, and ReLU activation functions in terms of their mathematical
properties and practical use cases.

• Sigmoid:
o Output range: (0, 1)
o Smooth gradient and differentiable, making it useful in probabilistic models.
o Issues: Can cause vanishing gradients for large positive or negative inputs, which leads to slow
convergence in deep networks.

• Tanh (Hyperbolic Tangent):


o Output range: (-1, 1)
o Similar to sigmoid but has a broader range, making it more effective in some cases.
o Issues: Also suffers from vanishing gradients like the sigmoid, but less severe.

• ReLU (Rectified Linear Unit):


o Output range: [0, ∞)
o Simple and computationally efficient.
o Provides sparse activation (only positive values are passed), which can make the network more
efficient.
o Issues: Can cause the "dying ReLU" problem, where neurons can get stuck during training and
never activate.

7. What is the "vanishing gradient problem," and how do activation functions like ReLU address it?
Vanishing Gradient Problem: The vanishing gradient problem occurs when gradients become very small during
backpropagation in deep networks. This happens especially with activation functions like sigmoid or tanh, which
squish large values into small ones, making the gradients approach zero as they are propagated back. This leads
to slower learning or the network failing to learn altogether.
How ReLU addresses it: ReLU activation does not suffer from vanishing gradients for positive values (as it
simply outputs the input for positive values). This makes it effective in preventing gradients from vanishing and
ensures faster convergence in deep networks.
8. Describe how a perceptron can be geometrically interpreted in terms of decision boundaries.
A perceptron can be geometrically interpreted as a linear classifier. In a 2D space, the perceptron draws a straight
line (or hyperplane in higher dimensions) that separates the two classes. This line is the decision boundary, where
one side corresponds to one class, and the other side corresponds to the other. The perceptron adjusts the weights
to move this boundary to correctly classify the data points.

9. Why is non-linearity critical in building deep neural networks?


Non-linearity is essential because it enables neural networks to model complex patterns. Without non-linear
activation functions, neural networks would essentially be a series of linear transformations, no matter how many
layers they have. Non-linear activation functions allow the network to learn non-linear decision boundaries, which
are necessary for solving more complex problems, such as image recognition or natural language processing.

10. Discuss scenarios where sigmoid or tanh activation functions are preferred over ReLU.

• Sigmoid and Tanh:


o These functions are typically preferred in output layers for problems requiring probabilistic
outputs or when the output range needs to be bounded (e.g., binary classification with sigmoid).
o Tanh might be preferred over sigmoid when it's important to have both positive and negative
activations, as it produces a symmetric output centered around zero.
o Both functions are also preferred in shallow networks where gradient issues (like vanishing
gradients) are less pronounced.

• ReLU is generally used in hidden layers, where the ability to learn sparse representations and avoid
vanishing gradients outweighs the preference for bounded outputs.

What is a feed forward neural network?

Feed forward neural networks are artificial neural networks in which nodes do not form loops.
This type of neural network is also known as a multi-layer neural network as all information is
only passed forward.

During data flow, input nodes receive data, which travel through hidden layers, and exit output
nodes. No links exist in the network that could get used to by sending information back from
the output node.

A feed forward neural network approximates functions in the following way:

• An algorithm calculates classifiers by using the formula y = f* (x).


• Input x is therefore assigned to category y.
• According to the feed forward model, y = f (x; θ). This value determines the closest
approximation of the function.

Feed forward neural networks serve as the basis for object detection in photos, as shown in the
Google Photos app.

What is the working principle of a feed forward neural network?


When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.

This model multiplies inputs with weights as they enter the layer. Afterward, the weighted
input values get added together to get the sum. As long as the sum of the values rises above a
certain threshold, set at zero, the output value is usually 1, while if it falls below the threshold,
it is usually -1.

As a feed forward neural network model, the single-layer perceptron often gets used for
classification. Machine learning can also get integrated into single-layer perceptrons. Through
training, neural networks can adjust their weights based on a property called the delta rule,
which helps them compare their outputs with the intended values.

As a result of training and learning, gradient descent occurs. Similarly, multi-layered


perceptrons update their weights. But, this process gets known as back-propagation. If this is
the case, the network's hidden layers will get adjusted according to the output values produced
by the final layer.

Layers of feed forward neural network

• Input layer:
The neurons of this layer receive input and pass it on to the other layers of the network. Feature
or attribute numbers in the dataset must match the number of neurons in the input layer.

• Output layer:

According to the type of model getting built, this layer represents the forecasted feature.

• Hidden layer:

Input and output layers get separated by hidden layers. Depending on the type of model, there
may be several hidden layers.

There are several neurons in hidden layers that transform the input before actually transferring
it to the next layer. This network gets constantly updated with weights in order to make it easier
to predict.

• Neuron weights:

Neurons get connected by a weight, which measures their strength or magnitude. Similar to
linear regression coefficients, input weights can also get compared.

Weight is normally between 0 and 1, with a value between 0 and 1.

• Neurons:

Artificial neurons get used in feed forward networks, which later get adapted from biological
neurons. A neural network consists of artificial neurons.

Neurons function in two ways: first, they create weighted input sums, and second, they activate
the sums to make them normal.

Activation functions can either be linear or nonlinear. Neurons have weights based on their
inputs. During the learning phase, the network studies these weights.

• Activation Function:

Neurons are responsible for making decisions in this area.

According to the activation function, the neurons determine whether to make a linear or
nonlinear decision. Since it passes through so many layers, it prevents the cascading effect
from increasing neuron outputs.

An activation function can be classified into three major categories: sigmoid, Tanh, and
Rectified Linear Unit (ReLu).

• Sigmoid:

Input values between 0 and 1 get mapped to the output values.

• Tanh:

A value between -1 and 1 gets mapped to the input values.


• Rectified linear Unit:

Only positive values are allowed to flow through this function. Negative values get mapped to
0.

Function in feed forward neural network

Cost function

In a feed forward neural network, the cost function plays an important role. The categorized
data points are little affected by minor adjustments to weights and biases.

Thus, a smooth cost function can get used to determine a method of adjusting weights and
biases to improve performance.

Following is a definition of the mean square error cost function:

Image source

Where,

w = the weights gathered in the network

b = biases

n = number of inputs for training

a = output vectors
x = input

‖v‖ = vector v's normal length

Loss function

The loss function of a neural network gets used to determine if an adjustment needs to be made
in the learning process.

Neurons in the output layer are equal to the number of classes. Showing the differences between
predicted and actual probability distributions. Following is the cross-entropy loss for binary
classification.

Image source

As a result of multiclass categorization, a cross-entropy loss occurs:

Gradient learning algorithm

In the gradient descent algorithm, the next point gets calculated by scaling the gradient at the
current position by a learning rate. Then subtracted from the current position by the achieved
value.

To decrease the function, it subtracts the value (to increase, it would add). As an example, here
is how to write this procedure:

The gradient gets adjusted by the parameter η, which also determines the step size. Performance
is significantly affected by the learning rate in machine learning.

Output units
In the output layer, output units are those units that provide the desired output or prediction,
thereby fulfilling the task that the neural network needs to complete.

There is a close relationship between the choice of output units and the cost function. Any unit
that can serve as a hidden unit can also serve as an output unit in a neural network.

Advantages of feed forward Neural Networks

• Machine learning can be boosted with feed forward neural networks' simplified
architecture.
• Multi-network in the feed forward networks operate independently, with a moderated
intermediary.
• Complex tasks need several neurons in the network.
• Neural networks can handle and process nonlinear data easily compared to perceptrons
and sigmoid neurons, which are otherwise complex.
• A neural network deals with the complicated problem of decision boundaries.
• Depending on the data, the neural network architecture can vary. For example,
convolutional neural networks (CNNs) perform exceptionally well in image processing,
whereas recurrent neural networks (RNNs) perform well in text and voice processing.
• Neural networks need graphics processing units (GPUs) to handle large datasets for
massive computational and hardware performance. Several GPUs get used widely in
the market, including Kaggle Notebooks and Google Collab Notebooks.

Applications of feed forward neural networks

Q1. What is feed-forward vs deep feed-forward?


A. Feed-forward refers to a neural network architecture where information flows in one

direction, from input to output, with no feedback loops. Deep feed-forward, commonly known

as a deep neural network, consists of multiple hidden layers between input and output layers,

enabling the network to learn complex hierarchical features and patterns, enhancing its ability

to model intricate relationships in data.

Q2. What is feed-forward vs feedback neural network?

A. Feed-forward neural networks transmit data in one direction—from input to output—

without feedback loops, making them suitable for tasks like pattern recognition and

classification. Feedback neural networks, on the other hand, incorporate feedback connections,

allowing output to affect subsequent processing. Recurrent Neural Networks (RNNs) are a

common type of feedback network, useful for sequential data tasks like language modeling,

where context matters.

I can provide you with an example of a simple feedforward neural network problem and
illustrate the backpropagation algorithm to update the weights. Let's assume a basic
feedforward network for a binary classification problem with two features in the input, one
hidden layer with two neurons, and one output neuron.

**Problem**:
Suppose you have a feedforward neural network with the following architecture:

- Input Layer: 2 neurons


- Hidden Layer: 2 neurons
- Output Layer: 1 neuron

The network is trained to perform binary classification. Given a set of training data, we'll
calculate the weights' updates for one training example using the backpropagation algorithm.

**Training Data**:
Let's consider one training example with the following values:

- Input features: x1 = 0.6, x2 = 0.9


- Target output: y_target = 1

**Initial Weights**:
We'll start with some initial weights for the connections:

- Weights between the input and hidden layer:


- w1 = 0.1, w2 = -0.2, w3 = 0.3, w4 = 0.4
- Weights between the hidden and output layer:
- w5 = -0.5, w6 = 0.6

**Forward Pass**:
1. Calculate the weighted sum and apply the activation function for the hidden layer:

Hidden Neuron 1:
z1 = (0.6 * 0.1) + (0.9 * (-0.2)) = 0.06 - 0.18 = -0.12
a1 = sigmoid(z1)

Hidden Neuron 2:
z2 = (0.6 * 0.3) + (0.9 * 0.4) = 0.18 + 0.36 = 0.54
a2 = sigmoid(z2)

2. Calculate the weighted sum and apply the activation function for the output layer:

Output Neuron:
z3 = (a1 * (-0.5)) + (a2 * 0.6) = (-0.12 * (-0.5)) + (0.54 * 0.6) = 0.06 + 0.324 = 0.384
a3 = sigmoid(z3)

**Backpropagation**:

1. Calculate the error (loss) at the output layer:


error_output = y_target - a3
2. Calculate the derivative of the sigmoid activation function for the output layer:
sigmoid_derivative_output = a3 * (1 - a3)

3. Calculate the delta at the output layer:


delta_output = error_output * sigmoid_derivative_output

4. Update the weights between the hidden and output layer using the backpropagation formula:
Δw5 = learning_rate * delta_output * a1
Δw6 = learning_rate * delta_output * a2

5. Calculate the error (loss) for the hidden layer:


error_hidden1 = w5 * delta_output
error_hidden2 = w6 * delta_output

6. Calculate the derivative of the sigmoid activation function for the hidden layer:
sigmoid_derivative_hidden1 = a1 * (1 - a1)
sigmoid_derivative_hidden2 = a2 * (1 - a2)

7. Calculate the delta for the hidden layer:


delta_hidden1 = error_hidden1 * sigmoid_derivative_hidden1
delta_hidden2 = error_hidden2 * sigmoid_derivative_hidden2

8. Update the weights between the input and hidden layer:


Δw1 = learning_rate * delta_hidden1 * x1
Δw2 = learning_rate * delta_hidden1 * x2
Δw3 = learning_rate * delta_hidden2 * x1
Δw4 = learning_rate * delta_hidden2 * x2

You can repeat these steps for each training example in your dataset and update the weights
iteratively. This is the basic idea of backpropagation in a feedforward neural network. The
learning rate is a hyperparameter that controls the size of weight updates and should be tuned
during training.

You might also like