0% found this document useful (0 votes)
46 views44 pages

Neural Networks

da notes

Uploaded by

pujiswathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views44 pages

Neural Networks

da notes

Uploaded by

pujiswathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

NEURAL NETWORKS (NNs)

AGENDA
• Definition – Motivation – History – Math used –Why NNs?
• What is Deep learning, its purpose and DNN?
• Architecture of NN?
• Types of NN?
• Neurons – Activation functions?
• Problem in NN
• Loss and Cost functions?
• Optimization and Regularization?
• Feed forward NN & Back Propagation?
• Gradient Descent
CREDITS: • Applications, Advantages and Disadvantages of NN
PPT also contains slides from
DeepLearning.AI and Dr Andrew Ng’s
classes
Gist of Neural Networks
- Definition, motivation, Architecture, working and applications
• Neural networks, also called artificial • Training data teach neural networks and
neural networks (ANNs) or simulated help improve their accuracy over time.
neural networks (SNNs), are a subset of Once the learning algorithms are
machine learning and are the backbone of fined-tuned, they become powerful
deep learning algorithms. computer science and AI tools because
• They are called “neural” because they they allow us to very quickly classify and
mimic how neurons in the brain signal cluster data.
one another.
• Neural networks are made up of node • Using neural networks, speech and image
layers – an input layer, one or more recognition tasks can happen in minutes
hidden layers, and an output layer. instead of the hours they take when done
• Each node is an artificial neuron that manually. Google’s search algorithm is a
connects to the next, and each has a well-known example of a neural network.
weight and threshold value.
• When one node’s output is above the
threshold value, that node is activated and
sends its data to the network’s next layer.
If it’s below the threshold, no data passes
along.
MOTIVATION IN DETAIL
The neuronal perception of deep learning is
generally motivated by two main ideas:

• It is assumed that the human brain proves


that intelligent behavior is possible,
and—by reverse engineering, it is possible
to build an intelligent system.
• Another perspective is that to understand
the working of the human brain and the
principles that underlie its intelligence is
to build a mathematical model that could
shed light on the fundamental scientific • In essence, neural networks enable us
questions. to learn the structure of the data or
information and help us to
understand it by performing tasks
such as clustering, classification,
regression, or sample generation.
BASIC NN ARCHITECTURE AND
INNER WORKING OF NN
History of NN
• 1943: Warren S. McCulloch and • 1958: Frank Rosenblatt is credited
Walter Pitts published “A logical with the development of the
calculus of the ideas immanent in perceptron, documented in his
nervous activity”. research, “The Perceptron: A
• This research sought to Probabilistic Model for Information
understand how the human brain Storage and Organization in the
could produce complex patterns Brain”.
through connected brain cells, or • He takes McCulloch and Pitt’s work a
neurons. step further by introducing weights
• One of the main ideas that came to the equation.
out of this work was the • Leveraging an IBM 704, Rosenblatt
comparison of neurons with a was able to get a computer to learn
binary threshold to Boolean logic how to distinguish cards marked on
(i.e., 0/1 or true/false statements). the left vs. cards marked on the
right.
PERCEPTRON
A Perceptron is an artificial neuron, and
thus a neural network unit. It performs
computations to detect features or
patterns in the input data. It is an
algorithm for supervised learning of
binary classifiers. It is this algorithm that
allows artificial neurons to learn and
process features in a data set.

Perceptrons can be viewed as building blocks in a single layer in a neural network, made up of four different
parts:
1. Input Values or One Input Layer
2. Weights and Bias
3. Net sum
4. Activation function

A neural network, which is made up of perceptrons, can be perceived as a complex logical statement (neural
network) made up of very simple logical statements (perceptrons) of “AND” and “OR” statements. A
statement can only be true or false, but never both at the same time. The goal of a perceptron is to determine
from the input whether the feature it is recognizing is true, in other words whether the output is going to be a
0 or 1. A complex statement is still a statement, and its output can only be either a 0 or 1.
HOW A PERCEPTRON FUNCTIONS
• Summing up the weighted inputs (product of each input from the previous layer
multiplied by their weight), and adding a bias (value hidden in the circle), will produce a
weighted net sum. The inputs can either come from the input layer or perceptrons in a
previous layer. The weighted net sum is then applied to an activation function which then
standardizes the value, producing an output of 0 or 1. This decision made by the
perceptron is then passed onto the next layer for the next perceptron to use in their
decision.

• Together, these pieces make up a single perceptron in a layer of a neural network. These
perceptrons work together to classify or predict inputs successfully, by passing on
whether the feature it sees is present (1) or is not (0). The perceptrons are essentially
messengers, passing on the ratio of features that correlate with the classification vs the
total number of features that the classification has. For example, if 90% of those features
exist then it is probably true that the input is the classification, rather than another input
that only has 20% of the features of the classification.

• It’s just as Helen Keller once said, “Alone we can do so little; together we can do so
much.” and this is very true for perceptrons all around.
BASIC NN ARCHITECTURE AND
INNER WORKING OF NN
Difference in No. of. Layers makes a
difference
• In Perceptron, there are no • A neural network of more
hidden layers, hence it takes than three layers, including
an input and calculates the the inputs and the output,
weighted input for each input can be considered a
node. This weighted input is deep-learning algorithm.
then passed through an • Thus, That is where Deep
activation function to Neural Networks appear
generate the output. (when Total number of layers
• The Shallow neural network in a NN >3)
has only one hidden layer
between the input and
output.
Deep Learning (DL)
• Deep learning is a specific subfield of machine learning:
a new take on learning representations from data that
puts an emphasis on learning successive layers of
increasingly meaningful representations.
• The ‘deep’ in deep learning stands for this idea of
successive layers of representations.
• How many layers contribute to a model of the data is
called the depth of the model.
• Modern deep learning often involves tens or even
hundreds of successive layers of representations— and
they’re all learned automatically from exposure to
training data.
• Meanwhile, other approaches to machine learning tend
to focus on learning only one or two layers of
representations of the data; hence, they’re sometimes
called shallow learning.
• In deep learning, these layered representations are
(almost always) learned via models called neural
networks, structured in literal layers stacked on top of
each other.
Why NNs? NN vs DL?
Neural networks excel at Thanks to its fewer layers and
handling high-dimensional connections, you can train a
data and can automate simple neural network more
feature extraction, reducing quickly. However, their
simplicity also limits the extent
the need for manual feature
to which you can teach them.
engineering. This makes them They cannot perform complex
crucial for solving tasks that analysis. Deep learning systems
involve intricate patterns that have a much greater capacity
traditional algorithms might to learn complex patterns and
struggle to capture skills.
effectively.
Purpose of Deep Learning?
DECISIONS TO MAKE
• There are two key architecture decisions to be made about such a
stack of Dense NN layers:
1. How many layers to use
2. How many hidden units to choose for each layer
INNNER WORKINGS OF A NN with an
example
NN TYPES
FEEDFORWARD NN
• A feedforward network consists of an • When a neural network is first trained, it is
input layer, one or more hidden layers, first fed with input. Since the neural
and an output layer. The input layer network isn’t trained yet, we don’t know
receives the input into the neural which weights to use for each input. And
network, and each input has a weight so, each input is randomly assigned a
attached to it. weight. Since the weights are randomly
assigned, the neural network will likely
make the wrong predictions. It will give
• The weights associated with each input out the incorrect output.
are numerical values. These weights are
an indicator of the importance of the
input in predicting the final output. For • When the neural network gives out the
example, an input associated with a large incorrect output, this leads to an output
weight will have a greater influence on the error.
output than an input associated with a
small weight.
FEEDFORWARD NN
• This error is the difference
between the actual and predicted
outputs. A cost function measures
this error.

• The cost function (J) indicates how


accurately the model performs. It
tells us how far-off our predicted
output values are from our actual
values. It is also known as the
error. Because the cost function
quantifies the error, we aim to
minimize the cost function.
BACKPROPAGATION
• What we want is to reduce the output error. Since • Backpropagation uses the chain rule to calculate the
the weights affect the error, we will need to readjust gradient of the cost function. The chain rule involves
the weights. We have to adjust the weights such taking the derivative. This involves calculating the
that we have a combination of weights that partial derivative of each parameter. These
minimizes the cost function. derivatives are calculated by differentiating one
weight and treating the other(s) as a constant. As a
• This is where Backpropagation comes in. result of doing this, we will have a gradient.
• Backpropagation allows us to re-adjust our weights • Since we have calculated the gradients, we will be
to reduce output error. The error is propagated able to adjust the weights.
backward during backpropagation from the output
to the input layer. This error is then used to calculate
the gradient of the cost function with respect to
each weight.
• Backpropagation aims to calculate the negative
gradient of the cost function. This negative gradient
is what helps in adjusting of the weights. It gives us
an idea of how we need to change the weights so
that we can reduce the cost function.
ACTIVATION FUNCTIONS
• The value of NN’s net input can be any • Types of Activation Functions
anything from -inf to +inf. The neuron • Several different types of activation
doesn’t really know how to bound to functions are used in Deep Learning. Some
value and thus is not able to decide the of them are:
firing pattern.
1. STEP ACTIVATION FUNCTION
• Thus the activation function is an 2. SIGMOID FUNCTION
important part of an artificial neural 3. RELU &
network. They basically decide whether a 4. Leaky RELU, etc.,
neuron should be activated or not. Thus
it bounds the value of the net input.
• The activation function is a non-linear
transformation that we do over the input
before sending it to the next layer of
neurons or finalizing it as output.
TYPES OF ACTIVATION FUNCTIONS
• Step Function is one of the • The ReLU function is the Rectified
simplest kind of activation linear unit. It is the most widely used
functions. In this, we consider a activation function. If the input is
threshold value and if the value negative it will convert it to zero and
of net input say y is greater than the neuron does not get activated.
the threshold then the neuron is • Leaky ReLU function is an improved
activated. version of the ReLU function. Instead
of defining the Relu function as 0 for x
• Sigmoid function is a widely less than 0, we define it as a small
used activation function. This is linear component of x.
a smooth function and is
continuously differentiable. The
biggest advantage that it has
over step and linear function is
that it is non-linear.
LOSS AND COST FUNCTIONS
• The terms cost function & loss • There are many cost functions in
function are analogous. machine learning and each has its use
• Loss function: Used when we refer to cases depending on whether it is a
the error for a single training example. regression problem or classification
problem.
• Cost function: Used to refer to an
average of the loss functions over an
entire training dataset. 1. Regression cost Function like MSE (L2
loss), MAE (L1 loss), etc.,
• As we tune our model to improve the 2. Binary Classification cost Functions
predictions, the cost function acts as like binary cross entropy
an indicator of how the model has 3. Multi-class Classification cost
improved. This is essentially an Functions like categorical cross
optimization problem. The entropy.
optimization strategies always aim at
“minimizing the cost function”.
OPTIMIZATION TECHNIQUES
• Optimizers are algorithms or • There are different types of
methods used to change the optimization algorithms available,
attributes of your neural network such as
such as weights and learning rate in • Gradient Descent
order to reduce the losses. • Stochastic Gradient Descent (SGD)
• How you should change your • Mini Batch Stochastic Gradient
weights or learning rates of your Descent (MB-SGD)
neural network to reduce the • SGD with momentum
losses is defined by the optimizers • Nesterov Accelerated Gradient (NAG)
you use. • Adaptive Gradient (AdaGrad)
• AdaDelta
• RMSprop
• Adam
GRADIENT DESCENT
• Gradient Descent algorithm is the
optimization algorithm which tries to find
the minimum value of the cost or loss
function by taking steps in the opposite
direction of the gradients. So if the
gradient is positive, the model will
decrease the weights where the gradient
is negative, the model will increase the
weights in order to decrease the total loss.
• After we found our gradient, we will
update our weights in the opposite
direction to the gradient for our loss
function to find the minimum value. So,
• W := W - learning rate * dW.
• b := b - learning rate * db.
GRADIENT DESCENT – LEARNING RATE
GRADIENT DESCENT
- FORMULA
DIFFERENT TYPES OF GD
• Stochastic Gradient Descent: SGD algorithm is an • Mini-batch SGD: MB-SGD algorithm is an
extension of the Gradient Descent and it extension of the SGD algorithm and it
overcomes some of the disadvantages of the GD overcomes the problem of large time
algorithm. Gradient Descent has a disadvantage complexity in the case of the SGD
that it requires a lot of memory to load the entire algorithm.
dataset of n-points at a time to compute the
derivative of the loss function. In the SGD • MB-SGD algorithm takes a batch of points
algorithm derivative is computed taking one point or subset of points from the dataset to
at a time. SGD performs a parameter update for compute derivate.
each training example and its label • It is observed that the derivative of the loss
• Batch Gradient Descent: Batch Gradient Descent function for MB-SGD is almost the same as
considers all the training examples before a derivate of the loss function for GD after
updating the weights. it takes the average of the some number of iterations.
gradients of all the training examples and then • But the number of iterations to achieve
uses that mean gradient to update our minima is large for MB-SGD compared to
parameters. GD and the cost of computation is also
large.
PROBLEM IN NN & BACKPROPAGATION – NOTATIONS, FORMULAE AND SOLUTION
NN PROBLEM TO SOLVE

ANSWER IS IN THE VIDEO:


https://www.youtube.com/watch
?v=x7gupF_o69w&t=4s
NN PROBLEM TO SOLVE

ANSWER IS IN THE VIDEO:


https://www.youtube.com/watch
?v=8Rz-Nwop2ME&t=1s
REGULARIZATION
• OVERFITTING: Model performs exceptionally well on • L2 & L1 regularization: L1 and L2 are the most
train data but is not able to predict test data. common types of regularization. These update the
• Regularization is a technique used in machine general cost function by adding another term known
learning and deep learning to prevent overfitting as the regularization term.
and improve the generalization performance of a • Cost function = Loss (say, binary cross entropy) +
model. Regularization term
• It involves adding a penalty term to the loss function • Due to the addition of this regularization term, the
during training. values of weight matrices decrease because it
• This penalty discourages the model from becoming assumes that a neural network with smaller weight
too complex or having large parameter values, matrices leads to simpler models. Therefore, it will
which helps in controlling the model’s ability to fit also reduce overfitting to quite an extent.
noise in the training data. • Dropout: At every iteration, it randomly selects
• Regularization methods include L1 and L2 some nodes and removes them along with all of
regularization, dropout, early stopping, and more. their incoming and outgoing connections. So each
iteration has a different set of nodes and this results
• By applying regularization, models become more in a different set of outputs and this randomness in
robust and better at making accurate predictions on learning leads to better generalization.
unseen data.
APPLICATION OF NNs
• Convolutional Neural Networks (CNN) are used for facial recognition and
image processing.
• To make a successful stock prediction in real time a Multilayer Perceptron
MLP (class of feedforward artificial intelligence algorithm) is employed.
• Recurrent Neural Network (RNN) is also being employed for the
development of voice recognition systems.
• Researchers are also employing Generative Neural Networks for drug
discovery. Matching different categories of drugs is a hefty task, but
generative neural networks have broken down the hefty task of drug
discovery. They can be used for combining different elements which forms
the basis of drug discovery.
• Combination models (MLP+CNN), (CNN+RNN) usually works better in the
case of weather forecasting.
• Neural networks can adapt to changing input data, enabling the network to
generate the best possible result without needing to redefine the output
criteria. It is a technique that is gaining popularity in data science problems
involving fraud prevention, healthcare, credit scoring and trading
APPLICATIONS FOR NNs
(examples more inclined to data analytics)
• Predictive Analytics: Neural networks are used for predictive
modeling. They can analyze historical data to make predictions about
future events, such as sales forecasting, stock price prediction, or
customer churn prediction.
• Embedding Visualization: Neural networks like autoencoders and
t-SNE (t-Distributed Stochastic Neighbor Embedding) are used to create
lower-dimensional embeddings of data. These embeddings can be
visualized in 2D or 3D space to gain insights into the structure of the data.
t-SNE, for instance, is often used for visualizing high-dimensional data
clusters.
• Market Forecasting: Neural networks analyze market data for trading
strategies and financial market predictions.
• Healthcare Analytics: Neural networks analyze medical data for tasks like
disease diagnosis, medical image analysis (MRI, CT scans), and drug
discovery.
• Social Media Analysis: Neural networks analyze social media data for
sentiment analysis, trend detection, and understanding user behaviour.
ADVANTAGES OF NN
1. Neural networks are flexible and can be used for both regression and
classification problems. Any data which can be made numeric can be
used in the model, as neural network is a mathematical model with
approximation functions.
2. Neural networks are good to model with nonlinear data with large
number of inputs; for example, images. It is reliable in an approach of
tasks involving many features. It works by splitting the problem of
classification into a layered network of simpler elements.
3. Once trained, the predictions are pretty fast.
4. Neural networks can be trained with any number of inputs and layers.
5. Neural networks work best with more data points.
LIMITATIONS OF NN
1. Having more hidden units (a higher-dimensional representation space)
allows your network to learn more-complex representations, but it
makes the network more computationally expensive and may lead to
learning unwanted patterns (patterns that will improve performance on
the training data but not on the test data).
2. Neural networks are black boxes, meaning we cannot know how much
each independent variable is influencing the dependent variables.
3. It is computationally very expensive and time consuming to train with
traditional CPUs.
4. Neural networks depend a lot on training data. This leads to the problem
of over-fitting and generalization. The mode relies more on the training
data and may be tuned to the data.
References
1. https://www.sas.com/en_in/insights/analytics/neural-networks.html
2. Chollet, Francois. 2017. Deep Learning with Python. New York, NY: Manning Publications. (Book)
3. https://wikidocs.net/165313
4. https://towardsdatascience.com/what-is-a-perceptron-210a50190c3b
5. https://www.analyticsvidhya.com/blog/2023/01/gradient-descent-vs-backpropagation-whats-the-difference/
6. https://www.datacamp.com/tutorial/tutorial-gradient-descent
7. https://iq.opengenus.org/backpropagation-vs-gradient-descent/
8. https://www.kdnuggets.com/2020/12/optimization-algorithms-neural-networks.html
9. https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/
10. https://www.analyticssteps.com/blogs/8-applications-neural-networks
11. https://subscription.packtpub.com/book/data/9781788397872/1/ch01lvl1sec27/pros-and-cons-of-neural-networks
12. https://www.analyticsvidhya.com/blog/2021/02/cost-function-is-no-rocket-science/#:~:text=The%20terms%20cost%20fun
ction%20%26%20loss,over%20an%20entire%20training%20dataset.
13. https://www.geeksforgeeks.org/activation-functions/
14. https://www.analyticsvidhya.com/blog/2022/01/activation-functions-for-neural-networks-and-their-implementation-in-py
thon/
15. https://www.youtube.com/watch?v=8Rz-Nwop2ME&t=1s
16. https://www.youtube.com/watch?v=tUoUdOdTkRw

You might also like