0% found this document useful (0 votes)
160 views37 pages

DL Unit 3 Notes PPT

The backpropagation algorithm allows neural networks to be trained by propagating errors backwards from the output to the input layers and adjusting weights to minimize loss. It consists of forward and backward propagation, where activations are calculated layer by layer during forward propagation and errors are calculated and weights updated during backward propagation using an optimization algorithm like gradient descent.

Uploaded by

Bitra Venugopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views37 pages

DL Unit 3 Notes PPT

The backpropagation algorithm allows neural networks to be trained by propagating errors backwards from the output to the input layers and adjusting weights to minimize loss. It consists of forward and backward propagation, where activations are calculated layer by layer during forward propagation and errors are calculated and weights updated during backward propagation using an optimization algorithm like gradient descent.

Uploaded by

Bitra Venugopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Back propagation algorithm

Back propagation algorithm


• Backpropagation is a fundamental algorithm used in training
artificial neural networks, especially in deep learning.
• It allows the network to learn from its mistakes by adjusting the
weights of the neurons to minimize the difference between
predicted outputs and actual outputs.
• The backpropagation algorithm consists of two main phases:
• forward propagation and
• backward propagation.
Back propagation algorithm
• Forward Propagation:
• During forward propagation, the input data is fed into the
neural network, and the activations of each neuron are
calculated layer by layer, moving from the input layer to the
output layer. The steps involved in forward propagation are as
follows:
• Input Data: The input data is provided to the neural network
and it propagates through the network one layer at a time.
• By iteratively updating the weights using backpropagation, the
neural network can gradually learn to make better predictions
leading to improved performance on the given task. Deep
learning architectures, which have many hidden layers, benefit
significantly from this training algorithm, as it allows them to
learn complex representations from data.
Back propagation algorithm
• b. Weighted Sum: Each neuron's input is the weighted sum of its inputs
(input data or activations from the previous layer) and their corresponding
weights. This calculation is performed for every neuron in the hidden
layers.
• c. Activation Function: After the weighted sum is computed, it is passed
through an activation function, which introduces non-linearity to the model
Common activation functions include ReLU (Rectified Linear Unit)
Sigmoid, and Tanh.
• d. Output: The output of the activation function becomes the activations o
the neurons in the next layer and is used as input for that layer's neurons.
Back propagation algorithm
• Backward Propagation: During backward propagation (also
known as backpropagation), the errors between the predicted
outputs and the actual outputs are used to adjust the
network's weights. The algorithm aims to minimize the error
between the predicted outputs and the ground truth labels
The steps involved in backpropagation are as follows:

• Loss Function: A loss function is used to measure the


difference between the predicted outputs and the actua
outputs. Common loss functions include Mean Squared Error
(MSE) for regression problems and Cross-Entropy Loss for
classification problems.
Back propagation algorithm
• b. Gradient Calculation: The gradient of the loss function with
respect to each weight in the network is calculated. The
gradient indicates the direction and magnitude of the weight
adjustment required to reduce the error.
• c. Weight Update: The weights are adjusted using an
optimization algorithm (e.g., Stochastic Gradient Descent
Adam) to move in the direction that minimizes the loss
function. The learning rate determines the step size of the
weight update.
• d. Propagation: The gradients are propagated backward
through the network, and the process is repeated iteratively for
a defined number of epochs or until the loss converges to a
satisfactory level.
Back propagation algorithm-training and
convergence
• Backpropagation is the core algorithm for training neural
networks, and convergence refers to the point at which the
training process reaches a satisfactory level of performance.
Let's discuss how backpropagation is used for training and how
convergence is achieved.
• The backpropagation algorithm consists of two main phases:
• forward propagation and
• backward propagation. Discussed previously
Back propagation algorithm-training and
convergence
• Backpropagation for Training: The backpropagation algorithm follows
these steps for training a neural network:
• Forward Propagation:
• Loss Calculation:
• Backward Propagation:
• Weight Update:

• Iterative Process: Steps a to d are repeated iteratively for a fixed


number of epochs (complete passes through the entire dataset) or
until the loss converges to a satisfactory level.
Back propagation algorithm-training and
convergence
2.Convergence: Convergence in the context of training neural
networks means that the model has reached a state where further
training does not significantly improve its performance on the training
data. Several factors can affect convergence:
• a. Learning Rate: The learning rate determines the step size of
weight updates during optimization. A large learning rate may lead to
overshooting, preventing convergence, while a very small learning
rate may cause slow convergence or get stuck in local minima.
• b. Model Architecture: The complexity of the neural network
architecture can impact convergence. A model that is too simple may
not have enough capacity to learn from the data, while an
excessively complex model may overfit the training data and not
generalize well.
Back propagation algorithm-training and
convergence
• Data Quality and Quantity: The size and quality of the training
dataset influence convergence. Larger and diverse datasets
often lead to better generalization and quicker convergence.
• d. Regularization: Techniques like L1 and L2 regularization can
help prevent overfitting, promoting faster convergence and
better generalization.
• e. Batch Size: The size of the mini-batch used during training
can also impact convergence. Smaller batch sizes can lead to
more frequent weight updates but may introduce more noise,
while larger batch sizes may provide a smoother learning
process but can be computationally expensive.
Practical and design issues
• The backpropagation algorithm, being the core of training deep neural
networks, has its own set of practical and design considerations. Let's delve
into some of the key issues related to backpropagation:
1.Vanishing and Exploding Gradients: In deep neural networks with many
layers, the gradients can sometimes become too small (vanishing gradients) or
too large (exploding gradients) as they propagate backward through the
network during backpropagation. Vanishing gradients can lead to slow
convergence or prevent the network from learning effectively, while exploding
gradients can cause instability during training. Techniques like using
appropriate activation functions, weight initialization methods, and gradient
clipping can mitigate these issues.
2.Learning Rate and Learning Rate Scheduling: Selecting an appropriate
learning rate is crucial for successful training. If the learning rate is too large, it
can cause the optimization process to overshoot the optimal weights,
preventing convergence. On the other hand, if the learning rate is too small,
training can be excessively slow. Learning rate scheduling, such as decreasing
the learning rate over time, can improve convergence and fine-tune the model
Practical and design issues
1.Choice of Activation Functions: The choice of activation functions affects
the learning process. Traditional activation functions like sigmoid can suffer
from the vanishing gradient problem. The use of ReLU (Rectified Linear Unit)
or its variants (e.g., Leaky ReLU, Parametric ReLU) is common due to their
ability to mitigate the vanishing gradient issue and improve convergence.
2.Initialization of Weights: Proper initialization of weights is important for
avoiding training instability and ensuring that the model converges efficiently.
Techniques like Xavier/Glorot initialization or He initialization are commonly
used to set initial weights based on the size of the layers.
3.Batch Size and Computational Efficiency: The choice of batch size affects
the training process. Larger batch sizes can lead to more stable weight
updates but require more memory, while smaller batch sizes can introduce
noise but may lead to faster convergence. Additionally, for larger networks,
training can be computationally expensive, and strategies like mini-batch
processing and distributed training may be necessary.
Practical and design issues
1.Overfitting and Regularization: The backpropagation algorithm can
lead to overfitting, where the model performs well on the training data but
poorly on unseen data. Regularization techniques like L1 and L2
regularization, dropout, and batch normalization can help prevent
overfitting and improve generalization.
2.Local Minima and Optimization Algorithms: Neural networks' loss
landscapes can have many local minima, making the optimization
process challenging. Various optimization algorithms like Stochastic
Gradient Descent (SGD), Adam, RMSprop, etc., have different
characteristics that affect convergence and the potential of getting stuck
in suboptimal solutions.
3.Numerical Stability and Precision: Backpropagation involves
numerous calculations, and numerical stability can become an issue,
particularly when dealing with very large or very small numbers. Ensuring
the use of appropriate data types and numerical precision is important to
avoid numerical errors during training.
Practical and design issues
1.Batch Normalization and Layer Normalization: Batch normalization
and layer normalization are techniques that help improve training
stability and convergence by normalizing the activations within a layer.
These normalization techniques can accelerate training and mitigate
some of the issues related to vanishing and exploding gradients.
• Addressing these practical and design issues related to backpropagation
is crucial for training deep neural networks effectively. Experimentation,
hyperparameter tuning, and understanding the characteristics of the
specific problem and architecture are key to achieving optimal results.
Linear and logistic regression using MLP
• Multilayer Perceptrons (MLPs) are primarily used for solving
classification tasks like Logistic Regression and regression tasks like
Linear Regression. MLPs can be considered as a generalization of linear
models, where multiple layers of neurons are stacked together to learn
complex patterns and non-linear relationships in the data.
• Let's implement Linear Regression and Logistic Regression using MLP
in Python using the TensorFlow library.
1.Linear Regression using MLP: Linear Regression aims to find a linear
relationship between the input features and the target variable. In an
MLP, it can be achieved using a single output neuron and Mean Squared
Error (MSE) as the loss function.
Linear and logistic regression using MLP
• import numpy as np
• import tensorflow as tf
• from tensorflow.keras.models import Sequential
• from tensorflow.keras.layers import Dense

• # Sample data for linear regression


• x_train = np.random.rand(100, 1) # Input features (random values)
• y_train = 2 * x_train + 1 + np.random.randn(100, 1) * 0.1 # Target variable (linear
relationship)

• # Create and compile the MLP model for Linear Regression


• model_linear = Sequential([
• Dense(units=1, input_shape=(1,))
• ])
Linear and logistic regression using MLP
• model_linear.compile(optimizer='adam', loss='mean_squared_error')

• # Train the model


• model_linear.fit(x_train, y_train, epochs=100, batch_size=10)

• # Get the learned parameters


• weights, biases = model_linear.get_weights()
• print("Learned Weight:", weights[0][0])
• print("Learned Bias:", biases[0])
Output
Logistic regression
• Logistic Regression is a statistical method used for binary
classification, which involves predicting one of two
possible outcomes based on input features.

• Despite its name, logistic regression is actually a linear


model that models the probability of the dependent
variable belonging to a particular class.
Logistic regression
• In logistic regression, the output is transformed using the
logistic function (also known as the sigmoid function) to
ensure that the predicted values are between 0 and 1.
This transformation allows the output to be interpreted
as a probability.
• The logistic function (sigmoid) is defined as:
sigmoid(x) = 1 / (1 + exp(-x))
Logistic regression
• The equation for logistic regression can be represented as
follows:
• P(y = 1 | X) = sigmoid(w0 + w1*x1 + w2*x2 + ... + wn*xn)
• Where:

• P(y = 1 | X) is the probability of the positive class given


the input features X.
• sigmoid is the logistic (sigmoid) function.
• w0, w1, w2, ..., wn are the weights assigned to the input
features x0, x1, x2, ..., xn.
Logistic regression
• Logistic Regression is widely used for various applications
such as
• spam detection,
• medical diagnosis,
• credit scoring, and more. While it is a linear model, it can
still be effective when the relationships between features
and the target variable are approximately linear or when
interpretability is important. However, for more complex
patterns, non-linear relationships, and tasks, more
advanced models like neural networks (such as Multi-
Layer Perceptrons) are often used.
Introduction to Convolutional Neural Networks
(CNNs)
Convolutional Neural Networks (CNNs):
CNNs are a class of deep neural networks that are particularly
effective in handling tasks related to computer vision, such as
image classification,
object detection,
image segmentation, and more.
CNNs are designed to automatically learn and extract features
from input data, making them well-suited for tasks where the
spatial relationships among data points are important, such as
images.
CNN
Key Concepts of (CNNs)
1.Convolutional Layer: This is the core building block of CNNs. It applies a set
of filters (also known as kernels) to the input data, performing convolution
operations. These filters help in detecting different features in the input data,
such as edges, textures, and patterns.
2.Pooling Layer: Pooling layers downsample the spatial dimensions of the
input, reducing the computational complexity and the number of parameters
in the network. Max pooling and average pooling are commonly used methods
to achieve this downsampling while retaining essential information.
3.Activation Functions: Common activation functions used in CNNs include
ReLU (Rectified Linear Activation), which introduces non-linearity to the
model.
4.Fully Connected Layers: These layers are typically used at the end of the
CNN architecture to make final predictions. They take the high-level features
extracted by earlier layers and produce class probabilities or regression
values.
5.Backpropagation: CNNs, like other neural networks, learn through
backpropagation. During training, the network adjusts its weights based on
the error between its predictions and the actual target values.
Key Concepts of (CNNs)
1.Convolutional Layer: This is the core building block of CNNs. It
applies a set of filters (also known as kernels) to the input data,
performing convolution operations. These filters help in detecting
different features in the input data, such as edges, textures, and
patterns.
Key Concepts of (CNNs)
1.Convolutional Layer: This is the core building block of CNNs. It
applies a set of filters (also known as kernels) to the input data,
performing convolution operations. These filters help in detecting
different features in the input data, such as edges, textures, and
patterns.
Key Concepts of (CNNs)
1.Convolutional Layer: This is the core building block of CNNs. It
applies a set of filters (also known as kernels) to the input data,
performing convolution operations. These filters help in detecting
different features in the input data, such as edges, textures, and
patterns.
What is Convolution Operation
• convolution is a key operation that enables the network to
learn and extract features from input data, particularly for
tasks like image analysis.
• In CNNs, convolution involves sliding a filter (also called a
kernel) over an input image or feature map to compute a
new representation of the data.
• The filter contains learnable weights that are adjusted
during the training process to capture specific features,
• such as edges,
• textures, or
• patterns, from the input data.
What is Convolution Operation
The convolution operation can be defined as follows:

Place the filter on a patch of the input data.


Perform element-wise multiplication between the filter and
the corresponding elements of the input patch.
Sum the products of the multiplications to obtain a single
value in the output feature map.
Slide the filter over the input data with a defined stride and
repeat the process to fill the entire output feature map.
What is Convolution Operation
The convolution operation can be defined as follows:
In mathematical terms, the convolution operation
(denoted as ∗) between a filter F and an input data
matrix I at a specific position (i, j) is given by:
Convolution Operation
Pooling layers
• Pooling layers down sample the spatial dimensions of the input,
reducing the computational complexity and the number of
parameters in the network.
• Max pooling and average pooling are commonly used methods to
achieve this down sampling while retaining essential information.
Flatten Layer
• A Flatten Layer is a type of layer used in CNN architectures to reshape the multi-dimensional
output of the previous layers into a one-dimensional vector. This flattened vector is then
passed to fully connected layers for making final predictions. The Flatten Layer essentially
converts the spatial dimensions of the data into a single-dimensional format, allowing it to be
used as input to traditional dense (fully connected) layers.

• Here's a breakdown of how the Flatten Layer works:

• Convolution and Pooling Layers: In the earlier layers of a CNN, convolutional and pooling
operations are applied to the input data, resulting in feature maps that capture various
patterns and features in the input.

• Flatten Layer: After passing through several convolutional and pooling layers, the output
feature maps have a 3D shape (height, width, depth), where depth corresponds to the number
of filters or channels. The Flatten Layer takes this 3D output and "flattens" it into a 1D vector
by concatenating all the values together.

• Fully Connected Layers: The flattened vector is then passed to fully connected layers (also
known as dense layers) that perform classification, regression, or other tasks. These layers
can handle the flattened data as they are similar to the layers found in traditional neural
networks.

You might also like