0% found this document useful (0 votes)
89 views39 pages

DL Unit-2 Notes PPT

The document discusses unconstrained optimization techniques used in deep learning, including stochastic gradient descent, momentum, adaptive learning rate methods like AdaGrad and RMSProp, and second-order optimization methods like Newton's method and L-BFGS.

Uploaded by

Bitra Venugopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views39 pages

DL Unit-2 Notes PPT

The document discusses unconstrained optimization techniques used in deep learning, including stochastic gradient descent, momentum, adaptive learning rate methods like AdaGrad and RMSProp, and second-order optimization methods like Newton's method and L-BFGS.

Uploaded by

Bitra Venugopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

INTRODUCTION TO SINGLE-LAYER PERCEPTRON

• A single-layer perceptron is a fundamental building block of artificial neural


networks and is one of the simplest forms of neural networks. It is a type of
feedforward neural network that consists of a single layer of artificial neurons,
also known as perceptron's. The perceptron was developed in the 1950s and laid
the foundation for modern neural networks.
• Mathematically, the output of a single-layer perceptron can be represented as
follows:
• output = activation_function(weighted_sum(inputs))
INTRODUCTION TO SINGLE-LAYER PERCEPTRON
• The basic idea behind a single-layer perceptron is to take a set of input
values, multiply them by corresponding weights, and then pass the
weighted sum through an activation function to produce an output. The
activation function introduces non-linearity into the model and helps in
making decisions or predictions based on the input.
INTRODUCTION TO SINGLE-LAYER PERCEPTRON

The appropriate weights are applied to the inputs, and the


resulting weighted sum passed to a function that produces
the output o.
INTRODUCTION TO SINGLE-LAYER PERCEPTRON
A diagram showing a perceptron updating its
linear boundary as more training examples
are added
What is gradient descent.
• The weights associated with each input determine the influence
of that input on the overall output.

• During the learning process, the weights are adjusted based on


the training data to optimize the network's performance.

• This adjustment is typically done using a learning algorithm,


such as the perceptron learning rule or gradient descent.
The activation function
• The activation function is a crucial component of the single-
layer perceptron.

• It introduces non-linear transformations that enable the model to


learn complex patterns and make non-linear decisions.

• Common activation functions used in single-layer perceptron's


include the step function, sigmoid function, and ReLU (Rectified
Linear Unit) function.
Perceptron as a classifier
It is a simple algorithm that can be used to classify input
data into two classes, typically referred to as positive and
negative classes or class 1 and class 0.
• The perceptron algorithm works as follows:
• Initialization:
• Training:
• Prediction:
Perceptron as a classifier
• Let us take One example of using a perceptron as a
classifier is to classify whether an email is spam or not
based on a set of features extracted from the email.

• Let's say we have a dataset of emails, where each email is


represented by a feature vector that contains
characteristics such as the number of words, presence of
certain keywords, punctuation usage, etc. Additionally,
each email is labeled as either spam (class 1) or not spam
(class 0).
• The perceptron can be trained on this dataset using the
following steps:
simple implementation of the perceptron algorithm for
email spam classification using Python:
import numpy as np
import matplotlib.pyplot as plt

class Perceptron:
def __init__(self, num_features, learning_rate=0.1,
max_epochs=100):
self.weights = np.zeros(num_features)
self.bias = 0.0
self.learning_rate = learning_rate
self.max_epochs = max_epochs
simple implementation of the perceptron algorithm for
email spam classification using Python:
def train(self, X, y):
for epoch in range(self.max_epochs):
misclassified = 0
for i in range(len(X)):
prediction = self.predict(X[i])
error = y[i] - prediction
if error != 0:
misclassified += 1
self.weights += self.learning_rate * error * X[i]
self.bias += self.learning_rate * error
if misclassified == 0:
break
simple implementation of the perceptron algorithm for
email spam classification using Python:
def predict(self, x):
activation = np.dot(self.weights, x) + self.bias
return np.sign(activation)

def plot_decision_boundary(self, X, y):


fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c=y)
ax.set_xlabel('Feature 1')
ax.set_ylabel('Feature 2')
ax.set_title('Perceptron Decision Boundary')
simple implementation of the perceptron algorithm for
email spam classification using Python:
x_min, x_max = min(X[:, 0]) - 1, max(X[:, 0]) + 1
y_min, y_max = min(X[:, 1]) - 1, max(X[:, 1]) + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1))
Z = np.array([self.predict([x1, x2]) for x1, x2 in
zip(xx.flatten(), yy.flatten())])
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, alpha=0.3)

plt.show()
simple implementation of the perceptron algorithm for
email spam classification using Python:
# Example usage:
X = np.array([[1, 1], [2, 3], [4, 5], [5, 4]]) # Feature vectors of
emails
y = np.array([0, 0, 1, 1]) # Class labels (0 for not spam, 1 for
spam)

perceptron = Perceptron(num_features=X.shape[1])
perceptron.train(X, y)

# Plotting decision boundary:


perceptron.plot_decision_boundary(X, y)
Adaptive filtering problem
• The adaptive filtering problem refers to the task of
designing neural network models that can dynamically
adjust their parameters to improve their performance on
changing input data.
• In deep learning, models are typically trained on a fixed
dataset, and the goal is to learn optimal parameters that
generalize well to new, unseen data.
• However, in certain scenarios where the input data
distribution changes over time, the model needs to adapt
to these changes to maintain its performance.
Adaptive filtering problem
• The adaptive filtering problem in deep learning can be
approached in several ways:
• Online Learning: Instead of training the model on a fixed
dataset offline, the model is incrementally updated as new
data arrives. This allows the model to adapt its parameters in
real-time and continuously improve its performance.
• Transfer Learning: The model is initially trained on a large
dataset and then fine-tuned on a smaller dataset specific to
the target task. This allows the model to leverage its prior
knowledge while adapting to the new data.
Adaptive filtering problem
• The adaptive filtering problem in deep learning can be approached
in several ways:
• Recurrent Neural Networks (RNNs): RNNs, such as LSTM or GRU
networks, are designed to handle sequential or time-series data.
They have the ability to retain information from past inputs and
adapt their internal state based on the current input, making them
suitable for adaptive filtering tasks.
• Online Batch Learning: The model is updated periodically in mini-
batches, where new data is combined with previously seen data.
This allows the model to adapt to recent changes while considering
the historical context.
Least Mean Square algorithm-LMS
• The LMS algorithm, on the other hand, is primarily used
in adaptive filtering and system identification tasks.
• It is an iterative algorithm that updates the weights of a
linear filter to minimize the mean squared error between
the filter output and the desired output.
• It is not specifically designed for training deep neural
networks.
Example code
Example code
Example code
Output
Example code for implementing a deep neural network
(DNN) for LMS-based speech enhancement in voice
communication systems:
Example code for implementing a deep neural network
(DNN) for LMS-based speech enhancement in voice
communication systems:
Example code for implementing a deep neural network
(DNN) for LMS-based speech enhancement in voice
communication systems:
unconstrained optimization techniques,in
deep learning
• Unconstrained optimization, in the context of deep learning,
refers to the task of finding the optimal values for the
parameters of a deep neural network without imposing any
constraints on those parameters. In other words, the
optimization problem does not include explicit constraints on the
values that the parameters can take.
• In deep learning, unconstrained optimization techniques are
commonly used to train neural networks by iteratively updating
the parameters based on the gradients of a chosen objective or
loss function. The objective is typically to minimize the loss
function, which measures the discrepancy between the
network's predictions and the ground truth.
unconstrained optimization techniques,in
deep learning
• Several popular unconstrained optimization techniques are
utilized in deep learning. Here are some notable ones:
• Stochastic Gradient Descent (SGD): SGD is a widely used
optimization algorithm in deep learning. It updates the
parameters based on the gradients of the loss function
computed on a subset (or a single example) of the training data
at each iteration. SGD often employs a learning rate that
determines the step size for parameter updates.
unconstrained optimization techniques, in
deep learning
2.Momentum: Momentum is an extension of SGD that introduces
a momentum term to accelerate the convergence. It accumulates
a fraction of the previous gradients to guide the parameter
updates. By taking into account the historical gradients,
momentum helps smooth out the noise in the gradient estimates
and enables faster convergence.
3.Adaptive Learning Rate Methods: These methods dynamically
adjust the learning rate during training based on the behavior of
the optimization process. Examples include AdaGrad, RMSProp,
and Adam. These techniques use information from past gradients
to adaptively scale the learning rate for each parameter, which
can improve convergence and handle different learning rates for
different parameters.
unconstrained optimization techniques, in
deep learning
4.Second-Order Optimization Methods: Second-order methods,
such as Newton's method and Quasi-Newton methods (e.g., L-
BFGS), incorporate second-order derivatives (Hessian) or
approximations to the Hessian into the optimization process.
These methods can converge faster than first-order methods like
SGD, but they require more computational resources due to the
computation and storage of the Hessian or its approximation.
Proof of Convergence.
• In the context of deep learning, "proof of convergence" refers to
demonstrating that an optimization algorithm used to train a deep
neural network will converge to an optimal solution under certain
conditions. Convergence, in this context, means that the algorithm
will eventually reach a point where further iterations do not
significantly improve the performance of the network on a given
task, such as minimizing the loss function.
• Proving convergence in deep learning is a challenging task due to
the complex and non-convex nature of the optimization problem.
Most deep learning models are trained using iterative optimization
algorithms, such as stochastic gradient descent (SGD) or its
variants, which update the network parameters based on the
gradients of the loss function with respect to those parameters.
Proof of Convergence.
• While there is no general proof of convergence for all deep learning
models, there have been theoretical and empirical results that provide
insights into the convergence behavior of specific algorithms and
architectures under certain conditions. Here are a few notable examples:

• Convex Loss Functions: If the loss function being optimized is convex,


which means it has a single global minimum, then various optimization
algorithms, including SGD, have been proven to converge to the global
optimum given appropriate learning rate schedules.
• Shallow Networks: For shallow neural networks with only a few layers,
there are theoretical guarantees of convergence. For example, a single-
layer neural network with a linear activation function and mean squared
error loss can be shown to converge to the optimal solution in a finite
number of steps.
Proof of Convergence.
• Convolutional Neural Networks (CNNs): While convergence proofs for
CNNs are challenging due to their depth and non-convexity, empirical
evidence suggests that well-tuned SGD with appropriate learning rate
schedules can effectively train CNNs to achieve high performance on
various tasks, such as image classification.
• Regularization Techniques: The use of regularization techniques, such as
weight decay (L2 regularization) or dropout, can aid in convergence by
preventing overfitting and improving the generalization of the model.
• It's important to note that proving convergence in deep learning is an
active area of research, and the theoretical understanding is still evolving.
Empirical observations and practical heuristics often guide the training
process in deep learning, and researchers and practitioners continue to
explore new algorithms, architectures, and optimization strategies to
improve convergence properties and performance in practice.
Introduction to Multilayer Perceptron
• A multi-layer perceptron (MLP) is a type of feedforward artificial neural network
that consists of multiple layers of nodes, also known as artificial neurons or
units. It is a foundational architecture in deep learning and serves as the basis
for many modern neural network models.
• In an MLP, information flows in one direction, from the input layer through one or
more hidden layers to the output layer. Each layer is composed of multiple
nodes, and each node is connected to every node in the subsequent layer.
These connections, often represented by weighted edges, transmit the output
activations from one layer to the next.
• The nodes in an MLP typically employ an activation function, such as the
sigmoid, ReLU, or tanh function, to introduce non-linearity into the network. This
non-linearity allows the MLP to model complex relationships and make it
capable of learning nonlinear mappings between input and output.
• The number of hidden layers and the number of nodes in each layer are
hyperparameters that can be adjusted to suit the specific problem and data at
hand. Deep MLPs refer to those with multiple hidden layers, and they are
capable of learning more intricate representations of the input data.
Introduction to Multilayer Perceptron
• MLPs are trained using supervised learning, where labeled examples are
provided for the network to learn from. The training process typically
involves an optimization algorithm, such as stochastic gradient descent
(SGD), to update the weights of the connections between nodes based on
the gradients of a chosen loss function. The objective is to minimize the
discrepancy between the predicted output of the MLP and the true output
labels.
• MLPs have been successfully applied to a wide range of tasks, including
classification, regression, and pattern recognition problems. They are
known for their ability to model complex relationships and perform well on
a variety of datasets.
• It's important to note that while MLPs were among the early neural
network architectures, the term "deep learning" typically refers to networks
with many more layers than traditional MLPs. However, the basic
principles of an MLP, such as layered architecture, feedforward
connections, and nonlinear activations, form the foundation for the
development of deeper and more complex neural network architectures.
Preliminaries of MLP
• To understand the multi-layer perceptron (MLP) better, let's discuss its
preliminary components and concepts:
1.Neuron (Node): Neurons are the fundamental units of computation in an MLP.
Each neuron takes input signals, performs a weighted sum of the inputs, applies
an activation function to the sum, and produces an output. The output is then
passed to the neurons in the subsequent layer.
2.Input Layer: The input layer of an MLP receives the input data, which could be
features or raw data representations. Each neuron in the input layer
corresponds to an input feature.
3.Hidden Layers: Hidden layers are the layers between the input and output
layers in an MLP. They are called "hidden" because they are not directly
connected to the external environment or receive any explicit supervision signal.
Hidden layers enable the network to learn complex representations of the input
data.
4.Output Layer: The output layer of an MLP produces the final outputs of the
network. The number of neurons in the output layer depends on the specific task
at hand. For example, in binary classification, a single neuron with a sigmoid
activation function may be used, while multi-class classification tasks may
require multiple neurons with softmax activations.
Introduction to Multilayer Perceptron
1.Connections and Weights: Neurons in adjacent layers are fully connected
in an MLP, meaning each neuron in one layer is connected to every
neuron in the next layer. Each connection between neurons is associated
with a weight, which represents the strength or importance of the
connection. During training, these weights are adjusted to optimize the
performance of the network.
2.Activation Function: The activation function of a neuron introduces non-
linearity into the network. It transforms the weighted sum of inputs into an
output activation value. Commonly used activation functions include
sigmoid, ReLU (Rectified Linear Unit), tanh (hyperbolic tangent), and
softmax.
3.Forward Propagation: Forward propagation refers to the process of
computing the output of the MLP based on the input data and the current
values of the weights and biases. It involves passing the input through the
layers of neurons, applying the activation functions, and producing the
Introduction to Multilayer Perceptron
1.Backpropagation: Backpropagation is the key algorithm used to train an
MLP. It calculates the gradients of the loss function with respect to the
weights and biases of the network by propagating the error backward from
the output layer to the input layer. These gradients are then used to update
the weights, iteratively improving the network's performance.
2.Loss Function: The loss function measures the discrepancy between the
predicted outputs of the MLP and the true outputs (labels). It quantifies the
error of the network's predictions. Common loss functions include mean
squared error (MSE), cross-entropy, and softmax loss.
3.Training: Training an MLP involves iteratively presenting training examples
to the network, computing the forward and backward passes, and updating
the weights and biases based on the calculated gradients. This process
aims to minimize the loss function and improve the network's ability to
make accurate predictions on unseen data.

You might also like