0% found this document useful (0 votes)
107 views

Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)

The document discusses feed forward neural networks and deep learning concepts. It provides details on: 1) The architecture of feed forward neural networks, including input, hidden, and output layers connected in a forward direction without loops. 2) How backpropagation works by calculating gradients to fine-tune weights and reduce errors through multiple iterations. 3) Common loss functions used in neural networks like mean squared error, likelihood, and log loss, and how they evaluate model performance. 4) Gradient descent optimization algorithms and types including batch, stochastic, and mini-batch gradient descent. 5) The importance of the sigmoid activation function in allowing neural networks to learn non-linear and complex problems.

Uploaded by

Mrunal Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)

The document discusses feed forward neural networks and deep learning concepts. It provides details on: 1) The architecture of feed forward neural networks, including input, hidden, and output layers connected in a forward direction without loops. 2) How backpropagation works by calculating gradients to fine-tune weights and reduce errors through multiple iterations. 3) Common loss functions used in neural networks like mean squared error, likelihood, and log loss, and how they evaluate model performance. 4) Gradient descent optimization algorithms and types including batch, stochastic, and mini-batch gradient descent. 5) The importance of the sigmoid activation function in allowing neural networks to learn non-linear and complex problems.

Uploaded by

Mrunal Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Week – 5 (Deep Learning)

Q. 1) Explain the architecture of Feed Forward Neural Network or


Multilayer Perceptron. (12 marks)

Ans: - Feed Forward Neural Networks, also known as Deep Feed Forward Networks or
Multilayer Perceptrons. For example, Convolutional and Recurrent Neural Networks
(which are used extensively in computer vision applications) are based on these
networks. Search engines, machine translation, and mobile applications all rely on deep
learning technologies. It works by stimulating the human brains in terms of identifying
and creating patterns from various types of input. A feed forward neural network is a
key component of this fantastic technology since it aids software developers with
pattern recognition and classification, non-linear regression, and function
approximation.

A feed forward neural network is a type of artificial neural network in which nodes
connections do not form a loop. Often referred to as a multilayered network or neurons,
feed forward neural networks are so named because all information flows in a forward
manner only. The data enters the input nodes, travels through the hidden layers, and
eventually exits the output nodes. The network is devoid of links that would allow the
information exiting the output node to be sent back into the network. The purpose of
feed forward neural networks is to approximate functions.

Here’s how it works

There is a classifier using the formula y = f*(x)

This assigns the value of input x to the category y.

The feed forward network will map y = f(x; θ). It then memorizes the value of θ that
most closely approximates the function.

Fig: - Feed Forward Neural Network


A Feed Forward Neural Network’s Layers:

The following are the components of a feed forward neural network:

Input Layer:

It contains the neurons that receive input. The data is subsequently passed on the next
tier. The input layer’s total number of neurons is equal to the number of variables in the
dataset.

Hidden Layer:

This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They
then communicate with the output layer.

Output Layer:

It is the last layer and is depending on the model’s construction. Additionally, the output
layer is the expected feature, as you are aware of the desired outcome.

Neurons weights:

Weights are used to describe the strength of a connection between neurons. The range
of a weight’s value is from 0 to 1.
Q. 2) What is Backpropagation & How Backpropagation algorithm works?
(6 marks)

Ans: - Backpropagation is the essence of neural network training. It is the method of


fine-tuning the weights of a neural network based on the error rate obtained in the
previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error
rates and make the model reliable by increasing its generalization.

Backpropagation in neural network is a short form for “backpropagation of errors”. It is


a standard method of training artificial neural networks. This method helps to calculate
the gradient of a loss function with respect to all the weights in the network.

The Backpropagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a time,
unlike a native direct computation. It computes the gradient, but it does not define how
the gradient is used. It generalizes the computation in the delta rule.

Consider the following Backpropagation neural network example diagram to


understand:

Fig: - Working of Backpropagation Algorithm

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers,
to the output layer.
4. Calculate the error in the outputs.

ErrorB= Actual Output – Desired Output


5. Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.

Keep repeating the process until the desired output is achieved.


Q.3) What is Loss Function? Explain types of loss function. (6 marks)

Ans: - At its core, a loss function is incredibly simple: It’s a method of evaluating how
well your algorithm models your dataset. If your predictions are totally off, your loss
function will output a higher number. If they’re pretty good, it’ll output a higher
number. If they’re pretty good, it’ll output a lower number. As you change pieces of your
algorithm to try and improve your model, your loss function will tell you if you’re
getting anywhere.

Types of loss functions: -

A few of the most popular loss functions currently being used, from simple to more
complex are: -

1. Mean square error:

Mean squared error (MSE) is the workhorse of basic loss functions; it’s easy to
understand and implement and generally works pretty well. To calculate MSE, you
take the difference between your predictions and the ground truth, square it, and
average it out across the whole dataset.

2. Likelihood loss:

The likelihood function is also relatively simple, and is commonly used in


classification problems. The function takes the predicted probability for each input
example and multiplies them. And although the output isn’t exactly human-
interpretable, it’s useful for comparing models.

For example, consider a model that outputs probabilities of [0.4, 0.6, 0.9, 0.1] for the
ground truth labels of [0, 1, 1, 0]. The likelihood loss would be computed as

(0.6) * (0.6) * (0.9) * (0.9) = 0.2916.

Since the model outputs probabilities for TRUE (or 1) only, when the ground truth
label is 0 we take (1-p) as the probability. In other words, we multiply the model’s
outputted probabilities together for the actual outcomes.

3. Log loss (Cross Entropy Loss):

Log loss is a loss function also used frequently in classification problems, and is one
of the most popular measures for kaggle competitions. It’s just a straightforward
modification of the likelihood function with logarithms.

This is actually exactly the same formula as the regular likelihood function, but with
logarithms added in. You can see that when the actual class is 1, the second half of the
function disappears, and when the actual class is 0, the first half drops. That way, we
just end up multiplying the log of the actual predicted probability for the ground truth
class.

The cool thing about the log loss function is that is has a kick: It penalizes heavily for
being very confident and very wrong. The graph below is for when the true label =1, and
you can see that it skyrockets as the predicted probability for label = 0 approaches 1.

Q. 4) What is Gradient descent? Explain the types of Gradient descent.

(3 marks)

Ans: - Gradient descent is an optimization algorithm which is commonly-used to train


machine learning models and neural networks. Training data helps these models learn
over time, and the cost function within gradient descent specifically acts as a barometer,
gauging its accuracy with each iteration of parameter updates. Until the function is close
to or equal to zero, the model will continue to adjust its parameters to yield the smallest
possible error.

Types of Gradient Descent: -

1. Batch gradient descent :


Batch gradient descent sums the error for each point in a training set, updating the
model only after all training examples have been evaluated. This process referred to
as a training epoch. While this batching provides computation efficiency, it can still
have a long processing time for large training datasets as it still needs to store all of
the data into memory. Batch gradient descent also usually produces a stable error
gradient and convergence, but sometimes that convergence point isn’t the most
ideal, finding the local minimum versus the global one.
2. Stochastic gradient descent :
Stochastic gradient descent (SGD) runs a training epoch for each example within the
dataset and it updates each training example's parameters one at a time. Since you
only need to hold one training example, they are easier to store in memory. While
these frequent updates can offer more detail and speed, it can result in losses in
computational efficiency when compared to batch gradient descent. Its frequent
updates can result in noisy gradients, but this can also be helpful in escaping the
local minimum and finding the global one.

3. Mini-batch gradient descent :


Mini-batch gradient descent combines concepts from both batch gradient descent
and stochastic gradient descent. It splits the training dataset into small batch
sizes and performs updates on each of those batches. This approach strikes a
balance between the computational efficiency of batch gradient descent and the
speed of stochastic gradient descent.

Q. 5) Why the Sigmoid function is important in neural networks?

(3 marks)

Ans: - If we use a linear activation function in a neural network, then this model can only
learn linearly separable problems. However, with the addition of just one hidden layer
and a sigmoid activation function in the hidden layer, the neural network can easily
learn a non-linearly separable problem. Using a non-linear function produces non-linear
boundaries and hence, the sigmoid function can be used in neural networks for learning
complex decision functions. The only non-linear function that can be used as an
activation function in a neural network is one which is monotonically increasing. So for
example, sin(x) or cos(x) cannot be used as activation functions. Also, the activation
function should be defined everywhere and should be continuous everywhere in the
space of real numbers. The function is also required to be differentiable over the entire
space of real numbers.

Typically a back propagation algorithm uses gradient descent to learn the weights of a
neural network. To derive this algorithm, the derivative of the activation function is
required. The fact that the sigmoid function is monotonic, continuous and differentiable
everywhere, coupled with the property that its derivative can be expressed in terms of
itself makes it easy to derive the update equations for learning the weights in a neural
network when using back propagation algorithm.

You might also like