Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Ans: - Feed Forward Neural Networks, also known as Deep Feed Forward Networks or
Multilayer Perceptrons. For example, Convolutional and Recurrent Neural Networks
(which are used extensively in computer vision applications) are based on these
networks. Search engines, machine translation, and mobile applications all rely on deep
learning technologies. It works by stimulating the human brains in terms of identifying
and creating patterns from various types of input. A feed forward neural network is a
key component of this fantastic technology since it aids software developers with
pattern recognition and classification, non-linear regression, and function
approximation.
A feed forward neural network is a type of artificial neural network in which nodes
connections do not form a loop. Often referred to as a multilayered network or neurons,
feed forward neural networks are so named because all information flows in a forward
manner only. The data enters the input nodes, travels through the hidden layers, and
eventually exits the output nodes. The network is devoid of links that would allow the
information exiting the output node to be sent back into the network. The purpose of
feed forward neural networks is to approximate functions.
The feed forward network will map y = f(x; θ). It then memorizes the value of θ that
most closely approximates the function.
Input Layer:
It contains the neurons that receive input. The data is subsequently passed on the next
tier. The input layer’s total number of neurons is equal to the number of variables in the
dataset.
Hidden Layer:
This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They
then communicate with the output layer.
Output Layer:
It is the last layer and is depending on the model’s construction. Additionally, the output
layer is the expected feature, as you are aware of the desired outcome.
Neurons weights:
Weights are used to describe the strength of a connection between neurons. The range
of a weight’s value is from 0 to 1.
Q. 2) What is Backpropagation & How Backpropagation algorithm works?
(6 marks)
The Backpropagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a time,
unlike a native direct computation. It computes the gradient, but it does not define how
the gradient is used. It generalizes the computation in the delta rule.
Ans: - At its core, a loss function is incredibly simple: It’s a method of evaluating how
well your algorithm models your dataset. If your predictions are totally off, your loss
function will output a higher number. If they’re pretty good, it’ll output a higher
number. If they’re pretty good, it’ll output a lower number. As you change pieces of your
algorithm to try and improve your model, your loss function will tell you if you’re
getting anywhere.
A few of the most popular loss functions currently being used, from simple to more
complex are: -
Mean squared error (MSE) is the workhorse of basic loss functions; it’s easy to
understand and implement and generally works pretty well. To calculate MSE, you
take the difference between your predictions and the ground truth, square it, and
average it out across the whole dataset.
2. Likelihood loss:
For example, consider a model that outputs probabilities of [0.4, 0.6, 0.9, 0.1] for the
ground truth labels of [0, 1, 1, 0]. The likelihood loss would be computed as
Since the model outputs probabilities for TRUE (or 1) only, when the ground truth
label is 0 we take (1-p) as the probability. In other words, we multiply the model’s
outputted probabilities together for the actual outcomes.
Log loss is a loss function also used frequently in classification problems, and is one
of the most popular measures for kaggle competitions. It’s just a straightforward
modification of the likelihood function with logarithms.
This is actually exactly the same formula as the regular likelihood function, but with
logarithms added in. You can see that when the actual class is 1, the second half of the
function disappears, and when the actual class is 0, the first half drops. That way, we
just end up multiplying the log of the actual predicted probability for the ground truth
class.
The cool thing about the log loss function is that is has a kick: It penalizes heavily for
being very confident and very wrong. The graph below is for when the true label =1, and
you can see that it skyrockets as the predicted probability for label = 0 approaches 1.
(3 marks)
(3 marks)
Ans: - If we use a linear activation function in a neural network, then this model can only
learn linearly separable problems. However, with the addition of just one hidden layer
and a sigmoid activation function in the hidden layer, the neural network can easily
learn a non-linearly separable problem. Using a non-linear function produces non-linear
boundaries and hence, the sigmoid function can be used in neural networks for learning
complex decision functions. The only non-linear function that can be used as an
activation function in a neural network is one which is monotonically increasing. So for
example, sin(x) or cos(x) cannot be used as activation functions. Also, the activation
function should be defined everywhere and should be continuous everywhere in the
space of real numbers. The function is also required to be differentiable over the entire
space of real numbers.
Typically a back propagation algorithm uses gradient descent to learn the weights of a
neural network. To derive this algorithm, the derivative of the activation function is
required. The fact that the sigmoid function is monotonic, continuous and differentiable
everywhere, coupled with the property that its derivative can be expressed in terms of
itself makes it easy to derive the update equations for learning the weights in a neural
network when using back propagation algorithm.