0% found this document useful (0 votes)

5 views19 pages

Unit 3 Endsem PYQs

Unit 3 covers the fundamentals of Neural Networks and their architecture, focusing on their application in big data. It explains the structure of Artificial Neural Networks, including input, hidden, and output layers, as well as the importance of activation functions and training processes. Additionally, it discusses perceptrons, the differences between linear and nonlinear neural networks, and the components of feedforward neural networks.

Uploaded by

Yuvraj kottalagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views19 pages

Unit 3 Endsem PYQs

Uploaded by

Yuvraj kottalagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Unit 3 Endsem PYQs

Unit III: Neural Networks for Big Data

Topic 1: Fundamentals of Neural Networks and Artificial

Neural Networks
Questions Covered:

May-Jun 2023 Q1 a) Explain fundamental of neural networks in big data. [6]

May-Jun 2024 Q2 b) Explain fundamental of neural networks in big data. [6]
May-Jun 2023 Q2 a) Explain architecture of artificial neural networks. [6]

Answer:

Fundamental of Neural Networks and Artificial Neural Networks in Big Data (May-Jun
2023 Q1 a & May-Jun 2024 Q2 b)

Artificial Neural Networks (ANNs), often simply called Neural Networks (NNs), are computing
systems inspired by the structure and function of the human brain. They are designed to
recognize patterns, learn from data, and make predictions or decisions.

Fundamentals:

1. Neurons (Nodes): The basic building blocks of an ANN are artificial neurons, analogous
to biological neurons. Each neuron receives inputs, processes them, and produces an
output.
2. Connections (Synapses) and Weights: Neurons are interconnected, and each
connection has an associated 'weight'. These weights represent the strength or
importance of the connection. During training, the network learns by adjusting these
weights.
3. Activation Function: After summing the weighted inputs, a neuron applies an activation
function (also known as a transfer function). This function introduces non-linearity,
allowing the network to learn complex patterns. Common activation functions include
Sigmoid, ReLU, Tanh, etc.
4. Layers: Neurons are typically organized into layers:
Input Layer: Receives the raw input data.
Hidden Layers: One or more layers between the input and output layers, where the
majority of the computational processing and feature extraction occurs.
Output Layer: Produces the final output of the network, which could be a prediction,
classification, or another form of decision.
5. Learning (Training): ANNs learn from data through a process called training. They are
fed with input data and corresponding desired outputs (labels). The network adjusts its
internal weights and biases to minimize the difference (error or loss) between its
predicted output and the true output. This process typically involves optimization
algorithms like Gradient Descent and Backpropagation.

Role in Big Data:

Neural Networks are particularly well-suited for handling big data due to their ability to:

Extract Complex Patterns: Big data often contains intricate, non-linear relationships
that traditional statistical methods might miss. Deep Neural Networks (with many hidden
layers) can automatically learn hierarchical features from vast amounts of raw data.
Scalability: Modern NN architectures and training algorithms (e.g., mini-batch gradient
descent) combined with distributed computing frameworks (like Apache Spark,
TensorFlow Distributed) and specialized hardware (GPUs, TPUs) allow NNs to scale to
massive datasets.
Feature Learning: Instead of requiring manual feature engineering (which is challenging
with big data's high dimensionality and variety), NNs can learn relevant features directly
from the data, which is crucial when dealing with unstructured big data (images, text,
audio).
Handling Variety: NNs, especially specialized architectures like Convolutional Neural
Networks (CNNs) for images and Recurrent Neural Networks (RNNs) for text/sequences,
can process diverse types of big data.
Predictive Power: With enough data and computational resources, NNs can achieve
state-of-the-art accuracy in tasks like predictive analytics, anomaly detection, and
recommendation systems on big data platforms.

Architecture of Artificial Neural Networks (May-Jun 2023 Q2 a)

The architecture of an Artificial Neural Network defines how its neurons are organized into
layers and how these layers are connected. The most common and fundamental architecture
is the Feedforward Neural Network (FNN), particularly the Multi-Layer Perceptron (MLP).

Key Components and Architecture:

1. Input Layer:
Consists of neurons that receive the initial data. Each neuron in the input layer
corresponds to a feature in the dataset.
No processing or activation functions are applied here; they simply pass the input
values to the next layer.
Example: For an image classification task, if the image is 28x28 pixels, the input
layer might have 784 neurons, each representing a pixel's intensity.
2. Hidden Layers:
One or more layers situated between the input and output layers. These layers are
where the network performs complex computations and learns intricate
representations of the input data.
Each neuron in a hidden layer receives inputs from all neurons in the preceding
layer, computes a weighted sum, adds a bias, and then applies an activation
function.
The number of hidden layers and neurons within them are hyperparameters that
need to be tuned. Deeper networks (with more hidden layers) are called "Deep
Neural Networks."
Example: In a sentiment analysis task, a hidden layer might learn to identify
combinations of words that signify positive or negative sentiment.
3. Output Layer:
Contains neurons that produce the final output of the network.
The number of neurons in the output layer depends on the type of problem:
Regression: Typically one neuron (e.g., predicting house price).
Binary Classification: One neuron (e.g., predicting spam/not spam) with a
sigmoid activation.
Multi-class Classification: One neuron per class (e.g., classifying digits 0-9
would have 10 output neurons) with a softmax activation.
The activation function used here depends on the problem type (e.g., Sigmoid for
binary, Softmax for multi-class classification, linear for regression).
4. Connections (Synapses):
Each neuron in one layer is typically connected to every neuron in the subsequent
layer (fully connected or dense layers).
Each connection has an associated numerical weight (w ), which represents the
ij

strength of the connection from neuron i in the previous layer to neuron j in the
current layer.
5. Biases (b ):
j

Each neuron (except input neurons) typically has an associated bias term (b ). j

The bias allows the activation function to be shifted, effectively providing an

additional degree of freedom for the model to fit the data. It helps the neuron activate
even when all inputs are zero.
6. Activation Functions (f ):
Mathematical functions applied to the weighted sum of inputs plus bias (
net j = ∑ (w ij ⋅ x i ) + b j
i
) within each neuron.
They introduce non-linearity, enabling the network to learn complex, non-linear
relationships in the data. Without non-linear activation functions, a multi-layer
network would behave like a single-layer linear model.
x −x

Common examples: ReLU (max(0, x)), Sigmoid ( 1

1+e
−x
), Tanh ( e −e
x
e +e
−x
).

General Architecture Diagram:

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

(Features) (Learned Reps) (More Abstract) (Predictions)

O O O O
O --W--> O --W--> O --W--> O --W--> O
O O O O
O O O O

^ ^ ^ ^
| | | |
Raw Data Feature Extr. Complex Mapping Decision/Output

Flow of Information: In feedforward networks, information flows strictly from the input
layer, through hidden layers, to the output layer, without any loops or cycles.
Learning: The network learns by adjusting the weights (W ) and biases (b) based on the
error between its predictions and the actual target values, typically using optimization
algorithms like Gradient Descent and Backpropagation.

Topic 2: Perceptron, Linear & Nonlinear Models

Questions Covered:

May-Jun 2023 Q1 b) What is perceptron? Explain Types of Perceptron. [6]

May-Jun 2023 Q2 b) Difference between linear and nonlinear neural networks? [6]
May-Jun 2024 Q1 b) Difference between linear and nonlinear neural networks? [6]

Answer:

What is Perceptron? Explain Types of Perceptron. (May-Jun 2023 Q1 b)

A Perceptron is the simplest form of an artificial neural network, proposed by Frank
Rosenblatt in 1957. It is a fundamental building block of ANNs and is a binary linear
classifier, meaning it can classify inputs into one of two categories based on a linear decision
boundary.

How a Perceptron Works:

1. Input: A perceptron receives multiple input signals (x 1

, x2 , … , xn ).
2. Weights: Each input x is multiplied by an associated weight w .
i i

3. Weighted Sum: The weighted inputs are summed up, along with a bias term (b). This is
called the net input:
n

net = ∑(x i ⋅ w i ) + b

i=1

4. Activation (Step) Function: The net input is then passed through a step (or threshold)
activation function. For a simple perceptron, this is typically a Heaviside step function:

1 if net ≥ threshold (or 0)

output = f (net) = {
0 if net < threshold (or 0)

This function outputs either 1 or 0 (or +1 or -1), classifying the input into one of two
classes.

Perceptron Diagram:

w1
x1 ----O----
w2 \
x2 ----O---- \ SUM + Bias ----> Activation Function ----> Output
... /
xn ----O---- /
wn

Types of Perceptron:

1. Single-Layer Perceptron (SLP):

Architecture: Consists of only an input layer and an output layer (a single layer of
processing units). There are no hidden layers.
Capability: SLPs can only solve linearly separable problems. This means they can
draw a single straight line (or hyperplane in higher dimensions) to separate the data
points belonging to different classes.
Example: It can learn to classify logical AND or OR gates because their outputs are
linearly separable. However, it cannot solve the XOR problem, as XOR's output
pattern is not linearly separable.
Training: Uses the Perceptron Learning Rule, which iteratively adjusts weights
based on misclassified examples.

Input Layer Output Layer

(Features) (Binary Class)

O--------------------O (Single Neuron)

O--------------------O
O--------------------O

2. Multi-Layer Perceptron (MLP):

Architecture: Consists of one or more hidden layers between the input and output
layers. Each layer is typically fully connected to the next.
Capability: By introducing hidden layers and, crucially, non-linear activation
functions (like sigmoid, ReLU, tanh) in the hidden layers, MLPs can learn and
approximate non-linearly separable problems. This is because the non-linear
activations allow the network to learn complex, curved decision boundaries.
Example: MLPs can effectively solve the XOR problem and are the foundation for
most modern deep learning architectures.
Training: Typically trained using the Backpropagation algorithm, which is a
generalized version of gradient descent for multi-layered networks.
Note: While technically a network of perceptrons, the term "Multi-Layer Perceptron"
often refers to feedforward neural networks with at least one hidden layer and non-
linear activations, rather than simple step function perceptrons in each neuron.

Input Layer Hidden Layer(s) Output Layer

(Features) (Non-linear mapping) (Classification/Regression)

O------------------O------------------O
O------------------O------------------O
O------------------O------------------O

Difference between Linear and Nonlinear Neural Networks? (May-Jun 2023 Q2 b &
May-Jun 2024 Q1 b)

The core distinction between linear and non-linear neural networks lies in their activation
functions and, consequently, their ability to model complex relationships in data.
Feature Linear Neural Network Nonlinear Neural Network
Activation Uses a linear activation function Uses non-linear activation
Function (e.g., f (x) = x) or no activation functions (e.g., Sigmoid, ReLU,
function at all. The output is a direct Tanh, Softmax).
sum of weighted inputs.
Output Type The output is a linear combination The output is a non-linear
of its inputs. transformation of its inputs.
Decision Can only learn linear decision Can learn complex, non-linear
Boundary boundaries (straight lines, planes, decision boundaries (curves,
or hyperplanes). complex shapes).
Problem Suitable only for linearly separable Capable of solving non-linearly
Solving problems. If the data cannot be separable problems, which are
separated by a straight line, it will common in real-world data.
fail.
Complexity Simpler, less powerful. Even More complex, powerful, and
multiple layers of linear activations capable of learning intricate
would still result in an overall linear patterns. Deeper networks (Deep
model (a composition of linear Learning) rely heavily on non-
functions is linear). linearity.
Examples Single-Layer Perceptron (when Multi-Layer Perceptrons (MLPs),
using a simple threshold), Linear Convolutional Neural Networks
Regression models. (CNNs), Recurrent Neural
Networks (RNNs).
Practical Limited to very simple problems. Foundation of most modern AI
Use Rarely used alone for complex and machine learning
tasks. applications due to their ability to
model real-world complexities.

Illustration of Decision Boundaries:

Linear Separation:
x
+ +
+ - - - - - - -
- -
-
(A single straight line can separate '+' from '-')

Nonlinear Separation (e.g., XOR Problem):

+ -
X
- +
(No single straight line can separate '+' from '-')

In essence, the non-linear activation functions are what give neural networks their power to
learn complex, non-trivial relationships in data. Without them, a multi-layer neural network
would simply be equivalent to a single-layer linear model, no matter how many layers it has.

Topic 3: Feedforward Neural Networks

Questions Covered:

May-Jun 2023 Q1 c) What is feed forward neural network explain with example. [8]
May-Jun 2024 Q1 a) What are the components of a feed forward neural network?
Explain with the help of neat sketch. [6]

Answer:

What is Feedforward Neural Network? Explain with example. (May-Jun 2023 Q1 c)

Components of a Feedforward Neural Network with a Neat Sketch (May-Jun 2024 Q1
a)

A Feedforward Neural Network (FNN) is the most basic type of artificial neural network
where the connections between nodes do not form a cycle. Information flows in only one
direction: from the input layer, through any hidden layers, and to the output layer. It's called
"feedforward" because the information is propagated forward through the network. The Multi-
Layer Perceptron (MLP) is a common example of an FNN.

Components of a Feedforward Neural Network:

1. Input Layer:
Purpose: Receives the raw input features of the dataset.
Structure: Consists of neurons (nodes), where each neuron corresponds to an input
feature.
Operation: Simply passes the input values to the first hidden layer. No computations
or activation functions are applied here.
2. Hidden Layers:
Purpose: Perform complex computations, extract features, and transform the input
data into more abstract representations. They are responsible for learning the
intricate patterns within the data.
Structure: One or more layers located between the input and output layers. Each
neuron in a hidden layer is connected to all neurons in the previous layer.
Operation: For each neuron j in a hidden layer:
It computes a weighted sum of inputs from the previous layer:
net j = ∑ (w ij ⋅ x i ) + b j
i

Then, it applies a non-linear activation function (f ) to this sum:

output = f (net ). Common functions include ReLU, Sigmoid, Tanh.
j j

3. Output Layer:
Purpose: Produces the final result of the network, which can be a prediction (for
regression) or a classification (for classification tasks).
Structure: Contains neurons that correspond to the desired output. The number of
neurons depends on the task (e.g., 1 for binary classification/regression, N for N-
class classification).
Operation: Similar to hidden layers, it computes a weighted sum and applies an
activation function, which is chosen based on the problem type (e.g., Sigmoid for
binary classification, Softmax for multi-class classification, linear for regression).
4. Weights (W ):
Purpose: Numerical values associated with each connection between neurons. They
represent the strength or importance of the connection.
Role: These are the parameters that the network learns during training. A higher
weight means the input from that connection has a stronger influence on the
receiving neuron.
5. Biases (b):
Purpose: An additional parameter associated with each neuron (except input
neurons). It allows the activation function to be shifted, providing more flexibility for
the model to fit the data.
Role: Acts like an intercept term in linear regression, allowing the neuron to activate
even if all inputs are zero, or to suppress activation if inputs are high but the bias is
very negative.
6. Activation Functions (f ):
Purpose: Introduce non-linearity into the network, enabling it to learn and model
complex, non-linear relationships in data. Without non-linear activations, any multi-
layer FNN would behave like a single-layer linear model.
Placement: Applied after the weighted sum in hidden and output layers.

Neat Sketch of a Feedforward Neural Network:

Input Layer Hidden Layer 1 Hidden Layer 2

Output Layer
(Features: x1, x2, x3) (Neurons: h11, h12) (Neurons: h21, h22)
(Output: y_hat)

x1 --------- (w11) --------> O (h11) -------- (w_out1) --------> O (h21)

--------- (w_final_1) --------> O (y_hat)
\ / ^ ^ \
/ /
\ / | | \
/ /
\ / | Bias (b_h1) | \
/ /
\ / | | \
/ /
\ / v v \
/ /
O-----------(w21)----------O (h12) ------------- (w_out2) -------
-> O (h22) ------ (w_final_2) -------->
/|\ ^ \
/ /
/ | \ | \
/ /
x2 | x3 (Other Inputs) Bias (b_h2)
Bias (b_out)

Explanation of Sketch:

Circles represent neurons.

Arrows represent connections, each with a weight (W) associated with it.
x1, x2, x3 are input features.
h11, h12 are neurons in the first hidden layer.
h21, h22 are neurons in the second hidden layer.
y_hat is the predicted output.
Each neuron (except input neurons) has an implicit bias term (e.g., b_h1 , b_h2 ,
b_out ) that is added to its weighted sum.
After the weighted sum and bias, an activation function is applied at each neuron.

Example: Image Classification (e.g., MNIST digit recognition)

Let's consider classifying handwritten digits (0-9) from grayscale images (e.g., 28x28 pixels).

1. Input Layer:
An image of 28x28 pixels can be flattened into a vector of 28 × 28 = 784 pixel values.
The input layer would have 784 neurons, each representing the intensity of one
pixel.
2. Hidden Layers:
We might have one or two hidden layers, say, the first with 128 neurons and the
second with 64 neurons.
Each neuron in the first hidden layer receives input from all 784 input neurons. It
calculates a weighted sum of these pixel values, adds its bias, and applies a ReLU
activation function.
The outputs of the first hidden layer serve as inputs to the second hidden layer, and
so on. These layers learn to extract increasingly complex features, like edges,
curves, and parts of digits.
3. Output Layer:
Since we are classifying 10 digits (0-9), the output layer would have 10 neurons.
Each output neuron corresponds to one digit class.
A Softmax activation function is typically applied to the output layer. Softmax converts
the raw outputs into probabilities, where the sum of probabilities for all 10 classes
equals 1.
The neuron with the highest probability indicates the predicted digit.

Training Process (Simplified):

1. Forward Pass: An image (e.g., a '5') is fed into the input layer. The pixel values
propagate forward through the hidden layers, undergoing weighted sums and activation
functions, until a set of 10 probability scores is produced by the output layer (e.g., [0.1,
0.05, ..., 0.8, ..., 0.01] where 0.8 might be for class '5').
2. Loss Calculation: The network's predicted probabilities are compared to the actual label
(e.g., a "one-hot" encoded vector [0,0,0,0,0,1,0,0,0,0] for '5') using a loss function (e.g.,
Cross-Entropy Loss). The loss quantifies how "wrong" the prediction was.
3. Backward Pass (Backpropagation): The calculated loss is then propagated backward
through the network. This process determines how much each weight and bias
contributed to the error.
4. Weight Update (Gradient Descent): Using the calculated gradients, an optimizer (like
Gradient Descent) adjusts the weights and biases slightly to reduce the loss.
5. Iteration: Steps 1-4 are repeated for many images (epochs) until the network learns to
accurately classify digits.

Topic 4: Gradient Descent & Backpropagation

Questions Covered:

May-Jun 2024 Q2 a) Explain Gradient descent algorithm that is used to train the
neural networks.[6]
May-Jun 2024 Q2 c) How the backpropagation algorithm works? Explain with
suitable example.[8]
Answer:

Explain Gradient Descent Algorithm that is used to train the Neural Networks. (May-
Jun 2024 Q2 a)

Gradient Descent is a widely used iterative optimization algorithm for training neural
networks. Its primary goal is to find the set of weights and biases for the network that
minimize a given loss function (or cost function). The loss function quantifies the error
between the network's predictions and the actual target values.

Core Concept:

Imagine the loss function as a landscape with hills and valleys, where the lowest point (a
valley) represents the minimum loss. Gradient Descent works by iteratively "descending" this
landscape in the steepest possible direction until it reaches a local (or global) minimum. The
"steepest direction" is given by the negative of the gradient of the loss function.

Steps of Gradient Descent:

1. Initialization: Start by initializing the network's weights and biases randomly (or with
small values).
2. Calculate Loss: For a given set of input data, perform a forward pass through the
network to obtain predictions. Then, calculate the value of the loss function based on
these predictions and the actual target values.
3. Calculate Gradients: Compute the gradient of the loss function with respect to each
weight and bias in the network. The gradient indicates the direction of the steepest
ascent (maximum increase) of the loss function. We want to move in the opposite
direction.
Mathematically, for a weight w and a loss function J (θ), where θ represents all
j

parameters (weights and biases), we calculate ∂J (θ)

∂w j
.
4. Update Parameters: Adjust each weight and bias by moving a small step in the direction
opposite to its gradient. The size of this step is controlled by a parameter called the
learning rate (α).
The update rule for a parameter θ is: j

∂J (θ)
θ j new = θ j old − α ⋅
∂θ j

A small learning rate leads to slow but potentially more stable convergence. A large
learning rate can cause oscillations or divergence.
5. Iteration: Repeat steps 2-4 for a specified number of training iterations (epochs) or until
the change in the loss function becomes negligible (convergence).

Analogy:
Think of yourself blindfolded on a mountainous terrain, trying to reach the lowest point (the
valley). You can't see the whole landscape. What you can do is feel the slope under your
feet. To go downhill fastest, you'd take a step in the direction where the slope is steepest
downwards. That's exactly what gradient descent does: it calculates the direction of steepest
ascent (gradient) and takes a step in the opposite direction.

Types of Gradient Descent:

While the core concept is the same, how much data is used to calculate the gradient in each
step leads to different variants:

Batch Gradient Descent (BGD): Calculates the gradient using the entire training
dataset in each iteration. This provides a very accurate gradient but can be
computationally very expensive and slow for big data.
Stochastic Gradient Descent (SGD): Calculates the gradient and updates parameters
for each single training example at a time. This is much faster and can escape local
minima, but the updates are noisy due to high variance.
Mini-Batch Gradient Descent: The most common approach. It calculates the gradient
and updates parameters using a small "mini-batch" of training examples (e.g., 32, 64,
128 samples) at a time. This offers a good balance between the computational efficiency
of SGD and the gradient stability of BGD.

How the Backpropagation Algorithm Works? Explain with suitable example. (May-Jun
2024 Q2 c)

Backpropagation is the foundational algorithm for efficiently training multi-layer neural

networks. It's an algorithm for computing the gradient of the loss function with respect to the
weights and biases of a network. This gradient information is then used by optimization
algorithms like Gradient Descent to update the parameters.

Core Idea: The Chain Rule in Reverse

Backpropagation essentially applies the chain rule of calculus to compute gradients layer
by layer, starting from the output layer and moving backward towards the input layer. It
determines how much each weight and bias contributed to the overall error (loss) of the
network.

Steps of Backpropagation:

Let's consider a simple FNN with one hidden layer for illustration:

Input Layer (x) -> Hidden Layer (h) -> Output Layer (y_hat)
1. Forward Pass:
Input data (x) is fed into the network.
Activations are computed layer by layer, from input to output.
For the hidden layer: net h
= x ⋅ Wh + bh , then h = f h
(net h )

For the output layer: net y = h ⋅ Wy + by , then y hat = f y (net y )

Finally, the loss (L) is calculated by comparing y hat with the true target y (e.g., using
Mean Squared Error or Cross-Entropy).
2. Backward Pass (Error Propagation & Gradient Calculation):
Calculate Output Layer Error (δ ): y

First, compute the error derivative with respect to the output layer's activation
and the derivative of the output activation function.
For example, if using MSE loss (L = 1

2
(y − y hat )
2
) and linear output activation (
y hat = net y ), then ∂L

∂y hat
= −(y − y hat ) .
The "error signal" for the output layer is typically: δ = (y − y) ⋅ f (net ) (for y hat
′
y y

MSE and a general activation function, f is the derivative of the activation

′
y

function).
Calculate Gradients for Output Layer Weights (W ) and Biases (b ): y y

The gradient of the loss with respect to a weight connecting a hidden neuron h k

to an output neuron y is: j

∂L
= δ yj ⋅ h k
∂W ykj

The gradient for the bias of an output neuron y is: j

∂L
= δ yj
∂b y j

Propagate Error to Hidden Layer (δ ): h

The error from the output layer is propagated backward to the hidden layer.
Each hidden neuron receives an error signal that is a weighted sum of the error
signals from the output neurons it connects to.
δ h_k = (∑
j
′
δ y_j ⋅ W ykj ) ⋅ f (net h k )
h
(where f is the derivative of the hidden
′
h

layer's activation function).

Calculate Gradients for Hidden Layer Weights (W ) and Biases (b ): h h

Similar to the output layer, using the propagated error signal δ : h

∂L
= δ hk ⋅ x i
∂W xik

∂L
= δh
k
∂b h
k

3. Update Weights and Biases:

Once all gradients (∂L

∂W
and ) are computed, the optimizer (e.g., Gradient
∂L

∂b

Descent) updates the parameters:

∂L
W new = W old − α ⋅
∂W

∂L
b new = b old − α ⋅
∂b

This process is repeated for many iterations (epochs) over the training data until the
network's performance is satisfactory.

Suitable Example (Conceptual):

Imagine a simple network trying to classify if an image contains a cat (output 1) or a dog
(output 0).

1. Forward Pass: You feed a cat image. The network, after calculating weighted sums and
activations through its layers, predicts "0.8" (meaning 80% confident it's a cat). The true
label is 1.
2. Loss Calculation: The loss function (e.g., binary cross-entropy) calculates a value
indicating the difference between 0.8 and 1.0. This error is now the target to minimize.
3. Backward Pass:
Output Layer: The backpropagation algorithm starts at the output layer. It asks:
"How much did this output neuron's weight/bias contribute to the error of 0.8 vs 1.0?"
It calculates the derivatives related to the output neuron's contribution.
Hidden Layer: It then propagates this "error signal" backward to the hidden layer.
For each hidden neuron, it asks: "Based on the error from the output layer, how much
did my weights and biases contribute to that error?" It uses the chain rule to
determine this. For instance, if a hidden neuron was crucial for detecting "ears" and
the cat image was misclassified, the error signal will be strong for the weights
connected to that "ear-detecting" neuron.
This "blame assignment" continues backward until the first hidden layer.
4. Weight Update: Gradient Descent uses these calculated blame signals (gradients) to
slightly adjust all the weights and biases in the network. For example, if the "ear-
detecting" neuron's weights were contributing to a wrong classification, they would be
adjusted to better recognize cat ears in the future.

This iterative process of forward pass (prediction), loss calculation, backward pass (gradient
calculation), and parameter update allows the neural network to learn from its mistakes and
improve its accuracy over time.

Topic 5: Recurrent Neural Networks

Questions Covered:

May-Jun 2023 Q2 c) Explain recurrent neural networks with example. [8]

May-Jun 2024 Q1 c) What is Recurrent neural network? Explain in detail. [8]

Answer:

What is Recurrent Neural Network? Explain in detail / with example. (May-Jun 2023 Q2
c & May-Jun 2024 Q1 c)

Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to
process sequential data, unlike traditional Feedforward Neural Networks (FNNs) that
assume inputs are independent of each other. The "recurrent" aspect comes from the fact
that information from previous time steps is fed back into the network, allowing it to exhibit
temporal dynamic behavior and remember past information. This internal memory makes
RNNs particularly well-suited for tasks involving sequences, such as natural language
processing, speech recognition, and time series prediction.

Core Concept:

An RNN has a "memory" in the form of a hidden state that is updated at each time step. The
hidden state (h ) at time t depends not only on the current input (x ) but also on the hidden
t t

state from the previous time step (h ). This allows the network to maintain context from
t−1

previous elements in the sequence.

Architecture and Operation:

The basic RNN unit has two main components:

1. Current Input (x ): The input at the current time step.

2. Previous Hidden State (h ): The output of the hidden layer from the previous time
t−1

step. This serves as the network's "memory."

These two components are combined, typically multiplied by their respective weights,
summed, and then passed through an activation function (like Tanh or ReLU) to produce the
current hidden state (h ). The hidden state h can then be used to calculate the output (y )
t t t

for the current time step.

The key feature is that the weights associated with the recurrent connections (connecting
ht−1 to h ) are shared across all time steps. This means the same set of parameters is
t

applied to different parts of the sequence, enabling the network to generalize across different
positions in the sequence.

Mathematical Representation of a Simple RNN Cell:

Hidden state at time t:

h t = f h (W hh h t−1 + W xh x t + b h )

Output at time t:

y t = f y (W hy h t + b y )

Where:
xt : Input at time t
ht : Hidden state at time t
yt : Output at time t
W hh : Weight matrix for the recurrent connection (hidden to hidden)
W xh : Weight matrix for input to hidden
W hy : Weight matrix for hidden to output
bh , by : Bias vectors
fh , fy : Activation functions (e.g., Tanh for f , Softmax for f for classification)
h y

Unrolled RNN Diagram:

To understand the flow over time, an RNN is often "unrolled" across time steps:

Input: x_0 ------> x_1 ------> x_2 ------> x_3

| | | |
v v v v
Hidden: h_(-1) -> [RNN Cell] -> [RNN Cell] -> [RNN Cell] -> [RNN Cell]
(initial) | h_0 | h_1 | h_2 | h_3
| | | |
v v v v
Output: y_0 y_1 y_2 y_3

Each [RNN Cell] box represents the same set of weights (W hh , W xh , W hy , b h , b y ) being
applied at different time steps.
h_(-1) is the initial hidden state (often initialized to zeros).

Key Features and Advantages:

1. Memory: Can process sequences of arbitrary length by maintaining a hidden state that
implicitly captures information about prior elements in the sequence.
2. Weight Sharing: Uses the same weights across different time steps, which reduces the
number of parameters and makes the model more efficient for sequence data.
3. Variable Length Inputs/Outputs: Can handle input sequences of varying lengths and
produce output sequences of varying lengths (e.g., many-to-one, one-to-many, many-to-
many).
Challenges with Vanilla RNNs:

Vanishing Gradient Problem: During backpropagation through time (BPTT), gradients

can become extremely small, making it difficult for the network to learn long-term
dependencies (i.e., information from early parts of a long sequence might not influence
later parts).
Exploding Gradient Problem: Conversely, gradients can become excessively large,
leading to unstable training.
Short-Term Memory: Due to vanishing gradients, simple RNNs struggle to remember
information for more than a few time steps.

Advanced RNN Architectures (to mitigate challenges):

Long Short-Term Memory (LSTM) Networks: Introduce "gates" (input, forget, output
gates) and a "cell state" to control the flow of information, enabling them to learn long-
term dependencies much more effectively.
Gated Recurrent Units (GRUs): A simpler variant of LSTMs with fewer gates, offering a
good balance between performance and computational efficiency.

Example: Sentiment Analysis of a Sentence

Problem: Classify a sentence as positive, negative, or neutral sentiment.

How an RNN (or LSTM/GRU) works:

1. Input Sequence: The sentence is tokenized into a sequence of words (or word
embeddings). For example: "This movie was absolutely amazing!"
x0 : "This"
x1 : "movie"
x2 : "was"
x3 : "absolutely"
x4 : "amazing!"
2. Processing (Many-to-One):
At each time step t, the RNN cell takes the current word embedding (x ) and the
t

hidden state from the previous word (h ). t−1

It updates its internal hidden state (h ).

The hidden state for "This" (h ) captures initial context.

h1 (for "movie") combines "movie" and h . 0

This continues until the end of the sentence. The final hidden state (h for
4

"amazing!") will ideally encode the overall sentiment of the entire sentence,
remembering the cumulative impact of words like "absolutely" and "amazing!".
3. Output:
After processing the entire sequence, the final hidden state (h ) is fed into a
4

classification layer (e.g., a softmax layer) that predicts the sentiment: "Positive",
"Negative", or "Neutral".

RNN for Sentiment Analysis Diagram (Many-to-One):

Word Embeddings (Inputs):

Other examples include:

Speech Recognition: Input is an audio sequence, output is a sequence of words.

Machine Translation: Input is a sentence in one language, output is a sentence in
another (e.g., Encoder-Decoder RNNs).
Time Series Prediction: Input is a sequence of past stock prices, output is future stock
price.

Zero To Deep Learning With Keras and Tensorflow Compress
No ratings yet
Zero To Deep Learning With Keras and Tensorflow Compress
769 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
100% (1)
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
15 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Unit 1 Question and Answers
100% (1)
Unit 1 Question and Answers
29 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
Iterative Autoassociative Memory Models For Image Recalls and Pa
No ratings yet
Iterative Autoassociative Memory Models For Image Recalls and Pa
6 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Exe 1 DL
No ratings yet
Exe 1 DL
3 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
NNDL
No ratings yet
NNDL
96 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
ML Theory Questions Final
No ratings yet
ML Theory Questions Final
3 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
ML-Lec10-Artificial Neural Networks
No ratings yet
ML-Lec10-Artificial Neural Networks
76 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
Unit 5
No ratings yet
Unit 5
102 pages
Ann MLP
No ratings yet
Ann MLP
56 pages
Basics
No ratings yet
Basics
48 pages
UNIT1
No ratings yet
UNIT1
72 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
RDM Slides Clustering With R 1
No ratings yet
RDM Slides Clustering With R 1
64 pages
Module 2
No ratings yet
Module 2
84 pages
Neural Networks: Some Material Adopted From Notes by
No ratings yet
Neural Networks: Some Material Adopted From Notes by
35 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
Ai Fundamentals Final Quiz Source by Ate Zein
No ratings yet
Ai Fundamentals Final Quiz Source by Ate Zein
25 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
NNDL
No ratings yet
NNDL
69 pages
01 - Large Networks
No ratings yet
01 - Large Networks
38 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
Unit V
No ratings yet
Unit V
49 pages
Lesson 03 Artificial Neural Network
No ratings yet
Lesson 03 Artificial Neural Network
116 pages
ML Unit 2
No ratings yet
ML Unit 2
63 pages
Soft Computing Lab
No ratings yet
Soft Computing Lab
33 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Clustering
No ratings yet
Clustering
18 pages
UNIT-II Chapter-2
No ratings yet
UNIT-II Chapter-2
20 pages
Chapter One
No ratings yet
Chapter One
9 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
ML Exam Prep
No ratings yet
ML Exam Prep
14 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
ML 6
No ratings yet
ML 6
10 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
No ratings yet
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
19 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
NNML Full
No ratings yet
NNML Full
19 pages
Unit 3
No ratings yet
Unit 3
8 pages
Neural Network
No ratings yet
Neural Network
12 pages
Machine Learning Toolkit User Manual
No ratings yet
Machine Learning Toolkit User Manual
7 pages
Group 4
No ratings yet
Group 4
11 pages
19 - Introduction To Neural Networks
No ratings yet
19 - Introduction To Neural Networks
7 pages
Idap 2019 8875953
No ratings yet
Idap 2019 8875953
6 pages
Research Paper
No ratings yet
Research Paper
8 pages
Quiz 1 Machine Learning II
No ratings yet
Quiz 1 Machine Learning II
7 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
DL Mod 1 Final
No ratings yet
DL Mod 1 Final
4 pages
Be Winter 2022
No ratings yet
Be Winter 2022
2 pages
Neural Network Representation
No ratings yet
Neural Network Representation
5 pages
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
No ratings yet
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
5 pages
K Means Clustering - Ipynb - Colaboratory
No ratings yet
K Means Clustering - Ipynb - Colaboratory
4 pages
CNN Notes - Rohan
No ratings yet
CNN Notes - Rohan
2 pages
Guidelines Datamining II
No ratings yet
Guidelines Datamining II
2 pages
Btech All 7 Sem Soft Computing Pcp7h010 2020
No ratings yet
Btech All 7 Sem Soft Computing Pcp7h010 2020
2 pages
Bcse209l - Machine-Learning - TH - 1.0 - 71 - Bcse209l - 66 Acp
No ratings yet
Bcse209l - Machine-Learning - TH - 1.0 - 71 - Bcse209l - 66 Acp
2 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet

Unit 3 Endsem PYQs

Uploaded by

Unit 3 Endsem PYQs

Uploaded by

Unit 3 Endsem PYQs

Unit III: Neural Networks for Big Data

Topic 1: Fundamentals of Neural Networks and Artificial

May-Jun 2023 Q1 a) Explain fundamental of neural networks in big data. [6]

Role in Big Data:

Architecture of Artificial Neural Networks (May-Jun 2023 Q2 a)

Key Components and Architecture:

The bias allows the activation function to be shifted, effectively providing an

Common examples: ReLU (max(0, x)), Sigmoid ( 1

General Architecture Diagram:

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

Topic 2: Perceptron, Linear & Nonlinear Models

May-Jun 2023 Q1 b) What is perceptron? Explain Types of Perceptron. [6]

What is Perceptron? Explain Types of Perceptron. (May-Jun 2023 Q1 b)

How a Perceptron Works:

1. Input: A perceptron receives multiple input signals (x 1

1 if net ≥ threshold (or 0)

1. Single-Layer Perceptron (SLP):

Input Layer Output Layer

O--------------------O (Single Neuron)

2. Multi-Layer Perceptron (MLP):

Input Layer Hidden Layer(s) Output Layer

Illustration of Decision Boundaries:

Nonlinear Separation (e.g., XOR Problem):

Topic 3: Feedforward Neural Networks

What is Feedforward Neural Network? Explain with example. (May-Jun 2023 Q1 c)

Components of a Feedforward Neural Network:

Then, it applies a non-linear activation function (f ) to this sum:

Neat Sketch of a Feedforward Neural Network:

Input Layer Hidden Layer 1 Hidden Layer 2

x1 --------- (w11) --------> O (h11) -------- (w_out1) --------> O (h21)

Circles represent neurons.

Example: Image Classification (e.g., MNIST digit recognition)

Training Process (Simplified):

Topic 4: Gradient Descent & Backpropagation

Steps of Gradient Descent:

parameters (weights and biases), we calculate ∂J (θ)

Types of Gradient Descent:

Backpropagation is the foundational algorithm for efficiently training multi-layer neural

Core Idea: The Chain Rule in Reverse

For the output layer: net y = h ⋅ Wy + by , then y hat = f y (net y )

MSE and a general activation function, f is the derivative of the activation

to an output neuron y is: j

The gradient for the bias of an output neuron y is: j

Propagate Error to Hidden Layer (δ ): h

layer's activation function).

Similar to the output layer, using the propagated error signal δ : h

3. Update Weights and Biases:

Descent) updates the parameters:

Suitable Example (Conceptual):

Topic 5: Recurrent Neural Networks

May-Jun 2023 Q2 c) Explain recurrent neural networks with example. [8]

previous elements in the sequence.

Architecture and Operation:

The basic RNN unit has two main components:

1. Current Input (x ): The input at the current time step.

step. This serves as the network's "memory."

for the current time step.

Mathematical Representation of a Simple RNN Cell:

Unrolled RNN Diagram:

Input: x_0 ------> x_1 ------> x_2 ------> x_3

Key Features and Advantages:

Vanishing Gradient Problem: During backpropagation through time (BPTT), gradients

Advanced RNN Architectures (to mitigate challenges):

Example: Sentiment Analysis of a Sentence

Problem: Classify a sentence as positive, negative, or neutral sentiment.

How an RNN (or LSTM/GRU) works:

hidden state from the previous word (h ). t−1

It updates its internal hidden state (h ).

The hidden state for "This" (h ) captures initial context.

h1 (for "movie") combines "movie" and h . 0

RNN for Sentiment Analysis Diagram (Many-to-One):

Word Embeddings (Inputs):

Other examples include:

Speech Recognition: Input is an audio sequence, output is a sequence of words.

You might also like