21CS743 Model Question Paper Solution
21CS743 Model Question Paper Solution
USN
Note: 01. Answer any FIVE full questions, choosing at least ONE question from each MODULE.
Module -1 Marks
COs
Q.01 a Explain the historical trends in deep learning. 1 10
1940s–1950s: Foundations
1. McCulloch-Pitts Model (1943): Simplified artificial neuron model.
2. Hebbian Learning (1949): Proposed learning mechanism based on neuron
co-activation.
1960s–1970s: Perceptrons and AI Winter
1. Perceptron (1958): Early neural network for linear classification.
2. Limitations: Minsky and Papert (1969) proved single-layer perceptrons
couldn't solve XOR problems.
3. AI Winter: Funding and interest declined due to perceived limitations.
Machine Learning
Machine Learning is a branch of artificial intelligence that allows systems to learn
and improve from data without explicit programming. It focuses on creating
algorithms to identify patterns and make predictions or decisions.
OR
Q.02 a Explain in detail about the supervised learning approach by taking suitable ex. 1 10
Supervised Learning
• Trains a model using labeled data where each input corresponds to a known
output.
• Aims to learn the relationship between inputs and outputs to make
predictions for new data.
• Consists of two phases: training (model learns patterns) and testing (model
evaluates its performance).
• Predicts outcomes by minimizing the difference between predicted and
actual outputs using a loss function.
• Divided into regression (predicts continuous outputs) and classification
(predicts categorical outputs).
Common Algorithms
• Linear Regression for continuous predictions.
• Logistic Regression for binary classification.
• Decision Trees for data splitting based on features.
• Support Vector Machines for separating classes with a hyperplane.
• Neural Networks for handling complex, non-linear relationships.
Advantages
• Produces accurate results when trained on quality data.
• Easy to understand and implement for straightforward tasks.
• Widely applicable in areas like fraud detection, spam filtering, and predictive
maintenance.
Limitations
• Requires a large and accurately labeled dataset.
• May overfit, leading to poor performance on unseen data.
• Labeling data can be time-consuming and resource-intensive.
Applications
• Used in healthcare for disease prediction and treatment planning.
• Finance applications include credit scoring and fraud detection.
• Marketing uses include customer segmentation and churn prediction.
Regularization
• Regularization is a technique used in machine learning to prevent overfitting
by adding a penalty to the loss function.
• The penalty discourages the model from fitting the noise in the training data,
encouraging simpler models with better generalization.
Advantages of Regularization
• Reduces overfitting while retaining model accuracy.
• Encourages sparsity in features (L1) and prevents large weights (L2).
• Enhances model robustness and generalization capabilities.
OR
Backpropagation
1. Purpose
o Backpropagation is an algorithm used to train neural networks by
updating weights and biases to minimize the loss function.
2. Process Overview
o Involves two main steps: forward propagation and backward
propagation.
o Forward propagation computes the output of the network and
calculates the loss.
o Backward propagation adjusts weights and biases using the gradient
of the loss function.
3. Steps in Backpropagation
o Forward Propagation:
▪ Input data passes through the network layer by layer.
▪ Weighted sums and activation functions are applied to
compute the output.
▪ Loss is calculated using a predefined loss function.
o Backward Propagation:
▪ Gradients of the loss with respect to the output layer
parameters are computed.
▪ Gradients are propagated backward through the network
using the chain rule to compute gradients for each layer.
▪ These gradients indicate how weights and biases in each layer
should be updated.
4. Weight and Bias Updates
o Parameters are updated using gradient descent or its variants:
▪ New Weight = Old Weight - Learning Rate × Gradient.
o The learning rate determines the step size during updates.
5. Key Components
o Loss Function: Measures the difference between predicted and
actual outputs (e.g., Mean Squared Error or Cross-Entropy Loss).
o Activation Function: Introduces non-linearity, enabling the network
to learn complex patterns. Examples include ReLU, Sigmoid, and
Tanh.
6. Training Iterations
o The algorithm repeats forward and backward propagation for
multiple epochs until the loss converges or reaches a predefined
threshold.
7. Advantages
o Efficiently trains deep networks by distributing the error signal to all
layers.
o Can handle large-scale data with the help of optimization techniques.
8. Limitations
o Computationally expensive for large networks.
o Sensitive to vanishing or exploding gradients, especially in very deep
networks.
9. Applications
o Widely used in training neural networks for tasks like image
recognition, natural language processing, and predictive modeling.
Module-3
• The goal of ERM is to find the hypothesis (model) that minimizes the
empirical risk, which is an approximation of the true risk (expected loss over
the entire distribution).
• A loss function measures how well the model's predictions match the true
values. Common examples include Mean Squared Error (MSE) for
regression and Cross-Entropy Loss for classification problems.
• While ERM minimizes the error on the training data, it does not directly
ensure good performance on unseen data, which could lead to overfitting.
• ERM focuses only on the training data, which can lead to overfitting if the
model is too complex or underfitting if the model is too simple.
• Regularization methods like L1 or L2 regularization can be used alongside
ERM to prevent overfitting by adding penalties for overly complex models.
• The true risk (expected risk) is the average loss over the entire distribution
of data, while ERM approximates it with training data, but may not always
align with it.
• ERM is widely used in supervised learning models, including linear
regression, decision trees, and neural networks.
OR
5. AdaGrad helps adjust the learning rates for different features, making it
especially useful for high-dimensional or sparse datasets, such as in natural
language processing or image recognition tasks.
6. One of the advantages of AdaGrad is that it eliminates the need for manual
learning rate decay since the algorithm adapts the learning rate based on the
parameters' updates over time.
7. However, a disadvantage of AdaGrad is that the learning rate tends to
decrease rapidly as the algorithm progresses, which can slow down
convergence in the later stages of training.
8. AdaGrad is particularly effective in domains with sparse data where certain
features appear less frequently than others, allowing the model to adjust the
learning rates accordingly for better optimization.
9. Compared to Stochastic Gradient Descent (SGD), AdaGrad adjusts the
learning rate for each parameter, allowing it to perform better when dealing
with datasets where feature frequencies vary widely.
10. While AdaGrad is a useful algorithm for sparse data, its rapid learning rate
decay can limit its efficiency in more complex, dense data scenarios.
Alternative algorithms like RMSprop and Adam are often preferred to
address this limitation.
Module-4
1. Input Layer:
o The input layer receives the image or data in the form of a multi-
dimensional array (e.g., height, width, and depth for color images).
The input data passes through the CNN for further processing.
2. Convolutional Layer:
o This layer performs the core operation of a CNN. It applies a set of
filters (kernels) to the input image, performing a convolution
operation. The filters slide over the image, computing dot products
between the filter and the region of the image it covers, extracting
features such as edges, textures, or patterns.
3. Activation Function:
o After the convolution operation, an activation function is applied,
typically the Rectified Linear Unit (ReLU). This function introduces
non-linearity, enabling the network to learn more complex patterns
and representations.
4. Pooling Layer:
o The pooling layer reduces the spatial dimensions (height and width)
of the feature maps while retaining important information. Common
types of pooling include Max Pooling (selects the maximum value in
the region) and Average Pooling (computes the average value).
Pooling helps reduce the computational complexity and prevent
overfitting.
5. Fully Connected Layer (Dense Layer):
o This layer connects every neuron in the previous layer to every
neuron in the current layer. It’s used for classification or regression
tasks. The fully connected layer outputs a final prediction or
classification, such as determining the class of the object in an image.
6. Normalization Layer:
o Normalization layers, like Batch Normalization, help to stabilize the
learning process by reducing internal covariate shift. They normalize
the input to each layer to have zero mean and unit variance, speeding
up training and improving performance.
7. Dropout Layer:
o Dropout is a regularization technique where random neurons are
"dropped" (set to zero) during training. This prevents overfitting by
ensuring that the network doesn’t rely too heavily on any single
neuron and helps it generalize better to new data.
8. Flatten Layer:
o The flatten layer converts the multi-dimensional output from the
convolutional and pooling layers into a 1D vector. This step is
necessary before passing the data into the fully connected layers, as
they require a 1D input.
9. Output Layer:
o The output layer generates the final prediction or classification result.
In classification tasks, it often uses a softmax activation function for
multi-class problems or a sigmoid function for binary classification.
The output layer size corresponds to the number of classes or
categories in the problem.
OR
1. LeNet (LeNet-5):
o One of the first CNN architectures, designed for digit recognition
(e.g., MNIST).
o Composed of two convolutional layers, followed by pooling layers
and fully connected layers.
o Simple architecture suitable for small image datasets.
2. AlexNet:
o Introduced in 2012 and won the ImageNet challenge.
o Consists of five convolutional layers and three fully connected
layers.
o Uses ReLU activation, dropout, and data augmentation to improve
training efficiency.
3. VGGNet (VGG16/VGG19):
o Known for its deep architecture with 16 or 19 layers.
o Uses 3x3 convolution filters stacked on top of each other.
o Simple but deep, it is easy to understand and apply.
4. GoogLeNet (Inception):
o Uses inception modules, where multiple convolution filters of
different sizes are applied at each layer.
o Combines different levels of feature extraction, making the network
efficient.
o Introduced the concept of "network in network."
5. ResNet (Residual Networks):
o Introduced residual learning to avoid vanishing gradient problems.
o Uses skip connections to pass output from one layer to a deeper
layer.
o ResNet allows for much deeper networks (e.g., ResNet-50, ResNet-
101).
6. DenseNet (Densely Connected Convolutional Networks):
o Each layer connects to every previous layer, enhancing feature
reuse.
o Improves gradient flow and reduces the vanishing gradient
problem.
o Requires fewer parameters compared to traditional CNNs.
7. MobileNet:
o Designed for mobile and embedded systems with limited
computational power.
o Uses depthwise separable convolutions to reduce computational
cost.
o Efficient for real-time mobile vision applications.
8. SqueezeNet:
o A compact CNN model designed for efficiency with fewer
parameters.
o Utilizes fire modules that combine 1x1 convolutions and 3x3
convolutions.
o Achieves competitive accuracy with a significantly smaller model
size.
9. U-Net:
o Primarily used for image segmentation tasks, especially in medical
image analysis.
o Features an encoder-decoder architecture that reduces and restores
spatial dimensions.
o Performs pixel-wise predictions to segment images.
10. EfficientNet:
• A family of models that balances depth, width, and resolution to improve
accuracy while reducing parameters.
• Uses a compound scaling method to scale the network efficiently.
• Achieves high performance with fewer parameters compared to other
models.
Module-5
Q. 09 a Explain how the recurrunt neural network (RNN) processes data sequences. 5 10
1. Recurrent Neural Networks (RNNs) are designed to process sequences of
data, enabling them to maintain information from previous time steps.
2. At each time step, an RNN processes the current input and the hidden state
from the previous time step.
3. The hidden state acts as the network's memory, capturing information from
earlier steps in the sequence.
4. The output of the previous time step is fed back into the network, allowing
it to influence future outputs.
5. RNNs are typically visualized by "unrolling" them across time steps, where
each time step corresponds to a layer of the network.
6. Training an RNN involves using Backpropagation Through Time
(BPTT), which computes gradients for each time step and updates the
weights through the entire sequence.
7. Some challenges faced by RNNs include the vanishing gradient problem,
where gradients become too small to learn long-term dependencies, and the
exploding gradient problem, where gradients grow too large and cause
instability.
8. To overcome these challenges, more advanced architectures like Long
Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) are
used, which introduce mechanisms to preserve important information and
avoid vanishing or exploding gradients.
OR
4. Advantages:
o LSTMs are capable of learning long-range dependencies in
sequential data by maintaining and updating cell states over time.
o They effectively mitigate the vanishing gradient problem, making
them suitable for tasks with long sequences, like language translation
and speech recognition.
*Bloom’s Taxonomy Level: Indicate as L1, L2, L3, L4, etc. It is also desirable to indicate the COs and POs to
be attained by every bit of questions.
Page 01 of 01