0% found this document useful (0 votes)

140 views32 pages

21CS743 Model Question Paper Solution

The document is a model question paper for a 7th Semester B.E. Degree Examination in Deep Learning, covering various topics including historical trends in deep learning, machine learning definitions, supervised learning, support vector machines, PCA, deep forward networks, regularization, gradient descent, and backpropagation. It includes detailed questions and explanations on these topics, structured in modules with specific marks allocated to each question. The paper emphasizes understanding key concepts, algorithms, and applications in the field of deep learning.

Uploaded by

neelamguptaa420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

140 views32 pages

21CS743 Model Question Paper Solution

Uploaded by

neelamguptaa420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

21CS743

Model Question Paper-1/2 with effect from 2021(CBCS Scheme)

USN

7th Semester B.E. Degree Examination

Deep Learning
TIME: 03 Hours Max. Marks: 100

Note: 01. Answer any FIVE full questions, choosing at least ONE question from each MODULE.

Module -1 Marks
COs
Q.01 a Explain the historical trends in deep learning. 1 10

1940s–1950s: Foundations
1. McCulloch-Pitts Model (1943): Simplified artificial neuron model.
2. Hebbian Learning (1949): Proposed learning mechanism based on neuron
co-activation.
1960s–1970s: Perceptrons and AI Winter
1. Perceptron (1958): Early neural network for linear classification.
2. Limitations: Minsky and Papert (1969) proved single-layer perceptrons
couldn't solve XOR problems.
3. AI Winter: Funding and interest declined due to perceived limitations.

1980s: Revival with Backpropagation

1. Backpropagation Algorithm (1986): Efficient training of multilayer
networks (Rumelhart, Hinton, Williams).
2. Applications: Early successes in tasks like handwritten digit recognition.

1990s: Specialized Architectures

1. Recurrent Neural Networks (RNNs): Introduced for sequence data, with
LSTMs (1997) addressing gradient issues.
2. Convolutional Neural Networks (CNNs): Used for image recognition
(e.g., LeNet).

2000s: Shift to Kernel Methods

1. Support Vector Machines (SVMs): Dominated machine learning due to
better performance.
2. Feature Engineering: Focused on manually creating features for models.

2010s: Deep Learning Revolution

1. GPU Acceleration: Enabled training of deep networks.
2. Breakthrough Architectures:
o AlexNet (2012): CNN won ImageNet challenge, reinvigorating deep
learning.
o ResNet (2015): Introduced residual connections for deeper
networks.
3. Transformer Models (2017): Revolutionized NLP and beyond.
4. Applications Expanded: NLP, computer vision, speech recognition, etc.
2020s: Scaling and Ethics
1. Large Pretrained Models: GPT, BERT, and Vision Transformers (ViT).
2. Self-Supervised Learning: Leveraged unlabeled data for pretraining.
3. Ethics and Explainability: Focused on fairness, interpretability, and
sustainability

b Define machine learning. Explain different types of ML algorithms. 1 10

Machine Learning
Machine Learning is a branch of artificial intelligence that allows systems to learn
and improve from data without explicit programming. It focuses on creating
algorithms to identify patterns and make predictions or decisions.

Types of Machine Learning Algorithms

1. Supervised Learning
• Uses labeled data with input-output pairs.
• Maps inputs to specific outputs.
• Examples:
o Regression: Predicting continuous outcomes (e.g., house prices).
o Classification: Categorizing data (e.g., spam detection).
• Algorithms: Linear Regression, Logistic Regression, SVM, Random Forest.
2. Unsupervised Learning
• Works with unlabeled data to find hidden structures or patterns.
• Examples:
o Clustering: Grouping data into clusters (e.g., customer
segmentation).
o Dimensionality Reduction: Simplifying data features (e.g., PCA).
• Algorithms: K-Means, Hierarchical Clustering, Autoencoders.
3. Semi-Supervised Learning
• Combines a small amount of labeled data with a large amount of unlabeled
data.
• Reduces the effort required for data labeling.
• Examples: Web content classification, medical image diagnosis.
4. Reinforcement Learning
• Learns through interactions with an environment, receiving rewards or
penalties.
• Focuses on maximizing cumulative rewards over time.
• Examples:
o Robotics: Teaching robots to perform tasks.
o Gaming: Developing AI agents like AlphaGo.
• Algorithms: Q-Learning, Deep Q-Networks (DQN).
5. Deep Learning
• Employs deep neural networks to learn hierarchical data representations.
• Ideal for large-scale data and complex tasks.
• Applications:
o Image recognition (e.g., object detection).
o Natural language processing (e.g., chatbots).
• Architectures: CNNs, RNNs, Transformers.

Q.02 a Explain in detail about the supervised learning approach by taking suitable ex. 1 10
Supervised Learning
• Trains a model using labeled data where each input corresponds to a known
output.
• Aims to learn the relationship between inputs and outputs to make
predictions for new data.
• Consists of two phases: training (model learns patterns) and testing (model
evaluates its performance).
• Predicts outcomes by minimizing the difference between predicted and
actual outputs using a loss function.
• Divided into regression (predicts continuous outputs) and classification
(predicts categorical outputs).

Example: Predicting House Prices

• Input features include size, location, and number of bedrooms, and the target
is the house price.
• A dataset is split into training and testing sets to train and evaluate the model.
• The model learns the relationship between features and prices and predicts
prices for unseen data.

Common Algorithms
• Linear Regression for continuous predictions.
• Logistic Regression for binary classification.
• Decision Trees for data splitting based on features.
• Support Vector Machines for separating classes with a hyperplane.
• Neural Networks for handling complex, non-linear relationships.

Advantages
• Produces accurate results when trained on quality data.
• Easy to understand and implement for straightforward tasks.
• Widely applicable in areas like fraud detection, spam filtering, and predictive
maintenance.
Limitations
• Requires a large and accurately labeled dataset.
• May overfit, leading to poor performance on unseen data.
• Labeling data can be time-consuming and resource-intensive.

Applications
• Used in healthcare for disease prediction and treatment planning.
• Finance applications include credit scoring and fraud detection.
• Marketing uses include customer segmentation and churn prediction.

b Write a note on support vector machine and PCA. 1 10

Support Vector Machine (SVM)

• SVM is a supervised learning algorithm used for classification and
regression tasks.
• Works by finding the optimal hyperplane that separates data points of
different classes.
• The hyperplane maximizes the margin, which is the distance between the
closest data points (support vectors) from each class.
• Handles linear and non-linear data using kernel functions like polynomial,
radial basis function (RBF), and sigmoid.
• Effective for high-dimensional data and applications like text classification,
image recognition, and bioinformatics.
• Advantages include robustness to outliers and versatility in solving linear
and non-linear problems.
• Limitations include sensitivity to parameter tuning and computational cost
for large datasets.

Principal Component Analysis (PCA)

• PCA is an unsupervised dimensionality reduction technique.
• Identifies the most important features by projecting data into a lower-
dimensional space.
• Finds principal components, which are orthogonal vectors that capture the
maximum variance in the data.
• Reduces computational complexity and eliminates redundant features while
retaining significant information.
• Commonly used in preprocessing for machine learning models, image
compression, and visualization of high-dimensional data.
• Advantages include improved performance in large datasets and easier
visualization.
• Limitations include loss of interpretability in reduced dimensions and
sensitivity to scaling of features.
Module-2
Q. 03 a Explain the working of deep forward networks. 2 10

Deep Forward Networks

1. Structure
o Consist of three main layers:
▪ Input Layer: Receives raw data features.
▪ Hidden Layers: Process and transform the data through
weights, biases, and activation functions.
▪ Output Layer: Provides the final prediction or output.
2. Forward Propagation
o Input data passes through the layers in one direction (from input to
output).
o Each neuron computes a weighted sum of inputs, adds a bias, and
applies an activation function.
o Activation functions introduce non-linearity, enabling the network to
learn complex relationships.
3. Activation Functions
o Examples: ReLU (Rectified Linear Unit), Sigmoid, and Tanh.
o Help the network capture patterns that are not linearly separable.
4. Loss Calculation
o Loss function measures the error between predicted and actual
outputs.
o Examples:
▪ Mean Squared Error for regression problems.
▪ Cross-Entropy Loss for classification problems.
5. Backward Propagation
o Calculates gradients of the loss with respect to weights and biases
using the chain rule.
o Updates weights to minimize the loss using optimization algorithms
like gradient descent.
6. Optimization
o Algorithms like SGD (Stochastic Gradient Descent), Adam, and
RMSProp improve the learning process.
o Learning rate determines the step size for weight updates.
7. Training Process
o Involves multiple iterations (epochs) over the dataset to refine
weights.
o Regularization techniques (e.g., dropout) help prevent overfitting.
8. Applications
o Image recognition, natural language processing, speech recognition,
and predictive modeling.
9. Advantages
o Ability to learn complex patterns and hierarchical features.
o Suitable for large datasets and diverse applications.
10. Limitations
• High computational cost and long training times.
• Requires large amounts of labeled data.
• Susceptible to overfitting without proper regularization.
b What is regularization? How does regularization help in reducing overfitting. 2 10

Regularization
• Regularization is a technique used in machine learning to prevent overfitting
by adding a penalty to the loss function.
• The penalty discourages the model from fitting the noise in the training data,
encouraging simpler models with better generalization.

How Regularization Helps Reduce Overfitting

1. Simplifies the Model
o Regularization reduces the complexity of the model by shrinking the
coefficients of less important features toward zero.
o Simpler models are less likely to overfit and are better at generalizing
to new data.
2. Avoids Over-reliance on Specific Features
o Penalizes large weights, ensuring the model does not rely too heavily
on a small subset of features.
o Promotes learning from the broader structure of the data.
3. Types of Regularization
o L1 Regularization (Lasso): Adds the sum of absolute values of
weights to the loss function.
▪ Shrinks some weights to zero, effectively performing feature
selection.
o L2 Regularization (Ridge): Adds the sum of squared values of
weights to the loss function.
▪ Reduces the magnitude of all weights but does not eliminate
them entirely.
o Elastic Net: Combines L1 and L2 penalties for more flexibility.
4. Dropout Regularization
o In neural networks, dropout randomly disables a subset of neurons
during training.
o Prevents neurons from co-adapting, improving generalization.
5. Improves Model Generalization
o By reducing the model's capacity to memorize the training data,
regularization ensures better performance on unseen data.

Advantages of Regularization
• Reduces overfitting while retaining model accuracy.
• Encourages sparsity in features (L1) and prevents large weights (L2).
• Enhances model robustness and generalization capabilities.

Q.04 a Explain briefly about gradient descent algorithm. 2 10

Gradient Descent Algorithm
1. Gradient descent is an optimization algorithm used to minimize the loss
function in machine learning models by iteratively adjusting model
parameters (weights and biases).
2. It calculates the gradient (partial derivative) of the loss function with
respect to each parameter to determine the direction of steepest descent.
3. Model parameters are updated iteratively using the formula:
o New Parameter = Old Parameter - Learning Rate × Gradient
4. The learning rate controls the step size during updates:
o A small learning rate ensures convergence but may be slow.
o A large learning rate may lead to overshooting or divergence.
5. Types of Gradient Descent:
o Batch Gradient Descent: Uses the entire dataset to compute the
gradient for each update. Efficient for small datasets but slow for
large datasets.
o Stochastic Gradient Descent (SGD): Uses one data point at a time
for updates, making it faster but noisier.
o Mini-batch Gradient Descent: Combines the advantages of batch
and SGD by using a small subset (mini-batch) of the data for
updates.
6. Gradient descent helps find the optimal parameters that minimize the loss,
improving model performance.
7. Challenges:
o May get stuck in local minima or saddle points.
o Requires careful tuning of the learning rate for effective
optimization.
8. Variants (e.g., Adam, RMSProp) improve convergence speed and handle
complex optimization landscapes.
b Discuss the working of Backpropogation. 2 10

Backpropagation
1. Purpose
o Backpropagation is an algorithm used to train neural networks by
updating weights and biases to minimize the loss function.
2. Process Overview
o Involves two main steps: forward propagation and backward
propagation.
o Forward propagation computes the output of the network and
calculates the loss.
o Backward propagation adjusts weights and biases using the gradient
of the loss function.
3. Steps in Backpropagation
o Forward Propagation:
▪ Input data passes through the network layer by layer.
▪ Weighted sums and activation functions are applied to
compute the output.
▪ Loss is calculated using a predefined loss function.
o Backward Propagation:
▪ Gradients of the loss with respect to the output layer
parameters are computed.
▪ Gradients are propagated backward through the network
using the chain rule to compute gradients for each layer.
▪ These gradients indicate how weights and biases in each layer
should be updated.
4. Weight and Bias Updates
o Parameters are updated using gradient descent or its variants:
▪ New Weight = Old Weight - Learning Rate × Gradient.
o The learning rate determines the step size during updates.
5. Key Components
o Loss Function: Measures the difference between predicted and
actual outputs (e.g., Mean Squared Error or Cross-Entropy Loss).
o Activation Function: Introduces non-linearity, enabling the network
to learn complex patterns. Examples include ReLU, Sigmoid, and
Tanh.
6. Training Iterations
o The algorithm repeats forward and backward propagation for
multiple epochs until the loss converges or reaches a predefined
threshold.
7. Advantages
o Efficiently trains deep networks by distributing the error signal to all
layers.
o Can handle large-scale data with the help of optimization techniques.
8. Limitations
o Computationally expensive for large networks.
o Sensitive to vanishing or exploding gradients, especially in very deep
networks.
9. Applications
o Widely used in training neural networks for tasks like image
recognition, natural language processing, and predictive modeling.

Module-3

Q. 05 a Explain Empirical risk minimization. 3 10

• Empirical Risk Minimization (ERM) is a principle in machine learning

where the objective is to minimize the average loss over the training data.
• It aims to find a model that performs best on the given dataset by minimizing
the empirical risk.
• The empirical risk is the average of the loss function applied to all training
examples.

• The goal of ERM is to find the hypothesis (model) that minimizes the
empirical risk, which is an approximation of the true risk (expected loss over
the entire distribution).
• A loss function measures how well the model's predictions match the true
values. Common examples include Mean Squared Error (MSE) for
regression and Cross-Entropy Loss for classification problems.
• While ERM minimizes the error on the training data, it does not directly
ensure good performance on unseen data, which could lead to overfitting.
• ERM focuses only on the training data, which can lead to overfitting if the
model is too complex or underfitting if the model is too simple.
• Regularization methods like L1 or L2 regularization can be used alongside
ERM to prevent overfitting by adding penalties for overly complex models.
• The true risk (expected risk) is the average loss over the entire distribution
of data, while ERM approximates it with training data, but may not always
align with it.
• ERM is widely used in supervised learning models, including linear
regression, decision trees, and neural networks.

b Explain the challenges occur in neural network optimization in detail. 3 10

• Vanishing and Exploding Gradients: Gradients can become too small or

too large during backpropagation, leading to slow updates or unstable
training, respectively.
• Local Minima and Saddle Points: The non-convex loss surface of neural
networks can cause the optimization process to get stuck in local minima or
saddle points, preventing the model from reaching the global minimum.
• Overfitting and Underfitting:
o Overfitting occurs when the model memorizes the training data and
fails to generalize to new data.
o Underfitting occurs when the model is too simple to capture the
underlying patterns in the data.
• High Computational Cost: Training deep networks with many parameters
requires significant computational resources, which can be expensive and
time-consuming.
• Learning Rate Selection: Choosing the right learning rate is critical. A
learning rate that's too high can cause the model to overshoot the optimal
solution, while a rate that's too low leads to slow convergence.
• Overfitting on Small Datasets: Deep networks trained on small datasets are
prone to overfitting, where the model memorizes the data instead of learning
general patterns.
• Optimization Algorithm Selection: Choosing the right optimizer (e.g.,
SGD, Adam) and tuning its parameters is important for effective training.
• Gradient Clipping: In certain networks (e.g., RNNs), gradient explosion is
controlled by limiting gradients to a specific threshold.
• Noise in the Data: Noisy data can mislead the training process, affecting
model performance. Data cleaning and noise reduction are essential.
• Choice of Activation Functions: The choice of activation function
influences training. For example, ReLU can mitigate vanishing gradients but
may cause dead neurons (dying ReLU problem).
• Model Initialization: Poor weight initialization can slow convergence or
cause gradient-related issues, requiring careful initialization strategies like
Xavier or He initialization.
• Hyperparameter Tuning: The effectiveness of a neural network depends
heavily on selecting the right hyperparameters, which can be
computationally expensive to optimize.
• Difficulty in Interpretability: Neural networks are often "black boxes,"
making it hard to interpret their decision-making process, which is crucial in
sensitive fields.
• Batch Normalization Issues: While it speeds up convergence, batch
normalization can introduce challenges during inference, especially with
small batch sizes, and requires tuning.
• Generalization vs. Memorization: Striking a balance between model
generalization and memorization of training data is crucial for effective
performance on unseen data.

Q. 06 a Explain AdaGrad and write an algorithm of AdaGrad. 3 10

1. AdaGrad is an optimization algorithm designed to adapt the learning rate for
each parameter based on the historical gradients, making it effective in
training models with sparse data.
2. The key feature of AdaGrad is that it adjusts the learning rate for each
parameter individually. Parameters with frequent gradients receive smaller
updates, while infrequent parameters receive larger updates.
3. AdaGrad computes the squared gradients for each parameter at every step
and accumulates them over time. The learning rate is then scaled by the
inverse square root of the accumulated gradient sum, helping the model
converge faster on sparse features.
4. The update rule for the parameters is as follows:

5. AdaGrad helps adjust the learning rates for different features, making it
especially useful for high-dimensional or sparse datasets, such as in natural
language processing or image recognition tasks.
6. One of the advantages of AdaGrad is that it eliminates the need for manual
learning rate decay since the algorithm adapts the learning rate based on the
parameters' updates over time.
7. However, a disadvantage of AdaGrad is that the learning rate tends to
decrease rapidly as the algorithm progresses, which can slow down
convergence in the later stages of training.
8. AdaGrad is particularly effective in domains with sparse data where certain
features appear less frequently than others, allowing the model to adjust the
learning rates accordingly for better optimization.
9. Compared to Stochastic Gradient Descent (SGD), AdaGrad adjusts the
learning rate for each parameter, allowing it to perform better when dealing
with datasets where feature frequencies vary widely.
10. While AdaGrad is a useful algorithm for sparse data, its rapid learning rate
decay can limit its efficiency in more complex, dense data scenarios.
Alternative algorithms like RMSprop and Adam are often preferred to
address this limitation.

b Explain Adam algorithm in detail. 3 10

1. Adam is an optimization algorithm that combines the advantages of both

momentum and adaptive learning rates, making it efficient for training deep
learning models.
2. The algorithm computes two moments for each parameter: the first moment
(mean) which is an estimate of the gradient direction, and the second
moment (variance) which estimates the gradient magnitude.
3. At each iteration, Adam updates the first moment estimate and second
moment estimate using the current gradient.
4. The first moment estimate is updated by incorporating a weighted sum of
past gradients, while the second moment estimate is updated with the
squared gradients.
5. Both the first and second moment estimates are biased toward zero in the
initial iterations, so Adam applies bias correction to these estimates to
account for the initial values.
6. The parameter updates are performed using the corrected first and second
moments, scaling the gradient by the square root of the second moment
estimate and adjusting it based on the first moment.
7. Adam is computationally efficient, requires little memory, and works well
on large datasets and high-dimensional problems, making it widely used in
various deep learning applications.
8. The algorithm has hyperparameters like learning rate decay rates and a small
constant to avoid division by zero.
9. It adapts the learning rate for each parameter based on its gradient history,
which is especially useful for sparse gradients in tasks like natural language
processing and image recognition.
10. Although Adam is highly effective, it may be sensitive to hyperparameter
choices and may require tuning, especially the learning rate and decay rates.

Module-4

Q. 07 a Explain the components of CNN layer. 4 10

1. Input Layer:
o The input layer receives the image or data in the form of a multi-
dimensional array (e.g., height, width, and depth for color images).
The input data passes through the CNN for further processing.
2. Convolutional Layer:
o This layer performs the core operation of a CNN. It applies a set of
filters (kernels) to the input image, performing a convolution
operation. The filters slide over the image, computing dot products
between the filter and the region of the image it covers, extracting
features such as edges, textures, or patterns.
3. Activation Function:
o After the convolution operation, an activation function is applied,
typically the Rectified Linear Unit (ReLU). This function introduces
non-linearity, enabling the network to learn more complex patterns
and representations.
4. Pooling Layer:
o The pooling layer reduces the spatial dimensions (height and width)
of the feature maps while retaining important information. Common
types of pooling include Max Pooling (selects the maximum value in
the region) and Average Pooling (computes the average value).
Pooling helps reduce the computational complexity and prevent
overfitting.
5. Fully Connected Layer (Dense Layer):
o This layer connects every neuron in the previous layer to every
neuron in the current layer. It’s used for classification or regression
tasks. The fully connected layer outputs a final prediction or
classification, such as determining the class of the object in an image.
6. Normalization Layer:
o Normalization layers, like Batch Normalization, help to stabilize the
learning process by reducing internal covariate shift. They normalize
the input to each layer to have zero mean and unit variance, speeding
up training and improving performance.
7. Dropout Layer:
o Dropout is a regularization technique where random neurons are
"dropped" (set to zero) during training. This prevents overfitting by
ensuring that the network doesn’t rely too heavily on any single
neuron and helps it generalize better to new data.
8. Flatten Layer:
o The flatten layer converts the multi-dimensional output from the
convolutional and pooling layers into a 1D vector. This step is
necessary before passing the data into the fully connected layers, as
they require a 1D input.
9. Output Layer:
o The output layer generates the final prediction or classification result.
In classification tasks, it often uses a softmax activation function for
multi-class problems or a sigmoid function for binary classification.
The output layer size corresponds to the number of classes or
categories in the problem.

b Explain Pooling with network representation. 4 10

1. Pooling is a technique used in Convolutional Neural Networks (CNNs) to

reduce the spatial size (height and width) of feature maps, making the
network more efficient.
2. There are two main types of pooling:
o Max Pooling: Takes the maximum value from each region of the
feature map.
o Average Pooling: Takes the average value from each region of the
feature map.
3. A pooling layer typically uses a small window (e.g., 2x2, 3x3) that slides
over the input feature map.
4. Stride refers to how much the pooling window moves at each step. For
example, with a stride of 2, the window moves two steps at a time.
5. Max pooling helps retain the most significant features, such as edges or
textures, from the input data.
6. Average pooling provides a more generalized representation of the features
by averaging the values in the pooling window.
7. Pooling reduces the spatial dimensions of the input, resulting in fewer
parameters and computations, which speeds up the learning process.
8. Pooling also provides translation invariance, meaning the network
becomes less sensitive to small changes in the position of features within the
input.
9. After pooling, the feature map is smaller, retaining only the most important
features for further processing.
10. Pooling helps prevent overfitting by reducing the complexity of the network,
making the model less prone to memorizing specific patterns in the training
data.

Q. 08 a Explain the variants of the CNN model. 4 10

1. LeNet (LeNet-5):
o One of the first CNN architectures, designed for digit recognition
(e.g., MNIST).
o Composed of two convolutional layers, followed by pooling layers
and fully connected layers.
o Simple architecture suitable for small image datasets.
2. AlexNet:
o Introduced in 2012 and won the ImageNet challenge.
o Consists of five convolutional layers and three fully connected
layers.
o Uses ReLU activation, dropout, and data augmentation to improve
training efficiency.
3. VGGNet (VGG16/VGG19):
o Known for its deep architecture with 16 or 19 layers.
o Uses 3x3 convolution filters stacked on top of each other.
o Simple but deep, it is easy to understand and apply.
4. GoogLeNet (Inception):
o Uses inception modules, where multiple convolution filters of
different sizes are applied at each layer.
o Combines different levels of feature extraction, making the network
efficient.
o Introduced the concept of "network in network."
5. ResNet (Residual Networks):
o Introduced residual learning to avoid vanishing gradient problems.
o Uses skip connections to pass output from one layer to a deeper
layer.
o ResNet allows for much deeper networks (e.g., ResNet-50, ResNet-
101).
6. DenseNet (Densely Connected Convolutional Networks):
o Each layer connects to every previous layer, enhancing feature
reuse.
o Improves gradient flow and reduces the vanishing gradient
problem.
o Requires fewer parameters compared to traditional CNNs.
7. MobileNet:
o Designed for mobile and embedded systems with limited
computational power.
o Uses depthwise separable convolutions to reduce computational
cost.
o Efficient for real-time mobile vision applications.
8. SqueezeNet:
o A compact CNN model designed for efficiency with fewer
parameters.
o Utilizes fire modules that combine 1x1 convolutions and 3x3
convolutions.
o Achieves competitive accuracy with a significantly smaller model
size.
9. U-Net:
o Primarily used for image segmentation tasks, especially in medical
image analysis.
o Features an encoder-decoder architecture that reduces and restores
spatial dimensions.
o Performs pixel-wise predictions to segment images.
10. EfficientNet:
• A family of models that balances depth, width, and resolution to improve
accuracy while reducing parameters.
• Uses a compound scaling method to scale the network efficiently.
• Achieves high performance with fewer parameters compared to other
models.

b Explain structured output with neural network. 4 10

1. Structured output refers to tasks where the prediction involves multiple

interdependent components rather than independent labels.
2. Examples of structured output tasks include:
o Sequence prediction (e.g., language translation, speech recognition).
o Image segmentation (e.g., classifying each pixel in an image).
o Object detection (e.g., identifying objects and their locations in an
image).
3. Neural networks can be adapted to handle structured outputs by using
specialized architectures.
4. Sequence Prediction:
o Recurrent Neural Networks (RNNs), especially LSTM or GRU, are
used for tasks like sequence generation, where each output depends
on previous elements.
5. Image Segmentation:
o Fully Convolutional Networks (FCNs) are used to predict pixel-wise
class labels in an image, making it suitable for segmentation tasks.
6. Object Detection:
o Architectures like YOLO (You Only Look Once) and Faster R-CNN
predict both object classes and bounding box locations in images.
7. Encoder-Decoder Networks:
o Encoder-decoder architectures are used to convert an input into a
structured output, such as converting an image to a segmented map
or a sentence to another sentence (in translation tasks).
8. Conditional Random Fields (CRFs):
o CRFs are used in conjunction with neural networks to model
dependencies between output variables, ensuring consistent and
accurate structured predictions.
9. Loss Functions:
o Specialized loss functions like sequence loss, dice coefficient for
segmentation, and negative log-likelihood are used to handle
dependencies between output components.
10. Training Structured Output Models:
• Structured output models are trained end-to-end, learning both local features
and global dependencies, often using backpropagation with techniques like
CRFs for better accuracy.

Module-5

Q. 09 a Explain how the recurrunt neural network (RNN) processes data sequences. 5 10
1. Recurrent Neural Networks (RNNs) are designed to process sequences of
data, enabling them to maintain information from previous time steps.
2. At each time step, an RNN processes the current input and the hidden state
from the previous time step.
3. The hidden state acts as the network's memory, capturing information from
earlier steps in the sequence.
4. The output of the previous time step is fed back into the network, allowing
it to influence future outputs.

5. RNNs are typically visualized by "unrolling" them across time steps, where
each time step corresponds to a layer of the network.
6. Training an RNN involves using Backpropagation Through Time
(BPTT), which computes gradients for each time step and updates the
weights through the entire sequence.
7. Some challenges faced by RNNs include the vanishing gradient problem,
where gradients become too small to learn long-term dependencies, and the
exploding gradient problem, where gradients grow too large and cause
instability.
8. To overcome these challenges, more advanced architectures like Long
Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) are
used, which introduce mechanisms to preserve important information and
avoid vanishing or exploding gradients.

b Discuss about Bidirectional RNNs. 5 10

1. Bidirectional RNNs extend traditional RNNs by processing data in both

forward and backward directions.
2. They consist of two RNNs: one processes the sequence from left to right
(forward), and the other processes it from right to left (backward).
3. The input sequence is fed into both forward and backward RNNs to capture
information from both past and future contexts.
4. The outputs from both directions are combined, typically by concatenation
or averaging, to produce the final output at each time step.
5. By considering both past and future information, Bidirectional RNNs
provide a more comprehensive understanding of the sequence.
6. This approach is especially beneficial in tasks where future context is
important, such as language modeling, speech recognition, and machine
translation.
7. For example, in speech recognition, understanding a word may require
knowledge of both the preceding and following words in the sequence.
8. Bidirectional RNNs improve performance over unidirectional RNNs,
making them more suitable for complex tasks requiring context from both
directions.
9. The main challenges include increased computational cost and complexity
due to processing the sequence in both directions.
10. Bidirectional RNNs can be further enhanced by using LSTM or GRU cells,
which mitigate the vanishing gradient problem and improve learning over
long sequences.

Q. 10 a Explain LSTM working principle along with equations. 5 10

1. LSTM (Long Short-Term Memory) is a type of RNN designed to address the
vanishing gradient problem in traditional RNNs and effectively capture
long-term dependencies.
2. LSTM uses a specialized architecture consisting of three main gates: Forget
Gate, Input Gate, and Output Gate, each controlling the flow of
information within the network.
3. Working Principle:
o The LSTM cell maintains the cell state across time steps, allowing it
to retain long-term dependencies.
o The forget gate selectively forgets parts of the previous state, while
the input gate updates the state with new information.
o The output gate determines the next hidden state, which will be used
by the LSTM for the next time step and for the final prediction.

4. Advantages:
o LSTMs are capable of learning long-range dependencies in
sequential data by maintaining and updating cell states over time.
o They effectively mitigate the vanishing gradient problem, making
them suitable for tasks with long sequences, like language translation
and speech recognition.

b Write a note on Speech Recognition and NLP. 5 10

1. Speech Recognition:
o Speech recognition is the process of converting spoken language into
text using computational algorithms.
o It involves analyzing sound waves, recognizing speech patterns, and
converting them into text or commands.
o The process begins by recording the audio input and preprocessing it
to remove noise and enhance clarity.
o Acoustic features, such as phonemes (smallest speech units), are then
extracted from the audio signal.
o These features are matched against a model trained on a large dataset
of speech samples.
o Modern speech recognition systems often use deep learning
techniques, such as recurrent neural networks (RNNs) or deep neural
networks (DNNs), to improve accuracy.
o Applications of speech recognition include virtual assistants (like
Siri, Alexa, and Google Assistant), transcription services, and voice-
controlled devices.
2. Natural Language Processing (NLP):
o NLP is a branch of artificial intelligence (AI) that focuses on the
interaction between computers and human language.
o It involves developing algorithms and models that enable machines
to understand, interpret, and generate human language in a
meaningful way.
o NLP tasks include text classification, machine translation, named
entity recognition (NER), sentiment analysis, and summarization.
o The primary challenge in NLP is dealing with the ambiguity and
complexity of natural language, such as homophones (words with the
same pronunciation but different meanings) and context-based
meanings.
o NLP uses various techniques, including:
▪ Tokenization: Breaking text into words, sentences, or
subword units.
▪ Part-of-speech tagging: Identifying the grammatical
structure of a sentence.
▪ Named Entity Recognition (NER): Recognizing entities
like names, locations, and dates in text.
▪ Word Embeddings: Representing words in a dense vector
format that captures their semantic meaning.
▪ Transformers and Attention Mechanisms: Advanced
models that capture relationships between words in context,
used in models like BERT, GPT, and T5.
o Applications of NLP include search engines, chatbots, sentiment
analysis tools, and translation services.
3. Relation Between Speech Recognition and NLP:
o Speech recognition converts spoken language into text, while NLP
works on processing and understanding that text.
o A combined system, such as a voice assistant, uses speech
recognition to convert voice input into text, and NLP to comprehend
the text and provide meaningful responses.
o The integration of these two technologies enables the development
of applications like automated transcription, real-time translation,
and intelligent virtual assistants.
4. Challenges:
o Speech Recognition: Handling accents, background noise, and
variations in pronunciation can lead to errors in recognition.
o NLP: Ambiguity in language, sarcasm, and context-dependent
meanings pose challenges in natural language understanding.

*Bloom’s Taxonomy Level: Indicate as L1, L2, L3, L4, etc. It is also desirable to indicate the COs and POs to
be attained by every bit of questions.

Page 01 of 01

MSP430 Adc
No ratings yet
MSP430 Adc
19 pages
Data Compression Seminar Report
67% (6)
Data Compression Seminar Report
34 pages
M.Phil Computer Science Wireless Communication Projects
No ratings yet
M.Phil Computer Science Wireless Communication Projects
8 pages
C20 5 6 Sem ECE
No ratings yet
C20 5 6 Sem ECE
95 pages
Advanced Machine Learning and Deep Learning Techniques
No ratings yet
Advanced Machine Learning and Deep Learning Techniques
22 pages
Audio and Video Compresssion
100% (1)
Audio and Video Compresssion
61 pages
BRMK557 RM & IPR Lecture Notes
No ratings yet
BRMK557 RM & IPR Lecture Notes
118 pages
Unit 1 WSN
No ratings yet
Unit 1 WSN
139 pages
AI in Healthcare Syllabus
No ratings yet
AI in Healthcare Syllabus
7 pages
21cs743 Model Question Paper Solution
No ratings yet
21cs743 Model Question Paper Solution
33 pages
Ipl Prediction Documentation
No ratings yet
Ipl Prediction Documentation
18 pages
Atm Lan
0% (2)
Atm Lan
27 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Multimedia
100% (1)
Multimedia
134 pages
Bce613a-Mod 3
No ratings yet
Bce613a-Mod 3
22 pages
Biomedical Signal Processing
No ratings yet
Biomedical Signal Processing
4 pages
Blue Eyes Technology
No ratings yet
Blue Eyes Technology
24 pages
CCN UNIT-I Introduction Complete Notes
No ratings yet
CCN UNIT-I Introduction Complete Notes
47 pages
Rajiv Gandhi Institute of Technology: Electronics and Communication Engineering
No ratings yet
Rajiv Gandhi Institute of Technology: Electronics and Communication Engineering
9 pages
NPTEL Transcript PDF
No ratings yet
NPTEL Transcript PDF
977 pages
21CS743
No ratings yet
21CS743
27 pages
Untitled
No ratings yet
Untitled
61 pages
VTU Exam Question Paper With Solution of 17EC741 Multimedia Communication Feb-2021-Prof. Aritri Debnath
No ratings yet
VTU Exam Question Paper With Solution of 17EC741 Multimedia Communication Feb-2021-Prof. Aritri Debnath
38 pages
VTU EC EBCS CCN Module1 Raghudathesh
100% (1)
VTU EC EBCS CCN Module1 Raghudathesh
94 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
Wireless Communications and Networks: William Stallings
No ratings yet
Wireless Communications and Networks: William Stallings
24 pages
CD Unit 4 Compiler Design Jntuk r20
No ratings yet
CD Unit 4 Compiler Design Jntuk r20
17 pages
Introduction To Xilinx System Generator
100% (1)
Introduction To Xilinx System Generator
69 pages
Embedded Zero Tree Wavelet Coding
No ratings yet
Embedded Zero Tree Wavelet Coding
25 pages
Projectreport-G15 Tue
100% (1)
Projectreport-G15 Tue
19 pages
You Look Like A Thing and I Love You
No ratings yet
You Look Like A Thing and I Love You
120 pages
Itc Unit-Iii
No ratings yet
Itc Unit-Iii
58 pages
Chapter 1-Intro To IA
No ratings yet
Chapter 1-Intro To IA
7 pages
Deep Learning (2024)
No ratings yet
Deep Learning (2024)
589 pages
Unit 4 (Ensemble Methods)
No ratings yet
Unit 4 (Ensemble Methods)
24 pages
Broadband Isdn Services
100% (3)
Broadband Isdn Services
33 pages
Unit 2,3 Ct2 Question Bank 4 Marks
No ratings yet
Unit 2,3 Ct2 Question Bank 4 Marks
3 pages
The COMPLETE TRUTH About AI Agents (2024)
No ratings yet
The COMPLETE TRUTH About AI Agents (2024)
32 pages
ADHOC UNIT-1 Applications PDF
0% (1)
ADHOC UNIT-1 Applications PDF
32 pages
Boosting of Implicit Neural Representation-Based Image Denoiser
No ratings yet
Boosting of Implicit Neural Representation-Based Image Denoiser
5 pages
Stock Prediction Report
No ratings yet
Stock Prediction Report
38 pages
MMC 15EC741 Module 2 - Watermark
No ratings yet
MMC 15EC741 Module 2 - Watermark
30 pages
Class 8 - Linear Regression
No ratings yet
Class 8 - Linear Regression
56 pages
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
6-A Prediction Problem
No ratings yet
6-A Prediction Problem
31 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Result Based Paper
No ratings yet
Result Based Paper
12 pages
JPEG2000 Image Compression Standard
No ratings yet
JPEG2000 Image Compression Standard
12 pages
1 s2.0 S0952197623018018 Main
No ratings yet
1 s2.0 S0952197623018018 Main
11 pages
ML Visuals
No ratings yet
ML Visuals
62 pages
EC2037 MUTIMEDIa QB
100% (4)
EC2037 MUTIMEDIa QB
16 pages
Jagadish
No ratings yet
Jagadish
13 pages
DSP Architecture
No ratings yet
DSP Architecture
90 pages
17ec741 MMC Module 2
No ratings yet
17ec741 MMC Module 2
20 pages
Automatic Grape Leaf Diseases Identification Via Unitedmodel Based On Multiple Convolutional Neural Networks
No ratings yet
Automatic Grape Leaf Diseases Identification Via Unitedmodel Based On Multiple Convolutional Neural Networks
9 pages
Data Detection For Controlled ISI Symbol by Symbol Suboptimum Detection
No ratings yet
Data Detection For Controlled ISI Symbol by Symbol Suboptimum Detection
14 pages
UNIT-4 Multimedia Communications
No ratings yet
UNIT-4 Multimedia Communications
24 pages
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
No ratings yet
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
2 pages
Unit 1 - Part1
No ratings yet
Unit 1 - Part1
50 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
1708443470801
No ratings yet
1708443470801
71 pages
Introduction of Multimedia Communications
No ratings yet
Introduction of Multimedia Communications
12 pages
Chapter-2-Fundamentals of Machine Learning
No ratings yet
Chapter-2-Fundamentals of Machine Learning
23 pages
Machine Learning System Design PDF
100% (1)
Machine Learning System Design PDF
14 pages
AIM: To Design of LPC Filter Using Levinson-Durbin Algorithm
No ratings yet
AIM: To Design of LPC Filter Using Levinson-Durbin Algorithm
3 pages
Machine Learning-Unit 3
No ratings yet
Machine Learning-Unit 3
18 pages
Module 1 Notes (17EC81)
No ratings yet
Module 1 Notes (17EC81)
16 pages
Lab Manual 15
No ratings yet
Lab Manual 15
9 pages
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
No ratings yet
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
15 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Fractal Image Compression: Presented By: Sarika Rani EC3 Year
No ratings yet
Fractal Image Compression: Presented By: Sarika Rani EC3 Year
17 pages
Manual CN Lab 2017 PDF
No ratings yet
Manual CN Lab 2017 PDF
71 pages
A Survey of Few-Shot Learning An Effective Method
No ratings yet
A Survey of Few-Shot Learning An Effective Method
10 pages
Article - Artificial Intelligence in The Stock Market
No ratings yet
Article - Artificial Intelligence in The Stock Market
19 pages
Exploring Efficient Quantitative Trading Strategie
No ratings yet
Exploring Efficient Quantitative Trading Strategie
6 pages
Bag of Freebies For Training Object Detection Neural Networks
No ratings yet
Bag of Freebies For Training Object Detection Neural Networks
9 pages
Ec1to6 PDF
No ratings yet
Ec1to6 PDF
61 pages
DSP Questions
No ratings yet
DSP Questions
46 pages
Heart Attack Prediction Using Machine Learning
No ratings yet
Heart Attack Prediction Using Machine Learning
11 pages
Dr. Ambedkar Institute of Technology: Mysuru Sandal and Industry Management
No ratings yet
Dr. Ambedkar Institute of Technology: Mysuru Sandal and Industry Management
5 pages
Inter Symbol Interference
No ratings yet
Inter Symbol Interference
29 pages
4G Mobile Communications
No ratings yet
4G Mobile Communications
18 pages
Radar Engineering and Navigational Aids Question Bank UNIT 3
No ratings yet
Radar Engineering and Navigational Aids Question Bank UNIT 3
1 page
Lab Manual Rev 5 Lab 1 - SDR Basics - 0
No ratings yet
Lab Manual Rev 5 Lab 1 - SDR Basics - 0
15 pages
Domain Payments
No ratings yet
Domain Payments
15 pages
Comparative Analysis Study For Air Quality Prediction in Smart Cities Using Regression Techniques
No ratings yet
Comparative Analysis Study For Air Quality Prediction in Smart Cities Using Regression Techniques
10 pages
Bais and Variance
No ratings yet
Bais and Variance
4 pages
Adaptive Blind Noise Suppression in Some Speech Processing Applications
No ratings yet
Adaptive Blind Noise Suppression in Some Speech Processing Applications
5 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)

21CS743 Model Question Paper Solution

Uploaded by

21CS743 Model Question Paper Solution

Uploaded by

21CS743

Model Question Paper-1/2 with effect from 2021(CBCS Scheme)

7th Semester B.E. Degree Examination

1980s: Revival with Backpropagation

1990s: Specialized Architectures

2000s: Shift to Kernel Methods

2010s: Deep Learning Revolution

b Define machine learning. Explain different types of ML algorithms. 1 10

Types of Machine Learning Algorithms

Example: Predicting House Prices

b Write a note on support vector machine and PCA. 1 10

Support Vector Machine (SVM)

Principal Component Analysis (PCA)

Deep Forward Networks

How Regularization Helps Reduce Overfitting

Q.04 a Explain briefly about gradient descent algorithm. 2 10

Q. 05 a Explain Empirical risk minimization. 3 10

• Empirical Risk Minimization (ERM) is a principle in machine learning

b Explain the challenges occur in neural network optimization in detail. 3 10

• Vanishing and Exploding Gradients: Gradients can become too small or

Q. 06 a Explain AdaGrad and write an algorithm of AdaGrad. 3 10

b Explain Adam algorithm in detail. 3 10

1. Adam is an optimization algorithm that combines the advantages of both

Q. 07 a Explain the components of CNN layer. 4 10

b Explain Pooling with network representation. 4 10

1. Pooling is a technique used in Convolutional Neural Networks (CNNs) to

Q. 08 a Explain the variants of the CNN model. 4 10

b Explain structured output with neural network. 4 10

1. Structured output refers to tasks where the prediction involves multiple

b Discuss about Bidirectional RNNs. 5 10

1. Bidirectional RNNs extend traditional RNNs by processing data in both

Q. 10 a Explain LSTM working principle along with equations. 5 10

b Write a note on Speech Recognition and NLP. 5 10

You might also like