0% found this document useful (0 votes)
8 views18 pages

DL Miid1 Mansi

The document provides an overview of key concepts in deep learning, including the definitions and functionalities of perceptrons and multilayer perceptrons, types of errors in deep learning, and explanations of convolution and pooling layers. It also covers backpropagation and gradient descent algorithms, including momentum-based gradient descent, as well as the importance of hyperparameters in model training. The content is structured into short and long answer questions, making it a comprehensive study guide for deep learning topics.

Uploaded by

tiarakhanna707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

DL Miid1 Mansi

The document provides an overview of key concepts in deep learning, including the definitions and functionalities of perceptrons and multilayer perceptrons, types of errors in deep learning, and explanations of convolution and pooling layers. It also covers backpropagation and gradient descent algorithms, including momentum-based gradient descent, as well as the importance of hyperparameters in model training. The content is structured into short and long answer questions, making it a comprehensive study guide for deep learning topics.

Uploaded by

tiarakhanna707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

hiii bbgs

(answers from ma’am PDFs + ChatGpt Plus)

Deep Learning Important Questions for CIE-1


Short Answer Questions:(2m each)

1. What is Perceptron? Define Multilayer Perceptron.

Perceptron

1. Introduction:
○ Introduced by Frank Rosenblatt in 1958.
○ Fundamental building block in neural networks.
2. Architecture:
○ Consists of a single layer of neurons.
○ Primarily used for binary classification tasks.
3. Functionality:
○ Takes a set of inputs.
○ Applies weights to the inputs.
○ Computes a weighted sum of the inputs.
○ Passes the weighted sum through an activation function (commonly a
step function).
4. Mathematical Representation:

Where:

● xi​: Inputs.
● wi​: Weights.
● b: Bias.
● y: Output.

Multilayer Perceptron (MLP)

a. Introduction:
i. Extension of the perceptron with multiple layers of neurons.
ii. Suitable for more complex machine learning tasks.
b. Architecture:
i. Includes an input layer, at least one hidden layer, and an output layer.
ii. Hidden layers enable learning complex relationships.
c. Activation Functions:
i. Uses nonlinear activation functions such as:
1. ReLU (Rectified Linear Unit).
2. Sigmoid.
3. Tanh.
ii. Nonlinearity allows it to classify non-linearly separable data.
d. Capability:
i. Models complex and nonlinear patterns in data.
ii. Suitable for tasks like image recognition, natural language processing,
and more.
e. Key Differences from Perceptron:
i. Can handle non-linearly separable data.
ii. Employs multiple layers instead of a single layer.
iii. Uses nonlinear activation functions, whereas a perceptron uses a
step function.

2. List out different types of errors in deep learning?


a. Training Errors:
i. Underfitting: The model fails to capture the underlying patterns in the
training data.
ii. Overfitting: The model captures noise in the training data, performing
poorly on new data.
b. Optimization Errors:
i. Gradient Vanishing: Gradients become too small for effective weight
updates, common in deep networks with sigmoid or tanh activations.
ii. Gradient Exploding: Gradients become excessively large, causing
instability in training.
c. Data Errors:
i. Noisy Data: Data with random errors or irrelevant information.
ii. Imbalanced Data: Uneven distribution of classes in the dataset.
d. Model Errors:
i. Incorrect architecture design.
ii. Poor selection of hyperparameters like learning rate, batch size, etc.
e. Implementation Errors:
i. Bugs in the code.
ii. Misuse of libraries or frameworks.
f. Evaluation Errors:
i. Choosing the wrong evaluation metrics.
ii. Data leakage between training and validation datasets.
g. Inference Errors:
i. Domain shift: Changes in input data distribution between training and
deployment.
ii. Adversarial attacks: Malicious inputs crafted to fool the model.

3. Define Convolution and pooling layers.

Convolution Layer

1. Definition:
○ A core component of Convolutional Neural Networks (CNNs).
○ Extracts spatial features from input data, such as edges, textures, and
shapes.
2. Operation:
○ Applies filters (kernels) to the input matrix.
○ Slides the kernel over the input matrix (convolution operation).
○ Computes element-wise multiplications and sums the results.
3. Output:
○ Produces feature maps that highlight specific features detected by
the filters.
4. Purpose:
○ Captures spatial hierarchies and patterns in data.

Pooling Layer

a. Definition:
i. A layer that reduces the spatial dimensions (height and width) of
feature maps.
b. Types of Pooling:
i. Max-Pooling: Selects the maximum value within a region.
ii. Average-Pooling: Calculates the average value within a region.
c. Operation:
i. Divides the feature map into regions (e.g., 2x2 or 3x3).
ii. Applies the pooling function (max or average) to each region.
d. Benefits:
i. Reduces computational complexity by downsampling.
ii. Helps prevent overfitting.
iii. Retains important features while discarding redundant information.
4. Define McCulloch Pitts Neuron.
● Developed in 1943 by Warren McCulloch and Walter Pitts as one of the
earliest models of artificial neurons.
● Accepts binary inputs (000 or 111), where 111 represents an active signal,
and 000 represents no signal.
● Each input is assigned a weight, and the neuron computes the weighted
sum of inputs.
● The weighted sum is compared to a threshold:
1. Outputs 111 if the sum is greater than or equal to the threshold.
2. Outputs 000 if the sum is below the threshold.
● Can perform basic logical operations like AND, OR, and NOT, forming the
foundation for modern neural networks.
● Threshold Mechanism: The weighted sum SSS is compared to a predefined
threshold value (TTT):
■ If S≥T, the neuron outputs 1 (activation).
■ If S<T, the neuron outputs 0 (no activation).
5. Describe the terms Bias and Variance.

Bias: Refers to the error caused by using a simplified model that does not capture the
complexity of the real-world function.

1. Impact:
○ High Bias results in underfitting, where the model fails to learn
patterns in the training data effectively.
○ The model performs poorly on both training and unseen data.
2. Example:
A linear regression model applied to non-linear data may result in high bias.

Variance: Refers to the error introduced due to the model’s excessive sensitivity to
small variations in the training data.

1. Impact:
○ High Variance leads to overfitting, where the model captures noise
in the training data and performs poorly on unseen data.
2. Example:
A highly complex model, like a deep decision tree, may memorize the training
data but fail to generalize.

Bias-Variance Tradeoff:

a. Balancing bias and variance is crucial for achieving optimal model


performance.
b. The goal is to minimize both errors simultaneously for better generalization.
c. Use cross-validation to evaluate model performance.
d. Regularization techniques or tuning model complexity help achieve the right
balance.
6. What is Cross Validation? List out the methods used for Cross Validation.
● Cross-validation is a statistical method used to evaluate the performance and
generalizability of a model.
● It divides the dataset into training and validation subsets multiple times,
ensuring that every data point gets a chance to be in the validation set.
● Methods:
1. K-Fold Cross-Validation: Divides data into kkk subsets; each subset
is used as a validation set once.
2. Stratified K-Fold Cross-Validation: Ensures class proportions are
maintained in each fold.
3. Leave-One-Out Cross-Validation (LOOCV): Uses all data except
one point for training; repeats for all data points.
4. Time Series Cross-Validation: Respects temporal order by splitting
data sequentially.

Long Answer Questions:

1. Explain Back Propagation with its algorithm.(7m)


● Backpropagation is a supervised learning algorithm widely used for training
neural networks.
● It efficiently calculates the gradients of the loss function with respect to the
network's weights and biases, enabling systematic and efficient weight
updates.
● It leverages the chain rule of calculus to propagate errors backward through
the network.
● Steps in Backpropagation
1. Feedforward Pass:
● Data flows forward through the network layer by layer.
● Outputs (activations) are computed for each neuron in each
layer using the current weights and biases.
● The final output of the network is produced, which will be
compared against the true labels during the loss calculation.
2. Loss Calculation:
● A loss function (e.g., Mean Squared Error, Cross-Entropy) is
used to compute the error between the predicted output and
the true label.
● This loss measures how far the model's predictions are from
the actual targets and serves as the quantity to minimize
during training.
3. Backward Pass:
● Errors are propagated backward through the network to
compute gradients of the loss with respect to the weights and
biases:
○ Start at the output layer, compute the derivative of the
loss concerning the activations and weights.
○ Move layer by layer backward, using the chain rule to
compute gradients for the preceding layers.
● Gradients indicate the direction and magnitude of change
needed to minimize the loss.
4. Weight Update:
● Using Gradient Descent, weights are adjusted to reduce the
loss:

Backpropagation Algorithm

1. Initialization: Randomly initialize all weights and biases in the network.


2. Training Loop:For each training example (or batch, in mini-batch gradient
descent):
a. Forward Propagation: Input the data into the network and compute
the outputs (activations) for all layers.
b. Loss Computation: Compare the predicted output to the true label
and calculate the loss using a loss function.
c. Backward Propagation:
● Start at the output layer:
○ Compute the gradient of the loss concerning the activations
and weights.
○ Use the chain rule to propagate these gradients backward
through all hidden layers.
● For each neuron in each layer, calculate gradients for both weights
and biases.
d. Weight and Bias Updates: Adjust weights and biases using the
computed gradients and a learning rate.

3. Repeat for Multiple Epochs:

● Perform the forward and backward passes for the entire training
dataset for several iterations (epochs).
● Monitor the loss at the end of each epoch to check for convergence
(when the loss stabilizes or reaches an acceptable level).

Key Concepts in Backpropagation

1. Gradient Descent:
○ Backpropagation uses gradient descent to minimize the loss function.
○ Variants of gradient descent include batch gradient descent,
stochastic gradient descent (SGD), and mini-batch gradient
descent.
2. Learning Rate (α):
○ Controls the step size during weight updates.
○ A small learning rate slows down training, while a large one may lead
to overshooting and instability.
3. Chain Rule of Calculus:
○ Backpropagation relies on the chain rule to compute how a change in
weights and biases affects the loss.
○ This allows efficient computation of gradients for networks with many
layers.
4. Convergence:
○ The training process is repeated until the loss converges, indicating
that the model is optimized to its best capacity on the training data.

Advantages of Backpropagation

1. Efficiency:
○ Enables efficient training of deep networks by computing gradients for
all parameters simultaneously.
2. General Applicability:
○ Works with various architectures and loss functions.
3. Foundation of Deep Learning:
○ Essential for training modern deep neural networks.

Limitations of Backpropagation

1. Vanishing and Exploding Gradients:


○ Gradients can become extremely small (vanishing) or large
(exploding) in very deep networks, making training difficult.
2. Overfitting:
○ Backpropagation may lead to overfitting if the model is too complex or
the training data is insufficient.
3. Dependence on Hyperparameters:
○ Requires careful tuning of learning rates, initialization, and other
hyperparameters.

Real-World Applications

● Image recognition (e.g., Convolutional Neural Networks for object detection).


● Natural Language Processing (e.g., Recurrent Neural Networks for text
generation).
● Recommendation systems and autonomous systems.

By iteratively reducing the error, backpropagation ensures that neural networks learn
effectively from data, forming the backbone of modern machine learning and AI.
2. What is Gradient Descent? Explain Momentum based Gradient Descent in
detail.(7m)
● Gradient Descent is a fundamental optimization algorithm used in machine
learning and deep learning to minimize the loss function by iteratively
updating the model's parameters.
● The updates guide the parameters (weights and biases) toward the values
that result in the smallest possible error, ensuring the model performs
optimally on the data.
● Steps in Gradient Descent
i. Initialize Parameters: Begin with random values for weights (w) and
biases (b).
ii. Calculate Gradients: Compute the gradient of the loss function (J(θ))
with respect to each parameter (θ), indicating the direction of steepest
ascent.
iii. Update Parameters: Update each parameter (θ) by taking a step in
the direction of the negative gradient (steepest descent):

iv. Iterate: Repeat the process for multiple iterations or until the loss
converges (stabilizes to a minimum value).

Key Factors in Gradient Descent

1. Learning Rate (α):


1. Determines how large a step is taken in the direction of the gradient.
2. A small α\alphaα may result in slow convergence.
3. A large α\alphaα can overshoot the minimum or cause instability.
2. Convergence: The algorithm stops when the loss function reaches a
minimum value or a predefined number of iterations is completed.
3. Challenges with Gradient Descent
● Ravines or Narrow Regions:Gradient descent can oscillate in
regions where the loss landscape is steep in one direction and flat in
another.
● Local Minima: The algorithm may converge to a local minimum rather
than the global minimum.
● Slow Convergence: In flat regions or regions with small gradients,
convergence can be slow.

Momentum-Based Gradient Descent: Momentum-based gradient descent is an extension


of gradient descent designed to address its limitations, especially slow convergence and
oscillations.
It adds a velocity term to the parameter update, which accumulates information about past
gradients. This approach helps the algorithm move more smoothly and consistently toward
the minimum.

● Steps in Momentum-Based Gradient Descent


1. Compute Velocity (vt):

2. Update Parameters (θ):


● Use the velocity to update the parameters: θ=θ−α*vt Where:
○ α: Learning rate.
○ vt​: Current velocity
3. Advantages of Momentum-Based Gradient Descent
1. Speeds Up Convergence:
○ Particularly useful in regions with narrow valleys or
ravines in the loss landscape.
○ Helps the model escape flat or slow-moving regions
quickly.
2. Reduces Oscillations:
○ Damps oscillations caused by sharp gradients in one
direction, stabilizing learning.
3. Smooth Learning Path:
○ Maintains a consistent motion in a particular direction,
reducing the impact of small fluctuations in the
gradient.
4. Real-World Applications
1. Deep Learning:
○ Training deep neural networks where the loss
landscape has many local minima and flat
regions.
2. Image Recognition:
○ Helps models like Convolutional Neural
Networks (CNNs) train faster and more
efficiently.
3. Natural Language Processing (NLP):
○ Used in models like Recurrent Neural Networks
(RNNs) to stabilize learning in sequence data.
3.
a. Briefly describe the concept of Hyperparameters.(4m)
Hyperparameters are adjustable settings in machine learning that define how a model is
trained. Unlike parameters (e.g., weights and biases), which are learned during training,
hyperparameters are set before training begins and directly impact the training process and
model performance.

Common Hyperparameters

1. Learning Rate (α\alphaα):


○ Controls the step size for updating model weights.
○ A high learning rate speeds up training but risks overshooting the
optimal solution.
○ A low learning rate ensures stability but slows convergence.
2. Batch Size:
○ Determines the number of samples processed before updating the
weights.
○ Smaller batches result in more frequent updates, improving
generalization but increasing training time.
○ Larger batches are faster but require more memory and may reduce
generalization.
3. Number of Layers and Neurons:
○ Impacts the complexity and learning capacity of the model.
○ More layers and neurons can learn complex patterns but may overfit.
○ Fewer layers risk underfitting by failing to capture intricate patterns.
4. Epochs:
○ Defines how many times the entire training dataset is passed through
the model.
○ Too few epochs may lead to underfitting, while too many can cause
overfitting.

Importance of Hyperparameters: Hyperparameters influence:

1. Model Performance: The right settings balance underfitting and overfitting.


2. Training Efficiency: Adjusting hyperparameters can reduce training time and
computational resources.
3. Generalization: Proper tuning ensures the model performs well on unseen
data.

Tuning Techniques

1. Grid Search: Tests all possible combinations of hyperparameter values.


2. Random Search: Samples random combinations within a defined range.
3. Automated Tools: Libraries like Optuna and Hyperopt optimize
hyperparameters efficiently.

Selecting and tuning hyperparameters is crucial for building an effective and robust model.
b. What are the applications of CNN? (3m)

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed
to process and analyze data with spatial or temporal dependencies, such as images, videos,
and time series data. By leveraging convolutional layers, CNNs excel in feature extraction,
making them ideal for pattern recognition and tasks requiring a spatial understanding of the
data.
Applications of CNNs

1. Image Classification

● Definition: Assigning labels to images by recognizing objects, patterns, or


textures within them.
● Examples:
○ Identifying animals (e.g., cats vs. dogs).
○ Classifying handwritten digits in datasets like MNIST.
○ Recognizing faces, objects, or scenes in photos.
● Real-World Use Cases:
○ Social media platforms use CNNs for automatic photo tagging.
○ E-commerce companies utilize CNNs for product classification and
recommendation.

2. Object Detection

● Definition: Locating and identifying multiple objects within an image along


with their bounding boxes.
● Examples:
○ Detecting cars and pedestrians in autonomous driving.
○ Identifying tumors in medical imaging.
● Techniques:
○ Region-based CNNs (R-CNN, Fast R-CNN, Faster R-CNN).
○ YOLO (You Only Look Once) and SSD (Single Shot MultiBox
Detector).
● Real-World Use Cases:
○ Surveillance systems for detecting suspicious activities.
○ Retail analytics for tracking customer behavior.

3. Medical Image Analysis

● Definition: Applying CNNs to interpret medical images like X-rays, MRIs, and
CT scans for diagnostic purposes.
● Examples:
○ Identifying fractures in X-ray images.
○ Detecting cancerous cells in histopathological images.
○ Classifying retinal diseases from fundus photography.
● Real-World Use Cases:
○ Early disease detection in healthcare.
○ Assisting radiologists by highlighting areas of concern.
4. Video Processing

● Definition: Analyzing video streams to recognize activities, events, or objects


over time.
● Examples:
○ Action recognition (e.g., identifying if someone is walking, running, or
jumping).
○ Tracking vehicles in traffic monitoring systems.
● Techniques:
○ Combining CNNs with Recurrent Neural Networks (RNNs) for
temporal analysis.
○ 3D CNNs for capturing spatial and temporal features simultaneously.
● Real-World Use Cases:
○ Sports analytics for identifying player movements and strategies.
○ Video surveillance for detecting abnormal behavior.

5. Text and Natural Language Processing (NLP)

● Definition: Applying CNNs to text-based tasks by transforming textual data


into image-like embeddings.
● Examples:
○ Sentiment analysis: Classifying text as positive, negative, or neutral.
○ Text classification: Categorizing news articles or customer reviews.
● Techniques:
○ Treating word embeddings (e.g., Word2Vec, GloVe) as input to a
CNN.
○ Using character-level embeddings for tasks like spam detection.
● Real-World Use Cases:
○ Social media sentiment analysis.
○ Email filtering (e.g., spam vs. non-spam classification).

Additional Applications of CNNs

1. Style Transfer: CNNs can apply artistic styles to images or videos (e.g.,
transforming a photo to mimic the style of a painting).
2. Facial Recognition: Widely used in security systems and smartphone
authentication.
3. Self-Driving Cars: CNNs process camera feeds to detect road signs, lane
boundaries, and pedestrians.
4. Speech Recognition: Used for spectrogram analysis in audio and speech
processing tasks.
5. Satellite Image Analysis: Detecting geographical changes, classifying land
use, and monitoring environmental conditions.
6. Generative Applications: CNNs power Generative Adversarial Networks
(GANs) to create realistic images, videos, and animations.

Key Advantages of CNNs in Applications


1. Feature Extraction: CNNs automatically detect important features like
edges, textures, and shapes without manual engineering.
2. Parameter Efficiency: Convolutional layers share weights across spatial
dimensions, making CNNs computationally efficient.
3. Scalability: CNNs can be extended for large-scale datasets and tasks with
high accuracy.

Conclusion

CNNs have transformed how we approach spatial and sequential data analysis. From
medical diagnostics to autonomous vehicles, their versatility and effectiveness have made
them indispensable in modern AI-driven applications. By continuing to innovate, CNNs are
likely to play an even more critical role in solving complex real-world problems.

4. Distinguish between Biological Neural Network and Artificial Neural Network.(7m)


5. Describe Batch Normalization with an example.(7m)
● Batch Normalization (BatchNorm) was introduced by Sergey Ioffe and
Christian Szegedy in 2015 to address the issue of internal covariate shift,
where the distribution of inputs to each layer changes during training.
● It normalizes the activations of neurons within a mini-batch, stabilizing and
accelerating the training process. BatchNorm is widely used in modern deep
learning architectures due to its effectiveness in improving convergence and
reducing sensitivity to initialization.
● Steps in Batch Normalization:
i. Compute Batch Statistics: For each mini-batch during training,
compute:
1. Mean (μ\muμ): Average value of the activations in the
mini-batch.
2. Variance (σ2\sigma^2σ2): Measure of the spread of
activations in the mini-batch.
ii. Normalize Activations:

iii. Scale and Shift: Transform the normalized activations using learnable
parameters γ (scale) and β (shift):

● Benefits of Batch Normalization:


○ Improved Stability: Reduces internal covariate shift, ensuring
consistent distributions of inputs across layers.
○ Accelerated Training: Allows for faster convergence by smoothing
the loss landscape.
○ Higher Learning Rates: Enables the use of larger learning rates
without causing instability.
○ Reduced Dependency on Initialization: Decreases sensitivity to
weight initialization, simplifying model tuning.
○ Regularization: Acts as a regularizer by adding noise during training,
reducing overfitting.
6.
a. Explain different Activation Functions.(4m)

Activation functions introduce non-linearity into a neural network, enabling it to learn


complex patterns and relationships in data. They operate on the outputs of neurons and
determine how the network activates for a given input.
b. Discuss the concept of Overfitting and Underfitting.(3m)
When training machine learning models, achieving the right balance between learning the
training data and generalizing to unseen data is crucial. Two major challenges in this context
are overfitting and underfitting.

You might also like