Deep Learning Final
Deep Learning Final
MCQ:
1. What is the purpose of an activation function in neural network?
a) It introduce non –linearity to the network
b) It determines the output of a neuron
c) It helps in back propagation
d) All of these
e) None of these
3. Which deep learning technique is used for learning from delayed rewards?
a) Reinforcement learning
b) Supervised Learning
c) unsupervised learning
d) Transfer learning
e) None of these
4. Computer are best at learning
a) Facts
b) Accuracy
c) Procedure
d) All of these
e) None of these
9. What is the primary limitation using deep learning in cases with limited labled data?
a) The inahability to use transfer Learning
b) The need for larger network
c) The risk of over tting
d) The requirement for more computational power
e) None of these
11. Which neural network has only one hidden layer between the input and output?
A. Shallow neural network
B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
View Answer
13. Deep learning algorithms are _______ more accurate than machine learning algorithm in
image classi cation.
A. 33%
B. 37%
C. 40%
D. 41%
14. In which of the following applications can we use deep learning to solve the problem?
A. Protein structure prediction
B. Prediction of chemical reactions
C. Detection of exotic particles
D. All of the above
15. Which of the following statements is true when you use 1×1 convolutions in a CNN?
A. It can help in dimensionality reduction
B. It can be used for feature pooling
fi
fi
C. It su ers less over tting due to small kernel size
D. All of the above
View Answer
16. The number of nodes in the input layer is 10 and the hidden layer is 5. The maximum
number of connections from the input layer to the hidden layer are
A. 50
B. less than 50
C. more than 50
D. It is an arbitrary value
View Answer
17. The input image has been converted into a matrix of size 28 X 28 and a kernel/ lter of size
7 X 7 with a stride of 1. What will be the size of the convoluted matrix?
A. 20x20
B. 21x21
C. 22x22
D. 25x25
Marks: 2
1. Explain multilayer perception.
A Multilayer Perceptron (MLP) is a type of arti cial neural network consisting of an
input layer, one or more hidden layers, and an output layer, where each neuron is fully
connected to the next layer. It uses weights, biases, and activation functions to learn
patterns in data for tasks like classi cation and regression.
3. What is hyperparameters?
Hyperparameters are settings in a machine learning model that are de ned before
training and not learned from data. Examples include the learning rate, number of
layers, batch size, and number of epochs. They in uence the training process and
model performance.
7. De ne ELU.
ELU (Exponential Linear Unit) is an activation function de ned as:
It helps reduce vanishing gradients and improves learning by allowing small negative
values.
8. Describe the theory of the autonomous form of Depp learning in a few words.
The autonomous form of deep learning refers to models that independently learn
patterns, extract features, and make decisions without manual feature engineering or
human intervention.
Marks: 5
1. Compared and contrast single layer model and multilayered perceptron model.
fi
fi
fi
fi
fi
fi
ffi
fi
Here's a comparison of single layer models and multilayered perceptron (MLP) models:
Comparison Points
1. Complexity: MLP models are more complex than single layer models, with more
layers and connections.
2. Capacity: MLP models have a greater capacity to learn complex patterns in data.
3.Training: MLP models require more data and computational resources to train.
4. Applications: MLP models are suitable for a wider range of applications.
Contrasting Points
1. Interpretability: Single layer models are more interpretable than MLP models, as the
relationships between inputs and outputs are more transparent.
2. Training Time: Single layer models train faster than MLP models.
3. Over tting: MLP models are more prone to over tting than single layer models, due
to their greater capacity.
In a feedforward neural network in deep learning, data ows through multiple layers in
a single direction, from input to output, without looping back. Here’s a concise
explanation:
1. Input Layer: The input layer receives raw data, with each neuron representing one
feature of the data (like pixels in an image or words in a sentence).
2. Hidden Layers: The input is passed through one or more hidden layers. Each neuron
in these layers computes a weighted sum of its inputs, adds a bias term, and applies
an activation function (e.g., ReLU, Sigmoid) to introduce non-linearity, allowing the
network to learn complex patterns.
fi
fi
fi
fi
fi
fi
fi
fl
3. Output Layer: The nal layer produces the output, which might represent
probabilities (for classi cation) or values (for regression).
4. Feedforward Process: Data ows forward through the network without looping
back, making this structure straightforward.
5. Training: During training, weights and biases are adjusted to minimize the error
between predicted and actual values using algorithms like backpropagation and
gradient descent.
Transfer Learning: Pre-trained CNN models are commonly used in transfer learning,
where the model is ne-tuned on a new dataset to adapt the learned features for
speci c tasks, such as image classi cation or object detection.
Using pre-trained CNNs helps reduce the time and computational resources required
to train a model, especially when data is limited or when starting from scratch is
computationally expensive.
2. Under tting
• De nition: Under tting occurs when a model is too simple to capture the underlying
patterns in the data, leading to poor performance on both the training and test data.
• Example: Using a linear model to predict house prices when the relationship
between features and prices is non-linear would be an example of under tting. The
model would fail to capture the complexity of the data, resulting in high errors on
both training and test sets.
• Solution: To address under tting, we can increase the model complexity (e.g., use a
more complex model or add more layers in a neural network) or use a model that
better suits the data’s complexity, such as a decision tree or neural network for non-
linear relationships.
In over tting, the model learns too much detail from the training data, while in
under tting, it learns too little, failing to capture the underlying structure.
3.Trade-off:
fi
fi
fi
fi
fi
fi
fi
fi
ff
fi
fi
fl
fi
• The Bias-Variance trade-off involves balancing model complexity. A model with
low bias and low variance is ideal, but typically, reducing bias increases variance
and vice versa.
• Total Error = Bias² + Variance + Irreducible Error, where the goal is to minimize
both bias and variance to reduce overall error.
4. Achieving Balance:
• Regularization techniques, such as L2 regularization or dropout, help control
variance.
• Cross-validation aids in selecting a model that generalizes well without
over tting or under tting.
The Bias-Variance trade-off aims to balance simplicity and complexity in a model
to achieve optimal predictive performance on unseen data.
4. Use Cases:
ff
fi
fl
fi
fi
fl
• RNN: Suitable for tasks where short-term dependencies are suf cient, such as
basic text generation or simple time series.
• LSTM: Preferred for tasks with long dependencies, like language translation,
speech recognition, and complex time series.
5. Computational Complexity:
• RNN: Simpler and faster to train but may underperform on complex sequences.
• LSTM: More computationally intensive but more effective for long sequences.
In summary, LSTMs extend RNNs with a more sophisticated memory
mechanism, making them better suited for tasks requiring long-term memory.
2. Hidden State: The RNN maintains a hidden state, which is used to capture the
context of the input sequence. Let's denote the hidden state at time step t as h_t.
3. Weighted Sum: At each time step, the input is multiplied by the input weights (W_x),
and the hidden state is multiplied by the recurrent weights (W_h). The results are
summed to produce a weighted sum.
5. Hidden State Update: The output of the RNN is used to update the hidden state for
the next time step. The updated hidden state is computed as: h_t = σ(W_x * x_t + W_h *
h_(t-1))
6. Output: The nal output of the RNN is typically the output at the last time step.
Mathematical Representation
The forward propagation process in an RNN can be mathematically represented as:
h_t = σ(W_x * x_t + W_h * h_(t-1))
o_t = σ(W_o * h_t)
where:
- h_t is the hidden state at time step t
- x_t is the input at time step t
- W_x, W_h, and W_o are the input, recurrent, and output weights, respectively
- σ is the activation function
fi
fi
Key Takeaways
- Forward propagation in an RNN involves computing the output of the network given
an input sequence.
- The RNN maintains a hidden state that captures the context of the input sequence.
- The output of the RNN is computed using the hidden state and the input at each
time step.
Marks: 10
1.Discuss various performance matrices to evaluate a deep learning model with
example.
Evaluating the performance of a deep learning model is crucial to determine its
e ectiveness and identify areas for improvement. Here are some common
performance metrics used to evaluate deep learning models, along with examples:
Regression Metrics
1. Mean Squared Error (MSE): Measures the average squared di erence between
predicted and actual values.
Example: A model achieves an MSE of 0.05 on a test dataset, indicating a small
average di erence between predicted and actual values.
2. Mean Absolute Error (MAE): Measures the average absolute di erence between
predicted and actual values.
Example: A model achieves an MAE of 0.03 on a test dataset, indicating a small
average absolute di erence between predicted and actual values.
3. Coe cient of Determination (R-squared): Measures the proportion of variance in the
dependent variable that is predictable from the independent variable(s).
Example: A model achieves an R-squared of 0.8 on a test dataset, indicating that 80%
of the variance in the dependent variable is predictable from the independent
variable(s).
ff
fi
fi
ffi
ff
ff
fi
ff
ff
fi
Other Metrics
1. Loss: Measures the di erence between the model's predictions and the actual
values.
Example: A model achieves a loss of 0.2 on a test dataset, indicating a moderate
di erence between predicted and actual values.
2.Perplexity: Measures the uncertainty of the model's predictions.
Example: A model achieves a perplexity of 0.5 on a test dataset, indicating low
uncertainty in its predictions.
Suppose we are building a deep learning model to classify images as either "cats" or
"dogs". We use a convolutional neural network (CNN) and train it on a dataset of
labeled images. After training, we evaluate the model's performance on a test dataset
using the following metrics:
- Accuracy: 0.9
- Precision: 0.85
- Recall: 0.95
- F1-score: 0.9
- ROC-AUC: 0.95
These metrics indicate that the model is performing well, with high accuracy,
precision, recall, and F1-score. The ROC-AUC score also indicates excellent
performance in distinguishing between the two classes.
2.List and explain the various activation functions used in modelling of arti cial
neurone .Also explain their suitability with respect to application.
Here are some common activation functions used in arti cial neural networks, along
with their explanations and suitability for di erent applications:
_Suitability_: Binary classi cation problems, such as spam vs. not spam emails.
_Suitability_: Multiclass classi cation problems, such as image classi cation, natural
language processing.
_Suitability_: Deep neural networks, image classi cation, natural language processing.
In summary, the choice of activation function depends on the speci c problem you are
trying to solve, as well as the architecture of your neural network.
Choosing the right loss function depends on the speci c problem you're trying to
solve. For example, MSE is commonly used for regression problems, while CEL is
commonly used for classi cation problems.
1.Neurons: The model consists of binary neurons, meaning each neuron can either be
in one of two states—either “ ring” (1) or “not ring” (0).
fi
fi
ff
fi
fi
fi
ff
fi
fi
ff
ff
ff
fi
fi
fi
2.Inputs and Weights: Each neuron receives multiple inputs, each associated with a
weight. These weights determine the strength of each input. The neuron sums the
weighted inputs.
3.Threshold: A threshold is applied to the sum of the weighted inputs. If the sum
exceeds the threshold, the neuron “ res” (output = 1); otherwise, it does not re
(output = 0).
4.Activation Function: The activation function in the McCulloch-Pitts model is a step
function:
•If the sum of inputs is greater than or equal to a threshold, the output is 1.
•If the sum of inputs is less than the threshold, the output is 0.
Mathematical Representation:
Given a set of inputs \( x_1, x_2, \dots, x_n \) with corresponding weights \( w_1, w_2,
\dots, w_n \), the output y of the McCulloch-Pitts neuron is determined by:
y=
\begin{cases}
1 & \text{if } \sum_{i=1}^{n} w_i \cdot x_i \geq \theta \\
0 & \text{if } \sum_{i=1}^{n} w_i \cdot x_i < \theta
\end{cases}
Where:
• \theta is the threshold value.
• w_i is the weight of the i^{th} input.
Example:
Consider a simple McCulloch-Pitts neuron with two inputs, x_1 and x_2 , having
weights w_1 = 0.5 and w_2 = 0.6 , and a threshold \theta = 1.0 .
# Preprocessing
1. Data preparation: Collect and preprocess the data.
2. Feature engineering: Select relevant features and transform them.
Hyperparameters to Optimize
1. Learning rate (alpha): 0.01, 0.1, 1.0
2. Number of hidden layers (hidden_layers): 1, 2, 3
3. Regularization strength (reg_strength): 0.0, 0.5, 1.0
Grid Search
| alpha | hidden_layers | reg_strength | Accuracy |
| --- | --- | --- | --- |
| 0.01 | 1 | 0.0 | 0.85 |
| 0.01 | 1 | 0.5 | 0.88 |
| 0.01 | 1 | 1.0 | 0.82 |
| ... | ... | ... | ... |
| 1.0 | 3 | 0.5 | 0.92 |
| 1.0 | 3 | 1.0 | 0.89 |
# Advantages
1. Easy to implement: Grid search is straightforward to implement.
2. Interpretable results: Grid search provides interpretable results.
# Disadvantages
1. Computational expensive: Grid search can be computationally expensive.
2. Over tting: Grid search can lead to over tting if the grid is too large.
# Alternatives
1. Random search: Random search can be more e cient than grid search.
2. Bayesian optimization: Bayesian optimization can be more e cient and e ective
than grid search.
# Preparation
1. De ne the problem: Clearly de ne the problem you're trying to solve.
2. Choose a model: Select a suitable machine learning model for your problem.
3. Prepare data: Split your data into training, validation, and testing sets.
fi
fi
fi
fi
ffi
fi
ffi
ff
# Hyperparameter Selection
1. Identify hyperparameters: Determine the hyperparameters to tune, such as learning
rate, regularization strength, or number of hidden layers.
2. De ne hyperparameter ranges: Specify the range of values for each hyperparameter.
3. Choose a tuning method: Select a hyperparameter tuning method, such as grid
search, random search, or Bayesian optimization.
# Hyperparameter Tuning
1. Grid search: Evaluate the model with each combination of hyperparameters in the
grid.
2. Random search: Randomly sample hyperparameters from the de ned ranges.
3. Bayesian optimization: Use a probabilistic approach to optimize hyperparameters.
# Evaluation
1. Measure performance: Evaluate the model's performance using a metric (e.g.,
accuracy, precision, recall).
2. Compare results: Compare the performance of di erent hyperparameter
combinations.
# Best Practices
1. Start with a small grid: Begin with a small grid and gradually increase the size.
2. Use cross-validation: Use cross-validation to evaluate model performance.
3. Monitor over tting: Monitor for over tting and adjust hyperparameters accordingly.
4. Document results: Document the hyperparameter tuning process and results.
By following these steps and best practices, you can e ectively perform
hyperparameter tuning and improve your machine learning model's performance.
fi
fi
fi
ff
ff
fi