Deep Learning 15
Deep Learning 15
What do you mean by single layer perceptron so that a single layer perceptron
can be used to represent the boolean function AND, OR, NOR, NAND?
Single Layer Perceptron: A simple neural network model with a single layer of
output neurons connected to input features. It applies a linear combination of
inputs and weights, followed by an activation function.
o AND: The perceptron fires when both inputs are 1. Weights can be
positive, and bias must be negative enough to prevent firing when any
input is 0.
o OR: It fires if either input is 1. Weights are positive, with bias slightly
negative.
o NAND: A combination of AND but with a negative weight, making it fire for
opposite conditions.
o NOR: Similar to OR but with negative weights to fire only when both inputs
are 0.
2. What is the significance of activation function used in the deep neural network?
Explain di erent types of activation functions in the deep neural network.
Significance:
2. Tanh:
4. Leaky ReLU:
5. Softmax:
3. What do you mean by vanishing gradient problem that generally occurs in the
deep neural network? How is it distinguished from gradient descent problem?
o Gradient Descent is the optimization process that can face issues like
slow convergence or saddle points, but it's not specifically about the
gradients disappearing.
4. Explain feedforward and backpropagation algorithm.
Feedforward Algorithm:
o The process of passing input data through the network layer by layer to get
the final output.
Backpropagation Algorithm:
o Step 1: Compute loss by comparing the predicted output with the actual
output using a loss function.
o Step 2: Calculate the gradient of the loss concerning the weights using
chain rule (i.e., partial derivatives).
Example:
In a network with 2 layers, backpropagation will compute the gradients for each
layer starting from the output, updating weights to reduce error.
5. Why batch normalization is required while training a deep neural network model?
Problems:
1. Overfitting:
2. Covariate Shift:
3. Vanishing Gradient:
4. Exploding Gradient:
7. With the help of suitable diagrams, explain the basic building block and their
operation of a convolutional neural network.
1. Convolutional Layer:
2. ReLU Activation:
3. Pooling Layer:
Overfitting:
o The model learns the noise and details of the training data too well,
resulting in poor generalization to new data.
Underfitting:
o The model fails to learn the underlying patterns in the data, resulting in
poor performance on both training and test data.
9. Explain di erent types of loss function used in the context of deep neural
network.
2. Cross-Entropy Loss:
3. Hinge Loss:
4. Huber Loss:
A combination of MSE and MAE, robust to outliers, used for
regression tasks.
Parameter Sharing:
o In CNNs: Filters are shared across the input image, meaning the same
filter is applied to multiple parts of the image to detect similar features
like edges, textures, etc.
11. Explain di erent types of regularization techniques used in the deep neural
network.
Regularization Techniques:
1. L2 Regularization (Ridge):
2. L1 Regularization (Lasso):
3. Dropout:
4. Early Stopping:
5. Data Augmentation:
ChatGPT said:
ChatGPT
Components of GANs
1. Generator (G):
o The generator network creates fake data samples from random noise
(latent space).
o The goal of the generator is to "fool" the discriminator into thinking the
generated data is real.
2. Discriminator (D):
o The goal of the discriminator is to correctly identify real vs. fake samples.
Training Process
o A batch of real data is fed into the discriminator to compute the loss (how
well it classifies real data).
o The fake data is sent to the discriminator, and the loss is computed based
on how well the discriminator classifies it as real.
3. Adversarial Training:
o The generator aims to improve its ability to create realistic data, while the
discriminator aims to become better at distinguishing between real and
fake data.
Diagram
sql
Copy code
+-----------------+
| |
| Generator (G) |
| |
+--------|--------+
| Fake Data
+-----------------+
| |
| Discriminator (D)|
| |
+--------|--------+
| Real or Fake?
+-----------------+
| |
| Real Data |
| |
+-----------------+
Conclusion
The interaction between the generator and the discriminator forms a minimax game,
where the generator tries to minimize the likelihood of detection by the discriminator,
while the discriminator tries to maximize its accuracy. Over time, as both networks
improve, the generator produces high-quality samples that closely resemble the training
data, achieving the goal of generating realistic outputs.
13. Determine the shape of the output matrix of an image of size 19×19 that uses a
padding of size = 2, stride size = 2, and filter size = 5×5.
ChatGPT said:
ChatGPT
To determine the shape of the output matrix after applying a convolutional layer with a
specified image size, padding, stride, and filter size, we can use the following formula:
Q14. What are the hyperparameters that can be optimized for batch normalization
technique?
ChatGPT said:
ChatGPT
Batch normalization (BN) is a technique used to stabilize and accelerate the training of
deep neural networks by normalizing the inputs of each layer. When using batch
normalization, several hyperparameters can be optimized to improve the model's
performance. Here are the key hyperparameters that can be optimized for batch
normalization:
1. Momentum (β):
2. Epsilon (ε):
3. Learning Rate:
Description: The rate at which the model's weights are updated during training.
Impact: The learning rate can a ect how well batch normalization works.
Sometimes, adjusting the learning rate can lead to better convergence when
using batch normalization.
4. Batch Size:
Impact: The size of the batch can influence the estimation of the mean and
variance. Larger batches can provide more stable estimates, while smaller
batches might lead to noisier estimates.
Description: In batch normalization, scale (γ) and shift (β) parameters can be
learned during training.
Impact: These parameters allow the network to learn how to adjust the
normalized values, which can improve model performance. Optimizing the initial
values and whether to use them can be critical.
Conclusion