2 Marks Gen AI

The document covers various concepts in neural networks, including the differences between biological and artificial neurons, the significance of activation functions, and the limitations of perceptrons in solving non-linear problems like XOR. It also discusses advanced topics such as the vanishing gradient problem, weight initialization, and techniques like dropout and transfer learning, which enhance model performance. Additionally, it explores recurrent neural networks (RNNs) and their limitations, particularly in capturing long-term dependencies, and introduces Long Short-Term Memory (LSTM) networks as a solution.

Uploaded by

Bhupesh Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

2 Marks Gen AI

Uploaded by

Bhupesh Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

MODULE-1 & 2 (2 MARKS)

1. Biological vs. Artificial Neurons: Biological neurons transmit electrical impulses across
synapses to communicate with other neurons. In contrast, artificial neurons in neural
networks simulate this by processing inputs with weighted connections, and an activation
function determines output. While biological neurons are much more complex, artificial
neurons aim to mimic their behavior for tasks like pattern recognition and learning.

2. Significance of Activation Function: Activation functions introduce non-linearity into a neural

network, enabling it to model complex relationships between inputs and outputs. Without
non-linearity, neural networks would only perform linear transformations, limiting their
capacity to solve complex tasks. Functions like ReLU, sigmoid, or tanh allow the network to
learn from data and make decisions in a more flexible and powerful way.

3. Perceptron & XOR: The perceptron, a simple linear classifier, fails to solve the XOR problem
because XOR is not linearly separable. The perceptron can only draw a straight line (or
hyperplane) to separate classes, but XOR requires a non-linear boundary. This limitation
highlights the need for more complex models, like multi-layer neural networks, which can
capture non-linear relationships.

4. Hyperplane in Perceptron: A hyperplane is the decision boundary in a perceptron, used to

separate different classes in the feature space. In a binary classification task, the perceptron
attempts to find a hyperplane that divides the two classes of data. This boundary is
determined based on the weighted sum of inputs and biases, helping to classify new data
points.

5. Weights & Biases: Weights in a neural network control the importance of each input feature,
while biases help shift the activation function, allowing for better fitting of data. During
training, weights and biases are adjusted through backpropagation to minimize the error
between the predicted and actual outputs. Proper tuning ensures the model learns the
underlying patterns of the data.

6. Vanishing Gradient Problem: The vanishing gradient problem occurs in deep networks when
gradients of the loss function become very small during backpropagation, preventing the
model from learning effectively. This happens when activation functions like sigmoid squash
values into a narrow range, causing gradients to diminish as they propagate back through
layers. This issue can hinder deep network training, especially in early layers.

7. Batch vs. Stochastic Gradient Descent: Batch gradient descent calculates the gradient using
the entire dataset to update the model weights, making it computationally expensive.
Stochastic gradient descent (SGD) updates weights after each training sample, which is faster
and often leads to quicker convergence. However, SGD introduces noise, making it less stable
than batch gradient descent but beneficial in large datasets and online learning.

8. Purpose of Padding in CNNs: Padding is used in convolutional neural networks (CNNs) to add
extra pixels around the input image, ensuring that the output size remains the same or
appropriate. It helps preserve spatial information at the borders of the image and prevents
the reduction of feature map size during convolution operations. Padding also helps in
detecting edge features more effectively.
9. Weight Initialization in Deep Networks: Proper weight initialization in deep networks is
crucial for preventing the vanishing or exploding gradient problems during backpropagation.
By starting with small, randomly chosen weights, the network avoids symmetry issues and
ensures effective learning. Techniques like Xavier and He initialization are used to adjust
weights according to the activation function, speeding up convergence and improving
performance.

10. L1 and L2 Regularization: L1 regularization adds a penalty proportional to the absolute

values of weights, encouraging sparsity in the model. L2 regularization adds a penalty
proportional to the squared weights, which discourages large weights and helps prevent
overfitting. Both techniques are used to reduce model complexity and improve
generalization, making the network more robust to unseen data.

11. Transfer Learning: Transfer learning leverages a pre-trained model on a similar, large dataset
and adapts it to a new task with less data. Instead of training a model from scratch,
knowledge learned from previous tasks is reused, saving time and computational resources.
Transfer learning is particularly useful when there is limited data for the new task but
sufficient data for a related task.

12. Advantages of ReLU: ReLU (Rectified Linear Unit) activation function is preferred over
sigmoid and tanh because it does not suffer from the vanishing gradient problem for positive
values. ReLU provides a fast and simple way to introduce non-linearity, speeds up
convergence during training, and ensures efficient learning. It has a straightforward
derivative and helps deep networks train more effectively.

13. Kernel in CNN: In convolutional neural networks (CNNs), a kernel (or filter) is a small matrix
applied to the input data, usually an image, to extract features like edges or textures. The
kernel slides over the input image and performs element-wise multiplication, creating a
feature map that highlights important patterns. Multiple filters are used in CNNs to capture
diverse features at different levels of abstraction.

14. Max Pooling vs. Average Pooling: Max pooling and average pooling are techniques used in
CNNs for downsampling the feature map. Max pooling selects the maximum value from a
defined region, preserving the most important feature. Average pooling calculates the
average of values in the region, which helps reduce the complexity of the feature map and
smooths out the extracted features, retaining less specific information.

15. Softmax Function in Multi-Class Classification: Softmax is used in multi-class classification to

convert raw model outputs (logits) into probabilities that sum to 1. Each probability indicates
the likelihood of a sample belonging to a particular class. The function is essential in tasks
like image recognition, where the model needs to predict the most likely class from multiple
possible options. Softmax facilitates clearer decision-making in classification problems.

16. Learning Rate: The learning rate is a crucial hyperparameter that controls how much the
model weights are updated during training. A high learning rate may lead to instability and
overshooting the optimal solution, while a low learning rate can result in slow convergence,
requiring more epochs to reach the minimum. Finding the optimal learning rate is key to
efficient and effective training.

17. Dropout in Deep Networks: Dropout is a regularization technique that randomly disables a
percentage of neurons during training to prevent overfitting. It forces the model to learn
redundant representations, making it more robust and preventing dependence on any single
neuron. Dropout improves generalization, ensuring that the network doesn’t become too
specialized to the training data, leading to better performance on unseen data.

18. ResNet Architecture: ResNet (Residual Networks) utilizes residual connections, allowing
gradients to flow directly through layers without degradation. This helps train very deep
networks by mitigating the vanishing gradient problem. By introducing shortcut connections
that bypass one or more layers, ResNet enables networks to have hundreds or even
thousands of layers while still being trainable, leading to improved accuracy in complex tasks.

19. Object Detection vs. Image Classification: Object detection not only identifies the objects
within an image but also provides their locations through bounding boxes. In contrast, image
classification assigns a single label to an image without specifying the position of individual
objects. Object detection is more complex, requiring both classification and localization,
whereas image classification is simpler, focusing solely on identifying the object or scene as a
whole.

20. Optimizer in Deep Learning: Optimizers adjust the weights of a neural network during
training by using the gradients calculated through backpropagation. Popular optimizers, like
stochastic gradient descent (SGD) and Adam, help minimize the loss function, allowing the
model to improve its accuracy. These algorithms choose different strategies for updating
weights, balancing between computational efficiency and convergence speed during the
learning process.

21. Bounding Box in Object Detection: A bounding box is a rectangular box that defines the
location of an object in an image, typically represented by the coordinates of the top-left and
bottom-right corners. It is used in object detection tasks to localize and identify objects
within the image. Bounding boxes are essential for evaluating the precision of object
detection models and ensuring accurate localization of objects.

22. Inception Module in GoogLeNet: The Inception module in GoogLeNet allows the network to
learn multi-scale features by applying multiple convolution filters of different sizes in parallel.
This architectural innovation improves efficiency by extracting diverse features without
significantly increasing the computational cost. It helps the model capture a wide range of
spatial information, making it more robust for image classification and recognition tasks.

23. Stride in CNNs: Stride refers to the step size with which the convolutional filter moves across
the input image. A stride of 1 means the filter moves pixel by pixel, preserving spatial
resolution, while larger strides reduce the output size by skipping pixels. Adjusting the stride
affects the spatial dimensions of the output feature map, impacting the amount of
information retained during convolution.

24. Normalization in Deep Networks: Normalization techniques like batch normalization

standardize the inputs to each layer of a neural network, ensuring that they have a
consistent mean and variance. This helps prevent internal covariate shifts, speeds up
training, and stabilizes learning by making gradients more consistent across the network.
Normalization improves the efficiency of deep networks, allowing them to train faster and
converge more reliably.

25. Landmark Detection in Computer Vision: Landmark detection involves identifying specific
key points within an image, such as facial features (eyes, nose) or body joints (elbows,
knees). This technique is essential for tasks like facial recognition, emotion detection, and
human pose estimation. By detecting landmarks, computer vision models can accurately
interpret and analyze visual data, enabling applications in areas like augmented reality and
biometrics.

26. UNet vs. Traditional CNN: UNet is designed for semantic segmentation, where pixel-level
accuracy is required. Unlike traditional CNNs, UNet has an encoder-decoder structure with
skip connections between layers, preserving spatial resolution. This architecture allows UNet
to recover fine-grained details that are crucial in segmentation tasks, especially in medical
imaging, where precise object boundaries are important for accurate analysis.

27. Motivation for YOLO in Object Detection: YOLO (You Only Look Once) performs real-time
object detection by analyzing the entire image in a single forward pass. This approach makes
YOLO faster and more efficient than traditional methods, which require multiple passes over
an image. YOLO's ability to predict multiple objects in one go, with both localization and
classification, makes it ideal for real-time applications like video surveillance and
autonomous vehicles.

28. Deeper Networks Perform Better: Deeper networks can learn more complex and abstract
features from data, making them capable of modeling intricate patterns. While shallow
networks may struggle to capture high-level abstractions, deeper networks can progressively
extract more meaningful representations from raw data, leading to better performance on
tasks such as image recognition, natural language processing, and more.

29. Non-linearity in Neural Networks: Non-linearity enables neural networks to model complex,
real-world relationships that linear models cannot capture. By introducing non-linear
activation functions like ReLU or sigmoid, neural networks can learn intricate patterns and
make more accurate predictions. Non-linearity allows the network to approximate any
function, enabling the deep learning models to solve complex tasks beyond simple linear
regression.

30. Weight Decay in Neural Networks: Weight decay is a regularization technique that adds a
penalty to the loss function based on the magnitude of weights. This encourages the model
to learn smaller weights, reducing overfitting and improving generalization. Weight decay is
commonly used in conjunction with other regularization techniques like L2 regularization to
prevent the network from becoming too complex and overfitting to training data.

MODULE- 3
1. Sequence Data and Examples: Sequence data refers to data where the order of
elements matters. Each element depends on its predecessor. Examples include time-
series data, like stock prices over time, and text data, such as sentences where word
order is crucial to meaning. Sequence data captures temporal or contextual
relationships between elements in the data.
2. RNNs vs. Feedforward Networks: Recurrent Neural Networks (RNNs) differ from
traditional feedforward networks by having connections that loop back, allowing
them to maintain memory of previous inputs. This enables RNNs to model sequential
data and temporal dependencies. In contrast, feedforward networks process inputs
independently, lacking memory, which limits their ability to capture time-dependent
patterns in data.
3. Key Limitation of Standard RNNs: The key limitation of standard RNNs is their
difficulty in capturing long-term dependencies due to the vanishing gradient
problem. As gradients are backpropagated through time, they can diminish, making it
difficult for the model to learn relationships in long sequences. This limits the
performance of RNNs on tasks requiring long-range memory.
4. Temporal Dependencies in Sequence Modeling: Temporal dependencies in
sequence modeling refer to the relationships between elements in a sequence where
current values depend on previous ones. This is crucial in time-series data and
language modeling, where past information influences future predictions. Properly
modeling these dependencies allows for better predictions in tasks like speech
recognition, machine translation, and forecasting.
5. Significance of the Hidden State in RNNs: The hidden state in RNNs stores
information about the previous time steps and captures the temporal dependencies
in the data. It allows the network to maintain memory, which is updated at each step
as new data is processed. The hidden state is crucial for RNNs to understand
sequential patterns and generate context-aware outputs.
6. Vanishing Gradient Problem in RNNs: The vanishing gradient problem occurs when
gradients become very small during backpropagation through time, causing weights
to update minimally. This problem is particularly severe in long sequences,
preventing RNNs from learning long-term dependencies. It arises due to repeated
multiplication of small gradients, making it difficult for the model to learn effectively
over many time steps.
7. Exploding Gradient in RNN Training: Exploding gradients occur when gradients grow
exponentially during backpropagation, leading to excessively large weight updates.
This causes numerical instability and results in the model's weights becoming too
large, which can cause training to fail. Exploding gradients are particularly
problematic in deep networks and require techniques like gradient clipping to
mitigate their effects during training.
8. Activation Function in RNNs: The commonly used activation function in RNNs is the
hyperbolic tangent (tanh) function or the sigmoid function. These functions are used
because they squash values between a specific range, making them suitable for
capturing the output of RNN units. Tanh is preferred as it has a wider output range
than sigmoid, helping to prevent vanishing gradients.
9. Backpropagation Through Time (BPTT) in RNNs: Backpropagation Through Time
(BPTT) is an extension of backpropagation used to train RNNs. It involves unrolling
the RNN over time steps, calculating the gradient at each step, and updating the
weights accordingly. BPTT allows the RNN to learn from sequential data by
propagating errors back through the time steps, adjusting weights to minimize the
loss function.
10. LSTMs vs. Vanilla RNNs: Long Short-Term Memory (LSTM) networks are different
from vanilla RNNs due to their ability to remember long-term dependencies through
specialized components like the forget, input, and output gates. LSTMs avoid the
vanishing gradient problem by controlling the flow of information, allowing them to
retain important information over longer sequences compared to vanilla RNNs,
which suffer from memory limitations.
11. Forget Gate in an LSTM: The forget gate in an LSTM controls what information from
the cell state should be discarded or forgotten. It takes the previous hidden state and
the current input to produce a value between 0 and 1, determining how much of the
previous cell state should be retained or erased. This helps prevent the model from
retaining irrelevant information.
12. Purpose of the Input Gate in an LSTM: The input gate in an LSTM controls what new
information should be added to the cell state. It combines the current input and the
previous hidden state to generate a value between 0 and 1, which determines how
much of the new input should influence the cell state. This gate ensures relevant
information is updated.
13. Role of Cell State in an LSTM: The cell state in an LSTM is responsible for carrying
information across time steps. It acts as a memory, holding long-term dependencies
that are updated by the forget and input gates. The cell state allows LSTMs to
maintain relevant information over long sequences, helping the model capture long-
range dependencies without suffering from vanishing gradients.
14. Output Gate in an LSTM: The output gate in an LSTM determines what information
from the cell state should be passed to the next layer or output. It combines the
current input and the previous hidden state, creating a value between 0 and 1 that
controls how much of the cell state is exposed as the output of the LSTM unit at each
time step.
15. GRUs vs. LSTMs in Computational Efficiency: Gated Recurrent Units (GRUs) are more
computationally efficient than LSTMs because they have fewer gates (reset and
update gates) and simpler mechanisms. Unlike LSTMs, which have separate forget,
input, and output gates, GRUs combine these functions into a single update gate,
making them faster to train and less computationally expensive while achieving
similar performance in many tasks.
16. Reset Gate in a GRU: The reset gate in a Gated Recurrent Unit (GRU) decides how
much of the previous hidden state should be ignored when computing the current
state. It allows the GRU to reset the memory of the network selectively, enabling it to
focus on the most relevant information from the previous time steps, especially in
tasks with varying temporal dependencies.
17. Update Gate in a GRU: The update gate in a Gated Recurrent Unit (GRU) controls the
balance between using the current hidden state and the previous state. It helps
decide how much information from the previous hidden state should be retained and
how much should be updated with new information. The update gate ensures the
GRU can adapt to varying time dependencies efficiently.
18. Bi-Directional RNNs vs. Unidirectional RNNs: A bi-directional RNN improves
performance by processing the input sequence in both forward and backward
directions, allowing it to capture context from both past and future elements. This is
especially useful in tasks like speech recognition and machine translation, where the
meaning of a word depends on both its previous and subsequent context in the
sequence.
19. Computational Cost of Bi-Directional RNNs: Bi-directional RNNs require twice the
computational cost of unidirectional RNNs because they process the input sequence
in two directions—forward and backward. This means that each time step is
computed twice, leading to increased memory usage and longer training times.
Despite the higher cost, bi-directional RNNs often provide better performance in
tasks requiring full context understanding.
20. Tasks Benefiting from Bi-Directional Models: Bi-directional models are especially
beneficial for tasks like named entity recognition, machine translation, speech
recognition, and sentiment analysis. These tasks require context from both the past
and the future in a sequence to make more accurate predictions. Bi-directional
models capture these dependencies effectively, enhancing performance by
understanding the full context around each word or phrase.
21. GRUs vs. LSTMs in Computational Efficiency: GRUs are generally more
computationally efficient than LSTMs because they have fewer parameters and
simpler structures. LSTMs have separate gates for input, forget, and output, while
GRUs combine these into a single update gate. This simplicity in GRUs results in
faster training and reduced computational costs, making them preferable for tasks
where efficiency is a concern.
22. LSTMs and Long-Term Dependencies: LSTMs are better suited for handling long-term
dependencies than vanilla RNNs because they utilize memory cells and gating
mechanisms (input, output, and forget gates). These gates allow LSTMs to selectively
retain or forget information, mitigating the vanishing gradient problem and enabling
the network to remember relevant information over longer time spans, making them
effective for tasks with long-range dependencies.
23. Sequence-to-Sequence Model: A sequence-to-sequence model is a type of neural
network architecture designed to transform one sequence into another. It typically
consists of an encoder, which processes the input sequence, and a decoder, which
generates the output sequence. Sequence-to-sequence models are widely used in
tasks like machine translation, text summarization, and speech recognition, where
input and output are sequential.
24. Teacher Forcing in RNN Training: Teacher forcing is a technique used in training RNN-
based models, where the true output from the previous time step is fed as input to
the next time step, instead of using the model's own prediction. This accelerates
training by providing the model with correct data at each time step, helping it learn
more effectively, especially for sequence generation tasks.
25. Significance of Gating Mechanism in GRUs and LSTMs: The gating mechanism in
GRUs and LSTMs controls the flow of information across time steps, enabling the
model to retain or forget relevant data. This mechanism helps mitigate issues like
vanishing gradients and allows the model to capture long-term dependencies. It
improves the network's ability to learn from sequential data by selectively updating
or resetting memory states.
26. Memory Cells in LSTMs and Vanishing Gradients: Memory cells in LSTMs help
mitigate vanishing gradients by storing and carrying forward relevant information
across long sequences. The gating mechanisms (forget, input, and output gates)
regulate the flow of information in and out of the cell, allowing LSTMs to maintain
long-term dependencies without the gradients shrinking to zero during
backpropagation.
27. LSTMs Over RNNs for NLP Tasks: LSTMs are preferred over standard RNNs for NLP
tasks because they can capture long-range dependencies in text, which is essential
for understanding context. LSTMs prevent the vanishing gradient problem by using
memory cells and gating mechanisms, allowing them to handle complex linguistic
structures and longer sequences more effectively than vanilla RNNs, leading to better
performance.
28. BPTT and Training Instability in Deep RNNs: BPTT can lead to training instability in
deep RNNs due to the accumulation of errors over time. As gradients are propagated
back through multiple time steps, they can either vanish or explode, causing unstable
weight updates. This instability is particularly problematic in deep RNN architectures
and can make it difficult for the model to converge to an optimal solution.
29. Context Vector in Sequence Models: A context vector in sequence models
represents a compressed summary of the input sequence, capturing the relevant
information needed for the output. In sequence-to-sequence models, the encoder
produces the context vector, which is passed to the decoder to generate the output
sequence. The context vector plays a crucial role in guiding

MODULE- 4
Sure! Below are the answers with exactly 60 words each:
1. Generative Modeling vs. Discriminative Modeling: Generative modeling focuses on
modeling the joint distribution P(x,y)P(x, y) to generate data similar to the original
dataset, while discriminative modeling focuses on learning the boundary between
classes by modeling the conditional distribution P(y∣x)P(y|x). Generative models
generate new data, whereas discriminative models only classify data into existing
categories.
2. Probabilistic vs. Non-Probabilistic Generative Models: Probabilistic generative
models, like Gaussian Mixture Models, explicitly model data distributions using
probability theory, providing a probabilistic approach to data generation. Non-
probabilistic models, like certain autoencoders, do not directly model probability
distributions but instead focus on deterministic transformations. Probabilistic models
offer uncertainty estimates, while non-probabilistic ones focus on deterministic
mappings.
3. Latent Variables in Generative Models: Latent variables represent hidden factors
that explain the observed data in generative models. These variables are used to
capture underlying structures in the data, enabling the model to generate realistic
samples. By conditioning on these latent variables, generative models, such as VAEs
and GANs, can generate new data points that resemble the training data distribution.
4. Applications of Generative Models in Image Processing: Generative models are
widely used in image processing, such as generating realistic images (via GANs),
enhancing low-resolution images (super-resolution), or filling in missing parts of
images (image inpainting). They learn the data distribution and generate new images
that closely match the real data, enabling tasks like data augmentation, restoration,
and creative generation.
5. Adversarial Process in GANs: GANs consist of two networks, a generator and a
discriminator, which are trained in an adversarial setting. The generator creates fake
data, and the discriminator tries to distinguish between real and fake data. The
generator learns to improve its output by receiving feedback from the discriminator,
leading to the creation of more realistic data over time.
6. Why VAEs are Probabilistic: VAEs are considered probabilistic because they model
data generation as a probabilistic process. They learn the distribution of data using a
latent variable space, where data points are drawn from a distribution (typically
Gaussian). The encoder outputs parameters for this distribution, and the decoder
samples from it to generate new data, capturing uncertainty in the generation
process.
7. Reparameterization Trick in VAEs: The reparameterization trick is a method used in
Variational Autoencoders (VAEs) to allow for backpropagation through the stochastic
sampling process. Instead of sampling directly from the latent distribution, the trick
expresses the latent variable as a deterministic function of a noise variable, allowing
gradients to flow through the stochastic part of the model during training.
8. Motivation Behind Transformer Model: The Transformer model was introduced to
overcome limitations of recurrent neural networks (RNNs), particularly in handling
long-range dependencies. It uses self-attention to process inputs in parallel, enabling
faster training and better performance on tasks like machine translation and natural
language processing. This architecture allows models to capture complex
relationships within sequences efficiently.
9. Role of Self-Attention in Transformers: Self-attention in Transformer models enables
each token in the input sequence to attend to all other tokens, capturing contextual
relationships at different positions. This mechanism allows the model to focus on
relevant parts of the sequence when making predictions, handling long-range
dependencies better than RNNs and providing more parallelizable computation for
faster training.
10. Diffusion Model for High-Quality Data Generation: Diffusion models generate high-
quality data by gradually transforming noise into structured data through a series of
denoising steps. They begin with random noise and iteratively refine it into data
samples by learning the reverse of a diffusion process. This step-by-step refinement
leads to high-quality samples, often outperforming GANs in generating images with
high fidelity.
11. Significance of Multi-Modal Models in Generative AI: Multi-modal models combine
information from different modalities, such as text, images, and audio, to generate
rich, contextually relevant data. These models enhance generative AI by enabling the
understanding and generation of complex data across multiple formats. They are
used in applications like text-to-image synthesis, video generation, and cross-modal
retrieval, improving overall model versatility and performance.
12. Overfitting vs. Mode Collapse in GANs: Overfitting in GANs occurs when the model
memorizes the training data, leading to poor generalization to new data. Mode
collapse happens when the generator produces limited varieties of outputs, failing to
capture the full diversity of the data distribution. Both issues hinder GANs' ability to
generate diverse and realistic samples from the true data distribution.
13. KL-Divergence in VAEs: KL-divergence in VAEs measures the difference between the
learned latent distribution and a prior distribution (typically Gaussian). By minimizing
this term, the model ensures that the learned latent space is close to the prior,
allowing smooth sampling and regularization of the latent space. This helps the
decoder generate diverse, realistic samples from the latent space.
14. Normalizing Flow for Density Estimation: Normalizing flow modeling improves
density estimation by learning invertible transformations that map simple
distributions to complex ones. It enables exact likelihood computation, making it
possible to model complex data distributions more accurately. The use of invertible
transformations allows for precise and efficient density estimation, improving
generative models' ability to generate high-quality samples with tractable likelihoods.
15. Computational Challenges of Training GANs: Training GANs is computationally
challenging due to instability during the adversarial process. The generator and
discriminator must be balanced, as one might overpower the other, leading to poor
performance. Moreover, training requires tuning hyperparameters like learning rate
and network architecture, making it computationally intensive and time-consuming.
Mode collapse and vanishing gradients are also common issues.
16. Diffusion Models vs. GANs in Training: Diffusion models and GANs differ in their
training approaches. GANs use a competitive adversarial process between a
generator and discriminator, while diffusion models train through a denoising
process, progressively refining noise into data. Diffusion models generally require
more computational resources but can generate more stable and high-quality
samples compared to GANs, which may suffer from mode collapse.
17. Why Large-Scale Transformers Require Massive Data: Large-scale Transformer
models require vast amounts of training data to effectively capture complex patterns
and relationships in data. The self-attention mechanism in Transformers has a high
computational cost and benefits from large datasets that allow the model to learn
rich, diverse representations. Without sufficient data, Transformer models may
overfit or fail to generalize well to new data.
18. Impact of Energy-Based Generative Models on Optimization: Energy-based
generative models (EBMs) aim to assign low energy to data points and high energy to
non-data points. These models learn to generate data by optimizing the energy
function, typically using gradient-based methods. EBMs are useful for optimizing
complex tasks like image generation, where energy minimization allows them to
generate realistic data by learning the energy landscape of the data distribution.
19. Wasserstein Distance in GANs: Wasserstein distance, or Earth Mover’s Distance, is
used in GANs to improve training stability. It measures the difference between the
real and generated data distributions in a continuous manner. Unlike the Jensen-
Shannon divergence used in traditional GANs, Wasserstein distance provides
smoother gradients during training, preventing issues like mode collapse and
improving the quality of generated samples in Wasserstein GANs (WGANs).
20. Learning Data Distributions in Generative Models: Learning data distributions is
fundamental in generative models, as it allows them to generate new data points
similar to the training data. By modeling the distribution of observed data, generative
models like VAEs and GANs learn to generate realistic samples. This ability is crucial
for tasks like image generation, anomaly detection, and data augmentation, where
new, similar data is required.

MODULE-5,6
Here are the answers with 60 words for each:
1. Language Model (LM) and its Function: A Language Model (LM) is a probabilistic
model that predicts the likelihood of a sequence of words in a language. Its basic
function is to estimate the probability distribution of word sequences, enabling tasks
such as text generation, machine translation, and speech recognition. It helps in
predicting the next word in a sentence.
2. LLMs vs. Traditional Language Models: Large Language Models (LLMs) differ from
traditional language models in terms of scale, training data, and architecture. LLMs
are trained on massive datasets, allowing them to handle more complex tasks and
generate coherent, contextually relevant responses. Traditional models often use
simpler architectures, and their scope is limited to smaller datasets and basic
language processing.
3. Advantage of Fine-Tuning LLMs: Fine-tuning an LLM is more efficient than training
from scratch as it allows the model to leverage pre-existing knowledge learned from
large datasets. Fine-tuning adjusts the model on a specific task with less data,
speeding up training and improving performance. It also reduces computational
resources and training time, making it practical for specialized tasks.
4. Hallucination in LLMs: Hallucination in LLMs refers to the phenomenon where the
model generates information that is factually incorrect, irrelevant, or nonsensical,
despite appearing plausible. It occurs when LLMs provide answers or generate
content that isn't grounded in the input data. This issue arises due to the
probabilistic nature of LLMs, which can sometimes "invent" details beyond their
training data.
5. Zero-Shot Learning in NLP: Zero-shot learning in NLP refers to the ability of a model
to perform a task without having been explicitly trained on task-specific data. The
model generalizes its understanding to new tasks by leveraging prior knowledge
learned from different tasks. For example, a language model can classify sentiment
without direct examples from sentiment-labeled data.
6. Importance of Few-Shot Learning in LLMs: Few-shot learning allows LLMs to perform
tasks with very few examples, making them adaptable to a wide range of problems
with minimal data. This is crucial for practical applications where labeled data is
scarce. It enables the model to generalize effectively from a small set of examples,
improving its flexibility and efficiency.
7. Extractive vs. Abstractive Summarization: Extractive summarization involves
selecting and extracting key sentences or phrases directly from the source text to
form a summary. In contrast, abstractive summarization generates a summary by
interpreting and paraphrasing the original text, often rewording and simplifying the
content. Abstractive methods tend to provide more coherent and fluent summaries
compared to extractive ones.
8. LLM Sentiment Analysis: LLMs perform sentiment analysis by analyzing the text and
determining its emotional tone (positive, negative, neutral). They use their
understanding of language and context to classify sentiments in sentences or
documents. By identifying patterns and relationships between words, LLMs can
discern sentiment even in complex or nuanced expressions, making them valuable
for market and opinion analysis.
9. Named Entity Recognition (NER) and LLMs: Named Entity Recognition (NER)
identifies and classifies entities in text, such as people, organizations, and locations.
LLMs improve NER by utilizing large amounts of contextual data to recognize and
disambiguate entities, even in complex or unstructured text. They enhance accuracy
by understanding contextual relationships, enabling more reliable entity extraction in
diverse domains.
10. Cross-Lingual NLP and Example: Cross-lingual NLP involves processing and
understanding text in multiple languages using a shared model. It enables tasks like
translation, information retrieval, and sentiment analysis across languages. An
example is using a multilingual LLM to translate text between languages that the
model hasn’t been explicitly trained on, leveraging shared representations for various
languages.
11. Prompt in LLMs: A prompt in the context of LLMs is an input or instruction given to
the model to guide its response or behavior. It provides context or specifies a task for
the model, such as generating text, answering questions, or summarizing
information. The quality and clarity of the prompt significantly influence the model's
output.
12. Prompt Engineering for LLMs: Prompt engineering involves designing and refining
input prompts to improve the performance of LLMs for specific tasks. By carefully
structuring the prompt, one can guide the model to generate more accurate,
relevant, and contextually appropriate responses. This technique maximizes LLMs'
capabilities in tasks like text generation, question-answering, and creative writing.
13. Retrieval-Augmented Generation (RAG) in NLP: Retrieval-Augmented Generation
(RAG) combines the power of large-scale retrieval systems with generative models. It
first retrieves relevant information from a large corpus based on the input query and
then uses a generative model to synthesize this information into a coherent
response. RAG improves accuracy and relevance, especially for tasks requiring
specific knowledge not contained in the model's training data.
14. LangChain and Generative NLP: LangChain is a framework designed for building
applications that leverage LLMs for various tasks, such as document retrieval,
question-answering, and chatbots. It integrates tools like APIs, databases, and other
systems with LLMs to enhance their generative capabilities. LangChain improves
generative NLP by providing an easy-to-use interface and expanding the potential
applications of LLMs.
15. Challenges in Machine Translation with LLMs: Machine translation using LLMs faces
challenges like handling idiomatic expressions, maintaining grammatical accuracy,
and translating context-dependent nuances. Additionally, LLMs may struggle with
low-resource languages due to limited training data. Ensuring consistency and
fluency across languages, especially when dealing with dialects or slang, remains an
ongoing challenge in translation tasks.
16. Supervised Fine-Tuning vs. Reinforcement Learning in LLMs: Supervised fine-tuning
involves training an LLM on labeled data to adjust it for a specific task. Reinforcement
learning (RL) in LLMs, such as RLHF (Reinforcement Learning from Human Feedback),
involves optimizing the model based on feedback from interactions, improving task
performance by reinforcing desirable outcomes. RL helps LLMs improve with
dynamic, interactive environments.
17. Real-Time NLP and its Importance: Real-time NLP refers to the ability to process and
generate language responses instantly or with minimal delay. It’s crucial for
applications like live chatbots, real-time translation, and voice assistants, where
timely responses are essential for user experience. Real-time NLP ensures that
systems can handle dynamic inputs and generate relevant, coherent outputs
efficiently.
18. Why LLMs Struggle with Factual Accuracy: LLMs struggle with factual accuracy
because they generate responses based on patterns learned from large, unstructured
datasets, without direct verification of facts. This can lead to the generation of
outdated or incorrect information, especially when the model relies on statistical
correlations rather than grounded knowledge. Ensuring accuracy requires additional
verification or specialized training.
19. Human-in-the-Loop in Generative NLP: Human-in-the-loop (HITL) refers to the
integration of human oversight in the training or output generation process of NLP
models. It helps improve the quality of generative NLP by allowing humans to correct
errors, provide feedback, and guide the model’s learning. This hybrid approach
enhances the reliability and effectiveness of generative systems.
20. Hugging Face API and LLMs: The Hugging Face API provides easy access to a wide
range of pre-trained LLMs, enabling developers to integrate them into applications
without needing to train models from scratch. It supports tasks like text generation,
classification, and translation, making LLMs more accessible. Hugging Face also offers
model fine-tuning and deployment options, streamlining NLP development.

Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
Neural Network-Soniya
100% (1)
Neural Network-Soniya
72 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Variational Autoencoders - Post Quiz - Attempt Review
No ratings yet
Variational Autoencoders - Post Quiz - Attempt Review
5 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
DL Cie2
No ratings yet
DL Cie2
5 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
Secrets of Deep Learning 1716536527
No ratings yet
Secrets of Deep Learning 1716536527
12 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
11 pages
Activation Function - A Mathematica
No ratings yet
Activation Function - A Mathematica
11 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
No ratings yet
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
11 pages
DL Internal
No ratings yet
DL Internal
9 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
Ann CNN RNN
No ratings yet
Ann CNN RNN
26 pages
Terms To Review
No ratings yet
Terms To Review
9 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Lecture 1-Unit 3.3
No ratings yet
Lecture 1-Unit 3.3
3 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
NNDL
No ratings yet
NNDL
7 pages
Capstone Project
No ratings yet
Capstone Project
7 pages
2mrk Answers
No ratings yet
2mrk Answers
6 pages
Deep Learning - As1
No ratings yet
Deep Learning - As1
2 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Unit II
No ratings yet
Unit II
38 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
DL Practical File
No ratings yet
DL Practical File
58 pages
DL Unit 3 Important Questions and Answers PDF .. - 1
No ratings yet
DL Unit 3 Important Questions and Answers PDF .. - 1
8 pages
Unit 2
No ratings yet
Unit 2
64 pages
Neural Network Representation
No ratings yet
Neural Network Representation
5 pages
Assignment Jaiprakash
No ratings yet
Assignment Jaiprakash
5 pages
Sony Ai Content
No ratings yet
Sony Ai Content
26 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
Module 1
No ratings yet
Module 1
22 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
NNML Full
No ratings yet
NNML Full
19 pages
ISE-1 Imp DLPDF
No ratings yet
ISE-1 Imp DLPDF
28 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
UNIT-IV Improving Deep Neural Networks
No ratings yet
UNIT-IV Improving Deep Neural Networks
17 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Antim Prahar AI and ML For Business 2025
No ratings yet
Antim Prahar AI and ML For Business 2025
45 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
Deep Learning
No ratings yet
Deep Learning
4 pages
DL - FNN - RNN
No ratings yet
DL - FNN - RNN
5 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Assignment 2 QSN 1
No ratings yet
Assignment 2 QSN 1
4 pages
DGM Mid Sem
No ratings yet
DGM Mid Sem
39 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
DeekshikaJadyada21 AP24LDS11
No ratings yet
DeekshikaJadyada21 AP24LDS11
5 pages
Gen AI Unit 1
100% (1)
Gen AI Unit 1
86 pages
Artificial Intelligence and Causal Inference (Momiao Xiong) (Z-Library)
No ratings yet
Artificial Intelligence and Causal Inference (Momiao Xiong) (Z-Library)
395 pages
GEN AI - Question Bank EndSem
No ratings yet
GEN AI - Question Bank EndSem
9 pages
Genconvit: Deepfake Video Detection Using Generative Convolutional Vision Transformer
No ratings yet
Genconvit: Deepfake Video Detection Using Generative Convolutional Vision Transformer
10 pages
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
No ratings yet
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
7 pages
Data Analytics Automation With AI A Comparative Study of Traditional and Generative AI Approaches
No ratings yet
Data Analytics Automation With AI A Comparative Study of Traditional and Generative AI Approaches
24 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Unit II
No ratings yet
Unit II
35 pages
1 s2.0 S2772918424000444 Main
No ratings yet
1 s2.0 S2772918424000444 Main
12 pages
Auto-Encoding Variational Bayes
No ratings yet
Auto-Encoding Variational Bayes
8 pages
Machine Translation, Auto Encoders and Decoders
No ratings yet
Machine Translation, Auto Encoders and Decoders
29 pages
Seminar Report Unleashing The Power of Image Generators
No ratings yet
Seminar Report Unleashing The Power of Image Generators
10 pages
A Survey of Deep Learning Audio Generation Methods
No ratings yet
A Survey of Deep Learning Audio Generation Methods
14 pages
How Is Generative AI Transforming Supply Chain Operations and Efficiency
No ratings yet
How Is Generative AI Transforming Supply Chain Operations and Efficiency
97 pages
3D Generative Models A Survey
No ratings yet
3D Generative Models A Survey
21 pages
Heterogeneous Hypergraph Variational Autoencoder For Link PR
No ratings yet
Heterogeneous Hypergraph Variational Autoencoder For Link PR
14 pages
Application of Machine Learning in LC-MS-based Non-Targeted Analysis
No ratings yet
Application of Machine Learning in LC-MS-based Non-Targeted Analysis
20 pages
Privacy and Security Concerns in Generative AI A Comprehensive Survey
No ratings yet
Privacy and Security Concerns in Generative AI A Comprehensive Survey
19 pages
Deep Generative Models in Engineering Design: A Review: Lyle Regenwetter Amin Heyrani Nobari
No ratings yet
Deep Generative Models in Engineering Design: A Review: Lyle Regenwetter Amin Heyrani Nobari
23 pages
Product Aesthetic Design - A Machine Learning Augmentation
No ratings yet
Product Aesthetic Design - A Machine Learning Augmentation
29 pages
Practice Exam Solutions
No ratings yet
Practice Exam Solutions
26 pages
Deep Learning Tackles Single Cell Analys
No ratings yet
Deep Learning Tackles Single Cell Analys
74 pages
Advanced AI Viva Questions
No ratings yet
Advanced AI Viva Questions
2 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
CVAE
No ratings yet
CVAE
3 pages
8 Generative AI
No ratings yet
8 Generative AI
36 pages
1 s2.0 S1566253523002567 Main
No ratings yet
1 s2.0 S1566253523002567 Main
24 pages
VAE, Domain Adaptation
No ratings yet
VAE, Domain Adaptation
15 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet