100% found this document useful (1 vote)

802 views51 pages

LLM Questions

Uploaded by

abhinav.kimothi.ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

802 views51 pages

LLM Questions

Uploaded by

abhinav.kimothi.ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Mastering LLM

Top 40 LLM
Interview
Questions &
Answers
(Updated)
www.masteringllm.com
01
Which technique helps mitigate
bias in prompt-based learning?
Fine-tuning

Data augmentation
Prompt calibration

Gradient clipping

Explanation
Prompt calibration involves
adjusting prompts to minimize bias
in the generated outputs. Fine-
tuning modifies the model itself,
while data augmentation expands
the training data. Gradient clipping
prevents exploding gradients
during training.

www.masteringllm.com
www.masteringllm.com

LLM Interview
Course
Want to Prepare yourself for an
LLM Interview?
100+ Questions spanning 14 categories
with Real Case Studies
Curated 100+ assessments for each
category
Well-researched real-world interview
questions based on FAANG & Fortune
500 companies
Focus on Visual learning
Certification

Coupon Code - LLM50

Coupon is valid till 30th Sep 2024
Do you need to have a vector store

03
for all your text-based LLM use
cases?
Yes

Explanation
A vector store is used to store the vector representation of a word or
sentence. These vector representations capture the semantic meaning
of the words or sentences and are used in various NLP tasks.
However, not all text-based LLM use cases require a vector store.
Some tasks, such as summarization, sentiment analysis, and
translation, do not need context augmentation.

Here is why:
Summarization: This task involves condensing a larger body of
text into a short summary. It does not require the context of other
documents or sentences beyond the text being summarized.
Sentiment Analysis: This task involves determining the sentiment
(positive, negative, neutral) expressed in a piece of text. It is
typically done based on the text itself without needing additional
context.
Translation: This task involves translating text from one language
to another. The context is usually provided by the sentence itself
and the broader document it is part of, rather than a separate
vector store.

www.masteringllm.com
Which of the following is NOT a

04
technique specifically used for aligning
Large Language Models (LLMs) with
human values and preferences?
RLHF

Direct Preference Optimization

Data Augmentation

Explanation
Data Augmentation is a general machine learning technique
that involves expanding the training data with variations or
modifications of existing data. While it can indirectly impact LLM
alignment by influencing the model's learning patterns, it's not
specifically designed for human value alignment.

Incorrect Options:
A) Reinforcement Learning from Human Feedback (RLHF) is
a technique where human feedback is used to refine the LLM's
reward function, guiding it towards generating outputs that align
with human preferences.
B) Direct Preference Optimization (DPO) is another technique
that directly compares different LLM outputs based on human
preferences to guide the learning process.
www.masteringllm.com
In Reinforcement Learning from Human

05
Feedback (RLHF), what describes "reward
hacking"?

Optimizes for desired behavior

Exploits reward function

Explanation
Reward hacking refers to a situation in RLHF where
the agent discovers unintended loopholes or
biases in the reward function to achieve high
rewards without actually following the desired
behavior. The agent essentially "games the
system" to maximize its reward metric.

Why Option A is Incorrect:

While optimizing for the desired behavior is the
intended outcome of RLHF, it doesn't represent
reward hacking. Option A describes a successful
training process. In reward hacking, the agent
deviates from the desired behavior and finds an
unintended way to maximize the reward.

www.masteringllm.com
Fine-tuning GenAI model for a task(e.g-

06
Creative writing), which factor
significantly impacts the models ability to
adapt to the target task?

Size of fine-tuning dataset

Pre-trained model architecture

Explanation
The architecture of the pre-trained model acts as the
foundation for fine-tuning. A complex and versatile
architecture like those used in large models (e.g., GPT-3)
allows for greater adaptation to diverse tasks. The size of
the fine-tuning dataset plays a role, but it's secondary. A
well-architected pre-trained model can learn from a
relatively small dataset and generalize effectively to the
target task.

Why A is Incorrect:
While the size of the fine-tuning dataset can enhance
performance, it's not the most crucial factor. Even a massive
dataset cannot compensate for limitations in the pre-trained
model's architecture. A well-designed pre-trained model can
extract relevant patterns from a smaller dataset and
outperform a less sophisticated model with a larger dataset.

www.masteringllm.com
What does the self-attention mechanism

07
in transformer architecture allow the
model to do?
Weigh word importance

Predict next word

Automatic summarization

Explanation
The self-attention mechanism in transformers acts as a spotlight,
illuminating the relative importance of words within a sentence.

In essence, self-attention allows transformers to dynamically

adjust the focus based on the current word being processed.
Words with higher similarity scores contribute more significantly,
leading to a richer understanding of word importance and
sentence structure. This empowers transformers for various NLP
tasks that heavily rely on context-aware analysis.

Incorrect Options:
Predict next word: While transformers can be used for language
modeling (including next-word prediction), this isn't the primary
function of self-attention.
Automatic summarization: While self-attention is a core
component of summarization models, it's not solely responsible
for generating summaries.

www.masteringllm.com
What is one advantage of using subword

08
algorithms like BPE or WordPiece in
Large Language Models (LLMs)?

Limit vocabulary size

Reduce amount of training data

Make computationally efficient

Explanation
LLMs deal with massive amounts of text, leading to a very large
vocabulary if you consider every single word. Subword algorithms
like Byte Pair Encoding (BPE) and WordPiece break down words
into smaller meaningful units (subwords) which are then used as
the vocabulary. This significantly reduces the vocabulary size
while still capturing the meaning of most words, making the model
more efficient to train and use.

Incorrect Answer Explanations:

Reduce amount of training data: Subword algorithms don't directly
reduce the amount of training data. The data size remains the
same.

Make computationally efficient: While limiting vocabulary size can

improve computational efficiency, it's not the primary purpose of
subword algorithms. Their main advantage lies in effectively
representing a large vocabulary with a smaller set of units.
www.masteringllm.com
Compared to Softmax, how does Adaptive
Softmax speed up large language models?

09
Sparse word reps

Zipf's law exploit

Pre-trained embedding

Explanation
Standard Softmax struggles with vast vocabularies, requiring
expensive calculations for every word. Imagine a large language
model predicting the next word in a sentence. Softmax multiplies
massive matrices for each word in the vocabulary, leading to
billions of operations! Adaptive Softmax leverages Zipf's law
(common words are frequent, rare words are infrequent) to
group words by frequency. Frequent words get precise
calculations in smaller groups, while rare words are grouped
together for more efficient computations. This significantly
reduces the cost of training large language models.

Incorrect Answer Explanations:

(A) Sparse word reps: While sparse representations can improve
memory usage, they don't directly address the computational
bottleneck of Softmax in large vocabularies.
(C) Pre-trained embedding: Pre-trained embeddings enhance
model performance but don't address the core issue of
Softmax's computational complexity.

www.masteringllm.com
Which configuration parameter for inference

10
can be adjusted to either increase or decrease
randomness within the model output layer?

Max new tokens

Top-k sampling

Temperature

Explanation
During text generation, large language models (LLMs) rely on a
softmax layer to assign probabilities to potential next words.
Temperature acts as a key parameter influencing the randomness
of these probability distributions.

Lower Temperature: When set low, the softmax layer assigns

significantly higher probabilities to the single word with the highest
likelihood based on the current context.
Higher Temperature: A higher temperature "softens" the probability
distribution, making other, less likely words more competitive.

Why other options are incorrect:

(A) Max new tokens: This parameter simply defines the maximum
number of words the LLM can generate in a single sequence.
(B) Top-k sampling: This technique restricts the softmax layer to
consider only the top k most probable words for the next
prediction.

www.masteringllm.com
What transformer model uses masking & bi-

11
directional context for masked token
prediction?

Autoencoder

Autoregressive

Sequence-to-sequence

Explanation
Autoencoder models are pre-trained using
masked language modeling. They use
randomly masked tokens in the input sequence
and the pretraining objective is to predict the
masked tokens to reconstruct the original
sentence.

www.masteringllm.com
What technique allows you to scale model

12
training across GPUs when the model doesn't
fit in the memory of a single chip?

DDP

FSDP

Explanation
FSDP (Fully Sharded Data Parallel) is the
technique that allows scaling model training
across GPUs when the model is too big to fit in
the memory of a single chip. FSDP distributes or
shards the model parameters, gradients, and
optimizer states across GPUs, enabling efficient
training.

Incorrect Answers:

A) DDP (Distributed Data-Parallel) is a technique

that distributes data and processes batches in
parallel across multiple GPUs, but it requires the
model to fit onto a single GPU.
www.masteringllm.com
What is the purpose of quantization in training

13
large language models?

Reduce memory usage

Improve model accuracy

Enhance model interpretability

Explanation
Quantization helps reduce the memory required
to store model weights by reducing their
precision.

Incorrect options:
b) Improve model accuracy: While quantization
can have some impact on model accuracy, its
primary purpose is to reduce memory usage.
C) Enhance model interpretability: Quantization
does not directly enhance model interpretability.

www.masteringllm.com
How can scaling laws be used to design

14
compute optimal models? By

Optimizing model & data size

Improve model interpretability

Reduce training time

Enhance model scalability

Explanation
Scaling laws provide valuable insights into the relationship
between model size (number of parameters), dataset size, and
the model's performance (often measured as loss). This
relationship can be expressed mathematically through power
laws.
Here's how scaling laws help design compute-optimal models:
Understanding cost trade-offs: By analyzing scaling laws,
you can estimate the impact of increasing model size or
dataset size on performance and computational resources
(training time, memory usage). This allows you to find a
balance between model complexity and training cost.
Targeted optimization: You can use scaling laws to predict
the performance gain from increasing model size or data
size. This helps you focus optimization efforts on the factors
that will have the most significant impact on performance
within your computational budget.
What’s catastrophic forgetting in fine-tuning?

15
Other tasks perform worse

All tasks perform better

Pre-trained weights
enhance

Explanation
Catastrophic forgetting refers to the degradation
of performance on tasks other than the one
being fine-tuned, as the weights of the original
model are modified.

Incorrect options:

b) All tasks perform better: This is incorrect as

catastrophic forgetting leads to a loss of
performance on other tasks.
c) Pre-trained weights enhance: This is incorrect
as catastrophic forgetting occurs due to the
modification of weights during fine-tuning.

www.masteringllm.com
Parameter Efficient Fine-Tuning (PEFT)

16
updates only a small subset of parameters and
this helps prevent catastrophic forgetting

True

False

Explanation
Parameter Efficient Fine-Tuning (PEFT) is a method that updates
only a small subset of parameters during the fine-tuning process.
This approach is designed to be more memory efficient and to
prevent catastrophic forgetting. Catastrophic forgetting is a
phenomenon where a neural network forgets its previously
learned information upon learning new information. By updating
only a small subset of parameters, PEFT mitigates this issue,
allowing the model to retain its previously learned knowledge
while adapting to new tasks.

Explanation for the incorrect answer (False): If you chose False,

the misunderstanding might be due to the assumption that all
parameters need to be updated during fine-tuning. However, in
PEFT, only a small subset of parameters is updated. This is indeed
an effective strategy to prevent catastrophic forgetting and is not
less efficient. It allows the model to maintain its general knowledge
while adapting to specific tasks, thereby enhancing its
performance on those tasks without a significant increase in
computational cost.
www.masteringllm.com
You can use an algorithm other than Proximal

17
Policy Optimization to update the model
weights during RLHF

True

False

Explanation
For instance, you can use an algorithm called Q-
Learning. PPO is the most popular for RLHF
because it balances complexity and
performance, but RLHF is an ongoing field of
research and this preference may change in the
future as new techniques are developed.

www.masteringllm.com
In a Transformer model with group attention,

18
how does the mechanism differ from standard
self-attention when processing a sentence?

Replaces self-attention

Pre-defined word groups

Attention on specific word

Explanation
Standard self-attention in a Transformer
considers the relationships between individual
words within a sentence. Group attention, on the
other hand, introduces a new layer of attention.
This layer focuses on groups of words pre-
defined based on specific criteria, such as
syntactic or semantic groupings (e.g., noun
phrases, verb phrases).

www.masteringllm.com
19
During LLM training, which step is NOT
directly involved in the process?

Feature engineering

Pre-training

Fine-tuning

RLHF

Explanation
LLMs primarily rely on the raw text data itself for
training. Feature engineering, which involves
manually extracting specific features from the
data, is not a typical step in LLM training. Options
(b), (c), and (d) are all common stages in the LLM
training pipeline.

www.masteringllm.com
20
Pre-training is a crucial step in LLM training.
What is the main objective of pre-training?

To perform a specific task

General language
understanding

Explanation
Pre-training aims to equip the LLM with a
foundational understanding of language by
exposing it to a vast amount of text data. This
allows the model to learn general
representations of words, their relationships, and
overall language structure

www.masteringllm.com
Which of the following sequences represents
the most likely order of LLM Training stages?

21
A. Pre-training
B. RLHF
C. Instruction Fine-tuning

A -> C -> B

B -> A -> C

C -> A -> B

Explanation
LLM training follows a specific order:

Pre-training (A): The LLM is exposed to a massive dataset to

learn general language patterns and relationships between
words.

Instruction Fine-tuning (C): The pre-trained model is adapted to

a specific task using labeled data and instructions. This tailors
the model's knowledge to the desired task.

RLHF (Reinforcement Learning from Human Feedback) (B): This

optional stage further refines the model's behavior by
incorporating human feedback through a reward system. The
LLM receives rewards for desirable outputs.

www.masteringllm.com
A technique that utilizes a smaller model to

22
learn from a larger pre-trained model,
improving efficiency, is called:

Gradient Clipping

Backpropagation

Knowledge Distillation

Batch Normalization

Explanation
Knowledge distillation is a technique that allows a smaller model
(student) to learn from a larger, pre-trained model (teacher). It
improves training efficiency by leveraging the knowledge already
encoded in the teacher model.
Here's how knowledge distillation works:
1. Train a large, powerful model (teacher) on a massive dataset.
2. During training of the smaller model (student), instead of relying
solely on the original loss function (difference between
predicted and actual labels), the student also learns from the
teacher's outputs or internal representations.
3. This "distilled knowledge" guides the student model towards
learning similar patterns and achieving good performance, but
with less data and computational resources compared to
training from scratch.

www.masteringllm.com
Which method places ## at the start of tokens?

23
BPE

WordPiece

Explanation
WordPiece tokenization method places ## at the
beginning of tokens. This is a characteristic
feature of WordPiece.

www.masteringllm.com
Which technique uses gating functions to

24
decide which model to use based on the input?

Ensemble Techniques

Mixture of Experts (MoE)

Explanation
Mixture of Experts (MoE) is a machine learning
technique that uses multiple models, called
“experts”, and a gating function that decides
which expert to use based on the input. This
allows MoE to model more complex patterns
and adapt to different regions of the input space,
making it more flexible than traditional ensemble
techniques, which typically combine predictions
from multiple models without such a gating
function. Therefore, option B is correct. The
other option is incorrect because ensemble
techniques do not use gating functions. They
use multiple models and combine their
predictions.

www.masteringllm.com
What does ‘Prompt leaking’ signify in the

25
context of Language Learning Models (LLMs)?

Extracting sensitive Info

Hijacking model's output

Explanation
‘Prompt leaking’ in the context of Language Learning
Models (LLMs) refers to the act of extracting sensitive
or confidential information from the model’s response.
This could potentially be exploited by adversaries to
gain unauthorized insights into the LLM’s behavior or
compromise its security.

For example, consider a scenario where an LLM is

trained on a dataset that includes confidential
company emails. If a user asks the model to generate
a response based on a specific topic covered in
those emails, the model might inadvertently include
sensitive information in its response. This is an
instance of ‘Prompt leaking’, as the model has leaked
information that was supposed to remain confidential
www.masteringllm.com
Which database would you use if you want to

26
store Multi-dimensional vectors and perform
ANN search?

Vector Database

Traditional Database

Explanation
Traditional databases, like relational databases, are designed to
store data in tables with rows and columns. This structure is not
efficient for storing and searching multi-dimensional vectors.

On the other hand, vector databases specialize in handling high-

dimensional vectors. They are optimized for:

Storage: Vectors are efficiently stored and compressed within the

database.
ANN Search (Approximate Nearest Neighbor Search): This allows
you to find data points similar to a query vector. Vector databases
use specialized algorithms to perform these searches efficiently,
even in high-dimensional spaces.

In summary, for storing multi-dimensional vectors and performing

ANN search, a vector database is the most suitable choice due to
its optimized storage and search capabilities for this specific type
of data.

www.masteringllm.com
Which of the following vector indexing

27
techniques relies on grouping similar vectors
in a cluster for efficient retrieval?

Flat Indexing

Inverted File Index

Principal Component Analysis

Explanation
(a) Flat Indexing: This stores vectors without any specific
organization. While it can be used for similarity searches, it's not
efficient for large datasets.

(b) Inverted File Index: This technique is commonly used for text
retrieval in document databases. It indexes words and keeps
track of which documents contain those words. While it can be
adapted for vector similarity search with specific techniques, it
doesn't inherently group similar vectors in clusters.

(c) Principal Component Analysis (PCA): This technique reduces

the dimensionality of vectors while preserving the most
important information. While it can be used for dimensionality
reduction before indexing, PCA doesn't involve clustering
similar vectors.

www.masteringllm.com
For a small review dataset, if you want a 100%

28
recall rate which vector index you would use?
Speed is not consideration here.
Flat Index

HNSW
Random Projection

Explanation
In a small dataset, a flat index allows for an exhaustive
search, comparing each review vector to every other vector.
This maximizes the chance of finding the most similar
reviews with perfect accuracy.

Incorrect Option:
B. HNSW:
While HNSW can be accurate, it might not guarantee finding
the absolute closest neighbors in every case due to the
focus on efficient search within clusters.
C. Random Projection:
Similar to HNSW, the potential loss of information due to
dimensionality reduction might compromise the goal of
perfect accuracy

www.masteringllm.com
In Inverted File Index (IVF) index which

29
parameter you would tune to expand number
of clusters?
nprob

nlist

Explanation
nlist: This parameter controls the number of
vectors assigned to each inverted list during the
initial clustering stage. Increasing nlist leads to
the creation of more inverted lists, which
essentially represent more clusters.

Incorrect Answer:
This parameter determines the number of
probes (comparisons) performed within each
inverted list during retrieval. It doesn't directly
affect the number of clusters but rather
influences how many elements within each
cluster are explored during search.

www.masteringllm.com
Which metric is NOT typically used for

30
evaluating the quality of factual language
summaries generated by an LLM?
ROUGE Score

BLEU Score

Perplexity

Explanation
Perplexity is a measure of how well the model
predicts the next word in a sequence. While it
might be a relevant metric for some LLM tasks,
it's not commonly used for evaluating factual
language summaries. Options (a), (b) are all
common metrics for assessing the quality and
factual accuracy of summaries generated by
LLMs.

www.masteringllm.com
Which of the following indices represents a

31
method that involves multiplying with another
metric to reduce the size of the original
vector?

Random Projection Index

Flat Index

Explanation
The Random Projection Index is a technique
used in dimensionality reduction. It works by
projecting the original high-dimensional data into
a lower-dimensional space using a random
matrix. This process involves multiplication with
another metric (the random matrix), effectively
reducing the size of the original vector. On the
other hand, a Flat Index does not involve such a
process.

www.masteringllm.com
What’s the right pre-filtering order in a vector

32
database?
Meta-data filtering --> Top-K

Top-K --> meta-data filtering

Explanation
In the context of a vector database, pre-filtering
typically involves two steps: performing meta-
data filtering and then retrieving the top K results
from the vector index. The correct sequence is
to first perform meta-data filtering, which
reduces the overall search space, and then
execute the vector query on these filtered
vectors to generate the top-k results.

www.masteringllm.com
What’s the right post-filtering order in a

33
vector database?

Meta-data filtering --> Top-K

Top-K --> meta-data filtering

Explanation
In the context of a vector database, post-filtering
typically involves two steps: retrieving the top K
results from the vector index and then
performing meta-data filtering. The correct
sequence is to first perform tok-k results from
vector index to reduce search space, and then
execute meta-data filtering to general final top-k
results.

www.masteringllm.com
Which type of attention provides the best

33
overall accuracy & speed?

Single-headed Attention

Multi-query Attention

Grouped Query Attention

Explanation
GQA allows the attention heads to pay attention
to different parts of the sequence, while still
allowing for better speed and compute
efficiencies compared to multi-headed attention.

www.masteringllm.com
What is the key difference between Global and

33
Local Attention mechanisms in LLMs?

Global uses all hidden states

Local uses all hidden states

Global focuses on subsets

Explanation
Global attention involves utilizing all hidden
states to generate the context vector, which can
be more computationally intensive. In contrast,
local attention only considers a subset of hidden
states, making it more efficient.

www.masteringllm.com
What is the purpose of Positional Encoding in

33
a Transformer model?

To normalize the data

Introduce sequence order

To reduce overfitting

Explanation
Since Transformers do not inherently consider
token order, Positional Encoding adds
information about the position of tokens in the
sequence.

www.masteringllm.com
Which process is used to convert text into

33
high-dimensional numerical representations
in LLMs?
Tokenization and Embeddings

Backpropagation

Data Augmentation

Explanation
LLMs use tokenization to break text into tokens,
which are then converted into embeddings that
capture semantic meaning.

www.masteringllm.com
Which prompt engineering technique is used

33
to ask for step-by-step reasoning?

Zero-shot prompting

Chain of thought prompting

Few-shot prompting

Explanation
Chain of thought prompting (B) is a technique
where the prompt is crafted to encourage a
model to generate a sequence of intermediate
steps or reasoning that leads to the final answer.
This is particularly useful in complex problem-
solving tasks where the direct generation of an
answer might be difficult without breaking down
the problem into more manageable parts. By
explicitly asking the model to describe its
reasoning step by step, the prompt helps
simulate a thought process that can improve the
model's accuracy and the transparency of its
conclusions.

www.masteringllm.com
What type of questions does GraphRAG excel

33
at answering compared to baseline RAG?

Factual questions

Aggregation of information

Explanation
GraphRAG excels at answering questions that
require aggregation of information across the
dataset, as it organizes data into meaningful
semantic clusters, enabling it to summarize
themes effectively.

www.masteringllm.com
How does AgenticRAG improve on basic RAG?

33
Use of LLMs

Autonomous info seeking

Faster processing

Explanation
AgenticRAG improves on RAG by enabling
autonomous information seeking. It can decide
when to retrieve more info, ask follow-up
questions, or clarify ambiguities. This makes it
more dynamic and adaptive than traditional
RAG.

Incorrect Explanations:
A) Larger models are not the main improvement.
C) Faster processing is important but not the key
difference.

www.masteringllm.com
How does a planner agent enhance the

34
capabilities of AgenticRAG?

Responding in real-time

Breaking down complex

queries

Explanation
A planner agent enhances the capabilities of
AgenticRAG by breaking down complex queries
into manageable sub-queries. This allows the
planner agent to execute these sub-queries
across various RAG pipelines linked to different
data sources, ultimately merging the results to
form a coherent final response. This approach
facilitates more efficient processing of intricate
queries.

www.masteringllm.com
What is the primary function of a routing

35
agent in AgenticRAG?

Summarize multiple documents

Select downstream RAG

pipeline

Explanation
The primary function of a routing agent in
AgenticRAG is to select the appropriate
downstream RAG pipeline based on the input
query. It uses an LLM to analyze the query and
determine which RAG pipeline (such as
summarization or question-answering) is best
suited to handle the request. This process is
known as agentic reasoning and is essential for
optimizing the use of available resources.

www.masteringllm.com
What distinguishes AgenticRAG from

36
traditional RAG models?

Uses dynamic retrieval

RL during generation

generate text w/o retrieval

Explanation
AgenticRAG distinguishes itself from traditional
RAG models by utilizing user feedback to
dynamically adjust the retrieval process. This
allows the model to better align with user
preferences and improve the relevance of the
retrieved information, enhancing the overall
quality of the generated output.

www.masteringllm.com
What are the key steps involved in the

37
Retrieval Augmented Generation (RAG)
pipeline?
Retrieval, Generation, Ranking

Ranking,Generation,Retrieval

Retrieval, Ranking, Generation

Explanation
The RAG pipeline involves three key steps:
1. Retrieval: The query is encoded and
compared to pre-computed embeddings of
text chunks from a corpus, retrieving
relevant documents.
2. Ranking: Retrieved documents are ranked
based on relevance to the query, refining
the set for the final response.
3. Generation: The language model generates
a response based on the query and the
most relevant documents.

www.masteringllm.com
What is the term for the process of reducing

38
the number of parameters in a model to
improve its computational efficiency?
Model Pruning

Knowledge Distillation

Model Compression

Explanation

Model pruning is the process of reducing the

number of parameters in a model to improve its
computational efficiency. This is done by
identifying and removing redundant or less
important parameters, which can significantly
reduce the model's computational complexity
without significantly affecting its performance.

www.masteringllm.com
What is order of execution when you set both

39
top-k and top-p together.
Top K acts before Top P

Top P acts before Top K

Top K does not matter

Top P does not matter

Explanation

For each token selection step, the top-K tokens

with the highest probabilities are sampled. Then
tokens are further filtered based on top-P with
the final token selected using temperature
sampling.

www.masteringllm.com
True or False: The more fine-tuning data we

40
can provide to fine-tune a LLM, the better the
model performance.

True

False

Explanation

The most important factor to consider when

providing fine-tuning data examples is their
quality and the diversity of downstream queries
users might submit to the LLM. If the
downstream use cases are diverse, then we
need more fine-tuning data. While providing
more fine-tuning data might typically boost the
model performance, it’s not always true. The
AlpaGasus paper has shown that highly curated
high-quality datasets are much more important
than having large datasets to produce a better
model.

www.masteringllm.com
In LLM decoding, which strategy considers

41
several possible continuations of a sequence at
the same time?

Greedy decoding

Beam search

Explanation
Beam search is a decoding strategy in LLMs
that addresses a limitation of greedy decoding.
It works by:

1. Starting with the first word in the prompt.

2. Predicting the top 'k' most probable
continuations (beams) for the next word.
3. Evaluating each beam based on a
combination of probabilities in the sequence so
far.
4. Selecting the top 'k' beams to continue
exploring and predicting the next word for each
remaining beam.
5. Repeating steps 2-4 until the desired length
is reached.
www.masteringllm.com
www.masteringllm.com

Coupon Code - LLM50

Coupon is valid till 30th Sep 2024
www.masteringllm.com

AgenticRAG with
LlamaIndex
Want to learn why AgenticRAG is
future of RAG?
Master RAG fundamentals through practical
case studies

Understand how to overcome limitations of RAG

Introduction to AgenticRAG & techniques like

Routing Agents, Query planning agents,
Structure planning agents, and React agents
with human in loop.

5 real-time case studies with code

walkthroughs

AI Agents by Google
100% (8)
AI Agents by Google
42 pages
Generative AI On AWS
100% (6)
Generative AI On AWS
208 pages
Mastering AI Agents
100% (4)
Mastering AI Agents
93 pages
Databricks Big Book of GenAI FINAL
100% (7)
Databricks Big Book of GenAI FINAL
118 pages
Current Best Practices For Training LLMs From Scratch - Final
No ratings yet
Current Best Practices For Training LLMs From Scratch - Final
23 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
Top Agentic AI Architecture Design Patterns
100% (4)
Top Agentic AI Architecture Design Patterns
8 pages
Generative AI Usecases - A Comprehensive Guide - Dummies
100% (1)
Generative AI Usecases - A Comprehensive Guide - Dummies
19 pages
Generative AI With Large Language Models
100% (3)
Generative AI With Large Language Models
31 pages
LangChain Cheat Sheet KDnuggets
No ratings yet
LangChain Cheat Sheet KDnuggets
1 page
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Generative Ai Terminology
67% (3)
Generative Ai Terminology
26 pages
Software Architecture in An AI World
100% (1)
Software Architecture in An AI World
25 pages
7 Agentic RAG System Architectures To Build AI Agents
100% (1)
7 Agentic RAG System Architectures To Build AI Agents
12 pages
Vector Databases
No ratings yet
Vector Databases
35 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
Create LLM Application Using Langchain With Ease
100% (5)
Create LLM Application Using Langchain With Ease
12 pages
Large Language Models
100% (1)
Large Language Models
23 pages
eBook-The Ultimate Guide To Using LLMs With Speech Recognition To Build Voice Apps
100% (1)
eBook-The Ultimate Guide To Using LLMs With Speech Recognition To Build Voice Apps
66 pages
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
100% (1)
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
24 pages
LangGraph: Multi-Agent Systems
No ratings yet
LangGraph: Multi-Agent Systems
9 pages
RAG Technics
100% (1)
RAG Technics
8 pages
300 LangChain Projects
100% (1)
300 LangChain Projects
17 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
26 RAG Concepts in Alphabetical Order
No ratings yet
26 RAG Concepts in Alphabetical Order
15 pages
Hands-On Guide To Agentic Corrective RAG-1
No ratings yet
Hands-On Guide To Agentic Corrective RAG-1
5 pages
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
100% (1)
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
27 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
Langchain Retrieval Augmented Generation White Paper
100% (1)
Langchain Retrieval Augmented Generation White Paper
23 pages
KAG Graph + Multimodal RAG + LLM Agents = Powerful AI Reasoning - by Gao Dalie (高達烈) - in Towards AI - Freedium
No ratings yet
KAG Graph + Multimodal RAG + LLM Agents = Powerful AI Reasoning - by Gao Dalie (高達烈) - in Towards AI - Freedium
13 pages
Agentic AI Projects
33% (3)
Agentic AI Projects
9 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
No ratings yet
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
13 pages
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
100% (1)
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
28 pages
Generative AI - 48 Hours TOC
100% (1)
Generative AI - 48 Hours TOC
4 pages
LLM Applications
100% (1)
LLM Applications
1 page
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
RAG Notes
No ratings yet
RAG Notes
19 pages
Aryan A. What Is LLMOps. Large Language Models in Production 2024
100% (1)
Aryan A. What Is LLMOps. Large Language Models in Production 2024
67 pages
Enhancing AI Systems With Agentic Workflows Patterns in Large Language Model
No ratings yet
Enhancing AI Systems With Agentic Workflows Patterns in Large Language Model
6 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
100% (2)
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
48 pages
Agentic Ai Roadmap
No ratings yet
Agentic Ai Roadmap
6 pages
Building Generative AI Agents With Vertex AI Agent Builder
No ratings yet
Building Generative AI Agents With Vertex AI Agent Builder
13 pages
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
27 pages
Azure OpenAI Cookbook
No ratings yet
Azure OpenAI Cookbook
173 pages
Improve Real-World RAG Systems
No ratings yet
Improve Real-World RAG Systems
43 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
Local LLM Inference and Fine-Tuning
100% (3)
Local LLM Inference and Fine-Tuning
26 pages
Building Effective Agents - Anthropic
No ratings yet
Building Effective Agents - Anthropic
16 pages
A Developer's Guide To Building AI Applications: Second Edition
100% (5)
A Developer's Guide To Building AI Applications: Second Edition
46 pages
TensorFlow Cheatsheet Zero To Mastery V1.01
No ratings yet
TensorFlow Cheatsheet Zero To Mastery V1.01
26 pages
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
50% (2)
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
21 pages
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
From Everand
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
Amandeep
No ratings yet
Generative AI Foundations in Python: Discover key techniques and navigate modern challenges in LLMs
From Everand
Generative AI Foundations in Python: Discover key techniques and navigate modern challenges in LLMs
Carlos Rodriguez
No ratings yet
Dhruba Jyoti Saha - Java Architect
No ratings yet
Dhruba Jyoti Saha - Java Architect
15 pages
E-Procurem - Andra Predesh
No ratings yet
E-Procurem - Andra Predesh
38 pages
School Management Proposal 2.2
No ratings yet
School Management Proposal 2.2
17 pages
Parcel Delivery System Project Synopsis Report
No ratings yet
Parcel Delivery System Project Synopsis Report
13 pages
Unit 4
No ratings yet
Unit 4
90 pages
A Report ON Organisational Study AT
No ratings yet
A Report ON Organisational Study AT
10 pages
V7 QBR Ad Hoc Reporting Release Rev4
No ratings yet
V7 QBR Ad Hoc Reporting Release Rev4
94 pages
EBS and WCC Content 13-Jun-25
No ratings yet
EBS and WCC Content 13-Jun-25
67 pages
CR VBR Sizing Best Practices ES 24122020
No ratings yet
CR VBR Sizing Best Practices ES 24122020
41 pages
RRL3 Baptismal
100% (1)
RRL3 Baptismal
30 pages
Lecture 8 - Web Architectures
No ratings yet
Lecture 8 - Web Architectures
25 pages
Hospital Information Systems Planning
No ratings yet
Hospital Information Systems Planning
9 pages
Abap Programming For Sap Hana - Final
No ratings yet
Abap Programming For Sap Hana - Final
7 pages
M.Tech CSE Syllabus Notes
No ratings yet
M.Tech CSE Syllabus Notes
32 pages
ch1 - Information Retrieval Systems
No ratings yet
ch1 - Information Retrieval Systems
52 pages
Updated Presentation
No ratings yet
Updated Presentation
19 pages
Artificial Intelligence, Expert Systems
No ratings yet
Artificial Intelligence, Expert Systems
12 pages
Oracle Cloud Infrastructure: - Guide To Building Cloud Native Applications 1st
No ratings yet
Oracle Cloud Infrastructure: - Guide To Building Cloud Native Applications 1st
335 pages
Corredy Jemutai CV 2024
No ratings yet
Corredy Jemutai CV 2024
4 pages
Database Fundamentals
No ratings yet
Database Fundamentals
35 pages
Igel Zibriel Eng Curriculum Vitae 240909
No ratings yet
Igel Zibriel Eng Curriculum Vitae 240909
7 pages
OWASP SCP Quick Reference Guide v1
No ratings yet
OWASP SCP Quick Reference Guide v1
12 pages
Final Report
No ratings yet
Final Report
6 pages
IDB Presentation
No ratings yet
IDB Presentation
16 pages
Hostel Management and Logistics
No ratings yet
Hostel Management and Logistics
9 pages
Data Analytics-Introduction: manish@IIITA
No ratings yet
Data Analytics-Introduction: manish@IIITA
26 pages
Pre Paid Legal Services Inc 2023 SOC 3 Report
No ratings yet
Pre Paid Legal Services Inc 2023 SOC 3 Report
16 pages
ITU Spectrum Monitoring Handbook Edition 2002
No ratings yet
ITU Spectrum Monitoring Handbook Edition 2002
608 pages
Solman 7.2 Installation On Sybase PDF
100% (1)
Solman 7.2 Installation On Sybase PDF
102 pages
Test Bank For Information Systems Today 9th Edition by Valacich
100% (3)
Test Bank For Information Systems Today 9th Edition by Valacich
35 pages

LLM Questions

Uploaded by

LLM Questions

Uploaded by

Mastering LLM

Coupon Code - LLM50

Direct Preference Optimization

Optimizes for desired behavior

Exploits reward function

Why Option A is Incorrect:

Size of fine-tuning dataset

Pre-trained model architecture

Predict next word

In essence, self-attention allows transformers to dynamically

Limit vocabulary size

Reduce amount of training data

Make computationally efficient

Incorrect Answer Explanations:

Make computationally efficient: While limiting vocabulary size can

Zipf's law exploit

Incorrect Answer Explanations:

Max new tokens

Lower Temperature: When set low, the softmax layer assigns

Why other options are incorrect:

A) DDP (Distributed Data-Parallel) is a technique

Reduce memory usage

Improve model accuracy

Enhance model interpretability

Optimizing model & data size

Improve model interpretability

Reduce training time

Enhance model scalability

All tasks perform better

b) All tasks perform better: This is incorrect as

Explanation for the incorrect answer (False): If you chose False,

Pre-defined word groups

Attention on specific word

To perform a specific task

Pre-training (A): The LLM is exposed to a massive dataset to

Instruction Fine-tuning (C): The pre-trained model is adapted to

RLHF (Reinforcement Learning from Human Feedback) (B): This

Mixture of Experts (MoE)

Extracting sensitive Info

Hijacking model's output

For example, consider a scenario where an LLM is

On the other hand, vector databases specialize in handling high-

Storage: Vectors are efficiently stored and compressed within the

In summary, for storing multi-dimensional vectors and performing

Inverted File Index

(c) Principal Component Analysis (PCA): This technique reduces

Random Projection Index

Top-K --> meta-data filtering

Meta-data filtering --> Top-K

Top-K --> meta-data filtering

Grouped Query Attention

Global uses all hidden states

Local uses all hidden states

Global focuses on subsets

To normalize the data

Introduce sequence order

Chain of thought prompting

Autonomous info seeking

Breaking down complex

Summarize multiple documents

Select downstream RAG

Uses dynamic retrieval

generate text w/o retrieval

Retrieval, Ranking, Generation

Model pruning is the process of reducing the

Top P acts before Top K

Top K does not matter

Top P does not matter

For each token selection step, the top-K tokens

The most important factor to consider when

1. Starting with the first word in the prompt.

Coupon Code - LLM50

Understand how to overcome limitations of RAG

Introduction to AgenticRAG & techniques like

5 real-time case studies with code

You might also like