0% found this document useful (0 votes)

3 views29 pages

Decoding Algorithms NLP

The document discusses decoding algorithms in large language models, focusing on both basic and advanced strategies for generating text. It covers methods such as greedy decoding, sampling techniques (temperature, top-k, top-p, min-p), beam search, and speculative decoding, highlighting their implications on output quality and generation speed. The presentation emphasizes the importance of decoding in enhancing coherence, creativity, and efficiency in language model outputs.

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views29 pages

Decoding Algorithms NLP

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Decoding Algorithms in

Large Language Models

Apoorv Saxena | Research Scientist

24 Feb 2025, Indian Institute of Science, Bangalore

Agenda
§ Introduction to decoding and theoretical foundations
§ Basic decoding strategies
§ Temperature, Top-k, top-p, greedy…

§ Min-p sampling

§ Advanced decoding strategies

§ Beam search

§ Speculative decoding (and variants)

We will also be looking at the code of some basic decoding strategies – and please feel free to interrupt and
ask questions!

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

What is decoding?
• Definition: Generating text by selecting tokens based on model probabilities
• Context: Autoregressive models (e.g., GPT)
• Modeling assumption: Models have been trained on the next-token prediction task

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Decoding in Autoregressive LLMs
§ We have a trained model that performs next-token
prediction.
§ Task: Use it to generate text iteratively.
§ Key Idea: Autoregression – Use previously generated tokens
as input to predict the next token.
§ Process:
§ Compute next token probabilities

§ Select next token

§ Append to sequence and repeat

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Theoretical Foundation – Language Modeling Equation

Notes:
§ f represents model logits
§ Transformation to probabilities via softmax

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Theoretical Foundation – Sequence Probability

§ Implication: Maximizing overall sequence likelihood can be achieved this way

§ Note: This is a theoretical construct

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Why Does Decoding Matter?
• Connections:
• Training (next-token prediction) vs. Inference (text generation)

• Impact on Output:
• Coherence and correctness

• Creativity, diversity

• Speed of generation

• Still an underexplored area – even recently, simple innovations can lead to major gains (eg. PLD
decoding)!

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Quick Example
• Prompt: “The cat sat on the …”
• Token Probabilities:
• "mat" – 0.4

• "chair" – 0.3

• "floor" – 0.2

• "roof" – 0.1

• How do we choose the next token?

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Greedy Decoding
• Definition: Always select the token with the highest probability.
• Pros:
• Simple and fast

• Cons:
• Often leads to repetitive or suboptimal outputs

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Sampling: The Basics
• Definition: Randomly select tokens based on the probability distribution.
• Key Parameter: Temperature (T)
• Low T: More focused, conservative outputs.

• High T: Increased diversity and randomness.

Without temperature T With temperature T

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Sampling: top-k and top-p
Top-k Top-p (nucleus sampling*)

• Concept: Restrict selection to the top-k • Concept: Choose tokens from the smallest set whose
most probable tokens. cumulative probability exceeds p

• Example: k=50 • Example: p=0.9

• Effect: Filters out long-tail, low-probability

tokens, reducing noise.

• *Holtzmann et al, 2020, “The curious case of neural

text degeneration”

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Sampling – min-p

Minh et al, Oct 2024 – “Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs”
© 2025 Adobe. All Rights Reserved. Adobe Confidential.
Sampling – min-p

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Sampling – min-p

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Advanced decoding methods - Introduction
§ Decoding so far
§ Single sequence under consideration

§ Different strategies for selecting next token, given probability vector

§ Next sections
§ Beam search: Keeping multiple sequences under consideration simultaneously

§ Speculative decoding: Speed up LLM decoding without affecting output quality

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Why multiple sequences?
§ Lets revisit the sequence probability formulation

• Q: Is there a way to maximize sequence probability while generating?

• Naïve answer: Score all possible sequences, get the maximum
• But with limited time/compute budget – beam search

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

What is Beam Search?
• Core Idea:
• Rather than choosing just the highest probability token (as in greedy decoding), beam search keeps the top-k sequences at each step.

• Terminology:
• Beam Width (k): Number of candidate sequences retained.

• Motivation:
• Avoids early commitment to a single sequence that may lead to suboptimal outputs.

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

How Beam Search Works – Step-by-Step
1. Initialization:
1. Start with the initial token or prompt.

2. Expansion:
1. For each sequence in the beam, generate possible next tokens.

3. Scoring:
1. Compute scores (cumulative log probabilities) for each candidate.

4. Pruning:

1. Keep the top-k highest-scoring sequences.

5. Iteration:
image credits: https://d2l.ai/
1. Repeat until an end-of-sequence token is generated or a maximum length is reached.

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Scoring in Beam Search

• Why Log?
• Enhances numerical stability and makes the multiplication of probabilities manageable as summation.

• Interpretation:
• The sequence with the highest cumulative score is considered the best candidate.

© 2025 Adobe. All Rights Reserved. Adobe Confidential.

Pros & Cons of Beam Search
• Advantages:
• Improved Coherence*: Explores multiple paths, often leading to more fluent text.

• Better Global Quality*: Reduces the risk of getting stuck in locally optimal (but globally suboptimal) decisions.

• Disadvantages:
• Computational Cost: More sequences to evaluate compared to greedy decoding.

• Reduced Diversity: Can still converge to similar outputs if beam width is narrow.

• Complexity: Requires careful tuning of the beam width parameter.

• Objective doesn’t align with training: While it maximizes the language modeling objective, the underlying models were only trained to predict
next token, not the full sequence!

Speculative Decoding - Motivation & Background
• Autoregressive Generation:
• Generates text one token at a time

• Can be slow due to sequential dependency – even simple to predict tokens take the same amount of time!

• Need for Speed:

• Real-time applications require faster inference

• Speculative decoding offers a way to reduce latency

• Key Idea:
• Use a fast, approximate model to “speculate” future tokens

• Validate these tokens with a more accurate model

Speculative Decoding
§ What makes it possible?
§ The ”attention is all you need” architecture!

§ Key Insight: We have access to next-token probs for all tokens in sequence – not just last token!
§ If we could make educated guesses for next k tokens, how do we leverage it?

Great blog post on spec-dec: https://huggingface.co/blog/assisted-generation

Prompt Lookup Decoding
§ Speculative decoding: Requires an assistant model
§ Additional VRAM requirements

§ Need to consider speed/quality tradeoff with smaller models

§ Let’s consider some limited usecases – document summarization, doc QA, code editing
§ Is there a way to get good draft tokens without using an additional model?

Prompt Lookup Decoding
§ Use the prompt itself!
§ Steps:
§ Take the last few generated tokens (so far)

§ Search for these in the prompt (eg. Document, previous code in prompt)

§ If a match is found – continuation of these tokens is the draft!

§ Use model as verifier, repeat

Prompt Lookup Decoding

“Draft Model”

Part of major LLM inference libraries,

Prompt Lookup Decoding

https://github.com/apoorvumang/prompt-lookup-decoding
Somasundaram et al, 2024, “PLD+: Accelerating LLM inference by leveraging Language Model Artifacts”
© 2025 Adobe. All Rights Reserved. Adobe Confidential.
Recap & Key Takeaways
§ Decoding basics, theoretical underpinnings
§ Overview of deterministic and stochastic methods
§ Greedy, sampling, sampling parameters (temperature, top-k, top-p, min-p)

§ Advanced methods for efficiency and quality

§ Beam search

§ Speculative decoding

§ Prompt lookup decoding

§ Relatively underexplored field – even Deepseek R1 uses temp=0.7 sampling!

§ Possible to make progress on SoTA even with modest GPU resources!

50 LLM Interview Questions
100% (1)
50 LLM Interview Questions
56 pages
DeepSeek V3
No ratings yet
DeepSeek V3
53 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
13 TextGen 2024
No ratings yet
13 TextGen 2024
106 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Foc QP 3
No ratings yet
Foc QP 3
18 pages
Lecture 13 - Transformer Encoder Decoderv2
No ratings yet
Lecture 13 - Transformer Encoder Decoderv2
65 pages
08 Natural Language Processing in Tensorflow
No ratings yet
08 Natural Language Processing in Tensorflow
29 pages
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
No ratings yet
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
21 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
LBDL A5 Booklet
No ratings yet
LBDL A5 Booklet
82 pages
NLP Basics
No ratings yet
NLP Basics
119 pages
495 Lecture 13 Trans Decoder
No ratings yet
495 Lecture 13 Trans Decoder
21 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Dlunit 4
No ratings yet
Dlunit 4
122 pages
2023 - The Benefits of Bad Advice
No ratings yet
2023 - The Benefits of Bad Advice
15 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
ERNIE Technical Report
No ratings yet
ERNIE Technical Report
72 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
Pretraining and Evaluation CodeLLMs
No ratings yet
Pretraining and Evaluation CodeLLMs
71 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
ChatGPT Teardown
No ratings yet
ChatGPT Teardown
9 pages
ChatGPT Teardown
No ratings yet
ChatGPT Teardown
9 pages
Accelerating Large Language Model Decoding With Speculative Sampling
No ratings yet
Accelerating Large Language Model Decoding With Speculative Sampling
11 pages
L23 Autoencoders
No ratings yet
L23 Autoencoders
16 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
Deep
No ratings yet
Deep
73 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Eagle Decoding
No ratings yet
Eagle Decoding
13 pages
Unlocking Efficiency in Large Language Model Inference
No ratings yet
Unlocking Efficiency in Large Language Model Inference
17 pages
NLP Short
No ratings yet
NLP Short
5 pages
Lookahead Decoding
No ratings yet
Lookahead Decoding
16 pages
11 Seq To Seq Model
No ratings yet
11 Seq To Seq Model
30 pages
Break The Sequential Dependency of LLM Inference Using L D: Ookahead Ecoding
No ratings yet
Break The Sequential Dependency of LLM Inference Using L D: Ookahead Ecoding
16 pages
4786 Planning With Large Language M
No ratings yet
4786 Planning With Large Language M
28 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Decoding Speculative Decoding Feb-24
No ratings yet
Decoding Speculative Decoding Feb-24
12 pages
Deep Recurrent Neural Networks
No ratings yet
Deep Recurrent Neural Networks
24 pages
NeurIPS 2022 Error Correction Code Transformer Paper Conference
No ratings yet
NeurIPS 2022 Error Correction Code Transformer Paper Conference
11 pages
Autojudge: Judge Decoding Without Manual Annotation: Roman Garipov Fedor Velikonivtsev
No ratings yet
Autojudge: Judge Decoding Without Manual Annotation: Roman Garipov Fedor Velikonivtsev
18 pages
Experiment 11
No ratings yet
Experiment 11
4 pages
Blockwise Parallel Decoding For Deep Autoregressive Models
No ratings yet
Blockwise Parallel Decoding For Deep Autoregressive Models
10 pages
Speech To Text Beam Search
No ratings yet
Speech To Text Beam Search
15 pages
Lec 11
No ratings yet
Lec 11
30 pages
Beyond The Speculative Game: A Survey of Speculative Execution in Large Language Models
No ratings yet
Beyond The Speculative Game: A Survey of Speculative Execution in Large Language Models
10 pages
Ucs664 Est 23
No ratings yet
Ucs664 Est 23
3 pages
Unit Iii
No ratings yet
Unit Iii
15 pages
PEC Cohort 2 Gen AI Training Day 3
No ratings yet
PEC Cohort 2 Gen AI Training Day 3
14 pages
AAI - Module 2 - Variational Autoencoders
No ratings yet
AAI - Module 2 - Variational Autoencoders
9 pages
December Deep Learning
No ratings yet
December Deep Learning
10 pages
Xia Et Al. - 2024 - Unlocking Efficiency in Large Language Model Infer
No ratings yet
Xia Et Al. - 2024 - Unlocking Efficiency in Large Language Model Infer
17 pages
Major Project
No ratings yet
Major Project
13 pages
Soal UAS Bahasa Inggris Kelas 6 Semester 1
100% (2)
Soal UAS Bahasa Inggris Kelas 6 Semester 1
2 pages
Chick Literature
No ratings yet
Chick Literature
9 pages
Multi-Candidate Speculative Decoding: National Key Laboratory For Novel Software Technology, Nanjing University
No ratings yet
Multi-Candidate Speculative Decoding: National Key Laboratory For Novel Software Technology, Nanjing University
15 pages
Copyreading & Headline Writing-Division Virtual Training
No ratings yet
Copyreading & Headline Writing-Division Virtual Training
56 pages
Tabla For Advanced Students 9788179914441 - Compress PDF
No ratings yet
Tabla For Advanced Students 9788179914441 - Compress PDF
117 pages
Module 2
No ratings yet
Module 2
157 pages
Training Guide Final
No ratings yet
Training Guide Final
34 pages
Alicia Cardigan Template: Print OUT & Keep
No ratings yet
Alicia Cardigan Template: Print OUT & Keep
22 pages
IOI Training Week 7 Advanced Data Structures: 1.1 Square-Root (SQRT) Decomposition
No ratings yet
IOI Training Week 7 Advanced Data Structures: 1.1 Square-Root (SQRT) Decomposition
6 pages
Imp GRC Tables
No ratings yet
Imp GRC Tables
3 pages
The Whole Art of Detection - by Sherlock Holmes
100% (2)
The Whole Art of Detection - by Sherlock Holmes
55 pages
Be or Modal Verbs (Can, Must, Might, Should Etc.)
No ratings yet
Be or Modal Verbs (Can, Must, Might, Should Etc.)
4 pages
Lecture 4 Word Representation
No ratings yet
Lecture 4 Word Representation
48 pages
Lecture 1 Course Overview
No ratings yet
Lecture 1 Course Overview
41 pages
Lecture 14 Post Training Annotations
No ratings yet
Lecture 14 Post Training Annotations
79 pages
Lecture 10 Seq To Seq Annotations
No ratings yet
Lecture 10 Seq To Seq Annotations
68 pages
Bloomsbury Guidelines For Contributors
No ratings yet
Bloomsbury Guidelines For Contributors
4 pages
Lecture 2 Generative Text Classification W Annotations
No ratings yet
Lecture 2 Generative Text Classification W Annotations
27 pages
Towards A Typology of Poetic Forms From
No ratings yet
Towards A Typology of Poetic Forms From
4 pages
OT24 Jericho Usa
No ratings yet
OT24 Jericho Usa
15 pages
Week6 PDF
No ratings yet
Week6 PDF
47 pages
Holographic Microscopy With Python and Holopy
No ratings yet
Holographic Microscopy With Python and Holopy
8 pages
Aaaaa
No ratings yet
Aaaaa
18 pages
Together Kl5 U1 Test For Dyslexic Students
No ratings yet
Together Kl5 U1 Test For Dyslexic Students
4 pages
Module 1
No ratings yet
Module 1
104 pages
MT48 Release Notes V1
No ratings yet
MT48 Release Notes V1
5 pages
Evaluation of The Bangor Dyslexia Test (BDT) For Use With Adults
No ratings yet
Evaluation of The Bangor Dyslexia Test (BDT) For Use With Adults
38 pages
Form 430 ECS Familiarisation Checklist
No ratings yet
Form 430 ECS Familiarisation Checklist
7 pages
NOI - PH 2017 Training Week 9: Kevin Charles Atienza
No ratings yet
NOI - PH 2017 Training Week 9: Kevin Charles Atienza
6 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Puncation Tutoira
No ratings yet
Puncation Tutoira
4 pages
Mod 5
No ratings yet
Mod 5
19 pages
Essay - A Portrait of A Lady On Fire
No ratings yet
Essay - A Portrait of A Lady On Fire
4 pages
Test Design Specification Template
No ratings yet
Test Design Specification Template
5 pages
Week8 PDF
No ratings yet
Week8 PDF
44 pages
CG - Module 1 Introduction
No ratings yet
CG - Module 1 Introduction
41 pages
Travelling: Types of Transport
No ratings yet
Travelling: Types of Transport
2 pages
22WHO GMP CoPP Units 1
No ratings yet
22WHO GMP CoPP Units 1
1 page
Time Place : at On in
No ratings yet
Time Place : at On in
4 pages
Module 3 Notes
No ratings yet
Module 3 Notes
34 pages
Graph Theory: What Are Graphs?
No ratings yet
Graph Theory: What Are Graphs?
25 pages
SILABUS Web Devices
No ratings yet
SILABUS Web Devices
6 pages
Descriptive Froebelian Writers
No ratings yet
Descriptive Froebelian Writers
1 page
Computer Graphics and OpenGL-18CS55-M5 NOTES
No ratings yet
Computer Graphics and OpenGL-18CS55-M5 NOTES
29 pages
Module 4 Notes
No ratings yet
Module 4 Notes
36 pages
Java Workshop
No ratings yet
Java Workshop
2 pages
Presentation 94
No ratings yet
Presentation 94
5 pages
NOI - PH Training: Week 4: Jared Guissmo Asuncion
No ratings yet
NOI - PH Training: Week 4: Jared Guissmo Asuncion
22 pages
IOI Training Phillipines
No ratings yet
IOI Training Phillipines
4 pages
Annotating: Why and How: How To Mark A Book by Mortimer J. Adler, PH.D
No ratings yet
Annotating: Why and How: How To Mark A Book by Mortimer J. Adler, PH.D
3 pages
IOI Training 2017 - Week 3 Dynamic Programming: Vernon Gutierrez March 2017
No ratings yet
IOI Training 2017 - Week 3 Dynamic Programming: Vernon Gutierrez March 2017
12 pages
Beyond Effective Go: Part 1 - Achieving High-Performance Code
From Everand
Beyond Effective Go: Part 1 - Achieving High-Performance Code
Corey S Scott
5/5 (1)
The Beginner’s Guide to Kilo Code
From Everand
The Beginner’s Guide to Kilo Code
Steven Mcananey
No ratings yet
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet