Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Liao, Baohao; Xu, Yuhui; Dong, Hanze; Li, Junnan; Monz, Christof; Savarese, Silvio; Sahoo, Doyen; Xiong, Caiming

Computer Science > Computation and Language

arXiv:2501.19324 (cs)

[Submitted on 31 Jan 2025 (v1), last revised 26 Jun 2025 (this version, v3)]

Title:Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Authors:Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong

View PDF HTML (experimental)

Abstract:We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios. The code is available at this https URL.

Comments:	17 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.19324 [cs.CL]
	(or arXiv:2501.19324v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.19324

Submission history

From: Yuhui Xu [view email]
[v1] Fri, 31 Jan 2025 17:19:57 UTC (639 KB)
[v2] Fri, 14 Feb 2025 07:30:00 UTC (639 KB)
[v3] Thu, 26 Jun 2025 03:14:46 UTC (866 KB)

Computer Science > Computation and Language

Title:Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators