NFDI4DS | UHH-SEMS - Publication Details

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2402.13720 Publication Date: 2024-02-21

Abstract Supplemental Material References Cited by

AUTHORS (6)

Weilin Zhao

Yuxiang Huang

Xu Han

Chaojun Xiao

Zhiyuan Liu

Maosong Sun

ABSTRACT

Drafting-then-verifying decoding methods such as speculative are widely adopted training-free to accelerate the inference of large language models (LLMs). Instead employing an autoregressive process decode tokens sequentially, initially creates drafts with efficient small model. Then LLMs required conduct verification and correction in a non-autoregressive fashion minimize time overhead. Generating longer can lead even more significant speedups once verified, but also incurs substantial trial error costs if it fails. Suffering from high failure probability, existing cannot draft too much content for at one time, achieving sub-optimal acceleration. In this paper, we introduce Ouroboros, which constructs phrase candidate pool provide candidates generation Thereby, Ouroboros further improve efficiency effectiveness initial drafts. The experimental results on typical text tasks show that achieves up 1.9x 2.8x compared lookahead decoding, respectively. source code is available https://github.com/thunlp/Ouroboros.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....