[Roadmap] - Speculative Decoding Support

@NorthmanPKU

Feature Description

A key advantage of MPK is achieving extremely low-latency for LLM serving. Speculative decoding offers a lossless approach to further reducing LLM latency which will push MPK's decoding speedup to the next level.

We will firstly implement model free methods such as prompt-lookup and lookahead decoding. Others will be on the schedule later.

Any comment on the road map && taking tasks you're interested in is welcomed.

General TODO (to be extended)

Python interface parameters to decide using Spec or not, methods type, parameters needed and so on.
Parallel decoding support of different kernels
Mask support for topolgy-aware verification (e.g. Specinfer)

Specified TODO

Please see in different method issue pages.

Methods to support

Model free methods

Prompt Lookup Decoding @NorthmanPKU
Lookahead Decoding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Description

General TODO (to be extended)

Specified TODO

Methods to support

Model free methods

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Feature Description

General TODO (to be extended)

Specified TODO

Methods to support

Model free methods

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions