8000 [Roadmap] - Speculative Decoding Support · Issue #365 · mirage-project/mirage · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Roadmap] - Speculative Decoding Support #365
Open
1 of 3 issues completed
Open
1 of 3 issues completed
@NorthmanPKU

Description

@NorthmanPKU

Feature Description

A key advantage of MPK is achieving extremely low-latency for LLM serving. Speculative decoding offers a lossless approach to further reducing LLM latency which will push MPK's decoding speedup to the next level.

We will firstly implement model free methods such as prompt-lookup and lookahead decoding. Others will be on the schedule later.

Any comment on the road map && taking tasks you're interested in is welcomed.

General TODO (to be extended)

  • Python interface parameters to decide using Spec or not, methods type, parameters needed and so on.
  • Parallel decoding support of different kernels
  • Mask support for topolgy-aware verification (e.g. Specinfer)

Specified TODO

Please see in different method issue pages.

Methods to support

Model free methods

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0