[V1][Spec Decode] Do not generate draft tokens beyond max_model_len #16087

WoosukKwon · 2025-04-05T05:11:04Z

Implements 4. Handle the edge cases like when the draft model generates beyond max_pos_embeddings in #15901

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

github-actions · 2025-04-05T05:11:14Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

comaniac

Overall LGTM. Approve first to unblock the PR. Meanwhile it would be good to have a unit test for it. Also do we know the overhead of introduced ops (e.g., torch.where)?

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[Spec Decode] Do not generate draft tokens beyond max_model_len

0931329

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 5, 2025 05:11

WoosukKwon changed the title ~~[Spec Decode] Do not generate draft tokens beyond max_model_len~~ [V1][Spec Decode] Do not generate draft tokens beyond max_model_len Apr 5, 2025

mergify bot added the v1 label Apr 5, 2025

comaniac approved these changes Apr 5, 2025

View reviewed changes

comaniac added the needs-tests Tests needed for this PR label Apr 5, 2025

WoosukKwon added 2 commits April 5, 2025 10:02

Fix ngram

55b6e1d

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Update comments

f5c3af6

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon mentioned this pull request Apr 6, 2025

[SpecDecode] Support EAGLE in V1 #15901

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Spec Decode] Do not generate draft tokens beyond max_model_len #16087

[V1][Spec Decode] Do not generate draft tokens beyond max_model_len #16087

WoosukKwon commented Apr 5, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Apr 5, 2025

comaniac left a comment

[V1][Spec Decode] Do not generate draft tokens beyond max_model_len #16087

Are you sure you want to change the base?

[V1][Spec Decode] Do not generate draft tokens beyond max_model_len #16087

Conversation

WoosukKwon commented Apr 5, 2025 • edited by github-actions bot Loading

github-actions bot commented Apr 5, 2025

comaniac left a comment

Choose a reason for hiding this comment

WoosukKwon commented Apr 5, 2025 •

edited by github-actions bot

Loading