PT-2026-38288 · Vllm · Vllm
Yunzez
·
Published
2026-05-06
·
Updated
2026-05-12
·
CVE-2026-44223
CVSS v3.1
6.5
Medium
| Vector | AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H |
Name of the Vulnerable Software and Affected Versions
vLLM versions 0.18.0 through 0.19.1
Description
The
extract hidden states speculative decoding proposer returns a tensor with an incorrect shape after the first decode step, leading to a RuntimeError that crashes the EngineCore process. This occurs when any request in a batch utilizes sampling penalty parameters: repetition penalty, frequency penalty, or presence penalty. The issue stems from a refactor in the propose() function where the removal of a .unsqueeze(-1) call caused a broadcast shape mismatch during penalty application, as the rejection sampler produces a shape of (batch size, 2) instead of the expected (batch size, 1) after the first decode step. A single request containing a penalty parameter is sufficient to cause a deterministic and immediate server crash, resulting in a complete loss of service availability.Recommendations
Update to version 0.20.0.
Avoid using
extract hidden states as the speculative decoding method.
Strip or reject the repetition penalty, frequency penalty, and presence penalty parameters from incoming requests at the API gateway.Exploit
Fix
Incorrect Type Conversion or Cast
Found an issue in the description? Have something to add? Feel free to write us 👾
Related Identifiers
Affected Products
Vllm