Vllm · Vllm · CVE-2026-44223
**Name of the Vulnerable Software and Affected Versions**
vLLM versions 0.18.0 through 0.19.1
**Description**
The `extract hidden states` speculative decoding proposer returns a tensor with an incorrect shape after the first decode step, leading to a `RuntimeError` that crashes the EngineCore process. This occurs when any request in a batch utilizes sampling penalty parameters: `repetition penalty`, `frequency penalty`, or `presence penalty`. The issue stems from a refactor in the `propose()` function where the removal of a `.unsqueeze(-1)` call caused a broadcast shape mismatch during penalty application, as the rejection sampler produces a shape of `(batch size, 2)` instead of the expected `(batch size, 1)` after the first decode step. A single request containing a penalty parameter is sufficient to cause a deterministic and immediate server crash, resulting in a complete loss of service availability.
**Recommendations**
Update to version 0.20.0.
Avoid using `extract hidden states` as the speculative decoding method.
Strip or reject the `repetition penalty`, `frequency penalty`, and `presence penalty` parameters from incoming requests at the API gateway.