Yunzez

#41018of 53,632

6.5Total CVSS

Vulnerabilities · 1

Vllm · Vllm · CVE-2026-44223

**Name of the Vulnerable Software and Affected Versions** vLLM versions 0.18.0 through 0.19.1 **Description** The `extract hidden states` speculative decoding proposer returns a tensor with an incorrect shape after the first decode step, leading to a `RuntimeError` that crashes the EngineCore process. This occurs when any request in a batch utilizes sampling penalty parameters: `repetition penalty`, `frequency penalty`, or `presence penalty`. The issue stems from a refactor in the `propose()` function where the removal of a `.unsqueeze(-1)` call caused a broadcast shape mismatch during penalty application, as the rejection sampler produces a shape of `(batch size, 2)` instead of the expected `(batch size, 1)` after the first decode step. A single request containing a penalty parameter is sufficient to cause a deterministic and immediate server crash, resulting in a complete loss of service availability. **Recommendations** Update to version 0.20.0. Avoid using `extract hidden states` as the speculative decoding method. Strip or reject the `repetition penalty`, `frequency penalty`, and `presence penalty` parameters from incoming requests at the API gateway.