PT-2026-38288 · Vllm · Vllm

Yunzez

·

Published

2026-05-06

·

Updated

2026-05-12

·

CVE-2026-44223

CVSS v3.1

6.5

Medium

VectorAV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Name of the Vulnerable Software and Affected Versions vLLM versions 0.18.0 through 0.19.1
Description The extract hidden states speculative decoding proposer returns a tensor with an incorrect shape after the first decode step, leading to a RuntimeError that crashes the EngineCore process. This occurs when any request in a batch utilizes sampling penalty parameters: repetition penalty, frequency penalty, or presence penalty. The issue stems from a refactor in the propose() function where the removal of a .unsqueeze(-1) call caused a broadcast shape mismatch during penalty application, as the rejection sampler produces a shape of (batch size, 2) instead of the expected (batch size, 1) after the first decode step. A single request containing a penalty parameter is sufficient to cause a deterministic and immediate server crash, resulting in a complete loss of service availability.
Recommendations Update to version 0.20.0. Avoid using extract hidden states as the speculative decoding method. Strip or reject the repetition penalty, frequency penalty, and presence penalty parameters from incoming requests at the API gateway.

Exploit

Fix

Incorrect Type Conversion or Cast

Weakness Enumeration

Related Identifiers

CVE-2026-44223
GHSA-83VM-P52W-F9PW
PYSEC-2026-145

Affected Products

Vllm