PT-2025-19362 · Pypi · Vllm

Published

2025-04-15

·

Updated

2025-04-15

CVSS v3.1

6.5

Medium

VectorAV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Impact

This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided decoding backend key of the extra body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.

Patches

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

Fix

Allocation of Resources Without Limits

Found an issue in the description? Have something to add? Feel free to write us 👾

Weakness Enumeration

Related Identifiers

GHSA-HF3C-WXG2-49Q9

Affected Products

Vllm