PT-2025-41232 · Pypi · Vllm

Published

2025-10-07

·

Updated

2025-10-07

·

CVE-2025-61620

CVSS v3.1

6.5

Medium

AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Summary

A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the chat template and chat template kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.

Details

When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a chat template parameter that lets users specify that template. In addition, the server accepts a chat template kwargs parameter to pass extra keyword arguments to the rendering function.
Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.
Importantly, simply forbidding the chat template parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for apply hf chat template and then updates that dictionary with the user-supplied chat template kwargs via dict.update. Since dict.update can overwrite existing keys, an attacker can place a chat template key inside chat template kwargs to replace the template that will be used by apply hf chat template.
# vllm/entrypoints/openai/serving engine.py#L794-L816
 chat template kwargs: dict[str, Any] = dict(
  chat template=chat template,
  add generation prompt=add generation prompt,
  continue final message=continue final message,
  tools=tool dicts,
  documents=documents,
)
 chat template kwargs.update(chat template kwargs or {})

request prompt: Union[str, list[int]]
if isinstance(tokenizer, MistralTokenizer):
  ...
else:
  request prompt = apply hf chat template(
    tokenizer=tokenizer,
    conversation=conversation,
    model config=model config,
    ** chat template kwargs,
  )

Impact

If an OpenAI-Compatible Server exposes endpoints that accept chat template or chat template kwargs from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding chat template inside chat template kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.

Fixes

Fix

RCE

Allocation of Resources Without Limits

Resource Exhaustion

Weakness Enumeration

Related Identifiers

CVE-2025-61620
GHSA-6FVQ-23CW-5628

Affected Products

Vllm