PT-2026-30199 · Pypi · Vllm

Published

2026-04-03

·

Updated

2026-04-03

·

CVE-2026-34756

CVSS v3.1

6.5

Medium

AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Summary

A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.

Details

The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:
  1. Protocol Layer: In vllm/entrypoints/openai/chat completion/protocol.py, the n parameter is defined simply as an integer without any pydantic.Field constraints for an upper bound.
class ChatCompletionRequest(OpenAIBaseModel):
  # Ordered by official OpenAI API documentation
  # https://platform.openai.com/docs/api/reference/chat/create
  messages: list[ChatCompletionMessageParam]
  model: str | None = None
  frequency penalty: float | None = 0.0
  logit bias: dict[str, float] | None = None
  logprobs: bool | None = False
  top logprobs: int | None = 0
  max tokens: int | None = Field(
    default=None,
    deprecated="max tokens is deprecated in favor of "
    "the max completion tokens field",
  )
  max completion tokens: int | None = None
  n: int | None = 1
  presence penalty: float | None = 0.0
  1. SamplingParams Layer (Incomplete Validation): When the API request is converted to internal SamplingParams in vllm/sampling params.py, the verify args method only checks the lower bound (self.n < 1), entirely omitting an upper bounds check.
  def verify args(self) -> None:
    if not isinstance(self.n, int):
      raise ValueError(f"n must be an int, but is of type {type(self.n)}")
    if self.n < 1:
      raise ValueError(f"n must be at least 1, got {self.n}.")
  1. Engine Layer (The OOM Trigger): When the malicious request reaches the core engine (vllm/v1/engine/async llm.py), the engine attempts to fan out the request n times to generate identical independent sequences within a synchronous loop.
    # Fan out child requests (for n>1).
    parent request = ParentRequest(request)
    for idx in range(parent params.n):
      request id, child params = parent request.get child info(idx)
      child request = request if idx == parent params.n - 1 else copy(request)
      child request.request id = request id
      child request.sampling params = child params
      await self. add request(
        child request, prompt text, parent request, idx, queue
      )
    return queue
Because Python's asyncio runs on a single thread and event loop, this monolithic for-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances via copy(request), driving the host's Resident Set Size (RSS) up by gigabytes per second until the OS OOM-killer terminates the vLLM process.

Impact

Vulnerability Type: Resource Exhaustion / Denial of Service
Impacted Parties:
  • Any individual or organization hosting a public-facing vLLM API server (vllm.entrypoints.openai.api server), which happens to be the primary entrypoint for OpenAI-compatible setups.
  • SaaS / AI-as-a-Service platforms acting as reverse proxies sitting in front of vLLM without strict HTTP body payload validation or rate limitations.
Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.

Fix

Allocation of Resources Without Limits

Weakness Enumeration

Related Identifiers

CVE-2026-34756
GHSA-3MWP-WVH9-7528

Affected Products

Vllm