PT-2026-28775 — Uncontrolled Recursion in Pypi Justhtml

PT-2026-28775 · Pypi · Justhtml

Published

2026-03-17

Updated

2026-03-17

CVSS v4.0

7.1

High

Vector

AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N

Summary

justhtml through 1.9.1 allows denial of service via deeply nested HTML. During parsing, JustHTML. init () always reaches TreeBuilder.finish(), which unconditionally calls populate selectedcontent(). That function recursively traverses the DOM via find elements() / find element() without a depth bound, allowing attacker-controlled deeply nested input to trigger an unhandled RecursionError on CPython. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.

Details

TreeBuilder.finish() (treebuilder.py#L476) unconditionally calls populate selectedcontent(self.document) at line 494. populate selectedcontent() (treebuilder.py#L1243) calls find elements() (treebuilder.py#L1280) to recursively search the DOM tree for <select> elements:

python

def find elements(self, node: Any, name: str, result: list[Any]) -> None:
  """Recursively find all elements with given name."""
  if node.name == name:
    result.append(node)
  if node.has child nodes():
    for child in node.children:
      self. find elements(child, name, result) # recursive call

When the DOM tree depth exceeds CPython's default recursion limit (1000), this raises an unhandled RecursionError. The full call path is:

JustHTML(html) → tokenizer.run() → tree builder.finish() → populate selectedcontent(document) → find elements(root, "select", selects) (recursive)

Deeply nested DOM trees can be produced by nesting <div> tags ~1000 levels deep. On CPython with the default recursion limit, approximately 11 KB of <div> nesting is sufficient to trigger the error. The exact depth threshold is environment-dependent (CPython version, recursion limit setting, call stack depth at invocation).

Additional recursive functions are affected on already-parsed deep trees:

Node.clone node(deep=True) (node.py#L523) — called during sanitization
node to html() (serialize.py#L580) — used by to html(pretty=True)
to markdown walk() (node.py#L817) — used by to markdown()

Note: the library already uses iterative traversal in several comparable functions (e.g., node to html compact at serialize.py#L197, to text collect at node.py#L161, is blocky element at serialize.py#L405, apply to children at transforms.py#L1642), demonstrating the correct pattern.

PoC

python

from justhtml import JustHTML

html = "<div>" * 1000 + "x" + "</div>" * 1000
doc = JustHTML(html) # raises RecursionError

Test environment: CPython 3.14.3, macOS ARM64 (Apple Silicon), justhtml 1.9.1, default recursion limit (1000)

Input	Size	Result
`<div>` × 500	5,501 bytes	OK
`<div>` × 800	8,801 bytes	OK
`<div>` × 1000	11,001 bytes	RecursionError

The error occurs with both sanitize=True (default) and sanitize=False.

Impact

An attacker who can supply HTML for parsing can trigger an unhandled RecursionError during JustHTML() construction. The error is triggered during construction and is not avoided by justhtml configuration alone; mitigating it requires host-application exception handling or input constraints. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.

Suggested Fix

Convert the recursive tree traversal functions to iterative implementations using an explicit stack. Example for find elements:

python

def find elements(self, node: Any, name: str, result: list[Any]) -> None:
  stack = [node]
  while stack:
    current = stack.pop()
    if current.name == name:
      result.append(current)
    if current.has child nodes():
      stack.extend(reversed(current.children))

The same conversion should be applied to find element, clone node(deep=True), node to html(), and to markdown walk().

Fix

Uncontrolled Recursion

Found an issue in the description? Have something to add? Feel free to write us 👾

dbugs@ptsecurity.com

Weakness Enumeration

CWE-674

Related Identifiers

GHSA-V7CF-C9RM-WM3J

Affected Products

Justhtml

PT-2026-28775 · Pypi · Justhtml

Summary

Details

PoC

Impact

Suggested Fix

Weakness Enumeration

Related Identifiers

Affected Products

References · 4