PT-2026-28775 · Pypi · Justhtml

Published

2026-03-17

·

Updated

2026-03-17

CVSS v4.0

7.1

High

VectorAV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N

Summary

justhtml through 1.9.1 allows denial of service via deeply nested HTML. During parsing, JustHTML. init () always reaches TreeBuilder.finish(), which unconditionally calls populate selectedcontent(). That function recursively traverses the DOM via find elements() / find element() without a depth bound, allowing attacker-controlled deeply nested input to trigger an unhandled RecursionError on CPython. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.

Details

TreeBuilder.finish() (treebuilder.py#L476) unconditionally calls populate selectedcontent(self.document) at line 494. populate selectedcontent() (treebuilder.py#L1243) calls find elements() (treebuilder.py#L1280) to recursively search the DOM tree for <select> elements:
python
def find elements(self, node: Any, name: str, result: list[Any]) -> None:
  """Recursively find all elements with given name."""
  if node.name == name:
    result.append(node)
  if node.has child nodes():
    for child in node.children:
      self. find elements(child, name, result) # recursive call
When the DOM tree depth exceeds CPython's default recursion limit (1000), this raises an unhandled RecursionError. The full call path is:
JustHTML(html)tokenizer.run()tree builder.finish() populate selectedcontent(document) find elements(root, "select", selects) (recursive)
Deeply nested DOM trees can be produced by nesting <div> tags ~1000 levels deep. On CPython with the default recursion limit, approximately 11 KB of <div> nesting is sufficient to trigger the error. The exact depth threshold is environment-dependent (CPython version, recursion limit setting, call stack depth at invocation).
Additional recursive functions are affected on already-parsed deep trees:
Note: the library already uses iterative traversal in several comparable functions (e.g., node to html compact at serialize.py#L197, to text collect at node.py#L161, is blocky element at serialize.py#L405, apply to children at transforms.py#L1642), demonstrating the correct pattern.

PoC

python
from justhtml import JustHTML

html = "<div>" * 1000 + "x" + "</div>" * 1000
doc = JustHTML(html) # raises RecursionError
Test environment: CPython 3.14.3, macOS ARM64 (Apple Silicon), justhtml 1.9.1, default recursion limit (1000)
InputSizeResult
<div> × 5005,501 bytesOK
<div> × 8008,801 bytesOK
<div> × 100011,001 bytesRecursionError
The error occurs with both sanitize=True (default) and sanitize=False.

Impact

An attacker who can supply HTML for parsing can trigger an unhandled RecursionError during JustHTML() construction. The error is triggered during construction and is not avoided by justhtml configuration alone; mitigating it requires host-application exception handling or input constraints. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.

Suggested Fix

Convert the recursive tree traversal functions to iterative implementations using an explicit stack. Example for find elements:
python
def find elements(self, node: Any, name: str, result: list[Any]) -> None:
  stack = [node]
  while stack:
    current = stack.pop()
    if current.name == name:
      result.append(current)
    if current.has child nodes():
      stack.extend(reversed(current.children))
The same conversion should be applied to find element, clone node(deep=True), node to html(), and to markdown walk().

Fix

Uncontrolled Recursion

Found an issue in the description? Have something to add? Feel free to write us 👾

Weakness Enumeration

Related Identifiers

GHSA-V7CF-C9RM-WM3J

Affected Products

Justhtml