PT-2026-28763 · Pypi · Justhtml

Publicado

2026-03-18

·

Atualizado

2026-03-18

CVSS v4.0

5.3

Média

VetorAV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N

Summary

to markdown() does not sufficiently escape text content that looks like HTML. As a result, untrusted input that is safe in to html() can become raw HTML in Markdown output.
This is not specific to tokenizer raw-text states like <title>, <noscript>, or <plaintext>, although those states can trigger the behavior. The root cause is broader: Markdown text serialization leaves angle brackets unescaped in text nodes.

Details

When converting a parsed document to Markdown, text nodes are escaped for a small set of Markdown metacharacters, but HTML-significant characters such as < and > are preserved. That means content parsed as text, including entity-decoded text or text produced by RCDATA/RAWTEXT-style parsing, can be emitted into Markdown as raw HTML.
Examples of affected input include:
  • Text produced from entity-decoded input such as &lt;script&gt;...&lt;/script&gt;
  • Text inside elements like <title>, <textarea>, <noscript> (when parsed as raw text), and <plaintext>
This is distinct from actual <script> or <style> elements in the DOM. Those are already dropped by default in to markdown() unless html passthrough=True.

Proof of Concept

General case

python
from justhtml import JustHTML

doc = JustHTML("<p>&lt;img src=x onerror=alert(1)&gt;</p>", fragment=True)

print(doc.to html())
print()
print(doc.to markdown())

Correção

XSS

Encontrou algum problema na descrição? Tem algo a acrescentar? Fique à vontade para nos escrever 👾

Enumeração de Fraquezas

Identificadores relacionados

GHSA-3RCM-VJRC-P45J

Produtos afetados

Justhtml