PT-2026-28763 · Pypi · Justhtml
Publicado
2026-03-18
·
Atualizado
2026-03-18
CVSS v4.0
5.3
Média
| Vetor | AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N |
Summary
to markdown() does not sufficiently escape text content that looks like HTML. As a result, untrusted input that is safe in to html() can become raw HTML in Markdown output.This is not specific to tokenizer raw-text states like
<title>, <noscript>, or <plaintext>, although those states can trigger the behavior. The root cause is broader: Markdown text serialization leaves angle brackets unescaped in text nodes.Details
When converting a parsed document to Markdown, text nodes are escaped for a small set of Markdown metacharacters, but HTML-significant characters such as
< and > are preserved. That means content parsed as text, including entity-decoded text or text produced by RCDATA/RAWTEXT-style parsing, can be emitted into Markdown as raw HTML.Examples of affected input include:
- Text produced from entity-decoded input such as
<script>...</script> - Text inside elements like
<title>,<textarea>,<noscript>(when parsed as raw text), and<plaintext>
This is distinct from actual
<script> or <style> elements in the DOM. Those are already dropped by default in to markdown() unless html passthrough=True.Proof of Concept
General case
python
from justhtml import JustHTML
doc = JustHTML("<p><img src=x onerror=alert(1)></p>", fragment=True)
print(doc.to html())
print()
print(doc.to markdown())Correção
XSS
Encontrou algum problema na descrição? Tem algo a acrescentar? Fique à vontade para nos escrever 👾
Enumeração de Fraquezas
Identificadores relacionados
Produtos afetados
Justhtml