Langchain Ai · Langchain · CVE-2024-0243
**Name of the Vulnerable Software and Affected Versions**
langchain versions prior to the version that includes the fix from https://github.com/langchain-ai/langchain/pull/15559
**Description**
The issue arises when an attacker controls the contents of a website, such as `https://example.com`, and places a malicious HTML file with links to external sites, like `https://example.completely.different/my file.html`. Even with `prevent outside=True` set in the crawler configuration, the RecursiveUrlLoader would still download the file from the external site. This is due to the loader's behavior when encountering links in the HTML content.
**Recommendations**
For versions prior to the fix in https://github.com/langchain-ai/langchain/pull/15559, consider updating to a version that includes this fix to resolve the issue. As a temporary workaround, consider restricting the `url` parameter in the `RecursiveUrlLoader` to only allow links from trusted domains until a patch is available. Additionally, be cautious when using the `extractor` parameter with lambda functions that parse HTML content, as this could potentially lead to unintended downloads.