Nltk · Nltk · CVE-2021-43854
Name of the Vulnerable Software and Affected Versions:
NLTK versions prior to 3.6.5
Description:
The issue is related to regular expression denial of service (ReDoS) attacks, which can cause significant execution time when a specifically crafted long input is provided to vulnerable functions. The vulnerability is present in `PunktSentenceTokenizer`, `sent tokenize`, and `word tokenize`. Any users of this class or these two functions are vulnerable to the ReDoS attack. If a program relies on any of the vulnerable functions for tokenizing unpredictable user input, upgrading to a version of NLTK without the vulnerability is strongly recommended.
Recommendations:
For versions prior to 3.6.5, upgrade to NLTK 3.6.6 or later to resolve the issue.
As a temporary workaround for users unable to upgrade, limit the maximum length of an input to any of the vulnerable functions to bound the execution time.