PT-2024-35110 · Unknown+2 · Scikit-Learn+2
Published
2024-06-06
·
Updated
2024-10-24
·
CVE-2024-5206
CVSS v3.1
4.7
Medium
| Vector | AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N |
Name of the Vulnerable Software and Affected Versions
scikit-learn versions up to and including 1.4.1.post1
Description
A sensitive data leakage issue was identified in scikit-learn's TfidfVectorizer. The vulnerability arises from the unexpected storage of all tokens present in the training data within the
stop words attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stop words attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this issue varies based on the nature of the data being processed by the vectorizer.Recommendations
For scikit-learn versions up to and including 1.4.1.post1, update to version 1.5.0 to resolve the issue. As a temporary workaround, consider restricting access to the
stop words attribute to minimize the risk of sensitive information leakage. Avoid using the TfidfVectorizer with sensitive data until the issue is resolved.Fix
Insecure Storage of Sensitive Information
Found an issue in the description? Have something to add? Feel free to write us 👾
Related Identifiers
Affected Products
Debian
Suse
Scikit-Learn