PT-2024-35110 · Unknown+2 · Scikit-Learn+2

Published

2024-06-06

·

Updated

2024-10-24

·

CVE-2024-5206

CVSS v3.1

4.7

Medium

VectorAV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N
Name of the Vulnerable Software and Affected Versions scikit-learn versions up to and including 1.4.1.post1
Description A sensitive data leakage issue was identified in scikit-learn's TfidfVectorizer. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stop words attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stop words attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this issue varies based on the nature of the data being processed by the vectorizer.
Recommendations For scikit-learn versions up to and including 1.4.1.post1, update to version 1.5.0 to resolve the issue. As a temporary workaround, consider restricting access to the stop words attribute to minimize the risk of sensitive information leakage. Avoid using the TfidfVectorizer with sensitive data until the issue is resolved.

Fix

Insecure Storage of Sensitive Information

Weakness Enumeration

Related Identifiers

CVE-2024-5206
ECHO-BF4B-FB83-A5A9
GHSA-JW8X-6495-233V
MGASA-2024-0228
OESA-2024-1745
OPENSUSE-SU-2024:14043-1
OPENSUSE-SU-2024_2029-1
PYSEC-2024-110
SUSE-SU-2024:2029-1

Affected Products

Debian
Suse
Scikit-Learn