PT-2024-35110 · Unknown+2 · Scikit-Learn+2

CVE-2024-5206

Published

2024-06-06

Updated

2024-10-24

CVSS v3.1

4.7

Medium

Vector

AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N

Name of the Vulnerable Software and Affected Versions scikit-learn versions up to and including 1.4.1.post1

Description A sensitive data leakage issue was identified in scikit-learn's TfidfVectorizer. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stop words attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stop words attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this issue varies based on the nature of the data being processed by the vectorizer.

Recommendations For scikit-learn versions up to and including 1.4.1.post1, update to version 1.5.0 to resolve the issue. As a temporary workaround, consider restricting access to the stop words attribute to minimize the risk of sensitive information leakage. Avoid using the TfidfVectorizer with sensitive data until the issue is resolved.

Exploit

Fix

Insecure Storage of Sensitive Information

Found an issue in the description? Have something to add? Feel free to write us 👾

dbugs@ptsecurity.com

Weakness Enumeration

CWE-921CWE-922

Related Identifiers

CVE-2024-5206

ECHO-BF4B-FB83-A5A9

GHSA-JW8X-6495-233V

MGASA-2024-0228

OESA-2024-1745

OPENSUSE-SU-2024:14043-1

OPENSUSE-SU-2024_2029-1

PYSEC-2024-110

SUSE-SU-2024:2029-1

Affected Products

Debian

Suse

Scikit-Learn

PT-2024-35110 · Unknown+2 · Scikit-Learn+2

CVE-2024-5206

Weakness Enumeration

Related Identifiers

Affected Products

References · 30