NVIDIA releases Nemotron‑3 Ultra — an LLM for optimizing long‑running agents

Technologies2026-06-17, 11:38

NVIDIA has introduced a large language model (LLM) called Nemotron‑3 Ultra, designed to optimize the performance of long‑term agents.

Long‑term agents are AI systems that must retain context and make decisions over extended sessions.

Key features: ⏺ Nemotron‑3 Ultra delivers five times higher throughput than comparable models and cuts agent task execution costs by up to 30%. The model also uses Multi‑token Prediction (MTP), which accelerates text generation by forecasting multiple future tokens at once. ⏺ Trained using the Multi‑Teacher On‑Policy Distillation (MOPD) method, where the model learns from more than ten expert teacher models. This allows Nemotron‑3 Ultra to continuously improve its capabilities and specialize across different domains. ⏺Supports context lengths of up to 1 million tokens thanks to its Hybrid Mamba‑Transformer architecture.

NVIDIA also released two new Nemotron models: 🛑Nemotron 3.5 Content Safety — an open, efficient 4B model (with 4 billion parameters) designed to detect harmful or restricted content. It supports 12 languages and 23 safety categories. 🛑Nemotron 3.5 ASR — a model for streaming Automatic Speech Recognition (ASR). It supports more than 40 languages and delivers latency under 100 milliseconds.

NVIDIA has released not only the model but also all related assets — weights, training datasets, and training

Vendors

Nvidia

Products

Nemotron 3.5 Asr

Nemotron 3.5 Content Safety

Nemotron‑3 Ultra