Nemotron-3-Ultra-550B-A55B
NVIDIA's strongest open-weights reasoning model — positioned near GPT-5.4 Mini (xhigh) and ahead of DeepSeek V4-Flash and Qwen3.5-397B-A17B.
Total Parameters
550B
55B Active
Context Window
262K
Tokens
Released
Jun 2026
Open Weights
Architecture
MoE
Reasoning
NVIDIA's Flagship Open Reasoning Model
NVIDIA Nemotron 3 Ultra is the strongest open-weights model in the Nemotron 3 family — 550B total parameters with 55B active, purpose-built for high-stakes reasoning, agentic workflows, tool use, and multilingual instruction following. It lands near GPT-5.4 Mini (xhigh) on the Artificial Analysis Intelligence Index and ahead of DeepSeek V4-Flash and Qwen3.5-397B-A17B, while staying open and self-hostable.
MoE — 55B Active of 550B
Built for high-stakes reasoning
Agentic Workflows
Long-horizon planning, self-correction, and autonomous decision-making for multi-step agent stacks.
Tool Use
Reliable native function calling and tool orchestration for production agentic pipelines.
High-Stakes RAG
Grounded answers over large knowledge bases — 262K context for whole-corpus reasoning.
Complex Instruction Following
Robust adherence to nested, multi-constraint instructions across multilingual prompts.
Frontier-class open intelligence
Artificial Analysis Intelligence Index v4.0 — Nemotron 3 Ultra vs comparable open & closed models.
Intelligence Index
AA Intelligence Index v4.0
GPQA Diamond
AA Intelligence Index v4.0
IFBench
AA Intelligence Index v4.0
| Category | Benchmark | Score |
|---|---|---|
| Agentic | GDPval-AA | 44% |
| Agentic | τ²-Bench Telecom | 83% |
| Coding | Terminal-Bench Hard | 36% |
| Coding | SciCode | 40% |
| Reasoning | AA-LCR | 67% |
| Reasoning | GPQA Diamond | 87% |
| Reasoning | Humanity's Last Exam | 27% |
| Instruction | IFBench | 81% |
| Knowledge | AA-Omniscience Accuracy | 22% |
| Knowledge | AA-Omniscience Non-Hallucination | 71% |
Metrics sourced from Artificial Analysis Intelligence Index v4.0. Evaluated in regular (highest-effort) reasoning mode.
Pick your delivery window
Same model, three speeds. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| BatchSave 50% | $0.25 | $1.25 |
| AsyncSave 26% | $0.37 | $1.87 |
| Realtime | $0.50 | $2.50 |
Context window natively supported up to 262K tokens.
Start Building in Minutes
Nemotron-3-Ultra-550B is accessible via OpenAI-compatible endpoints on Doubleword.ai.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
response = client.chat.completions.create(
model="nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B",
messages=[
{"role": "user", "content": "Plan a 3-step research workflow for evaluating an open-weights LLM."}
],
)
print(response.choices[0].message.content)