Hybrid Mamba-Transformer

LatentMoE

Open Weights

NVIDIA

Nemotron-3-Super-120B

Agentic reasoning at scale — 120B total parameters, 12B active, built for coding, planning, tool use, and long-context tasks.

Get API Key Test in Playground

Total Parameters

120B

12B Active

Context Window

256K

Tokens

Quantization

NVFP4

Optimized

Architecture

Hybrid

Mamba-Transformer

About

Agentic Reasoning at Scale

NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.

Hybrid Mamba-Transformer — 12B Active

Use Cases

Built for agentic workloads

Agentic Reasoning

Multi-step reasoning workflows with planning, self-correction, and autonomous decision-making for complex agentic tasks.

Coding & Tool Use

Advanced code generation, debugging, and tool orchestration with native function calling support for engineering workflows.

Long-Context Tasks

Process and reason over massive documents, codebases, and knowledge bases with 256K token context window support.

Planning & Orchestration

Decompose complex goals into executable plans, coordinate multi-agent systems, and orchestrate sophisticated processing pipelines.

Benchmarks

Agentic Intelligence

Artificial Analysis Intelligence Index v4.0 scores for the 120B weight class.

Intelligence Index

Better than 72% of models

GPQA Diamond

Better than 86% of models

τ²-Bench Telecom

Better than 82% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	83%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	19%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	80%	AI agents in dual-control scenarios
Reasoning	AA-LCR	56%	Long context reasoning evaluation
Reasoning	IFBench	71%	Instruction-following accuracy
Reasoning	GDPval-AA	25%	Agentic performance on real-world work tasks
Coding	SciCode	38%	Python for scientific computing
Coding	Terminal-Bench Hard	29%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	23%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	15%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis. Evaluated on BF16 weights in regular (highest-effort) reasoning mode.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.15	$0.38
Async	$0.23	$0.56
Realtime	$0.30	$0.75

Context window natively supported up to 256K tokens.

Quickstart

Start Building in Minutes

Nemotron-3-Super-120B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Nemotron-3-Super-120B?

Get Your API Keys Read the Full Documentation