Nemotron-3-Super-120B
Agentic reasoning at scale — 120B total parameters, 12B active, built for coding, planning, tool use, and long-context tasks.
Total Parameters
120B
12B Active
Context Window
256K
Tokens
Quantization
NVFP4
Optimized
Architecture
Hybrid
Mamba-Transformer
Agentic Reasoning at Scale
NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.
Hybrid Mamba-Transformer — 12B Active
Built for agentic workloads
Agentic Reasoning
Multi-step reasoning workflows with planning, self-correction, and autonomous decision-making for complex agentic tasks.
Coding & Tool Use
Advanced code generation, debugging, and tool orchestration with native function calling support for engineering workflows.
Long-Context Tasks
Process and reason over massive documents, codebases, and knowledge bases with 256K token context window support.
Planning & Orchestration
Decompose complex goals into executable plans, coordinate multi-agent systems, and orchestrate sophisticated processing pipelines.
Agentic Intelligence
Artificial Analysis Intelligence Index v4.0 scores for the 120B weight class.
Overall Intelligence
Better than 72% of models
Coding Capability
Better than 68% of models
Agentic Capability
Better than 74% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 71.2% |
| Reasoning | τ²-Bench Telecom | 62.8% |
| Reasoning | IFBench | 68.4% |
| Reasoning | AA-LCR | 34.1% |
| Reasoning | GDPval-AA | ELO 1027 |
| Reasoning | HLE | 11.2% |
| Coding | SciCode | 38.7% |
| Coding | Terminal-Bench Hard | 29.0% |
| Knowledge | AA-Omniscience | 18.3% |
Metrics sourced from Artificial Analysis. Evaluated on BF16 weights in regular (highest-effort) reasoning mode.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.15 | $0.38 |
| Async | $0.23 | $0.56 |
| Realtime | $0.30 | $0.75 |
Context window natively supported up to 256K tokens.
Start Building in Minutes
Nemotron-3-Super-120B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")