Doubleword
    Hybrid Mamba-Transformer
    LatentMoE
    Open Weights
    NVIDIA
    NVIDIA

    Nemotron-3-Super-120B

    Agentic reasoning at scale — 120B total parameters, 12B active, built for coding, planning, tool use, and long-context tasks.

    Total Parameters

    120B

    12B Active

    Context Window

    256K

    Tokens

    Quantization

    NVFP4

    Optimized

    Architecture

    Hybrid

    Mamba-Transformer

    About

    Agentic Reasoning at Scale

    NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.

    E
    E
    E

    Hybrid Mamba-Transformer — 12B Active

    Use Cases

    Built for agentic workloads

    Agentic Reasoning

    Multi-step reasoning workflows with planning, self-correction, and autonomous decision-making for complex agentic tasks.

    Coding & Tool Use

    Advanced code generation, debugging, and tool orchestration with native function calling support for engineering workflows.

    Long-Context Tasks

    Process and reason over massive documents, codebases, and knowledge bases with 256K token context window support.

    Planning & Orchestration

    Decompose complex goals into executable plans, coordinate multi-agent systems, and orchestrate sophisticated processing pipelines.

    Benchmarks

    Agentic Intelligence

    Artificial Analysis Intelligence Index v4.0 scores for the 120B weight class.

    36

    Overall Intelligence

    Better than 72% of models

    30

    Coding Capability

    Better than 68% of models

    38

    Agentic Capability

    Better than 74% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond71.2%
    Reasoningτ²-Bench Telecom62.8%
    ReasoningIFBench68.4%
    ReasoningAA-LCR34.1%
    ReasoningGDPval-AAELO 1027
    ReasoningHLE11.2%
    CodingSciCode38.7%
    CodingTerminal-Bench Hard29.0%
    KnowledgeAA-Omniscience18.3%

    Metrics sourced from Artificial Analysis. Evaluated on BF16 weights in regular (highest-effort) reasoning mode.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.15$0.38
    Async$0.23$0.56
    Realtime$0.30$0.75

    Context window natively supported up to 256K tokens.

    Quickstart

    Start Building in Minutes

    Nemotron-3-Super-120B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")

    Ready to deploy Nemotron-3-Super-120B?