Doubleword
    MoE Architecture
    Apache 2.0
    Single-GPU Ready

    GPT-OSS-20B

    Powerful chain-of-thought reasoning in an efficient, highly scalable 20B parameter model.

    Total Parameters

    20B

    Context Window

    128K

    Tokens

    Infrastructure

    Single

    B200 GPU

    License

    Apache

    2.0

    About

    Scalable Open Reasoning

    GPT-OSS-20B provides powerful chain-of-thought reasoning in an incredibly efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency. It utilizes a compact Mixture-of-Experts (MoE) design with SwiGLU activations, native FP4 quantization for optimal inference speed, and an adjustable reasoning effort level for task-specific optimization.

    E
    E
    E
    E

    Compact MoE — Single-GPU Ready

    Use Cases

    Built for efficient reasoning

    Development Applications

    Accelerate engineering with rapid prototyping, robust code generation, automated API design, and system integration testing.

    Business Solutions

    Streamline enterprise operations through customer support automation, advanced content generation, and sophisticated market research analysis.

    Educational Use Cases

    Deploy interactive tutoring systems, support curriculum development, and assist with academic writing and research methodology guidance.

    Edge & Scalable Deployment

    Leverage cost-effective single-GPU operations for reduced infrastructure requirements, making it ideal for edge computing and distributed processing.

    Benchmarks

    Efficient Intelligence

    Proven performance across reasoning, coding, and agentic workflows for the 20B weight class.

    24.5

    Overall Intelligence

    Better than 58% of models

    18.5

    Coding Capability

    Better than 56% of models

    27.6

    Agentic Capability

    Better than 61% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond68.8%
    Reasoningτ²-Bench Telecom60.2%
    ReasoningIFBench65.1%
    ReasoningAA-LCR30.7%
    ReasoningGDPval-AA9.2%
    ReasoningHLE9.8%
    ReasoningCritPt1.4%
    CodingSciCode34.4%
    CodingTerminal-Bench Hard10.6%
    KnowledgeAA-Omniscience15.5%

    Metrics sourced from Artificial Analysis.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.02$0.15
    Async$0.03$0.20
    Realtime$0.04$0.30

    Context window natively supported up to 128k tokens.

    Quickstart

    Start Building in Minutes

    GPT-OSS-20B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")