Doubleword
    MoE Architecture
    Vision Encoder
    Open Weights

    Qwen3.5 35B A3B

    The hyper-efficient 3B-active multimodal model for rapid reasoning and agentic workflows.

    Total Parameters

    35B

    3B Activated

    Context Window

    262K

    Extensible to 1M

    Modalities

    Text, Image

    & Video

    Architecture

    256 Experts

    8 Routed + 1 Shared

    About

    High-Speed Multimodal Intelligence

    Qwen3.5 35B A3B is an advanced causal language model featuring a native vision encoder. Built on a highly optimized Mixture-of-Experts (MoE) framework, it contains 35 billion total parameters but activates just 3 billion per token during inference. Leveraging a hybrid Gated DeltaNet and Gated Attention architecture, it delivers exceptional speed and cost efficiency without sacrificing intelligence. It natively supports a 262K context window—extensible up to 1 million tokens—making it perfect for high-throughput, long-context applications.

    E
    E
    E
    E
    E
    E

    Mixture of Experts — 3B active / 35B total

    Use Cases

    Built for speed and efficiency

    Native Multimodal Workflows

    Process text, high-resolution images, and videos seamlessly with the integrated vision encoder.

    High-Speed Agentic Workflows

    Incredibly fast inference for conversational AI agents, instruction following, and dual-control scenarios.

    Cost-Effective Coding

    Strong baseline performance in agentic coding and terminal usage at a fraction of the compute cost of massive models.

    Long-Context Data Processing

    Analyze massive documents natively with the 262K context window, easily extensible up to 1,000,000 tokens for comprehensive data extraction.

    Benchmarks

    Efficient Intelligence

    Proven performance across reasoning, coding, and agentic workflows for a 3B-active model.

    37.1

    Overall Intelligence

    Better than 82% of models

    30.3

    Coding Capability

    Better than 79% of models

    44.1

    Agentic Capability

    Better than 82% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond84.5%
    Reasoningτ²-Bench Telecom89.2%
    ReasoningIFBench72.5%
    ReasoningAA-LCR62.7%
    ReasoningGDPval-AA21.4%
    ReasoningHLE19.7%
    ReasoningCritPt0.9%
    CodingSciCode37.7%
    CodingTerminal-Bench Hard26.5%
    KnowledgeAA-Omniscience20.4%

    Metrics sourced from Artificial Analysis.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.05$0.20
    Async$0.07$0.30
    Realtime$0.25$2.00

    Context window natively supported up to 262k tokens (extensible to 1M).

    Quickstart

    Start Building in Minutes

    Qwen3.5 35B A3B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")

    Ready to deploy Qwen3.5 35B A3B?