Doubleword
    MoE · 3B Active
    Thinking Mode
    Open Weights
    Alibaba
    Apache 2.0

    Qwen3.6-35B-A3B

    The community-tuned refresh of Qwen3.5-35B — same MoE architecture, prioritizing stability and real-world utility. Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 in Qwen's published benchmarks.

    Total Parameters

    35B

    MoE · 3B Active

    Context Window

    256K

    Tokens

    Intelligence

    43

    AA Index v4.0

    License

    Apache 2.0

    Open Weights

    About

    Stability-Focused, Real-World Tuned

    Qwen3.6-35B-A3B is an updated version of the Qwen3.5-35B-A3B model, prioritizing stability and real-world utility following community feedback. It is a high-intelligence, mid-sized MoE that hits a very compelling price/performance point for async workloads.

    In Qwen's published benchmarks, this model outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5. Thinking mode is enabled by default — the model reasons step-by-step before responding. To disable, pass {"chat_template_kwargs": {"enable_thinking": false}}.

    MoE
    3B/A
    256K
    JSON
    Tools
    Think

    Mixture-of-Experts — 35B Total / 3B Active

    Use Cases

    Built for high-volume async workloads

    Async Agents

    Hits the sweet spot of price and intelligence for high-volume agentic pipelines, with thinking-mode reasoning enabled by default.

    Step-by-Step Reasoning

    Native chain-of-thought reasoning produces stable, well-structured answers tuned via real-world community feedback.

    Long-Context Processing

    256K context window supports full-repository analysis, long legal documents, and multi-document synthesis pipelines.

    Coding & Tool Use

    Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 on Qwen's published benchmarks for coding and tool-augmented tasks.

    Benchmarks

    Frontier Intelligence at a Fraction of the Price

    Artificial Analysis Intelligence Index v4.0 scores for the 35B-A3B class.

    43

    Intelligence Index

    Better than 84% of models

    84

    GPQA Diamond

    Better than 88% of models

    95

    τ²-Bench Telecom

    Better than 94% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond84%
    ReasoningHumanity's Last Exam20%
    Reasoningτ²-Bench Telecom95%
    ReasoningAA-LCR64%
    ReasoningIFBench64%
    ReasoningGDPval-AA43.5%
    CodingSciCode36%
    CodingTerminal-Bench Hard35%
    KnowledgeAA-Omniscience Accuracy19%
    KnowledgeAA-Omniscience Non-Hallucination50%

    Metrics sourced from Artificial Analysis. Evaluated in reasoning (thinking) mode.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.05$0.20
    Async$0.07$0.30
    Realtime$0.25$2.00

    Context window natively supported up to 256K tokens.

    Quickstart

    Start Building in Minutes

    Qwen3.6-35B-A3B is accessible via OpenAI-compatible endpoints. Here's how to integrate it via the standard Python SDK.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Standard chat completion (thinking enabled by default)
    response = client.chat.completions.create(
        model="Qwen/Qwen3.6-35B-A3B-FP8",
        messages=[
            {"role": "user", "content": "Explain the MoE architecture in 3 bullets."}
        ],
        # To disable step-by-step reasoning:
        # extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )
    
    print(response.choices[0].message.content)

    💡 Pro Tip

    Thinking mode is on by default. For latency-sensitive workloads or simple tasks where chain-of-thought is unnecessary, set "chat_template_kwargs": {"enable_thinking": false}. Note: reasoning_effort is not supported on this model.

    Ready to deploy Qwen3.6-35B-A3B?