MoE · 3B Active

Thinking Mode

Open Weights

Alibaba

Apache 2.0

Qwen3.6-35B-A3B

The community-tuned refresh of Qwen3.5-35B — same MoE architecture, prioritizing stability and real-world utility. Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 in Qwen's published benchmarks.

Get API Key Test in Playground

Total Parameters

35B

MoE · 3B Active

Context Window

256K

Tokens

Intelligence

AA Index v4.0

License

Apache 2.0

Open Weights

About

Stability-Focused, Real-World Tuned

Qwen3.6-35B-A3B is an updated version of the Qwen3.5-35B-A3B model, prioritizing stability and real-world utility following community feedback. It is a high-intelligence, mid-sized MoE that hits a very compelling price/performance point for async workloads.

In Qwen's published benchmarks, this model outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5. Thinking mode is enabled by default — the model reasons step-by-step before responding. To disable, pass {"chat_template_kwargs": {"enable_thinking": false}}.

MoE

3B/A

256K

JSON

Tools

Think

Mixture-of-Experts — 35B Total / 3B Active

Use Cases

Built for high-volume async workloads

Async Agents

Hits the sweet spot of price and intelligence for high-volume agentic pipelines, with thinking-mode reasoning enabled by default.

Step-by-Step Reasoning

Native chain-of-thought reasoning produces stable, well-structured answers tuned via real-world community feedback.

Long-Context Processing

256K context window supports full-repository analysis, long legal documents, and multi-document synthesis pipelines.

Coding & Tool Use

Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 on Qwen's published benchmarks for coding and tool-augmented tasks.

Benchmarks

Frontier Intelligence at a Fraction of the Price

Artificial Analysis Intelligence Index v4.0 scores for the 35B-A3B class.

Intelligence Index

Better than 84% of models

GPQA Diamond

Better than 88% of models

τ²-Bench Telecom

Better than 94% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	84%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	20%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	95%	AI agents in dual-control scenarios
Reasoning	AA-LCR	64%	Long context reasoning evaluation
Reasoning	IFBench	64%	Instruction-following accuracy
Reasoning	GDPval-AA	43.5%	Agentic performance on real-world work tasks
Coding	SciCode	36%	Python for scientific computing
Coding	Terminal-Bench Hard	35%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	19%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	50%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis. Evaluated in reasoning (thinking) mode.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.05	$0.20
Async	$0.07	$0.30
Realtime	$0.25	$2.00

Context window natively supported up to 256K tokens.

Quickstart

Start Building in Minutes

Qwen3.6-35B-A3B is accessible via OpenAI-compatible endpoints. Here's how to integrate it via the standard Python SDK.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Standard chat completion (thinking enabled by default)
response = client.chat.completions.create(
    model="Qwen/Qwen3.6-35B-A3B-FP8",
    messages=[
        {"role": "user", "content": "Explain the MoE architecture in 3 bullets."}
    ],
    # To disable step-by-step reasoning:
    # extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)

print(response.choices[0].message.content)

💡 Pro Tip

Thinking mode is on by default. For latency-sensitive workloads or simple tasks where chain-of-thought is unnecessary, set "chat_template_kwargs": {"enable_thinking": false}. Note: reasoning_effort is not supported on this model.

Ready to deploy Qwen3.6-35B-A3B?

Get Your API Keys Read the Full Documentation