Qwen3.6-35B-A3B
The community-tuned refresh of Qwen3.5-35B — same MoE architecture, prioritizing stability and real-world utility. Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 in Qwen's published benchmarks.
Total Parameters
35B
MoE · 3B Active
Context Window
256K
Tokens
Intelligence
43
AA Index v4.0
License
Apache 2.0
Open Weights
Stability-Focused, Real-World Tuned
Qwen3.6-35B-A3B is an updated version of the Qwen3.5-35B-A3B model, prioritizing stability and real-world utility following community feedback. It is a high-intelligence, mid-sized MoE that hits a very compelling price/performance point for async workloads.
In Qwen's published benchmarks, this model outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5. Thinking mode is enabled by default — the model reasons step-by-step before responding. To disable, pass {"chat_template_kwargs": {"enable_thinking": false}}.
Mixture-of-Experts — 35B Total / 3B Active
Built for high-volume async workloads
Async Agents
Hits the sweet spot of price and intelligence for high-volume agentic pipelines, with thinking-mode reasoning enabled by default.
Step-by-Step Reasoning
Native chain-of-thought reasoning produces stable, well-structured answers tuned via real-world community feedback.
Long-Context Processing
256K context window supports full-repository analysis, long legal documents, and multi-document synthesis pipelines.
Coding & Tool Use
Outperforms GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5 on Qwen's published benchmarks for coding and tool-augmented tasks.
Frontier Intelligence at a Fraction of the Price
Artificial Analysis Intelligence Index v4.0 scores for the 35B-A3B class.
Intelligence Index
Better than 84% of models
GPQA Diamond
Better than 88% of models
τ²-Bench Telecom
Better than 94% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 84% |
| Reasoning | Humanity's Last Exam | 20% |
| Reasoning | τ²-Bench Telecom | 95% |
| Reasoning | AA-LCR | 64% |
| Reasoning | IFBench | 64% |
| Reasoning | GDPval-AA | 43.5% |
| Coding | SciCode | 36% |
| Coding | Terminal-Bench Hard | 35% |
| Knowledge | AA-Omniscience Accuracy | 19% |
| Knowledge | AA-Omniscience Non-Hallucination | 50% |
Metrics sourced from Artificial Analysis. Evaluated in reasoning (thinking) mode.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.05 | $0.20 |
| Async | $0.07 | $0.30 |
| Realtime | $0.25 | $2.00 |
Context window natively supported up to 256K tokens.
Start Building in Minutes
Qwen3.6-35B-A3B is accessible via OpenAI-compatible endpoints. Here's how to integrate it via the standard Python SDK.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Standard chat completion (thinking enabled by default)
response = client.chat.completions.create(
model="Qwen/Qwen3.6-35B-A3B-FP8",
messages=[
{"role": "user", "content": "Explain the MoE architecture in 3 bullets."}
],
# To disable step-by-step reasoning:
# extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(response.choices[0].message.content)💡 Pro Tip
Thinking mode is on by default. For latency-sensitive workloads or simple tasks where chain-of-thought is unnecessary, set "chat_template_kwargs": {"enable_thinking": false}. Note: reasoning_effort is not supported on this model.
