Qwen3.5 35B A3B
The hyper-efficient 3B-active multimodal model for rapid reasoning and agentic workflows.
Total Parameters
35B
3B Activated
Context Window
262K
Extensible to 1M
Modalities
Text, Image
& Video
Architecture
256 Experts
8 Routed + 1 Shared
High-Speed Multimodal Intelligence
Qwen3.5 35B A3B is an advanced causal language model featuring a native vision encoder. Built on a highly optimized Mixture-of-Experts (MoE) framework, it contains 35 billion total parameters but activates just 3 billion per token during inference. Leveraging a hybrid Gated DeltaNet and Gated Attention architecture, it delivers exceptional speed and cost efficiency without sacrificing intelligence. It natively supports a 262K context window—extensible up to 1 million tokens—making it perfect for high-throughput, long-context applications.
Mixture of Experts — 3B active / 35B total
Built for speed and efficiency
Native Multimodal Workflows
Process text, high-resolution images, and videos seamlessly with the integrated vision encoder.
High-Speed Agentic Workflows
Incredibly fast inference for conversational AI agents, instruction following, and dual-control scenarios.
Cost-Effective Coding
Strong baseline performance in agentic coding and terminal usage at a fraction of the compute cost of massive models.
Long-Context Data Processing
Analyze massive documents natively with the 262K context window, easily extensible up to 1,000,000 tokens for comprehensive data extraction.
Efficient Intelligence
Proven performance across reasoning, coding, and agentic workflows for a 3B-active model.
Overall Intelligence
Better than 82% of models
Coding Capability
Better than 79% of models
Agentic Capability
Better than 82% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 84.5% |
| Reasoning | τ²-Bench Telecom | 89.2% |
| Reasoning | IFBench | 72.5% |
| Reasoning | AA-LCR | 62.7% |
| Reasoning | GDPval-AA | 21.4% |
| Reasoning | HLE | 19.7% |
| Reasoning | CritPt | 0.9% |
| Coding | SciCode | 37.7% |
| Coding | Terminal-Bench Hard | 26.5% |
| Knowledge | AA-Omniscience | 20.4% |
Metrics sourced from Artificial Analysis.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.05 | $0.20 |
| Async | $0.07 | $0.30 |
| Realtime | $0.25 | $2.00 |
Context window natively supported up to 262k tokens (extensible to 1M).
Start Building in Minutes
Qwen3.5 35B A3B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")