MoE Architecture

Vision Encoder

Open Weights

Qwen3.5 35B A3B

The hyper-efficient 3B-active multimodal model for rapid reasoning and agentic workflows.

Get API Key Test in Playground

Total Parameters

35B

3B Activated

Context Window

262K

Extensible to 1M

Modalities

Text, Image

& Video

Architecture

256 Experts

8 Routed + 1 Shared

About

High-Speed Multimodal Intelligence

Qwen3.5 35B A3B is an advanced causal language model featuring a native vision encoder. Built on a highly optimized Mixture-of-Experts (MoE) framework, it contains 35 billion total parameters but activates just 3 billion per token during inference. Leveraging a hybrid Gated DeltaNet and Gated Attention architecture, it delivers exceptional speed and cost efficiency without sacrificing intelligence. It natively supports a 262K context window—extensible up to 1 million tokens—making it perfect for high-throughput, long-context applications.

Mixture of Experts — 3B active / 35B total

Use Cases

Built for speed and efficiency

Native Multimodal Workflows

Process text, high-resolution images, and videos seamlessly with the integrated vision encoder.

High-Speed Agentic Workflows

Incredibly fast inference for conversational AI agents, instruction following, and dual-control scenarios.

Cost-Effective Coding

Strong baseline performance in agentic coding and terminal usage at a fraction of the compute cost of massive models.

Long-Context Data Processing

Analyze massive documents natively with the 262K context window, easily extensible up to 1,000,000 tokens for comprehensive data extraction.

Benchmarks

Efficient Intelligence

Proven performance across reasoning, coding, and agentic workflows for a 3B-active model.

Intelligence Index

Better than 82% of models

GPQA Diamond

Better than 90% of models

τ²-Bench Telecom

Better than 90% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	86%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	20%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	89%	AI agents in dual-control scenarios
Reasoning	AA-LCR	63%	Long context reasoning evaluation
Reasoning	IFBench	71%	Instruction-following accuracy
Reasoning	GDPval-AA	28%	Agentic performance on real-world work tasks
Coding	SciCode	38%	Python for scientific computing
Coding	Terminal-Bench Hard	29%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	20%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	18%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.05	$0.20
Async	$0.07	$0.30
Realtime	$0.25	$2.00

Context window natively supported up to 262k tokens (extensible to 1M).

Quickstart

Start Building in Minutes

Qwen3.5 35B A3B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Qwen3.5 35B A3B?

Get Your API Keys Read the Full Documentation