Qwen3.5 4B
A compact 4B parameter reasoning model with 262K context — comparable to GPT-OSS-20B at a fraction of the cost.
Total Parameters
4B
Context Window
262K
Native
Intelligence
27
AA Index v4.0
Provider
Alibaba
Cloud
Big Reasoning in a Small Package
Qwen3.5-4B is a compact 4 billion parameter reasoning model from Alibaba Cloud's Qwen family, featuring a native 262K token context length. Despite its small size, it delivers remarkably strong performance on complex reasoning tasks — Qwen's benchmarks show it is comparable to the much larger GPT-OSS-20B model. This makes it an ideal choice for cost-sensitive workloads that still require robust intelligence.
💡 Developer Tip
At just $0.04/$0.06 per 1M tokens on Standard tier, Qwen3.5-4B is one of the most cost-efficient reasoning models available — perfect for high-volume batch workloads.
Compact Reasoning — 4B parameters
Built for efficient reasoning at scale
Compact Reasoning Powerhouse
Despite its small 4B parameter size, Qwen3.5-4B delivers reasoning performance comparable to GPT-OSS-20B on complex tasks — at a fraction of the cost.
Cost-Efficient Agent Workflows
Ideal for high-volume agentic pipelines where per-token cost matters. Run thousands of concurrent reasoning tasks without breaking the bank.
Edge & Lightweight Deployment
Small enough for resource-constrained environments while maintaining strong reasoning capabilities. Perfect for on-device or low-latency scenarios.
Long-Context Processing
Native 262K token context window enables processing of large documents, codebases, and multi-turn conversations without chunking overhead.
Punches Above Its Weight
Comparable to GPT-OSS-20B on complex reasoning tasks, at 5x fewer parameters.
Overall Intelligence
Better than 71% of models
Coding Capability
Better than 63% of models
Agentic Capability
Better than 66% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 72.2% |
| Reasoning | τ²-Bench Telecom | 68.4% |
| Reasoning | IFBench | 62.3% |
| Reasoning | AA-LCR | 38.5% |
| Reasoning | GDPval-AA | 9.8% |
| Reasoning | HLE | 10.1% |
| Reasoning | CritPt | 0.8% |
| Coding | SciCode | 28.6% |
| Coding | Terminal-Bench Hard | 11.5% |
| Knowledge | AA-Omniscience | 14.2% |
Metrics sourced from Artificial Analysis.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.04 | $0.06 |
| Async | $0.05 | $0.08 |
Context window natively supported up to 262K tokens.
Start Building in Minutes
Qwen3.5 4B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")