Compact Model

Reasoning

Open Weights

New

Qwen3.5 4B

A compact 4B parameter reasoning model with 262K context — comparable to GPT-OSS-20B at a fraction of the cost.

Get API Key Test in Playground

Total Parameters

Context Window

262K

Native

Intelligence

AA Index v4.0

Provider

Alibaba

Cloud

About

Big Reasoning in a Small Package

Qwen3.5-4B is a compact 4 billion parameter reasoning model from Alibaba Cloud's Qwen family, featuring a native 262K token context length. Despite its small size, it delivers remarkably strong performance on complex reasoning tasks — Qwen's benchmarks show it is comparable to the much larger GPT-OSS-20B model. This makes it an ideal choice for cost-sensitive workloads that still require robust intelligence.

💡 Developer Tip

At just $0.04/$0.06 per 1M tokens on Standard tier, Qwen3.5-4B is one of the most cost-efficient reasoning models available — perfect for high-volume batch workloads.

Compact Reasoning — 4B parameters

Use Cases

Built for efficient reasoning at scale

Compact Reasoning Powerhouse

Despite its small 4B parameter size, Qwen3.5-4B delivers reasoning performance comparable to GPT-OSS-20B on complex tasks — at a fraction of the cost.

Cost-Efficient Agent Workflows

Ideal for high-volume agentic pipelines where per-token cost matters. Run thousands of concurrent reasoning tasks without breaking the bank.

Edge & Lightweight Deployment

Small enough for resource-constrained environments while maintaining strong reasoning capabilities. Perfect for on-device or low-latency scenarios.

Long-Context Processing

Native 262K token context window enables processing of large documents, codebases, and multi-turn conversations without chunking overhead.

Benchmarks

Punches Above Its Weight

Comparable to GPT-OSS-20B on complex reasoning tasks, at 5x fewer parameters.

Intelligence Index

Better than 71% of models

GPQA Diamond

Better than 75% of models

τ²-Bench Telecom

Better than 92% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	77%	Graduate-level scientific reasoning
Reasoning	τ²-Bench Telecom	92%	AI agents in dual-control scenarios
Reasoning	AA-LCR	56%	Long context reasoning evaluation
Reasoning	IFBench	52%	Instruction-following accuracy
Reasoning	GDPval-AA	27.1%	Agentic performance on real-world work tasks
Coding	SciCode	16%	Python for scientific computing
Coding	Terminal-Bench Hard	18%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	13%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	20%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.04	$0.06
Async	$0.05	$0.08

Context window natively supported up to 262K tokens.

Quickstart

Start Building in Minutes

Qwen3.5 4B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Qwen3.5 4B?

Get Your API Keys Read the Full Documentation