Qwen3.5 9B
The highly capable 9B-parameter multimodal model optimized for efficient agentic workflows and native tool calling.
Total Parameters
9B
Context Window
262K
Extensible to 1M+
Modalities
Text, Image
& Video
Function Calling
66.1%
BFCL-V4
Efficient Multimodal Intelligence at the 9B Scale
Qwen3.5 9B is a powerful foundation model that utilizes a hybrid Gated DeltaNet and Gated Attention architecture for highly efficient inference with reduced latency. Featuring 9 billion parameters, it delivers robust multimodal understanding across text, images, and video. Built with a native 262,144-token context window and explicit "Thinking Mode" capabilities, Qwen3.5 9B is engineered for production-grade reliability in autonomous agent workflows, advanced OCR, and complex global applications.
Hybrid Attention — 9B parameters
Built for agentic intelligence
Unified Multimodal Reasoning
Process text, video, and high-resolution images together. Excels at visual question answering, OCR document processing, and spatial reasoning.
Native Tool Calling & Agents
Production-ready function calling for multi-step agent orchestration, autonomous task planning, and reliable code generation.
Deep Reasoning ("Thinking Mode")
Built-in "Thinking Mode" generates explicit step-by-step reasoning traces before answering, dramatically increasing accuracy on complex tasks.
Global & Long-Context Processing
Analyze massive documents natively with the 262K context window (extensible to 1M+) while offering nuanced support across 201 languages.
Capable & Efficient
Proven performance across reasoning, coding, and agentic workflows for the 9B weight class.
Overall Intelligence
Better than 75% of models
Coding Capability
Better than 71% of models
Agentic Capability
Better than 72% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 80.6% |
| Reasoning | τ²-Bench Telecom | 86.8% |
| Reasoning | IFBench | 66.7% |
| Reasoning | AA-LCR | 59.0% |
| Reasoning | GDPval-AA | 12.1% |
| Reasoning | HLE | 13.3% |
| Reasoning | CritPt | 0.3% |
| Coding | SciCode | 27.5% |
| Coding | Terminal-Bench Hard | 24.2% |
| Knowledge | AA-Omniscience | 15.9% |
Metrics sourced from Artificial Analysis.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.03 | $0.29 |
| Async | $0.04 | $0.35 |
| Realtime | $0.08 | $0.70 |
Context window natively supported up to 262k tokens (extensible to 1M+).
Start Building in Minutes
Qwen3.5 9B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")