Qwen3.5 9B

The highly capable 9B-parameter multimodal model optimized for efficient agentic workflows and native tool calling.

Get API Key Test in Playground

Total Parameters

Context Window

262K

Extensible to 1M+

Modalities

Text, Image

& Video

Function Calling

66.1%

BFCL-V4

About

Efficient Multimodal Intelligence at the 9B Scale

Qwen3.5 9B is a powerful foundation model that utilizes a hybrid Gated DeltaNet and Gated Attention architecture for highly efficient inference with reduced latency. Featuring 9 billion parameters, it delivers robust multimodal understanding across text, images, and video. Built with a native 262,144-token context window and explicit "Thinking Mode" capabilities, Qwen3.5 9B is engineered for production-grade reliability in autonomous agent workflows, advanced OCR, and complex global applications.

Hybrid Attention — 9B parameters

Use Cases

Built for agentic intelligence

Unified Multimodal Reasoning

Process text, video, and high-resolution images together. Excels at visual question answering, OCR document processing, and spatial reasoning.

Native Tool Calling & Agents

Production-ready function calling for multi-step agent orchestration, autonomous task planning, and reliable code generation.

Deep Reasoning ("Thinking Mode")

Built-in "Thinking Mode" generates explicit step-by-step reasoning traces before answering, dramatically increasing accuracy on complex tasks.

Global & Long-Context Processing

Analyze massive documents natively with the 262K context window (extensible to 1M+) while offering nuanced support across 201 languages.

Benchmarks

Capable & Efficient

Proven performance across reasoning, coding, and agentic workflows for the 9B weight class.

Intelligence Index

Better than 75% of models

GPQA Diamond

Better than 83% of models

τ²-Bench Telecom

Better than 88% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	82%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	13%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	87%	AI agents in dual-control scenarios
Reasoning	AA-LCR	59%	Long context reasoning evaluation
Reasoning	IFBench	67%	Instruction-following accuracy
Reasoning	GDPval-AA	20%	Agentic performance on real-world work tasks
Coding	SciCode	29%	Python for scientific computing
Coding	Terminal-Bench Hard	22%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	17%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	19%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.03	$0.29
Async	$0.04	$0.35
Realtime	$0.08	$0.70

Context window natively supported up to 262k tokens (extensible to 1M+).

Quickstart

Start Building in Minutes

Qwen3.5 9B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Qwen3.5 9B?

Get Your API Keys Read the Full Documentation