Doubleword
    Hybrid Architecture
    Native Multimodal
    Open Weights

    Qwen3.5 9B

    The highly capable 9B-parameter multimodal model optimized for efficient agentic workflows and native tool calling.

    Total Parameters

    9B

    Context Window

    262K

    Extensible to 1M+

    Modalities

    Text, Image

    & Video

    Function Calling

    66.1%

    BFCL-V4

    About

    Efficient Multimodal Intelligence at the 9B Scale

    Qwen3.5 9B is a powerful foundation model that utilizes a hybrid Gated DeltaNet and Gated Attention architecture for highly efficient inference with reduced latency. Featuring 9 billion parameters, it delivers robust multimodal understanding across text, images, and video. Built with a native 262,144-token context window and explicit "Thinking Mode" capabilities, Qwen3.5 9B is engineered for production-grade reliability in autonomous agent workflows, advanced OCR, and complex global applications.

    N
    N
    N
    N
    N

    Hybrid Attention — 9B parameters

    Use Cases

    Built for agentic intelligence

    Unified Multimodal Reasoning

    Process text, video, and high-resolution images together. Excels at visual question answering, OCR document processing, and spatial reasoning.

    Native Tool Calling & Agents

    Production-ready function calling for multi-step agent orchestration, autonomous task planning, and reliable code generation.

    Deep Reasoning ("Thinking Mode")

    Built-in "Thinking Mode" generates explicit step-by-step reasoning traces before answering, dramatically increasing accuracy on complex tasks.

    Global & Long-Context Processing

    Analyze massive documents natively with the 262K context window (extensible to 1M+) while offering nuanced support across 201 languages.

    Benchmarks

    Capable & Efficient

    Proven performance across reasoning, coding, and agentic workflows for the 9B weight class.

    32.4

    Overall Intelligence

    Better than 75% of models

    25.3

    Coding Capability

    Better than 71% of models

    37.4

    Agentic Capability

    Better than 72% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond80.6%
    Reasoningτ²-Bench Telecom86.8%
    ReasoningIFBench66.7%
    ReasoningAA-LCR59.0%
    ReasoningGDPval-AA12.1%
    ReasoningHLE13.3%
    ReasoningCritPt0.3%
    CodingSciCode27.5%
    CodingTerminal-Bench Hard24.2%
    KnowledgeAA-Omniscience15.9%

    Metrics sourced from Artificial Analysis.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.03$0.29
    Async$0.04$0.35
    Realtime$0.08$0.70

    Context window natively supported up to 262k tokens (extensible to 1M+).

    Quickstart

    Start Building in Minutes

    Qwen3.5 9B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")