Doubleword
    Compact Model
    Reasoning
    Open Weights
    New

    Qwen3.5 4B

    A compact 4B parameter reasoning model with 262K context — comparable to GPT-OSS-20B at a fraction of the cost.

    Total Parameters

    4B

    Context Window

    262K

    Native

    Intelligence

    27

    AA Index v4.0

    Provider

    Alibaba

    Cloud

    About

    Big Reasoning in a Small Package

    Qwen3.5-4B is a compact 4 billion parameter reasoning model from Alibaba Cloud's Qwen family, featuring a native 262K token context length. Despite its small size, it delivers remarkably strong performance on complex reasoning tasks — Qwen's benchmarks show it is comparable to the much larger GPT-OSS-20B model. This makes it an ideal choice for cost-sensitive workloads that still require robust intelligence.

    💡 Developer Tip

    At just $0.04/$0.06 per 1M tokens on Standard tier, Qwen3.5-4B is one of the most cost-efficient reasoning models available — perfect for high-volume batch workloads.

    Compact Reasoning — 4B parameters

    Use Cases

    Built for efficient reasoning at scale

    Compact Reasoning Powerhouse

    Despite its small 4B parameter size, Qwen3.5-4B delivers reasoning performance comparable to GPT-OSS-20B on complex tasks — at a fraction of the cost.

    Cost-Efficient Agent Workflows

    Ideal for high-volume agentic pipelines where per-token cost matters. Run thousands of concurrent reasoning tasks without breaking the bank.

    Edge & Lightweight Deployment

    Small enough for resource-constrained environments while maintaining strong reasoning capabilities. Perfect for on-device or low-latency scenarios.

    Long-Context Processing

    Native 262K token context window enables processing of large documents, codebases, and multi-turn conversations without chunking overhead.

    Benchmarks

    Punches Above Its Weight

    Comparable to GPT-OSS-20B on complex reasoning tasks, at 5x fewer parameters.

    27

    Overall Intelligence

    Better than 71% of models

    20.1

    Coding Capability

    Better than 63% of models

    30.2

    Agentic Capability

    Better than 66% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond72.2%
    Reasoningτ²-Bench Telecom68.4%
    ReasoningIFBench62.3%
    ReasoningAA-LCR38.5%
    ReasoningGDPval-AA9.8%
    ReasoningHLE10.1%
    ReasoningCritPt0.8%
    CodingSciCode28.6%
    CodingTerminal-Bench Hard11.5%
    KnowledgeAA-Omniscience14.2%

    Metrics sourced from Artificial Analysis.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.04$0.06
    Async$0.05$0.08

    Context window natively supported up to 262K tokens.

    Quickstart

    Start Building in Minutes

    Qwen3.5 4B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")