Doubleword
    Vision-Language
    MoE Architecture
    Fine-Tune Ready

    Qwen3-VL-30B-A3B

    The highly capable mid-size multimodal model for production workloads, reasoning, and visual coding.

    Total Parameters

    30B

    3B Activated

    Context Window

    128K

    Tokens

    Modalities

    Text, Image

    & Video

    Performance Class

    GPT-4.1-mini

    / Sonnet 4

    About

    High-Performance Multimodal Efficiency

    Meet Qwen3-VL-30B, the highly capable mid-size model of the Qwen3-VL family. It unifies strong text generation with visual understanding for images and videos, delivering performance similar to GPT-4.1-mini and Claude Sonnet 4. Built on an efficient 48-layer Mixture-of-Experts architecture, it is suited for production workloads that are cost-constrained or require high token volumes. From advanced 2D/3D spatial grounding and document AI to converting UI sketches into debugged code, Qwen3-VL-30B offers robust reasoning capabilities without frontier model costs.

    E
    E
    E
    E
    E

    Vision-Language MoE — 3B active / 30B total

    Use Cases

    Built for production multimodal workloads

    Production Multimodal AI

    Excels at general vision-language tasks including VQA, robust OCR, document AI, and long-form visual comprehension across real-world and synthetic categories.

    Agentic GUI Automation

    Handles multi-image multi-turn instructions, aligns text to video timelines for precise temporal queries, and natively navigates GUI automation tasks.

    Visual Coding & STEM

    Transforms whiteboard sketches and mockups directly into functional code. Actively assists with UI debugging, scientific computing, and complex reasoning.

    Enterprise Customization

    Trained on 36 trillion tokens across 119 languages, its expert-specific adaptation makes it the perfect foundation for supervised fine-tuning and domain-specific AI.

    Benchmarks

    Efficient Multimodal Intelligence

    Proven performance across reasoning, coding, and agentic workflows for the 30B weight class.

    16.1

    Overall Intelligence

    Better than 34% of models

    14.3

    Coding Capability

    Better than 43% of models

    9.5

    Agentic Capability

    Better than 29% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond69.5%
    Reasoningτ²-Bench Telecom19.0%
    ReasoningIFBench33.1%
    ReasoningAA-LCR23.7%
    ReasoningGDPval-AA1.3%
    ReasoningHLE6.4%
    ReasoningCritPt0.0%
    CodingSciCode30.8%
    CodingTerminal-Bench Hard6.1%
    KnowledgeAA-Omniscience15.5%

    Metrics sourced from Artificial Analysis.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.05$0.20
    Async$0.07$0.30
    Realtime$0.16$0.80

    Context window natively supported up to 128k tokens.

    Quickstart

    Start Building in Minutes

    Qwen3-VL-30B-A3B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")

    Ready to deploy Qwen3-VL-30B-A3B?