Vision-Language

MoE Architecture

Fine-Tune Ready

Qwen3-VL-30B-A3B

The highly capable mid-size multimodal model for production workloads, reasoning, and visual coding.

Get API Key Test in Playground

Total Parameters

30B

3B Activated

Context Window

128K

Tokens

Modalities

Text, Image

& Video

Performance Class

GPT-4.1-mini

/ Sonnet 4

About

High-Performance Multimodal Efficiency

Meet Qwen3-VL-30B, the highly capable mid-size model of the Qwen3-VL family. It unifies strong text generation with visual understanding for images and videos, delivering performance similar to GPT-4.1-mini and Claude Sonnet 4. Built on an efficient 48-layer Mixture-of-Experts architecture, it is suited for production workloads that are cost-constrained or require high token volumes. From advanced 2D/3D spatial grounding and document AI to converting UI sketches into debugged code, Qwen3-VL-30B offers robust reasoning capabilities without frontier model costs.

Vision-Language MoE — 3B active / 30B total

Use Cases

Built for production multimodal workloads

Production Multimodal AI

Excels at general vision-language tasks including VQA, robust OCR, document AI, and long-form visual comprehension across real-world and synthetic categories.

Agentic GUI Automation

Handles multi-image multi-turn instructions, aligns text to video timelines for precise temporal queries, and natively navigates GUI automation tasks.

Visual Coding & STEM

Transforms whiteboard sketches and mockups directly into functional code. Actively assists with UI debugging, scientific computing, and complex reasoning.

Enterprise Customization

Trained on 36 trillion tokens across 119 languages, its expert-specific adaptation makes it the perfect foundation for supervised fine-tuning and domain-specific AI.

Benchmarks

Efficient Multimodal Intelligence

Proven performance across reasoning, coding, and agentic workflows for the 30B weight class.

Intelligence Index

Better than 34% of models

GPQA Diamond

Better than 62% of models

τ²-Bench Telecom

Better than 28% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	71%	Graduate-level scientific reasoning
Reasoning	τ²-Bench Telecom	19%	AI agents in dual-control scenarios
Reasoning	AA-LCR	24%	Long context reasoning evaluation
Reasoning	IFBench	33%	Instruction-following accuracy
Reasoning	GDPval-AA	16.0%	Agentic performance on real-world work tasks
Coding	SciCode	30%	Python for scientific computing
Coding	Terminal-Bench Hard	11%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	16%	Proportion of correctly answered questions

Metrics sourced from Artificial Analysis.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.05	$0.20
Async	$0.07	$0.30
Realtime	$0.16	$0.80

Context window natively supported up to 128k tokens.

Quickstart

Start Building in Minutes

Qwen3-VL-30B-A3B is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Qwen3-VL-30B-A3B?

Get Your API Keys Read the Full Documentation