Dense Architecture

Text-Only

Thinking Mode

Qwen3-14B-FP8

A highly efficient 14.8B parameter dense language model optimized for high-volume text tasks and dual-mode reasoning.

Get API Key Test in Playground

Total Parameters

14.8B

Context Window

131K

Tokens

Modalities

Text Only

Max Output

16,384

Tokens

About

Efficient Text Generation with Dual-Mode Reasoning

Meet Qwen3-14B, a dense 14.8B parameter causal language model from the Qwen3 release. Designed for both complex reasoning and efficient dialogue, it supports seamless switching between a "thinking" mode for rigorous logic, math, and programming tasks, and a "non-thinking" mode for general-purpose conversation. Trained on 36 trillion multilingual tokens across 100+ languages, it serves as an excellent foundation for high-volume workloads like classification, extraction, and summarization where maximum frontier performance is not strictly required.

Dense 14.8B — Dual-Mode Reasoning

Use Cases

Built for efficient text intelligence

High-Volume Processing

Perfectly suited for tasks that do not require massive frontier intelligence, such as document classification, data extraction, and large-scale summarization.

Dual-Mode Execution

Toggle between specialized "thinking" mode for robust logical inference and creative coding, or "non-thinking" mode for rapid, general-purpose conversation.

Business Customization

A strong baseline for medium-scale enterprise custom model training, allowing for efficient adaptation to domain-specific professional services.

Academic & Research

Provides an optimal, budget-friendly foundation for natural language processing methodologies, educational AI, and fine-tuning experiments.

Benchmarks

Capable & Cost-Effective

Proven baseline performance across reasoning, coding, and agentic workflows for the 14B weight class.

Intelligence Index

Better than 32% of models

GPQA Diamond

Better than 50% of models

τ²-Bench Telecom

Better than 42% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	60%	Graduate-level scientific reasoning
Reasoning	τ²-Bench Telecom	35%	AI agents in dual-control scenarios
Reasoning	IFBench	36%	Instruction-following accuracy
Coding	SciCode	29%	Python for scientific computing
Knowledge	AA-Omniscience Accuracy	15%	Proportion of correctly answered questions

Metrics sourced from Artificial Analysis. Hallucination Rate: 24.5%

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.02	$0.20
Async	$0.03	$0.30
Realtime	$0.05	$0.60

Context window natively supported up to 131k tokens via YaRN-based scaling.

Quickstart

Start Building in Minutes

Qwen3-14B-FP8 is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.

Developer Tip: Recommended Sampling Parameters

For optimal performance and to reduce endless repetitions, the Qwen team recommends: Temperature=0.7, TopP=0.8, TopK=20, MinP=0, and a presence_penalty=1.5 (adjust up to 2.0 if language mixing or repetition persists).

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy Qwen3-14B-FP8?

Get Your API Keys Read the Full Documentation