Qwen3-14B-FP8
A highly efficient 14.8B parameter dense language model optimized for high-volume text tasks and dual-mode reasoning.
Total Parameters
14.8B
Context Window
131K
Tokens
Modalities
Text Only
Max Output
16,384
Tokens
Efficient Text Generation with Dual-Mode Reasoning
Meet Qwen3-14B, a dense 14.8B parameter causal language model from the Qwen3 release. Designed for both complex reasoning and efficient dialogue, it supports seamless switching between a "thinking" mode for rigorous logic, math, and programming tasks, and a "non-thinking" mode for general-purpose conversation. Trained on 36 trillion multilingual tokens across 100+ languages, it serves as an excellent foundation for high-volume workloads like classification, extraction, and summarization where maximum frontier performance is not strictly required.
Dense 14.8B — Dual-Mode Reasoning
Built for efficient text intelligence
High-Volume Processing
Perfectly suited for tasks that do not require massive frontier intelligence, such as document classification, data extraction, and large-scale summarization.
Dual-Mode Execution
Toggle between specialized "thinking" mode for robust logical inference and creative coding, or "non-thinking" mode for rapid, general-purpose conversation.
Business Customization
A strong baseline for medium-scale enterprise custom model training, allowing for efficient adaptation to domain-specific professional services.
Academic & Research
Provides an optimal, budget-friendly foundation for natural language processing methodologies, educational AI, and fine-tuning experiments.
Capable & Cost-Effective
Proven baseline performance across reasoning, coding, and agentic workflows for the 14B weight class.
Overall Intelligence
Better than 34% of models
Coding Capability
Better than 35% of models
Agentic Capability
Better than 39% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 60.4% |
| Reasoning | τ²-Bench Telecom | 34.5% |
| Reasoning | IFBench | 40.5% |
| Reasoning | AA-LCR | 0.0% |
| Reasoning | GDPval-AA | 0.4% |
| Reasoning | HLE | 4.3% |
| Reasoning | CritPt | 0.0% |
| Coding | SciCode | 31.6% |
| Coding | Terminal-Bench Hard | 3.8% |
| Knowledge | AA-Omniscience | 14.9% |
Metrics sourced from Artificial Analysis. Hallucination Rate: 24.5%
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.02 | $0.20 |
| Async | $0.03 | $0.30 |
| Realtime | $0.05 | $0.60 |
Context window natively supported up to 131k tokens via YaRN-based scaling.
Start Building in Minutes
Qwen3-14B-FP8 is accessible via OpenAI-compatible endpoints. Here is how to integrate it using the standard Python SDK via Doubleword.ai.
Developer Tip: Recommended Sampling Parameters
For optimal performance and to reduce endless repetitions, the Qwen team recommends: Temperature=0.7, TopP=0.8, TopK=20, MinP=0, and a presence_penalty=1.5 (adjust up to 2.0 if language mixing or repetition persists).
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")