DeepSeek V4-Flash
A general-purpose open MoE model built for reasoning, tool use, and long-context work — 284B total parameters, 13B active, with a 1M-token context window. The compact V4 family flagship for everyday agentic tasks.
Architecture
MoE
284B total / 13B active
Context Window
1M
Tokens
Intelligence
47
AA Index v4.0
License
Open
Open Weights
Compact V4 with 1M-token Context
DeepSeek V4-Flash is a general-purpose open MoE model built for reasoning, tool use, and long-context work. With 284B total parameters and only 13B active per token, it brings the core strengths of the V4 family into a more compact, faster, and cheaper package — without giving up the 1M-token context window.
It’s a strong fit for chat, structured generation, document-scale analysis, and agentic workflows that need broad capability across everyday tasks.
Compact V4 MoE
Built for everyday agentic & long-context work
General Reasoning
Strong everyday reasoning across chat, Q&A, and structured generation tasks.
Document-Scale Analysis
1M-token context window handles long documents, codebases, and research corpora in a single call.
Agentic Workflows
Reliable tool use and instruction following for agents that need broad capability across diverse tasks.
Structured Generation
Strong format adherence for JSON, code, and structured outputs in production pipelines.
Strong reasoning at a fraction of the cost
Artificial Analysis Intelligence Index v4.0 scores. V4-Flash compresses the V4 family's strengths into a 13B-active MoE while retaining the 1M-token context window.
Intelligence Index
Better than 88% of models
GPQA Diamond
Better than 90% of models
τ²-Bench Telecom
Better than 95% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 87% |
| Reasoning | Humanity's Last Exam | 28% |
| Reasoning | τ²-Bench Telecom | 96% |
| Reasoning | AA-LCR | 63% |
| Reasoning | IFBench | 73% |
| Reasoning | GDPval-AA | 46% |
| Coding | SciCode | 42% |
| Coding | Terminal-Bench Hard | 39% |
| Knowledge | AA-Omniscience Accuracy | 36% |
Metrics sourced from Artificial Analysis and DeepSeek's published evaluations.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Overnight (24H) | $0.07 | $0.14 |
| Async | $0.10 | $0.20 |
| Realtime | $0.14 | $0.28 |
Context window natively supported up to 1M tokens.
Start Building in Minutes
DeepSeek V4-Flash is accessible via OpenAI-compatible endpoints.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Long-context reasoning with DeepSeek V4-Flash (1M tokens)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[
{"role": "user", "content": "Analyze this document and extract key insights."}
],
)
print(response.choices[0].message.content)💡 Pro Tip
V4-Flash's 1M-token context shines on document-scale workloads — feed entire codebases, contracts, or research corpora in a single call. Use the Async or Overnight tier to slash costs further on bulk pipelines.
