MoE

1M Context

Reasoning

Agentic

DeepSeek

Open Weights

DeepSeek V4-Flash

A general-purpose open MoE model built for reasoning, tool use, and long-context work — 284B total parameters, 13B active, with a 1M-token context window. The compact V4 family flagship for everyday agentic tasks.

Get API Key Test in Playground

Architecture

MoE

284B total / 13B active

Context Window

Tokens

Intelligence

AA Index v4.0

License

Open

Open Weights

About

Compact V4 with 1M-token Context

DeepSeek V4-Flash is a general-purpose open MoE model built for reasoning, tool use, and long-context work. With 284B total parameters and only 13B active per token, it brings the core strengths of the V4 family into a more compact, faster, and cheaper package — without giving up the 1M-token context window.

It’s a strong fit for chat, structured generation, document-scale analysis, and agentic workflows that need broad capability across everyday tasks.

Reason

1M Ctx

Tools

Code

Chat

Agent

Compact V4 MoE

Use Cases

Built for everyday agentic & long-context work

General Reasoning

Strong everyday reasoning across chat, Q&A, and structured generation tasks.

Document-Scale Analysis

1M-token context window handles long documents, codebases, and research corpora in a single call.

Agentic Workflows

Reliable tool use and instruction following for agents that need broad capability across diverse tasks.

Structured Generation

Strong format adherence for JSON, code, and structured outputs in production pipelines.

Benchmarks

Strong reasoning at a fraction of the cost

Artificial Analysis Intelligence Index v4.0 scores. V4-Flash compresses the V4 family's strengths into a 13B-active MoE while retaining the 1M-token context window.

46.5

Intelligence Index

Better than 88% of models

GPQA Diamond

Better than 90% of models

τ²-Bench Telecom

Better than 95% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	87%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	28%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	96%	AI agents in dual-control scenarios
Reasoning	AA-LCR	63%	Long context reasoning evaluation
Reasoning	IFBench	73%	Instruction-following accuracy
Reasoning	GDPval-AA	46%	Agentic performance on real-world work tasks
Coding	SciCode	42%	Python for scientific computing
Coding	Terminal-Bench Hard	39%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	36%	Proportion of correctly answered questions

Metrics sourced from Artificial Analysis and DeepSeek's published evaluations.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Overnight (24H)	$0.07	$0.14
Async	$0.10	$0.20
Realtime	$0.14	$0.28

Context window natively supported up to 1M tokens.

Quickstart

Start Building in Minutes

DeepSeek V4-Flash is accessible via OpenAI-compatible endpoints.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Long-context reasoning with DeepSeek V4-Flash (1M tokens)
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[
        {"role": "user", "content": "Analyze this document and extract key insights."}
    ],
)

print(response.choices[0].message.content)

💡 Pro Tip

V4-Flash's 1M-token context shines on document-scale workloads — feed entire codebases, contracts, or research corpora in a single call. Use the Async or Overnight tier to slash costs further on bulk pipelines.

Ready to deploy DeepSeek V4-Flash?

Get Your API Keys Read the Full Documentation