Doubleword
    MoE
    1M Context
    Reasoning
    Agentic
    DeepSeek
    Open Weights

    DeepSeek V4-Flash

    A general-purpose open MoE model built for reasoning, tool use, and long-context work — 284B total parameters, 13B active, with a 1M-token context window. The compact V4 family flagship for everyday agentic tasks.

    Architecture

    MoE

    284B total / 13B active

    Context Window

    1M

    Tokens

    Intelligence

    47

    AA Index v4.0

    License

    Open

    Open Weights

    About

    Compact V4 with 1M-token Context

    DeepSeek V4-Flash is a general-purpose open MoE model built for reasoning, tool use, and long-context work. With 284B total parameters and only 13B active per token, it brings the core strengths of the V4 family into a more compact, faster, and cheaper package — without giving up the 1M-token context window.

    It’s a strong fit for chat, structured generation, document-scale analysis, and agentic workflows that need broad capability across everyday tasks.

    Reason
    1M Ctx
    Tools
    Code
    Chat
    Agent

    Compact V4 MoE

    Use Cases

    Built for everyday agentic & long-context work

    General Reasoning

    Strong everyday reasoning across chat, Q&A, and structured generation tasks.

    Document-Scale Analysis

    1M-token context window handles long documents, codebases, and research corpora in a single call.

    Agentic Workflows

    Reliable tool use and instruction following for agents that need broad capability across diverse tasks.

    Structured Generation

    Strong format adherence for JSON, code, and structured outputs in production pipelines.

    Benchmarks

    Strong reasoning at a fraction of the cost

    Artificial Analysis Intelligence Index v4.0 scores. V4-Flash compresses the V4 family's strengths into a 13B-active MoE while retaining the 1M-token context window.

    46.5

    Intelligence Index

    Better than 88% of models

    87

    GPQA Diamond

    Better than 90% of models

    96

    τ²-Bench Telecom

    Better than 95% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond87%
    ReasoningHumanity's Last Exam28%
    Reasoningτ²-Bench Telecom96%
    ReasoningAA-LCR63%
    ReasoningIFBench73%
    ReasoningGDPval-AA46%
    CodingSciCode42%
    CodingTerminal-Bench Hard39%
    KnowledgeAA-Omniscience Accuracy36%

    Metrics sourced from Artificial Analysis and DeepSeek's published evaluations.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Overnight (24H)$0.07$0.14
    Async$0.10$0.20
    Realtime$0.14$0.28

    Context window natively supported up to 1M tokens.

    Quickstart

    Start Building in Minutes

    DeepSeek V4-Flash is accessible via OpenAI-compatible endpoints.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Long-context reasoning with DeepSeek V4-Flash (1M tokens)
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V4-Flash",
        messages=[
            {"role": "user", "content": "Analyze this document and extract key insights."}
        ],
    )
    
    print(response.choices[0].message.content)

    💡 Pro Tip

    V4-Flash's 1M-token context shines on document-scale workloads — feed entire codebases, contracts, or research corpora in a single call. Use the Async or Overnight tier to slash costs further on bulk pipelines.

    Ready to deploy DeepSeek V4-Flash?