Doubleword
    Multimodal
    Agentic
    Thinking Mode
    MoE
    Moonshot
    Open Weights

    Kimi K2.6

    Moonshot's open-source native multimodal agentic model. Built on the same MoE multimodal architecture as K2.5 with a 256K context window — combining strong reasoning, visual understanding, and agentic tool use across instant and thinking modes.

    Architecture

    MoE

    Multimodal

    Context Window

    256K

    Tokens

    Intelligence

    54

    AA Index v4.0

    License

    Open

    Open Weights

    About

    Native Multimodal MoE for Agentic Workflows

    Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. Built on the same MoE multimodal architecture as K2.5 with a 256K context window, K2.6 unifies reasoning, visual understanding, and agentic tool use across instant and thinking modes.

    Thinking mode is enabled by default; disable with {"chat_template_kwargs": {"enable_thinking": false}}. K2.6 does not support graduated thinking levels — reasoning_effort is not supported.

    Code
    Vision
    Term
    Tools
    Plan
    Swarm

    Multimodal Agentic Flagship

    Use Cases

    Built for autonomous, multimodal agents

    Long-Horizon Coding

    End-to-end coding performance across Rust, Go, Python, front-end, DevOps, and performance optimization workflows.

    Coding-Driven Design

    Turns prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows with structured layouts and visual polish.

    Elevated Agent Swarm

    Decomposes complex tasks into parallel, domain-specialized subtasks — scaling to large coordinated agent runs for end-to-end outputs.

    Proactive Orchestration

    Built for autonomous execution. Persistent background agents that manage schedules, execute code, and coordinate cross-platform operations with minimal oversight.

    Benchmarks

    Frontier Coding & Agentic Performance

    Artificial Analysis Intelligence Index v4.0 scores. In Moonshot's published benchmarks, K2.6 outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5.

    54

    Intelligence Index

    Better than 94% of models

    91

    GPQA Diamond

    Better than 96% of models

    96

    τ²-Bench Telecom

    Better than 95% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond91%
    ReasoningHumanity's Last Exam36%
    Reasoningτ²-Bench Telecom96%
    ReasoningAA-LCR70%
    ReasoningIFBench76%
    ReasoningGDPval-AA49%
    CodingSciCode53%
    CodingTerminal-Bench Hard44%
    KnowledgeAA-Omniscience Accuracy33%
    KnowledgeAA-Omniscience Non-Hallucination61%

    Metrics sourced from Artificial Analysis and Moonshot's published evaluations. Reasoning (thinking) mode enabled.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Overnight (24H)$0.45$2.00
    Async$0.70$3.00
    Realtime$0.95$4.00

    Context window natively supported up to 256K tokens.

    Quickstart

    Start Building in Minutes

    Kimi K2.6 is accessible via OpenAI-compatible endpoints.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Long-horizon multimodal agentic task (thinking enabled by default)
    response = client.chat.completions.create(
        model="moonshotai/Kimi-K2.6",
        messages=[
            {"role": "user", "content": "Plan and execute a 3-step refactor of this codebase."}
        ],
        # To disable step-by-step reasoning:
        # extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )
    
    print(response.choices[0].message.content)

    💡 Pro Tip

    K2.6 shines on long-horizon agentic work — let it sustain reasoning across planning, tool use, and iterative debugging. K2.6 does not support graduated thinking levels (reasoning_effort has no effect). For latency-sensitive endpoints, disable thinking with "chat_template_kwargs": {"enable_thinking": false}.