Doubleword
    Agentic Engineering
    Thinking Mode
    FP8
    Z.ai
    Open Weights

    GLM-5.1

    Z.ai's next-generation flagship for agentic engineering. State-of-the-art on SWE-Bench Pro, with significantly stronger coding than GLM-5 — built for repository generation, terminal tasks, and long-horizon agentic workflows.

    Total Parameters

    FP8

    Quantized

    Context Window

    198K

    Tokens

    Intelligence

    51

    AA Index v4.0

    License

    Open

    Open Weights

    About

    Built for Long-Horizon Agentic Work

    GLM-5.1-FP8 is Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0 — making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows.

    GLM-5.1 is designed to stay productive over extended sessions: breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration. Thinking mode is enabled by default; disable with {"chat_template_kwargs": {"enable_thinking": false}}.

    SWE
    Repo
    Term
    Tools
    Plan
    Iter

    Flagship Agentic Engineering Model

    Use Cases

    Best for sustained engineering agents

    Agentic Engineering

    State-of-the-art on SWE-Bench Pro. Built for repository generation, multi-file edits, and long-horizon coding agents.

    Terminal & Tool Use

    Leads GLM-5 by a wide margin on Terminal-Bench 2.0 — strong at sustained tool calls, shell workflows, and iterative debugging.

    Long-Horizon Reasoning

    Stays productive over extended sessions: breaks down ambiguous problems, runs experiments, reads results, identifies blockers, and improves through iteration.

    NL2Repo Generation

    Top-tier performance on natural-language-to-repository benchmarks — turn specs and prompts into full working codebases.

    Benchmarks

    Frontier Coding & Agentic Performance

    Artificial Analysis Intelligence Index v4.0 scores. SOTA on SWE-Bench Pro.

    51

    Intelligence Index

    Better than 92% of models

    84

    GPQA Diamond

    Better than 92% of models

    98

    τ²-Bench Telecom

    Better than 95% of models

    CategoryBenchmarkScore
    ReasoningGPQA Diamond84%
    ReasoningHumanity's Last Exam28%
    Reasoningτ²-Bench Telecom98%
    ReasoningAA-LCR62%
    ReasoningIFBench76%
    ReasoningGDPval-AA52%
    CodingSciCode44%
    CodingTerminal-Bench Hard43%
    KnowledgeAA-Omniscience Accuracy26%
    KnowledgeAA-Omniscience Non-Hallucination71%

    Metrics sourced from Artificial Analysis and Z.ai's published evaluations. Reasoning (thinking) mode enabled.

    Pricing

    Flexible Pricing Tiers

    Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.70$2.20
    Async$1.05$3.30
    Realtime$1.40$4.40

    Context window natively supported up to 198K tokens (max 202,752).

    Quickstart

    Start Building in Minutes

    GLM-5.1 is accessible via OpenAI-compatible endpoints. Default sampling: temperature=1.0, top_p=0.95.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Long-horizon agentic coding task (thinking enabled by default)
    response = client.chat.completions.create(
        model="zai-org/GLM-5.1-FP8",
        messages=[
            {"role": "user", "content": "Refactor this repo to use async I/O end-to-end."}
        ],
        temperature=1.0,
        top_p=0.95,
        # To disable step-by-step reasoning:
        # extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )
    
    print(response.choices[0].message.content)

    💡 Pro Tip

    GLM-5.1 is built for long-horizon agentic work — let it sustain reasoning across planning, tool use, experiments, and iterative debugging. Keep thinking mode on for ambiguous tasks. Disable with "chat_template_kwargs": {"enable_thinking": false} for latency-sensitive endpoints.