Agentic Engineering

Thinking Mode

FP8

Z.ai

Open Weights

GLM-5.1

Z.ai's next-generation flagship for agentic engineering. State-of-the-art on SWE-Bench Pro, with significantly stronger coding than GLM-5 — built for repository generation, terminal tasks, and long-horizon agentic workflows.

Get API Key Test in Playground

Total Parameters

FP8

Quantized

Context Window

198K

Tokens

Intelligence

AA Index v4.0

License

Open

Open Weights

About

Built for Long-Horizon Agentic Work

GLM-5.1-FP8 is Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0 — making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows.

GLM-5.1 is designed to stay productive over extended sessions: breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration. Thinking mode is enabled by default; disable with {"chat_template_kwargs": {"enable_thinking": false}}.

SWE

Repo

Term

Tools

Plan

Iter

Flagship Agentic Engineering Model

Use Cases

Best for sustained engineering agents

Agentic Engineering

State-of-the-art on SWE-Bench Pro. Built for repository generation, multi-file edits, and long-horizon coding agents.

Terminal & Tool Use

Leads GLM-5 by a wide margin on Terminal-Bench 2.0 — strong at sustained tool calls, shell workflows, and iterative debugging.

Long-Horizon Reasoning

Stays productive over extended sessions: breaks down ambiguous problems, runs experiments, reads results, identifies blockers, and improves through iteration.

NL2Repo Generation

Top-tier performance on natural-language-to-repository benchmarks — turn specs and prompts into full working codebases.

Benchmarks

Frontier Coding & Agentic Performance

Artificial Analysis Intelligence Index v4.0 scores. SOTA on SWE-Bench Pro.

Intelligence Index

Better than 92% of models

GPQA Diamond

Better than 92% of models

τ²-Bench Telecom

Better than 95% of models

Category	Benchmark	Score	Description
Reasoning	GPQA Diamond	84%	Graduate-level scientific reasoning
Reasoning	Humanity's Last Exam	28%	Humanity's Last Exam
Reasoning	τ²-Bench Telecom	98%	AI agents in dual-control scenarios
Reasoning	AA-LCR	62%	Long context reasoning evaluation
Reasoning	IFBench	76%	Instruction-following accuracy
Reasoning	GDPval-AA	52%	Agentic performance on real-world work tasks
Coding	SciCode	44%	Python for scientific computing
Coding	Terminal-Bench Hard	43%	Agentic coding & terminal use
Knowledge	AA-Omniscience Accuracy	26%	Proportion of correctly answered questions
Knowledge	AA-Omniscience Non-Hallucination	71%	Confidently answered questions that are correct

Metrics sourced from Artificial Analysis and Z.ai's published evaluations. Reasoning (thinking) mode enabled.

Pricing

Flexible Pricing Tiers

Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.70	$2.20
Async	$1.05	$3.30
Realtime	$1.40	$4.40

Context window natively supported up to 198K tokens (max 202,752).

Quickstart

Start Building in Minutes

GLM-5.1 is accessible via OpenAI-compatible endpoints. Default sampling: temperature=1.0, top_p=0.95.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Long-horizon agentic coding task (thinking enabled by default)
response = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=[
        {"role": "user", "content": "Refactor this repo to use async I/O end-to-end."}
    ],
    temperature=1.0,
    top_p=0.95,
    # To disable step-by-step reasoning:
    # extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)

print(response.choices[0].message.content)

💡 Pro Tip

GLM-5.1 is built for long-horizon agentic work — let it sustain reasoning across planning, tool use, experiments, and iterative debugging. Keep thinking mode on for ambiguous tasks. Disable with "chat_template_kwargs": {"enable_thinking": false} for latency-sensitive endpoints.

Ready to deploy GLM-5.1?

Get Your API Keys Read the Full Documentation