GLM-5.1
Z.ai's next-generation flagship for agentic engineering. State-of-the-art on SWE-Bench Pro, with significantly stronger coding than GLM-5 — built for repository generation, terminal tasks, and long-horizon agentic workflows.
Total Parameters
FP8
Quantized
Context Window
198K
Tokens
Intelligence
51
AA Index v4.0
License
Open
Open Weights
Built for Long-Horizon Agentic Work
GLM-5.1-FP8 is Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0 — making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows.
GLM-5.1 is designed to stay productive over extended sessions: breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration. Thinking mode is enabled by default; disable with {"chat_template_kwargs": {"enable_thinking": false}}.
Flagship Agentic Engineering Model
Best for sustained engineering agents
Agentic Engineering
State-of-the-art on SWE-Bench Pro. Built for repository generation, multi-file edits, and long-horizon coding agents.
Terminal & Tool Use
Leads GLM-5 by a wide margin on Terminal-Bench 2.0 — strong at sustained tool calls, shell workflows, and iterative debugging.
Long-Horizon Reasoning
Stays productive over extended sessions: breaks down ambiguous problems, runs experiments, reads results, identifies blockers, and improves through iteration.
NL2Repo Generation
Top-tier performance on natural-language-to-repository benchmarks — turn specs and prompts into full working codebases.
Frontier Coding & Agentic Performance
Artificial Analysis Intelligence Index v4.0 scores. SOTA on SWE-Bench Pro.
Intelligence Index
Better than 92% of models
GPQA Diamond
Better than 92% of models
τ²-Bench Telecom
Better than 95% of models
| Category | Benchmark | Score |
|---|---|---|
| Reasoning | GPQA Diamond | 84% |
| Reasoning | Humanity's Last Exam | 28% |
| Reasoning | τ²-Bench Telecom | 98% |
| Reasoning | AA-LCR | 62% |
| Reasoning | IFBench | 76% |
| Reasoning | GDPval-AA | 52% |
| Coding | SciCode | 44% |
| Coding | Terminal-Bench Hard | 43% |
| Knowledge | AA-Omniscience Accuracy | 26% |
| Knowledge | AA-Omniscience Non-Hallucination | 71% |
Metrics sourced from Artificial Analysis and Z.ai's published evaluations. Reasoning (thinking) mode enabled.
Flexible Pricing Tiers
Choose the optimal balance of speed and cost for your workflow. Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.70 | $2.20 |
| Async | $1.05 | $3.30 |
| Realtime | $1.40 | $4.40 |
Context window natively supported up to 198K tokens (max 202,752).
Start Building in Minutes
GLM-5.1 is accessible via OpenAI-compatible endpoints. Default sampling: temperature=1.0, top_p=0.95.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Long-horizon agentic coding task (thinking enabled by default)
response = client.chat.completions.create(
model="zai-org/GLM-5.1-FP8",
messages=[
{"role": "user", "content": "Refactor this repo to use async I/O end-to-end."}
],
temperature=1.0,
top_p=0.95,
# To disable step-by-step reasoning:
# extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(response.choices[0].message.content)💡 Pro Tip
GLM-5.1 is built for long-horizon agentic work — let it sustain reasoning across planning, tool use, experiments, and iterative debugging. Keep thinking mode on for ambiguous tasks. Disable with "chat_template_kwargs": {"enable_thinking": false} for latency-sensitive endpoints.
