Use CaseAsync Agents

Scale Deep Research Agents Without Burning Your API Budget

The Challenge: Scale vs. Cost

Agentic workflows require massive rounds of inference. A single autonomous researcher might call tools 10–20 times; a recursive tree of sub-agents easily hits hundreds of LLM calls.

Real-time APIsDestroys your unit economics at scale.

Standard AsyncCheap, but impossibly slow. A deep agent tree might take weeks to resolve.

The Doubleword Unlock

Doubleword provides a high-throughput async inference engine built specifically for agentic workloads.

The Result: Process massive, parallel workloads smoothly and predictably, while costing 95% less than real-time inference.

📊 Case Study

The Economics of Async Agents

Deep Research Workload: "How has the popularity of daffodils evolved over recent centuries?"

A sophisticated system spawns parallel sub-agents to explore distinct angles (cultural history, agricultural data, literary references).

Parallel Agents

20+

Async Rounds

1.95M

Input Tokens

1423

LMArena Score (Qwen 235B)

Provider	Infrastructure	Cost / 1M (In/Out)	Total
Doubleword	Async Inference	$0.15 / $0.55	$0.34
OpenAI	Real-Time (GPT-4o)	$2.50 / $10.00	$5.81
Anthropic	Real-Time (Sonnet 3.5)	$3.00 / $15.00	$7.25

The Result: You get comparable intelligence and deep reasoning, but the cost is 16–20× lower. A background workload that would normally cost $7.00 now costs $0.34, turning a highly expensive AI feature into a wildly profitable one.

Wide Trees Beat Deep Trees

When you shift from real-time to async infrastructure, the engineering paradigm changes. Because compute is practically free but execution involves background processing, the goal is to make your agent tree as wide as possible so maximum work happens in parallel.

Our recommended architecture utilizes a Search-First, Delegated-Thinking approach:

Pre-Loaded Context

When a root agent spawns sub-agents, it executes a web search immediately and injects the results into the sub-agent's creation prompt.

Instant Parallelism

Instead of an agent wasting its first async round deciding to search, waiting, and then reading, it starts reading and analyzing immediately upon creation.

Breadth over Depth

Spawning 5 sub-agents that each complete in 2 async rounds is vastly faster than 1 agent doing 10 sequential search-and-read cycles.

All sub-agents across all branches of the tree are enqueued together. Doubleword's high-throughput backend processes them concurrently, allowing massive map-reduce workloads to resolve simultaneously.

How the Async Loop Works

Instead of keeping an HTTP connection open, your orchestrator simply manages states and listens for Doubleword's completion webhooks:

Dispatch

The root agent breaks the topic into sub-queries and enqueues payload requests for 5 new sub-agents to the Doubleword API.

Decouple

Your application server pauses that specific thread. No connections are held open.

Process

Doubleword executes the massive parallel workload in the background via our high-throughput queues.

Resolve

Upon completion, Doubleword hits your webhook. The parent agent ingests findings from all 5 sub-agents simultaneously, synthesizes the final report, and completes the workload.

Ready to build your own Async Agents?

Don't let real-time API costs restrict the depth of your agentic workflows. Shift your heavy map-reduce tasks to the background.