Doubleword
    Back to Workbooks
    Use CaseAsync Agents

    Scale Deep Research Agents Without Burning Your API Budget

    The Challenge: Scale vs. Cost

    Agentic workflows require massive rounds of inference. A single autonomous researcher might call tools 10–20 times; a recursive tree of sub-agents easily hits hundreds of LLM calls.

    Real-time APIsDestroys your unit economics at scale.
    Standard AsyncCheap, but impossibly slow. A deep agent tree might take weeks to resolve.

    The Doubleword Unlock

    Doubleword provides a high-throughput async inference engine built specifically for agentic workloads.

    The Result: Process massive, parallel workloads smoothly and predictably, while costing 95% less than real-time inference.

    📊 Case Study

    The Economics of Async Agents

    Deep Research Workload: "How has the popularity of daffodils evolved over recent centuries?"

    A sophisticated system spawns parallel sub-agents to explore distinct angles (cultural history, agricultural data, literary references).

    50

    Parallel Agents

    20+

    Async Rounds

    1.95M

    Input Tokens

    1423

    LMArena Score (Qwen 235B)

    ProviderCost / 1M (In/Out)Total
    Doubleword$0.15 / $0.55$0.34
    OpenAI$2.50 / $10.00$5.81
    Anthropic$3.00 / $15.00$7.25

    The Result: You get comparable intelligence and deep reasoning, but the cost is 16–20× lower. A background workload that would normally cost $7.00 now costs $0.34, turning a highly expensive AI feature into a wildly profitable one.

    Wide Trees Beat Deep Trees

    When you shift from real-time to async infrastructure, the engineering paradigm changes. Because compute is practically free but execution involves background processing, the goal is to make your agent tree as wide as possible so maximum work happens in parallel.

    Our recommended architecture utilizes a Search-First, Delegated-Thinking approach:

    01

    Pre-Loaded Context

    When a root agent spawns sub-agents, it executes a web search immediately and injects the results into the sub-agent's creation prompt.

    02

    Instant Parallelism

    Instead of an agent wasting its first async round deciding to search, waiting, and then reading, it starts reading and analyzing immediately upon creation.

    03

    Breadth over Depth

    Spawning 5 sub-agents that each complete in 2 async rounds is vastly faster than 1 agent doing 10 sequential search-and-read cycles.

    All sub-agents across all branches of the tree are enqueued together. Doubleword's high-throughput backend processes them concurrently, allowing massive map-reduce workloads to resolve simultaneously.

    How the Async Loop Works

    Instead of keeping an HTTP connection open, your orchestrator simply manages states and listens for Doubleword's completion webhooks:

    01

    Dispatch

    The root agent breaks the topic into sub-queries and enqueues payload requests for 5 new sub-agents to the Doubleword API.

    02

    Decouple

    Your application server pauses that specific thread. No connections are held open.

    03

    Process

    Doubleword executes the massive parallel workload in the background via our high-throughput queues.

    04

    Resolve

    Upon completion, Doubleword hits your webhook. The parent agent ingests findings from all 5 sub-agents simultaneously, synthesizes the final report, and completes the workload.

    Ready to build your own Async Agents?

    Don't let real-time API costs restrict the depth of your agentic workflows. Shift your heavy map-reduce tasks to the background.