Question 1

Why is it cheaper?

Accepted Answer

We're not running at a loss or burning VC cash. Doubleword has built an inference stack optimized for high throughput and low cost from the ground up. By optimizing at every layer—hardware, runtime, and orchestration—we achieve significantly better unit economics than providers who bolt batch features onto real-time infrastructure.

You can learn more about our inference optimization work at blog.doubleword.ai.

Question 2

What happens if you can't meet the SLA?

Accepted Answer

We guarantee delivery. Unlike other providers who may expire your batch, we commit to your chosen SLA. If in the unlikely event we fail to meet it, we won't expire your request and you won't be charged for any requests that are late.

Question 3

Is it secure?

Accepted Answer

Yes. Message us for our detailed security policies, and you can read our publically available data usage policy. You have full control of your data and can delete it as soon as your batch is done. Customers can also choose to enter into a DPA with us. We're in the process of finalising our SOC2 and ISO27001 and so adhere to these standards. We take security very seriously, even if we're sometimes unserious elsewhere.

Question 4

What if a run fails?

Accepted Answer

We retry and recover automatically. If something truly breaks, you'll know immediately, and you won't lose your work. Partial results are saved, and you can resume from where things left off.

Question 5

Which models are supported?

Accepted Answer

We currently support popular open source LLMs and embedding models of various sizes. We also support deployment of open source models. If we don't have the model you're looking for let us know which models matter most to you!

Question 6

Can I use this to build multi-step agents?

Accepted Answer

Yes - The Doubleword Async API is built exactly for long running agents. It is interactive enough to support multi-turn agents. It has tool calling and structured generation support, and it is optimized for efficiency so your agent can reason for longer for less.

Question 7

Will I be rate limited?

Accepted Answer

Unlike other providers, because we optimize our inference stack for throughput we are able to offer much higher rate limits, especially on our async inference tier.

Question 8

How long do you store my data?

Accepted Answer

We keep it simple and put you in control. 1) On-demand deletion: You can delete your outputs whenever you want using the CLI, the API, or directly in the app.doubleword.ai dashboard. 2) Convenient retention: We retain your results just long enough for you to fetch and refetch them as needed, saving you from having to rerun the same requests. 3) Zero training: We don't use your data to train models.

Making tokens too cheap to meter

The largest volume of tokens comes from asynchronous AI workloads.

Background Agents

Batch Processing

Evaluations

Data Enrichment

Synthetic Data

Offline Jobs

Doubleword's APIs are the most efficient for every SLA

Realtime

Async

Batch

Same Intelligence. Fraction of the price.

Built for your highest volume use cases

Async Agents

Classification

Data Processing

Data Enrichment

Embeddings

Image Processing

Model Evals

Structured Generation

Synthetic Data

Seen in the wild

Questions, answered honestly

Stop overpaying for inference.

Making tokens too cheap to meter

The largest volume of tokens comes from asynchronous AI workloads.

Background Agents

Batch Processing

Evaluations

Data Enrichment

Synthetic Data

Offline Jobs

Doubleword's APIs are the most efficient for every SLA

​Realtime

​Async

Batch

Same Intelligence. Fraction of the price.

Built for your highest volume use cases

Async Agents

Classification

Data Processing

Data Enrichment

Embeddings

Image Processing

Model Evals

Structured Generation

Synthetic Data

Seen in the wild

Questions, answered honestly

Why is it cheaper?

What happens if you can't meet the SLA?

Is it secure?

What if a run fails?

Which models are supported?

Can I use this to build multi-step agents?

Will I be rate limited?

How long do you store my data?

Stop overpaying for inference.

Realtime

Async