Doubleword
    Customer Stories

    Real teams. Real savings.

    See how teams use Doubleword to run workloads that would be prohibitively expensive at real-time API rates.

    Doubleword×OpenMed×Hugging Face

    119,000 Medical Images Annotated for $452.
    Claude Sonnet 4.6 Would Have Cost $7,487.

    How OpenMed used Doubleword to make frontier-model knowledge distillation viable at dataset scale — and what it means for anyone building with synthetic data.

    Saved vs Anthropic

    94%

    119K medical images annotated with two frontier VLMs, cross-validated at 93% agreement, producing 110K training records — for $452.58 total.

    119K

    Images annotated

    93%

    Cross-validation agreement

    110K

    Training records

    +15%

    Exact match improvement

    The Challenge

    Medical VQA datasets are small (VQA-RAD has just 314 training samples), narrow in coverage, and often restrictively licensed. Frontier VLMs can produce clinical analyses but cost $10–$50 per 1,000 images at real-time rates. Small 2–3B models are deployable but lack medical knowledge. Knowledge distillation at 119K images would be prohibitively expensive.

    The Solution

    By routing the entire annotation pipeline through Doubleword's async inference API, OpenMed ran two full annotation passes and two cross-validation passes over 119,137 images using Qwen 3.5 (397B) and Kimi K2.5 (1T). The OpenAI-compatible API required no pipeline changes — only the endpoint changed.

    "The Doubleword team worked with us on batch annotation at scale. Their API made it economically viable to run two full annotation passes plus two cross-validation passes over 119K images with frontier reasoning models."

    — Maziyar Panahi, Founder of OpenMed

    Cost Breakdown

    ModelProviderTotal Costvs Doubleword
    Qwen3.5-397B + Kimi-K2.5Doubleword$452.58
    Qwen3.5-397B + Kimi-K2.5Alibaba Cloud + Moonshot AI$1,393.393.1× more
    Gemini 3 FlashGoogle$1,486.003.3× more
    GPT-5OpenAI$4,909.0010.8× more
    Claude Sonnet 4.6Anthropic$7,487.0016.5× more

    The Result

    110,741 validated medical VQA records, open-sourced in full: datasets, model adapters, and code. Fine-tuning three small model families (2–3B parameters) on the synthetic dataset improved benchmarks across every model and every task. Best result: +15.0% average exact match improvement on Qwen3.5-2B.

    Why It Matters

    Synthetic data generation and large-scale annotation are among the most cost-sensitive workloads in AI. They are high-volume, non-time-sensitive, and directly constrained by inference budget. At real-time API prices, only well-funded labs can annotate at the scale needed to produce useful training data. Async inference removes that constraint entirely — and makes the entire pipeline reproducible by anyone.