About

What becomes possible when inference is 100x cheaper?

That's the question we're building toward. And the engineering problem we're solving every day.

Who we are

Doubleword is a London-based team of researchers and systems engineers obsessed with inference efficiency. We've built our own inference engines, published research, optimized kernels in CUDA and Triton, and deployed production infrastructure inside regulated enterprises. We're interested in what becomes possible when inference is 100x cheaper, and we spend all of our efforts in making that a reality.

We're hiring

We're looking for engineers who care deeply about performance and want to work on problems at the intersection of systems engineering and inference infrastructure.

See open roles

Ready to run inference at scale?

Run a sample job Read the Docs