LightOnOCR-2-1B
Efficient 1B-parameter end-to-end OCR — state-of-the-art on OlmOCR-Bench while being ~9× smaller than competing approaches.
Provider
LightOn
Parameters
1B
Compact
Context Window
16K
Tokens
Released
Jan 2026
Compact, State-of-the-Art OCR
LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization.
Merged bbox variant: This model combines OCR-improving RLVR signals with bounding-box-focused RLVR updates via joint merging, preserving OCR quality while providing image localization.
Built for document intelligence
Document Conversion
Convert PDFs, scans, and images into clean, naturally ordered text without brittle OCR pipelines — end-to-end in a single model.
High-Speed Processing
At just 1B parameters, LightOnOCR-2 is ~9× smaller and significantly faster than competing approaches while achieving state-of-the-art OlmOCR-Bench scores.
Multilingual & LaTeX
Strong French, arXiv, and scan coverage with improved LaTeX handling and cleaner text normalization across multiple languages.
Flexible Pricing Tiers
Prices are per 1M tokens.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $0.05 | $0.05 |
| Async | $0.08 | $0.08 |
Context window natively supported up to 16k tokens.
Getting the Best Results
No System Prompt Needed
Do not include a system prompt or user prompt — the model has a tendency to repeat prompts in its answer. Just send the image directly.
payload = {
"model": "lightonai/LightOnOCR-2-1B-bbox-soup",
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}]
}],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9,
}Rendering & Preprocessing
Render PDFs to PNG or JPEG at a target longest dimension of 1540px. Maintain aspect ratio to preserve text geometry. Use one image per page.
Start Building in Minutes
LightOnOCR-2 is accessible via OpenAI-compatible endpoints.
from openai import OpenAI
client = OpenAI(
api_key="your-api-key-here",
base_url="https://api.doubleword.ai/v1"
)
# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
batch_file = client.files.create(
file=file,
purpose="batch"
)
print(f"File ID: {batch_file.id}")
# Step 2: Create a batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")