olmOCR-2-7B

Fine-tuned from Qwen2.5-VL-7B with GRPO RL training for superior math equation, table, and document OCR performance.

Get API Key Test in Playground

Provider

Ai2

Allen Institute for AI

Context Window

16K

Tokens

Type

Generation

OCR

Released

Oct 2025

About

RL-Enhanced Document OCR

olmOCR-2-7B is a release of the olmOCR model fine-tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine-tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. The model outputs natural-reading plain text with LaTeX for equations and HTML for tables.

Use Cases

Built for document intelligence

Natural Document Reading

Returns plain text as if reading the document naturally — ideal for search, summarization, and content extraction pipelines.

Math & Table Extraction

Fine-tuned with GRPO RL training to excel at math equations, complex tables, and other tricky OCR edge cases.

Figure & Chart Detection

Automatically labels figures and charts with descriptive alt text and bounding coordinates for downstream processing.

Pricing

Flexible Pricing Tiers

Prices are per 1M tokens.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$0.10	$0.10
Async	$0.15	$0.15

Context window natively supported up to 16k tokens.

Usage Tips

Getting the Best Results

Default Prompt

This model expects a prompt alongside the image. The default prompt converts equations to LaTeX, tables to HTML, and labels figures with markdown syntax.

prompt = """Attached is one page of a document that you must process.
Just return the plain text representation of this document as if
you were reading it naturally. Convert equations to LateX and
tables to HTML.
If there are any figures or charts, label them with the following
markdown syntax ![Alt text](page_startx_starty_width_height.png)
Return your output as markdown, with a front matter section on top
specifying values for the primary_language, is_rotation_valid,
rotation_correction, is_table, and is_diagram parameters"""

messages = [{"role": "user", "content": [
    {"type": "text", "text": prompt},
    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}}
]}]

Image Processing

This model expects a single document image as input, rendered such that the longest dimension is 1288 pixels. Maintain aspect ratio for best results.

Quickstart

Start Building in Minutes

olmOCR-2-7B is accessible via OpenAI-compatible endpoints.

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.doubleword.ai/v1"
)

# Step 1: Upload a batch input file
with open("batch_requests.jsonl", "rb") as file:
    batch_file = client.files.create(
        file=file,
        purpose="batch"
    )

print(f"File ID: {batch_file.id}")

# Step 2: Create a batch job
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Step 3: Check batch status
batch_status = client.batches.retrieve(batch.id)
print(f"Status: {batch_status.status}")

Ready to deploy olmOCR-2-7B?

Get Your API Keys Read the Full Documentation