Doubleword
    OCR
    Vision-Language
    GRPO RL
    Open Weights
    New

    olmOCR-2-7B

    Fine-tuned from Qwen2.5-VL-7B with GRPO RL training for superior math equation, table, and document OCR performance.

    Provider

    Ai2

    Allen Institute for AI

    Context Window

    16K

    Tokens

    Type

    Generation

    OCR

    Released

    Oct 2025

    About

    RL-Enhanced Document OCR

    olmOCR-2-7B is a release of the olmOCR model fine-tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine-tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. The model outputs natural-reading plain text with LaTeX for equations and HTML for tables.

    Use Cases

    Built for document intelligence

    Natural Document Reading

    Returns plain text as if reading the document naturally — ideal for search, summarization, and content extraction pipelines.

    Math & Table Extraction

    Fine-tuned with GRPO RL training to excel at math equations, complex tables, and other tricky OCR edge cases.

    Figure & Chart Detection

    Automatically labels figures and charts with descriptive alt text and bounding coordinates for downstream processing.

    Pricing

    Flexible Pricing Tiers

    Prices are per 1M tokens.

    TierInput / 1M tokensOutput / 1M tokens
    Standard$0.10$0.10
    Async$0.15$0.15

    Context window natively supported up to 16k tokens.

    Usage Tips

    Getting the Best Results

    Default Prompt

    This model expects a prompt alongside the image. The default prompt converts equations to LaTeX, tables to HTML, and labels figures with markdown syntax.

    prompt = """Attached is one page of a document that you must process.
    Just return the plain text representation of this document as if
    you were reading it naturally. Convert equations to LateX and
    tables to HTML.
    If there are any figures or charts, label them with the following
    markdown syntax ![Alt text](page_startx_starty_width_height.png)
    Return your output as markdown, with a front matter section on top
    specifying values for the primary_language, is_rotation_valid,
    rotation_correction, is_table, and is_diagram parameters"""
    
    messages = [{"role": "user", "content": [
        {"type": "text", "text": prompt},
        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}}
    ]}]

    Image Processing

    This model expects a single document image as input, rendered such that the longest dimension is 1288 pixels. Maintain aspect ratio for best results.

    Quickstart

    Start Building in Minutes

    olmOCR-2-7B is accessible via OpenAI-compatible endpoints.

    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="your-api-key-here",
        base_url="https://api.doubleword.ai/v1"
    )
    
    # Step 1: Upload a batch input file
    with open("batch_requests.jsonl", "rb") as file:
        batch_file = client.files.create(
            file=file,
            purpose="batch"
        )
    
    print(f"File ID: {batch_file.id}")
    
    # Step 2: Create a batch job
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    print(f"Batch ID: {batch.id}")
    
    # Step 3: Check batch status
    batch_status = client.batches.retrieve(batch.id)
    print(f"Status: {batch_status.status}")