Skip to content

Chandra OCR

Next-generation OCR model delivering comprehensive document intelligence. Converts images/PDFs to structured Markdown, HTML, or JSON while preserving layout and hierarchy.

Performance: 83.1% accuracy (olmOCR benchmark) - outperforms DeepSeek OCR, Mistral OCR, GPT-4o

Key Capabilities

  • Multi-format output (Markdown, HTML, JSON)
  • Handwriting recognition
  • Form reconstruction (including checkboxes)
  • Complex layouts (tables, math equations)
  • Visual element extraction (images, diagrams, captions)
  • 40+ languages
Inference Modes:
  • Local: HuggingFace transformers (privacy-sensitive, edge)
  • Remote: vLLM server (scalable production, high throughput)

Benchmarks: 83.1% overall accuracy (olmOCR)

  • Headers/Footers: 90.8%
  • Long Tiny Text: 92.3%
  • Tables: 88.0%
  • ArXiv: 82.2%

vs Competitors: +13.2pp vs GPT-4o | +19.3pp vs Gemini Flash 2 | +4pp vs dots.ocr

Deployment Tiers

TierGPUPerformanceUse Case
Dev/TestCPU0.1-0.3 img/sPoC, batch processing
Cost-OptimizedRTX 3060/4060 Ti (4-bit)0.4-0.8 img/sModerate volumes
High-PerformanceRTX 3090/4090, L40S (BF16/FP16)1.5-3.0 img/sHigh daily volumes
EnterpriseA100/H100 (FlashAttention2)3.0-5.0 img/sMission-critical pipelines
Distributed2x A100/H100 (tensor-parallel)5.0-8.0 img/sReal-time OCR services

Model: HuggingFace

Deploy on Spheron

  1. Sign up at app.spheron.ai
  2. Add credits (card/crypto)
  3. Deploy → Select GPU (see tiers above) → Region → Ubuntu 22.04 → SSH key → Deploy
ssh -i <private-key-path> root@<your-vm-ip>

New to Spheron? Getting Started | SSH Setup

Installation

Update System

sudo apt update && sudo apt install -y software-properties-common curl ca-certificates

Add Python PPA

sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Install Python 3.11

sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev

Setup pip

python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Create Virtual Environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Install PyTorch (CUDA 12.1)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Install Chandra OCR

pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

Usage

Launch Web Interface

chandra_app

Access at: http://localhost:8501

Features:
  • Upload PDFs or images
  • Visualize OCR results
  • Export as Markdown, HTML, or JSON

Programmatic Usage

from chandra_ocr import ChandraOCR
 
# Initialize the model
ocr = ChandraOCR()
 
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
 
# Print the result
print(result)

Batch Processing

import os
from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
 
for filename in os.listdir(input_dir):
    if filename.endswith((".pdf", ".png", ".jpg")):
        input_path = os.path.join(input_dir, filename)
        result = ocr.process(input_path, output_format="markdown")
        
        output_path = os.path.join(output_dir, f"{filename}.md")
        with open(output_path, "w") as f:
            f.write(result)

Advanced Configuration

vLLM Server (High Throughput)

# Install vLLM if not already installed
pip install vllm
 
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model datalab-to/chandra \
    --dtype bfloat16 \
    --max-model-len 4096

Custom Parameters

from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR(
    max_tokens=2048,
    temperature=0.7,
    batch_size=4,
    use_flash_attention=True
)

Performance Optimization

Memory:
  • Use 4-bit/8-bit quantization
  • Reduce batch size for OOM
  • Gradient checkpointing for large docs
Speed:
  • FlashAttention2 (A100/H100)
  • vLLM for concurrent processing
  • Distributed inference for high volumes
Accuracy:
  • BF16/FP16 precision
  • High resolution (2560px+)
  • Multi-pass for critical docs

Troubleshooting

OOM Errors

# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
 
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
 
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)

Slow Processing

# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
 
# Check GPU utilization
nvidia-smi
 
# Enable vLLM for better throughput
# See Advanced Configuration section above

Installation Issues

# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
 
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocr

Supported Formats

Input: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned docs, screenshots

Specialized: Academic papers, forms, tables, equations, diagrams, handwritten notes

Output:
  • Markdown - Preserves structure, hierarchy, formatting
  • HTML - Browser-ready with semantic markup
  • JSON - Text, layout, bounding boxes, confidence, metadata

Best Practices

Document Quality:
  • Good lighting, high resolution (300+ DPI)
  • Avoid skew/rotation, remove noise
Deployment:
  • Start with Balanced tier, scale as needed
  • Monitor GPU usage, adjust batch sizes
  • Error handling and retry logic
Production:
  • Async processing for web apps
  • Queue systems for high volumes
  • Cache frequent documents
  • Logging and monitoring

Use Cases

Enterprise: Legacy archives, invoice automation, contract analysis, compliance
Academic: Research papers, databases, publications, historical docs
Legal/Financial: Contracts, statements, filings, due diligence
Healthcare: Medical records, prescriptions, forms, clinical trials

Additional Resources