Chandra OCR

Next-generation OCR model delivering comprehensive document intelligence. Converts images/PDFs to structured Markdown, HTML, or JSON while preserving layout and hierarchy.

Performance: 83.1% accuracy (olmOCR benchmark) - outperforms DeepSeek OCR, Mistral OCR, GPT-4o

Key Capabilities

Multi-format output (Markdown, HTML, JSON)
Handwriting recognition
Form reconstruction (including checkboxes)
Complex layouts (tables, math equations)
Visual element extraction (images, diagrams, captions)
40+ languages

Inference Modes:

Local: HuggingFace transformers (privacy-sensitive, edge)
Remote: vLLM server (scalable production, high throughput)

Benchmarks: 83.1% overall accuracy (olmOCR)

Headers/Footers: 90.8%
Long Tiny Text: 92.3%
Tables: 88.0%
ArXiv: 82.2%

vs Competitors: +13.2pp vs GPT-4o | +19.3pp vs Gemini Flash 2 | +4pp vs dots.ocr

Deployment Tiers

Tier	GPU	Performance	Use Case
Dev/Test	CPU	0.1-0.3 img/s	PoC, batch processing
Cost-Optimized	RTX 3060/4060 Ti (4-bit)	0.4-0.8 img/s	Moderate volumes
High-Performance	RTX 3090/4090, L40S (BF16/FP16)	1.5-3.0 img/s	High daily volumes
Enterprise	A100/H100 (FlashAttention2)	3.0-5.0 img/s	Mission-critical pipelines
Distributed	2x A100/H100 (tensor-parallel)	5.0-8.0 img/s	Real-time OCR services

Model: HuggingFace

Deploy on Spheron

Sign up at app.spheron.ai
Add credits (card/crypto)
Deploy → Select GPU (see tiers above) → Region → Ubuntu 22.04 → SSH key → Deploy

ssh -i <private-key-path> root@<your-vm-ip>

New to Spheron? Getting Started | SSH Setup

Installation

Update System

sudo apt update && sudo apt install -y software-properties-common curl ca-certificates

Add Python PPA

sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Install Python 3.11

sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev

Setup pip

python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Create Virtual Environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Install PyTorch (CUDA 12.1)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Install Chandra OCR

pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

Usage

Launch Web Interface

chandra_app

Access at: http://localhost:8501

Features:

Upload PDFs or images
Visualize OCR results
Export as Markdown, HTML, or JSON

Programmatic Usage

from chandra_ocr import ChandraOCR
 
# Initialize the model
ocr = ChandraOCR()
 
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
 
# Print the result
print(result)

Batch Processing

import os
from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
 
for filename in os.listdir(input_dir):
    if filename.endswith((".pdf", ".png", ".jpg")):
        input_path = os.path.join(input_dir, filename)
        result = ocr.process(input_path, output_format="markdown")
        
        output_path = os.path.join(output_dir, f"{filename}.md")
        with open(output_path, "w") as f:
            f.write(result)

Advanced Configuration

vLLM Server (High Throughput)

# Install vLLM if not already installed
pip install vllm
 
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model datalab-to/chandra \
    --dtype bfloat16 \
    --max-model-len 4096

Custom Parameters

from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR(
    max_tokens=2048,
    temperature=0.7,
    batch_size=4,
    use_flash_attention=True
)

Performance Optimization

Memory:

Use 4-bit/8-bit quantization
Reduce batch size for OOM
Gradient checkpointing for large docs

Speed:

FlashAttention2 (A100/H100)
vLLM for concurrent processing
Distributed inference for high volumes

Accuracy:

BF16/FP16 precision
High resolution (2560px+)
Multi-pass for critical docs

Troubleshooting

OOM Errors

# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
 
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
 
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)

Slow Processing

# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
 
# Check GPU utilization
nvidia-smi
 
# Enable vLLM for better throughput
# See Advanced Configuration section above

Installation Issues

# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
 
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocr

Supported Formats

Input: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned docs, screenshots

Specialized: Academic papers, forms, tables, equations, diagrams, handwritten notes

Output:

Markdown - Preserves structure, hierarchy, formatting
HTML - Browser-ready with semantic markup
JSON - Text, layout, bounding boxes, confidence, metadata

Best Practices

Document Quality:

Good lighting, high resolution (300+ DPI)
Avoid skew/rotation, remove noise

Deployment:

Start with Balanced tier, scale as needed
Monitor GPU usage, adjust batch sizes
Error handling and retry logic

Production:

Async processing for web apps
Queue systems for high volumes
Cache frequent documents
Logging and monitoring

Use Cases

Enterprise: Legacy archives, invoice automation, contract analysis, compliance
Academic: Research papers, databases, publications, historical docs
Legal/Financial: Contracts, statements, filings, due diligence
Healthcare: Medical records, prescriptions, forms, clinical trials

Additional Resources

Chandra on HuggingFace
vLLM Documentation
Getting Started - Spheron deployment
API Reference - Programmatic access