Chandra OCR
Next-generation OCR model delivering comprehensive document intelligence. Converts images/PDFs to structured Markdown, HTML, or JSON while preserving layout and hierarchy.
Performance: 83.1% accuracy (olmOCR benchmark) - outperforms DeepSeek OCR, Mistral OCR, GPT-4o
Key Capabilities
- Multi-format output (Markdown, HTML, JSON)
- Handwriting recognition
- Form reconstruction (including checkboxes)
- Complex layouts (tables, math equations)
- Visual element extraction (images, diagrams, captions)
- 40+ languages
- Local: HuggingFace transformers (privacy-sensitive, edge)
- Remote: vLLM server (scalable production, high throughput)
Benchmarks: 83.1% overall accuracy (olmOCR)
- Headers/Footers: 90.8%
- Long Tiny Text: 92.3%
- Tables: 88.0%
- ArXiv: 82.2%
vs Competitors: +13.2pp vs GPT-4o | +19.3pp vs Gemini Flash 2 | +4pp vs dots.ocr
Deployment Tiers
| Tier | GPU | Performance | Use Case |
|---|---|---|---|
| Dev/Test | CPU | 0.1-0.3 img/s | PoC, batch processing |
| Cost-Optimized | RTX 3060/4060 Ti (4-bit) | 0.4-0.8 img/s | Moderate volumes |
| High-Performance | RTX 3090/4090, L40S (BF16/FP16) | 1.5-3.0 img/s | High daily volumes |
| Enterprise | A100/H100 (FlashAttention2) | 3.0-5.0 img/s | Mission-critical pipelines |
| Distributed | 2x A100/H100 (tensor-parallel) | 5.0-8.0 img/s | Real-time OCR services |
Model: HuggingFace
Deploy on Spheron
- Sign up at app.spheron.ai
- Add credits (card/crypto)
- Deploy → Select GPU (see tiers above) → Region → Ubuntu 22.04 → SSH key → Deploy
ssh -i <private-key-path> root@<your-vm-ip>New to Spheron? Getting Started | SSH Setup
Installation
Update System
sudo apt update && sudo apt install -y software-properties-common curl ca-certificatesAdd Python PPA
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt updateInstall Python 3.11
sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-devSetup pip
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheelCreate Virtual Environment
python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activateInstall PyTorch (CUDA 12.1)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudioInstall Chandra OCR
pip install chandra-ocr vllm transformers accelerate pillow bitsandbytesUsage
Launch Web Interface
chandra_appAccess at: http://localhost:8501
Features:- Upload PDFs or images
- Visualize OCR results
- Export as Markdown, HTML, or JSON
Programmatic Usage
from chandra_ocr import ChandraOCR
# Initialize the model
ocr = ChandraOCR()
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
# Print the result
print(result)Batch Processing
import os
from chandra_ocr import ChandraOCR
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
for filename in os.listdir(input_dir):
if filename.endswith((".pdf", ".png", ".jpg")):
input_path = os.path.join(input_dir, filename)
result = ocr.process(input_path, output_format="markdown")
output_path = os.path.join(output_dir, f"{filename}.md")
with open(output_path, "w") as f:
f.write(result)Advanced Configuration
vLLM Server (High Throughput)
# Install vLLM if not already installed
pip install vllm
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
--model datalab-to/chandra \
--dtype bfloat16 \
--max-model-len 4096Custom Parameters
from chandra_ocr import ChandraOCR
ocr = ChandraOCR(
max_tokens=2048,
temperature=0.7,
batch_size=4,
use_flash_attention=True
)Performance Optimization
Memory:- Use 4-bit/8-bit quantization
- Reduce batch size for OOM
- Gradient checkpointing for large docs
- FlashAttention2 (A100/H100)
- vLLM for concurrent processing
- Distributed inference for high volumes
- BF16/FP16 precision
- High resolution (2560px+)
- Multi-pass for critical docs
Troubleshooting
OOM Errors
# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)Slow Processing
# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
# Check GPU utilization
nvidia-smi
# Enable vLLM for better throughput
# See Advanced Configuration section aboveInstallation Issues
# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocrSupported Formats
Input: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned docs, screenshots
Specialized: Academic papers, forms, tables, equations, diagrams, handwritten notes
Output:- Markdown - Preserves structure, hierarchy, formatting
- HTML - Browser-ready with semantic markup
- JSON - Text, layout, bounding boxes, confidence, metadata
Best Practices
Document Quality:- Good lighting, high resolution (300+ DPI)
- Avoid skew/rotation, remove noise
- Start with Balanced tier, scale as needed
- Monitor GPU usage, adjust batch sizes
- Error handling and retry logic
- Async processing for web apps
- Queue systems for high volumes
- Cache frequent documents
- Logging and monitoring
Use Cases
Enterprise: Legacy archives, invoice automation, contract analysis, compliance
Academic: Research papers, databases, publications, historical docs
Legal/Financial: Contracts, statements, filings, due diligence
Healthcare: Medical records, prescriptions, forms, clinical trials
Additional Resources
- Chandra on HuggingFace
- vLLM Documentation
- Getting Started - Spheron deployment
- API Reference - Programmatic access