Guides to Deploy on Spheron
Where to Start
- Training a model? → Start with Distributed Training for multi-GPU, or pick any RTX 4090 Spot instance for single-GPU fine-tuning
- Running inference? → vLLM Server for an OpenAI-compatible API; Ollama for interactive local usage
- Running an AI node? → See the AI Nodes section below
Training
Model training guides, from single-GPU fine-tuning to large-scale distributed runs.
Distributed Training (PyTorch DDP)
Multi-GPU PyTorch DDP and DeepSpeed ZeRO-3 on a Voltage Park bare-metal H100 NVLink cluster (up to 8× H100). Covers torchrun, gradient checkpointing, BF16 precision, checkpoint persistence, and GPU monitoring.
Hardware: Voltage Park Cluster (H100 NVLink, up to 8 GPUs)
LLM Inference
Deploy and serve large language models on Spheron GPU instances.
vLLM Inference Server
OpenAI-compatible inference server using vLLM on H100 or A100. Includes a systemd service for persistence, SSH tunnel access, and performance tuning flags.
Hardware: H100 80GB (7B–13B models) · 2× A100 80GB (30B+)
Ollama + Open WebUI
Browser-based chat interface backed by Ollama on an RTX 4090. Docker Compose setup with GPU passthrough; pull any model with one command.
Hardware: RTX 4090 (24GB VRAM)
Qwen3-Omni-30B-A3B
Multimodal AI model with 30B parameters supporting text, audio, and vision inputs.
Qwen3-VL 4B & 8B
Vision-language models in 4B and 8B parameter versions for image understanding.
Chandra OCR
Specialized OCR model for document processing and text extraction.
Soulx Podcast-1.7B
Compact 1.7B parameter model optimized for podcast and audio content generation.
Janus CoderV-8B
Code generation and understanding model with 8B parameters.
Baidu Ernie-4.5-VL-28B-A3B
Advanced vision-language model from Baidu with 28B parameters.
AI Nodes
Deploy and run specialized AI network nodes.
Gonka AI Node
Deploy Gonka AI node infrastructure for AI compute network participation.
Pluralis Node 0
Set up and run Pluralis Node 0 for distributed AI network participation.
Additional Resources
- Instance Types: Choose the right GPU for your workload
- Cost Optimization: Reduce training and inference costs
- Templates & Images: Copy-ready startup scripts
- API Reference: Automate deployments programmatically