Guides to Deploy on Spheron

Where to Start

Training a model? → Start with Distributed Training for multi-GPU, or pick any RTX 4090 Spot instance for single-GPU fine-tuning
Running inference? → vLLM Server for an OpenAI-compatible API; Ollama for interactive local usage
Running an AI node? → See the AI Nodes section below

Training

Model training guides, from single-GPU fine-tuning to large-scale distributed runs.

Distributed Training (PyTorch DDP)

Multi-GPU PyTorch DDP and DeepSpeed ZeRO-3 on a Voltage Park bare-metal H100 NVLink cluster (up to 8× H100). Covers torchrun, gradient checkpointing, BF16 precision, checkpoint persistence, and GPU monitoring.

Hardware: Voltage Park Cluster (H100 NVLink, up to 8 GPUs)

LLM Inference

Deploy and serve large language models on Spheron GPU instances.

vLLM Inference Server

OpenAI-compatible inference server using vLLM on H100 or A100. Includes a systemd service for persistence, SSH tunnel access, and performance tuning flags.

Hardware: H100 80GB (7B–13B models) · 2× A100 80GB (30B+)