Cost Optimization
GPU Tier Selection Matrix
Choose the right GPU tier based on how much VRAM your workload needs:
| VRAM Needed | GPU | Type | Approx. $/hr | Best For |
|---|---|---|---|---|
| <16GB | RTX 4090 (24GB) | Dedicated/Spot | ~$0.25–0.55 | Dev, inference, fine-tune |
| 40GB | A100 40GB | Dedicated/Spot | Variable | Mid-scale training |
| 80GB | A100 80GB / H100 | Dedicated | Variable | Large model training |
| 640GB+ | 8× H100 NVLink | Cluster | ~$15+/hr | Distributed training, K8s |
Check current prices in the dashboard; prices vary by provider and availability.
Instance Type Strategy
Spot (Cheapest)
Spot instances are 30–60% cheaper than Dedicated. The trade-off: they can be reclaimed by the provider at any time.
Use Spot for:- Experiments and hyperparameter search
- Batch training jobs with checkpoint saving enabled
- Any workload under 4 hours that can tolerate interruption
Handling interruption: Save checkpoints to a persistent volume every N steps. If the instance is reclaimed, resume from the latest checkpoint on a new instance without losing progress.
# Save checkpoint every 100 steps
if step % 100 == 0:
torch.save(state, '/checkpoints/checkpoint_latest.pt')Dedicated (Guaranteed)
Dedicated instances cannot be reclaimed. Use them when interruption would be costly:
- Production inference servers
- Multi-day training runs
- Interactive workloads and demos
Cluster (Largest Scale)
Full physical servers with NVLink interconnects. Use for:
- Multi-GPU distributed training (PyTorch DDP, DeepSpeed)
- Kubernetes cluster workloads
- Workloads requiring maximum GPU-to-GPU bandwidth
Reserved GPUs for Long-Term Work
For multi-week or multi-month projects, Reserved GPUs offer significant savings:
- Submit requests via dashboard → Reserved GPU
- Multiple providers compete to offer the lowest price
- Typical savings: 30–50% vs on-demand hourly rates for 3–12 month commitments
- Select "Any Location" to maximize provider competition
See Reserved GPUs for the request form.
Team Discount Program
Teams with active discounts automatically see reduced prices on the dashboard. The discounted price is applied at deploying GPU without any additional steps.
- Discounts are either volume-based or admin-assigned; the higher of the two is applied automatically
To enquire about discount eligibility for high-volume usage, contact support via Discord or email.
Monitoring Burn Rate
Check remaining balance
View your current credit balance on the Billing page in the dashboard. The balance updates in real time as instances run.
Track per-instance spend
Open the instance details drawer from the Instances page to see the hourly rate and total cost accumulated for a running deployment.
Terminate when done
Terminate instances from the dashboard as soon as your workload finishes to stop charges immediately. Navigate to Instances, select the instance, and click Terminate.
Set up balance alerts in User Settings to receive a notification before credits run out.
Practical Tips
Use persistent volumes for datasets and model weights. Avoid re-downloading multi-GB datasets on every deployment; mount a volume with data pre-loaded. This saves both time and egress costs.
Prefer Spot for short jobs. Any job under 4 hours that can be checkpointed is a good Spot candidate. Switch to Dedicated for multi-day runs requiring uninterrupted time.
Batch GPU use. Avoid leaving instances running idle. Terminate immediately when your job finishes, and re-deploy from a checkpoint when you resume work.
Use RTX 4090 for development. The RTX 4090 is the most cost-effective GPU for code iteration, small model experiments, and inference serving at low traffic. Graduate to A100/H100 only when VRAM or compute requirements demand it.
Additional Resources
- Instance Types: Detailed Spot/Dedicated/Cluster comparison
- Regions & Providers: Provider capabilities and GPU tiers
- Reserved GPUs: Long-term GPU reservation form
- Billing: Credit management, auto top-up, and team discounts
- Volume Mounting: Persistent storage for datasets and checkpoints