Skip to content

Cost Optimization

GPU Tier Selection Matrix

Choose the right GPU tier based on how much VRAM your workload needs:

VRAM NeededGPUTypeApprox. $/hrBest For
<16GBRTX 4090 (24GB)Dedicated/Spot~$0.25–0.55Dev, inference, fine-tune
40GBA100 40GBDedicated/SpotVariableMid-scale training
80GBA100 80GB / H100DedicatedVariableLarge model training
640GB+8× H100 NVLinkCluster~$15+/hrDistributed training, K8s

Check current prices in the dashboard; prices vary by provider and availability.

Instance Type Strategy

Spot (Cheapest)

Spot instances are 30–60% cheaper than Dedicated. The trade-off: they can be reclaimed by the provider at any time.

Use Spot for:
  • Experiments and hyperparameter search
  • Batch training jobs with checkpoint saving enabled
  • Any workload under 4 hours that can tolerate interruption

Handling interruption: Save checkpoints to a persistent volume every N steps. If the instance is reclaimed, resume from the latest checkpoint on a new instance without losing progress.

# Save checkpoint every 100 steps
if step % 100 == 0:
    torch.save(state, '/checkpoints/checkpoint_latest.pt')

Dedicated (Guaranteed)

Dedicated instances cannot be reclaimed. Use them when interruption would be costly:

  • Production inference servers
  • Multi-day training runs
  • Interactive workloads and demos

Cluster (Largest Scale)

Full physical servers with NVLink interconnects. Use for:

  • Multi-GPU distributed training (PyTorch DDP, DeepSpeed)
  • Kubernetes cluster workloads
  • Workloads requiring maximum GPU-to-GPU bandwidth

Reserved GPUs for Long-Term Work

For multi-week or multi-month projects, Reserved GPUs offer significant savings:

  • Submit requests via dashboard → Reserved GPU
  • Multiple providers compete to offer the lowest price
  • Typical savings: 30–50% vs on-demand hourly rates for 3–12 month commitments
  • Select "Any Location" to maximize provider competition

See Reserved GPUs for the request form.

Team Discount Program

Teams with active discounts automatically see reduced prices on the dashboard. The discounted price is applied at deploying GPU without any additional steps.

  • Discounts are either volume-based or admin-assigned; the higher of the two is applied automatically

To enquire about discount eligibility for high-volume usage, contact support via Discord or email.

Monitoring Burn Rate

Check remaining balance

View your current credit balance on the Billing page in the dashboard. The balance updates in real time as instances run.

Track per-instance spend

Open the instance details drawer from the Instances page to see the hourly rate and total cost accumulated for a running deployment.

Terminate when done

Terminate instances from the dashboard as soon as your workload finishes to stop charges immediately. Navigate to Instances, select the instance, and click Terminate.

Set up balance alerts in User Settings to receive a notification before credits run out.

Practical Tips

Use persistent volumes for datasets and model weights. Avoid re-downloading multi-GB datasets on every deployment; mount a volume with data pre-loaded. This saves both time and egress costs.

Prefer Spot for short jobs. Any job under 4 hours that can be checkpointed is a good Spot candidate. Switch to Dedicated for multi-day runs requiring uninterrupted time.

Batch GPU use. Avoid leaving instances running idle. Terminate immediately when your job finishes, and re-deploy from a checkpoint when you resume work.

Use RTX 4090 for development. The RTX 4090 is the most cost-effective GPU for code iteration, small model experiments, and inference serving at low traffic. Graduate to A100/H100 only when VRAM or compute requirements demand it.

Additional Resources