Skip to content

PyTorch Environment

What's Included

Core Frameworks:
  • PyTorch 2.x with CUDA support
  • torchvision and torchaudio
  • Hugging Face Transformers and Accelerate
Development Tools:
  • Jupyter Notebook for interactive development
  • TensorBoard for training visualization
  • NumPy, Pandas, Matplotlib
  • datasets and scikit-learn
System:
  • Ubuntu 22.04 or 24.04 LTS
  • NVIDIA drivers (550 or 570) pre-installed
  • CUDA toolkit (12.x)
  • Python 3.10–3.11

CUDA and NVIDIA Drivers

PyTorch requires a compatible CUDA version and NVIDIA driver. Spheron GPU images come with NVIDIA drivers and CUDA pre-installed.

Verify your driver and CUDA versions after connecting:
# Check NVIDIA driver version
nvidia-smi
 
# Check CUDA compiler version
nvcc --version
 
# Check which CUDA versions are installed
ls /usr/local/ | grep cuda
PyTorch ↔ CUDA compatibility:
PyTorch VersionCUDA 11.8CUDA 12.1CUDA 12.4
2.0.xN/A
2.1.xN/A
2.2.x
2.3.x+

Always match the --index-url in your pip install command to your CUDA version (see PyTorch install page).

Deploying a PyTorch Environment

Using a Pre-configured OS Image

  1. Go to app.spheron.aiDeploy
  2. Choose your GPU
  3. Select OS: Ubuntu 24.04 LTS ML PyTorch or Ubuntu 24.04 LTS ML Everything
  4. Deploy; instance ready in 30–60 seconds

Using a Startup Script

Use the PyTorch + CUDA 12.1 startup template to install PyTorch on a base Ubuntu image:

#cloud-config
runcmd:
  - apt-get update -y
  - apt-get install -y python3.11 python3.11-venv
  - python3.11 -m ensurepip --upgrade
  - python3.11 -m pip install --upgrade pip
  - python3.11 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  - python3.11 -m pip install transformers accelerate bitsandbytes datasets

Verify Installation

After connecting via SSH:

# Check PyTorch version and CUDA availability
python3 -c "import torch; print('PyTorch:', torch.__version__); print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"
 
# Check GPU count and names
python3 -c "import torch; print('GPU count:', torch.cuda.device_count()); [print(f'  GPU {i}:', torch.cuda.get_device_name(i)) for i in range(torch.cuda.device_count())]"
 
# Run nvidia-smi to see GPU utilization
nvidia-smi

Expected output on an H100 instance:

PyTorch: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU count: 8
  GPU 0: NVIDIA H100 80GB HBM3
  ...

Quick Start

Basic GPU Computation

import torch
 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
 
# Move tensor to GPU
x = torch.randn(1000, 1000).to(device)
y = torch.matmul(x, x.T)
print("Matrix multiply done, result shape:", y.shape)

Load a Hugging Face Model on GPU

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
model_id = "meta-llama/Llama-3-8B-Instruct"  # replace with your model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",          # spreads across all available GPUs
)
 
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Common Packages

# Large language models
pip install transformers accelerate bitsandbytes peft trl
 
# Computer vision
pip install torchvision timm opencv-python pillow
 
# Distributed training
pip install deepspeed
 
# Experiment tracking
pip install wandb tensorboard
 
# Data
pip install datasets huggingface-hub

Troubleshooting

CUDA not available:
# Confirm NVIDIA driver is loaded
nvidia-smi
 
# Reinstall PyTorch matching your CUDA version
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Out of GPU memory:
  • Reduce batch size
  • Enable gradient checkpointing: model.gradient_checkpointing_enable()
  • Use BF16: model = model.to(torch.bfloat16)
  • Monitor memory: nvidia-smi -l 1 or torch.cuda.memory_summary()
Slow training: check NVLink / NVMe utilization:
# Per-GPU utilization
nvidia-smi dmon -s u
 
# NVLink status (on NVLink clusters)
nvidia-smi nvlink --status

Additional Resources