PyTorch Environment

What's Included

Core Frameworks:

PyTorch 2.x with CUDA support
torchvision and torchaudio
Hugging Face Transformers and Accelerate

Development Tools:

Jupyter Notebook for interactive development
TensorBoard for training visualization
NumPy, Pandas, Matplotlib
datasets and scikit-learn

System:

Ubuntu 22.04 or 24.04 LTS
NVIDIA drivers (550 or 570) pre-installed
CUDA toolkit (12.x)
Python 3.10–3.11

CUDA and NVIDIA Drivers

PyTorch requires a compatible CUDA version and NVIDIA driver. Spheron GPU images come with NVIDIA drivers and CUDA pre-installed.

Verify your driver and CUDA versions after connecting:

# Check NVIDIA driver version
nvidia-smi
 
# Check CUDA compiler version
nvcc --version
 
# Check which CUDA versions are installed
ls /usr/local/ | grep cuda

PyTorch ↔ CUDA compatibility:

PyTorch Version	CUDA 11.8	CUDA 12.1	CUDA 12.4
2.0.x	✓	✓	N/A
2.1.x	✓	✓	N/A
2.2.x	✓	✓	✓
2.3.x+	✓	✓	✓

Always match the --index-url in your pip install command to your CUDA version (see PyTorch install page).

Deploying a PyTorch Environment

Using a Pre-configured OS Image

Go to app.spheron.ai → Deploy
Choose your GPU
Select OS: Ubuntu 24.04 LTS ML PyTorch or Ubuntu 24.04 LTS ML Everything
Deploy; instance ready in 30–60 seconds

Using a Startup Script

Use the PyTorch + CUDA 12.1 startup template to install PyTorch on a base Ubuntu image:

#cloud-config
runcmd:
  - apt-get update -y
  - apt-get install -y python3.11 python3.11-venv
  - python3.11 -m ensurepip --upgrade
  - python3.11 -m pip install --upgrade pip
  - python3.11 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  - python3.11 -m pip install transformers accelerate bitsandbytes datasets

Verify Installation

After connecting via SSH:

# Check PyTorch version and CUDA availability
python3 -c "import torch; print('PyTorch:', torch.__version__); print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"
 
# Check GPU count and names
python3 -c "import torch; print('GPU count:', torch.cuda.device_count()); [print(f'  GPU {i}:', torch.cuda.get_device_name(i)) for i in range(torch.cuda.device_count())]"
 
# Run nvidia-smi to see GPU utilization
nvidia-smi

Expected output on an H100 instance:

PyTorch: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU count: 8
  GPU 0: NVIDIA H100 80GB HBM3
  ...

Quick Start

Basic GPU Computation

import torch
 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
 
# Move tensor to GPU
x = torch.randn(1000, 1000).to(device)
y = torch.matmul(x, x.T)
print("Matrix multiply done, result shape:", y.shape)

Load a Hugging Face Model on GPU

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
model_id = "meta-llama/Llama-3-8B-Instruct"  # replace with your model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",          # spreads across all available GPUs
)
 
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Common Packages

# Large language models
pip install transformers accelerate bitsandbytes peft trl
 
# Computer vision
pip install torchvision timm opencv-python pillow
 
# Distributed training
pip install deepspeed
 
# Experiment tracking
pip install wandb tensorboard
 
# Data
pip install datasets huggingface-hub

Troubleshooting

CUDA not available:

# Confirm NVIDIA driver is loaded
nvidia-smi
 
# Reinstall PyTorch matching your CUDA version
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Out of GPU memory:

Reduce batch size
Enable gradient checkpointing: model.gradient_checkpointing_enable()
Use BF16: model = model.to(torch.bfloat16)
Monitor memory: nvidia-smi -l 1 or torch.cuda.memory_summary()

Slow training: check NVLink / NVMe utilization:

# Per-GPU utilization
nvidia-smi dmon -s u
 
# NVLink status (on NVLink clusters)
nvidia-smi nvlink --status

Additional Resources

Distributed Training guide: PyTorch DDP on H100 bare-metal clusters
Templates & Images: PyTorch startup script
Ubuntu Environments: OS images with CUDA and NVIDIA drivers
TensorFlow: TensorFlow GPU environment