Skip to content

CUDA & NVIDIA Drivers

What Are NVIDIA Drivers?

NVIDIA drivers are software components that let the operating system communicate with the GPU hardware. Without a compatible driver, the GPU cannot be used for any compute workload.

On Spheron instances, NVIDIA drivers come pre-installed on all GPU images. You don't need to install them manually.

Key points:
  • Drivers are specific to the GPU architecture (e.g., Hopper for H100, Ampere for A100)
  • Each driver version exposes a maximum supported CUDA version
  • Driver version ≠ CUDA version; they are separate but must be compatible

Check the installed driver after connecting:

nvidia-smi

Example output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08             Driver Version: 550.127.08   CUDA Version: 12.4      |
+-----------------------------------------------------------------------------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================================================================|
|   0  NVIDIA H100 80GB HBM3           Off |   00000000:00:00.0 Off |                    0 |
| N/A   34C    P0             72W / 700W |      0MiB / 81920MiB |      0%      Default |
+-----------------------------------------------------------------------------------------+

The Driver Version and CUDA Version fields tell you exactly what is installed.

What Is CUDA?

CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that lets software run computations on the GPU. Nearly all AI/ML frameworks depend on it.

CUDA has two components:
ComponentWhat it isHow to check
CUDA RuntimeLibraries used by your applicationnvidia-smi → shows max supported CUDA
CUDA ToolkitCompiler (nvcc) + dev toolsnvcc --version
# Compiler version (CUDA toolkit)
nvcc --version
 
# List all CUDA installations
ls /usr/local/ | grep cuda

How CUDA and Drivers Affect Development

The relationship between drivers, CUDA, and your frameworks determines what works:

GPU Hardware
    └── NVIDIA Driver  (minimum requirement)
            └── CUDA Runtime  (must be ≤ driver's max CUDA)
                    └── Framework (PyTorch, TensorFlow, JAX…)
                              └── Your Code
What this means in practice:
  • A newer driver supports a higher maximum CUDA version, but is backward-compatible with older CUDA runtimes
  • If your framework requires CUDA 12.4 but the instance only has CUDA 12.0, builds or training runs will fail
  • Mismatched versions are the most common source of CUDA not available errors
Framework ↔ CUDA compatibility quick reference:
FrameworkMinimum CUDARecommended CUDA
PyTorch 2.3+11.812.1 – 12.4
TensorFlow 2.16+12.312.3 – 12.4
JAX (latest)12.012.4+
vLLM 0.4+12.112.4

Always check the framework's official docs for the exact compatibility matrix before selecting a CUDA version.

Available CUDA Versions on Spheron

CUDA VersionNVIDIA DriverNotes
12.0525+Maximum compatibility with older frameworks
12.4550+Stable, broadly compatible; good default
12.6560+Optimized for RTX 5090, H100, newer GPUs
12.8 Open570+ (open-source)Open-source kernel module, community use
13.0 Open575+ (open-source)Latest features; early adoption and research use

Open-source drivers (12.8 Open, 13.0 Open) use NVIDIA's open-source kernel module instead of the proprietary driver. They are functionally equivalent for most AI/ML workloads but preferred in community and research environments.

Choosing a Driver Version at Deployment

When deploying an instance on Spheron, the CUDA version and driver are selected via the OS image dropdown; they are bundled together.

Step-by-step

  1. Go to app.spheron.aiDeploy
  2. Select your GPU
  3. Open the OS / Environment dropdown
  4. Choose an image that includes your desired CUDA version:
GoalRecommended Image
Stable AI/ML workUbuntu 22.04 + CUDA 12.4 or Ubuntu 24.04 ML PyTorch
Latest GPU support (H100, RTX 5090)Ubuntu 24.04 + CUDA 12.6
Open-source driver preferenceUbuntu 22.04 + CUDA 12.8 Open
Research and early adoptionUbuntu 24.04 + CUDA 13.0 Open
Legacy framework compatibilityUbuntu 20.04 + CUDA 12.0
  1. Deploy; the instance is ready in 30–60 seconds with the driver already loaded

You cannot change the CUDA version after deployment. If you need a different version, deploy a new instance with the correct image.

Verify After Deployment

Once connected via SSH, confirm the environment is set up correctly:

# Driver version and max supported CUDA
nvidia-smi
 
# CUDA toolkit version (compiler)
nvcc --version
 
# Installed CUDA directories
ls /usr/local/ | grep cuda
 
# Quick Python check (PyTorch)
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"

Troubleshooting

nvidia-smi: command not found The instance may have launched on a CPU-only node or the driver failed to load. Redeploy with a GPU image.

CUDA not available in PyTorch/TensorFlow The framework's CUDA build doesn't match the installed runtime. Reinstall the framework with the correct CUDA wheel:

# PyTorch example - match cu124 to your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124

nvcc: command not found but nvidia-smi works The CUDA toolkit (compiler) isn't installed, only the runtime. Install it:

apt-get install -y cuda-toolkit-12-4

Version mismatch between nvidia-smi and nvcc This is expected behavior: nvidia-smi shows the driver's maximum supported CUDA, while nvcc shows the toolkit version. Both are valid as long as the toolkit version ≤ the driver's max CUDA.

Additional Resources