CUDA & NVIDIA Drivers
What Are NVIDIA Drivers?
NVIDIA drivers are software components that let the operating system communicate with the GPU hardware. Without a compatible driver, the GPU cannot be used for any compute workload.
On Spheron instances, NVIDIA drivers come pre-installed on all GPU images. You don't need to install them manually.
Key points:- Drivers are specific to the GPU architecture (e.g., Hopper for H100, Ampere for A100)
- Each driver version exposes a maximum supported CUDA version
- Driver version ≠ CUDA version; they are separate but must be compatible
Check the installed driver after connecting:
nvidia-smiExample output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
+-----------------------------------------------------------------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================================================================================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:00:00.0 Off | 0 |
| N/A 34C P0 72W / 700W | 0MiB / 81920MiB | 0% Default |
+-----------------------------------------------------------------------------------------+The Driver Version and CUDA Version fields tell you exactly what is installed.
What Is CUDA?
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that lets software run computations on the GPU. Nearly all AI/ML frameworks depend on it.
CUDA has two components:| Component | What it is | How to check |
|---|---|---|
| CUDA Runtime | Libraries used by your application | nvidia-smi → shows max supported CUDA |
| CUDA Toolkit | Compiler (nvcc) + dev tools | nvcc --version |
# Compiler version (CUDA toolkit)
nvcc --version
# List all CUDA installations
ls /usr/local/ | grep cudaHow CUDA and Drivers Affect Development
The relationship between drivers, CUDA, and your frameworks determines what works:
GPU Hardware
└── NVIDIA Driver (minimum requirement)
└── CUDA Runtime (must be ≤ driver's max CUDA)
└── Framework (PyTorch, TensorFlow, JAX…)
└── Your Code- A newer driver supports a higher maximum CUDA version, but is backward-compatible with older CUDA runtimes
- If your framework requires CUDA 12.4 but the instance only has CUDA 12.0, builds or training runs will fail
- Mismatched versions are the most common source of
CUDA not availableerrors
| Framework | Minimum CUDA | Recommended CUDA |
|---|---|---|
| PyTorch 2.3+ | 11.8 | 12.1 – 12.4 |
| TensorFlow 2.16+ | 12.3 | 12.3 – 12.4 |
| JAX (latest) | 12.0 | 12.4+ |
| vLLM 0.4+ | 12.1 | 12.4 |
Always check the framework's official docs for the exact compatibility matrix before selecting a CUDA version.
Available CUDA Versions on Spheron
| CUDA Version | NVIDIA Driver | Notes |
|---|---|---|
| 12.0 | 525+ | Maximum compatibility with older frameworks |
| 12.4 | 550+ | Stable, broadly compatible; good default |
| 12.6 | 560+ | Optimized for RTX 5090, H100, newer GPUs |
| 12.8 Open | 570+ (open-source) | Open-source kernel module, community use |
| 13.0 Open | 575+ (open-source) | Latest features; early adoption and research use |
Open-source drivers (12.8 Open, 13.0 Open) use NVIDIA's open-source kernel module instead of the proprietary driver. They are functionally equivalent for most AI/ML workloads but preferred in community and research environments.
Choosing a Driver Version at Deployment
When deploying an instance on Spheron, the CUDA version and driver are selected via the OS image dropdown; they are bundled together.
Step-by-step
- Go to app.spheron.ai → Deploy
- Select your GPU
- Open the OS / Environment dropdown
- Choose an image that includes your desired CUDA version:
| Goal | Recommended Image |
|---|---|
| Stable AI/ML work | Ubuntu 22.04 + CUDA 12.4 or Ubuntu 24.04 ML PyTorch |
| Latest GPU support (H100, RTX 5090) | Ubuntu 24.04 + CUDA 12.6 |
| Open-source driver preference | Ubuntu 22.04 + CUDA 12.8 Open |
| Research and early adoption | Ubuntu 24.04 + CUDA 13.0 Open |
| Legacy framework compatibility | Ubuntu 20.04 + CUDA 12.0 |
- Deploy; the instance is ready in 30–60 seconds with the driver already loaded
You cannot change the CUDA version after deployment. If you need a different version, deploy a new instance with the correct image.
Verify After Deployment
Once connected via SSH, confirm the environment is set up correctly:
# Driver version and max supported CUDA
nvidia-smi
# CUDA toolkit version (compiler)
nvcc --version
# Installed CUDA directories
ls /usr/local/ | grep cuda
# Quick Python check (PyTorch)
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"Troubleshooting
nvidia-smi: command not found
The instance may have launched on a CPU-only node or the driver failed to load. Redeploy with a GPU image.
CUDA not available in PyTorch/TensorFlow
The framework's CUDA build doesn't match the installed runtime. Reinstall the framework with the correct CUDA wheel:
# PyTorch example - match cu124 to your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124nvcc: command not found but nvidia-smi works
The CUDA toolkit (compiler) isn't installed, only the runtime. Install it:
apt-get install -y cuda-toolkit-12-4Version mismatch between nvidia-smi and nvcc
This is expected behavior: nvidia-smi shows the driver's maximum supported CUDA, while nvcc shows the toolkit version. Both are valid as long as the toolkit version ≤ the driver's max CUDA.
Additional Resources
- Ubuntu Environments: Full list of OS images and configurations
- PyTorch Environment: PyTorch + CUDA setup
- TensorFlow Environment: TensorFlow + CUDA setup
- Templates & Images: Pre-built startup scripts