Kubernetes Addon

Deploy a managed Kubernetes cluster on a Voltage Park Cluster instance using the Spheron K8s addon.

Prerequisites

Select a compatible offer. Browse compatible Voltage Park Cluster offers on the dashboard using the Cluster filter.
Select a Kubernetes version. Available Kubernetes versions are listed in the deployment form when configuring the instance.

Enabling the Addon

Enable the Kubernetes addon from the deployment form on the dashboard. Select a Kubernetes version when configuring the instance. The cluster provisions automatically during deployment.

Wait for the deployment status to change to running before retrieving the kubeconfig.

Extracting kubeconfig

Once the deployment status is running, retrieve the kubeconfig from the instance details drawer in the dashboard. Save it locally:

mkdir -p ~/.kube
# Copy the kubeconfig value from the deployment details into ~/.kube/config
chmod 600 ~/.kube/config
 
# Verify cluster connectivity
kubectl get nodes

Expected output:

NAME              STATUS   ROLES           AGE   VERSION
spheron-gpu-0     Ready    control-plane   5m    v1.35.0

Deploying a GPU Workload

Example Pod spec requesting all 8 H100 GPUs:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  containers:
    - name: trainer
      image: nvcr.io/nvidia/pytorch:24.01-py3
      command: ["python3", "-c", "import torch; print(torch.cuda.device_count())"]
      resources:
        limits:
          nvidia.com/gpu: 8
  restartPolicy: Never

Apply it:

kubectl apply -f gpu-pod.yaml
kubectl logs gpu-training

For smaller allocations, change nvidia.com/gpu: 8 to the number of GPUs your workload needs.

Cluster Health Check

Monitor the health status of a running cluster using the pre-provisioned Grafana dashboard. The Grafana URL is available in the instance details in the dashboard.

Grafana Dashboard

A pre-provisioned Grafana URL for cluster metrics (GPU utilization, memory, network) is available in the instance details in the dashboard. Open the URL directly in a browser. No additional setup is required.

Additional Resources

Instance Types: Cluster requirements for the K8s addon
Regions & Providers: Voltage Park capabilities
Distributed Training: Multi-GPU training without K8s
Volume Mounting (Voltage Park): Persistent storage for K8s workloads