Ollama + Open WebUI

Run Ollama with Open WebUI on an RTX 4090 Spheron instance. Open WebUI provides a browser-based chat interface backed by any model Ollama can load into VRAM.

Recommended Hardware

GPU: RTX 4090 (24GB VRAM) Instance Type: Dedicated or Spot OS: Ubuntu 22.04 LTS

The RTX 4090 supports:

Models up to ~13B parameters at Q4 quantization
Models up to ~7B parameters in full precision (FP16)

For larger models (30B+), use an A100 or H100 instead.

Cloud-Init Startup Script

Paste this into the Startup Script field when deploying. It installs Docker with the NVIDIA Container Toolkit, writes a docker-compose.yml, and starts both services.

#cloud-config
runcmd:
  - apt-get update -y
  - apt-get install -y ca-certificates curl gnupg
  - install -m 0755 -d /etc/apt/keyrings
  - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
  - chmod a+r /etc/apt/keyrings/docker.gpg
  - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" > /etc/apt/sources.list.d/docker.list
  - apt-get update -y
  - apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
  - curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
  - curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' > /etc/apt/sources.list.d/nvidia-container-toolkit.list
  - apt-get update -y
  - apt-get install -y nvidia-container-toolkit
  - nvidia-ctk runtime configure --runtime=docker
  - systemctl restart docker
  - mkdir -p /opt/ollama
  - |
    cat > /opt/ollama/docker-compose.yml << 'EOF'
    services:
      ollama:
        image: ollama/ollama:latest
        container_name: ollama
        runtime: nvidia
        environment:
          - NVIDIA_VISIBLE_DEVICES=all
        volumes:
          - ollama_data:/root/.ollama
        ports:
          - "11434:11434"
        restart: unless-stopped
 
      open-webui:
        image: ghcr.io/open-webui/open-webui:main
        container_name: open-webui
        ports:
          - "3000:8080"
        environment:
          - OLLAMA_BASE_URL=http://ollama:11434
        volumes:
          - webui_data:/app/backend/data
        depends_on:
          - ollama
        restart: unless-stopped
 
    volumes:
      ollama_data:
      webui_data:
    EOF
  - chmod 644 /opt/ollama/docker-compose.yml
  - docker compose -f /opt/ollama/docker-compose.yml up -d

Access the Web UI

SSH Tunnel (Recommended)

ssh -L 3000:localhost:3000 <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel in the dashboard (e.g., ubuntu for Spheron AI instances). Then open http://localhost:3000 in your browser. On first launch, create an admin account.

Pull and Use Models

Pull a model

docker exec -it ollama ollama pull llama3.2

Popular models for RTX 4090:

docker exec -it ollama ollama pull llama3.2        # 3B, fast
docker exec -it ollama ollama pull llama3.1:8b     # 8B, good balance
docker exec -it ollama ollama pull mistral:7b      # 7B, good quality
docker exec -it ollama ollama pull codellama:13b   # 13B coding model

CLI usage

docker exec -it ollama ollama run llama3.2

API usage

Ollama exposes an OpenAI-compatible API on port 11434. It is published to the host, so you can tunnel it the same way:

ssh -L 11434:localhost:11434 <user>@<ipAddress>

Then query it:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama3.2",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

Memory Guidelines for RTX 4090 (24GB VRAM)

Model	Quantization	VRAM	Fits on 4090?
7B	FP16	~14GB	Yes
7B	Q4	~4GB	Yes
13B	Q4	~8GB	Yes
30B	Q4	~20GB	Yes (tight)
70B	Q4	~40GB	No; use A100/H100

Additional Resources

vLLM Inference Server: OpenAI-compatible API for production workloads
Templates & Images: Additional startup script templates
Networking: SSH tunneling and port access
Instance Types: Choosing the right GPU for your model