Deploying Ollama Server

Learn how to deploy an Ollama server on Spheron with the ability to load any model from the Ollama registry.

Prerequisites

Spheron account
Basic knowledge of OpenAI API & SDK
Knowledge of which Ollama models you want to use, if not you can search for models on ollama.com

Deployment Steps

Access Spheron Console

Navigate to console.spheron.network
Log in to your account
Go to billing page and deposit some funds to your account to use it for renting the GPU to run the Ollama server.

Select a GPU

Go to Marketplace tab
You have 2 options to choose from:
- Secure: For deploying on secure and data center grade provider. It is super reliable but costs more.
- Community: For deploying on community fizz nodes that are running on someones home machine. It might not be very reliable.
Now select any GPU you want to deploy on. You can also search the GPU name to find the exact GPU you want to deploy on.

Configure the Deployment

Select the template Ollama Server
Put any model name in OLLAMA_MODEL field that you want to preload on the Ollama Server. If you don’t know which model to use, you can search for models on ollama.com
If you want you can increase the GPU count to access multiple GPUs at once.
You can select the duration of the deployment.
Click on Confirm button to start the deployment
Deployment will be done in less than 60 seconds

Using Your Ollama Server

Once deployed, go to Overview tab.
Click on ollama-test service to open the Ollama Server service.
You can use the connection url to access the Ollama Server.
You can also use the connection url to load models using the Ollama API:


curl http://your-deployment-url/api/tags

Model Management on Ollama Server

List available models:


curl http://your-deployment-url/api/tags

Run inference:


curl -X POST http://your-deployment-url/api/generate -d '{"model": "llama2", "prompt": "Hello!"}'

Best Practices

Start with smaller models first
Consider GPU instances for better performance and faster inference

Last updated on September 23, 2025

Running Jupyter Notebook Running VS Code Server