Ollama on Lawrencium¶
Ollama is provided as a module on Lawrencium and can be used to run various open-weight Large-Language Models (LLMs) on Lawrencium hardware.
Ollama should only be run on compute nodes. While some smaller models may be run on CPU partitions, running Ollama on GPU partitions (es0 or es1) will have better performance. In this page, we will discuss how to run Ollama in an interactive slurm allocation. We also provide an Open OnDemand app that allows you to run Ollama on Open OnDemand and access Ollama through Jupyter notebooks. You could also run Ollama through a slurm batch submission script.
General suggestions
For general use and experimentation, we recommend using the gpt-oss-high:20b model under ai/ollama/0.12.6 module. The model performs well using 2 GPUs on the es0 partition.
Example: Getting an interactive slurm allocation
Please substitute with your actual account_name and other relevant parameters.
srun -p es0 -A account_name -q es_normal -N 1 -t 1:00:00 --gres=gpu:2 --cpus-per-task=2 --pty bash
-
Load the
ollama/0.12.6modulemodule load ai/ollama/0.12.6 -
Launch the ollama server in the background
ollama serve > /dev/null 2>&1 & -
List the available models
ollama list NAME ID SIZE MODIFIED gpt-oss-high:20b 5b4abdd0eea4 13 GB 3 days ago gpt-oss-safeguard:20b f2e795d0099c 13 GB 5 days ago gemma3n:e2b 719372f8c7de 5.6 GB 5 days ago gemma3n:e4b 15cb39fd9394 7.5 GB 5 days ago granite4:tiny-h 566b725534ea 4.2 GB 5 days ago gpt-oss:20b 17052f91a42e 13 GB 9 days ago llama3.2-vision:11b 6f2f9757ae97 7.8 GB 9 days ago -
Run an available model and obtain a terminal chat interface
ollama run gpt-oss:20b -
Run a particular model with a prompt and save the response to a file
ollama run gpt-oss:20b What model are you? > response.txt
-
Load the
ollama/0.6.8modulemodule load ai/ollama/0.6.8 -
Lauch the ollama server in the background
ollama serve > /dev/null 2>&1 & -
List the available models
ollama list NAME ID SIZE MODIFIED devstral:24b c4b2fa0c33d7 14 GB 4 months ago codegemma:2b 926331004170 1.6 GB 4 months ago nomic-embed-text:v1.5 0a109f422b47 274 MB 5 months ago deepseek-coder:6.7b ce298d984115 3.8 GB 5 months ago deepseek-coder:1.3b 3ddd2d3fc8d2 776 MB 5 months ago llama3.2:1b baf6a787fdff 1.3 GB 5 months ago qwen3:1.7b 458ce03a2187 1.4 GB 5 months ago qwen3:30b-a3b 2ee832bc15b5 18 GB 5 months ago qwen3:8b e4b5fd7f8af0 5.2 GB 5 months ago deepseek-r1:8b 28f8fd6cdc67 4.9 GB 5 months ago deepseek-r1:7b 0a8c26691023 4.7 GB 5 months ago deepseek-r1:1.5b a42b25d8c10a 1.1 GB 5 months ago gemma3:4b a2af6cc3eb7f 3.3 GB 5 months ago gemma3:12b f4031aab637d 8.1 GB 5 months ago gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 5 months ago gemma3:1b 8648f39daa8f 815 MB 5 months ago -
Run an available model and obtain a terminal chat interface
ollama run gemma3:4b -
Run a particular model with a prompt and save the response to a file
ollama run gemma3:4b What model are you? > response.txt
Batch submission tips
In a slurm batch submission, you could add a sleep command after the ollama serve command to give time for the ollama server to start e.g. (sleep 10).