With the rise of hosted LLM providers like OpenAI and Anthropic, it’s fair to ask: why go through the effort of self-hosting?
The answer comes down to three key factors:
- 💰 Cost – Self-hosting gives you full control over the resources consumed by the model and how much you spend. Optimise costs by using smaller, cheaper models for simpler tasks while reserving heavier models for complex work.
- ⚙️ Control and flexibility – Self-hosting ensures complete control over the models you use. You decide how they’re customised and fine-tuned, without worrying about unexpected changes or drifts from third-party providers.
- 🔒 Security – Self-hosting is critical for processing sensitive or confidential data that can’t be shared externally. Your data stays where you want it: under your control.
Step 1: Build a Generic Ollama Container
We’ll containerise Ollama to provide a consistent, portable execution environment for managing models.
- 1️⃣ Run the Ollama container
- 2️⃣ Verify the container is running
- 3️⃣ Pull the Llama3.3 model
Use the following commands to launch the Ollama container with persistent storage and port mapping:
mkdir ./ollama-models
mkdir ./ollama-model-files
docker run -d \
-v ./ollama-models:/root/.ollama \
-v ./ollama-model-files:/ollama-model-files \
-p 11434:11434 \
--name ollama \
ollama/ollama
Check the status of the container:
docker ps
It should show a running container with the name ollama.
Use the following command to attach to the running ollama container, and download the Llama3.3 model into the attached static storage folder (ollama-models):
docker exec -it ollama ollama pull llama3.3
Step 2: Customising the Llama3.3 Model
Now that we have a generic Ollama container running, we can create a customised model tailored to specific tasks.
- 1️⃣ Create a model template
- 2️⃣ Create the custom model
- 3️⃣ Test the custom model
Use the Ollama CLI to generate a new model with the above template:
FROM llama3.3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
SYSTEM """
You turn text posts into HTML pages.
Use the following html snippet, starting after the word BEGIN, and ending before the word END, as a template.
BEGIN
<!DOCTYPE html>
<head>
<title><!--TITLE--></title>
</head>
<body>
<h1><!--TITLE--></h1>
<!--POST-->
</body>
</html>
END
The text post will be provided to you as a prompt.
Use the first line of the text post as a title, and replace all occurences of in the template with the title.
Use the rest of the text post as the post, and convert it to html.
Use paragraph tags to separate paragrahs, and use unordered lists to create lists with bullets.
Replace the string in the title with the generated html.
"""
For full model file documentation, view the Ollama model file docs.
Use the Ollama CLI to generate a new model with the above template:
docker exec -it ollama \
ollama create post2html -f /ollama-model-files/Modelfile
Run the model to verify it behaves as expected:
docker exec -it ollama ollama run post2html "How to bake a cake\n\nWhen you bake a cake, you need the following ingredients:\n* 2 eggs\n* 1 cup of flour\n* 1 cup of milk\n* 1 tablespoon of butter\n\nMix everything together."
If all went well, this should output the following html (or something equivalent):
<!DOCTYPE html>
<head>
<title>How to bake a cake</title>
</head>
<body>
<h1>How to bake a cake</h1>
<p>When you bake a cake, you need the following ingredients:</p>
<ul>
<li>2 eggs</li>
<li>1 cup of flour</li>
<li>1 cup of milk</li>
<li>1 tablespoon of butter</li>
</ul>
<p>Mix everything together.</p>
</body>
</html>
Why This Approach Works
By separating the Ollama container from the models and customisations:
- ✅ Portability: The same container can be reused, with different models attached at runtime.
- ✅ Simplified versioning: Models and customisations become static artifacts, easy to version, store, and share.
- ✅ Environment consistency: The Ollama container guarantees a predictable runtime for all models.
- ✅ Flexibility: New models or updates can be added without rebuilding the container—simply attach updated storage.
This clean separation of execution (the container) and data (the models) forms the ideal foundation for scaling AI workflows to production.
In the next post, we’ll explore how to take this local setup to the cloud—deploying customised models in AWS and orchestrating workflows for real-world production use.
Customisation is achieved by separating the Ollama container from models and customisations. This ensures:
- ✅ Portability: Reuse the same container with different models at runtime.
- ✅ Simplified versioning: Models and customisations become static artifacts, easy to manage.
- ✅ Environment consistency: Ollama guarantees a predictable runtime for all models.
Define model templates using Modelfile for tailored logic, such as creating a summarisation assistant. By keeping models and customisations in static storage, updates can be applied without rebuilding the container.
In the next post, we’ll take this local setup to the cloud—deploying customised models in AWS and orchestrating workflows for real-world production use.