With the rise of hosted LLM providers like OpenAI and Anthropic, it’s fair to ask: why go through the effort of self-hosting?

The answer comes down to three key factors:

  • 💰 Cost – Self-hosting gives you full control over the resources consumed by the model and how much you spend. Optimise costs by using smaller, cheaper models for simpler tasks while reserving heavier models for complex work.
  • ⚙️ Control and flexibility – Self-hosting ensures complete control over the models you use. You decide how they’re customised and fine-tuned, without worrying about unexpected changes or drifts from third-party providers.
  • 🔒 Security – Self-hosting is critical for processing sensitive or confidential data that can’t be shared externally. Your data stays where you want it: under your control.


Step 1: Build a Generic Ollama Container

We’ll containerise Ollama to provide a consistent, portable execution environment for managing models.

  • 1️⃣ Run the Ollama container
  • Use the following commands to launch the Ollama container with persistent storage and port mapping:

    mkdir ./ollama-models
    mkdir ./ollama-model-files
    docker run -d \
      -v ./ollama-models:/root/.ollama \
      -v ./ollama-model-files:/ollama-model-files \
      -p 11434:11434 \
      --name ollama \
      ollama/ollama
  • 2️⃣ Verify the container is running
  • Check the status of the container:

    docker ps

    It should show a running container with the name ollama.

  • 3️⃣ Pull the Llama3.3 model
  • Use the following command to attach to the running ollama container, and download the Llama3.3 model into the attached static storage folder (ollama-models):

    docker exec -it ollama ollama pull llama3.3


Step 2: Customising the Llama3.3 Model

Now that we have a generic Ollama container running, we can create a customised model tailored to specific tasks.

  • 1️⃣ Create a model template
  • Use the Ollama CLI to generate a new model with the above template:

    FROM llama3.3
    PARAMETER temperature 0.7
    PARAMETER top_p 0.9
    PARAMETER repeat_penalty 1.1
    SYSTEM """
    You turn text posts into HTML pages.
    Use the following html snippet, starting after the word BEGIN, and ending before the word END, as a template.
    BEGIN
    <!DOCTYPE html>
      <head>
        <title><!--TITLE--></title>
      </head>
      <body>
        <h1><!--TITLE--></h1>
        <!--POST-->
      </body>
    </html>
    END
    The text post will be provided to you as a prompt.
    Use the first line of the text post as a title, and replace all occurences of in the template with the title.
    Use the rest of the text post as the post, and convert it to html.
    Use paragraph tags to separate paragrahs, and use unordered lists to create lists with bullets.
    Replace the string in the title with the generated html.
    """

    For full model file documentation, view the Ollama model file docs.

  • 2️⃣ Create the custom model
  • Use the Ollama CLI to generate a new model with the above template:

    docker exec -it ollama \
      ollama create post2html -f /ollama-model-files/Modelfile
  • 3️⃣ Test the custom model
  • Run the model to verify it behaves as expected:

    docker exec -it ollama ollama run post2html "How to bake a cake\n\nWhen you bake a cake, you need the following ingredients:\n* 2 eggs\n* 1 cup of flour\n* 1 cup of milk\n* 1 tablespoon of butter\n\nMix everything together."

    If all went well, this should output the following html (or something equivalent):

    <!DOCTYPE html>
      <head>
        <title>How to bake a cake</title>
      </head>
      <body>
        <h1>How to bake a cake</h1>
        <p>When you bake a cake, you need the following ingredients:</p>
        <ul>
          <li>2 eggs</li>
          <li>1 cup of flour</li>
          <li>1 cup of milk</li>
          <li>1 tablespoon of butter</li>
        </ul>
        <p>Mix everything together.</p>
      </body>
    </html>


Why This Approach Works

By separating the Ollama container from the models and customisations:

  • Portability: The same container can be reused, with different models attached at runtime.
  • Simplified versioning: Models and customisations become static artifacts, easy to version, store, and share.
  • Environment consistency: The Ollama container guarantees a predictable runtime for all models.
  • Flexibility: New models or updates can be added without rebuilding the container—simply attach updated storage.

This clean separation of execution (the container) and data (the models) forms the ideal foundation for scaling AI workflows to production.


In the next post, we’ll explore how to take this local setup to the cloud—deploying customised models in AWS and orchestrating workflows for real-world production use.

Customisation is achieved by separating the Ollama container from models and customisations. This ensures:

  • Portability: Reuse the same container with different models at runtime.
  • Simplified versioning: Models and customisations become static artifacts, easy to manage.
  • Environment consistency: Ollama guarantees a predictable runtime for all models.

Define model templates using Modelfile for tailored logic, such as creating a summarisation assistant. By keeping models and customisations in static storage, updates can be applied without rebuilding the container.

In the next post, we’ll take this local setup to the cloud—deploying customised models in AWS and orchestrating workflows for real-world production use.