OpenWebUI & Ollama: Experience AI on Your Terms with Local Hosting

In an era where AI solutions are behind subscription models behind cloud-based solutions, OpenWebUI and Ollama provide a powerful alternative that prioritize privacy, security, and cost efficiency. These open-source tools are revolutionizing how organizations and individuals can harness AI capabilities while maintaining complete control of models and data used.

Why use local LLMs? #1 Uncensored Models

One significant advantage of local deployment through Ollama is the ability to use a model of your choosing which includes unrestricted LLMs. While cloud-based AI services often implement various limitations and filters on their models to maintain content control and reduce liability, locally hosted models can be used without these restrictions. This provides several benefits:

  • Complete control over model behavior and outputs
  • Ability to fine-tune models for specific use cases without limitations
  • Access to open-source models with different training approaches
  • Freedom to experiment with model parameters and configurations
  • No artificial constraints on content generation or topic exploration

This flexibility is particularly valuable for research, creative applications, and specialized industry use cases where standard content filters might interfere with legitimate work.

Here’s an amazing article from Eric Hartford on: Uncensored Models

Why use local LLMs? #2 Privacy

When running AI models locally through Ollama and OpenWebUI, all data processing occurs on your own infrastructure. This means:

  • Sensitive data never leaves your network perimeter
  • No third-party access to your queries or responses
  • Complete control over data retention and deletion policies
  • Compliance with data sovereignty requirements
  • Protection from cloud provider data breaches

Implementation

Requirements:

  • Docker
  • NVIDIA Container Toolkit (Optional but Recommended)
  • GPU + NVIDIA Cuda Installation (Optional but Recommended)

Step 1: Install Ollama

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest

Step 2: Launch Open WebUI with the new features

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Need help setting up Docker and Nvidia Container toolkit?

OpenWeb UI and Ollama

OpenWebUI provides a sophisticated interface for interacting with locally hosted models while maintaining all the security benefits of local deployment. Key features include:

  • Intuitive chat interface similar to popular cloud-based AI services
  • Support for multiple concurrent model instances
  • Built-in prompt templates and history management
  • Customizable UI themes and layouts
  • API integration capabilities for internal applications

Ollama simplifies the process of running AI models locally while providing robust security features:

  • Easy model installation and version management
  • Efficient resource utilization through optimized inference
  • Support for custom model configurations
  • Built-in model verification and integrity checking
  • Container-friendly architecture for isolated deployments


Ollama On Docker on Nvidia

Free AI Inference with local Containers that leverage your NVIDIA GPU

First, let’s find out our GPU information from the OS perspective with the following command:

sudo lshw -C display

NVIDIA Drivers

Check your drivers are up to date so you can get the best features and security patches released. We are using ubuntu so will check by first

nvidia-smi
sudo modinfo nvidia | grep version

Then compare to see what’s in the apt repo to see if you have the latest with:

apt-cache search nvidia | grep nvidia-driver-5

NVIDIA-SMI and Drivers in Ubuntu
NVIDIA-SMI and Drivers in Ubuntu

If this is your first time installing drivers please see:

Configure the NVIDIA Toolkit Runtime for Docker

nvidia-ctk is a command-line tool you get when you configure the NVIDIA Container Toolkit. It’s used to configure and manage the container runtime (Docker or containerd) to enable GPU support within containers. To configure you can simply run the following

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Here are some of its primary functions:

  • Configuring runtime: Modifies the configuration files of Docker or containerd to include the NVIDIA Container Runtime.
  • Generating CDI specifications: Creates configuration files for the Container Device Interface (CDI), which allows containers to access GPU devices.
  • Listing CDI devices: Lists the available GPU devices that can be used by containers.

In essence, nvidia-ctk acts as a bridge between the container runtime and the NVIDIA GPU, ensuring that containers can effectively leverage GPU acceleration.

Tip: In cases where you want to split one GPU you could create multiple CDI devices which are virtual slices of the GPU. Say you have a GPU with 6GB of RAM, you could create 2 devices with the nvidia-ctk command like so:

nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 2G --name cdi1
nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 4G --name cdi2

Now you can assign each to containers to limit their utilization of the GPU ram like this:

docker run --gpus device=cdi1,cdi2

Run Containers with GPUs

After configuring the Driver and NVIDIA Container Toolkit you are ready to run GPU-powered containers. One of our favorites is the Ollama containers that allow you to run AI Inference endpoints.

docker run -it --rm --gpus=all -v /home/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama

Notice we are using all gpus in this instance.

Sources: