OpenWebUI & Ollama: Experience AI on Your Terms with Local Hosting
In an era where AI solutions are behind subscription models behind cloud-based solutions, OpenWebUI and Ollama provide a powerful alternative that prioritize privacy, security, and cost efficiency. These open-source tools are revolutionizing how organizations and individuals can harness AI capabilities while maintaining complete control of models and data used.
Why use local LLMs? #1 Uncensored Models
One significant advantage of local deployment through Ollama is the ability to use a model of your choosing which includes unrestricted LLMs. While cloud-based AI services often implement various limitations and filters on their models to maintain content control and reduce liability, locally hosted models can be used without these restrictions. This provides several benefits:
- Complete control over model behavior and outputs
- Ability to fine-tune models for specific use cases without limitations
- Access to open-source models with different training approaches
- Freedom to experiment with model parameters and configurations
- No artificial constraints on content generation or topic exploration
This flexibility is particularly valuable for research, creative applications, and specialized industry use cases where standard content filters might interfere with legitimate work.
Here’s an amazing article from Eric Hartford on: Uncensored Models

Why use local LLMs? #2 Privacy
When running AI models locally through Ollama and OpenWebUI, all data processing occurs on your own infrastructure. This means:
- Sensitive data never leaves your network perimeter
- No third-party access to your queries or responses
- Complete control over data retention and deletion policies
- Compliance with data sovereignty requirements
- Protection from cloud provider data breaches
Implementation
Requirements:
- Docker
- NVIDIA Container Toolkit (Optional but Recommended)
- GPU + NVIDIA Cuda Installation (Optional but Recommended)
Step 1: Install Ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
Step 2: Launch Open WebUI with the new features
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Need help setting up Docker and Nvidia Container toolkit?
OpenWeb UI and Ollama
OpenWebUI provides a sophisticated interface for interacting with locally hosted models while maintaining all the security benefits of local deployment. Key features include:
- Intuitive chat interface similar to popular cloud-based AI services
- Support for multiple concurrent model instances
- Built-in prompt templates and history management
- Customizable UI themes and layouts
- API integration capabilities for internal applications
Ollama simplifies the process of running AI models locally while providing robust security features:
- Easy model installation and version management
- Efficient resource utilization through optimized inference
- Support for custom model configurations
- Built-in model verification and integrity checking
- Container-friendly architecture for isolated deployments
Free AI Inference with local Containers that leverage your NVIDIA GPU
First, let’s find out our GPU information from the OS perspective with the following command:
sudo lshw -C display
NVIDIA Drivers
Check your drivers are up to date so you can get the best features and security patches released. We are using ubuntu so will check by first
nvidia-smi
sudo modinfo nvidia | grep version
Then compare to see what’s in the apt repo to see if you have the latest with:
apt-cache search nvidia | grep nvidia-driver-5

If this is your first time installing drivers please see:
Configure the NVIDIA Toolkit Runtime for Docker
nvidia-ctk is a command-line tool you get when you configure the NVIDIA Container Toolkit. It’s used to configure and manage the container runtime (Docker or containerd) to enable GPU support within containers. To configure you can simply run the following
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Here are some of its primary functions:
- Configuring runtime: Modifies the configuration files of Docker or containerd to include the NVIDIA Container Runtime.
- Generating CDI specifications: Creates configuration files for the Container Device Interface (CDI), which allows containers to access GPU devices.
- Listing CDI devices: Lists the available GPU devices that can be used by containers.
In essence, nvidia-ctk acts as a bridge between the container runtime and the NVIDIA GPU, ensuring that containers can effectively leverage GPU acceleration.
Tip: In cases where you want to split one GPU you could create multiple CDI devices which are virtual slices of the GPU. Say you have a GPU with 6GB of RAM, you could create 2 devices with the nvidia-ctk command like so:
nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 2G --name cdi1
nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 4G --name cdi2
Now you can assign each to containers to limit their utilization of the GPU ram like this:
docker run --gpus device=cdi1,cdi2
Run Containers with GPUs
After configuring the Driver and NVIDIA Container Toolkit you are ready to run GPU-powered containers. One of our favorites is the Ollama containers that allow you to run AI Inference endpoints.
docker run -it --rm --gpus=all -v /home/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama
Notice we are using all gpus in this instance.