First, let’s find out our GPU information from the OS perspective with the following command:

sudo lshw -C display

NVIDIA Drivers

Check your drivers are up to date so you can get the best features and security patches released. We are using ubuntu so will check by first

nvidia-smi
sudo modinfo nvidia | grep version

Then compare to see what’s in the apt repo to see if you have the latest with:

apt-cache search nvidia | grep nvidia-driver-5

NVIDIA-SMI and Drivers in Ubuntu
NVIDIA-SMI and Drivers in Ubuntu

If this is your first time installing drivers please see:

Configure the NVIDIA Toolkit Runtime for Docker

nvidia-ctk is a command-line tool you get when you configure the NVIDIA Container Toolkit. It’s used to configure and manage the container runtime (Docker or containerd) to enable GPU support within containers. To configure you can simply run the following

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Here are some of its primary functions:

  • Configuring runtime: Modifies the configuration files of Docker or containerd to include the NVIDIA Container Runtime.
  • Generating CDI specifications: Creates configuration files for the Container Device Interface (CDI), which allows containers to access GPU devices.
  • Listing CDI devices: Lists the available GPU devices that can be used by containers.

In essence, nvidia-ctk acts as a bridge between the container runtime and the NVIDIA GPU, ensuring that containers can effectively leverage GPU acceleration.

Tip: In cases where you want to split one GPU you could create multiple CDI devices which are virtual slices of the GPU. Say you have a GPU with 6GB of RAM, you could create 2 devices with the nvidia-ctk command like so:

nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 2G --name cdi1
nvidia-ctk create-cdi --device-path /dev/nvidia0 --device-id 0 --memory 4G --name cdi2

Now you can assign each to containers to limit their utilization of the GPU ram like this:

docker run --gpus device=cdi1,cdi2

Run Containers with GPUs

After configuring the Driver and NVIDIA Container Toolkit you are ready to run GPU-powered containers. One of our favorites is the Ollama containers that allow you to run AI Inference endpoints.

docker run -it --rm --gpus=all -v /home/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama

Notice we are using all gpus in this instance.

Sources: