Setup

Docker + NVIDIA Setup for Hosting

A walkthrough of installing NVIDIA drivers, Docker CE, and the NVIDIA Container Toolkit on Ubuntu 22.04 or 24.04 — the base stack every Vast.ai or RunPod host needs before anything else runs.

Vast.ai and RunPod both deliver work to your machine as Docker containers with GPU access mapped through. That means three pieces have to be in place before anything else: an NVIDIA driver that recognizes your GPU, a working Docker installation, and the NVIDIA Container Toolkit that lets Docker see the GPU. Get one of them wrong and your host agent will either refuse to come online or happily come online and fail every rental.

This guide covers Ubuntu 22.04 and 24.04, which are the supported targets for most hosting platforms. Commands are copy-paste safe for a fresh install. If you are layering this onto an existing system, adapt accordingly.

Always verify against upstream docs. Package names and repo URLs for Docker and NVIDIA tooling occasionally change. Before pasting, cross-check with the Docker Engine for Ubuntu docs and the NVIDIA Container Toolkit install guide.

Prerequisites

A machine with a supported NVIDIA GPU (Pascal / GTX 10-series or newer as a practical floor)
Ubuntu 22.04 LTS or 24.04 LTS, installed and updated
A user with sudo privileges
An internet connection for package downloads

Before you begin, update the system:

sudo apt update
sudo apt upgrade -y
sudo reboot

A reboot clears any pending kernel updates before you install the driver, which avoids a category of "driver built against the wrong kernel" errors later.

Step 1: install the NVIDIA driver

The simplest path on Ubuntu is ubuntu-drivers, which picks a recommended driver version for your detected GPU:

sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, verify the driver is running and the GPU is visible:

nvidia-smi

You should see a table listing your GPU, driver version, CUDA version, and current utilization. If nvidia-smi prints "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver", the driver did not install cleanly — see the troubleshooting section below.

Alternative: install a specific driver version

If ubuntu-drivers autoinstall picks a version you don't want, you can install a specific one. List what's available:

ubuntu-drivers devices

Then install by name, for example:

sudo apt install -y nvidia-driver-535
sudo reboot

Match the driver version to what your hosting platform requires — Vast.ai publishes a minimum driver version in their host docs.

Step 2: install Docker CE

Use Docker's official apt repository rather than Ubuntu's older docker.io package — the upstream package is kept current and works cleanly with the NVIDIA Container Toolkit.

Remove any older Docker packages that might interfere:

sudo apt remove -y docker docker-engine docker.io containerd runc
sudo apt autoremove -y

Install prerequisites, add Docker's GPG key, and add the repo:

sudo apt install -y ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Verify Docker is running:

sudo docker run --rm hello-world

You should see the "Hello from Docker!" output. Optionally, add your user to the docker group to skip sudo on subsequent commands (log out and back in for group membership to take effect):

sudo usermod -aG docker $USER

Step 3: install the NVIDIA Container Toolkit

This is the bridge that lets Docker containers see and use your GPU.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

Configure Docker to use the NVIDIA runtime, then restart Docker:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 4: verify the full stack

This is the single most important test. If this works, your host is ready for a hosting agent to install on top:

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

You should see the same nvidia-smi output you got from the host, but this time running inside a container. If the container can see the GPU, Docker can hand it to Vast.ai or RunPod workloads.

Tip. The CUDA image tag in that command (12.0.0-base-ubuntu22.04) is a widely-available reference image. Newer CUDA versions work the same way — swap in whatever is current.

Common errors and fixes

"Failed to initialize NVML: Driver/library version mismatch"

This usually means you updated the driver package but didn't reboot, or you have mixed driver versions installed. Fix: sudo reboot. If the error persists after reboot, purge NVIDIA packages and reinstall:

sudo apt purge 'nvidia-*' 'libnvidia-*'
sudo apt autoremove -y
sudo apt install -y nvidia-driver-535
sudo reboot

"could not select device driver with capabilities: [[gpu]]"

Docker can't find the NVIDIA runtime. Either the Container Toolkit isn't installed, or nvidia-ctk runtime configure wasn't run. Re-run Step 3.

"docker: Error response from daemon: unknown or invalid runtime name: nvidia"

The /etc/docker/daemon.json file either doesn't exist or is missing the NVIDIA runtime entry. Running sudo nvidia-ctk runtime configure --runtime=docker writes a correct config. After that, restart Docker.

"cannot open shared object file: libcuda.so"

The container image is looking for CUDA libraries that the NVIDIA driver on the host provides. Almost always caused by a missing or outdated driver on the host. Re-run nvidia-smi on the host — if that fails too, the driver is the problem.

cgroup v2 quirks on Ubuntu 24.04

Ubuntu 24.04 uses cgroup v2 by default. This is generally fine with modern Docker and NVIDIA Container Toolkit versions, but if you see permission errors when containers try to access the GPU, make sure you are on a recent version of the toolkit (apt list --installed | grep nvidia-container-toolkit). Older versions pre-date cgroup v2 support.

Where to go from here

With the Docker + NVIDIA stack verified, you are ready to install the Vast.ai or RunPod host agent following their respective onboarding docs. Before you do, it's worth running your rig through our compatibility checker — it covers the non-software prerequisites (VRAM threshold, internet speed, storage capacity) that matter just as much.

Verify your rig meets the minimums

Software is one piece. The RigHost checker covers the rest — GPU class, VRAM, bandwidth, storage, OS. Run it in a browser or pipe the CLI version into your server.

Run the Compatibility Checker →

Docker + NVIDIA Setup for Hosting

Prerequisites

Step 1: install the NVIDIA driver

Alternative: install a specific driver version

Step 2: install Docker CE

Step 3: install the NVIDIA Container Toolkit

Step 4: verify the full stack

Common errors and fixes

"Failed to initialize NVML: Driver/library version mismatch"

"could not select device driver with capabilities: [[gpu]]"

"docker: Error response from daemon: unknown or invalid runtime name: nvidia"

"cannot open shared object file: libcuda.so"

cgroup v2 quirks on Ubuntu 24.04

Where to go from here

Verify your rig meets the minimums

Continue reading