The error message you're seeing, Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]], typically occurs when trying to run a Docker container that requires GPU support, but Docker is unable to recognize or access the NVIDIA GPU on your system. This can happen for several reasons, including missing or improperly configured NVIDIA driver, Docker not set up to use the NVIDIA runtime, or issues with the NVIDIA Container Toolkit. Here's how you can troubleshoot and possibly resolve this issue:

1. Install NVIDIA Drivers

Ensure that the NVIDIA drivers are installed on your system. The drivers are necessary for Docker to communicate with your GPU.

2. Install NVIDIA Docker Toolkit

NVIDIA Container Toolkit (formerly nvidia-docker) allows Docker to utilize the GPU. Follow these steps to install it:

  • Remove Older Versions (if installed):

    sudo apt-get remove nvidia-docker
  • Set Up the NVIDIA Docker repository:

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  • Install the NVIDIA Docker Engine:

    sudo apt-get update
    sudo apt-get install -y nvidia-docker2
  • Restart the Docker Daemon:

    sudo systemctl restart docker

3. Configure Docker to Use the NVIDIA Runtime

After installing NVIDIA Docker, you need to configure Docker to use the NVIDIA runtime as the default runtime for containers. You can do this by editing the Docker daemon configuration file, typically located at /etc/docker/daemon.json. Add or update the following lines:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Then, restart Docker:

sudo systemctl restart docker

4. Verify Installation

After setting up, verify that Docker can now access the GPU. Run a test container with:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

This command runs a temporary container using a CUDA base image and executes nvidia-smi, which should display information about the GPU, indicating that Docker can successfully access the GPU.

If you follow these steps and still encounter issues, ensure your NVIDIA driver is compatible with your GPU and the version of the CUDA toolkit you're using. Additionally, consult the official NVIDIA documentation for the most up-to-date information on configuring Docker to work with NVIDIA GPUs.

最后修改:2024 年 05 月 11 日
如果觉得我的文章对你有用,请随意赞赏