Toolbox: Nvidia proprietary driver

Created on 16 Apr 2019  ·  16Comments  ·  Source: containers/toolbox

First, great project!

If I'm using Nvidia proprietary driver, OpenGL softwares (like Blender) don't work inside toolbox container. I tried to install the proprietary driver inside the container, it installs but the OpenGL softwares don't work. Is it necessary to install more things? Or set some env variable?

Thanks!

1. Bug 5. Help Wanted

Most helpful comment

So to have the NVIDIA stuff working inside the Toolbox I had to do this (inspired by https://github.com/thewtex/docker-opengl-nvidia):

1) You have to patch the Toolbox to bind mount the /dev/nvidia0 and /dev/nvidiactl to the Toolbox and setup the X11 things - see https://github.com/tpopela/toolbox/commit/40231e8591d70065199c0df9b6811c2f9e9d7269

2) Download the NVIDIA proprietary drivers on the host:

#!/bin/sh

# Get your current host nvidia driver version, e.g. 340.24
nvidia_version=$(cat /proc/driver/nvidia/version | head -n 1 | awk '{ print $8 }')

# We must use the same driver in the image as on the host
if test ! -f nvidia-driver.run; then
  nvidia_driver_uri=http://us.download.nvidia.com/XFree86/Linux-x86_64/${nvidia_version}/NVIDIA-Linux-x86_64-${nvidia_version}.run
  wget -O ~/nvidia-driver.run $nvidia_driver_uri
fi

3) Install the drivers while being inside the Toolbox:

#!/bin/sh

sudo dnf install -y glx-utils kmod libglvnd-devel || exit 1
sudo sh ~/nvidia-driver.run -a -N --ui=none --no-kernel-module || exit 1
glxinfo | grep "OpenGL version"

All 16 comments

Toolbox is a container, you would have to map your graphics card inside, or do things the way nvidia-docker does.

@Findarato You mean add something like --volume /dev/nvidia0:/dev/nvidia0 and other /dev files?

So to have the NVIDIA stuff working inside the Toolbox I had to do this (inspired by https://github.com/thewtex/docker-opengl-nvidia):

1) You have to patch the Toolbox to bind mount the /dev/nvidia0 and /dev/nvidiactl to the Toolbox and setup the X11 things - see https://github.com/tpopela/toolbox/commit/40231e8591d70065199c0df9b6811c2f9e9d7269

2) Download the NVIDIA proprietary drivers on the host:

#!/bin/sh

# Get your current host nvidia driver version, e.g. 340.24
nvidia_version=$(cat /proc/driver/nvidia/version | head -n 1 | awk '{ print $8 }')

# We must use the same driver in the image as on the host
if test ! -f nvidia-driver.run; then
  nvidia_driver_uri=http://us.download.nvidia.com/XFree86/Linux-x86_64/${nvidia_version}/NVIDIA-Linux-x86_64-${nvidia_version}.run
  wget -O ~/nvidia-driver.run $nvidia_driver_uri
fi

3) Install the drivers while being inside the Toolbox:

#!/bin/sh

sudo dnf install -y glx-utils kmod libglvnd-devel || exit 1
sudo sh ~/nvidia-driver.run -a -N --ui=none --no-kernel-module || exit 1
glxinfo | grep "OpenGL version"

@tpopela it worked. Thanks!

I'm glad it worked! But there was a mistake that could lead to malfunctions after the host is restarted - you will need to apply https://github.com/tpopela/toolbox/commit/3db450a8e5762399fd81c848f311da950437dd04 on top of the previous patch.

@tpopela We might be able to get away without bind mounting /tmp/.x11-unix. These days the X.org server listens on an abstract UNIX socket and a UNIX socket on the file system. The former doesn't work if you have a network namespace, but the Toolbox doesn't have one (because podman create --net host), and that's why X applications work. The latter is located at /tmp/.x11-unix and is used by Flatpak containers because those have network namespaces.

References:

Ah ok @debarshiray! Thank you for clarification. I can confirm that not bind mounting the /tmp/X11-unix doesn't change anything and the integration works (tried to run Blender here).

There is maybe a small change after we are bind mounting the whole /dev. Blender now looks for nvcc (CUDA stuff) in PATH and can't find it.

With the merge of https://github.com/debarshiray/toolbox/pull/119 this issue may be closed, since Nvidia is working now with proprietary driver. It's just necessary to install nvidia driver once inside the toolbox container. @tpopela's scripts helps with driver installation. @tpopela you have to install CUDA Toolkit. To make it install I've passed the parameters --override and --toolkit. After installing CUDA Toolkit Blender show me option to render using CUDA. But unfortunately CUDA doesn't work with GCC9 :(

Actually I would leave this open (but I will leave it on Rishi) as we were thinking with @debarshiray about leaking the NVIDIA host drivers to the container, so there will be no need to manually install the drivers in the container. We have a working WIP solution for it.

That would be great!

we were thinking with @debarshiray about leaking the NVIDIA host drivers to the
container, so there will be no need to manually install the drivers in the container.

Yes, I agree that this will be the right thing to do. OpenGL drivers have a kernel module and some user-space components (eg., shared libraries) that talk to each other. In NVIDIA's case the interface between these two components isn't stable and hence the user-space bits inside the container must match the kernel module on the host. These two can go out of sync if your host is lagging behind the container or vice versa.

The problem with leaking the files into the container is maintaining a list of those files somewhere because they vary from version to version. This would be vastly simpler if there was a well known nvidia directory somewhere on the host that could be bind mounted because then we wouldn't have to worry about the names and locations of the individual files themselves. Unfortunately that's not the case.

Looking around, I found Flatpak's solution to be a reasonable compromise. In short, it invents and enforces this well known nvidia directory. It expects distributors of the host OS to put all the user-space files in /var/lib/flatpak/extension/org.freedesktop.Platform.GL.host/x86_64/1.4 and that's implemented by modifying the package shipping the NVIDIA driver.

With that done, we'd need to figure out where to place these files inside the container and how to point the container's runtime environment at them.

Nvidia have their own solution for this nvidia-container-runtime-hook which works very well with podman triggered by an oci prestart hook. I just run into an issues at the moment when using --uidmaps resulting in losing permissions to run ldconfig:

could not start /sbin/ldconfig: mount operation failed: /proc: operation not permitted

It may be better for toolbox to try and integrate with this existing tool rather then maintaining another implementation.

Issue relating to the uidmap permission problem:

https://github.com/NVIDIA/libnvidia-container/issues/49

I was trying to run steam in the toolbox bug #343 I didn't patch the toolbox, steam runs and opengl works but vulkan doesn't seem to work, tried vkmark and Rise of Tomb Raider on steam.

Any ideas how to get it to work?

I saw that Singularity ccontainer fix this problem without libnvidia-container. They use a list of needed files

So what is the status of using Nvidia GPU drivers in container in 2021?
I can /dev/nvidia0 and /dev/nvidiactl are mounted.
However, I cannot install Nvidia drivers successfully. The install proceeds normally but checking with modinfo -F version nvidia gives Error:
modinfo: ERROR: Module alias nvidia not found..
And Nvidia Container Toolkit is not officially supported in Fedora, so it doesn't seem like a good idea to use with Fedora Silverblue.

Was this page helpful?
0 / 5 - 0 ratings