Thanks Ulrich!
Yes indeed the system was using the libGL of nvidia. I did these steps to recover the system:
yum reinstall libglvnd* mesa*
That fixes most of the symlinks of libGL in /usr/lib64 to again point to the system's libGL.
Except for this one: # ll /usr/lib64/libGLX_indirect.so.0 lrwxrwxrwx. 1 root root 26 Jul 1 16:03 /usr/lib64/libGLX_indirect.so.0 -> libGLX_nvidia.so.470.42.01
So I did this:
# cd /usr/lib64 # rm -f libGLX_indirect.so.0 # ln -s libGLX_mesa.so.0.0.0 libGLX_indirect.so.0 # ll /usr/lib64/libGLX_indirect.so.0 lrwxrwxrwx. 1 root root 20 Jul 14 11:43 /usr/lib64/libGLX_indirect.so.0 -> libGLX_mesa.so.0.0.0
Now glxinfo (and firefox) don't crash nxagent!
Not sure of the full implications of manually modifying that symlink.
On Tue, 13 Jul 2021 at 20:07, Ulrich Sibiller <uli42@gmx.de> wrote:
I think the problem is that the system now uses the libGL of nvidia instead of the libGL that came with the system. So try to reinstall the lib. See https://forums.developer.nvidia.com/t/multiple-glx-client-libraries-in-the-n... for some details regarding the libGL handling in the nvidia driver.
Hope that helps!
Uli
On Tue, Jul 13, 2021 at 9:05 AM Norman Gaywood <ngaywood@une.edu.au> wrote:
We have some centos 7 systems with GPUs that users access with x2go to
run their machine learning task.
After an update to cuda on the centos 7 systems, nxagent now segfaults
when I run glxinfo or firefox.
nxagent-3.5.99.26-1.el7.x86_64 x2goserver-4.1.0.3-9.el7.x86_64 cuda-11.4.0-1.x86_64 kmod-nvidia-latest-dkms-470.42.01-1.el7.x86_64
What's also interesting is that if I x2go into a host that does not have
cuda installed, and then:
ssh -Y cudahost glxinfo then the nxagent on the non-cuda host segfaults.
This happens with glxinfo and when trying to start firefox. google-chrome works fine.
This was all working fine until I updated cuda and the GPU driver. On Centos 7.9.2009 This happens on a host with a K80 GPU and another host that has a V100 GPU.
I also have cuda-11.4 on a Fedora 34 host with a V100 GPU. If I x2go (or ssh -Y) to the Fedora 34 host, glxinfo (and firefox) run fine.
nxagent-3.5.99.26-1.fc34.x86_64 x2goserver-4.1.0.3-10.fc34.x86_64 cuda-11.4.0-1.x86_64 kmod-nvidia-latest-dkms-3:465.19.01-1.fc33.x86_64
Any suggestions on how I might provide some debugging information to the developers?
abrt-cli list --since 1626070479 id 35db2c461122be7229abdbe219ddfd92d0613da8 reason: nxagent killed by SIGSEGV time: Mon 05 Jul 2021 09:37:02 AEST cmdline: x2goagent -nolisten tcp -nolisten tcp -dpi 97 -D -auth /home/ngaywood/.Xauthority -geometry 2560x1440 -name X2GO-ngaywood-50-1625441817_stDMATE_dp24 :50 package: nxagent-3.5.99.26-1.el7 uid: 5125 (ngaywood) count: 21 Directory: /var/spool/abrt/ccpp-2021-07-05-09:37:02-2523
-- Norman Gaywood, Computer Systems Officer School of Science and Technology University of New England Armidale NSW 2351, Australia
ngaywood@une.edu.au http://turing.une.edu.au/~ngaywood Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062
Please avoid sending me Word or Power Point attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
x2go-user mailing list x2go-user@lists.x2go.org https://lists.x2go.org/listinfo/x2go-user
-- Norman Gaywood, Computer Systems Officer School of Science and Technology University of New England Armidale NSW 2351, Australia
ngaywood@une.edu.au http://turing.une.edu.au/~ngaywood Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062
Please avoid sending me Word or Power Point attachments. See http://www.gnu.org/philosophy/no-word-attachments.html