Thanks Ulrich!

Yes indeed the system was using the libGL of nvidia. I did these steps to recover the system:

yum reinstall libglvnd* mesa*

That fixes most of the symlinks of libGL in /usr/lib64 to again point to the system's libGL.

Except for this one:
 # ll /usr/lib64/libGLX_indirect.so.0
lrwxrwxrwx. 1 root root 26 Jul  1 16:03 /usr/lib64/libGLX_indirect.so.0 -> libGLX_nvidia.so.470.42.01

So I did this:

# cd /usr/lib64
# rm -f libGLX_indirect.so.0
# ln -s libGLX_mesa.so.0.0.0 libGLX_indirect.so.0
# ll /usr/lib64/libGLX_indirect.so.0
lrwxrwxrwx. 1 root root 20 Jul 14 11:43 /usr/lib64/libGLX_indirect.so.0 -> libGLX_mesa.so.0.0.0

Now glxinfo (and firefox) don't crash nxagent!

Not sure of the full implications of manually modifying that symlink.


On Tue, 13 Jul 2021 at 20:07, Ulrich Sibiller <uli42@gmx.de> wrote:
I think the problem is that the system now uses the libGL of nvidia
instead of the libGL that came with the system. So try to reinstall
the lib. See https://forums.developer.nvidia.com/t/multiple-glx-client-libraries-in-the-nvidia-linux-driver-installer-package/41308
for some details regarding the libGL handling in the nvidia driver.

Hope that helps!

Uli

On Tue, Jul 13, 2021 at 9:05 AM Norman Gaywood <ngaywood@une.edu.au> wrote:
>
> We have some centos 7 systems with GPUs that users access with x2go to run their machine learning task.
>
> After an update to cuda on the centos 7 systems, nxagent now segfaults when I run glxinfo or firefox.
>
> nxagent-3.5.99.26-1.el7.x86_64
> x2goserver-4.1.0.3-9.el7.x86_64
> cuda-11.4.0-1.x86_64
> kmod-nvidia-latest-dkms-470.42.01-1.el7.x86_64
>
> What's also interesting is that if I x2go into a host that does not have cuda installed, and then:
>    ssh -Y cudahost glxinfo
> then the nxagent on the non-cuda host segfaults.
>
> This happens with glxinfo and when trying to start firefox. google-chrome works fine.
>
> This was all working fine until I updated cuda and the GPU driver. On Centos 7.9.2009
> This happens on a host with a K80 GPU and another host that has a V100 GPU.
>
> I also have cuda-11.4 on a Fedora 34 host with a V100 GPU.
> If I x2go (or ssh -Y) to the Fedora 34 host, glxinfo (and firefox) run fine.
>
> nxagent-3.5.99.26-1.fc34.x86_64
> x2goserver-4.1.0.3-10.fc34.x86_64
> cuda-11.4.0-1.x86_64
> kmod-nvidia-latest-dkms-3:465.19.01-1.fc33.x86_64
>
> Any suggestions on how I might provide some debugging information to the developers?
>
>
>  abrt-cli list --since 1626070479
> id 35db2c461122be7229abdbe219ddfd92d0613da8
> reason:         nxagent killed by SIGSEGV
> time:           Mon 05 Jul 2021 09:37:02 AEST
> cmdline:        x2goagent -nolisten tcp -nolisten tcp -dpi 97 -D -auth /home/ngaywood/.Xauthority -geometry 2560x1440 -name X2GO-ngaywood-50-1625441817_stDMATE_dp24 :50
> package:        nxagent-3.5.99.26-1.el7
> uid:            5125 (ngaywood)
> count:          21
> Directory:      /var/spool/abrt/ccpp-2021-07-05-09:37:02-2523
>
>
>
> --
> Norman Gaywood, Computer Systems Officer
> School of Science and Technology
> University of New England
> Armidale NSW 2351, Australia
>
> ngaywood@une.edu.au  http://turing.une.edu.au/~ngaywood
> Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062
>
> Please avoid sending me Word or Power Point attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> _______________________________________________
> x2go-user mailing list
> x2go-user@lists.x2go.org
> https://lists.x2go.org/listinfo/x2go-user


--
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia

ngaywood@une.edu.au  http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062

Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html