[X2go-Dev] 3.99 Testing - Multiple X2goclient instances fail
John A. Sullivan III
jsullivan at opensourcedevel.com
Mon Aug 1 20:38:27 CEST 2011
On Mon, 2011-08-01 at 13:30 -0400, John A. Sullivan III wrote:
> On Fri, 2011-07-29 at 10:21 +0200, Mike Gabriel wrote:
> > Hi Daniel,
> >
> > On Fr 29 Jul 2011 07:24:40 CEST Daniel Lindgren wrote:
> >
> > > [...]
> > >
> > > I've been trying to think of some way to make x2go automatically only
> > > use free ports, but since there are possibly several clients and
> > > servers involved in multiple incoming and outgoing sessions, the only
> > > safe way I've come up with would be to register used ports (or reserve
> > > port ranges) on a shared resource somewhere, which would make all x2go
> > > clients and servers dependent on that one resource.
> >
> > What you have been describing above is the old, well known
> > NXServer/NXClient problem.
> >
> > To my experience it does not exist with X2go (at least not with
> > pyhoca-gui ;-) ).
> >
> > The connecting stuff is a bit tricky...
> >
> > - x2goclient connects via SSH and launches either x2gostartagent or
> > x2goresume-session
> > - x2gostartagent will try to detect a free server-side port >30000.
> > This port is put into the session database table. (Make sure you do
> > not have other server-side stuff running that claims ports between
> > 30000 and 30000++.
> > - x2goresume-session will pick the formerly detected port from the session
> > database table (i.e. SQLite or postgres) and presume it is still unused
> > (as it should be, unless unused by X2go -> x2gostartagent avoids claiming
> > ports that are in the database marked with S (suspended) or R (running)
> > state
> > - say we detect a free server-side port: 30015: x2gostartagent will
> > launch the
> > XNest-like x2goagent XServer, listening for incoming connections on port
> > localhost:300015.
> >
> > - x2goclient now tries to set up a forwarding tunnel from
> > client to server (-L 30000:localhost:30015)
> > - if the local port 30000 (the first one in the above expression) is already
> > in use, it simply selects another port. Each selected port is
> > probed before
> > usage. That is done by x2goclient.
> > - say we end up with -L 30028:localhost:30015
> >
> > The problem you observe: ”the remote proxy closed the connection while
> > negotiating …” to my point of view stems from uncleanly closed SSH
> > connections. Try to restart the SSH daemon when this error occurs and
> > you should be able to connect again. This error can also be caused by
> > the sound/sshfs tunnels. Here the same cause applies: unclean shutdown
> > of SSH connections.
> >
> > Reasons may be:
> >
> > o unclean x2goclient/pyhoca-gui code
> > o line failures (so the clients do not get the chance to cancel the port
> > forwarding requests...)
> >
> > You may do the following to debug:
> >
> > o try if the problem occurs in the same way when you use pyhoca-gui/-cli
> > o pyhoca-gui (i.e. python-x2go) tries to restore the mal-closure of SSH
> > port forwardings... You may try to resume a session twice, the second
> > time you should get a session window
> >
> > The addressed issue was a big problem for Python X2go before version
> > 0.1.0.0. It took me quite some sleepness nights to get rid of it by
> > 90%+.
> <snip>
> I continue to have a devil of a time with this. It seems like
>
> - x2goclient now tries to set up a forwarding tunnel from
> client to server (-L 30000:localhost:30015)
> - if the local port 30000 (the first one in the above expression) is
> already
> in use, it simply selects another port. Each selected port is
> probed before
> usage. That is done by x2goclient.
>
> isn't working. In the past, using older clients, we would generate
> error messages about a problem connecting on port 30001 but the sessions
> would all still work.
>
> We are no longer seeing those errors. Instead, the sessions are rudely
> dropped. I've restarted all the X2Go servers and clients after deleting
> all the old session directories on both. Still no success. Here are
> the details.
>
> In our environment, we have a single PostgreSQL database for all the
> X2Go servers. There is one X2Go server for each user (x2go client).
> I connect to the first server and I start a valid session. I am using
> ports 30001 - 30003 on the server side and 30004 on the client side:
> tcp 0 0 0.0.0.0:30004 0.0.0.0:* LISTEN
>
> Server User Status Last Connection Run Time Display Client GP SP FP Context Load RSS VSZ
> jasiii jasiii S 01.08.11 12:45:12 0d 0h 2m 51 74.75.231.235 30001 30002 30003 40016 0.04 434.2Mb 8.4Gb
>
> I start a second x2goclient to connect to the second server. The client
> is listening on port 30005:
> tcp 0 0 0.0.0.0:30004 0.0.0.0:* LISTEN
> tcp 0 0 0.0.0.0:30005 0.0.0.0:* LISTEN
>
> The server side is using the same ports as the other x2go server:
> Server User Status Last Connection Run Time Display Client GP SP FP Context Load RSS VSZ
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> cuser-a6 cuser-a6 R 01.08.11 12:45:06 0d 0h 1m 53 74.75.231.235 30001 30002 30003 40046 0.01 315.3Mb 5.9Gb
> jasiii jasiii S 01.08.11 12:45:12 0d 0h 2m 51 74.75.231.235 30001 30002 30003 40016 0.04 434.2Mb 8.4Gb
>
> This is where the original session dies. If I understand the logic,
> shouldn't there be a mapping from 30004 to localhost:30001 for the first
> session and another from 30005 to localhost:30001 for the second
> session? Why is the first session dropped as if there was no network
> connectivity?
>
> This is what the server side session.log says:
> Error: Failure reading from the peer proxy.
> Error: Connection with remote peer broken.
> Error: Please check the state of your network and retry.
> Session: Display failure detected at 'Mon Aug 1 12:45:07 2011'.
>
> Sometimes, it can get really ugly:
> Error: Failure reading from the peer proxy.
> Error: Connection with remote peer broken.
> Error: Please check the state of your network and retry.
> nxagentReconnectFailedFonts: WARNING! Font server tunneling not
> retrieved.
> *** glibc detected *** /usr/lib/x2go/x2goagent: corrupted double-linked
> list: 0x0000000001131d10 ***
> ======= Backtrace: =========
> /lib/libc.so.6(+0x71ad6)[0x7f05278fbad6]
> /lib/libc.so.6(+0x71f4d)[0x7f05278fbf4d]
> /lib/libc.so.6(+0x73418)[0x7f05278fd418]
> /lib/libc.so.6(cfree+0x6c)[0x7f052790084c]
> /usr/lib/x2go/libX11.so.6(XFreeFontPath+0x1d)[0x7f05295ff8dd]
> /usr/lib/x2go/x2goagent[0x485af8]
> /usr/lib/x2go/x2goagent[0x4ab421]
> /usr/lib/x2go/x2goagent[0x4ab840]
> /usr/lib/x2go/x2goagent[0x4ab95b]
> /usr/lib/x2go/x2goagent[0x48cb75]
> /usr/lib/x2go/x2goagent[0x450bbb]
> /usr/lib/x2go/x2goagent[0x45c65e]
> /usr/lib/x2go/x2goagent[0x429595]
> /usr/lib/x2go/x2goagent[0x455863]
> /lib/libc.so.6(__libc_start_main+0xfd)[0x7f05278a8c4d]
> /usr/lib/x2go/x2goagent[0x40aa69]
>
> The client side says this:
> Loop: WARNING! Connected to remote version 3.4.0 with local version
> 3.5.0.
> Loop: WARNING! Unrecognized session type 'unix-kde-depth_24'. Assuming
> agent session.
> Proxy: PANIC! Failure reading from the peer proxy on FD#5.
> Loop: PANIC! No shutdown of proxy link performed by remote proxy.
>
> So what changed? Is this an issue with libssh? Thanks - John
<snip>
I'm not sure where it is coming from but sometimes, not all the time, we
receive a message such as:
channel_open_session failed - Received SSH_MSG_DISCONNECT: Received ieof
for nonexistent channel 0
Lots of Internet research didn't produce a lot of helpful information on
the error.
On the server side ssh logs I see:
Aug 1 12:45:07 jasiii sshd[22977]: error: connect_to localhost port
30001: failed.
Aug 1 12:45:07 jasiii sshd[22977]: channel_by_id: 0: bad id: channel
free
I do not see anything of interest in the client ssh logs.
Out of curiosity, I tried reproducing this manually. Both systems where
I can reliably reproduce this problem have a local smtp daemon. I did a
"ssh -p <some hidden port> -L 40015:localhost:25 user1 at machine1" and a
"ssh -p <some hidden port> -L 40016:localhost:25 user2 at machine2". I was
able to telnet to each on port 40015 and 40016 without an issue or one
disconnecting the other. Could there be a problem in the way libssh is
handling multiple channels? - John
More information about the x2go-dev
mailing list