[X2go-Dev] 3.99 Testing - Multiple X2goclient instances fail
John A. Sullivan III
jsullivan at opensourcedevel.com
Mon Aug 1 20:43:23 CEST 2011
On Mon, 2011-08-01 at 14:38 -0400, John A. Sullivan III wrote:
> On Mon, 2011-08-01 at 13:30 -0400, John A. Sullivan III wrote:
> > On Fri, 2011-07-29 at 10:21 +0200, Mike Gabriel wrote:
> > > Hi Daniel,
> > >
> > > On Fr 29 Jul 2011 07:24:40 CEST Daniel Lindgren wrote:
> > >
> > > > [...]
> > > >
> > > > I've been trying to think of some way to make x2go automatically only
> > > > use free ports, but since there are possibly several clients and
> > > > servers involved in multiple incoming and outgoing sessions, the only
> > > > safe way I've come up with would be to register used ports (or reserve
> > > > port ranges) on a shared resource somewhere, which would make all x2go
> > > > clients and servers dependent on that one resource.
> > >
> > > What you have been describing above is the old, well known
> > > NXServer/NXClient problem.
> > >
> > > To my experience it does not exist with X2go (at least not with
> > > pyhoca-gui ;-) ).
> > >
> > > The connecting stuff is a bit tricky...
> > >
> > > - x2goclient connects via SSH and launches either x2gostartagent or
> > > x2goresume-session
> > > - x2gostartagent will try to detect a free server-side port >30000.
> > > This port is put into the session database table. (Make sure you do
> > > not have other server-side stuff running that claims ports between
> > > 30000 and 30000++.
> > > - x2goresume-session will pick the formerly detected port from the session
> > > database table (i.e. SQLite or postgres) and presume it is still unused
> > > (as it should be, unless unused by X2go -> x2gostartagent avoids claiming
> > > ports that are in the database marked with S (suspended) or R (running)
> > > state
> > > - say we detect a free server-side port: 30015: x2gostartagent will
> > > launch the
> > > XNest-like x2goagent XServer, listening for incoming connections on port
> > > localhost:300015.
> > >
> > > - x2goclient now tries to set up a forwarding tunnel from
> > > client to server (-L 30000:localhost:30015)
> > > - if the local port 30000 (the first one in the above expression) is already
> > > in use, it simply selects another port. Each selected port is
> > > probed before
> > > usage. That is done by x2goclient.
> > > - say we end up with -L 30028:localhost:30015
> > >
> > > The problem you observe: ”the remote proxy closed the connection while
> > > negotiating …” to my point of view stems from uncleanly closed SSH
> > > connections. Try to restart the SSH daemon when this error occurs and
> > > you should be able to connect again. This error can also be caused by
> > > the sound/sshfs tunnels. Here the same cause applies: unclean shutdown
> > > of SSH connections.
> > >
> > > Reasons may be:
> > >
> > > o unclean x2goclient/pyhoca-gui code
> > > o line failures (so the clients do not get the chance to cancel the port
> > > forwarding requests...)
> > >
> > > You may do the following to debug:
> > >
> > > o try if the problem occurs in the same way when you use pyhoca-gui/-cli
> > > o pyhoca-gui (i.e. python-x2go) tries to restore the mal-closure of SSH
> > > port forwardings... You may try to resume a session twice, the second
> > > time you should get a session window
> > >
> > > The addressed issue was a big problem for Python X2go before version
> > > 0.1.0.0. It took me quite some sleepness nights to get rid of it by
> > > 90%+.
> > <snip>
> > I continue to have a devil of a time with this. It seems like
> >
> > - x2goclient now tries to set up a forwarding tunnel from
> > client to server (-L 30000:localhost:30015)
> > - if the local port 30000 (the first one in the above expression) is
> > already
> > in use, it simply selects another port. Each selected port is
> > probed before
> > usage. That is done by x2goclient.
> >
> > isn't working. In the past, using older clients, we would generate
> > error messages about a problem connecting on port 30001 but the sessions
> > would all still work.
> >
> > We are no longer seeing those errors. Instead, the sessions are rudely
> > dropped. I've restarted all the X2Go servers and clients after deleting
> > all the old session directories on both. Still no success. Here are
> > the details.
> >
> > In our environment, we have a single PostgreSQL database for all the
> > X2Go servers. There is one X2Go server for each user (x2go client).
> > I connect to the first server and I start a valid session. I am using
> > ports 30001 - 30003 on the server side and 30004 on the client side:
> > tcp 0 0 0.0.0.0:30004 0.0.0.0:* LISTEN
> >
> > Server User Status Last Connection Run Time Display Client GP SP FP Context Load RSS VSZ
> > jasiii jasiii S 01.08.11 12:45:12 0d 0h 2m 51 74.75.231.235 30001 30002 30003 40016 0.04 434.2Mb 8.4Gb
> >
> > I start a second x2goclient to connect to the second server. The client
> > is listening on port 30005:
> > tcp 0 0 0.0.0.0:30004 0.0.0.0:* LISTEN
> > tcp 0 0 0.0.0.0:30005 0.0.0.0:* LISTEN
> >
> > The server side is using the same ports as the other x2go server:
> > Server User Status Last Connection Run Time Display Client GP SP FP Context Load RSS VSZ
> > ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> > cuser-a6 cuser-a6 R 01.08.11 12:45:06 0d 0h 1m 53 74.75.231.235 30001 30002 30003 40046 0.01 315.3Mb 5.9Gb
> > jasiii jasiii S 01.08.11 12:45:12 0d 0h 2m 51 74.75.231.235 30001 30002 30003 40016 0.04 434.2Mb 8.4Gb
> >
> > This is where the original session dies. If I understand the logic,
> > shouldn't there be a mapping from 30004 to localhost:30001 for the first
> > session and another from 30005 to localhost:30001 for the second
> > session? Why is the first session dropped as if there was no network
> > connectivity?
> >
> > This is what the server side session.log says:
> > Error: Failure reading from the peer proxy.
> > Error: Connection with remote peer broken.
> > Error: Please check the state of your network and retry.
> > Session: Display failure detected at 'Mon Aug 1 12:45:07 2011'.
> >
> > Sometimes, it can get really ugly:
> > Error: Failure reading from the peer proxy.
> > Error: Connection with remote peer broken.
> > Error: Please check the state of your network and retry.
> > nxagentReconnectFailedFonts: WARNING! Font server tunneling not
> > retrieved.
> > *** glibc detected *** /usr/lib/x2go/x2goagent: corrupted double-linked
> > list: 0x0000000001131d10 ***
> > ======= Backtrace: =========
> > /lib/libc.so.6(+0x71ad6)[0x7f05278fbad6]
> > /lib/libc.so.6(+0x71f4d)[0x7f05278fbf4d]
> > /lib/libc.so.6(+0x73418)[0x7f05278fd418]
> > /lib/libc.so.6(cfree+0x6c)[0x7f052790084c]
> > /usr/lib/x2go/libX11.so.6(XFreeFontPath+0x1d)[0x7f05295ff8dd]
> > /usr/lib/x2go/x2goagent[0x485af8]
> > /usr/lib/x2go/x2goagent[0x4ab421]
> > /usr/lib/x2go/x2goagent[0x4ab840]
> > /usr/lib/x2go/x2goagent[0x4ab95b]
> > /usr/lib/x2go/x2goagent[0x48cb75]
> > /usr/lib/x2go/x2goagent[0x450bbb]
> > /usr/lib/x2go/x2goagent[0x45c65e]
> > /usr/lib/x2go/x2goagent[0x429595]
> > /usr/lib/x2go/x2goagent[0x455863]
> > /lib/libc.so.6(__libc_start_main+0xfd)[0x7f05278a8c4d]
> > /usr/lib/x2go/x2goagent[0x40aa69]
> >
> > The client side says this:
> > Loop: WARNING! Connected to remote version 3.4.0 with local version
> > 3.5.0.
> > Loop: WARNING! Unrecognized session type 'unix-kde-depth_24'. Assuming
> > agent session.
> > Proxy: PANIC! Failure reading from the peer proxy on FD#5.
> > Loop: PANIC! No shutdown of proxy link performed by remote proxy.
> >
> > So what changed? Is this an issue with libssh? Thanks - John
> <snip>
> I'm not sure where it is coming from but sometimes, not all the time, we
> receive a message such as:
>
> channel_open_session failed - Received SSH_MSG_DISCONNECT: Received ieof
> for nonexistent channel 0
>
> Lots of Internet research didn't produce a lot of helpful information on
> the error.
>
> On the server side ssh logs I see:
> Aug 1 12:45:07 jasiii sshd[22977]: error: connect_to localhost port
> 30001: failed.
> Aug 1 12:45:07 jasiii sshd[22977]: channel_by_id: 0: bad id: channel
> free
>
> I do not see anything of interest in the client ssh logs.
>
> Out of curiosity, I tried reproducing this manually. Both systems where
> I can reliably reproduce this problem have a local smtp daemon. I did a
> "ssh -p <some hidden port> -L 40015:localhost:25 user1 at machine1" and a
> "ssh -p <some hidden port> -L 40016:localhost:25 user2 at machine2". I was
> able to telnet to each on port 40015 and 40016 without an issue or one
> disconnecting the other. Could there be a problem in the way libssh is
> handling multiple channels? - John
>
> _<snip>
Correction - we are getting error messages about the connection on
30001:
Aug 1 12:32:21 jasiii sshd[11630]: error: connect_to localhost port
30001: failed.
Aug 1 12:32:21 jasiii sshd[11630]: channel_by_id: 0: bad id: channel
free
More information about the x2go-dev
mailing list