[X2Go-Dev] Bug#1313: Bug#1313: Bug#1313: there is still a problem in getting a correct value for loadavgXX with loadchecker
Mike Gabriel
mike.gabriel at das-netzwerkteam.de
Fri Dec 14 16:02:52 CET 2018
Control: close -1
On Do 13 Sep 2018 15:19:19 CEST, Mike Gabriel wrote:
> Hi Walid,
>
> On Mo 13 Aug 2018 12:34:00 CEST, Walid MOGHRABI wrote:
>
>> package: x2gobroker-agent
>> version: 0.0.4.0-0~1038~ubuntu16.04.1
>> priority: bug
>>
>> I don't have a "0" value anymore since latest fixes so the
>> loadchecker process don't crash anymore but still, there is
>> something strange.
>> Here is a fragment of my loadchecker logs from this morning.
>> Just to give you the context, I have 22 servers which are all
>> automaticaly started at 6 AM (wake on lan) and they are absolutely
>> the same (blade servers with same CPU, memory amount, bios version,
>> ...).
>> I checked our monitoring to see if users were correctly distributed
>> over the farm and at 7:30AM, I had about 7 or 8 users connected but
>> 4 of them were on tce-server-21 where I should have had 1 user on 8
>> servers.
>
> Have you seen this issues more often? Does it hop from one server to
> another or occur on more than one server at a time?
>
>> Here is the loadchecker log fragment :
>>
>> root at tce-manager-01 [~] # grep -B 1 'loadavgXX:1;'
>> /var/log/x2gobroker/loadchecker.log
>> ...
>> 2018-07-24 07:15:01,200 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:15:01,622 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23810;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:17:50,354 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:17:50,779 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23686; myMemAvail:23812;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:20:32,550 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:20:32,964 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:23:21,610 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:23:22,034 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:26:03,872 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:26:04,286 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:28:52,917 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:28:53,338 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:31:35,252 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:31:35,670 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811;
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:34:24,424 - loadchecker - INFO - Executing agent
>> command on remote host tce-server-21 (10.50.0.221): sh -c
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:34:24,842 - loadchecker - INFO - Broker agent
>> answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809;
>> numCPU:16; typeCPU:2400;
>
> The log message "Broker agent answered:" comes directly from X2Go
> Broker Agent. It is basically its raw output.
>
> This means, that the flaw must be in x2gobroker-agent.pl on the
> remote X2Go Server. Or that the loadchecker stops querying the
> broker agent and re-uses old data.
>
> Looking at x2gobroker-agent.pl: If we focus on the loadavgXX for
> now, we come to the conclusion, that the load was really "0" or it
> was negative (both gives us a loadavgXX value of "1". The value
> should normally be greater (system load of 1.0 brings a loadavgXX of
> 100).
>
> Looking at x2gobroker.agent.py: As the values always change
> slightly, we can't say that Python provides us the same return
> result string all the time. The query to the broker agent must have
> happened.
>
> We need to do more debugging if this issue reoccurs:
>
> * run '/usr/lib/x2go/x2gobroker-agent foo checkload' on the
> affected X2Go Server
> and see if the reported values match with what the load checker sees.
>
> * check if it is reoccuring on the same X2Go Server
>
> * if /usr/lib/x2go/x2gobroker-agent returns a load of zero,
> look at /proc/loadavg
>
> * and /proc/sys/vm/min_free_kbytes,
> /proc/meminfo
> /proc/cpuinfo
>
> ... and report all back here...
>
>> As you can see, there is only 1 server with a loadavgXX = 1 (which
>> means that in fact, we got a zero value from the broker agent).
>> This is not normal, at 7:34, there were 4 users already connected
>> to this server and most of my other servers were empty.
>
>> Restarting x2gobroker-loadchecker service fixed the issue.
>
> Considering the above analysis that the issue must come from
> x2gobroker-agent.pl, a restart of the loadchecker can in theory not
> solve such an issue.
>
> Can you see the x2gobroker-agent.pl process appear and disappear in
> the process list on the remote X2Go Server? Or does it stay open,
> even zombied?
>
>> I think there is a problem in retrieving this informations ... even
>> memAvail seem strange on this server to me ... with 4 connected
>> users, it should have been lower than that.
>
> Hmmm... Ok... Maybe the wrong server got tested? Two identical IPs
> on the subnet?
>
>> I also think the number of connected users should be taken into
>> account when calculating the load factor (maybe this is already the
>> case, not sure about that).
>
> Yes, we take the number of sessions into account. But that is not
> provided by the broker agent, but is available in the X2Go Server
> database and queried from there.
>
> Mike
Request from Walid on IRC. Not an issue anymore.
Thus, closing...
Mike
--
DAS-NETZWERKTEAM
mike gabriel, herweg 7, 24357 fleckeby
mobile: +49 (1520) 1976 148
landline: +49 (4354) 8390 139
GnuPG Fingerprint: 9BFB AEE8 6C0A A5FF BF22 0782 9AF4 6B30 2577 1B31
mail: mike.gabriel at das-netzwerkteam.de, http://das-netzwerkteam.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 851 bytes
Desc: Digitale PGP-Signatur
URL: <http://lists.x2go.org/pipermail/x2go-dev/attachments/20181214/6d4f6930/attachment-0001.sig>
More information about the x2go-dev
mailing list