[X2Go-Dev] Bug#1315: loadchecker bad auto sleep time leads to faulty service

Walid MOGHRABI w.moghrabi at servicemagic.eu
Mon Aug 13 12:43:45 CEST 2018


package: x2gobroker-loadchecker
version: 0.0.4.0-0~1038~ubuntu16.04.1 
priority: bug 


I had a problem with broker loadbalancing not working anymore even though the loadchecker service seemed to work well.
Looking at the logs, I found an unusual "sleep time" for the loadchecker service then, after that, it seemed to never re-caclulate load on remote servers.

---
2018-08-08 08:43:40,379 - loadchecker - INFO - Executing agent command on remote host tce-server-14 (10.50.0.214): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:43:40,812 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:49; memAvail:21904; myMemAvail:22030; numCPU:16; typeCPU:2400; 
2018-08-08 08:43:40,812 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-14, new load factor is: 1716558
2018-08-08 08:43:40,813 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:43:48,051 - loadchecker - INFO - Executing agent command on remote host tce-server-13 (10.50.0.213): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:43:48,497 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:126; memAvail:21117; myMemAvail:21242; numCPU:16; typeCPU:2400; 
2018-08-08 08:43:48,497 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-13, new load factor is: 643565
2018-08-08 08:43:48,498 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:43:55,734 - loadchecker - INFO - Executing agent command on remote host tce-server-12 (10.50.0.212): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:43:56,162 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:52; memAvail:22260; myMemAvail:22386; numCPU:16; typeCPU:2400; 
2018-08-08 08:43:56,162 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-12, new load factor is: 1643815
2018-08-08 08:43:56,163 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:44:03,404 - loadchecker - INFO - Executing agent command on remote host tce-server-11 (10.50.0.211): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:44:03,850 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:123; memAvail:21398; myMemAvail:21524; numCPU:16; typeCPU:2400; 
2018-08-08 08:44:03,851 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-11, new load factor is: 668035
2018-08-08 08:44:03,851 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:44:11,093 - loadchecker - INFO - Executing agent command on remote host tce-server-10 (10.50.0.210): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:44:11,532 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:84; memAvail:22133; myMemAvail:22259; numCPU:16; typeCPU:2400; 
2018-08-08 08:44:11,532 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-10, new load factor is: 1011794
2018-08-08 08:44:11,533 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:44:18,774 - loadchecker - INFO - Executing agent command on remote host tce-server-09 (10.50.0.209): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-08-08 08:44:19,206 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:93; memAvail:21758; myMemAvail:21883; numCPU:16; typeCPU:2400; 
2018-08-08 08:44:19,206 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-09, new load factor is: 898394
2018-08-08 08:44:19,206 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server
2018-08-08 08:44:26,189 - loadchecker - DEBUG - LoadChecker.loadchecker(): performed 44 queries (failures: 0), sleeping for -6.976744186046517secs before starting next query cycle
2018-08-08 08:46:47,377 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname=
2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-11: 763244
2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-23: 1742811
2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-26: 507295
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-18: 899328
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-19: 932160
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-09: 879966
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-28: 834700
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-17: 721955
2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-25: 715089
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-29: 1195074
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-20: 1431168
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-27: 1196586
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-31: 1972650
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-13: 735827
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-24: 780717
2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-10: 932487
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-12: 1237871
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-30: 1371001
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-21: 587846
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-14: 1616787
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-15: 741288
2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-22: 924074
2018-08-08 08:46:54,245 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname=
2018-08-08 08:46:54,245 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-11: 763244
2018-08-08 08:46:54,245 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-23: 1742811
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-26: 507295
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-18: 899328
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-19: 932160
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-09: 879966
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-28: 834700
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-17: 721955
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-25: 715089
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-29: 1195074
2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-20: 1431168
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-27: 1196586
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-31: 1972650
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-13: 735827
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-24: 780717
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-10: 932487
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-12: 1237871
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-30: 1371001
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-21: 587846
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-14: 1616787
2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-15: 741288
2018-08-08 08:46:54,248 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-22: 924074
2018-08-08 08:47:21,453 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname=
---

As you can see, there is a strange line with a '-6.976744186046517secs' value for sleep time which seems to be interpreted as an infinite sleep time by the process and then, load is never recalculated :

---
2018-08-08 08:44:26,189 - loadchecker - DEBUG - LoadChecker.loadchecker(): performed 44 queries (failures: 0), sleeping for -6.976744186046517secs before starting next query cycle
---

I don't see any reason why this happens and I was unable to reproduce it manually, it is a rare case but it already happened more than once.
---
DISCLAIMER: This e-mail is private and confidential and may contain proprietary or legally privileged information. It is for the intended recipient only. If you have received this email in error, please notify the author by replying to it and then destroy it. If you are not the intended recipient you must not use, disclose, distribute, copy, print or rely on this e-mail or any attachment. Thank you


More information about the x2go-dev mailing list