Mender-connect stop working on short internet outtage

Hi,

We have problems that mender-connect stops working when the internet connection (in our case dsl reconnect) is lost for a short time. Network and link is still there.
This issue doesn’t happen on each dsl reconnect. Currently we don’t see any other parameter how to trigger the problem.

We use mender-connect builded by yocto on dunfell release.

Step-by-Step description

  • mender-connect service starts, mender shell is working fine
  • router does dsl reconnect (or something similar which causes a short internet outtage)
    LOGS
Apr 18 14:35:28 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:35:28Z" level=debug msg="PingHandler called"
Apr 18 14:35:28 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:35:28Z" level=debug msg="PongHandler called"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="ping message"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="PingHandler called"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="PongHandler called"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="ping message"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="PingHandler called"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="PongHandler called"
Apr 18 14:38:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:13Z" level=debug msg="ping message"
Apr 18 14:38:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:21Z" level=debug msg="PingHandler called"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="webSock.ReadMessage()=<nil>,websocket: close 1006 (abnormal closure): unexpected EOF"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="messageLoop: called readMessage: <nil>,websocket: close 1006 (abnormal closure): unexpected EOF"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=error msg="messageLoop: error on readMessage: websocket: close 1006 (abnormal closure): unexpected EOF; disconnecting, waiting for reconnect."
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="messageLoop: posting event: reconnect-req; waiting for response"
Apr 18 14:38:33 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:33Z" level=debug msg="needsReconnect: got event: reconnect-req"
Apr 18 14:38:33 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:33Z" level=debug msg="dbusEventLoop: daemon needs to reconnect"
Apr 18 14:39:35 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:39:12Z" level=debug msg="(dbusEventLoop) posting Event: reconnect"
Apr 18 14:40:45 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:39:45Z" level=debug msg="eventLoop: got event: reconnect"
Apr 18 14:43:36 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:40:17Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 14:48:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:45:17Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 1/0); len(token)=0"
Apr 18 14:54:11 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:52:27Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 14:59:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:57:50Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 2/0); len(token)=0"
Apr 18 15:01:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:01:37Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 15:01:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:01:37Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 3/0); len(token)=0"
Apr 18 15:02:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:02:37Z" level=warning msg="The server certificate cannot be loaded: no file provided"
  • After that, mender-connect needs to be restarted to work again

I did some research in the code, for me the dbus events seem to work fine.

Regarding to the log entry above the server URL seems to be gone.
I guess the issue is caused here: mender-connect/daemon.go at master · mendersoftware/mender-connect · GitHub

But from here on I am lost. I don’t know where the jwt is created. Can somebody help me with this?
Also I am currently not able to build mender-connect locally on my Ubuntu 64bit machine to add some more traces. Is there any documentation how to build mender-connect from amd64 dev machine for arm32 architecture?

@kacf, @peter is this something either of you can help with?

Hello @ruben

thanks for the report. Could you please verify what happens if you use ServerURL in the mender-connect.conf file, and point it to the server?

best regards,
peter

Hi,

ServerURL is set in our configuration, see

{
    "ReconnectIntervalSeconds": 60,
"ServerURL": "https://OUR.SERVER.URL/",
    "ShellCommand": "/bin/bash",
    "User": "xxx",
    "Sessions": {
      "StopExpired": true,
      "ExpireAfter": 0,
      "ExpireAfterIdle": 30,
      "MaxPerUser": 10
    }
}

best regards
Ruben

Hi @ruben ,

did you solve this problem?

best regards
Mark

Hello,
I am experiencing the same problems with hosted.mender.io and mender-connect and raspberry Pi 3b devices. As the devices connect via mqtt to our cloud system, I can see that after short disconnections from the mqtt broker. The only way to get the mender-connect back working, is to issue a reset command via mqtt and reboot the raspberry pi.

Are there any news on this topic ?

What else can I do to prevent rebooting the pi ?

kernel
Linux version 4.19.93 (oe-user@oe-host) (gcc version 9.2.0 (GCC)) #1 SMP Thu Feb 18 13:45:21 UTC 2021

mender_bootloader_integration
uboot

mender_client_version
3.1.0

network_interfaces
eth0

os
Poky (Yocto Project Reference Distro) 3.0.4 (zeus)

rootfs_type
ext4

update_modules
deb,dir-overlay,directory,docker,rpm,script,single-file

Best regards,
Dirk