Mender-connect stop working on short internet outtage

Hi,

We have problems that mender-connect stops working when the internet connection (in our case dsl reconnect) is lost for a short time. Network and link is still there.
This issue doesn’t happen on each dsl reconnect. Currently we don’t see any other parameter how to trigger the problem.

We use mender-connect builded by yocto on dunfell release.

Step-by-Step description

  • mender-connect service starts, mender shell is working fine
  • router does dsl reconnect (or something similar which causes a short internet outtage)
    LOGS
Apr 18 14:35:28 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:35:28Z" level=debug msg="PingHandler called"
Apr 18 14:35:28 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:35:28Z" level=debug msg="PongHandler called"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="ping message"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="PingHandler called"
Apr 18 14:36:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:36:22Z" level=debug msg="PongHandler called"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="ping message"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="PingHandler called"
Apr 18 14:37:17 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:37:17Z" level=debug msg="PongHandler called"
Apr 18 14:38:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:13Z" level=debug msg="ping message"
Apr 18 14:38:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:21Z" level=debug msg="PingHandler called"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="webSock.ReadMessage()=<nil>,websocket: close 1006 (abnormal closure): unexpected EOF"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="messageLoop: called readMessage: <nil>,websocket: close 1006 (abnormal closure): unexpected EOF"
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=error msg="messageLoop: error on readMessage: websocket: close 1006 (abnormal closure): unexpected EOF; disconnecting, waiting for reconnect."
Apr 18 14:38:25 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:25Z" level=debug msg="messageLoop: posting event: reconnect-req; waiting for response"
Apr 18 14:38:33 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:33Z" level=debug msg="needsReconnect: got event: reconnect-req"
Apr 18 14:38:33 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:38:33Z" level=debug msg="dbusEventLoop: daemon needs to reconnect"
Apr 18 14:39:35 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:39:12Z" level=debug msg="(dbusEventLoop) posting Event: reconnect"
Apr 18 14:40:45 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:39:45Z" level=debug msg="eventLoop: got event: reconnect"
Apr 18 14:43:36 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:40:17Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 14:48:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:45:17Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 1/0); len(token)=0"
Apr 18 14:54:11 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:52:27Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 14:59:22 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T14:57:50Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 2/0); len(token)=0"
Apr 18 15:01:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:01:37Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Apr 18 15:01:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:01:37Z" level=error msg="connection manager failed to connect to /api/devices/v1/deviceconnect/connect: malformed ws or wss URL; reconnecting in 60s (try 3/0); len(token)=0"
Apr 18 15:02:37 SOCP2001-00000101 mender-connect[7229]: time="2021-04-18T15:02:37Z" level=warning msg="The server certificate cannot be loaded: no file provided"
  • After that, mender-connect needs to be restarted to work again

I did some research in the code, for me the dbus events seem to work fine.

Regarding to the log entry above the server URL seems to be gone.
I guess the issue is caused here: mender-connect/daemon.go at master · mendersoftware/mender-connect · GitHub

But from here on I am lost. I don’t know where the jwt is created. Can somebody help me with this?
Also I am currently not able to build mender-connect locally on my Ubuntu 64bit machine to add some more traces. Is there any documentation how to build mender-connect from amd64 dev machine for arm32 architecture?

@kacf, @peter is this something either of you can help with?

Hello @ruben

thanks for the report. Could you please verify what happens if you use ServerURL in the mender-connect.conf file, and point it to the server?

best regards,
peter

Hi,

ServerURL is set in our configuration, see

{
    "ReconnectIntervalSeconds": 60,
"ServerURL": "https://OUR.SERVER.URL/",
    "ShellCommand": "/bin/bash",
    "User": "xxx",
    "Sessions": {
      "StopExpired": true,
      "ExpireAfter": 0,
      "ExpireAfterIdle": 30,
      "MaxPerUser": 10
    }
}

best regards
Ruben