Problem with mender connect remote terminal

When the device first came online, clock was wrong and it couldn’t talk to Mender. All fine.

After a while it connected, but then it got this error and couldn’t recover.

Jan 27 17:18:31 ifu-273 mender-connect[525]: time="2021-01-27T17:18:31Z" level=error msg="messageLoop: error on readMessage: read tcp 192.168.1.170:43264->3.219.180.200:443: read: connection reset by peer; disconnecting, waiting for reconnect."

It then got stuck here for 10 minutes until I got fedup and restarted the service at which point it worked immediately.

Sep 20 10:44:30 ifu-273 mender-connect[525]: time="2020-09-20T10:44:30Z" level=error msg="connection manager failed to connect to https://hosted.mender.io/api/devices/v1/deviceconnect/connect: x509: certificate has expired or is not yet valid: current time 2020-09-20T10:44:30Z is before 2021-01-21T08:53:09Z; reconnecting in 5s (try 5/0); len(token)=884"
Jan 27 16:50:59 ifu-273 mender-connect[525]: time="2021-01-27T16:50:59Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Jan 27 17:18:31 ifu-273 mender-connect[525]: time="2021-01-27T17:18:31Z" level=error msg="messageLoop: error on readMessage: read tcp 192.168.1.170:43264->3.219.180.200:443: read: connection reset by peer; disconnecting, waiting for reconnect."
Jan 27 17:18:41 ifu-273 mender-connect[525]: time="2021-01-27T17:18:41Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Jan 27 17:27:53 ifu-273 systemd[1]: Stopping Mender Connect service...
Jan 27 17:27:53 ifu-273 systemd[1]: mender-connect.service: Succeeded.
Jan 27 17:27:53 ifu-273 systemd[1]: Stopped Mender Connect service.
Jan 27 17:27:53 ifu-273 systemd[1]: Started Mender Connect service.
Jan 27 17:27:53 ifu-273 mender-connect[899]: time="2021-01-27T17:27:53Z" level=info msg="Loaded configuration file: /etc/mender/mender-connect.conf"
Jan 27 17:27:53 ifu-273 mender-connect[899]: time="2021-01-27T17:27:53Z" level=warning msg="ShellCommand is empty, defaulting to /bin/sh"
Jan 27 17:27:54 ifu-273 mender-connect[899]: time="2021-01-27T17:27:54Z" level=warning msg="The server certificate cannot be loaded: no file provided"

Any ideas?

We have tried rttys and shellhub in the past to do the same thing (remote ssh) and they have both been plagued by this sort of thing (not reliable when you most need them).

@lluiscampos any thoughts on this one?

I will defer this to @merlin

@jakew009 what is really the issue?

The printed warnings, while confusing, could be completely fine depending on your setup. Which image are you using (yocto based, mender-convert)? Which server are you connecting to?

Hello

We are using Yocto (Dunfell) and hosted mender.

The issue is that the device was “offline” (remote terminal not available) in Mender. Restarting Mender brings it back online.

This line seems to get echoed out when it goes offline:

Jan 27 17:18:31 ifu-273 mender-connect[525]: time="2021-01-27T17:18:31Z" level=error msg="messageLoop: error on readMessage: read tcp 192.168.1.170:43264->3.219.180.200:443: read: connection reset by peer; disconnecting, waiting for reconnect."

@jakew009 Right, I see. I got confused with your original message.

I thought it was a one-time error and that then you were worried about the “certificate cannot be loaded” warning and jumped in the conversation.

The error itself seems to be lost connection between mender-connect and the backend. @merlin should be the right one to ping then :slight_smile:

@lluiscampos yep that’s correct. The connection between mender-connect and the mender server is lost, and it does not recover automatically. But if you restart the mender-connect service, it reconnects instantly.

This is the major problem we’ve had with every other solution we’ve tried before that implement similar functionalities. The only thing we’ve had almost 100% success with is a VPN daemon, as these seem to be have been designed from the get-go to be persistent and naturally they have built in watchdogs / ack packets.

Maybe it is possible to add some sort of watchdog to mender-connect, so it can figure out when it has lost it’s connection and reset itself?

Hello @jakew009

thank you for using Mender.
mender-connect is designed to handle exactly the scenario you are describing, i.e.: it waits until in can reconnect, and resumes normal operations.
could I ask for you enable debug output? by changing the following line in the mender-connect.service file:

ExecStart=/usr/bin/mender-connect daemon

into

ExecStart=/usr/bin/mender-connect --debug daemon

and send the log?
thank you.
Also, the mender-client process is running at all times, right? and you restart only mender-connect? Could you share the log of mender-client as well?

best regards,
peter

I too have the same issue. I had to restart mender-client to see the launch remote terminal button. @peter , is this issue solved?

For some devices, I tried to reboot the device without restarting the mender-client. But the button doesn’t appear on the mender server.

Once you restarted the mender-client manually, you’ll see the launch button. After that, I had no issues launching the terminal even after a reboot.

Restarting mender-client manually at runtime is the only solution. weird.

We absolutely still have the same problem as well on random devices. Restarting mender-connect always fixes it immediately. Luckily we have a vpn and mender-connect is just a backup in case we break the vpn config somehow.

I haven’t had chance to get logs and because it happens randomly, it’s hard to reproduce / chose which devices to log.

Hello,

the issue has been fixed with the new release. please give it a go.

best regards
peter

1 Like

@peter can you elaborate where and from which version on this was fixed exactly? mender v2.5.x ? mender-connect v1.1 and up?

We’ve had the same issue and I’d like to make sure it’s solved in our next release.

It was fixed in Mender client 2.5.2, 2.6.1 and later versions.

1 Like

I appear to be having a very similar problem and I am on version 2.6.1 . It is a yocto build on Zeus. Some devices loose their connection to mender with the error messages “The server certificate cannot be loaded: no file provided” and “messageLoop: error on readMessage: websocket: close 1011 (inter
nal server error): read tcp 172.23.102.249:8080->172.23.121.10:35918: i/o timeout; disconnecting, waiting for reconnect.”. They will not reconnect until the mender client has been restarted

1 Like

Has anyone found a solution yet? @rlaybourn have you been able to resolve this issue?