When the device first came online, clock was wrong and it couldn’t talk to Mender. All fine.
After a while it connected, but then it got this error and couldn’t recover.
Jan 27 17:18:31 ifu-273 mender-connect[525]: time="2021-01-27T17:18:31Z" level=error msg="messageLoop: error on readMessage: read tcp 192.168.1.170:43264->3.219.180.200:443: read: connection reset by peer; disconnecting, waiting for reconnect."
It then got stuck here for 10 minutes until I got fedup and restarted the service at which point it worked immediately.
Sep 20 10:44:30 ifu-273 mender-connect[525]: time="2020-09-20T10:44:30Z" level=error msg="connection manager failed to connect to https://hosted.mender.io/api/devices/v1/deviceconnect/connect: x509: certificate has expired or is not yet valid: current time 2020-09-20T10:44:30Z is before 2021-01-21T08:53:09Z; reconnecting in 5s (try 5/0); len(token)=884"
Jan 27 16:50:59 ifu-273 mender-connect[525]: time="2021-01-27T16:50:59Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Jan 27 17:18:31 ifu-273 mender-connect[525]: time="2021-01-27T17:18:31Z" level=error msg="messageLoop: error on readMessage: read tcp 192.168.1.170:43264->3.219.180.200:443: read: connection reset by peer; disconnecting, waiting for reconnect."
Jan 27 17:18:41 ifu-273 mender-connect[525]: time="2021-01-27T17:18:41Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Jan 27 17:27:53 ifu-273 systemd[1]: Stopping Mender Connect service...
Jan 27 17:27:53 ifu-273 systemd[1]: mender-connect.service: Succeeded.
Jan 27 17:27:53 ifu-273 systemd[1]: Stopped Mender Connect service.
Jan 27 17:27:53 ifu-273 systemd[1]: Started Mender Connect service.
Jan 27 17:27:53 ifu-273 mender-connect[899]: time="2021-01-27T17:27:53Z" level=info msg="Loaded configuration file: /etc/mender/mender-connect.conf"
Jan 27 17:27:53 ifu-273 mender-connect[899]: time="2021-01-27T17:27:53Z" level=warning msg="ShellCommand is empty, defaulting to /bin/sh"
Jan 27 17:27:54 ifu-273 mender-connect[899]: time="2021-01-27T17:27:54Z" level=warning msg="The server certificate cannot be loaded: no file provided"
Any ideas?
We have tried rttys and shellhub in the past to do the same thing (remote ssh) and they have both been plagued by this sort of thing (not reliable when you most need them).
The printed warnings, while confusing, could be completely fine depending on your setup. Which image are you using (yocto based, mender-convert)? Which server are you connecting to?
@lluiscampos yep that’s correct. The connection between mender-connect and the mender server is lost, and it does not recover automatically. But if you restart the mender-connect service, it reconnects instantly.
This is the major problem we’ve had with every other solution we’ve tried before that implement similar functionalities. The only thing we’ve had almost 100% success with is a VPN daemon, as these seem to be have been designed from the get-go to be persistent and naturally they have built in watchdogs / ack packets.
Maybe it is possible to add some sort of watchdog to mender-connect, so it can figure out when it has lost it’s connection and reset itself?
thank you for using Mender.
mender-connect is designed to handle exactly the scenario you are describing, i.e.: it waits until in can reconnect, and resumes normal operations.
could I ask for you enable debug output? by changing the following line in the mender-connect.service file:
ExecStart=/usr/bin/mender-connect daemon
into
ExecStart=/usr/bin/mender-connect --debug daemon
and send the log?
thank you.
Also, the mender-client process is running at all times, right? and you restart only mender-connect? Could you share the log of mender-client as well?
We absolutely still have the same problem as well on random devices. Restarting mender-connect always fixes it immediately. Luckily we have a vpn and mender-connect is just a backup in case we break the vpn config somehow.
I haven’t had chance to get logs and because it happens randomly, it’s hard to reproduce / chose which devices to log.
I appear to be having a very similar problem and I am on version 2.6.1 . It is a yocto build on Zeus. Some devices loose their connection to mender with the error messages “The server certificate cannot be loaded: no file provided” and “messageLoop: error on readMessage: websocket: close 1011 (inter
nal server error): read tcp 172.23.102.249:8080->172.23.121.10:35918: i/o timeout; disconnecting, waiting for reconnect.”. They will not reconnect until the mender client has been restarted