Mender-connect randomly doesn't connect to server

Hi,

we integrated mender-connect to our product and it is a really great tool!!

Randomly we are facing problems that devices don’t connect to server after a reboot. mender-client connection works fine.

Log output from journalctl -u mender-connect:

Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:38223->[::1]:53: read: connection refused; reconnecting in 60s (try 29/0); len(token)=656”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:37531->[::1]:53: read: connection refused; reconnecting in 60s (try 30/0); len(token)=656”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:36167->[::1]:53: read: connection refused; reconnecting in 60s (try 31/0); len(token)=656”
Mar 15 11:00:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T11:00:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”

The problem is gone after another reboot of the device.
Any idea what’s wrong here? Perhaps any issue with dependencies on startup?

Best regards
Ruben

1 Like

Interesting.

Are you running a versioned mender-connect or have you built this from source?

What does your /etc/mender/mender-connect.conf file look like?

Hi

we are building our system with yocto on dunfell branch.
mender-connect.conf:

{
    "ReconnectIntervalSeconds": 60,
"ServerURL": "https://xxxxxxxxx/",
    "Shell": "/bin/bash",
    "ShellCommand": "/bin/bash",
    "User": "tester"
}

But, in meantime we saw that after waiting some time (~10min) the issue is gone. Anyway, it’s a little bit strange that sometimes it takes some time until mender-connect is able to connect to server.

BTW: mender server and mender-connect server are the same machine, same URL.

Hi @ruben,

The time sync must be completed before mender-connect is successful due to certification verification. Do you have an RTC on your board or is there a delay at boot while systemd-timesync establishes the current time?

Drew

1 Like

I notice a similar issue intermittently on my builds as well, but with mender-client. If you look closely, mender-connect (or in my case, mender-client) is using [::1] to reach the server, which is the “lo” (loopback) interface… so of course it’s not going to reach anything. After an unknown event occurs/some time has passed, the situation fixes itself.

Sample ifconfig output that shows that “::1” is the IPv6 address for “lo”:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500  metric 1
        inet 192.168.1.110  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::2e0:70ff:fe9d:97ea  prefixlen 64  scopeid 0x20<link>
        ether 00:e0:70:9d:97:ea  txqueuelen 1000  (Ethernet)
        RX packets 8060  bytes 664136 (648.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2113  bytes 175811 (171.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536  metric 1
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 27440  bytes 2101120 (2.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27440  bytes 2101120 (2.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Timesync should be ok, this was also our first thought. But mender-client is working fine at the same time

@hacpa this is really interesting! We are using networkmanager. I will double-check when routes are created and when networkmanager is started.

But it is the same on our side, the situation is fixed by itself after some time

Hello @ruben

thank you for using mender-connect.
Could you verify the following scenario:

  • stop both mender-client and mender-connect
  • start mender-client only
  • check the mender-client is running and the device is authorized (please be sure of that)
  • start mender-connect by hand with debug:
mender-connect --debug daemon

can you reproduce the behaviour with the above?

best regards,
peter