Mender-connect randomly doesn't connect to server

Hi,

we integrated mender-connect to our product and it is a really great tool!!

Randomly we are facing problems that devices don’t connect to server after a reboot. mender-client connection works fine.

Log output from journalctl -u mender-connect:

Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:38223->[::1]:53: read: connection refused; reconnecting in 60s (try 29/0); len(token)=656”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:37531->[::1]:53: read: connection refused; reconnecting in 60s (try 30/0); len(token)=656”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:36167->[::1]:53: read: connection refused; reconnecting in 60s (try 31/0); len(token)=656”
Mar 15 11:00:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T11:00:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”

The problem is gone after another reboot of the device.
Any idea what’s wrong here? Perhaps any issue with dependencies on startup?

Best regards
Ruben

1 Like

Interesting.

Are you running a versioned mender-connect or have you built this from source?

What does your /etc/mender/mender-connect.conf file look like?

Hi

we are building our system with yocto on dunfell branch.
mender-connect.conf:

{
    "ReconnectIntervalSeconds": 60,
"ServerURL": "https://xxxxxxxxx/",
    "Shell": "/bin/bash",
    "ShellCommand": "/bin/bash",
    "User": "tester"
}

But, in meantime we saw that after waiting some time (~10min) the issue is gone. Anyway, it’s a little bit strange that sometimes it takes some time until mender-connect is able to connect to server.

BTW: mender server and mender-connect server are the same machine, same URL.

Hi @ruben,

The time sync must be completed before mender-connect is successful due to certification verification. Do you have an RTC on your board or is there a delay at boot while systemd-timesync establishes the current time?

Drew

1 Like

I notice a similar issue intermittently on my builds as well, but with mender-client. If you look closely, mender-connect (or in my case, mender-client) is using [::1] to reach the server, which is the “lo” (loopback) interface… so of course it’s not going to reach anything. After an unknown event occurs/some time has passed, the situation fixes itself.

Sample ifconfig output that shows that “::1” is the IPv6 address for “lo”:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500  metric 1
        inet 192.168.1.110  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::2e0:70ff:fe9d:97ea  prefixlen 64  scopeid 0x20<link>
        ether 00:e0:70:9d:97:ea  txqueuelen 1000  (Ethernet)
        RX packets 8060  bytes 664136 (648.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2113  bytes 175811 (171.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536  metric 1
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 27440  bytes 2101120 (2.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27440  bytes 2101120 (2.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Timesync should be ok, this was also our first thought. But mender-client is working fine at the same time

@hacpa this is really interesting! We are using networkmanager. I will double-check when routes are created and when networkmanager is started.

But it is the same on our side, the situation is fixed by itself after some time

Hello @ruben

thank you for using mender-connect.
Could you verify the following scenario:

  • stop both mender-client and mender-connect
  • start mender-client only
  • check the mender-client is running and the device is authorized (please be sure of that)
  • start mender-connect by hand with debug:
mender-connect --debug daemon

can you reproduce the behaviour with the above?

best regards,
peter

Hi again,

in meantime we saw this problem also on other components, e.g. when downloading docker containers. But it is always related to port 53 (DNS).
It seems it is not directly related to mender-connect, we will investigate anyway what is the problem here.