ruben
March 16, 2021, 9:45am
1
Hi,
we integrated mender-connect to our product and it is a really great tool!!
Randomly we are facing problems that devices don’t connect to server after a reboot. mender-client connection works fine.
Log output from journalctl -u mender-connect:
Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:57:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:57:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:38223->[::1]:53: read: connection refused; reconnecting in 60s (try 29/0); len(token)=656”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:58:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:58:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:37531->[::1]:53: read: connection refused; reconnecting in 60s (try 30/0); len(token)=656”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
Mar 15 10:59:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T10:59:09Z” level=error msg=“connection manager failed to connect to https://MY_MENDER_SERVER_URL/api/devices/v1/deviceconnect/connect: dial tcp: lookup MY_MENDER_SERVER_URL on [::1]:53: read udp [::1]:36167->[::1]:53: read: connection refused; reconnecting in 60s (try 31/0); len(token)=656”
Mar 15 11:00:09 SOCP2001-20011115 mender-connect[355]: time=“2021-03-15T11:00:09Z” level=warning msg=“The server certificate cannot be loaded: no file provided”
The problem is gone after another reboot of the device.
Any idea what’s wrong here? Perhaps any issue with dependencies on startup?
Best regards
Ruben
1 Like
Interesting.
Are you running a versioned mender-connect
or have you built this from source?
What does your /etc/mender/mender-connect.conf
file look like?
ruben
March 22, 2021, 7:52am
3
Hi
we are building our system with yocto on dunfell branch.
mender-connect.conf
:
{
"ReconnectIntervalSeconds": 60,
"ServerURL": "https://xxxxxxxxx/",
"Shell": "/bin/bash",
"ShellCommand": "/bin/bash",
"User": "tester"
}
But, in meantime we saw that after waiting some time (~10min) the issue is gone. Anyway, it’s a little bit strange that sometimes it takes some time until mender-connect is able to connect to server.
BTW: mender server and mender-connect server are the same machine, same URL.
Hi @ruben ,
The time sync must be completed before mender-connect is successful due to certification verification. Do you have an RTC on your board or is there a delay at boot while systemd-timesync establishes the current time?
Drew
1 Like
hacpa
March 23, 2021, 12:38am
5
I notice a similar issue intermittently on my builds as well, but with mender-client. If you look closely, mender-connect (or in my case, mender-client) is using [::1] to reach the server, which is the “lo” (loopback) interface… so of course it’s not going to reach anything. After an unknown event occurs/some time has passed, the situation fixes itself.
Sample ifconfig output that shows that “::1” is the IPv6 address for “lo”:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 metric 1
inet 192.168.1.110 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::2e0:70ff:fe9d:97ea prefixlen 64 scopeid 0x20<link>
ether 00:e0:70:9d:97:ea txqueuelen 1000 (Ethernet)
RX packets 8060 bytes 664136 (648.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2113 bytes 175811 (171.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 metric 1
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 27440 bytes 2101120 (2.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 27440 bytes 2101120 (2.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ruben
March 23, 2021, 10:40am
6
Timesync should be ok, this was also our first thought. But mender-client is working fine at the same time
ruben
March 23, 2021, 10:42am
7
@hacpa this is really interesting! We are using networkmanager. I will double-check when routes are created and when networkmanager is started.
But it is the same on our side, the situation is fixed by itself after some time
peter
March 23, 2021, 2:15pm
8
Hello @ruben
thank you for using mender-connect.
Could you verify the following scenario:
stop both mender-client and mender-connect
start mender-client only
check the mender-client is running and the device is authorized (please be sure of that)
start mender-connect by hand with debug:
mender-connect --debug daemon
can you reproduce the behaviour with the above?
best regards,
peter
ruben
April 30, 2021, 7:42am
9
Hi again,
in meantime we saw this problem also on other components, e.g. when downloading docker containers. But it is always related to port 53 (DNS).
It seems it is not directly related to mender-connect, we will investigate anyway what is the problem here.