System:
- Raspberry Pi CM3+, Raspbian, kernel 4.19.75-v7l+
- Mender client: 2.5.0 runtime: go1.14.7
- NetworkManager
Summary:
- Non-working time synchronization may cause trouble when deploying a artifact
- Wrong time may cause DNSSEC lookup error
- DNSSEC lookup error makes mender unable to “phone home” to hosts specified by host name (hosted.mender.io) as DNS lookup is failing
This might be a problem on some distrubutions when date/time is way off on first boot. Ref. Can't sync time when time is incorrect due to dnssec · Issue #5873 · systemd/systemd · GitHub closed as late as 2020.06.23.
I’ve done several deployments on devices running Ubuntu without any issues (not cased by myeself…). This is the first Raspbian deployment I’ve done, and currently there are some issues I haven seen before.
I have an Mender Artifact failing caused by DNS lookup failing during report update. The deployment log looks like this:
2021-03-05 11:22:08 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2021-03-05 11:22:08 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2021-03-05 11:22:08 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2021-03-05 11:22:09 +0000 UTC error: Failed to report status: Put "https://hosted.mender.io/api/devices/v1/deployments/device/deployments/xxxx/status": dial tcp: lookup hosted.mender.io: no such host
2021-03-05 11:22:09 +0000 UTC error: error reporting update status: reporting status failed: Put "https://hosted.mender.io/api/devices/v1/deployments/device/deployments/xxxx/status": dial tcp: lookup hosted.mender.io: no such host
2021-03-05 11:22:09 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put "https://hosted.mender.io/api/devices/v1/deployments/device/deployments/xxxx/status": dial tcp: lookup hosted.mender.io: no such host
The artifact was correctly downloaded, and I was able to manually boot the new root by setting mender_boot_part
and mender_boot_part_hex
. The problem was apparently DNS lookup, and I was not able to ping any known hosts. This lead me to looking at the systemd-resolved log using journalctl. This contained a lot of DNSSEC failure.
Mar 07 03:01:48 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question mender.io IN DS: no-signature
Mar 07 03:01:48 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question hosted.mender.io IN DS: no-signature
Mar 07 03:01:48 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question hosted.mender.io IN SOA: no-signature
Mar 07 03:01:48 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question hosted.mender.io IN A: no-signature
Mar 07 03:01:51 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question ntp.org IN DS: no-signature
Mar 07 03:01:51 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question pool.ntp.org IN DS: no-signature
Mar 07 03:01:51 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question 2.debian.pool.ntp.org IN SOA: no-signature
Mar 07 03:01:51 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question 2.debian.pool.ntp.org IN A: no-signature
Mar 07 03:01:51 somecontroller systemd-resolved[22188]: DNSSEC validation failed for question 2.debian.pool.ntp.org IN DS: no-signature
At this time, i realized thta the controller time was way off (severl hours). Setting the time/date manually fixed the DNS lookup failed becuase of DNSSEC.
To re-produce the error, set the time “wrong” and flush systemd-resolved cache:
$ sudo date -s 01:31
$ sudo systemd-resolve --flush-caches
$ ping vg.no
ping: vg.no: Name or service not known
In my case, this should not be possible, and the problem was actually a conflict between ntpd
and systemd-timesyncd
. No need for ntpd, so i removed it with sudo apt purge ntp
. As long as time sych is up and running, DNSSEC should also be working correctly.
Edit:
I still don’t have a good solution on this problem. systemd-timesyncd
It seems that whenever a new Mender artifact is downloaded, the time is set to the artifact-creation (or image-creation) date. Currently, I see the following woirkarounds:
- Disable DNSSEC
- Add a NTP-address in /etc/hosts
- Add some sort of script to set the time at boot, based on known ip-addresses, RTC or something
- Add fallback NTP-servers (with ip-address) in /etc/systemd/timesyncd.conf