Failure updating with Mender

Tirank · December 1, 2021, 3:17pm

Hello,
I am recently experiencing failures during OTA update.
I are using Yocto, Jetson Nano/nx, client version is 2.6.1 and server is currently self hosted.
Log on the device side is:
2021-12-01 14:38:44 +0000 UTC info: Running Mender client version: 2.6.1
2021-12-01 14:38:45 +0000 UTC info: State transition: update-fetch [Download_Enter] → update-store [Download_Enter]
2021-12-01 14:38:45 +0000 UTC info: No public key was provided for authenticating the artifact
2021-12-01 14:38:45 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2021-12-01 14:38:51 +0000 UTC info: Opening device “/dev/mmcblk0p18” for writing
2021-12-01 14:38:51 +0000 UTC info: Native sector size of block device /dev/mmcblk0p18 is 512 bytes. Mender will write in chunks of 1048576 bytes
2021-12-01 14:45:48 +0000 UTC info: Daemon terminated with SIGTERM
2021-12-01 14:46:27 +0000 UTC info: Running Mender client version: 2.6.1
2021-12-01 14:46:27 +0000 UTC error: Mender shut down in state: update-store
2021-12-01 14:46:27 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2021-12-01 14:46:27 +0000 UTC info: State transition: init [none] → cleanup [Error]
2021-12-01 14:46:27 +0000 UTC info: State transition: cleanup [Error] → update-status-report [none]
2021-12-01 14:46:27 +0000 UTC error: Failed to report status: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC error: error reporting update status: reporting status failed: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC error: Failed to send status to server: transient error: reporting status failed: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC info: State transition: update-status-report [none] → update-retry-report [none]
2021-12-01 14:51:26 +0000 UTC info: State transition: update-retry-report [none] → update-status-report [none]

My main question is with the states transitioning done by Mender.
I see update-etc transitions to update-store. Does this means download has finished?
7 minutes after that I see the daemon has been killed, is this something expected?
Then I see daemon was started again and moved to report.

Can you please explain what was the process here?

Thanks

kacf · December 3, 2021, 1:06pm

The killing and starting again happened because your device rebooted. Afterwards Mender is attempting to report from the new partition. From the looks of it, it seems like the network settings may be broken here. Port 53 indicates something is wrong with the DNS.

If you have the opportunity, try to log into the device after the reboot and take a look at the network settings.

Tirank · January 22, 2022, 5:29pm

Hi @kacf,
I was able to capture pcap log while this happened and it doesnt seem like a dns error.
From what I see, the device was able to establish a tlsv3 connection with the server, i see data sent and received (encrypted) and then connection closed.
I also incorporated a dns check in the reboot leave script which shows success just before the mender reports the above failure.
The only thing i can think of is a server replying badly, probably need to check the server log (version 2.6).

Thanks

dellgreen · January 22, 2022, 6:39pm

for self hosted server are you running with docker-compose or kubernetes?

In golang:

dial tcp: lookup [ota.nvidia.local.com](http://ota.nvidia.local.com) on [::1]:53: server misbehavingi

seems to indicate it cannot resolve the domain name.

dellgreen · January 22, 2022, 6:54pm

github.com/golang/go

net: LookupIP("doesnotexist.domain") returns "server misbehaving" when resolv.conf contains search lists

opened 11:15AM - 22 Sep 15 UTC

closed 03:17PM - 17 Dec 15 UTC

mbenkmann

FrozenDueToAge

The following test program returns "lookup doesnotexist.domain: no such host" wh…en run with https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz but returns "lookup doesnotexist.domain on 172.16.2.203:53: server misbehaving" with https://storage.googleapis.com/golang/go1.5.linux-amd64.tar.gz and https://storage.googleapis.com/golang/go1.5.1.linux-amd64.tar.gz ``` package main import "net" import "fmt" func main() { _, err := net.LookupIP("doesnotexist.domain") fmt.Println(err) } ``` When I change resolv.conf to use nameserver 8.8.8.8 the output is correct. Apparently something has changed in Go 1.5 that prevents it from understanding the reply from our internal DNS server. nslookup does not have a problem: ``` > nslookup doesnotexist.domain 172.16.2.203 Server: 172.16.2.203 Address: 172.16.2.203#53 ** server can't find doesnotexist.domain: NXDOMAIN > nslookup doesnotexist.domain 8.8.8.8 Server: 8.8.8.8 Address: 8.8.8.8#53 ** server can't find doesnotexist.domain: NXDOMAIN ``` dig also has no problem ``` > dig @172.16.2.203 doesnotexist.domain ; <<>> DiG 9.9.5-3ubuntu0.4-Ubuntu <<>> @172.16.2.203 doesnotexist.domain ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 37809 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;doesnotexist.domain. IN A ;; AUTHORITY SECTION: . 1200 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2015092200 1800 900 604800 86400 ;; Query time: 2 msec ;; SERVER: 172.16.2.203#53(172.16.2.203) ;; WHEN: Tue Sep 22 13:13:11 CEST 2015 ;; MSG SIZE rcvd: 123 ```

Tirank · January 22, 2022, 7:47pm

This is a self hosted server, docker compose based.
I understand this looks like a dns issue but its definitely not…
I can clearly see from the pcap log that the device has reached the server. I also know the device waits for dns resolution during the reboot leave and log shows the name has been resolved successfully

Topic		Replies	Views
Mender 2.0 update failure because wifi network not immediately available General Discussions	9	1618	June 19, 2019
OTA update failing with: Update was interrupted in state: update-store General Discussions	3	765	November 25, 2020
Connection issue with mender server General Discussions	7	661	October 28, 2020
Update is succesfull but mender announces to the server that it failed General Discussions	5	819	November 11, 2019
FOTA Failed for BBB General Discussions	8	433	October 25, 2019

Failure updating with Mender

Related topics