Failure updating with Mender

Hello,
I am recently experiencing failures during OTA update.
I are using Yocto, Jetson Nano/nx, client version is 2.6.1 and server is currently self hosted.
Log on the device side is:
2021-12-01 14:38:44 +0000 UTC info: Running Mender client version: 2.6.1
2021-12-01 14:38:45 +0000 UTC info: State transition: update-fetch [Download_Enter] → update-store [Download_Enter]
2021-12-01 14:38:45 +0000 UTC info: No public key was provided for authenticating the artifact
2021-12-01 14:38:45 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2021-12-01 14:38:51 +0000 UTC info: Opening device “/dev/mmcblk0p18” for writing
2021-12-01 14:38:51 +0000 UTC info: Native sector size of block device /dev/mmcblk0p18 is 512 bytes. Mender will write in chunks of 1048576 bytes
2021-12-01 14:45:48 +0000 UTC info: Daemon terminated with SIGTERM
2021-12-01 14:46:27 +0000 UTC info: Running Mender client version: 2.6.1
2021-12-01 14:46:27 +0000 UTC error: Mender shut down in state: update-store
2021-12-01 14:46:27 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2021-12-01 14:46:27 +0000 UTC info: State transition: init [none] → cleanup [Error]
2021-12-01 14:46:27 +0000 UTC info: State transition: cleanup [Error] → update-status-report [none]
2021-12-01 14:46:27 +0000 UTC error: Failed to report status: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC error: error reporting update status: reporting status failed: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC error: Failed to send status to server: transient error: reporting status failed: Put “https://ota.nvidia.local.com/api/devices/v1/deployments/device/deployments/e54cb889-0f58-484d-aacc-7eb3e3320ec6/status”: dial tcp: lookup ota.nvidia.local.com on [::1]:53: server misbehaving
2021-12-01 14:46:27 +0000 UTC info: State transition: update-status-report [none] → update-retry-report [none]
2021-12-01 14:51:26 +0000 UTC info: State transition: update-retry-report [none] → update-status-report [none]

My main question is with the states transitioning done by Mender.
I see update-etc transitions to update-store. Does this means download has finished?
7 minutes after that I see the daemon has been killed, is this something expected?
Then I see daemon was started again and moved to report.

Can you please explain what was the process here?

Thanks

The killing and starting again happened because your device rebooted. Afterwards Mender is attempting to report from the new partition. From the looks of it, it seems like the network settings may be broken here. Port 53 indicates something is wrong with the DNS.

If you have the opportunity, try to log into the device after the reboot and take a look at the network settings.

Hi @kacf,
I was able to capture pcap log while this happened and it doesnt seem like a dns error.
From what I see, the device was able to establish a tlsv3 connection with the server, i see data sent and received (encrypted) and then connection closed.
I also incorporated a dns check in the reboot leave script which shows success just before the mender reports the above failure.
The only thing i can think of is a server replying badly, probably need to check the server log (version 2.6).

Thanks

for self hosted server are you running with docker-compose or kubernetes?

In golang:

dial tcp: lookup [ota.nvidia.local.com](http://ota.nvidia.local.com) on [::1]:53: server misbehavingi

seems to indicate it cannot resolve the domain name.

This is a self hosted server, docker compose based.
I understand this looks like a dns issue but its definitely not…
I can clearly see from the pcap log that the device has reached the server. I also know the device waits for dns resolution during the reboot leave and log shows the name has been resolved successfully