Hi, I’m trying to find a list of the integrity checks made before the ArtifactCommit phase.
In particular I’m interested in knowing if the devices checks in with the server and if the update would fail if the server it’s not reachable in this phase.
I know it’s a weird situation, but could happen if for example the device has no access to a DNS server and has the IP of the server explicitly set in /etc/hosts…but this configuration is not present in the update, so after the reboot the device is unable to resolve the server IP
The scenario you are mentioning could happen. But if you are using A/B partitioning, the system should be able to rollback to the previous status where the connection was successful.
Usually the mender client checks the checksum of the downloaded update, and if after rebooting the system we are in the right partition it will consider it successful. Depending on what kind of update module you use or if you are performing a full system update the checks could change.
I recommend taking a look at this section State scripts | Mender documentation. For example, I would say that “connecting/ping to a server in particular” could be a custom task as part of the ArtifactCommit process of the state scripts.
Please let me know if anything is not clear enough or you need more information.
I think I failed to explain clearly my goal
I would like the update (with A/B partitioning) to succeed even without connectivity to the server after the reboot.
A connection failure to the Mender server is considered a rollback event. This is because we want to guarantee that we can always connect to the server in order to deliver another update to avoid bricking devices. If your system may have a gap in time where connectivity is available then you will need to use a Sync_Enter state script to delay the connectivity until the network is online. The server will eventually time out though and mark the deployment as failed so you will need to arrange for it to eventually connect. @kacf do you know what the timeout is in this case?
I think you actually meant an ArtifactReboot_Leave script, no? In any case, if we assume that the kernel returns failure immediately (because DNS is down, for example), then I think the timeout is roughly about 3 * RetryPollIntervalSeconds, plus a little bit extra since it also tries lower intervals first.