Read-only file system / issues with Overlayfs

Hello,
We are currently testing Mender’s capabilities. Our goal is to manage Ubuntu server-based images and deploy system, application, and container updates.
For this, we also aim to use a read-only file system.

I discovered that one of the easiest approaches is to use an overlay. Using overlayroot in tmpfs mode allows any changes to the file system to be discarded upon the next boot.

However, this method seems to be incompatible with how Mender operates. Deploying an OS update results in the following failure:

2024-09-11 13:16:21.578 +0000 UTC info: Running Mender client 4.0.2
2024-09-11 13:16:21.578 +0000 UTC info: Deployment with ID <ID> started.
2024-09-11 13:16:21.579 +0000 UTC info: Sending status update to server
2024-09-11 13:16:21.967 +0000 UTC info: Installing artifact...
2024-09-11 13:16:22.135 +0000 UTC info: Update Module output (stderr): Mounted root does not match boot loader environment (/dev/nvme0n1p2)!
2024-09-11 13:16:22.135 +0000 UTC error: Process returned non-zero exit status: Download: Update Module returned non-zero status: Process exited with status 1
2024-09-11 13:16:22.146 +0000 UTC info: Sending status update to server

This is due to check_device_matches_root() found in:

Since there is an overlay, / not directly mounted from the exoected rootfsa or rootfsb partition, and appears with a different ID.

Here are some potential solutions I can think of:

  1. Disabling overlayroot when performing updates, but I’m unsure how to implement this.
  2. Creating a rootfs-overlay to update previous rootfs-image script. Adding the ability to handle overlayfs cases in the checks.

Is there any better option?
What are Mender’s recommendations for read-only file systems? This seems to be a requirement for delta updates, but I couldn’t find additional documentation on this.

Thank you.

Hi @Remicorp,

Thanks for reaching out! Yes, having a read-only root filesystem is a hard requirement for having delta updates on it. The best practices here vary a bit, as it depends heavily on the use case.

My personal take would be to take a step back, and think about what problem you want to solve with the overlayfs. Which writes or modifications is it supposed to catch? What processes cause those? Are those really necessary for the operation? If you want them to be discarded upon reboot anyways, my (possibly wrong) guess is that the overlayfs is more like a band-aid in this situation for not doing this probably tedious work.

On how to deal with this, both of your approaches are perfectly valid, however just variations of the same theme: “adjust the rootfs-image Update Module` to your use case”. You could disable and re-enable the overlay in it, or you can adjust the partition check, whatever works better for you.

Greets
Josef

Hi @TheYoctoJester,
Thank you for your reply.

Well, the goal is to have a very robust system that doesn’t gradually change over time. Ending up with a fleet of device in diverging states. Making some containers updates and only a few system ones along the way. There will be no physical access, so if anything goes wrong, a reboot should restore the original state and bring the device back to a working condition.

  • Which writes or modifications is it supposed to catch? As many as possible.
  • What processes cause those? It’s hard to predict. The system needs to be able to recover from a buggy dependency, a corruption caused by a power outage, the disk getting full due to too many logs, …

We did not consider custom Yocto builds at the moment due to our very limited knowledge and time constraints. We could use something like NixOS or Fedora IOT, but they are not officially supported and require manual work. Another constraint is our hardware that might vary from device to device (same architecture, but drivers and external devices could change).
This is why we were testing some Debian-based solutions. However, it is hard to know simply which solutions exist and are best suited to archive such read-only system.

Now, what exactly do you mean by saying “having a read-only root filesystem is a hard requirement”? Does it imply that basically 100% of the rootfs must match the original artifact, allowing delta updates to create a diff from the new artifact and update only the changed files?
Or is making /bin /lib /usr RO enough ? Keeping /etc /home /var RW, which appears to be a Debian requirement .