Change in mender rootfs creation breaks sdimg for systemd boot

There was a change that went in to meta-mender in warrior-v2019.10 with how rootfs partitions were created. It’s commit 30a27a9f5. This changed from using rootfs to rawcopy when the rootfs is placed into the sdimg file.

When using rootfs, wic will expand the filesystem to fill the entire partition. When using rawcopy is does not.

The rootfs ext4 image created has little to no extra free space if it is not expanded to fill the partition. Trying to boot it fails as systemd creates enough stuff on first boot to fill all space at which point the systemd boot processes starts getting unrecoverable failures in various critical units and aborts.

One can get more free space by increasing the size of the rootfs ext4 image itself to include more space. Perhaps make it as large as the partition. But there are problems with this. One is that the rootfs partitions sizes are calculated automatically, while an exact rootfs size must be specified directly. It needs to be manually kept in sync. The other, more serious problem, is that this creates a huge ext4 image if the partition is much larger than the initial contents of the rootfs, and this huge image is much slower to flash during an update. It compresses well, so while it’s some extra space to transmit, it’s not a huge amount.

The way you describe makes me think that some of your size configuration variables are incorrectly set. The size of the rootfs partitions and the filesystem are always the same in meta-mender, so even though rawcopy may not fill the space automatically, it will still do so in practice because meta-mender sets the size.

What size variables do you have set?

At a high level, the variables are set the same way as they were in mender 1.5.0, and the behavior has changed.

But here’s what’s relevant.

MENDER_BOOT_PART_SIZE_MB="32"
MENDER_CALC_ROOTFS_SIZE="1662976"
MENDER_DATA_PART_SIZE_MB="4096"
MENDER_STORAGE_TOTAL_SIZE_MB="7393"
IMAGE_ROOTFS_SIZE="1024"

The goal is to have a ext4 image which is only large enough to hold the files in it, which is much smaller than 1.6 GB. Only this image needs to be downloaded to devices and then flashed into a partition to update it. After being flashed, resize2fs can be used to expand it to fill the partition far faster.

The sdimg also has the rootfs expanded to fill the partition.

If you don’t set neither MENDER_CALC_ROOTFS_SIZE nor IMAGE_ROOTFS_SIZE, then both the filesystem and the partition will be the same size, and will fill whatever size you have left in MENDER_STORAGE_TOTAL_SIZE_MB when the other partitions are subtracted.

It is not possible to also get an ext4 file which is small at the same time, and the reason is that the filesystems must be the same checksum to work with binary delta updates. You could also argue that a file system of a smaller size is a different image altogether. If you want this, then I would suggest that you treat the artifact and sdimg/uefiimg builds as two different configurations and build them separately.

Since the filesystem is not read-only, how would binary delta updates work at all, even if the filesystem wasn’t resized? So far, the change in filesystem checksum has not been an issue.

I typically do something like this, to achieve what you are describing

# Explicitly set this to zero as it might sneak in depending on what we include.
IMAGE_ROOTFS_EXTRA_SPACE = "0"
#
# # If we set this to zero we rely completely on IMAGE_OVERHEAD_FACTOR.
IMAGE_ROOTFS_SIZE = "0"
#
# # This will make sure that we a relative amount of extra free space on rootfs.
# # 1.5 means that we will have 50 % extra free space added.
IMAGE_OVERHEAD_FACTOR = "1.5"

This way I will get “smaller” ext4 images but the partition size will still based on what is calculated from MENDER_STORAGE_TOTAL_SIZE_MB.

This should work with read-only / delta updates as well, give that you do not try to resize the image on first boot, which there is no point in doing if it is read-only.

Unfortunately, the system was designed without a read-only root and also to install a large armount of data in the rootfs when in use.

So IMAGE_OVERHEAD_FACTOR would be something like “10.0”.

And that means flashing the rootfs needs a file 10x larger when uncompressed and that it takes 10x as long to flash. Which is really quite a significant regression vs how it was working with Mender 1.5.

But I see that the new system Mender uses to remove files from the persistent data partition does not work with the wic roofs source anymore. So I’ve decided to go back to rawcopy and created a systemd unit that will resize the rootfs on first boot. It’s possible to run it early in the boot sequence.

But it does not need to be 10x at build time. The intention is to set IMAGE_OVERHEAD_FACTOR to enough free space for the device to boot, which will speed up downloading the Artifact. You can still resize it on first boot.

That’s what I’ve done. The key bit is adding the “resize on first boot”. There’s nothing that does that and it’s not an option in meta-mender or poky. I had to write a custom systemd unit to make it happen, and make it happen at the right time, which is the more complex part.

I used the constant extra space to allow the system boot to progress to the earliest point it’s possible to resize the rootfs. Using a factor for extra space seems less correct, since the space systemd needs to make a few symlinks and empty directories and so on isn’t based on how big the rootfs is. If someone adds a 1 GB dataset to the rootfs for some reason, it doesn’t now need 512 MB of extra space because of that.