Recovery when data partition is corrupted


this is just speculation and would like to get maybe some feedback or experience if anybody have experience in this regard. In yocto build image we have boot, rootA, rootB and data partition which is used as storage for configurations and also some mender data. Does anybody have an experience what happens if data partition gets corrupted? In our build rootfs is readonly so only write can happen in data partition. In my opinion as data partition is mounted using systemd (from fstab) and cannot be mounted (if it is corrupted) then device boots to emergency mode and stay in this mode forever. We was thinking about doing some action if this can happen. I’m about to try to simulate it just maybe get some insights ;). Thanks

1 Like

I have not had this happen to me, but if you want to minimize the risk from a corrupted data partition, you could make two of them, and put Mender on its own partition, and all other app data on the other. Then at least Mender should keep functioning if an app breaks the partition and you can use state scripts to fix the broken data partition.

If Mender’s data partition becomes corrupted, well then you are probably out of luck. But the same can be said about the Mender binary itself: The state data and the binary are the heart and brain of Mender, and they need to function.

1 Like

OK thanks for hint. I’ll probably go this way to move mender related stuff to separate partition and keep data for application purposes only.

Got myself thinking about this when I got a test image to drop into emergency mode, didn’t know whether it was better to start a new thread or revive this one instead…

What about keeping a template of /data inside the system partition? As for the device private key, I suppose a backup could be stored as a bootloader variable.

Additionally, the data partition could be dropped entirely from the sdimg, as you now have the logic and files to seed a data partition baked into the system partition already.

So, what do you all think?

That would probably work for recovery, but Mender would forget everything about what was installed. It would look like a device that has never deployed an artifact.