So at least the baked in root checksums correspond.
I’m having trouble executing the dd command you propose, because the bare tegra-demo-image uses busybox (which doesn’t have the iflags option). I guess running the following command is the same:
Hmmm. OK. It seems even a freshly flashed system shows different hash values for either root partition:
root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p1 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 94.952 s, 130 MB/s
6864964e08e50e1021d24f845fc50f84a8fc06b87a1977a1f973adf765236dea -
root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p33 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 96.8051 s, 128 MB/s
cc8017f288d6cdb17f1f61d6e0f1b368f6e223159fac8b437e0b69e24a2c1114 -
(this is run from boot slot 0, right after the flashing procedure, no Mender artifacts installed)
I also checked if the above checksums remain the same if I switch root partition (which, I guess will rule out the overlay mounts etc.). But the checksums are in fact equal regardless the active boot slot (though still different between the root partitions).
Note that a difference between the two partitions after a fresh flash is not necessarily a problem. This commit was added relatively recently to make the second partition completely empty. However the checksum of the running partition should be identical to the one reported by mender-artifact, so this is still a mystery.
I’m not sure I fully understand “completely empty” (or at which stage in the process that is), because I’m able to switch back and forth between the boot slots (i.e. partitions /dev/mmcblk0p{1,33}) using tegra-boot-control right after the initial flashing procedure.
Is there anyone aware of an open-source/example Yocto distro, preferably with the meta-tegra layer included that is known to work with Mender delta updates? Or even just another bare-bones example project?
Now that we’re at a loss again as to where the checksum error originates from we’d like to get a global sense where the issue may be.
I also had this issue, it turned out that U-Boot was mounting the partitions as RW before remounting as RO. Can you check fw_printenv to see your boot args.
Something else I’ve found in the meantime is the reason for the checksum difference (at least one of, that is): the file /etc/machine-id is different for either root partition. This is a systemd configuration file (see here). I’ve looked around in the Yocto layers that use/modify this file, and there’s a couple of Mender layers involved. I’m still looking as to which layers/recipes are active for my build, and in what order this affects my build output. But it seems as though the machine ID file is re-generated for every build, eventually making the delta updates fail.
Yes, I can confirm that at first boot right after installation, systemd changes the contents of /etc/machine-id from empty to some ID. This means the partition apparently is mounted as RW at first. And of course after changing this file the root filesystem checksum will forever be different from the other partition (which has the same randomization process).
I’m very happy to announce that we’ve managed to fix the issue.
As I already alluded to in my previous message, the difference was in /etc/machine-id. It turns out the kernel didn’t mount / as read-only by default, so we had to add that. This will make sure the machine ID writer will not change the root filesystem.
However, that’s not the whole story. There’s more to /etc/machine-id than meets the eye. Especially when it is used in conjunction with read-only file systems and overlay mounts for /etc. The order in which things happen is very important, see this PR for more info. The short summary: it’s kinda broken in systemd but not going to be fixed.
Hi Krisvanrens,
it seems we have the same issue
Please, can you explain or describe your fix? Our root fs is already in read-only, so I don’t understand exactly when you wrote ‘we had to add that’
Best regards
The kernel was initially mounted as RW instead of RO. This allowed systemd to initialize the root filesystem with a unique ID during first boot. This was the issue that caused the wrong hash value.
After making the kernel mount root as readonly, systemd now would overwrite the ID every boot in a tmpfs.
We found the issue by performing a file-level hash comparison between the two root filesystems (e.g. create an md5sum for every file and compare it).
Thank you Krisvanres for your reply. We are already in your 2nd step, and our issue is then different. Once the device starts, we found 1 modified byte in the partition, BUT if we start the device for example with a live distro, and try to run a sha256sum on the dismounted rootfs, the checksum is correct. If we reboot the device with rootfs, we still find the same modified byte (and wrong checksum). Now, a big mystery…
Best regards