Delta update update module checksum error

Here you go:

$ mender-artifact read A2B.mender
Reading Artifact...
.............................................................. - 100 %
Mender artifact:
  Name: demo-image-base_3.1.15-9b9df01-dirty-kris
  Format: mender
  Version: 3
  Signature: no signature
  Compatible devices: '[jetson-tx2-devkit jetson-tx2]'
  Provides group: 
  Depends on one of artifact(s): []
  Depends on one of group(s): []
  State scripts:
    ArtifactInstall_Leave_80_bl-update

Updates:
    0:
    Type:   mender-binary-delta
    Provides:
        rootfs-image.checksum: 693cd521c0aa1ecd46c1f81f8f91df9d44d9b5ed2ebc014dab3f8cd539cb20a4
        rootfs-image.version: demo-image-base_3.1.15-9b9df01-dirty-kris
    Depends:
        rootfs-image.checksum: 38e485758ba550532ca19fa9ec4bd61d0282e05924e5a24f304a90316199b1b0
    Clears Provides: ["artifact_group", "rootfs_image_checksum", "rootfs-image.*"]
    Metadata:
        {
          "delta_algorithm": "xdelta3",
          "rootfs_file_size": 12343259136
        }
    Files:
      name:     demo-image-base-jetson-tx2-devkit.ext4.delta
      size:     727497
      modified: 2022-03-30 16:44:49 +0200 CEST
      checksum: d283c647696e31e34403174228e1c43fdac64e8a2d094aebfbf6a281a11bdccb

On the device (with A.mender installed), it shows:

$ mender show-provides
artifact_name=demo-image-base_3.1.15-9b9df01-dirty-kris
rootfs-image.checksum=38e485758ba550532ca19fa9ec4bd61d0282e05924e5a24f304a90316199b1b0
rootfs-image.version=demo-image-base_3.1.15-9b9df01-dirty-kris

So at least the baked in root checksums correspond.

I’m having trouble executing the dd command you propose, because the bare tegra-demo-image uses busybox (which doesn’t have the iflags option). I guess running the following command is the same:

$ dd if=/dev/mmcblk0p1 bs=1 count=12343259136 | sha256sum -

But of course reading a root partition byte-for-byte takes ages.

I will try to install the ‘real’ coreutils, and get back with results later.

Allright, I did a rebuild of Yocto with coreutils included, and repeated the whole A/B/A2B installation and update process.

The mender-artifact output:

$ mender-artifact read A2B.mender
Reading Artifact...
.............................................................. - 100 %
Mender artifact:
  Name: demo-image-base_3.1.15-9b9df01-dirty-kris
  Format: mender
  Version: 3
  Signature: no signature
  Compatible devices: '[jetson-tx2-devkit jetson-tx2]'
  Provides group: 
  Depends on one of artifact(s): []
  Depends on one of group(s): []
  State scripts:
    ArtifactInstall_Leave_80_bl-update

Updates:
    0:
    Type:   mender-binary-delta
    Provides:
        rootfs-image.checksum: 88e7c0241dc48df3a887ef87c61c119e271a56cad36836d10f823332a56c818c
        rootfs-image.version: demo-image-base_3.1.15-9b9df01-dirty-kris
    Depends:
        rootfs-image.checksum: 84b5a7a89ea1b5fc947d8df6bc20a27102e9483caf18f1ea4977b00e2d1007db
    Clears Provides: ["artifact_group", "rootfs_image_checksum", "rootfs-image.*"]
    Metadata:
        {
          "delta_algorithm": "xdelta3",
          "rootfs_file_size": 12343259136
        }
    Files:
      name:     demo-image-base-jetson-tx2-devkit.ext4.delta
      size:     729948
      modified: 2022-03-31 10:19:34 +0200 CEST
      checksum: 56002eb6c0a58572003b2f17807e8aefdece378cdbbc33946604ff0c8c8b19ec

The dd output on the device:

root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p1 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 96.1862 s, 128 MB/s
9b6d570b8bbafa5ddf3335926164382b664ce8e6e0c745feed941300fc8a6ecb  -

So, apparently there is a difference indeed.

What’s interesting by the way: I ran the dd command on both read-only root partitions and get different results:

root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p1 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 96.1862 s, 128 MB/s
9b6d570b8bbafa5ddf3335926164382b664ce8e6e0c745feed941300fc8a6ecb  -
root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p33 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 97.3028 s, 127 MB/s
06df80590d916f30d29f14020f7be28c4fe13d38b673f4c4c0e10b3054b3df7f  -

Even though right before I installed/committed A.mender on both partitions in the preparation process.

I will verify now if this is also the case for a clean system flash (without installing any Mender updates).

Hmmm. OK. It seems even a freshly flashed system shows different hash values for either root partition:

root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p1 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 94.952 s, 130 MB/s
6864964e08e50e1021d24f845fc50f84a8fc06b87a1977a1f973adf765236dea  -
root@jetson-tx2-devkit:~# dd if=/dev/mmcblk0p33 bs=1M count=12343259136 iflag=count_bytes | sha256sum -
11771+1 records in
11771+1 records out
12343259136 bytes (12 GB, 11 GiB) copied, 96.8051 s, 128 MB/s
cc8017f288d6cdb17f1f61d6e0f1b368f6e223159fac8b437e0b69e24a2c1114  -

(this is run from boot slot 0, right after the flashing procedure, no Mender artifacts installed)

I also checked if the above checksums remain the same if I switch root partition (which, I guess will rule out the overlay mounts etc.). But the checksums are in fact equal regardless the active boot slot (though still different between the root partitions).

I also asked in the OE4T project if they can think of any reason why this is happening.

Note that a difference between the two partitions after a fresh flash is not necessarily a problem. This commit was added relatively recently to make the second partition completely empty. However the checksum of the running partition should be identical to the one reported by mender-artifact, so this is still a mystery.

OK, thanks for the information.

I’m not sure I fully understand “completely empty” (or at which stage in the process that is), because I’m able to switch back and forth between the boot slots (i.e. partitions /dev/mmcblk0p{1,33}) using tegra-boot-control right after the initial flashing procedure.

Is there anyone aware of an open-source/example Yocto distro, preferably with the meta-tegra layer included that is known to work with Mender delta updates? Or even just another bare-bones example project?

Now that we’re at a loss again as to where the checksum error originates from we’d like to get a global sense where the issue may be.

Again, thanks in advance.

I also had this issue, it turned out that U-Boot was mounting the partitions as RW before remounting as RO. Can you check fw_printenv to see your boot args.

1 Like

Hi! Thanks for the response.

This is the output of fw_printenv:

mender_boot_part=1

Something else I’ve found in the meantime is the reason for the checksum difference (at least one of, that is): the file /etc/machine-id is different for either root partition. This is a systemd configuration file (see here). I’ve looked around in the Yocto layers that use/modify this file, and there’s a couple of Mender layers involved. I’m still looking as to which layers/recipes are active for my build, and in what order this affects my build output. But it seems as though the machine ID file is re-generated for every build, eventually making the delta updates fail.

Yes, I can confirm that at first boot right after installation, systemd changes the contents of /etc/machine-id from empty to some ID. This means the partition apparently is mounted as RW at first. And of course after changing this file the root filesystem checksum will forever be different from the other partition (which has the same randomization process).

I’m very happy to announce that we’ve managed to fix the issue.

As I already alluded to in my previous message, the difference was in /etc/machine-id. It turns out the kernel didn’t mount / as read-only by default, so we had to add that. This will make sure the machine ID writer will not change the root filesystem.

However, that’s not the whole story. There’s more to /etc/machine-id than meets the eye. Especially when it is used in conjunction with read-only file systems and overlay mounts for /etc. The order in which things happen is very important, see this PR for more info. The short summary: it’s kinda broken in systemd but not going to be fixed.

Thanks everyone for helping out! :slight_smile:

2 Likes

Hi Krisvanrens,
it seems we have the same issue :frowning:
Please, can you explain or describe your fix? Our root fs is already in read-only, so I don’t understand exactly when you wrote ‘we had to add that’
Best regards

Hi!

Our problem was two things:

  1. The kernel was initially mounted as RW instead of RO. This allowed systemd to initialize the root filesystem with a unique ID during first boot. This was the issue that caused the wrong hash value.
  2. After making the kernel mount root as readonly, systemd now would overwrite the ID every boot in a tmpfs.

We found the issue by performing a file-level hash comparison between the two root filesystems (e.g. create an md5sum for every file and compare it).

Best regards,

Kris

Thank you Krisvanres for your reply. We are already in your 2nd step, and our issue is then different. Once the device starts, we found 1 modified byte in the partition, BUT if we start the device for example with a live distro, and try to run a sha256sum on the dismounted rootfs, the checksum is correct. If we reboot the device with rootfs, we still find the same modified byte (and wrong checksum). Now, a big mystery…
Best regards