After getting the mender-convert created image to work on my Gigabyte Brix (see my previous post on this) I’ve run into a new issue I’m not sure how to resolve.
In order to test the updating of the second rootfs partition on sda3 through the demo server, I’ve imaged the first rootfs on sda2 using the “mender dump” command on the device. I then used the mender-artifact tool to create a deployable mender artifact from this dump and uploaded that to the mender server. I then created a deployment for my Brix using the tools available in the mender web interface.
My Brix contacts the server and after accepting it as a valid device it starts to download the deployment (which is about 2.2 gigabytes in size), this seems to go smoothly until it hits about 69 percent, after which the Brix reboots. During the reboot phase I see a message on the monitor connected to the Brix that states that it will attempt a rollback. It reboots once more after this and after I log into Ubuntu and check which rootfs is currently being used I see that it is still using the one on sda2.
I’ve noticed that a log file is created for each attempt I make and I’ve looked at its contents, but for me it isn’t clear why the update fails and rolls itself back.
Can anybody who is more knowledgeable in the internals of Mender have a look at it and explain to me what is going wrong?
Note: I had to change the extension from .log to .yaml in order to get it to attach to this message. Hopefully that doesn’t mess up the contents!
{"level":"info","message":"Running Mender version a0ffa83","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"info","message":"State transition: init [none] -\u003e after-reboot [ArtifactReboot_Leave]","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"debug","message":"Have U-Boot variable: mender_check_saveenv_canary=","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"debug","message":"List of U-Boot variables:map[mender_check_saveenv_canary:]","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"debug","message":"Have U-Boot variable: upgrade_available=0","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"debug","message":"List of U-Boot variables:map[upgrade_available:0]","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"error","message":"transient error: Reboot to new update failed. Expected \"upgrade_available\" flag to be true but it was false","timestamp":"2020-03-16T15:32:23+01:00"}
{"level":"info","message":"State transition: after-reboot [ArtifactReboot_Leave] -\u003e rollback [ArtifactRollback]","timestamp":"2020-03-16T15:32:23+01:00"}
Which triggers the roll-back, but this could also mean that the rollback was performed in GRUB e.g because if not being able to boot the “new” Linux kernel.
If you are able to capture serial output on the device, it might provide valuable insights, though this is difficult on x86 devices and GRUB is really bad at printing valuable output.
But my bet would be on that it is failing to load the Linux kernel after the update when it is trying to boot the updated image, and would suggest to dig deeper here. Maybe halt GRUB after update reboot and try running the commands manually to find the one that fails.
I found the issue causing the problem. I had gzipped the dump file to save space (16 GB => ~2GB) and then directly used the gzipped version with mender-artifact, because for some weird reason I thought that would work. Which it doesn’t!
After using an uncompressed copy as input file it created a usable artifact that successfully updated the Brix.