We use Mender for a few months but are still at the start of our journey. We have like 20 productive devices at our customers’ locations and introduced Mender to be able to update the device without having access to the device in case anything happens.
Now we have the case that something broke up and we need to update the devices. We implemented the update in our custom layer (only some changes in the Python code) and built the image with the same configuration as before. When we tried to create a deployment we received this log:
2024-12-04 09:43:41 +0000 UTC info: Running Mender client version: 3.5.3
2024-12-04 09:43:41 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2024-12-04 09:43:41 +0000 UTC info: No public key was provided for authenticating the artifact
2024-12-04 09:43:42 +0000 UTC info: Output (stderr) from command "fw_printenv": Cannot read environment, using default
2024-12-04 09:43:42 +0000 UTC info: Output (stderr) from command "fw_printenv": Cannot read default environment from file
2024-12-04 09:43:42 +0000 UTC info: Output (stderr) from command "fw_printenv": Cannot read environment, using default
2024-12-04 09:43:42 +0000 UTC info: Output (stderr) from command "fw_printenv": Cannot read default environment from file
2024-12-04 09:43:42 +0000 UTC error: Artifact install failed: Payload: can not install Payload: core-image-minimal-raspberrypi3-64.ext4: No match between boot and root partitions.: exit status 243
2024-12-04 09:43:42 +0000 UTC info: State transition: update-store [Download_Enter] -> cleanup [Error]
2024-12-04 09:43:42 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
We also get the error “No match between …” on the devices when we run mender show-artifact.
We tried to solve this by reading some very old discussions here but we aren’t able to solve this as we don’t really understand what the error is telling us.
Can someone please assist with that?
Details of our integration
We integrated mender in an existing Yocto project that consists of the following layer:
poky/meta
poky/meta-poky
poky/meta-yocto-bsp
meta-raspberrypi
meta-rust
meta-openembedded/meta-oe
meta-openembedded/meta-filesystems
meta-openembedded/meta-networking
meta-openembedded/meta-python
meta-virtualization
meta-mender/meta-mender-core
meta-our-custom layer (only has some services that run python scripts)
fw_printenv can’t read the u-boot environment and returns the default. The default probably does not contain a proper root partition setting, hence the Mender Client can’t understand the current state and bails out.
As for the reasons why your u-boot environment is defective or non-readable, I cannot comment. Do you have a local reproducer? If so, what is the output if fw_printenv, and what are its configuration settings? Additionally, you might want to hexdump the environment memory addresses to see what’s actually there, and/or inspect the u-boot build log via serial console.
Your explanation sounds reasonable, but we’re not sure why this occurs, despite the image working correctly when burning it to a storage manually without trying to do an update.
fw_printenv outputs:
Configuration file wrong or corrupted
After building, we have the following in the build directory: fw_env.config:
The latter one (the default) also reflects the contents in /data/u-boot/fw_env.config and /etc/fw_env.configon the actual device with the prior version which we want to update. So it seems like the device is falling back to that default.
Additional information:
Despite that we’re creating the Mender Artifact with:
We also tried to create an artifact with the ext4 instead of the sdimd output of the Bitbake process, and also with the Mender artifact created by Bitbake itself in the course of building. All three variants spawned the same error.
Let us know when you need anything else.
Thank you very much so far.
Reason: you’re putting a full sd card image into an Artifact. The Client doesn’t care, to it it’s just a binary blob, and will try to put it into the partition (which will probably fail due to size), but you don’t even get there, as the partitions preconditions are checked before starting the download.
If you want to create the Artifact manually, then you need to use the .ext4 (or whatever you configured it to) filesystem as payload. I would suggest to also inspect the actual command which is run if the Artifact is created through bitbake for comparison.
After looking at your layers, I also think I’ve found the problem: you don’t use meta-mender-raspberrypi, which was in meta-mender up to and including kirkstone, and in meta-mender-community since scarthgap. That one brings the u-boot configuration and patches required for the Raspberry Pi.
So if you really did not use it in the build, that’s the problem. And I’m pretty sure that the build will never have passed a test update then.
If not mentioning it was just an oversight, no straight idea. So here’s a couple of pointers to dig:
if you take a fresh SD card, flash it with the .sdimg, boot it and then run fw_printenv, what do you get?
in such a freshly built image, are there also the two diverging fw_env.config and fw_env_config.default?
has the update process worked at some point in the past, with exactly that build?
our usual way was to create the artifact with the .ext4 image, or we used the one provided by Bitbake. Regarding the sdimg it makes sense that it’s the wrong way to create an artifact anyways. But the error was the same in all three variants we’ve tested. That was more or less just one of the failed trials by doing trial & error.
We indeed didn’t include this layer. Trying to see what happens when we integrate that now.
But since the installation seems to lack the correct configuration nevertheless, I guess that’ll still not help when trying to update the OS in that way.
Do you maybe see any way to patch this problem remotely and fix that issue up, so that we’re able to introduce an OS update? Probably my understanding of the issue isn’t deep enough, but I would suppose that if we manage to patch the fw_env.config remotely -via an application update maybe-, we should be able to make the update work , since it seems that it’s only missing that file. Or do you think we’re stuck with that and just need to roll out new SD cards?
Regarding your other questions:
With a fresh SD card we still have the same output for fw_printenv
The two files are also still diverging. Going to see what changes when meta-mender-raspberry is integrated.
We thought it did, but it was an application update that worked. We never did an successful OS Update before unfortunately.
It’s an RPi3 B+. The + changed some Chipsets in comparison to the non-plus version, but it’s officially supported and running that didn’t spawn any further problems.
If the layer is not there then the other questions are not relevant anymore, because that explains your problem fully.
Unfortunately it’s not just the fw_env.config, but actually the u-boot binary itself on the boot partition that you need to update, at least. Not sure if other things might be affected too, but at least from a first glance it might be good enough. Please note that changing that is definitely a single point of failure and might brick the devices. If the devices are easily accessible, rolling out fresh SD cards might be a more time efficient solution, as you don’t need to construct the application update artifacts and test the process.
thank you very much, we’re tracing that path then.
In case we’ve found a solution to make it work remotely so that the mender-client consumes the updates, I’ll give an update here. Cause since application updates work like they should, we have the option to inject files into the file system, which might do the trick, in case we’re able to update u-boot and the minimal configurations required.
We could validate that the missing meta-mender-raspberry layer was indeed the source of the problem.
After adding that it’s no problem to issue OS updates anymore. Works without any trouble. Also, when hexdumping the respective address space, you can clearly see the correct configuration now, whereas that was missing completely before the patch. Still wondering why it even booted in the first place, seems there was a fallback bootloader in place which handled that.
In regards of fixing already deployed devices remotely we don’t have a solution yet.
Theoretically it’s possible to update the bootloader remotely, but practically it’s hard to reproduce the exact configurations needed, so that we end up with a useful update package which would maybe allow us to fix up broken devices.
Update:
We skipped the latter part, since supplying new storage media once was ultimately faster.
In theory it should be possible to patch the bootloader remotely via a .deb or .rpm package and do all the modifications required, but for that you’d need a pretty exact configuration. The process was too tedious, which is why we skipped on that and just made a fresh and clean hardware-deployment now.