Raspberry Pi Firmware

Greetings,

I was looking at this issue on the Pi GitHub page and it got me wondering what additional changes would be needed to eliminate the need for u-boot entirely. It looks like it would handle almost everything for the A/B boot as-is, but I suspect something like a special file to select between them (rather than parsing config.txt) and a way to check for failed boots (bootcount in u-boot, I think?) would be required additions. I don’t know how willing they would be to add extra stuff, but it seems it would be worth a try. Does anyone have any suggestions or advice? I’m happy to bring it up with the Pi devs if I can get some help on the actual requirements. Let me know what you think!

Thanks,
Trevor

Hi @stiltr. Thank you for bringing this up.

I think you covered the important parts.

We rely heavily on the bootcount feature in U-Boot to trigger roll-backs, which is also important to be able to trigger a rollback from user-space in case e.g the Mender server is not able to connect to the Mender server, it will just reboot the device without running mender -commit and the U-Boot bootcount will trigger a rollback.

The Mender client must also be able to parse and modify the configuration in user-space.

It would be prefererad if there is built-in redundancy (similar to U-Boot env), to be able to perform updates
to the configuration file atomically and in a fail-safe way.

Hi @mirzak,

Thanks for the reply! Sorry mine was so delayed.

I just posted a comment on the github issue asking about the possibility of adding these features to the Pi firmware. I’ll keep you posted on what I hear back. I’m worried that there might not be write support built into the firmware which would make the boot count difficult to accomplish, but I’m totally guessing there.

Thanks again!

Thanks for initiating the discussion @stiltr.

No problem!

So it looks like a bootcount may be possible (limited to two bits). I would think this would be sufficient since the mender default is a bootlimit of 1.

They’re asking for an overview of how this would all work and what would be required to be added. I’ve outlined it to the best of my ability below and would appreciate any feedback you could give. I tried to make the flag names as generic as possible (ideally most of this could be used for users who just want a recovery partition to boot after bootlimit number of failed boots).

The root partition, kernel, etc. would be selected via the recently added os_prefix flag. Two folders would hold the necessary files for each root fs (the root partition being specified in the respective cmdline.txt files).

A 2-bit bootcount would be added to wherever it lives and a method to read and reset it from user-space would be created.

Three new optional flags would be added to config.txt: upgrade_available, bootlimit and recovery_os_prefix. (This is for the simplest and most generic setup.)

The os_prefix and recovery_os_prefix flags in config.txt would be managed from user-space by the mender client. bootlimit would be set to 1.

During boot, the firmware would check bootcount, bootlimit and upgrade_available to select the proper os_prefix. (Please forgive the horrible psuedo code and formatting…)
if(bootcount<bootlimit && upgrade_available==0)
_ //Normal boot
_ boot();
elseif(bootcount<bootlimit && upgrade_available==1)
_ //Upgrade is pending, boot to it
_ os_prefix=recovery_os_prefix;
_ boot();
elseif(bootcount>=bootlimit && upgrade_available==1)
_ //Upgrade failed, boot normal os
_ boot();
elseif(bootcount>=bootlimit && upgrade_available==0)
_ //Normal boot failed, fall back to recovery
_ os_prefix=recovery_os_prefix;
_ boot();

Once the OS has booted, Mender would check bootcount and upgrade_available and act accordingly. If the boot was a success, bootcount is set to 0. If this is the first successful boot after an update, clear upgrade_available and swap os_prefix and recovery_os_prefix. If the boot is a failure, don’t change anything and reboot.

I think that about covers it, but it’s been a long day and I’m sure I missed something along the way. If you can sum this more elegantly, please feel free. Thanks for taking the time to look this over!

1 Like

Hi @mirzak,

I forgot to tag you in the last reply, so I’m not sure if you’ve seen it. Do you have any input?

Thanks!

Apologies for the delay, I did see your write up but just been struggling to find time to look it over.

Overall I think you have covered the the important bits.

This,

if(bootcount<bootlimit && upgrade_available==0)
_ //Normal boot
_ boot();

can probably be:

if(upgrade_available==0)
_ //Normal boot
_ boot();

as Mender logic will only do rollback/alternative boot when upgrade_available=1. Same would apply to this statement,

elseif(bootcount>=bootlimit && upgrade_available==0)
_ //Normal boot failed, fall back to recovery
_ os_prefix=recovery_os_prefix;
_ boot();

When using an A/B update strategy there is not garanti that the “recover_os” is a functional image. So it would simply be:

if(upgrade_available==0)
_ //Normal boot
_ boot();
elseif(bootcount<bootlimit && upgrade_available==1)
_ //Upgrade is pending, boot to it
_ os_prefix=recovery_os_prefix;
_ boot();
elseif(bootcount>=bootlimit && upgrade_available==1)
_ //Upgrade failed, boot normal os
_ //Set  upgrade_available=0
_ boot();

So it looks like a bootcount may be possible (limited to two bits)

Do you know if this is persistent across power cycle? I trying to figure out what happens when you lose power.

In the U-Boot case, you store the U-Boot environment on persistent storage using an A/B strategy as well. This is required for it be atomic and thus be able to handle a power loss while updating variables.

This means that one needs to have two copies of config.txt and cmdline.txt (normal os, and recovery os). Maybe that was your intention as well but was not clear to me from the text.

Thank you so much for following up on this!

No worries, I’m sure you’re busy. I know I certainly have been lately.

I see what you’re saying about Mender not guaranteeing that the “recovery os” is a valid image. That makes sense, but is there any downside to trying to boot it anyway if the primary os fails? It seems like maybe working is better than definitely not working, but maybe I’m missing something there. Also, part of my goal with this would be to allow this code to preform two functions. Obviously supporting mender would be the first, but also to allow people not using mender, or an A/B setup at all, to boot into a recovery os if they exceed bootlimit. I think this will further our case with the RPF for getting the code added. This way, a “normal” user could ignore upgrade_available (which would have to default to 0) and the same code would handle their recovery needs as well. Does this sounds like a reasonable approach? Obviously duplicating this small snippet of code to handle both scenarios separately isn’t going to grow the size of the binary significantly, but I assume memory is at a premium here.

I don’t know for sure how they would be implementing the bootcount, but it sounded to me like it would persist across power cycles. If I had to guess, I’d say it’s probably located in some small bit of on-board EEPROM, but I’m totally guessing. I’ll make sure I ask this in my next GitHub comment.

My understanding is that there would only be one copy of config.txt at a time, but you could preform atomic updates to it using the rename trick (write to temp file, sync, rename). For cmdline.txt there would be two different versions located alongside the kernel and other assorted boot files in their respective folders.

Thanks again for your help with this. Let me know if there’s anything I missed.

From a security perspective, you can force it to load something “old” by breaking the “new” if you do rollback automatically and I think that is the only drawback I can think of. Preferably you would want to be able mark “old” as unsecure and to be able to disable the automatic rollback.

Though you could probably just erase the “old” if you know it is un-secure, so there are ways to work around this.

Obviously supporting mender would be the first, but also to allow people not using mender, or an A/B setup at all, to boot into a recovery os if they exceed bootlimit . I think this will further our case with the RPF for getting the code added

Makes sense, and agree that it is better to keep it generic.

My understanding is that there would only be one copy of config.txt at a time, but you could preform atomic updates to it using the rename trick (write to temp file, sync, rename). For cmdline.txt there would be two different versions located alongside the kernel and other assorted boot files in their respective folders.

Then we are covered :slight_smile:

Sorry for the slow reply!

Preferably you would want to be able mark “old” as insecure and to be able to disable the automatic rollback.

That makes sense. You probably wouldn’t even have to erase the image, just change/delete the recovery_prefix or rename the kernel or something like that. I guess it depends on how thorough you want to be. It would likely require some extra changes in the mender client either way though. I’ll mention that it would be nice to make the fallback boot selectable and see what they say.

It sounds like we’re on the same page! I’ll write up a comment on GitHub detailing what we’ve discussed. Hopefully they’ll be agreeable. I really appreciate all your help in sorting through this. Have a great day!

2 Likes

It looks like we’re out of luck here. From the GitHub issue:

My original answer was based on the idea of using another few bits of the very scarce reset-proof state, but that doesn’t survive a power cycle so could only be used with a user-accessible reset signal (removing the need to pull the power cable in the event of a boot failure).

I can’t think of a way around that at the moment and at this point they aren’t interested in adding writing to persistent storage to the FW. I thought there might be a possibility of some sort of hardware bootcounter that would select the proper config via the GPIO, but I’m not sure if that would work and I haven’t been able to think of a simple way to do that. If anyone has any ideas on how we might be able to make this work without having the FW write to persistent storage, please share them!

Thank you for looking in to this!