Gigabyte Brix issues with mender i.c.w. Ubuntu Server 19.04

I’m currently working for a project that uses Gigabyte Brix devices (the GB-BLCE-4000C to be exact, see https://www.gigabyte.com/Mini-PcBarebone/GB-BLCE-4000C-rev-10) in combination with Ubuntu Server 19.04. We’ve equipped these systems with 4GB, a 120 GB SSD and a USB 4G dongle for connectivity. The processor is a Intel Celeron N4000.

We have these systems working for our application by setting up Ubuntu using the normal procedure. Now we are looking into ways of making them OTA updatable. After some research Mender seemed to be the most mature solution, so I’m currently investigating how to setup the Brix for that use.

I had some successes with manually converting the base image we use for use with Mender (got the demo server running in a VM and managed to get the Brix to register itself on the server). But of course this was only part of the things we needed.

Next I tried to use mender-convert to convert the base image of out Ubuntu install into something that would conform to the 4 partition layout that Mender wants to use.

After a number of starting problems with Mender-Convert (my manual changes to install mender were causing problems and I had to go back to the base Ubuntu image that we use to configure the Brix for our use) I finally managed to go through the entire processing chain of mender-convert and I was left with a SDIMG that could be placed on the Brix’s SSD using dd. It contained a 512 MB boot partition, two 16GB rootfs partition and a 87.5 GB data partition.

After placing the image file on the Brix SSD and rebooting I was confronted with a Grub 2.02 command line prompt, but not our Linux OS. Further investigation using the Grub prompt showed that there were indeed 4 partitions on the SSD, (hd0, msdos1) to (hd0, msdos4). Using the available commands I could see that msdos1 contained the boot partitioned (with base dir EFI), msdos2 and msdos3 contained our Ubuntu rootfs and msdos4 contained the data partition (with mender directory).

Using some grub commands I even managed to boot into our Ubuntu rootfs on msdos2. Looking at the contents of the EFI base directory on mdos1 I saw that there were two subdirectories there, “BOOT” and “Ubuntu”. The Mender specific stuff was located in “BOOT”, “Ubuntu” seemed to contain the original boot files from our install (verified by comparing them to the ones on the original boot partition).

Using the grub.cfg that was present on the EFI/BOOT directory with the command configfile would result in a system shutdown.

Looking at the configuration options available in the “mender_convert_config” and “mender_grub_config” files used by mender_convert tool and trying several variations of what looked like relevant settings did not change the behavior of the resulting SDIMG file: we always ended up in the Grub 2.02 prompt!

Which was strange because the latest version of mender-convert had switched to Grub 2.04!

As a final test I booted the Brix in its UEFI shell to see if that would help us out. This gave me a fs0: drive that mapped to the boot partition. Going to the EFI/BOOT directory and running the BOOTX64.EFI executable led to a successful boot of our Ubuntu rootfs on partition 2! I even saw some feedback being printed in the form of a Grub message line and two “lock” lines before the Linux boot sequence took over.

Unfortunately I had removed our USB 4G dongle (because of lack of USB ports for keyboard and other USB devices), so I couldn’t connect to the internet.

So I removed some unused USB device and plugged the 4G dongle back in and rebooted the Brix back to the UEFI prompt to repeat my actions to boot into the Ubuntu rootfs. Only this time I was greeted with a Grub 2.04 prompt. Turns out that the 4G Dongle also has some form of mass storage inside of it and Grub had assigned it to hd0, moving our SSD down to hd1. So the boot no longer worked because the rootfs was now on a different (incorrect) drive!

So I now have two problems:

  1. why doesn’t the mender-convert generated image boot through to the active rootfs by itself, without me having to do the trick via the UEFI shell?
  2. if that is fixed, how do I prevent or circumvent the boot failing because the 4G dongle occupies the hd0 device slot in Grub? Is this something that can be done with settings for Mender config files? Strangely enough this was not an issue for the original bootloader, but that used the UUID of the partition to select the rootfs to boot (which won’t work because mender uses an identical UUID for both rootfs partitions)

Any help will be greatly appreciated! If you need more information I’ll gladly supply it.

It might be a setting in the UEFI firmware, that the Ubuntu directory is prioritized before the BOOT folder. Does it help to delete the former from the 1st partition?

This is a known problem, and the key issue is exactly that GRUB can use UUIDs of filesystems, but not of partitions. And Mender would need the latter, as filesystem UUIDs are neither stable nor unique in Mender context. This is something that could probably quite easily be implemented in GRUB, but no one has taken on the task yet. There is a task for it in our bug tracker.

There is no out-of-the-box workaround, but it should not be too hard to tweak the grub scripts to look for some key file on the available hard drives, and then assign the right one.

Ok, removing the Ubuntu subdirectory has resolved the boot to Grub issue. The image now boots through to the correct partition if I leave the 4G dongle unplugged. Maybe a suggestion for the mender-convert tool to remove the old boot subdirectory for Ubuntu during conversion?

There might be a possibility to disable the built-in “driver CD” of the 4G dongle, searching the internet has shown possibilities for this for similar Huawei 4G dongles. We are also looking at different 4G dongles that don’t have this “extra” functionality.

If the task that is linked in your response is live/resolved soon, I’m willing to try it out to see if it fixed our problem.

I’m not at home in the capabilities of Grub, so going that route would require some serious investigation on my part. Not sure if I’m going to be able to invest the required time for that.

Thanks at least for the advice to solve the first issue!

I’ve been doing some experiments with Grub and booting with the 4G USB stick inserted. I see that the internal SSD has been shifted to (hd2) because the stick introduced both an (hd0) and a (hd1) drive in the mix, with (hd0) being presented as a drive with ISO9660 filesystem (basically it imitates a CD) and (hd1) being an invalid drive (ls (hd1) returns an invalid sector size error, most probably because it tries to access the integrated SDCard reader which has no card inserted).

Looking at the environment variables that are set I see that all Mender related ones seem to be aware that their files are stored on (hd2,1), which makes me wonder if it is possible to change the grub script in such a way that it also uses the boot drive reference (called prefix in the grub script) in a way that would allow it to modify the partition part from 1 to the relevant rootfs partition (so 2 or 3)?

That way there is no need for specifying UUIDs of filesystems or partitions and the grub config would still be robust against drive letter shifts.

Since I’m not an expert on what can and can’t be done in Grub2 scripts (total noob here), maybe someone with more advanced knowledge of Grub2 can tell me if something like that is possible?

– edit –

I’ve looked around on the internet and it seems that there might be a way to do this by using the regexp command. According to a post here:

it should be possible to extract the first part of the string of either prefix or root (that contains only the drive/partition pair, so should be less error prone to process that variable) using a regular expression.

Testing some regex online (hopefully using the same syntax as the Grub2 one command uses I found one that will capture the entire string up to the partition number. It is:

.\w*.\D+

The result of this pattern for the “(hd0, msdos1)” string is everything up to the 1 (so “(hd0, msdos” is the result for this case). I’ve also tried this out for other variations like “(hd0, 1)”, “(hd0,1)”, “(hd0,msdos1)”, “(hda, msdos1)” and all those worked. I don’t know if there are other variants that Grub2 will use for other device types, maybe someone with more knowledge of Grub2 can check if the pattern would cover all those cases.

If it is possible to use the “regexp -s varname pattern subject” command in the grub.cfg file to capture the first part of the boot drive, it should be possible with the string composing functionality to create the correct partition “id” to use for booting to the active partition as set by mender.

I haven’t been able to test the above solution because when I get dropped in the Grub 2.04 shell it seems that a number of commands (including regexp) are not available. I guess that someone has decided to reduce the size of the Grub2 interpreter for use with Mender.

Hi @PJK I don’t think we deliberately remove features from the build-time configuration but rather just use the defaults. I’ve been working on a more robust fix for this where grub actually provides a variable specifically for the device grub itself was launched from. We should then be able to build up the rest of the paths we need from that. Unfortunately I have not had time to get back to this recently and it will likely be several weeks before I have the time.

If you work out a solution with regex please post it here. We would love to see it.

Drew

Hi @drewmoseley. I’ve added the regexp command to the grub.cfg script used by mender and I get the message “unknown command” when it tries to execute it. I’ve noticed that other important grub2 commands (like help) also give this message when I try them out on the command line. The list of known commands see when pressing tab in the command shell is also much shorter than the one I saw when the boot into the Grub 2.02 shell happened (when the ubuntu directory was still a part of /EFI). So it seems that the regexp command is not built into the grub2 2.04 shell or something is missing that added that command to the Grub2 2.02 shell. Maybe in order for these commands to be built into Grub2 a change from default to a different level is needed?

I’ll try and see if I can find out how mender-convert builds/uses grub2.

This is probably the appropriate location,

And specifically which modules are built-in is listed here,

So should be fairly straight forward to add regexp command if necessary.

@mirzak: Thanks for the links. From the looks of it it seems that the Grub2 used by mender is somewhat slimmed down. I’ll see if I can create a less slim version and integrate that in our current boot image.

@mirzak
I’ve tried to change the script to include the required regexp module in the list of GRUB_MODULES, but the resulting BOOTX64.EFI file that created led to a halted Grub2 on startup (just shows a line cursor and nothing happens). Are there any dependent modules that need to be included besides regexp for this to work correctly? Is there a guide out there somewhere that can help me determine this?

Ok, I’ve built the default Grub2 and that one also halts with a bar cursor, so something is wrong in the build scripts (or something extra needs to be done). I’ve also build an x86_64 Grub2 from the sources available at the GNU website and following instructions available on the internet I managed to get it done. After installing that version of the BOOTX64.EFI file I got the same results.

Could it be that the 2.02 version of the file that is built does not conform to the other .EFI files that are placed in /EFI/BOOT?

I recently built and used GRUB successfully on ARM, using these build commands:

./bootstrap
./configure --with-platform=efi --host=arm-linux-gnueabihf
make
grub-mkimage -p /efi/boot -d grub-core -o bootarm.efi -O arm-efi <MODULES>

As mentioned, this is ARM, but you can just change or remove the “arm” parts where appropriate.

Notice in particular the grub-mkimage -p argument. This is important, and indeed the bootloader will stop at the prompt without it.

Also note that I used GRUB 2.04, and I know that on 2.02, you need to replace bootstrap with autogen.sh.

<MODULES> is the list of modules, which you are modifying. The original list I used is:

boot linux ext2 fat serial part_msdos part_gpt normal iso9660 configfile search loadenv test cat echo gcry_sha256 halt hashsum loadenv sleep reboot test

@kacf I’ve followed your steps (modified for the x86_64 architectures) and had no success with the resulting .EFI file. The only difference between our two build steps was that I didn’t use the --host option for ./configure and that the -p option for the grub-mkimage used /foobar (came from an example I found on the internet). But after copying the file to the boot partition of my “mender ready” Brix machine, it still ends up with a screen with the bar cursor in the upper left corner (so no command line or boot-through).

Is there a special way you place your generated boot file on the boot partition? Because up to now I’ve just been booting a USB stick based Ubuntu and copying the file from another USB stick to the boot partition (having copied the original BOOTX64.EFI file so that I can get back).

I’ve done some testing with copying back the original BOOTX64.EFI copy to see if that leads to a booting system, which it does, so the issue seems to be with the generated BOOTX64.EFI.

My build steps (on a Ubuntu 18.04 LTS VM in virtualbox) are:

./autogen.sh
./configure --target=x86_64 --with-platform=efi CPPFLAGS=-Wno-error=unused-value --host=amd64-linux-gnu
make clean
make
./grub-mkimage -d ./grub-core -o bootx64.efi -O x86_64-efi -p /efi/boot linux serial part_msdos part_gpt fat ntfs normal exfat iso9660 configfile search ls echo cat cpuid loadenv sleep reboot test hashsum loopback regexp read bsd probe ext2 gcry_sha256

Any obvious errors?

That looks correct to me. But the problem could also be with the boot scripts. If you first mount your boot partition locally (through loopback or from the memory card you are using), could you post the output from the following two commands:

find /mnt/boot
cat /mnt/boot/efi/boot/grub.conf

I’m assuming that /mnt/boot is your mountpoint in this case.

I found the blocking issue. The golden image used for Brix was created with a mender-convert version that used the Grub 2.04 build and not the Grub 2.02 build that you kept referring to. After building my enhanced Grub 2.04 BOOTX64.EFI and copying that one to the Brix boot partition I now have a working Grub again and one with RegExp as extra command.

Now I need to figure out ho to use regexp for my bootdrive extraction bit!

1 Like

Just added my solution for getting the boot drive from the root variable using regexp and it works like a charm.

I’ve added the line " regexp --set “mender_grub_storage_device” “(\w+)” “${root}” " (without the first and last quotes) just below “# load environment” section and now it doesn’t matter what other mass storage devices are plugged into the USB ports, it will boot to the currently active rootfs partition.

The only strange thing I do see is that I get the message “error: variable prefix not set” during the Grub section (between the grub greeting and the the two lock messages". I’ve added an echo statement for prefix just before the “# load environment” section to print the actual value and that didn’t show anything out of the ordinary (it shows the correct prefix). Looking online for this error I see posts that this is something that can happen but isn’t something to worry about.

If anybody has tips or tricks to avoid/remove this error I’ll be grateful as it does give you the impression that something is wrong.

Hi, just a small check before I start the final configuration of our Mender client devices with Mender 2.3 (took some time because we ran into some issues getting our final server infrastructure in place after having tested the preliminary client install against both the initial demo Mender server and then a trial pro Mender server).

Do I still need the Grub workaround using regexp with Mender 2.3? Or has the fix for the boot device determination that @drewmosely was working on made it into that version (or if you’ve integrated my method instead, that is also fine)?

That would save me some work with the whole setup process tomorrow.

We do not always release the tooling (mender-convert/Yocto) together with the top level releases.

The regexp fix was integrated here, https://github.com/mendersoftware/grub-mender-grubenv/commit/9f1621b2f791a7f6a2b0faf37e932d0f3c5507ed. This component is not versioned anymore, that we do not do releases of it and just base it on git revision.

But, above should be included in upcoming release of mender-convert, currently in beta, https://github.com/mendersoftware/mender-convert/releases/tag/2.1.0b1, you can see the revision here.

But if I clone the current mender convert version from git, does the fix come with it?

If you do:

git clone https://github.com/mendersoftware/mender-convert.git -b 2.1.0b1

then yes. Or you can also use the master branch which should have the same fix.

Thanks for the info, that saves me so much manual work!