Multiple problems with mender-growfs-data, GPT partition and mender-grow-data.service

Hi!

I’m seeing a few errors with our latest mender 2.2 build.

dmesg shows me this:

GPT:Primary header thinks Alt. header is not at the end of the disk.
GPT:61046783 != 61071359
GPT:Alternate GPT header not at the end of the disk.
GPT:61046783 != 61071359
GPT: Use GNU Parted to correct GPT errors.

systemctl shows me this:

● mender-grow-data.service - Mender service to grow data partition size
   Loaded: loaded (/lib/systemd/system/mender-grow-data.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-06-18 12:05:36 UTC; 54min ago
  Process: 212 ExecStart=/usr/bin/mender-client-resize-data-part (code=exited, status=1/FAILURE)
 Main PID: 212 (code=exited, status=1/FAILURE)

Jun 18 12:05:35 machine mender-client-resize-data-part[212]: and could in certain setups cause problems with:
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: 1) software that runs at boot time (e.g., old versions of LILO)
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: 2) booting and partitioning software from other OSs
Jun 18 12:05:35 machine mender-client-resize-data-part[212]:    (e.g., DOS FDISK, OS/2 FDISK)
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: Command (m for help): The partition table has been altered.
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: Calling ioctl() to re-read partition table
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: fdisk: WARNING: rereading partition table failed, kernel still uses old table: Device or resource busy
Jun 18 12:05:36 machine systemd[1]: mender-grow-data.service: Main process exited, code=exited, status=1/FAILURE
Jun 18 12:05:36 machine systemd[1]: mender-grow-data.service: Failed with result 'exit-code'.
Jun 18 12:05:36 machine systemd[1]: Failed to start Mender service to grow data partition size.

full log:

-- Logs begin at Thu 2020-06-18 12:05:33 UTC, end at Thu 2020-06-18 12:59:57 UTC. --
Jun 18 12:05:33 machine mender-client-resize-data-part[134]: The number of cylinders for this disk is set to 3786.
Jun 18 12:05:33 machine mender-client-resize-data-part[134]: There is nothing wrong with that, but this is larger than 1024,
Jun 18 12:05:33 machine mender-client-resize-data-part[134]: and could in certain setups cause problems with:
Jun 18 12:05:33 machine mender-client-resize-data-part[134]: 1) software that runs at boot time (e.g., old versions of LILO)
Jun 18 12:05:33 machine mender-client-resize-data-part[134]: 2) booting and partitioning software from other OSs
Jun 18 12:05:33 machine mender-client-resize-data-part[134]:    (e.g., DOS FDISK, OS/2 FDISK)
Jun 18 12:05:34 machine mender-client-resize-data-part[134]: Command (m for help): The partition table has been altered.
Jun 18 12:05:34 machine mender-client-resize-data-part[134]: Calling ioctl() to re-read partition table
Jun 18 12:05:34 machine mender-client-resize-data-part[134]: fdisk: WARNING: rereading partition table failed, kernel still uses old table: Device or resource busy
Jun 18 12:05:34 machine systemd[1]: mender-grow-data.service: Main process exited, code=exited, status=1/FAILURE
Jun 18 12:05:34 machine systemd[1]: mender-grow-data.service: Failed with result 'exit-code'.
Jun 18 12:05:34 machine systemd[1]: Failed to start Mender service to grow data partition size.
Jun 18 12:05:35 machine systemd[1]: Starting Mender service to grow data partition size...
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: The number of cylinders for this disk is set to 3786.
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: There is nothing wrong with that, but this is larger than 1024,
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: and could in certain setups cause problems with:
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: 1) software that runs at boot time (e.g., old versions of LILO)
Jun 18 12:05:35 machine mender-client-resize-data-part[212]: 2) booting and partitioning software from other OSs
Jun 18 12:05:35 machine mender-client-resize-data-part[212]:    (e.g., DOS FDISK, OS/2 FDISK)
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: Command (m for help): The partition table has been altered.
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: Calling ioctl() to re-read partition table
Jun 18 12:05:36 machine mender-client-resize-data-part[212]: fdisk: WARNING: rereading partition table failed, kernel still uses old table: Device or resource busy
Jun 18 12:05:36 machine systemd[1]: mender-grow-data.service: Main process exited, code=exited, status=1/FAILURE
Jun 18 12:05:36 machine systemd[1]: mender-grow-data.service: Failed with result 'exit-code'.
Jun 18 12:05:36 machine systemd[1]: Failed to start Mender service to grow data partition size.

disk is this:

machine:/home/user# fdisk -l
Disk /dev/mmcblk0: 29 GB, 31268536320 bytes, 61071360 sectors
3786 cylinders, 256 heads, 63 sectors/track
Units: sectors of 1 * 512 = 512 bytes

Device       Boot StartCHS    EndCHS        StartLBA     EndLBA    Sectors  Size Id Type
/dev/mmcblk0p1    0,0,2       1023,255,63          1   61046783   61046783 29.1G ee EFI GPT

The system is set up with these variables:

MENDER_STORAGE_TOTAL_SIZE_MB_DEFAULT = "29820"
MENDER_STORAGE_DEVICE = "/dev/mmcblk0"
MENDER_STORAGE_TOTAL_SIZE_MB = "29820"
MENDER_BOOT_PART_SIZE_MB = "128"
MENDER_DATA_PART_SIZE_MB = "1024"

Apparently we could disable the service via

MENDER_FEATURES_DISABLE_append = " mender-growfs-data"

but is this recommended? Any ideas what can be done to fix these errors?

Thanks in advance!

1 Like

It would be interesting to see if disabling mender-growfs-data resolves the issue. All that feature does is configure the systemd-growfs feature to grow the data partition; we are not directly doing anything with the partition so it’s possibly a systemd issue.

If you know the full size of your storage media you can just set the variables you mention to fill the entire disk and then not need the growfs feature.

Drew

There another reported problem that is similar to this here

Suggestion from above is to suppress the error of echo "w" | fdisk ${MENDER_STORAGE_DEVICE} as it does not seem to impact the end result. I do not see any problems with this suggestion. Thoughts?

Would be interesting to know if sgdisk -e ${MENDER_STORAGE_DEVICE} works on systems where echo "w" | fdisk ${MENDER_STORAGE_DEVICE} fails.

When running sgdisk I get the following output:
sgdisk -e /dev/mmcblk3


Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.


Non-GPT disk; not saving changes. Use -g to override.

Running sgdisk with the -g option works and will convert to a GPT. After converting to a GPT, fdisk reports one partition, but gdisk has the partitions listed correctly.

sgdisk -g -e /dev/mmcblk3


Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.


Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

gdisk /dev/mmcblk3
GPT fdisk (gdisk) version 1.0.4

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): q

What are the ramifications for Mender updates with switching to a GPT? Does the Mender update system prefer GPT?

Mender supports either type, there is no difference from Mender’s
perspective. Compatibility with MBR is sometimes better, especially in
more simple hardware, but GPT is more flexible, and is the modern
alternative.

Thank you for the clarification.
I ran a few more tests just to get some additional data points. In all my cases I’m using u-boot as my bootloader. I’m on an i.MX6, which is partition agnostic, the boot ROM does not require a base FAT partition.
Initially I was having mender generate a “mender-image-sd” image. And this resulted in the issues described above.
As a test I switched to generating a “mender-image-uefi” which does create the GPT, but the “data” partition expansion using “fdisk” still encounters the same error (similar to the issue reported by deffo). However, running “sgdisk -e /dev/mmcblk3” is successful without the “-g” required. BTW, in order to have access to sgdisk the “gptfdisk” package needs to be installed.

I migrated from yocto warrior to zeus, and now I see a similar error.
Sometimes mender-grow-data service runs successfully. However, sometimes I get a similar message reported above (when this message will appear again I’ll add it to this post, because now it seems working :confused:).

If I run “fdisk” without any argument I get this:

fdisk: bad usage
Try 'fdisk --help' for more information.

So, also “echo “w” | fdisk” returns:

fdisk: bad usage
Try 'fdisk --help' for more information.

fdisk -V returns:

fdisk from util-linux 2.34

fdisk -l returns:

Device         Boot   Start      End  Sectors  Size Id Type
/dev/mmcblk1p1 *      49152   180223   131072   64M  c W95 FAT32 (LBA)
/dev/mmcblk1p2       180224  4046847  3866624  1.9G 83 Linux
/dev/mmcblk1p3      4046848  7913471  3866624  1.9G 83 Linux
/dev/mmcblk1p4      7913472 31116287 23202816 11.1G 83 Linux

My config is:

MENDER_STORAGE_TOTAL_SIZE_MB ?= "4000"
MENDER_DATA_PART_SIZE_MB ?= "128"
MENDER_BOOT_PART_SIZE_MB = "64"
MENDER_IMAGE_BOOTLOADER_BOOTSECTOR_OFFSET = "2"
MENDER_IMAGE_BOOTLOADER_FILE = "u-boot.imx"
MENDER_UBOOT_STORAGE_INTERFACE = "mmc"
MENDER_UBOOT_STORAGE_DEVICE = "1"
MENDER_STORAGE_DEVICE = "/dev/mmcblk1"

With yocto warrior I never saw this error!

I don’t have sgdisk installed on my device at the moment, but I can try to add it, if required and available for my system.

Also, at every reboot a very long task starts:

*[ ] (1 of 2) A start job is running for…on /dev/mmcblk1p4 (16s / no limit)
data: fsck 20.2% complete…

It requires about a minute. mmcblk1p4 is the data partition.

But I don’t understand the reason of why that task starts at every reboot. It’s slowing down the boot process of my device. 60 seconds is really too much.

My /etc/fstab is:

cat /etc/fstab
# stock fstab - you probably want to override this with a machine specific one

/dev/root            /                    auto       defaults              1  1
proc                 /proc                proc       defaults              0  0
devpts               /dev/pts             devpts     mode=0620,gid=5       0  0
tmpfs                /run                 tmpfs      mode=0755,nodev,nosuid,strictatime 0  0
tmpfs                /var/volatile        tmpfs      defaults              0  0

# uncomment this if your device has a SD/MMC/Transflash slot
#/dev/mmcblk0p1       /media/card          auto       defaults,sync,noauto  0  0

# Where the U-Boot environment resides; for devices with SD card support ONLY!
/dev/mmcblk1p1       /uboot               auto       defaults,sync         0  2
/dev/mmcblk1p4       /data                auto       defaults              0  2

Do you have suggestions?

Based on the solution above

If you know the full size of your storage media you can just set the variables you mention to fill the entire disk and then not need the growfs feature.

do you think that I can disable growfs feature?

thanks.

Ok, I got the error.

When it’s starting it throws

[FAILED] Failed to start Mender ser…e to grow data partition size.
See 'systemctl status mender-grow-data.service' for details.

And systemctl status mender-grow-data returns:

* mender-grow-data.service - Mender service to grow data partition size
   Loaded: loaded (/lib/systemd/system/mender-grow-data.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2020-07-22 15:31:04 UTC; 1 weeks 4 days ago
  Process: 116 ExecStart=/usr/bin/mender-client-resize-data-part (code=exited, status=1/FAILURE)
 Main PID: 116 (code=exited, status=1/FAILURE)

Jul 22 15:31:03 mydevice mender-client-resize-data-part[116]: and could in certain setups cause problems with:
Jul 22 15:31:03 mydevice mender-client-resize-data-part[116]: 1) software that runs at boot time (e.g., old versions of LILO)
Jul 22 15:31:03 mydevice mender-client-resize-data-part[116]: 2) booting and partitioning software from other OSs
Jul 22 15:31:03 mydevice mender-client-resize-data-part[116]:    (e.g., DOS FDISK, OS/2 FDISK)
Jul 22 15:31:04 mydevice mender-client-resize-data-part[116]: Command (m for help): The partition table has been altered.
Jul 22 15:31:04 mydevice mender-client-resize-data-part[116]: Calling ioctl() to re-read partition table
Jul 22 15:31:04 mydevice mender-client-resize-data-part[116]: fdisk: WARNING: rereading partition table failed, kernel still uses old table: Device or resource busy
Jul 22 15:31:04 mydevice systemd[1]: mender-grow-data.service: Main process exited, code=exited, status=1/FAILURE
Jul 22 15:31:04 mydevice systemd[1]: mender-grow-data.service: Failed with result 'exit-code'.
Jul 22 15:31:04 mydevice systemd[1]: Failed to start Mender service to grow data partition size.

Probably, I understood the problem.
If I run my device with development build (as in the previous message) everything is ok, except for the very long task while booting.
Instead, when I use the production build I get the error above in this message.
If I run fdisk in terminal on the production build, I get this:

BusyBox v1.31.0 (2020-07-13 08:59:47 UTC) multi-call binary.

Usage: fdisk [-ul] [-C CYLINDERS] [-H HEADS] [-S SECTORS] [-b SSZ] DISK

Basically it’s a different fdisk utility. This comes from busybox, instead in development I was using the “original” fdisk utility.