Yocto Image wont boot from SSD

Hi Guys!

I created a core-image-full-cmdline Image for the intel-corei7-64 machine and have some big troubles booting it via grub (and everything else). The Image boots without any trouble from a usb drive but when I dd it to the SSD it always gets an kernel panic saying:

VFS: Cannot open root device “sda2” or unknown-block(0,0): error -6
Please append a correct “root=” boot option; here are the available partitions:

0100 4096 ram0
(driver?)
0101 4096 ram1
(driver?)
.
.
.
.
010f 4096 ram15
(driver?)

For me this looks like my ssd is not detected. Even with other grub configurations like root=/dev/mmcblk0p2 I get the same error.
I tried a lot to solve this but everytime I found a thread about this topic, this error is never answered.

I hope anybody of you can help me with that problem.

Kind regards

Alex

Which Yocto version are you running?

Also can you share the full list of below,

0100 4096 ram0
(driver?)
0101 4096 ram1
(driver?)
.
.
.
.
010f 4096 ram15
(driver?)

It would be interesting to see what device names you have.

Hi mirzak!

Thanks for your quick reply! I am using the thud branch.

The full list looks like that:

0100 4096 ram0
(driver?)
0101 4096 ram1
(driver?)
0103 4096 ram3
(driver?)
0104 4096 ram4
(driver?)
0105 4096 ram5
(driver?)
0106 4096 ram6
(driver?)
0107 4096 ram7
(driver?)
0108 4096 ram8
(driver?)
hub 1-1:1.0: USB hub found
0109 4096 ram9
(driver?)
010a 4096 ram10
(driver?)
010b 4096 ram11
(driver?)
010c 4096 ram12
(driver?)
010d 4096 ram13
(driver?)
010e 4096 ram14
(driver?)
010f 4096 ram15
(driver?)

This looks like the symptoms of this fix,

But this should have been included in thud already. Can you please verify that your bootargs contain the rootwait argument?

I had to add this manually to the grub.cfg and now it booted further! The new kernel panic is:

Kernel panic - not syncing: Now working init found . Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.

The next step I would try is adding the microcode.cpio to sda2 and 3 and adding

init /boot/microcode.cpio

to grub.cfg. Is this the right way to go?

You should not need to add this manually, and I suspect that something is going wrong during the build. That is why it is cascading in to further errors.

Can you please share the content of the generated grub.cfg from your device?

Of course I can do that:

# Start of ---------- 00_mender_grubenv_defines_grub.cfg ----------
mender_rootfsa_part=2
mender_rootfsb_part=3
mender_kernel_root_base=/dev/sda
mender_grub_storage_device=hd0
kernel_imagetype=bzImage
# End of ---------- 00_mender_grubenv_defines_grub.cfg ----------
# Start of ---------- 01_console_bootargs_grub.cfg ----------
set console_bootargs="console=tty0,115200n8 console=ttyS0,115200n8 console=ttyO0,115200n8 console=ttyAMA0,115200n8"
# End of ---------- 01_console_bootargs_grub.cfg ----------
set rootargs="rootwait"
# Start of ---------- 05_mender_setup_grub.cfg ----------
# See the fw_printenv script for how this works.

function maybe_pause {
    # By default we do nothing. debug-pause PACKAGECONFIG replaces this so we
    # can pause at strategic places.
    echo
}

# Load environment.

MENDER_ENV1=${prefix}/mender_grubenv1/env
MENDER_ENVPREFIX1=${prefix}/mender_grubenv1/
MENDER_LOCK1=${prefix}/mender_grubenv1/lock
MENDER_LOCKSUM1=${prefix}/mender_grubenv1/lock.sha256sum
MENDER_ENV2=${prefix}/mender_grubenv2/env
MENDER_ENVPREFIX2=${prefix}/mender_grubenv2/
MENDER_LOCK2=${prefix}/mender_grubenv2/lock
MENDER_LOCKSUM2=${prefix}/mender_grubenv2/lock.sha256sum

function mender_check_and_restore_env {
    if ! hashsum --hash sha256 --prefix ${MENDER_ENVPREFIX2} --check ${MENDER_LOCKSUM2}; then
        load_env --skip-sig --file ${MENDER_ENV1} bootcount mender_boot_part upgrade_available
        save_env --file ${MENDER_ENV2} bootcount mender_boot_part upgrade_available
        editing=0
        save_env --file ${MENDER_LOCK2} editing
        if ! hashsum --hash sha256 --prefix ${MENDER_ENVPREFIX2} --check ${MENDER_LOCKSUM2}; then
            echo "Environment 2 still corrupt after attempted restore. Halting."
            halt
        fi
    elif ! hashsum --hash sha256 --prefix ${MENDER_ENVPREFIX1} --check ${MENDER_LOCKSUM1}; then
        load_env --skip-sig --file ${MENDER_ENV2} bootcount mender_boot_part upgrade_available
        save_env --file ${MENDER_ENV1} bootcount mender_boot_part upgrade_available
        editing=0
        save_env --file ${MENDER_LOCK1} editing
        if ! hashsum --hash sha256 --prefix ${MENDER_ENVPREFIX1} --check ${MENDER_LOCKSUM1}; then
            echo "Environment 1 still corrupt after attempted restore. Halting."
            halt
        fi
    fi
}

function mender_save_env {
    # Save redundant environment.
    editing=1
    save_env --file ${MENDER_LOCK2} editing
    save_env --file ${MENDER_ENV2} bootcount mender_boot_part upgrade_available
    editing=0
    save_env --file ${MENDER_LOCK2} editing

    editing=1
    save_env --file ${MENDER_LOCK1} editing
    save_env --file ${MENDER_ENV1} bootcount mender_boot_part upgrade_available
    editing=0
    save_env --file ${MENDER_LOCK1} editing
}

function mender_check_grubenv_valid {
    if [ "${mender_boot_part}" != "${mender_rootfsa_part}" -a "${mender_boot_part}" != "${mender_rootfsb_part}" ]; then
        return 1
    fi

    if [ "${bootcount}" != "0" -a "${bootcount}" != "1" ]; then
        return 1
    fi

    if [ "${upgrade_available}" != "0" -a "${upgrade_available}" != "1" ]; then
        return 1
    fi

    return 0
}

mender_check_and_restore_env

# Now load environment.

# Skipping signatures?? Yes, because these values will change over time, so they
# cannot be signed. There is also no checksum facility that will work for
# changing values. Instead we check their content for validity.
load_env --skip-sig --file ${MENDER_ENV1} bootcount mender_boot_part upgrade_available

if ! mender_check_grubenv_valid; then
    if [ "${check_signatures}" == "enforce" ]; then
        echo "Unverified environment and signatures enabled. Halting."
        halt
    fi
fi

if [ "${upgrade_available}" = "1" ]; then
    if [ "${bootcount}" != "0" ]; then
        echo "Rolling back..."
        if [ "${mender_boot_part}" = "${mender_rootfsa_part}" ]; then
            mender_boot_part="${mender_rootfsb_part}"
        else
            mender_boot_part="${mender_rootfsa_part}"
        fi
        upgrade_available=0
        bootcount=0
    else
        echo "Booting new update..."
        bootcount=1
    fi

    mender_save_env
fi
# End of ---------- 05_mender_setup_grub.cfg ----------
# Start of ---------- 10_bootargs_grub.cfg ----------
set bootargs="${bootargs} ${console_bootargs} ${rootargs}"
# End of ---------- 10_bootargs_grub.cfg ----------
# Start of ---------- 90_mender_boot_grub.cfg ----------
if test -e (${mender_grub_storage_device},gpt${mender_boot_part})/; then
    ptable_type=gpt
else
    ptable_type=msdos
fi

if test -n "${kernel_devicetree}"; then
    if test -e (${mender_grub_storage_device},${ptable_type}${mender_boot_part})/boot/${kernel_devicetree}; then
        devicetree (${mender_grub_storage_device},${ptable_type}${mender_boot_part})/boot/${kernel_devicetree}
    fi
fi

if [ test -n "${mender_rootfsa_uuid}" -a test -n  "${mender_rootfsb_uuid}" ]; then
    if [ "${mender_boot_part}" = "${mender_rootfsa_part}" ]; then
        mender_root="PARTUUID=${mender_rootfsa_uuid}"
    elif [ "${mender_boot_part}" = "${mender_rootfsb_part}" ]; then
        mender_root="PARTUUID=${mender_rootfsb_uuid}"
    fi
else
    mender_root="${mender_kernel_root_base}${mender_boot_part}"
fi

if linux (${mender_grub_storage_device},${ptable_type}${mender_boot_part})/boot/${kernel_imagetype} root=${mender_root} ${bootargs}; then
    maybe_pause "Pausing before booting."
    boot
fi
maybe_pause "Pausing after failed boot."
# End of ---------- 90_mender_boot_grub.cfg ----------
# Start of ---------- 95_mender_try_to_recover_grub.cfg ----------
if [ "${upgrade_available}" == "1" ]; then
    reboot
fi
# End of ---------- 95_mender_try_to_recover_grub.cfg ----------

Hm, do not see anything wrong with this file.

I would check the boot log again, I suspect that there are other errors hidden there as this one,

Kernel panic - not syncing: Now working init found . Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.

does not really say what went wrong, just that it was not able to start the init process of the rootfs.

I dont now how to get the bootlog into a textfile. Is it fine for you if I post a pic of it?

Would prefer text, which you typically can get out of a serial console. But if not possible, then a picture will have to do :slight_smile:

Before it takes me hours to get it into a textfile :smiley:

Yeah, that log does not give away why it is failing.

I would try to attach the SSD to a PC and make sure that I can mount all the partitions and do a sanity check of content, especially the rootfs partition since this is probably failing for some reason.

I do that for days now… What I find suspicious is that the /boot/ folder in the rootfs does not contain any initrd like microcode.cpio or initramfs.cpio.gz. Is this normal for a default mender built?

I do that for days now… What I find suspicious is that the /boot/ folder in the rootfs does not contain any initrd like microcode.cpio or initramfs.cpio.gz. Is this normal for a default mender built?

Yes this is normal in a Mender build.

I am curious, are you trying to integrate Mender in a existing Yocto setup or is it a “bare” setup with poky/oe-core + mender?

This whole Yocto - Mender thing is an experiment to see if it works with the hardware that I currently have available. So it’s just a base yocto setup with mender. Since I am not using super standard hardware like a beaglebone or raspberry pi I start to believe that the problems are on hardware site. I am using a x86 ipc with an intel atom chip.

One thing I’ve seen, esp on X86 PC-like systems, is that if there is a USB drive inserted, the device numbering can change. Can you make sure that the SSD is the only block device in the system when you try to boot?

If that turns out to be the case, then you likely will need to enable the mender-partuuid feature to use PARTUUIDs rather than device node names.

Thanks for your advice!

You are right, it was only booting further because it detected the usb drive as /dev/sda… I now tried it with mender-partuuid enabled but I still got the kernel panic from my first post. Where do I have to enable mender-partuuid exactly? I put it in MENDER_FEATURES_ENABLE_append in local.conf and added

MENDER_BOOT_PART="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_ROOTFS_PART_A="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_ROOTFS_PART_B="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_DATA_PART="/dev/disk/by-partuuid/#########-####-####-####-############"

with proper partuuid’s of course.

You are right, it was only booting further because it detected the usb drive as /dev/sda

Did you remove the USB now?

I put it in MENDER_FEATURES_ENABLE_append in local.conf

This should have been enough, as it would generate and assign the appropriate UUIDS during image build.

Yes I did. And this leaded to the kernel panic i described in the beginning. So I did a new build. But without

MENDER_BOOT_PART="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_ROOTFS_PART_A="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_ROOTFS_PART_B="/dev/disk/by-partuuid/#########-####-####-####-############"

MENDER_DATA_PART="/dev/disk/by-partuuid/#########-####-####-####-############"

set it leads me to a build failure /dev/mmcblk0p2 Does not contain a valid PARTUUID path and for MENDER_STORAGE_DEVICE = "/dev/sda" its the same. Is partuuid the right way to work with SSD’s in Yocto builds?

set it leads me to a build failure /dev/mmcblk0p2 Does not contain a valid PARTUUID path and for MENDER_STORAGE_DEVICE = "/dev/sda" its the same

Ah, I see now. This is correct. I though we generated the UUIDS but you need to provide them as can be seen here,

Is partuuid the right way to work with SSD’s in Yocto builds?

Should work with both, partuuid has a couple of benefits as you are not longer bound to /dev/sdaX names etc.