How to migrate live devices to Mender?

We would like to migrate our devices to be managed by Mender, preferably without having to call them back in since we already shipped several hundred. Trust me, I really regret not thinking this through better before we shipped, but at the time we did not have the right experience or the time to do so. Anyway here I am, trying to set it right.

Currently our devices are running default Raspbian and we manage the software from a private Debian repository. I have seen the new mender-convert tool but it seems I should have used that before we started shipping?

Is it possible to have an unattended procedure that will convert a running Raspbian system to a Mender system? Has anyone ever done this before that’s willing to share his or her expertise on the matter?

1 Like

Hi @erikhh,

I have seen the new mender-convert tool but it seems I should have used that before we started shipping?

Yes this an correct assumption.

Is it possible to have an unattended procedure that will convert a running Raspbian system to a Mender system? Has anyone ever done this before that’s willing to share his or her expertise on the matter?

I have never done this but in theory it might work if really necessary.

There is some complexity to it and doing a “live migration” will open a window of where devices would be vulnerable to power loss/interruptions which could lead to “bricked/broken devices”.

Unfortunately there is not a magical script for this :), at least not that I am aware of.

But essentially there are a couple of steps that need to be performed and I would probably do them one at a time as it is not required that they are performed at the same time:

  • You need to re-partition the disk running on your devices in the field

This depends on what your current partitioning schema looks like, but if it is only “boot part” + “rootfs part” (which is the default of Raspbian) then this can be done rather risk free.

Meaning that you need to shrink the root filesystem to free up blocks for new partitions, add a second root filesystem parition + add data partition (persistent storage)

  • You need to install the Mender client on your running systems

Essentially everything that is done in this script in mender-convert:

  • You need to integration U-boot on your running system (highest risk here)

By default the Raspbian boot process is that the “boot firmware” loads the Linux kernel and the system starts. Mender requires a bootloader to integrate with which means that you need to change the boot process to “boot firmware” -> u-boot -> Linux.

Essentially what is done in this script in mender-convert:

Note that a custom U-boot is integrated in above, which already has all the necessary Mender parts to it.

Hope this provides some insights to the effort.

Hi @mirzak,

Those tips are very helpful, thank you so much!

Cheers,
Erik

Hi @erikhh,

This is a pretty hard but interesting problem. :slight_smile:
One tool that might help as you’ll need to repartition a live system: https://github.com/marcan/takeover.sh

Another option is to rely on application-based updates for the existing devices, and use system updates for the new ones only. The next release of Mender, due in a few months, will add support for application updates through “update modules”, and we will provide some common modules to support package managers, files and container based updates. This is not to replace system updates, but in your case it is easier to “retrofit” (just install Mender as an application, rather than make system-level changes). The development ticket is here: https://tracker.mender.io/browse/MEN-2000 (also see “Issues in Epic”).

In any case, would be interesting for everyone to learn which approach you choose and your experience along the way!

Hi @eyestein,

I’m currently working on a script to make this happen. My current approach does involve a live re-partitioning of the system. I base myself mostly on this quite excellent StackExchange post but I will definitely checkout takeover.sh to see if it has more useful pieces to my puzzle, thank you very much for that tip!

When I can do the re-partitioning of the live system my assumption is that I can pretty much do whatever I want, so that’s what I’ll do, let me elaborate a bit.

I’m making a migration script, I’ll deliver that script to the devices using our current apt based update mechanism. I’ll make our software on the device schedule the execution of it somewhere in the dark of night. That way it’s not so likely that it will get interrupted. My current plan is to let the script do the following:

  1. Shrink the root fs.
  2. Make a new data partition at the end of the disk.
  3. Re-mount the root fs and copy user data to the data partition, and put the mender /data things in place.
  4. I’ll make binary images of the boot and system partition of our first released yocto/mender image. The migration script can download these and verify their checksums.
  5. Unmount the old root.
  6. Create the boot, sys-a and sys-b partitions.
  7. dd the images to their respective partitions.
  8. Clean up and reboot.

The main problem I have then left is the devices that are not on 24/7. But we already have infrastructure in place in our product that will inform users when their device is out of date. I plan to leverage that to let the users initiate the upgrade on their own terms. We can surround that with ample messages about being patient and the risk of bricking the device when the process is interrupted.

That’s the basic plan now. I’ll let you know if it actually works, but I need a couple more weeks to make it all and roll it out and such.

Kind regards,
Erik

Quick progress update:

The migration script as I laid out above is working now. It’s executed in two phases, the first moves everything into ramfs and restarts everything that needs to let go of the old root filsystem. The last thing that will need to let go is the script itself so it kicks of the second part with systemd at the end. The second script does all the copying and partitioning, etc.

On our setup the whole migration process takes about 10 minutes. It will be shorter on smaller SD cards I guess.

We’ve decided to keep our current update apt based mechanism in place during the rollout of the migration. We’ll use our application infrastructure to remotely trigger the migration in small batches so we can proceed with caution and keep a close eye on things.
That way we can phase the rollout, keep an eye on it and make adjustments to the migration script if need be. Should we encounter unexpectedness out there, at least we won’t have broken hundreds of devices in a single night.

Cheers,
Erik

Hi erikhh

We also did this for ~350 devices in the field running on Toradex Imx6 DL (EMMC based). We used the approach described below:

1.) Upload small rootfs (initramfs)
2.) Boot from initramfs and resize + repartition the disks
3.) Patch u-boot scripts to install new mender patched u-boot bootloader on device
4.) Boot from the old partition
5.) DD new mender image and manual change mender_boot_part through fw_setenv

1 Like

That’s pretty interesting!

Would either of you be willing to share your script / process with the communtiy, @erikhh or @jormenjanssen?

I hope to share the information/used approach and scripts with you soon. I cannot give an exact time-frame for this

Ok we’re finally well on our way most of the migration is behind us now. Let me try to share the process we used.

First of all this is a complicated process. To catch any errors we might have missed we need a way where we can gradually migrate our devices to Mender. So that we won’t run the risk of bricking all devices in a single night. We used our own application infrastructure to be able to send a "migrate’ command to our devices.

Since we really don’t want people to pull the power during the migration, and our devices are usually on 24/7 we decided to only do the actual migrations in the dead of night (2 am local time).

We’ve put binary images of the boot partition and the system partition of our new Mender build up for download. When a device receives the migrate command it firsts starts downloading these two files and verifies their checksums. After downloading the device will wait until it’s 2 am.

For the actual migration we use two scripts on the device. The first script removes everything not needed for the migration and moves a minimal system onto a RAM disk. Then it makes all processes let go of the root file system. The last step is to kick of the second script wich will then run from the rootfs in ram.

The second script does, first reduces the old root fs to the minimum, this frees up enough space to make the new data partition behind it. Then it makes the data partition. Copies all user data over from the old root to the new data partition and the two downloded image files as well.It also sets up the needed Mender config that’s needed on the data partition.

Then it simply deletes the old root and boot partitions repartions it with a boot sys-a and sys-b partition. And used DD to write the two downloaded partition images to boot and sys-a. Then it reboots, and the device is running the new Mender managed system.

So for reference these the two migration scripts are below, I removed the company sensitive bits you’ll need to fill those places in yourself. We start the scripts with systemd.

1 Like
# migrate1.sh
#!/usr/bin/env bash

set -x

# Just delete all log files to begin with.
rm -rf /var/log/*

# I simply hardcoded all the sectors based on our Yocto image.
BEGIN_SECTOR_DATA_PART=8118272

# Determine the minimal needed space for the old root partition.
MIN_FS_BLOCKS=$(resize2fs /dev/mmcblk0p2 -P | awk -F ':' '{print $2}')
OLD_ROOT_START_SECTOR=$(fdisk -l /dev/mmcblk0|tail -n 2|head -n 1|awk '{print $2}')
let "MIN_ROOT_SIZE_BYTE = $MIN_FS_BLOCKS * 4096"
let "MIN_ROOT_SIZE_FS_SECTOR = $MIN_ROOT_SIZE_BYTE / 512"
let "OLD_ROOT_MIN_END_SECTOR = $MIN_ROOT_SIZE_FS_SECTOR + 
OLD_ROOT_START_SECTOR + 1"

# Make sure we can free up enough space to make room for the
# data partition at the end of the disk. Or we bail out.
if [ $BEGIN_SECTOR_DATA_PART -gt $OLD_ROOT_MIN_END_SECTOR ]
then
  echo "It should fit!"
else
  echo $BEGIN_SECTOR_DATA_PART " > " $OLD_ROOT_MIN_END_SECTOR
  echo "ERR: The root partition is too big"
  exit 1 #TODO: Can I do something more intelligent?
fi

# Stop as much as we can
systemctl stop <all services you don't need to migrate>

# Delete things we don't need to complete this process. 
# The whole system needs to fit in RAM make it as small as you can.
apt-get -y remove <any packages not needed to migrate>
apt-get -y autoremove
apt-get -y clean
rm -rf /lib/modules/4.4.48*
rm -rf /lib/modules/4.9.35+

# Make sure systemd won't interfere with / and /boot from here on.
echo "proc            /proc           proc    defaults          0       0" > /etc/fstab

# Unmount all unused filesystems
umount -a
swapoff -a

# Make a temporary root
mkdir /tmp/tmproot
mount -t tmpfs none /tmp/tmproot
mkdir /tmp/tmproot/{proc,sys,dev,run,usr,var,tmp,oldroot}
cp -ax /{bin,etc,sbin,lib} /tmp/tmproot/
cp -ax /usr/{bin,sbin,lib} /tmp/tmproot/usr/
cp -ax /var/{lib,local,lock,opt,run,spool,tmp} /tmp/tmproot/var/

# Pivot to new root
mount --make-rprivate /
pivot_root /tmp/tmproot /tmp/tmproot/oldroot
for i in dev proc sys run; do mount --move /oldroot/$i /$i; done

# Restart everything that's using /oldroot
systemctl restart <everything still left>
# Restart systemd itself.
systemctl daemon-reexec

# I couldn't get rid of all with systemd so using more force here.
kill -9 $(pidof wpa_supplicant)
kill -9 $(pidof agetty)
kill -9 $(pidof hciattach)
 
sleep 5

# Initiate phase two, this will remove the last hold on /oldroot, this script.
systemctl start migrate-2.service
1 Like
# migrate-2.sh
#!/usr/bin/env bash

set -x

if [ -z "$1" ]
  then
    echo "Target device not provided"
    exit 1
fi

DEVICE=$1
PARTITION_1=p1
PARTITION_2=p2
PARTITION_3=p3
PARTITION_4=p4

# The files that where downloaded before all this started.
BOOT_IMG="uboot.img"
SYS_IMG="sys.img"

# I just read these out by doing fdisk -l on the Yocto image.
BEGIN_SECTOR_BOOT=24576
END_SECTOR_BOOT=106495
BEGIN_SECTOR_SYS_A=106496
END_SECTOR_SYS_A=4112383
BEGIN_SECTOR_SYS_B=4112384
END_SECTOR_SYS_B=8118271
BEGIN_SECTOR_DATA_PART=8118272

# Kill everything that might still have a hold on the old root, never know.
fuser -Mk /oldroot

sleep 5
# Unmout the old root filesystem, so we can do things to it.
umount /oldroot
sleep 5

# Determine the new end sector for the old root partition.
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
MIN_FS_BLOCKS=$(resize2fs -P $DEVICE$PARTITION_2 | awk -F ':' '{print $2}')
OLD_ROOT_START_SECTOR=$(fdisk -l $DEVICE|grep $DEVICE$PARTITION_2|awk '{print $2}')

# Need to forcefully fix any FS errors that might be there or none of this will work.
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
OLD_ROOT_FS_BLOCK_SIZE=$(resize2fs -M $DEVICE$PARTITION_2|tail -n2|head -n1|awk '{print $7}')
let "OLD_ROOT_END_SECTOR = (($OLD_ROOT_FS_BLOCK_SIZE * 4096) / 512) + 
$OLD_ROOT_START_SECTOR + 1"
echo "End sector old root " $OLD_ROOT_END_SECTOR
echo "Start sector n data " $BEGIN_SECTOR_DATA_PART
sleep 5

# Shrink the old root file system.
fdisk $DEVICE <<EOF1
d
2
n
p
2
137216
$OLD_ROOT_END_SECTOR
w
EOF1

sleep 5
partprobe $DEVICE
sleep 5
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
sleep 5

# Make a new data partition at the end of the disk
fdisk $DEVICE <<EOF2
n
p
4
$BEGIN_SECTOR_DATA_PART

w
EOF2

sleep 5
partprobe $DEVICE
sleep 5

# Put a filesystem on the data partition
mkfs.ext4 -F -q $DEVICE$PARTITION_4

# Mount everything so we can start copying user data.
mkdir -p /newroot/data
mount -o rw $DEVICE$PARTITION_4 /newroot/data/
mount -o rw $DEVICE$PARTITION_2 /oldroot/
mount -o rw $DEVICE$PARTITION_1 /oldroot/boot

# Copy user data
mkdir -p /newroot/data/{etc,mender,temp,u-boot} # Whatever directory structure you need really.

# Copy all your userdata here.

# Copy over the new image files
cp -avx /oldroot/var/cache/migrate/{$BOOT_IMG*,$SYS_IMG*} /newroot/data/temp

# You'll want to keep the network config too.
cp -avx /oldroot/etc/wpa_supplicant/wpa_supplicant.conf /newroot/data/etc/wpa_supplicant.conf

 # These you need to make Mender happy. If you don't the migration will succeed. 
# But the any update done with Mender will fail. I don't really know what it means I just copied it
# off the Yocto image.
echo "device_type=raspberrypi-cm3" > /newroot/data/mender/device_type
echo "/dev/mmcblk0 0x400000 0x4000" > /newroot/data/u-boot/fw_env.config
echo "/dev/mmcblk0 0x800000 0x4000" >> /newroot/data/u-boot/fw_env.config

# Defintitley done with the old filsystems now.
umount /oldroot/boot
umount /oldroot
umount /newroot/data
sleep 5

# Delete boot and root partitions
fdisk $DEVICE <<EOF
d
1
d
2
w
EOF

sleep 5

# Create new boot partition, sys a and sys b
fdisk $DEVICE <<EOF
n
p
1
$BEGIN_SECTOR_BOOT
$END_SECTOR_BOOT
t
1
c
a
1
w
EOF

sleep 5

# Create new system partition A
fdisk $DEVICE << EOF
n
p
2
$BEGIN_SECTOR_SYS_A
$END_SECTOR_SYS_A
w
EOF

sleep 5

# Create new system partition B
fdisk $DEVICE << EOF
n 
p
3
$BEGIN_SECTOR_SYS_B
$END_SECTOR_SYS_B
w
EOF

partprobe $DEVICE
sleep 5

# Re mount the data partion so we can read the image files.
mount -o rw $DEVICE$PARTITION_4 /newroot/data/

# Write the new partitions from the image
# Boot
zcat /newroot/data/temp/$BOOT_IMG.gz | dd of=$DEVICE$PARTITION_1 bs=1M
# System A
zcat /newroot/data/temp/$SYS_IMG.gz | dd of=$DEVICE$PARTITION_2 bs=1M

# Force checks, repairs, the lot.
fsck -f -a $DEVICE$PARTITION_1
fsck -f -y $DEVICE$PARTITION_1
fsck -f -a $DEVICE$PARTITION_2
fsck -f -y $DEVICE$PARTITION_2

# Clean up
rm -rf /newroot/data/temp
umount /newroot/uboot
umount /newroot/data

# Fingers crossed, it should come back as new now.
reboot now
1 Like