How to migrate live devices to Mender?

We would like to migrate our devices to be managed by Mender, preferably without having to call them back in since we already shipped several hundred. Trust me, I really regret not thinking this through better before we shipped, but at the time we did not have the right experience or the time to do so. Anyway here I am, trying to set it right.

Currently our devices are running default Raspbian and we manage the software from a private Debian repository. I have seen the new mender-convert tool but it seems I should have used that before we started shipping?

Is it possible to have an unattended procedure that will convert a running Raspbian system to a Mender system? Has anyone ever done this before thatā€™s willing to share his or her expertise on the matter?

1 Like

Hi @erikhh,

I have seen the new mender-convert tool but it seems I should have used that before we started shipping?

Yes this an correct assumption.

Is it possible to have an unattended procedure that will convert a running Raspbian system to a Mender system? Has anyone ever done this before thatā€™s willing to share his or her expertise on the matter?

I have never done this but in theory it might work if really necessary.

There is some complexity to it and doing a ā€œlive migrationā€ will open a window of where devices would be vulnerable to power loss/interruptions which could lead to ā€œbricked/broken devicesā€.

Unfortunately there is not a magical script for this :), at least not that I am aware of.

But essentially there are a couple of steps that need to be performed and I would probably do them one at a time as it is not required that they are performed at the same time:

  • You need to re-partition the disk running on your devices in the field

This depends on what your current partitioning schema looks like, but if it is only ā€œboot partā€ + ā€œrootfs partā€ (which is the default of Raspbian) then this can be done rather risk free.

Meaning that you need to shrink the root filesystem to free up blocks for new partitions, add a second root filesystem parition + add data partition (persistent storage)

  • You need to install the Mender client on your running systems

Essentially everything that is done in this script in mender-convert:

https://github.com/mendersoftware/mender-convert/blob/master/convert-stage-4.sh

  • You need to integration U-boot on your running system (highest risk here)

By default the Raspbian boot process is that the ā€œboot firmwareā€ loads the Linux kernel and the system starts. Mender requires a bootloader to integrate with which means that you need to change the boot process to ā€œboot firmwareā€ ā†’ u-boot ā†’ Linux.

Essentially what is done in this script in mender-convert:

https://github.com/mendersoftware/mender-convert/blob/master/rpi-convert-stage-5.sh

Note that a custom U-boot is integrated in above, which already has all the necessary Mender parts to it.

Hope this provides some insights to the effort.

Hi @mirzak,

Those tips are very helpful, thank you so much!

Cheers,
Erik

Hi @erikhh,

This is a pretty hard but interesting problem. :slight_smile:
One tool that might help as youā€™ll need to repartition a live system: https://github.com/marcan/takeover.sh

Another option is to rely on application-based updates for the existing devices, and use system updates for the new ones only. The next release of Mender, due in a few months, will add support for application updates through ā€œupdate modulesā€, and we will provide some common modules to support package managers, files and container based updates. This is not to replace system updates, but in your case it is easier to ā€œretrofitā€ (just install Mender as an application, rather than make system-level changes). The development ticket is here: https://tracker.mender.io/browse/MEN-2000 (also see ā€œIssues in Epicā€).

In any case, would be interesting for everyone to learn which approach you choose and your experience along the way!

Hi @eyestein,

Iā€™m currently working on a script to make this happen. My current approach does involve a live re-partitioning of the system. I base myself mostly on this quite excellent StackExchange post but I will definitely checkout takeover.sh to see if it has more useful pieces to my puzzle, thank you very much for that tip!

When I can do the re-partitioning of the live system my assumption is that I can pretty much do whatever I want, so thatā€™s what Iā€™ll do, let me elaborate a bit.

Iā€™m making a migration script, Iā€™ll deliver that script to the devices using our current apt based update mechanism. Iā€™ll make our software on the device schedule the execution of it somewhere in the dark of night. That way itā€™s not so likely that it will get interrupted. My current plan is to let the script do the following:

  1. Shrink the root fs.
  2. Make a new data partition at the end of the disk.
  3. Re-mount the root fs and copy user data to the data partition, and put the mender /data things in place.
  4. Iā€™ll make binary images of the boot and system partition of our first released yocto/mender image. The migration script can download these and verify their checksums.
  5. Unmount the old root.
  6. Create the boot, sys-a and sys-b partitions.
  7. dd the images to their respective partitions.
  8. Clean up and reboot.

The main problem I have then left is the devices that are not on 24/7. But we already have infrastructure in place in our product that will inform users when their device is out of date. I plan to leverage that to let the users initiate the upgrade on their own terms. We can surround that with ample messages about being patient and the risk of bricking the device when the process is interrupted.

Thatā€™s the basic plan now. Iā€™ll let you know if it actually works, but I need a couple more weeks to make it all and roll it out and such.

Kind regards,
Erik

Quick progress update:

The migration script as I laid out above is working now. Itā€™s executed in two phases, the first moves everything into ramfs and restarts everything that needs to let go of the old root filsystem. The last thing that will need to let go is the script itself so it kicks of the second part with systemd at the end. The second script does all the copying and partitioning, etc.

On our setup the whole migration process takes about 10 minutes. It will be shorter on smaller SD cards I guess.

Weā€™ve decided to keep our current update apt based mechanism in place during the rollout of the migration. Weā€™ll use our application infrastructure to remotely trigger the migration in small batches so we can proceed with caution and keep a close eye on things.
That way we can phase the rollout, keep an eye on it and make adjustments to the migration script if need be. Should we encounter unexpectedness out there, at least we wonā€™t have broken hundreds of devices in a single night.

Cheers,
Erik

Hi erikhh

We also did this for ~350 devices in the field running on Toradex Imx6 DL (EMMC based). We used the approach described below:

1.) Upload small rootfs (initramfs)
2.) Boot from initramfs and resize + repartition the disks
3.) Patch u-boot scripts to install new mender patched u-boot bootloader on device
4.) Boot from the old partition
5.) DD new mender image and manual change mender_boot_part through fw_setenv

1 Like

Thatā€™s pretty interesting!

Would either of you be willing to share your script / process with the communtiy, @erikhh or @jormenjanssen?

I hope to share the information/used approach and scripts with you soon. I cannot give an exact time-frame for this

Ok weā€™re finally well on our way most of the migration is behind us now. Let me try to share the process we used.

First of all this is a complicated process. To catch any errors we might have missed we need a way where we can gradually migrate our devices to Mender. So that we wonā€™t run the risk of bricking all devices in a single night. We used our own application infrastructure to be able to send a "migrateā€™ command to our devices.

Since we really donā€™t want people to pull the power during the migration, and our devices are usually on 24/7 we decided to only do the actual migrations in the dead of night (2 am local time).

Weā€™ve put binary images of the boot partition and the system partition of our new Mender build up for download. When a device receives the migrate command it firsts starts downloading these two files and verifies their checksums. After downloading the device will wait until itā€™s 2 am.

For the actual migration we use two scripts on the device. The first script removes everything not needed for the migration and moves a minimal system onto a RAM disk. Then it makes all processes let go of the root file system. The last step is to kick of the second script wich will then run from the rootfs in ram.

The second script does, first reduces the old root fs to the minimum, this frees up enough space to make the new data partition behind it. Then it makes the data partition. Copies all user data over from the old root to the new data partition and the two downloded image files as well.It also sets up the needed Mender config thatā€™s needed on the data partition.

Then it simply deletes the old root and boot partitions repartions it with a boot sys-a and sys-b partition. And used DD to write the two downloaded partition images to boot and sys-a. Then it reboots, and the device is running the new Mender managed system.

So for reference these the two migration scripts are below, I removed the company sensitive bits youā€™ll need to fill those places in yourself. We start the scripts with systemd.

2 Likes
# migrate1.sh
#!/usr/bin/env bash

set -x

# Just delete all log files to begin with.
rm -rf /var/log/*

# I simply hardcoded all the sectors based on our Yocto image.
BEGIN_SECTOR_DATA_PART=8118272

# Determine the minimal needed space for the old root partition.
MIN_FS_BLOCKS=$(resize2fs /dev/mmcblk0p2 -P | awk -F ':' '{print $2}')
OLD_ROOT_START_SECTOR=$(fdisk -l /dev/mmcblk0|tail -n 2|head -n 1|awk '{print $2}')
let "MIN_ROOT_SIZE_BYTE = $MIN_FS_BLOCKS * 4096"
let "MIN_ROOT_SIZE_FS_SECTOR = $MIN_ROOT_SIZE_BYTE / 512"
let "OLD_ROOT_MIN_END_SECTOR = $MIN_ROOT_SIZE_FS_SECTOR + 
OLD_ROOT_START_SECTOR + 1"

# Make sure we can free up enough space to make room for the
# data partition at the end of the disk. Or we bail out.
if [ $BEGIN_SECTOR_DATA_PART -gt $OLD_ROOT_MIN_END_SECTOR ]
then
  echo "It should fit!"
else
  echo $BEGIN_SECTOR_DATA_PART " > " $OLD_ROOT_MIN_END_SECTOR
  echo "ERR: The root partition is too big"
  exit 1 #TODO: Can I do something more intelligent?
fi

# Stop as much as we can
systemctl stop <all services you don't need to migrate>

# Delete things we don't need to complete this process. 
# The whole system needs to fit in RAM make it as small as you can.
apt-get -y remove <any packages not needed to migrate>
apt-get -y autoremove
apt-get -y clean
rm -rf /lib/modules/4.4.48*
rm -rf /lib/modules/4.9.35+

# Make sure systemd won't interfere with / and /boot from here on.
echo "proc            /proc           proc    defaults          0       0" > /etc/fstab

# Unmount all unused filesystems
umount -a
swapoff -a

# Make a temporary root
mkdir /tmp/tmproot
mount -t tmpfs none /tmp/tmproot
mkdir /tmp/tmproot/{proc,sys,dev,run,usr,var,tmp,oldroot}
cp -ax /{bin,etc,sbin,lib} /tmp/tmproot/
cp -ax /usr/{bin,sbin,lib} /tmp/tmproot/usr/
cp -ax /var/{lib,local,lock,opt,run,spool,tmp} /tmp/tmproot/var/

# Pivot to new root
mount --make-rprivate /
pivot_root /tmp/tmproot /tmp/tmproot/oldroot
for i in dev proc sys run; do mount --move /oldroot/$i /$i; done

# Restart everything that's using /oldroot
systemctl restart <everything still left>
# Restart systemd itself.
systemctl daemon-reexec

# I couldn't get rid of all with systemd so using more force here.
kill -9 $(pidof wpa_supplicant)
kill -9 $(pidof agetty)
kill -9 $(pidof hciattach)
 
sleep 5

# Initiate phase two, this will remove the last hold on /oldroot, this script.
systemctl start migrate-2.service
1 Like
# migrate-2.sh
#!/usr/bin/env bash

set -x

if [ -z "$1" ]
  then
    echo "Target device not provided"
    exit 1
fi

DEVICE=$1
PARTITION_1=p1
PARTITION_2=p2
PARTITION_3=p3
PARTITION_4=p4

# The files that where downloaded before all this started.
BOOT_IMG="uboot.img"
SYS_IMG="sys.img"

# I just read these out by doing fdisk -l on the Yocto image.
BEGIN_SECTOR_BOOT=24576
END_SECTOR_BOOT=106495
BEGIN_SECTOR_SYS_A=106496
END_SECTOR_SYS_A=4112383
BEGIN_SECTOR_SYS_B=4112384
END_SECTOR_SYS_B=8118271
BEGIN_SECTOR_DATA_PART=8118272

# Kill everything that might still have a hold on the old root, never know.
fuser -Mk /oldroot

sleep 5
# Unmout the old root filesystem, so we can do things to it.
umount /oldroot
sleep 5

# Determine the new end sector for the old root partition.
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
MIN_FS_BLOCKS=$(resize2fs -P $DEVICE$PARTITION_2 | awk -F ':' '{print $2}')
OLD_ROOT_START_SECTOR=$(fdisk -l $DEVICE|grep $DEVICE$PARTITION_2|awk '{print $2}')

# Need to forcefully fix any FS errors that might be there or none of this will work.
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
OLD_ROOT_FS_BLOCK_SIZE=$(resize2fs -M $DEVICE$PARTITION_2|tail -n2|head -n1|awk '{print $7}')
let "OLD_ROOT_END_SECTOR = (($OLD_ROOT_FS_BLOCK_SIZE * 4096) / 512) + 
$OLD_ROOT_START_SECTOR + 1"
echo "End sector old root " $OLD_ROOT_END_SECTOR
echo "Start sector n data " $BEGIN_SECTOR_DATA_PART
sleep 5

# Shrink the old root file system.
fdisk $DEVICE <<EOF1
d
2
n
p
2
137216
$OLD_ROOT_END_SECTOR
w
EOF1

sleep 5
partprobe $DEVICE
sleep 5
e2fsck -f -a $DEVICE$PARTITION_2
e2fsck -f -y $DEVICE$PARTITION_2
sleep 5

# Make a new data partition at the end of the disk
fdisk $DEVICE <<EOF2
n
p
4
$BEGIN_SECTOR_DATA_PART

w
EOF2

sleep 5
partprobe $DEVICE
sleep 5

# Put a filesystem on the data partition
mkfs.ext4 -F -q $DEVICE$PARTITION_4

# Mount everything so we can start copying user data.
mkdir -p /newroot/data
mount -o rw $DEVICE$PARTITION_4 /newroot/data/
mount -o rw $DEVICE$PARTITION_2 /oldroot/
mount -o rw $DEVICE$PARTITION_1 /oldroot/boot

# Copy user data
mkdir -p /newroot/data/{etc,mender,temp,u-boot} # Whatever directory structure you need really.

# Copy all your userdata here.

# Copy over the new image files
cp -avx /oldroot/var/cache/migrate/{$BOOT_IMG*,$SYS_IMG*} /newroot/data/temp

# You'll want to keep the network config too.
cp -avx /oldroot/etc/wpa_supplicant/wpa_supplicant.conf /newroot/data/etc/wpa_supplicant.conf

 # These you need to make Mender happy. If you don't the migration will succeed. 
# But the any update done with Mender will fail. I don't really know what it means I just copied it
# off the Yocto image.
echo "device_type=raspberrypi-cm3" > /newroot/data/mender/device_type
echo "/dev/mmcblk0 0x400000 0x4000" > /newroot/data/u-boot/fw_env.config
echo "/dev/mmcblk0 0x800000 0x4000" >> /newroot/data/u-boot/fw_env.config

# Defintitley done with the old filsystems now.
umount /oldroot/boot
umount /oldroot
umount /newroot/data
sleep 5

# Delete boot and root partitions
fdisk $DEVICE <<EOF
d
1
d
2
w
EOF

sleep 5

# Create new boot partition, sys a and sys b
fdisk $DEVICE <<EOF
n
p
1
$BEGIN_SECTOR_BOOT
$END_SECTOR_BOOT
t
1
c
a
1
w
EOF

sleep 5

# Create new system partition A
fdisk $DEVICE << EOF
n
p
2
$BEGIN_SECTOR_SYS_A
$END_SECTOR_SYS_A
w
EOF

sleep 5

# Create new system partition B
fdisk $DEVICE << EOF
n 
p
3
$BEGIN_SECTOR_SYS_B
$END_SECTOR_SYS_B
w
EOF

partprobe $DEVICE
sleep 5

# Re mount the data partion so we can read the image files.
mount -o rw $DEVICE$PARTITION_4 /newroot/data/

# Write the new partitions from the image
# Boot
zcat /newroot/data/temp/$BOOT_IMG.gz | dd of=$DEVICE$PARTITION_1 bs=1M
# System A
zcat /newroot/data/temp/$SYS_IMG.gz | dd of=$DEVICE$PARTITION_2 bs=1M

# Force checks, repairs, the lot.
fsck -f -a $DEVICE$PARTITION_1
fsck -f -y $DEVICE$PARTITION_1
fsck -f -a $DEVICE$PARTITION_2
fsck -f -y $DEVICE$PARTITION_2

# Clean up
rm -rf /newroot/data/temp
umount /newroot/uboot
umount /newroot/data

# Fingers crossed, it should come back as new now.
reboot now
1 Like

Hello everyone,
for my project iā€™m in the same situation as erikhh is.
i would need to upgrade allot of live raspbian-jessie piā€™s (pi 3 model b)
to mender-clients which are spread around the country.

could someone please elaborate to me, on why i cant just stop most of the unnecessary processes, load in / keep the needed modules into ram, to maintain everything that is needed for network connectivity & some dd.
unmount the sd card and just dd the given .sdimg image from the internet on the sd-card and let it reboot into a fresh mender-client.

i donā€™t want to sound arrogant, itā€™s probably i donā€™t fully understand yet how the whole linux/gnu & raspbian-jessie things work in.

thanks!

Hi,

Thatā€™s in essence what I did.

I didnā€™t dd directly from the internet because internet connections arenā€™t all that reliable. And if the dd fails you brick the device. So I choose to download and verify the image before doing a dd.

I didnā€™t dd the whole SD card essentially to speed up the process and reduce the space needed on our devices. Only dd-ing sys-a and the boot partition saves about half of the time you need writing.

And we have user data on the devices that needed to be kept intact during all this. Thatā€™s why the script first makes a data partition so that I could move all the user data there before destroying the Raspbian partition.

Cheers,
Erik

1 Like

@erikhh, big thanks for sharing your scripts! Will probably save me days of time.

@mirzak, I think this will be a common problem for quite a lot of newcomers. If Mender invest some time in providing a more or less reliable script like this which would work with standard raspbian setup, Iā€™m quite sure it would be a strong additional argument to go with Mender for hesitant :slight_smile: Or at least put a link to this post in docs regarding migration so this post wouldnā€™t get lost as the time passes.

3 Likes

For sure we want to highlight this post as it contains valuable information. We have shared this else where as well,

https://twitter.com/mender_io/status/1139224898579005440

But it is an good idea to put a link the docs as well.

so this i what iā€™m doing to live upgrade running RPIā€™s to a mender-client, u could also maybe see this a guide.

so apparently, u can just completely dd an image on de sd-card without bricking the device,
with been bricking i mean, ā€˜it does not freeze mid wget | ddā€™
once dd is doing what its doing, there is no way back, and most functionality of raspbian or whatever happens to be running will be gone, only whatā€™s left in memory will be left.

at this point i have done dozenā€™s of rewrites this way, and never does it seem to brick when done correct.

this is for piā€™s that do not hold any important data, once the dd starts, there is no going back!
first start of with activating magic sysrq in the kernel, this allows the kernel to listen to a certain key combination & will do an certain action to it. this will be needed to reboot the pi, since ā€˜shutdown now -rā€™ wont work anymore, because the running os is broken at this point.
active magic sysrq here,

echo 1 > /proc/sys/kernel/sysrq

then we will use a wget, streaming it to funzip and then finally dd it on the sd-card

wget -c -t inf -O - https://to-my-url-containing-my-.sdimg.zip | funzip | dd of=/dev/mmcblk0 bs=64k

after the command is done, we send an echo, triggering the kernel to reboot itself using

echo b > /proc/sysrq-trigger

And thatā€™s it. donā€™t forget that your .sdimg needs to have ssh enabled! the new stock raspbians images for example are not ssh enabled.

bonus

I was also working to get an checksum with a sha256 hash, for when the file was written on the sd-card, u could also read the same amount of blockā€™s written to check if they where indeed all correct, its very cumbersome, and i could get some help completing it, itā€™s just an extra security, especially concerning transferring over wifi.
Again i encountered no problems, since wget will be using ftp protocol with tcp, which has checksums, its just to be very, very sure.

the problem is iam unable to feed sha256sum the hash, because it only accepts a hash from a file, not directly inside the command itself like this

dd if=/dev/mmcblk0 | sha256sum --check ā€˜random digits from hashā€™

since we only have one stdin, and broke our filesystem, iam unable to give sha256sum its hash file. even with ramdisk, i could not get this working, an eventual solution would be to kexec another kernel, for ramdisk, put the hash file there & use it that way.

the command:

wget -c -t inf -O - 192.168.30.11:/working_9-7-19_rapbian-stretch-lite.sdimg.zip | funzip | dd of=/dev/mmcblk0 && dd if=/dev/mmcblk0 bs=65536 count=57600 status=progress | echo ā€˜2628a2784264a78ef8f155ee35785f14ece8aaba28df64003e9ff1aabecb5036 hash.sha256ā€™ | sha256sum --check -

the problem is getting past the echo, the data of ā€˜dd if=ā€™ wont get to the sha256, and not having an ā€˜fileā€™ for sha256sum does not work also as mentioned earlier.

so an ideal solution would be

wget -c -t inf -O - 192.168.30.11:/working_9-7-19_rapbian-stretch-lite.sdimg.zip | funzip | dd of=/dev/mmcblk0
&& "here an variable $touch ā€˜2628a2784264a78ef8f155ee35785f14ece8aaba28df64003e9ff1aabecb5036 hash.sha256ā€™
&& dd if=/dev/mmcblk0 bs=65536 count=57600 status=progress | sha256sum --check ā€˜$hash.sha256ā€™

but iā€™m not aware if this is possible.
other suggestions on how to do this better, are ofcourse welcome! cheers

I found this a real interesting write up!

I havent had the time to look more deeply into this, but just from a first read through, I dontā€™ understand why shasum cannot read from stdin?
Have you not simply forgotten the ā€˜-ā€™. That is:
dd if=/dev/mmcblk0 | sha256sum --check ā€˜random digits from hashā€™ -

Hello @oleorhagen
yes i also though that i could just do
sha256sum --check "hash" -
or
sha256sum --check "hash -"
sha256sum --check "hash filename"
but none of these worked for me

i also donā€™t find any example online other then the echo ā€œhash filenameā€ | sha256sum --check -
which is the problem for my command chain
Thanks for the interest!

Thank you for posting all this information!

We have done some more research and posted a howto for this as a blog post: https://mender.io/blog/converting-a-live-device-to-a-robust-dual-rootfs-device