Read-only filesystems with Yocto

Read-only filesystems with Yocto

Introduction

An embedded device that loses power mid-write should come back up, not come back up corrupted. A read-only rootfs is the cheap, structural answer: the kernel cannot dirty pages it is not allowed to write, and the filesystem cannot be left in a half-written state if nothing was ever writing to it in the first place. The awkward part is the things that legitimately have to change: a hostname, a few configuration files, a runtime cache the application needs to scribble into.

This tutorial walks through a complete example for qemuarm64: an erofs root filesystem mounted strictly read-only, an ext4 /persistent partition that holds persistent configuration, an /etc/hostname that lives on /persistent via a symlink and is seeded by a first-boot script, and a tmpfs-backed cache exposed via the Yocto populate-volatile.sh mechanism.

Dynamic remounting of /persistent between ro and rw for runtime writes is a separate problem and is deferred to a follow-up article. In this tutorial /persistent is seeded once at first boot and stays read-only afterwards.

Version notes

This tutorial uses scarthgap as the primary target, which is the current LTS release by the Yocto Project. You can find more information on releases here.

The recipes below were written and verified on scarthgap (5.0). The same pattern works on more recent releases without modification; older releases (kirkstone and earlier) need adjustments around erofs support in wic and populate-volatile.sh, which is out of scope here.

Step 0: prepare a meta layer

We will collect everything in a small custom layer named meta-rwdata, sitting next to poky and build:

.
├── build
├── meta-rwdata
└── poky

Throughout the tutorial we assume the shell is initialized for the build (source poky/oe-init-build-env build) and the current working directory is build.

The layer configuration is the boilerplate minimum:

meta-rwdata/conf/layer.conf:

BBPATH .= ":${LAYERDIR}"
BBFILES += "${LAYERDIR}/recipes-*/*/*.bb ${LAYERDIR}/recipes-*/*/*.bbappend"

BBFILE_COLLECTIONS += "rwdata"
BBFILE_PATTERN_rwdata = "^${LAYERDIR}/"
BBFILE_PRIORITY_rwdata = "10"

LAYERSERIES_COMPAT_rwdata = "scarthgap"
LAYERDEPENDS_rwdata = "core"

Add it to build/conf/bblayers.conf and you are ready to go.

We will also build a custom image core-image-rwdata that requires core-image-minimal. Putting our changes into a separate image (rather than overriding core-image-minimal from a bbappend) means rollback is trivial: anything goes wrong, just bitbake core-image-minimal again.

Step 1: make the rootfs read-only

Lets start with the single line that does most of the heavy lifting:

IMAGE_FEATURES += "read-only-rootfs"

This image feature triggers read_only_rootfs_hook in poky/meta/classes-recipe/rootfs-postcommands.bbclass. The hook does three things we care about:

  1. It rewrites the /dev/root line in the image’s /etc/fstab so the root filesystem is mounted with ro, and sets the fsck pass field to 0. This part is universal regardless of init system.
  2. It makes sure scratch paths daemons expect to find writable (/tmp, /var/volatile) end up on tmpfs at boot. Under sysvinit thats done by flipping ROOTFS_READ_ONLY=yes in /etc/default/rcS. Under systemd /tmp is already on tmpfs via tmp.mount, /var/volatile comes from the base fstab line, and the hook drops an empty /etc/machine-id for stateless first boot.
  3. Under sysvinit, it also runs populate-volatile.sh once at rootfs build time, baking in any symlinks contributed by /etc/default/volatiles/*. We will lean on that in Step 5. The systemd equivalent is the volatile-binds recipe, covered in the same step.

A useful sanity check: after the hook runs, tmp/work/.../core-image-rwdata/.../rootfs/etc/fstab will show /dev/root / auto ro 1 0, not defaults. That one applies to both init systems.

A word on what “read-only” really means

Before we go further, an important caveat. What IMAGE_FEATURES = "read-only-rootfs" gives us is software-level read-only. Every layer above the storage controller (the filesystem driver, mount flags, postprocess hooks) is an instruction not to write. None of it is enforcement against the hardware actually putting electrons on the medium.

Most embedded storage interfaces have no real way to mechanically prevent writes. eMMC and SD have a write-protect mechanism on the carrier, but the host driver is the only thing that honors it; the flash controller itself does not care. NOR/CFI flash has a WP# line that is hardware-enforced, but it also gates the unlock sequence, so “force read-only forever” means “give up your ability to ever update this device”. NVMe and SATA have no enforced read-only mode at all.

Even when no software writes are issued, your storage device is rarely idle. NAND-based media run wear-levelling and bad-block remapping continuously; FTL metadata gets flushed seconds after host activity stops; eMMC SLC caches promote and demote pages at the controller’s discretion. None of that needs a host write to happen. So “the filesystem was mounted read-only” is not the same as “the storage was idle”; serious power-loss resilience pairs a read-only filesystem with PLP (power-loss-protection) on the storage device.

What software-defined read-only does give you is a complete elimination of the writes your OS and applications have any visibility into. That removes the largest source of corruption (your own writes) and forces you to find every place in your application stack that thought it was writing somewhere harmless. Read-only is one of the best diagnostic tools in the embedded toolkit. Turn it on, watch what breaks, then deal with each thing on its merits: does it really need to persist? Can it go on tmpfs? Does it belong on /persistent?

One temptation, when something breaks loudly, is to reach for overlayfs and make the rootfs “read-only-but-actually-writable”. Resist it. Overlayfs gives you a writable rootfs that looks read-only to inspection tools, which is the opposite of the property you wanted. It is, in my opinion, basically capitulation.

The rule of thumb: make as much read-only as you can, and keep it read-only as long as you can. The rest of this article is the supporting machinery for doing exactly that.

Step 2: choose the filesystem layout

For the rootfs we use erofs with lz4 compression. erofs is structurally read-only: no journal to recover, no allocator state to maintain, and the on-disk format does not change after mkfs.erofs exits. That makes the “device powered down mid-boot” question trivially answerable: nothing was being written, so nothing can be corrupt.

For /persistent we use ext4. ext4 has a journal, which we want for the brief rw window during first-boot seeding (Step 4).

The wic file lays out two partitions on a virtio-blk disk:

meta-rwdata/wic/qemuarm64-rwdata.wks:

# Disk layout for a virtio-blk disk (/dev/vda), QEMU-only.
#
#   /dev/vda1  rootfs  erofs (lz4)  populated from IMAGE_ROOTFS, mounted ro
#   /dev/vda2  persist  ext4        empty, 256 MiB, mounted ro by default
#                                    via LABEL=persistent
#
# msdos partition table is fine for runqemu / virtio-blk; a real device
# build would want GPT. No on-disk bootloader: runqemu boots the kernel
# with -kernel, so we only need a partition table here.

part / --source rootfs --ondisk vda --fstype=erofs --mkfs-extraopts="-z lz4" --align 1024
part /persistent --ondisk vda --fstype=ext4 --label persistent --use-label --fsoptions="ro,defaults" --align 1024 --fixed-size 256

bootloader --ptable msdos

Two gotchas hide in this file.

The first one cost me some nerves: wic’s --label is rejected on erofs partitions. The check is right there in the wic plugin; if you try, you will get erofs does not support LABEL. So the rootfs gets selected by partition number on the kernel cmdline (/dev/vda1), not by label. The /persistent partition is ext4 and uses --use-label happily, and wic will auto-emit an fstab entry for it. We will look at that one in Step 4.

The second one is the bootloader --ptable msdos line. wic refuses to lay out a partition table without a bootloader directive, even though we are booting the kernel directly via runqemu -kernel and there is no on-disk loader to install. Treat the line as required syntactic furniture.

Stock poky kernels do not enable erofs. If you skip this, your kernel will panic mid-boot with unable to mount root fs on /dev/vda1, which is genuinely confusing because the disk image is fine. Add a kernel config fragment:

meta-rwdata/recipes-kernel/linux/files/erofs.cfg:

CONFIG_EROFS_FS=y
CONFIG_EROFS_FS_ZIP=y

And the matching bbappend that pulls it in:

meta-rwdata/recipes-kernel/linux/linux-yocto_%.bbappend:

FILESEXTRAPATHS:prepend := "${THISDIR}/files:"

SRC_URI += "file://erofs.cfg"

CONFIG_EROFS_FS=y enables the filesystem driver, CONFIG_EROFS_FS_ZIP=y is what allows mounting an lz4-compressed image. Skip the second one and the kernel will recognise the filesystem but refuse to read it.

Step 3: tame runqemu

The image recipe so far is short:

meta-rwdata/recipes-core/images/core-image-rwdata.bb:

SUMMARY = "core-image-minimal with erofs read-only rootfs and a /persistent partition"

require recipes-core/images/core-image-minimal.bb

IMAGE_FEATURES += "read-only-rootfs"

IMAGE_FSTYPES = "wic"
WKS_FILE = "qemuarm64-rwdata.wks"

QB_DEFAULT_FSTYPE = "wic"
# Treat the wic as a plain rootfs image (kernel comes from -kernel, not from
# inside the wic). Without this, runqemu classifies wic as a "vmtype" and
# falls back to QB_DRIVE_TYPE (= /dev/sd, virtio-scsi) instead of using
# QB_ROOTFS_OPT (virtio-blk-pci -> /dev/vda).
QB_FSINFO = "wic:no-kernel-in-fs"
# runqemu appends " rw" to QB_KERNEL_ROOT unless ro/rw is present, so embed ro
# here. The "read-only-rootfs" IMAGE_FEATURES adds ro to APPEND, but APPEND is
# only consumed by on-disk bootloaders (extlinux/grub), not by runqemu.
QB_KERNEL_ROOT = "/dev/vda1 ro"

The two QB_* lines are the entire reason this section exists.

QB_FSINFO = "wic:no-kernel-in-fs" tells qemuboot.bbclass that the wic image does not embed a kernel. Without it, runqemu classifies the wic as a “vmtype” and silently switches the disk attachment from QB_ROOTFS_OPT (virtio-blk-pci, giving /dev/vda) to QB_DRIVE_TYPE (virtio-scsi, giving /dev/sd*). The kernel boots, mounts the wrong device, and panics.

QB_KERNEL_ROOT = "/dev/vda1 ro" is the second trap. read-only-rootfs adds ro to APPEND, which is what an on-disk bootloader like extlinux or grub would consume. runqemu does not; it builds the cmdline from QB_KERNEL_ROOT and a few defaults. Without an explicit ro here, runqemu appends rw itself and you boot read-write despite all the other configuration.

After both lines, cat /proc/cmdline from inside the booted VM should show:

$ cat /proc/cmdline
root=/dev/vda1 ro  mem=256M ip=dhcp console=ttyAMA0 console=hvc0 swiotlb=0

And to confirm the rootfs really is read-only:

$ mount | grep ' on / '
/dev/root on / type erofs (ro,relatime,user_xattr,acl,cache_strategy=readaround)
$ touch /should-fail
touch: /should-fail: Read-only file system

That Read-only file system is the whole point of the exercise.

Step 4: persistent /etc via symlinks onto /persistent

Now we have a structurally read-only rootfs. So how does hostname work? Out of the box, poky reads /etc/hostname at boot via /etc/init.d/hostname.sh. We need that file to be writable per device, but /etc lives on the erofs we just made unwritable.

The pattern is to symlink the file in /etc to a path under /persistent, ship a “factory copy” of the file in the read-only rootfs, and seed /persistent from those factory copies on the first boot. Three pieces are needed: the symlink, the factory copy, and the first-boot seeder.

The symlink and the factory copy are set up in two ROOTFS_POSTPROCESS_COMMAND functions in the image recipe. Add them to core-image-rwdata.bb:

# Create the /persistent mount point. wic auto-emits the fstab entry for /persistent
# from the wks --use-label, so we only need the directory.
rwdata_create_persistent_mountpoint() {
    install -d -m 0755 ${IMAGE_ROOTFS}/persistent
}
ROOTFS_POSTPROCESS_COMMAND += "rwdata_create_persistent_mountpoint;"

# Move /etc/hostname onto /persistent via a symlink. Stage a factory copy under
# /usr/share/rwdata-factory/etc/hostname so the first-boot init script can
# seed /persistent before hostname.sh reads /etc/hostname.
rwdata_persist_etc_hostname() {
    install -d ${IMAGE_ROOTFS}/usr/share/rwdata-factory/etc
    install -m 0644 ${IMAGE_ROOTFS}/etc/hostname \
                    ${IMAGE_ROOTFS}/usr/share/rwdata-factory/etc/hostname
    rm ${IMAGE_ROOTFS}/etc/hostname
    ln -s /persistent/etc/hostname ${IMAGE_ROOTFS}/etc/hostname
}
ROOTFS_POSTPROCESS_COMMAND += "rwdata_persist_etc_hostname;"

Note that read_only_rootfs_hook itself is a ROOTFS_POSTPROCESS_COMMAND. We append our own commands with += so they run after it has rewritten fstab. Order matters here: the hook hardcodes the path ${IMAGE_ROOTFS}/etc/fstab, and we do not want to fight it.

The wic file produces an /etc/fstab line for /persistent automatically, courtesy of poky/scripts/lib/wic/plugins/imager/direct.py:update_fstab:

LABEL=persistent	/persistent	ext4	ro,defaults	0	0

Note ro,defaults: that comes straight from --fsoptions="ro,defaults" in the wks file. /persistent is mounted read-only at boot just like /.

That leaves one problem: on the very first boot of a freshly-flashed device, /persistent is empty, so /etc/hostname → /persistent/etc/hostname is a dangling symlink. hostname.sh will fail to read it silently, and any other persistent files we later move onto /persistent (machine-id, ssh host keys, application config) would suffer the same fate. We need a way to populate /persistent before any boot script reads through the symlinks.

The first-boot seeder fixes this. It runs once, copies factory defaults from /usr/share/rwdata-factory into /persistent, and drops a marker so it never runs again. The script body is identical regardless of init system; what differs is how the system invokes it.

meta-rwdata/recipes-extras/rwdata-firstboot/files/rwdata-firstboot:

#!/bin/sh
set -e

INITIALIZED=/persistent/.rwdata-initialized
FACTORY=/usr/share/rwdata-factory

[ -f "$INITIALIZED" ] && exit 0
[ -d "$FACTORY" ]     || exit 0

# Briefly remount /persistent rw so we can populate it. The window is closed
# again before any later boot script reads its contents.
mount -o remount,rw /persistent

( cd "$FACTORY" && find . -type d ) | while read -r d; do
    mkdir -p "/persistent/$d"
done
( cd "$FACTORY" && find . -type f ) | while read -r f; do
    if [ ! -e "/persistent/$f" ]; then
        cp -p "$FACTORY/$f" "/persistent/$f"
    fi
done

touch "$INITIALIZED"
sync

mount -o remount,ro /persistent

exit 0

The script is idempotent: the [ -f "$INITIALIZED" ] early exit means on the second and every subsequent boot the script does nothing. After it finishes, /persistent is read-only again; the rw window is just long enough to copy a tree of small text files.

The recipe is where the init-system choice shows up. systemd is the de facto default on most modern embedded distros, so lets cover that path first. Poky’s stock DISTRO is still sysvinit-based, and the equivalent recipe for that path follows.

Under systemd

Lets drop a one-shot unit alongside the script:

meta-rwdata/recipes-extras/rwdata-firstboot/files/rwdata-firstboot.service:

[Unit]
Description=Seed /persistent with factory defaults on first boot
RequiresMountsFor=/persistent
ConditionPathExists=!/persistent/.rwdata-initialized
Before=sysinit.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/libexec/rwdata-firstboot

[Install]
WantedBy=sysinit.target

There are three important lines in this unit. RequiresMountsFor=/persistent tells systemd to pull in the auto-generated mount unit for /persistent and order us after it, with no manual After= ceremony. ConditionPathExists=!/persistent/.rwdata-initialized makes the unit a true no-op on every subsequent boot. systemd skips activation entirely, so no script invocation and no log noise. Before=sysinit.target keeps us ahead of basically every other early service, so anything that resolves /etc/hostname later in the boot finds a valid target.

The matching recipe:

SUMMARY = "First-boot seeder for /persistent factory defaults"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"

SRC_URI = "file://rwdata-firstboot \
           file://rwdata-firstboot.service"
S = "${WORKDIR}"

inherit systemd

SYSTEMD_SERVICE:${PN} = "rwdata-firstboot.service"
SYSTEMD_AUTO_ENABLE = "enable"

do_install() {
    install -d ${D}${libexecdir}
    install -m 0755 ${S}/rwdata-firstboot ${D}${libexecdir}/rwdata-firstboot

    install -d ${D}${systemd_system_unitdir}
    install -m 0644 ${S}/rwdata-firstboot.service ${D}${systemd_system_unitdir}/
}

FILES:${PN} = "${libexecdir}/rwdata-firstboot ${systemd_system_unitdir}/rwdata-firstboot.service"

inherit systemd plus SYSTEMD_SERVICE:${PN} is the canonical poky pattern. The class takes care of registering the unit and enabling it via the appropriate symlinks at rootfs build time, so SYSTEMD_AUTO_ENABLE = "enable" is enough to get it running on the very first boot.

Under sysvinit (poky’s stock DISTRO)

If you have not opted into systemd (DISTRO_FEATURES += "systemd", VIRTUAL-RUNTIME_init_manager = "systemd"), poky boots with sysvinit and the same script is wired up via an init-script priority instead:

SUMMARY = "First-boot seeder for /persistent factory defaults"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"

SRC_URI = "file://rwdata-firstboot"
S = "${WORKDIR}"

inherit update-rc.d

INITSCRIPT_NAME = "rwdata-firstboot"
# Run after mountall (priority 3) so /persistent is mounted, and before
# hostname.sh (priority 39) so the /etc/hostname symlink resolves.
INITSCRIPT_PARAMS = "start 15 S ."

do_install() {
    install -d ${D}${sysconfdir}/init.d
    install -m 0755 ${S}/rwdata-firstboot ${D}${sysconfdir}/init.d/rwdata-firstboot
}

FILES:${PN} = "${sysconfdir}/init.d/rwdata-firstboot"

RDEPENDS:${PN} += "initscripts"

The start 15 priority places the script in the right spot in the S-runlevel. Lower numbers run first, and the relevant initscripts from the initscripts package land at:

  • mountall.sh: priority 3 (mounts /persistent)
  • populate-volatile.sh: priority 37 (creates type-d directories on tmpfs)
  • hostname.sh: priority 39 (reads /etc/hostname)

start 15 puts our seeder safely after mountall and well before anything that would consume /etc/hostname, which is exactly the window we need. The role of INITSCRIPT_PARAMS here is what Before=sysinit.target and RequiresMountsFor= do in the systemd path: ordering relative to other early-boot work.

Pull the recipe into the image with one more line in core-image-rwdata.bb:

IMAGE_INSTALL += "rwdata-firstboot"

Booting the image and logging in:

$ ls -l /etc/hostname
lrwxrwxrwx ... /etc/hostname -> /persistent/etc/hostname
$ cat /etc/hostname
qemuarm64
$ ls -l /persistent/etc/
-rw-r--r-- ... hostname
$ ls -l /persistent/.rwdata-initialized
-rw-r--r-- ... 0 ... /persistent/.rwdata-initialized
$ hostname
qemuarm64

/etc/hostname is a symlink, the target is on /persistent, the marker file shows the seeder ran, and hostname.sh resolved the value successfully.

The same pattern extends to anything else you want to make per-device persistent: /etc/machine-id, /etc/ssh/, your application config. Add the file under /usr/share/rwdata-factory/... with another ROOTFS_POSTPROCESS_COMMAND, and on first boot the seeder will copy it across.

One thing this article does not fully solve: how do you change /etc/hostname after first boot, given that /persistent is read-only? The pattern that handles it is sketched in “So what about runtime writes?” below; a complete implementation is the subject of a follow-up article.

Step 5: tmpfs-backed locations for runtime scratch

Persistent state lives on /persistent. But there is a third category we have not addressed yet: directories the application has to write to but does not need to keep across reboots. Think caches, scratch space, sockets, runtime indices. These belong on tmpfs, and both major init systems on Yocto ship a purpose-built mechanism for declaring them. As with the first-boot seeder, the systemd path is the de facto default, and poky’s stock sysvinit-based DISTRO has its own equivalent.

Under systemd

poky core ships a recipe called volatile-binds (at poky/meta/recipes-core/volatile-binds/) whose entire job is to take a list of <source-on-tmpfs> <destination-in-rootfs> pairs and generate one systemd .service unit per entry. Each unit bind-mounts the tmpfs source over the destination at boot, so the destination appears writable at runtime and is wiped on every reboot.

The default VOLATILE_BINDS value already covers the obvious paths (/var/lib, /var/cache, /var/spool, /srv). To add /var/lib/rwdata-cache to the set, extend the variable in your image recipe or distro config:

VOLATILE_BINDS += "${localstatedir}/volatile/rwdata-cache ${localstatedir}/lib/rwdata-cache\n"

IMAGE_INSTALL += "volatile-binds"

The \n at the end of the entry is how volatile-binds.bb separates list items. The trailing IMAGE_INSTALL line is the only addition needed; read-only-rootfs already arranges for the destination directory to exist so the bind mount has somewhere to land. The runtime result is a real bind mount visible to findmnt, not a symlink; application code does not care.

Under sysvinit (poky’s stock DISTRO)

The sysvinit-native equivalent is populate-volatile.sh, shipped in the initscripts package. The base poky fstab already mounts /var/volatile as tmpfs; all you need is to drop a fragment file under /etc/default/volatiles/ describing the directories and symlinks you want.

meta-rwdata/recipes-extras/rwdata-volatile-binds/files/99_rwdata:

# Volatile mount points contributed by our image.
#
# Format (see /etc/init.d/populate-volatile.sh):
#   <type> <owner> <group> <mode> <path> <linksource>
#     type     d=directory, f=file, l=symlink, b=bind

# /var/lib/rwdata-cache is a tmpfs-backed location: writable at runtime,
# wiped on every reboot.
d root root 0755 /var/volatile/rwdata-cache none
l root root 0755 /var/lib/rwdata-cache /var/volatile/rwdata-cache

Two records, two entirely different lifecycles.

The type-d record (the directory under /var/volatile) is created at every boot, because /var/volatile is wiped clean each time the tmpfs gets remounted. populate-volatile.sh runs as a SysV init script (priority 37 in the S runlevel) and recreates it.

The type-l record (the symlink at /var/lib/rwdata-cache) is something else entirely. Because read-only-rootfs is enabled, read_only_rootfs_hook runs populate-volatile.sh once at rootfs build time and bakes the symlink into the rootfs as a real symlink on disk. By the time the device boots, /var/lib/rwdata-cache → /var/volatile/rwdata-cache is already there, so the application sees a writable directory at the expected path even though the rootfs is otherwise frozen. Note the semantic difference with the systemd path: here the destination is a symlink resolving onto a tmpfs directory, not a bind mount.

One trap: the <owner> <group> <mode> columns cannot be none on type-l records, even though symlinks have no real ownership. The script silently skips such entries with Undefined users: in the build log. Use root root 0755.

The recipe is minimal:

meta-rwdata/recipes-extras/rwdata-volatile-binds/rwdata-volatile-binds_1.0.bb:

SUMMARY = "Volatile (tmpfs-backed) mount points"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"

SRC_URI = "file://99_rwdata"
S = "${WORKDIR}"

do_install() {
    install -d ${D}${sysconfdir}/default/volatiles
    install -m 0644 ${S}/99_rwdata ${D}${sysconfdir}/default/volatiles/99_rwdata
}

FILES:${PN} = "${sysconfdir}/default/volatiles/99_rwdata"

RDEPENDS:${PN} += "initscripts"

RDEPENDS:${PN} += "initscripts" is what guarantees populate-volatile.sh is on the image; without it, the fragment file would be installed but nothing would consume it.

Pull the recipe into the image:

IMAGE_INSTALL += "rwdata-volatile-binds"

Verification

Regardless of which path you took, after booting you should see:

$ ls -l /var/lib/rwdata-cache
lrwxrwxrwx ... /var/lib/rwdata-cache -> /var/volatile/rwdata-cache
$ mount | grep '/var/volatile'
tmpfs on /var/volatile type tmpfs (rw,relatime)
$ echo hello > /var/lib/rwdata-cache/runtime-only && cat /var/lib/rwdata-cache/runtime-only
hello

(Under the systemd path, the first line will show a regular directory and a bind mount rather than a symlink; the third line behaves identically.) After a reboot the file is gone, which is the entire point.

Step 6: build and verify

Lets pull it together. Build:

bitbake core-image-rwdata

And lets boot it under qemu. We point runqemu at the qemuboot.conf directly rather than at the image name: on some hosts the shortcut form’s internal bitbake -e lookup misses IMAGE_LINK_NAME and aborts, so the explicit form is more reliable:

runqemu build/tmp/deploy/images/qemuarm64/core-image-rwdata-qemuarm64.rootfs.qemuboot.conf nographic slirp

Log in as root (no password; EXTRA_IMAGE_FEATURES = "debug-tweaks" is set in local.conf). Exit qemu with Ctrl-A x.

Lets do some checks:

$ cat /proc/cmdline
root=/dev/vda1 ro  mem=256M ip=dhcp console=ttyAMA0 console=hvc0 swiotlb=0

$ mount | grep ' on / '
/dev/root on / type erofs (ro,relatime,user_xattr,acl,cache_strategy=readaround)

$ mount | grep ' on /persistent '
/dev/vda2 on /persistent type ext4 (ro,relatime)

$ touch /should-fail 2>&1; echo exit=$?
touch: /should-fail: Read-only file system
exit=1

$ touch /persistent/should-also-fail 2>&1; echo exit=$?
touch: /persistent/should-also-fail: Read-only file system
exit=1

$ ls -l /etc/hostname
lrwxrwxrwx    1 root     root            24 Apr  5  2011 /etc/hostname -> /persistent/etc/hostname

$ cat /etc/hostname
qemuarm64

$ ls -l /persistent/.rwdata-initialized
-rw-r--r--    1 root     root             0 May  7 07:28 /persistent/.rwdata-initialized

$ ls -l /var/lib/rwdata-cache
lrwxrwxrwx    1 root     root            26 May  7 07:29 /var/lib/rwdata-cache -> /var/volatile/rwdata-cache

$ mount | grep '/var/volatile'
tmpfs on /var/volatile type tmpfs (rw,relatime)

$ echo hello > /var/lib/rwdata-cache/runtime-only && cat /var/lib/rwdata-cache/runtime-only
hello

And here we go: / is erofs and read-only, /persistent is ext4 and read-only, and writes to either fail. /etc/hostname is a symlink onto /persistent and resolves successfully because the first-boot seeder ran. /var/lib/rwdata-cache is a symlink baked into the read-only rootfs that points at a tmpfs directory recreated on every boot.

So what about runtime writes?

The first-boot seeder in Step 4 quietly ships an important pattern: mount -o remount,rw /persistent, do the writes, mount -o remount,ro /persistent. Two lines, one for each direction. And nothing stops us from doing the same thing at runtime, every time the application genuinely needs to write.

In a tightly controlled embedded system, you usually know when persistent state actually changes: a user pushes a config update, a calibration routine finishes, a log rotation rolls over. The total time per day that the system genuinely writes to /persistent, summed up, is often a handful of seconds. So why was /persistent mounted rw for the other 86,400?

The pattern is: /persistent boots read-only. The first write request triggers mount -o remount,rw /persistent and arms a short grace-period timer (ten seconds is a reasonable starting point). Every subsequent write inside the grace window re-arms the timer, coalescing bursts. When the timer elapses, mount -o remount,ro /persistent fires and we are back to read-only steady state.

Most of the day, you are protected by exactly the property this article built. A power loss during the small rw window is still bad; a power loss during the 99.99% of the day that is not the rw window is fine. That ratio is the entire point. As a bonus, once writes carry an explicit cost (a remount, a timer arm, a privileged operation), developers stop writing things speculatively. Logs no one reads, state files updated every second: they all become visible and easy to push back on.

Two architectural notes for the follow-up article. First, the application and the code that owns the remount belong in separate processes: the remount needs CAP_SYS_ADMIN and should live in a small auditable watchdog daemon. Inlining mount -o remount into the application is the kind of decision you regret the day a misbehaving process leaves /persistent rw forever. Second, the watchdog has to remount back to ro on SIGTERM, SIGINT, and on its own crash recovery path, or the design fails closed in the wrong direction.

A working implementation with the writer, watchdog, signal handling and timer logic is the subject of the follow-up article, built on top of the same image we just constructed.

Conclusion

Thats it, were done! We have a poky image where the rootfs is structurally read-only, persistent factory configuration lives on a labelled ext4 /persistent partition reached via /etc symlinks, and runtime-writable scratch space comes from a tmpfs bind set up at rootfs build time.

Three patterns generalise far beyond this small example:

  • Anything in /etc that needs to be per-device persistent gets a symlink onto /persistent plus a factory copy seeded on first boot.
  • Anything that needs to be writable at runtime but does not need to persist across reboots goes onto tmpfs, via volatile-binds under systemd or a /etc/default/volatiles/ fragment under sysvinit.
  • The two non-obvious runqemu knobs (QB_FSINFO = "wic:no-kernel-in-fs" and an explicit ro in QB_KERNEL_ROOT) are the difference between “boots read-only as designed” and “boots read-write while you stare at the recipe wondering why”.

The cool-down timer pattern from the previous section is the logical next step on top of this image, and a complete implementation is the subject of a follow-up article. The meta-rwdata layer in this build directory contains the full unedited recipes for anyone who wants to clone and reproduce.

More resources:

1 Like