I have a scheduled restart on my devices. This is done via a systemd timer calling systemctl reboot.
On each reboot I see messages from systemctl showing that mender-authd core dumped.
systemd[1]: mender-authd.service: Main process exited, code=dumped, status=11/SEGV
systemd[1]: mender-authd.service: Failed with result 'core-dump'.
I’m running mender-auth 5.0.2 on a Yocto Scarthgap build. I’m seeing this on multiple devices.
I’m happy to provide more information if that helps. Not sure what to try next to solve this.
I’m also not actually sure where/if the core dumps are being saved to.
Cheers!
Hi @greg-blackcurrent,
Thanks for reaching out! That sounds a bit concerning indeed, unfortunately coredumps are usually not persisted in such setups. Additionally, it would be quite impractical as coredumps can be huge and connected devices in most cases don’t have a lot of spare non-volatile memory.
One small question, what do you mean be “scheduled restarts”? How are those triggered and implemented?
If you really want to go digging, then minicoredumper is a good way to approach this. However, it will require some effort, hence it would be way better if we could reproduce this. Like, does it also happen with your restart strategy in QEMU?
Greetz,
Josef
Short Answer: The scheduled restarts are triggered by a systemd timer which calls systemctl reboot.
I haven’t seen this in QEMU - I can try and reproduce there.
It’s possible that mender-authd is in a “reconnecting” state when it gets sigint(or sigterm?) from systemd. We do a modem reset immediately before rebooting. It could be that mender-authd is doing some kind of reconnect, and doesn’t like being interrupted at this time. Not sure - still searching for ideas.
Long answer…
[Unit]
Description=Periodic Rebooter
[Timer]
# System is using UTC time so this is either 7am, or 8am depending on daylight savings.
# 10 seconds after the hour, so that we get some time to send the message for the sample made at 19:00.
OnCalendar=*-*-* 19:00:10
[Install]
WantedBy=timers.target
[Unit]
Description=Periodic Rebooter
[Service]
Type=oneshot
ExecStart=/usr/bin/reset-modem-and-reboot "periodic-rebooter: resetting modem and rebooting..."
#!/bin/sh
echo "reset-modem-and-reboot: resetting modem."
# On a IoT Link device, the modem power can be reset by toggling a GPIO pin.
# https://mediawiki.compulab.com/w/index.php?title=IOT-LINK:_Linux:_How-To_Guide#Modem_Reset
/usr/bin/gpioset -c 0 --toggle 0 22=1
/usr/bin/sleep 0.3
/usr/bin/gpioset -c 0 --toggle 0 22=0
echo "reset-modem-and-reboot: rebooting."
/bin/systemctl reboot --message="$1"
Other notes, when I manually run systemctl reboot, or reboot this also causes a core dump.
Hi @greg-blackcurrent,
The client definitely should not segfault, but without understanding the chain of causality its really hard to narrow it down.
So a few thoughts here:
- does it also segfault if you do
systemctl stop mender-auth?
- does it also happen if you extend the modem reset script to stop the
mender-auth service before resetting?
Greetz,
Josef
It doesn’t segfault if I do systemctl stop.
It does segfault if I manually do a systemctl reboot. So I suspect the scripts I posted above are a red-herring.
I’ve also turned on debug log level for mender-authd - no extra clues there. I may see if I can add some extra debug logging in the signal handling code which triggers the clean shutdown.
I don’t have a way to reproduce this in qemu at the moment. The Compulab/NXP imx9 machine configuration doesn’t seem to support qemu.
Perhaps it would make sense porting the build to another machine config which does support qemu. I am looking at building for another device soonish.
I agree this is going to be difficult to track down. I don’t have a lot of information to work with at this point. I’m totally open to suggestions of other things to try.
I wonder if the shutdown process is removing some resource that mender-auth is relying on. unmounting partitions too soon is the only thing that comes to mind.
Hi @greg-blackcurrent,
The good news is: I can reproduce it.
Dec 15 11:55:15 qemuarm64 mender-auth[629]: using interface /sys/class/net/enp0s1
Dec 15 11:55:15 qemuarm64 mender-auth[237]: record_id=12 severity=info time="2025-Dec-15 11:55:15.503061" name="Global" msg="Signing with: /var/lib/mender/mender-agent.pem"
Dec 15 11:55:15 qemuarm64 mender-auth[237]: record_id=13 severity=info time="2025-Dec-15 11:55:15.835892" name="Global" msg="Authentication error trying server 'https://hosted.mender.io': Unauthorized error: Authentication error(Unauthorized): Failed to authorize with the server.({"error":"dev auth: unauthorized","request_id":"303eaf66-c6a9-43bf-a7d2-6b44f9bf615c"})"
Dec 15 11:55:15 qemuarm64 mender-auth[237]: record_id=14 severity=error time="2025-Dec-15 11:55:15.836321" name="Global" msg="Failed to fetch new token: Authentication error: No more servers to try for authentication"
Dec 15 11:55:24 qemuarm64 systemd[1]: Stopping Mender authentication service...
Dec 15 11:55:24 qemuarm64 systemd[1]: mender-authd.service: Main process exited, code=dumped, status=11/SEGV
Dec 15 11:55:24 qemuarm64 systemd[1]: mender-authd.service: Failed with result 'core-dump'.
Dec 15 11:55:24 qemuarm64 systemd[1]: Stopped Mender authentication service.
-- Boot 96e6797d665040979831671656464d14 --
The bad news is: no idea what causes it. I’m currently discussing it with the developers, and will keep you posted if new information is available.
Greetz,
Josef
That’s really good news. Thanks for looking into this!
Hi @greg-blackcurrent,
As a first work-around, you can add a systemctl stop mender-updated; systemctl stop mender-authd to the reboot service.
We have narrowed it down to a race condition in system shutdown, and are now proceeding to plan the the fix as part of ongoing development. Thank you for the report!
Greetz,
Josef
Thanks for the update.
Is there a Jira ticket I can follow?