Troubleshooting addon goes offline for hours

Hi

We observed that on our devices, mender-connect addon goes offline for long periods of time (could be few hours)

The effect is only troubleshooting is offline, and deployment and inventory updates still works

an example of log from an event this happend

Sep 03 17:43:36 raspberrypi mender-connect[574]: time=“2025-09-03T17:43:36+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:44654->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 6/10); len(token)=647”
Sep 03 17:43:38 raspberrypi mender-connect[574]: time=“2025-09-03T17:43:38+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:43:49 raspberrypi mender-connect[574]: time=“2025-09-03T17:43:49+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:00 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:00+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:11 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:11+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:22 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:22+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:26 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:26+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:54440->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 7/10); len(token)=647”
Sep 03 17:44:33 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:33+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:44 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:44+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:44:55 raspberrypi mender-connect[574]: time=“2025-09-03T17:44:55+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:45:06 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:06+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:45:16 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:16+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:35162->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 8/10); len(token)=647”
Sep 03 17:45:17 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:17+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:45:28 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:28+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:45:39 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:39+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:45:50 raspberrypi mender-connect[574]: time=“2025-09-03T17:45:50+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:01 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:01+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:06 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:06+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:51468->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 9/10); len(token)=647”
Sep 03 17:46:12 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:12+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:23 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:23+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:34 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:34+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:45 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:45+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=error msg=“eventLoop: error reconnecting: failed to connect after max number of retries”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=trace msg=“messageLoop: got response: {event:connected-error data: id:}”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=trace msg=“messageLoop: posting event: reconnect-req; waiting for response”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=trace msg=“needsReconnect: got event: reconnect-req”
Sep 03 17:46:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:46:56+03:00” level=trace msg=“dbusEventLoop: daemon needs to reconnect”
Sep 03 17:47:06 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:06+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:47:06 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:06+03:00” level=trace msg=“(dbusEventLoop) posting Event: reconnect”
Sep 03 17:47:06 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:06+03:00” level=trace msg=“eventLoop: got event: reconnect”
Sep 03 17:47:17 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:17+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:47:28 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:28+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:47:39 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:39+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:47:50 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:50+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:47:51 raspberrypi mender-connect[574]: time=“2025-09-03T17:47:51+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:57666->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 1/10); len(token)=647”
Sep 03 17:48:01 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:01+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:48:12 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:12+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:48:23 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:23+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:48:34 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:34+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:48:41 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:41+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:32959/api/devices/v1/deviceconnect/connect: read tcp 127.0.0.1:50636->127.0.0.1:32959: i/o timeout; reconnecting in 5s (try 2/10); len(token)=647”
Sep 03 17:48:45 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:45+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 03 17:48:56 raspberrypi mender-connect[574]: time=“2025-09-03T17:48:56+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”

we are using 2025-05-13-raspios-bookworm-arm64-lite.img on rpizero2w
with mender 4.0.7 and mender-connect 2.2.1 runtime: go1.21.1
on the hosted mender servers (US)

What can be done?

Edit by @TheYoctoJester: formatting of log

Hi @roee.douek,

Thanks for reaching out!

The easiest thing to try is bumping mender-connect to 2.3.1, as if I remember correctly there were some improvements concerning handling the connection.
Besides that, are there maybe any recognizable patterns? Like

  • always around a similar time or day,
  • or a specific action taking place?

Greetz,
Josef

i tried upgrading to 2.3.1 and the issue still replicate

i feel like now with the exponential backoff it would disconnect for longer?

pi@raspberrypi:~ $ sudo journalctl -u mender-updated.service
Sep 13 12:01:10 raspberrypi mender-update[627]: record_id=122 severity=info time=“2025-Sep-13 12:01:10.121374” name=“Global” msg=“No update available”
Sep 13 12:11:10 raspberrypi mender-update[627]: record_id=123 severity=info time=“2025-Sep-13 12:11:10.148140” name=“Global” msg=“No update available”
Sep 13 12:21:10 raspberrypi mender-update[627]: record_id=124 severity=info time=“2025-Sep-13 12:21:10.255266” name=“Global” msg=“No update available”
Sep 13 12:31:10 raspberrypi mender-update[627]: record_id=125 severity=info time=“2025-Sep-13 12:31:10.304287” name=“Global” msg=“No update available”
Sep 13 12:41:10 raspberrypi mender-update[627]: record_id=126 severity=info time=“2025-Sep-13 12:41:10.416596” name=“Global” msg=“No update available”
pi@raspberrypi:~ $ sudo journalctl -u mender-authd.service
Sep 13 12:11:16 raspberrypi mender-auth[615]: record_id=113 severity=error time=“2025-Sep-13 12:11:16.308219” name=“Global” msg="Error during network socket forwardin>
Sep 13 12:11:16 raspberrypi mender-auth[615]: record_id=114 severity=warning time=“2025-Sep-13 12:11:16.308663” name=“http_client” url="https://hosted.mender.io/api/d>
Sep 13 12:11:16 raspberrypi mender-auth[615]: record_id=115 severity=error time=“2025-Sep-13 12:11:16.318146” name=“Global” msg="Error during network socket forwardin>
pi@raspberrypi:~ $ sudo journalctl -u mender-connect.service
Sep 13 12:07:30 raspberrypi mender-connect[626]: time=“2025-09-13T12:07:30+03:00” level=error msg="connection manager failed to connect to http://127.0.0.1:44853/api/>
Sep 13 12:07:30 raspberrypi mender-connect[626]: time=“2025-09-13T12:07:30+03:00” level=error msg="eventLoop: error reconnecting: read tcp 127.0.0.1:42776->127.0.0.1:>
Sep 13 12:07:50 raspberrypi mender-connect[626]: time=“2025-09-13T12:07:50+03:00” level=info msg=“connectionmanager backoff: retrying in 1h11m54.74583393s”
pi@raspberrypi:~ $

and observing

as for your questions

that happens not around a specific time, we feel its related to network disconnects/hiccups, but everything else recover rather quickly

even after restaring the service (added —dd flag) im seeing

Sep 13 13:05:10 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:10+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:05:21 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:21+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:05:32 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:32+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:05:43 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:43+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=error msg=“connection manager failed to connect to http://127.0.0.1:44853/api/devices/v1/deviceconnect/conne>
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=error msg=“eventLoop: error reconnecting: read tcp 127.0.0.1:34688->127.0.0.1:44853: i/o timeout”
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=trace msg=“messageLoop: got response: {event:connected-error data: id:}”
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=trace msg=“messageLoop: posting event: reconnect-req; waiting for response”
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=trace msg=“needsReconnect: got event: reconnect-req”
Sep 13 13:05:44 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:44+03:00” level=trace msg=“dbusEventLoop: daemon needs to reconnect”
Sep 13 13:05:54 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:54+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:05:54 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:54+03:00” level=trace msg=”(dbusEventLoop) posting Event: reconnect"
Sep 13 13:05:54 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:54+03:00” level=trace msg=“eventLoop: got event: reconnect”
Sep 13 13:05:54 raspberrypi mender-connect[310953]: time=“2025-09-13T13:05:54+03:00” level=info msg=“connectionmanager backoff: retrying in 1m0s”
Sep 13 13:06:05 raspberrypi mender-connect[310953]: time=“2025-09-13T13:06:05+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:06:16 raspberrypi mender-connect[310953]: time=“2025-09-13T13:06:16+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:06:27 raspberrypi mender-connect[310953]: time=“2025-09-13T13:06:27+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:06:38 raspberrypi mender-connect[310953]: time=“2025-09-13T13:06:38+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:06:49 raspberrypi mender-connect[310953]: time=“2025-09-13T13:06:49+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:07:00 raspberrypi mender-connect[310953]: time=“2025-09-13T13:07:00+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”
Sep 13 13:07:11 raspberrypi mender-connect[310953]: time=“2025-09-13T13:07:11+03:00” level=trace msg=“dbusEventLoop: WaitForJwtTokenStateChange , err timeout waiting for signal JwtTokenStateChange”

pi@raspberrypi:~ $ sudo dbus-send --system --dest=io.mender.AuthenticationManager --print-reply /io/mender/AuthenticationManager io.mender.Authentication1.GetJwtToken
method return time=1757758057.251751 sender=:1.15 → destination=:1.48 serial=160 reply_serial=2
string “eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJqdGkiOiI3ZmQ1YWIyYy03NzI3LTRhODAtOThiOC1iODQ4Y2E5NTJlODkiLCJzdWIiOiJjNGExMzNmYS1lZTllLTQ5ZGQtYTczYy05M2ZhMDliYTFlYjAiLCJpc3MiOiJNZW5kZXIiLCJtZW5kZXIudGVuYW50IjoiNjgxMGRhNjExZjMzZTk4MDcwYmY0MmMxIiwiZXhwIjoxNzU4MzA3OTMwLCJpYXQiOjE3NTc3MDMxMzAsIm5iZiI6LTYyMTM1NTk2ODAwLCJtZW5kZXIuZGV2aWNlIjp0cnVlLCJtZW5kZXIucGxhbiI6Im9zIiwibWVuZGVyLnRyaWFsIjpmYWxzZSwibWVuZGVyLmFkZG9ucyI6W3sibmFtZSI6InRyb3VibGVzaG9vdCIsImVuYWJsZWQiOnRydWV9LHsibmFtZSI6ImNvbmZpZ3VyZSIsImVuYWJsZWQiOmZhbHNlfSx7Im5hbWUiOiJtb25pdG9yIiwiZW5hYmxlZCI6ZmFsc2V9XX0.PhagXWCe5VDy3e0ng6jJGv-GObRhpsuUuJ0KlC84cY4AdUdklf5OXzGFxPxrVCcLkn4sMDHo8yx_I2H4rZlRAA”
string “http://127.0.0.1:44853

Here are logs from the services, i even tried restarting them

mender-authd.service

mender-connect.service

this is still recurring, what can be done to help debug?

Hi @roee.douek,

After digging a bit, there is an improvement to the connection-restart logic in feat: enable 10 minute idle timeout by default by danielskinstad · Pull Request #158 · mendersoftware/mender-connect · GitHub. Unfortunately, that one has not made it into a mender-connect release yet.

Do you have a way of building the binary and putting it onto one of the devices in question?

Greetz,
Josef

sure, i have built master and set it up on my device

will follow up in the next few days

1 Like

@roee.douek thanks!