Mender MCU client crash on ESP32-S3 during identity callback

ESP32-S3 + Zephyr

Device description

Board: ESP32-S3 DevKitC
OS: Zephyr v4.2.0
Quick description:
A development board with dual-core Xtensa LX7, used here with Zephyr OS and integrated Mender MCU client for OTA updates.


Support level

  • Board supported: ESP32-S3 DevKitC

  • Zephyr OS integration tested by the user for basic build and OTA functionality.

  • Note: Crash occurs during Mender MCU client initialization when using identity callback.


Getting started

  • Zephyr project: mender-mcu-integration

  • Build steps (successful firmware build with Mender MCU integration):

    west build -b esp32s3_devkitc path/to/mender-mcu-integration
    west espressif flash -p /dev/ttyUSB0
    west espressif monitor -p /dev/ttyUSB0 -b 115200
    
    
  • MCUboot is used as the bootloader.

  • TLS certificates are integrated via certs.c and .cer.inc include files.

  • Mender Server: self-hosted Open Source server deployed on Ubuntu (following official Mender tutorial).

  • Mender Client Kconfig settings for self-hosted server:

    CONFIG_MENDER_SERVER_HOST="http://<server_ip>"
    CONFIG_MENDER_SERVER_TENANT_TOKEN=""   # empty for on-prem server
    CONFIG_MENDER_NET_CA_CERTIFICATE_TAG_PRIMARY=1
    CONFIG_MENDER_CLIENT_UPDATE_POLL_INTERVAL=30
    CONFIG_MENDER_CLIENT_INVENTORY_REFRESH_INTERVAL=60
    
    

References


Known issues

Problem description:
During initialization of Mender MCU client on ESP32-S3 with Zephyr, the device crashes after network connection, TLS setup, and when attempting to call the identity callback.

Crash log snippet:

[1970-01-01T00:00:04,673000Z] <dbg> mender_app: mender_network_connect_cb: network_connect_cb
[1970-01-01T00:00:04,673000Z] <inf> mender: Checking for deployment...
[1970-01-01T00:00:04,673000Z] <dbg> mender: mender_storage_get_provides: Provides not available
[1970-01-01T00:00:04,673000Z] <dbg> mender_app: mender_get_identity_cb: get_identity_cb
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/lib/libc/picolibc/assert.c:27
[1970-01-01T00:00:04,746000Z] <err> os:  ** FATAL EXCEPTION
[1970-01-01T00:00:04,746000Z] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[1970-01-01T00:00:04,746000Z] <err> os:  **  PC 0x40378587 VADDR 0
[1970-01-01T00:00:04,746000Z] <err> os:  **  PS 0x60e20
[1970-01-01T00:00:04,746000Z] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:14 CALLINC:2)

Identity callback used:

mender_err_t mender_get_identity_cb(const mender_identity_t **identity_ptr) {
    if (identity_ptr) {
        *identity_ptr = &identity;
        return MENDER_OK;
    }
    return MENDER_FAIL;
}

Steps attempted to resolve:

  1. Verified mender_get_identity_cb returns MENDER_OK.

  2. Integrated self-signed TLS certificates correctly.

  3. Ensured network (Wi-Fi) is up before client initialization.

  4. Tried different static identity formats.

  5. Checked memory usage; NVS storage initialized correctly.

  6. Confirmed server is reachable and accessible from other devices.

Observations:

  • Crash occurs immediately after mender_get_identity_cb is called.

  • The device successfully connects to Wi-Fi and gets an IP address.

  • TLS setup is completed and primary certificate is loaded.

  • MCUboot successfully loads the Zephyr application; “Hello World” message prints before crash.

Request for guidance:

  • Could this crash be due to environment initialization order (NVS, TLS, time)?

  • Are there known issues with ESP32-S3 + Zephyr + Mender MCU client regarding identity callback?

  • Any recommended workaround or example for safe initialization sequence on ESP32-S3 with self-hosted Mender server?

Hello @alian and thank you for your report,

This is something I would like to look deeper into. Certainly and exception like that one is something we need to look into!

Please clarify some points:

  • Mender Client Kconfig settings for self-hosted server:
CONFIG_MENDER_SERVER_HOST="http://<server_ip>"
CONFIG_MENDER_SERVER_TENANT_TOKEN=""   # empty for on-prem server
CONFIG_MENDER_NET_CA_CERTIFICATE_TAG_PRIMARY=1
CONFIG_MENDER_CLIENT_UPDATE_POLL_INTERVAL=30
CONFIG_MENDER_CLIENT_INVENTORY_REFRESH_INTERVAL=60

Setting SERVER_HOST to http://<server_ip> cannot work for two reasons: 1) Mender MCU won’t connect to a non-https server and 2) It must be a domain and not an ip address. If you are running on a private network you will need to set a DNS to resolve the ip address.

Can you clarify more your on-prem setup?

References

These don’t exist, is this a AI hallucination? Which guide(s) did you use?

References
TLS / certificates guidance: Mender MCU TLS documentation

Are you referring to this one?

Identity callback used:

mender_err_t mender_get_identity_cb(const mender_identity_t **identity_ptr) {
   if (identity_ptr) {
       *identity_ptr = &identity;
       return MENDER_OK;
   }
   return MENDER_FAIL;
}

Can you please share where is identity defined? I assume you used our demo code, which defines it static like static mender_identity_t mender_identity but I would like to double-check.

LluĂ­s

One more thing,

In Zephyr v4.2.0, this exact line is a comment :thinking:

LluĂ­s

Hi @lluiscampos,

I would like to provide a follow-up on my setup and ask for your guidance regarding an authentication issue I am currently facing.

After rebuilding my firmware using the official mender-mcu integration example:

the device now boots correctly, connects to WiFi, and completes TLS initialization. However, authentication with my self-hosted Mender server fails.


Device Environment

  • Server: Self-hosted Mender Server (Docker Compose, likely single-tenant)

  • Board: ESP32-S3 DevKitC + MCUboot

  • Zephyr: v4.2.0

The server certificate is embedded in the firmware:

CONFIG_MENDER_SERVER_HOST="https://docker.mender.io"
CONFIG_MENDER_NET_CA_CERTIFICATE_TAG_PRIMARY=1


Initial Error

When no tenant token is configured:

<err> mender: [401] Unauthorized: tenant token missing

In the server UI, I cannot find an Organization/Tenant token, which led me to assume that a tenant token is not required for a single-tenant deployment.


Attempts

:one: No tenant token

CONFIG_MENDER_SERVER_TENANT_TOKEN=""

Result:

401 Unauthorized: tenant token missing


:two: Using tokens from Settings → My profile

I tried both:

  • Session token

  • Personal access token

When using a Personal Access Token, the error changes to:

401 Unauthorized: Unauthorized

Full log excerpt:

<err> mender: [401] Unauthorized: Unauthorized
<err> mender: Authentication failed
<err> mender: Unable to perform HTTP request

After repeated retries, I also see:

<err> mender: Unable to allocate memory

(This may be a secondary issue caused by repeated authentication failures.)


Observations

The device successfully reaches:

mender_get_identity_cb

but authentication fails immediately afterward, which suggests the request is being rejected before device provisioning.


Questions

For a self-hosted single-tenant Mender server:

:white_check_mark: Tenant Token

Should CONFIG_MENDER_SERVER_TENANT_TOKEN be:

  • completely removed, or

  • defined as an empty string?

Does the “tenant token missing” error typically indicate:

  • a server-side configuration problem, or

  • an incorrect MCU client configuration?


:white_check_mark: Token Types

Are Session Tokens or Personal Access Tokens ever valid substitutes for a tenant token?

Or should the MCU client never use those tokens?


:white_check_mark: Recommended Setup

Is there a recommended authentication configuration specifically for:

Mender MCU + on-prem / self-hosted server?

A minimal known-working configuration would be extremely helpful for narrowing this down.


At this point, connectivity and TLS appear to be functioning correctly, but authentication consistently fails. I suspect something fundamental may be missing in either the client configuration or the server setup.

Any guidance would be greatly appreciated — especially if there is a known working reference for Mender MCU with a self-hosted deployment.

Thank you again for your support!

Hi @alian,

I understand that you have good intentions when using AI to improve your reports. However, this creates a lot of extra noise and it makes it more time consuming for me to help you (and a bit annoying, to be honest).

Please ask AI to make short and concrete reports. Say something like “this report goes to a Mender expert, do not add extra assumptions nor beautify our report. Assume that the reader knows the code base and the product. Avoid emojis and bullet points”.

All right, I’ll try to help anyway. There are two issues that I can see:

ISSUE 1: CRASH

Mender MCU crashes with an IP-address based SERVER_URL → This part is not a blocker for you any longer because you set it up “correctly” with a URL. Still, I will follow-up on this internally as it may hide some other problem.

So let’s ignore this problem in this thread (you deleted yesterday’s message, so I assume you are not interested in following up anyway)

ISSUE 2: Unauthorized

Good news is that the client seems to work just fine now :tada: The client is able to contact the server through TLS, so we know that network is working, the certificate is fine, etc. Promising result.

The message " Unauthorized: tenant token missing" is coming from the Mender server. This is indicating that the server is configured for multi-tenancy.

I suspect that you have the environment variable HAVE_MULTITENANT.

Can you share how are you starting the server?

Hi, @lluiscampos

Thank you for feedback. I understand your concern about the noise in my previous reports. Previsouly I used AI to help summarize my findings, I am sorry. In next, I will keep future reports short and concrete without AI.

The previous crash about mender_get_identity_cb disappeared after I updated
CONFIG_MENDER_SERVER_HOST="https://docker.mender.io".

Afterwards, as previously mentioned, I encountered authentication errors:

<err> mender: [401] Unauthorized: Unauthorized
<err> mender: Authentication failed

As you suggested, this might be related to multi-tenancy.

I am using the open-source mender-server locally on my ubuntu and assumed it was configured as single-tenant. Therefore, I removed CONFIG_MENDER_SERVER_TENANT_TOKEN from prj.conf, rebuilt, and reflashed the device.

After doing, the monitor output shows DNS failures:

[1970-01-01T00:00:09,543000Z] mender: Initialization done
[1970-01-01T00:00:09,543000Z] mender_app: mender_network_connect_cb: network_connect_cb
[1970-01-01T00:00:09,543000Z] mender: Checking for deployment…
[1970-01-01T00:00:09,544000Z] mender_app: mender_get_identity_cb: get_identity_cb
[1970-01-01T00:00:10,865000Z] mender: Unable to resolve host name ‘docker.mender.io:443’: EAI_SYSTEM
[1970-01-01T00:00:10,865000Z] mender: Unable to open HTTP client connection
[1970-01-01T00:00:11,699000Z] mender: Unable to resolve host name ‘docker.mender.io:443’: EAI_SYSTEM

My deployment Mender-Server is from mendersoftware/mender-server: Mender Server for managing devices and deployments and I followed the README for the setup

Since both my local mender-server PC and mender-mcu(ESP32-S3) are connected to the same Wi-Fi network, could you please how to resolve this DNS issue?

Thank you for your help.

Hi @alian,

Unless you took any actions with your name resolution, the hostname docker.mender.io will probably not resolve to your Mender Server instance. So there’s two things here:

Greetz,
Josef

1 Like

Hi, @TheYoctoJester

Currently, with a on-premises Mender server, I am able to see the device in Pending, move it to Accepted, and create deployments without issues.

However, I found extremely slow deployment speeds. For example, deploying a firmware image of approximately 750 KB takes an unusually long time (43200 seconds).

Is this deployment speed expected in a local setup? Are there any recommended configurations or checks to improve deployment performance?

For your convenience, I attached the complete monitor logs below.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fcb5400,len:0x1e18
load:0x403ba400,len:0x7f38
load:0x403c6400,len:0x49c
entry 0x403bd34c
I (37) soc_init: MCUboot 2nd stage bootloader
I (37) soc_init: compile time Feb 13 2026 17:32:04
W (37) soc_init: Unicore bootloader
I (37) soc_init: chip revision: v0.2
I (40) flash_init: Boot SPI Speed : 80MHz
I (44) flash_init: SPI Mode : DIO
I (48) flash_init: SPI Flash Size : 8MB
I (97) boot: Image index: 0, Swap type: none
I (97) boot: Loading image 0 - slot 0 from flash, area id: 2
I (97) boot: Application start=4037e490h
I (98) boot: DRAM segment: paddr=0002f184h, vaddr=3fc93120h, size=03d74h ( 15732) load
I (107) boot: IRAM segment: paddr=00020080h, vaddr=40374000h, size=0f104h ( 61700) load
I (125) boot: IROM segment: paddr=00040000h, vaddr=42000000h, size=7A486h (500870) map
I (125) boot: DROM segment: paddr=000c0000h, vaddr=3c080000h, size=173B0h ( 95152) map
I (141) boot: libc heap size 122 kB.
I (141) spi_flash: detected chip: generic
I (141) spi_flash: flash io: dio
Hello SIEMENS! esp32s3_devkitc/esp32s3/procpu
*** Booting Zephyr OS build v4.2.0 ***
[1970-01-01T00:00:00,177000Z] mender_app: failed to bind to LED device
[1970-01-01T00:00:00,177000Z] mender_app: Using net interface wifi, index=1
[1970-01-01T00:00:00,177000Z] mender_app: Connecting to wireless network GL-MT3000-ee0…
[1970-01-01T00:00:00,224000Z] mender_app: Waiting for network up…
[1970-01-01T00:00:02,891000Z] net_dhcpv4: Received: 192.168.8.156
[1970-01-01T00:00:02,891000Z] mender_app: Address[1]: 192.168.8.156
[1970-01-01T00:00:02,891000Z] mender_app: Subnet[1]: 255.255.255.0
[1970-01-01T00:00:02,891000Z] mender_app: Router[1]: 192.168.8.1
[1970-01-01T00:00:02,891000Z] mender_app: Lease time[1]: 43200 seconds
[1970-01-01T00:00:02,891000Z] mender_app: Initializing Mender Client with:
[1970-01-01T00:00:02,891000Z] mender_app: Device type: ‘esp32s3_devkitc’
[1970-01-01T00:00:02,891000Z] mender_app: Identity: ‘{“mac”: “e8:f6:0a:8d:af:fc”}’
[1970-01-01T00:00:02,892000Z] mender: Device type: [esp32s3_devkitc]
[1970-01-01T00:00:02,903000Z] fs_nvs: 2 Sectors of 4096 bytes
[1970-01-01T00:00:02,903000Z] fs_nvs: alloc wra: 0, fd8
[1970-01-01T00:00:02,903000Z] fs_nvs: data wra: 0, 120
[1970-01-01T00:00:02,904000Z] mender_app: Mender client initialized
[1970-01-01T00:00:02,904000Z] mender_app: Update Module ‘zephyr-image’ initialized
[1970-01-01T00:00:02,904000Z] mender_app: Mender inventory callback added
[1970-01-01T00:00:02,904000Z] mender_app: Mender client activated and running!
[1970-01-01T00:00:02,905000Z] mender: Initialization done
[1970-01-01T00:00:02,905000Z] mender_app: mender_network_connect_cb: network_connect_cb
[1970-01-01T00:00:02,905000Z] mender: Checking for deployment…
[1970-01-01T00:00:02,906000Z] mender_app: mender_get_identity_cb: get_identity_cb
[1970-01-01T00:00:04,249000Z] mender: Downloading artifact with id ‘de0d93f…’, name ‘esp32s3_release-2’, uri ‘https://s3.docker.mender.io/mender/b898d8be-de7a-419e-8098-8a54bb43f325?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=mender%2F20260214%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260214T013743Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D"esp32s3_release-2.mender"&response-content-type=application%2Fvnd.mender-artifact&x-id=GetObject&X-Amz-Signature=6ffb4e2f6b1596271b313be9688c67ece5543cb10b8ec78ba1d0c718de02a101’
[1970-01-01T00:00:04,654000Z] mender_app: mender_deployment_status_cb: deployment_status_cb: downloading

I appreciate your guidance on how to troubleshoot this issue.

BR

Hi @alian,

Downloading and installing a ~700kB artifact takes 2-3 minutes. Where do you get the 43200 seconds from? That makes for exactly 12 hours, which would be an extraordinary coincidence for a download process duration. So my guess:

  • check where the timestamps actually do come from. Is there a clock re-sync somewhere?
  • 12h is also a common polling cycle for update checks. So if a deployment is started, it can of course take up to 12 hours for the device to pick it up in the next cycle.

Greetz,
Josef

Hi, @TheYoctoJester

Thanks a lot for your feedback. Currently, when I deploy a release to ESP32S3 devkitc on on-premises Mender server, I found deployment consistently gets stuck in the downloading artifact phase. Firstly, I guessed the root cause that the device’s system time was 0 (1970-01-01), which is before the CA certificate’s valid range (2021–2031). Because of this, TLS handshake fails, resulting in repeated connection attempts during deployment.

So, I configured SNTP to set ESP32S3 system time automatically. After that, I found it indeedly synchronizes time via SNTP (such as “current timeL 1772087227 “ in below log). However, the deployment consistently gets stuck in the “downloading” stage. The monitor log is as below:

[1970-01-01T00:00:00,210000Z] mender_app: Waiting for network up…
[1970-01-01T00:00:05,698000Z] mender_app: Address[1]: 192.168.8.156
[1970-01-01T00:00:05,698000Z] mender_app: Subnet[1]: 255.255.255.0
[1970-01-01T00:00:05,698000Z] mender_app: Router[1]: 192.168.8.1
[1970-01-01T00:00:05,698000Z] mender_app: Lease time[1]: 43200 seconds
[1970-01-01T00:00:05,993000Z] mender_app: Time sync: 1772087227
[1970-01-01T00:00:05,993000Z] mender_app: current time: 1772087227
[1970-01-01T00:00:05,994000Z] mender_app: Initializing Mender Client with:
[1970-01-01T00:00:05,994000Z] mender_app: Device type: ‘esp32s3_devkitc’
[1970-01-01T00:00:05,994000Z] mender_app: Identity: ‘{“mac”: “e8:f6:0a:8d:af:fc”}’
[1970-01-01T00:00:05,994000Z] mender: Device type: [esp32s3_devkitc]
[1970-01-01T00:00:06,006000Z] mender_app: Mender client initialized
[1970-01-01T00:00:06,006000Z] mender_app: Update Module ‘zephyr-image’ initialized
[1970-01-01T00:00:06,006000Z] mender_app: Mender inventory callback added
[1970-01-01T00:00:06,006000Z] mender_app: Mender client activated and running!
[1970-01-01T00:00:06,006000Z] mender: Initialization done
[1970-01-01T00:00:06,006000Z] mender_app: mender_network_connect_cb: network_connect_cb
[1970-01-01T00:00:06,006000Z] mender: Checking for deployment…
[1970-01-01T00:00:06,007000Z] mender_app: mender_get_identity_cb: get_identity_cb
[1970-01-01T00:00:07,419000Z] mender: Downloading artifact with id ‘64df989…’, name ‘release-35’, uri ‘https://s3.docker.mender.io/mender/6040e44c-1e43-4d40-93ef-84512b5ab21d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=mender%2F20260226%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260226T062708Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D"release-35.mender"&response-content-type=application%2Fvnd.mender-artifact&x-id=GetObject&X-Amz-Signature=aafa4f061eb6f506044000752b7a3fe427592d9fad640d6282be446203baa5ba’
[1970-01-01T00:00:07,823000Z] mender_app: mender_deployment_status_cb: deployment_status_cb: downloading

Currently, in my side, DNS resolution works, TLS SNI is enabled, and the on-prem certificate includes both docker.mender.io and s3.docker.mender.io. And, I used “curl -k -o “ to verify downloading uri, and it also can successfully get downloading file.

Besides, I also configure CONFIG_HEAP_MEM_POOL_SIZE=98304, CONFIG_MBEDTLS_HEAP_SIZE=49152, CONFIG_MBEDTLS_SSL_MAX_CONTENT_LEN=8192, and NET_BUF RX/TX count of 48

During artifact downloading on Mender server, which server components are actually involved?
After the device receives the presigned S3 URL and performs the HTTPS GET request, what does the server validate (e.g., signature, host header, timestamp), and which service handles the request (API, MinIO, reverse proxy)? If the device gets stuck in the “downloading” state, which logs should I check first to troubleshoot the issue?

BR