ESP32-S3 + Zephyr: Artifact download stalls at 20–40% (TLS read blocks)

Hi,

When I use the Hosted Mender server and performing OTA updates on an ESP32-S3 device. However, during deployment, the artifact download consistently stalls between 20% and 40%. The exact percentage varies each time (sometimes 10%, sometimes 30%, sometimes 40%), and the download never completes successfully.

  • Device: ESP32-S3 (esp32s3_devkitc)
  • OS: Zephyr
  • Client: Mender MCU client
  • Server: Hosted Mender (US region)
  • Artifact size: ~760 KB
  • Transport: HTTPS

And I have ever tried to tested (Switched to a different WiFi network, Used a mobile hotspot, Adjusted socket connect timeout, Modified network connection limits, Experimented with different mbedTLS buffer and heap sizes).

And, the device does not crash. It appears that the TLS/socket read operation blocks indefinitely during the download phase.

Below is the relevant part of the log :

[1970-01-01T00:00:00,177000Z] mender_app: Using net interface wifi, index=1
[1970-01-01T00:00:00,177000Z] mender_app: Connecting to wireless network GL-MT3000-ee0…
[1970-01-01T00:00:00,206000Z] mender_app: Waiting for network up…
[1970-01-01T00:00:05,329000Z] net_dhcpv4: Received: 192.168.8.156
[1970-01-01T00:00:05,330000Z] mender_app: Address[1]: 192.168.8.156
[1970-01-01T00:00:05,330000Z] mender_app: Subnet[1]: 255.255.255.0
[1970-01-01T00:00:05,330000Z] mender_app: Router[1]: 192.168.8.1
[1970-01-01T00:00:05,330000Z] mender_app: Lease time[1]: 43200 seconds
[1970-01-01T00:00:05,330000Z] mender_app: Initializing Mender Client with:
[1970-01-01T00:00:05,330000Z] mender_app: Device type: ‘esp32s3_devkitc’
[1970-01-01T00:00:05,330000Z] mender_app: Identity: ‘{“mac”: “e8:f6:0a:8d:af:fc”}’
[1970-01-01T00:00:05,330000Z] mender: Device type: [esp32s3_devkitc]
[1970-01-01T00:00:05,341000Z] fs_nvs: 2 Sectors of 4096 bytes
[1970-01-01T00:00:05,341000Z] fs_nvs: alloc wra: 0, fd8
[1970-01-01T00:00:05,341000Z] fs_nvs: data wra: 0, 120
[1970-01-01T00:00:05,342000Z] mender: mender_storage_init: Initialized Mender NVS at 0x7f1000 with 2 sectors (4096 bytes available)
[1970-01-01T00:00:05,342000Z] mender: mender_storage_init: Initialized deployment logs FCB at 0x7f3000 with 2 sectors (8192 bytes available)
[1970-01-01T00:00:05,342000Z] mender: mender_client_init: Added dormant certificate
[1970-01-01T00:00:05,342000Z] mender_app: Mender client initialized
[1970-01-01T00:00:05,342000Z] mender_app: Update Module ‘zephyr-image’ initialized
[1970-01-01T00:00:05,342000Z] mender_app: Mender inventory callback added
[1970-01-01T00:00:05,342000Z] mender: mender_os_scheduler_work_activate: Activating mender_client_main every 30 seconds
[1970-01-01T00:00:05,342000Z] mender: mender_os_scheduler_work_activate: Activating mender_inventory every 60 seconds
[1970-01-01T00:00:05,342000Z] mender_app: Mender client activated and running!
[1970-01-01T00:00:05,343000Z] mender: mender_os_scheduler_work_handler: Executing mender_client_main work
[1970-01-01T00:00:05,343000Z] mender: mender_client_work_function: Inside work function [state: MENDER_CLIENT_STATE_INITIALIZATION]
[1970-01-01T00:00:05,343000Z] mender: mender_tls_init_authentication_keys: Trying to read authentication keys from store
[1970-01-01T00:00:05,343000Z] mender: mender_storage_get_deployment_data: Deployment data not available
[1970-01-01T00:00:05,343000Z] mender: Initialization done
[1970-01-01T00:00:05,344000Z] mender: mender_client_update_work_function: Entering state MENDER_UPDATE_STATE_DOWNLOAD
[1970-01-01T00:00:05,344000Z] mender_app: mender_network_connect_cb: network_connect_cb
[1970-01-01T00:00:05,344000Z] mender: Checking for deployment…
[1970-01-01T00:00:05,344000Z] mender: mender_storage_get_provides: Provides not available
[1970-01-01T00:00:05,344000Z] mender_app: mender_get_identity_cb: get_identity_cb
[1970-01-01T00:00:07,681000Z] mender: ensure_authenticated_and_locked: Authenticated successfully
[1970-01-01T00:00:11,572000Z] mender: Downloading artifact with id ‘93bfaeb…’, name ‘release-343’, uri ‘https://c271964d41749feb10da762816c952ee.r2.cloudflarestorage.com/mender-artifacts-us/69845d2fa5cf26b77f4f7075/c2ba3487-4aac-491d-9937-0971d648ec8c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=161d877990f252c30c4602beb38f74a8%2F20260227%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260227T054909Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D"release-343.mender"&response-content-type=application%2Fvnd.mender-artifact&x-id=GetObject&X-Amz-Signature=2f78c5f315166a56d716b77b261af1bd8c82660dcc8357779e1fae45b6eeaceb
[1970-01-01T00:00:14,931000Z] mender_app: mender_deployment_status_cb: deployment_status_cb: downloading
[1970-01-01T00:00:18,839000Z] mender: artifact_read_version: Artifact has valid version
[1970-01-01T00:00:19,044000Z] mender: artifact_read_manifest: 384a8f8b3363eb2477a0d4221711eb7bf97caab329edad7f822dc11f6e3d4606 data/0000/zephyr.signed.bin
[1970-01-01T00:00:19,044000Z] mender: artifact_read_manifest: 2a5cc250ced94ac8e6de5b548d3c6ef927de7b77441e9cc7b97e944fe2ca9b8b header.tar
[1970-01-01T00:00:19,044000Z] mender: artifact_read_manifest: 96bcd965947569404798bcbdb614f103db5a004eb6e364cfc162c146890ea35b version
[1970-01-01T00:00:19,044000Z] mender: is_checksum_valid: Checking integrity for artifact file ‘version’
[1970-01-01T00:00:19,863000Z] mender: is_checksum_valid: Checking integrity for artifact file ‘header.tar’
[1970-01-01T00:00:19,878000Z] mender: Start flashing artifact ‘zephyr.signed.bin’ with size 751484
[1970-01-01T00:00:36,883000Z] mender: Downloading ‘zephyr-image’ 10%… [75264/751484]
[1970-01-01T00:00:52,842000Z] mender: Downloading ‘zephyr-image’ 20%… [150528/751484]
[1970-01-01T00:01:08,714000Z] mender: Downloading ‘zephyr-image’ 30%… [225792/751484]

My prj.conf is as below:

CONFIG_MENDER_SERVER_TENANT_TOKEN=“…”
CONFIG_MENDER_MCU_CLIENT=y
CONFIG_MENDER_LOG_LEVEL_INF=y
CONFIG_MENDER_CLIENT_UPDATE_POLL_INTERVAL=30
CONFIG_MENDER_CLIENT_INVENTORY_REFRESH_INTERVAL=60
CONFIG_MENDER_RETRY_ERROR_BACKOFF=5
CONFIG_MENDER_RETRY_ERROR_MAX_BACKOFF=15
CONFIG_MENDER_SERVER_HOST_US=y
CONFIG_MENDER_STORAGE_PARTITION_STORAGE_PARTITION=y

CONFIG_MBEDTLS=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_PSK_ENABLED=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_RSA_ENABLED=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_ECDSA_ENABLED=y
CONFIG_MBEDTLS_ECDH_C=y
CONFIG_MBEDTLS_ECDSA_C=y
CONFIG_MBEDTLS_ECP_C=y

CONFIG_MBEDTLS_ECP_DP_SECP256R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP384R1_ENABLED=y
CONFIG_MBEDTLS_ECP_NIST_OPTIM=y
CONFIG_MBEDTLS_CIPHER_CCM_ENABLED=y
CONFIG_MBEDTLS_CIPHER_GCM_ENABLED=y
CONFIG_MBEDTLS_SHA384=y
CONFIG_MBEDTLS_GENPRIME_ENABLED=y
CONFIG_MBEDTLS_PEM_CERTIFICATE_FORMAT=y
CONFIG_MBEDTLS_SERVER_NAME_INDICATION=y
CONFIG_MBEDTLS_PK_WRITE_C=y
CONFIG_MBEDTLS_SSL_MAX_CONTENT_LEN=16384
CONFIG_MBEDTLS_ENABLE_HEAP=y
CONFIG_MBEDTLS_HEAP_SIZE=40960
CONFIG_MBEDTLS_ENTROPY_POLL_ZEPHYR=y
CONFIG_MBEDTLS_ENTROPY_C=y

CONFIG_MBEDTLS_USER_CONFIG_ENABLE=y
CONFIG_MBEDTLS_USER_CONFIG_FILE=“config-tls-mender.h”

CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_ZVFS_OPEN_MAX=5

CONFIG_LOG=y
CONFIG_LOG_BUFFER_SIZE=2048
CONFIG_LOG_MODE_DEFERRED=y
CONFIG_LOG_MODE_OVERFLOW=y
CONFIG_LOG_SPEED=y
CONFIG_POSIX_C_LANG_SUPPORT_R=y
CONFIG_LOG_OUTPUT_FORMAT_ISO8601_TIMESTAMP=y

CONFIG_NET_IPV6=n
CONFIG_NET_IPV4=y
CONFIG_NET_DHCPV4=y
CONFIG_NET_MGMT=y
CONFIG_NET_MGMT_EVENT=y
CONFIG_NET_MGMT_EVENT_STACK_SIZE=2048
CONFIG_NET_SOCKETS_SOCKOPT_TLS=y
CONFIG_NET_SOCKETS_TLS_MAX_CONTEXTS=2
CONFIG_NET_MAX_CONN=16
CONFIG_DNS_RESOLVER_ADDITIONAL_BUF_CTR=5
CONFIG_DNS_RESOLVER_ADDITIONAL_QUERIES=2
CONFIG_DNS_RESOLVER_MAX_SERVERS=2
CONFIG_DNS_NUM_CONCUR_QUERIES=5
CONFIG_NET_SOCKETS_CONNECT_TIMEOUT=5000
CONFIG_NET_LOG=y
CONFIG_COMPILER_WARNINGS_AS_ERRORS=y
CONFIG_SPIN_VALIDATE=n

Questions

  1. Is this behavior known when using Hosted Mender + ESP32/Zephyr?

  2. Are there recommended TCP/TLS configuration settings for downloading artifacts of this size (~760 KB) on resource-constrained devices?

  3. Are there specific debug logs (TCP state, TLS debug level, net_buf stats, etc.) that you would like me to capture?

Thank you for your assistance.

Best regards,

Hi @alian,

From a first glance, this really shouldn’t be a problem, and a 800kByte artifact is definitely not a huge load. Do you experience other, additional problem in connectivity, like needing multiple retries when uploading the artifact? Are you using the web UI, or mender-cli? And additional, long shot: if you try a Linux device like a RPi, does the onboarding process work there? Or do you experience stalled downloads there too?

Besides that, which source code version are you starting out with? GitHub - mendersoftware/mender-mcu-integration probably, but which specific hash, and which Zephyr version?

Greetz,
Josef

Hi, @TheYoctoJester

Thanks a lot for your response.

I don’t see obvious connectivity issues in general netowork usage, and uploading the artifact file to the “hosted.mender.io” doesn’t require multiple retries. I am using the web UI to upload and deploy, not mender-cli.

I haven’t yet tested with a Linux device such as Pi,. That is a good suggestion - I will try it to compare whether the issue is specific to the ESP32-S3 or more general.

I am currently starting from the mender-mcu-integration repository on GitHub. The revision I am using is [88aeae5009e3c61b76a41727cba2f126f28d5527], and zephyr version is v4.2

In one case, the download progressed to around 40% and then stalled indefinitely. It stayed in the "downloading"state overnight, without any timeout and not retries triggered. Below is the relevant log excerpt from that run:

[1970-01-01T00:01:39,100000Z] mender_app: mender_deployment_status_cb: deployment_status_cb: downloading
[1970-01-01T00:01:41,093000Z] mender: artifact_read_version: Artifact has valid version
[1970-01-01T00:01:41,097000Z] mender: artifact_read_manifest: bd5bc0c8a7a44a3e2c448448ff26e15a799c4005d7e07804ec9253e08e62e5bf data/0000/zephyr.signed.bin
[1970-01-01T00:01:41,097000Z] mender: artifact_read_manifest: 27cab4dd7b90f1fa4811b4275b407578254e31b8714939d4ce59546d420a9a21 header.tar
[1970-01-01T00:01:41,097000Z] mender: artifact_read_manifest: 96bcd965947569404798bcbdb614f103db5a004eb6e364cfc162c146890ea35b version
[1970-01-01T00:01:41,097000Z] mender: is_checksum_valid: Checking integrity for artifact file ‘version’
[1970-01-01T00:01:41,403000Z] mender: is_checksum_valid: Checking integrity for artifact file ‘header.tar’
[1970-01-01T00:01:41,406000Z] mender: Start flashing artifact ‘zephyr.signed.bin’ with size 754156
[1970-01-01T00:01:46,532000Z] mender: Downloading ‘zephyr-image’ 10%… [75776/754156]
[1970-01-01T00:01:51,435000Z] mender: Downloading ‘zephyr-image’ 20%… [151040/754156]
[1970-01-01T00:01:57,117000Z] mender: Downloading ‘zephyr-image’ 30%… [226304/754156]
[1970-01-01T00:02:02,546000Z] mender: Downloading ‘zephyr-image’ 40%… [302080/754156]

Best regards

alian

.

Hi @alian,

We have discussed the situation internally a bit, and there are probably a few factors at play here.

  • the time verification of the certificate is according to our experience not a problem. The ESP32s which we are running do not do any RTC sync at all, and the TLS handshakes work. So I would rule that out.
  • the mechanisms you introduced for setting the clock are possibly interfering with the network communication, so we suggest to remove them again
  • given your whereabouts, we suspect national internet infrastructure to be the main problem. We can offer two approaches here. Either running a demo server instance locally, just to make sure the network stack operations are fine, or using the local hosted Mender offering. If the latter is of interest to you please let us know.

Greetz,
Josef