CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=y causes a Zephyr crash on nRF9151 DK

CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=y causes a Zephyr crash on nRF9151 DK

We have observed a Zephyr FATAL ERROR on nRF9151 DK which uses mobile connectivity (LTE).

When CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=y is enabled, the device boots, connects to LTE, initializes Mender, and completes the first deployment check. Right after that, Zephyr crashes with a secure fault when trying to collect network information for inventory.

With the option disabled, the same integration runs normally.

This is the log we get:

*** Booting nRF Connect SDK v3.2.1-d8887f6f32df ***
*** Using Zephyr OS v4.2.99-ec78104f1569 ***
[00:00:00.256,439] <inf> main: Starting LTE
[00:00:25.644,836] <inf> main: LTE connected
[00:00:25.644,897] <inf> mender: Device type: [nrf9151dk]
[00:00:25.649,200] <inf> main: Mender activated
[00:00:25.649,230] <inf> main:  V0.1
[00:00:25.649,536] <inf> mender: Initialization done
[00:00:25.649,566] <inf> mender: Checking for deployment...
[00:00:29.321,685] <inf> mender: No deployment available
[00:00:29.325,897] <err> os: ***** SECURE FAULT *****
[00:00:29.325,897] <err> os:   Address: 0x4
[00:00:29.325,927] <err> os:   Attribution unit violation
[00:00:29.325,927] <err> os: r0/a1:  0x200216b0  r1/a2:  0x00000000  r2/a3:  0x200216c0
[00:00:29.325,927] <err> os: r3/a4:  0x200216c0 r12/ip:  0x2001c294 r14/lr:  0x00022ba1
[00:00:29.325,958] <err> os:  xpsr:  0x61000000
[00:00:29.325,958] <err> os: Faulting instruction address (r15/pc): 0x00022c7e
[00:00:29.325,988] <err> os: >>> ZEPHYR FATAL ERROR 41: Unknown error on CPU 0
[00:00:29.326,019] <err> os: Current thread: 0x2000ea68 (unknown)
[00:00:29.600,402] <err> os: Halting system

To reproduce:

  1. Follow the setup and build instructions in this repository:
    README.md
  2. Before building, change this line in app/prj.conf:
    app/prj.conf#L64
  3. Change:
CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=n

to:

CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=y
  1. Build and flash the application.
  2. Let the device boot, connect to LTE, and complete the first Mender deployment check.
  3. Observe the secure fault right after No deployment available.

Environment:

  • Hardware: nRF9151 DK
  • nRF Connect SDK: v3.2.1-d8887f6f32df
  • Zephyr: v4.2.99-ec78104f1569

For now, our workaround is to keep:

CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=n

Is this a known limitation or bug in the current mender-mcu network inventory path, or is there something specific we should change in the integration?

Hi Dexter9532, thanks for sharing this! We are not aware of the default network inventory code causing issues like this. We haven’t seen it happening on our reference board(s). The code is quite simple so it should be easy to make some changes helping to identify what exactly is going on. Note that the callback is added as a non-persistent one so there might potentially be some issue with ownership/allocation of some of the network info.

Hello @vpodzime ,

I looked a bit more into this on nrf9151.

I found a way to stop the crash by changing the inventory code so it does not access iface->config.ip.ipv4 directly, and instead uses the Zephyr network APIs. With that change, CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO=y no longer crashes.

Diff:

diff --git a/src/platform/inventory/zephyr/inventory.c b/src/platform/inventory/zephyr/inventory.c
index 1a9d5d0..5e36ce4 100644
--- a/src/platform/inventory/zephyr/inventory.c
+++ b/src/platform/inventory/zephyr/inventory.c
@@ -18,6 +18,7 @@
  */

 #include <zephyr/net/net_if.h>
+#include <zephyr/net/net_ip.h>
 #include <zephyr/version.h> /* a file generated during build */

 #include "alloc.h"
@@ -45,9 +46,12 @@ build_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
 #ifdef CONFIG_MENDER_CLIENT_INVENTORY_NETWORK_INFO
 static mender_err_t
 network_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
-    mender_keystore_t *network_info = NULL;
-    struct net_if     *iface        = NULL;
-    const char        *ifname       = NULL;
+    mender_keystore_t    *network_info = NULL;
+    struct net_if        *iface        = NULL;
+    const char           *ifname       = NULL;
+    const struct in_addr *ipv4_addr    = NULL;
+    struct in_addr        netmask;
+    struct in_addr        gateway;

     if (NULL == (iface = net_if_get_default())) {
         mender_log_debug("No network interface");
@@ -65,6 +69,14 @@ network_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
     network_info[0].name  = mender_utils_strdup("Default network interface");
     network_info[0].value = mender_utils_strdup(ifname);

+    ipv4_addr = net_if_ipv4_get_global_addr(iface, NET_ADDR_PREFERRED);
+    if (NULL == ipv4_addr) {
+        mender_log_debug("No preferred IPv4 address on default interface");
+        *inventory     = network_info;
+        *inventory_len = 1;
+        return MENDER_OK;
+    }
+
     /* The first IP of the iface */
     if (mender_utils_asprintf(&(network_info[1].name), "IPv4[%s]", ifname) <= 0) {
         mender_log_error("Failed to construct network inventory data");
@@ -74,7 +86,7 @@ network_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
         mender_log_error("Unable to allocate memory");
         goto ERR;
     }
-    if (NULL == net_addr_ntop(AF_INET, &iface->config.ip.ipv4->unicast[0].ipv4.address.in_addr, network_info[1].value, NET_IPV4_ADDR_LEN)) {
+    if (NULL == net_addr_ntop(AF_INET, ipv4_addr, network_info[1].value, NET_IPV4_ADDR_LEN)) {
         mender_log_error("Failed to construct network inventory data");
         goto ERR;
     }
@@ -88,12 +100,14 @@ network_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
         mender_log_error("Unable to allocate memory");
         goto ERR;
     }
-    if (NULL == net_addr_ntop(AF_INET, &iface->config.ip.ipv4->unicast[0].netmask, network_info[2].value, NET_IPV4_ADDR_LEN)) {
+    netmask = net_if_ipv4_get_netmask_by_addr(iface, ipv4_addr);
+    if (NULL == net_addr_ntop(AF_INET, &netmask, network_info[2].value, NET_IPV4_ADDR_LEN)) {
         mender_log_error("Failed to construct network inventory data");
         goto ERR;
     }

     /* Gateway */
+    gateway = net_if_ipv4_get_gw(iface);
     if (mender_utils_asprintf(&(network_info[3].name), "Gateway[%s]", ifname) <= 0) {
         mender_log_error("Failed to construct network inventory data");
         goto ERR;
@@ -102,7 +116,7 @@ network_info_callback(mender_keystore_t **inventory, uint8_t *inventory_len) {
         mender_log_error("Unable to allocate memory");
         goto ERR;
     }
-    if (NULL == net_addr_ntop(AF_INET, &iface->config.ip.ipv4->gw, network_info[3].value, NET_IPV4_ADDR_LEN)) {
+    if (NULL == net_addr_ntop(AF_INET, &gateway, network_info[3].value, NET_IPV4_ADDR_LEN)) {
         mender_log_error("Failed to construct network inventory data");
         goto ERR;
     }

But I still do not get the IPv4 address from the generic Zephyr inventory path. I only get the interface name, for example:

  • Default network interface = net0

The device is connected and can talk to Mender, so it clearly has an IP, but that IP does not seem to show up through the generic inventory code on nrf9151.

I then tested another way in the application, using Nordic’s modem API modem_info_string_get(MODEM_INFO_IP_ADDRESS, ...) and sending that as a custom inventory value. That worked, and I can now see for example:

  • IPv4[modem] = 2.68.345.43

Would this be the recommended way to handle it on nrf91, or do you think it would be better to keep working on the generic Zephyr inventory backend so it can also provide the IPv4 address there?

Does that sound reasonable, or would you suggest another way?

Thanks.

Hi @Dexter9532,
thanks for sharing the details! To me it looks like (a wild guess) the net_if_ipv4_get_global_addr() doesn’t properly handle modem-based interfaces. We should definitely include your change upstream to make things more stable (avoid crashes). But I don’t think there’s a great chance of this default code working everywhere. Inventory values can be overriden so if the default network inventory callback doesn’t crash and gives another custom callback to add better/extra data, it’s as good as it can get, IMHO. Could you please submit your fix that eliminates the crash as a PR?

Thanks!
Vratislav

Hi @vpodzime,
Sorry for late reply. I have created a PR for the patch. fix: avoid crash when IPv4 info is unavailable in Zephyr inventory by Dexter9532 · Pull Request #243 · mendersoftware/mender-mcu · GitHub

Thanks! And no problem at all!