Installation hungup

Hi,

We are using mender for OTA update.

From today, our device kitting process is hanged up at installing mender. Our process is to run `get-mender.sh --force-mender-client4` for installation.

Yesterday, I got a successful kitting at version 5.0.3-1+ubuntu+jammy. But now the installer might try to install 5.1.0-1+ubuntu+jammy and try to configure at installation timing and waiting interactive response from user.

root       18563  0.0  0.0  10176  3448 pts/2    S+   19:15   0:00 /bin/bash ./get-mender.sh --force-mender-client4
root       20174  0.4  0.1 111912 97368 pts/2    S+   19:15   0:02 apt-get install -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold mender-client4 mender-configure mender-connect
root       20303  0.0  0.0   9780  3112 pts/3    S+   19:15   0:00 /bin/bash /var/lib/dpkg/info/mender-setup.postinst configure
root       20305  0.0  0.0 1231976 9884 pts/3    Sl+  19:15   0:00 mender-setup --quiet --device-type eac-035.local --demo=false --hosted-mender --tenant-token Paste your Hosted Mender token here --update-poll 1800 --inventory-poll 28800 --retry-poll 300
 

In our kitting procedure is done by ansible automatically so this design change causes hangup problem on our system.

Please revert the change as soon as possible or release new install script to be able to skip interactive configuration at install timing!!!

Hi @piste-jp,

Thanks for reaching out. I’m in touch with the team to discuss a way forward here, but one remark: the get-mender.sh script is not meant for production, and will exhibit changing behaviour over time. If it blocks somewhere then it is a bug and we will fix it definitely, but there have been situations, and will be more where it changes the selection of installed tools, the semantics or other things. This particular script is meant as an onboarding tool for people evaluating Mender on a one-off board on their desk and does not give any lifecycle guarantees.

As you already have Ansible in the loop, I would strongly encourage moving the Mender installation procedure in there, then you also have full control over the versioning.

Greetz,
Josef

Thank you for your response.

I already reported the problem to the tech support and it was addresses as #11816. But actually no progress at this time.

Actually, the get_mender.sh script is not a problem. The problem might be in the post-install script in the mender-setup (1.0.3-1+ubuntu+jammy), it kicks mender-setup command automatically at the end of installation from 1.0.3-1.

It might not be a problem if the device is already fully supported. But my device is not supported and we need to replace mender-device-identity script before device registration.

My concern is we need to replace mender-device-identity after device registration and it affects something on the hosting server side.

My wish is “do not call mender-setup at installation if no setup option is provided into get_mender.sh” OR “introduce new option to skip mender-setup at installation into gen_mender.sh or somewhere”.

I don’t know if this can help, but I asked ChatGPT, and it gave the folowing tips:

The error:

malloc(): unsorted double linked list corrupted

comes from glibc’s heap allocator (ptmalloc) and indicates heap metadata corruption—specifically, the unsorted bin (a doubly linked list of freed chunks) has been tampered with or is inconsistent.

This is not the root cause; it is a symptom detected during a malloc/free operation after memory has already been corrupted.


:magnifying_glass_tilted_left: What exactly is corrupted?

glibc maintains freed memory chunks in bins (fastbins, small bins, large bins, unsorted bin).
The unsorted bin is a temporary holding list for recently freed chunks.

Each chunk has metadata:

  • fd (forward pointer)

  • bk (backward pointer)

This error means:

The allocator detected that fd->bk != chunk or bk->fd != chunk

:backhand_index_pointing_right: In other words: the double-linked list is broken → memory corruption


:warning: Most common root causes

1. Buffer overflow / underflow

Writing beyond allocated memory overwrites heap metadata.

char *p = malloc(10);
strcpy(p, "this string is too long");  // overflow

:check_mark: Very common cause


2. Use-after-free

Accessing memory after it has been freed.

char *p = malloc(10);
free(p);
p[0] = 'A';  // corrupts freed chunk metadata


3. Double free

Freeing the same pointer twice corrupts bin structures.

char *p = malloc(10);
free(p);
free(p);  // boom


4. Invalid pointer passed to free()

Freeing something not returned by malloc.

char *p = malloc(10);
free(p + 1);  // invalid pointer


5. Heap corruption from struct overwrite

Overwriting adjacent allocations:

struct A { char buf[8]; };
struct A *a = malloc(sizeof(struct A));
struct A *b = malloc(sizeof(struct A));

strcpy(a->buf, "overflowing data...");  // overwrites b's metadata


6. Concurrency issues (less common but possible)

  • Race conditions modifying shared heap memory

  • Incorrect locking in multi-threaded code


:brain: Why it appears during malloc()

Important:

:backhand_index_pointing_right: The crash often happens later, not at the moment of corruption.

Example:

  1. You corrupt memory

  2. Program continues

  3. Later malloc() or free() checks the heap → detects inconsistency → aborts


:hammer_and_wrench: How to debug it effectively

1. Use AddressSanitizer (best first step)

gcc -fsanitize=address -g your_program.c
./a.out

:check_mark: Pinpoints exact line of corruption


2. Use Valgrind

valgrind --leak-check=full ./your_program

:check_mark: Detects:

  • invalid writes

  • use-after-free

  • double free


3. Enable glibc heap checks

export MALLOC_CHECK_=3

:check_mark: Makes allocator stricter and fail earlier


4. Use GDB with heap debugging

gdb ./your_program
run

Then:

bt


5. Compile with debug + no optimisations

gcc -g -O0 your_program.c


:puzzle_piece: Quick diagnostic checklist

  • Did you write beyond allocated buffer?

  • Did you free the same pointer twice?

  • Are you using memory after free?

  • Are you freeing stack memory?

  • Any pointer arithmetic before free?

  • Any struct/array overflow?


:brain: Practical insight (important)

This specific error:

unsorted double linked list corrupted

usually means:

:red_exclamation_mark: You corrupted a freed chunk, not an active one

So prioritise investigating:

  • use-after-free

  • writes after free

  • double free patterns


Best

Baya