My updates are stalling on a handful Raspberry Pi 3b+ devices because mender is running out of memory. The system’s memory utilization normally hovers around 40%, but during the update, I see it spike to 100% and then the client crashed with an “out of memory” error. Most devices update successfully, and only spike to around 80% utilization during the update. However the devices that are failing fail repeatedly on multiple tries, even after a reboot.
I wonder if the high memory usage is indicative of another issue? Any tips for things I can do to bring down mender’s memory usage and successfully complete the update?
Are you using gzip or lzma compressed Artifacts? I suspect that this might have some impact on memory utilization while the Artifacts are decompressed.
gzip would probably use less memory, while lzma would be heavier versus much better compression then gzip.
But saying that, I would not have expected to see the Mender client utilize 60 % of your available memory regardless of which compression method is used.
Could you share cat /proc/$(pidof mender)/status | grep Vm while an update is in progress?
As a side note: since mender.service only restarts on abort, mender does not recover from this crash so the server gets stuck thinking the update is still downloading and has no way to recover.
The device is still running mender client 1.7, while the server is 2.3.
Do you have the possibility to test this with the Mender client (2.2.0)? I know that there have been memory leaks in older versions caused by the golang standard library.
As a side note: since mender.service only restarts on abort, mender does not recover from this crash so the server gets stuck thinking the update is still downloading and has no way to recover.
This looks like something that maybe should be fixed. @kacf what do you think?
I have ssh access to some of the devices, so I can try updating mender client manually. Should I just follow the install from source steps at https://github.com/mendersoftware/mender? Will that cause any issues if the image was created with an older mender-convert?
I’m unable to recreate the issue on a device locally. It’s only happening on 5 of the 20 devices this latest update has been pushed to so far, so I need to find a way to fix it remotely.
I tried killing all user processes to free up more RAM before performing the update and noticed that mender’s memory usage appears to be fixed at around 60% somehow? Out of memory made sense when my starting utilization was 40%, but after a reboot, my system’s memory utilization spiked from 30% to 85%. After lowering my memory usage even further to 20%, the system spiked to 75% and then threw the same out of memory error. I’m not sure if that’s pointing to a different issue or just a red herring.
Can we try to eliminate that you are using lzma compressed Artifacts, as both have been default in mender-convert and it might depend on which version you are using.
To check your this, can you please unpack the Artifact you are trying to deploy
E.g
$ tar xvf Ubuntu-Bionic-x86-64-qemux86_64-mender.mender
version
manifest
header.tar.gz
data/0000.tar.gz
The file ending of data/0000.tar hints which compression algorithm was used. In my example it is gzip.
The situation does seem to somewhat environment specific, as you are not able to re-produce this locally and we have not seen similar reports earlier. So I do not think that is strictly only the Mender client in play here.
How much actual memory do you have on the devices, so far we have been speaking in % but can you also check with free -h, what is the total available memory on remote and local devices.