Mender client running out of memory

My updates are stalling on a handful Raspberry Pi 3b+ devices because mender is running out of memory. The system’s memory utilization normally hovers around 40%, but during the update, I see it spike to 100% and then the client crashed with an “out of memory” error. Most devices update successfully, and only spike to around 80% utilization during the update. However the devices that are failing fail repeatedly on multiple tries, even after a reboot.

I wonder if the high memory usage is indicative of another issue? Any tips for things I can do to bring down mender’s memory usage and successfully complete the update?

Hi,

Are you using gzip or lzma compressed Artifacts? I suspect that this might have some impact on memory utilization while the Artifacts are decompressed.

gzip would probably use less memory, while lzma would be heavier versus much better compression then gzip.

But saying that, I would not have expected to see the Mender client utilize 60 % of your available memory regardless of which compression method is used.

Could you share cat /proc/$(pidof mender)/status | grep Vm while an update is in progress?

Thanks for taking a look, Mirza!

I’m using mender-convert, which I think defaults to gzip compression IIRC?

I just watched an update occur, and mender was definitely consuming 60% of memory according to top (see screenshot).

Here’s the output for cat /proc/$(pidof mender)/status | grep Vm:

VmPeak:	  926892 kB
VmSize:	  926892 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	  567872 kB
VmRSS:	  567872 kB
VmData:	  681644 kB
VmStk:	     132 kB
VmExe:	    5228 kB
VmLib:	            1440 kB
VmPTE:	     600 kB
VmPMD:	       0 kB
VmSwap:	       0 kB

And here’s the log when it crashes: https://gist.github.com/neilgupta/683deb58bfd0ce97006d686ac6f37dd2

As a side note: since mender.service only restarts on abort, mender does not recover from this crash so the server gets stuck thinking the update is still downloading and has no way to recover.

The device is still running mender client 1.7, while the server is 2.3.

Thanks for sharing the information.

Do you have the possibility to test this with the Mender client (2.2.0)? I know that there have been memory leaks in older versions caused by the golang standard library.

As a side note: since mender.service only restarts on abort, mender does not recover from this crash so the server gets stuck thinking the update is still downloading and has no way to recover.

This looks like something that maybe should be fixed. @kacf what do you think?

I have ssh access to some of the devices, so I can try updating mender client manually. Should I just follow the install from source steps at https://github.com/mendersoftware/mender? Will that cause any issues if the image was created with an older mender-convert?

Oh nevermind, just found the debian packages at https://docs.mender.io/2.3/downloads

Second question still stands of whether there would be any conflict with an older integration?

Hard to say, if you just use the binary from the deb package that would minimize any potential conflicts.

But I do not know if I can recommend to this on a device that is in the field, and best if you can test this in a controlled environment.

I’m unable to recreate the issue on a device locally. It’s only happening on 5 of the 20 devices this latest update has been pushed to so far, so I need to find a way to fix it remotely.

I tried killing all user processes to free up more RAM before performing the update and noticed that mender’s memory usage appears to be fixed at around 60% somehow? Out of memory made sense when my starting utilization was 40%, but after a reboot, my system’s memory utilization spiked from 30% to 85%. After lowering my memory usage even further to 20%, the system spiked to 75% and then threw the same out of memory error. I’m not sure if that’s pointing to a different issue or just a red herring.

Devices that succeeded saw the same memory spike, but did not crash.

Any ideas on how I can get the update to succeed on a remote device?

Can we try to eliminate that you are using lzma compressed Artifacts, as both have been default in mender-convert and it might depend on which version you are using.

To check your this, can you please unpack the Artifact you are trying to deploy

E.g

$ tar xvf Ubuntu-Bionic-x86-64-qemux86_64-mender.mender 
version
manifest
header.tar.gz
data/0000.tar.gz

The file ending of data/0000.tar hints which compression algorithm was used. In my example it is gzip.

The situation does seem to somewhat environment specific, as you are not able to re-produce this locally and we have not seen similar reports earlier. So I do not think that is strictly only the Mender client in play here.

How much actual memory do you have on the devices, so far we have been speaking in % but can you also check with free -h, what is the total available memory on remote and local devices.

I checked this on my own, since you are using Mender Client 1.7 you are using gzip as it does not support lzma

It’s a raspberry pi 3B+, so 1GB total ram.

free -h during normal usage (no mender update):

              total        used        free      shared  buff/cache   available
Mem:           976M        242M        369M         49M        364M        633M
Swap:           99M          0B         99M

memory usage than climbs roughly 1mb/second as it downloads the update until it hits its peak:

              total        used        free      shared  buff/cache   available
Mem:           976M        788M         24M         49M        163M         87M
Swap:           99M          0B         99M

and then it immediately returns to

              total        used        free      shared  buff/cache   available
Mem:           976M        237M        645M         49M         94M        643M
Swap:           99M          0B         99M

Also confirmed that the mender artifact is gzip compressed.