Mender and ulimit (number of open files) on raspbian

We just got some of our devices that throw “too many open files” errors.
When we display which process consumes open files, mender and mender connect use a lot

root@mapado-box-dee46275:/home/dummy# lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
    731 mender
    585 mender-co
    285 python
    180 colord
    102 systemd-t
     99 systemd
     72 wpa_suppl
     66 systemd-j
     60 rngd
     59 cupsd

We are on raspbian an ulimit (soft limit) is set to 1024.
Is there something wrong with our installation or mender do use a lot of open files.
Thanks for your help

I took a look at this and immediately noticed one file descriptor which is not closed, which I’ve fixed here. Perhaps you could give this a try?

It doesn’t necessarily fix your problem though, since the garbage collector should have cleaned this up eventually anyway. So there may be a deeper problem. If you’re still seeing the problem after attempting that patch (or if applying it is tricky), could you run this command on the device after some file descriptors have accumulated, and post the result?

ls -l /proc/$(pgrep '^mender$')/fd

Does this helps you ?

ls -l /proc/$(pgrep '^mender$')/fd
total 0
lr-x------ 1 root root 64 Jan  2 16:55 0 -> /dev/null
lrwx------ 1 root root 64 Jan  2 16:55 1 -> 'socket:[16932]'
lrwx------ 1 root root 64 Jan 10 09:18 10 -> 'socket:[6958168]'
lrwx------ 1 root root 64 Jan 10 09:18 11 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Jan 10 09:18 12 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Jan 10 09:18 13 -> 'socket:[15986]'
lrwx------ 1 root root 64 Jan 10 09:18 14 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Jan  2 16:55 2 -> 'socket:[16932]'
lrwx------ 1 root root 64 Jan 10 09:18 3 -> 'socket:[16951]'
lrwx------ 1 root root 64 Jan 10 09:18 4 -> 'anon_inode:[eventpoll]'
lr-x------ 1 root root 64 Jan 10 09:18 5 -> 'pipe:[16952]'
l-wx------ 1 root root 64 Jan 10 09:18 6 -> 'pipe:[16952]'
lrwx------ 1 root root 64 Jan 10 09:18 7 -> /data/mender/mender-store-lock
lrwx------ 1 root root 64 Jan 10 09:18 8 -> /data/mender/mender-store
lrwx------ 1 root root 64 Jan 10 09:18 9 -> /data/mender/mender-store

Hmm, I would have expected a far greater number of entries. In your first post, mender had more than 700 open file descriptors. Why does it have so few in the next post? Can you try to match the two?

Here is the two command line launched at the same time

:~# lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
    516 mender
    507 mender-co
    285 python
    180 colord
    159 systemd
    102 systemd-t
     72 wpa_suppl
     69 systemd-j
     60 rngd
     59 (sd-pam)
:~# ls -l /proc/$(pgrep '^mender$')/fd
total 0
lr-x------ 1 root root 64 13 janv. 09:57 0 -> /dev/null
lrwx------ 1 root root 64 13 janv. 09:57 1 -> 'socket:[16017]'
lrwx------ 1 root root 64 13 janv. 10:33 10 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 13 janv. 10:33 11 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 13 janv. 10:33 12 -> 'socket:[48668]'
lrwx------ 1 root root 64 13 janv. 10:33 13 -> 'socket:[15182]'
lrwx------ 1 root root 64 13 janv. 10:33 14 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 13 janv. 09:57 2 -> 'socket:[16017]'
lrwx------ 1 root root 64 13 janv. 10:33 3 -> 'socket:[15173]'
lrwx------ 1 root root 64 13 janv. 10:33 4 -> 'anon_inode:[eventpoll]'
lr-x------ 1 root root 64 13 janv. 10:33 5 -> 'pipe:[15174]'
l-wx------ 1 root root 64 13 janv. 10:33 6 -> 'pipe:[15174]'
lrwx------ 1 root root 64 13 janv. 10:33 7 -> /data/mender/mender-store-lock
lrwx------ 1 root root 64 13 janv. 10:33 8 -> /data/mender/mender-store
lrwx------ 1 root root 64 13 janv. 10:33 9 -> /data/mender/mender-store

Looks like lsof lists a lot of extra things as well, such as memory mappings. It also relists everything for every thread which exists in the process, even though they share the open file descriptors, which artificially increases the number a lot. So I would trust the data from /proc more.

However, I could confirm that quite a few descriptors were being held open by the /etc/mender/scripts/version file, which I fixed in the pull request mentioned in my first reply. I would retry with this patch.

Thanks for your reply.
To apply the patch, do i need to compile things or is it just a matter of replacing a file ?

You need to compile.

But you could also just wait. We are working on making new releases as we speak, so it won’t be too long before they are out and available as binary packages.

Thanks for help