Since update to open-Source Mender 2.3 , I can no longer decommission any device, neither via UI nor via API. Always get 500 Internal Server Error without any explanation.
I also get 500 as a reply when I try to accept or reject authorization requests, however they seem to work even if 500 is returned.
Looking at mender-conductor logs shows that it waits for elasticsearch:
mender-conductor_1 | Fri Mar 27 18:18:15 UTC 2020 waiting for ES to be ready at mender-elasticsearch:9300
mender-conductor_1 | nc: bad address 'mender-elasticsearch:9300'
mender-conductor_1 | Fri Mar 27 18:18:16 UTC 2020 waiting for ES to be ready at mender-elasticsearch:9300
mender-conductor_1 | nc: bad address 'mender-elasticsearch:9300'
mender-conductor_1 | Fri Mar 27 18:18:17 UTC 2020 waiting for ES to be ready at mender-elasticsearch:9300
see that mender-elasticsearch is constantly restarting with following errors:
elasticsearch Logs
mender-elasticsearch_1 | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
mender-elasticsearch_1 | [2020-03-27T16:57:43,165][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [unknown] uncaught exception in thread [main]
mender-elasticsearch_1 | org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
mender-elasticsearch_1 | at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:300) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:296) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | ... 6 more
mender-elasticsearch_1 | Caused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data/nodes/0
mender-elasticsearch_1 | at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:218) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:270) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:296) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | ... 6 more
mender-elasticsearch_1 | Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes/0/node.lock
mender-elasticsearch_1 | at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
mender-elasticsearch_1 | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
mender-elasticsearch_1 | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
mender-elasticsearch_1 | at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) ~[?:?]
mender-elasticsearch_1 | at java.nio.channels.FileChannel.open(FileChannel.java:292) ~[?:?]
mender-elasticsearch_1 | at java.nio.channels.FileChannel.open(FileChannel.java:345) ~[?:?]
mender-elasticsearch_1 | at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
mender-elasticsearch_1 | at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
mender-elasticsearch_1 | at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
mender-elasticsearch_1 | at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:211) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:270) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:296) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.8.6.jar:6.8.6]
mender-elasticsearch_1 | ... 6 more
Googling these symptoms suggested that there’s an instance of elasticsearch which is running in parallel.
I tried to stop all docker containers and performed server restart but it didn’t help.
Strange thing: I see that elastic and java processes keep appearing in ps aux (outside Docker ?!), with PID incrementing several times a second.
I tried to kill these processes with kill $(ps aux | grep 'elastic' | awk '{print $2}') but command always return “No such process”, even though ps aux | grep 'elastic' shows there is one, but it terminates very quickly in a loop and always changes it’s PID.
This process appears even after a server reboot, without any docker containers running yet!
Any help will be much appreciated…
Hi @ster thanks for digging in and troubleshooting here. I tried in my local setup and could not immediately reproduce but my server is pretty much ephemeral so I started from scratch. It could be related to how long you have maintained your server, and how you handled the upgrades. Can you describe the lifecycle of your server a bit?
Hello @drewmoseley, thanks a lot for looking at this.
The server I use was set up back in time of Mender V2.0, carefully following the Production installation section of the documentation. It was then updated to 2.1 and recently to 2.3, using the git merging appropriate branches into my installation branch as described in Upgrading section of the docs. During every update, I always had merge conflicts (but I saw on the forum that I’m not the only one) and force applying incoming changes.
The problem appeared after 2.1->2.3 upgrade. Either right after, or some time later (I thought I checked all containers being up after upgrade but now I’m not sure anymore, I might have missed restarting Elasticsearch). I noticed that since 2.3 prod.yml moved under another directory and changed the format a bit (which by the way wasn’t reflected in Upgrading docs, I was only able to discover it from launch errors ;), so I manually updated and moved prod.yml.
I will carefully compare files on my branch with files in the repo, maybe something went wrong during merge conflicts resolving… Normally there shouldn’t be any differences from my branch and 2.3.0 tag (except prod.yml, keys and certificates) right?
The problem seems to be fixed by removing and recreating mender-elasticsearch-db volume.
It would be nice though to know the following, maybe someone could confirm…:
is any data from mender-elasticsearh-bd needs to be backed up and restored or it it safe to wipe it?
Seeing that migration/dump-db tool doesn’t back up anything from elasticsearch, I assumed it is safe to wipe it and indeed it seems to work fine, however it would be nice to be sure.
May the fact that I skipped a minor version be the reason of this problem (I updated from 2.1 directly to 2.3)?
You can safely remove the elasticsearch block device, it is used as an index cache about workflows status by Conductor and does not contain anything application-specific for Mender
Mender 2.3 uses Elasticsearch 6, while previous versions used Elasticsearch 5; it is possible something went wrong with the upgrade of the volume, but as I mentioned before, you can start from an empty Elasticsearch cluster and solve the issue.
802e8d72c93d mendersoftware/api-gateway:2.1.0 "/entrypoint.sh" 14 minutes ago Up 28 seconds 80/tcp, 0.0.0.0:443->443/tcp menderproduction_mender-api-gateway_1
796596a83159 mendersoftware/deployments:1.9.0 "/entrypoint.sh --co…" 14 minutes ago Up 29 seconds 8080/tcp menderproduction_mender-deployments_1
884a86ab64e6 openresty/openresty:1.13.6.2-0-alpine "/usr/local/openrest…" 14 minutes ago Up 30 seconds 0.0.0.0:9000->9000/tcp menderproduction_storage-proxy_1
eb9e38cfa5a0 mendersoftware/deviceauth:2.2.0 "/usr/bin/deviceauth…" 15 minutes ago Up 55 seconds 8080/tcp menderproduction_mender-device-auth_1
29a0fc3aac3c mendersoftware/mender-conductor:1.6.0 "/srv/start_conducto…" 15 minutes ago Up 57 seconds (health: starting) 8080/tcp, 8090/tcp menderproduction_mender-conductor_1
2b5ba91b3ac7 mendersoftware/workflows:1.0.0 "/usr/bin/workflows …" 15 minutes ago Up 54 seconds 8080/tcp menderproduction_mender-workflows-server_1
c927e027857d mendersoftware/create-artifact-worker:1.0.0 "/usr/bin/workflows …" 15 minutes ago Up 55 seconds 8080/tcp menderproduction_mender-create-artifact-worker_1
41777600600e mendersoftware/useradm:1.10.0 "/usr/bin/useradm --…" 15 minutes ago Up 55 seconds 8080/tcp menderproduction_mender-useradm_1
269907a52bfe mendersoftware/inventory:1.7.0 "/usr/bin/inventory …" 15 minutes ago Up 55 seconds 8080/tcp menderproduction_mender-inventory_1
d6479fd2ad3e redis:5-alpine3.8 "/redis/entrypoint.sh" 15 minutes ago Up About a minute 6379/tcp menderproduction_mender-redis_1
886c73307019 minio/minio:RELEASE.2018-09-25T21-34-43Z "/usr/bin/docker-ent…" 15 minutes ago Up About a minute (healthy) 9000/tcp menderproduction_minio_1
aaea94c34b85 mongo:3.6 "docker-entrypoint.s…" 15 minutes ago Up About a minute 27017/tcp menderproduction_mender-mongo_1
908791b91d19 mendersoftware/gui:2.3.0 "/entrypoint.sh nginx" 15 minutes ago Up 57 seconds 80/tcp menderproduction_mender-gui_1
43b8c402429c elasticsearch:6.8.6 "/usr/local/bin/dock…" 15 minutes ago Restarting (1) 2 seconds ago menderproduction_mender-elasticsearch_1
and ./run ps returns:
----------------------------------------------------------------------------------------------------------------------------------------
menderproduction_mender-api-gateway_1 /entrypoint.sh Up 0.0.0.0:443->443/tcp, 80/tcp
menderproduction_mender-conductor_1 /srv/start_conductor.sh Up (health: starting) 8080/tcp, 8090/tcp
menderproduction_mender-create-artifact-worker_1 /usr/bin/workflows --confi ... Up 8080/tcp
menderproduction_mender-deployments_1 /entrypoint.sh --config /e ... Up 8080/tcp
menderproduction_mender-device-auth_1 /usr/bin/deviceauth --conf ... Up 8080/tcp
menderproduction_mender-elasticsearch_1 /usr/local/bin/docker-entr ... Restarting
menderproduction_mender-gui_1 /entrypoint.sh nginx Up 80/tcp
menderproduction_mender-inventory_1 /usr/bin/inventory --confi ... Up 8080/tcp
menderproduction_mender-mongo_1 docker-entrypoint.sh mongod Up 27017/tcp
menderproduction_mender-redis_1 /redis/entrypoint.sh Up 6379/tcp
menderproduction_mender-useradm_1 /usr/bin/useradm --config ... Up 8080/tcp
menderproduction_mender-workflows-server_1 /usr/bin/workflows --confi ... Up 8080/tcp
menderproduction_minio_1 /usr/bin/docker-entrypoint ... Up (healthy) 9000/tcp
menderproduction_storage-proxy_1 /usr/local/openresty/bin/o ... Up 0.0.0.0:9000->9000/tcp
No, you just delete the “buggy” volume with a standard docker command.
Once deleted, it will be recreated healthy after next docker run, or you have to recreate it manually using a command from “Production installation” procedure described in docs, don’t really remember, most probably the second.
I removed the elasticsearch with docker rm 43b8c402429c , where “43b8c402429c” is the id of my elasticsearch container.
Then I executed ./run up -d but elasticsearch is still restarting.
As pointed out by @ster, I meant you can lose/delete your Elasticsearch volume and start with a new one. The content of the volume is not critical and you can start with an empty Elasticsearch without any issue.
Anyway, I see you already solved the issue removing the volume and creating it again, well done!