"Showing 0 of 1 device pending authorization": device missing in authorization interface

I’m running version 2.5.0 of mender server, freshly installed and configured.
With one device out of six the authorization process has a problem: the ‘Pending’ tab shows the number 1, but no device is shown. In the following screenshot the “faulty” device and two others are connected to the network, but only two show up.

After the authorization of the two devices, as you can see, the page says “Showing 0 of 1 device pending authorization”. No filter is enabled.

In a previous installation of the server (v2.4.0) the “faulty” device was normally associated.

Do you have suggestions on how better investigate the problem?

I don’t know if it’s related, but I’ve noted two recurring warnings in the logs:

set 29 14:45:13 mender run[12181]: mender-api-gateway_1             | 172.17.254.26 - - [29/Sep/2020:12:45:13 +0000] "POST /api/devices/v1/authentication/auth_requests HTTP/1.1" 200 667 "-" "Go-http-client/1.1" "-" request ID "a9d5878b-925e-410f-95a3-12ae5f9ff297" 0.013
set 29 14:45:15 mender run[12181]: mender-device-auth_1             | time="2020-09-29T12:45:15Z" level=warning msg="Failed to extract identity from header: malformed authorization data" file=entry.go func="logrus.(*Entry).Warnf" line=354 request_id=1d9bfd95-c18d-4cf7-b9ed-952548474a21
set 29 14:45:15 mender run[12181]: mender-device-auth_1             | time="2020-09-29T12:45:15Z" level=info msg="Token 9df507ee-d283-4c08-99b0-8950a1798de4 assigned to device 88e9cd25-3a61-4582-8a5d-377e8ec38857" file=entry.go func="logrus.(*Entry).Infof" line=346 request_id=1d9bfd95-c18d-4cf7-b9ed-952548474a21
set 29 14:45:15 mender run[12181]: mender-device-auth_1             | time="2020-09-29T12:45:15Z" level=info msg="200 11933μs POST /api/devices/v1/authentication/auth_requests HTTP/1.1 - Go-http-client/1.1" byteswritten=672 file=entry.go func="logrus.(*Entry).Print" line=300 method=POST path=/api/devices/v1/authentication/auth_requests qs= request_id=1d9bfd95-c18d-4cf7-b9ed-952548474a21 responsetime=0.011933706 status=200 ts="2020-09-29 12:45:15.784480028 +0000 UTC" type=http
set 29 14:45:15 mender run[12181]: mender-api-gateway_1             | 172.17.254.30 - - [29/Sep/2020:12:45:15 +0000] "POST /api/devices/v1/authentication/auth_requests HTTP/1.1" 200 672 "-" "Go-http-client/1.1" "-" request ID "1d9bfd95-c18d-4cf7-b9ed-952548474a21" 0.013

Thank you!

Do you have any devices listed in the “Rejected” tab?

Drew

Nope…

I don’t know if it is related, but I have another problem (should I open another thread?): creating a deployment targeting a group fails with: “Error creating deployment. internal error [Request ID: ac3001e3]”
Instead if targeting ‘all devices’ there is no error and the deployment goes as expected.

set 30 18:44:21 mender run[12181]: mender-deployments_1             | time="2020-09-30T16:44:21Z" level=error msg="tenant ID not present in the context" file=app.go func="app.(*Deployments).CreateDeployment" line
set 30 18:44:21 mender run[12181]: mender-deployments_1             | time="2020-09-30T16:44:21Z" level=error msg="Internal error" file=view.go func="view.(*RESTView).RenderInternalError" line=58 request_id=e8395
set 30 18:44:21 mender run[12181]: mender-deployments_1             | time="2020-09-30T16:44:21Z" level=info msg="500 1078μs POST /api/management/v1/deployments/deployments/group/lorac-dev HTTP/1.1 - Mozilla/5.0   

@tranchitella @merlin any thoughts here?

Ciao @spratesi,

First issue:

Can you please share with us a bit more details about your deployment? Are you running Mender using docker-compose, following the production instructions in our documentation? Can you please check the content of the deviceauth and inventory databases, devices collections (both of them) and check how many devices you have there? Are all the containers up and running, including both workflows-server and workflows-worker? Can you upload somewhere all the logs from all the containers?

Second issue:

Are you running the Open Source version of Mender? Which kind of Deployment are you creating, targeting a single device, or a group?

Ciao @tranchitella :wink:

Can you please share with us a bit more details about your deployment?

Sure.
Thank you for your assistance.

Are you running the Open Source version of Mender?

Yes

Which kind of Deployment are you creating, targeting a single device, or a group?

On “Release” tab I click on “CREATE A DEPLOYMENT WITH THIS RELEASE”. Then I select one of the groups I have. Then “NEXT”. Then “CREATE”. Here I have the error.
If instead I select “All devices” instead of a group the error doesn’t happen.

Are you running Mender using docker-compose, following the production instructions in our documentation?

Yes, the only difference is that I’m using a systemd service to automatically handle the process:

[Unit]    
Description=Mender docker embedded image upgrade services    
    
# Upgrading:    
# https://docs.mender.io/2.x/administration/upgrading    
    
Requires=docker.service    
After=docker.service systemd-user-sessions.service    
After=network-online.target network.target    
    
[Service]    
Restart=on-failure    
    
WorkingDirectory=/opt/mender-server/production/    
ExecStartPre=/opt/mender-server/production/run pull    
ExecStart=/opt/mender-server/production/run up    
ExecStop=/opt/mender-server/production/run stop    
    
[Install]    
WantedBy=multi-user.target

And I have /var/lib/docker/volumes as symlink to /srv/volumes, but this shouldn’t be a problem…

The machine has Debian Buster installed, docker and docker-compose from Debian repositories:

  • docker.io==18.09.1+dfsg1-7.1+deb10u2
  • docker-compose==1.21.0.3

I previously run on the same machine v2.4.0 of Mender, and I never encountered this problem. Then I found a problem upgrading to 2.5.0, in the logs I saw an error with mongo DB (failed to connect to db: Error reaching mongo server: context deadline exceeded). Thinking to an error in my configurations and having no problem in redoing all devices associations from scratch I purged and reinstalled docker.io (thus erasing all volumes), and followed from the start: Production installation | Mender documentation (keeping my old keys).
I noted that in version 2.5.0 you have to create less volumes than in 2.2.0 (the first version I tested).

Can you please check the content of the deviceauth and inventory databases, devices collections (both of them) and check how many devices you have there?

Can you kindly tell me how to check this?

Are all the containers up and running, including both workflows-server and workflows-worker?

I’ve always checked this (following to the problems I had with mongo), and yes, all up:

/o/m/production# ./run ps                                                                                                                                                     
                      Name                                    Command                  State                  Ports            
-------------------------------------------------------------------------------------------------------------------------------
menderproduction_mender-api-gateway_1              /entrypoint.sh                   Up             0.0.0.0:443->443/tcp, 80/tcp
menderproduction_mender-create-artifact-worker_1   /usr/bin/workflows --confi ...   Up             8080/tcp                    
menderproduction_mender-deployments_1              /entrypoint.sh --config /e ...   Up             8080/tcp                    
menderproduction_mender-device-auth_1              /usr/bin/deviceauth --conf ...   Up             8080/tcp                    
menderproduction_mender-gui_1                      /entrypoint.sh nginx             Up (healthy)   80/tcp                      
menderproduction_mender-inventory_1                /usr/bin/inventory --confi ...   Up             8080/tcp                    
menderproduction_mender-mongo_1                    docker-entrypoint.sh mongod      Up             27017/tcp                   
menderproduction_mender-useradm_1                  /usr/bin/useradm --config  ...   Up             8080/tcp                    
menderproduction_mender-workflows-server_1         /usr/bin/workflows --confi ...   Up             8080/tcp                    
menderproduction_mender-workflows-worker_1         /usr/bin/workflows --confi ...   Up                                         
menderproduction_minio_1                           /usr/bin/docker-entrypoint ...   Up (healthy)   9000/tcp                    
menderproduction_storage-proxy_1                   /usr/local/openresty/bin/o ...   Up             0.0.0.0:9000->9000/tcp

Can you upload somewhere all the logs from all the containers?

I’m working on it

@spratesi

The group you select, is it a static group?

Regarding the mongodb database, you can enter the container and run the mongo client with:
docker exec -it menderproduction_mender-mongo_1 mongo

From there, you can enter databases and query the collections:

> use deviceauth
> db.devices.find()
> db.auth_sets.find()
> use inventory
> db.devices.find()

Ok, something new happened: with a reboot of the machine the missing pending to-be-associated device appeared:

(If you look at the time of the request you see that it is two days ago (now it’s 2020-10-01)…so it seems it was somehow only an error in displaying it? Moreover the device has been powered off since two days ago…)

So at this moment this issue is solved…but I don’t know how…and if it will happen again in the future

Anyway the problem of the deployment to a group is still here…(we should maybe change the topic title or split it in two?)

Here the logs of the service from boot, with a failed deployment to a group and a successful deployment to “All devices”:

The group you select, is it a static group?

Yes, having the open source version, is the only kind of groups I have at disposal

Here the result of DB querying (but as said above the problem with the association has disappeared):

https://pastebin.com/J1AKDzrT

hello @spratesi

there is a bug in 2.5.0 and I think you are hitting it with the deployments to static groups. it was fixed and will be a part of the upcoming release. I am sorry for inconvenience.

peter

Thank you. Will the release be advertised somewhere? So I can update as soon as possible…

Our releases are all posted on our blog. I believe they are also cross posted in the main forums here.

@spratesi as a workaround, you could change both deployments and inventory images, to use:

mendersoftware/deployments:mender-2.5.x
mendersoftware/inventory:mender-2.5.x

please take a backup of the installation first.
the above is the same as mentioned here

peter

I had the same issue as @spratesi at the top of this thread regarding a pending device that is not visible in the UI. However my device did not appear, even after multiple rebooots/resets. What did help me though were your mongodb commands.

After removing the device from the database I could re-register it and it appeared in the UI. These were the commands I’ve used

# start a mongodb shell
kubectl exec -it mongo-$YOURID -- bash
mongodb --username root --password $YOURPASS
> use deviceauth
# find the broken device id
> db.devices.find({"id_data_struct.mac":"$DEVICE_MAC"})
# delete it
> db.devices.remove({"_id" : "$DEVICE_ID"})
# same for auth_sets
> db.auth_sets.find({"id_data_struct.mac":"$DEVICE_MAC"})
> db.auth_sets.remove({"_id" : "$AUTHSET_ID"})

PS: I’m using the kubernetes helm deployment on 2.5.0