One workflow-server, one workflow-worker, 1.3.0
It looks like some devices added processed, and some are not.
Added five test devices, only two appears in pending list, both accepted: one moved to devices list, and one still in the pending list (while status changed to accepted).
I’ve added one more client device using stress test docker container, it didn’t appears in Mender pending list at all.
Jobs looks not executed by worker:
Worker logs:
time="2021-04-26T06:28:45Z" level=info msg="StartWorkflow db.InsertJob returned &{60865d9d42dd26c5db22d614 update_device_inventory [{request_id c5f30f9c-93f8-4189-b363-2571c2defe2e} {tenant_id } {device_id efb659cb-99d1-4fea-adeb-6b240662fb70} {scope identity} {attributes [{\"name\":\"mac\",\"value\":\"cf:fb:c6:0b:18:9e\",\"scope\":\"identity\"}]}] 1 [] 2021-04-26 06:28:45.02920532 +0000 UTC m=+3077.017808925},<nil>" file=entry.go func="logrus.(*Entry).Infof" line=346
time="2021-04-26T06:28:45Z" level=info msg="StartWorkflow db.InsertJob returned &{60865d9d42dd26c5db22d615 update_device_status [{request_id c5f30f9c-93f8-4189-b363-2571c2defe2e} {devices [{\"id\":\"efb659cb-99d1-4fea-adeb-6b240662fb70\",\"revision\":2}]} {tenant_id } {device_status pending}] 1 [] 2021-04-26 06:28:45.04276124 +0000 UTC m=+3077.031364748},<nil>" file=entry.go func="logrus.(*Entry).Infof" line=346
It seems that worker returns nil
and job was never executed.
Added one more device, it appears in pending list:
It really seems strange, like some jobs are handled properly, and some stays unprocessed and the only thing I can see in worker logs is something like:
time="2021-04-26T06:28:45Z" level=info msg="StartWorkflow db.InsertJob returned &{60865d9d42dd26c5db22d615 update_device_status [{request_id c5f30f9c-93f8-4189-b363-2571c2defe2e} {devices [{\"id\":\"efb659cb-99d1-4fea-adeb-6b240662fb70\",\"revision\":2}]} {tenant_id } {device_status pending}] 1 [] 2021-04-26 06:28:45.04276124 +0000 UTC m=+3077.031364748},**[nil]**" file=entry.go func="logrus.(*Entry).Infof" line=346
instead of HTTP status code if failed. Worker didn’t fails/hangs/restarted.
And I can’t find such requests in inventory log, which may explain it.
I’ll try to remove ${env.INVENTORY_ADDR|mender-inventory:8080}
from job definitions, maybe it trying to send requests to default mender-inventory
for 50% of requests…
Entire setup right now is pretty simple, single container for all microservices, service discovery (checked, no dead services):
- inventory
- device-auth
- user-adm
- gui
- create-artifact-worker
- workflows-server
- workflows-worker
- docker-api-gateway
- deployments
P.S. Tried to adjust inventory address in workflow files, and set WORKFLOWS_CONCURRENCY to 1, no luck.
That what I see in workflow_worker logs for the failed jobs:
time="2021-04-26T07:56:54Z" level=error msg="(DuplicateKey) E11000 duplicate key error collection: workflows.job_queue index: _id_ dup key: { _id: \"60867246a63e3c8cb86eb857\" }" file=process.go func=worker.processJob line=44
It’s obvious that worker can’t add new document in workflows.job_queue, but I can’t figure out why. It’s 20-50% of devices added successfully using this stress client setup:
/mender-stress-test-client -backend https://<test_mender_setup_url> -count 10 -invfreq 60 -pollfreq 60 -wait 60 -current_device qemux86-64 -inventory "device_type:qemux86-64,image_id:test,client_version:test"
That’s the failed job document in workflows.job_queue:
That’s the same failed job document in workflows.jobs:
And there are two documents for job which is not failed (workflows.job_queue, workflows.jobs):
P.P.S.
Redeployed, trying to add devices one by one.
Errors happened at update_device_inventory
and update_device_status
workflows, preventing device to register and/or move to accepted:
time="2021-04-26T09:18:52Z" level=error msg="(DuplicateKey) E11000 duplicate key error collection: workflows.job_queue index: _id_ dup key: { _id: \"6086857c925d25a7809155e5\" }" file=process.go func=worker.processJob line=44
Double checked that there’s no job with _id=6086857c925d25a7809155e5
in job_queue before adding the device.
Overall, seems really strange for me(