oldev  
                
               
                 
              
                  
                    October 20, 2023,  1:00pm
                   
                   
              1 
               
             
            
              Hi everyone,
I’m running the kubernetes based mender server (Chart version: mender-5.2.3. App Version: 3.6.2) and I have an issue with the workflows-worker pod.
The pod was running for 41 days without restarts, but today I performed a manual kubectl delete pod mender-workflows-worker-..., because I changed the smtp secret with a helm upgrade and the pod was not restarted automatically.
Since the manual pod restart, the pod has been in CrashLoopBackOff state with the following error:
kubectl logs -f mender-workflows-worker-859b66449-zjcl2
time="2023-10-20T11:59:42Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38
time="2023-10-20T11:59:42Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=124
time="2023-10-20T11:59:42Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=139
time="2023-10-20T11:59:47Z" level=info msg="nats client closed the connection" file=client.go func=nats.NewClientWithDefaults.func1.1 line=91
2023/10/20 11:59:47 failed to apply Jetstream consumer migrations: context deadline exceeded
 
I updated the release with the following command, so hooks were not  disabled:
helm upgrade mender mender/mender --version 5.2.3 -f mender-3.6.2.yml
 
and only the smtp secret was changed:
helm diff revision mender 1 2
default, smtp, Secret (v1) has changed:
  # Source: mender/templates/secret-smtp.yaml
  apiVersion: v1
  kind: Secret
  metadata:
    labels:
      app.kubernetes.io/component: secrets
      app.kubernetes.io/instance: mender
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: smtp
      app.kubernetes.io/part-of: mender
      helm.sh/chart: mender
    name: smtp
    namespace: default
  data:
-   EMAIL_SENDER: '-------- # (14 bytes)'
-   SMTP_AUTH_MECHANISM: '-------- # (5 bytes)'
-   SMTP_HOST: '-------- # (12 bytes)'
-   SMTP_PASSWORD: '-------- # (4 bytes)'
-   SMTP_SSL: '-------- # (5 bytes)'
-   SMTP_USERNAME: '-------- # (4 bytes)'
+   EMAIL_SENDER: '++++++++ # (... bytes)'
+   SMTP_AUTH_MECHANISM: '++++++++ # (... bytes)'
+   SMTP_HOST: '++++++++ # (... bytes)'
+   SMTP_PASSWORD: '++++++++ # (...bytes)'
+   SMTP_SSL: '++++++++ # (... bytes)'
+   SMTP_USERNAME: '++++++++ # (... bytes)'
 
I do see some activity in the mender-nats-0 pod every time the workflow-worker restarts.
[7] 2023/09/08 12:41:58.668123 [INF] 10.224.0.59:6222 - rid:14 - Router connection closed: Duplicate Route
[7] 2023/09/08 12:42:05.595351 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/08 12:42:09.849093 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/10/20 11:49:05.650113 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/10/20 11:49:11.168638 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/10/20 11:49:31.952829 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/10/20 11:49:59.946672 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/10/20 11:50:47.943136 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
 
Any help would be greatly appreciated. 
Kind Regards
             
            
               
               
               
            
            
           
          
            
              
                oldev  
                
               
              
                  
                    October 20, 2023,  1:07pm
                   
                   
              2 
               
             
            
              I tried a helm rollback to revision 1, which did not fix the issue
helm history mender
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION     
1               Fri Sep  8 14:39:58 2023        superseded      mender-5.2.3    3.6.2           Install complete
2               Fri Oct 20 13:41:32 2023        superseded      mender-5.2.3    3.6.2           Upgrade complete
3               Fri Oct 20 13:52:53 2023        deployed        mender-5.2.3    3.6.2           Rollback to 1 
 
             
            
               
               
               
            
            
           
          
            
              
                robgio  
                
               
              
                  
                    October 20, 2023,  1:22pm
                   
                   
              3 
               
             
            
              Hi @oldev  , 
can you please try with an updated NATS version?
In your values file you can specify:
nats:
  nats:
    image: nats:2.9.21-scratch
 
The default image is nats:2.7.4-scratch.
When you run the helm upgrade, a new migration job should run and fix the NATS Jetstream. After that can you try restarting the workflows-worker and server pods?
Thanks
             
            
               
               
              1 Like 
            
            
           
          
            
              
                oldev  
                
               
              
                  
                    October 20, 2023,  1:34pm
                   
                   
              4 
               
             
            
              Hi, 
thanks for the quick reply.
The new revision is looking good.
The nats pods were upgraded to a new image and the workflows server and worker are up and running after a pod restart
Here are the new worker logs:
kubectl logs -f mender-workflows-worker-859b66449-6cgsg
time="2023-10-20T13:28:31Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38
time="2023-10-20T13:28:31Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=124
time="2023-10-20T13:28:31Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=139
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading 9 workflows from /etc/workflows/definitions." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=187
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading decommission_device v10." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: error loading: decommission_device v10: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading deploy_device_configuration v2." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: error loading: deploy_device_configuration v2: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading provision_external_device v3." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: error loading: provision_external_device v3: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading reindex_inventory v3." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: error loading: reindex_inventory v3: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading reindex_reporting v3." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: error loading: reindex_reporting v3: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:31Z" level=info msg="LoadWorkflows: loading update_device_status v7." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: error loading: update_device_status v7: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: loading provision_device v10." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: error loading: provision_device v10: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: loading reindex_reporting_deployment v1." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: error loading: reindex_reporting_deployment v1: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: loading update_device_inventory v3." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=189
time="2023-10-20T13:28:32Z" level=info msg="LoadWorkflows: error loading: update_device_inventory v3: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=195
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=9
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=6
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=8
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=3
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=1
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=7
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=0
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=2
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=4
time="2023-10-20T13:28:32Z" level=info msg="worker starting up" file=worker.go func=worker.InitAndRun.func2 line=114 worker_id=5