Mender 3.4 k8s deployment on AWS doesn't recover after scaling down

I’ve deployed the opensource version of Mender 3.4 on AWS with Kubernetes using eksctl. I didn’t want to have this server running continuously, so I scaled it down to 0 nodes. When I came back and scaled it back up, the “create-artifact-worker” is failing with the log message:

time="2022-10-12T18:22:29Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38                                                                                                    
time="2022-10-12T18:22:29Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=125                                              
time="2022-10-12T18:22:29Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=migrator_simple.go func="migrate.(*SimpleMigrator).Apply" line=140                                                    
time="2022-10-12T18:22:29Z" level=info msg="LoadWorkflows: loading 1 workflows from /etc/workflows/definitions." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=199                    
time="2022-10-12T18:22:29Z" level=info msg="LoadWorkflows: loading generate_artifact v2." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=201                                           
time="2022-10-12T18:22:29Z" level=info msg="LoadWorkflows: error loading: generate_artifact v2: Workflow already exists." file=datastore_mongo.go func="mongo.(*DataStoreMongo).LoadWorkflows" line=207           
time="2022-10-12T18:22:29Z" level=info msg="nats client closed the connection" file=client.go func=nats.NewClientWithDefaults.func1.1 line=86                                                                     
2022/10/12 18:22:29 failed to subscribe to the nats JetStream: cannot create a queue subscription for a consumer without a deliver group                                                                          
Stream closed EOF for default/create-artifact-worker-7947674c9f-cllfd (workflows)

Is there a way to recover from this? I would like to be able to scale this cluster to 0 to reduce cost when it’s not in use.

1 Like