Deployment of AKS based mender-server Version 3.6.2 not completely successful

Hi all,

Yesterday I deployed the mender chart version “mender-5.2.3 Chart” = Server Version 3.6.2 on Azure Kubernetes Service.

After the deployment not all container wenn into Running state. These two keep crashing for the same reason.

NAME                                                  STATUS            
mender-create-artifact-worker-6b9f44d949-7bktj        CrashLoopBackOff
mender-workflows-worker-859b66449-qq2rn               CrashLoopBackOff

Last error log:
time=“2023-09-08T07:47:58Z” level=info msg=“nats client closed the connection” file=client.go func=nats.NewClientWithDefaults.func1.1 line=91
2023/09/08 07:47:58 failed to apply Jetstream consumer migrations: context deadline exceeded

I’m using NATS from the subchart. As custom value I used empty string as URL, as in the example here

global:
  nats:
    URL: ""

Any help is appreciated and please let me know if you need more Information.

BR

1 Like

More Information from my side:

I assume this in an cluster internal Issue. I haven’t created an Ingress yet for external access.

And here’s the values file without the secrets, that I’m using for the helm installation:
mender-3.6.2.yml (741 Bytes)

Hello @oldev ,
can you please check if the NATS addres is created successfully?

kubectl get deploy -l app.kubernetes.io/component: workflows -o yaml | grep -A2 WORKFLOWS_NATS_URI

Thanks for the quick reply! Sure, but I get an error with this command
kubectl get deploy -l app.kubernetes.io/component: workflows -o yaml
error: name cannot be provided when a selector is specified

Hope this works aswell: Heres what I get when I grep over the complete deploy yaml:

kubectl get deploy -o yaml | grep -A2 WORKFLOWS_NATS_URI
          - name: WORKFLOWS_NATS_URI
            value: nats://mender-nats
          - name: WORKFLOWS_MENDER_URL
--
          - name: WORKFLOWS_NATS_URI
            value: nats://mender-nats
          envFrom:
--
          - name: WORKFLOWS_NATS_URI
            value: nats://mender-nats
          - name: WORKFLOWS_MENDER_URL

Sorry, here’s the correct command:

kubectl get deploy -l app.kubernetes.io/component=workflows -o yaml | grep -A2 WORKFLOWS_NATS_URI

Anyway, you got the right check anyway. Now can you please check if a service named mender-nats exists?

kubectl get svc mender-nats

Yes, it’s exactly the same output.

Yes the service exists

kubectl get svc mender-nats
NAME          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                                                 AGE
mender-nats   ClusterIP   None         <none>        4222/TCP,6222/TCP,8222/TCP,7777/TCP,7422/TCP,7522/TCP   19h

The endpoint aswell:

kubectl get ep | grep nats
mender-nats                       10.224.0.44:7522,10.224.1.77:7522,10.224.1.90:7522 + 15 more...   19h

Awesome, now let’s check the nats statefulset status:

kubectl get statefulset
kubectl get pods

and check if the NATS pods are healthy

kubectl get statefulset
NAME                     READY   AGE
mender-mongodb           2/2     20h
mender-mongodb-arbiter   1/1     20h
mender-nats              3/3     20h
mender-redis-master      1/1     20h
mender-redis-replicas    3/3     20h
kubectl get pods
NAME                                               READY   STATUS             RESTARTS          AGE
mender-api-gateway-7b75c59bdd-xnth6                1/1     Running            0                 20h
mender-create-artifact-worker-6b9f44d949-7bktj     0/1     CrashLoopBackOff   212 (3m32s ago)   18h
mender-deployments-6f6764995c-s9dpx                1/1     Running            1 (20h ago)       20h
mender-deployments-storage-daemon-28236015-v5785   0/1     Completed          0                 16m
mender-device-auth-989c8bd96-s6mm6                 1/1     Running            1 (20h ago)       20h
mender-deviceconfig-67b9779b8d-rwmsn               1/1     Running            0                 20h
mender-deviceconnect-5b67d847f-wg4n6               1/1     Running            0                 18h
mender-gui-5d999776c4-9h7wp                        1/1     Running            0                 20h
mender-inventory-8899485fb-7z6dg                   1/1     Running            0                 20h
mender-iot-manager-587ff944d-h82cz                 1/1     Running            1 (20h ago)       20h
mender-mongodb-0                                   1/1     Running            0                 20h
mender-mongodb-1                                   1/1     Running            0                 20h
mender-mongodb-arbiter-0                           1/1     Running            0                 20h
mender-nats-0                                      3/3     Running            0                 20h
mender-nats-1                                      3/3     Running            0                 20h
mender-nats-2                                      3/3     Running            0                 20h
mender-nats-box-7bbc67486c-x5hdd                   1/1     Running            0                 20h
mender-redis-master-0                              1/1     Running            0                 20h
mender-redis-replicas-0                            1/1     Running            0                 20h
mender-redis-replicas-1                            1/1     Running            0                 20h
mender-redis-replicas-2                            1/1     Running            0                 20h
mender-useradm-7d9d56b6fd-sxpfw                    1/1     Running            1 (20h ago)       20h
mender-workflows-server-5cc4b8b965-hrzl7           1/1     Running            0                 18h
mender-workflows-worker-859b66449-qq2rn            0/1     CrashLoopBackOff   212 (106s ago)    18h

Could you please also check:

  1. NATS logs
kubectl logs mender-nats-0
kubectl logs mender-nats-1
kubectl logs mender-nats-2
  1. NATS storage:
kubectl get pv,pvc

Thanks

Sure.

kubectl get pv,pvc
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      STORAGECLASS   REASON   AGE
persistentvolume/pvc-20cc7c2e-9729-4468-97a7-da57555fc3d5   2Gi        RWO            Delete           Bound    default/mender-nats-js-pvc-mender-nats-2   default                 20h
persistentvolume/pvc-273eb924-5bc2-4510-aeaa-2bfa9bec0945   8Gi        RWO            Delete           Bound    default/datadir-mender-mongodb-0           default                 20h
persistentvolume/pvc-367946ea-6028-4e92-b186-d4794f93afaa   2Gi        RWO            Delete           Bound    default/mender-nats-js-pvc-mender-nats-0   default                 20h
persistentvolume/pvc-3917d50c-9696-4cdf-81fb-89c13fe3d372   8Gi        RWO            Delete           Bound    default/datadir-mender-mongodb-1           default                 20h
persistentvolume/pvc-a824a563-11bc-4582-bc6e-be2a702e098d   2Gi        RWO            Delete           Bound    default/mender-nats-js-pvc-mender-nats-1   default                 20h

NAME                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/datadir-mender-mongodb-0           Bound    pvc-273eb924-5bc2-4510-aeaa-2bfa9bec0945   8Gi        RWO            default        20h
persistentvolumeclaim/datadir-mender-mongodb-1           Bound    pvc-3917d50c-9696-4cdf-81fb-89c13fe3d372   8Gi        RWO            default        20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-0   Bound    pvc-367946ea-6028-4e92-b186-d4794f93afaa   2Gi        RWO            default        20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-1   Bound    pvc-a824a563-11bc-4582-bc6e-be2a702e098d   2Gi        RWO            default        20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-2   Bound    pvc-20cc7c2e-9729-4468-97a7-da57555fc3d5   2Gi        RWO            default        20h
kubectl logs mender-nats-0 
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:09.883670 [INF] Starting nats-server
[7] 2023/09/07 12:20:09.883693 [INF]   Version:  2.7.4
[7] 2023/09/07 12:20:09.883694 [INF]   Git:      [a86b84a]
[7] 2023/09/07 12:20:09.883696 [INF]   Name:     mender-nats-0
[7] 2023/09/07 12:20:09.883698 [INF]   Node:     c0E5oXoN
[7] 2023/09/07 12:20:09.883699 [INF]   ID:       NDI656WJTRG7ECKSTFZXGJC6NGD3AJPJH3X6R4QGJADZ54ILMI3TTJQY
[7] 2023/09/07 12:20:09.883711 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:09.884075 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:09.884104 [INF] Starting JetStream
[7] 2023/09/07 12:20:09.884389 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7] 2023/09/07 12:20:09.884394 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7] 2023/09/07 12:20:09.884395 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7] 2023/09/07 12:20:09.884396 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7] 2023/09/07 12:20:09.884397 [INF] 
[7] 2023/09/07 12:20:09.884398 [INF]          https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:09.884399 [INF] 
[7] 2023/09/07 12:20:09.884400 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:09.884404 [INF]   Max Memory:      1.00 GB
[7] 2023/09/07 12:20:09.884406 [INF]   Max Storage:     2.00 GB
[7] 2023/09/07 12:20:09.884408 [INF]   Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:09.884409 [INF] -------------------------------------------
[7] 2023/09/07 12:20:09.884528 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:09.884534 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:09.884861 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:09.885005 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:09.885121 [INF] Server is ready
[7] 2023/09/07 12:20:09.885136 [INF] Cluster name is nats
[7] 2023/09/07 12:20:09.885151 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:14.899897 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-1.mender-nats.default.svc.cluster.local": lookup mender-nats-1.mender-nats.default.svc.cluster.local on 10.0.0.10:53: no such host
[7] 2023/09/07 12:20:14.900081 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-2.mender-nats.default.svc.cluster.local": lookup mender-nats-2.mender-nats.default.svc.cluster.local on 10.0.0.10:53: no such host
[7] 2023/09/07 12:20:26.482055 [INF] 10.224.1.90:46540 - rid:6 - Route connection created
[7] 2023/09/07 12:20:27.204441 [INF] 10.224.1.77:42556 - rid:7 - Route connection created
[7] 2023/09/07 12:20:28.261939 [INF] JetStream cluster new metadata leader: mender-nats-1/nats
[7] 2023/09/07 12:20:37.942395 [INF] Self is new JetStream cluster metadata leader
[7] 2023/09/07 12:20:45.234996 [INF] 10.224.1.77:6222 - rid:11 - Route connection created
[7] 2023/09/07 12:20:45.236538 [INF] 10.224.1.77:6222 - rid:11 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:46.218773 [INF] 10.224.1.90:6222 - rid:12 - Route connection created
[7] 2023/09/07 12:20:46.220122 [INF] 10.224.1.90:6222 - rid:12 - Router connection closed: Duplicate Route

there are quite a bit more repetetive logs in mender-nats-1

kubectl logs mender-nats-1
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:16.184700 [INF] Starting nats-server
[7] 2023/09/07 12:20:16.184723 [INF]   Version:  2.7.4
[7] 2023/09/07 12:20:16.184725 [INF]   Git:      [a86b84a]
[7] 2023/09/07 12:20:16.184726 [INF]   Name:     mender-nats-1
[7] 2023/09/07 12:20:16.184729 [INF]   Node:     0asvfPsf
[7] 2023/09/07 12:20:16.184730 [INF]   ID:       NBSFD63ED3ULYUJCSOREWW6X4LVYDV6SF4HQRNT2GN63UF4RMWQKQ2H5
[7] 2023/09/07 12:20:16.184741 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:16.185056 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:16.185087 [INF] Starting JetStream
[7] 2023/09/07 12:20:16.185527 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7] 2023/09/07 12:20:16.185533 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7] 2023/09/07 12:20:16.185534 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7] 2023/09/07 12:20:16.185535 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7] 2023/09/07 12:20:16.185536 [INF] 
[7] 2023/09/07 12:20:16.185537 [INF]          https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:16.185538 [INF] 
[7] 2023/09/07 12:20:16.185539 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:16.185543 [INF]   Max Memory:      1.00 GB
[7] 2023/09/07 12:20:16.185545 [INF]   Max Storage:     2.00 GB
[7] 2023/09/07 12:20:16.185546 [INF]   Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:16.185547 [INF] -------------------------------------------
[7] 2023/09/07 12:20:16.185657 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:16.185662 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:16.185959 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:16.186125 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:16.186238 [INF] Server is ready
[7] 2023/09/07 12:20:16.186252 [INF] Cluster name is nats
[7] 2023/09/07 12:20:16.186267 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:26.200380 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-0.mender-nats.default.svc.cluster.local": lookup mender-nats-0.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.77:38809->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:27.204355 [INF] 10.224.0.44:6222 - rid:6 - Route connection created
[7] 2023/09/07 12:20:28.261700 [INF] Self is new JetStream cluster metadata leader
[7] 2023/09/07 12:20:31.198333 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-2.mender-nats.default.svc.cluster.local": lookup mender-nats-2.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.77:53470->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:33.411174 [INF] JetStream cluster new stream leader for '$G > WORKFLOWS'
[7] 2023/09/07 12:20:34.348084 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:20:35.763537 [INF] JetStream cluster no metadata leader
[7] 2023/09/07 12:20:37.943774 [INF] JetStream cluster new metadata leader: mender-nats-0/nats
[7] 2023/09/07 12:20:45.236143 [INF] 10.224.0.44:44430 - rid:12 - Route connection created
[7] 2023/09/07 12:20:45.236326 [INF] 10.224.0.44:44430 - rid:12 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:45.327794 [INF] 10.224.1.90:6222 - rid:13 - Route connection created
[7] 2023/09/07 12:20:46.687035 [INF] 10.224.1.90:49836 - rid:14 - Route connection created
[7] 2023/09/07 12:20:46.687291 [INF] 10.224.1.90:49836 - rid:14 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:59.159334 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/07 12:20:59.353410 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:21:55.447968 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:23:32.435883 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'

**repeating logs until live**

[7] 2023/09/08 09:05:23.433716 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/08 09:08:59.056224 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/08 09:10:38.434511 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/08 09:14:17.158217 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/08 09:15:45.340557 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
kubectl logs mender-nats-2
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:15.460872 [INF] Starting nats-server
[7] 2023/09/07 12:20:15.460898 [INF]   Version:  2.7.4
[7] 2023/09/07 12:20:15.460900 [INF]   Git:      [a86b84a]
[7] 2023/09/07 12:20:15.460902 [INF]   Name:     mender-nats-2
[7] 2023/09/07 12:20:15.460905 [INF]   Node:     HEE3lf4A
[7] 2023/09/07 12:20:15.460907 [INF]   ID:       NCTDYSTK7HDID5USQKKX5QH7CN3RIV4CMOGDXGZ5FDY23CDKYAD3T5NC
[7] 2023/09/07 12:20:15.460920 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:15.461332 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:15.461373 [INF] Starting JetStream
[7] 2023/09/07 12:20:15.461850 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7] 2023/09/07 12:20:15.461857 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7] 2023/09/07 12:20:15.461858 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7] 2023/09/07 12:20:15.461859 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7] 2023/09/07 12:20:15.461860 [INF] 
[7] 2023/09/07 12:20:15.461861 [INF]          https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:15.461862 [INF] 
[7] 2023/09/07 12:20:15.461863 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:15.461868 [INF]   Max Memory:      1.00 GB
[7] 2023/09/07 12:20:15.461869 [INF]   Max Storage:     2.00 GB
[7] 2023/09/07 12:20:15.461870 [INF]   Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:15.461871 [INF] -------------------------------------------
[7] 2023/09/07 12:20:15.462009 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:15.462015 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:15.462322 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:15.462559 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:15.462841 [INF] Server is ready
[7] 2023/09/07 12:20:15.462847 [INF] Cluster name is nats
[7] 2023/09/07 12:20:15.462860 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:25.475486 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-0.mender-nats.default.svc.cluster.local": lookup mender-nats-0.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.90:48005->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:25.477996 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-1.mender-nats.default.svc.cluster.local": lookup mender-nats-1.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.90:41706->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:26.482236 [INF] 10.224.0.44:6222 - rid:6 - Route connection created
[7] 2023/09/07 12:20:35.535862 [INF] JetStream cluster no metadata leader
[7] 2023/09/07 12:20:37.943606 [INF] JetStream cluster new metadata leader: mender-nats-0/nats
[7] 2023/09/07 12:20:45.327818 [INF] 10.224.1.77:60598 - rid:10 - Route connection created
[7] 2023/09/07 12:20:46.219902 [INF] 10.224.0.44:40360 - rid:11 - Route connection created
[7] 2023/09/07 12:20:46.220122 [INF] 10.224.0.44:40360 - rid:11 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:46.687023 [INF] 10.224.1.77:6222 - rid:12 - Route connection created
[7] 2023/09/07 12:20:46.687245 [INF] 10.224.1.77:6222 - rid:12 - Router connection closed: Duplicate Route

It seems all good to me. I already checked in a test environment (EKS) and it works properly. Can you please give me the output of

kubectl get deploy -l app.kubernetes.io/component=workflows -o yaml

I’m going to compare it with my env.
Please double check that no secrets are exposed before sharing.

Hi @oldev :wave:
There is an issue with the workflows automigrate behavior, could you try disabling workflows.automigrate and create_artifact_worker.automigrate in the helm values?

Okay,
i have all secrets replaced by “redacted”
workflows-deployment.yml (9.9 KB)

I see, that there are a lot of ENV vars which consume the public domain. Should I continue with deploying the Ingress?

okay I added these overwrites to mender-3.6.2.yml.

workflows:
  automigrate: false

create_artifact_worker:
  automigrate: false

should i run the helm upgrade with or without hooks?
helm upgrade mender mender/mender -f mender-3.6.2.yml

helm upgrade --no-hooks mender mender/mender -f mender-3.6.2.yml

You can safely run the upgrade with hooks. If they’ve been run before the hooks are simply no-ops.

I see, that there are a lot of ENV vars which consume the public domain. Should I continue with deploying the Ingress?

Yes, it’s required for exposing the Mender server externally

okay seems like they did not yet run successfully. The helm upgrade command does not return and the mender-db-data-migration-* pods are in Error state

mender-db-data-migration-594jc                     0/8     Error              0                 43s
mender-db-data-migration-622bl                     0/8     Error              0                 103s
mender-db-data-migration-gclw8                     0/8     Error              0                 73s
mender-db-data-migration-vngxt                     0/8     Error              0                 92s

P.S: helm upgrade returned with:

Error: UPGRADE FAILED: pre-upgrade hooks failed: 1 error occurred:
        * timed out waiting for the condition

Job logs show:

kubectl logs mender-db-data-migration-tk2pn
Defaulted container "deployments-migration" out of: deployments-migration, device-auth-migration, deviceconfig-migration, deviceconnect-migration, inventory-migration, useradm-migration, workflows-server-migration, iot-manager-migration
time="2023-09-08T09:46:41Z" level=warning msg="'presign.secret' not configured. Generating a random secret." file=config.go func=config.Setup line=238
failed to connect to db: Error reaching mongo server: connection() error occurred during connection handshake: auth error: unable to authenticate using mechanism "SCRAM-SHA-256": (AuthenticationFailed) Authentication failed.

That seems another issue. To debug it further could you please check if your mongodb-common secret is correct?

kubectl get secret mongodb-common -o jsonpath='{.data.MONGO}' | base64 -d

The actual db-data-migration job is using a temporary secret, created for the migration job then destroyed if successfully. Probably you can still check it (please don’t share: just check if it’s right):

kubectl get secret mongodb-common-prerelease -o jsonpath='{.data.MONGO}' | base64 -d

Switching back to the NATS issue, the actual logs for the migration job container can be viewed with:

kubectl logs mender-db-data-migration-* -c workflows-server-migration

Can you plase share these logs?

Alright,

For context: I did an k3s deployment with minio 2 days ago and had the same rror: unable to authenticate using mechanism "SCRAM-SHA-256": error, so it’s reproducible (at least with my configs :sweat_smile:)

so I was on lunch break and both the previous error pods and the secret was already gone.
Can you specify what you mean by “a temporary secret”. Is the secret deleted itself, or is the password overwritten?

The secret is actually still present, but has a strange URL. this is the actual secret, NOT redacted:

kubectl get secret mongodb-common-prerelease -o jsonpath='{.data.MONGO}' | base64 -d
mongodb://root:<rootPW>@mender-mongodb-headless

I’ll trigger a new upgrade and Upload the logs

The temporary secret is just a Helm Hook resource, needed for the migration-job hook. Is has the same content as the mongodb-common secret?
The strange thing is that if the MongoDB auth is broken, all your services are going to crashbackoff, but this seems not the case…