oldev
September 8, 2023, 7:57am
1
Hi all,
Yesterday I deployed the mender chart version “mender-5.2.3 Chart” = Server Version 3.6.2 on Azure Kubernetes Service.
After the deployment not all container wenn into Running state. These two keep crashing for the same reason.
NAME STATUS
mender-create-artifact-worker-6b9f44d949-7bktj CrashLoopBackOff
mender-workflows-worker-859b66449-qq2rn CrashLoopBackOff
Last error log:
time=“2023-09-08T07:47:58Z” level=info msg=“nats client closed the connection” file=client.go func=nats.NewClientWithDefaults.func1.1 line=91
2023/09/08 07:47:58 failed to apply Jetstream consumer migrations: context deadline exceeded
I’m using NATS from the subchart. As custom value I used empty string as URL, as in the example here
global:
nats:
URL: ""
Any help is appreciated and please let me know if you need more Information.
BR
1 Like
oldev
September 8, 2023, 8:06am
2
More Information from my side:
I assume this in an cluster internal Issue. I haven’t created an Ingress yet for external access.
And here’s the values file without the secrets, that I’m using for the helm installation:
mender-3.6.2.yml (741 Bytes)
robgio
September 8, 2023, 8:08am
3
Hello @oldev ,
can you please check if the NATS addres is created successfully?
kubectl get deploy -l app.kubernetes.io/component: workflows -o yaml | grep -A2 WORKFLOWS_NATS_URI
oldev
September 8, 2023, 8:12am
4
Thanks for the quick reply! Sure, but I get an error with this command
kubectl get deploy -l app.kubernetes.io/component: workflows -o yaml
error: name cannot be provided when a selector is specified
Hope this works aswell: Heres what I get when I grep over the complete deploy yaml:
kubectl get deploy -o yaml | grep -A2 WORKFLOWS_NATS_URI
- name: WORKFLOWS_NATS_URI
value: nats://mender-nats
- name: WORKFLOWS_MENDER_URL
--
- name: WORKFLOWS_NATS_URI
value: nats://mender-nats
envFrom:
--
- name: WORKFLOWS_NATS_URI
value: nats://mender-nats
- name: WORKFLOWS_MENDER_URL
robgio
September 8, 2023, 8:15am
5
Sorry, here’s the correct command:
kubectl get deploy -l app.kubernetes.io/component=workflows -o yaml | grep -A2 WORKFLOWS_NATS_URI
Anyway, you got the right check anyway. Now can you please check if a service named mender-nats
exists?
kubectl get svc mender-nats
oldev
September 8, 2023, 8:18am
6
Yes, it’s exactly the same output.
Yes the service exists
kubectl get svc mender-nats
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mender-nats ClusterIP None <none> 4222/TCP,6222/TCP,8222/TCP,7777/TCP,7422/TCP,7522/TCP 19h
The endpoint aswell:
kubectl get ep | grep nats
mender-nats 10.224.0.44:7522,10.224.1.77:7522,10.224.1.90:7522 + 15 more... 19h
robgio
September 8, 2023, 8:30am
7
Awesome, now let’s check the nats statefulset status:
kubectl get statefulset
kubectl get pods
and check if the NATS pods are healthy
oldev
September 8, 2023, 8:31am
8
kubectl get statefulset
NAME READY AGE
mender-mongodb 2/2 20h
mender-mongodb-arbiter 1/1 20h
mender-nats 3/3 20h
mender-redis-master 1/1 20h
mender-redis-replicas 3/3 20h
kubectl get pods
NAME READY STATUS RESTARTS AGE
mender-api-gateway-7b75c59bdd-xnth6 1/1 Running 0 20h
mender-create-artifact-worker-6b9f44d949-7bktj 0/1 CrashLoopBackOff 212 (3m32s ago) 18h
mender-deployments-6f6764995c-s9dpx 1/1 Running 1 (20h ago) 20h
mender-deployments-storage-daemon-28236015-v5785 0/1 Completed 0 16m
mender-device-auth-989c8bd96-s6mm6 1/1 Running 1 (20h ago) 20h
mender-deviceconfig-67b9779b8d-rwmsn 1/1 Running 0 20h
mender-deviceconnect-5b67d847f-wg4n6 1/1 Running 0 18h
mender-gui-5d999776c4-9h7wp 1/1 Running 0 20h
mender-inventory-8899485fb-7z6dg 1/1 Running 0 20h
mender-iot-manager-587ff944d-h82cz 1/1 Running 1 (20h ago) 20h
mender-mongodb-0 1/1 Running 0 20h
mender-mongodb-1 1/1 Running 0 20h
mender-mongodb-arbiter-0 1/1 Running 0 20h
mender-nats-0 3/3 Running 0 20h
mender-nats-1 3/3 Running 0 20h
mender-nats-2 3/3 Running 0 20h
mender-nats-box-7bbc67486c-x5hdd 1/1 Running 0 20h
mender-redis-master-0 1/1 Running 0 20h
mender-redis-replicas-0 1/1 Running 0 20h
mender-redis-replicas-1 1/1 Running 0 20h
mender-redis-replicas-2 1/1 Running 0 20h
mender-useradm-7d9d56b6fd-sxpfw 1/1 Running 1 (20h ago) 20h
mender-workflows-server-5cc4b8b965-hrzl7 1/1 Running 0 18h
mender-workflows-worker-859b66449-qq2rn 0/1 CrashLoopBackOff 212 (106s ago) 18h
robgio
September 8, 2023, 9:10am
9
Could you please also check:
NATS logs
kubectl logs mender-nats-0
kubectl logs mender-nats-1
kubectl logs mender-nats-2
NATS storage:
kubectl get pv,pvc
Thanks
oldev
September 8, 2023, 9:21am
10
Sure.
kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-20cc7c2e-9729-4468-97a7-da57555fc3d5 2Gi RWO Delete Bound default/mender-nats-js-pvc-mender-nats-2 default 20h
persistentvolume/pvc-273eb924-5bc2-4510-aeaa-2bfa9bec0945 8Gi RWO Delete Bound default/datadir-mender-mongodb-0 default 20h
persistentvolume/pvc-367946ea-6028-4e92-b186-d4794f93afaa 2Gi RWO Delete Bound default/mender-nats-js-pvc-mender-nats-0 default 20h
persistentvolume/pvc-3917d50c-9696-4cdf-81fb-89c13fe3d372 8Gi RWO Delete Bound default/datadir-mender-mongodb-1 default 20h
persistentvolume/pvc-a824a563-11bc-4582-bc6e-be2a702e098d 2Gi RWO Delete Bound default/mender-nats-js-pvc-mender-nats-1 default 20h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/datadir-mender-mongodb-0 Bound pvc-273eb924-5bc2-4510-aeaa-2bfa9bec0945 8Gi RWO default 20h
persistentvolumeclaim/datadir-mender-mongodb-1 Bound pvc-3917d50c-9696-4cdf-81fb-89c13fe3d372 8Gi RWO default 20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-0 Bound pvc-367946ea-6028-4e92-b186-d4794f93afaa 2Gi RWO default 20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-1 Bound pvc-a824a563-11bc-4582-bc6e-be2a702e098d 2Gi RWO default 20h
persistentvolumeclaim/mender-nats-js-pvc-mender-nats-2 Bound pvc-20cc7c2e-9729-4468-97a7-da57555fc3d5 2Gi RWO default 20h
kubectl logs mender-nats-0
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:09.883670 [INF] Starting nats-server
[7] 2023/09/07 12:20:09.883693 [INF] Version: 2.7.4
[7] 2023/09/07 12:20:09.883694 [INF] Git: [a86b84a]
[7] 2023/09/07 12:20:09.883696 [INF] Name: mender-nats-0
[7] 2023/09/07 12:20:09.883698 [INF] Node: c0E5oXoN
[7] 2023/09/07 12:20:09.883699 [INF] ID: NDI656WJTRG7ECKSTFZXGJC6NGD3AJPJH3X6R4QGJADZ54ILMI3TTJQY
[7] 2023/09/07 12:20:09.883711 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:09.884075 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:09.884104 [INF] Starting JetStream
[7] 2023/09/07 12:20:09.884389 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[7] 2023/09/07 12:20:09.884394 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[7] 2023/09/07 12:20:09.884395 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[7] 2023/09/07 12:20:09.884396 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[7] 2023/09/07 12:20:09.884397 [INF]
[7] 2023/09/07 12:20:09.884398 [INF] https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:09.884399 [INF]
[7] 2023/09/07 12:20:09.884400 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:09.884404 [INF] Max Memory: 1.00 GB
[7] 2023/09/07 12:20:09.884406 [INF] Max Storage: 2.00 GB
[7] 2023/09/07 12:20:09.884408 [INF] Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:09.884409 [INF] -------------------------------------------
[7] 2023/09/07 12:20:09.884528 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:09.884534 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:09.884861 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:09.885005 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:09.885121 [INF] Server is ready
[7] 2023/09/07 12:20:09.885136 [INF] Cluster name is nats
[7] 2023/09/07 12:20:09.885151 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:14.899897 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-1.mender-nats.default.svc.cluster.local": lookup mender-nats-1.mender-nats.default.svc.cluster.local on 10.0.0.10:53: no such host
[7] 2023/09/07 12:20:14.900081 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-2.mender-nats.default.svc.cluster.local": lookup mender-nats-2.mender-nats.default.svc.cluster.local on 10.0.0.10:53: no such host
[7] 2023/09/07 12:20:26.482055 [INF] 10.224.1.90:46540 - rid:6 - Route connection created
[7] 2023/09/07 12:20:27.204441 [INF] 10.224.1.77:42556 - rid:7 - Route connection created
[7] 2023/09/07 12:20:28.261939 [INF] JetStream cluster new metadata leader: mender-nats-1/nats
[7] 2023/09/07 12:20:37.942395 [INF] Self is new JetStream cluster metadata leader
[7] 2023/09/07 12:20:45.234996 [INF] 10.224.1.77:6222 - rid:11 - Route connection created
[7] 2023/09/07 12:20:45.236538 [INF] 10.224.1.77:6222 - rid:11 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:46.218773 [INF] 10.224.1.90:6222 - rid:12 - Route connection created
[7] 2023/09/07 12:20:46.220122 [INF] 10.224.1.90:6222 - rid:12 - Router connection closed: Duplicate Route
there are quite a bit more repetetive logs in mender-nats-1
kubectl logs mender-nats-1
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:16.184700 [INF] Starting nats-server
[7] 2023/09/07 12:20:16.184723 [INF] Version: 2.7.4
[7] 2023/09/07 12:20:16.184725 [INF] Git: [a86b84a]
[7] 2023/09/07 12:20:16.184726 [INF] Name: mender-nats-1
[7] 2023/09/07 12:20:16.184729 [INF] Node: 0asvfPsf
[7] 2023/09/07 12:20:16.184730 [INF] ID: NBSFD63ED3ULYUJCSOREWW6X4LVYDV6SF4HQRNT2GN63UF4RMWQKQ2H5
[7] 2023/09/07 12:20:16.184741 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:16.185056 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:16.185087 [INF] Starting JetStream
[7] 2023/09/07 12:20:16.185527 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[7] 2023/09/07 12:20:16.185533 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[7] 2023/09/07 12:20:16.185534 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[7] 2023/09/07 12:20:16.185535 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[7] 2023/09/07 12:20:16.185536 [INF]
[7] 2023/09/07 12:20:16.185537 [INF] https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:16.185538 [INF]
[7] 2023/09/07 12:20:16.185539 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:16.185543 [INF] Max Memory: 1.00 GB
[7] 2023/09/07 12:20:16.185545 [INF] Max Storage: 2.00 GB
[7] 2023/09/07 12:20:16.185546 [INF] Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:16.185547 [INF] -------------------------------------------
[7] 2023/09/07 12:20:16.185657 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:16.185662 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:16.185959 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:16.186125 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:16.186238 [INF] Server is ready
[7] 2023/09/07 12:20:16.186252 [INF] Cluster name is nats
[7] 2023/09/07 12:20:16.186267 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:26.200380 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-0.mender-nats.default.svc.cluster.local": lookup mender-nats-0.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.77:38809->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:27.204355 [INF] 10.224.0.44:6222 - rid:6 - Route connection created
[7] 2023/09/07 12:20:28.261700 [INF] Self is new JetStream cluster metadata leader
[7] 2023/09/07 12:20:31.198333 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-2.mender-nats.default.svc.cluster.local": lookup mender-nats-2.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.77:53470->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:33.411174 [INF] JetStream cluster new stream leader for '$G > WORKFLOWS'
[7] 2023/09/07 12:20:34.348084 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:20:35.763537 [INF] JetStream cluster no metadata leader
[7] 2023/09/07 12:20:37.943774 [INF] JetStream cluster new metadata leader: mender-nats-0/nats
[7] 2023/09/07 12:20:45.236143 [INF] 10.224.0.44:44430 - rid:12 - Route connection created
[7] 2023/09/07 12:20:45.236326 [INF] 10.224.0.44:44430 - rid:12 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:45.327794 [INF] 10.224.1.90:6222 - rid:13 - Route connection created
[7] 2023/09/07 12:20:46.687035 [INF] 10.224.1.90:49836 - rid:14 - Route connection created
[7] 2023/09/07 12:20:46.687291 [INF] 10.224.1.90:49836 - rid:14 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:59.159334 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/07 12:20:59.353410 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:21:55.447968 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/07 12:23:32.435883 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
**repeating logs until live**
[7] 2023/09/08 09:05:23.433716 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/08 09:08:59.056224 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/08 09:10:38.434511 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[7] 2023/09/08 09:14:17.158217 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[7] 2023/09/08 09:15:45.340557 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
kubectl logs mender-nats-2
Defaulted container "nats" out of: nats, reloader, metrics
[7] 2023/09/07 12:20:15.460872 [INF] Starting nats-server
[7] 2023/09/07 12:20:15.460898 [INF] Version: 2.7.4
[7] 2023/09/07 12:20:15.460900 [INF] Git: [a86b84a]
[7] 2023/09/07 12:20:15.460902 [INF] Name: mender-nats-2
[7] 2023/09/07 12:20:15.460905 [INF] Node: HEE3lf4A
[7] 2023/09/07 12:20:15.460907 [INF] ID: NCTDYSTK7HDID5USQKKX5QH7CN3RIV4CMOGDXGZ5FDY23CDKYAD3T5NC
[7] 2023/09/07 12:20:15.460920 [INF] Using configuration file: /etc/nats-config/nats.conf
[7] 2023/09/07 12:20:15.461332 [INF] Starting http monitor on 0.0.0.0:8222
[7] 2023/09/07 12:20:15.461373 [INF] Starting JetStream
[7] 2023/09/07 12:20:15.461850 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[7] 2023/09/07 12:20:15.461857 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[7] 2023/09/07 12:20:15.461858 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[7] 2023/09/07 12:20:15.461859 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[7] 2023/09/07 12:20:15.461860 [INF]
[7] 2023/09/07 12:20:15.461861 [INF] https://docs.nats.io/jetstream
[7] 2023/09/07 12:20:15.461862 [INF]
[7] 2023/09/07 12:20:15.461863 [INF] ---------------- JETSTREAM ----------------
[7] 2023/09/07 12:20:15.461868 [INF] Max Memory: 1.00 GB
[7] 2023/09/07 12:20:15.461869 [INF] Max Storage: 2.00 GB
[7] 2023/09/07 12:20:15.461870 [INF] Store Directory: "/data/jetstream"
[7] 2023/09/07 12:20:15.461871 [INF] -------------------------------------------
[7] 2023/09/07 12:20:15.462009 [INF] Starting JetStream cluster
[7] 2023/09/07 12:20:15.462015 [INF] Creating JetStream metadata controller
[7] 2023/09/07 12:20:15.462322 [INF] JetStream cluster bootstrapping
[7] 2023/09/07 12:20:15.462559 [INF] Listening for client connections on 0.0.0.0:4222
[7] 2023/09/07 12:20:15.462841 [INF] Server is ready
[7] 2023/09/07 12:20:15.462847 [INF] Cluster name is nats
[7] 2023/09/07 12:20:15.462860 [INF] Listening for route connections on 0.0.0.0:6222
[7] 2023/09/07 12:20:25.475486 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-0.mender-nats.default.svc.cluster.local": lookup mender-nats-0.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.90:48005->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:25.477996 [ERR] Error trying to connect to route (attempt 1): lookup for host "mender-nats-1.mender-nats.default.svc.cluster.local": lookup mender-nats-1.mender-nats.default.svc.cluster.local on 10.0.0.10:53: read udp 10.224.1.90:41706->10.0.0.10:53: i/o timeout
[7] 2023/09/07 12:20:26.482236 [INF] 10.224.0.44:6222 - rid:6 - Route connection created
[7] 2023/09/07 12:20:35.535862 [INF] JetStream cluster no metadata leader
[7] 2023/09/07 12:20:37.943606 [INF] JetStream cluster new metadata leader: mender-nats-0/nats
[7] 2023/09/07 12:20:45.327818 [INF] 10.224.1.77:60598 - rid:10 - Route connection created
[7] 2023/09/07 12:20:46.219902 [INF] 10.224.0.44:40360 - rid:11 - Route connection created
[7] 2023/09/07 12:20:46.220122 [INF] 10.224.0.44:40360 - rid:11 - Router connection closed: Duplicate Route
[7] 2023/09/07 12:20:46.687023 [INF] 10.224.1.77:6222 - rid:12 - Route connection created
[7] 2023/09/07 12:20:46.687245 [INF] 10.224.1.77:6222 - rid:12 - Router connection closed: Duplicate Route
robgio
September 8, 2023, 9:28am
11
It seems all good to me. I already checked in a test environment (EKS) and it works properly. Can you please give me the output of
kubectl get deploy -l app.kubernetes.io/component=workflows -o yaml
I’m going to compare it with my env.
Please double check that no secrets are exposed before sharing.
Hi @oldev
There is an issue with the workflows automigrate behavior, could you try disabling workflows.automigrate
and create_artifact_worker.automigrate
in the helm values?
oldev
September 8, 2023, 9:36am
13
Okay,
i have all secrets replaced by “redacted”
workflows-deployment.yml (9.9 KB)
I see, that there are a lot of ENV vars which consume the public domain. Should I continue with deploying the Ingress?
oldev
September 8, 2023, 9:39am
14
alfrunes:
create_artifact_worker
okay I added these overwrites to mender-3.6.2.yml.
workflows:
automigrate: false
create_artifact_worker:
automigrate: false
should i run the helm upgrade with or without hooks?
helm upgrade mender mender/mender -f mender-3.6.2.yml
helm upgrade --no-hooks mender mender/mender -f mender-3.6.2.yml
You can safely run the upgrade with hooks. If they’ve been run before the hooks are simply no-ops.
robgio
September 8, 2023, 9:46am
16
I see, that there are a lot of ENV vars which consume the public domain. Should I continue with deploying the Ingress?
Yes, it’s required for exposing the Mender server externally
oldev
September 8, 2023, 9:47am
17
okay seems like they did not yet run successfully. The helm upgrade command does not return and the mender-db-data-migration-* pods are in Error state
mender-db-data-migration-594jc 0/8 Error 0 43s
mender-db-data-migration-622bl 0/8 Error 0 103s
mender-db-data-migration-gclw8 0/8 Error 0 73s
mender-db-data-migration-vngxt 0/8 Error 0 92s
P.S: helm upgrade returned with:
Error: UPGRADE FAILED: pre-upgrade hooks failed: 1 error occurred:
* timed out waiting for the condition
Job logs show:
kubectl logs mender-db-data-migration-tk2pn
Defaulted container "deployments-migration" out of: deployments-migration, device-auth-migration, deviceconfig-migration, deviceconnect-migration, inventory-migration, useradm-migration, workflows-server-migration, iot-manager-migration
time="2023-09-08T09:46:41Z" level=warning msg="'presign.secret' not configured. Generating a random secret." file=config.go func=config.Setup line=238
failed to connect to db: Error reaching mongo server: connection() error occurred during connection handshake: auth error: unable to authenticate using mechanism "SCRAM-SHA-256": (AuthenticationFailed) Authentication failed.
robgio
September 8, 2023, 10:07am
18
That seems another issue. To debug it further could you please check if your mongodb-common
secret is correct?
kubectl get secret mongodb-common -o jsonpath='{.data.MONGO}' | base64 -d
The actual db-data-migration job is using a temporary secret, created for the migration job then destroyed if successfully. Probably you can still check it (please don’t share: just check if it’s right):
kubectl get secret mongodb-common-prerelease -o jsonpath='{.data.MONGO}' | base64 -d
Switching back to the NATS issue, the actual logs for the migration job container can be viewed with:
kubectl logs mender-db-data-migration-* -c workflows-server-migration
Can you plase share these logs?
oldev
September 8, 2023, 10:39am
19
Alright,
For context: I did an k3s deployment with minio 2 days ago and had the same rror: unable to authenticate using mechanism "SCRAM-SHA-256":
error, so it’s reproducible (at least with my configs )
so I was on lunch break and both the previous error pods and the secret was already gone.
Can you specify what you mean by “a temporary secret”. Is the secret deleted itself, or is the password overwritten?
The secret is actually still present, but has a strange URL. this is the actual secret, NOT redacted:
kubectl get secret mongodb-common-prerelease -o jsonpath='{.data.MONGO}' | base64 -d
mongodb://root:<rootPW>@mender-mongodb-headless
I’ll trigger a new upgrade and Upload the logs
robgio
September 8, 2023, 10:43am
20
The temporary secret is just a Helm Hook resource, needed for the migration-job hook. Is has the same content as the mongodb-common
secret?
The strange thing is that if the MongoDB auth is broken, all your services are going to crashbackoff, but this seems not the case…