Kubernetes installation fail

walterp · July 14, 2022, 12:52pm

I followed the instructions at Production installation with Kubernetes | Mender documentation to install a k3s cluster with a mender opensource production server.

MongoDB:

Using the command
helm upgrade --install mongodb bitnami/mongodb --version 10.21.1 -f mongodb.yml
quits with an error
Error: failed to download "bitnami/mongodb" at version "10.21.1"
If I remove the version instruction --version 10.21.1 and the installation works.
Is there any version definitely needed?

Mender Server:

In the last step “Create the admin user” I get the following error:
error: unable to upgrade connection: container not found ("useradm")

CrashLoopBackOff

I get many CrashLoopBackOff status:

root@deployment:/var/lib/rancher/k3s/server/manifests# kubectl get pod -o wide
NAME                                       READY   STATUS             RESTARTS       AGE    IP           NODE         NOMINATED NODE   READINESS GATES
cert-manager-cainjector-5bb9bfbb5c-nz6l7   1/1     Running            0              109m   10.42.0.10   deployment   <none>           <none>
cert-manager-798f8bb594-pstp5              1/1     Running            0              109m   10.42.0.9    deployment   <none>           <none>
cert-manager-webhook-bf48877d4-v2gzn       1/1     Running            0              109m   10.42.0.11   deployment   <none>           <none>
mongodb-arbiter-0                          1/1     Running            0              103m   10.42.0.12   deployment   <none>           <none>
mongodb-0                                  1/1     Running            0              103m   10.42.0.14   deployment   <none>           <none>
mongodb-1                                  1/1     Running            0              102m   10.42.0.16   deployment   <none>           <none>
nats-box-68dd458c5d-5nd6b                  1/1     Running            0              101m   10.42.0.17   deployment   <none>           <none>
nats-0                                     3/3     Running            0              101m   10.42.0.20   deployment   <none>           <none>
nats-1                                     3/3     Running            0              101m   10.42.0.21   deployment   <none>           <none>
minio-operator-fc8bbbc9b-wtk9m             1/1     Running            0              98m    10.42.0.23   deployment   <none>           <none>
minio-operator-console-7d5db9fdd4-2lxrz    1/1     Running            0              98m    10.42.0.22   deployment   <none>           <none>
minio-ss-0-1                               1/1     Running            0              94m    10.42.0.30   deployment   <none>           <none>
minio-ss-0-0                               1/1     Running            0              94m    10.42.0.31   deployment   <none>           <none>
api-gateway-7bcb79c8d4-t5lh8               1/1     Running            0              83m    10.42.0.32   deployment   <none>           <none>
gui-6b6c9dcc9-vrbft                        1/1     Running            0              83m    10.42.0.41   deployment   <none>           <none>
deployments-8f97bb89b-xrcqb                0/1     CrashLoopBackOff   21 (93s ago)   83m    10.42.0.36   deployment   <none>           <none>
deviceconnect-85854c47fb-n6bbs             0/1     CrashLoopBackOff   21 (88s ago)   83m    10.42.0.37   deployment   <none>           <none>
deviceconfig-655bc99fc-lxg94               0/1     CrashLoopBackOff   21 (80s ago)   83m    10.42.0.38   deployment   <none>           <none>
workflows-worker-864756b6d4-xlxks          0/1     CrashLoopBackOff   21 (85s ago)   83m    10.42.0.33   deployment   <none>           <none>
create-artifact-worker-5f4dd85f4d-g76q8    0/1     CrashLoopBackOff   21 (82s ago)   83m    10.42.0.35   deployment   <none>           <none>
workflows-server-5668fc8776-9blx7          0/1     CrashLoopBackOff   21 (70s ago)   83m    10.42.0.39   deployment   <none>           <none>
iot-manager-c658c9869-flltj                0/1     CrashLoopBackOff   21 (59s ago)   83m    10.42.0.42   deployment   <none>           <none>
useradm-78fc76586f-5b8x2                   0/1     CrashLoopBackOff   21 (35s ago)   83m    10.42.0.40   deployment   <none>           <none>
device-auth-59d8744b7b-txlcj               0/1     CrashLoopBackOff   21 (30s ago)   83m    10.42.0.43   deployment   <none>           <none>
inventory-5589f4df8f-zzps6                 0/1     CrashLoopBackOff   21 (23s ago)   83m    10.42.0.34   deployment   <none>           <none>

What can I do to find the mistake? Are there any steps additionally needed?

alfrunes · July 14, 2022, 1:03pm

The chart version 10.21.1 for bitnami/mongodb no longer exists. Could you replace the --version argument with 11.2.0? Remember to update the Mender chart values. Note that the snippet deploying the Mender helm chart depends on the $MONGODB_ROOT_PASSWORD env variable from the MongoDB deployment step.
In the meantime I will make sure to update the documentation. All the services that are in CrashLoopBackOff state depends on MongoDB.

walterp · July 14, 2022, 1:30pm

Thank you for the fast reply.

I deleted/uninstalled my previous k3s installation and work through all the installation steps with the different mongodb version.

All installation steps succed successfully but one pod shows the status CrashLoopBackOff:

root@deployment:/var/lib/rancher/k3s/server/manifests# kubectl get pod -o wide
NAME                                       READY   STATUS             RESTARTS        AGE   IP           NODE         NOMINATED NODE   READINESS GATES
cert-manager-cainjector-5bb9bfbb5c-zgk25   1/1     Running            0               29m   10.42.0.7    deployment   <none>           <none>
cert-manager-798f8bb594-cp28k              1/1     Running            0               29m   10.42.0.9    deployment   <none>           <none>
cert-manager-webhook-bf48877d4-k4hlp       1/1     Running            0               29m   10.42.0.8    deployment   <none>           <none>
mongodb-arbiter-0                          1/1     Running            0               27m   10.42.0.13   deployment   <none>           <none>
mongodb-0                                  1/1     Running            0               27m   10.42.0.14   deployment   <none>           <none>
mongodb-1                                  1/1     Running            0               27m   10.42.0.16   deployment   <none>           <none>
nats-box-5448cbc897-vnk74                  1/1     Running            0               26m   10.42.0.17   deployment   <none>           <none>
nats-0                                     3/3     Running            0               26m   10.42.0.19   deployment   <none>           <none>
nats-1                                     3/3     Running            0               26m   10.42.0.21   deployment   <none>           <none>
minio-operator-console-7d5db9fdd4-brp5z    1/1     Running            0               25m   10.42.0.23   deployment   <none>           <none>
minio-operator-fc8bbbc9b-5fxtm             1/1     Running            0               25m   10.42.0.22   deployment   <none>           <none>
minio-ss-0-0                               1/1     Running            0               24m   10.42.0.28   deployment   <none>           <none>
minio-ss-0-1                               1/1     Running            0               24m   10.42.0.29   deployment   <none>           <none>
workflows-worker-864756b6d4-5khfc          1/1     Running            0               14m   10.42.0.31   deployment   <none>           <none>
gui-6b6c9dcc9-8nv4v                        1/1     Running            0               14m   10.42.0.36   deployment   <none>           <none>
iot-manager-c658c9869-5hhkt                1/1     Running            0               14m   10.42.0.32   deployment   <none>           <none>
create-artifact-worker-5f4dd85f4d-277jv    1/1     Running            0               14m   10.42.0.35   deployment   <none>           <none>
inventory-5589f4df8f-n9shm                 1/1     Running            0               14m   10.42.0.37   deployment   <none>           <none>
useradm-78fc76586f-wwsp4                   1/1     Running            0               14m   10.42.0.41   deployment   <none>           <none>
api-gateway-7bcb79c8d4-g7qqv               1/1     Running            0               14m   10.42.0.33   deployment   <none>           <none>
workflows-server-5668fc8776-66gtw          1/1     Running            0               14m   10.42.0.40   deployment   <none>           <none>
deviceconnect-85854c47fb-d72hf             1/1     Running            0               14m   10.42.0.39   deployment   <none>           <none>
deviceconfig-655bc99fc-9s5mv               1/1     Running            0               14m   10.42.0.38   deployment   <none>           <none>
device-auth-59d8744b7b-5nvth               1/1     Running            0               14m   10.42.0.34   deployment   <none>           <none>
deployments-8f97bb89b-8ktgj                0/1     CrashLoopBackOff   7 (3m12s ago)   14m   10.42.0.30   deployment   <none>           <none>

alfrunes · July 14, 2022, 1:59pm

This looks like the minio credentials in the Mender helm chart values does not match the generated access key ($MINIO_ACCESS_KEY and $MINIO_SECRET_KEY) from this step.

Tip: you don’t have to redeploy everything from scratch to update the values, you can run:
helm get values mender > values.yml
vim values.yml # Update values in section global.s3.*
helm upgrade mender -f values.yml

If this doesn’t help, could you share the logs from the deployments-* pod?

walterp · July 14, 2022, 2:40pm

2022-07-14T14:37:01.525790194Z stderr F RequestError: send request failed
2022-07-14T14:37:01.525797681Z stderr F caused by: Put "https://artifacts.interelectronix.com/mender-artifact-storage": x509: certificate is valid for d2bc123fc7ac1ea25c3581b93e5a12e6.cf3bcb6853f16c11ffa66b5e1d33adda.traefik.default, not artifacts.interelectronix.com

stumarr · October 14, 2022, 1:40pm

I’m facing the same issue. However, my initial deployment was with the recommended version, 11.2.0. I tried changing to the latest chart version, which is 13.1.2, but I’m still seeing the same containers in CrashLoopBackOff status.
Any suggestions?
Thanks,
stu

alfrunes · October 17, 2022, 7:37am

@walterp Sorry, I completely missed your last reply. Your issue seems to be related to your certificate setup. The ingress does not have a certificate configured for the artifacts (minio) hostname, so it is serving with the fallback certificate from the IngressController. Make sure the tls configuration in the minio TLS configuration matches the certificate secret.

@stumarr Which pods are in CrashLoopBackoff state? Can you inspect the logs from the crashing pods and see if you find some hints on why they are failing?

stumarr · October 17, 2022, 11:42am

Thanks for the response. These are the pods that are failing, same as walterp initially posted:

default device-auth-7d899c7fc5-ft48m 0/1 CrashLoopBackOff 31 (5m2s ago) 139m
default iot-manager-7b98c9dd88-r55dt 0/1 CrashLoopBackOff 31 (4m59s ago) 139m
default inventory-57f899d454-mz8kc 0/1 CrashLoopBackOff 32 (4m43s ago) 139m
default deployments-6c8fd4f759-hg8m4 0/1 CrashLoopBackOff 218 (2m19s ago) 17h
default mongodb-arbiter-0 0/1 CrashLoopBackOff 18 (117s ago) 141m
default workflows-worker-78c9dd6454-4hbtq 0/1 CrashLoopBackOff 31 (46s ago) 139m
default create-artifact-worker-79d6df5997-fdqp6 0/1 CrashLoopBackOff 31 (42s ago) 139m
default deviceconnect-6d565ffbbd-88q5w 0/1 CrashLoopBackOff 32 (34s ago) 139m
default workflows-server-6f88b49d49-csdpp 0/1 CrashLoopBackOff 32 (15s ago) 139m
default useradm-7d85cb66b7-ts2bs 0/1 Running 404 (11s ago) 17h
default deviceconfig-7dfc7445bb-cfktq 0/1 CrashLoopBackOff 32 (6s ago) 139m

stumarr · October 17, 2022, 11:49am

Also, the mongodb chart installs successfully, but when I try to connect to the db using the mongo shell I get this error:

I have no name!@mongodb-client:/$ mongosh admin --host “mongodb” --authenticationDatabase admin -u root -p mongorootpassword123
Current Mongosh Log ID: 634d413e22a6354e8748107d
Connecting to: mongodb://mongodb:27017/admin?directConnection=true&appName=mongosh+1.3.1
MongoNetworkError: getaddrinfo ENOTFOUND mongodb

alfrunes · October 17, 2022, 12:23pm

The Mender pods are crashing because the mongodb instance is not reachable. I can also see that the mongodb arbiter is crashing, could you inspect why it is crashing?
The hostname in the mongodb URL is incorrect; by default the helm chart is creating a headless service which is reachable using the FQDN mongodb-headless.default.svc.cluster.local. If you haven’t changed the default port name for the service (“mongodb”) you can utilize the DNS service record (SRV) that kubernetes creates for headless services and use the more compact connection string: mongodb+srv://root:mongorootpassword123@mongodb-headless.default.svc.cluster.local?tls=false

stumarr · October 17, 2022, 12:42pm

So I created a pod to test the network connectivity from the pods to mongodb. Looks like we’re able to connect.

kubectl exec -it busybox – /bin/sh
/ # telnet 10.42.0.39 27017
Connected to 10.42.0.39

telnet mongodb-1.mongodb-headless.default.svc.cluster.local 27017
Connected to mongodb-1.mongodb-headless.default.svc.cluster.local

alfrunes · October 17, 2022, 1:23pm

Are you able to connect to the MongoDB cluster? Note that if you redeployed the MongoDB helm chart following the instructions, you might have changed the root password to the cluster which you need to input into the mender helm values. To see the current root password, you can inspect the secret generated by the mongodb helm chart:

MGO_ROOT_PWD=$(kubectl get secret -l 'app.kubernetes.io/name=mongodb' -o 'jsonpath={.items[].data.mongodb-root-password}' | base64 -d)
echo "Connection string: mongodb+srv://root:${MGO_ROOT_PWD}@mongodb-headless.default.svc.cluster.local?tls=false"
# Enter the new connection string into mender Helm values at `global.mongodb.URL`
# Then run `helm upgrade mender -f <path/to/mender.yaml>`

stumarr · October 17, 2022, 1:40pm

I think that’s the issue that I’m trying to highlight here. I am not able to connect to MongoDB. If I can’t connect to it manually, I’m sure the Mender components will also be unable to connect. However, it doesn’t seem to be an issue with the credentials because when I hard-code the root password it times out when I try to connect to MongoDB.

stumarr · October 17, 2022, 6:00pm

@alfrunes we were able to resolve our issue. It turns out that I made the mistake of creating my own yaml files instead of using, for example, the cat >mender-ingress.yml <<EOF method. This was causing an issue with the secrets. I appreciate the help.

stumarr · October 18, 2022, 12:06pm

One last issue…This step isn’t working for me. I enter the credentials and it just refreshes the login screen but never logs in. Is there a pod that I need to restart?

Create the Admin User
USERADM_POD=$(kubectl get pod -l ‘app.kubernetes.io/name=useradm’ -o name | head -1)
kubectl exec $USERADM_POD – useradm create-user --username “demo@mender.io” --password “demodemo”

Topic		Replies	Views
Kubernetes setup fail. Secret Mongodb-common no found General Discussions	12	1165	April 19, 2023
Mender on K8S problems General Discussions mender-server	31	1926	April 10, 2024
Deployment of AKS based mender-server Version 3.6.2 not completely successful General Discussions mender-server	31	695	September 8, 2023
Open Source Mender Server Tutorial General Discussions mender , mender-server	2	87	January 23, 2025
Kubernetes deployments endpoint is not working, pod crashing General Discussions deployments , kubernetes	7	582	April 5, 2023

Kubernetes installation fail

MongoDB:

Mender Server:

CrashLoopBackOff

Related topics