Kubernetes installation fail

I followed the instructions at Production installation with Kubernetes | Mender documentation to install a k3s cluster with a mender opensource production server.

MongoDB:

Using the command
helm upgrade --install mongodb bitnami/mongodb --version 10.21.1 -f mongodb.yml
quits with an error
Error: failed to download "bitnami/mongodb" at version "10.21.1"
If I remove the version instruction --version 10.21.1 and the installation works.
Is there any version definitely needed?

Mender Server:

In the last step ā€œCreate the admin userā€ I get the following error:
error: unable to upgrade connection: container not found ("useradm")

CrashLoopBackOff

I get many CrashLoopBackOff status:

root@deployment:/var/lib/rancher/k3s/server/manifests# kubectl get pod -o wide
NAME                                       READY   STATUS             RESTARTS       AGE    IP           NODE         NOMINATED NODE   READINESS GATES
cert-manager-cainjector-5bb9bfbb5c-nz6l7   1/1     Running            0              109m   10.42.0.10   deployment   <none>           <none>
cert-manager-798f8bb594-pstp5              1/1     Running            0              109m   10.42.0.9    deployment   <none>           <none>
cert-manager-webhook-bf48877d4-v2gzn       1/1     Running            0              109m   10.42.0.11   deployment   <none>           <none>
mongodb-arbiter-0                          1/1     Running            0              103m   10.42.0.12   deployment   <none>           <none>
mongodb-0                                  1/1     Running            0              103m   10.42.0.14   deployment   <none>           <none>
mongodb-1                                  1/1     Running            0              102m   10.42.0.16   deployment   <none>           <none>
nats-box-68dd458c5d-5nd6b                  1/1     Running            0              101m   10.42.0.17   deployment   <none>           <none>
nats-0                                     3/3     Running            0              101m   10.42.0.20   deployment   <none>           <none>
nats-1                                     3/3     Running            0              101m   10.42.0.21   deployment   <none>           <none>
minio-operator-fc8bbbc9b-wtk9m             1/1     Running            0              98m    10.42.0.23   deployment   <none>           <none>
minio-operator-console-7d5db9fdd4-2lxrz    1/1     Running            0              98m    10.42.0.22   deployment   <none>           <none>
minio-ss-0-1                               1/1     Running            0              94m    10.42.0.30   deployment   <none>           <none>
minio-ss-0-0                               1/1     Running            0              94m    10.42.0.31   deployment   <none>           <none>
api-gateway-7bcb79c8d4-t5lh8               1/1     Running            0              83m    10.42.0.32   deployment   <none>           <none>
gui-6b6c9dcc9-vrbft                        1/1     Running            0              83m    10.42.0.41   deployment   <none>           <none>
deployments-8f97bb89b-xrcqb                0/1     CrashLoopBackOff   21 (93s ago)   83m    10.42.0.36   deployment   <none>           <none>
deviceconnect-85854c47fb-n6bbs             0/1     CrashLoopBackOff   21 (88s ago)   83m    10.42.0.37   deployment   <none>           <none>
deviceconfig-655bc99fc-lxg94               0/1     CrashLoopBackOff   21 (80s ago)   83m    10.42.0.38   deployment   <none>           <none>
workflows-worker-864756b6d4-xlxks          0/1     CrashLoopBackOff   21 (85s ago)   83m    10.42.0.33   deployment   <none>           <none>
create-artifact-worker-5f4dd85f4d-g76q8    0/1     CrashLoopBackOff   21 (82s ago)   83m    10.42.0.35   deployment   <none>           <none>
workflows-server-5668fc8776-9blx7          0/1     CrashLoopBackOff   21 (70s ago)   83m    10.42.0.39   deployment   <none>           <none>
iot-manager-c658c9869-flltj                0/1     CrashLoopBackOff   21 (59s ago)   83m    10.42.0.42   deployment   <none>           <none>
useradm-78fc76586f-5b8x2                   0/1     CrashLoopBackOff   21 (35s ago)   83m    10.42.0.40   deployment   <none>           <none>
device-auth-59d8744b7b-txlcj               0/1     CrashLoopBackOff   21 (30s ago)   83m    10.42.0.43   deployment   <none>           <none>
inventory-5589f4df8f-zzps6                 0/1     CrashLoopBackOff   21 (23s ago)   83m    10.42.0.34   deployment   <none>           <none>

What can I do to find the mistake? Are there any steps additionally needed?

The chart version 10.21.1 for bitnami/mongodb no longer exists. Could you replace the --version argument with 11.2.0? Remember to update the Mender chart values. Note that the snippet deploying the Mender helm chart depends on the $MONGODB_ROOT_PASSWORD env variable from the MongoDB deployment step.
In the meantime I will make sure to update the documentation. All the services that are in CrashLoopBackOff state depends on MongoDB.

Thank you for the fast reply.

I deleted/uninstalled my previous k3s installation and work through all the installation steps with the different mongodb version.

All installation steps succed successfully but one pod shows the status CrashLoopBackOff:

root@deployment:/var/lib/rancher/k3s/server/manifests# kubectl get pod -o wide
NAME                                       READY   STATUS             RESTARTS        AGE   IP           NODE         NOMINATED NODE   READINESS GATES
cert-manager-cainjector-5bb9bfbb5c-zgk25   1/1     Running            0               29m   10.42.0.7    deployment   <none>           <none>
cert-manager-798f8bb594-cp28k              1/1     Running            0               29m   10.42.0.9    deployment   <none>           <none>
cert-manager-webhook-bf48877d4-k4hlp       1/1     Running            0               29m   10.42.0.8    deployment   <none>           <none>
mongodb-arbiter-0                          1/1     Running            0               27m   10.42.0.13   deployment   <none>           <none>
mongodb-0                                  1/1     Running            0               27m   10.42.0.14   deployment   <none>           <none>
mongodb-1                                  1/1     Running            0               27m   10.42.0.16   deployment   <none>           <none>
nats-box-5448cbc897-vnk74                  1/1     Running            0               26m   10.42.0.17   deployment   <none>           <none>
nats-0                                     3/3     Running            0               26m   10.42.0.19   deployment   <none>           <none>
nats-1                                     3/3     Running            0               26m   10.42.0.21   deployment   <none>           <none>
minio-operator-console-7d5db9fdd4-brp5z    1/1     Running            0               25m   10.42.0.23   deployment   <none>           <none>
minio-operator-fc8bbbc9b-5fxtm             1/1     Running            0               25m   10.42.0.22   deployment   <none>           <none>
minio-ss-0-0                               1/1     Running            0               24m   10.42.0.28   deployment   <none>           <none>
minio-ss-0-1                               1/1     Running            0               24m   10.42.0.29   deployment   <none>           <none>
workflows-worker-864756b6d4-5khfc          1/1     Running            0               14m   10.42.0.31   deployment   <none>           <none>
gui-6b6c9dcc9-8nv4v                        1/1     Running            0               14m   10.42.0.36   deployment   <none>           <none>
iot-manager-c658c9869-5hhkt                1/1     Running            0               14m   10.42.0.32   deployment   <none>           <none>
create-artifact-worker-5f4dd85f4d-277jv    1/1     Running            0               14m   10.42.0.35   deployment   <none>           <none>
inventory-5589f4df8f-n9shm                 1/1     Running            0               14m   10.42.0.37   deployment   <none>           <none>
useradm-78fc76586f-wwsp4                   1/1     Running            0               14m   10.42.0.41   deployment   <none>           <none>
api-gateway-7bcb79c8d4-g7qqv               1/1     Running            0               14m   10.42.0.33   deployment   <none>           <none>
workflows-server-5668fc8776-66gtw          1/1     Running            0               14m   10.42.0.40   deployment   <none>           <none>
deviceconnect-85854c47fb-d72hf             1/1     Running            0               14m   10.42.0.39   deployment   <none>           <none>
deviceconfig-655bc99fc-9s5mv               1/1     Running            0               14m   10.42.0.38   deployment   <none>           <none>
device-auth-59d8744b7b-5nvth               1/1     Running            0               14m   10.42.0.34   deployment   <none>           <none>
deployments-8f97bb89b-8ktgj                0/1     CrashLoopBackOff   7 (3m12s ago)   14m   10.42.0.30   deployment   <none>           <none>

This looks like the minio credentials in the Mender helm chart values does not match the generated access key ($MINIO_ACCESS_KEY and $MINIO_SECRET_KEY) from this step.

Tip: you donā€™t have to redeploy everything from scratch to update the values, you can run:

helm get values mender > values.yml
vim values.yml # Update values in section global.s3.*
helm upgrade mender -f values.yml

If this doesnā€™t help, could you share the logs from the deployments-* pod?

2022-07-14T14:37:01.525790194Z stderr F RequestError: send request failed
2022-07-14T14:37:01.525797681Z stderr F caused by: Put "https://artifacts.interelectronix.com/mender-artifact-storage": x509: certificate is valid for d2bc123fc7ac1ea25c3581b93e5a12e6.cf3bcb6853f16c11ffa66b5e1d33adda.traefik.default, not artifacts.interelectronix.com

Iā€™m facing the same issue. However, my initial deployment was with the recommended version, 11.2.0. I tried changing to the latest chart version, which is 13.1.2, but Iā€™m still seeing the same containers in CrashLoopBackOff status.
Any suggestions?
Thanks,
stu

@walterp Sorry, I completely missed your last reply. Your issue seems to be related to your certificate setup. The ingress does not have a certificate configured for the artifacts (minio) hostname, so it is serving with the fallback certificate from the IngressController. Make sure the tls configuration in the minio TLS configuration matches the certificate secret.

@stumarr Which pods are in CrashLoopBackoff state? Can you inspect the logs from the crashing pods and see if you find some hints on why they are failing?

Thanks for the response. These are the pods that are failing, same as walterp initially posted:

default device-auth-7d899c7fc5-ft48m 0/1 CrashLoopBackOff 31 (5m2s ago) 139m
default iot-manager-7b98c9dd88-r55dt 0/1 CrashLoopBackOff 31 (4m59s ago) 139m
default inventory-57f899d454-mz8kc 0/1 CrashLoopBackOff 32 (4m43s ago) 139m
default deployments-6c8fd4f759-hg8m4 0/1 CrashLoopBackOff 218 (2m19s ago) 17h
default mongodb-arbiter-0 0/1 CrashLoopBackOff 18 (117s ago) 141m
default workflows-worker-78c9dd6454-4hbtq 0/1 CrashLoopBackOff 31 (46s ago) 139m
default create-artifact-worker-79d6df5997-fdqp6 0/1 CrashLoopBackOff 31 (42s ago) 139m
default deviceconnect-6d565ffbbd-88q5w 0/1 CrashLoopBackOff 32 (34s ago) 139m
default workflows-server-6f88b49d49-csdpp 0/1 CrashLoopBackOff 32 (15s ago) 139m
default useradm-7d85cb66b7-ts2bs 0/1 Running 404 (11s ago) 17h
default deviceconfig-7dfc7445bb-cfktq 0/1 CrashLoopBackOff 32 (6s ago) 139m

Also, the mongodb chart installs successfully, but when I try to connect to the db using the mongo shell I get this error:

I have no name!@mongodb-client:/$ mongosh admin --host ā€œmongodbā€ --authenticationDatabase admin -u root -p mongorootpassword123
Current Mongosh Log ID: 634d413e22a6354e8748107d
Connecting to: mongodb://mongodb:27017/admin?directConnection=true&appName=mongosh+1.3.1
MongoNetworkError: getaddrinfo ENOTFOUND mongodb

The Mender pods are crashing because the mongodb instance is not reachable. I can also see that the mongodb arbiter is crashing, could you inspect why it is crashing?
The hostname in the mongodb URL is incorrect; by default the helm chart is creating a headless service which is reachable using the FQDN mongodb-headless.default.svc.cluster.local. If you havenā€™t changed the default port name for the service (ā€œmongodbā€) you can utilize the DNS service record (SRV) that kubernetes creates for headless services and use the more compact connection string: mongodb+srv://root:mongorootpassword123@mongodb-headless.default.svc.cluster.local?tls=false

So I created a pod to test the network connectivity from the pods to mongodb. Looks like weā€™re able to connect.

kubectl exec -it busybox ā€“ /bin/sh
/ # telnet 10.42.0.39 27017
Connected to 10.42.0.39

telnet mongodb-1.mongodb-headless.default.svc.cluster.local 27017
Connected to mongodb-1.mongodb-headless.default.svc.cluster.local

Are you able to connect to the MongoDB cluster? Note that if you redeployed the MongoDB helm chart following the instructions, you might have changed the root password to the cluster which you need to input into the mender helm values. To see the current root password, you can inspect the secret generated by the mongodb helm chart:

MGO_ROOT_PWD=$(kubectl get secret -l 'app.kubernetes.io/name=mongodb' -o 'jsonpath={.items[].data.mongodb-root-password}' | base64 -d)
echo "Connection string: mongodb+srv://root:${MGO_ROOT_PWD}@mongodb-headless.default.svc.cluster.local?tls=false"
# Enter the new connection string into mender Helm values at `global.mongodb.URL`
# Then run `helm upgrade mender -f <path/to/mender.yaml>`

I think thatā€™s the issue that Iā€™m trying to highlight here. I am not able to connect to MongoDB. If I canā€™t connect to it manually, Iā€™m sure the Mender components will also be unable to connect. However, it doesnā€™t seem to be an issue with the credentials because when I hard-code the root password it times out when I try to connect to MongoDB.

@alfrunes we were able to resolve our issue. It turns out that I made the mistake of creating my own yaml files instead of using, for example, the cat >mender-ingress.yml <<EOF method. This was causing an issue with the secrets. I appreciate the help.

1 Like

One last issueā€¦This step isnā€™t working for me. I enter the credentials and it just refreshes the login screen but never logs in. Is there a pod that I need to restart?

Create the Admin User
USERADM_POD=$(kubectl get pod -l ā€˜app.kubernetes.io/name=useradmā€™ -o name | head -1)
kubectl exec $USERADM_POD ā€“ useradm create-user --username ā€œdemo@mender.ioā€ --password ā€œdemodemoā€