How to debug Kubernetes installation of mender?

Hello Everyone

We are trying to deploy the mender installation of Kubernetes, following the guidelines for 3.2: Production installation with Kubernetes | Mender documentation

Everything works except for 3 mender containers:

  • 2 times workflows-server
  • 1 time create-artifact-worker
    The containers are stuck in a CrashLoopBackOff.
❯ kubectl get pods --namespace application-peripherals
NAME                                      READY   STATUS             RESTARTS          AGE
api-gateway-756685fdc9-vj7rd              1/1     Running            0                 22h
cert-manager-54b9fc686-hbc4x              1/1     Running            0                 22h
cert-manager-cainjector-89487b959-8x9n6   1/1     Running            0                 22h
cert-manager-webhook-85f96c57dd-2nhvm     1/1     Running            0                 22h
create-artifact-worker-6676fd594-z7rqb    0/1     CrashLoopBackOff   247 (70s ago)     21h
deployments-88d4d87-66c2x                 0/1     Running            23 (20h ago)      22h
device-auth-77ffc8688c-nf7lt              0/1     Running            0                 22h
deviceconfig-7ccbfb857d-fk5pc             1/1     Running            0                 22h
deviceconnect-5468dd6c54-qw4mh            1/1     Running            0                 21h
gui-7b6988cb96-xwp9n                      1/1     Running            0                 22h
inventory-7454868b78-dmlm4                1/1     Running            0                 22h
iot-manager-5465779b4-w5zkg               1/1     Running            0                 22h
minio-operator-6c984995c9-lldss           1/1     Running            0                 22h
minio-operator-console-9d9cbbcc8-flbmf    1/1     Running            0                 22h
minio-ss-0-0                              1/1     Running            0                 22h
minio-ss-0-1                              1/1     Running            0                 22h
mongodb-0                                 1/1     Running            0                 22h
mongodb-arbiter-0                         1/1     Running            0                 22h
nats-0                                    3/3     Running            0                 22h
nats-box-67786894bd-hszrk                 1/1     Running            0                 22h
useradm-65db46c846-xjz59                  1/1     Running            0                 22h
workflows-server-db8fd468d-mb8w7          0/1     CrashLoopBackOff   254 (2m14s ago)   21h
workflows-worker-8657585498-7tcr2         0/1     CrashLoopBackOff   247 (71s ago)     21h

When we check the logs I get the following:

create-artifact-worker-6676fd594-z7rqb

❯ kubectl logs create-artifact-worker-6676fd594-z7rqb --namespace application-peripherals
time="2022-02-02T09:39:11Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:39:11Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:39:11Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/02/02 09:39:16 context deadline exceeded

workflows-server-db8fd468d-mb8w7

❯ kubectl logs workflows-server-db8fd468d-mb8w7 --namespace application-peripherals
time="2022-02-02T09:38:22Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:38:22Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:38:22Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/02/02 09:38:27 context deadline exceeded

workflows-worker-8657585498-7tcr2

❯ kubectl logs workflows-worker-8657585498-7tcr2 --namespace application-peripherals
time="2022-02-02T09:39:16Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:39:16Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-02-02T09:39:16Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/02/02 09:39:21 context deadline exceeded

The context deadline exceeded is probably a GOLANG error. Which makes the error logs ambiguous.


Deviations we have from the installation documentation (Production installation with Kubernetes | Mender documentation)

  • We don’t use AWS Kubernetes, we have a bare-metal Kubernetes
    • we use ingress-nginx as a reverse proxy and TLS termination
    • we use kube-flannel for networking
  • We use MinIO for S3 (deployed as explained in the mender documentation)

Questions:

  • What can we do to debug the ‘context deadline exceeded’ errors?
  • It isn’t mentioned in the documentation, do I have to create the MinIO bucket or does mender take care of this?
  • Is the nats://nats:4222 connection string mentioned in the documentation correct?
    • Don’t we have to use the internal DNS of the nats service?
    • e.g. nats://pod-0.nats.application-peripherals.svc.cluster.local:4222

Additional deviation:
The version of mender chart proposed in the documentation is not available,
–version 3.2.1 only 3.2.0 is available in the chart repo.

Looks to me like this was done yesterday, here: