I’m having a problem setting up mender (non-enterprise) on K8S.
I’ve followed the instructions from here:
I have also now created an own hostname for the mender server (other thread can be ignored) and have my own ingress that is working correctly. I have created a user using useradm as specified and can login but get a number of errors. In K8S I see that not all server are starting:
I noticed that I had taken the most up to date mongo but the version here MongoDB | Mender documentation was set at 10.21… So I installed that instead and redid the whole mender namepsace as well… Exact same result again
The services that are crashing all depend on NATS, which is what I suspect is not deployed or reachable with the configured URL. Can you verify that NATS is deployed and reachable with the nats hostname in your deployment?
What I did this morning is downgrade mender to 3.1.1 and it is working now. I didn’t change anything else so if I had to hazard a guess I would say there is a bug somewhere in 3.2.1.
I think I may have found an error in our documentation. Since the latest release, these backend services depends on NATS Jetstream™, which needs explicit configuration to work. I think the following helm values should work for the NATS deployment:
It seems like you haven’t configured a DefaultStorageClass on your cluster.
In this case you can explicitly set the storage class by uncommenting the line from the helm values in my previous comment - replaceing “default” with a class name available on your cluster.
To list the available storage classes on your cluster you can run the following command:
kubectl get storageclasses.storage.k8s.io
For instance on my local minikube cluster, I get:
$ kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) k8s.io/minikube-hostpath Delete Immediate false 46h
thanks, there was default class “local-path” in my cluster, but something was wrong with PVC, or PV, I deleted whole cluster with all objects manually and recreated it and NATS now starts correctly.
Everything is up except
Deployments are down, because minio does not have valid certificate (Lets encrypt mechanism is not accessible yet, probably because of missing redirect for ports).
workflows-worker’s logs:
Sorry for the late response - I was trying to reproduce your results using chart version 3.2.1 and master without any luck. What version of the helm chart are you using?
Could you also try running the following snippet to verify that the Jetstream is in fact created?
kubectl exec `kubectl get pod -l app=nats-box -o name` -- nats stream ls
Example output:
╭────────────────────────────────────────────────────────────────────────────────╮
│ Streams │
├───────────┬─────────────┬─────────────────────┬──────────┬──────┬──────────────┤
│ Name │ Description │ Created │ Messages │ Size │ Last Message │
├───────────┼─────────────┼─────────────────────┼──────────┼──────┼──────────────┤
│ WORKFLOWS │ │ 2022-03-15 16:03:49 │ 0 │ 0 B │ never │
╰───────────┴─────────────┴─────────────────────┴──────────┴──────┴──────────────╯
As for the deployments service, you need to properly setup the certificate and a DNS record to make sure that the global.s3.AWS_URI helm value resolves to your minio instance.
╭────────────────────────────────────────────────────────────────────────────────╮
│ Streams │
├───────────┬─────────────┬─────────────────────┬──────────┬──────┬──────────────┤
│ Name │ Description │ Created │ Messages │ Size │ Last Message │
├───────────┼─────────────┼─────────────────────┼──────────┼──────┼──────────────┤
│ WORKFLOWS │ │ 2022-03-17 17:03:52 │ 0 │ 0 B │ never │
╰───────────┴─────────────┴─────────────────────┴──────────┴──────┴──────────────╯
helm version
version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.5"}
helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager default 1 2022-03-17 13:25:46.741016257 +0100 CET deployed cert-manager-v1.4.0 v1.4.0
mender default 1 2022-03-17 20:13:49.678944217 +0100 CET deployed mender-3.2.1 3.2.1
minio-operator default 1 2022-03-17 18:00:52.620454599 +0100 CET deployed minio-operator-4.1.7 v4.1.3
mongodb default 1 2022-03-17 17:58:18.709092987 +0100 CET deployed mongodb-10.21.1 4.4.6
nats default 1 2022-03-17 20:11:20.084756604 +0100 CET deployed nats-0.14.2 2.7.4
NATS is version 2.7.4 from last attempt, but 2.3.1 was giving same results.
All charts were installed using commands from Production installation with Kubernetes | Mender documentation .
I can upload my yml files generated in process, if it helps.
I hope, that it is not caused by some leftover from my attempts before adding of jetstream (I have recreated whole cluster though) or because of previous Docker Compose installation (it is shut down).
I tried (again) deploying everything from scratch in a k3s environment this time. Although I had the same issue with the default storage class not working as expected, I was not able to reproduce the issue with NATS. Everything simply works out of the box
May I ask you to try once more deploying to a fresh k3s cluster?
Before you do, could you also give me the full description of the workflows-worker pod (or at least the container image ID)?
Hi @alfrunes,
sorry for late response, my server was down, so no way to test it.
I have tried to reinstall everything again (about 4th time) from one big script and now workflows-worker works right after install I don’t know why, because all generated files looks same as previous, except for generated secrets.
Problem is, that when I restart whole machine, so k3s boots automatically, I get
failed to subscribe to the nats JetStream: cannot create a queue subscription for a consumer without a deliver group
for workflows-worker.
What is even stranger,
kubectl exec `kubectl get pod -l app=nats-box -o name` -- nats stream ls
Hi @alfrunes, sorry for the late reply. I stuck with my downgraded config for the short term but will attempt an install with the new instructions. Thanks for the help!
Hi everyone,
Was this issue ever resolved? I’m currently facing the same issue. Both the create-artifact-worker and workflows-worker pods are failing with CrashLoopBackOff. Logs for both have the same error message:
failed to subscribe to the nats JetStream: cannot create a queue subscription for a consumer without a deliver group
Per the documentation for the Kubernetes installation on the Mender site (NATS | Mender documentation), I am using chart version nats-0.8.2 and app version 2.3.1.
Ubuntu 22.04
k3s version v1.24.6+k3s1 (a8e0c66d)
go version go1.18.6
mender 3.4.0
At the time I was not able to reproduce the issue locally. But I have encountered it recently.
It seems to be related to when the majority of the NATS nodes are down while receiving traffic. The only workaround I have found for this is to remove the consumer configuration from the NATS server and recreate the workflows pods. If you have included nats-box in your deployment (included by default), you can run the following snippet:
kubectl exec `kubectl get pod -l app=nats-box -o name` -- \
sh -c 'for consumer in workflows-worker create-artifact-worker; do nats consumer rm -f WORKFLOWS $consumer; done'
for deploy in workflows-worker workflows-server create-artifact-worker; do
kubectl rollout restart deployment/$deploy
done
I’m hoping that the new NATS consumer API included in one of the recent driver versions will resolve the issue, but it will take some time before we can migrate to this API in our software.