davel
March 11, 2022, 4:12pm
#1
I’m having a problem setting up mender (non-enterprise) on K8S.
I’ve followed the instructions from here:
I have also now created an own hostname for the mender server (other thread can be ignored) and have my own ingress that is working correctly. I have created a user using useradm as specified and can login but get a number of errors. In K8S I see that not all server are starting:
mender api-gateway-5ccf655f9b-k98rw 1/1 Running 0 4h10m
mender create-artifact-worker-68db55dd4f-4xbtw 0/1 CrashLoopBackOff 11 (3m25s ago) 35m
mender deployments-66659cd55f-84287 0/1 Running 11 (3h42m ago) 4h11m
mender device-auth-ddf5fdccc-njw6f 0/1 Running 1 (4h8m ago) 4h8m
mender deviceconfig-798796457d-wsw45 1/1 Running 1 (4h8m ago) 4h9m
mender deviceconnect-5d4dc576cc-6dx6k 1/1 Running 2 (4h8m ago) 4h9m
mender gui-584bfd7ff7-gddzt 1/1 Running 0 4h10m
mender inventory-595cb5bb7b-ngdnf 1/1 Running 1 (4h8m ago) 4h9m
mender iot-manager-7b9596f696-855mf 1/1 Running 2 (4h9m ago) 4h9m
mender useradm-59c9449649-k5gwq 1/1 Running 4 (4h9m ago) 4h10m
mender workflows-server-7794f55d99-2xhj2 0/1 Error 54 (5m14s ago) 4h12m
mender workflows-worker-778f8fb6d6-4m9dv 0/1 CrashLoopBackOff 17 (3m51s ago) 67m
Most are however and the connection to mongo is also working.
All 3 of the crashing servers report a similar thing:
time="2022-03-11T16:05:51Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:05:51Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:05:51Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/03/11 16:05:56 context deadline exceeded
time="2022-03-11T16:04:11Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:04:11Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:04:11Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/03/11 16:04:16 context deadline exceeded
time="2022-03-11T16:05:25Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:05:25Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-11T16:05:25Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
2022/03/11 16:05:30 context deadline exceeded
Is it stranger that they all log the same thing?
Also the time between the previous log and the “context deadline exceeded” is only 5 seconds.
davel
March 11, 2022, 4:59pm
#2
I recreated the namespace from scratch with the exact same result
davel
March 11, 2022, 6:47pm
#3
I noticed that I had taken the most up to date mongo but the version here MongoDB | Mender documentation was set at 10.21… So I installed that instead and redid the whole mender namepsace as well… Exact same result again
Hello @davel , and thank you for trying Mender!
The services that are crashing all depend on NATS , which is what I suspect is not deployed or reachable with the configured URL. Can you verify that NATS is deployed and reachable with the nats
hostname in your deployment?
davel
March 14, 2022, 12:19pm
#5
Hi @alfrunes , thanks for the reply. Nats is deployed, healthy and reachable:
nats: {
URL: 'nats://nats.nats.svc.cluster.local:4222'
},
What I did this morning is downgrade mender to 3.1.1 and it is working now. I didn’t change anything else so if I had to hazard a guess I would say there is a bug somewhere in 3.2.1.
I think I may have found an error in our documentation. Since the latest release, these backend services depends on NATS Jetstream™, which needs explicit configuration to work. I think the following helm values should work for the NATS deployment:
cluster:
enabled: true
replicas: 2
nats:
image: "nats:2.7.4-alpine"
jetstream:
enabled: true
memStorage:
enabled: true
size: "2Gi"
fileStorage:
enabled: true
size: "1Gi"
storageDirectory: /data/
#storageClassName: "default"
I’ve created a ticket to make sure the documentation gets corrected ASAP.
Michal
March 17, 2022, 1:37pm
#7
Hi @alfrunes ,
I have similar problem, as davel .
I have tried installing NATS with your new configuration and result after
cat >nats.yml <<EOF
cluster:
enabled: true
replicas: 2
nats:
image: "nats:2.3.1-alpine"
jetstream:
enabled: true
memStorage:
enabled: true
size: "1Gi"
fileStorage:
enabled: true
size: "2Gi"
storageDirectory: /data/
EOF
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update
helm install nats nats/nats --version 0.8.2 -f nats.yml
is
NAME READY STATUS RESTARTS AGE
cert-manager-5d7f97b46d-kzr4b 1/1 Running 1 (55m ago) 68m
mongodb-arbiter-0 1/1 Running 1 (55m ago) 66m
mongodb-0 1/1 Running 1 (55m ago) 66m
cert-manager-webhook-f697cc96d-tzvv8 1/1 Running 1 (55m ago) 68m
mongodb-1 1/1 Running 1 (55m ago) 65m
cert-manager-cainjector-69d885bf55-vzgqs 1/1 Running 2 (54m ago) 68m
nats-0 0/3 Pending 0 2m20s
nats-box-56c684cb47-tr9fb 1/1 Running 0 2m20s
kubectl describe pod nats-0:
Name: nats-0
Namespace: default
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=nats
app.kubernetes.io/name=nats
controller-revision-hash=nats-5f96889449
statefulset.kubernetes.io/pod-name=nats-0
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 7777
prometheus.io/scrape: true
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/nats
Containers:
nats:
Image: nats:2.3.1-alpine
Ports: 4222/TCP, 7422/TCP, 7522/TCP, 6222/TCP, 8222/TCP, 7777/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
nats-server
--config
/etc/nats-config/nats.conf
Liveness: http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
POD_NAME: nats-0 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
CLUSTER_ADVERTISE: $(POD_NAME).nats.$(POD_NAMESPACE).svc.cluster.local.
Mounts:
/data/ from nats-js-pvc (rw)
/etc/nats-config from config-volume (rw)
/var/run/nats from pid (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpmvb (ro)
reloader:
Image: natsio/nats-server-config-reloader:0.6.1
Port: <none>
Host Port: <none>
Command:
nats-server-config-reloader
-pid
/var/run/nats/nats.pid
-config
/etc/nats-config/nats.conf
Environment: <none>
Mounts:
/etc/nats-config from config-volume (rw)
/var/run/nats from pid (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpmvb (ro)
metrics:
Image: natsio/prometheus-nats-exporter:0.8.0
Port: 7777/TCP
Host Port: 0/TCP
Args:
-connz
-routez
-subz
-varz
-prefix=nats
-use_internal_server_id
-jsz=all
http://localhost:8222/
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpmvb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
nats-js-pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nats-js-pvc-nats-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nats-config
Optional: false
pid:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-qpmvb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 33s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Am I missing something? Previous configuration started NATS correctly, but workflows were in error/crash state.
Thanks in advance,
Michal
Hi @Michal
It seems like you haven’t configured a DefaultStorageClass on your cluster.
In this case you can explicitly set the storage class by uncommenting the line from the helm values in my previous comment - replaceing “default” with a class name available on your cluster.
To list the available storage classes on your cluster you can run the following command:
kubectl get storageclasses.storage.k8s.io
For instance on my local minikube
cluster, I get:
$ kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) k8s.io/minikube-hostpath Delete Immediate false 46h
Michal
March 17, 2022, 7:16pm
#9
Hi @alfrunes ,
thanks, there was default class “local-path” in my cluster, but something was wrong with PVC, or PV, I deleted whole cluster with all objects manually and recreated it and NATS now starts correctly.
Everything is up except
workflows-worker-778f8fb6d6-z9254 0/1 CrashLoopBackOff 5 (2m39s ago) 5m47s
deployments-66659cd55f-sbmzm 0/1 CrashLoopBackOff 5 (2m20s ago) 5m47s
Deployments are down, because minio does not have valid certificate (Lets encrypt mechanism is not accessible yet, probably because of missing redirect for ports).
workflows-worker’s logs:
time="2022-03-17T18:48:09Z" level=info msg="migrating workflows" file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="migration to version 1.0.0 skipped" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="DB migrated to version 1.0.0" db=workflows file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading 8 workflows from /etc/workflows/definitions." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading deploy_device_configuration v1." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: deploy_device_configuration v1: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading provision_device v8." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: provision_device v8: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading provision_external_device v2." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: provision_external_device v2: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading reindex_inventory v2." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: reindex_inventory v2: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading reindex_reporting v3." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: reindex_reporting v3: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading update_device_inventory v2." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: update_device_inventory v2: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading update_device_status v6." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: update_device_status v6: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: loading decommission_device v8." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="LoadWorkflows: error loading: decommission_device v8: Workflow already exists." file=entry.go func="logrus.(*Entry).Infof" line=351
time="2022-03-17T18:48:09Z" level=info msg="nats client closed the connection" file=client.go func=nats.NewClientWithDefaults.func1.1 line=86
2022/03/17 18:48:09 failed to subscribe to the nats JetStream: cannot create a queue subscription for a consumer without a deliver group
Looks like NATS problem, but I’m quite lost, where to look.
My mender helm chart has config:
nats:
URL: "nats://nats:4222"
I have tried newer image
image: "nats:2.7.4-alpine"
for installation of NATS, as you proposed here, but it did not help.
workflows-server is running OK.
Thanks in advance!
Sorry for the late response - I was trying to reproduce your results using chart version 3.2.1
and master
without any luck. What version of the helm chart are you using?
Could you also try running the following snippet to verify that the Jetstream is in fact created?
kubectl exec `kubectl get pod -l app=nats-box -o name` -- nats stream ls
Example output:
╭────────────────────────────────────────────────────────────────────────────────╮
│ Streams │
├───────────┬─────────────┬─────────────────────┬──────────┬──────┬──────────────┤
│ Name │ Description │ Created │ Messages │ Size │ Last Message │
├───────────┼─────────────┼─────────────────────┼──────────┼──────┼──────────────┤
│ WORKFLOWS │ │ 2022-03-15 16:03:49 │ 0 │ 0 B │ never │
╰───────────┴─────────────┴─────────────────────┴──────────┴──────┴──────────────╯
As for the deployments
service, you need to properly setup the certificate and a DNS record to make sure that the global.s3.AWS_URI
helm value resolves to your minio instance.
Michal
March 22, 2022, 6:43pm
#11
Thanks,
output is:
╭────────────────────────────────────────────────────────────────────────────────╮
│ Streams │
├───────────┬─────────────┬─────────────────────┬──────────┬──────┬──────────────┤
│ Name │ Description │ Created │ Messages │ Size │ Last Message │
├───────────┼─────────────┼─────────────────────┼──────────┼──────┼──────────────┤
│ WORKFLOWS │ │ 2022-03-17 17:03:52 │ 0 │ 0 B │ never │
╰───────────┴─────────────┴─────────────────────┴──────────┴──────┴──────────────╯
helm version
version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.5"}
helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager default 1 2022-03-17 13:25:46.741016257 +0100 CET deployed cert-manager-v1.4.0 v1.4.0
mender default 1 2022-03-17 20:13:49.678944217 +0100 CET deployed mender-3.2.1 3.2.1
minio-operator default 1 2022-03-17 18:00:52.620454599 +0100 CET deployed minio-operator-4.1.7 v4.1.3
mongodb default 1 2022-03-17 17:58:18.709092987 +0100 CET deployed mongodb-10.21.1 4.4.6
nats default 1 2022-03-17 20:11:20.084756604 +0100 CET deployed nats-0.14.2 2.7.4
NATS is version 2.7.4 from last attempt, but 2.3.1 was giving same results.
All charts were installed using commands from Production installation with Kubernetes | Mender documentation .
I can upload my yml files generated in process, if it helps.
I hope, that it is not caused by some leftover from my attempts before adding of jetstream (I have recreated whole cluster though) or because of previous Docker Compose installation (it is shut down).
Michal
I tried (again) deploying everything from scratch in a k3s environment this time. Although I had the same issue with the default storage class not working as expected, I was not able to reproduce the issue with NATS. Everything simply works out of the box
May I ask you to try once more deploying to a fresh k3s cluster?
Before you do, could you also give me the full description of the workflows-worker pod (or at least the container image ID)?
Michal
April 4, 2022, 7:24pm
#13
Hi @alfrunes ,
sorry for late response, my server was down, so no way to test it.
I have tried to reinstall everything again (about 4th time) from one big script and now workflows-worker works right after install I don’t know why, because all generated files looks same as previous, except for generated secrets.
Problem is, that when I restart whole machine, so k3s boots automatically, I get
failed to subscribe to the nats JetStream: cannot create a queue subscription for a consumer without a deliver group
for workflows-worker.
What is even stranger,
kubectl exec `kubectl get pod -l app=nats-box -o name` -- nats stream ls
returns
Streams:
WORKFLOWS
NATS pods are in Running state.
kubectl logs nats-0 nats
[8] 2022/04/04 19:04:11.607079 [INF] Starting nats-server
[8] 2022/04/04 19:04:11.607123 [INF] Version: 2.3.1
[8] 2022/04/04 19:04:11.607125 [INF] Git: [907fef4]
[8] 2022/04/04 19:04:11.607127 [INF] Name: nats-0
[8] 2022/04/04 19:04:11.607132 [INF] Node: S1Nunr6R
[8] 2022/04/04 19:04:11.607134 [INF] ID: NDDIDMI5N3DZPPDNI6J6EPPKKL4QBSLT3LJUVVRXXCZ6NNTCJDYMPNZN
[8] 2022/04/04 19:04:11.607137 [INF] Using configuration file: /etc/nats-config/nats.conf
[8] 2022/04/04 19:04:11.612397 [INF] Starting JetStream
[8] 2022/04/04 19:04:11.613380 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[8] 2022/04/04 19:04:11.613386 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[8] 2022/04/04 19:04:11.613388 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[8] 2022/04/04 19:04:11.613389 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[8] 2022/04/04 19:04:11.613390 [INF]
[8] 2022/04/04 19:04:11.613391 [INF] https://docs.nats.io/jetstream
[8] 2022/04/04 19:04:11.613393 [INF]
[8] 2022/04/04 19:04:11.613394 [INF] ---------------- JETSTREAM ----------------
[8] 2022/04/04 19:04:11.613400 [INF] Max Memory: 1.00 GB
[8] 2022/04/04 19:04:11.613403 [INF] Max Storage: 2.00 GB
[8] 2022/04/04 19:04:11.613404 [INF] Store Directory: "/data/jetstream"
[8] 2022/04/04 19:04:11.613405 [INF] -------------------------------------------
[8] 2022/04/04 19:04:11.630812 [INF] Restored 0 messages for stream "WORKFLOWS"
[8] 2022/04/04 19:04:11.632074 [INF] Recovering 2 consumers for stream - "WORKFLOWS"
[8] 2022/04/04 19:04:11.643995 [INF] Starting JetStream cluster
[8] 2022/04/04 19:04:11.644005 [INF] Creating JetStream metadata controller
[8] 2022/04/04 19:04:11.647296 [INF] JetStream cluster recovering state
[8] 2022/04/04 19:04:11.650342 [INF] Starting http monitor on 0.0.0.0:8222
[8] 2022/04/04 19:04:11.650510 [INF] Listening for client connections on 0.0.0.0:4222
[8] 2022/04/04 19:04:11.651063 [INF] Server is ready
[8] 2022/04/04 19:04:11.651079 [INF] Cluster name is nats
[8] 2022/04/04 19:04:11.651332 [INF] Listening for route connections on 0.0.0.0:6222
[8] 2022/04/04 19:04:11.657697 [INF] JetStream cluster new stream leader for '$G > WORKFLOWS'
[8] 2022/04/04 19:04:11.658001 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > workflows-worker'
[8] 2022/04/04 19:04:11.658074 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
[8] 2022/04/04 19:04:21.653376 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-1.nats.default.svc.cluster.local.": lookup nats-1.nats.default.svc.cluster.local. on 10.43.0.10:53: read udp 10.42.0.50:48455->10.43.0.10:53: i/o timeout
[8] 2022/04/04 19:04:41.738252 [INF] 10.42.0.65:47636 - rid:13 - Route connection created
[8] 2022/04/04 19:04:46.661185 [INF] 10.42.0.65:6222 - rid:14 - Route connection created
[8] 2022/04/04 19:04:46.661540 [INF] 10.42.0.65:6222 - rid:14 - Router connection closed: Duplicate Route
[8] 2022/04/04 19:04:48.155619 [INF] JetStream cluster new metadata leader
[8] 2022/04/04 19:04:49.606657 [INF] JetStream cluster new consumer leader for '$G > WORKFLOWS > create-artifact-worker'
kubectl logs nats-1 nats
[8] 2022/04/04 19:04:18.698872 [INF] Starting nats-server
[8] 2022/04/04 19:04:18.698927 [INF] Version: 2.3.1
[8] 2022/04/04 19:04:18.698930 [INF] Git: [907fef4]
[8] 2022/04/04 19:04:18.698932 [INF] Name: nats-1
[8] 2022/04/04 19:04:18.698936 [INF] Node: yrzKKRBu
[8] 2022/04/04 19:04:18.698938 [INF] ID: NB3WCYR7KJZ65XVYNIUYVLQL3Z2OHR5MST5E6GZMS3J3QD4AHCKTS4TH
[8] 2022/04/04 19:04:18.698941 [INF] Using configuration file: /etc/nats-config/nats.conf
[8] 2022/04/04 19:04:18.700034 [INF] Starting JetStream
[8] 2022/04/04 19:04:18.701403 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[8] 2022/04/04 19:04:18.701414 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[8] 2022/04/04 19:04:18.701415 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[8] 2022/04/04 19:04:18.701417 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[8] 2022/04/04 19:04:18.701418 [INF]
[8] 2022/04/04 19:04:18.701419 [INF] https://docs.nats.io/jetstream
[8] 2022/04/04 19:04:18.701421 [INF]
[8] 2022/04/04 19:04:18.701422 [INF] ---------------- JETSTREAM ----------------
[8] 2022/04/04 19:04:18.701433 [INF] Max Memory: 1.00 GB
[8] 2022/04/04 19:04:18.701435 [INF] Max Storage: 2.00 GB
[8] 2022/04/04 19:04:18.701437 [INF] Store Directory: "/data/jetstream"
[8] 2022/04/04 19:04:18.701438 [INF] -------------------------------------------
[8] 2022/04/04 19:04:18.702127 [INF] Starting JetStream cluster
[8] 2022/04/04 19:04:18.702133 [INF] Creating JetStream metadata controller
[8] 2022/04/04 19:04:18.708325 [INF] JetStream cluster recovering state
[8] 2022/04/04 19:04:18.715385 [INF] Starting http monitor on 0.0.0.0:8222
[8] 2022/04/04 19:04:18.715960 [INF] Listening for client connections on 0.0.0.0:4222
[8] 2022/04/04 19:04:18.716069 [INF] Server is ready
[8] 2022/04/04 19:04:18.716111 [INF] Cluster name is nats
[8] 2022/04/04 19:04:18.716128 [INF] Listening for route connections on 0.0.0.0:6222
[8] 2022/04/04 19:04:28.725672 [ERR] Error trying to connect to route (attempt 1): lookup for host "nats-0.nats.default.svc.cluster.local.": lookup nats-0.nats.default.svc.cluster.local. on 10.43.0.10:53: read udp 10.42.0.65:52598->10.43.0.10:53: i/o timeout
[8] 2022/04/04 19:04:41.738532 [INF] 10.42.0.50:6222 - rid:6 - Route connection created
[8] 2022/04/04 19:04:46.661250 [INF] 10.42.0.50:34610 - rid:7 - Route connection created
[8] 2022/04/04 19:04:46.661761 [INF] 10.42.0.50:34610 - rid:7 - Router connection closed: Duplicate Route
Here is workflows-worker’s describe:
Name: workflows-worker-778f8fb6d6-drbbt
Namespace: default
Priority: 0
Node: vmyoctobuilder/10.0.7.58
Start Time: Mon, 04 Apr 2022 20:53:20 +0200
Labels: app.kubernetes.io/name=workflows-worker
pod-template-hash=778f8fb6d6
Annotations: <none>
Status: Running
IP: 10.42.0.63
IPs:
IP: 10.42.0.63
Controlled By: ReplicaSet/workflows-worker-778f8fb6d6
Containers:
workflows:
Container ID: containerd://16b1c17dcc5e36412841288cd41f16cf92c99bdb147a2218d1ea34854a0ce598
Image: docker.io/mendersoftware/workflows-worker:mender-3.2.1
Image ID: docker.io/mendersoftware/workflows-worker@sha256:8eebc2d4002b84eeb08ac5d6ea0434da0011e004e6fbb5ae6a9a5290920bce0a
Port: <none>
Host Port: <none>
Args:
worker
--excluded-workflows
generate_artifact
--automigrate
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 04 Apr 2022 21:10:20 +0200
Finished: Mon, 04 Apr 2022 21:10:20 +0200
Ready: False
Restart Count: 9
Limits:
cpu: 100m
memory: 128M
Requests:
cpu: 10m
memory: 64M
Environment Variables from:
mongodb-common Secret with prefix 'WORKFLOWS_' Optional: false
Environment:
WORKFLOWS_NATS_URI: nats://nats:4222
HAVE_DEVICECONNECT: true
HAVE_DEVICECONFIG: true
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vx49d (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vx49d:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned default/workflows-worker-778f8fb6d6-drbbt to vmyoctobuilder
Normal Pulling 17m kubelet Pulling image "docker.io/mendersoftware/workflows-worker:mender-3.2.1"
Normal Pulled 15m kubelet Successfully pulled image "docker.io/mendersoftware/workflows-worker:mender-3.2.1" in 1m51.676472634s
Warning BackOff 14m (x3 over 15m) kubelet Back-off restarting failed container
Normal Pulled 14m (x3 over 15m) kubelet Container image "docker.io/mendersoftware/workflows-worker:mender-3.2.1" already present on machine
Normal Created 14m (x4 over 15m) kubelet Created container workflows
Normal Started 14m (x4 over 15m) kubelet Started container workflows
Normal SandboxChanged 6m33s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 4m49s (x4 over 6m33s) kubelet Container image "docker.io/mendersoftware/workflows-worker:mender-3.2.1" already present on machine
Normal Created 4m49s (x4 over 6m33s) kubelet Created container workflows
Normal Started 4m48s (x4 over 6m33s) kubelet Started container workflows
Warning BackOff 83s (x24 over 6m22s) kubelet Back-off restarting failed container
davel
April 4, 2022, 10:42pm
#14
Hi @alfrunes , sorry for the late reply. I stuck with my downgraded config for the short term but will attempt an install with the new instructions. Thanks for the help!
Thanks for the update. I will try to reproduce this scenario later and get back to you.