Hi, everyone!
I have an issue and want to know if you can help me. Recently, I upgraded my EKS on AWS to 1.29 and after that, I cannot get Mender to work. The workflows-server, the workflows-worker and create-artifact-worker continuosly restart with CrashLoopBackOff
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-gateway-865ffc4768-mqzkg 1/1 Running 0 51m
create-artifact-worker-69646cf645-464dq 0/1 CrashLoopBackOff 14 (4m48s ago) 51m
deployments-5d9b6d6bd5-lm7sl 0/1 Running 0 11m
device-auth-77bdb4868f-h2fv2 0/1 Running 0 11m
deviceconfig-586d87974-jqxxb 1/1 Running 0 11m
deviceconnect-8565cf5d45-khxhf 1/1 Running 0 11m
gui-5d76f4dc89-8gpt7 1/1 Running 0 41m
inventory-dcc7dc886-wgzq8 1/1 Running 0 11m
iot-manager-675488bd48-8zhxk 1/1 Running 0 11m
nats-0 3/3 Running 0 41m
nats-box-56d4878784-4q2j7 1/1 Running 0 51m
sonarqube-sonarqube-0 1/1 Running 0 111m
useradm-99478d77b-p8wb6 1/1 Running 0 11m
workflows-server-58495d7489-rlmvq 0/1 CrashLoopBackOff 8 (3m46s ago) 19m
workflows-worker-6bdb9d8cbc-k56n7 0/1 CrashLoopBackOff 7 (56s ago) 11m
This are the logs:
Workflows-server logs
time="2024-09-23T22:18:10Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38
2024/09/23 22:18:10 nats: no responders available for request
Workflows-server Describe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31m default-scheduler Successfully assigned default/workflows-server-58495d7489-rlmvq to ip-10-127-1-254.ec2.internal
Normal Pulled 29m (x5 over 31m) kubelet Container image "docker.io/mendersoftware/workflows:mender-3.4.0" already present on machine
Normal Created 29m (x5 over 31m) kubelet Created container workflows
Normal Started 29m (x5 over 31m) kubelet Started container workflows
Warning BackOff 69s (x151 over 31m) kubelet Back-off restarting failed container workflows in pod workflows-server-58495d7489-rlmvq_default(86d06ee9-850f-4ee5-814e-b00a534c7352)
Workflows-worker logs
time="2024-09-23T22:21:04Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38
2024/09/23 22:21:04 nats: no responders available for request
Workflows-worker describe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned default/workflows-worker-6bdb9d8cbc-k56n7 to ip-10-127-2-101.ec2.internal
Normal Pulling 22m kubelet Pulling image "docker.io/mendersoftware/workflows-worker:mender-3.4.0"
Normal Pulled 22m kubelet Successfully pulled image "docker.io/mendersoftware/workflows-worker:mender-3.4.0" in 839ms (839ms including waiting)
Normal Created 20m (x5 over 22m) kubelet Created container workflows
Normal Started 20m (x5 over 22m) kubelet Started container workflows
Normal Pulled 20m (x4 over 22m) kubelet Container image "docker.io/mendersoftware/workflows-worker:mender-3.4.0" already present on machine
Warning BackOff 2m (x94 over 22m) kubelet Back-off restarting failed container workflows in pod workflows-worker-6bdb9d8cbc-k56n7_default(649c33c3-e550-4989-9397-6d3acc07a2f0)
Creater-Artifact worker logs
time="2024-09-23T22:22:08Z" level=info msg="migrating workflows" file=migrations.go func=mongo.Migrate line=38
2024/09/23 22:22:08 nats: no responders available for request
The nats are working fine.
The Deployments pod is running good, but I got the following 503 error, I supposed because the worksflows server is not working:
time="2024-09-23T22:32:12Z" level=error msg="Workflows service unhealthy: Get \"http://mender-workflows-server:8080/api/v1/health\": dial tcp 172.20.50.161:8080: connect: connection refused" file=response_helpers.go func=rest_utils.restErrWithLogMsg line=110 request_id=1bd8f4a1-8357-4e09-be55-f505633ca3f7
time="2024-09-23T22:32:12Z" level=info msg="503 39289μs GET /api/internal/v1/deployments/health HTTP/1.1 - kube-probe/1.29+" byteswritten=208 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=GET path=/api/internal/v1/deployments/health qs= request_id=1bd8f4a1-8357-4e09-be55-f505633ca3f7 responsetime=0.0
39289687 status=503 ts="2024-09-23 22:32:12.495906076 +0000 UTC" type=http
time="2024-09-23T22:32:16Z" level=error msg="Workflows service unhealthy: Get \"http://mender-workflows-server:8080/api/v1/health\": dial tcp 172.20.50.161:8080: connect: connection refused" file=response_helpers.go func=rest_utils.restErrWithLogMsg line=110 request_id=c7030c45-2578-4441-81fe-76b833dd8439
time="2024-09-23T22:32:16Z" level=info msg="503 23544μs GET /api/internal/v1/deployments/health HTTP/1.1 - kube-probe/1.29+" byteswritten=208 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=GET path=/api/internal/v1/deployments/health qs= request_id=c7030c45-2578-4441-81fe-76b833dd8439 responsetime=0.0
23544842 status=503 ts="2024-09-23 22:32:16.673091063 +0000 UTC" type=http
time="2024-09-23T22:32:17Z" level=info msg="204 53μs GET /api/internal/v1/deployments/alive HTTP/1.1 - kube-probe/1.29+" byteswritten=0 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=GET path=/api/internal/v1/deployments/alive qs= request_id=5181b61c-ed85-4de3-b851-cc4e9dc39752 responsetime=5.3792e-05
status=204 ts="2024-09-23 22:32:17.495558402 +0000 UTC" type=http
time="2024-09-23T22:32:22Z" level=info msg="204 53μs GET /api/internal/v1/deployments/alive HTTP/1.1 - kube-probe/1.29+" byteswritten=0 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=GET path=/api/internal/v1/deployments/alive qs= request_id=85316c95-e6a1-470e-80f1-646b98449b14 responsetime=5.3454e-05
status=204 ts="2024-09-23 22:32:22.495332492 +0000 UTC" type=http
time="2024-09-23T22:32:27Z" level=info msg="204 66μs GET /api/internal/v1/deployments/alive HTTP/1.1 - kube-probe/1.29+" byteswritten=0 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=GET path=/api/internal/v1/deployments/alive qs= request_id=28578432-71d3-4a04-b075-617073ffe78d responsetime=6.6035e-05
status=204 ts="2024-09-23 22:32:27.495704439 +0000 UTC" type=http
time="2024-09-23T22:32:27Z" level=error msg="Workflows service unhealthy: Get \"http://mender-workflows-server:8080/api/v1/health\": dial tcp 172.20.50.161:8080: connect: connection refused" file=response_helpers.go func=rest_utils.restErrWithLogMsg line=110 request_id=14aa22b5-4a4a-449f-b4b4-dbebb888a614
I tried to redeploy the Mender with Helm, but didn’t work. The stack I’m using is:
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.8-eks-a737599
Mender: 3.4.0 (Via Helm)
What could be happening?
Thanks in advance.
Regards,
Víctor.