Hi @robgio
I’m having an issue with artifacts generation for big files (500Mb+ tested up to 2Gb).
Although a same size .tml file is created in minio , the artifacts worker never succeeds on creating the final artifact file and .tmp file gets deleted after some time.
Steps to reproduce :
- Artifacts file is successfully uploaded from servers webui
- A .tmp file is created in minio (i can verify it from minio artifacts webui
- generate_artifact task is started on kubernetes but never gets finished , after some time create artifacts pod logs are cleared and new it seems like it restarted with no errors in create-artifacts pod’s logs.
- .tmp file from step 2 suddenly disappears from minio, but system disk size is constantly growing so it might stuck in /tmp ? in create artifacts docker maybe ?
- No error is recorder in artifacts and deploy pods all seems fine !
Things i tried unsuccessfully:
- I tried to modify create artifacts yml in kubernetes and increase limits of cpu & memory , i can verify that the settings works cause i can see more cpu & memory usage while create artifacts workers start but same issue happens…
- I tried directly uploading big files (500Mb, 1Gb, 1.5Gb) directly in minio from minio webui and they all uploaded fine, so no issue there.
- I tried also to use mender artifacts for tag:mender-3.7-4 still same issue.
Logs from [mender-create-artifact-worker-f89796ddf-gdmp7] :
time="2024-07-10T07:15:10Z" level=info msg="migrating workflows" caller="mongo.Migrate@migrations.go:39"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: loading 1 workflows from /etc/workflows/definitions." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:181"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: loading generate_artifact v2." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:183"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: error loading: generate_artifact v2: Workflow already exists." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:189"
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=4
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=7
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=5
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=6
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=8
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=2
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=0
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=9
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=1
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=3
time="2024-07-10T07:17:58Z" level=info msg="Worker: processing job 668e35a6884c994f38c42610 workflow generate_artifact" caller="worker.workerMain@worker.go:178" worker_id=4
time="2024-07-10T07:17:58Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact" caller="worker.processJob@process.go:61" worker_id=4
time="2024-07-10T07:17:58Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact task :Run create_artifact CLI" caller="worker.processJob@process.go:65" worker_id=4
time="2024-07-10T07:18:52Z" level=info msg="Worker: processing job 668e35a6884c994f38c42610 workflow generate_artifact" caller="worker.workerMain@worker.go:178" worker_id=7
time="2024-07-10T07:18:52Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact" caller="worker.processJob@process.go:61" worker_id=7
time="2024-07-10T07:18:52Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact task :Run create_artifact CLI" caller="worker.processJob@process.go:65" worker_id=7
It can see in logs that Run create_artifact CLI is called but never Done, like for example what happens when i upload an artifact with size 100Mb.
After some time all previous logs are cleared from artifacts pod and i can only see new workers been spawned in logs like for example: caller="worker.InitAndRun.func2@worker.go:114" worker_id=4 time="2024-07-10T07:15:10Z" level=info msg="worker starting up"
I can verify there is cpu & memory utilization while artifacts workers are running , but after some time when logs are cleared, cpu & memory returns to normal.
I’m running mender server 3.7.5 on a microk8s single cluster running on a qemu VM with 8 cpu cores, 8 Gb Ram & 100 Gb disk so there are plenty of resources available.
Any idea on what else to check ?
fyi minio ingress conf in docs needs some extra config to work with large files :
nginx.ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/proxy-buffering: 'off'
nginx.ingress.kubernetes.io/proxy-read-timeout: '300'