Mender Server Create Artifacts issue with large files

Hi @robgio

I’m having an issue with artifacts generation for big files (500Mb+ tested up to 2Gb).
Although a same size .tml file is created in minio , the artifacts worker never succeeds on creating the final artifact file and .tmp file gets deleted after some time.

Steps to reproduce :

  1. Artifacts file is successfully uploaded from servers webui
  2. A .tmp file is created in minio (i can verify it from minio artifacts webui
  3. generate_artifact task is started on kubernetes but never gets finished , after some time create artifacts pod logs are cleared and new it seems like it restarted with no errors in create-artifacts pod’s logs.
  4. .tmp file from step 2 suddenly disappears from minio, but system disk size is constantly growing so it might stuck in /tmp ? in create artifacts docker maybe ?
  5. No error is recorder in artifacts and deploy pods all seems fine !

Things i tried unsuccessfully:

  1. I tried to modify create artifacts yml in kubernetes and increase limits of cpu & memory , i can verify that the settings works cause i can see more cpu & memory usage while create artifacts workers start but same issue happens…
  2. I tried directly uploading big files (500Mb, 1Gb, 1.5Gb) directly in minio from minio webui and they all uploaded fine, so no issue there.
  3. I tried also to use mender artifacts for tag:mender-3.7-4 still same issue.

Logs from [mender-create-artifact-worker-f89796ddf-gdmp7] :

time="2024-07-10T07:15:10Z" level=info msg="migrating workflows" caller="mongo.Migrate@migrations.go:39"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: loading 1 workflows from /etc/workflows/definitions." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:181"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: loading generate_artifact v2." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:183"
time="2024-07-10T07:15:10Z" level=info msg="LoadWorkflows: error loading: generate_artifact v2: Workflow already exists." caller="mongo.(*DataStoreMongo).LoadWorkflows@datastore_mongo.go:189"
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=4
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=7
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=5
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=6
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=8
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=2
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=0
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=9
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=1
time="2024-07-10T07:15:10Z" level=info msg="worker starting up" caller="worker.InitAndRun.func2@worker.go:114" worker_id=3
time="2024-07-10T07:17:58Z" level=info msg="Worker: processing job 668e35a6884c994f38c42610 workflow generate_artifact" caller="worker.workerMain@worker.go:178" worker_id=4
time="2024-07-10T07:17:58Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact" caller="worker.processJob@process.go:61" worker_id=4
time="2024-07-10T07:17:58Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact task :Run create_artifact CLI" caller="worker.processJob@process.go:65" worker_id=4
time="2024-07-10T07:18:52Z" level=info msg="Worker: processing job 668e35a6884c994f38c42610 workflow generate_artifact" caller="worker.workerMain@worker.go:178" worker_id=7
time="2024-07-10T07:18:52Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact" caller="worker.processJob@process.go:61" worker_id=7
time="2024-07-10T07:18:52Z" level=info msg="668e35a6884c994f38c42610: started, generate_artifact task :Run create_artifact CLI" caller="worker.processJob@process.go:65" worker_id=7

It can see in logs that Run create_artifact CLI is called but never Done, like for example what happens when i upload an artifact with size 100Mb.

After some time all previous logs are cleared from artifacts pod and i can only see new workers been spawned in logs like for example: caller="worker.InitAndRun.func2@worker.go:114" worker_id=4 time="2024-07-10T07:15:10Z" level=info msg="worker starting up"

I can verify there is cpu & memory utilization while artifacts workers are running , but after some time when logs are cleared, cpu & memory returns to normal.

I’m running mender server 3.7.5 on a microk8s single cluster running on a qemu VM with 8 cpu cores, 8 Gb Ram & 100 Gb disk so there are plenty of resources available.

Any idea on what else to check ?

fyi minio ingress conf in docs needs some extra config to work with large files :

nginx.ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/proxy-buffering: 'off'
nginx.ingress.kubernetes.io/proxy-read-timeout: '300'

Hi @sakisd,

I’ve got information from the Mender Server team that in such a situation, the uploaded file will exceed the ephemeral storage of the pod, which results in it being killed off and the process failing.

The easiest and most straightforward solution is to use mender-artefact locally.

Greets,
Josef