Artifact name not updated in mender server after successful mender update due to 401 error despite correct key being accepted in ui

sorry for long title, and need for some background:

am migrating from self-hosted 2.4 server to a paid hosted plan, a previous test I did for this part of the migration (configuring the device to use both servers, rejecting from the old server after the mender update, and accepting on the new server) was successful

mender-client is version 2.6 (I know it is bit old, but didn’t see any bugs in Mender client | Mender documentation that would be relevant, am working on updating to something newer, but am trying to start this migration before that is ready)

If this is a bug related to the versions being old I can work around it, since after a while the device appeared as pending in hosted mender, it would just require more work for the migration since I can’t rely on checking artifact name in self-hosted mender to know what devices need to be rejected and accepted.

some relevant error messages:

 root@osname:/data/mender# mender send-inventory && journalctl -fu mender-client
...
Sep 12 06:43:23 osname mender[2253]: time="2024-09-12T06:43:23Z" level=info msg="Forced wake-up from sleep"
Sep 12 06:43:23 osname mender[2253]: time="2024-09-12T06:43:23Z" level=info msg="Forcing state machine to: inventory-update"
Sep 12 06:43:23 osname mender[2253]: time="2024-09-12T06:43:23Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]"
Sep 12 06:43:26 osname mender[2253]: time="2024-09-12T06:43:26Z" level=warning msg="Server \"https://mender-internal.selfhosted.com\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://eu.hosted.mender.io\""
Sep 12 06:43:26 osname mender[2253]: time="2024-09-12T06:43:26Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=info msg="successfully received new authorization data"
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=error msg="(request_id: ): Got unexpected HTTP status when submitting to inventory 401"
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=warning msg="Failed to refresh inventory: failed to submit inventory data: (request_id: ): Got unexpected HTTP status when submitting to inventory 401"
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]"
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 12 06:43:28 osname osname-Idle-Enter-01-remove-deployment-flags[18234]: Enter idle state
Sep 12 06:43:28 osname mender[2253]: time="2024-09-12T06:43:28Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"

I wasn’t able to find any messages in the server logs about this, and I also couldn’t find any messages in the server logs when I run the same command on another device and it succeeds. Not sure if there is a specific container I should check or some setting I need to enable for more verbose logging or something.

root@arkkios:/data/mender# curl -X POST https://mender-internal.selfhosted.com/api/devices/v1/authentication/auth_requests -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'X-MEN-Signature: string' -d '{"id_data": "{\"os_device_id\":\"3f9e2b13-e836-42e6-ad22-012a39032053\"}", "pubkey": "-----BEGIN PUBLIC KEY-----\n...\n...\n...\n...\n...\n...\n...\n-----END PUBLIC KEY-----\n"}' {"error":"signature verification failed","request_id":"8fc3f545-f6cf-49c8-a00b-a30171d5645a"}

Here I was trying to see if I could curl the api endpoing giving 401 error to get more information, but I can’t get a token for some reason. Did verify that public key I used is the same as in the ui and can be generated from the private key on the device.

Also, for confirmation that these credentials are accepted in ui:

>>> a = requests.get("https://api.selfhosted.com/mender-api/api/management/v2/devauth/devices/32eaeadd-f342-4359-85c3-ca72ff75a414", headers=headers)
>>> pprint(a.json())
{'auth_sets': [{'id': '0a0d3183-74be-4749-9d6c-9e873dfd71bc',
                'identity_data': {'os_device_id': '3f9e2b13-e836-42e6-ad22-012a39032053'},
                'pubkey': '-----BEGIN PUBLIC KEY-----\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '...\n'
                          '-----END PUBLIC KEY-----\n',
                'status': 'accepted',
                'ts': '2024-09-10T12:01:15.857Z'}],
 'created_ts': '2024-09-10T12:01:15.839Z',
 'decommissioning': False,
 'id': '32eaeadd-f342-4359-85c3-ca72ff75a414',
 'identity_data': {'os_device_id': '3f9e2b13-e836-42e6-ad22-012a39032053'},
 'status': 'accepted',
 'updated_ts': '2024-09-12T05:41:44.691Z'}

This is for a custom board with yocto based os. Didn’t have the same problem the first time I tried this migration, but it is possible i’ve misconfigured something since then, though there shouldn’t be any significant changes. I don’t see any configuration other than the old server url that would be relevant here anyway.

Hello @threesc :wave:

Could you elaborate exactly what steps you did to migrate the client to the new server?

Please note that in order to connect with hosted Mender, you’ll also need to configure the “TenantToken” in /etc/mender/mender.conf for the server to recognize the device under your account. You’ll fine the organization and billing tab in your hosted Mender account: https://eu.hosted.mender.io/ui/settings/organization-and-billing. The tenant token will be ignored by your self-hosted server.

I think only step at this point for the mender migration is that I have done a (full) mender update which has a new /etc/mender/mender.conf that looks like (after reformatting):

{
    "UpdatePollIntervalSeconds": 1800,
    "ArtifactVerifyKey": "/etc/mender/artifact-verify-key.pem",
    "RetryPollIntervalSeconds": 300,
    "TenantToken": "............",
    "InventoryPollIntervalSeconds": 28800,
    "Servers": [{
            "ServerURL": "https://mender-internal.selfhosted.com"
        }, {
            "ServerURL": "https://eu.hosted.mender.io"
        }
    ]
}

Also done in that update is add support for delta updates (including never remounting ro rootfs) and a few other updated not related to mender.

Following steps for migration will be:
2) reject device in old mender server
3) accept device in hosted mender
4) mender update to new image without self hosted mender url configured

Your migration steps looks correct.

Could you check the log when the mender client starts after booting the new deployment?
Could you also double check that the identity data and the device’s private key doesn’t change after the update?
For example, if the private key is removed as part of the update, the device will automatically replace it which in turn will fail the update.

Also, relevant containers to monitor on your local server:

  • deviceauth - all authentication requests and verification of authorization tokens are logged by this service
  • inventory - serves the inventory API
  • deployments - serves the deployments APIs relevant for device updates
  • mender-api-gateway (traefik) - Routes all requests to the appropriate service

Have already checked public key accepted on self-hosted mender matches the one I can generate from the private key on the device, so I’m pretty sure that is the same key as was there before, and identity data (this os_device_id) is the same as well on both servers.

Is there any way to filter those logs for messages related to a specific device? I’m fairly sure there isn’t any message about that /api/devices/v1/inventory/device/attributes patch (I think) call, but there is too many devices putting messages there for me to otherwise filter out what messages are about this specific device.

Also some messages have the device IP included (looks to be apigateway calls for v1/authentication/auth_requests and a deployment endpoint), but I don’t find anything searching by the IP for this specific device (though I would expect this “attempting reauthorization … successfully retrieved new authorization data” to correspond to an authenticaiton request).

logs after a reboot look like:

root@osnameos:~# journalctl -fu mender-client
-- Logs begin at Tue 2024-09-17 04:36:56 UTC. --
Sep 17 04:38:01 osnameos mender[2110]: time="2024-09-17T04:38:01Z" level=info msg="Loaded configuration file: /etc/mender/mender.conf"
Sep 17 04:38:01 osnameos mender[2110]: time="2024-09-17T04:38:01Z" level=info msg="Mender running on partition: /dev/mmcblk3p3"
Sep 17 04:38:01 osnameos mender[2110]: time="2024-09-17T04:38:01Z" level=info msg="State transition: init [none] -> init [none]"
Sep 17 04:38:01 osnameos mender[2110]: time="2024-09-17T04:38:01Z" level=info msg="State transition: init [none] -> idle [Idle]"
Sep 17 04:38:01 osnameos mender[2110]: time="2024-09-17T04:38:01Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 17 04:38:02 osnameos xxxxxx-Idle-Enter-01-remove-deployment-flags[2238]: Enter idle state
Sep 17 04:38:03 osnameos mender[2110]: time="2024-09-17T04:38:03Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"
Sep 17 04:38:03 osnameos mender[2110]: time="2024-09-17T04:38:03Z" level=info msg="State transition: idle [Idle] -> check-wait [Idle]"
Sep 17 04:38:03 osnameos mender[2110]: time="2024-09-17T04:38:03Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]"
Sep 17 04:38:16 osnameos mender[2110]: time="2024-09-17T04:38:16Z" level=warning msg="Server \"https://mender-internal.self-hosted.com\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://eu.hosted.mender.io\""
Sep 17 04:38:18 osnameos mender[2110]: time="2024-09-17T04:38:18Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 17 04:38:21 osnameos mender[2110]: time="2024-09-17T04:38:21Z" level=info msg="successfully received new authorization data"
Sep 17 04:38:21 osnameos mender[2110]: time="2024-09-17T04:38:21Z" level=error msg="(request_id: ): Got unexpected HTTP status when submitting to inventory 401"
Sep 17 04:38:21 osnameos mender[2110]: time="2024-09-17T04:38:21Z" level=warning msg="Failed to refresh inventory: failed to submit inventory data: (request_id: ): Got unexpected HTTP status when submitting to inventory 401"
Sep 17 04:38:21 osnameos mender[2110]: time="2024-09-17T04:38:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]"
Sep 17 04:38:21 osnameos mender[2110]: time="2024-09-17T04:38:21Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 17 04:38:22 osnameos xxxxxx-Idle-Enter-01-remove-deployment-flags[2742]: Enter idle state
Sep 17 04:38:23 osnameos mender[2110]: time="2024-09-17T04:38:23Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"
Sep 17 04:38:23 osnameos mender[2110]: time="2024-09-17T04:38:23Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]"
Sep 17 04:38:25 osnameos mender[2110]: time="2024-09-17T04:38:25Z" level=warning msg="Server \"https://mender-internal.self-hosted.com\" failed to serve request \"/api/devices/v1/deployments/device/deployments/next\". Attempting \"https://eu.hosted.mender.io\""
Sep 17 04:38:25 osnameos mender[2110]: time="2024-09-17T04:38:25Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 17 04:38:26 osnameos mender[2110]: time="2024-09-17T04:38:26Z" level=info msg="successfully received new authorization data"
Sep 17 04:38:28 osnameos mender[2110]: time="2024-09-17T04:38:28Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]"
Sep 17 04:38:28 osnameos mender[2110]: time="2024-09-17T04:38:28Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 17 04:38:29 osnameos xxxxxx-Idle-Enter-01-remove-deployment-flags[2869]: Enter idle state
Sep 17 04:38:29 osnameos mender[2110]: time="2024-09-17T04:38:29Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"

I’ll try later today (or maybe tomorrow if some other work goes poorly) to reflash another device and see if the issue reproduces or not, though not sure if that helps anything.

Same thing occurred on second device I tried with (thought it isn’t yet pending on hosted mender I think it will probably get there in a bit), seems like I must have somehow done something wrong with configuration somewhere.

All authorized request will log the subject of the calling client. In terms of device APIs, there is a log field device_id which contains the unique ID of the device. So if you pipe your logs through grep(1). Depending on whether you’re using docker compose or kubernetes you could try:

docker logs integration-mender-inventory-1 | grep 'device_id=<device ID from UI>'
kubectl logs deploy/mender-inventory | grep 'device_id=<device ID from UI>'

If it doesn’t show up in https://eu.hosted.mender.io then it is likely that either the TenantToken in /etc/mender/mender.conf is incorrect or the device changed identity. The device also won’t show up as pending in https://eu.hosted.mender.io before the device has been rejected on https://mender-internal.selfhosted.com, but make sure it’s not rejected until the device has successfully completed the deployment.

To better help me understand the context, could you also share what version of the Mender server you’re running? That is, which Docker tags are your mender containers using?

I did some digging through the client and backend code, and this looks a bit suspicious. It seems like the first call to your self-hosted instance fails with a different status code than 401 (Unauthorized).
Could you check the output of the script and verify that the key/value attributes produced does not exceed 1024 characters?
There might be a better explanation in the logs for the inventory service.

Hi @threesc,

Sorry to jump a bit here in the middle, but I had a huntch when I read about this issue.

Could we be hitting this old client bug fixed in 3.2.0?

Client will no longer cache the Authorization token from the server across restarts, meaning that it is no longer possible to end up in the situation where a rootfs update with invalid authorization data succeeds, only to fail authorization later on when the token expires. (MEN-5217)

What I think is suspicious of your scenario is that the full rootfs update succeeds, to only then get 401 :thinking:

Lluis

First, couple more things I found testing:

  1. reproduced a third time, found when the self-hosted server is taken down I get the exact same error messages (which does make it seem likely I’m not getting a 401 from self-hosted server, i’m getting some other error which isn’t reported)
  2. it doesn’t appear as pending in hosted mender until the self-hosted server is taken down (I haven’t yet tried rejecting the device when it is failing to send inventory, but could test that at some point)

Actually using aws ecs, but looks like cloudwatch search doesn’t actually work, I’ll try to download and analyze it a bit later (probably tomorrow) but downloading cloudwatch stuff is always a pain, so I finish this reply first.

I think the tag we are using is this “mender-2.4.0b1”, but if that number is a nonsense answer to the question you are asking it means i’m looking wrong place, wasn’t here when we cloned the containers.
Also that is from source code, not 100% sure it matches what the server is actually running, but lower left corner in gui looks like this:
Scresdefnshote2df024-09-efeff35647

Not totally sure what script you mean to run, but executing every file in /usr/share/mender/inventory and appending the results together gives a character count of 973 (for the one device I checked on), so if almost anything needs escaped it will go over 1024
Also, would make sense for that length to be a problem since I now have longer artifact name than when I did the first test. (And there is now at least a new update module which wasn’t included before.)
Did some quick tests to reduce the size (now got a character count of 625) and unfortunately didn’t seem to help anything.

Is that token cache somewhere I can check and delete or clear it?

I’ll try to download the relevant server logs now and probably check them tomorrow morning.

took a while but finally got some (maybe) relevant logs (corresponding to running mender send-inventory):

2024-09-19T06:56:24.508000+00:00 mender-device-auth/mender-device-auth/253ee71461754d19a9c49a57c90b4e72 time="2024-09-19T06:56:24Z" level=info msg="200 5988μs POST /api/internal/v1/devauth/tokens/verify HTTP/1.0 - Go-http-client/1.1" byteswritten=0 device_id=0b1b25c4-27a8-44a1-b72d-229c655beb14 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=POST path=/api/internal/v1/devauth/tokens/verify plan=enterprise qs= request_id=f17da4f2-553c-499a-81c5-b1706f056468 responsetime=0.005988105 status=200 ts="2024-09-19 06:56:24.502554158 +0000 UTC" type=http

2024-09-19T06:56:24.510000+00:00 mender-inventory/mender-inventory/79d8623129fd45dbb007ee3de1ce99dc time="2024-09-19T06:56:24Z" level=info msg="405 353μs PUT /api/0.1.0/attributes HTTP/1.0 - Go-http-client/1.1" byteswritten=59 device_id=0b1b25c4-27a8-44a1-b72d-229c655beb14 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=PUT path=/api/0.1.0/attributes plan=enterprise qs= request_id=f17da4f2-553c-499a-81c5-b1706f056468 responsetime=0.000353186 status=405 ts="2024-09-19 06:56:24.509745899 +0000 UTC" type=http

2024-09-19T06:56:26.927000+00:00 mender-device-auth/mender-device-auth/253ee71461754d19a9c49a57c90b4e72 time="2024-09-19T06:56:26Z" level=info msg="Token ...... assigned to device 0b1b25c4-27a8-44a1-b72d-229c655beb14" file=devauth.go func="devauth.(*DevAuth).SubmitAuthRequest" line=380 request_id=4f4f614d-e5ee-43ac-85e3-18ad53d5b91e sub=65d5c0039291ce78d820749a tenant_id=......

2024-09-19T06:56:26.928000+00:00 mender-device-auth/mender-device-auth/253ee71461754d19a9c49a57c90b4e72 time="2024-09-19T06:56:26Z" level=info msg="200 126652μs POST /api/devices/v1/authentication/auth_requests HTTP/1.0 - Go-http-client/1.1" byteswritten=671 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=POST path=/api/devices/v1/authentication/auth_requests qs= request_id=4f4f614d-e5ee-43ac-85e3-18ad53d5b91e responsetime=0.126652336 status=200 sub=65d5c0039291ce78d820749a tenant_id=...... ts="2024-09-19 06:56:26.801964847 +0000 UTC" type=http

(4th message obtained by checking the request_id values)

checked also on a device where mender send-inventory && journalctl -fu mender-client didn’t give any kind of error (with self-hosted mender server only):

2024-09-19T07:24:07.687000+00:00 mender-device-auth/mender-device-auth/253ee71461754d19a9c49a57c90b4e72 time="2024-09-19T07:24:07Z" level=info msg="200 10748μs POST /api/internal/v1/devauth/tokens/verify HTTP/1.0 - Go-http-client/1.1" byteswritten=0 device_id=94362bd3-c1fa-4708-aa57-ddac8c0d05d3 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=POST path=/api/internal/v1/devauth/tokens/verify plan=enterprise qs= request_id=0d6f9c84-56e1-4636-83cb-1deb7e1abc14 responsetime=0.010748207 status=200 ts="2024-09-19 07:24:07.67640539 +0000 UTC" type=http

2024-09-19T07:24:07.689000+00:00 mender-inventory/mender-inventory/79d8623129fd45dbb007ee3de1ce99dc time="2024-09-19T07:24:07Z" level=info msg="405 280μs PUT /api/0.1.0/attributes HTTP/1.0 - Go-http-client/1.1" byteswritten=59 device_id=94362bd3-c1fa-4708-aa57-ddac8c0d05d3 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=PUT path=/api/0.1.0/attributes plan=enterprise qs= request_id=0d6f9c84-56e1-4636-83cb-1deb7e1abc14 responsetime=0.000280426 status=405 ts="2024-09-19 07:24:07.688772442 +0000 UTC" type=http

2024-09-19T07:24:07.875000+00:00 mender-device-auth/mender-device-auth/253ee71461754d19a9c49a57c90b4e72 time="2024-09-19T07:24:07Z" level=info msg="200 6512μs POST /api/internal/v1/devauth/tokens/verify HTTP/1.0 - Go-http-client/1.1" byteswritten=0 device_id=94362bd3-c1fa-4708-aa57-ddac8c0d05d3 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=POST path=/api/internal/v1/devauth/tokens/verify plan=enterprise qs= request_id=a4e3f1cd-249b-4148-9a01-3fd81f3e67dc responsetime=0.006512659 status=200 ts="2024-09-19 07:24:07.86899438 +0000 UTC" type=http

2024-09-19T07:24:07.878000+00:00 mender-inventory/mender-inventory/79d8623129fd45dbb007ee3de1ce99dc time="2024-09-19T07:24:07Z" level=info msg="200 1471μs PATCH /api/0.1.0/attributes HTTP/1.0 - Go-http-client/1.1" byteswritten=0 device_id=94362bd3-c1fa-4708-aa57-ddac8c0d05d3 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=71 method=PATCH path=/api/0.1.0/attributes plan=enterprise qs= request_id=a4e3f1cd-249b-4148-9a01-3fd81f3e67dc responsetime=0.001471219 status=200 ts="2024-09-19 07:24:07.87684208 +0000 UTC" type=http

Those logs don’t seem very useful to me, i’m also not sure why they have this “plan=enterprise” even for the older device which shouldn’t have anything to do with mender enterprise. looks to me like the device is just asking to reauthenticate instead of sending the PATCH request, but from these logs I don’t have any more idea why.

Was testing a few more things and found if I changed my mender.conf to this:

{
    "UpdatePollIntervalSeconds": 1800,
    "ArtifactVerifyKey": "/etc/mender/artifact-verify-key.pem",
    "RetryPollIntervalSeconds": 300,
    "TenantToken": "............",
    "InventoryPollIntervalSeconds": 28800,
    "Servers": [{
            "ServerURL": "https://mender-internal.selfhosted.com"
        }
    ]
}

and it did manage to send inventory to that self-hosted server.

I also already tried an extra / at the end since that was mentioned in documentation, but it didn’t help anything.

after a bit more testing that I really thought a waste of time, I found this configuration worked for me:

{
    "UpdatePollIntervalSeconds": 1800,
    "ArtifactVerifyKey": "/etc/mender/artifact-verify-key.pem",
    "RetryPollIntervalSeconds": 300,
    "TenantToken": "............",
    "InventoryPollIntervalSeconds": 28800,
    "Servers": [{
            "ServerURL": "https://eu.hosted.mender.io"
        }, {
            "ServerURL": "https://mender-internal.selfhosted.com"
        }
    ]
}

I guess it is good enough for me that it works now, checked with that device that it correctly connects to to hosted mender also with that mender.conf file and puts inventory there, will on monday check the device where I haven’t mangled the inventory and then also the full update path.

Awesome! Glad you made it work. I thought that putting hosted Mender first might fail the deployment, but seems like it works just fine.
One thing to keep in mind is that you can’t accept the device before the deployment has finished successfully (otherwise it will roll back because the deployment doesn’t exist on hosted Mender). This means that you cannot preauthorize the device using this server ordering.

Inspecting the logs:

It seems like you’re running a pretty old version of the Mender server. The client (device) is first trying to use an endpoint that has not yet been implemented, and instead of falling back to the older endpoint, it tries hosted Mender first. This is why you never see the inventory attributes on your self-hosted server.
You could also try to upgrade your server installation to mender-2.6 which is where this endpoint was first introduced (don’t forget to back up the database before you upgrade). Otherwise, if this configuration works don’t bother :slight_smile:

I had some problems so took a while to test.

But now it has had that problem that it seems to only try to authenticate to eu.mender.io, the device is pending already there and the deployment is stuck downloading at 69% (which is where it stops when the devices reboot), and the log on the device looks pretty similar as before, but with self-hosted and eu.mender.io servers switched around.

Also I mention that it looks like the mender commit worked, even after 2 reboots there is still the new artifact which should be trying to connect to both servers.

At least this way I think I can workaround it relatively easy, but could anyway check if someone has some idea for a proper fix?

It still wasn’t totally clear what you wanted me to check about inventory if that is even relevant anymore, if I was able to check something about that cache lluiscampos mentioned?

For reference, I also include logs here for completion, they are actually a bit different in this case, but I can say for sure now that it is pending in eu.mender.io and shouldn’t be authorized there:

root@osos:~# mender send-inventory && journalctl -fu mender-client
-- Logs begin at Tue 2024-09-24 11:09:36 UTC. --
Sep 24 11:11:09 osos mender[2154]: time="2024-09-24T11:11:09Z" level=info msg="successfully received new authorization data"
Sep 24 11:11:09 osos mender[2154]: time="2024-09-24T11:11:09Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/deployments/device/deployments/next\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:11:11 osos mender[2154]: time="2024-09-24T11:11:11Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]"
Sep 24 11:11:11 osos mender[2154]: time="2024-09-24T11:11:11Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 24 11:11:11 osos os-Idle-Enter-01-remove-deployment-flags[2969]: Enter idle state
Sep 24 11:11:12 osos mender[2154]: time="2024-09-24T11:11:12Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"
Sep 24 11:19:53 osos mender[2154]: time="2024-09-24T11:19:53Z" level=info msg="Forced wake-up from sleep"
Sep 24 11:19:53 osos mender[2154]: time="2024-09-24T11:19:53Z" level=info msg="Forcing state machine to: inventory-update"
Sep 24 11:19:53 osos mender[2154]: time="2024-09-24T11:19:53Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]"
Sep 24 11:19:56 osos mender[2154]: time="2024-09-24T11:19:56Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:19:57 osos mender[2154]: time="2024-09-24T11:19:57Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:19:58 osos mender[2154]: time="2024-09-24T11:19:58Z" level=info msg="successfully received new authorization data"
Sep 24 11:19:58 osos mender[2154]: time="2024-09-24T11:19:58Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:00 osos mender[2154]: time="2024-09-24T11:20:00Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:20:00 osos mender[2154]: time="2024-09-24T11:20:00Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:20:01 osos mender[2154]: time="2024-09-24T11:20:01Z" level=info msg="successfully received new authorization data"
Sep 24 11:20:01 osos mender[2154]: time="2024-09-24T11:20:01Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:01 osos mender[2154]: time="2024-09-24T11:20:01Z" level=info msg="Forcing state machine to: inventory-update"
Sep 24 11:20:01 osos mender[2154]: time="2024-09-24T11:20:01Z" level=info msg="State transition: inventory-update [Sync] -> inventory-update [Sync]"
Sep 24 11:20:02 osos mender[2154]: time="2024-09-24T11:20:02Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:20:03 osos mender[2154]: time="2024-09-24T11:20:03Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:20:03 osos mender[2154]: time="2024-09-24T11:20:03Z" level=info msg="successfully received new authorization data"
Sep 24 11:20:04 osos mender[2154]: time="2024-09-24T11:20:04Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:04 osos mender[2154]: time="2024-09-24T11:20:04Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:20:04 osos mender[2154]: time="2024-09-24T11:20:04Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:20:05 osos mender[2154]: time="2024-09-24T11:20:05Z" level=info msg="successfully received new authorization data"
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/inventory/device/attributes\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]"
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 24 11:20:06 osos os-Idle-Enter-01-remove-deployment-flags[13921]: Enter idle state
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="Forced wake-up from sleep"
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]"
Sep 24 11:20:06 osos mender[2154]: time="2024-09-24T11:20:06Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:20:07 osos mender[2154]: time="2024-09-24T11:20:07Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:20:07 osos mender[2154]: time="2024-09-24T11:20:07Z" level=info msg="successfully received new authorization data"
Sep 24 11:20:08 osos mender[2154]: time="2024-09-24T11:20:08Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/deployments/device/deployments/next\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:08 osos mender[2154]: time="2024-09-24T11:20:08Z" level=info msg="Device unauthorized; attempting reauthorization"
Sep 24 11:20:09 osos mender[2154]: time="2024-09-24T11:20:09Z" level=warning msg="Failed to authorize \"https://eu.hosted.mender.io\"; attempting \"https://mender-internal.os.com\"."
Sep 24 11:20:09 osos mender[2154]: time="2024-09-24T11:20:09Z" level=info msg="successfully received new authorization data"
Sep 24 11:20:10 osos mender[2154]: time="2024-09-24T11:20:10Z" level=warning msg="Server \"https://eu.hosted.mender.io\" failed to serve request \"/api/devices/v1/deployments/device/deployments/next\". Attempting \"https://mender-internal.os.com\""
Sep 24 11:20:11 osos mender[2154]: time="2024-09-24T11:20:11Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]"
Sep 24 11:20:11 osos mender[2154]: time="2024-09-24T11:20:11Z" level=info msg="Executing script: Idle_Enter_01_remove-deployment-flags"
Sep 24 11:20:11 osos os-Idle-Enter-01-remove-deployment-flags[14039]: Enter idle state
Sep 24 11:20:11 osos mender[2154]: time="2024-09-24T11:20:11Z" level=info msg="Collected output (stderr) while running script /etc/mender/scripts/Idle_Enter_01_remove-deployment-flags\nEnter idle state\n\n---------- end of script output"

Ok, I’m even more stumped now, I repeated same steps of:
reflash old image → mender update to intermediate image (with reordered servers)

and this time the update finished successfully, and inventory was updated for the device in mender, but it also appeared as pending in hosted mender server.

Anyway I think I will continue the migration with the workaround at this point, it will be easy later to reject updated devices if I need too and with my knowledge of the mender server (and memory from when previous coworkers were trying to update in the past) I don’t want to try updating that.