After deploying production version of mender-server following instructions in docs, I encountered problem with creation of deployments. First of all it was taking far too long (about 4 hours without even starting the deployment - it was stuck at pending status). After checking logs from the container mender-deployments I found out that it is struggling with an error:
level=error msg="error reaching artifact storage service: SerializationError: failed to decode REST XML response
status code: 200, request id:
caused by: XML syntax error on line 8: element <link> closed by </head>"
I am not sure if those two are related, but what leads me to believe so is that i was perfectly able to download the artifact and install it using command below, so i suspect that the fault isn’t at the client side of things, also client checked in with no issue, and i was able to upload artifact just fine, so that should dispel doubts about other services.
At first i thought there might be problem with the keys (since at first i included CA signed ones) but neither keys provided by keygen utility included with mender integration, nor keys and certificates generated by CA authority seems to have influence on that behavior (not that they should have since problem isn’t with keys nor certificates). None of other containers nor client itself report any error, only mender-deployments ant it is only this one i have described.
Have anyone encountered such behavior, or have any idea how to resolve this?
@CezaryKierzyk it seems the deployments service is pointing to a wrong URL when connecting to the storage layer. Are you using minio or AWS S3? Can you please post the logs of the minio container? Are you using a custom domain name for your storage layer? Can you double check you can reach minio with the DNS name you set for it?
Mostly i just copy-pasted into script what was in docs and just ran it. All docker containers run in the same vps under the same ip pointed by the same domain name. I do not use AWS S3.
Minio container logs consist of this part repeating itself continuously:
I managed to get it to work using mender 2.5.1 , mender 2.6.1 and 2.7.0 seems to not launch storage-proxy. It seems that after 2.5.1 in prod.yml.template there is no storage-proxy configuration section in the config, and while Minio container is in fact launched and running healthy, it is not binded to hosts port 9000. I will doublecheck that and come back with proper answer.
level=error msg="error reaching artifact storage service: SerializationError: failed to decode REST XML response
status code: 200, request id:
caused by: XML syntax error on line 8: element <link> closed by </head>"
If I see this correctly, and storage proxy is gone, minio should somehow point to the api-gateway, too. So should the mender-api-gateway now expose port 9000? How does this diagram look like for the mender 2.7 - mender-api-gateway case? Like This?
the correct diagram is here - https://docs.mender.io/2.7/server-installation/overview; we’ll update the one from integration ASAP;
you no longer need to expose port 9000; minio is using network alias and traefik has a rule for this alias (look at the mender-api-gateway networks/aliases and minio/labels from prod.yml and docker-compose.storage.minio.yml provided by me)
If i take the docker-compose.storage.minio.yml into account, this means that we now have the storage available under either the s3.docker.mender.io, which is an internal domain, or we take the website_url and add the path /mender-artifact-storage, right? Will test that
I checked it and the issue on my side seemed to be that my bucket wasn’t named mender-artifact-storage, but something slightly different. I adjusted the path in the docker-compose.storage.minio.yml and now it looks good. So it should be $DEPLOYMENTS_AWS_BUCKET
Problem with your prod.yml is that the s3.docker.mender.io must be replaced somehow with my mender-url. The frontend runs normally, because all the containers can see s3.docker.mender.io, that’s what the alias does. But this way, the request to pull an artifact from s3.docker.mender.io is sent to the devices, and they cant resolve the url… The other way around, using my mender-url instead of s3.docker.mender.io, i can also get it to work, that the frontend forwards my requests correctly and i can download all the artifacts from the releases menu, where the artifacts are resolved via my mender-url/my-bucket. but somehow deployments cant handle that and creates this serialization error
2021-06-03 05:55:03 +0000 UTC error: Can not fetch update image: Get "https://s3.docker.mender.io/mender-artifact-storage/df2248bf-b75c-40d1-a48c-da275a803bd5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=mender-deployments%2F20210603%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210603T055500Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=application%2Fvnd.mender-artifact&X-Amz-Signature=5ede7f6f9388d58138b6a6e577bdb5edcc9d3e01f7c88b9a10cbfa876d2338d5": dial tcp: lookup s3.docker.mender.io: no such host
2021-06-03 05:55:03 +0000 UTC error: Update fetch failed: update fetch request failed: Get "https://s3.docker.mender.io/mender-artifact-storage/df2248bf-b75c-40d1-a48c-da275a803bd5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=mender-deployments%2F20210603%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210603T055500Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=application%2Fvnd.mender-artifact&X-Amz-Signature=5ede7f6f9388d58138b6a6e577bdb5edcc9d3e01f7c88b9a10cbfa876d2338d5": dial tcp: lookup s3.docker.mender.io: no such host
We just moved from mender server 2.4.0 to a clean installation of 2.7.0.
But we are experiencing exactly the same problem (with same error logs) as Rohita83 and some others.
Instead that the mender-client uses our public mender server URI it uses the mender-server internal https://s3.docker.mender.io/ URI. The latter is not publicly known on the internet and causes for this reason the error.
Hopefully someone can give us some tips to resolve the issue. Thanks a lot.
And to give some more context (as Dave took over from me). For the 2.4.0. installation we used two different URL’s for Mender: 1 public facing one for the the API side and another one for the port 9000 access to the deployment server. This because we have a URL provider for the public facing one that does not open port 9000.
So the problem we are now facing is that with the changes in 2.7.0 we don’t know how we are supposed to change our configuration to make it work. We got as far as making the web interface work, but the clients don’t seem able to reach the deployment side of things.
This of course makes using Mender for our purposes unsuitable, so we need some help to get this working under the new setup of Mender Server.
The information in this thread has up to now not helped us and we are in the same boat as Rohita83.
Any help to get our Mender Server working again as intended would be very much appreciated.
Why are you using s3.docker.mender.io? I think the docs may have a bug as the latest version does seem to require that but the older version defaults to $API_GATEWAY_NAME which I think is correct. Can you try to change that and see if it makes a difference?
apart from following Drew’s advice, could you please send me the /etc/hosts from the host, output of docker ps and also could you enter the deployments container and print the environment with something like that: