Mender client connection issues

We have a Mender server deployed on an AWS instance, with the primary URL served via Akamai (company policy, took me a month to get this cleared!). After they configured the URL for us they informed us that only traffic on port 80 and 443 would be forwarded to our AWS instance. This of course causes issues with the deployment of Artifacts on port 9000.

After talking to our server support people they suggested to setup a tiny AWS server with a port 9000 forwarding function. I reconfigured our server and test client to use that URL for storage-proxy alias and the DEPLOYMENTS_AWS_URI.

I also put both URLs in the “Servers” entry of our client configuration (/etc/mender/mender.conf) as the required json array [(“ServerURL”: “URL1”), (“ServerURL”: “URL2”)].

After starting the server and the client device we can open the UI via URL1 and the device shows up for connection. If I go to Releases I can upload an Artifact and I can even deploy it (strangely enough only through a group, when I try to create a Deployment for a single Device the UI tells me there are no releases, which I found strange as well).

After the Deployment has been created it gets the state pending and it stays in that state (client should check every 15 minutes) for as long as the server is running.

Using “./run logs | error” on the server shows me no special errors.

Looking at the logging on the client device (journalctl | tail -n 1000 | grep error) I see that it is complaining about a number of issues:

  1. warning msg="Server \URL1 failed to serve request “/api/devices/v1/deployments/device/deployments/next”, Attempting “URL2"”
  2. error msg="Error receiving scheduled update data: POST update check request failed: Post URL2/api/devices/v1/deployments/device/deployments/next: dial tcp : connect: connection timed out

After that I see state transitions from Sync to Error and then to Idle.

Previously we had a test server deployed in a VM with a single URL and we didn’t run into these issues (everthing was running over a local ethernet cable as well), but now we are on the internet we’ve run into this issue.

Does anybody have any tips or solutions for our issue?

My IT support guy noticed that there was a HTTPS server running on port 9000 and if we go to the AWS passthrough servers URL with port 9000 added on we do see the UI there as well. Not sure if that helps with diagnosis.

BTW, if I upload a large firmware artifact file (about 1.7 GB) in Releases, it gets to 100% upload but doesn’t seem to go past that point. Not sure if that is related to the other problem I’m having.

Any help is welcome as we have already spent way more time than we would have liked getting this Server setup to work!

I’ve managed to resolve the download issue by removing URL2 from the client configuration (going back to the regular and singular ServerURL entry).

I can now upload a script Artifact and Deploy it to a device, though it doesn’t execute it (most likely because I used an old version of mender-artifact to create it, as I see errors pertaining to that in the client logs).

My upload issue is still there (but that rootfs image was also created with the incorrect mender-artifact exe), so I’m currently creating a new version based on the latest mender-artifact exe (3.0.1-2)

Still, if anyone has experienced the upload issue and managed to solve it, I’m curious how you did it, so I can check if I’m not experiencing a similar problem and can fix it in a similar way.

Hi @PJK,
Indeed adding the second server will cause problems since that is only the port 9000 storage gateway and not the full Mender API server which is running on your URL1.

The storage gateway simply provides TLS access over port 9000 to an S3-compatible API. You can always just use raw S3 for that if it is easier to get company approval rather than hosting your own forwarder.

As for the upload issue, I know we had some issues related to this but I thought they had all been resolved as of several weeks ago. When did you last test and replicate this issue? @merlin @tranchitella any ideas?

Drew

Hi Drew,

My mistake with the multiple server URLs came from the fact that I though the client should also know how to reach the storage server (which was actually caused by what I thought was causing the large uploads to the server failing).

In the meantime I found another forum thread that was specific to the problems with uploading large Artifact files to the server. It turns out that that was a bug in Server v2.3 that was resolved in Server 2.4.

I’ve updated our Server installation to Server 2.4 and those problems have disappeared. We’ve been able to perform multiple updates of our test devices and have now rolled out our first Menderized firmware for use with the devices that will be used in the field.

Here is a link to the forum thread about the large file upload issue:

Excellent. Glad to hear you were able to get unstuck.

Drew

@PJK please be sure to increase the idle timeout on your AWS load balancer to 3600 (1 hour).