Since we moved the mender server into a datacenter, deployments have been really slow. I applied the tactics described in [1], which make upgrade times acceptable. (1 to 1,5 hour update time, mender file is about 80MB)
Today, we pushed the first update to a device in the field. The update completed in 4 minutes.
Something must be wrong on our network, but we dont experience any other issues.
Can anyone suggest any things to investigate this?
I turned on debug logging of the mender service, but no clues there.
I would just try fetching an artifact with “wget” from the device, if it is still slow then Mender can be eliminated and could be “general network” issue.
ran wireshark on the traffic. (but due to https being used, i cannot see the contents) Not too much interesting stuff there, but I’m not too familiar with things
Any suggestions I could try to further pinpoint this?
At our network, we use a pfsense-based firewall. THe underlying firewall software (packet fence) has an option to pre-filter tcp packets, for bad combinations of flags, for example. Disabling that functionality solves my speed issues…
Now imo, the big question is, what is mender doing to trigger this…
I could offer a wireshark trace of the data during a (slow) upgrade…
Hard to provide anything specific about the why, but there is one difference between doing a wget and downloading with the Mender client and that is that the Mender client utilizes HTTP range requests. One of the reasons is to support resume of downloads in case of network interruption.
HTTP range requests is standard HTTP feature and I do not really think that this is causing the “red flags” but it is notable difference in how content is moved on the network.