Slow deployment

mterwoord · March 15, 2019, 5:13pm

Since we moved the mender server into a datacenter, deployments have been really slow. I applied the tactics described in [1], which make upgrade times acceptable. (1 to 1,5 hour update time, mender file is about 80MB)

Today, we pushed the first update to a device in the field. The update completed in 4 minutes.
Something must be wrong on our network, but we dont experience any other issues.

Can anyone suggest any things to investigate this?

[1] Speed up mender upgrade

mirzak · March 18, 2019, 7:58am

I am bit confused here by these two lines,

which make upgrade times acceptable. (1 to 1,5 hour update time

and

The update completed in 4 minutes.

First you mention hours to complete an update and then 4 minutes? Could you clarify this.

mterwoord · March 18, 2019, 9:43am

These are on different networks: At our office network, things take hours, at a customer network, its in 4 minutes.

I’m pretty sure its something on our network, but no clue on how to find out what…

I turned on debug logging of the mender service, but no clues there.

mirzak · March 18, 2019, 9:48am

I turned on debug logging of the mender service, but no clues there.

I would just try fetching an artifact with “wget” from the device, if it is still slow then Mender can be eliminated and could be “general network” issue.

Could you try that?

mterwoord · March 18, 2019, 10:01am

How do I get the URL to use? I guess the mender server is secured for unauthorized downloading?

mirzak · March 18, 2019, 10:04am

You can check the Mender client log, the URL that is generated is printed and is pre-signed URL and is usable for 24 hours.

mterwoord · March 23, 2019, 7:59am

I did some assumption-checking:

wget downloads as fast as I expected
dd-ing the passive root partition takes 4 minutes
ran wireshark on the traffic. (but due to https being used, i cannot see the contents) Not too much interesting stuff there, but I’m not too familiar with things

Any suggestions I could try to further pinpoint this?

mterwoord · March 23, 2019, 4:04pm

Made some big breakthrough:

At our network, we use a pfsense-based firewall. THe underlying firewall software (packet fence) has an option to pre-filter tcp packets, for bad combinations of flags, for example. Disabling that functionality solves my speed issues…

Now imo, the big question is, what is mender doing to trigger this…

I could offer a wireshark trace of the data during a (slow) upgrade…

mirzak · March 25, 2019, 7:45am

Hard to provide anything specific about the why, but there is one difference between doing a wget and downloading with the Mender client and that is that the Mender client utilizes HTTP range requests. One of the reasons is to support resume of downloads in case of network interruption.

HTTP range requests is standard HTTP feature and I do not really think that this is causing the “red flags” but it is notable difference in how content is moved on the network.

mterwoord · March 25, 2019, 8:12am

It seems the scrubbing is not something that’s really working great with tcp windows, which probably is being used.

See http://openbsd-archive.7691.n7.nabble.com/Scrub-reassemble-tcp-td259581.html

Topic		Replies	Views
Standalone Mender 1.7 taking a lot of time for deploying the artifact General Discussions yocto , sumo	2	495	February 27, 2019
Update-Loop - overwrite partition b, then a, again General Discussions	3	474	December 22, 2021
Deployment varies signicantly depending on number of devices General Discussions	4	476	September 12, 2019
Mender in bandwidth constrained environments: 2g/15kbps connection speed General Discussions	12	1505	November 4, 2019
Mender 2.0 update failure because wifi network not immediately available General Discussions	9	1607	June 19, 2019

Slow deployment

Related topics