When we run updates we often see that the status in the web UI says that it is
Downloading: Executing script: Download_Enter_10 and the progress bar gets stuck at 69% for a long period of time until the device finally reboots and then it moves to the rebooting stage and the update continues. This is particularly a problem with updates that take a longer time due to slow network connections.
This also causes problems for us if a unit shuts off during the download step we have no way of knowing that until the unit is turned back and mender updates the status.
Is there a way to actually see the download status in the web UI to monitor if the update is still running or not?
Note here: I am not sure if this is a bug in how we have mender configured with our state scripts or a limitation of the mender client itself.
This is because the progress bar on the server is just an estimate. There is no actual communication between the client and server during the download so we just estimate it until the next state change is reported to the server.
There have been discussions of a more fine grained progress reporting but at the moment we don’t have anything.
Thanks for the info. We would be very interested to see this feature added. Even a ping every 5-10 minutes while the download is running would be incredibly helpful for us.
Thanks for the feedback. The main reason this has not been supported (yet) is that it leads to a lot of communication overhead that rarely will be used.
For example if you have 10,000 devices, tracking the download progress of each and every one seems a bit wasteful. I do certainly understand it for smaller scale / individual diagnostics purposes though. The question is how to enable this when it matters without wasting resources in general… Any ideas?
I definitely agree that this would be wasteful and too much information across a large fleet.
I see a couple of ways that this could be enabled that would help us out.
One would be some sort of way that a state script or config option could trigger this. That way we could have an end user turn it on for diagnostics that would set a file somewhere that our state script checks or add the option to the config.
The other option would be a debug checkbox that could be selected in the mender UI for a specific deployment. That way we could run a deployment on a large group and then turn debug on when we retry the deployment for any devices that failed.
Good ideas. I am trying to think how to make this a bit more automatic too, so it’s a bit easier to deal with (less to configure).
Your original problem seemed to be you didn’t know if the device was actually doing something or not and so your first proposal may be better: Mender client should “ping” the server regularly during an update process and last ping could be displayed in the UI deployment report.
From some of our research, a typical deployment takes less than 10-15 minutes, and that’s what the “mock” progress is modeled after. What’s the timespans you are seeing in your case (e.g. normal vs. slow case)?
Maybe we could have a default ping every 10 minutes, or 30 minutes or so, during a deployment - so the server knows it is still active. I think it should be infrequent enough to not occur in most cases, to save resources.
A regular ping would actually work really well for this issue. Almost all of our deployments usually take between 5 and 15 minutes depending on internet bandwidth. The ones where we have issues often take upwards of 30 minutes. Having the client send up an update every 10 minutes or so would probably work well for this. A single extra ping to the server for each update would not be too much extra information for the server to handle.
I think that this would still be quite wasteful as any kind of information that needs to be displayed in the UI must go via database, so each ping would be a write to the database. And even if this would be for “smaller fleets” if enough number of “smaller fleets” have this enabled the impact could be significant.
Optimally we would have socket based communication channel and a message bus for the UI to avoid this going via the database.
ping @tranchitella, who has expertise in this area and might provide additional insights.
@mirzak I agree with you: this ping-based approach, writing to a database, causes a significant overhead that is not justified in scenarios with a large fleet of devices.
Future releases of Mender will use a persistent connection between the device and the backend, providing the right communication channel for this kind of live feedback, which does not need persistence in the storage layer (MongoDB). Until then, I believe we won’t have this kind of feature available.