Feature Request: Logs for successful deployments

Hi,

I would be curious to know about the deployments logs.
Are the only ones to have this need or maybe there are technical limitations that make it unfeasible ?
Thanks to the whole team in advance,

Context

Hosted mender portal. Deployments page.

Current behavior

Logs are visible only on failed deployments

Not on successful ones.

Feature Request

Allow the consultation of logs also on successful deployments ?

Use case

Troubleshooting.

  1. in case of failed deployments, signs and hidden issues might have appeared on previous, otherwise successful deployments.

  2. logs of successful deployments might still contain useful log lines coming from logging from within the user-supplied state scripts performing upgrade migration that might need developer investigation.

Thanks in advance for all information and opinions,

Have a nice day.

Hi @fce,

Thanks a lot for reaching out and your ideas! In fact I thought about along very similar lines a while back, and then came to some conclusions.

  • I agree that in some cases the logs of a successful deployment installation might hold interesting information
  • a successful installation (without relevant information) is understood as the default case, for the production case
  • when we’re talking about real product fleet sizes and deployment frequencies, like 10k devices and 4 deployments/year - then who is going to review 40k logs each year?

This brings us to the assumptions:

  • either this relevant information is present on all devices, because it is intrinsic to the deployment/software combination: then it would also show up on a single canary unit. Having one of those running in-house, and accessible to obtain such information, is what I would call a good engineering practise, and hence argue for having that.
  • or the relevant information is only present on a fraction, depending on unknown conditions. This is the tricky part. What is the information? As it ends up in the log, the precondition is that your logic already defines it as worth printing. Now here I would argue for the “fail fast, fail hard” logic. Instead of issuing “soft” warning to logs (which everybody will ignore), make every relevant information an error. This is obviously not a lot of fun, but in the end it forces you to make sure every case is properly covered, and therefore helps the overall quality of the product (and bonus, you get logs on the Mender server).

For corner cases there always is the Troubleshoot Add-On, but - as a preliminary analysis - my take is that a “good log server side storage” does eat storage but not yield noticeable information.

Greets,
Josef

PS: combining this, a possible line of thought might be: “for devices with the Troubleshoot Add-On, all logs are collected”. This reduces the scope to a manageable size - but then, on such a device the logs could inspected be anyways…