Feature Request: Factory Reset request on Deployments

Hi,

I would like to seek the community opinions regarding one feature.

It is mostly architectural, and I ask just because of encountering it, now more than once in company-made internal OTA systems, and seeing it being massively used, when available.

Although, to my knowledge, it is not present any existing publicly avaialable OTA providers offer it.

Does this behavior fit in standard OTA scenarios ?

I would like to know what is done elsewhere :slight_smile:

Contex

Portal. Deployment creation page.

Feature Request

An option to request a ‘factory reset’ of the device together with the deployment.

Expected Behavior

Two options to mind:

  1. mender-managed: Mender clears all from the /data partition, except for what is needed by mender to operate the deployment (ex. leaving /data/mender/).
  2. user-managed: an option --factory-reset is passed to the user-supplied state scripts. The user is responsible to clear all data that that represents state.
  3. others ?

Use Case

I list a few that come to mind:

  • remote devices with suspect corrupt configuration
  • devices coming from very old versions, now incompatible with cloud, or with deprecated configurations, that can be easily be reconfigured (ex. sitting in a drawer in a lab, re-purposed)
  • devices under test, ex. the lava server every night wipes it and puts it to a clean version vx.x.x to test

Caviat

I acknowledge that the term ‘factory reset’ is ambiguous:

Ex.

  • A device, let’s call it ‘TestDevice32’ sold in 2019, it was shipped with version v2.4.0
  • Current products, of the same model, are now shipped with current stable version v4.1.0

Now, if the device ‘TestDevice32’ recevies today a ‘factory-reset’, should it get:
a) a clean v2.4.0, as it litterally it came out of factory (the role of the word factory) ?
b) or a clean v4.1.0, as current products now come out of factory ?

Another term might be used, I used ‘factory reset’ because of its widespread adoptions in user-facing device HMIs (smartphones, camers, etc..).

Thanks in advance for any opinion,

Have a great day,

Hi @fce,

Thank you for your thoughts! Indeed resets are a common part of device lifecycle management and as such related to Mender. As you already said, the problem is that there usually are various degrees of “resetting” and expectations involved here.

Applicable scopes

User

This is essentially what people refer to as “wiping”. All user configuration, data, and so on is removed, and upon next boot the “first boot” process or whatever provisioning process applies to the device launches (again). Such a reset is recommended when the device is sold or handed over to a new user, and mandatory to be available under the CRA.

A user reset must effectively be completely detached from any kind of remote service, so it boils down to the implementation in the device software.

Service

This one is very similar to the user, but might involve changing protected material, like a parts replacement log, device internal errors, or such. Still the device identity stays the same.

Factory

The “good as new device” reset. This wipes all internal logs, key material, possibly even the installed software, so the device can be put into the default provisioning/manufacturing process again. This is the only of the reset types which would actually get carried out by the manufacturer themselves (hence, “real factory reset”)

Affected data

Scoping the affected data is complicated. Some things are clear cut, like users name, WiFi credentials, or such. Others less so, like non-user visible internal logs, runtime counters. At the extreme end again are device identity markers and hardware calibration data. In my experience, what should be involved in which reset is subject to manufacturer processes and often politics as well as philosophy.

on the software state

While sometimes done the “burnt in software version way”, usually a reset means “keep the last installed state”. Reverting to the initially shipped one is not done that often to my knowledge.

How this relates to Mender

My understanding is that the types of “resettable data”, their scopes and and storage strategies vary way too much for a generic solution, and given the fact that the by far largest number of expected operations are the user wipes which must be triggered without remote interaction, trying to provide something in this area is not really meaningful.
However, the “full factory reset” might be interesting to address. My understanding is though that it would rather make sense to implement it as a separate artifact, which gets created right alongside a software release as part of the same CI/CD/testing process. Both can get uploaded then, and give the device manufacturer the means to fully wipe a specific device upon request.

So I would conclude the first thing to look at is understanding the data and its processes, then design the reset processes, and then, last wrap them up.

Greetz,
Josef

Thank you Josef,

great answer, as ususal.

Additional Scopes

Remote-only scenarios:

For the user vs. remote operations, I see cases of B2B Industrial scenarios where there is no user interaction, or offline user triggered-wiping, the configuration is remote-only.

Ex: CompanyA uses Mender in its device. CompanyA installs devices in the factory of CompanyB. Company A manages the device entirely remotely, OTA, configuration, and all.
Never goes there again.
Wipes are always remote, part of troubleshooting, re-purposing, or jump between very very incompatible versions.

I don’t know, though, what portions of the global Mender usage these represent.

And I think that probably CRA will impose that a wipe offline button will always be exposed on the device, for the user to operate, will need to check the regulation again.

On the software state

I can confirm that in my POV also the “keep the last installed state” is the vast majority of the cases, and “burnt in software version way”.

How this relates to Mender

The solution of a separate artifact is absolutely viable, although I wander, to avoid environmental storage waste, would it multiply x2 the storage size on the Mender back-end ?
Or, in the following scenario:

  • Artifact 4.2.3 : readonly root-fs with hash faefraer21a365ef4a56df4
  • Artifact 4.2.3-factory : readonly root-fs with same hash faefraer21a365ef4a56df4 + extra deployements steps to wipe

Is the Mender portal able to see that the same rootfs is used twice, and optimise by storing the rootfs only once ?

Otherwise, is the following solution common ?
Maintaining a single ‘wipe’ artifact release, ex 4.2.3 → 4.x-wipe → 4.2.4, which does nothing else then wiping and transitioning between versions.

It does not feel proper, and should probably not exist, but I let you confirm me if that is the case with your knowledge ?

Thank you in advance,

Have a great day,