We have a use-case for updating devices in a WiFi mesh-network via mender and would like to hear your opinions about it and if you’d like to see official support for that.
The goal here is to exchange ideas and feedback and come to a conclusion if this is a good idea after all and if so what would be the best way.
We’ll probably need some server-changes and/or additional device APIs for that kind of support and designing and implementing these together would probably be a good idea.
Would like @eystein to chip-in on this but I will take first swing .
It is an interesting use case, that at least we should explore to see what implications it might have and if it fits in to the general solution.
Just for us to get a better understanding of the use case, how do you envision the update to be distributed in such a network?
- Would there be a gateway device that is connected to a Mender server, fetches the update and then re-distributes it to a node in the mesh network and then the nodes forward it on?
- Do you already know what type of API`s would be required for this?
first a few basics:
- anyone in a mesh can become root-node - that’s decided dynamically
- there can be multiple root nodes at the same time managing separate sub-meshes
- one mesh-feature we should take advantage of is streaming an update to multiple/all devices at the same time saving time and traffic
- I’m planning to run this on MCUs where all devices including the root nodes are normal devices running the same software
now to how I think this could work:
- the root nodes would act as proxies to their sub-meshes
- to reduce traffic there can’t be any handshake-based encryption like SSL
- one way would be to use pre-shared keys between clients and the mender-server to either encrypt or sign every request(instead of
- another way would be to just trust the mesh as a unit use plain-text, traffic-optimized communication between the mesh nodes and root-node
for both ways:
- there needs to be a new API call to update devices in bulk to reduce the number of requests the root node has to make
- any node in the mesh needs to be able to update inventory/identity/deployment data from any other node in the mesh. This access-control could make use of device-groups.
- the nodes would not poll for new updates, instead the root-node would do that in bulk and just start streaming the update file to all affected nodes in case one is available
for way 1:
- the bulk requests would have to contain individually signed data which may dramatically increase the request size depending on the algorithm.
I hope this gives you an overview of what I currently think is the best way. Feel free to show me wrong by coming up with better ways
Looks like an interesting architecture.
From this (MCU client) I assume you will not be using the currently supported Mender client, rather the new C-based one you were working on?
We have been thinking about a “store-and-forward” architecture for network topologies like these, where a gateway node would act as an intermediate between devices and the Mender server. In this model the gateway would be your “root node”.
Regarding transport security I think pre-shared keys would lead to a pretty big key management problem. On a local network it might be OK to not encrypt the traffic as long as the Artifact itself is signed (and potentially encrypted – which is not yet supported in the format).
How the root node would authenticate as a different node is a complex question on it’s own. I am not sure if this should be supported by the server. Perhaps the root node needs to know all the credentials (e.g. identity attributes, private keys, or API tokens) of the other devices and use them to authenticate as that device.
Finally, how the root node would “stream” an update to multiple other nodes simultaneously is yet another problem. Is it guaranteed that they would all be running the same software? I am not sure if we should make that assumption in general.
This is a pretty complex architectural discussion to have in the forum, but hopefully it helps.
@eystein Sry for the late answer but I had to work on other projects last month.
This will be based on our C-based mender client, yes.(We just did final reviews and cleanups, so you can expect the release soon)
We thought a lot about doing OTA over WiFi-Mesh and how this can be done both secure and efficient at the same time.
We also don’t want to trust the root node anymore because the risk of that getting compromised is too high(also, anyone in the mesh can become root node, there’s no dedicated hardware).
This is our new concept-draft:
We’re planning to use a DTLS-encrypted MQTT-SN connection for our cloud-connection which allows us to have end-to-end encryption between the mesh-nodes and a mqtt-sn gateway which decrypts and forwards all traffic to our mqtt-broker. This means that the root-node acts as a proxy and never sees plain text traffic.
This means that the only thing that has to be changed in the mender-server is adding support for a MQTT device API which would also make this change apply to more use-cases.
Mender would connect to a MQTT-broker and create a /mender topic with the sub-topics reflecting the existing API-URLs. e.g:
Besides of normal request data each request additionally will receive the client-id(e.g. the mac or a CN) and a response-topic where it has to send the result of the request to, e.g.:
each client would subscribe to “/mender/responses/DEVICEID/*” to receive responses.
So there are two things that have to be done in the mender-server:
- there needs to be a mqtt-client which connects to a broker to wait for and answer client requests. There doesn’t have to be any device-authentication because the broker can and should be trusted and the devices already authenticated themselves to the broker. Since the communication between mender and the broker should be secured(either through TLS or because they’re the same server and use local-only communication) it’s still safe and doesn’t require additional device-permissions like the previous concept
- I didn’t take a close look at the code of menders device API implementation yet but ideally we’d be able have both HTTP and MQTT APIs end up in the same request handlers(that’s why we chose to use the existing URL structure as our MQTT topics). In case the request handlers currently are too HTTP-specific we need more abstraction there. Also the device-authentication would have to be HTTP-only. MQTT devices will never make the auth call or request a token.
About the combined-update optimization, this is our current idea:
- the mesh nodes will receive an update URL via the usual API.
- they tell the root node that they want to receive an update from that URL
- the root node waits some time(e.g. 1h) to collect as many clients who want an update as possible
- the root-node creates groups of devices with the same URL and starts downloading and streaming them to the devices.
At that point the clients can keep using the mqtt-api to update their deployment status.
This way is only safe for devices with secure-boot. A root node could still stream modified updates, but the clients would never boot them due to invalid secure-boot signatures.
This looks like it is well thought out.
From an integration perspective, I do think that devices in this case must authenticate and get their auth token, that is used in subsequent requests: https://docs.mender.io/2.0/apis/device-apis/device-authentication – the only difference is that the transport is (secured) MQTT instead of https.