Some of the devices in our fleet continue to create new authorization sets even after they’ve been recently accepted (presumably with similar information).
Why does this happen? Is there any way to prevent this?
Some of the devices in our fleet continue to create new authorization sets even after they’ve been recently accepted (presumably with similar information).
Why does this happen? Is there any way to prevent this?
Something appears to be happening resulting in the mender-agent.pem file being regenerated. If a device with the same identity connects but the mender-agent.pem has changed, then you will see this behavior.
Based on your other post, I wonder if you have the case of devices with the same MAC address? I don’t know how BBB gets its MAC address but it may be the case that it is somehow happening.
Drew
I see. I was able to deploy an OTA update to the device successfully, but it still seems to be sending new auth requests multiple times per minute, even after I’ve dismissed them. Is there a way to debug or verify that something alarming isn’t going on?
Also, the mender client version on these devices is 1.7.1, is there an update that might help to resolve or affect this?
Hard to say without knowing the root cause but I doubt it as this is pretty fundamental behavior.
If you do have multiple devices using the same MAC address then this could happen. They would obviously have different certificates so every time one of them attempted to connect to the server it would likely be considered a new authentication set for the device. If you can get a shell on the device, then perhaps you can dump the Mender client logs and we can see more details.
Drew
Good to know. I didn’t think it was possible to have two devices with the same MAC address!
Are there more detailed or more useful logs than copying the output of journalctl -u mender
? Recent output of that below:
Oct 20 15:02:20 ps157 mender[214]: time="2020-10-20T15:02:20Z" level=error msg="failed to send status to server: transient error: reporting status failed: Put https:
//hosted.mender.io/api/devices/v1/deployments/device/deployments/358885c2-163b-4467-b134-0200bc290b90/status: dial tcp: lookup hosted.mender.io on [::1]:53: read udp
[::1]:53177->[::1]:53: read: connection refused" module=state
Oct 20 15:02:20 ps157 mender[214]: time="2020-10-20T15:02:20Z" level=info msg="State transition: update-status-report [ArtifactCommit] -> update-retry-report [none]"
module=mender
Oct 20 15:07:18 ps157 mender[214]: time="2020-10-20T15:07:18Z" level=info msg="State transition: update-retry-report [ArtifactCommit] -> update-status-report [Artifa
ctCommit]" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="State transition: update-status-report [ArtifactCommit] -> idle [Idle]" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="authorization data present and valid" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:j[Idle] -> check-wait [Idle]" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]" module=mender
Oct 20 15:07:21 ps157 mender[214]: time="2020-10-20T15:07:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]" module=mender
Oct 20 15:07:21 ps157 mender[214]: time="2020-10-20T15:07:21Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]" module=mender
Oct 20 15:07:21 ps157 mender[214]: time="2020-10-20T15:07:21Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]" module=mender
Oct 20 15:37:20 ps157 mender[214]: time="2020-10-20T15:37:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]" module=mender
Oct 20 15:37:21 ps157 mender[214]: time="2020-10-20T15:37:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]" module=mender
Oct 20 15:37:21 ps157 mender[214]: time="2020-10-20T15:37:21Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]" module=mender
Oct 20 15:37:21 ps157 mender[214]: time="2020-10-20T15:37:21Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]" module=mender
Oct 20 16:07:20 ps157 mender[214]: time="2020-10-20T16:07:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]" module=mender
Oct 20 16:07:21 ps157 mender[214]: time="2020-10-20T16:07:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]" module=mender
Oct 20 16:07:21 ps157 mender[214]: time="2020-10-20T16:07:21Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]" module=mender
Oct 20 16:07:21 ps157 mender[214]: time="2020-10-20T16:07:21Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]" module=mender
Oct 20 16:37:20 ps157 mender[214]: time="2020-10-20T16:37:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]" module=mender
Oct 20 16:37:21 ps157 mender[214]: time="2020-10-20T16:37:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]" module=mender
Oct 20 16:37:21 ps157 mender[214]: time="2020-10-20T16:37:21Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]" module=mender
Oct 20 16:37:21 ps157 mender[214]: time="2020-10-20T16:37:21Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]" module=mender
Oct 20 17:07:20 ps157 mender[214]: time="2020-10-20T17:07:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]" module=mender
Oct 20 17:07:21 ps157 mender[214]: time="2020-10-20T17:07:21Z" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]" module=mender
Oct 20 17:07:21 ps157 mender[214]: time="2020-10-20T17:07:21Z" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]" module=mender
Oct 20 17:07:21 ps157 mender[214]: time="2020-10-20T17:07:21Z" level=info msg="State transition: update-check [Sync] -> check-wait [Idle]" module=mender
From Mender that is the only logging of interest.
This bit indicates that it is not regenerating the certificate:
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="State transition: update-status-report [ArtifactCommit] -> idle [Idle]" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="authorization data present and valid" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:j[Idle] -> check-wait [Idle]" module=mender
Oct 20 15:07:20 ps157 mender[214]: time="2020-10-20T15:07:20Z" level=info msg="State transition: check-wait [Idle] -> inventory-upda
Other logs are available from systemd-journald.
Devices are not supposed to share MAC addresses but there is nothing enforcing that and sometimes you explicitly want to take over a MAC address from another device. It used to be common when ISPs locked you to a MAC address and you wanted to replace your router without getting their permission.
If you have physical access to this one system and can shut it down, then you can monitor the Mender connections to see if there is another device trying to connect.
Drew
Okay, so I have powered down ps157 and dismissed all of the auth requests from earlier in the day and alas, new requests started coming in:
Pointing to your deduction of there being two devices using the same mac address! (right?)
To resolve it, I would imagine the steps would be:
Does that sound right?
It does seem that you have two devices with the same MAC address. I’m not sure that necessarily means that one needs to be replaced, depending on how the MAC addresses are set. There seem to be a lot of forum requests for the ability to change the MAC address so it may be that your systems, for some reason are not using a HW provided value but rather something else. Two links that looked helpful are:
Drew
Update: it seems like we actually have many more than two devices with the same mac address simultaneously sending auth requests.
Given that I’ve now decommissioned and re-added 9+ different devices with the same mac address, how likely it actually is for each of those BBB’s to have identical eth0 HWAddresses? Maybe there’s some intermediary networking device getting polled instead? According to this link:
The values read from Control Module (Base address 0x44E1_0000) MAC_ID0_LO register (Offset 0x630), MAC_ID0_HI register (Offset 0x634), MAC_ID1_LO register (Offset 0x638), and MAC_ID1_HI register (Offset 0x63C) represent unique MAC addresses assigned to each AM335x device. The values in these registers are programmed into each AM335x device by TI and can not be changed.
The Device ID’s appear to be distinct, but I’m wondering if we should plan to add a custom identity file (perhaps with our own internal serialization, i.e. psXXX) to a future OTA release? Or is that something that we’ll only be able to implement on new devices that aren’t already in the field since it’s stored in the /data
directory?
If we implemented the custom client identity file, would that bypass this anyways?
have you already ruled out, when you manually run the device identity file on a dev BBB device, that it produces the expected result for that device?
i don’t believe anything related to device identity is stored in the data directory, or it wasn’t historically.
Funnily enough we decided we wanted to change our custom device identity script on our devices in the field only just a couple of weeks ago (devices only in beta testing currently) to correlate with info warehousing needed. I deployed a new artifact to all the devices with the new device identity script in it, and then once they had updated decommissioned them all, cleaned up the auth database on the server as a precaution, and then re-authorized them again as and when they all appeared in pending devices with the new identity.
is the BBB serial number not unique enough?
This is something we need to test, but it’s probably our next step after we get to the bottom of why there’s so many devices with the same address. It also seems quite difficult to deploy a new artifact if all of the devices are showing up as a single device?
The serial number from the BBB EEPROM seems like it would potentially help, but that hexdump doesn’t seem to output anything on our system. We are also running our os from SD cards as opposed to using the onboard eMMC which might affect our ability to pull from that directory?
It also seems quite difficult to deploy a new artifact if all of the devices are showing up as a single device?
good point
We are also running our os from SD cards as opposed to using the onboard eMMC which might affect our ability to pull from that directory?
it shouldn’t matter what device you are booting from, as the access to EEPROM hardware is exposed by the kernel via its dynamically created sysfs pseudo file system on boot.
have you already ruled out, when you manually run the device identity file on a dev BBB device, that it produces the expected result for that device?
just in case there’s a bug in the script that’s failing in some way and creating the same MAC address for each device even though they actually have different physical mac addresses, as that script is part of the middleware between the physical hardware and mender-server
it shouldn’t matter what device you are booting from, as the access to EEPROM hardware is exposed by the kernel via its dynamically created sysfs pseudo file system on boot.
Okay, I wasn’t able to find the eeprom file in that directory or on our filesystem. Do you know another place I should check?
just in case there’s a bug in the script that’s failing in some way and creating the same MAC address for each device even though they actually have different physical mac addresses, as that script is part of the middleware between the physical hardware and mender-server
Ah, got it. Good idea, will try that on the devices that we have access to and post results.
it might not be exactly at the same path as the example above have a look for an eeprom file somewhere under /sys/bus/i2c/devices or /sys/class/i2c-dev/
if not, at a guess you may need to enable some kernel module or configure the device tree to get it to appear