Getting "error: fw_setenv returned failure: exit status 2" after "Committing update" OTA update

Hi guys, @mirzak @drewmoseley

Mender server: v2.6 (self hosted) & storing mender artifacts files in to AWS S3 bucket
OS: Ubuntu 18.04 X86_64
mender_convert: v2.4

I have created artifacts & .img using mender_convert tool. I have uploaded artifacts into Mender server GUI & triggered the deployments. At the end of deployment, after rebooting the target device but deployment status is showing as failed at Mender server GUI whereas at client side is showing latest artifact. It means that mender client success status is not updating to Mender server GUI,

2021-05-24 06:18:03 +0000 UTC error: Error rebooting device: signal: terminated

2021-05-24 06:20:20 +0000 UTC error: fw_setenv returned failure: exit status 2
2021-05-24 06:20:20 +0000 UTC error: Failed to probe the U-Boot environment for which separator to use. Got error: exit status 2
2021-05-24 06:20:21 +0000 UTC error: transient error: update commit failed: exit status 2

Could you please assist me to resolve this issue. Below are the logs for your reference.

2021-05-24 05:55:52 +0000 UTC info: Running Mender client version: 2.6.0
2021-05-24 05:55:54 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2021-05-24 05:55:54 +0000 UTC info: No public key was provided for authenticating the artifact
2021-05-24 05:55:54 +0000 UTC info: Opening device "/dev/sda3" for writing
2021-05-24 05:55:54 +0000 UTC info: Native sector size of block device /dev/sda3 is 512 bytes. Mender will write in chunks of 1048576 bytes
2021-05-24 06:17:57 +0000 UTC info: All bytes were successfully written to the new partition
2021-05-24 06:17:57 +0000 UTC info: The optimized block-device writer wrote a total of 49729 frames, where 125 frames did need to be rewritten (i.e., skipped)
2021-05-24 06:17:58 +0000 UTC info: Wrote 52143587328/52143587328 bytes to the inactive partition
2021-05-24 06:18:00 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2021-05-24 06:18:00 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2021-05-24 06:18:01 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2021-05-24 06:18:02 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2021-05-24 06:18:03 +0000 UTC info: Rebooting device(s)
2021-05-24 06:18:03 +0000 UTC info: Mender rebooting from active partition: /dev/sda2
2021-05-24 06:18:03 +0000 UTC info: Daemon terminated with SIGTERM
2021-05-24 06:18:03 +0000 UTC error: Error rebooting device: signal: terminated
2021-05-24 06:18:04 +0000 UTC info: State transition: reboot [ArtifactReboot_Enter] -> rollback [ArtifactRollback]
2021-05-24 06:20:17 +0000 UTC info: Running Mender client version: 2.6.0
2021-05-24 06:20:18 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2021-05-24 06:20:18 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2021-05-24 06:20:18 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2021-05-24 06:20:19 +0000 UTC info: Committing update
2021-05-24 06:20:20 +0000 UTC error: fw_setenv returned failure:  exit status 2
2021-05-24 06:20:20 +0000 UTC error: Failed to probe the U-Boot environment for which separator to use. Got error: exit status 2
2021-05-24 06:20:21 +0000 UTC error: transient error: update commit failed: exit status 2
2021-05-24 06:20:21 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> rollback [ArtifactRollback]
2021-05-24 06:20:21 +0000 UTC info: Performing rollback
2021-05-24 06:20:24 +0000 UTC info: Rolling back to the inactive partition (2).
2021-05-24 06:20:26 +0000 UTC error: fw_setenv returned failure:  exit status 1
2021-05-24 06:20:26 +0000 UTC error: Failed to probe the U-Boot environment for which separator to use. Got error: exit status 1
2021-05-24 06:20:26 +0000 UTC error: Rollback failed: exit status 1
2021-05-24 06:20:27 +0000 UTC error: fatal error: exit status 1
2021-05-24 06:20:27 +0000 UTC info: State transition: rollback [ArtifactRollback] -> update-error [ArtifactFailure]
2021-05-24 06:20:27 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2021-05-24 06:20:28 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]

journalctl -u mender-client output is as below

It seems something is sending a signal to the Mender daemon. Do you have any state scripts or other tasks that may be doing something like that?

Drew

@drewmoseley thanks for quick response.
Here I am not using any state scripts.

ls -lrt /etc/mender/scripts/
total 4
-rw-r--r-- 1 root root 1 May 21 00:40 version

fw_printenv output as below:

bootcount=1
mender_boot_part=2
upgrade_available=0
mender_uboot_separator=
mender_boot_part_hex=3

Note: I have also observed that “_INCONSISTENT” is appending to Artifact name at Mender server GUI as "image_20210521_test_INCONSISTENT"

It seems similar issue as
https://tracker.mender.io/browse/MEN-4362

https://tracker.mender.io/browse/MEN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel

@oleorhagen can you help here? Looks like the bugs listed are assigned to you.

$ whereis fw_printenv
fw_printenv: /usr/bin/fw_printenv
$ whereis fw_setenv
fw_setenv: /usr/bin/fw_setenv
$ whereis fw_env.config
fw_env:
$ ls -lrt /etc/fw_env.config
ls: cannot access ‘/etc/fw_env.config’: No such file or directory

It seems that fw_env.config is are missing.

Note: I have also tried it with mender client version 2.6.0 but getting same issue.

@mirzak @oleorhagen is there any suggestions to crack this above issue?

Hi @Rohita83
Interesting that you should not find the fw_env.config file.

Any chance that you could give the commands you used for converting the image, and a download link to the image itself, so I could try and reproduce this?

Since image size is huge big & I am not allowed to share it as per customer restriction. sorry for this.

I have shared mender_convert v2.4.0 tool log for reference. & mender client version v2.6.0.

convert.log.yml (28.6 KB)

I have also tried mender client version 2.5.1. first time OTA is succeed & second time it is in progress. still fw_env.config file is missing :frowning:

Before OTA fw_printenv o/p as below

bootcount=0
mender_boot_part=2
upgrade_available=0

After 1st OTA fw_printenv o/p as below

bootcount=1
mender_boot_part=3
upgrade_available=0
mender_uboot_separator=1
mender_boot_part_hex=3

But you are allowed to share the configs you used when converting?

@oleorhagen

again second time OTA has failed and appended [INCONSISTENT to artifact name.

After second OTA fw_printenv o/p is
bootcount=1
mender_boot_part=2
upgrade_available=0
mender_uboot_separator=
mender_boot_part_hex=2

2021-05-25 07:16:34 +0000 UTC info: Running Mender client version: 2.5.1
2021-05-25 07:16:35 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2021-05-25 07:16:35 +0000 UTC info: No public key was provided for authenticating the artifact
2021-05-25 07:16:35 +0000 UTC info: Opening device "/dev/sda2" for writing
2021-05-25 07:16:36 +0000 UTC info: Native sector size of block device /dev/sda2 is 512 bytes. Mender will write in chunks of 1048576 bytes
2021-05-25 07:40:47 +0000 UTC info: All bytes were successfully written to the new partition
2021-05-25 07:40:47 +0000 UTC info: The optimized block-device writer wrote a total of 49729 frames, where 22639 frames did need to be rewritten (i.e., skipped)
2021-05-25 07:40:49 +0000 UTC info: Wrote 52143587328/52143587328 bytes to the inactive partition
2021-05-25 07:40:50 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2021-05-25 07:40:50 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2021-05-25 07:40:51 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 2
2021-05-25 07:40:52 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2021-05-25 07:40:52 +0000 UTC info: Rebooting device(s)
2021-05-25 07:40:52 +0000 UTC info: Mender rebooting from active partition: /dev/sda3
2021-05-25 07:40:53 +0000 UTC error: Error rebooting device: signal: terminated
2021-05-25 07:40:54 +0000 UTC info: State transition: reboot [ArtifactReboot_Enter] -> rollback [ArtifactRollback]
2021-05-25 07:40:53 +0000 UTC info: Daemon terminated with SIGTERM
2021-05-25 07:43:08 +0000 UTC info: Running Mender client version: 2.5.1
2021-05-25 07:43:08 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2021-05-25 07:43:09 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2021-05-25 07:43:09 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2021-05-25 07:43:10 +0000 UTC info: Committing update
2021-05-25 07:43:12 +0000 UTC error: fw_setenv returned failure:  exit status 1
2021-05-25 07:43:12 +0000 UTC error: Failed to probe the U-Boot environment for which separator to use. Got error: exit status 1
2021-05-25 07:43:12 +0000 UTC error: transient error: update commit failed: exit status 1
2021-05-25 07:43:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> rollback [ArtifactRollback]
2021-05-25 07:43:12 +0000 UTC info: Performing rollback
2021-05-25 07:43:14 +0000 UTC info: Rolling back to the inactive partition (3).
2021-05-25 07:43:15 +0000 UTC error: fw_setenv returned failure:  exit status 1
2021-05-25 07:43:15 +0000 UTC error: Failed to probe the U-Boot environment for which separator to use. Got error: exit status 1
2021-05-25 07:43:15 +0000 UTC error: Rollback failed: exit status 1
2021-05-25 07:43:15 +0000 UTC error: fatal error: exit status 1
2021-05-25 07:43:16 +0000 UTC info: State transition: rollback [ArtifactRollback] -> update-error [ArtifactFailure]
2021-05-25 07:43:16 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2021-05-25 07:43:16 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]

I see, so this is using the x86 config. Which means it is using Grub, and not U-Boot, so no wonder the fw_env.config file is not found.

I’m not too familiar with the Grub setup though, so I’m afraid I’m not too much help here.

@kacf maybe you have some pointers?

@kacf @tranchitella

is there any suggestion for above issue.

surprising that above issue is observing in S3 bucket artifact storage in hosted mender server instead of EC2 minio artifact storage & also here first OTA is success but second or onward OTA is failing with the same image/artifacts.

Below is the sequence:

  • Flash the new image on the target device
    
  • small OTA push  i.e. single file OTA update (~1 mins)
    
  • Full OS mender OTA update (~30 mins) ==> success deployment
    
  • again small OTA push  i.e. single file OTA update (~1 mins)
    
  • again Full OS mender OTA update (~30 mins) ==> fail deployment as below screenshot
    

Hey @Rohita83, sorry for the delay, busy times!

There are several things here which are a bit suspicious. Let’s look at the first issue to begin with:

Error rebooting device: signal: terminated

What the client expects, is that the reboot command succeeds, and then at some point later, the client is killed. But it looks like what’s happening here is that the command fails, but the reboot still happens. I suspect the reason is that the reboot command itself gets killed as part of the reboot, and never gets to report the success. The client is not taking this scenario into account.

If you have a chance to patch the client, can you try to apply this patch:

diff --git a/system/system.go b/system/system.go
index a96d7d8..c676058 100644
--- a/system/system.go
+++ b/system/system.go
@@ -33,10 +33,7 @@ func NewSystemRebootCmd(command Commander) *SystemRebootCmd {
 }
 
 func (s *SystemRebootCmd) Reboot() error {
-       err := s.command.Command("reboot").Run()
-       if err != nil {
-               return err
-       }
+       s.command.Command("reboot").Run()
 
        // Wait up to ten minutes for reboot to kill the client, otherwise the
        // client may mistake a successful return code as "reboot is complete,

Note that it needs to be applied to the client that you upgrade from, not the one upgrade to. So you may need to install it manually first, or flash the device from scratch.

@kacf Thanks for informative response.

Since i am using mender_convert v2.4.0 tool directly to make artifact which having mender_client version v2.6.0.

To apply above patch into mender_client, i need to modify mender_convert file mender-convert-modify and create deb package for this. Please let me know if i am wrong. could you please assist me to make this deb package.

He @Rohita83, can you try this package?

It is a development build, so don’t use it in production. But let’s see if it fixes the problem.

@kacf

Since we are on mender Client 2.6, there is a change in the changelog as below screenshot that was dated back in April 2021, titled “mender-binary-delta (1.2.1)” where it mentions that there has been a fix for the incorrect status during a rollback when the bootloader is being used. Would this change be a part of the Client 2.6 release? Or would I have to upgrade the server to 2.7 to get these changes? is the above change relevant to the issue that I am experiencing?

I will also test the change that u have mentioned above and will come back with the result.

https://docs.mender.io/2.7/release-information/release-notes-changelog

The mender-binary-delta version is independent of both the client and the server, it should work with either. It depends on which version you have installed from here.

We are switching to a new changelog structure in the development version of our docs, split into component pages. This avoids everything being lumped together like it is now, which is hopefully easier to understand and follow. Check it out here.

@kacf Thanks for quick response.

As per you provided mender deb package, I have modified mender convert utility to include this mender client package to create mender artifact as below

Note: Here i have rename your client package name as deb_name=mender-client_2.6.0-LOCAL_amd64.deb

Attached mender convert log for reference.
convert.log.yml (26.3 KB)

After creation artifact (i.e. alto_p3_upgV2.6_20210601_Testserver_TEST
.mender) & upload it into server GUI successfully, I have triggered OTA update twice but unfortunately it failed, attached logs screenshots for reference below as log-1

I have also directory update OTA to change in Home partition and then again trigger OTA but failed again as below log-2 for reference.

ARTIFACT_NAME=“HOME_TEST_v1.0”
DEVICE_TYPE=“x86_64”
OUTPUT_PATH=“HOME_TEST_v1.0.mender”
DEST_DIR=“/home/alto/TEST/”
FILE_TREE=“Test_local”

./directory-artifact-gen -n ${ARTIFACT_NAME} -t ${DEVICE_TYPE} -d ${DEST_DIR} -o ${OUTPUT_PATH} ${FILE_TREE}

Note: Additionally I have seen that OTA deployment (multiple attempts) is not consistent in mender v2.6 with S3 bucket artifact storage as below observations.

Waiting for your input to crack this above issue.

@kacf
Additional information, earlier I was using mender version 2.6 with self-hosted Minio (EC2) based storage configuration that OTA deployment time was long ~2 hrs 30 mins but i was not experience this above issue. Once i have switched AWS S3 based storage configuration, the OTA deployment time has down ~30 mins only but having this above mention issue randomly (sometimes its success and sometimes fail). Here I don’t think this error related to S3 based configuration but just it is my observation.

Hey @Rohita83,

Ok, so at least we got rid of that error. I’d really like to isolate the problem on the client side. Any chance you can copy the artifact onto the device, or to a memory stick, and install it using standalone mode? This would give us a chance to figure out exactly what is going on.

Assuming you have put the artifact in /data/artifact.mender, do these steps to do a manual install:

mender install /data/artifact.mender
reboot

# At this stage, maybe try "fw_printenv" to verify that the
# environment is working correctly.

mender commit

Does that work?