Mender on EKS can't access my S3 Bucket

Hello,

I am new to Mender and trying to deploy mender-server 3.7.7 with Kubernetes on AWS EKS.

All my pods are online except mender-deployments which fails with the following error:

main: failed to setup storage client: s3: failed to check bucket preconditions: s3: insufficient permissions for accessing bucket '<my s3 bucket>'

I have confirmed that the IAM role used by the EC2 instance the pod is deployed to has permissions to access the bucket (in fact, I gave it S3FullAccess just to test).

Here is my S3 config that I used when installing Mender:

s3:
    AWS_URI: "https://s3.us-east-2.amazonaws.com"
    AWS_BUCKET: "<my s3 bucket>"
    AWS_REGION: "us-east-2"
    AWS_ACCESS_KEY_ID: ""
    AWS_SECRET_ACCESS_KEY: ""
    AWS_FORCE_PATH_STYLE: "false"

I read in the docs that AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are not needed if deployed to EC2 and using the correct IAM role and permissions (which seems to be the case as far as I can tell). I tried deploying with them absent, and also as empty strings, in case there is a difference.

Is there anything obvious that I am missing or any suggestions on how to debug the issue?

To ensure my cluster can access the bucket I created the following pod:

apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
  namespace: mender
spec:
  containers:
    - name: debug-container
      image: amazon/aws-cli
      command: ["sleep", "3600"]
      tty: true
      stdin: true

Then I ssh’ed into it and was able to use the AWS CLI to access the bucket. This makes me think it is not a permissions issue on the EKS side.

It appears that uninstalling and then reinstalling Mender with Helm in my cluster has fixed the issue.

Hi @JamesTann ,
thanks for getting back. It was maybe a race condition? The IAM role has been created after that the deployment service has been scheduled?

Interesting issue, though. Did you configured a custom service account? If you don’t, the helm chart creates one named default.

Thanks!

Hi @robgio,

Yes it was likely a race condition, although what seemed weird to me was that restarting the deployment pod (after I knew the IAM role was attached and had the correct permissions) did not seem to fix the issue. Only a full reinstall of the chart seemed to do the trick. I think this is because I was adding permissions to the role after I installed the chart and realized what permissions were required.

For the service role, I did create my own.