Awx-operator: AWX Operator cannot create pods after upgrading from k8s 1.21.3 to 1.22.0

Created on 11 Aug 2021  ·  6Comments  ·  Source: ansible/awx-operator

ISSUE TYPE
  • Bug Report
SUMMARY

AWX Operator cannot create awx-pods after upgrading from k8s 1.21.3 to 1.22.0
Worked well before.

ENVIRONMENT
  • AWX version: 19.2.2
  • Operator version: 0.12.0
  • Kubernetes version: v1.22.0
  • AWX install method: on premise with ubuntu 20.04 and docker
STEPS TO REPRODUCE

From 1.21.3 install awx-operator and setup awx. It work well.
Upgrade to 1.22.0
Kill and recreate the awx deployment
pods awx postgres is up
pods awx server is not up

EXPECTED RESULTS

AWX should be up in running state

ACTUAL RESULTS

Only awx postgres is up

NAME                                     READY   STATUS    RESTARTS      AGE
awx-itd-postgres-0                       1/1     Running   0             8m29s
awx-operator-545497f7d5-k88wr            1/1     Running   1 (34m ago)   55m
nfs-client-provisioner-5c95d8f86-9tm6k   1/1     Running   5 (34m ago)   5d4h
ADDITIONAL INFORMATION

Yaml file i use to create pods :

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx-itd
spec:
  service_type: LoadBalancer
  loadbalancer_protocol: http
  loadbalancer_port: 80
  loadbalancer_annotations: |
    metallb.universe.tf/address-pool: bde-172-17
  hostname: awx.bde.lab
  replicas: 2
  projects_persistence: true
  projects_storage_class: managed-nfs-storage
  postgres_storage_class: managed-nfs-storage
  #adminUser: admin
AWX-OPERATOR LOGS
PLAY RECAP *********************************************************************
localhost                  : ok=29   changed=2    unreachable=0    failed=1    skipped=27   rescued=0    ignored=0


-------------------------------------------------------------------------------
{"level":"error","ts":1628693554.202394,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx-itd","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}

Most helpful comment

Observing the same.

After updating microk8s to 1.22 and then trying to deploy operator 0.13, the operator container upgraded but seem to get errors when it tries to upgrade the awx pod

"job":"6175742077372812453","name":"awx","namespace":"default","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:239"}
{"level":"error","ts":1629084951.678016,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}

the awx yaml is extremely basic:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  task_privileged: true

All 6 comments

Observing the same.

After updating microk8s to 1.22 and then trying to deploy operator 0.13, the operator container upgraded but seem to get errors when it tries to upgrade the awx pod

"job":"6175742077372812453","name":"awx","namespace":"default","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:239"}
{"level":"error","ts":1629084951.678016,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}

the awx yaml is extremely basic:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  task_privileged: true

Chatting with some folks from the operator-sdk team.

This seems like it might be caused by the fact that our operator is built upon version 0.19 of the sdk.

cc @Spredzy @rooftopcellist

Confirmed with the operator-sdk team that operators built upon 0.x are not going to work on the newer version of kubernetes. We've prioritized bumping this at some point in the near-term. Any updates will be posted here.

I can confirm this issue exists on version 1.22.1.
Going back to version 1.21.3 resolves the issue 🥳

With the addition of this PR - https://github.com/ansible/awx-operator/pull/508, I am able to deploy the awx-operator and awx app to an Openshift 4.9 cluster (k8s v1.22.0).

$ oc version
Client Version: 4.6.8
Server Version: 4.9.0-0.nightly-2021-08-23-224104
Kubernetes Version: v1.22.0-rc.0+5c2f7cd

All containers are running and I am able to log in and run a job from the UI.

image

To use this fix, you will need to build the awx-operator image from devel at the moment as a release with this fix has not yet been cut.

I was also able to deploy 0.19.3 by editing https://raw.githubusercontent.com/ansible/awx-operator/devel/deploy/awx-operator.yaml to:

...
      containers:
        - name: awx-operator
          image: 'quay.io/ansible/awx-operator:devel'
          env:
            - name: WATCH_NAMESPACE
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: awx-operator
            - name: ANSIBLE_GATHERING
              value: explicit
            - name: OPERATOR_VERSION
              value: devel
            - name: ANSIBLE_DEBUG_LOGS
              value: 'false'
...

note changes on image tag and OPERATOR_VERSION to devel and applying it

Thanks!

Was this page helpful?
0 / 5 - 0 ratings