AWX Operator fails to perform installation.
Had an instance of 17.0.1 running, don't care if the data persists either.
Performed data migration following Data Migration instructions.
Performed install of AWX Operator following INSTALL.md
Follow INSTALL.md
Expected to see pods/AWX instance
minikube kubectl apply -- -f myawx.yml
After 30 minutes only the orchestrator is running, tailing the logs shows a looping error.
xxx@yyy:~$ minikube kubectl get pods
NAME READY STATUS RESTARTS AGE
awx-operator-5595d6fc57-hdj9d 1/1 Running 0 29m
xxx@yyy:~$ minikube version
minikube version: v1.18.1
{
"level": "error",
"ts": 1620223136.6924627,
"logger": "logging_event_handler",
"msg": "",
"name": "custom.name.awx",
"namespace": "default",
"gvk": "awx.ansible.com/v1beta1,Kind=AWX",
"event_type": "runner_on_failed",
"job": "2601737961087659062",
"EventData.Task": "Create Database if no database is specified",
"EventData.TaskArgs": "",
"EventData.FailedTaskPath": "/opt/ansible/roles/installer/tasks/database_configuration.yml:68",
"error": "[playbook task failed]",
"stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/events.loggingEventHandler.Handle\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/events/log_events.go:87"
}
Blew the entire thing away and restarted fresh. Service pods are stuck 0/4 pending. It's been an additional 45 minutes now.
This is the exact task that continuously fails over and over again with no real output/log.
TASK [installer : Apply deployment resources] **********************************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:34
Output
{
"level":"error",
"ts":1620229749.932002,
"logger":"controller-runtime.controller",
"msg":"Reconciler error",
"controller":"awx-controller",
"request":"default/awx",
"error":"event runner on failed",
"stacktrace":
"github.com/go-logr/zapr.(*zapLogger).Error
pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"
}
Logs
awx-web
: stale after image with the following two lines
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
redis
: stale after image downloads
awx-task
: nothing
awx-ee
: nothing
@bandwiches we will need more information to understand what is going on.
Please send us the following:
kubectl get awx -o yaml awx
kubectl describe deployment awx
kubectl describe statefulset awx-postgres
kubectl get pods
kubectl get events
Thanks!
I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.
describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt
@bandwiches we will need more information to understand what is going on.
Please send us the following:kubectl get awx -o yaml awx kubectl describe deployment awx kubectl describe statefulset awx-postgres kubectl get pods kubectl get events
Thanks!
For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.
get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt
(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.
Here is the snippet of the error i'm seeing which i believe is exactly like @bandwiches error.
{"level":"error","ts":1620310325.2259731,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}
{"level":"info","ts":1620310327.3738635,"logger":"logging_event_handler","msg":"[playbook task]","name":"awx","namespace":"default","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"261049867304784443","EventData.Name":"installer : Patching labels to AWX kind"}
@exodusprime1337
Spot on.
I have the exact same error, but on a bare-metal kubernetes cluster:
AWX version: 19.1.0
Operator version: 0.9.0
Kubernetes version: v1.21.0 with containerd 1.4.4
AWX install method: operator
@bandwiches we will need more information to understand what is going on.
Please send us the following:kubectl get awx -o yaml awx kubectl describe deployment awx kubectl describe statefulset awx-postgres kubectl get pods kubectl get events
Thanks!
For the sake of clarity, I feel I should state that I'm using minikube since it is recommended by the AWX install guide.
get_awx.txt
describe_deployment_awx.txt
describe_statefulset.txt
get_pods.txt
get_events.txt(Edit) I see a CPU warning (insufficient CPU) for the AWX pod. I have to say, this is a dedicated VM w/2 CPU and 2GB RAM. This VM has had no issues running AWX v15 and v17. New install method introduced in v19 all of a sudden complains about resources? Understandable that this could change from version to version, but it would be nice to know minimal system requirements now that it's an issue.
For your case, it looks the issue is related with the CPU (like you mentioned)
NAME READY STATUS RESTARTS AGE
awx-5b58db49c-9gslf 0/4 Pending 0 7m3s
awx-operator-5595d6fc57-92txg 1/1 Running 0 10m
awx-postgres-0 1/1 Running 0 7m14s
LAST SEEN TYPE REASON OBJECT MESSAGE
87s Warning FailedScheduling pod/awx-5b58db49c-9gslf 0/1 nodes are available: 1 Insufficient cpu.
Looking at your deployment, we can see it's using the default resource
limits:
awx-web:
Image: quay.io/ansible/awx:19.1.0
Port: 8052/TCP
Host Port: 0/TCP
Requests:
cpu: 1
memory: 2Gi
....
awx-task:
Image: quay.io/ansible/awx:19.1.0
Port: <none>
Host Port: <none>
Args:
/usr/bin/launch_awx_task.sh
Requests:
cpu: 500m
memory: 1Gi
....
Please note the suggested values (memory and cpu) are still the same (see https://github.com/ansible/awx-operator/pull/93/files) and you can override it to fulfill your needs. That should the job for you. Please let us know.
I'm seeing the same thing in minikube. here is the output from mine. Saw the same error in kubernetes on centos7 as well. Fresh install with all latest binaries.
describe_awx.txt
describe_stateful.txt
events.txt
get_awx.txt
pods.txt
Same thing here @exodusprime1337
LAST SEEN TYPE REASON OBJECT MESSAGE
2s Warning FailedScheduling pod/awx-5b58db49c-bfwnt 0/1 nodes are available: 1 Insufficient memory.
21m Normal SuccessfulCreate replicaset/awx-5b58db49c Created pod: awx-5b58db49c-bfwnt
awx-web:
Image: quay.io/ansible/awx:19.1.0
Port: 8052/TCP
Host Port: 0/TCP
Requests:
cpu: 1
memory: 2Gi
Requests:
cpu: 500m
memory: 1Gi
If you run kubectl get nodes <NODE_NAME> -o yaml
, you shall see the amount of memory for your node:
allocatable:
cpu: 7800m
ephemeral-storage: "222240964241"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 31547268Ki
pods: "250"
capacity:
cpu: "8"
ephemeral-storage: 235495Mi
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32173956Ki
pods: "250"
> kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
p70 763m 9% 12685Mi 41%
@tchellomello thanks for the update there. I'm following you, but I have a serious concern about the AWX install tutorial since it gives a bare minimum config and that leads to this result. Perhaps there should be more cross-communication between the two packages to ensure that the minimal config is actually the bare minimum? These settings are never mentioned in install doc.
Edit -
Per your link, I noticed both of these.
awx_v1beta1_molecule.yml
(cpu: 500m, memory: 128M // cpu: 500m, memory: 128M)
installer\defaults\main.yml
(cpu: 1000m, memory: 1Gi // cpu: 500m, memory: 2Gi)
One issue is that AWX INSTALL.md doesn't have any mention of minimal requirements thus making the transition from v17 to v19 even harder since what worked before, may no longer work as "default". While I understand requirements may change, it would also be nice to know that the minimal requirements/default have changed.
@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.
In regards to the https://github.com/ansible/awx-operator/blob/devel/deploy/crds/awx_v1beta1_molecule.yaml, that is used on the molecule
tests here -> https://github.com/ansible/awx-operator/blob/devel/molecule/test-local/converge.yml#L31 so that is totally different scenario and should not necessarily be consistent as for this test we don't need to allocate that mount of memory and cpu.
@bandwiches I hear you, I agree that the documentation has lots of room to improve, and please if you see any place that could use some enhancement, do not hesitate to submit a PR.
I would love to, except I think the awx
repo is outpacing awx-operator
and making the inconsistencies impossible to fix.
In regards to your response about system settings - understood and that's fair, no qualms about that.
I was running into another issue once I was able to resolve the resources issue and I feel it's actually still appropriate here. The awx-service
was not externally reachable by default (regardless of Ingress or NodePort). The issue was actually related to IPTABLES not adding a rule to allow the destination port for the service.
minikube service awx-service --url
returns the IP:PORT, but that PORT is never allowed through iptables. Adding a rule to the DOCKER
chain on the dport
jumping to ACCEPT
fixed this.
Second issue - minikube service IP. I don't see anywhere that this is configurable, however I'll admit that I may be overlooking it given how many different repo's I've had to visit today. This actually presents 2 issues (1) now we're required to route to the host first for the underlying subnet access and (2) there's no consideration for organizational overlap if that subnet is already in use. I believe the default underlying network is 192.168.49.0/24
which is huge for a bridge/transit network and increases the risk of overlap.
Hi bandwiches
Great thanks for your hint, I have the same issue to deploy ansible awx on k3s cluseter in a VM, and no idea what happen and how to trouble shooting, regarding to your post, finally I increase my ansible awx VM host memory and CPU core, and the problem get fix.
Most helpful comment
Here is the snippet of the error i'm seeing which i believe is exactly like @bandwiches error.
{"level":"error","ts":1620310325.2259731,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"awx-controller","request":"default/awx","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}
{"level":"info","ts":1620310327.3738635,"logger":"logging_event_handler","msg":"[playbook task]","name":"awx","namespace":"default","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"261049867304784443","EventData.Name":"installer : Patching labels to AWX kind"}