Awx-operator: AWX 实例数量错误(非托管 PostgreSQL,副本数>1)

创建于 2021-07-09  ·  13评论  ·  资料来源: ansible/awx-operator

问题类型
  • 错误报告
概括

使用外部(非托管)PostgreSQL api/v2/instances 创建 AWX 部署后显示计数 < 副本数

环境
  • AWX 版本:19.1.0
  • 运营商版本:0.12.0
  • Kubernetes 版本:1.17.9
  • AWX 安装方法:操作员
重现步骤

kubectl apply -f awx-deploy.yml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  replicas: 2
  image_version: 19.1.0
  admin_user: admin
  admin_password_secret: awx-admin-password
  ingress_type: ingress
  ingress_annotations: |
   kubernetes.io/ingress.class: nginx
  hostname: awx-demo.example.com
  ingress_tls_secret: awx-ingress-tls
  web_resource_requirements:
     requests:
       cpu: 400m
       memory: 2Gi
     limits:
       cpu: 1000m
       memory: 4Gi
  task_resource_requirements:
     requests:
       cpu: 250m
       memory: 1Gi
     limits:
       cpu: 500m
       memory: 2Gi
  ee_resource_requirements:
     requests:
       cpu: 250m
       memory: 1Gi
     limits:
       cpu: 500m
       memory: 2Gi

---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: XXXX
  port: "XXXX"
  database: XXX
  username: XXX
  password: XXX
  type: unmanaged
type: Opaque
预期成绩

具有 2 个实例的 AWX HA 配置

实际结果

api/v2/ping/

{
    "ha": false,
    "version": "19.1.0",
    "active_node": "awx-5776c59677-h9mrj",
    "install_uuid": "ba8b8bc6-1010-4e09-b5b2-08cc06901800",
    "instances": [
        {
            "node": "awx-5776c59677-h9mrj",
            "uuid": "5b18352d-24e7-47ce-a18d-e0e4cbd994d5",
            "heartbeat": "2021-07-09T09:09:26.165742Z",
            "capacity": 0,
            "version": "19.1.0"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        }
    ]
}

api/v2/实例/

{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": 1,
            "type": "instance",
            "url": "/api/v2/instances/1/",
            "related": {
                "jobs": "/api/v2/instances/1/jobs/",
                "instance_groups": "/api/v2/instances/1/instance_groups/"
            },
            "uuid": "5b18352d-24e7-47ce-a18d-e0e4cbd994d5",
            "hostname": "awx-5776c59677-h9mrj",
            "created": "2021-07-09T09:08:31.072893Z",
            "modified": "2021-07-09T09:09:26.165742Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        }
    ]
}

kubectl 获取 pods -n awx
姓名准备状态重新开始年龄
awx-5776c59677-74964 4/4 运行 0 14m
awx-5776c59677-h9mrj 4/4 运行 0 14m

kubectl exec pod/awx-5776c59677-74964 -n awx -c awx-web -it -- /bin/bash
bash-4.4$ awx-manage check_db
数据库版本:PostgreSQL 12.7 (Ubuntu 12.7-1.pgdg18.04+1) on x86_64-pc-linux-gnu,gcc 编译 (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

kubectl 日志 -n awx pod/awx-5776c59677-74964 -c awx-task

...
文件“/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/managers.py”,第107行,在我里面
raise RuntimeError("未找到具有当前集群主机 ID 的实例")
运行时错误:未找到具有当前集群主机 ID 的实例
2021-07-09 09:17:27,304 INFO 退出:回调接收器(退出状态 1;不期望)
...

附加信息

使用托管的 PostgreSQL 没有注意到这样的问题。

问题解决后

kubectl rollout restart -n awx 部署/awx

kubectl 获取 pods -n awx
姓名准备状态重新开始年龄
awx-686dd7df69-52kgh 4/4 运行 0 4m26s
awx-686dd7df69-v8w2g 4/4 运行 0 4m23s

api/v2/ping/

{
    "ha": true,
    "version": "19.1.0",
    "active_node": "awx-686dd7df69-52kgh",
    "install_uuid": "ba8b8bc6-1010-4e09-b5b2-08cc06901800",
    "instances": [
        {
            "node": "awx-686dd7df69-52kgh",
            "uuid": "ea773db2-7007-47a8-9987-16ddc79d6ec3",
            "heartbeat": "2021-07-09T09:33:46.787935Z",
            "capacity": 0,
            "version": "19.1.0"
        },
        {
            "node": "awx-686dd7df69-v8w2g",
            "uuid": "acef28b0-3977-4dbe-8c10-e9c4f11adab8",
            "heartbeat": "2021-07-09T09:33:52.378214Z",
            "capacity": 0,
            "version": "19.1.0"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        }
    ]
}

api/v2/实例/

{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": 3,
            "type": "instance",
            "url": "/api/v2/instances/3/",
            "related": {
                "jobs": "/api/v2/instances/3/jobs/",
                "instance_groups": "/api/v2/instances/3/instance_groups/"
            },
            "uuid": "ea773db2-7007-47a8-9987-16ddc79d6ec3",
            "hostname": "awx-686dd7df69-52kgh",
            "created": "2021-07-09T09:33:46.194006Z",
            "modified": "2021-07-09T09:33:46.787935Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        },
        {
            "id": 4,
            "type": "instance",
            "url": "/api/v2/instances/4/",
            "related": {
                "jobs": "/api/v2/instances/4/jobs/",
                "instance_groups": "/api/v2/instances/4/instance_groups/"
            },
            "uuid": "acef28b0-3977-4dbe-8c10-e9c4f11adab8",
            "hostname": "awx-686dd7df69-v8w2g",
            "created": "2021-07-09T09:33:51.780698Z",
            "modified": "2021-07-09T09:33:52.378214Z",
            "capacity_adjustment": "1.00",
            "version": "19.1.0",
            "capacity": 0,
            "consumed_capacity": 0,
            "percent_capacity_remaining": 0.0,
            "jobs_running": 0,
            "jobs_total": 0,
            "cpu": 0,
            "memory": 0,
            "cpu_capacity": 0,
            "mem_capacity": 0,
            "enabled": true,
            "managed_by_policy": true
        }
    ]
}

后来,如果扩展到3个副本,问题是一样的,但也解决了使用

kubectl rollout restart -n awx 部署/awx

AWX 操作员日志
bug needs_devel medium operator

所有13条评论

日志文件.tar.gz

操作员和容器日志文件

@tchellomello @rooftopcellist如果你们中的任何一个下周有时间,你能帮忙看看吗?

如果需要,我可以重现并为您提供 kubeconfig 和 api 的外部 ip 几个小时。

用外部数据库为我工作,但我会多挖掘一点。

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-7pdrp
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.023s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-7pdrp",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-7pdrp",
            "uuid": "3a34f8fe-8336-4910-8c47-7193694a9536",
            "heartbeat": "2021-07-10T19:35:26.186881Z",
            "capacity": 296,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-10T19:35:52.645367Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 589,
            "instances": [
                "awx-toca-657778f5cb-7pdrp",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }

kubectl get pods -w  | grep awx-toca                                                              15:35:01
awx-toca-657778f5cb-7pdrp                                  4/4     Running     0          2d14h
awx-toca-657778f5cb-lm776                                  0/4     Pending     0          0s
awx-toca-657778f5cb-lm776                                  0/4     Pending     0          0s
awx-toca-657778f5cb-lm776                                  0/4     Init:0/1    0          0s
awx-toca-657778f5cb-lm776                                  0/4     PodInitializing   0          1s
awx-toca-657778f5cb-lm776                                  4/4     Running           0          23s


[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     [-] awx.main.consumers client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] 2021-07-10 19:36:00,757 INFO     client 'specific.d10caed53de54b76b34bc914c0ab92b6!290a7011729a4e5b8558e9519d1afd95' joined the broadcast group. 
[awx-toca-657778f5cb-lm776 awx-toca-web] RESULT 2 

该问题可能在 image_version=19.1.0 时出现
我尝试使用 19.2.2,创建 2 个副本成功。

但是如果然后设置副本:3(从 2 到 3)

kubectl apply -f awx-deploy.yml

api/v2/ping/

{
    "ha": false,
    "version": "19.2.2",
    "active_node": "awx-848f64cdb4-29pcv",
    "install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
    "instances": [
        {
            "node": "awx-848f64cdb4-spt82",
            "uuid": "27494bf7-6fa2-489e-bdb3-82466edbd49c",
            "heartbeat": "2021-07-12T09:48:24.680690Z",
            "capacity": 79,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "controlplane",
            "capacity": 79,
            "instances": [
                "awx-848f64cdb4-spt82"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}
kubectl rollout restart -n awx deployment/awx

api/v2/ping/

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-657cd5b84-t5htk",
    "install_uuid": "88b63b97-2942-49c5-bc5f-e5006a7b5456",
    "instances": [
        {
            "node": "awx-657cd5b84-g8kx2",
            "uuid": "30e28fc4-8c88-4922-a7e1-0196fe790f2f",
            "heartbeat": "2021-07-12T10:00:33.162404Z",
            "capacity": 79,
            "version": "19.2.2"
        },
        {
            "node": "awx-657cd5b84-rg9v4",
            "uuid": "501a0ff7-9043-46f4-baae-4602de3107d2",
            "heartbeat": "2021-07-12T10:00:36.591979Z",
            "capacity": 79,
            "version": "19.2.2"
        },
        {
            "node": "awx-657cd5b84-t5htk",
            "uuid": "a3308acc-04e9-4da7-88cf-71048d666ffb",
            "heartbeat": "2021-07-12T10:00:38.958448Z",
            "capacity": 79,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "controlplane",
            "capacity": 237,
            "instances": [
                "awx-657cd5b84-g8kx2",
                "awx-657cd5b84-rg9v4",
                "awx-657cd5b84-t5htk"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

replicas:2这是 AWX API 的输出

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-lm776
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.015s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-lm776",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-4bzps",
            "uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
            "heartbeat": "2021-07-25T03:09:29.263282Z",
            "capacity": 293,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-25T03:09:49.130909Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 586,
            "instances": [
                "awx-toca-657778f5cb-4bzps",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

然后修改了 AWX 规范kubectl edit awx awx-toca并设置replicas:3得到了预期的 3:

kubectl get pods -w | grep awx                                                                                          23:10:10
awx-operator-df789fd9c-rqn2k                               1/1     Running     0          32h
awx-toca-657778f5cb-4bzps                                  4/4     Running     0          32h
awx-toca-657778f5cb-lm776                                  4/4     Running     78         14d
awx-toca-657778f5cb-28fq9                                  0/4     Pending     0          0s
awx-toca-657778f5cb-28fq9                                  0/4     Pending     0          0s
awx-toca-657778f5cb-28fq9                                  0/4     Init:0/1    0          0s
awx-toca-657778f5cb-28fq9                                  0/4     PodInitializing   0          2s
awx-toca-657778f5cb-28fq9                                  4/4     Running           0          4s

查看 API,它按预期工作:

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-toca-657778f5cb-4bzps
X-API-Product-Name: AWX
X-API-Product-Version: 19.2.2
X-API-Time: 0.014s

{
    "ha": true,
    "version": "19.2.2",
    "active_node": "awx-toca-657778f5cb-4bzps",
    "install_uuid": "e27ea7cb-c400-45fe-a595-9bb5217c71ac",
    "instances": [
        {
            "node": "awx-toca-657778f5cb-28fq9",
            "uuid": "7801777c-93de-416f-841e-0eb9a1b721d2",
            "heartbeat": "2021-07-25T03:10:55.501238Z",
            "capacity": 296,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-4bzps",
            "uuid": "617ccf03-2231-44ef-b512-7b97d3207feb",
            "heartbeat": "2021-07-25T03:11:29.447748Z",
            "capacity": 293,
            "version": "19.2.2"
        },
        {
            "node": "awx-toca-657778f5cb-lm776",
            "uuid": "fab5cf31-ae1d-4ddc-8b55-459618c20845",
            "heartbeat": "2021-07-25T03:10:49.231003Z",
            "capacity": 293,
            "version": "19.2.2"
        }
    ],
    "instance_groups": [
        {
            "name": "tower",
            "capacity": 0,
            "instances": []
        },
        {
            "name": "controlplane",
            "capacity": 882,
            "instances": [
                "awx-toca-657778f5cb-28fq9",
                "awx-toca-657778f5cb-4bzps",
                "awx-toca-657778f5cb-lm776"
            ]
        },
        {
            "name": "default",
            "capacity": 0,
            "instances": []
        }
    ]
}

请记住,任何手动发出的kubecl scale --replicas命令都将被操作员覆盖。 所有更改都必须直接在 AWX 规范中执行。 @tklsnk因为我无法重现它,你能确认你遵循的步骤来扩大它吗?

使用 replicas=2 部署后,我编辑 awx-deploy.yml(设置 replicas=3)并执行 kubectl apply -f awx-deploy.yml

@tklsnk是的,这就是我在这里所做的,但是我无法重现相同的问题。

好的,我会尝试使用另一个 k8s 集群。
谢谢你。

好的,我会尝试使用另一个 k8s 集群。
谢谢你。

这个@tklsnk 有什么更新吗?

对不起,还没有机会尝试这个。 希望这周做。

与备用 k8s 集群一起按预期工作。 可能是特定云提供商的特定 k8s 实现的问题。

感谢您的反馈@tklsnk

此页面是否有帮助?
0 / 5 - 0 等级