Helm: 发布“ prometheus-operator”失败:rpc错误:代码=已取消

创建于 2019-07-31  ·  71评论  ·  资料来源: helm/helm

描述错误
当我尝试使用helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yaml在AKS上安装prometheus运算符时,出现此错误:

prometheus-operator”失败:rpc错误:代码=已取消

我检查了历史:

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

图表
[稳定/普罗米修斯算子]

附加信息
我正在使用以下配置来部署图表:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

在值文件中: createCustomResource设置为false

helm version
客户端:&version.Version {SemVer:“ v2.14.3”,GitCommit:“ 0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085”,GitTreeState:“ clean”}
服务器:&version.Version {SemVer:“ v2.14.3”,GitCommit:“ 0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085”,GitTreeState:“ clean”}

kubectl version
客户端版本:version.Info {主要:“ 1”,次要:“ 10”,GitVersion:“ v1.10.4”,GitCommit:“ 5ca598b4ba5abb89bb773071ce452e33fb66339d”,GitTreeState:“ clean”,BuildDate:“ 2018-06-06T08:13: 03Z“,GoVersion:” go1.9.3“,编译器:” gc“,平台:” windows / amd64“}
服务器版本:version.Info {主要:“ 1”,次要:“ 13”,GitVersion:“ v1.13.7”,GitCommit:“ 4683545293d792934a7a7e12f2cc47d20b2dd01b”,GitTreeState:“ clean”,BuildDate:“ 2019-06-06T01:39: 30Z”,GoVersion:“ go1.11.5”,编译器:“ gc”,平台:“ linux / amd64”}

云提供商/平台(AKS,GKE,Minikube等):
AKS

questiosupport

最有用的评论

通过遵循readme.md中的“头盔无法创建CRD”部分,我可以解决此问题。 我不确定它们之间的关系,但是确实有效。

步骤1:手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步:
等待创建CRD,只需几秒钟

第三步:
安装图表,但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

所有71条评论

我们在minikube上也遇到了同样的问题,因此它似乎并非特定于AWS。

我们在部署kubespray的集群上遇到了同样的问题。

我还在自动化管道中的k8s 12.x和13.x k8s kubespray部署群集上都看到了该问题-失败率100%。 以前版本的prometheus-operator(0.30.1)可以正常工作。
有趣的是-如果我手动运行该命令,而不是通过CD管道运行该命令,那么它会起作用-因此,我对造成问题的原因感到有些困惑。

看到今天的普罗米修斯图表有更新。 我撞到

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0     

而且我不再看到这个问题。

@rnkhouse是否可以查看@ dlevene1https://github.com/helm/helm/issues/6130#issuecomment -526977731中提到的最新图表版本?

我在AKS上使用版本6.8.1遇到相同的问题。

NAME                        CHART VERSION   APP VERSION
stable/prometheus-operator  6.8.1           0.32.0
❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

我们在部署kubespray的集群上遇到了同样的问题。

Kubernete版本: v1.4.1
头盔版本:

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Prometheus-operator版本:

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0  

我对aks有同样的问题。

任何人都可以在Helm 3中重现此问题,还是将其传播为其他错误? 我的假设是,随着分till的清除,这将不再是一个问题。

@bacongobbler这在Helm 3中仍然是一个问题。

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

但是,这似乎是与OP提出的问题不同的问题。

描述:“发布“ prometheus-operator”失败:rpc错误:代码=取消了desc
= grpc:客户端连接正在关闭'

您可以检查并查看是否正在使用最新的Beta版本吗? 这个错误似乎在3.0.0-beta.3中发布的#6332中得到了解决。 如果不能,您可以打开一个新的问题吗?

@bacongobbler我正在使用最新的Helm v3.0.0-beta.3。

我必须回到--version 6.7.3才能正确安装

我们的解决方法是将prometheus运算符映像保留在v0.31.1上。

helm.log
也刚刚在DockerEE kubernetes安装上遇到了这个问题

经过对安装选项--debug等的摆弄后,现在得到:

Error: release prom failed: context canceled

编辑:可以尝试更新我的头盔版本,当前为v2.12.3
Edit2:更新到2.14.3仍然有问题
grpc: the client connection is closing
Edit3:按照上述建议安装了6.7.3版,以使事情再次发生
Edit4:附加的分er日志,用于失败安装为helm.log

相关: https :

在使用@ cyp3d进行了一些挖掘之后,似乎该问题可能是由于对某些群集而言太短的头盔删除超时引起的。 我无法在任何地方重现该问题,因此,如果遇到此问题的人可以在链接的请求请求分支中验证潜在的修复,我将不胜感激!

https://github.com/helm/charts/pull/17090

在AWS上使用kops创建的几个集群上也是如此。
在K3S上运行时没有问题。

@xvzf

您可以在此PR中尝试潜在的解决方法吗? https://github.com/helm/charts/pull/17090

我给公关打了遍,仍然是一样的Error: release prom failed: context canceled
tiller.log

@vsliouniaev不,无法在此处解决问题

感谢您检查@xvzf和@pyadminn。 我在同一张PR中进行了另一处更改。 你能帮忙看看吗?

刚刚检查了更新的PR,仍然在我们的基础设施中看到以下内容: Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

仅供参考,我们使用Kuber 1.14.3
头盔v2.14.3

通过遵循readme.md中的“头盔无法创建CRD”部分,我可以解决此问题。 我不确定它们之间的关系,但是确实有效。

步骤1:手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步:
等待创建CRD,只需几秒钟

第三步:
安装图表,但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

@vsliouniaev还是同样的问题! 尽管可以使用lethalwire的解决方法。

致命的解决方法也使我解决了。

因此,在4天的时间里,解决方法起作用了,并且停止了工作,我不得不使用0.32.0而不是master的CRD文件。

我刚才在使用当前的CRD时遇到了同样的问题。 感谢@Typositoire ,建议您使用当前的先前版本。 使CRD适应于以下对我有用的工作:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

因此,修复版本通常是一个好习惯。

也有此问题,请尝试禁用admissionWebhooks 。 对我来说,这很有帮助。

安装prometheus-operator chart 6.0.0并进行helm升级--force --version 6.11.0,这似乎适用于rancher kubernetes 1.13.10和helm v2.14.3

@Typositoire建议的解决方法在由kops生成的1.13.10群集上对我来说效果很好。

此处尝试使用kubernetes 1.13.10和带prometheus-operator-6.18.0的头盔v2.14.3安装在Azure AKS上的同一问题。 有什么建议吗?

手动安装CRD。

该命令失败:
helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false

给出错误

错误:发布prometheus-operator失败:rpc错误:代码=已取消desc = grpc:客户端连接正在关闭

编辑:安装图表的版本6.11.0(以及6.7.3)正在工作:

helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false --version 6.11.0

尝试禁用准入控制器Web挂钩吗?

https://waynekhan.net/2019/10/09/prometheus-operator-release-failed.html

2019年10月15日,19:32,iMacX [email protected]写道:

</ s> </ s> </ s> </ s> </ s> </ s> </ s> </ s> </ s>
此处尝试使用kubernetes 1.13.10和带prometheus-operator-6.18.0的头盔v2.14.3安装在Azure AKS上的同一问题。 有什么建议吗?

-
您收到此消息是因为您已订阅此线程。
直接回复此电子邮件,在GitHub上查看或取消订阅。

我正在解决同一问题,我必须手动安装@JBosom指定的crds ,并在禁用Web钩子的情况下进行安装。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

helm --tls --tiller-namespace=tiller install --namespace=monitoring --name prom-mfcloud stable/prometheus-operator --set prometheusOperator.createCustomResource=false --set prometheusOperator.admissionWebhooks.enabled=false --values values.yaml --versi on 6.18.0

我在尝试通过Docker for Desktop掌舵v2.14.3在本地K8S群集上安装v8.0.0时收到相同的错误。 之所以能够仅首先创建顶层要求后安装由@lethalwire的建议

我认为我们这里有足够的案例来确定这是普罗米修斯-运营商图表的一个特定问题。

我将结束这件事,因为我们最终没有可采取的行动,但请随时保持对话的进行。

我很抱歉,但是升级到最新的头盔v2.15.2之后,我再也没有收到此错误。 👍

Helm无法提供任何有关正在发生的事情的信息,这似乎很奇怪。

这里没有发布调试日志,也没有要求调试日志,人们正在恢复翻转开关并查看是否有帮助。

该错误实际上是什么意思? 是否显示等待等待的死锁? 除了集体耸耸肩之外,还有其他可以执行的动作吗?

是的。 最初的错误似乎是等待准入Web挂钩完成的死锁,因为禁用Web挂钩可以使图表安装没有问题。 查看蒂勒的日志,可以确认问题。

Helm 3应该向用户报告正确的错误,因为在混合中没有gRPC层会超时并从超时中取消请求。

随时为Helm 2提供补丁程序。鉴于Helm 3对此功能进行了改进,因此我继续进行此工作,并在较新的发行版中对其进行了修复。

希望这可以帮助。

最初的错误似乎是等待准入Web挂钩完成的死锁,因为禁用Web挂钩可以使图表安装没有问题。

结论似乎很奇怪,因为解决方案是禁用作业或禁用安装CRD挂钩。 两者似乎都可以解决问题,因此似乎并不是工作特有的问题。

对于遇到此问题的其他任何人-您能提供kubectl describe job的输出,以便我们找出哪些作业失败了吗? 我之前曾要求这样做,但是每个人似乎都表明没有工作。

提勒的内容如下:

[kube] 2019/11/15 14:35:46 get relation pod of object: monitoring/PrometheusRule/prometheus-operator-node-time
[kube] 2019/11/15 14:35:46 Doing get for PrometheusRule: "prometheus-operator-kubernetes-apps"
[ A lot of unrelated updates in between... ]
2019/11/15 14:36:38 Cannot patch PrometheusRule: "prometheus-operator-kubernetes-apps" (rpc error: code = Canceled desc = grpc: the client connection is closing)
2019/11/15 14:36:38 Use --force to force recreation of the resource
[kube] 2019/11/15 14:36:38 error updating the resource "prometheus-operator-kubernetes-apps":
     rpc error: code = Canceled desc = grpc: the client connection is closing
[tiller] 2019/11/15 14:36:38 warning: Upgrade "prometheus-operator" failed: rpc error: code = Canceled desc = grpc: the client connection is closing
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v94"
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v95"
[ then rollback... ]

因此,我不得不手动删除此资源。 apiserver可能具有更多信息(听起来确实与准入控制器有关)。

@desaintmartin这看起来像是在升级而不是安装上发生的,对吗?

由于Helm 3.0现在已经是通用航空,并且图表正在运行中,因此请报告是否可以在此发生以及是否有更好的日志

我正在使用Helm3,但仍然在Azure AKS上收到此错误:(

我在图表v8.2.4上尝试过:如果prometheusOperator.admissionWebhooks=false ,也可以prometheus.tlsProxy.enabled=false

此外,就像vsliouniaev所说的一样, --debug--dry-run怎么说?

@ truealex81由于helm3旨在提供有关此的更多信息,您能否在安装过程中发布详细的日志?

我在Azure AKS上部署8.2.4时遇到相同的问题。

头盔版本:
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debug产生以下输出:

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

我可以可靠地重现这一点。 如果有办法获取更多详细日志,请让我知道,我在此处发布输出

@ pather87非常感谢!

这是图表中将要发生的事情的顺序:

  1. 已配置CRD
  2. 有一个预安装;预升级作业,该作业运行一个容器以创建带有用于接纳钩子的证书的密钥。 成功完成这项工作及其资源的清理
  3. 所有资源均已创建
  4. 有一个安装后;升级后作业,该作业运行一个容器,以根据步骤2中创建的证书使用CA修补创建的验证gwebhookconfiguration和mutatingwebhookconfiguration。

您能否检查是否还有任何失败的作业? 从日志中读取的内容似乎不应该,因为它们都已成功完成。

Error: context canceled发生之后,集群中是否还有其他资源?

在安装prometheus-operator时,此处相同:

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

@vsliouniaev谢谢您的回答!

  1. 部署后没有作业。
  2. 部署后,部署和服务将出现在群集中,请参阅kubectl输出:

kubectl全部获取-lrelease = prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

使用调试安装:

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

之后,然后执行: kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

通过尝试解决此问题,我也发现了什么:如果我随后删除图表和CRD并再次安装图表,问题仍然存在,但是如果我不删除crds,问题就不会继续存在。

我尝试并预先安装了crds,并执行了helm install --skip-crds ,但问题仍然存在。 这有点令人困惑。

我期望的下一条日志行是关于安装后,升级后的挂钩,但是在您的情况下不会出现。 我不确定什么舵手在这里等着

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

手动创建CRD至少对Azure有所帮助。
首先从此链接创建crds https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
所有文件的“ kubectl创建-f alertmanager.crd.yaml”等等
然后
头盔安装prometheus-operator stable / prometheus-operator-名称空间监视-版本8.2.4 --set prometheusOperator.createCustomResource = false

谢谢@ truealex81 ! 可以在Azure上使用。

myenv:
k8s 1.11.2头盔2.13.1分till 2.13.1
prometheus-operator-5.5 APP版本0.29可以!!!

但:
prometheus-operator-8 APP VERSION 0.32有同样的问题:
“上下文已取消”或“ grpc:客户端连接正在关闭” !!!

我想最新版本的Prometheus-operator不兼容吗?!!!

@bierhov,请问您可以在失败后将资源发布到命名空间中吗?

是的!
shell执行“ helm ls”,我可以看到我的prometheus-operator发行状态为“ failed”,但是我安装的prometheus-operator的名称空间具有所有prometheus-operator资源
但,
Promethues网站无法获取任何数据!

不过,您可以张贴资源吗?

不过,您可以张贴资源吗?

抱歉,我无法重现,除非我卸下稳定的头盔env,然后再做一次!

@bierhov安装后,您还有任何失败的作业吗?

@bierhov安装后,您还有任何失败的作业吗?

我的k8s版本是1.11.2掌舵机,耕种机版本是2.13.1
如果我安装了Prometheus-operator版本8.x
shell exec命令“ helm ls”,作业状态失败
但我安装了Prometheus-operator版本5.x
shell exec命令“ helm ls”,作业状态已部署!!!

使用以下方法不可复制:

Kubernetes版本: v1.13.12"
Kubectl版本: v1.16.2
头盔版本: 3.0.1
Prometheus-operator版本: 8.3.3

  1. 手动安装CRD:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

  1. 将操作员配置为在Values.yaml中或使用进行安装时不创建crds

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

@GramozKrasniqi
如果您不手动创建CRD怎么办? 这是该问题的解决方法之一

@vsliouniaev如果不创建它们,将得到错误。
但是在附加信息的原始问题中, @ rnkhouse表示他是手动创建CRD。

简而言之,我们在部署中使用了prometheus-operator,我们将prom-op从6.9.3升级到8.3.3,并且始终失败,并显示“错误:上下文已取消”。
同样,我们总是在安装/升级prometheus-operator之前先安装crd,并且我们并没有更改或更新这些crd-s。

我尝试刷新crds,在github.com/helm/charts/tree/master/stable/prometheus-operator中提到的内容/master/example/prometheus-operator-crd/alertmanager.crd.yaml),但这些不再存在。
之后,我尝试从这里进行以下操作: https :
但是它又失败了。

我差点放弃,但是有了这些信用证,头盔部署成功了! yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

我的设置:

Kubernetes版本: v1.14.3
Kubectl版本: v1.14.2
头盔版本: 2.14.3
Prometheus-operator版本: 8.3.3

从k8s清除prometheus运算符!

然后:

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml 
helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

就这样 !

这是否意味着有必要进行全新安装并丢失历史指标数据?

将AKS k8s升级到1.15.5,将掌舵机升级到3.0.1,将Prometheus-operator图表升级到8.3.3,问题就消失了。

我们的解决方法是将prometheus运算符映像保留在v0.31.1上。

在AKS v1.14.8和helm + tiller v2.16.1上也为我工作,并将操作员图像更改为v0.31.1

手动创建CRD至少对Azure有所帮助。
首先从此链接创建crds https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
所有文件的“ kubectl创建-f alertmanager.crd.yaml”等等
然后
头盔安装prometheus-operator stable / prometheus-operator-名称空间监视-版本8.2.4 --set prometheusOperator.createCustomResource = false

在天蓝色的kubernetes作品中,谢谢

通过遵循readme.md中的“头盔无法创建CRD”部分,我可以解决此问题。 我不确定它们之间的关系,但是确实有效。

步骤1:手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步:
等待创建CRD,只需几秒钟

第三步:
安装图表,但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

谢谢,这对AKS集群非常有用。 必须更改CRD的URL。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate = false

掌舵安装稳定/ prometheus-operator --name prometheus-operator --namespace监控--set prometheusOperator.createCustomResource = false

闭幕。 根据最后三位评论者的说法,此问题似乎已经解决。 谢谢!

此页面是否有帮助?
0 / 5 - 0 等级