Helm: 发布“ prometheus-operator”失败：rpc错误：代码=已取消

创建于 2019-07-31 · 71评论 · 资料来源: helm/helm

描述错误
当我尝试使用helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yaml在AKS上安装prometheus运算符时，出现此错误：

prometheus-operator”失败：rpc错误：代码=已取消

我检查了历史：

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

图表
[稳定/普罗米修斯算子]

附加信息
我正在使用以下配置来部署图表：

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

在值文件中： createCustomResource设置为false ，

helm version ：
客户端：＆version.Version {SemVer：“ v2.14.3”，GitCommit：“ 0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085”，GitTreeState：“ clean”}
服务器：＆version.Version {SemVer：“ v2.14.3”，GitCommit：“ 0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085”，GitTreeState：“ clean”}

kubectl version ：
客户端版本：version.Info {主要：“ 1”，次要：“ 10”，GitVersion：“ v1.10.4”，GitCommit：“ 5ca598b4ba5abb89bb773071ce452e33fb66339d”，GitTreeState：“ clean”，BuildDate：“ 2018-06-06T08：13： 03Z“，GoVersion：” go1.9.3“，编译器：” gc“，平台：” windows / amd64“}
服务器版本：version.Info {主要：“ 1”，次要：“ 13”，GitVersion：“ v1.13.7”，GitCommit：“ 4683545293d792934a7a7e12f2cc47d20b2dd01b”，GitTreeState：“ clean”，BuildDate：“ 2019-06-06T01：39： 30Z”，GoVersion：“ go1.11.5”，编译器：“ gc”，平台：“ linux / amd64”}

云提供商/平台（AKS，GKE，Minikube等）：
AKS

questiosupport

资料来源

rnkhouse

👍8 👀3

最有用的评论

通过遵循readme.md中的“头盔无法创建CRD”部分，我可以解决此问题。我不确定它们之间的关系，但是确实有效。

步骤1：手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步：
等待创建CRD，只需几秒钟

第三步：
安装图表，但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype 于 2019-09-20

👍10

所有71条评论

我们在minikube上也遇到了同样的问题，因此它似乎并非特定于AWS。

janvdvegt 于 2019-08-09

我们在部署kubespray的集群上遇到了同样的问题。

robinelfrink 于 2019-08-23

我还在自动化管道中的k8s 12.x和13.x k8s kubespray部署群集上都看到了该问题-失败率100％。以前版本的prometheus-operator（0.30.1）可以正常工作。
有趣的是-如果我手动运行该命令，而不是通过CD管道运行该命令，那么它会起作用-因此，我对造成问题的原因感到有些困惑。

dlevene1 于 2019-09-02

看到今天的普罗米修斯图表有更新。我撞到

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0

而且我不再看到这个问题。

dlevene1 于 2019-09-02

🎉1 👍1

@rnkhouse是否可以查看@ dlevene1在https://github.com/helm/helm/issues/6130#issuecomment -526977731中提到的最新图表版本？

hickeyma 于 2019-09-02

我在AKS上使用版本6.8.1遇到相同的问题。

NAME                        CHART VERSION   APP VERSION
stable/prometheus-operator  6.8.1           0.32.0

❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

PaulusTM 于 2019-09-02

我们在部署kubespray的集群上遇到了同样的问题。

Kubernete版本： v1.4.1
头盔版本：

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Prometheus-operator版本：

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0

luncj 于 2019-09-04

我对aks有同样的问题。

will-beta 于 2019-09-06

任何人都可以在Helm 3中重现此问题，还是将其传播为其他错误？我的假设是，随着分till的清除，这将不再是一个问题。

bacongobbler 于 2019-09-06

@bacongobbler这在Helm 3中仍然是一个问题。

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

will-beta 于 2019-09-07

但是，这似乎是与OP提出的问题不同的问题。

描述：“发布“ prometheus-operator”失败：rpc错误：代码=取消了desc
= grpc：客户端连接正在关闭'

您可以检查并查看是否正在使用最新的Beta版本吗？这个错误似乎在3.0.0-beta.3中发布的＃6332中得到了解决。如果不能，您可以打开一个新的问题吗？

bacongobbler 于 2019-09-07

@bacongobbler我正在使用最新的Helm v3.0.0-beta.3。

will-beta 于 2019-09-07

我必须回到--version 6.7.3才能正确安装

k8s-class 于 2019-09-08

我们的解决方法是将prometheus运算符映像保留在v0.31.1上。

robinelfrink 于 2019-09-09

👍3

helm.log
也刚刚在DockerEE kubernetes安装上遇到了这个问题

经过对安装选项--debug等的摆弄后，现在得到：

Error: release prom failed: context canceled

编辑：可以尝试更新我的头盔版本，当前为v2.12.3
Edit2：更新到2.14.3仍然有问题
grpc: the client connection is closing
Edit3：按照上述建议安装了6.7.3版，以使事情再次发生
Edit4：附加的分er日志，用于失败安装为helm.log

相关： https :

pyadminn 于 2019-09-10

在使用@ cyp3d进行了一些挖掘之后，似乎该问题可能是由于对某些群集而言太短的头盔删除超时引起的。我无法在任何地方重现该问题，因此，如果遇到此问题的人可以在链接的请求请求分支中验证潜在的修复，我将不胜感激！

https://github.com/helm/charts/pull/17090

vsliouniaev 于 2019-09-12

在AWS上使用kops创建的几个集群上也是如此。
在K3S上运行时没有问题。

xvzf 于 2019-09-13

@xvzf

您可以在此PR中尝试潜在的解决方法吗？ https://github.com/helm/charts/pull/17090

vsliouniaev 于 2019-09-13

我给公关打了遍，仍然是一样的Error: release prom failed: context canceled
tiller.log

pyadminn 于 2019-09-13

@vsliouniaev不，无法在此处解决问题

xvzf 于 2019-09-13

感谢您检查@xvzf和@pyadminn。我在同一张PR中进行了另一处更改。你能帮忙看看吗？

vsliouniaev 于 2019-09-14

刚刚检查了更新的PR，仍然在我们的基础设施中看到以下内容： Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

仅供参考，我们使用Kuber 1.14.3
头盔v2.14.3

pyadminn 于 2019-09-16

通过遵循readme.md中的“头盔无法创建CRD”部分，我可以解决此问题。我不确定它们之间的关系，但是确实有效。

步骤1：手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步：
等待创建CRD，只需几秒钟

第三步：
安装图表，但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype 于 2019-09-20

👍10

@vsliouniaev还是同样的问题！尽管可以使用lethalwire的解决方法。

xvzf 于 2019-09-23

👍1

致命的解决方法也使我解决了。

pyadminn 于 2019-09-25

因此，在4天的时间里，解决方法起作用了，并且停止了工作，我不得不使用0.32.0而不是master的CRD文件。

Typositoire 于 2019-10-02

👍1

我刚才在使用当前的CRD时遇到了同样的问题。感谢@Typositoire ，建议您使用当前的先前版本。使CRD适应于以下对我有用的工作：

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

因此，修复版本通常是一个好习惯。

JBosom 于 2019-10-03

👍5 🎉2

也有此问题，请尝试禁用admissionWebhooks 。对我来说，这很有帮助。

cu12 于 2019-10-03

🎉1

安装prometheus-operator chart 6.0.0并进行helm升级--force --version 6.11.0，这似乎适用于rancher kubernetes 1.13.10和helm v2.14.3

FreezB 于 2019-10-03

@Typositoire建议的解决方法在由kops生成的1.13.10群集上对我来说效果很好。

alex-hempel 于 2019-10-10

此处尝试使用kubernetes 1.13.10和带prometheus-operator-6.18.0的头盔v2.14.3安装在Azure AKS上的同一问题。有什么建议吗？

手动安装CRD。

该命令失败：
helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false

给出错误

错误：发布prometheus-operator失败：rpc错误：代码=已取消desc = grpc：客户端连接正在关闭

编辑：安装图表的版本6.11.0（以及6.7.3）正在工作：

helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false --version 6.11.0

iMacX 于 2019-10-15

👍1

尝试禁用准入控制器Web挂钩吗？

https://waynekhan.net/2019/10/09/prometheus-operator-release-failed.html

2019年10月15日，19：32，iMacX [email protected]写道：
</ s> </ s> </ s> </ s> </ s> </ s> </ s> </ s> </ s>
此处尝试使用kubernetes 1.13.10和带prometheus-operator-6.18.0的头盔v2.14.3安装在Azure AKS上的同一问题。有什么建议吗？
-
您收到此消息是因为您已订阅此线程。
直接回复此电子邮件，在GitHub上查看或取消订阅。

waynekhan 于 2019-10-16

👍1

我正在解决同一问题，我必须手动安装@JBosom指定的crds ，并在禁用Web钩子的情况下进行安装。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

helm --tls --tiller-namespace=tiller install --namespace=monitoring --name prom-mfcloud stable/prometheus-operator --set prometheusOperator.createCustomResource=false --set prometheusOperator.admissionWebhooks.enabled=false --values values.yaml --versi on 6.18.0

poochwashere 于 2019-10-16

👍5

我在尝试通过Docker for Desktop掌舵v2.14.3在本地K8S群集上安装v8.0.0时收到相同的错误。之所以能够仅首先创建顶层要求后安装由@lethalwire的建议

demisx 于 2019-11-06

我认为我们这里有足够的案例来确定这是普罗米修斯-运营商图表的一个特定问题。

我将结束这件事，因为我们最终没有可采取的行动，但请随时保持对话的进行。

bacongobbler 于 2019-11-06

我很抱歉，但是升级到最新的头盔v2.15.2之后，我再也没有收到此错误。 👍

demisx 于 2019-11-06

Helm无法提供任何有关正在发生的事情的信息，这似乎很奇怪。

这里没有发布调试日志，也没有要求调试日志，人们正在恢复翻转开关并查看是否有帮助。

该错误实际上是什么意思？是否显示等待等待的死锁？除了集体耸耸肩之外，还有其他可以执行的动作吗？

vsliouniaev 于 2019-11-06

是的。最初的错误似乎是等待准入Web挂钩完成的死锁，因为禁用Web挂钩可以使图表安装没有问题。查看蒂勒的日志，可以确认问题。

Helm 3应该向用户报告正确的错误，因为在混合中没有gRPC层会超时并从超时中取消请求。

随时为Helm 2提供补丁程序。鉴于Helm 3对此功能进行了改进，因此我继续进行此工作，并在较新的发行版中对其进行了修复。

希望这可以帮助。

bacongobbler 于 2019-11-06

👎1

最初的错误似乎是等待准入Web挂钩完成的死锁，因为禁用Web挂钩可以使图表安装没有问题。

结论似乎很奇怪，因为解决方案是禁用作业或禁用安装CRD挂钩。两者似乎都可以解决问题，因此似乎并不是工作特有的问题。

对于遇到此问题的其他任何人-您能提供kubectl describe job的输出，以便我们找出哪些作业失败了吗？我之前曾要求这样做，但是每个人似乎都表明没有工作。

vsliouniaev 于 2019-11-14

提勒的内容如下：

[kube] 2019/11/15 14:35:46 get relation pod of object: monitoring/PrometheusRule/prometheus-operator-node-time
[kube] 2019/11/15 14:35:46 Doing get for PrometheusRule: "prometheus-operator-kubernetes-apps"
[ A lot of unrelated updates in between... ]
2019/11/15 14:36:38 Cannot patch PrometheusRule: "prometheus-operator-kubernetes-apps" (rpc error: code = Canceled desc = grpc: the client connection is closing)
2019/11/15 14:36:38 Use --force to force recreation of the resource
[kube] 2019/11/15 14:36:38 error updating the resource "prometheus-operator-kubernetes-apps":
     rpc error: code = Canceled desc = grpc: the client connection is closing
[tiller] 2019/11/15 14:36:38 warning: Upgrade "prometheus-operator" failed: rpc error: code = Canceled desc = grpc: the client connection is closing
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v94"
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v95"
[ then rollback... ]

因此，我不得不手动删除此资源。 apiserver可能具有更多信息（听起来确实与准入控制器有关）。

desaintmartin 于 2019-11-15

@desaintmartin这看起来像是在升级而不是安装上发生的，对吗？

vsliouniaev 于 2019-11-15

由于Helm 3.0现在已经是通用航空，并且图表正在运行中，因此请报告是否可以在此发生以及是否有更好的日志

vsliouniaev 于 2019-11-15

我正在使用Helm3，但仍然在Azure AKS上收到此错误：(

truealex81 于 2019-11-27

我在图表v8.2.4上尝试过：如果prometheusOperator.admissionWebhooks=false ，也可以prometheus.tlsProxy.enabled=false 。

此外，就像vsliouniaev所说的一样， --debug和--dry-run怎么说？

waynekhan 于 2019-11-28

@ truealex81由于helm3旨在提供有关此的更多信息，您能否在安装过程中发布详细的日志？

vsliouniaev 于 2019-11-28

我在Azure AKS上部署8.2.4时遇到相同的问题。

头盔版本：
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debug产生以下输出：

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

我可以可靠地重现这一点。如果有办法获取更多详细日志，请让我知道，我在此处发布输出

sschne 于 2019-11-29

@ pather87非常感谢！

这是图表中将要发生的事情的顺序：

已配置CRD
有一个预安装；预升级作业，该作业运行一个容器以创建带有用于接纳钩子的证书的密钥。成功完成这项工作及其资源的清理
所有资源均已创建
有一个安装后；升级后作业，该作业运行一个容器，以根据步骤2中创建的证书使用CA修补创建的验证gwebhookconfiguration和mutatingwebhookconfiguration。

您能否检查是否还有任何失败的作业？从日志中读取的内容似乎不应该，因为它们都已成功完成。

Error: context canceled发生之后，集群中是否还有其他资源？

vsliouniaev 于 2019-11-29

在安装prometheus-operator时，此处相同：

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

willsilvano 于 2019-11-29

@vsliouniaev谢谢您的回答！

部署后没有作业。
部署后，部署和服务将出现在群集中，请参阅kubectl输出：

kubectl全部获取-lrelease = prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

sschne 于 2019-11-29

使用调试安装：

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

之后，然后执行： kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

willsilvano 于 2019-11-29

通过尝试解决此问题，我也发现了什么：如果我随后删除图表和CRD并再次安装图表，问题仍然存在，但是如果我不删除crds，问题就不会继续存在。

我尝试并预先安装了crds，并执行了helm install --skip-crds ，但问题仍然存在。这有点令人困惑。

sschne 于 2019-11-29

我期望的下一条日志行是关于安装后，升级后的挂钩，但是在您的情况下不会出现。我不确定什么舵手在这里等着

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

vsliouniaev 于 2019-11-29

手动创建CRD至少对Azure有所帮助。
首先从此链接创建crds https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
所有文件的“ kubectl创建-f alertmanager.crd.yaml”等等
然后
头盔安装prometheus-operator stable / prometheus-operator-名称空间监视-版本8.2.4 --set prometheusOperator.createCustomResource = false

truealex81 于 2019-11-29

❤1 👍1

谢谢@ truealex81 ！可以在Azure上使用。

willsilvano 于 2019-12-02

myenv：
k8s 1.11.2头盔2.13.1分till 2.13.1
prometheus-operator-5.5 APP版本0.29可以！！！

但：
prometheus-operator-8 APP VERSION 0.32有同样的问题：
“上下文已取消”或“ grpc：客户端连接正在关闭” ！！！

我想最新版本的Prometheus-operator不兼容吗？！！！

bierhov 于 2019-12-05

👍1

@bierhov，请问您可以在失败后将资源发布到命名空间中吗？

vsliouniaev 于 2019-12-05

是的！
shell执行“ helm ls”，我可以看到我的prometheus-operator发行状态为“ failed”，但是我安装的prometheus-operator的名称空间具有所有prometheus-operator资源
但，
Promethues网站无法获取任何数据！

bierhov 于 2019-12-05

不过，您可以张贴资源吗？

vsliouniaev 于 2019-12-05

不过，您可以张贴资源吗？

抱歉，我无法重现，除非我卸下稳定的头盔env，然后再做一次！

bierhov 于 2019-12-05

@bierhov安装后，您还有任何失败的作业吗？

vsliouniaev 于 2019-12-05

@bierhov安装后，您还有任何失败的作业吗？

我的k8s版本是1.11.2掌舵机，耕种机版本是2.13.1
如果我安装了Prometheus-operator版本8.x
shell exec命令“ helm ls”，作业状态失败
但我安装了Prometheus-operator版本5.x
shell exec命令“ helm ls”，作业状态已部署！！！

bierhov 于 2019-12-05

使用以下方法不可复制：

Kubernetes版本： v1.13.12"
Kubectl版本： v1.16.2
头盔版本： 3.0.1
Prometheus-operator版本： 8.3.3

手动安装CRD：

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

将操作员配置为在Values.yaml中或使用进行安装时不创建crds

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

GramozKrasniqi 于 2019-12-12

@GramozKrasniqi
如果您不手动创建CRD怎么办？这是该问题的解决方法之一

vsliouniaev 于 2019-12-12

@vsliouniaev如果不创建它们，将得到错误。
但是在附加信息的原始问题中， @ rnkhouse表示他是手动创建CRD。

GramozKrasniqi 于 2019-12-12

简而言之，我们在部署中使用了prometheus-operator，我们将prom-op从6.9.3升级到8.3.3，并且始终失败，并显示“错误：上下文已取消”。
同样，我们总是在安装/升级prometheus-operator之前先安装crd，并且我们并没有更改或更新这些crd-s。

我尝试刷新crds，在github.com/helm/charts/tree/master/stable/prometheus-operator中提到的内容/master/example/prometheus-operator-crd/alertmanager.crd.yaml），但这些不再存在。
之后，我尝试从这里进行以下操作： https :
但是它又失败了。

我差点放弃，但是有了这些信用证，头盔部署成功了！ yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

我的设置：

Kubernetes版本： v1.14.3
Kubectl版本： v1.14.2
头盔版本： 2.14.3
Prometheus-operator版本： 8.3.3

从k8s清除prometheus运算符！

然后：

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

就这样！

alfonzso 于 2019-12-18

👍2

这是否意味着有必要进行全新安装并丢失历史指标数据？

pandvan 于 2019-12-19

将AKS k8s升级到1.15.5，将掌舵机升级到3.0.1，将Prometheus-operator图表升级到8.3.3，问题就消失了。

truealex81 于 2019-12-20

我们的解决方法是将prometheus运算符映像保留在v0.31.1上。

在AKS v1.14.8和helm + tiller v2.16.1上也为我工作，并将操作员图像更改为v0.31.1

infa-ddeore 于 2020-01-14

手动创建CRD至少对Azure有所帮助。
首先从此链接创建crds https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
所有文件的“ kubectl创建-f alertmanager.crd.yaml”等等
然后
头盔安装prometheus-operator stable / prometheus-operator-名称空间监视-版本8.2.4 --set prometheusOperator.createCustomResource = false

在天蓝色的kubernetes作品中，谢谢

cocuba 于 2020-01-28

通过遵循readme.md中的“头盔无法创建CRD”部分，我可以解决此问题。我不确定它们之间的关系，但是确实有效。

步骤1：手动创建CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

第2步：
等待创建CRD，只需几秒钟

第三步：
安装图表，但通过设置prometheusOperator.createCustomResource = false禁用CRD设置

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

谢谢，这对AKS集群非常有用。必须更改CRD的URL。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate = false

掌舵安装稳定/ prometheus-operator --name prometheus-operator --namespace监控--set prometheusOperator.createCustomResource = false

Superset1986 于 2020-03-24

闭幕。根据最后三位评论者的说法，此问题似乎已经解决。谢谢！

bacongobbler 于 2020-10-14

此页面是否有帮助？

0 / 5 - 0 等级

Helm: 发布“ prometheus-operator”失败：rpc错误：代码=已取消

最有用的评论

所有71条评论

相关问题