Helm: Release "prometheus-operator" failed: rpc error: code = Canceled

Created on 31 Jul 2019  ·  71Comments  ·  Source: helm/helm

Describe the bug
When I try to install prometheus operator on AKS with helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yaml I am getting this error:

prometheus-operator" failed: rpc error: code = Canceled

I checked with history:

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

Chart
[stable/prometheus-operator]

Additional Info
I am using below configurations to deploy a chart:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

In values file: createCustomResource is set to false,

Output of helm version:
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.7", GitCommit:"4683545293d792934a7a7e12f2cc47d20b2dd01b", GitTreeState:"clean", BuildDate:"2019-06-06T01:39:30Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):
AKS

questiosupport

Most helpful comment

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

All 71 comments

We have the same issue on minikube so it does not seem to be specific to AWS.

We have the same issue on kubespray-deployed clusters.

I'm also seeing the issue on both k8s 12.x and 13.x k8s kubespray deployed clusters in our automated pipeline - 100% failure rate. The previous version of prometheus-operator(0.30.1) works without issues.
Funny things is - that if I run the command manually instead of via the CD pipeline it works - so i'm a little confused as to what would be the cause.

Saw there was an update to promethus chart today. I bumped it to

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0     

and i'm no longer seeing the issue.

@rnkhouse Can you check with the latest chart version as mentioned by @dlevene1 in https://github.com/helm/helm/issues/6130#issuecomment-526977731?

I have this same issue with version 6.8.1 on AKS.

NAME                        CHART VERSION   APP VERSION
stable/prometheus-operator  6.8.1           0.32.0
❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

We have the same issue on kubespray-deployed clusters.

Kubernete version: v1.4.1
Helm version:

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Prometheus-operator version:

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0  

I have the same issue on aks.

Can anyone reproduce this issue in Helm 3, or does it propagate as a different error? My assumption is that with the removal of tiller this should no longer be an issue.

@bacongobbler This is still an issue in Helm 3.

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

That seems to be a different issue than the issue raised by the OP, though.

description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
= grpc: the client connection is closing'

Can you check and see if you're using the latest beta release as well? That error was seemingly addressed in #6332 which was released in 3.0.0-beta.3. If not can you open a new issue?

@bacongobbler i'm using the latest Helm v3.0.0-beta.3.

I had to go back to --version 6.7.3 to get it to install properly

Our workaround is to keep prometheus operator image on v0.31.1.

helm.log
Also just encountered this issue on DockerEE kubernetes install

After some fiddling with install options --debug and such, am now getting:

Error: release prom failed: context canceled

Edit: May try updating my helm versions, currently at v2.12.3
Edit2: Updated to 2.14.3 and still problematic
grpc: the client connection is closing
Edit3: Installed version 6.7.3 per above suggestions to get things going again
Edit4: Attached tiller log for a failed install as helm.log

related: https://github.com/helm/charts/issues/15977

After doing some digging with @cyp3d it appears that the issue could be caused by a helm delete timeout that's too short for some clusters. I cannot reproduce the issue anywhere, so if someone who is experiencing this could validate a potential fix in the linked pull request branch I would much appreciate it!

https://github.com/helm/charts/pull/17090

Same here on several Clusters created with kops on AWS.
No issues when running on K3S though.

@xvzf

Could you try the potential fix in this PR? https://github.com/helm/charts/pull/17090

I gave the PR a run through and still the same Error: release prom failed: context canceled
tiller.log

@vsliouniaev Nope, does not fix the issue here

Thanks for checking @xvzf and @pyadminn. I have made another change in the same PR. Could you see if this helps?

Just checked the updated PR still seeing the following on our infra: Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

FYI we are on Kuber 1.14.3
Helm vers v2.14.3

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

@vsliouniaev Still same issue! Though the workaround from lethalwire works.

The lethalwire workaround has me resolved as well.

So 4 days a part the workaround worked and stopped working I had to use the CRDs file from 0.32.0 not master.

I just now experienced the same issue with the CRDs currently on master. Thanks @Typositoire for your suggestion to use the currently previous version. Adapting CRDs install to the following worked for me:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

That's why fixing the version is often a good practice.

Also had this issue, try to disable admissionWebhooks. It helped in my case.

Install prometheus-operator chart 6.0.0 and do an helm upgrade --force --version 6.11.0, this seems to work on rancher kubernetes 1.13.10 and helm v2.14.3

The workaround suggested by @Typositoire worked fine for me on a kops-generated 1.13.10 cluster.

Same issue here trying to install on Azure AKS with kubernetes 1.13.10 and helm v2.14.3 with prometheus-operator-6.18.0. Any suggestion?

CRD installed manually.

This command failed:
helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false

give the error

Error: release prometheus-operator failed: rpc error: code = Canceled desc = grpc: the client connection is closing

EDIT: installing the version 6.11.0 (as well as the 6.7.3) of the chart is working:

helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false --version 6.11.0

Try disabling the admissions controller web hook?

https://waynekhan.net/2019/10/09/prometheus-operator-release-failed.html

On 15 Oct 2019, at 19:32, iMacX notifications@github.com wrote:


Same issue here trying to install on Azure AKS with kubernetes 1.13.10 and helm v2.14.3 with prometheus-operator-6.18.0. Any suggestion?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

I was fighting the same issue, I had to manually install the crds specified by @JBosom and install with the web hook disabled.

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

helm --tls --tiller-namespace=tiller install --namespace=monitoring --name prom-mfcloud stable/prometheus-operator --set prometheusOperator.createCustomResource=false --set prometheusOperator.admissionWebhooks.enabled=false --values values.yaml --versi on 6.18.0

I was receiving the same error trying to install v8.0.0 on local K8S cluster by Docker for Desktop with helm v2.14.3. Was able to install only after creating CRDs first as suggested by @lethalwire

I think we have enough cases here to determine this is a specific issue with the prometheus-operator chart.

I'm going to close this as something which we have no actionable response on our end, but please feel free to keep the conversation going.

I am sorry for the rant, but I am not getting this error anymore after upgrading to the latest helm v2.15.2. 👍

It seems quite strange that there's no information available from Helm about what is happening.

There are no debug logs posted here nor asked for and folks are restoring to flipping switches and seeing if it helps.

What does the error actually mean? Is it an indicator of a deadlock with waits? Are there some other actions that can be performed other than just a collective shrug?

Yes. The original error appears to be a deadlock waiting for the admission web hook to complete, since disabling the web hook allows the chart to install without issue. Looking at Tiller's logs should confirm the issue.

Helm 3 should report the correct error back to the user as there is no gRPC layer in the mix timing out and cancelling the request from a timeout.

Feel free to provide patches for Helm 2. Given that this has been improved for Helm 3, I went ahead and closed this as fixed in newer releases.

Hope this helps.

The original error appears to be a deadlock waiting for the admission web hook to complete, since disabling the web hook allows the chart to install without issue.

This seems pretty strange as the conclusion, since the solution is to either disable the job or disable installing the CRD hooks. Both of these appear to solve the problem, so it doesn't appear to be an issue specifically with the job.

To anyone else running into this issue - could you please provide the output of kubectl describe job so we can find out what jobs are failing? I have asked for this before but everyone appears to indicate that there are no jobs present.

Tiller reads as follow:

[kube] 2019/11/15 14:35:46 get relation pod of object: monitoring/PrometheusRule/prometheus-operator-node-time
[kube] 2019/11/15 14:35:46 Doing get for PrometheusRule: "prometheus-operator-kubernetes-apps"
[ A lot of unrelated updates in between... ]
2019/11/15 14:36:38 Cannot patch PrometheusRule: "prometheus-operator-kubernetes-apps" (rpc error: code = Canceled desc = grpc: the client connection is closing)
2019/11/15 14:36:38 Use --force to force recreation of the resource
[kube] 2019/11/15 14:36:38 error updating the resource "prometheus-operator-kubernetes-apps":
     rpc error: code = Canceled desc = grpc: the client connection is closing
[tiller] 2019/11/15 14:36:38 warning: Upgrade "prometheus-operator" failed: rpc error: code = Canceled desc = grpc: the client connection is closing
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v94"
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v95"
[ then rollback... ]

So I had to manually delete this resource. apiserver may have more informations (sounds like it is related to admission controller indeed).

@desaintmartin This looks like it is happening for you on an upgrade, rather than an install, right?

Since Helm 3.0 is GA now and the chart is working for it, please do report if you can get it to happen there and if you get any better logs

I'm on Helm3 and still get this error on Azure AKS :(

I tried on chart v8.2.4: if prometheusOperator.admissionWebhooks=false, prometheus.tlsProxy.enabled=false too.

Also, like what vsliouniaev said, what does --debug and --dry-run say?

@truealex81 Since helm3 is meant to give more information about this, can you please post verbose logs from the install process?

I am receiving the same issue deploying 8.2.4 on Azure AKS.

Helm Version:
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debug produces this output:

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

I can reproduce this reliably. If there is a way to get more verbose logs, please let me know and i post the output here

@pather87 thanks a lot!

Here's the order of what's meant to happen in the chart:

  1. CRDs are provisioned
  2. There is a pre-install;pre-upgrade job which runs a container to create a secret with certificates for the admission hooks. This job and its resources are cleaned up on success
  3. All the resources are created
  4. There is a post-install;post-upgrade job that runs a container to patch the created validationgwebhookconfiguration and mutatingwebhookconfiguration with the CA from the certificates created in step 2. This job and its resources are cleaned up on success

Could you please check if you have any failed jobs still present? From the logs it reads like you shouldn't because they were all successful.

Are there any other resources present in the cluster after the Error: context canceled happens?

Same here when install prometheus-operator:

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

@vsliouniaev thanks for your answer!

  1. There are no jobs laying around after the deployment.
  2. Deployments and services are present in the Cluster after the deployment, see kubectl output:

kubectl get all -lrelease=prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

Installation with debug:

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

After, then I execute: kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

What I've also discovered by trying to work around this: The issue persists, if i delete the chart and the CRDs afterwards and install the chart again, but the issue does not persist, if i do not delete the crds.

I tried out and installed the crds beforehand, and do a helm install --skip-crds, but still the issue persists. This somewhat confusing.

The next log line I would expect after this is about post-install,post-upgrade hooks, but it does not appear in your case. I'm not certain what helm is waiting on here

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

Thanks @truealex81 ! That works on Azure.

myenv:
k8s 1.11.2 helm 2.13.1 tiller 2.13.1
prometheus-operator-5.5 APP VERSION 0.29 is OK!!!

but:
prometheus-operator-8 APP VERSION 0.32 hava same problem:
"context canceled" or "grpc: the client connection is closing"!!!

i guess the lastest version of prometheus-operator is not compatible?!!!

@bierhov please can you post the resources in the namespace after a failure?

yes!
shell execute "helm ls" i can see my prometheus-operator release status "failed",but the namespace where prometheus-operator i installed have all prometheus-operator resourses
but,
promethues web can't get any data!

Can you please post the resources though?

Can you please post the resources though?

sorry,i cant reappear,unless i remove my stable helm env and do it again!

@bierhov do you have any failed jobs left after the install?

@bierhov do you have any failed jobs left after the install?

my k8s version is 1.11.2 helm an tiller version is 2.13.1
if i install prometheus-operator version 8.x
shell exec command "helm ls",the job status is failed
but i install prometheus-operator version 5.x
shell exec command "helm ls",the job status is deployed!!!

Not reproducable using:

Kubernetes version: v1.13.12"
Kubectl version: v1.16.2
Helm version: 3.0.1
Prometheus-operator version: 8.3.3

  1. Install CRDs manually:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

  1. Configure operator to not create crds in Values.yaml or when installing using

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

@GramozKrasniqi
What if you don't create CRDs manually? That's one of the workarounds for the issue

@vsliouniaev if you dont create them you will get the error.
But in the original issue in Additional Info @rnkhouse stated that he was creating the CRDs manually.

We use prometheus-operator in our deployment, in a nutshell, we upgraded prom-op from 6.9.3 to 8.3.3 and always failed with "Error: context canceled".
Also we always install crds before install/upgrade prometheus-operator, and ofc we didn't change or update these crd-s.

I try to refresh crds, which in 'github.com/helm/charts/tree/master/stable/prometheus-operator' mentions ( like this kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml ), but these don't exists anymore.
After that I try to these from here: https://github.com/helm/charts/tree/master/stable/prometheus-operator/crds
But It failed again.

I almost gave up, but with these crds, helm deploy succeeded ! yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

My setup:

Kubernetes version: v1.14.3
Kubectl version: v1.14.2
Helm version: 2.14.3
Prometheus-operator version: 8.3.3

Purge prometheus-operator from k8s !

Then:

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml 
helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

That's all !

Does this mean that it's necessary to do a clean install and lose historical metrics data?

Аfter upgarding AKS k8s to 1.15.5, helm to 3.0.1 and Prometheus-operator chart to 8.3.3 the problem is gone.

Our workaround is to keep prometheus operator image on v0.31.1.

worked for me as well on AKS v1.14.8 and helm+tiller v2.16.1 and changing operator image to v0.31.1

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

in azure kubernetes works, thanks

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

Thanks, this worked for me with AKS cluster. had to change the URL for the CRD's.

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate=false

helm install stable/prometheus-operator --name prometheus-operator --namespace monitoring --set prometheusOperator.createCustomResource=false

Closing. Looks like this has been since resolved, according to the last three commenters. Thanks!

Was this page helpful?
0 / 5 - 0 ratings