Helm: Release "prometheus-operator" failed: rpc error: code = Canceled

Created on 31 Jul 2019 · 71Comments · Source: helm/helm

Describe the bug
When I try to install prometheus operator on AKS with helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yaml I am getting this error:

prometheus-operator" failed: rpc error: code = Canceled

I checked with history:

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

Chart
[stable/prometheus-operator]

Additional Info
I am using below configurations to deploy a chart:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

In values file: createCustomResource is set to false,

Output of helm version:
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.7", GitCommit:"4683545293d792934a7a7e12f2cc47d20b2dd01b", GitTreeState:"clean", BuildDate:"2019-06-06T01:39:30Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):
AKS

questiosupport

Source

rnkhouse

👍8 👀3

Most helpful comment

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype on 20 Sep 2019

👍10

All 71 comments

We have the same issue on minikube so it does not seem to be specific to AWS.

janvdvegt on 9 Aug 2019

We have the same issue on kubespray-deployed clusters.

robinelfrink on 23 Aug 2019

I'm also seeing the issue on both k8s 12.x and 13.x k8s kubespray deployed clusters in our automated pipeline - 100% failure rate. The previous version of prometheus-operator(0.30.1) works without issues.
Funny things is - that if I run the command manually instead of via the CD pipeline it works - so i'm a little confused as to what would be the cause.

dlevene1 on 2 Sep 2019

Saw there was an update to promethus chart today. I bumped it to

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0

and i'm no longer seeing the issue.

dlevene1 on 2 Sep 2019

🎉1 👍1

@rnkhouse Can you check with the latest chart version as mentioned by @dlevene1 in https://github.com/helm/helm/issues/6130#issuecomment-526977731?

hickeyma on 2 Sep 2019

I have this same issue with version 6.8.1 on AKS.

NAME                        CHART VERSION   APP VERSION
stable/prometheus-operator  6.8.1           0.32.0

❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

PaulusTM on 2 Sep 2019

We have the same issue on kubespray-deployed clusters.

Kubernete version: v1.4.1
Helm version:

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Prometheus-operator version:

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0

luncj on 4 Sep 2019

I have the same issue on aks.

will-beta on 6 Sep 2019

Can anyone reproduce this issue in Helm 3, or does it propagate as a different error? My assumption is that with the removal of tiller this should no longer be an issue.

bacongobbler on 6 Sep 2019

@bacongobbler This is still an issue in Helm 3.

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

will-beta on 7 Sep 2019

That seems to be a different issue than the issue raised by the OP, though.

description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
= grpc: the client connection is closing'

Can you check and see if you're using the latest beta release as well? That error was seemingly addressed in #6332 which was released in 3.0.0-beta.3. If not can you open a new issue?

bacongobbler on 7 Sep 2019

@bacongobbler i'm using the latest Helm v3.0.0-beta.3.

will-beta on 7 Sep 2019

I had to go back to --version 6.7.3 to get it to install properly

k8s-class on 8 Sep 2019

Our workaround is to keep prometheus operator image on v0.31.1.

robinelfrink on 9 Sep 2019

👍3

helm.log
Also just encountered this issue on DockerEE kubernetes install

After some fiddling with install options --debug and such, am now getting:

Error: release prom failed: context canceled

Edit: May try updating my helm versions, currently at v2.12.3
Edit2: Updated to 2.14.3 and still problematic
grpc: the client connection is closing
Edit3: Installed version 6.7.3 per above suggestions to get things going again
Edit4: Attached tiller log for a failed install as helm.log

pyadminn on 10 Sep 2019

After doing some digging with @cyp3d it appears that the issue could be caused by a helm delete timeout that's too short for some clusters. I cannot reproduce the issue anywhere, so if someone who is experiencing this could validate a potential fix in the linked pull request branch I would much appreciate it!

https://github.com/helm/charts/pull/17090

vsliouniaev on 12 Sep 2019

Same here on several Clusters created with kops on AWS.
No issues when running on K3S though.

xvzf on 13 Sep 2019

@xvzf

Could you try the potential fix in this PR? https://github.com/helm/charts/pull/17090

vsliouniaev on 13 Sep 2019

I gave the PR a run through and still the same Error: release prom failed: context canceled
tiller.log

pyadminn on 13 Sep 2019

@vsliouniaev Nope, does not fix the issue here

xvzf on 13 Sep 2019

Thanks for checking @xvzf and @pyadminn. I have made another change in the same PR. Could you see if this helps?

vsliouniaev on 14 Sep 2019

Just checked the updated PR still seeing the following on our infra: Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

FYI we are on Kuber 1.14.3
Helm vers v2.14.3

pyadminn on 16 Sep 2019

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype on 20 Sep 2019

👍10

@vsliouniaev Still same issue! Though the workaround from lethalwire works.

xvzf on 23 Sep 2019

👍1

The lethalwire workaround has me resolved as well.

pyadminn on 25 Sep 2019

So 4 days a part the workaround worked and stopped working I had to use the CRDs file from 0.32.0 not master.

Typositoire on 2 Oct 2019

👍1

I just now experienced the same issue with the CRDs currently on master. Thanks @Typositoire for your suggestion to use the currently previous version. Adapting CRDs install to the following worked for me:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

That's why fixing the version is often a good practice.

JBosom on 3 Oct 2019

👍5 🎉2

Also had this issue, try to disable admissionWebhooks. It helped in my case.

cu12 on 3 Oct 2019

🎉1

Install prometheus-operator chart 6.0.0 and do an helm upgrade --force --version 6.11.0, this seems to work on rancher kubernetes 1.13.10 and helm v2.14.3

FreezB on 3 Oct 2019

The workaround suggested by @Typositoire worked fine for me on a kops-generated 1.13.10 cluster.

alex-hempel on 10 Oct 2019

Same issue here trying to install on Azure AKS with kubernetes 1.13.10 and helm v2.14.3 with prometheus-operator-6.18.0. Any suggestion?

CRD installed manually.

This command failed:
helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false

give the error

Error: release prometheus-operator failed: rpc error: code = Canceled desc = grpc: the client connection is closing

EDIT: installing the version 6.11.0 (as well as the 6.7.3) of the chart is working:

helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false --version 6.11.0

iMacX on 15 Oct 2019

👍1

Try disabling the admissions controller web hook?

https://waynekhan.net/2019/10/09/prometheus-operator-release-failed.html

On 15 Oct 2019, at 19:32, iMacX notifications@github.com wrote:

Same issue here trying to install on Azure AKS with kubernetes 1.13.10 and helm v2.14.3 with prometheus-operator-6.18.0. Any suggestion?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

waynekhan on 16 Oct 2019

👍1

I was fighting the same issue, I had to manually install the crds specified by @JBosom and install with the web hook disabled.

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

helm --tls --tiller-namespace=tiller install --namespace=monitoring --name prom-mfcloud stable/prometheus-operator --set prometheusOperator.createCustomResource=false --set prometheusOperator.admissionWebhooks.enabled=false --values values.yaml --versi on 6.18.0

poochwashere on 16 Oct 2019

👍5

I was receiving the same error trying to install v8.0.0 on local K8S cluster by Docker for Desktop with helm v2.14.3. Was able to install only after creating CRDs first as suggested by @lethalwire

demisx on 6 Nov 2019

I think we have enough cases here to determine this is a specific issue with the prometheus-operator chart.

I'm going to close this as something which we have no actionable response on our end, but please feel free to keep the conversation going.

bacongobbler on 6 Nov 2019

I am sorry for the rant, but I am not getting this error anymore after upgrading to the latest helm v2.15.2. 👍

demisx on 6 Nov 2019

It seems quite strange that there's no information available from Helm about what is happening.

There are no debug logs posted here nor asked for and folks are restoring to flipping switches and seeing if it helps.

What does the error actually mean? Is it an indicator of a deadlock with waits? Are there some other actions that can be performed other than just a collective shrug?

vsliouniaev on 6 Nov 2019

Yes. The original error appears to be a deadlock waiting for the admission web hook to complete, since disabling the web hook allows the chart to install without issue. Looking at Tiller's logs should confirm the issue.

Helm 3 should report the correct error back to the user as there is no gRPC layer in the mix timing out and cancelling the request from a timeout.

Feel free to provide patches for Helm 2. Given that this has been improved for Helm 3, I went ahead and closed this as fixed in newer releases.

Hope this helps.

bacongobbler on 6 Nov 2019

👎1

The original error appears to be a deadlock waiting for the admission web hook to complete, since disabling the web hook allows the chart to install without issue.

This seems pretty strange as the conclusion, since the solution is to either disable the job or disable installing the CRD hooks. Both of these appear to solve the problem, so it doesn't appear to be an issue specifically with the job.

To anyone else running into this issue - could you please provide the output of kubectl describe job so we can find out what jobs are failing? I have asked for this before but everyone appears to indicate that there are no jobs present.

vsliouniaev on 14 Nov 2019

Tiller reads as follow:

[kube] 2019/11/15 14:35:46 get relation pod of object: monitoring/PrometheusRule/prometheus-operator-node-time
[kube] 2019/11/15 14:35:46 Doing get for PrometheusRule: "prometheus-operator-kubernetes-apps"
[ A lot of unrelated updates in between... ]
2019/11/15 14:36:38 Cannot patch PrometheusRule: "prometheus-operator-kubernetes-apps" (rpc error: code = Canceled desc = grpc: the client connection is closing)
2019/11/15 14:36:38 Use --force to force recreation of the resource
[kube] 2019/11/15 14:36:38 error updating the resource "prometheus-operator-kubernetes-apps":
     rpc error: code = Canceled desc = grpc: the client connection is closing
[tiller] 2019/11/15 14:36:38 warning: Upgrade "prometheus-operator" failed: rpc error: code = Canceled desc = grpc: the client connection is closing
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v94"
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v95"
[ then rollback... ]

So I had to manually delete this resource. apiserver may have more informations (sounds like it is related to admission controller indeed).

desaintmartin on 15 Nov 2019

@desaintmartin This looks like it is happening for you on an upgrade, rather than an install, right?

vsliouniaev on 15 Nov 2019

Since Helm 3.0 is GA now and the chart is working for it, please do report if you can get it to happen there and if you get any better logs

vsliouniaev on 15 Nov 2019

I'm on Helm3 and still get this error on Azure AKS :(

truealex81 on 27 Nov 2019

I tried on chart v8.2.4: if prometheusOperator.admissionWebhooks=false, prometheus.tlsProxy.enabled=false too.

Also, like what vsliouniaev said, what does --debug and --dry-run say?

waynekhan on 28 Nov 2019

@truealex81 Since helm3 is meant to give more information about this, can you please post verbose logs from the install process?

vsliouniaev on 28 Nov 2019

I am receiving the same issue deploying 8.2.4 on Azure AKS.

Helm Version:
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debug produces this output:

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

I can reproduce this reliably. If there is a way to get more verbose logs, please let me know and i post the output here

sschne on 29 Nov 2019

@pather87 thanks a lot!

Here's the order of what's meant to happen in the chart:

CRDs are provisioned
There is a pre-install;pre-upgrade job which runs a container to create a secret with certificates for the admission hooks. This job and its resources are cleaned up on success
All the resources are created
There is a post-install;post-upgrade job that runs a container to patch the created validationgwebhookconfiguration and mutatingwebhookconfiguration with the CA from the certificates created in step 2. This job and its resources are cleaned up on success

Could you please check if you have any failed jobs still present? From the logs it reads like you shouldn't because they were all successful.

Are there any other resources present in the cluster after the Error: context canceled happens?

vsliouniaev on 29 Nov 2019

Same here when install prometheus-operator:

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

willsilvano on 29 Nov 2019

@vsliouniaev thanks for your answer!

There are no jobs laying around after the deployment.
Deployments and services are present in the Cluster after the deployment, see kubectl output:

kubectl get all -lrelease=prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

sschne on 29 Nov 2019

Installation with debug:

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

After, then I execute: kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

willsilvano on 29 Nov 2019

What I've also discovered by trying to work around this: The issue persists, if i delete the chart and the CRDs afterwards and install the chart again, but the issue does not persist, if i do not delete the crds.

I tried out and installed the crds beforehand, and do a helm install --skip-crds, but still the issue persists. This somewhat confusing.

sschne on 29 Nov 2019

The next log line I would expect after this is about post-install,post-upgrade hooks, but it does not appear in your case. I'm not certain what helm is waiting on here

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

vsliouniaev on 29 Nov 2019

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

truealex81 on 29 Nov 2019

❤1 👍1

Thanks @truealex81 ! That works on Azure.

willsilvano on 2 Dec 2019

myenv:
k8s 1.11.2 helm 2.13.1 tiller 2.13.1
prometheus-operator-5.5 APP VERSION 0.29 is OK!!!

but:
prometheus-operator-8 APP VERSION 0.32 hava same problem:
"context canceled" or "grpc: the client connection is closing"!!!

i guess the lastest version of prometheus-operator is not compatible?!!!

bierhov on 5 Dec 2019

👍1

@bierhov please can you post the resources in the namespace after a failure?

vsliouniaev on 5 Dec 2019

yes!
shell execute "helm ls" i can see my prometheus-operator release status "failed",but the namespace where prometheus-operator i installed have all prometheus-operator resourses
but,
promethues web can't get any data!

bierhov on 5 Dec 2019

Can you please post the resources though?

vsliouniaev on 5 Dec 2019

Can you please post the resources though?

sorry,i cant reappear,unless i remove my stable helm env and do it again!

bierhov on 5 Dec 2019

@bierhov do you have any failed jobs left after the install?

vsliouniaev on 5 Dec 2019

@bierhov do you have any failed jobs left after the install?

my k8s version is 1.11.2 helm an tiller version is 2.13.1
if i install prometheus-operator version 8.x
shell exec command "helm ls",the job status is failed
but i install prometheus-operator version 5.x
shell exec command "helm ls",the job status is deployed!!!

bierhov on 5 Dec 2019

Not reproducable using:

Kubernetes version: v1.13.12"
Kubectl version: v1.16.2
Helm version: 3.0.1
Prometheus-operator version: 8.3.3

Install CRDs manually:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

Configure operator to not create crds in Values.yaml or when installing using

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

GramozKrasniqi on 12 Dec 2019

@GramozKrasniqi
What if you don't create CRDs manually? That's one of the workarounds for the issue

vsliouniaev on 12 Dec 2019

@vsliouniaev if you dont create them you will get the error.
But in the original issue in Additional Info @rnkhouse stated that he was creating the CRDs manually.

GramozKrasniqi on 12 Dec 2019

We use prometheus-operator in our deployment, in a nutshell, we upgraded prom-op from 6.9.3 to 8.3.3 and always failed with "Error: context canceled".
Also we always install crds before install/upgrade prometheus-operator, and ofc we didn't change or update these crd-s.

I try to refresh crds, which in 'github.com/helm/charts/tree/master/stable/prometheus-operator' mentions ( like this kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml ), but these don't exists anymore.
After that I try to these from here: https://github.com/helm/charts/tree/master/stable/prometheus-operator/crds
But It failed again.

I almost gave up, but with these crds, helm deploy succeeded ! yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

My setup:

Kubernetes version: v1.14.3
Kubectl version: v1.14.2
Helm version: 2.14.3
Prometheus-operator version: 8.3.3

Purge prometheus-operator from k8s !

Then:

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

That's all !

alfonzso on 18 Dec 2019

👍2

Does this mean that it's necessary to do a clean install and lose historical metrics data?

pandvan on 19 Dec 2019

Аfter upgarding AKS k8s to 1.15.5, helm to 3.0.1 and Prometheus-operator chart to 8.3.3 the problem is gone.

truealex81 on 20 Dec 2019

Our workaround is to keep prometheus operator image on v0.31.1.

worked for me as well on AKS v1.14.8 and helm+tiller v2.16.1 and changing operator image to v0.31.1

infa-ddeore on 14 Jan 2020

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

in azure kubernetes works, thanks

cocuba on 28 Jan 2020

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml
Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false
$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

Thanks, this worked for me with AKS cluster. had to change the URL for the CRD's.

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate=false

helm install stable/prometheus-operator --name prometheus-operator --namespace monitoring --set prometheusOperator.createCustomResource=false

Superset1986 on 24 Mar 2020

Closing. Looks like this has been since resolved, according to the last three commenters. Thanks!

bacongobbler on 14 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Helm init without internet connectivity.

libesz · 3Comments

Cannot install/fetch latest chart if 'pre-release' version

vdice · 3Comments

Helm upgrade fails upgrade ConfigMap

adam-sandor · 3Comments

direct support for blue / green deployment pattern in helm

burnettk · 3Comments

Upgrading statefulset affinity from annotation to spec.affinity fails from k8s 1.5 -> 1.6

PhilippeDupont · 3Comments