Helm: リリース「prometheus-operator」が失敗しました：rpcエラー：コード=キャンセルされました

作成日 2019年07月31日 · 71コメント · ソース: helm/helm

バグを説明する
helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yamlを使用してAKSにprometheusオペレーターをインストールしようとすると、次のエラーが発生します。

prometheus-operator "が失敗しました：rpcエラー：コード=キャンセルされました

私は歴史をチェックしました：

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

チャート
[安定/プロメテウス-オペレーター]

追加情報
以下の構成を使用してチャートをデプロイしています。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

値ファイル： createCustomResourceはfalseに設定され、

helm version出力：
クライアント：＆version.Version {SemVer： "v2.14.3"、GitCommit： "0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085"、GitTreeState： "clean"}
サーバー：＆version.Version {SemVer： "v2.14.3"、GitCommit： "0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085"、GitTreeState： "clean"}

kubectl version出力：
クライアントバージョン：version.Info {Major： "1"、Minor： "10"、GitVersion： "v1.10.4"、GitCommit： "5ca598b4ba5abb89bb773071ce452e33fb66339d"、GitTreeState： "clean"、BuildDate： "2018-06-06T08：13： 03Z "、GoVersion：" go1.9.3 "、コンパイラ：" gc "、プラットフォーム：" windows / amd64 "}
サーバーバージョン：version.Info {Major： "1"、Minor： "13"、GitVersion： "v1.13.7"、GitCommit： "4683545293d792934a7a7e12f2cc47d20b2dd01b"、GitTreeState： "clean"、BuildDate： "2019-06-06T01：39： 30Z "、GoVersion：" go1.11.5 "、コンパイラ：" gc "、プラットフォーム：" linux / amd64 "}

クラウドプロバイダー/プラットフォーム（AKS、GKE、Minikubeなど）：
AKS

questiosupport

ソース

rnkhouse

👍8 👀3

最も参考になるコメント

readme.mdの「ヘルムがCRDを作成できません」セクションに従うことで、この問題を回避することができました。それらがどのように関連しているかはわかりませんが、機能しました。

ステップ1：CRDSを手動で作成する

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

ステップ2：
CRDが作成されるのを待ちます。これには数秒しかかかりません。

ステップ3：
チャートをインストールしますが、prometheusOperator.createCustomResource = falseを設定してCRDプロビジョニングを無効にします

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype 2019年09月20日

👍10

全てのコメント71件

minikubeでも同じ問題が発生しているため、AWSに固有の問題ではないようです。

janvdvegt 2019年08月09日

kubesprayでデプロイされたクラスターでも同じ問題が発生します。

robinelfrink 2019年08月23日

また、自動パイプラインのk8s12.xと13.xk8s kubesprayでデプロイされたクラスターの両方で問題が発生しています（失敗率は100％）。以前のバージョンのprometheus-operator（0.30.1）は問題なく動作します。
面白いことに、CDパイプラインを介さずに手動でコマンドを実行すると、コマンドが機能するため、原因が何であるかについて少し混乱しています。

dlevene1 2019年09月02日

今日、プロメサスチャートが更新されたのを見ました。私はそれをぶつけた

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0

そして、私はもう問題を見ていません。

dlevene1 2019年09月02日

🎉1 👍1

@rnkhouse https://github.com/helm/helm/issues/6130#issuecomment-526977731の@ dlevene1で言及されている最新のチャートバージョンで確認できますか？

hickeyma 2019年09月02日

AKSのバージョン6.8.1でも同じ問題が発生します。

NAME                        CHART VERSION   APP VERSION
stable/prometheus-operator  6.8.1           0.32.0

❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

PaulusTM 2019年09月02日

kubesprayでデプロイされたクラスターでも同じ問題が発生します。

Kuberneteバージョン： v1.4.1
ヘルムバージョン：

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

プロメテウス-オペレーターバージョン：

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0

luncj 2019年09月04日

私はaksで同じ問題を抱えています。

will-beta 2019年09月06日

誰かがこの問題をHelm3で再現できますか、それとも別のエラーとして伝播しますか？私の仮定は、分げつを取り除くことで、これはもはや問題ではないはずだということです。

bacongobbler 2019年09月06日

@bacongobblerこれは

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

will-beta 2019年09月07日

ただし、これはOPが提起した問題とは異なる問題のようです。

説明： 'リリース "prometheus-operator"が失敗しました：rpcエラー：コード=キャンセルされた説明
= grpc：クライアント接続が閉じています '

最新のベータリリースも使用しているかどうかを確認できますか？このエラーは、3.0.0-beta.3でリリースされた＃6332で対処されたようです。そうでない場合は、新しい問題を開くことができますか？

bacongobbler 2019年09月07日

@bacongobbler最新のHelmv3.0.0-beta.3を使用しています。

will-beta 2019年09月07日

正しくインストールするには、-version6.7.3に戻らなければなりませんでした

k8s-class 2019年09月08日

私たちの回避策は、v0.31.1でプロメテウスオペレーターイメージを維持することです。

robinelfrink 2019年09月09日

👍3

helm.log
また、DockerEEkubernetesのインストールでこの問題が発生しました

インストールオプション--debugなどをいじった後、次のようになりました。

Error: release prom failed: context canceled

編集：現在v2.12.3で、私のヘルムバージョンを更新してみてください
Edit2：2.14.3に更新されましたが、まだ問題があります
grpc: the client connection is closing
Edit3：物事を再開するために、上記の提案に従ってバージョン6.7.3をインストールしました
Edit4：失敗したインストールの耕うん機ログをhelm.logとして添付

関連： https ：

pyadminn 2019年09月10日

@ cyp3dでいくつかの

https://github.com/helm/charts/pull/17090

vsliouniaev 2019年09月12日

AWSでkopsを使用して作成されたいくつかのクラスターでも同じです。
ただし、K3Sで実行する場合は問題ありません。

xvzf 2019年09月13日

@xvzf

このPRの潜在的な修正を試していただけますか？ https://github.com/helm/charts/pull/17090

vsliouniaev 2019年09月13日

私はPRを実行しましたが、それでも同じError: release prom failed: context canceled
tiller.log

pyadminn 2019年09月13日

@vsliouniaevいいえ、ここでは問題を修正しません

xvzf 2019年09月13日

@xvzfと@pyadminnをチェックしていただきありがとうございます。同じPRに別の変更を加えました。これが役立つかどうかわかりますか？

vsliouniaev 2019年09月14日

更新されたPRを確認したところ、インフラストラクチャに次の情報が表示されています： Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

参考までに、Kuber1.14.3を使用しています
ヘルム対v2.14.3

pyadminn 2019年09月16日

ステップ1：CRDSを手動で作成する

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

ステップ2：
CRDが作成されるのを待ちます。これには数秒しかかかりません。

ステップ3：
チャートをインストールしますが、prometheusOperator.createCustomResource = falseを設定してCRDプロビジョニングを無効にします

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

quantumhype 2019年09月20日

👍10

@vsliouniaevそれでも同じ問題です！致命的なワイヤーからの回避策は機能しますが。

xvzf 2019年09月23日

👍1

致命的な回避策により、私も解決しました。

pyadminn 2019年09月25日

そのため、回避策の一部が4日間機能し、機能しなくなったため、マスターで0.32.0なく

Typositoire 2019年10月02日

👍1

現在マスターにあるCRDで同じ問題が発生しました。現在以前のバージョンを使用することを提案してくれた@Typositoireに感謝します。 CRDインストールを以下に適合させることは私のために働きました：

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

そのため、バージョンを修正することをお勧めします。

JBosom 2019年10月03日

👍5 🎉2

また、この問題が発生した場合は、 admissionWebhooksを無効にしてみてください。私の場合は役に立ちました。

cu12 2019年10月03日

🎉1

prometheus-operator chart 6.0.0をインストールし、helm upgrade --force --version6.11.0を実行します。これはrancherkubernetes1.13.10およびhelmv2.14.3で機能するようです。

FreezB 2019年10月03日

@Typositoireによって提案された回避策は、kopsで生成された1.13.10クラスターで問題なく機能しました。

alex-hempel 2019年10月10日

同じ問題が、kubernetes1.13.10を使用してAzureAKSにインストールし、prometheus-operator-6.18.0を使用してhelmV2.14.3にインストールしようとしています。なにか提案を？

CRDは手動でインストールされます。

このコマンドは失敗しました：
helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false

エラーを与える

エラー：プロメテウスの解放-オペレーターが失敗しました：rpcエラー：コード=キャンセルされましたdesc = grpc：クライアント接続が閉じています

編集：チャートのバージョン6.11.0（および6.7.3）のインストールは機能しています：

helm install --name prometheus-operator stable/prometheus-operator --namespace=monitoring --set prometheusOperator.createCustomResource=false --version 6.11.0

iMacX 2019年10月15日

👍1

アドミッションコントローラーのWebフックを無効にしてみてください。

https://waynekhan.net/2019/10/09/prometheus-operator-release-failed.html

2019年10月15日で、夜07時32分で、iMacX [email protected]書きました：

同じ問題が、kubernetes1.13.10を使用してAzureAKSにインストールし、prometheus-operator-6.18.0を使用してhelmV2.14.3にインストールしようとしています。なにか提案を？
—
このスレッドにサブスクライブしているため、これを受け取っています。
このメールに直接返信するか、GitHubで表示するか、登録を解除してください。

waynekhan 2019年10月16日

👍1

私は同じ問題と戦っていました。@ JBosomで指定されたcrdsを手動でインストールし、Webフックを無効にしてインストールする必要がありました。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml

helm --tls --tiller-namespace=tiller install --namespace=monitoring --name prom-mfcloud stable/prometheus-operator --set prometheusOperator.createCustomResource=false --set prometheusOperator.admissionWebhooks.enabled=false --values values.yaml --versi on 6.18.0

poochwashere 2019年10月16日

👍5

Docker for Desktopによってヘルムv2.14.3してローカルK8Sクラスターにv8.0.0をインストールしようとすると、同じエラーが発生しました。 @lethalwireにより示唆されるように最初のCRDを作成した後にのみインストールすることができました

demisx 2019年11月06日

これがプロメテウス-オペレーターチャートの特定の問題であると判断するのに十分なケースがここにあると思います。

これは、私たちの側で実行可能な応答がないものとして終了しますが、会話を続けてください。

bacongobbler 2019年11月06日

暴言を申し訳ありませんが、最新のヘルムv2.15.2にアップグレードした後、このエラーは発生しなくなりました。 👍

demisx 2019年11月06日

何が起こっているのかについてヘルムから入手できる情報がないのは非常に奇妙に思えます。

ここに投稿されたデバッグログも要求されたデバッグログもありません。人々はスイッチを切り替えて、それが役立つかどうかを確認しています。

エラーは実際にはどういう意味ですか？待機によるデッドロックの指標ですか？肩をすくめるだけでなく、実行できるアクションはありますか？

vsliouniaev 2019年11月06日

はい。 Webフックを無効にするとチャートを問題なくインストールできるため、元のエラーはアドミッションWebフックが完了するのを待っているデッドロックのようです。 Tillerのログを見ると、問題が確認されます。

ミックスのタイムアウトとタイムアウトからのリクエストのキャンセルにはgRPCレイヤーがないため、Helm3は正しいエラーをユーザーに報告する必要があります。

Helm 2のパッチを自由に提供してください。これは、Helm 3で改善されているため、新しいリリースで修正されたように、これを閉じました。

お役に立てれば。

bacongobbler 2019年11月06日

👎1

Webフックを無効にするとチャートを問題なくインストールできるため、元のエラーはアドミッションWebフックが完了するのを待っているデッドロックのようです。

解決策はジョブを無効にするか、CRDフックのインストールを無効にすることなので、これは結論としてはかなり奇妙に思えます。これらは両方とも問題を解決しているように見えるので、特に仕事の問題ではないようです。

この問題が発生している他の人に- kubectl describe jobの出力を提供して、失敗しているジョブを特定できるようにしてください。私は以前にこれを求めましたが、誰もが仕事が存在しないことを示しているようです。

vsliouniaev 2019年11月14日

耕うん機は次のように読みます：

[kube] 2019/11/15 14:35:46 get relation pod of object: monitoring/PrometheusRule/prometheus-operator-node-time
[kube] 2019/11/15 14:35:46 Doing get for PrometheusRule: "prometheus-operator-kubernetes-apps"
[ A lot of unrelated updates in between... ]
2019/11/15 14:36:38 Cannot patch PrometheusRule: "prometheus-operator-kubernetes-apps" (rpc error: code = Canceled desc = grpc: the client connection is closing)
2019/11/15 14:36:38 Use --force to force recreation of the resource
[kube] 2019/11/15 14:36:38 error updating the resource "prometheus-operator-kubernetes-apps":
     rpc error: code = Canceled desc = grpc: the client connection is closing
[tiller] 2019/11/15 14:36:38 warning: Upgrade "prometheus-operator" failed: rpc error: code = Canceled desc = grpc: the client connection is closing
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v94"
[storage] 2019/11/15 14:36:38 updating release "prometheus-operator.v95"
[ then rollback... ]

そのため、このリソースを手動で削除する必要がありました。 apiserverには、より多くの情報がある場合があります（実際、アドミッションコントローラーに関連しているように聞こえます）。

desaintmartin 2019年11月15日

@desaintmartinこれは、インストールではなく、アップグレードで発生しているようですよね？

vsliouniaev 2019年11月15日

Helm 3.0は現在GAであり、チャートは機能しているので、そこで発生する可能性があるかどうか、およびより良いログが得られるかどうかを報告してください。

vsliouniaev 2019年11月15日

Helm3を使用していますが、AzureAKSでこのエラーが発生します:(

truealex81 2019年11月27日

チャートv8.2.4で試してみました： prometheusOperator.admissionWebhooks=false場合、 prometheus.tlsProxy.enabled=falseも。

また、vsliouniaevが言ったように、 --debugと--dry-runは何と言っていますか？

waynekhan 2019年11月28日

@ truealex81 helm3はこれに関する詳細情報を提供することを目的としているため、インストールプロセスから詳細なログを投稿していただけますか？

vsliouniaev 2019年11月28日

AzureAKSに8.2.4をデプロイするときに同じ問題が発生します。

ヘルムバージョン：
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debugは、次の出力を生成します。

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

これを確実に再現できます。より詳細なログを取得する方法がある場合は、お知らせください。出力をここに投稿します

sschne 2019年11月29日

@ pather87どうもありがとう！

チャートで発生することを意味する順序は次のとおりです。

CRDがプロビジョニングされます
コンテナーを実行して、アドミッションフックの証明書を使用してシークレットを作成するpre-install; pre-upgradeジョブがあります。このジョブとそのリソースは、成功するとクリーンアップされます
すべてのリソースが作成されます
コンテナーを実行して、作成されたvalidationgwebhookconfigurationにパッチを適用し、ステップ2で作成された証明書からCAを使用してwebhookconfigurationを変更する、インストール後;アップグレード後のジョブがあります。このジョブとそのリソースは、成功するとクリーンアップされます。

失敗したジョブがまだ存在するかどうかを確認してください。ログからは、すべて成功したため、すべきではないように読み取られます。

Error: context canceledが発生した後、クラスター内に他のリソースが存在しますか？

vsliouniaev 2019年11月29日

prometheus-operatorをインストールするときも同じです。

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

willsilvano 2019年11月29日

@vsliouniaevあなたの答えに感謝します！

展開後、仕事はありません。
デプロイメントとサービスは、デプロイメント後にクラスターに存在します。kubectlの出力を参照してください。

kubectl get all -lrelease = prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

sschne 2019年11月29日

デバッグによるインストール：

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

その後、実行します： kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

willsilvano 2019年11月29日

これを回避しようとしても発見したこと：後でチャートとCRDを削除してチャートを再度インストールすると問題は解決しませんが、crdsを削除しないと問題は解決しません。

事前にcrdsを試してインストールし、 helm install --skip-crdsを実行しましたが、それでも問題は解決しません。これはやや紛らわしいです。

sschne 2019年11月29日

この後に私が期待する次のログ行は、インストール後、アップグレード後のフックに関するものですが、あなたの場合には表示されません。ここで何の舵が待っているのかわかりません

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

vsliouniaev 2019年11月29日

手動のCRD作成は、少なくともAzureでは役立ちます。
まず、このリンクからcrdsを作成しますhttps://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
すべてのファイルに対して「kubectlcreate-falertmanager.crd.yaml」など
その後、
helm install prometheus-operator stable / prometheus-operator --namespace monitor --version 8.2.4 --set prometheusOperator.createCustomResource = false

truealex81 2019年11月29日

❤1 👍1

ありがとう@ truealex81 ！これはAzureで機能します。

willsilvano 2019年12月02日

myenv：
k8s1.11.2ヘルム2.13.1ティラー2.13.1
prometheus-operator-5.5APPバージョン0.29はOKです!!!

だが：
prometheus-operator-8APPバージョン0.32hava同じ問題：
「コンテキストがキャンセルされました」または「grpc：クライアント接続が閉じています」!!!

prometheus-operatorの最新バージョンには互換性がないと思いますか？!!!

bierhov 2019年12月05日

👍1

@bierhov失敗した後、名前空間にリソースを投稿できますか？

vsliouniaev 2019年12月05日

はい！
shell execute "helm ls" prometheus-operatorのリリースステータスが "failed"であることがわかりますが、インストールしたprometheus-operatorの名前空間にはすべてprometheus-operatorリソースがあります。
だが、
promethues webはデータを取得できません！

bierhov 2019年12月05日

リソースを投稿していただけますか？

vsliouniaev 2019年12月05日

リソースを投稿していただけますか？

申し訳ありませんが、安定したヘルムenvを削除して再度実行しない限り、再表示できません。

bierhov 2019年12月05日

@bierhovインストール後に失敗したジョブが残っていますか？

vsliouniaev 2019年12月05日

@bierhovインストール後に失敗したジョブが残っていますか？

私のk8sバージョンは1.11.2ヘルムです。ティラーバージョンは2.13.1です。
prometheus-operatorバージョン8.xをインストールした場合
shellexecコマンド "helm ls"、ジョブステータスが失敗しました
しかし、私はprometheus-operatorバージョン5.xをインストールします
shellexecコマンド "helm ls"、ジョブステータスがデプロイされます!!!

bierhov 2019年12月05日

以下を使用して再現することはできません。

Kubernetesバージョン： v1.13.12"
Kubectlバージョン： v1.16.2
ヘルムバージョン： 3.0.1
プロメテウス-オペレーターバージョン： 8.3.3

CRDを手動でインストールします。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

Values.yamlで、またはを使用してインストールするときにcrdsを作成しないようにオペレーターを構成します

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

GramozKrasniqi 2019年12月12日

@GramozKrasniqi
CRDを手動で作成しない場合はどうなりますか？これは、この問題の回避策の1つです。

vsliouniaev 2019年12月12日

@vsliouniaev作成しないと、エラーが発生します。
しかし、追加情報の元の問題に彼は、手動でのCRDを作成していたと述べ@rnkhouse。

GramozKrasniqi 2019年12月12日

デプロイメントではprometheus-operatorを使用します。簡単に言うと、prom-opを6.9.3から8.3.3にアップグレードし、常に「エラー：コンテキストがキャンセルされました」で失敗しました。
また、prometheus-operatorをインストール/アップグレードする前に常にcrdsをインストールしますが、これらのcrd-を変更または更新しなかった場合もあります。

'github.com/helm/charts/tree/master/stable/prometheus-operator'で言及されているcrdsを更新しようとしています（このkubectl apply -fhttps：//raw.githubusercontent.com/coreos/prometheus-operatorのように） /master/example/prometheus-operator-crd/alertmanager.crd.yaml）が、これらはもう存在しません。
その後、私はここからこれらを試してみます： https ：
しかし、それは再び失敗しました。

私はほとんどあきらめましたが、これらのcrdsで、ヘルムの展開は成功しました！ yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

私のセットアップ：

Kubernetesバージョン： v1.14.3
Kubectlバージョン： v1.14.2
ヘルムバージョン： 2.14.3
プロメテウス-オペレーターバージョン： 8.3.3

k8sからprometheus-operatorを削除します！

次に：

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

それで全部です！

alfonzso 2019年12月18日

👍2

これは、クリーンインストールを実行して履歴メトリックデータを失う必要があることを意味しますか？

pandvan 2019年12月19日

AKS k8sを1.15.5に、ヘルムを3.0.1に、プロメテウス-オペレーターチャートを8.3.3にアップグレードすると、問題は解消されます。

truealex81 2019年12月20日

私たちの回避策は、v0.31.1でプロメテウスオペレーターイメージを維持することです。

AKS v1.14.8とhelm + tiller v2.16.1で作業し、オペレーターの画像をv0.31.1

infa-ddeore 2020年01月14日

手動のCRD作成は、少なくともAzureでは役立ちます。
まず、このリンクからcrdsを作成しますhttps://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
すべてのファイルに対して「kubectlcreate-falertmanager.crd.yaml」など
その後、
helm install prometheus-operator stable / prometheus-operator --namespace monitor --version 8.2.4 --set prometheusOperator.createCustomResource = false

紺碧のkubernetesで動作します、ありがとう

cocuba 2020年01月28日

readme.mdの「ヘルムがCRDを作成できません」セクションに従うことで、この問題を回避することができました。それらがどのように関連しているかはわかりませんが、機能しました。
ステップ1：CRDSを手動で作成する
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml
ステップ2：
CRDが作成されるのを待ちます。これには数秒しかかかりません。
ステップ3：
チャートをインストールしますが、prometheusOperator.createCustomResource = falseを設定してCRDプロビジョニングを無効にします
$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

おかげで、これはAKSクラスターで私のために働いた。 CRDのURLを変更する必要がありました。

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate = false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate = false

ヘルムインストールstable / prometheus-operator --name prometheus-operator --namespace Monitoring --set prometheusOperator.createCustomResource = false

Superset1986 2020年03月24日

閉鎖。最後の3人のコメント投稿者によると、これはその後解決されたようです。ありがとう！

bacongobbler 2020年10月14日

このページは役に立ちましたか？

0 / 5 - 0 評価

Helm: リリース「prometheus-operator」が失敗しました：rpcエラー：コード=キャンセルされました

最も参考になるコメント

全てのコメント71件

関連する問題