helm 🚀 - "Error: Transport is closing" message when attempting to install

I've been experiencing the same problem. The deployment is "visibly" successful - yet it fails due to "Error: transport is closing" after around three minutes of waiting. This is happening whether I add --wait --timeout 600 or not. Moreover, the release looks just fine:

NAME                        REVISION    UPDATED                     STATUS      CHART               NAMESPACE
review-feature-ld-s9rxem    1           Tue Jan 30 18:55:42 2018    DEPLOYED    lde-nginx-941.0.0   default

Nowaker on 31 Jan 2018

👍2

In addition, letting ELB idle timeout to 3600, still led to the install failing within 3.5 minutes:

$ time helm install stable/spinnaker --name spinnaker -f values-personal.yaml --wait --timeout 3600 --debug
[debug] Created tunnel using local port: '37517'

[debug] SERVER: "127.0.0.1:37517"

[debug] Original chart version: ""
[debug] Fetched stable/spinnaker to /home/jjyooi/.helm/cache/archive/spinnaker-0.3.12.tgz

[debug] CHART PATH: /home/user/.helm/cache/archive/spinnaker-0.3.12.tgz

Error: transport is closing

real    3m31.836s
user    0m0.432s
sys 0m0.034s

huang-jy on 31 Jan 2018

I am also seeing timeouts right at 210s (3.5 minutes), regardless of the --timeout flag, with no LBs timing out in the middle. Indeed I see a FIN being sent from the client on two open sockets to the kube-apiserver. This happens while waiting on a post-install hook to execute to completion, and doesn't require passing --wait.

benlangfeld on 2 Feb 2018

Managed to watch the tiller logs as I did another install. No major errors here, so something is cutting out in between. The only thing I can think of is the cluster itself. Helm is reporting to respect the timeout, the ELB is not timing out with a timeout of 3600 secs, so unless the cluster itself is cutting the connection out?

[storage] 2018/02/02 19:51:45 getting release history for "spinnaker-blenderfox" [tiller] 2018/02/02 19:51:46 uninstall: Release not loaded: spinnaker-blenderfox [tiller] 2018/02/02 19:52:02 preparing install for spinnaker-blenderfox [storage] 2018/02/02 19:52:02 getting release history for "spinnaker-blenderfox" [tiller] 2018/02/02 19:52:03 rendering spinnaker chart using values 2018/02/02 19:52:07 info: manifest "spinnaker/charts/jenkins/templates/rbac.yaml" is empty. Skipping. 2018/02/02 19:52:08 info: manifest "spinnaker/charts/minio/templates/minio_statefulset.yaml" is empty. Skipping. 2018/02/02 19:52:08 info: manifest "spinnaker/templates/secrets/gcs.yaml" is empty. Skipping. 2018/02/02 19:52:08 info: manifest "spinnaker/charts/jenkins/templates/config.yaml" is empty. Skipping. 2018/02/02 19:52:08 info: manifest "spinnaker/charts/jenkins/templates/service-account.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-networkpolicy.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/charts/minio/templates/post-install-create-bucket-pod.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-ingress.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/charts/redis/templates/networkpolicy.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/charts/minio/templates/minio_networkpolicy.yaml" is empty. Skipping. 2018/02/02 19:52:11 info: manifest "spinnaker/templates/ingress/deck.yaml" is empty. Skipping. [tiller] 2018/02/02 19:52:16 performing install for spinnaker-blenderfox [tiller] 2018/02/02 19:52:16 executing 7 pre-install hooks for spinnaker-blenderfox [tiller] 2018/02/02 19:52:16 hooks complete for pre-install spinnaker-blenderfox [storage] 2018/02/02 19:52:16 getting release history for "spinnaker-blenderfox" [storage] 2018/02/02 19:52:16 creating release "spinnaker-blenderfox.v1" [kube] 2018/02/02 19:52:23 building resources from manifest [kube] 2018/02/02 19:52:28 creating 37 resource(s) [kube] 2018/02/02 19:52:39 beginning wait for 37 resources with timeout of 1h0m0s [kube] 2018/02/02 19:52:53 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:53:06 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:53:19 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:53:28 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:53:36 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:53:47 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:54:02 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:54:15 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:54:30 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:54:36 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [kube] 2018/02/02 19:54:51 Deployment is not ready: default/spinnaker-blenderfox-spi-clouddriver [tiller] 2018/02/02 19:55:01 executing 7 post-install hooks for spinnaker-blenderfox [kube] 2018/02/02 19:55:01 building resources from manifest [kube] 2018/02/02 19:55:01 creating 1 resource(s) [kube] 2018/02/02 19:55:04 Watching for changes to Job spinnaker-blenderfox-create-bucket with timeout of 1h0m0s [kube] 2018/02/02 19:55:04 Add/Modify event for spinnaker-blenderfox-create-bucket: ADDED [kube] 2018/02/02 19:55:04 spinnaker-blenderfox-create-bucket: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

huang-jy on 2 Feb 2018

@huang-jy In my case, it looks a lot like the helm CLI is timing out, since it's sending a FIN packet.

benlangfeld on 2 Feb 2018

One thing I was going to try was with an older version of helm.

huang-jy on 2 Feb 2018

I suspect my problem is happening @ https://github.com/grpc/grpc-go/blob/424e3e9894f9206fca433fb4ba66f639be56e325/stream.go#L299-L300. I have this problem with both 2.7.2 and 2.8.0. Will try next with master.

benlangfeld on 2 Feb 2018

Well, master is a no-go because I'd have to upgrade Tiller in my production kube install.

benlangfeld on 2 Feb 2018

helm 2.6.1 was used in a udemy course and that installed spinnaker successfully (though not sure the spinnaker version it picked up) so maybe try that too (I'll do the same)

huang-jy on 2 Feb 2018

I used 2.6.0 and it didn't timeout on me. (spinnaker 0.3.12 was used). It waited properly. My spinnaker install didn't succeed (some of the containers were stuck in CrashLoopBackoffs)

huang-jy on 3 Feb 2018

So, I used 2.8.0 and latest spinnaker chart. It timed out on me halfway, but from the tiller logs, it still continued.

When I used 2.6.0, and the latest spinnaker chart, it waited for the resources, although they never seemed to come ready (possibly a spinnaker issue rather than helm)

huang-jy on 3 Feb 2018

Just curious, but has anyone tried looking at the FAQ and see if that resolved it for them? https://github.com/kubernetes/helm/blob/29358ef9cef85c8467434008a42bc07e5a0d2a85/docs/install_faq.md#getting-started

bacongobbler on 5 Feb 2018

Interesting feedback, @huang-jy.

bacongobbler on 5 Feb 2018

@bacongobbler I'll have a look. That being said, I don't see the first two lines from the quoted FAQ. All I see is Error: transport is closing. But maybe the "missing" lines come from different sources? What sources?

E1014 02:26:32.885226   16143 portforward.go:329] an error occurred forwarding 37008 -> 44134: error forwarding port 44134 to pod tiller-deploy-2117266891-e4lev_kube-system, uid : unable to do port forwarding: socat not found.
2016/10/14 02:26:32 transport: http2Client.notifyError got notified that the client transport was broken EOF.
Error: transport is closing

Nowaker on 5 Feb 2018

@bacongobbler @Nowaker I also don't see those first two lines, and the install works just fine if I don't include my post-install hook (and correspondingly remove container health-checks, which fail until the hook is executed). The install actually completes despite the timeout (and the app is functional), but the deployment is never marked complete, and so a subsequent upgrade fails. It's not an outright functional issue, but an actual timeout.

benlangfeld on 5 Feb 2018

Seconded @benlangfeld. Release ends up as DEPLOYED even though Error: transport is closing hits me after a couple minutes.

Nowaker on 5 Feb 2018

@bacongobbler in my case, one of three things happened:

Transport is closing happened on 2.8.0 and 2.8.1 during install, but the install continued to happen behind the scenes even after the error.
Versions prior to 2.8.0 (checked with 2.7 and 2.6) didn't come up with this message and continued to wait
During the install, Tiller gets killed by the cluster (not evicted)

huang-jy on 6 Feb 2018

Note that @huang-jy appears to be the only one with a problem in Tiller (eviction), and in only 1/3 of his cases. This issue normally has nothing to do with Tiller, and appears very much to be a client-side timeout. In my case, Tiller has never ceased to operate, and this is purely a client disconnection.

benlangfeld on 6 Feb 2018

@benlangfeld Yes, it's a client disconnection and something I think within the 2.8.0 version. When I used <2.8.0 version on this cluster, I don't get a transport closed error. Sure, spinnaker doesn't install properly, but that's probably something with the chart and not helm.

I noticed there was a comment about increasing RAM. I might sizing up the worker nodes and see if that helps.

huang-jy on 7 Feb 2018

Increasing the box to r4.large didn't help, but I noticed on the pod logs, that when I thought the pod was evicted, it in fact wasn't, but was killed by the cluster

Logs from tiller

[tiller] 2018/02/07 09:19:21 preparing install for spinnaker-blenderfox
[storage] 2018/02/07 09:19:21 getting release history for "spinnaker-blenderfox"
[tiller] 2018/02/07 09:19:21 rendering spinnaker chart using values
2018/02/07 09:19:25 info: manifest "spinnaker/charts/minio/templates/post-install-create-bucket-pod.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/templates/secrets/gcs.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/templates/ingress/deck.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/jenkins/templates/rbac.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/minio/templates/minio_networkpolicy.yaml" is empty. Skipping.
2018/02/07 09:19:25 info: manifest "spinnaker/charts/jenkins/templates/config.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/redis/templates/networkpolicy.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/minio/templates/minio_statefulset.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-ingress.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/service-account.yaml" is empty. Skipping.
2018/02/07 09:19:26 info: manifest "spinnaker/charts/jenkins/templates/jenkins-master-networkpolicy.yaml" is empty. Skipping.
[tiller] 2018/02/07 09:19:31 performing install for spinnaker-blenderfox
[tiller] 2018/02/07 09:19:31 executing 7 pre-install hooks for spinnaker-blenderfox
[tiller] 2018/02/07 09:19:31 hooks complete for pre-install spinnaker-blenderfox
[storage] 2018/02/07 09:19:31 getting release history for "spinnaker-blenderfox"
[storage] 2018/02/07 09:19:31 creating release "spinnaker-blenderfox.v1"
[kube] 2018/02/07 09:19:35 building resources from manifest
[kube] 2018/02/07 09:19:39 creating 37 resource(s)
[kube] 2018/02/07 09:19:44 beginning wait for 37 resources with timeout of 5m0s
[kube] 2018/02/07 09:20:06 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:20:35 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:06 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:21 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:21:40 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:01 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:31 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver
[kube] 2018/02/07 09:22:54 Deployment is not ready: spinnaker/spinnaker-blenderfox-spi-clouddriver

>>Container crashed out here, pod restarted<<

[main] 2018/02/07 09:23:08 Starting Tiller v2.8.0 (tls=false)
[main] 2018/02/07 09:23:08 GRPC listening on :44134
[main] 2018/02/07 09:23:08 Probes listening on :44135
[main] 2018/02/07 09:23:08 Storage driver is ConfigMap
[main] 2018/02/07 09:23:08 Max history per release is 0

Pod logs

  Normal   Killing                6m                kubelet, ip-10-10-20-112.eu-west-2.compute.internal  Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.

huang-jy on 7 Feb 2018

So, I reproduced this without using hooks, against minikube and with a minimal chart: https://gist.github.com/benlangfeld/005f5d934c074d67a34fe9f881c84e89

While this particular deployment would of course never succeed (because of the impossible healthchecks), I would not expect it to time out in 210s as it does, but rather continue until the timeout at 300 seconds indicated in the Tiller log, as is the primary contention in this ticket.

benlangfeld on 7 Feb 2018

👍3

Having the same issue on 2.8.0. Helm's deployment status is DEPLOYED, yet the Helm client exits with the Error: Transport is closing error. It started happening after I upgraded Tiller & Helm from 2.6.0 to 2.8.0. Any ideas on how to mitigate this issue? Quite annoying especially in a CI environment.

cmdshepard on 7 Feb 2018

This is the output from Tiller when the error occurs:

[kube] 2018/02/07 20:04:33 Watching for changes to Job staging-stored-value-migration-job with timeout of 10m0s
[kube] 2018/02/07 20:04:34 Add/Modify event for staging-stored-value-migration-job: ADDED
[kube] 2018/02/07 20:04:34 staging-stored-value-migration-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

cmdshepard on 7 Feb 2018

My reproduction has the expected behaviour on v2.7.2 (both client and Tiller), timing out at 300 seconds. The same is true for a v2.7.2 client against a v2.8.0 Tiller server. So the bug is in client code somewhere here: https://github.com/kubernetes/helm/compare/v2.7.2...v2.8.0 . I'll see if I can bisect that tomorrow to identify the problem commit. The most suspicious commit is, of course, https://github.com/kubernetes/helm/commit/838d7808946b554865e0fc897e7db8e8dfd63bde

benlangfeld on 7 Feb 2018

I downgraded to 2.6.0. No longer have the issue. I have timeout set to 10 mins. Tiller is honoring the timeout but the Helm client does not.

cmdshepard on 7 Feb 2018

👍1

@cmdshepard Tiller honoured the timeout even on 2.8, Helm did not.

huang-jy on 8 Feb 2018

@huang-jy Yes. I can confirm this behavior on 2.8.0.

cmdshepard on 8 Feb 2018

The results of bisecting:

803f7c706ef4ce44aa6418c42c77dbf7e60ac66d is the first bad commit
commit 803f7c706ef4ce44aa6418c42c77dbf7e60ac66d
Author: Helgi Þormar Þorbjörnsson <[email protected]>
Date:   Wed Nov 22 08:30:25 2017 -0800

    add a keepalive of 30s to the client (#3183)

:040000 040000 3b1218a94f23f51ccbbf253676e1457f0c42d663 3d207d79bad28de7dc17f922e44d5faa1e31cf45 M  pkg

Indeed, current master with this commit reverted does not present this issue.

I suspect we have an incompatibility in configuration between helm and tiller; the relevant docs are at https://godoc.org/google.golang.org/grpc/keepalive.

benlangfeld on 9 Feb 2018

👍2

This is fixed for me by https://github.com/kubernetes/helm/pull/3482. If you would like to try that, do helm init --force-upgrade --tiller-image powerhome/tiller:git-3b22ecd. I'd appreciate feedback, particularly from @huang-jy, @Nowaker and @cmdshepard

benlangfeld on 9 Feb 2018

👍11 🎉2

Awesom, @benlangfeld thanks for that, will check that tomorrow when I'm in the office

huang-jy on 9 Feb 2018

Confirmed, @benlangfeld this version does not timeout. I did get a timeout error, but that's because the container crashed (presumably because I didn't have enough worker nodes)

Let's keep this open until the change is merged.

huang-jy on 9 Feb 2018

@benlangfeld I can confirm as well. Helm client successfully waited for the post-install hook to complete. Thanks for your contribution!

cmdshepard on 9 Feb 2018

this still happens in tiller 2.9.2

sheerun on 15 May 2018

Also: this happens immediately after helm says "Release "xxx" has been upgraded. Happy Helming!". It happens without --wait flag (with wait flag it seems OK), and far before timeout (300s timeout, fails after 100s).

sheerun on 15 May 2018

@sheerun did you try with the version above?

huang-jy on 15 May 2018

More news: tiller crashes also with --wait flag. The reason seems to be too much revisions because it crashes after following logs. Also, for now I can only try v2.5.1 because I use old kubernetes and tillers downgrades itself automatically.

[main] 2018/05/15 21:49:53 Starting Tiller v2.5.1 (tls=false)
[main] 2018/05/15 21:49:53 GRPC listening on :44134
[main] 2018/05/15 21:49:53 Probes listening on :44135
[main] 2018/05/15 21:49:53 Storage driver is ConfigMap
[tiller] 2018/05/15 21:50:37 getting history for release xxx
[storage] 2018/05/15 21:50:37 getting release history for "xxx"
[tiller] 2018/05/15 21:50:53 preparing update for xxx
[storage] 2018/05/15 21:50:53 getting last revision of "xxx"
[storage] 2018/05/15 21:50:53 getting release history for "xxx"

I didn't find solution for this one. I needed to migrate to new cluster.

sheerun on 16 May 2018

this still happens in tiller 2.9.2

2.9.2 doesn't exist... did you mean 2.8.2 or 2.9.1? This should be fixed in 2.8.2 as @huang-jy points out. :)

bacongobbler on 16 May 2018

Yes 2.9.1, but it turned out tiller on Azure automatically downgraded to 2.5.1 so I don't know if it's still an issue on 2.9.1. The bug was most likely due to too many revisions (more than 600).

sheerun on 16 May 2018

I saw the Error: transport is closing error too when I ran helm install, and after doing rm -rf ~/.helm, the error was not seen anymore. Guess deleting the helm cache (rm -rf ~/.helm) may resolve the error.

vhosakot on 30 May 2018

👍1

@vhosakot I believe that wipes helm's state

huang-jy on 1 Jun 2018

That just wipes your local state but not tiller's state. Not sure how that fixes the issue but if it works, that's great

/shrug

bacongobbler on 1 Jun 2018

In our case, we've noticed that his happens when the container running tiller goes away while a release is being installed due to Docker Engine upgrading. Looking for the tiller containers (e.g. docker ps -a | grep tiller), checking the logs of the container that exited and correlating the timestamps to events in /var/log/syslog helped us figure out our problem. YMMV ¯\_(ツ)_/¯

gerhard on 5 Jul 2018

This happens when tiller goes away during an install. In our case it was because Azure Container Service decided to downgrade tiller to a previous version... not Helm's fault.

Starefossen on 8 Aug 2018

I am still encountering this problem. My env is on premise k8s, helm client: 2.8.2, helm server: 2.9.1.
I am getting the error whenever I have a job in my chart configured as post-install hook and I try to 'upgrade --install' or 'delete' the chart's release, after exactly 1 minute of helm client waiting for install/delete completion.
If I don't configure the job as post-install hook, the problem disappears.

uxon123 on 24 Aug 2018

@uxon123 I had that originally -- it turned out tiller was getting booted off the node and I added an extra node to fix that. Now I know your env is on-prem, but if you have resource, can you try adding an extra node?

huang-jy on 24 Aug 2018

extra tiller node? you mean extra tiller pod to the tillers deployment or what?

uxon123 on 24 Aug 2018

No, extra kubernetes node.

huang-jy on 24 Aug 2018

Hm, I don't know if I fully understand this. You say that the reason of your problem (identical like I am encountering) was there was not enough resources on k8s?
If that was the case, running a job not from post-install, in my opinion should not resolve the problem (it needs the same amount of resources, right?). Tiller is responding if I try to install or uninstall some other releases in the meantime.
The problem occurs only when I use a post-install job, so when I enforce to wait for an answer. I am getting error everytime after nearly exactly 1 minute of waiting (but the tiller continues doing its job on a cluster).

uxon123 on 24 Aug 2018

You can check this if you do a watch kubectl get pods and keep an eye on the tiller pod during the helm install.

If it disappears or crashes out, then it's likely tiller is crashing out, as it did in my case.

huang-jy on 24 Aug 2018

Ok, I did what you said. I watched the tiller pod and I executed upgrade --install. The tiller pod neither disappeared nor crashed. I also checked tiller's replica set with describe command and no pods were failed etc. So I think I can eliminate problem with tiller. Thanks for your input though :)

uxon123 on 24 Aug 2018

You can also try tailing tiller's logs during the install. See if there's any errors during that time.

huang-jy on 24 Aug 2018

I checked it again and there are no problems with tiller in my case (logs are clean, no pod restarts, etc.).
It seems that the 1 minute timeout is not that long by a coincidence. I found out that all connections to my k8s cluster are routed by loadbalancer wchich is configured with 60 seconds timeouts. So connections are being killed by a load balancer after 60 seconds of inactivity.
So it looks like there are no keepalives between tiller client and server. Shouldn't tiller client be sending keepalives every 30 seconds? (https://github.com/helm/helm/pull/3183)

uxon123 on 27 Aug 2018

Are you running on AWS? Then you should check your api loadbalancer's connection timeout. If I remember right, it's only 60 secs by default.

huang-jy on 27 Aug 2018

Add --tls flag to helm install command.

makandas on 23 Nov 2018

👍6 ❤1

@makandas --tls relates to the connection between helm and tiller, IIRC.

If this error came up without anything being installed, I would agree that switch would be an option. But this is not the case. Here, Helm can talk to Tiller, initiate the install, but the connection is somehow closed off too early. This ticket has already been closed with a solution, and has been verified by several users.

huang-jy on 23 Nov 2018

I'm facing this issue yet. Event on the last version available 2.12.3.

I can see that tiller just crash with a log of errors

[tiller] 2019/01/23 16:25:50 preparing install for my-release
[storage] 2019/01/23 16:25:50 getting release history for "my-release"
[tiller] 2019/01/23 16:25:50 rendering my-release chart using values
panic: should not happen [recovered]
    panic: should not happen

goroutine 29 [running]:
k8s.io/helm/vendor/gopkg.in/yaml%2ev2.handleErr(0xc00078af78)
    /go/src/k8s.io/helm/vendor/gopkg.in/yaml.v2/yaml.go:164 +0x9a
....

I'm using at GCP (GKE version: 1.11.5-gke.5)

alanwds on 23 Jan 2019

👍2

@alanwds same here.

AndrewDryga on 11 Feb 2019

@alanwds from the truncated panic stack trace, I can only see that it stemmed from the yaml parser. What's the output of helm template on that chart? Do you have the full output of that stack trace somewhere?

bacongobbler on 11 Feb 2019

Same question to you, @AndrewDryga :)

bacongobbler on 11 Feb 2019

@bacongobbler for us it looks like Tiller is crashing once we deploy, there is no issue in templates as they did not change since last deployment (which passed while Tiller was available).

Name:               tiller-deploy-58b6bf5687-2498x
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               gke-staging-default-pool-4c48bc57-n6kx/10.142.0.7
Start Time:         Mon, 11 Feb 2019 22:01:37 +0200
Labels:             app=helm
                    name=tiller
                    pod-template-hash=1462691243
Annotations:        cni.projectcalico.org/podIP: 10.16.0.51/32
Status:             Running
IP:                 10.16.0.51
Controlled By:      ReplicaSet/tiller-deploy-58b6bf5687
Containers:
  tiller:
    Container ID:   docker://c5d2f01465da5f3b309fcc8b2b39c21015de204d125a944ba79333c514649901
    Image:          gcr.io/kubernetes-helm/tiller:v2.12.3
    Image ID:       docker-pullable://gcr.io/kubernetes-helm/tiller@sha256:cab750b402d24dd7b24756858c31eae6a007cd0ee91ea802b3891e2e940d214d
    Ports:          44134/TCP, 44135/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 12 Feb 2019 20:08:52 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 12 Feb 2019 20:05:28 +0200
      Finished:     Tue, 12 Feb 2019 20:05:54 +0200
    Ready:          False
    Restart Count:  18
    Liveness:       http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment:
      TILLER_NAMESPACE:    kube-system
      TILLER_HISTORY_MAX:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from tiller-token-r74zd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tiller-token-r74zd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tiller-token-r74zd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                             Message
  ----     ------     ----                  ----                                             -------
  Warning  Unhealthy  10m (x15 over 20h)    kubelet, gke-staging-default-pool-4c48bc57-n6kx  Liveness probe failed: Get http://10.16.0.51:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  10m (x28 over 12h)    kubelet, gke-staging-default-pool-4c48bc57-n6kx  Readiness probe failed: Get http://10.16.0.51:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    6m51s (x14 over 22h)  kubelet, gke-staging-default-pool-4c48bc57-n6kx  Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  BackOff    2m4s (x104 over 22h)  kubelet, gke-staging-default-pool-4c48bc57-n6kx  Back-off restarting failed container

logs -p gives empty return :(.

AndrewDryga on 12 Feb 2019

I just had this error and it turned out to be I had to little resources set for the tiller deploy.

JPWKU on 22 Apr 2019

👍2

Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}

helm hangs with the --wait command

phil-lgr on 13 May 2019

@JPWKU, could you please provide some info on how much resources for the tilelr deploy? Thanks

imriss on 22 Apr 2020

Helm: "Error: Transport is closing" message when attempting to install

Most helpful comment

All 63 comments

Related issues