Output of helm version
:
version.BuildInfo{Version:"v3.0+unreleased", GitCommit:"180db556aaf45f34516f8ddb9ddac28d71736a3e", GitTreeState:"clean", GoVersion:"go1.13"}
Output of kubectl version
:
lient Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:36:28Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3+IKS", GitCommit:"66a72e7aa8fd2dbf64af493f50f943d7f7067916", GitTreeState:"clean", BuildDate:"2019-08-23T08:07:38Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.):
IBM Cloud
Helm chart deployment fails with:
➜ charts git:(h2update2) helm install vdc -f ~/etc/cloud-noes.yaml vdc <<<
coalesce.go:155: warning: skipped value for image: Not a table.
Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
(The first error is in a confluent chart... here I discuss the second issue)
Looking at the error I see a similar problem with
➜ charts git:(h2update2) kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
componentstatuses cs false ComponentStatus
configmaps cm true ConfigMap
endpoints ep true Endpoints
events ev true Event
limitranges limits true LimitRange
namespaces ns false Namespace
nodes no false Node
persistentvolumeclaims pvc true PersistentVolumeClaim
persistentvolumes pv false PersistentVolume
pods po true Pod
podtemplates true PodTemplate
replicationcontrollers rc true ReplicationController
resourcequotas quota true ResourceQuota
secrets true Secret
serviceaccounts sa true ServiceAccount
services svc true Service
mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition
apiservices apiregistration.k8s.io false APIService
controllerrevisions apps true ControllerRevision
daemonsets ds apps true DaemonSet
deployments deploy apps true Deployment
replicasets rs apps true ReplicaSet
statefulsets sts apps true StatefulSet
meshpolicies authentication.istio.io false MeshPolicy
policies authentication.istio.io true Policy
tokenreviews authentication.k8s.io false TokenReview
localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io false SubjectAccessReview
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
metrics autoscaling.internal.knative.dev true Metric
podautoscalers kpa,pa autoscaling.internal.knative.dev true PodAutoscaler
cronjobs cj batch true CronJob
jobs batch true Job
images img caching.internal.knative.dev true Image
certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest
certificates cert,certs certmanager.k8s.io true Certificate
challenges certmanager.k8s.io true Challenge
clusterissuers certmanager.k8s.io false ClusterIssuer
issuers certmanager.k8s.io true Issuer
orders certmanager.k8s.io true Order
adapters config.istio.io true adapter
attributemanifests config.istio.io true attributemanifest
handlers config.istio.io true handler
httpapispecbindings config.istio.io true HTTPAPISpecBinding
httpapispecs config.istio.io true HTTPAPISpec
instances config.istio.io true instance
quotaspecbindings config.istio.io true QuotaSpecBinding
quotaspecs config.istio.io true QuotaSpec
rules config.istio.io true rule
templates config.istio.io true template
leases coordination.k8s.io true Lease
brokers eventing.knative.dev true Broker
channels chan eventing.knative.dev true Channel
clusterchannelprovisioners ccp eventing.knative.dev false ClusterChannelProvisioner
eventtypes eventing.knative.dev true EventType
subscriptions sub eventing.knative.dev true Subscription
triggers eventing.knative.dev true Trigger
events ev events.k8s.io true Event
daemonsets ds extensions true DaemonSet
deployments deploy extensions true Deployment
ingresses ing extensions true Ingress
networkpolicies netpol extensions true NetworkPolicy
podsecuritypolicies psp extensions false PodSecurityPolicy
replicasets rs extensions true ReplicaSet
channels ch messaging.knative.dev true Channel
choices messaging.knative.dev true Choice
inmemorychannels imc messaging.knative.dev true InMemoryChannel
sequences messaging.knative.dev true Sequence
nodes metrics.k8s.io false NodeMetrics
pods metrics.k8s.io true PodMetrics
certificates kcert networking.internal.knative.dev true Certificate
clusteringresses networking.internal.knative.dev false ClusterIngress
ingresses ing networking.internal.knative.dev true Ingress
serverlessservices sks networking.internal.knative.dev true ServerlessService
destinationrules dr networking.istio.io true DestinationRule
envoyfilters networking.istio.io true EnvoyFilter
gateways gw networking.istio.io true Gateway
serviceentries se networking.istio.io true ServiceEntry
sidecars networking.istio.io true Sidecar
virtualservices vs networking.istio.io true VirtualService
ingresses ing networking.k8s.io true Ingress
networkpolicies netpol networking.k8s.io true NetworkPolicy
poddisruptionbudgets pdb policy true PodDisruptionBudget
podsecuritypolicies psp policy false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io false ClusterRole
rolebindings rbac.authorization.k8s.io true RoleBinding
roles rbac.authorization.k8s.io true Role
authorizationpolicies rbac.istio.io true AuthorizationPolicy
clusterrbacconfigs rbac.istio.io false ClusterRbacConfig
rbacconfigs rbac.istio.io true RbacConfig
servicerolebindings rbac.istio.io true ServiceRoleBinding
serviceroles rbac.istio.io true ServiceRole
priorityclasses pc scheduling.k8s.io false PriorityClass
configurations config,cfg serving.knative.dev true Configuration
revisions rev serving.knative.dev true Revision
routes rt serving.knative.dev true Route
services kservice,ksvc serving.knative.dev true Service
apiserversources sources.eventing.knative.dev true ApiServerSource
awssqssources sources.eventing.knative.dev true AwsSqsSource
containersources sources.eventing.knative.dev true ContainerSource
cronjobsources sources.eventing.knative.dev true CronJobSource
githubsources sources.eventing.knative.dev true GitHubSource
kafkasources sources.eventing.knative.dev true KafkaSource
csidrivers storage.k8s.io false CSIDriver
csinodes storage.k8s.io false CSINode
storageclasses sc storage.k8s.io false StorageClass
volumeattachments storage.k8s.io false VolumeAttachment
clustertasks tekton.dev false ClusterTask
pipelineresources tekton.dev true PipelineResource
pipelineruns pr,prs tekton.dev true PipelineRun
pipelines tekton.dev true Pipeline
taskruns tr,trs tekton.dev true TaskRun
tasks tekton.dev true Task
error: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
➜ charts git:(h2update2)
Then looking at 'action.go' in the source I can see that if this api call fails, we exit getCapabilities(). I understand why ... but is this failure too 'hard' - in the case above the error was a minor service?
This seems to have come up recently due to some changes on the k8s service with metrics.
I will persue that seperately... but was after thoughts on how helm handles this situation
Also a heads up helm3 may be broken on IKS - but I'm not knowledgeable enough to dig much further?
I have the same issue on AKS, though the error message is
Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
my config :
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
helm version: alpine/helm:3.0.0-beta.2 (docker)
kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
componentstatuses cs false ComponentStatus
configmaps cm true ConfigMap
endpoints ep true Endpoints
events ev true Event
limitranges limits true LimitRange
namespaces ns false Namespace
nodes no false Node
persistentvolumeclaims pvc true PersistentVolumeClaim
persistentvolumes pv false PersistentVolume
pods po true Pod
podtemplates true PodTemplate
replicationcontrollers rc true ReplicationController
resourcequotas quota true ResourceQuota
secrets true Secret
serviceaccounts sa true ServiceAccount
services svc true Service
mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition
apiservices apiregistration.k8s.io false APIService
controllerrevisions apps true ControllerRevision
daemonsets ds apps true DaemonSet
deployments deploy apps true Deployment
replicasets rs apps true ReplicaSet
statefulsets sts apps true StatefulSet
tokenreviews authentication.k8s.io false TokenReview
localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io false SubjectAccessReview
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
cronjobs cj batch true CronJob
jobs batch true Job
certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest
leases coordination.k8s.io true Lease
events ev events.k8s.io true Event
daemonsets ds extensions true DaemonSet
deployments deploy extensions true Deployment
ingresses ing extensions true Ingress
networkpolicies netpol extensions true NetworkPolicy
podsecuritypolicies psp extensions false PodSecurityPolicy
replicasets rs extensions true ReplicaSet
ingresses ing networking.k8s.io true Ingress
networkpolicies netpol networking.k8s.io true NetworkPolicy
runtimeclasses node.k8s.io false RuntimeClass
poddisruptionbudgets pdb policy true PodDisruptionBudget
podsecuritypolicies psp policy false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io false ClusterRole
rolebindings rbac.authorization.k8s.io true RoleBinding
roles rbac.authorization.k8s.io true Role
priorityclasses pc scheduling.k8s.io false PriorityClass
csidrivers storage.k8s.io false CSIDriver
csinodes storage.k8s.io false CSINode
storageclasses sc storage.k8s.io false StorageClass
volumeattachments storage.k8s.io false VolumeAttachment
error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I believe in my case this issue started recently... it seems to be in relation to having knative installed in my case (On IBM Cloud IKS this is a managed option). I've uninstalled knative and am ok for now, but there could be an interop issue here
@kalioz out of interest are you using knative on AWS? It looks not actually since I can't see the tekton objects
I have just seen this issue myself. In my case it was cert-manager that triggered the problem. Still working on how to get it back to how it was.
@planetf1 I'm not using knative (or i think i don't), but the problem only exist on the new cluster I deployed for this test.
The differences between the working cluster and the not-working are :
| |working|not-working|
|---|---|---|
|kube version|1.13.5|1.14.6|
|azure AD authentification|disabled|enabled|
|RBAC|disabled|enabled|
So i have some major changes.
To me the problem is that helm3 crash because of the lack of access to some apis, who are not used for the chart i'm trying to deploy.
I am using it on k8 cluster version 1.13.9, same error is coming for deploying any stable chart.
helm version
version.BuildInfo{Version:"v3.0.0-beta.3", GitCommit:"5cb923eecbe80d1ad76399aee234717c11931d9a", GitTreeState:"clean", GoVersion:"go1.12.9"}
helm.go:81: [debug] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request.
After resolving the issue from the metrics pod (can't remember how I solved it, i think it might have to do with hostNetwork or simply restarting the associated pod) helm3 function as expected.
So it might be a 'feature' as it forces to maintain the cluster in good health, but it'll require someone to manually go in the cluster each time an api break (and thus might prevent using helm3 to deploy pods able to be listed on this).
It's really, really annoying as someone starting out with Kubernetes. I'm hand rolling a solution for certificates using acme, since I can't guarantee that cert manager won't still be broken even after configuring it.
The really annoying part is I can't just use helm to uninstall cert manager and get back to where I was! Anything which allows a strongly recommended service to break it, and won't undo the change is broken.
For anyone who hits this, it's caused by api-services that no longer have backends running...
In my case it was KEDA, but there are a number of different services that install aggregated API servers.
To fix it:
kubectl get apiservice
Look for ones the AVAILABLE
is False
If you don't need those APIs any more, delete them:
kubectl delete apiservce <service-name>
Then Helm should work properly. I think improving the Helm error message for this case may be worthwhile...
Thanks for the explanation - is there a way Helm could code around this too?
We think so, though we're still investigating. My first look suggests that this is just related to our usage of the Discovery API, which is used for the Capabilities
object in template rendering. We might be able to trap this particular error and warn the user instead of failing.
Same with 2.15.0
now:
Error: Could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
This is pretty annoying. Warning instead of failing would be much better indeed.
Any updates on this so far?
EDIT: can s/o confirm 2.15
also being affected? Then I would suggest to adjust the labels of this ticket.
@sjentzsch I am also seeing the same using Helm 2.15.0
and k8s 1.16.0
.
If this does also affect 2.x then everyone using "cert-manager" (possibly only pre-configuration) is going to have a bad time.
__Here we have two different cases with the same behavior from helm side.
Both 2.15.1
and 3 beta
versions are affected.__
As @technosophos mentioned helm uses discovery API functionality and fails if any of API response fails https://github.com/helm/helm/blob/f1dc84773f9a34fe59a504fdb32428ce1d56a2e8/pkg/action/action.go#L105-L118
admission.certmanager.k8s.io/v1beta1
is a good example:kubectl get apiservice | grep certmanager
v1beta1.admission.certmanager.k8s.io service/cert-manager-webhook False (ServiceNotFound) 111d
and for this case you can easily fix it by kubectl delete apiservice v1beta1.admission.certmanager.k8s.io
as @brendandburns described.
Currently, it's alive and running but was down accidentally during the helm's request.
⇒ k get apiservice | grep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server True 1y
I'm sure that helm must be more robust for such type of issues,
1) maybe it's a good idea to convert the error to warning (I don't know how the info from api service uses during the template rendering)
2) implement retries for such type of requests
We have similar issue with 2.15.1 on Kubernetes 1.15.5, but NOT with helm 2.14.3.
The issue is floating: some charts are installed OK, but then they begin to fail.
Our message is:
Error: Could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request: exit status 1
kubectl get apiservice
lists metrics.k8s.io/v1beta1
as available. May be we have transient issue with this service, but helm 2.14.3 on mostly identical cluster works reliably.
We hit this issue when trying to upgrade to Helm 2.15.2 on the charts CI cluster. So, it's not only a Helm 3 issue. Deleting the missing API service fixed it. I wonder if Helm could be more graceful here, especially since this could probably pop up again any time.
Hit a similar problem installing the stable/metrics-server chart on a kubeadm installed cluster.
When you attempt to uninstall the chart, the uninstall fails with an api-server error (because metrics server is fubar), and that leaves a load of dangling resources lying around that you have to clean up by hand - since helm has removed the release from its database anyway.
$ helm version
version.BuildInfo{Version:"v3.0.0-rc.2", GitCommit:"82ea5aa774661cc6557cb57293571d06f94aff0c", GitTreeState:"clean", GoVersion:"go1.13.3"}
Started hitting this recently in freshly created GKE clusters, using 2.15.1 (might have upgraded recently via Snap). Also reported as https://github.com/kubernetes/kubernetes/issues/72051#issuecomment-521157642. Seem to be able to work around by preceding every helm install
command with:
kubectl --namespace=kube-system wait --for=condition=Available --timeout=5m apiservices/v1beta1.metrics.k8s.io
@jglick In your case is it happening only when the cluster is first created?
The problem is deep down in the Kubernetes Go discovery client. I am experimenting with just printing a warning. However, that could have negative consequences for charts that heavily rely on the Capabilities object.
In your case is it happening only when the cluster is first created?
Yes. I have a script which creates a cluster, installs Tiller, and creates Helm releases. So it seems like a race condition in cluster initialization.
@jglick the implementation I did yesterday will very likely avoid the problem for you unless you are writing charts that directly reference the offending API group.
@technosophos thanks for that merge. I do think it will improve the resilience of helm.
Do we have a fix for 2.15/2.16 ?
Seeing this in 2.16 as well. GKE Master version 1.14.8-gke.12.
Error: UPGRADE FAILED: Could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
UPGRADE FAILED
no fix has been made available for 2.16. If you feel like porting the fix over from Helm 3, that would be a welcome change.
For GKE users, Google is having issues with heapster and metric-server. This is what is causing the helm failures and explains why it works sometimes and not others.
Event Start: 10/30/19
Affected Products:
Cloud Services
Description:
The issue with Google Kubernetes Engine experiencing an elevated rate of errors for heapster autoscaling is in the process of being mitigated and our Engineering Team is working to deploy new versions with a fix.
Once the fixed versions become available affected customers will be able to upgrade their clusters to receive the fix.
We will provide an update on the status of the fix by Wednesday, 2019-11-13 16:30 US/Pacific with current details. In the interim, if you have questions or are impacted, please open a case with the Support Team and we will work with you until this issue is resolved.
Steps to Reproduce:
Heapster deployment may be crashing due to inaccurate resource values and then fail to resize due to an invalid name reference in the heapster-nanny container. The logs for an affected clusters will show errors like the below under the heapster-nanny container logs:
ERROR: logging before flag.Parse: E1030 14:50:59.147245 1 nanny_lib.go:110] deployments.extensions "heapster-v1.7.X" not found
Workaround:
Manually add requests/limits to the heapster container under the heapster deployment::
kubectl -n kube-system edit deployment heapster
These values can be calculated as:
* cpu: 80m + 0.5m * number of nodes
* memory: 140Mi + 4Mi * number of nodes
I just use helm 3.0.0 stable and ran in the issue:
Error: Could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: admission.certmanager.k8s.io/v1beta1: the server is currently unable to handle the request: exit status 1
The apiservice seemed to be healthy b/c the Availabiliy was showing "true" in kubectl get apiservices | grep certmanager
.
After "restarting" with kubectl delete apiservice v1beta1.admission.certmanager.k8s.io
the problem went away.
the fix was merged into the master branch, but it wasn't merged into 3.0.0. The patch will be in 3.1.
Look for ones the AVAILABLE is False
If you don't need those APIs any more, delete them:
kubectl delete apiservce
$ kubectl get apiservice
NAME SERVICE AVAILABLE AGE
v1. Local True 2d20h
v1.apps Local True 2d20h
v1.authentication.k8s.io Local True 2d20h
v1.authorization.k8s.io Local True 2d20h
v1.autoscaling Local True 2d20h
v1.batch Local True 2d20h
v1.coordination.k8s.io Local True 2d20h
v1.networking.k8s.io Local True 2d20h
v1.rbac.authorization.k8s.io Local True 2d20h
v1.scheduling.k8s.io Local True 2d20h
v1.storage.k8s.io Local True 2d20h
v1alpha3.compose.docker.com docker/compose-api False (ServiceNotFound) 2d19h
v1beta1.admissionregistration.k8s.io Local True 2d20h
v1beta1.apiextensions.k8s.io Local True 2d20h
v1beta1.apps Local True 2d20h
v1beta1.authentication.k8s.io Local True 2d20h
v1beta1.authorization.k8s.io Local True 2d20h
v1beta1.batch Local True 2d20h
v1beta1.certificates.k8s.io Local True 2d20h
v1beta1.compose.docker.com docker/compose-api False (ServiceNotFound) 2d19h
v1beta1.coordination.k8s.io Local True 2d20h
v1beta1.events.k8s.io Local True 2d20h
v1beta1.extensions Local True 2d20h
v1beta1.networking.k8s.io Local True 2d20h
v1beta1.node.k8s.io Local True 2d20h
v1beta1.policy Local True 2d20h
v1beta1.rbac.authorization.k8s.io Local True 2d20h
v1beta1.scheduling.k8s.io Local True 2d20h
v1beta1.storage.k8s.io Local True 2d20h
v1beta2.apps Local True 2d20h
v1beta2.compose.docker.com docker/compose-api False (ServiceNotFound) 2d19h
v2beta1.autoscaling Local True 2d20h
v2beta2.autoscaling Local True 2d20h
$ kubectl delete apiservce v1beta2.compose.docker.com
error: the server doesn't have a resource type "apiservce"
Windows 10, dokcer for windows.
I'm guessing there was a typo in the instructions. Probably should be
kubectl delete apiservice
(missing i in service)
We are also hit by inconsistent behaviour on delete. For example
The only part of "uninstallation" that was completed was removing the release secret.
The PR that fixed this was here: https://github.com/helm/helm/pull/6908 See the additional discussion around whether there still remains one additional case.
@here can we backport this fix to v2?
If anyone is available to test the Helm 2 fix, it's here: #7196
@bacongobbler as per your comment here https://github.com/helm/helm/issues/6361#issuecomment-554480815 do you know when v3.1 will be available? I just installed 3.0.1 and am still hitting the issue - I was surprised this fix didn't make it into v3.0.1 as it seems to be a pretty pervasive issue. Any chance of it making it into a v3.0.x release if that will be before v3.1?
Same question as @mcginne . I've been using the master branch for a bit now waiting for this fix to get into a release. But I'd like to get back to being on a release. This bug makes writing automation with helm
pretty difficult (unless you just want to try your luck and put sleeps and waiters everywhere).
Even just like a 3.1alpha
or something would be nice :)
Closing this issue, as it is resolved on master
One more case:
Error: failed to fetch api groups from kubernetes: unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1: the server is currently unable to handle the request
It was related to https://github.com/linkerd/linkerd2/issues/3497, when the Linkerd service had some internal problems and couldn't respond back to the API service requests. Fixed by restarting its pods.
@kivagant-ba would you mind opening a new issue for that one? It's a slightly different case, and we'll have to decide what the "correct" behavior on Helm's side should be. I think the current fix will still consider the above a fatal error.
For anyone who hits this, it's caused by api-services that no longer have backends running...
In my case it was KEDA, but there are a number of different services that install aggregated API servers.
To fix it:
kubectl get apiservice
Look for ones the
AVAILABLE
isFalse
If you don't need those APIs any more, delete them:
kubectl delete apiservice <service-name>
Then Helm should work properly. I think improving the Helm error message for this case may be worthwhile...
Just a small correction in spelling of the "service". corrected it.
would you mind opening a new issue for that one?
It is not an issue for people who are using a newer version of Linkerd. I left my comment here for those who will search the error phrase because it looks similar but the root cause is different.
Oh! Okay. Thank you!
@technosophos what's the fix for this? Should we grep kubectl get apiservice
and then block until all services are in a Ready
state? Is there something else we could do instead?
We're working on an OSS tool which installs a number of helm charts to bootstrap a system and this problem appears to be causing the whole process to fail intermittently.
I've just faced this issue doing helm delete
. It caused a very bad effect. The Helm release got removed, but all K8s objects were kept running in the cluster. So we had to remove everything by hand. And as it was an operator, this action required a significant effort.
@andrewnazarov Please provide more information on what you attempted to delete and what happened. Error messages would be helpful, as would Helm version, Kube version, etc.
@alexellis What, exactly, is causing a problem? Are you installing a Helm chart that installs an API Service and are wondering how to wait until it is available? The short answer is that you will definitely need to devise a strategy to wait, or possibly break it into two charts. Kubernetes doesn't give us much tooling to be able to deal with errors on the discovery API, but if a service description isn't backed by a service, a discovery call will definitely fail _and not return the service_ in the map it returns.
Please provide more information on what you attempted to delete and what happened. Error messages would be helpful, as would Helm version, Kube version, etc.
Sure.
Helm: 3.0.0
K8s: 1.14.8
helm delete prom -n monitoring
ended with the following error
Error: uninstallation completed with 1 error(s): could not get apiVersions from Kubernetes: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: admission.stash.appscode.com/v1alpha1: the server is currently unable to handle the request, admission.stash.appscode.com/v1beta1: the server is currently unable to handle the request, repositories.stash.appscode.com/v1alpha1: the server is currently unable to handle the request
After that, helm release got disappeared from the list of Helm releases and all objects related to that Prometheus operator became orphaned.
Ok, I see, it might be a version issue. Will upgrade Helm to most recent version 3.0.2 asap.
Yes, this is most definitely a version mismatch issue. This patch was made available in 3.0.2. In the future, please make sure to test with the latest patch release (or, better yet, on master). Thanks!
If you are experiencing further issues, please open a new ticket.
kubectl get apiservice
If one of the service "AVAILABLE=false", you can try to delete the related pods to restart them.
It solved my problem with the kube-system/metrics service.
Hi @technosophos. May be I'm missed something, but I don't see this PR https://github.com/helm/helm/pull/6908/files being ported to 2.16.3, although it happens with helm 2 as well. Are you planning port this workaround in helm 2 as well?
It as merged into dev-v2
a few months ago. You can build from that branch and test it out if you want.
Would be great to see this incorporated into Helm 2 and the Terraform provider. I'm able to repo this error every time a cluster is created.
Have you tested the dev-v2
branch? We currently don't have any confirmation (other than our own tests) that the solution there works, though it is in essence the same solution.
I have not, I can give it a try this week. Since I am using this with Terraform, can I build/run the dev-v2
branch and set the repository variable of the helm_release resource to "local"
to simulate?
@bacongobbler We faced the same issue with prometheus-adapter
which exposed custom apiservice and if I failed release with custom apiservice and kubectl get apiservice
any of this list would be AVAILABLE=false helm no longer able to make any new release even if it not related to custom apiservice:
err="Could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request"
Helm 2 with the Terraform provider is broken because of this issue at the moment and. I hope you can also provide a fix for it, it looks like this is a common use-case.
Can confirm I'm having this issue as well. Hoping for a fix.
Solution:
The steps I followed are:
kubectl get apiservices
: If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name"
. For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system
and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
kubectl describe pod/"pod_name"
to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
Most helpful comment
For anyone who hits this, it's caused by api-services that no longer have backends running...
In my case it was KEDA, but there are a number of different services that install aggregated API servers.
To fix it:
Look for ones the
AVAILABLE
isFalse
If you don't need those APIs any more, delete them:
Then Helm should work properly. I think improving the Helm error message for this case may be worthwhile...