I am using v1.8.4 and I am having the problem that deleted namespace stays at "Terminating" state forever. I did "kubectl delete namespace XXXX" already.
/sig api-machinery
@shean-guangchang Do you have some way to reproduce this?
And out of curiosity, are you using any CRDs? We faced this problem with TPRs previously.
/kind bug
I seem to be experiencing this issue with a rook deployment:
➜ tmp git:(master) ✗ kubectl delete namespace rook
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system.
➜ tmp git:(master) ✗
I think it does have something to do with their CRD, I see this in the API server logs:
E0314 07:28:18.284942 1 crd_finalizer.go:275] clusters.rook.io failed with: timed out waiting for the condition
E0314 07:28:18.287629 1 crd_finalizer.go:275] clusters.rook.io failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "clusters.rook.io": the object has been modified; please apply your changes to the latest version and try again
I've deployed rook to a different namespace now, but I'm not able to create the cluster CRD:
➜ tmp git:(master) ✗ cat rook/cluster.yaml
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook-cluster
spec:
dataDirHostPath: /var/lib/rook-cluster-store
➜ tmp git:(master) ✗ kubectl create -f rook/
Error from server (MethodNotAllowed): error when creating "rook/cluster.yaml": the server does not allow this method on the requested resource (post clusters.rook.io)
Seems like the CRD was never cleaned up:
➜ tmp git:(master) ✗ kubectl get customresourcedefinitions clusters.rook.io -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
creationTimestamp: 2018-02-28T06:27:45Z
deletionGracePeriodSeconds: 0
deletionTimestamp: 2018-03-14T07:36:10Z
finalizers:
- customresourcecleanup.apiextensions.k8s.io
generation: 1
name: clusters.rook.io
resourceVersion: "9581429"
selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusters.rook.io
uid: 7cd16376-1c50-11e8-b33e-aeba0276a0ce
spec:
group: rook.io
names:
kind: Cluster
listKind: ClusterList
plural: clusters
singular: cluster
scope: Namespaced
version: v1alpha1
status:
acceptedNames:
kind: Cluster
listKind: ClusterList
plural: clusters
singular: cluster
conditions:
- lastTransitionTime: 2018-02-28T06:27:45Z
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: 2018-02-28T06:27:45Z
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
- lastTransitionTime: 2018-03-14T07:18:18Z
message: CustomResource deletion is in progress
reason: InstanceDeletionInProgress
status: "True"
type: Terminating
➜ tmp git:(master) ✗
I have a fission namespace in a similar state:
➜ tmp git:(master) ✗ kubectl delete namespace fission
Error from server (Conflict): Operation cannot be fulfilled on namespaces "fission": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system.
➜ tmp git:(master) ✗ kubectl get pods -n fission
NAME READY STATUS RESTARTS AGE
storagesvc-7c5f67d6bd-72jcf 0/1 Terminating 0 8d
➜ tmp git:(master) ✗ kubectl delete pod/storagesvc-7c5f67d6bd-72jcf --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (NotFound): pods "storagesvc-7c5f67d6bd-72jcf" not found
➜ tmp git:(master) ✗ kubectl describe pod -n fission storagesvc-7c5f67d6bd-72jcf
Name: storagesvc-7c5f67d6bd-72jcf
Namespace: fission
Node: 10.13.37.5/10.13.37.5
Start Time: Tue, 06 Mar 2018 07:03:06 +0000
Labels: pod-template-hash=3719238268
svc=storagesvc
Annotations: <none>
Status: Terminating (expires Wed, 14 Mar 2018 06:41:32 +0000)
Termination Grace Period: 30s
IP: 10.244.2.240
Controlled By: ReplicaSet/storagesvc-7c5f67d6bd
Containers:
storagesvc:
Container ID: docker://3a1350f6e4871b1ced5c0e890e37087fc72ed2bc8410d60f9e9c26d06a40c457
Image: fission/fission-bundle:0.4.1
Image ID: docker-pullable://fission/fission-bundle@sha256:235cbcf2a98627cac9b0d0aae6e4ea4aac7b6e6a59d3d77aaaf812eacf9ef253
Port: <none>
Command:
/fission-bundle
Args:
--storageServicePort
8000
--filePath
/fission
State: Terminated
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/fission from fission-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from fission-svc-token-zmsxx (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
fission-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: fission-storage-pvc
ReadOnly: false
fission-svc-token-zmsxx:
Type: Secret (a volume populated by a Secret)
SecretName: fission-svc-token-zmsxx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
➜ tmp git:(master) ✗
Fission also uses CRDs, however, they appear to be cleaned up.
@shean-guangchang - I had the same issue. I've deleted everything under the namespaces manually, deleted and purged everything from "helm" and restarted the master nodes one by one and it fixed the issue.
I imagine what i've encountered has something to do with "ark", "tiller" and Kuberenets all working together (i bootstraped using helm and backed-up using ark) so this may not be a Kuberenets issue per say, on the other hand, it was pretty much impossible to troubleshot because there are no relevant logs.
if it is the rook one, take a look at this: https://github.com/rook/rook/issues/1488#issuecomment-365058080
I guess that makes sense, but it seems buggy that it's possible to get a namespace into an undeletable state.
I have a similar environment (Ark & Helm) with @barakAtSoluto and have the same issue. Purging and restarting the masters didn't fix it for me though. Still stuck at terminating.
I had that too when trying to recreate the problem. I eventually had to create a new cluster....
Exclude - default, kube-system/public and all ark related namespaces from backup and restore to prevent this from happening...
I'm also seeing this too, on a cluster upgraded from 1.8.4 to 1.9.6. I don't even know what logs to look at
The same issue on 1.10.1 :(
Same issue on 1.9.6
Edit: The namespace couldn't be deleted because of some pods hanging. I did a delete with --grace-period=0 --force on them all and after a couple of minutes the namespace was deleted as well.
Hey,
I've got this over and over again and it's most of the time some trouble with finalizers.
If a namespace is stuck, try to kubectl get namespace XXX -o yaml
and check if there is a finalizer on it. If so, edit the namespace and remove the finalizer (by passing an empty array) and then the namespace gets deleted
@xetys is it safe? in my case there is only one finalizer named "kubernetes".
That's strange, I've never seen such a finalizer. I just can speak based in my experience. I did that several time in a production cluster and it's still alive
Same issue on 1.10.5. I tried all advice in this issue without result. I was able to get rid of the pods, but the namespace is still hanging.
Actually, the ns too got deleted after a while.
It would be good to understand what causes this behavior, the only finalizer I had is kubernetes. I also have dynamic webhooks, can these be related?
@xetys Well, finally I used your trick on the replicas inside that namespace. They had some custom finalizer that probably no longer existed, so I couldn't delete them. When I removed the references to that finalizer, they disappeared and so did the namespace. Thanks! :)
Same issue on an EKS 1.10.3 cluster:
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:13:43Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Having the same problem on a bare metal cluster:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
My namespace looks like so:
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"creneaux-app","namespace":""}}
name: creneaux-app
spec:
finalizers:
- kubernetes
It's actually the second namespace I've had have this problem.
Try this to get the actual list of all things in your namespace: https://github.com/kubernetes/kubectl/issues/151#issuecomment-402003022
Then for each object do kubectl delete
or kubectl edit
to remove finalizers.
removing the initializer did the trick for me...
When I do kubectl edit namespace annoying-namespace-to-delete
and remove the finalizers, they get re-added when I check with a kubectl get -o yaml
.
Also, when trying what you suggested @adampl I get no output (removing --ignore-not-found
confirms no resources are found in the namespace, of any type).
@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
and it should delete your namespace,
@slassh It worked! Should've thought about making an API call: thanks a lot! We shall sing your praises forever
Issue exists in v1.11.1. I had a stuck rancher/helm deployment of dokuwiki. I first had to force delete the pods as suggested by @siXor and then followed @slassh advice. All good now.
@slassh how to see the kubernetes-cluster-ip? i use one of the node ip deployed the kubernetes cluster replace, and it report 404.
hi @jiuchongxiao , by kubernetes-cluster-ip i meant one of your node masters ip.
sorry if it's confusing !
If you start 'kubectl proxy' first you can direct the curl to http://127.0.0.1:8001/api/v1/namespaces/annoying-namespace-to-delete/finalize. I couldn't get authentication to work until I did it that way.
good tips @2stacks. Just need replace https
to http
.
I still see this issue in 1.11.2.
To give more context for reproducing, I saw this only with CRDs. By deleting the CRD object, I got in a weird state were the objects owned by it were not deleted. I didn't notice so I issued a delete for the namespace. Then I deleted all the objects in the namespace with kubectl delete all --all -n my-namespace
. At that point the namespace deletion got stuck. I hope this helps in some way.
By looking at logs I just found out that this particular problem was related to the controller manager being unhealthy. In my case it was not a bug most likely. When the controller manager went up again everything was cleaned up correctly.
@slassh Perfect solution! thank you very much!!!!
I also see this issue with 1.10.x. I find @slassh's comment a workaround that only hides the real issue. Why are the namespaces stuck at Terminating
?
We discovered the reason for deleting namespaces to be stuck in our case (@palmerabollo)
When a namespace has a finalizer kubernetes
, it means its an internal problem with the API Server.
Run kubectl api-resources
, if it returns an like the following, it means that the custom API isn't reachable.
error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Run kubectl get apiservices v1beta1.metrics.k8s.io -o yaml
, for checking its status conditions.
status:
conditions:
- lastTransitionTime: 2018-10-15T08:14:15Z
message: endpoints for service/metrics-server in "kube-system" have no addresses
reason: MissingEndpoints
status: "False"
type: Available
The above error should be caused by a crashloopbackoff affecting metrics-server. It would be similar for other custom APIs registered in Kubernetes.
Check your services health in kube-system
for restoring cluster runtime operations like deleting namespaces.
I'm facing with this issue on v1.11.3. At to the finalizers only kubernetes present for problematic namespace.
spec:
finalizers:
- kubernetes
@slassh Thanks a million, your solution works well!
I have the same problem in my cluster with ark, tiller and kubed. I suspect the issues might be the api of kubed that is giving an error, although not sure why it impacts the deletion of another namespace.
@javierprovecho I was merely playing around with metrics server and since it wasn't working I tried to delete the service and whatnot but my namespace still won't delete, error is
status:
conditions:
- lastTransitionTime: 2018-08-24T08:59:36Z
message: service/metrics-server in "kube-system" is not present
reason: ServiceNotFound
status: "False"
type: Available
Do you know how to recover from this state?
edit: I found out... I had to delete _everything_ even remotely related to metrics/HPA and then restart the entire control plane (had to take down all my replicas of it, before booting them back up.) this included deleting the apiservice
v1beta1.metrics.k8s.io
itself.
@2rs2ts
$ kubectl delete apiservice v1beta1.metrics.k8s.io
By getting rid of the non-functioning metrics
API service the controller-manager will be able to delete the stale namespace(s).
Restarting the control plane is not necessary.
@antoineco no, it was necessary; I deleted the apiservice and waited quite a while but the namespace would not be deleted until I restarted the control plane.
first, take small coffee and relax , now go to your k8s master nodes
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
save the ID to delete it later on :)
put it in file
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize
and it's gone 👍
Hey, the finalizer kubernetes
is there for a reason. For us it was a wrongly configured metrics API service name. Maybe for you is something else, which you can discover by looking at your control plane logs. Without confirmation of a bug, removing the finalizer may produce undesired consequences like leaving stuff created that can no longer be accesible again for deletion purposes.
as this issue is still open:
within my minikube cluster running with "none", this happened after the host woke up from hibernate.
my assumption:
in my case the hibernate triggered the same problems, an enabled swap would do.
which yields the assumption:
the swap might be enabled in the affected clusters?
but this is just conjecture. the important thing for me, and anyone landing in this bug with my local setup: hibernate is bad for kubernetes.
first, take small coffee and relax , now go to your k8s master nodes
- kubectl cluster-info
Kubernetes master is running at https://localhost:6443
KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxyTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
- now run the kube-proxy
kubectl proxy &
Starting to serve on 127.0.0.1:8001save the ID to delete it later on :)
- find your name-space that decided no to be deleted :) for us it will be cattle-system
kubectl get ns
cattle-system Terminating 1dput it in file
- kubectl get namespace cattle-system -o json > tmp.json
- edit the file and remove the finalizers
},
"spec": {
"finalizers": [
"kubernetes"
]
},
after editing it should look like this 👍
},
"spec": {
"finalizers": [
]
},
we almost there 👍curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize
and it's gone 👍
Great!!
Works
I run into this issue periodically if we change our gcloud instances (e.g. upgrading nodes). This replaces the old node from gcloud instances list
with a new one but leaves the pods in the k8s namespace hanging:
Reason: NodeLost
Message: Node <old node> which was running pod <pod> is unresponsive
This then leaves the pods in an unknown state:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
<pod> 2/2 Unknown 0 39d
Due to this, the namespace will never finish terminating. Not sure if this means we should change our finalizers or if there's an actual bug related to terminating that should be handling pods in an UNKNOWN
state (or if there should be a way of force terminating a namespace for cases like this).
I run into this issue periodically if we change our gcloud instances (e.g. upgrading nodes). This replaces the old node from
gcloud instances list
with a new one but leaves the pods in the k8s namespace hanging:Reason: NodeLost Message: Node <old node> which was running pod <pod> is unresponsive
This then leaves the pods in an unknown state:
$ kubectl get po NAME READY STATUS RESTARTS AGE <pod> 2/2 Unknown 0 39d
Due to this, the namespace will never finish terminating. Not sure if this means we should change our finalizers or if there's an actual bug related to terminating that should be handling pods in an
UNKNOWN
state (or if there should be a way of force terminating a namespace for cases like this).
cool it's not the same issue
you need to put nodes in maintenance mode and then after it's in maintenance mode all pods will be evacuated and then u can delete/upgrade
look it ,https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/,
edit resource and delete metadata.finalizers,and delete unuseful crd,you can delete it force
But what does the kubernetes
finalizer do exactly? Is there any risk that resources are not being correctly cleaned up with this hack?
For the rook stuck terminating this helped https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
Error from server (NotFound): namespaces "annoying-namespace-to-delete" not found
first, take small coffee and relax , now go to your k8s master nodes
- kubectl cluster-info
Kubernetes master is running at https://localhost:6443
KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxyTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
- now run the kube-proxy
kubectl proxy &
Starting to serve on 127.0.0.1:8001save the ID to delete it later on :)
- find your name-space that decided no to be deleted :) for us it will be cattle-system
kubectl get ns
cattle-system Terminating 1dput it in file
- kubectl get namespace cattle-system -o json > tmp.json
- edit the file and remove the finalizers
},
"spec": {
"finalizers": [
"kubernetes"
]
},
after editing it should look like this 👍
},
"spec": {
"finalizers": [
]
},
we almost there 👍curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize
and it's gone 👍
Invalid value: "The edited file failed validation": ValidationError(Namespace.spec): invalid type for io.k8s.api.core.v1.NamespaceSpec: got "string", expected "map"
If you have many namespaces stuck in Terminating, you can automate this:
kubectl get ns | grep Terminating | awk '{print $1}' | gxargs -n1 -- bash -c 'kubectl get ns "$0" -o json | jq "del(.spec.finalizers[0])" > "$0.json"; curl -k -H "Content-Type: application/json" -X PUT --data-binary @"$0.json" "http://127.0.0.1:8001/api/v1/namespaces/$0/finalize" '
make sure that all namespaces that you want the finalizer removed are indeed Terminating
.
You need the kubectl proxy
running and jq
for the above to work.
In our case, metrics api service is down and i can see this error log from verbose logging
kubectl delete ns <namespace-name> -v=7
.......
I0115 11:03:25.548299 12445 round_trippers.go:383] GET https://<api-server-url>/apis/metrics.k8s.io/v1beta1?timeout=32s
I0115 11:03:25.548317 12445 round_trippers.go:390] Request Headers:
I0115 11:03:25.548323 12445 round_trippers.go:393] Accept: application/json, */*
I0115 11:03:25.548329 12445 round_trippers.go:393] User-Agent: kubectl/v1.11.3 (darwin/amd64) kubernetes/a452946
I0115 11:03:25.580116 12445 round_trippers.go:408] Response Status: 503 Service Unavailable in 31 milliseconds
After fixing the metrics apiservice, terminating ones are completed.
Not really sure why deletion depends on metrics apiservice, also intrested to know how it works if metrics apiservice is not installed on the cluster
Not really sure why deletion depends on metrics apiservice,
@manojbadam because metrics is registered in the api server, when performing a namespace deletion, it must query that external api for (namespaced) resources to be deleted (if exist) associated with that namespace. If the extension server isn't available, Kubernetes can't guarantee that all objects have been removed, and it doesn't have a persistent mechanism (in memory or disk) to reconcile later because the root object would have been removed. That happens with any registered api extension service.
As I was constantly running into this, I automated this with a small shell script:
https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …
Thanks to everyone pointing me into the right direction!
As I was constantly running into this, I automated this with a small shell script:
https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …
Thanks to everyone pointing me into the right direction!
my hero! <3
I ran into this problem too. I'm on Google Kubernetes Engine and using Terraform to spin up Kubernetes clusters and to create namespaces and pods inside the cluster. The problem started a while after running a terraform destroy
.
In my case, this turns out to be an issue with order in which Terraform executes the destroy. Terraform deletes the node pool first, and then deletes the namespaces and pods. But due to deleting the (only) node pool, the Kubernetes cluster broke, and that's what caused the namespace deletion to get stuck at "terminating" forever.
@FooBarWidget same problem for me :(
As I was constantly running into this, I automated this with a small shell script:
https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …
Thanks to everyone pointing me into the right direction!
[root@k8s-master ~]# curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://172.*****:6443/api/v1/namespaces/rook-ceph/finalize
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "namespaces "rook-ceph" is forbidden: User "system:anonymous" cannot update namespaces/finalize in the namespace "rook-ceph"",
"reason": "Forbidden",
"details": {
"name": "rook-ceph",
"kind": "namespaces"
},
"code": 403
I got a return code 403, what should I do :(
As I was constantly running into this, I automated this with a small shell script:
https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …
Thanks to everyone pointing me into the right direction![root@k8s-master ~]# curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://172.*****:6443/api/v1/namespaces/rook-ceph/finalize { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "namespaces "rook-ceph" is forbidden: User "system:anonymous" cannot update namespaces/finalize in the namespace "rook-ceph"", "reason": "Forbidden", "details": { "name": "rook-ceph", "kind": "namespaces" }, "code": 403
I got a return code 403, what should I do :(
Thx god, the terminating namespace is finally gone. The following method do trick for me.
NAMESPACE=rook-ceph
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
I have the same issue but i don't see any metrics-service.
I'm playing around with k8s from digitalocean and gitlab auto devops. My assumption is some digitalocean blob storage but i'm lost on how to analyse or fix it.
@mingxingshi tx. Did a edit namespace which didn't do the trick. Your script did it.
Wow, finally got rid of it. Thanks for the commands @mingxingshi !
The solution for me was:
kubectl delete apiservice v1beta1.metrics.k8s.io
just figured i should leave my experience of this here:
i was doing terraform apply
with the following resource:
resource "helm_release" "servicer" {
name = "servicer-api"
// n.b.: this is a local chart just for this terraform project
chart = "charts/servicer-api"
namespace = "servicer"
...
}
but i am a helm newb and had a template that had a template in it that created a namespace called servicer
. This caused terraform and k8s to get in this bad state where terraform would fail, then k8s would leave the servicer
namespace permanently in the Terminating
state. Doing what @mingxingshi suggests above made the namespace terminate, as it had no resources attached to it.
this issue stopped happening for me when i removed that template that made the namespace and left it to helm to create it.
The problem is completely repeatable for me. First, clone the prometheus-operator. Then:
cd prometheus-operator/contrib/kube-prometheus
kubectl create -f manifests/ --validate=false
... wait ...
kubectl delete namespace monitoring
Hangs. If, however, I use kubectl delete -f manifests/
, then cleanup is successful.
Yeah, had the same hang with prometheus-operator. Need to kubectl delete -f manifests/
to unstuck.
I think there are some finalizers in prometheus CRD's that are misbehaving, in this particular scenario it's hardly kubernetes fault. However kubernetes should make it easier finding the culprit, because the lengts of this thread demonstrates that there could be many causes and it's not easy to get to the bottom of it in each particular scenario.
I'm a kubernetes noob so I can't offer much info here but I also have 2 namespaces stuck in terminating status. My kubernetes setup is using IBM Cloud Private 3.1.2 Community Edition
kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4+icp", GitCommit:"3f5277fa129f05fea532de48284b8b01e3d1ab4e", GitTreeState:"clean", BuildDate:"2019-01-17T13:41:02Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4+icp", GitCommit:"3f5277fa129f05fea532de48284b8b01e3d1ab4e", GitTreeState:"clean", BuildDate:"2019-01-17T13:41:02Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
kubectl cluster-info
Kubernetes master is running at https://ip
catalog-ui is running at https://ip/api/v1/namespaces/kube-system/services/catalog-ui:catalog-ui/proxy
Heapster is running at https://ip/api/v1/namespaces/kube-system/services/heapster/proxy
image-manager is running at https://ip/api/v1/namespaces/kube-system/services/image-manager:image-manager/proxy
CoreDNS is running at https://ip/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
metrics-server is running at https://ip/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
platform-ui is running at https://ip/api/v1/namespaces/kube-system/services/platform-ui:platform-ui/proxy
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip1 Ready,SchedulingDisabled etcd,management,master,proxy 23h v1.12.4+icp
ip2 Ready worker 23h v1.12.4+icp
ip3 Ready worker 23h v1.12.4+icp
I have two namespaces stuck in the terminating state
kubectl get ns
NAME STATUS AGE
my-apps Terminating 21h
cert-manager Active 23h
default Active 23h
istio-system Active 23h
kube-public Active 23h
kube-system Active 23h
platform Active 22h
psp-example Terminating 18h
services Active 22h
When I check the finalizers as described in this comment I only see kubernetes
.
kubectl get ns my-apps -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"my-apps"}}
creationTimestamp: 2019-04-10T18:23:55Z
deletionTimestamp: 2019-04-11T15:24:24Z
name: my-apps
resourceVersion: "134914"
selfLink: /api/v1/namespaces/my-apps
uid: ccb0398d-5bbd-11e9-a62f-005056ad5350
spec:
finalizers:
- kubernetes
status:
phase: Terminating
Regardless I tried removing kubernetes
from the finalizers and it didn't work. I also tried using the json/api approach described in this comment. Also didn't work. I tried restarting all the nodes and that didn't work either.
I also tried doing the force delete and that doesn't work either
kubectl delete namespace my-apps --force --grace-period 0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (Conflict): Operation cannot be fulfilled on namespaces "my-apps": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system.
In my case the namespace is rook-ceph, kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
works for me. For other cases it should work too.
From: https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md
@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
and it should delete your namespace,
I have some problem while using your approach, what should I do for next step troubleshooting?
~ curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://39.96.4.11:6443/api/v1/namespaces/istio-system/finalize
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "namespaces \"istio-system\" is forbidden: User \"system:anonymous\" cannot update resource \"namespaces/finalize\" in API group \"\" in the namespace \"istio-system\"",
"reason": "Forbidden",
"details": {
"name": "istio-system",
"kind": "namespaces"
},
"code": 403
@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
and it should delete your namespace,I have some problem while using your approach, what should I do for next step troubleshooting?
~ curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://39.96.4.11:6443/api/v1/namespaces/istio-system/finalize { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "namespaces \"istio-system\" is forbidden: User \"system:anonymous\" cannot update resource \"namespaces/finalize\" in API group \"\" in the namespace \"istio-system\"", "reason": "Forbidden", "details": { "name": "istio-system", "kind": "namespaces" }, "code": 403
My problem can be solved by this script: https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
.
yup https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns does the trick
set -eo pipefail
die() { echo "$*" 1>&2 ; exit 1; }
need() {
which "$1" &>/dev/null || die "Binary '$1' is missing but required"
}
# checking pre-reqs
need "jq"
need "curl"
need "kubectl"
PROJECT="$1"
shift
test -n "$PROJECT" || die "Missing arguments: kill-ns <namespace>"
kubectl proxy &>/dev/null &
PROXY_PID=$!
killproxy () {
kill $PROXY_PID
}
trap killproxy EXIT
sleep 1 # give the proxy a second
kubectl get namespace "$PROJECT" -o json | jq 'del(.spec.finalizers[] | select("kubernetes"))' | curl -s -k -H "Content-Type: application/json" -X PUT -o /dev/null --data-binary @- http://localhost:8001/api/v1/namespaces/$PROJECT/finalize && echo "Killed namespace: $PROJECT"
It seems that namespaces are actually not deleted.
In my case, kubectl get ns
does not show the deleted namespace but a kubectl get all -n <namespace>
show all resources safe and sound.
I checked on the nodes and docker containers were still running...
@glouis that's because you bypassed the finalizers using the method above, so Kubernetes didn't have time to execute all those essential deletion tasks.
It's really sad to see so many people blindly advocating for this method without understanding its consequences. It's extremely ugly and can potentially leave tons of leftovers in the cluster. @javierprovecho already mentioned it above, and @liggitt also mentioned it in another GitHub issue.
You'd be better off fixing the broken v1beta1.metrics.k8s.io
API service, or deleting it if you don't need it.
See also #73405
I second @antoineco message. I tested this out on one of our sandbox environments because we were constantly getting stuck namespaces. after about a month all the docker daemons were freezing for no reason. Turns out we created huge memory leaks from leaving resources behind.
After a lot of trial and error, and reading through these comments, it turned out to be custom resource definition for coreos grafana stack for the namespaces. Listing out the CRDs showed specific resources for that namespace. I was very lucky that the name of the CRD had the namespace in it that was stuck.
It also turned out that having one stuck namespace stops any more namespaces to delete. So even if you have namespace A that has no CRD's getting it stuck, and there is a namespace B with a stuck CRD, all the resources in A will stick around until B is gone. I think I must have done the described fix above on namespace A leaving a ton of resources around every time.
The thing that is still killing me is I cannot for the life of me find any log mentioning a namespace clean up failing on deleting a CRD, or even what it is currently doing. I had to spend an hour just figuring out what CRD it was stuck on. If anyone has an idea on how to get more info so I dont have to spend a huge amount of time figuring out the stuck resource that would be awesome.
@jecafarelli good hint for Production Clusters. But unfortunate for me, i was just not able to kill it otherwise. I also knew i will recreate the whole cluster later on.
I tried analysing the issue but nothin in this thread helped me to solve it by other means.
This official solution helped me: https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating
This is not the same as kubectl edit namespace rook-ceph
. I was unable to solve this problem until I PUT
request with deleted _"finalizers"_
ok so I ran into this again with coreos, and I dug a bit deeper. this is most definitely because of a cluster wide resource definition that is namespaced, and further more maybe it couldnt delete it because it cant query info on coreos. I did find errors in the apiserver logs that showed errors on trying to get information about an api group. I used the referenced issue above to come up with a quick script that lists out the resources that got the ns stuck for me.
ill probably just use this in the future if I run into it again and keep adding any other namespaced resources I run into.
for ns in `kubectl get ns --field-selector status.phase=Terminating -o name | cut -d/ -f2`;
do
echo "apiservice under namespace $ns"
kubectl get apiservice -o json |jq --arg ns "$ns" '.items[] |select(.spec.service.namespace != null) | select(.spec.service.namespace == $ns) | .metadata.name ' --raw-output
echo "api resources under namespace $ns"
for resource in `kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -o name -n $ns`;
do
echo $resource
done;
done
Thanks a lot @jecafarelli, you helped me solve my issue the right way ;-)
I had installed cert-manager on an OpenShift cluster inside the cert-manager namespace and when I tried to delete this namespace, it got stuck in terminating state. Executing oc delete apiservices v1beta1.admission.certmanager.k8s.io
seems to have solved the problem, the namespace is gone.
Thanks a lot @jecafarelli, you helped me solve my issue the right way ;-)
I had installed cert-manager on an OpenShift cluster inside the cert-manager namespace and when I tried to delete this namespace, it got stuck in terminating state. Executing
oc delete apiservices v1beta1.admission.certmanager.k8s.io
seems to have solved the problem, the namespace is gone.
Same here, running kubectl delete -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml
helped
Just chiming in to say I've also met this error on version 1.13.6
with GKE. It happened after I disabled GKE's Istio addon with the goal of manually installing it for full control.
This is the longest issue thread I've ever taken the time to read through, and I'm blown away that there is no real consensus or reproduction steps to the root of this issue. Seems it can get tripped in so many different way :(
The JSON and curl/proxy method mentioned numerous times above and documented at https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating is what saved me.
The advice at https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating is actively harmful, and can result in orphaned resources not getting cleaned up and resurfacing if a namespace with an identical name is later recreated.
There is work in progress to surface the specific cause of the hung delete, but the fundamental issue is that there are API types that cannot be verified to have been cleaned up, so the namespace deletion blocks until they are verified.
We also hit this with Knative which installs this namespaced apiservice.
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
labels:
autoscaling.knative.dev/metric-provider: custom-metrics
serving.knative.dev/release: v0.7.1
name: v1beta1.custom.metrics.k8s.io
spec:
group: custom.metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: autoscaler
namespace: knative-serving
version: v1beta1
versionPriority: 100
---
After deleting it both the knative-serving ns and a bunch of other stuck namespaces cleaned up. Thanks to @jecafarelli for the above bash script.
Here's a terrible powershell version.
$api = kubectl get apiservice -o json | convertfrom-json
#list out the namespaced api items can ignore kube-system
$api.items | % { $_.spec.service.namespace }
#repalce knative-serving with whatever namespace you found
$api.items | ? { $_.spec.service.namespace -eq 'knative-serving' } | ConvertTo-Json
#replace v1beta1.custom.metrics.k8s.io with whatever you found.
k delete apiservice v1beta1.custom.metrics.k8s.io
I had the same problem today and this script worked for me.
@kubernetes/sig-api-machinery-misc
This bug has existed for > year and is still a problem... What is your plan to address inbound issues such as this?
This could help with at least understanding whats going on: https://github.com/kubernetes/kubernetes/pull/80962
I am hitting the same issue
k get ns cdnamz-k8s-builder-system -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"labels":{"control-plane":"controller-manager"},"name":"cdnamz-k8s-builder-system"}}
creationTimestamp: "2019-08-05T18:38:21Z"
deletionTimestamp: "2019-08-05T20:37:37Z"
labels:
control-plane: controller-manager
name: cdnamz-k8s-builder-system
resourceVersion: "5980028"
selfLink: /api/v1/namespaces/cdnamz-k8s-builder-system
uid: 3xxxxxxx
spec:
finalizers:
- kubernetes
status:
phase: Terminating
k get ns
NAME STATUS AGE
cdnamz-k8s-builder-system Terminating 4h20m
Namespace controller should report conditions to the namespace status and clients should report that. Needs a KEP, but should be pretty straightforward if someone can take and validate it.
@timothysc there is (or was) a PR in flight (somewhere) doing exactly what @smarterclayton says.
I am pretty sure there is another github issue about this, too?
Yeah the PR is here: https://github.com/kubernetes/kubernetes/pull/73405
The issue I consider canonical is here: https://github.com/kubernetes/kubernetes/issues/70916
Here's a resource that helped me: https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.1/troubleshoot/ns_terminating.html
It's similar to the solution proposed by @slassh, but it uses kubectl proxy
to create a local proxy and make the target IP of the curl
command predictable.
--
Edit: as stated several times below this answer, this solution is a dirty hack and will possibly leave some dependent resources in the cluster. Use at your own risk, and possibly only use it as a quick way out in a development cluster (don't use it in a production cluster).
removing the finalizer directly as described in the doc above can have consequences. The resources that were pending deletion will still be defined in the cluster even after the namespace has been released. This is the purpose of the finalizer. To ensure that all dependents are removed prior to allowing the deletion of the ns.
Found workaround in similar questions:
NAMESPACE=<namespace-name>
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
Found workaround in similar questions:
NAMESPACE=<namespace-name> kubectl proxy & kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
Thank you!
It works good.
I create simple app use this workaround: https://github.com/jenoOvchi/k8sdelns
I use it for fast deletion and hope it will be helpful for someone.
Kubernetes 1.12.2 namespaces are in the Terminating state. Sometimes the finalizers can be deleted by modifying the yaml of ns. It cannot be deleted by api. Can it be deleted? What is the situation? Have we specifically tracked it (prerequisite: there are no resources in this ns), I hope I can get pointers, thank you!
Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:
kubectl get apiservice|grep False
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(Kudos to Jordan for that one)The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.
Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:
- Checking if any apiservice is unavailable and hence doesn't serve its resources:
kubectl get apiservice|grep False
- Finding all resources that still exist via
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(Kudos to Jordan for that one)The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.
How right you are.
In my case pod of Operator Framework apiservice was deleted and block terminating process.
Removing an unused apiservice (kubectl delete apiservice
Hi all, code freeze is coming up in just a few days (Thursday, end of day, PST), so we need to make sure that this issue will be solved for v1.16 or moved to v1.17. Can you comment on it's status?
Will this be backported into a current GKE release? I have a cluster that has a handful of namespaces that are still "Terminating".
@squarelover even after doing this? https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920
@josiahbjorgaard I just approved the PR, which is all we will be doing on this for 1.16.
https://github.com/kubernetes/kubernetes/pull/73405 is the aforementioned PR
It's merged. I think there may be more we can do, but please take future comments to #70916.
Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:
- Checking if any apiservice is unavailable and hence doesn't serve its resources:
kubectl get apiservice|grep False
- Finding all resources that still exist via
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(Kudos to Jordan for that one)The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.
In many of the cases you might have Metric-Server installed. When the pods you deploy in a specific namespaces looks for metric gathering. It hangs on with the Metric-server. So even after you delete all the resources in that namespace, metric-server is somehow linked to that namespace. Which will prevent you from deleting the namespace.
This post helps you identify the reason why you cannot delete Namespace. So the rite way.
Try this to get the actual list of all things in your namespace: kubernetes/kubectl#151 (comment)
Then for each object do
kubectl delete
orkubectl edit
to remove finalizers.
This solution useful for me, thanks.
Hi guys,
I made a script to make easier delete namespaces in Terminating status: https://github.com/thyarles/knsk.
Thanks.
We met with the same issue, when deleting a namespace, it will stuck in 'Terminating' state. I followed the stpes above to remove 'kubernetes' in finalizer in the yaml file. It works.
However, we don't know why we need to do extra steps. It should do kubectl delete ns foonamespace and it should delete. Can anyone give me a reason? Thank you!
Hello @xzhang007,
If you discover why the namespace deletion stucks in Terminating state, please, let me know. I tried for a while a good answer, but nothing. Then I made a script to make my life easier until discover and fix the cause.
Thank you.
@thyales it seems I did not find an answer up to now.
In our case we discovered that one of the webhhoks and finalizers we were
using was reaching out to a pod which was living in the namespaces that got
deleted.
Once the pod got deleted, the termination was stuck.
>
@xzhang007 have you looked at the answer @alvaroaleman provided? For us that was enough to find out what the cause was.
Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:
- Checking if any apiservice is unavailable and hence doesn't serve its resources:
kubectl get apiservice|grep False
- Finding all resources that still exist via
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(Kudos to Jordan for that one)The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.
also, when this issue was closed, there was a new ticket referenced to discuss how to make it clear why the namespace is stuck in Terminating. I suggest you take the conversation there instead of this closed issue.
It's merged. I think there may be more we can do, but please take future comments to #70916.
@jeff-knurek That should be the right way. Thank you.
In our case it was a botched upgrade of cert-manager
which broke the finalizer. https://github.com/jetstack/cert-manager/issues/1582
$ kube get apiservice
NAME SERVICE AVAILABLE AGE
v1. Local True 43d
v1.apps Local True 43d
v1.authentication.k8s.io Local True 43d
v1.authorization.k8s.io Local True 43d
v1.autoscaling Local True 43d
v1.batch Local True 43d
v1.coordination.k8s.io Local True 43d
v1.networking.k8s.io Local True 43d
v1.rbac.authorization.k8s.io Local True 43d
v1.scheduling.k8s.io Local True 43d
v1.storage.k8s.io Local True 43d
v1alpha1.certmanager.k8s.io Local True 3d22h
v1alpha1.crd.k8s.amazonaws.com Local True 43d
v1beta1.admission.certmanager.k8s.io cert-manager/cointainers-cointainers-cert-manager-webhook False (MissingEndpoints) 60m
v1beta1.admissionregistration.k8s.io Local True 43d
v1beta1.apiextensions.k8s.io Local True 43d
v1beta1.apps Local True 43d
v1beta1.authentication.k8s.io Local True 43d
v1beta1.authorization.k8s.io Local True 43d
v1beta1.batch Local True 43d
v1beta1.certificates.k8s.io Local True 43d
v1beta1.coordination.k8s.io Local True 43d
v1beta1.events.k8s.io Local True 43d
v1beta1.extensions Local True 43d
v1beta1.networking.k8s.io Local True 43d
v1beta1.node.k8s.io Local True 43d
v1beta1.policy Local True 43d
v1beta1.rbac.authorization.k8s.io Local True 43d
v1beta1.scheduling.k8s.io Local True 43d
v1beta1.storage.k8s.io Local True 43d
v1beta1.webhook.cert-manager.io cert-manager/cointainers-cointainers-cert-manager-webhook False (MissingEndpoints) 3d22h
v1beta2.apps Local True 43d
v2beta1.autoscaling Local True 43d
v2beta2.autoscaling Local True 43d
Hi.
I my case namespace stucks in Terminating when https://github.com/rancher/rancher/issues/21546#issuecomment-553635629
Maybe it will help.
https://medium.com/@newtondev/how-to-fix-kubernetes-namespace-deleting-stuck-in-terminating-state-5ed75792647e
This worked like a champ for me
I also faced the same issue now it is working fine for me. Please refer following document and solve your issue
@zerkms well, sometimes, it's a legitimate advice, is not it? Often finalizers being waited on used to be served by objects that was deleted as part of the namespace deletion. In this case, since there is no point in waiting any longer - there is nothing that could do finalization any more -, patching the objects the way the article describes _is the only option_.
Note that the article is applicable only if the issue _was not resolved_ by the applying steps listed in the Known Issues page, linked at the top of the article, which basically is the advice that the comment you linked repeats.
@zerkms well, sometimes, it's a legitimate advice, is not it? Often finalizers being waited on used to be served by objects that was deleted as part of the namespace deletion
I've never seen that be true for a spec.finalizer on a namespace. Every instance I've seen has involved the namespace cleanup controller, and has either been caused by a persistent object in the namespace (which that advice would strand in etcd), or an unresponsive aggregated API (which removing the namespace spec.finalizer would skip waiting for, also stranding any persisted resources from that API)
The article does not warn that bypassing the namespace finalization risks leaving namespaced resources stranded in storage, and is not recommended.
I've never seen that be true for a spec.finalizer on a namespace
Yep, that's right, this is becaise this finalizer is implemented by the kubernetes itself, but there could be other finalizers on objects inside that namespace, which could be implemented by objects in that said namespace. One example that I encountered recently was https://appscode.com/products/stash/
.
It puts finalizers on some of its CRDs which are to be serviced by the stash-operator deployment. But with stash-operator already deleted, there is nothing that can remove the finalizer mark from those CRDs and the namespace deletion gets stuck. In this case patching out those finalizers (not on the namespace itself, but on those objects) is the only sensible thing to do.
Hope it makes sense.
In this case patching out those finalizers (not on the namespace itself, but on those objects) is the only sensible thing to do.
Correct. I would not object to that in a "delete all resources" cleanup scenario, but that is not what the linked article walks through... it describes how to remove a spec.finalizer from the namespace.
first, take small coffee and relax , now go to your k8s master nodes
kubectl cluster-info
Kubernetes master is running at https://localhost:6443
KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
now run the kube-proxy
kubectl proxy &
Starting to serve on 127.0.0.1:8001
save the ID to delete it later on :)
put it in file
edit the file and remove the finalizers
},
"spec": {
"finalizers": [
"kubernetes"
]
},
after editing it should look like this 👍
},
"spec": {
"finalizers": [
]
},
we almost there 👍
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize
and it's gone 👍
Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:
- Checking if any apiservice is unavailable and hence doesn't serve its resources:
kubectl get apiservice|grep False
- Finding all resources that still exist via
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(Kudos to Jordan for that one)The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.
Hey, guys! I follow the tips provided by @alvaroaleman and I made a script that inspect and try the clean deletion before do a hard deletion of stucked namespace.
What the script https://github.com/thyarles/knsk do:
Hope it helps.
@thyarles Thank you so much. I used your way to solve the problem.
$kubectl get apiservices to check which service is unavailable, delete those available is false by $kubectl delete apiservice [service-name], and after that there would be no issues about deleting a name space.
For our team, there are 3 unavailable apiservices, v1beta1.admission.certmanager.k8s.io, v1beta1.metrics.k8s.io, and v1beta1.webhook.certmanager.k8s.io.
Note that your cluster is somewhat broken if the metrics apiserver isn't running, just removing the APIService doesn't actually fix the root cause.
@lavalamp the metrics is an unavailable apiservice.
Yes, which means the metrics apiserver is not running, which means HPA doesn't work on your cluster, and probably other things, too.
Yes. HPA doesn't work now. I should not delete metrics and find a way to fix it.
@thyarles Thank you so much. I used your way to solve the problem.
$kubectl get apiservices to check which service is unavailable, delete those available is false by $kubectl delete apiservice [service-name], and after that there would be no issues about deleting a name space.
For our team, there are 3 unavailable apiservices, v1beta1.admission.certmanager.k8s.io, v1beta1.metrics.k8s.io, and v1beta1.webhook.certmanager.k8s.io.
@xzhang007 glad to hear! Now you must check why your v1beta1.metrics.k8s.io became broken. Check how it would like:
```
$ kubectl -n kube-system get all | grep metrics
pod/metrics-server-64f74f8d47-r5vcq 2/2 Running 9 119d
service/metrics-server ClusterIP xx.xx.xx.xx
deployment.apps/metrics-server 1/1 1 1 201d
replicaset.apps/metrics-server-55c7f68d94 0 0 0 165d
replicaset.apps/metrics-server-5c696bb6d7 0 0 0 201d
replicaset.apps/metrics-server-5cdb8bb4fb 0 0 0 201d
replicaset.apps/metrics-server-64f74f8d47 1 1 1 119d
replicaset.apps/metrics-server-686789bb4b 0 0 0 145d```
$ kubectl -n kube-system get all | grep metrics
pod/metrics-server-5dcfd4dd9f-m2v9k 1/1 Running 0 2d20h
service/metrics-server ClusterIP xx.xx.xx.xx
deployment.apps/metrics-server 1/1 1 1 27d
replicaset.apps/metrics-server-5dcfd4dd9f 1 1 1 27d
replicaset.apps/metrics-server-7fcf9cc98b 0 0 0 27d
Yes. HPA doesn't work now. I should not delete metrics and find a way to fix it.
@xzhang007 in fact it doesn't work before the you noticed the problem.... you just noticed because it put your deleted namespaces in stuck mode. Just use a helm package manager to re-deploy your metric-server or just call the command bellow to fix it (check the deployment file before apply):
$ curl https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/metrics-server-deployment.yaml | kubectl apply -f -
@slassh solution worked for me on Kubernetes 1.15
Delete v1beta1.metrics.k8s.io APIService
kubectl get ns ns-to-delete -o yaml
...
status:
conditions:
- lastTransitionTime: "2020-01-08T05:36:52Z"
message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
unable to handle the request'
...
kubectl get APIService
...
v1beta1.metrics.k8s.io kube-system/metrics-server False (ServiceNotFound)
kubectl delete v1beta1.metrics.k8s.io APIService
The cert-manager was unavailable maybe since it was set up incorrectly. For example, use a wrong syntax in ingress controller. For our system, it was
"certmanager.k8s.io/cluster-issuer": "letsencrypt-prod"
and it was changed to
"cert-manager.io/cluster-issuer": "letsencrypt-prod"
make it available.
As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where kubectl replace --raw
is available (not sure from which version). This way you will not have to spawn a kubectl proxy
process and avoid dependency with curl
(that in some environment like busybox is not available). In the hope that this will help someone else I left this here:
kubectl get namespace "stucked-namespace" -o json \
| tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
| kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -
Has it been established whether this is a fixable issue? Seems to be a lot hacky solutions here but nothing addressing the underlying issue which is that none of us can delete our namespaces....
I have this on EKS v1.14 cluster
Has it been established whether this is a fixable issue? Seems to be a lot hacky solutions here but nothing addressing the underlying issue which is that none of us can delete our namespaces
The fundamental issue is that an aggregated API group in your cluster is unavailable. It is intentional that the namespace cleanup controller blocks until all APIs are available, so that it can verify all resources from all API groups are cleaned up for that namespace.
for ppl trying to curl the API:
# Check all possible clusters, as your .KUBECONFIG may have multiple contexts:
kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
# Select name of cluster you want to interact with from above output:
export CLUSTER_NAME="some_server_name"
# Point to the API server referring the cluster name
APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(@.name==\"$CLUSTER_NAME\")].cluster.server}")
# Gets the token value
TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)
# Explore the API with TOKEN
curl -X GET $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-kubectl-proxy
Here's a script to do this automatically. Needs jq
:
#!/bin/bash
if [ -z "${1}" ] ; then
echo -e "\nUsage: ${0} <name_of_the_namespace_to_remove_the_finalizer_from>\n"
echo "Valid cluster names, based on your kube config:"
kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
exit 1
fi
kubectl proxy --port=8901 &
PID=$!
sleep 1
echo -n 'Current context : '
kubectl config current-context
read -p "Are you sure you want to remove the finalizer from namespace ${1}? Press Ctrl+C to abort."
kubectl get namespace "${1}" -o json \
| jq '.spec.finalizers = [ ]' \
| curl -k \
-H "Content-Type: application/json" \
-X PUT --data-binary @- "http://localhost:8901/api/v1/namespaces/${1}/finalize"
kill -15 $PID
Everyone: scripts to automate the finalizer removal do more harm than good. They may leave time-bombs in the aggregated apiserver(s) that aren't available; if someone recreates the namespace, suddenly a bunch of old objects may re-appear.
The real solution is:
$ kubectl get api-services
# something in the list is unavailable. Figure out what it is and fix it.
# ... the namespace lifecycle controller will finish deleting the namespace.
Everyone: scripts to automate the finalizer removal do more harm than good. They may leave time-bombs in the aggregated apiserver(s) that aren't available; if someone recreates the namespace, suddenly a bunch of old objects may re-appear.
The real solution is:
$ kubectl get api-services # something in the list is unavailable. Figure out what it is and fix it. # ... the namespace lifecycle controller will finish deleting the namespace.
https://github.com/thyarles/knsk
This script does all the checks and tries to do a clean deletion, including looking for orphaned resources. If the user wants to take a risk, the script offers a --force option to perform a non-recommended way of deletion.
typo, should be apiservices
This command shows not available apis:
kubectl get apiservices --template='{{range $api := .items}}{{range $api.status.conditions}}{{if eq .type "Available"}}{{$api.metadata.name}} {{.status}}{{"\n"}}{{end}}{{end}}{{end}}' | grep -i false
This article will surely be useful to you:
https://access.redhat.com/solutions/5038511
Actually what exists is a conflict in the apiservices, they could validate the health status of the apis in openshift:
oc get apiservices -o=custom-columns="name:.metadata.name,status:.status.conditions[0].status"
the api that fails will need to restart it, restarting the pod or the deployement that belongs to that API, after that try to delete the namespace.
$ oc delete namspace
and ready, business fixed !!
Pretty disrespectful to use your own language in a place where everyone agrees to speak English. 👎
Where does everyone agree to speak English?
On Thu, Apr 30, 2020 at 17:58 theAkito notifications@github.com wrote:
Pretty disrespectful to use your own language in a place where everyone
agrees to speak English. 👎—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-622137770,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ALGMKB6K4OU4X3XOYMALOBLRPHYCDANCNFSM4ETUOEPA
.>
Chris, Lead Architect @ brace.ai
--
Confidentiality Notice: This email is intended for the sole use of the
intended recipient(s) and may contain confidential, proprietary or
privileged information. If you are not the intended recipient, you are
notified that any use, review, dissemination, copying or action taken based
on this message or its attachments, if any, is prohibited. If you are not
the intended recipient, please contact the sender by reply email and
destroy or delete all copies of the original message and any attachments.
ready, excuse me it was for my speed, it was fixed
We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.
@teoincontatto
As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where
kubectl replace --raw
is available (not sure from which version). This way you will not have to spawn akubectl proxy
process and avoid dependency withcurl
(that in some environment like busybox is not available). In the hope that this will help someone else I left this here:kubectl get namespace "stucked-namespace" -o json \ | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \ | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -
This worked perfectly!
We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.
We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.
Still trying to understand. Forgive me. I may have clicked thumbs down by mistake.
Yes, indeed, tools haven't been done to perfection.
Those, giving a thumbs down without an explanation, doesn't make sense.
Almost all the time I experience this issue, it's up to CRD's. Delete CRD's if they are used only in that namespace and then you can proceed with deleting finalizer and namespace.
As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where
kubectl replace --raw
is available (not sure from which version). This way you will not have to spawn akubectl proxy
process and avoid dependency withcurl
(that in some environment like busybox is not available). In the hope that this will help someone else I left this here:kubectl get namespace "stucked-namespace" -o json \ | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \ | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -
@teoincontatto Thank you! This finally worked!
Sometimes only edit the resource manifest online would be not working very well(I mean remove the finalizers
filed and save).
So, I got some new way from others.
kubectl get namespace linkerd -o json > linkerd.json
# Where:/api/v1/namespaces/<your_namespace_here>/finalize
kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json
After running that command, the namespace should now be absent from your namespaces list.. And it works for me.
Not only namespace
but also support the other resources.
i fixed the problem by removing the finalizers lines using the: kubectl edit annoying-ns
Hmm ... I have this problem right now :)
Today I did an update of my eks cluster from 1.15 to 1.16.
Everything looks fine so far.
But my development ns "configcluster" was a kind of "damanged".
So I decide to clean it up.
k delete ns configcluster
....
now this hangs (3h +) :/
$ kubectl get namespace configcluster -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"configcluster"}}
creationTimestamp: "2020-06-19T06:40:15Z"
deletionTimestamp: "2020-06-19T09:19:16Z"
name: configcluster
resourceVersion: "22598109"
selfLink: /api/v1/namespaces/configcluster
uid: e50f0b53-b21e-4e6e-8946-c0a0803f031b
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2020-06-19T09:19:21Z"
message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
unable to handle the request'
reason: DiscoveryFailed
status: "True"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2020-06-19T09:19:22Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2020-06-19T09:19:22Z"
message: All content successfully deleted
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
phase: Terminating
How do we get more exposure to this thorn in the foot issue?
On Fri, Jun 19, 2020 at 4:46 AM Andreas Höhmann notifications@github.com
wrote:
Hmm ... I have this problem right now :)
Today I did an update of my eks cluster from 1.15 to 1.16.
Everything looks fine so far.
But my development ns "configcluster" was a kind of "damanged".
So I decide to clean it up.k delete ns configcluster
....
now this hangs (3h +) :/$ kubectl get namespace configcluster -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"configcluster"}}
creationTimestamp: "2020-06-19T06:40:15Z"
deletionTimestamp: "2020-06-19T09:19:16Z"
name: configcluster
resourceVersion: "22598109"
selfLink: /api/v1/namespaces/configcluster
uid: e50f0b53-b21e-4e6e-8946-c0a0803f031b
spec:
finalizers:
- kubernetes
status:
conditions:- lastTransitionTime: "2020-06-19T09:19:21Z"
message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
unable to handle the request'
reason: DiscoveryFailed
status: "True"
type: NamespaceDeletionDiscoveryFailure- lastTransitionTime: "2020-06-19T09:19:22Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure- lastTransitionTime: "2020-06-19T09:19:22Z"
message: All content successfully deleted
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
phase: Terminating—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-646543073,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFLKRCLHIZ77X2Z3F5GAOCTRXMXVTANCNFSM4ETUOEPA
.
@bobhenkel well this issue is closed, so effectively this means that there is no issue (as far as any actionable items are concerned). If you need practical help with dealing with a similar situation, please read the thread above, there are some good pieces of advise there (and also some bad ones).
In my case, I had to manually delete my ingress load balancer from the GCP Network Service console. I had manually created the load balancer frontend directly in the console. Once I deleted the load balancer the namespace was automatically deleted.
I'm suspecting that Kubernetes didn't want to delete since the state of the load balancer was different than the state in the manifest.
I will try to automate the ingress frontend creation using annotations next to see if I can resolve this issue.
Sometimes only edit the resource manifest online would be not working very well(I mean remove the
finalizers
filed and save).
So, I got some new way from others.kubectl get namespace linkerd -o json > linkerd.json # Where:/api/v1/namespaces/<your_namespace_here>/finalize kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json
After running that command, the namespace should now be absent from your namespaces list.. And it works for me.
Not only
namespace
but also support the other resources.
you are a star it worked
Sometimes only edit the resource manifest online would be not working very well(I mean remove the
finalizers
filed and save).
So, I got some new way from others.kubectl get namespace linkerd -o json > linkerd.json # Where:/api/v1/namespaces/<your_namespace_here>/finalize kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json
After running that command, the namespace should now be absent from your namespaces list.. And it works for me.
Not only
namespace
but also support the other resources.
Tried a lot of solutions but this is the one that worked for me. Thank you!
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
This should really be the "accepted" answer - it completely resolved the root of this issue!
Take from the link above:
This is not the right way, especially in a production environment.
Today I got into the same problem. By removing the finalizer you’ll end up with leftovers in various states. You should actually find what is keeping the deletion from complete.
See https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920
(also, unfortunately, ‘kubetctl get all’ does not report all things, you need to use similar commands like in the link)
My case — deleting ‘cert-manager’ namespace. In the output of ‘kubectl get apiservice -o yaml’ I found APIService ‘v1beta1.admission.certmanager.k8s.io’ with status=False . This apiservice was part of cert-manager, which I just deleted. So, in 10 seconds after I ‘kubectl delete apiservice v1beta1.admission.certmanager.k8s.io’ , the namespace disappeared.
Hope that helps.
With that being said, I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.
You can find it here: https://github.com/oze4/service.remove-terminating-namespaces
I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.
You can find it here: https://github.com/oze4/service.remove-terminating-namespaces
Yet another oneliner:
for ns in $(kubectl get ns --field-selector status.phase=Terminating -o jsonpath='{.items[*].metadata.name}'); do kubectl get ns $ns -ojson | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f -; done
But deleting stuck namespaces is not a good solution. Right way is to find out why it's stuck. Very common reason is there's an unavailable API service(s) which prevents cluster from finalizing namespaces.
For example here I haven't deleted Knative properly:
$ kubectl get apiservice|grep False
NAME SERVICE AVAILABLE AGE
v1beta1.custom.metrics.k8s.io knative-serving/autoscaler False (ServiceNotFound) 278d
Deleting it solved the problem
k delete apiservice v1beta1.custom.metrics.k8s.io
apiservice.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" deleted
$ k create ns test2
namespace/test2 created
$ k delete ns test2
namespace "test2" deleted
$ kgns test2
Error from server (NotFound): namespaces "test2" not found
I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.
You can find it here: https://github.com/oze4/service.remove-terminating-namespaces
good job.
I had a similar issue on 1.18 in a lab k8s cluster and adding a note to maybe help others. I had been working with the metrics API and with custom metrics in particular. After deleting those k8s objects to recreate it, it stalled on deleting the namespace with an error that the metrics api endpoint could not be found. Putting that back in on another namespace, everything cleared up immediately.
This was in the namespace under status.conditions.message:
Discovery failed for some groups, 4 failing: unable to retrieve the
complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently
unable to handle the request, custom.metrics.k8s.io/v1beta2: the server is currently
unable to handle the request, external.metrics.k8s.io/v1beta1: the server is
currently unable to handle the request, metrics.k8s.io/v1beta1: the server is
Yet another oneliner:
for ns in $(kubectl get ns --field-selector status.phase=Terminating -o jsonpath='{.items[*].metadata.name}'); do kubectl get ns $ns -ojson | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f -; done
But deleting stuck namespaces is not a good solution. Right way is to find out why it's stuck. Very common reason is there's an unavailable API service(s) which prevents cluster from finalizing namespaces.
For example here I haven't deleted Knative properly:$ kubectl get apiservice|grep False NAME SERVICE AVAILABLE AGE v1beta1.custom.metrics.k8s.io knative-serving/autoscaler False (ServiceNotFound) 278d
Deleting it solved the problem
k delete apiservice v1beta1.custom.metrics.k8s.io apiservice.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" deleted
$ k create ns test2 namespace/test2 created $ k delete ns test2 namespace "test2" deleted $ kgns test2 Error from server (NotFound): namespaces "test2" not found
Definitely the cleanest one liner! It's important to note that none of these "solutions" actually solve the root issue.
That is the message would should be spreading :smile: not "yet another one liner".
efinitely the cleanest one liner! It's important to note that none of these "solutions" actually solve the root issue.
This solution solves one of the all possibilities. To look for all possible root causes and fix them, I use this script: https://github.com/thyarles/knsk
@thyarles very nice!
Please do not use modify finalize
to delete namespace. That will cause an error
Please find out the cause of namespace terminating. Currently known troubleshooting directions
I encounter the same problem:
# sudo kubectl get ns
NAME STATUS AGE
cattle-global-data Terminating 8d
cattle-global-nt Terminating 8d
cattle-system Terminating 8d
cert-manager Active 8d
default Active 10d
ingress-nginx Terminating 9d
kube-node-lease Active 10d
kube-public Active 10d
kube-system Active 10d
kubernetes-dashboard Terminating 4d6h
local Active 8d
p-2sfgk Active 8d
p-5kdx9 Active 8d
# sudo kubectl get all -n kubernetes-dashboard
No resources found in kubernetes-dashboard namespace.
# sudo kubectl get namespace kubernetes-dashboard -o json
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"annotations": {
"cattle.io/status": "{\"Conditions\":[{\"Type\":\"ResourceQuotaInit\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"},{\"Type\":\"InitialRolesPopulated\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"}]}",
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"kubernetes-dashboard\"}}\n",
"lifecycle.cattle.io/create.namespace-auth": "true"
},
"creationTimestamp": "2020-09-29T01:15:45Z",
"deletionGracePeriodSeconds": 0,
"deletionTimestamp": "2020-10-02T07:59:52Z",
"finalizers": [
"controller.cattle.io/namespace-auth"
],
"managedFields": [
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:annotations": {
"f:cattle.io/status": {},
"f:lifecycle.cattle.io/create.namespace-auth": {}
},
"f:finalizers": {
".": {},
"v:\"controller.cattle.io/namespace-auth\"": {}
}
}
},
"manager": "Go-http-client",
"operation": "Update",
"time": "2020-09-29T01:15:45Z"
},
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:annotations": {
".": {},
"f:kubectl.kubernetes.io/last-applied-configuration": {}
}
}
},
"manager": "kubectl-client-side-apply",
"operation": "Update",
"time": "2020-09-29T01:15:45Z"
},
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:status": {
"f:phase": {}
}
},
"manager": "kube-controller-manager",
"operation": "Update",
"time": "2020-10-02T08:13:49Z"
}
],
"name": "kubernetes-dashboard",
"resourceVersion": "3662184",
"selfLink": "/api/v1/namespaces/kubernetes-dashboard",
"uid": "f1944b81-038b-48c2-869d-5cae30864eaa"
},
"spec": {},
"status": {
"conditions": [
{
"lastTransitionTime": "2020-10-02T08:13:49Z",
"message": "All resources successfully discovered",
"reason": "ResourcesDiscovered",
"status": "False",
"type": "NamespaceDeletionDiscoveryFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All legacy kube types successfully parsed",
"reason": "ParsedGroupVersions",
"status": "False",
"type": "NamespaceDeletionGroupVersionParsingFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content successfully deleted, may be waiting on finalization",
"reason": "ContentDeleted",
"status": "False",
"type": "NamespaceDeletionContentFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content successfully removed",
"reason": "ContentRemoved",
"status": "False",
"type": "NamespaceContentRemaining"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content-preserving finalizers finished",
"reason": "ContentHasNoFinalizers",
"status": "False",
"type": "NamespaceFinalizersRemaining"
}
],
"phase": "Terminating"
}
# sudo kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
You can use etcdctl
to find undeleted resources
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
get /registry --prefix | grep <namespace>
Just copy and paste in your terminal
for NS in $(kubectl get ns 2>/dev/null | grep Terminating | cut -f1 -d ' '); do
kubectl get ns $NS -o json > /tmp/$NS.json
sed -i '' "s/\"kubernetes\"//g" /tmp/$NS.json
kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f /tmp/$NS.json
done
/tmp/$NS.json
this worked for me, and I ran after verifying there were no dangling k8s objects in the ns. Thanks!
I used this to remove a namespace stuck at Terminated
example :
kubectl get namespace openebs -o json | jq -j '.spec.finalizers=null' > tmp.json
kubectl replace --raw "/api/v1/namespaces/openebs/finalize" -f ./tmp.json
For all the googlers who bumped into stuck namespaces at Terminating on Rancher specific namespaces (e.g cattle-system), the following modified command (grebois's original) worked for me:
for NS in $(kubectl get ns 2>/dev/null | grep Terminating | cut -f1 -d ' '); do
kubectl get ns $NS -o json > /tmp/$NS.json
sed -i "s/\"controller.cattle.io\/namespace-auth\"//g" /tmp/$NS.json
kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f /tmp/$NS.json
done
Folks, just FYI, when the video for this kubecon talk is out I plan to link to it and some of the helpful comments above, and lock this issue.
I recorded a 10 minute explanation of what is going on and presented it at this SIG Deep Dive session.
Here's a correct comment with 65 upvotes
Mentioned several times above, this medium post is an example of doing things the right way. Find and fix the broken api service.
All the one liners that just remove the finalizers on the namespace do address the root cause and leave your cluster subtly broken, which will bite you later. So please don't do that. The root cause fix is usually easier anyway. It seems that people like to post variations on this theme even though there's numerous correct answers in the thread already, so I'm going to lock the issue now, to ensure that this comment stays at the bottom.
Most helpful comment
@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove
"kubernetes"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
and it should delete your namespace,