Kubernetes: deleting namespace stuck at "Terminating" state

Created on 5 Mar 2018  ·  180Comments  ·  Source: kubernetes/kubernetes

I am using v1.8.4 and I am having the problem that deleted namespace stays at "Terminating" state forever. I did "kubectl delete namespace XXXX" already.

kinbug prioritimportant-soon siapi-machinery

Most helpful comment

@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize

and it should delete your namespace,

All 180 comments

/sig api-machinery

@shean-guangchang Do you have some way to reproduce this?

And out of curiosity, are you using any CRDs? We faced this problem with TPRs previously.

/kind bug

I seem to be experiencing this issue with a rook deployment:

➜  tmp git:(master) ✗ kubectl delete namespace rook
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
➜  tmp git:(master) ✗ 

I think it does have something to do with their CRD, I see this in the API server logs:

E0314 07:28:18.284942       1 crd_finalizer.go:275] clusters.rook.io failed with: timed out waiting for the condition
E0314 07:28:18.287629       1 crd_finalizer.go:275] clusters.rook.io failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "clusters.rook.io": the object has been modified; please apply your changes to the latest version and try again

I've deployed rook to a different namespace now, but I'm not able to create the cluster CRD:

➜  tmp git:(master) ✗ cat rook/cluster.yaml 
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
  name: rook
  namespace: rook-cluster
spec:
  dataDirHostPath: /var/lib/rook-cluster-store
➜  tmp git:(master) ✗ kubectl create -f rook/
Error from server (MethodNotAllowed): error when creating "rook/cluster.yaml": the server does not allow this method on the requested resource (post clusters.rook.io)

Seems like the CRD was never cleaned up:

➜  tmp git:(master) ✗ kubectl get customresourcedefinitions clusters.rook.io -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  creationTimestamp: 2018-02-28T06:27:45Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-03-14T07:36:10Z
  finalizers:
  - customresourcecleanup.apiextensions.k8s.io
  generation: 1
  name: clusters.rook.io
  resourceVersion: "9581429"
  selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusters.rook.io
  uid: 7cd16376-1c50-11e8-b33e-aeba0276a0ce
spec:
  group: rook.io
  names:
    kind: Cluster
    listKind: ClusterList
    plural: clusters
    singular: cluster
  scope: Namespaced
  version: v1alpha1
status:
  acceptedNames:
    kind: Cluster
    listKind: ClusterList
    plural: clusters
    singular: cluster
  conditions:
  - lastTransitionTime: 2018-02-28T06:27:45Z
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: 2018-02-28T06:27:45Z
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  - lastTransitionTime: 2018-03-14T07:18:18Z
    message: CustomResource deletion is in progress
    reason: InstanceDeletionInProgress
    status: "True"
    type: Terminating
➜  tmp git:(master) ✗ 

I have a fission namespace in a similar state:

➜  tmp git:(master) ✗ kubectl delete namespace fission
Error from server (Conflict): Operation cannot be fulfilled on namespaces "fission": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
➜  tmp git:(master) ✗ kubectl get pods -n fission     
NAME                          READY     STATUS        RESTARTS   AGE
storagesvc-7c5f67d6bd-72jcf   0/1       Terminating   0          8d
➜  tmp git:(master) ✗ kubectl delete pod/storagesvc-7c5f67d6bd-72jcf --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (NotFound): pods "storagesvc-7c5f67d6bd-72jcf" not found
➜  tmp git:(master) ✗ kubectl describe pod -n fission storagesvc-7c5f67d6bd-72jcf
Name:                      storagesvc-7c5f67d6bd-72jcf
Namespace:                 fission
Node:                      10.13.37.5/10.13.37.5
Start Time:                Tue, 06 Mar 2018 07:03:06 +0000
Labels:                    pod-template-hash=3719238268
                           svc=storagesvc
Annotations:               <none>
Status:                    Terminating (expires Wed, 14 Mar 2018 06:41:32 +0000)
Termination Grace Period:  30s
IP:                        10.244.2.240
Controlled By:             ReplicaSet/storagesvc-7c5f67d6bd
Containers:
  storagesvc:
    Container ID:  docker://3a1350f6e4871b1ced5c0e890e37087fc72ed2bc8410d60f9e9c26d06a40c457
    Image:         fission/fission-bundle:0.4.1
    Image ID:      docker-pullable://fission/fission-bundle@sha256:235cbcf2a98627cac9b0d0aae6e4ea4aac7b6e6a59d3d77aaaf812eacf9ef253
    Port:          <none>
    Command:
      /fission-bundle
    Args:
      --storageServicePort
      8000
      --filePath
      /fission
    State:          Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /fission from fission-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from fission-svc-token-zmsxx (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  fission-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  fission-storage-pvc
    ReadOnly:   false
  fission-svc-token-zmsxx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  fission-svc-token-zmsxx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
➜  tmp git:(master) ✗ 

Fission also uses CRDs, however, they appear to be cleaned up.

@shean-guangchang - I had the same issue. I've deleted everything under the namespaces manually, deleted and purged everything from "helm" and restarted the master nodes one by one and it fixed the issue.

I imagine what i've encountered has something to do with "ark", "tiller" and Kuberenets all working together (i bootstraped using helm and backed-up using ark) so this may not be a Kuberenets issue per say, on the other hand, it was pretty much impossible to troubleshot because there are no relevant logs.

if it is the rook one, take a look at this: https://github.com/rook/rook/issues/1488#issuecomment-365058080

I guess that makes sense, but it seems buggy that it's possible to get a namespace into an undeletable state.

I have a similar environment (Ark & Helm) with @barakAtSoluto and have the same issue. Purging and restarting the masters didn't fix it for me though. Still stuck at terminating.

I had that too when trying to recreate the problem. I eventually had to create a new cluster....
Exclude - default, kube-system/public and all ark related namespaces from backup and restore to prevent this from happening...

I'm also seeing this too, on a cluster upgraded from 1.8.4 to 1.9.6. I don't even know what logs to look at

The same issue on 1.10.1 :(

Same issue on 1.9.6

Edit: The namespace couldn't be deleted because of some pods hanging. I did a delete with --grace-period=0 --force on them all and after a couple of minutes the namespace was deleted as well.

Hey,

I've got this over and over again and it's most of the time some trouble with finalizers.

If a namespace is stuck, try to kubectl get namespace XXX -o yaml and check if there is a finalizer on it. If so, edit the namespace and remove the finalizer (by passing an empty array) and then the namespace gets deleted

@xetys is it safe? in my case there is only one finalizer named "kubernetes".

That's strange, I've never seen such a finalizer. I just can speak based in my experience. I did that several time in a production cluster and it's still alive

Same issue on 1.10.5. I tried all advice in this issue without result. I was able to get rid of the pods, but the namespace is still hanging.

Actually, the ns too got deleted after a while.

It would be good to understand what causes this behavior, the only finalizer I had is kubernetes. I also have dynamic webhooks, can these be related?

@xetys Well, finally I used your trick on the replicas inside that namespace. They had some custom finalizer that probably no longer existed, so I couldn't delete them. When I removed the references to that finalizer, they disappeared and so did the namespace. Thanks! :)

Same issue on an EKS 1.10.3 cluster:

Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:13:43Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Having the same problem on a bare metal cluster:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

My namespace looks like so:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"creneaux-app","namespace":""}}
  name: creneaux-app
spec:
  finalizers:
  - kubernetes

It's actually the second namespace I've had have this problem.

Try this to get the actual list of all things in your namespace: https://github.com/kubernetes/kubectl/issues/151#issuecomment-402003022

Then for each object do kubectl delete or kubectl edit to remove finalizers.

removing the initializer did the trick for me...

When I do kubectl edit namespace annoying-namespace-to-delete and remove the finalizers, they get re-added when I check with a kubectl get -o yaml.

Also, when trying what you suggested @adampl I get no output (removing --ignore-not-found confirms no resources are found in the namespace, of any type).

@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize

and it should delete your namespace,

@slassh It worked! Should've thought about making an API call: thanks a lot! We shall sing your praises forever

Issue exists in v1.11.1. I had a stuck rancher/helm deployment of dokuwiki. I first had to force delete the pods as suggested by @siXor and then followed @slassh advice. All good now.

@slassh how to see the kubernetes-cluster-ip? i use one of the node ip deployed the kubernetes cluster replace, and it report 404.

hi @jiuchongxiao , by kubernetes-cluster-ip i meant one of your node masters ip.
sorry if it's confusing !

If you start 'kubectl proxy' first you can direct the curl to http://127.0.0.1:8001/api/v1/namespaces/annoying-namespace-to-delete/finalize. I couldn't get authentication to work until I did it that way.

good tips @2stacks. Just need replace https to http.

I still see this issue in 1.11.2.

To give more context for reproducing, I saw this only with CRDs. By deleting the CRD object, I got in a weird state were the objects owned by it were not deleted. I didn't notice so I issued a delete for the namespace. Then I deleted all the objects in the namespace with kubectl delete all --all -n my-namespace. At that point the namespace deletion got stuck. I hope this helps in some way.

By looking at logs I just found out that this particular problem was related to the controller manager being unhealthy. In my case it was not a bug most likely. When the controller manager went up again everything was cleaned up correctly.

@slassh Perfect solution! thank you very much!!!!

I also see this issue with 1.10.x. I find @slassh's comment a workaround that only hides the real issue. Why are the namespaces stuck at Terminating?

We discovered the reason for deleting namespaces to be stuck in our case (@palmerabollo)

When a namespace has a finalizer kubernetes, it means its an internal problem with the API Server.

Run kubectl api-resources, if it returns an like the following, it means that the custom API isn't reachable.

error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Run kubectl get apiservices v1beta1.metrics.k8s.io -o yaml, for checking its status conditions.

status:
  conditions:
  - lastTransitionTime: 2018-10-15T08:14:15Z
    message: endpoints for service/metrics-server in "kube-system" have no addresses
    reason: MissingEndpoints
    status: "False"
    type: Available

The above error should be caused by a crashloopbackoff affecting metrics-server. It would be similar for other custom APIs registered in Kubernetes.

Check your services health in kube-system for restoring cluster runtime operations like deleting namespaces.

I'm facing with this issue on v1.11.3. At to the finalizers only kubernetes present for problematic namespace.

spec:
  finalizers:
  - kubernetes

@slassh Thanks a million, your solution works well!
I have the same problem in my cluster with ark, tiller and kubed. I suspect the issues might be the api of kubed that is giving an error, although not sure why it impacts the deletion of another namespace.

@javierprovecho I was merely playing around with metrics server and since it wasn't working I tried to delete the service and whatnot but my namespace still won't delete, error is

status:
  conditions:
  - lastTransitionTime: 2018-08-24T08:59:36Z
    message: service/metrics-server in "kube-system" is not present
    reason: ServiceNotFound
    status: "False"
    type: Available

Do you know how to recover from this state?

edit: I found out... I had to delete _everything_ even remotely related to metrics/HPA and then restart the entire control plane (had to take down all my replicas of it, before booting them back up.) this included deleting the apiservice v1beta1.metrics.k8s.io itself.

@2rs2ts

$ kubectl delete apiservice v1beta1.metrics.k8s.io

By getting rid of the non-functioning metrics API service the controller-manager will be able to delete the stale namespace(s).

Restarting the control plane is not necessary.

@antoineco no, it was necessary; I deleted the apiservice and waited quite a while but the namespace would not be deleted until I restarted the control plane.

first, take small coffee and relax , now go to your k8s master nodes

  1. kubectl cluster-info
    Kubernetes master is running at https://localhost:6443
    KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  1. now run the kube-proxy
    kubectl proxy &
    Starting to serve on 127.0.0.1:8001

save the ID to delete it later on :)

  1. find your name-space that decided no to be deleted :) for us it will be cattle-system
    kubectl get ns
    cattle-system Terminating 1d

put it in file

  1. kubectl get namespace cattle-system -o json > tmp.json
  1. edit the file and remove the finalizers
    },
    "spec": {
    "finalizers": [
    "kubernetes"
    ]
    },
    after editing it should look like this 👍
    },
    "spec": {
    "finalizers": [
    ]
    },
    we almost there 👍

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize

and it's gone 👍

Hey, the finalizer kubernetes is there for a reason. For us it was a wrongly configured metrics API service name. Maybe for you is something else, which you can discover by looking at your control plane logs. Without confirmation of a bug, removing the finalizer may produce undesired consequences like leaving stuff created that can no longer be accesible again for deletion purposes.

as this issue is still open:
within my minikube cluster running with "none", this happened after the host woke up from hibernate.

my assumption:
in my case the hibernate triggered the same problems, an enabled swap would do.

which yields the assumption:
the swap might be enabled in the affected clusters?

but this is just conjecture. the important thing for me, and anyone landing in this bug with my local setup: hibernate is bad for kubernetes.

first, take small coffee and relax , now go to your k8s master nodes

  1. kubectl cluster-info
    Kubernetes master is running at https://localhost:6443
    KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  1. now run the kube-proxy
    kubectl proxy &
    Starting to serve on 127.0.0.1:8001

save the ID to delete it later on :)

  1. find your name-space that decided no to be deleted :) for us it will be cattle-system
    kubectl get ns
    cattle-system Terminating 1d

put it in file

  1. kubectl get namespace cattle-system -o json > tmp.json
  1. edit the file and remove the finalizers
    },
    "spec": {
    "finalizers": [
    "kubernetes"
    ]
    },
    after editing it should look like this 👍
    },
    "spec": {
    "finalizers": [
    ]
    },
    we almost there 👍

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize

and it's gone 👍

Great!!
Works

I run into this issue periodically if we change our gcloud instances (e.g. upgrading nodes). This replaces the old node from gcloud instances list with a new one but leaves the pods in the k8s namespace hanging:

Reason:                    NodeLost
Message:                   Node <old node> which was running pod <pod> is unresponsive

This then leaves the pods in an unknown state:

$ kubectl get po
NAME                               READY     STATUS    RESTARTS   AGE
<pod>                              2/2       Unknown   0          39d

Due to this, the namespace will never finish terminating. Not sure if this means we should change our finalizers or if there's an actual bug related to terminating that should be handling pods in an UNKNOWN state (or if there should be a way of force terminating a namespace for cases like this).

I run into this issue periodically if we change our gcloud instances (e.g. upgrading nodes). This replaces the old node from gcloud instances list with a new one but leaves the pods in the k8s namespace hanging:

Reason:                    NodeLost
Message:                   Node <old node> which was running pod <pod> is unresponsive

This then leaves the pods in an unknown state:

$ kubectl get po
NAME                               READY     STATUS    RESTARTS   AGE
<pod>                              2/2       Unknown   0          39d

Due to this, the namespace will never finish terminating. Not sure if this means we should change our finalizers or if there's an actual bug related to terminating that should be handling pods in an UNKNOWN state (or if there should be a way of force terminating a namespace for cases like this).

cool it's not the same issue
you need to put nodes in maintenance mode and then after it's in maintenance mode all pods will be evacuated and then u can delete/upgrade

look it ,https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/,
edit resource and delete metadata.finalizers,and delete unuseful crd,you can delete it force

But what does the kubernetes finalizer do exactly? Is there any risk that resources are not being correctly cleaned up with this hack?

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize

Error from server (NotFound): namespaces "annoying-namespace-to-delete" not found

first, take small coffee and relax , now go to your k8s master nodes

  1. kubectl cluster-info
    Kubernetes master is running at https://localhost:6443
    KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  1. now run the kube-proxy
    kubectl proxy &
    Starting to serve on 127.0.0.1:8001

save the ID to delete it later on :)

  1. find your name-space that decided no to be deleted :) for us it will be cattle-system
    kubectl get ns
    cattle-system Terminating 1d

put it in file

  1. kubectl get namespace cattle-system -o json > tmp.json
  1. edit the file and remove the finalizers
    },
    "spec": {
    "finalizers": [
    "kubernetes"
    ]
    },
    after editing it should look like this 👍
    },
    "spec": {
    "finalizers": [
    ]
    },
    we almost there 👍

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize

and it's gone 👍

Invalid value: "The edited file failed validation": ValidationError(Namespace.spec): invalid type for io.k8s.api.core.v1.NamespaceSpec: got "string", expected "map"

If you have many namespaces stuck in Terminating, you can automate this:

kubectl get ns | grep Terminating | awk '{print $1}' | gxargs  -n1 -- bash -c 'kubectl get ns "$0" -o json | jq "del(.spec.finalizers[0])" > "$0.json"; curl -k -H "Content-Type: application/json" -X PUT --data-binary @"$0.json" "http://127.0.0.1:8001/api/v1/namespaces/$0/finalize" '

make sure that all namespaces that you want the finalizer removed are indeed Terminating.

You need the kubectl proxy running and jq for the above to work.

In our case, metrics api service is down and i can see this error log from verbose logging

kubectl delete ns <namespace-name> -v=7
.......
I0115 11:03:25.548299   12445 round_trippers.go:383] GET https://<api-server-url>/apis/metrics.k8s.io/v1beta1?timeout=32s
I0115 11:03:25.548317   12445 round_trippers.go:390] Request Headers:
I0115 11:03:25.548323   12445 round_trippers.go:393]     Accept: application/json, */*
I0115 11:03:25.548329   12445 round_trippers.go:393]     User-Agent: kubectl/v1.11.3 (darwin/amd64) kubernetes/a452946
I0115 11:03:25.580116   12445 round_trippers.go:408] Response Status: 503 Service Unavailable in 31 milliseconds

After fixing the metrics apiservice, terminating ones are completed.
Not really sure why deletion depends on metrics apiservice, also intrested to know how it works if metrics apiservice is not installed on the cluster

Not really sure why deletion depends on metrics apiservice,

@manojbadam because metrics is registered in the api server, when performing a namespace deletion, it must query that external api for (namespaced) resources to be deleted (if exist) associated with that namespace. If the extension server isn't available, Kubernetes can't guarantee that all objects have been removed, and it doesn't have a persistent mechanism (in memory or disk) to reconcile later because the root object would have been removed. That happens with any registered api extension service.

As I was constantly running into this, I automated this with a small shell script:

https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns

It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …

Thanks to everyone pointing me into the right direction!

As I was constantly running into this, I automated this with a small shell script:

https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns

It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …

Thanks to everyone pointing me into the right direction!

my hero! <3

I ran into this problem too. I'm on Google Kubernetes Engine and using Terraform to spin up Kubernetes clusters and to create namespaces and pods inside the cluster. The problem started a while after running a terraform destroy.

In my case, this turns out to be an issue with order in which Terraform executes the destroy. Terraform deletes the node pool first, and then deletes the namespaces and pods. But due to deleting the (only) node pool, the Kubernetes cluster broke, and that's what caused the namespace deletion to get stuck at "terminating" forever.

@FooBarWidget same problem for me :(

As I was constantly running into this, I automated this with a small shell script:

https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns

It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …

Thanks to everyone pointing me into the right direction!

[root@k8s-master ~]# curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://172.*****:6443/api/v1/namespaces/rook-ceph/finalize
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "namespaces "rook-ceph" is forbidden: User "system:anonymous" cannot update namespaces/finalize in the namespace "rook-ceph"",
"reason": "Forbidden",
"details": {
"name": "rook-ceph",
"kind": "namespaces"
},
"code": 403

I got a return code 403, what should I do :(

As I was constantly running into this, I automated this with a small shell script:
https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns
It fetches the project, fixes the JSON, starts and properly stops "kubectl proxy", …
Thanks to everyone pointing me into the right direction!

[root@k8s-master ~]# curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://172.*****:6443/api/v1/namespaces/rook-ceph/finalize
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "namespaces "rook-ceph" is forbidden: User "system:anonymous" cannot update namespaces/finalize in the namespace "rook-ceph"",
"reason": "Forbidden",
"details": {
"name": "rook-ceph",
"kind": "namespaces"
},
"code": 403

I got a return code 403, what should I do :(

Thx god, the terminating namespace is finally gone. The following method do trick for me.

NAMESPACE=rook-ceph
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize

I have the same issue but i don't see any metrics-service.

I'm playing around with k8s from digitalocean and gitlab auto devops. My assumption is some digitalocean blob storage but i'm lost on how to analyse or fix it.

@mingxingshi tx. Did a edit namespace which didn't do the trick. Your script did it.

Wow, finally got rid of it. Thanks for the commands @mingxingshi !

The solution for me was:

kubectl delete apiservice v1beta1.metrics.k8s.io

just figured i should leave my experience of this here:

i was doing terraform apply with the following resource:

resource "helm_release" "servicer" {
  name      = "servicer-api"
  // n.b.: this is a local chart just for this terraform project
  chart     = "charts/servicer-api"
  namespace = "servicer"
  ...
}

but i am a helm newb and had a template that had a template in it that created a namespace called servicer. This caused terraform and k8s to get in this bad state where terraform would fail, then k8s would leave the servicer namespace permanently in the Terminating state. Doing what @mingxingshi suggests above made the namespace terminate, as it had no resources attached to it.

this issue stopped happening for me when i removed that template that made the namespace and left it to helm to create it.

The problem is completely repeatable for me. First, clone the prometheus-operator. Then:

cd prometheus-operator/contrib/kube-prometheus
kubectl create -f manifests/ --validate=false
 ... wait ...
kubectl delete namespace monitoring

Hangs. If, however, I use kubectl delete -f manifests/, then cleanup is successful.

Yeah, had the same hang with prometheus-operator. Need to kubectl delete -f manifests/ to unstuck.
I think there are some finalizers in prometheus CRD's that are misbehaving, in this particular scenario it's hardly kubernetes fault. However kubernetes should make it easier finding the culprit, because the lengts of this thread demonstrates that there could be many causes and it's not easy to get to the bottom of it in each particular scenario.

I'm a kubernetes noob so I can't offer much info here but I also have 2 namespaces stuck in terminating status. My kubernetes setup is using IBM Cloud Private 3.1.2 Community Edition

kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4+icp", GitCommit:"3f5277fa129f05fea532de48284b8b01e3d1ab4e", GitTreeState:"clean", BuildDate:"2019-01-17T13:41:02Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4+icp", GitCommit:"3f5277fa129f05fea532de48284b8b01e3d1ab4e", GitTreeState:"clean", BuildDate:"2019-01-17T13:41:02Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

kubectl cluster-info
Kubernetes master is running at https://ip
catalog-ui is running at https://ip/api/v1/namespaces/kube-system/services/catalog-ui:catalog-ui/proxy
Heapster is running at https://ip/api/v1/namespaces/kube-system/services/heapster/proxy
image-manager is running at https://ip/api/v1/namespaces/kube-system/services/image-manager:image-manager/proxy
CoreDNS is running at https://ip/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
metrics-server is running at https://ip/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
platform-ui is running at https://ip/api/v1/namespaces/kube-system/services/platform-ui:platform-ui/proxy

kubectl get nodes
NAME          STATUS                     ROLES                          AGE   VERSION
ip1    Ready,SchedulingDisabled   etcd,management,master,proxy   23h   v1.12.4+icp
ip2   Ready                      worker                         23h   v1.12.4+icp
ip3   Ready                      worker                         23h   v1.12.4+icp

I have two namespaces stuck in the terminating state

kubectl get ns
NAME           STATUS        AGE
my-apps       Terminating   21h
cert-manager   Active        23h
default        Active        23h
istio-system   Active        23h
kube-public    Active        23h
kube-system    Active        23h
platform       Active        22h
psp-example    Terminating   18h
services       Active        22h

When I check the finalizers as described in this comment I only see kubernetes.

kubectl get ns my-apps -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"my-apps"}}
  creationTimestamp: 2019-04-10T18:23:55Z
  deletionTimestamp: 2019-04-11T15:24:24Z
  name: my-apps
  resourceVersion: "134914"
  selfLink: /api/v1/namespaces/my-apps
  uid: ccb0398d-5bbd-11e9-a62f-005056ad5350
spec:
  finalizers:
  - kubernetes
status:
  phase: Terminating

Regardless I tried removing kubernetes from the finalizers and it didn't work. I also tried using the json/api approach described in this comment. Also didn't work. I tried restarting all the nodes and that didn't work either.

I also tried doing the force delete and that doesn't work either

kubectl delete namespace my-apps --force --grace-period 0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (Conflict): Operation cannot be fulfilled on namespaces "my-apps": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

In my case the namespace is rook-ceph, kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge works for me. For other cases it should work too.

From: https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md

@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"

curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize

and it should delete your namespace,

I have some problem while using your approach, what should I do for next step troubleshooting?

~ curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://39.96.4.11:6443/api/v1/namespaces/istio-system/finalize
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "namespaces \"istio-system\" is forbidden: User \"system:anonymous\" cannot update resource \"namespaces/finalize\" in API group \"\" in the namespace \"istio-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "istio-system",
    "kind": "namespaces"
  },
  "code": 403

@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove"kubernetes"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://kubernetes-cluster-ip/api/v1/namespaces/annoying-namespace-to-delete/finalize
and it should delete your namespace,

I have some problem while using your approach, what should I do for next step troubleshooting?

~ curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json https://39.96.4.11:6443/api/v1/namespaces/istio-system/finalize
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "namespaces \"istio-system\" is forbidden: User \"system:anonymous\" cannot update resource \"namespaces/finalize\" in API group \"\" in the namespace \"istio-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "istio-system",
    "kind": "namespaces"
  },
  "code": 403

My problem can be solved by this script: https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns.

yup https://github.com/ctron/kill-kube-ns/blob/master/kill-kube-ns does the trick

set -eo pipefail

die() { echo "$*" 1>&2 ; exit 1; }

need() {
    which "$1" &>/dev/null || die "Binary '$1' is missing but required"
}

# checking pre-reqs

need "jq"
need "curl"
need "kubectl"

PROJECT="$1"
shift

test -n "$PROJECT" || die "Missing arguments: kill-ns <namespace>"

kubectl proxy &>/dev/null &
PROXY_PID=$!
killproxy () {
    kill $PROXY_PID
}
trap killproxy EXIT

sleep 1 # give the proxy a second

kubectl get namespace "$PROJECT" -o json | jq 'del(.spec.finalizers[] | select("kubernetes"))' | curl -s -k -H "Content-Type: application/json" -X PUT -o /dev/null --data-binary @- http://localhost:8001/api/v1/namespaces/$PROJECT/finalize && echo "Killed namespace: $PROJECT"

It seems that namespaces are actually not deleted.
In my case, kubectl get ns does not show the deleted namespace but a kubectl get all -n <namespace> show all resources safe and sound.
I checked on the nodes and docker containers were still running...

@glouis that's because you bypassed the finalizers using the method above, so Kubernetes didn't have time to execute all those essential deletion tasks.

It's really sad to see so many people blindly advocating for this method without understanding its consequences. It's extremely ugly and can potentially leave tons of leftovers in the cluster. @javierprovecho already mentioned it above, and @liggitt also mentioned it in another GitHub issue.

You'd be better off fixing the broken v1beta1.metrics.k8s.io API service, or deleting it if you don't need it.

See also #73405

I second @antoineco message. I tested this out on one of our sandbox environments because we were constantly getting stuck namespaces. after about a month all the docker daemons were freezing for no reason. Turns out we created huge memory leaks from leaving resources behind.

After a lot of trial and error, and reading through these comments, it turned out to be custom resource definition for coreos grafana stack for the namespaces. Listing out the CRDs showed specific resources for that namespace. I was very lucky that the name of the CRD had the namespace in it that was stuck.

It also turned out that having one stuck namespace stops any more namespaces to delete. So even if you have namespace A that has no CRD's getting it stuck, and there is a namespace B with a stuck CRD, all the resources in A will stick around until B is gone. I think I must have done the described fix above on namespace A leaving a ton of resources around every time.

The thing that is still killing me is I cannot for the life of me find any log mentioning a namespace clean up failing on deleting a CRD, or even what it is currently doing. I had to spend an hour just figuring out what CRD it was stuck on. If anyone has an idea on how to get more info so I dont have to spend a huge amount of time figuring out the stuck resource that would be awesome.

@jecafarelli good hint for Production Clusters. But unfortunate for me, i was just not able to kill it otherwise. I also knew i will recreate the whole cluster later on.

I tried analysing the issue but nothin in this thread helped me to solve it by other means.

This official solution helped me: https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating
This is not the same as kubectl edit namespace rook-ceph. I was unable to solve this problem until I PUTrequest with deleted _"finalizers"_

ok so I ran into this again with coreos, and I dug a bit deeper. this is most definitely because of a cluster wide resource definition that is namespaced, and further more maybe it couldnt delete it because it cant query info on coreos. I did find errors in the apiserver logs that showed errors on trying to get information about an api group. I used the referenced issue above to come up with a quick script that lists out the resources that got the ns stuck for me.

ill probably just use this in the future if I run into it again and keep adding any other namespaced resources I run into.

for ns in `kubectl get ns --field-selector status.phase=Terminating -o name | cut -d/ -f2`; 
do
  echo "apiservice under namespace $ns"
  kubectl get apiservice -o json |jq --arg ns "$ns" '.items[] |select(.spec.service.namespace != null) | select(.spec.service.namespace == $ns) | .metadata.name ' --raw-output
  echo "api resources under namespace $ns"
  for resource in `kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -o name -n $ns`; 
  do 
    echo $resource
  done;
done

Thanks a lot @jecafarelli, you helped me solve my issue the right way ;-)

I had installed cert-manager on an OpenShift cluster inside the cert-manager namespace and when I tried to delete this namespace, it got stuck in terminating state. Executing oc delete apiservices v1beta1.admission.certmanager.k8s.io seems to have solved the problem, the namespace is gone.

Thanks a lot @jecafarelli, you helped me solve my issue the right way ;-)

I had installed cert-manager on an OpenShift cluster inside the cert-manager namespace and when I tried to delete this namespace, it got stuck in terminating state. Executing oc delete apiservices v1beta1.admission.certmanager.k8s.io seems to have solved the problem, the namespace is gone.

Same here, running kubectl delete -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml helped

Just chiming in to say I've also met this error on version 1.13.6 with GKE. It happened after I disabled GKE's Istio addon with the goal of manually installing it for full control.
This is the longest issue thread I've ever taken the time to read through, and I'm blown away that there is no real consensus or reproduction steps to the root of this issue. Seems it can get tripped in so many different way :(

The JSON and curl/proxy method mentioned numerous times above and documented at https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating is what saved me.

The advice at https://success.docker.com/article/kubernetes-namespace-stuck-in-terminating is actively harmful, and can result in orphaned resources not getting cleaned up and resurfacing if a namespace with an identical name is later recreated.

There is work in progress to surface the specific cause of the hung delete, but the fundamental issue is that there are API types that cannot be verified to have been cleaned up, so the namespace deletion blocks until they are verified.

We also hit this with Knative which installs this namespaced apiservice.

---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  labels:
    autoscaling.knative.dev/metric-provider: custom-metrics
    serving.knative.dev/release: v0.7.1
  name: v1beta1.custom.metrics.k8s.io
spec:
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: autoscaler
    namespace: knative-serving
  version: v1beta1
  versionPriority: 100
---

After deleting it both the knative-serving ns and a bunch of other stuck namespaces cleaned up. Thanks to @jecafarelli for the above bash script.
Here's a terrible powershell version.

$api = kubectl get apiservice -o json  | convertfrom-json
#list out the namespaced api items can ignore kube-system
$api.items | % { $_.spec.service.namespace }
#repalce knative-serving with whatever namespace you found
$api.items | ? { $_.spec.service.namespace -eq 'knative-serving'  } | ConvertTo-Json
#replace v1beta1.custom.metrics.k8s.io with whatever you found. 
k delete apiservice v1beta1.custom.metrics.k8s.io

I had the same problem today and this script worked for me.

@kubernetes/sig-api-machinery-misc

This bug has existed for > year and is still a problem... What is your plan to address inbound issues such as this?

This could help with at least understanding whats going on: https://github.com/kubernetes/kubernetes/pull/80962

I am hitting the same issue

k get ns cdnamz-k8s-builder-system  -o yaml 
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"labels":{"control-plane":"controller-manager"},"name":"cdnamz-k8s-builder-system"}}
  creationTimestamp: "2019-08-05T18:38:21Z"
  deletionTimestamp: "2019-08-05T20:37:37Z"
  labels:
    control-plane: controller-manager
  name: cdnamz-k8s-builder-system
  resourceVersion: "5980028"
  selfLink: /api/v1/namespaces/cdnamz-k8s-builder-system
  uid: 3xxxxxxx
spec:
  finalizers:
  - kubernetes
status:
  phase: Terminating
 k get ns 
NAME                        STATUS        AGE
cdnamz-k8s-builder-system   Terminating   4h20m

Namespace controller should report conditions to the namespace status and clients should report that. Needs a KEP, but should be pretty straightforward if someone can take and validate it.

@timothysc there is (or was) a PR in flight (somewhere) doing exactly what @smarterclayton says.

I am pretty sure there is another github issue about this, too?

Here's a resource that helped me: https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.1/troubleshoot/ns_terminating.html

It's similar to the solution proposed by @slassh, but it uses kubectl proxy to create a local proxy and make the target IP of the curl command predictable.

--

Edit: as stated several times below this answer, this solution is a dirty hack and will possibly leave some dependent resources in the cluster. Use at your own risk, and possibly only use it as a quick way out in a development cluster (don't use it in a production cluster).

removing the finalizer directly as described in the doc above can have consequences. The resources that were pending deletion will still be defined in the cluster even after the namespace has been released. This is the purpose of the finalizer. To ensure that all dependents are removed prior to allowing the deletion of the ns.

Found workaround in similar questions:

NAMESPACE=<namespace-name>
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize

Found workaround in similar questions:

NAMESPACE=<namespace-name>
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize

Thank you!
It works good.
I create simple app use this workaround: https://github.com/jenoOvchi/k8sdelns
I use it for fast deletion and hope it will be helpful for someone.

Kubernetes 1.12.2 namespaces are in the Terminating state. Sometimes the finalizers can be deleted by modifying the yaml of ns. It cannot be deleted by api. Can it be deleted? What is the situation? Have we specifically tracked it (prerequisite: there are no resources in this ns), I hope I can get pointers, thank you!

Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:

  • Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
  • Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete (Kudos to Jordan for that one)

The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.

Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:

  • Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
  • Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete (Kudos to Jordan for that one)

The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.

How right you are.
In my case pod of Operator Framework apiservice was deleted and block terminating process.
Removing an unused apiservice (kubectl delete apiservice ) solved the problem.

Hi all, code freeze is coming up in just a few days (Thursday, end of day, PST), so we need to make sure that this issue will be solved for v1.16 or moved to v1.17. Can you comment on it's status?

Will this be backported into a current GKE release? I have a cluster that has a handful of namespaces that are still "Terminating".

@squarelover even after doing this? https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920

@josiahbjorgaard I just approved the PR, which is all we will be doing on this for 1.16.

It's merged. I think there may be more we can do, but please take future comments to #70916.

Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:

  • Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
  • Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete (Kudos to Jordan for that one)

The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.

In many of the cases you might have Metric-Server installed. When the pods you deploy in a specific namespaces looks for metric gathering. It hangs on with the Metric-server. So even after you delete all the resources in that namespace, metric-server is somehow linked to that namespace. Which will prevent you from deleting the namespace.
This post helps you identify the reason why you cannot delete Namespace. So the rite way.

Try this to get the actual list of all things in your namespace: kubernetes/kubectl#151 (comment)

Then for each object do kubectl delete or kubectl edit to remove finalizers.

This solution useful for me, thanks.

Hi guys,

I made a script to make easier delete namespaces in Terminating status: https://github.com/thyarles/knsk.

Thanks.

We met with the same issue, when deleting a namespace, it will stuck in 'Terminating' state. I followed the stpes above to remove 'kubernetes' in finalizer in the yaml file. It works.

However, we don't know why we need to do extra steps. It should do kubectl delete ns foonamespace and it should delete. Can anyone give me a reason? Thank you!

Hello @xzhang007,

If you discover why the namespace deletion stucks in Terminating state, please, let me know. I tried for a while a good answer, but nothing. Then I made a script to make my life easier until discover and fix the cause.

Thank you.

@thyales it seems I did not find an answer up to now.

In our case we discovered that one of the webhhoks and finalizers we were
using was reaching out to a pod which was living in the namespaces that got
deleted.
Once the pod got deleted, the termination was stuck.

>

@xzhang007 have you looked at the answer @alvaroaleman provided? For us that was enough to find out what the cause was.

Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:

  • Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
  • Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete (Kudos to Jordan for that one)

The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.


also, when this issue was closed, there was a new ticket referenced to discuss how to make it clear why the namespace is stuck in Terminating. I suggest you take the conversation there instead of this closed issue.

It's merged. I think there may be more we can do, but please take future comments to #70916.

@jeff-knurek That should be the right way. Thank you.

In our case it was a botched upgrade of cert-manager which broke the finalizer. https://github.com/jetstack/cert-manager/issues/1582

$ kube get apiservice

NAME                                   SERVICE                                                     AVAILABLE                  AGE
v1.                                    Local                                                       True                       43d
v1.apps                                Local                                                       True                       43d
v1.authentication.k8s.io               Local                                                       True                       43d
v1.authorization.k8s.io                Local                                                       True                       43d
v1.autoscaling                         Local                                                       True                       43d
v1.batch                               Local                                                       True                       43d
v1.coordination.k8s.io                 Local                                                       True                       43d
v1.networking.k8s.io                   Local                                                       True                       43d
v1.rbac.authorization.k8s.io           Local                                                       True                       43d
v1.scheduling.k8s.io                   Local                                                       True                       43d
v1.storage.k8s.io                      Local                                                       True                       43d
v1alpha1.certmanager.k8s.io            Local                                                       True                       3d22h
v1alpha1.crd.k8s.amazonaws.com         Local                                                       True                       43d
v1beta1.admission.certmanager.k8s.io   cert-manager/cointainers-cointainers-cert-manager-webhook   False (MissingEndpoints)   60m
v1beta1.admissionregistration.k8s.io   Local                                                       True                       43d
v1beta1.apiextensions.k8s.io           Local                                                       True                       43d
v1beta1.apps                           Local                                                       True                       43d
v1beta1.authentication.k8s.io          Local                                                       True                       43d
v1beta1.authorization.k8s.io           Local                                                       True                       43d
v1beta1.batch                          Local                                                       True                       43d
v1beta1.certificates.k8s.io            Local                                                       True                       43d
v1beta1.coordination.k8s.io            Local                                                       True                       43d
v1beta1.events.k8s.io                  Local                                                       True                       43d
v1beta1.extensions                     Local                                                       True                       43d
v1beta1.networking.k8s.io              Local                                                       True                       43d
v1beta1.node.k8s.io                    Local                                                       True                       43d
v1beta1.policy                         Local                                                       True                       43d
v1beta1.rbac.authorization.k8s.io      Local                                                       True                       43d
v1beta1.scheduling.k8s.io              Local                                                       True                       43d
v1beta1.storage.k8s.io                 Local                                                       True                       43d
v1beta1.webhook.cert-manager.io        cert-manager/cointainers-cointainers-cert-manager-webhook   False (MissingEndpoints)   3d22h
v1beta2.apps                           Local                                                       True                       43d
v2beta1.autoscaling                    Local                                                       True                       43d
v2beta2.autoscaling                    Local                                                       True                       43d

Hi.
I my case namespace stucks in Terminating when https://github.com/rancher/rancher/issues/21546#issuecomment-553635629

Maybe it will help.

https://medium.com/@newtondev/how-to-fix-kubernetes-namespace-deleting-stuck-in-terminating-state-5ed75792647e

This worked like a champ for me

I also faced the same issue now it is working fine for me. Please refer following document and solve your issue

@zerkms well, sometimes, it's a legitimate advice, is not it? Often finalizers being waited on used to be served by objects that was deleted as part of the namespace deletion. In this case, since there is no point in waiting any longer - there is nothing that could do finalization any more -, patching the objects the way the article describes _is the only option_.

Note that the article is applicable only if the issue _was not resolved_ by the applying steps listed in the Known Issues page, linked at the top of the article, which basically is the advice that the comment you linked repeats.

@zerkms well, sometimes, it's a legitimate advice, is not it? Often finalizers being waited on used to be served by objects that was deleted as part of the namespace deletion

I've never seen that be true for a spec.finalizer on a namespace. Every instance I've seen has involved the namespace cleanup controller, and has either been caused by a persistent object in the namespace (which that advice would strand in etcd), or an unresponsive aggregated API (which removing the namespace spec.finalizer would skip waiting for, also stranding any persisted resources from that API)

The article does not warn that bypassing the namespace finalization risks leaving namespaced resources stranded in storage, and is not recommended.

I've never seen that be true for a spec.finalizer on a namespace

Yep, that's right, this is becaise this finalizer is implemented by the kubernetes itself, but there could be other finalizers on objects inside that namespace, which could be implemented by objects in that said namespace. One example that I encountered recently was https://appscode.com/products/stash/.

It puts finalizers on some of its CRDs which are to be serviced by the stash-operator deployment. But with stash-operator already deleted, there is nothing that can remove the finalizer mark from those CRDs and the namespace deletion gets stuck. In this case patching out those finalizers (not on the namespace itself, but on those objects) is the only sensible thing to do.

Hope it makes sense.

In this case patching out those finalizers (not on the namespace itself, but on those objects) is the only sensible thing to do.

Correct. I would not object to that in a "delete all resources" cleanup scenario, but that is not what the linked article walks through... it describes how to remove a spec.finalizer from the namespace.

first, take small coffee and relax , now go to your k8s master nodes

kubectl cluster-info
Kubernetes master is running at https://localhost:6443
KubeDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

now run the kube-proxy
kubectl proxy &
Starting to serve on 127.0.0.1:8001
save the ID to delete it later on :)

  1. find your name-space that decided no to be deleted :) for us it will be cattle-system
    kubectl get ns
    cattle-system Terminating 1d

put it in file

  1. kubectl get namespace cattle-system -o json > tmp.json

edit the file and remove the finalizers
},
"spec": {
"finalizers": [
"kubernetes"
]
},
after editing it should look like this 👍
},
"spec": {
"finalizers": [
]
},
we almost there 👍
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/${NAMESPACE}/finalize

and it's gone 👍

Again, please do not remove the finalizer, it is there for a reason. Try to instead find out which resources in the NS are pending deletion by:

  • Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
  • Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete (Kudos to Jordan for that one)

The solution to this problem is not to short-circuit the cleanup mechanism, its to find out what prevents cleanup from succeeding.

Hey, guys! I follow the tips provided by @alvaroaleman and I made a script that inspect and try the clean deletion before do a hard deletion of stucked namespace.

What the script https://github.com/thyarles/knsk do:

  1. Check for unavailable apiresources and ask to delete it
  2. Check for pending resources in namespace and ask to delete it
  3. Wait about 5 minutes to see if Kubernetes do a clean deletion if the script do any deletion
  4. Force deletion of stucked namespace

Hope it helps.

@thyarles Thank you so much. I used your way to solve the problem.

$kubectl get apiservices to check which service is unavailable, delete those available is false by $kubectl delete apiservice [service-name], and after that there would be no issues about deleting a name space.

For our team, there are 3 unavailable apiservices, v1beta1.admission.certmanager.k8s.io, v1beta1.metrics.k8s.io, and v1beta1.webhook.certmanager.k8s.io.

Note that your cluster is somewhat broken if the metrics apiserver isn't running, just removing the APIService doesn't actually fix the root cause.

@lavalamp the metrics is an unavailable apiservice.

Yes, which means the metrics apiserver is not running, which means HPA doesn't work on your cluster, and probably other things, too.

Yes. HPA doesn't work now. I should not delete metrics and find a way to fix it.

@thyarles Thank you so much. I used your way to solve the problem.

$kubectl get apiservices to check which service is unavailable, delete those available is false by $kubectl delete apiservice [service-name], and after that there would be no issues about deleting a name space.

For our team, there are 3 unavailable apiservices, v1beta1.admission.certmanager.k8s.io, v1beta1.metrics.k8s.io, and v1beta1.webhook.certmanager.k8s.io.

@xzhang007 glad to hear! Now you must check why your v1beta1.metrics.k8s.io became broken. Check how it would like:

```
$ kubectl -n kube-system get all | grep metrics

pod/metrics-server-64f74f8d47-r5vcq 2/2 Running 9 119d
service/metrics-server ClusterIP xx.xx.xx.xx 443/TCP 201d
deployment.apps/metrics-server 1/1 1 1 201d
replicaset.apps/metrics-server-55c7f68d94 0 0 0 165d
replicaset.apps/metrics-server-5c696bb6d7 0 0 0 201d
replicaset.apps/metrics-server-5cdb8bb4fb 0 0 0 201d
replicaset.apps/metrics-server-64f74f8d47 1 1 1 119d
replicaset.apps/metrics-server-686789bb4b 0 0 0 145d```

$ kubectl -n kube-system get all | grep metrics

pod/metrics-server-5dcfd4dd9f-m2v9k 1/1 Running 0 2d20h

service/metrics-server ClusterIP xx.xx.xx.xx 443/TCP 27d

deployment.apps/metrics-server 1/1 1 1 27d

replicaset.apps/metrics-server-5dcfd4dd9f 1 1 1 27d
replicaset.apps/metrics-server-7fcf9cc98b 0 0 0 27d

Yes. HPA doesn't work now. I should not delete metrics and find a way to fix it.

@xzhang007 in fact it doesn't work before the you noticed the problem.... you just noticed because it put your deleted namespaces in stuck mode. Just use a helm package manager to re-deploy your metric-server or just call the command bellow to fix it (check the deployment file before apply):

$ curl https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/metrics-server-deployment.yaml | kubectl apply -f -

@slassh solution worked for me on Kubernetes 1.15

Delete v1beta1.metrics.k8s.io APIService

kubectl get ns ns-to-delete -o yaml
...
status:
  conditions:
  - lastTransitionTime: "2020-01-08T05:36:52Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
      unable to handle the request'
...
kubectl get APIService
...
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (ServiceNotFound)
 kubectl delete v1beta1.metrics.k8s.io APIService

The cert-manager was unavailable maybe since it was set up incorrectly. For example, use a wrong syntax in ingress controller. For our system, it was

"certmanager.k8s.io/cluster-issuer": "letsencrypt-prod"

and it was changed to

"cert-manager.io/cluster-issuer": "letsencrypt-prod"

make it available.

As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where kubectl replace --raw is available (not sure from which version). This way you will not have to spawn a kubectl proxy process and avoid dependency with curl (that in some environment like busybox is not available). In the hope that this will help someone else I left this here:

kubectl get namespace "stucked-namespace" -o json \
            | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
            | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -

Has it been established whether this is a fixable issue? Seems to be a lot hacky solutions here but nothing addressing the underlying issue which is that none of us can delete our namespaces....
I have this on EKS v1.14 cluster

Has it been established whether this is a fixable issue? Seems to be a lot hacky solutions here but nothing addressing the underlying issue which is that none of us can delete our namespaces

The fundamental issue is that an aggregated API group in your cluster is unavailable. It is intentional that the namespace cleanup controller blocks until all APIs are available, so that it can verify all resources from all API groups are cleaned up for that namespace.

for ppl trying to curl the API:

# Check all possible clusters, as your .KUBECONFIG may have multiple contexts:
kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'

# Select name of cluster you want to interact with from above output:
export CLUSTER_NAME="some_server_name"

# Point to the API server referring the cluster name
APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(@.name==\"$CLUSTER_NAME\")].cluster.server}")

# Gets the token value
TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)

# Explore the API with TOKEN
curl -X GET $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure

https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-kubectl-proxy

Here's a script to do this automatically. Needs jq:


#!/bin/bash

if [ -z "${1}" ] ; then
  echo -e "\nUsage: ${0} <name_of_the_namespace_to_remove_the_finalizer_from>\n"
  echo "Valid cluster names, based on your kube config:"
  kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
  exit 1
fi

kubectl proxy --port=8901 &
PID=$!
sleep 1

echo -n 'Current context : '
kubectl config current-context 
read -p "Are you sure you want to remove the finalizer from namespace ${1}? Press Ctrl+C to abort."

kubectl get namespace "${1}" -o json \
            | jq '.spec.finalizers = [ ]' \
            | curl -k \
            -H "Content-Type: application/json" \
            -X PUT --data-binary @- "http://localhost:8901/api/v1/namespaces/${1}/finalize"

kill -15 $PID

Everyone: scripts to automate the finalizer removal do more harm than good. They may leave time-bombs in the aggregated apiserver(s) that aren't available; if someone recreates the namespace, suddenly a bunch of old objects may re-appear.

The real solution is:

$ kubectl get api-services

# something in the list is unavailable. Figure out what it is and fix it.

# ... the namespace lifecycle controller will finish deleting the namespace.

Everyone: scripts to automate the finalizer removal do more harm than good. They may leave time-bombs in the aggregated apiserver(s) that aren't available; if someone recreates the namespace, suddenly a bunch of old objects may re-appear.

The real solution is:

$ kubectl get api-services

# something in the list is unavailable. Figure out what it is and fix it.

# ... the namespace lifecycle controller will finish deleting the namespace.

https://github.com/thyarles/knsk

This script does all the checks and tries to do a clean deletion, including looking for orphaned resources. If the user wants to take a risk, the script offers a --force option to perform a non-recommended way of deletion.

typo, should be apiservices

This command shows not available apis:

kubectl get apiservices --template='{{range $api := .items}}{{range $api.status.conditions}}{{if eq .type "Available"}}{{$api.metadata.name}} {{.status}}{{"\n"}}{{end}}{{end}}{{end}}' | grep -i false

This article will surely be useful to you:

https://access.redhat.com/solutions/5038511

Actually what exists is a conflict in the apiservices, they could validate the health status of the apis in openshift:

oc get apiservices -o=custom-columns="name:.metadata.name,status:.status.conditions[0].status"

the api that fails will need to restart it, restarting the pod or the deployement that belongs to that API, after that try to delete the namespace.

$ oc delete namspace

and ready, business fixed !!

Pretty disrespectful to use your own language in a place where everyone agrees to speak English. 👎

Where does everyone agree to speak English?

On Thu, Apr 30, 2020 at 17:58 theAkito notifications@github.com wrote:

Pretty disrespectful to use your own language in a place where everyone
agrees to speak English. 👎


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-622137770,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ALGMKB6K4OU4X3XOYMALOBLRPHYCDANCNFSM4ETUOEPA
.

>

Chris, Lead Architect @ brace.ai

--

Confidentiality Notice: This email is intended for the sole use of the
intended recipient(s) and may contain confidential, proprietary or
privileged information. If you are not the intended recipient, you are
notified that any use, review, dissemination, copying or action taken based
on this message or its attachments, if any, is prohibited. If you are not
the intended recipient, please contact the sender by reply email and
destroy or delete all copies of the original message and any attachments.

ready, excuse me it was for my speed, it was fixed

We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.

@teoincontatto

As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where kubectl replace --raw is available (not sure from which version). This way you will not have to spawn a kubectl proxy process and avoid dependency with curl (that in some environment like busybox is not available). In the hope that this will help someone else I left this here:

kubectl get namespace "stucked-namespace" -o json \
            | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
            | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -

This worked perfectly!

We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.

We have a multi lingual user base, it's bad enough that none of our tools are internationalized, we can at least be nice here on github, please.

Still trying to understand. Forgive me. I may have clicked thumbs down by mistake.
Yes, indeed, tools haven't been done to perfection.
Those, giving a thumbs down without an explanation, doesn't make sense.

Almost all the time I experience this issue, it's up to CRD's. Delete CRD's if they are used only in that namespace and then you can proceed with deleting finalizer and namespace.

As mentioned before in this issue there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where kubectl replace --raw is available (not sure from which version). This way you will not have to spawn a kubectl proxy process and avoid dependency with curl (that in some environment like busybox is not available). In the hope that this will help someone else I left this here:

kubectl get namespace "stucked-namespace" -o json \
            | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
            | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -

@teoincontatto Thank you! This finally worked!

Sometimes only edit the resource manifest online would be not working very well(I mean remove the finalizers filed and save).
So, I got some new way from others.

kubectl get namespace linkerd -o json > linkerd.json

# Where:/api/v1/namespaces/<your_namespace_here>/finalize
kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json

After running that command, the namespace should now be absent from your namespaces list.. And it works for me.

Not only namespace but also support the other resources.

i fixed the problem by removing the finalizers lines using the: kubectl edit annoying-ns

Hmm ... I have this problem right now :)
Today I did an update of my eks cluster from 1.15 to 1.16.
Everything looks fine so far.
But my development ns "configcluster" was a kind of "damanged".
So I decide to clean it up.

k delete ns configcluster
....
now this hangs (3h +) :/

$ kubectl get namespace configcluster -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"configcluster"}}
  creationTimestamp: "2020-06-19T06:40:15Z"
  deletionTimestamp: "2020-06-19T09:19:16Z"
  name: configcluster
  resourceVersion: "22598109"
  selfLink: /api/v1/namespaces/configcluster
  uid: e50f0b53-b21e-4e6e-8946-c0a0803f031b
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2020-06-19T09:19:21Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
      unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2020-06-19T09:19:22Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2020-06-19T09:19:22Z"
    message: All content successfully deleted
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  phase: Terminating

How do we get more exposure to this thorn in the foot issue?

On Fri, Jun 19, 2020 at 4:46 AM Andreas Höhmann notifications@github.com
wrote:

Hmm ... I have this problem right now :)
Today I did an update of my eks cluster from 1.15 to 1.16.
Everything looks fine so far.
But my development ns "configcluster" was a kind of "damanged".
So I decide to clean it up.

k delete ns configcluster
....
now this hangs (3h +) :/

$ kubectl get namespace configcluster -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"configcluster"}}
creationTimestamp: "2020-06-19T06:40:15Z"
deletionTimestamp: "2020-06-19T09:19:16Z"
name: configcluster
resourceVersion: "22598109"
selfLink: /api/v1/namespaces/configcluster
uid: e50f0b53-b21e-4e6e-8946-c0a0803f031b
spec:
finalizers:

  • kubernetes
    status:
    conditions:
  • lastTransitionTime: "2020-06-19T09:19:21Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
    complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
    unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  • lastTransitionTime: "2020-06-19T09:19:22Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  • lastTransitionTime: "2020-06-19T09:19:22Z"
    message: All content successfully deleted
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
    phase: Terminating


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-646543073,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFLKRCLHIZ77X2Z3F5GAOCTRXMXVTANCNFSM4ETUOEPA
.

@bobhenkel well this issue is closed, so effectively this means that there is no issue (as far as any actionable items are concerned). If you need practical help with dealing with a similar situation, please read the thread above, there are some good pieces of advise there (and also some bad ones).

In my case, I had to manually delete my ingress load balancer from the GCP Network Service console. I had manually created the load balancer frontend directly in the console. Once I deleted the load balancer the namespace was automatically deleted.

I'm suspecting that Kubernetes didn't want to delete since the state of the load balancer was different than the state in the manifest.

I will try to automate the ingress frontend creation using annotations next to see if I can resolve this issue.

Sometimes only edit the resource manifest online would be not working very well(I mean remove the finalizers filed and save).
So, I got some new way from others.

kubectl get namespace linkerd -o json > linkerd.json

# Where:/api/v1/namespaces/<your_namespace_here>/finalize
kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json

After running that command, the namespace should now be absent from your namespaces list.. And it works for me.

Not only namespace but also support the other resources.

you are a star it worked

Sometimes only edit the resource manifest online would be not working very well(I mean remove the finalizers filed and save).
So, I got some new way from others.

kubectl get namespace linkerd -o json > linkerd.json

# Where:/api/v1/namespaces/<your_namespace_here>/finalize
kubectl replace --raw "/api/v1/namespaces/linkerd/finalize" -f ./linkerd.json

After running that command, the namespace should now be absent from your namespaces list.. And it works for me.

Not only namespace but also support the other resources.

Tried a lot of solutions but this is the one that worked for me. Thank you!

This should really be the "accepted" answer - it completely resolved the root of this issue!

Take from the link above:

This is not the right way, especially in a production environment.

Today I got into the same problem. By removing the finalizer you’ll end up with leftovers in various states. You should actually find what is keeping the deletion from complete.

See https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920

(also, unfortunately, ‘kubetctl get all’ does not report all things, you need to use similar commands like in the link)

My case — deleting ‘cert-manager’ namespace. In the output of ‘kubectl get apiservice -o yaml’ I found APIService ‘v1beta1.admission.certmanager.k8s.io’ with status=False . This apiservice was part of cert-manager, which I just deleted. So, in 10 seconds after I ‘kubectl delete apiservice v1beta1.admission.certmanager.k8s.io’ , the namespace disappeared.

Hope that helps.


With that being said, I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.

You can find it here: https://github.com/oze4/service.remove-terminating-namespaces

I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.

You can find it here: https://github.com/oze4/service.remove-terminating-namespaces

Yet another oneliner:

for ns in $(kubectl get ns --field-selector status.phase=Terminating -o jsonpath='{.items[*].metadata.name}'); do  kubectl get ns $ns -ojson | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f -; done

But deleting stuck namespaces is not a good solution. Right way is to find out why it's stuck. Very common reason is there's an unavailable API service(s) which prevents cluster from finalizing namespaces.
For example here I haven't deleted Knative properly:

$ kubectl get apiservice|grep False
NAME                                   SERVICE                             AVAILABLE   AGE
v1beta1.custom.metrics.k8s.io          knative-serving/autoscaler          False (ServiceNotFound)   278d

Deleting it solved the problem

k delete apiservice v1beta1.custom.metrics.k8s.io
apiservice.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" deleted
$  k create ns test2
namespace/test2 created
$ k delete ns test2
namespace "test2" deleted
$ kgns test2
Error from server (NotFound): namespaces "test2" not found  

I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces.

You can find it here: https://github.com/oze4/service.remove-terminating-namespaces

good job.

I had a similar issue on 1.18 in a lab k8s cluster and adding a note to maybe help others. I had been working with the metrics API and with custom metrics in particular. After deleting those k8s objects to recreate it, it stalled on deleting the namespace with an error that the metrics api endpoint could not be found. Putting that back in on another namespace, everything cleared up immediately.

This was in the namespace under status.conditions.message:

Discovery failed for some groups, 4 failing: unable to retrieve the
complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently
unable to handle the request, custom.metrics.k8s.io/v1beta2: the server is currently
unable to handle the request, external.metrics.k8s.io/v1beta1: the server is
currently unable to handle the request, metrics.k8s.io/v1beta1: the server is

Yet another oneliner:

for ns in $(kubectl get ns --field-selector status.phase=Terminating -o jsonpath='{.items[*].metadata.name}'); do  kubectl get ns $ns -ojson | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f -; done

But deleting stuck namespaces is not a good solution. Right way is to find out why it's stuck. Very common reason is there's an unavailable API service(s) which prevents cluster from finalizing namespaces.
For example here I haven't deleted Knative properly:

$ kubectl get apiservice|grep False
NAME                                   SERVICE                             AVAILABLE   AGE
v1beta1.custom.metrics.k8s.io          knative-serving/autoscaler          False (ServiceNotFound)   278d

Deleting it solved the problem

k delete apiservice v1beta1.custom.metrics.k8s.io
apiservice.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" deleted
$  k create ns test2
namespace/test2 created
$ k delete ns test2
namespace "test2" deleted
$ kgns test2
Error from server (NotFound): namespaces "test2" not found  

Definitely the cleanest one liner! It's important to note that none of these "solutions" actually solve the root issue.

See here for the correct solution

That is the message would should be spreading :smile: not "yet another one liner".

efinitely the cleanest one liner! It's important to note that none of these "solutions" actually solve the root issue.

This solution solves one of the all possibilities. To look for all possible root causes and fix them, I use this script: https://github.com/thyarles/knsk

@thyarles very nice!

Please do not use modify finalize to delete namespace. That will cause an error

image

Please find out the cause of namespace terminating. Currently known troubleshooting directions

  • pod terminating
  • cert-manager webhook block secrte

I encounter the same problem:

# sudo kubectl get ns
NAME                   STATUS        AGE
cattle-global-data     Terminating   8d
cattle-global-nt       Terminating   8d
cattle-system          Terminating   8d
cert-manager           Active        8d
default                Active        10d
ingress-nginx          Terminating   9d
kube-node-lease        Active        10d
kube-public            Active        10d
kube-system            Active        10d
kubernetes-dashboard   Terminating   4d6h
local                  Active        8d
p-2sfgk                Active        8d
p-5kdx9                Active        8d
# sudo kubectl get all -n kubernetes-dashboard
No resources found in kubernetes-dashboard namespace.
# sudo kubectl get namespace kubernetes-dashboard  -o json 
{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "cattle.io/status": "{\"Conditions\":[{\"Type\":\"ResourceQuotaInit\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"},{\"Type\":\"InitialRolesPopulated\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"}]}",
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"kubernetes-dashboard\"}}\n",
            "lifecycle.cattle.io/create.namespace-auth": "true"
        },
        "creationTimestamp": "2020-09-29T01:15:45Z",
        "deletionGracePeriodSeconds": 0,
        "deletionTimestamp": "2020-10-02T07:59:52Z",
        "finalizers": [
            "controller.cattle.io/namespace-auth"
        ],
        "managedFields": [
            {
                "apiVersion": "v1",
                "fieldsType": "FieldsV1",
                "fieldsV1": {
                    "f:metadata": {
                        "f:annotations": {
                            "f:cattle.io/status": {},
                            "f:lifecycle.cattle.io/create.namespace-auth": {}
                        },
                        "f:finalizers": {
                            ".": {},
                            "v:\"controller.cattle.io/namespace-auth\"": {}
                        }
                    }
                },
                "manager": "Go-http-client",
                "operation": "Update",
                "time": "2020-09-29T01:15:45Z"
            },
            {
                "apiVersion": "v1",
                "fieldsType": "FieldsV1",
                "fieldsV1": {
                    "f:metadata": {
                        "f:annotations": {
                            ".": {},
                            "f:kubectl.kubernetes.io/last-applied-configuration": {}
                        }
                    }
                },
                "manager": "kubectl-client-side-apply",
                "operation": "Update",
                "time": "2020-09-29T01:15:45Z"
            },
            {
                "apiVersion": "v1",
                "fieldsType": "FieldsV1",
                "fieldsV1": {
                    "f:status": {
                        "f:phase": {}
                    }
                },
                "manager": "kube-controller-manager",
                "operation": "Update",
                "time": "2020-10-02T08:13:49Z"
            }
        ],
        "name": "kubernetes-dashboard",
        "resourceVersion": "3662184",
        "selfLink": "/api/v1/namespaces/kubernetes-dashboard",
        "uid": "f1944b81-038b-48c2-869d-5cae30864eaa"
    },
    "spec": {},
    "status": {
        "conditions": [
            {
                "lastTransitionTime": "2020-10-02T08:13:49Z",
                "message": "All resources successfully discovered",
                "reason": "ResourcesDiscovered",
                "status": "False",
                "type": "NamespaceDeletionDiscoveryFailure"
            },
            {
                "lastTransitionTime": "2020-10-02T08:11:49Z",
                "message": "All legacy kube types successfully parsed",
                "reason": "ParsedGroupVersions",
                "status": "False",
                "type": "NamespaceDeletionGroupVersionParsingFailure"
            },
            {
                "lastTransitionTime": "2020-10-02T08:11:49Z",
                "message": "All content successfully deleted, may be waiting on finalization",
                "reason": "ContentDeleted",
                "status": "False",
                "type": "NamespaceDeletionContentFailure"
            },
            {
                "lastTransitionTime": "2020-10-02T08:11:49Z",
                "message": "All content successfully removed",
                "reason": "ContentRemoved",
                "status": "False",
                "type": "NamespaceContentRemaining"
            },
            {
                "lastTransitionTime": "2020-10-02T08:11:49Z",
                "message": "All content-preserving finalizers finished",
                "reason": "ContentHasNoFinalizers",
                "status": "False",
                "type": "NamespaceFinalizersRemaining"
            }
        ],
        "phase": "Terminating"
    }

#  sudo kubectl version

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

You can use etcdctl to find undeleted resources

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
get /registry --prefix | grep <namespace>

Just copy and paste in your terminal

for NS in $(kubectl get ns 2>/dev/null | grep Terminating | cut -f1 -d ' '); do
  kubectl get ns $NS -o json > /tmp/$NS.json
  sed -i '' "s/\"kubernetes\"//g" /tmp/$NS.json
  kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f /tmp/$NS.json
done
/tmp/$NS.json

this worked for me, and I ran after verifying there were no dangling k8s objects in the ns. Thanks!

I used this to remove a namespace stuck at Terminated

example :

kubectl get namespace openebs -o json | jq -j '.spec.finalizers=null' > tmp.json 
kubectl replace --raw "/api/v1/namespaces/openebs/finalize" -f ./tmp.json

For all the googlers who bumped into stuck namespaces at Terminating on Rancher specific namespaces (e.g cattle-system), the following modified command (grebois's original) worked for me:

for NS in $(kubectl get ns 2>/dev/null | grep Terminating | cut -f1 -d ' '); do
  kubectl get ns $NS -o json > /tmp/$NS.json
  sed -i "s/\"controller.cattle.io\/namespace-auth\"//g" /tmp/$NS.json
  kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f /tmp/$NS.json
done

Folks, just FYI, when the video for this kubecon talk is out I plan to link to it and some of the helpful comments above, and lock this issue.

I recorded a 10 minute explanation of what is going on and presented it at this SIG Deep Dive session.

Here's a correct comment with 65 upvotes

Mentioned several times above, this medium post is an example of doing things the right way. Find and fix the broken api service.

All the one liners that just remove the finalizers on the namespace do address the root cause and leave your cluster subtly broken, which will bite you later. So please don't do that. The root cause fix is usually easier anyway. It seems that people like to post variations on this theme even though there's numerous correct answers in the thread already, so I'm going to lock the issue now, to ensure that this comment stays at the bottom.

Was this page helpful?
5 / 5 - 1 ratings