Describe the bug
We cannot detach volumes attached to deleted nodes in Trident 21.10.1. In Trident v21.07.2, these volumes would be automatically detached after a certain period. If I understand correctly, this force detachment is done by AttachDetachController after ReconcilerMaxWaitForUnmountDuration.
It seems that this change is introduced in this commit. This commit makes Trident's ControllerUnpublishVolume check the existence of the node. If the node does not exist, ControllerUnpublishVolume now returns a NotFound error, so the volume detachment always fails when the node is already deleted.
In server failure, volume detachment might fail, and we have no choice but to delete the node, so it is desirable to detach volumes attached to deleted nodes automatically.
Environment
silenceAutosupport: true
(Trident Operator)To Reproduce
kubectl delete node
In the VolumeAttachment, the following error can be found.
rpc error: code = NotFound desc = node <NODE_NAME> was not found'
Expected behavior
Trident automatically detaches volumes attached to deleted nodes.
We run a 100+ node Kubernetes cluster on AWS which heavily relies on spot nodes. Spot nodes will be terminated with just a few minutes warning on AWS, which expected to happen quite often. Even if we run the Node Termination Handler in SQS mode and react to spot termination notifications with automatic node draining we usually end up in a situation where the detach process doesn't finish before a node is deleted.
In this scenario we often encounter the exact same issue as described by @tksm. This is a severe problem as workloads will be stuck in a crashlooping state because the PVC fails to attached after the pod is moved to a new node. I hope the problem can be hotfixed.
Any ETA on a fix?
@paalkr, the team is currently working on a fix. We will update this issue with a link to the commit once it merges.
Excellent, thank you very much.
Most helpful comment
@paalkr, the team is currently working on a fix. We will update this issue with a link to the commit once it merges.