Zombie pods and orphaned resources
Zombie Pods and Orphaned Resources¶
Over time, finished tasks or failed processes can leave behind "zombie" resources. These consume API memory and make monitoring dashboards difficult to read.
Common Symptoms¶
- Dashboard Clutter: Hundreds of Pods in
Completed,Error, orEvictedstates. - Namespace Stuck: A namespace is deleted but remains in
Terminatingstate for hours. - Job Buildup: Old CronJobs leaving behind successful pods that are no longer needed.
Resolution Steps¶
- Cleanup Completed/Failed Pods: Safely remove pods that are no longer active across a namespace.
# Delete all pods in 'Succeeded' or 'Failed' state in a namespace
oc delete pods --field-selector=status.phase==Succeeded -n <namespace>
oc delete pods --field-selector=status.phase==Failed -n <namespace>
Handle Terminating Namespaces: Identify which resource is blocking a namespace from closing.
# Check for remaining resources in a 'Terminating' namespace
oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get -n <namespace>
Cleanup Orphaned Jobs:
# Delete jobs that have completed their execution
oc delete jobs --field-selector=status.successful==1 -n <namespace>
Escalation Criteria: > 1. If an ETCD member is Unhealthy or the operator is Degraded, escalate to L2 Cluster Admins immediately. ETCD issues can lead to total cluster failure. 2. If a namespace is stuck in Terminating and clearing resources doesn't work, do not attempt to "force delete" via the API; escalate to L2.