Troubleshooting Project Quotas and Resource Limits¶
This guide helps L1 analysts diagnose issues where applications cannot scale or new resources cannot be created due to Namespace-level restrictions.
1. Resource Quota Exhaustion¶
Resource Quotas limit the total amount of CPU, Memory, and other resources a project can consume.
Common Symptoms¶
- Forbidden Errors: "Error from server (Forbidden): pods 'xyz' is forbidden: exceeded quota."
- Scaling Failures: A deployment is scaled up, but the new pods never appear (not even in
Pendingstate).
Resolution Steps¶
- Check Quota Status: Identify which specific resource (CPU, Memory, Pods, Services) has reached its limit.
oc get quota -n <namespace>
Describe Quota Details: See the "Used" vs "Hard" limits.
oc describe quota <quota_name> -n <namespace>
2. LimitRange Enforcement¶
LimitRanges set the minimum and maximum resource requests/limits for individual containers within a specific namespace. They also define default values for containers that do not specify their own resource requirements.
Common Symptoms¶
- Validation Errors: Attempts to create or scale a Pod fail with messages like:
"Pod is forbidden: maximum memory usage per Container is 512Mi, but limit is 1Gi." - Missing Requests: Pod creation is blocked because the namespace requires explicit resource definitions:
"Pod is forbidden: CPU request for container is required." - Unexpected Defaults: Pods are running with different resource values than expected because the LimitRange applied a default "burst" setting.
Resolution Steps¶
- View LimitRange Rules: Identify the constraints (min, max, and default) enforced within the namespace.
# List all LimitRanges in the namespace
oc get limitrange -n <namespace>
# View the specific constraints of a LimitRange
oc describe limitrange <limitrange_name> -n <namespace>
Action: Advise the application owner or developer to adjust the resources.requests and resources.limits sections in their Deployment/Pod YAML to comply with the boundaries identified above.
3. Application Rollbacks (Safe Recovery)¶
When a new deployment version fails due to misconfiguration, image issues, or application crashes, L1 support can perform a safe rollback to the last known working state to minimize downtime.
Resolution Steps¶
- View Deployment History: Check the list of previous revisions to identify the stable version of the workload.
# List all previous revisions for a deployment
oc rollout history deployment/<deployment_name> -n <namespace>
# View details of a specific revision (e.g., revision 2)
oc rollout history deployment/<deployment_name> -n <namespace> --revision=2
Perform Rollback: Immediately revert the deployment to the previous version to restore service.
# Rollback to the immediate previous version
oc rollout undo deployment/<deployment_name> -n <namespace>
# Rollback to a specific stable revision
oc rollout undo deployment/<deployment_name> -n <namespace> --to-revision=2
4. Project Request Failures¶
This scenario occurs when a user is unable to create a new Project (Namespace) via the OpenShift Web Console or the CLI, often resulting in "Access Denied" or timeout errors.
Resolution Steps¶
- Check Self-Provisioner Status: Verify if the user (or their associated group) has the necessary
self-provisionerrole to request new projects.
oc adm policy who-can create projectrequests
Check Cluster Load and Templates: If project creation is significantly delayed or failing with internal errors, verify the status and existence of the global project-request template.
oc get template -n openshift-config
Escalation Criteria: If a project requires a permanent increase in Quota limits to accommodate new workloads, gather the current usage statistics (oc describe quota -n