Troubleshooting Networking and Route Connectivity¶

This guide outlines the diagnostic steps for resolving issues related to external access (Routes), internal communication (Services), and DNS resolution within OpenShift.

Initial Network Triage¶

When a user reports that an application is "unreachable," use these commands to locate the point of failure:

# Check if the Route is admitted and has a valid host
oc get route -n <namespace>

# Verify if the Service has active endpoints (Pods)
oc get endpoints <service_name> -n <namespace>

# Check the status of the Ingress Controller pods
oc get pods -n openshift-ingress

1. Route Connectivity Issues¶

The Route is the entry point for external traffic into the cluster.

Common Symptoms¶

404 Not Found: The request reached the router, but the router doesn't know where to send it.
503 Service Unavailable: The router knows the destination, but there are no healthy Pods to handle the request.
Connection Timeout: The traffic is likely being dropped by an external firewall or Load Balancer.

Resolution Steps¶

Verify Route Admission: Ensure the Route status is Accepted.

oc describe route <route_name> -n <namespace>

2. Service and Endpoint Failures¶

Services act as the internal load balancer for Pods.

Common Symptoms¶

Internal Communication Failure: Application components cannot "talk" to each other using the Service DNS name.
No Endpoints: The Service exists, but the Endpoints list is <none>, meaning no Pods are receiving traffic.

Resolution Steps¶

Check Selector Match: Verify that the labels defined in the Service selector match the labels applied to the Pods.

# View the Service selector
oc get service <service_name> -o jsonpath='{.spec.selector}'

# View Pod labels to ensure they match the selector above
oc get pods --show-labels -n <namespace>

3. DNS Resolution Issues¶

Standard resolution within the cluster should follow the format: <service>.<namespace>.svc.cluster.local.

Resolution Steps¶

Test CoreDNS: Verify the cluster-wide DNS operator is healthy and available.

oc get clusteroperator dns

Internal Lookup Test: Run a debug pod to test resolution directly from the network stack of a specific node.

oc debug node/<node_name>

# Once inside the debug shell:
chroot /host
dig <service_name>.<namespace>.svc.cluster.local

4. Network Policies (Egress/Ingress)¶

OpenShift uses NetworkPolicies to restrict traffic between namespaces.

Common Symptoms¶

Environment Inconsistency: Connectivity works in Development but fails in Production where security policies are typically stricter.
External Access Blocked: Specific pods cannot reach external databases, APIs, or services outside the cluster.

Resolution Steps¶

List Policies: Check if there are any Deny-All or highly restrictive policies currently active in the namespace.

oc get networkpolicy -n <namespace>

Audit Egress: If the pod needs to reach an external IP, ensure an EgressNetworkPolicy or an external corporate firewall is not blocking the traffic.

Escalation Criteria: If the Route and Service are correctly configured but the application remains unreachable via the corporate F5 or Load Balancer, escalate the incident to the L2 Networking/Infrastructure Team.