Troubleshooting OpenShift Container Platform 4.x: Authentication¶
Prerequisite - retrieve kubeconfig file to communicate with the API server of the cluster¶
Connect to the cluster as system:admin using the kubeconfig file generated at installation time, as described in the solution OpenShift 4 system:admin kubeconfig file.
Unable to login due to EOF¶
The EOF error is a symptom caused by underlying components not working as expected and affecting the behaviour of the authentication cluster operator. The main reasons are the presence of CSR in Pending status or the CNI not in a healthy status. The error message from CLI is like this:
Raw
$ oc login https://api.<clustername>.<domain>:6443 --insecure-skip-tls-verify=true -v=10
...
F1106 10:40:33.771570 6911 helpers.go:116] error: EOF
````
## Pending CSR
Follow the prerequisite step to interact with the cluster.
Then, verify if any **CSR** in **Pending** status is present:
**Raw**
```text
$ oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-244x8 13h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
If CSR in Pending status are present, approve all of them with the following command:
Raw
$ oc adm certificate approve <CSR name>
NOTE: Check again after approval if any new CSR is generated.
Networking issues - router pods are not running¶
Follow the prerequisite step to interact with the cluster. If no CSR waiting for being approved are present, then the issue can be caused by the network components. So, verify the status of the router pods, they must be in Running status without restarts like in the following example:
Raw
$ oc get pods -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-75d55f4fb4-c8dzg 1/1 Running 0 30h
router-default-75d55f4fb4-fd7gz 1/1 Running 0 30h
If their status is different from Running or if they have lots of restarts, describe the pods to look for any explicit error:
Raw
$ oc describe pods <router pod name> -n openshift-ingress
Or look at the pod logs:
Raw
$ oc logs <router pod name> -n openshift-ingress
Networking issues - SDN not healthy¶
Follow the prerequisite step to interact with the cluster. Then, verify the status of the SDN pods, they must be in Running status without restarts as for the following example:
Raw
$ oc get pods -n openshift-sdn
NAME READY STATUS RESTARTS AGE
ovs-4sbkl 1/1 Running 0 30h
ovs-5t5pn 1/1 Running 0 30h
ovs-dzrcc 1/1 Running 0 30h
ovs-fbqbs 1/1 Running 0 30h
ovs-gc2gh 1/1 Running 0 30h
ovs-mqbqp 1/1 Running 0 30h
sdn-88p7h 2/2 Running 0 30h
sdn-cjz4p 2/2 Running 0 30h
sdn-controller-5nkxv 1/1 Running 0 30h
sdn-controller-c92cn 1/1 Running 0 30h
sdn-controller-f6hbt 1/1 Running 0 30h
sdn-gskgx 2/2 Running 0 30h
sdn-mzpnx 2/2 Running 0 30h
sdn-q5pmh 2/2 Running 0 30h
sdn-qw6pt 2/2 Running 0 30h
If one or more of them are in CrashLoopBackOff status verify first if the solution OVS and SDN Pods in CrashLoopbackOff after upgrade to OpenShift 4.6 can be helpful. Otherwise, follow the solution Troubleshooting OpenShift Container Platform 4.x: openshift-sdn to address the issue.
Authentication Cluster Operator¶
Follow the prerequisite step to interact with the cluster. If with the previous steps the issue is not fixed, then verify the status of the authentication pods, they must be in Running status without restarts:
Raw
$ oc get pods -n openshift-authentication
NAME READY STATUS RESTARTS AGE
pod/oauth-openshift-bf6f48656-4r6mz 1/1 Running 0 9m4s
pod/oauth-openshift-bf6f48656-gztld 1/1 Running 0 8m58s
If their status is different from Running or if they have lots of restarts, describe the pods to looking for any explicit error:
Raw
$ oc describe pods <oauth pod name> -n openshift-authentication
Or look at the pod logs:
Raw
$ oc logs <oauth pod name> -n openshift-authentication
Verify also the status of the authentication route, it must be present as shown in this example:
Raw
$ oc get route -n openshift-authentication
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/oauth-openshift oauth-openshift.apps.<clustername>.<domain> oauth-openshift 6443 passthrough/Redirect None
If the route is missing and the steps described in the Pending CSR or Networking issues sections were already done, look at the authentication cluster operator for any explicit error:
Raw
$ oc describe co authentication
$ oc logs -n openshift-authentication-operator $(oc get po -o name -n openshift-authentication-operator)
Unable to login due to certificate expired¶
In case of ingress certificate expired it is no more possible to access the cluster via the OpenShift Container Platform web console or the OpenShift CLI (oc) and an error similar to the following is returned:
Raw
$ oc login -u kubeadmin [https://api.cluster.example.com:6443](https://api.cluster.example.com:6443)
error: x509: certificate has expired or is not yet valid: current time 2021-10-21T08:33:38+01:00 is after 2021-09-20T19:48:38Z
To fix this issue is necessary first to follow the Prerequisite - retrieve kubeconfig file to communicate with the API server of the cluster to gain cluster administrator access.
Then, determine how the ingress controller is configured. If the following command return empty output, like:
Raw
$ oc get ingresscontroller -n openshift-ingress-operator -o jsonpath='{.items[].spec.defaultCertificate}{"\n"}'
This means that the default ingress certificate is in use and this step should be followed: Default ingress certificate.
Otherwise, if the output returned contains the name of the secret, like this:
Raw
$ oc get ingresscontroller -n openshift-ingress-operator -o jsonpath='{.items[].spec.defaultCertificate}{"\n"}'
{"name":"custom-cert"}
This means that a custom ingress certificate is in use and this step should be followed: Custom ingress certificate.
Default ingress certificate¶
The default ingress certificate is not automatically rotated because it is expected to be replaced after cluster installation, as stated here. So, to manually rotate it is sufficient to follow this solution: How to redeploy the default ingress certificate in OCP 4.
Custom ingress certificate¶
In case of a custom certificate the certificate must be replaced with a new one following the documentation: Replacing the default ingress certificate.
Let me know if you need any further adjustments to the formatting.