Kubernetes troubleshooting

Kubernetes is a powerful tool for managing containerized applications. However, as with any complex system, errors can occur when using it. When problems arise, it's important to have effective troubleshooting techniques and tools.

In this article, we'll take the following steps to get you started with event collection:

Retrieves the most recent events
Use pods to simulate problems
Store events in a pod located on a PV

Retrieves the most recent events

The first step in troubleshooting a Kubernetes cluster is to retrieve the latest events. Events in Kubernetes are generated by various components and objects in the cluster, such as pods, nodes, and services. They provide information about the status of the cluster and any issues that may occur.

To retrieve the latest events, you can use the Kubectl get events command. This will display a list of all events in the cluster.

kubectl get events


LAST SEEN   TYPE      REASON                    OBJECT                                 MESSAGE
78s         Warning   BackOff                   pod/bbb                                Back-off restarting failed container
72s         Warning   BackOff                   pod/bbb2                               Back-off restarting failed container
12m         Normal    Pulling                   pod/bbb3                               Pulling image "busybox"
12m         Normal    Created                   pod/bbb3                               Created container bbb3
46m         Normal    Started                   pod/bbb3                               Started container bbb3

As shown above, it displays a list of all communication ports in the cluster in chronological order. You can also add a -w tag to see how new events change.

This will show the real-time status of the events that have occurred in the cluster. By observing events, you can quickly identify any issues that may have occurred.

While the kubectl get events command is helpful for retrieving events, it can be difficult to identify the problem if the events are displayed in chronological order. To make it easier to identify issues, you can sort events by metadata.creationTimestamp.

kubectl get events --sort-by=.metadata.creationTimestamp
LAST SEEN   TYPE      REASON                    OBJECT                                 MESSAGE
104s        Normal    Pulling                   pod/busybox13                          Pulling image "busybox"
88s         Warning   FailedScheduling          pod/mysqldeployment-6f8b755598-phgzr   0/2 nodes are available: 2 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
104s        Warning   BackOff                   pod/busybox6                           Back-off restarting failed container
82s         Warning   ProvisioningFailed        persistentvolumeclaim/pv-volume        storageclass.storage.k8s.io "csi-hostpath-sc" not found
82s         Warning   ProvisioningFailed        persistentvolumeclaim/pv-volume-2      storageclass.storage.k8s.io "csi-hostpath-sc" not found

As shown above, a list of all events in the cluster is displayed by metada.creationTimestamp. By sequencing the communication ports in this way, you can quickly identify recent events and any issues that may arise.

Use pods to simulate problems

If you find an issue related to networking or service discovery, terminating the kube-proxy pod may help. The kube-proxy pod is responsible for networking and service discovery in the cluster, so terminating it helps identify any issues related to these features.

To terminate a kube-proxy pod, you can use the kubectl delete pod command. If you need to specify the name of the kube-proxy pod, you can find it using the kubectl get pods command.

kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS      AGE
coredns-57575c5f89-66z2h          1/1     Running   1 (45h ago)   36d
coredns-57575c5f89-bcjdn          1/1     Running   1 (45h ago)   36d
etcd-k81                          1/1     Running   1 (45h ago)   36d
fluentd-elasticsearch-5fdvc       1/1     Running   2 (45h ago)   60d
fluentd-elasticsearch-wx6x9       1/1     Running   1 (45h ago)   60d
kube-apiserver-k81                1/1     Running   1 (45h ago)   36d
kube-controller-manager-k81       1/1     Running   2 (45h ago)   36d
kube-proxy-bqpb5                  1/1     Running   1 (45h ago)   36d
kube-proxy-q94sk                  1/1     Running   1 (45h ago)   36d
kube-scheduler-k81                1/1     Running   2 (45h ago)   36d
metrics-server-5c59ff65b6-s4kms   1/1     Running   2 (45h ago)   58d
weave-net-56pl2                   2/2     Running   3 (45h ago)   61d
weave-net-rml96                   2/2     Running   5 (45h ago)   62d

As above, a list of all pods in the Kube system namespace is displayed, including kube-proxy pods. Once you have the name of the kube-proxy pod, you can delete it using the kubectl delete pod command.

kubectl delete pod -n kube-system kube-proxy-q94sk

This will remove the kube-proxy pod in the kube-system namespace. Kubernetes automatically creates a new kube-proxy pod to replace it.

You can check for events with the following command:

kubectl get events -n=kube-system --sort-by=.metadata.creationTimestamp


LAST SEEN   TYPE     REASON             OBJECT                 MESSAGE
4m59s       Normal   Killing            pod/kube-proxy-bqpb5   Stopping container kube-proxy
4m58s       Normal   Scheduled          pod/kube-proxy-cbkx6   Successfully assigned kube-system/kube-proxy-cbkx6 to k82
4m58s       Normal   SuccessfulCreate   daemonset/kube-proxy   Created pod: kube-proxy-cbkx6
4m57s       Normal   Pulled             pod/kube-proxy-cbkx6   Container image "registry.k8s.io/kube-proxy:v1.24.11" already present on machine

Store events in a pod located on a PV

Storing events in pods located in PVs is an effective way to keep track of what happens in your Kubernetes cluster. Here's a step-by-step explanation of how to do it:

1. Add permissions to the pod

To connect to the Kubernetes API in a pod, you need to give it the appropriate permissions. Here's an example of a YAML file that binds permissions to a pod.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-logger
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default
  namespace: default

2. Create Persistent Encrypted Volumes (PVs) and Persistent Encrypted Volume Declarations (PVCs)

Now that we've set up ClusterRoleBind, we can create a persistent volume to store our events. Here's an example of a YAML file that uses hostPath to create a PC:

# pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /mnt/data


---


# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  volumeName: my-pv

3. Create a pod to collect events

Now that we've set up our PVs and PVCs, we can create pods to collect events. Here's an example of a YAML file that creates a pod, connects to the Kubernetes API in the pod, and stores all events in the file events.log.

apiVersion: v1
kind: Pod
metadata:
  name: event-logger
spec:
  containers:
  - name: event-logger
    image: alpine
    command: ["/bin/sh", "-c"]
    args:
    - |
      apk add --no-cache curl jq && while true; do
        EVENTS=$(curl -s -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://${KUBERNETES_SERVICE_HOST}/api/v1/events | jq -r '.items[]')
        if [ -n "$EVENTS" ]; then
          echo "$EVENTS" >> /pv/events.log
        fi
        sleep 10
      done
    volumeMounts:
    - name: event-log
      mountPath: /pv
    - name: sa-token
      mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      readOnly: true
  volumes:
  - name: event-log
    persistentVolumeClaim:
      claimName: my-pvc
  - name: sa-token
    projected:
      sources:
      - serviceAccountToken:
          path: token
          expirationSeconds: 7200
      - configMap:
          name: kube-root-ca.crt

The pod will run a simple shell script with curl and jq installed, connect to the Kubernetes API using event-logger ClusterRoleBinding, and store all events in /pv/events.log.

You can run the following command to check for events:

kubectl exec event-logger -- cat /pv/events.log

By using these troubleshooting techniques and tools, you can keep your Kubernetes cluster healthy and running smoothly. Retrieving the latest events, simulating issues, and storing events in pods located in PVs are essential steps for effectively maintaining a cluster. As you become more experienced with Kubernetes, you can explore more advanced tools like Kibana, Prometheus, or Grafana for analyzing events, as well as centralized logging solutions like Elasticsearch or Fluentd.