Kubernetes is a powerful container orchestration system that can automate the deployment, scaling, and management of containerized applications. However, like any complex system, things can sometimes go wrong. In this blog post, we will discuss 20 real-life Kubernetes troubleshooting tips that can help you diagnose and fix common issues. We will also provide Bash code snippets to help you implement these tips in practice.

1. Check Kubernetes Components

The first step in troubleshooting Kubernetes is to check the health of the Kubernetes components. You can use the following Bash code to check the status of the Kubernetes components:

kubectl get componentstatuses

2. Check Pod Status

If a pod is not running, you can use the following Bash code to check its status:

kubectl get pods

3. Check Pod Logs

If a pod is running but you are experiencing issues, you can use the following Bash code to check its logs:

kubectl logs <pod-name>

4. Check Node Status

If a node is not responding, you can use the following Bash code to check its status:

kubectl get nodes

5. Check Node Logs

If a node is having issues, you can use the following Bash code to check its logs:

journalctl -u kubelet

6. Check API Server Logs

If the Kubernetes API server is having issues, you can use the following Bash code to check its logs:

journalctl -u kube-apiserver

7. Check etcd Logs

If the etcd datastore is having issues, you can use the following Bash code to check its logs:

journalctl -u etcd

8. Check the Service Status

If a service is not responding, you can use the following Bash code to check its status:

kubectl get services

9. Check Network Policies

If you are experiencing network issues, you can use the following Bash code to check your network policies:

kubectl get networkpolicies

10. Check Resource Limits

If you are experiencing performance issues, you can use the following Bash code to check the resource limits of your pods:

kubectl describe pod <pod-name>

11. Check Pod IP Addresses

If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your pods:

kubectl get pods -o wide

12. Check Node IP Addresses

If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your nodes:

kubectl get nodes -o wide

13. Check Pod DNS

If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your pods:

kubectl exec <pod-name> -- nslookup <service-name>

14. Check Node DNS

If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your nodes:

kubectl exec <pod-name> -- cat /etc/resolv.conf

15. Check Node Ports

If you are experiencing network issues, you can use the following Bash code to check the ports on your nodes:

nmap -p 22,80,443 <node-ip>

16. Check Pod Ports

If you are experiencing network issues, you can use the following Bash code to check the ports on your pods:

kubectl port-forward <pod-name> <local-port>:<pod-port>

17. Check Pod Environment Variables

If you are experiencing issues related to environment variables, you can use the following Bash code to check the environment variables of your pods:

kubectl exec <pod-name> -- env

18. Check Pod Configurations

If you are experiencing issues related to configurations, you can use the following Bash code to check the configurations of your pods:

kubectl describe pod <pod-name>

19. Check Pod Volumes

If you are experiencing issues related to volumes, you can use the following Bash code to check the volumes of your pods:

kubectl describe pod <pod-name>

20. Check Pod Resource Usage

If you are experiencing performance issues, you can use the following Bash code to check the resource usage of your pods:

kubectl top pod <pod-name>

Resource metrics pipeline

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes/minikube" | jq '.'

curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes/minikube

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-scheduler-minikube" | jq '.'

Monitor Node Health

node-problem-detector.yaml

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-problem-detector-v0.1

  namespace: kube-system

  labels:

    k8s-app: node-problem-detector

    version: v0.1

    kubernetes.io/cluster-service: "true"

spec:

  selector:

    matchLabels:

      k8s-app: node-problem-detector  

      version: v0.1

      kubernetes.io/cluster-service: "true"

  template:

    metadata:

      labels:

        k8s-app: node-problem-detector

        version: v0.1

        kubernetes.io/cluster-service: "true"

    spec:

      hostNetwork: true

      containers:

      - name: node-problem-detector

        image: registry.k8s.io/node-problem-detector:v0.1

        securityContext:

          privileged: true

        resources:

          limits:

            cpu: "200m"

            memory: "100Mi"

          requests:

            cpu: "20m"

            memory: "20Mi"

        volumeMounts:

        - name: log

          mountPath: /log

          readOnly: true

      volumes:

      - name: log

        hostPath:

          path: /var/log/

kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml

Change the node-problem-detector.yaml to use the ConfigMap

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-problem-detector-v0.1

  namespace: kube-system

  labels:

    k8s-app: node-problem-detector

    version: v0.1

    kubernetes.io/cluster-service: "true"

spec:

  selector:

    matchLabels:

      k8s-app: node-problem-detector  

      version: v0.1

      kubernetes.io/cluster-service: "true"

  template:

    metadata:

      labels:

        k8s-app: node-problem-detector

        version: v0.1

        kubernetes.io/cluster-service: "true"

    spec:

      hostNetwork: true

      containers:

      - name: node-problem-detector

        image: registry.k8s.io/node-problem-detector:v0.1

        securityContext:

          privileged: true

        resources:

          limits:

            cpu: "200m"

            memory: "100Mi"

          requests:

            cpu: "20m"

            memory: "20Mi"

        volumeMounts:

        - name: log

          mountPath: /log

          readOnly: true

        - name: config # Overwrite the config/ directory with ConfigMap volume

          mountPath: /config

          readOnly: true

      volumes:

      - name: log

        hostPath:

          path: /var/log/

      - name: config # Define ConfigMap volume

        configMap:

          name: node-problem-detector-config

kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml

Debugging Kubernetes nodes with crictl :

crictl pods

crictl pods --name <name of your pod>

Auditing :

apiVersion: audit.k8s.io/v1 # This is required.

kind: Policy

# Don't generate audit events for all requests in RequestReceived stage.

omitStages:

  - "RequestReceived"

rules:

  # Log pod changes at RequestResponse level

  - level: RequestResponse

    resources:

    - group: ""

      # Resource "pods" doesn't match requests to any subresource of pods,

      # which is consistent with the RBAC policy.

      resources: ["pods"]

  # Log "pods/log", "pods/status" at Metadata level

  - level: Metadata

    resources:

    - group: ""

      resources: ["pods/log", "pods/status"]



  # Don't log requests to a configmap called "controller-leader"

  - level: None

    resources:

    - group: ""

      resources: ["configmaps"]

      resourceNames: ["controller-leader"]



  # Don't log watch requests by the "system:kube-proxy" on endpoints or services

  - level: None

    users: ["system:kube-proxy"]

    verbs: ["watch"]

    resources:

    - group: "" # core API group

      resources: ["endpoints", "services"]



  # Don't log authenticated requests to certain non-resource URL paths.

  - level: None

    userGroups: ["system:authenticated"]

    nonResourceURLs:

    - "/api*" # Wildcard matching.

    - "/version"



  # Log the request body of configmap changes in kube-system.

  - level: Request

    resources:

    - group: "" # core API group

      resources: ["configmaps"]

    # This rule only applies to resources in the "kube-system" namespace.

    # The empty string "" can be used to select non-namespaced resources.

    namespaces: ["kube-system"]



  # Log configmap and secret changes in all other namespaces at the Metadata level.

  - level: Metadata

    resources:

    - group: "" # core API group

      resources: ["secrets", "configmaps"]



  # Log all other resources in core and extensions at the Request level.

  - level: Request

    resources:

    - group: "" # core API group

    - group: "extensions" # Version of group should NOT be included.



  # A catch-all rule to log all other requests at the Metadata level.

  - level: Metadata

    # Long-running requests like watches that fall under this rule will not

    # generate an audit event in RequestReceived.

    omitStages:

      - "RequestReceived"

Debugging Kubernetes Nodes With Kubectl :

kubectl debug node/mynode -it --image=ubuntu

kubectl get pods

Developing and debugging services locally using telepresence :

Connecting your local machine to a remote Kubernetes cluster :

telepresence connect

Conclusion :

Kubernetes is a complex system that can sometimes be challenging to troubleshoot. In this blog post, we discussed 20 real-life Kubernetes troubleshooting tips with Bash code snippets that can help you diagnose and fix common issues. By using these tips, you can quickly identify and resolve issues in your Kubernetes environment.

480+ Devops Team Stock Photos, Pictures & Royalty-Free ...

Real-life Kubernetes Troubleshooting

Table of contents

1. Check Kubernetes Components

2. Check Pod Status

3. Check Pod Logs

4. Check Node Status

5. Check Node Logs

6. Check API Server Logs

7. Check etcd Logs

8. Check the Service Status

9. Check Network Policies

10. Check Resource Limits

11. Check Pod IP Addresses

12. Check Node IP Addresses

13. Check Pod DNS

14. Check Node DNS

15. Check Node Ports

16. Check Pod Ports

17. Check Pod Environment Variables

18. Check Pod Configurations

19. Check Pod Volumes

20. Check Pod Resource Usage

Resource metrics pipeline

Monitor Node Health

Debugging Kubernetes nodes with crictl :

Auditing :

Debugging Kubernetes Nodes With Kubectl :

Developing and debugging services locally using telepresence :

Connecting your local machine to a remote Kubernetes cluster :

Conclusion :

Real-life Kubernetes Troubleshooting

Table of contents

1. Check Kubernetes Components

2. Check Pod Status

3. Check Pod Logs

4. Check Node Status

5. Check Node Logs

6. Check API Server Logs

7. Check etcd Logs

8. Check the Service Status

9. Check Network Policies

10. Check Resource Limits

11. Check Pod IP Addresses

12. Check Node IP Addresses

13. Check Pod DNS

14. Check Node DNS

15. Check Node Ports

16. Check Pod Ports

17. Check Pod Environment Variables

18. Check Pod Configurations

19. Check Pod Volumes

20. Check Pod Resource Usage

Resource metrics pipeline

Monitor Node Health

Debugging Kubernetes nodes with crictl :

Auditing :

Debugging Kubernetes Nodes With Kubectl :

Developing and debugging services locally using telepresence :

Connecting your local machine to a remote Kubernetes cluster :

Conclusion :

Did you find this article valuable?