Real-life Kubernetes Troubleshooting

Real-life Kubernetes Troubleshooting

·

6 min read

Kubernetes is a powerful container orchestration system that can automate the deployment, scaling, and management of containerized applications. However, like any complex system, things can sometimes go wrong. In this blog post, we will discuss 20 real-life Kubernetes troubleshooting tips that can help you diagnose and fix common issues. We will also provide Bash code snippets to help you implement these tips in practice.

1. Check Kubernetes Components

The first step in troubleshooting Kubernetes is to check the health of the Kubernetes components. You can use the following Bash code to check the status of the Kubernetes components:

kubectl get componentstatuses

2. Check Pod Status

If a pod is not running, you can use the following Bash code to check its status:

kubectl get pods

3. Check Pod Logs

If a pod is running but you are experiencing issues, you can use the following Bash code to check its logs:

kubectl logs <pod-name>

4. Check Node Status

If a node is not responding, you can use the following Bash code to check its status:

kubectl get nodes

5. Check Node Logs

If a node is having issues, you can use the following Bash code to check its logs:

journalctl -u kubelet

6. Check API Server Logs

If the Kubernetes API server is having issues, you can use the following Bash code to check its logs:

journalctl -u kube-apiserver

7. Check etcd Logs

If the etcd datastore is having issues, you can use the following Bash code to check its logs:

journalctl -u etcd

8. Check the Service Status

If a service is not responding, you can use the following Bash code to check its status:

kubectl get services

9. Check Network Policies

If you are experiencing network issues, you can use the following Bash code to check your network policies:

kubectl get networkpolicies

10. Check Resource Limits

If you are experiencing performance issues, you can use the following Bash code to check the resource limits of your pods:

kubectl describe pod <pod-name>

11. Check Pod IP Addresses

If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your pods:

kubectl get pods -o wide

12. Check Node IP Addresses

If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your nodes:

kubectl get nodes -o wide

13. Check Pod DNS

If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your pods:

kubectl exec <pod-name> -- nslookup <service-name>

14. Check Node DNS

If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your nodes:

kubectl exec <pod-name> -- cat /etc/resolv.conf

15. Check Node Ports

If you are experiencing network issues, you can use the following Bash code to check the ports on your nodes:

nmap -p 22,80,443 <node-ip>

16. Check Pod Ports

If you are experiencing network issues, you can use the following Bash code to check the ports on your pods:

kubectl port-forward <pod-name> <local-port>:<pod-port>

17. Check Pod Environment Variables

If you are experiencing issues related to environment variables, you can use the following Bash code to check the environment variables of your pods:

kubectl exec <pod-name> -- env

18. Check Pod Configurations

If you are experiencing issues related to configurations, you can use the following Bash code to check the configurations of your pods:

kubectl describe pod <pod-name>

19. Check Pod Volumes

If you are experiencing issues related to volumes, you can use the following Bash code to check the volumes of your pods:

kubectl describe pod <pod-name>

20. Check Pod Resource Usage

If you are experiencing performance issues, you can use the following Bash code to check the resource usage of your pods:

kubectl top pod <pod-name>

Resource metrics pipeline

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes/minikube" | jq '.'
curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes/minikube
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-scheduler-minikube" | jq '.'

Monitor Node Health

node-problem-detector.yaml

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-problem-detector-v0.1

  namespace: kube-system

  labels:

    k8s-app: node-problem-detector

    version: v0.1

    kubernetes.io/cluster-service: "true"

spec:

  selector:

    matchLabels:

      k8s-app: node-problem-detector  

      version: v0.1

      kubernetes.io/cluster-service: "true"

  template:

    metadata:

      labels:

        k8s-app: node-problem-detector

        version: v0.1

        kubernetes.io/cluster-service: "true"

    spec:

      hostNetwork: true

      containers:

      - name: node-problem-detector

        image: registry.k8s.io/node-problem-detector:v0.1

        securityContext:

          privileged: true

        resources:

          limits:

            cpu: "200m"

            memory: "100Mi"

          requests:

            cpu: "20m"

            memory: "20Mi"

        volumeMounts:

        - name: log

          mountPath: /log

          readOnly: true

      volumes:

      - name: log

        hostPath:

          path: /var/log/
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml

Change the node-problem-detector.yaml to use the ConfigMap

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-problem-detector-v0.1

  namespace: kube-system

  labels:

    k8s-app: node-problem-detector

    version: v0.1

    kubernetes.io/cluster-service: "true"

spec:

  selector:

    matchLabels:

      k8s-app: node-problem-detector  

      version: v0.1

      kubernetes.io/cluster-service: "true"

  template:

    metadata:

      labels:

        k8s-app: node-problem-detector

        version: v0.1

        kubernetes.io/cluster-service: "true"

    spec:

      hostNetwork: true

      containers:

      - name: node-problem-detector

        image: registry.k8s.io/node-problem-detector:v0.1

        securityContext:

          privileged: true

        resources:

          limits:

            cpu: "200m"

            memory: "100Mi"

          requests:

            cpu: "20m"

            memory: "20Mi"

        volumeMounts:

        - name: log

          mountPath: /log

          readOnly: true

        - name: config # Overwrite the config/ directory with ConfigMap volume

          mountPath: /config

          readOnly: true

      volumes:

      - name: log

        hostPath:

          path: /var/log/

      - name: config # Define ConfigMap volume

        configMap:

          name: node-problem-detector-config
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml

Debugging Kubernetes nodes with crictl :

crictl pods
crictl pods --name <name of your pod>

Auditing :

apiVersion: audit.k8s.io/v1 # This is required.

kind: Policy

# Don't generate audit events for all requests in RequestReceived stage.

omitStages:

  - "RequestReceived"

rules:

  # Log pod changes at RequestResponse level

  - level: RequestResponse

    resources:

    - group: ""

      # Resource "pods" doesn't match requests to any subresource of pods,

      # which is consistent with the RBAC policy.

      resources: ["pods"]

  # Log "pods/log", "pods/status" at Metadata level

  - level: Metadata

    resources:

    - group: ""

      resources: ["pods/log", "pods/status"]



  # Don't log requests to a configmap called "controller-leader"

  - level: None

    resources:

    - group: ""

      resources: ["configmaps"]

      resourceNames: ["controller-leader"]



  # Don't log watch requests by the "system:kube-proxy" on endpoints or services

  - level: None

    users: ["system:kube-proxy"]

    verbs: ["watch"]

    resources:

    - group: "" # core API group

      resources: ["endpoints", "services"]



  # Don't log authenticated requests to certain non-resource URL paths.

  - level: None

    userGroups: ["system:authenticated"]

    nonResourceURLs:

    - "/api*" # Wildcard matching.

    - "/version"



  # Log the request body of configmap changes in kube-system.

  - level: Request

    resources:

    - group: "" # core API group

      resources: ["configmaps"]

    # This rule only applies to resources in the "kube-system" namespace.

    # The empty string "" can be used to select non-namespaced resources.

    namespaces: ["kube-system"]



  # Log configmap and secret changes in all other namespaces at the Metadata level.

  - level: Metadata

    resources:

    - group: "" # core API group

      resources: ["secrets", "configmaps"]



  # Log all other resources in core and extensions at the Request level.

  - level: Request

    resources:

    - group: "" # core API group

    - group: "extensions" # Version of group should NOT be included.



  # A catch-all rule to log all other requests at the Metadata level.

  - level: Metadata

    # Long-running requests like watches that fall under this rule will not

    # generate an audit event in RequestReceived.

    omitStages:

      - "RequestReceived"

Debugging Kubernetes Nodes With Kubectl :

kubectl debug node/mynode -it --image=ubuntu
kubectl get pods

Developing and debugging services locally using telepresence :

Connecting your local machine to a remote Kubernetes cluster :

telepresence connect

Conclusion :

Kubernetes is a complex system that can sometimes be challenging to troubleshoot. In this blog post, we discussed 20 real-life Kubernetes troubleshooting tips with Bash code snippets that can help you diagnose and fix common issues. By using these tips, you can quickly identify and resolve issues in your Kubernetes environment.

480+ Devops Team Stock Photos, Pictures & Royalty-Free ...

Did you find this article valuable?

Support DevOpsculture by becoming a sponsor. Any amount is appreciated!