Table of contents
- 1. Check Kubernetes Components
- 2. Check Pod Status
- 3. Check Pod Logs
- 4. Check Node Status
- 5. Check Node Logs
- 6. Check API Server Logs
- 7. Check etcd Logs
- 8. Check the Service Status
- 9. Check Network Policies
- 10. Check Resource Limits
- 11. Check Pod IP Addresses
- 12. Check Node IP Addresses
- 13. Check Pod DNS
- 14. Check Node DNS
- 15. Check Node Ports
- 16. Check Pod Ports
- 17. Check Pod Environment Variables
- 18. Check Pod Configurations
- 19. Check Pod Volumes
- 20. Check Pod Resource Usage
- Resource metrics pipeline
- Monitor Node Health
- Debugging Kubernetes nodes with crictl :
- Auditing :
- Debugging Kubernetes Nodes With Kubectl :
- Developing and debugging services locally using telepresence :
Kubernetes is a powerful container orchestration system that can automate the deployment, scaling, and management of containerized applications. However, like any complex system, things can sometimes go wrong. In this blog post, we will discuss 20 real-life Kubernetes troubleshooting tips that can help you diagnose and fix common issues. We will also provide Bash code snippets to help you implement these tips in practice.
1. Check Kubernetes Components
The first step in troubleshooting Kubernetes is to check the health of the Kubernetes components. You can use the following Bash code to check the status of the Kubernetes components:
kubectl get componentstatuses
2. Check Pod Status
If a pod is not running, you can use the following Bash code to check its status:
kubectl get pods
3. Check Pod Logs
If a pod is running but you are experiencing issues, you can use the following Bash code to check its logs:
kubectl logs <pod-name>
4. Check Node Status
If a node is not responding, you can use the following Bash code to check its status:
kubectl get nodes
5. Check Node Logs
If a node is having issues, you can use the following Bash code to check its logs:
journalctl -u kubelet
6. Check API Server Logs
If the Kubernetes API server is having issues, you can use the following Bash code to check its logs:
journalctl -u kube-apiserver
7. Check etcd Logs
If the etcd datastore is having issues, you can use the following Bash code to check its logs:
journalctl -u etcd
8. Check the Service Status
If a service is not responding, you can use the following Bash code to check its status:
kubectl get services
9. Check Network Policies
If you are experiencing network issues, you can use the following Bash code to check your network policies:
kubectl get networkpolicies
10. Check Resource Limits
If you are experiencing performance issues, you can use the following Bash code to check the resource limits of your pods:
kubectl describe pod <pod-name>
11. Check Pod IP Addresses
If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your pods:
kubectl get pods -o wide
12. Check Node IP Addresses
If you are experiencing network issues, you can use the following Bash code to check the IP addresses of your nodes:
kubectl get nodes -o wide
13. Check Pod DNS
If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your pods:
kubectl exec <pod-name> -- nslookup <service-name>
14. Check Node DNS
If you are experiencing DNS issues, you can use the following Bash code to check the DNS configuration of your nodes:
kubectl exec <pod-name> -- cat /etc/resolv.conf
15. Check Node Ports
If you are experiencing network issues, you can use the following Bash code to check the ports on your nodes:
nmap -p 22,80,443 <node-ip>
16. Check Pod Ports
If you are experiencing network issues, you can use the following Bash code to check the ports on your pods:
kubectl port-forward <pod-name> <local-port>:<pod-port>
17. Check Pod Environment Variables
If you are experiencing issues related to environment variables, you can use the following Bash code to check the environment variables of your pods:
kubectl exec <pod-name> -- env
18. Check Pod Configurations
If you are experiencing issues related to configurations, you can use the following Bash code to check the configurations of your pods:
kubectl describe pod <pod-name>
19. Check Pod Volumes
If you are experiencing issues related to volumes, you can use the following Bash code to check the volumes of your pods:
kubectl describe pod <pod-name>
20. Check Pod Resource Usage
If you are experiencing performance issues, you can use the following Bash code to check the resource usage of your pods:
kubectl top pod <pod-name>
Resource metrics pipeline
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes/minikube" | jq '.'
curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes/minikube
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-scheduler-minikube" | jq '.'
Monitor Node Health
node-problem-detector.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-problem-detector-v0.1
namespace: kube-system
labels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
template:
metadata:
labels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
spec:
hostNetwork: true
containers:
- name: node-problem-detector
image: registry.k8s.io/node-problem-detector:v0.1
securityContext:
privileged: true
resources:
limits:
cpu: "200m"
memory: "100Mi"
requests:
cpu: "20m"
memory: "20Mi"
volumeMounts:
- name: log
mountPath: /log
readOnly: true
volumes:
- name: log
hostPath:
path: /var/log/
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
Change the node-problem-detector.yaml
to use the ConfigMap
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-problem-detector-v0.1
namespace: kube-system
labels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
template:
metadata:
labels:
k8s-app: node-problem-detector
version: v0.1
kubernetes.io/cluster-service: "true"
spec:
hostNetwork: true
containers:
- name: node-problem-detector
image: registry.k8s.io/node-problem-detector:v0.1
securityContext:
privileged: true
resources:
limits:
cpu: "200m"
memory: "100Mi"
requests:
cpu: "20m"
memory: "20Mi"
volumeMounts:
- name: log
mountPath: /log
readOnly: true
- name: config # Overwrite the config/ directory with ConfigMap volume
mountPath: /config
readOnly: true
volumes:
- name: log
hostPath:
path: /var/log/
- name: config # Define ConfigMap volume
configMap:
name: node-problem-detector-config
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
Debugging Kubernetes nodes with crictl :
crictl pods
crictl pods --name <name of your pod>
Auditing :
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
Debugging Kubernetes Nodes With Kubectl :
kubectl debug node/mynode -it --image=ubuntu
kubectl get pods
Developing and debugging services locally using telepresence :
Connecting your local machine to a remote Kubernetes cluster :
telepresence connect
Conclusion :
Kubernetes is a complex system that can sometimes be challenging to troubleshoot. In this blog post, we discussed 20 real-life Kubernetes troubleshooting tips with Bash code snippets that can help you diagnose and fix common issues. By using these tips, you can quickly identify and resolve issues in your Kubernetes environment.