๐Ÿท๏ธ Troubleshoot cluster and nodes#

Tools#

k9s#

Warning

Forget k9s during cka preparation. Just a reminder.

Installation.

แ… curl -sLO https://github.com/derailed/k9s/releases/download/v0.50.13/k9s_Linux_amd64.tar.gz
แ… tar tvzf k9s_Linux_amd64.tar.gz
แ… tar xvzf k9s_Linux_amd64.tar.gz k9s
แ… sudo mv k9s /usr/local/bin

Infos.

แ… k9s info
 ____  __ ________
|    |/  /   __   \______
|       /\____    /  ___/
|    \   \  /    /\___  \
|____|\__ \/____//____  /
         \/           \/

Version:           v0.50.13
Config:            /home/guisam/.config/k9s/config.yaml
Custom Views:      /home/guisam/.config/k9s/views.yaml
Plugins:           /home/guisam/.config/k9s/plugins.yaml
Hotkeys:           /home/guisam/.config/k9s/hotkeys.yaml
Aliases:           /home/guisam/.config/k9s/aliases.yaml
Skins:             /home/guisam/.config/k9s/skins
Context Configs:   /home/guisam/.local/share/k9s/clusters
Logs:              /home/guisam/.local/state/k9s/k9s.log
Benchmarks:        /home/guisam/.local/state/k9s/benchmarks
ScreenDumps:       /home/guisam/.local/state/k9s/screen-dumps

Static pods#

Static pods

Static Pods are created and managed by the kubelet daemon on a specific node, without involvement from the Kubernetes API server or scheduler. The kubelet watches a predefined directory /etc/kubernetes/manifests/ for Pod manifest files

sudo tree /etc/kubernetes/manifests --noreport
/etc/kubernetes/manifests
โ”œโ”€โ”€ etcd.yaml
โ”œโ”€โ”€ kube-apiserver.yaml
โ”œโ”€โ”€ kube-controller-manager.yaml
โ””โ”€โ”€ kube-scheduler.yaml

Check cluster component#

แ… k -n kube-system get po -l component=kube-apiserver
แ… k -n kube-system describe po -l component=kube-apiserver
แ… k -n kube-system logs -l component=kube-apiserver
แ… k -n kube-system get po -l component=kube-apiserver -o jsonpath='{range .items[*]}{.metadata.name} {.spec.nodeName}{"\n"}'

Check container component#

# chek if container exists and get his id
แ… sudo crictl ps | awk '/apiserver/{print $1}'
# use container id to get his status
adfba1df325a5
แ… sudo crictl inspect adfba1df325a5 | jq -r '.info.config.command'
[
  "kube-apiserver",
  "--advertise-address=192.168.94.73",
  "--allow-privileged=true",
  "--authorization-mode=Node,RBAC",
  "--client-ca-file=/etc/kubernetes/pki/ca.crt",
  "--enable-admission-plugins=NodeRestriction",
  "--enable-bootstrap-token-auth=true",
  "--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt",
  "--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt",
  "--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key",
  "--etcd-servers=https://127.0.0.1:2379",
  "--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt",
  "--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key",
  "--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname",
  "--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt",
  "--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key",
  "--requestheader-allowed-names=front-proxy-client",
  "--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt",
  "--requestheader-extra-headers-prefix=X-Remote-Extra-",
  "--requestheader-group-headers=X-Remote-Group",
  "--requestheader-username-headers=X-Remote-User",
  "--secure-port=6443",
  "--service-account-issuer=https://kubernetes.default.svc.cluster.local",
  "--service-account-key-file=/etc/kubernetes/pki/sa.pub",
  "--service-account-signing-key-file=/etc/kubernetes/pki/sa.key",
  "--service-cluster-ip-range=10.96.0.0/16",
  "--tls-cert-file=/etc/kubernetes/pki/apiserver.crt",
  "--tls-private-key-file=/etc/kubernetes/pki/apiserver.key"
]
แ… sudo crictl inspect adfba1df325a5 | jq -r '.status.state'
CONTAINER_RUNNING

Check container logs#

# check logs on running container
แ… sudo crictl logs --tail 1 adfba1df325a5
I1209 02:34:23.791071       1 cidrallocator.go:277] updated ClusterIP allocator for Service CIDR 10.96.0.0/16
# if container crash, check container logs on node filesystem
แ… sudo tail -1 /var/log/pods/kube-system_kube-apiserver-cp-01_576ee8980d4161197f2b05b77950d238/kube-apiserver/0.log
2025-12-09T03:34:23.791298835+01:00 stderr F I1209 02:34:23.791071       1 cidrallocator.go:277] updated ClusterIP allocator for Service CIDR 10.96.0.0/16

kubelet#

kubelet

  • The kubelet is the primary node agent that runs on each node within a Kubernetes cluster, serving as the key liaison between the Kubernetes control plane and the individual nodes.

  • The kubelet monitors the health of containers, manages their lifecycle (starting, stopping, and restarting), and reports the nodeโ€™s status back to the control plane, thereby maintaining the clusterโ€™s desired operational state.

  • It interacts with the container runtimeโ€”such as Docker, containerd, or another compliant runtimeโ€”to execute and manage containers

# check service status
แ… sudo systemctl status kubelet.service --no-pager
# get service confguration
แ… sudo systemctl cat kubelet.service --no-pager
# check server certificate
แ… sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet.crt | grep 'Issuer'
        Issuer: CN = cp-01-ca@1765239260
แ… sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet.crt | grep -A1 'Extended Key Usage'
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
แ… sudo openssl x509 -noout -dates -in /var/lib/kubelet/pki/kubelet.crt
notBefore=Dec  8 23:14:20 2025 GMT
notAfter=Dec  8 23:14:20 2026 GMT
# check client certificate
แ… sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet-client-current.pem | grep 'Issuer'
        Issuer: CN = kubernetes
แ… sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet-client-current.pem | grep  -A1 'Extended Key Usage'
            X509v3 Extended Key Usage:
                TLS Web Client Authentication
# get systemd service logs
แ… sudo journalctl -u kubelet --since '4 hour ago' --no-pager
# get syslog service error logs
แ… sudo awk '$3~/^kubelet/&&$4~/^E/' /var/log/syslog

kube-proxy#

# check kube-proxy resources
แ… k -n kube-system get ds,po,cm -l k8s-app=kube-proxy
NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/kube-proxy   2         2         2       2            2           kubernetes.io/os=linux   3h57m

NAME                   READY   STATUS    RESTARTS   AGE
pod/kube-proxy-5sw9x   1/1     Running   0          3h57m
pod/kube-proxy-zqgnz   1/1     Running   0          3h52m
# get kube-proxy config
แ… k -n kube-system get cm kube-proxy -o jsonpath='{.data.config\.conf}'
# get kube-proxy container id
แ… kpid=$(sudo crictl ps | awk '/kube-proxy/{print $1}')
# check iptables usage
แ… sudo crictl logs "$kpid" 2>&1 | grep "Using iptables"
I1209 00:14:33.835327       1 server_linux.go:53] "Using iptables proxy"
I1209 00:14:34.055829       1 server_linux.go:132] "Using iptables Proxier"

check kube-proxy#

# create pod and service
แ… k run test --image=nginx:alpine --port=80 --expose --dry-run=client -oyaml > test.yaml
แ… vi test.yaml
# ensure pod runs on chosen node
แ… diff test.yaml.before test.yaml
30a31,35
>   nodeName: cp-01
>   tolerations:
>   - key: node-role.kubernetes.io/control-plane
>     operator: Exists
>     effect: NoSchedule
# deploy
แ… k apply -f test.yaml
# check iptables rules
แ… sudo iptables-save | grep test
-A KUBE-SEP-2DZQJPMH6S7CA75J -s 10.0.0.60/32 -m comment --comment "default/test" -j KUBE-MARK-MASQ
-A KUBE-SEP-2DZQJPMH6S7CA75J -p tcp -m comment --comment "default/test" -m tcp -j DNAT --to-destination 10.0.0.60:80
-A KUBE-SERVICES -d 10.96.223.95/32 -p tcp -m comment --comment "default/test cluster IP" -m tcp --dport 80 -j KUBE-SVC-HQOVHX4BQRA7XPPR
-A KUBE-SVC-HQOVHX4BQRA7XPPR ! -s 172.16.0.0/16 -d 10.96.223.95/32 -p tcp -m comment --comment "default/test cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-HQOVHX4BQRA7XPPR -m comment --comment "default/test -> 10.0.0.60:80" -j KUBE-SEP-2DZQJPMH6S7CA75J
# delete service
แ… k delete svc test
service "test" deleted from default namespace
# No more rules
แ… sudo iptables-save | grep test

Note

On non-systemd systems:

/var/log/kube-apiserver.log
/var/log/kube-scheduler.log
/var/log/kube-controller-manager.log

/var/log/containers
/var/log/pods/

/var/log/kubelet.log
/var/log/kube-proxy.log

kubectl debug#

k debug node/guisam-test-control-plane -it --image=alpine