This page explains how to debug Pods running (or crashing) on a Node.
kubectl
.kubectl describe pod
to fetch details about podsFor this example we'll use a Deployment to create two pods, similar to the earlier example.
apiVersion:apps/v1kind:Deploymentmetadata:name:nginx-deploymentspec:selector:matchLabels:app:nginxreplicas:2template:metadata:labels:app:nginxspec:containers:- name:nginximage:nginxresources:limits:memory:"128Mi"cpu:"500m"ports:- containerPort:80
Create deployment by running following command:
kubectl apply -f https://k8s.io/examples/application/nginx-with-request.yaml
deployment.apps/nginx-deployment created
Check pod status by following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-deployment-67d4bdd6f5-cx2nz 1/1 Running 0 13s nginx-deployment-67d4bdd6f5-w6kd7 1/1 Running 0 13s
We can retrieve a lot more information about each of these pods using kubectl describe pod
. For example:
kubectl describe pod nginx-deployment-67d4bdd6f5-w6kd7
Name: nginx-deployment-67d4bdd6f5-w6kd7 Namespace: default Priority: 0 Node: kube-worker-1/192.168.0.113 Start Time: Thu, 17 Feb 2022 16:51:01 -0500 Labels: app=nginx pod-template-hash=67d4bdd6f5 Annotations: <none> Status: Running IP: 10.88.0.3 IPs: IP: 10.88.0.3 IP: 2001:db8::1 Controlled By: ReplicaSet/nginx-deployment-67d4bdd6f5 Containers: nginx: Container ID: containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462a Image: nginx Image ID: docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767 Port: 80/TCP Host Port: 0/TCP State: Running Started: Thu, 17 Feb 2022 16:51:05 -0500 Ready: True Restart Count: 0 Limits: cpu: 500m memory: 128Mi Requests: cpu: 500m memory: 128Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bgsgp (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-bgsgp: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 34s default-scheduler Successfully assigned default/nginx-deployment-67d4bdd6f5-w6kd7 to kube-worker-1 Normal Pulling 31s kubelet Pulling image "nginx" Normal Pulled 30s kubelet Successfully pulled image "nginx" in 1.146417389s Normal Created 30s kubelet Created container nginx Normal Started 30s kubelet Started container nginx
Here you can see configuration information about the container(s) and Pod (labels, resource requirements, etc.), as well as status information about the container(s) and Pod (state, readiness, restart count, events, etc.).
The container state is one of Waiting, Running, or Terminated. Depending on the state, additional information will be provided -- here you can see that for a container in Running state, the system tells you when the container started.
Ready tells you whether the container passed its last readiness probe. (In this case, the container does not have a readiness probe configured; the container is assumed to be ready if no readiness probe is configured.)
Restart Count tells you how many times the container has been restarted; this information can be useful for detecting crash loops in containers that are configured with a restart policy of 'always.'
Currently the only Condition associated with a Pod is the binary Ready condition, which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.
Lastly, you see a log of recent events related to your Pod. "From" indicates the component that is logging the event. "Reason" and "Message" tell you what happened.
A common scenario that you can detect using events is when you've created a Pod that won't fit on any node. For example, the Pod might request more resources than are free on any node, or it might specify a label selector that doesn't match any nodes. Let's say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500, on a four-node cluster where each (virtual) machine has 1 CPU. In that case one of the Pods will not be able to schedule. (Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)
kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-deployment-1006230814-6winp 1/1 Running 0 7m nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m nginx-deployment-1370807587-fg172 0/1 Pending 0 1m nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m
To find out why the nginx-deployment-1370807587-fz9sd pod is not running, we can use kubectl describe pod
on the pending Pod and look at its events:
kubectl describe pod nginx-deployment-1370807587-fz9sd
Name: nginx-deployment-1370807587-fz9sd Namespace: default Node: / Labels: app=nginx,pod-template-hash=1370807587 Status: Pending IP: Controllers: ReplicaSet/nginx-deployment-1370807587 Containers: nginx: Image: nginx Port: 80/TCP QoS Tier: memory: Guaranteed cpu: Guaranteed Limits: cpu: 1 memory: 128Mi Requests: cpu: 1 memory: 128Mi Environment Variables: Volumes: default-token-4bcbi: Type: Secret (a volume populated by a Secret) SecretName: default-token-4bcbi Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000 fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000
Here you can see the event generated by the scheduler saying that the Pod failed to schedule for reason FailedScheduling
(and possibly others). The message tells us that there were not enough resources for the Pod on any of the nodes.
To correct this situation, you can use kubectl scale
to update your Deployment to specify four or fewer replicas. (Or you could leave the one Pod pending, which is harmless.)
Events such as the ones you saw at the end of kubectl describe pod
are persisted in etcd and provide high-level information on what is happening in the cluster. To list all events you can use
kubectl get events
but you have to remember that events are namespaced. This means that if you're interested in events for some namespaced object (e.g. what happened with Pods in namespace my-namespace
) you need to explicitly provide a namespace to the command:
kubectl get events --namespace=my-namespace
To see events from all namespaces, you can use the --all-namespaces
argument.
In addition to kubectl describe pod
, another way to get extra information about a pod (beyond what is provided by kubectl get pod
) is to pass the -o yaml
output format flag to kubectl get pod
. This will give you, in YAML format, even more information than kubectl describe pod
--essentially all of the information the system has about the Pod. Here you will see things like annotations (which are key-value metadata without the label restrictions, that is used internally by Kubernetes system components), restart policy, ports, and volumes.
kubectl get pod nginx-deployment-1006230814-6winp -o yaml
apiVersion:v1kind:Podmetadata:creationTimestamp:"2022-02-17T21:51:01Z"generateName:nginx-deployment-67d4bdd6f5-labels:app:nginxpod-template-hash:67d4bdd6f5name:nginx-deployment-67d4bdd6f5-w6kd7namespace:defaultownerReferences:- apiVersion:apps/v1blockOwnerDeletion:truecontroller:truekind:ReplicaSetname:nginx-deployment-67d4bdd6f5uid:7d41dfd4-84c0-4be4-88ab-cedbe626ad82resourceVersion:"1364"uid:a6501da1-0447-4262-98eb-c03d4002222espec:containers:- image:nginximagePullPolicy:Alwaysname:nginxports:- containerPort:80protocol:TCPresources:limits:cpu:500mmemory:128Mirequests:cpu:500mmemory:128MiterminationMessagePath:/dev/termination-logterminationMessagePolicy:FilevolumeMounts:- mountPath:/var/run/secrets/kubernetes.io/serviceaccountname:kube-api-access-bgsgpreadOnly:truednsPolicy:ClusterFirstenableServiceLinks:truenodeName:kube-worker-1preemptionPolicy:PreemptLowerPrioritypriority:0restartPolicy:AlwaysschedulerName:default-schedulersecurityContext:{}serviceAccount:defaultserviceAccountName:defaultterminationGracePeriodSeconds:30tolerations:- effect:NoExecutekey:node.kubernetes.io/not-readyoperator:ExiststolerationSeconds:300- effect:NoExecutekey:node.kubernetes.io/unreachableoperator:ExiststolerationSeconds:300volumes:- name:kube-api-access-bgsgpprojected:defaultMode:420sources:- serviceAccountToken:expirationSeconds:3607path:token- configMap:items:- key:ca.crtpath:ca.crtname:kube-root-ca.crt- downwardAPI:items:- fieldRef:apiVersion:v1fieldPath:metadata.namespacepath:namespacestatus:conditions:- lastProbeTime:nulllastTransitionTime:"2022-02-17T21:51:01Z"status:"True"type:Initialized- lastProbeTime:nulllastTransitionTime:"2022-02-17T21:51:06Z"status:"True"type:Ready- lastProbeTime:nulllastTransitionTime:"2022-02-17T21:51:06Z"status:"True"type:ContainersReady- lastProbeTime:nulllastTransitionTime:"2022-02-17T21:51:01Z"status:"True"type:PodScheduledcontainerStatuses:- containerID:containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462aimage:docker.io/library/nginx:latestimageID:docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767lastState:{}name:nginxready:truerestartCount:0started:truestate:running:startedAt:"2022-02-17T21:51:05Z"hostIP:192.168.0.113phase:RunningpodIP:10.88.0.3podIPs:- ip:10.88.0.3- ip:2001:db8::1qosClass:GuaranteedstartTime:"2022-02-17T21:51:01Z"
First, look at the logs of the affected container:
kubectl logs ${POD_NAME}${CONTAINER_NAME}
If your container has previously crashed, you can access the previous container's crash log with:
kubectl logs --previous ${POD_NAME}${CONTAINER_NAME}
If the container image includes debugging utilities, as is the case with images built from Linux and Windows OS base images, you can run commands inside a specific container with kubectl exec
:
kubectl exec${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD}${ARG1}${ARG2} ... ${ARGN}
-c ${CONTAINER_NAME}
is optional. You can omit it for Pods that only contain a single container.As an example, to look at the logs from a running Cassandra pod, you might run
kubectl exec cassandra -- cat /var/log/cassandra/system.log
You can run a shell that's connected to your terminal using the -i
and -t
arguments to kubectl exec
, for example:
kubectl exec -it cassandra -- sh
For more details, see Get a Shell to a Running Container.
Kubernetes v1.25 [stable]
Ephemeral containers are useful for interactive troubleshooting when kubectl exec
is insufficient because a container has crashed or a container image doesn't include debugging utilities, such as with distroless images.
You can use the kubectl debug
command to add ephemeral containers to a running Pod. First, create a pod for the example:
kubectl run ephemeral-demo --image=registry.k8s.io/pause:3.1 --restart=Never
The examples in this section use the pause
container image because it does not contain debugging utilities, but this method works with all container images.
If you attempt to use kubectl exec
to create a shell you will see an error because there is no shell in this container image.
kubectl exec -it ephemeral-demo -- sh
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
You can instead add a debugging container using kubectl debug
. If you specify the -i
/--interactive
argument, kubectl
will automatically attach to the console of the Ephemeral Container.
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
Defaulting debug container name to debugger-8xzrl. If you don't see a command prompt, try pressing enter. / #
This command adds a new busybox container and attaches to it. The --target
parameter targets the process namespace of another container. It's necessary here because kubectl run
does not enable process namespace sharing in the pod it creates.
--target
parameter must be supported by the Container Runtime. When not supported, the Ephemeral Container may not be started, or it may be started with an isolated process namespace so that ps
does not reveal processes in other containers.You can view the state of the newly created ephemeral container using kubectl describe
:
kubectl describe pod ephemeral-demo
... Ephemeral Containers: debugger-8xzrl: Container ID: docker://b888f9adfd15bd5739fefaa39e1df4dd3c617b9902082b1cfdc29c4028ffb2eb Image: busybox Image ID: docker-pullable://busybox@sha256:1828edd60c5efd34b2bf5dd3282ec0cc04d47b2ff9caa0b6d4f07a21d1c08084 Port: <none> Host Port: <none> State: Running Started: Wed, 12 Feb 2020 14:25:42 +0100 Ready: False Restart Count: 0 Environment: <none> Mounts: <none> ...
Use kubectl delete
to remove the Pod when you're finished:
kubectl delete pod ephemeral-demo
Sometimes Pod configuration options make it difficult to troubleshoot in certain situations. For example, you can't run kubectl exec
to troubleshoot your container if your container image does not include a shell or if your application crashes on startup. In these situations you can use kubectl debug
to create a copy of the Pod with configuration values changed to aid debugging.
Adding a new container can be useful when your application is running but not behaving as you expect and you'd like to add additional troubleshooting utilities to the Pod.
For example, maybe your application's container images are built on busybox
but you need debugging utilities not included in busybox
. You can simulate this scenario using kubectl run
:
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
Run this command to create a copy of myapp
named myapp-debug
that adds a new Ubuntu container for debugging:
kubectl debug myapp -it --image=ubuntu --share-processes --copy-to=myapp-debug
Defaulting debug container name to debugger-w7xmf. If you don't see a command prompt, try pressing enter. root@myapp-debug:/#
kubectl debug
automatically generates a container name if you don't choose one using the --container
flag.-i
flag causes kubectl debug
to attach to the new container by default. You can prevent this by specifying --attach=false
. If your session becomes disconnected you can reattach using kubectl attach
.--share-processes
allows the containers in this Pod to see processes from the other containers in the Pod. For more information about how this works, see Share Process Namespace between Containers in a Pod.Don't forget to clean up the debugging Pod when you're finished with it:
kubectl delete pod myapp myapp-debug
Sometimes it's useful to change the command for a container, for example to add a debugging flag or because the application is crashing.
To simulate a crashing application, use kubectl run
to create a container that immediately exits:
kubectl run --image=busybox:1.28 myapp -- false
You can see using kubectl describe pod myapp
that this container is crashing:
Containers: myapp: Image: busybox ... Args: false State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1
You can use kubectl debug
to create a copy of this Pod with the command changed to an interactive shell:
kubectl debug myapp -it --copy-to=myapp-debug --container=myapp -- sh
If you don't see a command prompt, try pressing enter. / #
Now you have an interactive shell that you can use to perform tasks like checking filesystem paths or running the container command manually.
--container
or kubectl debug
will instead create a new container to run the command you specified.-i
flag causes kubectl debug
to attach to the container by default. You can prevent this by specifying --attach=false
. If your session becomes disconnected you can reattach using kubectl attach
.Don't forget to clean up the debugging Pod when you're finished with it:
kubectl delete pod myapp myapp-debug
In some situations you may want to change a misbehaving Pod from its normal production container images to an image containing a debugging build or additional utilities.
As an example, create a Pod using kubectl run
:
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
Now use kubectl debug
to make a copy and change its container image to ubuntu
:
kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu
The syntax of --set-image
uses the same container_name=image
syntax as kubectl set image
. *=ubuntu
means change the image of all containers to ubuntu
.
Don't forget to clean up the debugging Pod when you're finished with it:
kubectl delete pod myapp myapp-debug
If none of these approaches work, you can find the Node on which the Pod is running and create a Pod running on the Node. To create an interactive shell on a Node using kubectl debug
, run:
kubectl debug node/mynode -it --image=ubuntu
Creating debugging pod node-debugger-mynode-pdx84 with container debugger on node mynode. If you don't see a command prompt, try pressing enter. root@ek8s:/#
When creating a debugging session on a node, keep in mind that:
kubectl debug
automatically generates the name of the new Pod based on the name of the Node./host
.chroot /host
may fail.--profile=sysadmin
flag.Don't forget to clean up the debugging Pod when you're finished with it:
kubectl delete pod node-debugger-mynode-pdx84
When using kubectl debug
to debug a node via a debugging Pod, a Pod via an ephemeral container, or a copied Pod, you can apply a profile to them. By applying a profile, specific properties such as securityContext are set, allowing for adaptation to various scenarios. There are two types of profiles, static profile and custom profile.
A static profile is a set of predefined properties, and you can apply them using the --profile
flag. The available profiles are as follows:
Profile | Description |
---|---|
legacy | A set of properties backwards compatibility with 1.22 behavior |
general | A reasonable set of generic properties for each debugging journey |
baseline | A set of properties compatible with PodSecurityStandard baseline policy |
restricted | A set of properties compatible with PodSecurityStandard restricted policy |
netadmin | A set of properties including Network Administrator privileges |
sysadmin | A set of properties including System Administrator (root) privileges |
--profile
, the legacy
profile is used by default, but it is planned to be deprecated in the near future. So it is recommended to use other profiles such as general
.Assume that you create a Pod and debug it. First, create a Pod named myapp
as an example:
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
Then, debug the Pod using an ephemeral container. If the ephemeral container needs to have privilege, you can use the sysadmin
profile:
kubectl debug -it myapp --image=busybox:1.28 --target=myapp --profile=sysadmin
Targeting container "myapp". If you don't see processes from this container it may be because the container runtime doesn't support this feature. Defaulting debug container name to debugger-6kg4x. If you don't see a command prompt, try pressing enter. / #
Check the capabilities of the ephemeral container process by running the following command inside the container:
/ # grep Cap /proc/$$/status
... CapPrm: 000001ffffffffff CapEff: 000001ffffffffff ...
This means the container process is granted full capabilities as a privileged container by applying sysadmin
profile. See more details about capabilities.
You can also check that the ephemeral container was created as a privileged container:
kubectl get pod myapp -o jsonpath='{.spec.ephemeralContainers[0].securityContext}'
{"privileged":true}
Clean up the Pod when you're finished with it:
kubectl delete pod myapp
Kubernetes v1.32 [stable]
You can define a partial container spec for debugging as a custom profile in either YAML or JSON format, and apply it using the --custom
flag.
name
, image
, command
, lifecycle
and volumeDevices
fields of the container spec are not allowed. It does not support the modification of the Pod spec.Create a Pod named myapp as an example:
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
Create a custom profile in YAML or JSON format. Here, create a YAML format file named custom-profile.yaml
:
env:- name:ENV_VAR_1value:value_1- name:ENV_VAR_2value:value_2securityContext:capabilities:add:- NET_ADMIN- SYS_TIME
Run this command to debug the Pod using an ephemeral container with the custom profile:
kubectl debug -it myapp --image=busybox:1.28 --target=myapp --profile=general --custom=custom-profile.yaml
You can check that the ephemeral container has been added to the target Pod with the custom profile applied:
kubectl get pod myapp -o jsonpath='{.spec.ephemeralContainers[0].env}'
[{"name":"ENV_VAR_1","value":"value_1"},{"name":"ENV_VAR_2","value":"value_2"}]
kubectl get pod myapp -o jsonpath='{.spec.ephemeralContainers[0].securityContext}'
{"capabilities":{"add":["NET_ADMIN","SYS_TIME"]}}
Clean up the Pod when you're finished with it:
kubectl delete pod myapp