0

I'm running Kubernetes Jobs in which I set limits and requests both to the same number of CPUs. In some of these jobs I'm occasionally seeing OutOfcpu errors

When I kubectl describe pods PODNAME I see the following message:

Pod Node didn't have enough resource: cpu, requested: 8000, used: 11453, capacity: 16000 

That pretty clearly indicates why the OutOfcpu occurred.

But my Limits.cpu == Requests.cpu == 8.

 Limits: cpu: 8 ephemeral-storage: 500Gi memory: 10Gi Requests: cpu: 8 ephemeral-storage: 300Gi memory: 2Gi 

So as far as I understand I should have been throttled at 8 CPUs and fenced off from the node running out of CPU resources for the pod.

I've only noticed this recently, our Kubernetes version is 1.22.5 as of a reasonably recent upgrade.

    1 Answer 1

    1

    There is an open issue with a long thread about this bug.

    It is introduced in k8s v 1.22 and seems a race condition which can occur when pods get scheduled on a node where another pod is terminated. The terminated pod isn't seen by the scheduler anymore, but still uses resources of the node (cpu, memory).

    https://github.com/kubernetes/kubernetes/issues/106884

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.