[JM] Tried to summarize the above links into a couple of sentences
Requests are used by the k8s scheduler to find a node that has these resources available (e.g. not requested by any other Container/Pod)
CPU requests will (unlike memory) inform the cgroup cpu shares system in order to split the available CPU on a node properly between Containers
Containers are generally allowed to exceed their Requests, as long as there are resources available (e.g. if a container is using more memory than requested, it is likely to get OOM killed of another container requests more memory still within it's Requests)
Limits are used as hard boundaries, which only works "well" (in terms of easy to understand) for memory. A container crossing the memory Limit will likely be OOM killed, so it defines the maximum of memory it can use.
Limits on CPU is hard to understand/reason about. The number of "CPUs" given in Kubernetes YAML translates to setting the quota in the cpu bandwith control system (CFS quota). This quota is provided on a "per period" basis (usually 100ms), e.g. processes can only use Xms of CPU in 100ms windows, needing to wait for the rest of the time (for the next period to begin).
[CG] for latency relevant applications the interwebs suggest not to use limits, we saw that in php-fpm (mw-api) with very little number of requests as well as envoy which spawns threads in the number of physical cores by default (eating quickly through the cf quota budget)
[LT] when we don’t set limits (but requests) and use multiple threads, will it eat all the cpu?
It will (get “the rest” of available CPU), but the Requests of other Containers will still be fulfilled (as they have a cpu shares allocated in the cgroup)
[JM] cpu quota is shared between all physical CPUs of a node, if one of those slices is exhausted, there is no mechanism of using the shares of “unused CPUs”
[LT] nice to have would be an explanation about how to think about limits, how to consider the value. “What happens if I raise my CPU limit from 5 to 10”
[CG] There is probably no generic answer to this question as it depends on the type of workload (thread count for example) as well as the actual CPUs
Action Items:
[JM] Document what limits are not, and what Requests are 🙂
[JM] Document on why we decided to remove them/change them for some particular workload or what we did to mitigate the problem (like limiting the number of worker threads in envoy)
Allow to monitor/alert throttling workload (T266216)