Skip to content

Resource Requests and Limits

Resource settings are one of the most important workload controls in Kubernetes.

They determine scheduling quality, runtime stability, and autoscaling behavior.

Core model

  • requests: minimum resources reserved for scheduling
  • limits: maximum runtime resources a container may use

If requests are too low, pods get packed too tightly and become unstable under load. If they are too high, cluster capacity is wasted.

CPU and memory behavior

CPU and memory limits fail differently:

  • CPU over limit: throttling, usually latency increase
  • Memory over limit: OOM kill, container restart

That is why memory sizing errors are typically more disruptive.

Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: ghcr.io/example/api:v5.2.1
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1Gi

QoS classes

Pod QoS class is derived from resource configuration and determines eviction priority under node pressure:

graph TD
    BE[BestEffort\nno requests or limits\nevicted first]
    BU[Burstable\nrequests set, not equal to limits\nevicted second]
    GU[Guaranteed\nrequests == limits for all containers\nevicted last]

    BE -->|evict before| BU
    BU -->|evict before| GU
  • Guaranteed: all containers have requests equal to limits. Maximum scheduling predictability. Recommended for latency-sensitive services.
  • Burstable: at least one request set, but requests do not equal limits. Can use slack capacity when available.
  • BestEffort: no requests or limits at all. First to be evicted under memory pressure. Avoid for production workloads.

In node memory pressure events, the kubelet evicts BestEffort first, then Burstable pods that exceed their requests, then Guaranteed pods only as a last resort.

Sizing guidance

  • Start from observed p50 and p95 usage, not guesses
  • Keep requests close to realistic baseline load
  • Set memory limits with enough headroom for peak behavior
  • Avoid setting very low CPU limits on latency-sensitive services

Relationship to autoscaling

HPA resource targets depend on requests. Bad request values produce bad scaling decisions.

Always tune requests before tuning autoscaler thresholds.

Operational checks

kubectl top pods -A
kubectl describe pod <pod-name>
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].lastState}'

Look for OOMKilled events and sustained CPU throttling signals in metrics.

Summary

Requests drive placement. Limits enforce runtime caps. Correct values improve stability, cost efficiency, and autoscaling quality.