Misconfigured resource requests and limits are responsible for a surprising number of production Kubernetes incidents — OOMKilled pods, throttled CPUs, and evictions during traffic spikes. Getting them right is one of the highest-leverage things you can do for cluster stability.
Requests vs limits — what actually happens
Requests are what the scheduler uses to decide where to place a pod. This is the resource the pod is guaranteed.
Limits are the hard ceiling. For memory, exceed it and the container is killed (OOMKilled). For CPU, exceeding the limit causes throttling — silent latency spikes.
A sensible starting template
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
Using Vertical Pod Autoscaler in recommendation mode
VPA in Off mode is the safest way to get data-driven recommendations without it auto-changing your pods in production.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off"
LimitRange as a safety net
Set a namespace-level LimitRange to apply default limits to any pod that does not specify them:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
Key takeaways
- Always set both requests and limits. No exceptions in production.
- Profile first. Never guess — use VPA recommendations or historical metrics.
- CPU throttling is silent and causes latency; OOM is loud and causes restarts. Both hurt.
- Use LimitRange to enforce defaults at the namespace level as a backstop.