The Linux OOM (Out of Memory) Killer is invoked by the kernel when a container exceeds its memory limit. Kubernetes enforces container memory limits via cgroups — when a container hits its limit, the kernel terminates it with SIGKILL (exit code 137).
This is not a crash. It's an intentional termination. The container is killed, Kubernetes restarts it, and if the root cause isn't fixed, you get a CrashLoopBackOff.
# Check pod status and last termination reason
kubectl describe pod <pod-name> -n <namespace>
# Look for this in the output:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
# Quick check across all pods in namespace
kubectl get pods -n <namespace> -o json | \
jq '.items[].status.containerStatuses[].lastState.terminated | select(.reason=="OOMKilled")'
# See current limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].resources}'
# See live memory usage
kubectl top pod <pod-name> --containers
# Watch memory in real-time (if it's happening now)
watch -n 2 kubectl top pod <pod-name>
# Get the node the pod ran on
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}'
# Check kernel OOM logs on that node
kubectl debug node/<node-name> -it --image=ubuntu -- dmesg | grep -i oom
# Or via node shell
ssh <node-ip> "dmesg | grep -i 'oom_kill\|Out of memory'"
The container's memory limit was set conservatively during initial deployment and never updated as the application's actual memory footprint grew.
# Your limit vs actual usage
kubectl get pod <pod-name> -o yaml | grep -A3 resources:
# If usage regularly hits 80%+ of limit — limit is too low
# Fix: increase memory limit
kubectl set resources deployment/<name> \
--limits=memory=1Gi \
--requests=memory=512Mi
Application allocates memory and never releases it. Memory grows over time until the container is killed. Pattern: memory climbs steadily, pod restarts, climbs again.
# Check memory trend over time in Prometheus/Grafana
container_memory_usage_bytes{pod=~"<pod-name>.*"}
# If memory grows linearly over hours → leak
# Take heap dump while running (Java example)
kubectl exec <pod-name> -- \
jmap -dump:format=b,file=/tmp/heap.hprof <java-pid>
kubectl cp <pod-name>:/tmp/heap.hprof ./heap.hprof
# Analyze with Eclipse MAT or VisualVM
Normal memory usage, but a traffic surge created more concurrent requests than the container's memory could handle. Often correlation with traffic spike in APM.
# Check if traffic spiked before the kill
# Datadog query:
sum:nginx.net.connections{*} by {pod}
# Fix: HPA to scale out before memory is exhausted
kubectl autoscale deployment <name> \
--min=2 --max=10 \
--cpu-percent=70
The node itself is running low on memory. Kubernetes evicts pods to free node memory even if the pod hasn't hit its own limit.
# Check node conditions
kubectl describe node <node-name> | grep -A5 Conditions
# Look for:
MemoryPressure True
# Check node memory
kubectl top nodes
Input:
service: auth-service
error: CrashLoopBackOff — pod restarting every 30s
logs: OOMKilled exit code 137, memory limit 512Mi, usage 511Mi
recent changes: no recent deploys, increased traffic 3x this morning
Root cause identified:
OOMKilled — container memory limit too low for current traffic load. Memory limit 512Mi was set for baseline traffic but 3x traffic surge tripled concurrent request memory.
Ranked actions:
Investigate first: Memory consumption pattern since traffic spike — is it linear (memory leak) or proportional to traffic (limit too low)?
Rejected hypotheses:
# 1. Increase memory limit (immediate — no downtime)
kubectl patch deployment <name> -p \
'{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"limits":{"memory":"2Gi"},"requests":{"memory":"1Gi"}}}]}}}}'
# 2. Verify pod comes up cleanly
kubectl rollout status deployment/<name>
kubectl get pods -w
# 3. Monitor memory after fix
watch kubectl top pod -l app=<name>
# 4. Set VPA for automatic management (long-term)
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: <name>-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <name>
updatePolicy:
updateMode: "Auto"
EOF