Startup Probe (Kubernetes 1.20+ stable) detects if application finished starting, disables liveness/readiness probes until first success (prevents premature liveness kills during slow startup). Question answered: Has container finished initializing? Failure action: After failureThreshold * periodSeconds total time, kubelet kills container (startup timeout exceeded). Once succeeds, disabled permanently (liveness/readiness take over). Use cases: Slow-starting applications (Java apps with 60s+ startup, ML models loading large datasets, database migrations before app starts), legacy apps with unpredictable startup. Example: Spring Boot app takes 120s to start. Without startup probe: liveness probe fails at 60s (app not ready) → restart → infinite CrashLoopBackOff. With startup probe: failureThreshold: 30, periodSeconds: 5 (150s allowed) → app starts at 120s → startup succeeds → liveness/readiness enabled. Configuration: High failureThreshold (30-60) + short periodSeconds (5-10) allows long startup without infrequent probing.
Kubernetes Probes FAQ & Answers
5 expert Kubernetes Probes answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
5 questionsLiveness Probe detects if container is running correctly, restarts container if probe fails (recovers from deadlocks, hung processes, unrecoverable errors). Question answered: Is the container still alive? Failure action: Kubelet kills container, respects restartPolicy (Always/OnFailure restarts, Never terminates). Use cases: Application deadlocks (web server hung), infinite loops (stuck in bad state), memory leaks causing freeze. Configuration: httpGet.path: /healthz (most common), tcpSocket (databases), exec command. Best practice: Conservative settings to avoid flapping - initialDelaySeconds: 60 (wait after start), periodSeconds: 10 (check interval), failureThreshold: 3 (consecutive failures before restart). Anti-pattern: Don't check dependencies (database down → restart loop). Check only app health (is process responsive?).
Four probe methods: (1) httpGet: HTTP GET to container IP + port + path. Success: 200 ≤ status < 400. Config: httpGet.path: /healthz, httpGet.port: 8080, httpGet.scheme: HTTP/HTTPS. Most common for web apps/APIs. (2) tcpSocket: TCP connection to port. Success: connection established. Config: tcpSocket.port: 3306. Useful for databases/caches (MySQL, Redis, PostgreSQL). Limitation: Only checks port listening, not app health. (3) exec: Execute command inside container. Success: exit code 0. Config: exec.command: ["pg_isready", "-U", "postgres"]. Use for legacy apps without HTTP endpoints, custom validation. Performance: Slower (spawns shell), avoid complex commands. (4) grpc (Kubernetes 1.27+): gRPC health check via grpc.health.v1.Health/Check. Config: grpc.port: 50051. Native support for gRPC microservices (Envoy, Istio). Efficient binary protocol.
Readiness Probe detects if container is ready to serve traffic, removes pod from Service endpoints if probe fails (prevents routing requests to unhealthy pods). Question answered: Can this container handle traffic right now? Failure action: Pod marked NotReady, removed from Service load balancing (no container restart). Use cases: Application warming up (loading caches, warming JVM), temporary overload (high CPU, cannot handle more requests), dependency unavailable (database connection lost, waiting for reconnection), graceful shutdown (draining connections). Example: E-commerce app depends on Redis. Redis restarts → readiness probe fails (GET /ready returns 503) → pod removed from Service → requests route to healthy pods → Redis recovers → readiness succeeds → pod added back. Critical difference from liveness: Readiness is temporary (pod recovers), liveness is permanent failure (requires restart). Best practice: Check dependencies, probe frequently (periodSeconds: 5) for quick recovery.
Common mistakes: (1) Liveness checks dependencies: App depends on database → liveness queries database → database down → liveness fails → restarts all pods → infinite restart loop. Fix: Liveness checks only app health (is process alive?), readiness checks dependencies. (2) No startup probe for slow apps: 60s startup app with liveness initialDelaySeconds: 30 → killed at 30s → CrashLoopBackOff. Fix: Add startup probe with sufficient failureThreshold. (3) Readiness == liveness: Same endpoint for both → temporary overload triggers liveness restart (should only affect readiness). Fix: Separate endpoints - liveness lightweight, readiness comprehensive. (4) Probe timeout too short: Health check queries database taking 2s → timeoutSeconds: 1 → fails even when healthy. Fix: Set timeout > expected response time + network latency (3-5s for database checks). (5) Too aggressive thresholds: failureThreshold: 1 → single network blip causes restart. Fix: failureThreshold: 3 allows transient failures.