kubernetes_hpa 5 Q&As

Kubernetes Hpa FAQ & Answers

5 expert Kubernetes Hpa answers researched from official documentation. Every answer cites authoritative sources you can verify.

unknown

5 questions
A

HPA automatically scales pod replicas based on observed metrics, essential for handling variable load in production. Mechanism: HPA controller queries metrics-server every 15s (default, configurable via --horizontal-pod-autoscaler-sync-period), calculates desired replicas using formula desired = ceil(current * (current_metric / target_metric)), updates Deployment/StatefulSet spec. Example: current 3 pods, CPU at 90%, target 70% → desired = ceil(3 * (90/70)) = ceil(3.86) = 4 pods. Kubernetes 1.29+ (2025) supports autoscaling/v2 API with multiple metric sources. Built-in resource metrics: CPU utilization via kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10 scales when average CPU exceeds 70%, memory scaling requires metrics-server v0.6.0+ with memory metrics enabled. HPA queries metrics every 15s, scales up immediately, scales down after 5-minute stabilization by default.

99% confidence
A

Advanced custom metrics require external adapters: (1) Prometheus Adapter (most popular) - exposes Prometheus metrics as Kubernetes custom metrics, (2) Datadog Cluster Agent - integrates Datadog metrics, (3) KEDA (Kubernetes Event-Driven Autoscaling) - supports 50+ scalers (RabbitMQ, Kafka, Azure Queue, AWS SQS). Production metric strategies: (1) HTTP request rate - scale API pods when requests/sec exceeds threshold (typical: 100-500 req/sec per pod), (2) Queue depth - scale workers based on RabbitMQ queue length or Kafka consumer lag (scale at 1000+ pending messages ensures timely processing), (3) Business metrics - active WebSocket connections, concurrent database queries, order processing rate. Example multi-metric HPA: combine CPU utilization 70% target + custom http_requests metric 1000 req/sec average, HPA chooses metric requiring most pods (highest replica count wins). Use custom metrics when CPU/memory don't correlate with actual load (API gateway, queue workers).

99% confidence
A

HPA best practices (2025): (1) Set minReplicas at 2+ for high availability (survives node failure, avoids cold start during traffic spike), (2) CPU target 70-80% leaves headroom for traffic spikes (avoid 90%+ - no room for bursts), (3) Set maxReplicas based on cluster capacity and cost limits (prevent runaway scaling), (4) Use VPA (Vertical Pod Autoscaler) for right-sizing resource requests (adjusts CPU/memory requests), use HPA for horizontal scaling (adjusts replica count), (5) Configure behavior field in v2 API to prevent thrashing: behavior: {scaleDown: {stabilizationWindowSeconds: 300, policies: [{type: Percent, value: 50, periodSeconds: 60}]}} limits scale-down to 50% per minute with 5-minute stabilization window. Performance: HPA adds ~10ms scheduling latency, metrics-server consumes ~100MB memory per 1000 pods. Use Cluster Autoscaler to scale nodes when HPA cannot schedule pods (insufficient cluster capacity).

99% confidence
A

Common HPA pitfalls: (1) Slow-starting applications (JVM warm-up, ML model loading) without proper readiness probes cause HPA thrashing - pod starts, receives traffic before ready, fails health checks, HPA scales up thinking capacity insufficient, set initialDelaySeconds and periodSeconds appropriately in readiness probe. (2) Missing cooldown periods lead to rapid scale up/down cycles (flapping) - set scaleDown stabilizationWindowSeconds: 300 (5 minutes) prevents immediate scale-down after scale-up. (3) Using only CPU metrics misses application-specific bottlenecks (database connections exhausted at 50% CPU, thread pool saturated) - combine CPU with custom metrics (queue depth, request latency). (4) Insufficient cluster capacity blocks scale-up - HPA creates pods but scheduler cannot place them (pending state), use Cluster Autoscaler to add nodes automatically. (5) Resource requests too low/high - underestimated requests cause OOMKilled, overestimated waste resources, use VPA to right-size requests.

99% confidence
A

Real-world e-commerce HPA example: scales from 3 pods (night, low traffic) to 20 pods (peak shopping hours) based on combined metrics - request rate + CPU utilization. Configuration: minReplicas: 3, maxReplicas: 20, target CPU: 70%, target request rate: 500 req/sec per pod. Traffic pattern: 2AM (100 req/sec) → 3 pods sufficient, 2PM (8000 req/sec) → scales to 16 pods (8000/500 = 16), 6PM Black Friday (15000 req/sec) → scales to 20 pods (maxReplicas cap). Cost savings: 60% infrastructure cost vs static 20-pod deployment (20 pods × 24 hours = 480 pod-hours/day, HPA averages 8 pods × 24 hours = 192 pod-hours/day, saves 288 pod-hours = 60%). Behavior config prevents thrashing: scaleUp aggressive (double pods within 1 minute during spike), scaleDown conservative (reduce 25% per 5 minutes after traffic subsides, 300s stabilization window). Result: maintains <200ms p95 latency during peaks, saves $3000/month in compute costs.

99% confidence