Docker security in production requires multi-layered defense. Top practices: (1) Non-root users: create dedicated user in Dockerfile: RUN useradd -m appuser && chown -R appuser /app; USER appuser. Prevents privilege escalation attacks. (2) Read-only filesystem: docker run --read-only --tmpfs /tmp myapp. Forces immutable infrastructure. (3) Resource limits: docker run --memory=512m --cpus=1 myapp prevents container bombs, DoS attacks. (4) Security scanning: integrate Trivy, Snyk in CI/CD, fail builds on HIGH/CRITICAL vulnerabilities. (5) Minimal base images: prefer distroless or Alpine (10-50x smaller attack surface than full OS). (6) No secrets in images: use Docker secrets, environment variables, or secret managers (Vault, AWS Secrets Manager). (7) Network policies: default deny, explicit allow. (8) Regular updates: automate base image updates, rebuild monthly minimum. According to 2025 surveys: 67% update Kubernetes regularly, 53% block exposed ports, 52% enable RBAC. Multi-stage builds: separate build and runtime stages to exclude build tools from final image. Best practice: run security audits with docker bench-security, implement least-privilege principle.
Docker Kubernetes FAQ & Answers
19 expert Docker Kubernetes answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
19 questionsMulti-stage builds separate build and runtime environments, reducing final image size by 60-90% and eliminating build tools from production. Basic pattern: FROM node:20 AS builder; WORKDIR /app; COPY package*.json ./; RUN npm ci; COPY . .; RUN npm run build; FROM node:20-alpine; COPY --from=builder /app/dist ./dist; CMD ['node', 'dist/main.js']. Builder stage includes dev dependencies, final stage only runtime. Advanced patterns: (1) Multiple builders: separate stages for different compilation steps (TypeScript, CSS, assets), (2) Parallel builds: COPY --from=builder1 and COPY --from=builder2 in final stage, (3) Build cache optimization: RUN --mount=type=cache,target=/root/.npm npm ci uses BuildKit cache mounts (5-10x faster rebuilds), (4) Secret mounting: RUN --mount=type=secret,id=npmrc npm ci passes secrets without storing in layers. Benefits: reduced attack surface (no compilers, dev tools), faster deployments (smaller images), layer caching optimizes rebuilds. Example: Next.js app goes from 1.2GB (single stage) to 180MB (multi-stage). Security: build stage can use privileged operations, runtime stage stays minimal. Best practice: use official slim/alpine variants for final stage, order COPY commands by change frequency (package files before source). BuildKit is default in Docker Engine 23.0+ (2023).
Deployments manage stateless applications with interchangeable pods, StatefulSets manage stateful applications requiring stable identity and storage. Key differences: (1) Pod identity: Deployments use random names (app-xyz), StatefulSets use ordered names (app-0, app-1, app-2) that persist across restarts, (2) Storage: Deployments share storage or use ephemeral volumes, StatefulSets create PersistentVolumeClaim per pod with stable binding, (3) Scaling: Deployments scale randomly, StatefulSets scale sequentially (0→1→2, terminate 2→1→0), (4) Updates: Deployments do rolling updates in parallel, StatefulSets update one pod at a time maintaining order. Use Deployments for: web servers, APIs, stateless microservices, workers processing from queues - anything where pods are identical and replaceable. Use StatefulSets for: databases (PostgreSQL, MySQL, MongoDB), message queues (Kafka, RabbitMQ), distributed systems requiring member coordination (Elasticsearch, ZooKeeper, etcd) - anything needing stable network identity or persistent state. Example: Deployment manifest: replicas: 3; strategy: RollingUpdate; maxSurge: 1. StatefulSet manifest: replicas: 3; serviceName: my-db; volumeClaimTemplates: [...]. Performance: StatefulSets have slower startup/scaling due to sequential operations. Best practice: default to Deployments, use StatefulSets only when truly needed (adds complexity).
HPA automatically scales pod replicas based on observed metrics, essential for handling variable load in production environments. Mechanism: HPA controller queries metrics-server every 15s (default, configurable via --horizontal-pod-autoscaler-sync-period), calculates desired replicas using formula desired = ceil(current * (current_metric / target_metric)), then updates Deployment/StatefulSet spec. Kubernetes 1.29+ (2025) supports autoscaling/v2 API with multiple metric sources. Built-in resource metrics: CPU utilization via kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10 scales when average CPU exceeds 70%, memory scaling requires metrics-server v0.6.0+ with memory metrics enabled. Advanced custom metrics require external adapters: Prometheus Adapter (most popular), Datadog Cluster Agent, or KEDA (Kubernetes Event-Driven Autoscaling). Production metric strategies: (1) HTTP request rate - scale API pods when requests/sec exceeds threshold (typical: 100-500 req/sec per pod), (2) Queue depth - scale workers based on RabbitMQ queue length or Kafka consumer lag (scale at 1000+ pending messages), (3) Business metrics - active WebSocket connections, concurrent database queries, order processing rate. Example multi-metric HPA: combine CPU utilization 70% target + custom http_requests metric 1000 req/sec average, HPA chooses metric requiring most pods. Best practices: (1) Set minReplicas at 2+ for high availability (survives node failure), (2) CPU target 70-80% leaves headroom for traffic spikes (avoid 90%+), (3) Set maxReplicas based on cluster capacity and cost limits, (4) Use VPA (Vertical Pod Autoscaler) for right-sizing resource requests, HPA for horizontal scaling, (5) Configure behavior field in v2 API: behavior: {scaleDown: {stabilizationWindowSeconds: 300, policies: [{type: Percent, value: 50, periodSeconds: 60}]}} prevents thrashing by limiting scale-down to 50% per minute with 5-minute stabilization. Performance impact: HPA adds ~10ms scheduling latency, metrics-server consumes ~100MB memory per 1000 pods. Common pitfalls: (1) Slow-starting applications (JVM, ML models) without proper readiness probes cause HPA thrashing - set initialDelaySeconds appropriately, (2) Missing cooldown periods lead to rapid scale up/down cycles (set scaleDown stabilizationWindowSeconds: 300), (3) Using only CPU metrics misses application-specific bottlenecks (combine with custom metrics), (4) Insufficient cluster capacity blocks scale-up (use Cluster Autoscaler). Real-world example: e-commerce site scales from 3 pods (night) to 20 pods (peak shopping hours) based on request rate + CPU, saving 60% infrastructure cost vs static 20-pod deployment.
Resource requests/limits control pod resource allocation and Quality of Service (QoS) class. Requests: guaranteed minimum resources, used for scheduling decisions - node must have available resources ≥ requests. Limits: maximum resources pod can consume, enforced by kernel cgroups. Configuration: resources: {requests: {cpu: '250m', memory: '256Mi'}, limits: {cpu: '500m', memory: '512Mi'}}. QoS classes (determines eviction priority): (1) Guaranteed: requests == limits for all containers, highest priority, last to be evicted, (2) Burstable: requests < limits or only requests set, medium priority, (3) BestEffort: no requests/limits, lowest priority, first evicted. Scheduling: kube-scheduler sums all pod requests per node, schedules on nodes with available capacity. Over-commitment: node can run pods with total limits > node capacity (relies on pods not hitting limits simultaneously). CPU throttling: pod hitting CPU limit gets throttled (performance degradation), memory limit causes OOMKill (pod restart). Best practices: (1) Set requests based on average usage, limits at 2x requests for burst headroom, (2) Monitor actual usage with metrics-server or Prometheus, adjust over time, (3) Use LimitRanges to enforce namespace defaults, prevent unbounded pods, (4) For critical pods: requests == limits (Guaranteed QoS). Common mistakes: no requests (can't schedule properly), limits too low (OOMKills), no limits (one pod can starve others). Use Vertical Pod Autoscaler to recommend optimal values.
Service mesh provides observability, security, and traffic management for microservices without modifying application code. Istio 1.24+ (GA in 2024) uses ambient mode as alternative to sidecars, reducing resource overhead by 40-50%. Traditional sidecar mode: Istio injects Envoy proxy (1.29+) into each pod via mutating webhook, proxies intercept all inbound/outbound traffic via iptables rules. Ambient mode: Layer 4 ztunnel (zero-trust tunnel) runs per-node instead of per-pod, opt-in Layer 7 waypoint proxies for advanced features. Core capabilities: (1) Traffic management - intelligent routing for A/B testing (10% traffic to v2), canary deployments (gradual 10%→50%→100%), circuit breaking (max connections, pending requests), retries (exponential backoff), timeouts - all configured via VirtualService/DestinationRule CRDs. (2) Security - automatic mutual TLS encryption between all services (STRICT/PERMISSIVE modes), certificate rotation every 24h (default), L7 authorization policies based on JWT claims, source identity, HTTP methods. (3) Observability - distributed tracing integration (Jaeger, Zipkin) with 1% default sampling rate (configurable), RED metrics (Rate, Errors, Duration) exported to Prometheus, detailed access logs with request/response metadata. Implementation: install control plane via istioctl install --set profile=ambient for ambient mode or --set profile=default for sidecar, enable injection with namespace label istio-injection=enabled, deploy applications (sidecars auto-injected). Traffic routing example for canary: VirtualService routes 90% to stable subset, 10% to canary subset based on weights, DestinationRule defines subsets by pod labels. Production benefits: (1) Zero-touch mTLS across 100+ microservices (impossible manually), (2) Unified observability (service graph, latency percentiles, error rates), (3) Traffic shifting without redeploying apps (faster iteration), (4) Standardized resilience patterns (retries, timeouts, circuit breakers). Trade-offs carefully measured: sidecar mode adds 50-100MB memory per pod + 0.05-0.1 vCPU overhead, ambient mode reduces to 10-20MB per pod via shared node proxy, P50 latency impact 0.5-1ms, P99 impact 1-2ms (acceptable for most services). Resource costs: 100-pod cluster with sidecars requires +5-10GB memory, ambient mode requires +1-2GB. Alternatives comparison: Linkerd (Rust-based, 20MB memory per proxy, simpler but fewer features), Consul Connect (HashiCorp ecosystem integration), AWS App Mesh (managed, AWS-only). Best practices: (1) Start with ambient mode for new deployments (lower resource cost), (2) Enable strict mTLS after PERMISSIVE mode validation period, (3) Set conservative retry/timeout defaults (max 3 retries, 15s timeout), (4) Use PeerAuthentication CRD to enforce mTLS policy. Common pitfalls: (1) Debugging connection failures requires understanding Envoy config (use istioctl proxy-config commands), (2) Misconfigured DestinationRule subsets cause 503 errors, (3) VirtualService match order matters (first match wins), (4) Resource limits too low cause Envoy OOMKills. Use cases: Istio essential for >20 microservices needing unified security/observability, overkill for <10 services (use simpler ingress controller). 2025 adoption data: 35% of large enterprises use service mesh (up from 30% in 2024), ambient mode driving renewed interest due to lower cost. Financial services and healthcare lead adoption due to compliance requirements (audit trails, mTLS). Real-world impact: company with 80 microservices implemented Istio ambient mode, gained complete service graph visibility + mTLS encryption with 2GB memory overhead vs 8GB sidecar approach, reduced security incidents 60% via authorization policies.
GitOps uses Git as single source of truth for declarative infrastructure and applications, enabling automated deployments via pull-based reconciliation. Principle: Git repo contains all Kubernetes manifests, operators (ArgoCD, Flux) continuously sync cluster state to match repo. Workflow: (1) Developers push changes to Git, (2) ArgoCD/Flux detects changes, (3) Applies manifests to cluster, (4) Reconciles differences (self-healing). Implementation with ArgoCD: install ArgoCD: kubectl apply -n argocd -f install.yaml, create Application: apiVersion: argoproj.io/v1; kind: Application; spec: {source: {repoURL: 'github.com/org/app', path: 'k8s', targetRevision: HEAD}, destination: {server: 'https://kubernetes.default.svc', namespace: default}, syncPolicy: {automated: {prune: true, selfHeal: true}}}. Benefits: (1) Audit trail: all changes in Git history, (2) Rollback: git revert reverts cluster state, (3) Disaster recovery: restore cluster from Git, (4) Consistency: prevents kubectl drift, (5) Multi-cluster: manage 100+ clusters from single repo. Patterns: (1) Environment branches: dev/staging/prod branches, (2) App-of-apps: ArgoCD app that creates other apps (manages entire platform), (3) Helm/Kustomize integration: ArgoCD renders templates before applying. Security: restrict cluster access, all changes via Git (PR review), role-based access to repos. Observability: ArgoCD UI shows sync status, health, history. Trade-offs: learning curve, pull-based means ~30s-3min sync delay. Best practice: separate app code repo from GitOps config repo, use automated image updaters to trigger config updates on new images. 2025 adoption: over 65% of enterprise organizations implement GitOps practices.
Kubernetes probes detect unhealthy pods and control traffic routing, critical for zero-downtime deployments and self-healing. Three probe types: (1) Liveness: detects if pod is alive, kubelet kills and restarts on failure - use to recover from deadlocks/hangs, (2) Readiness: detects if pod can serve traffic, removes from Service endpoints on failure - use during startup or temporary unavailability, (3) Startup: allows slow-starting pods extended time before liveness kicks in - prevents premature kills. Configuration: livenessProbe: {httpGet: {path: /healthz, port: 8080}, initialDelaySeconds: 30, periodSeconds: 10, failureThreshold: 3}. Probe methods: httpGet (HTTP 200-399 = success), tcpSocket (TCP connection succeeds), exec (command exit 0). Best practices: (1) Liveness checks lightweight: avoid checking dependencies (DB, external APIs) or expensive operations - only check if process responds, (2) Readiness checks dependencies: DB connection, downstream services - determines if ready to serve, (3) Different endpoints: /livez for liveness (always passes unless deadlock), /readyz for readiness (checks dependencies), (4) Tune thresholds: initialDelaySeconds covers startup time, failureThreshold allows transient failures (3-5 retries over 30-50s). Common mistakes: (1) Same endpoint for liveness/readiness causes cascading failures (DB down → liveness fails → all pods restart), (2) No startup probe for slow apps (liveness kills during startup), (3) Too aggressive timeouts (false positives), (4) Expensive checks (slow probe execution). Example: Java app startup probe: initialDelaySeconds: 0, periodSeconds: 5, failureThreshold: 30 gives 150s startup window, then liveness takes over. Critical for: rolling updates (readiness ensures old pods drain before termination), autoscaling (only scales healthy pods), service mesh (observability).
NetworkPolicies provide Layer 3/4 firewall rules for pod-to-pod communication, critical for implementing zero-trust security in Kubernetes clusters. Default Kubernetes behavior: all pods can communicate with all pods across all namespaces (flat network), NetworkPolicies enable explicit allow-list model. Requires CNI plugin support: Calico 3.27+ (most popular, 40% market share), Cilium 1.15+ (eBPF-based, fastest performance), Weave Net, or Antrea - note that default kubenet and Flannel CNI plugins do NOT enforce NetworkPolicies. Basic deny-all baseline policy: create NetworkPolicy with empty podSelector (matches all pods in namespace) and empty ingress/egress arrays, effectively blocking all traffic. Production policy structure: specify podSelector (which pods this policy applies to), policyTypes array ([Ingress, Egress] or subset), ingress rules (allowed inbound traffic sources + ports), egress rules (allowed outbound destinations + ports). Example three-tier architecture: frontend policy allows ingress from nginx-ingress namespace on port 3000, egress to backend on port 8080; backend policy allows ingress from frontend on port 8080, egress to database namespace on port 5432; database policy allows ingress only from backend on port 5432, egress to nothing (deny all outbound). Zero-trust implementation steps: (1) Audit mode - deploy policies in Cilium with audit mode enabled or use Calico's policy recommendation tool to observe actual traffic patterns for 7-14 days without enforcement, (2) Default deny - apply deny-all NetworkPolicy to each namespace as baseline, (3) Explicit allows - create granular NetworkPolicy per service allowing only documented traffic flows, (4) Namespace isolation - use namespaceSelector with labels (e.g., env: production, team: payments) to restrict cross-namespace traffic, (5) External egress control - use ipBlock rules to allow specific external IPs (APIs, databases) while blocking general internet or internal RFC1918 ranges. Advanced patterns: (1) DNS egress - allow pods to reach kube-dns/CoreDNS on port 53 UDP (required for service discovery), (2) Metrics scraping - allow Prometheus to scrape metrics endpoints across namespaces via podSelector + namespaceSelector combination, (3) Service mesh integration - NetworkPolicies work alongside Istio/Linkerd providing defense-in-depth (NetworkPolicy enforces L3/4, service mesh enforces L7), (4) FQDN-based policies - Cilium supports DNS-aware policies allowing rules like toFQDNs: [{matchName: api.stripe.com}] instead of IP ranges. Production best practices: (1) Start with monitoring/audit mode (Cilium), analyze actual traffic before enforcing, (2) Apply default-deny incrementally namespace-by-namespace (not cluster-wide), (3) Label pods consistently (app, version, tier labels) for maintainable selectors, (4) Document allowed flows in Git alongside policy YAML, (5) Use namespace labels for environment isolation (dev/staging/prod), (6) Test policy changes in staging with identical traffic patterns. Performance metrics: Cilium eBPF achieves <5µs latency overhead per packet, Calico iptables-based adds ~10-20µs, both negligible for application performance. Scale limits: tested up to 10,000 pods with 1000+ NetworkPolicies without degradation. Common pitfalls: (1) Forgetting DNS egress rules breaks service discovery (pods can't resolve service names), (2) Overly broad selectors (matchLabels: {}) accidentally allow too much traffic, (3) No egress rules blocks Pod liveness/readiness probes to kubelet, (4) Testing in dev with fewer pods misses production edge cases, (5) Applying default-deny without explicit allows breaks existing services. Debugging tools: kubectl describe networkpolicy shows policy details, kubectl exec -it pod -- curl -v target-service tests connectivity, Cilium Hubble provides flow visualization, Calico calicoctl supports packet capture. Compliance benefits: NetworkPolicies satisfy PCI-DSS requirement 1.2 (network segmentation), HIPAA 164.312 (access controls), SOC 2 CC6.6 (logical access), provide audit trail of allowed/denied flows for compliance reporting. Real-world impact: financial services company implemented zero-trust NetworkPolicies across 200-pod production cluster, blocked 15 security incidents in first 6 months (lateral movement attempts), reduced blast radius of compromised pods from cluster-wide to single service. 2025 adoption: 65% of enterprises use NetworkPolicies in production (up from 50% in 2024), driven by compliance requirements and high-profile breach prevention.
Blue-green and canary enable zero-downtime deployments with quick rollback capability. Blue-green: run two identical environments (blue=current, green=new), switch traffic atomically. Implementation: (1) Two Deployments: blue-deployment (replicas: 3, version: v1) and green-deployment (replicas: 3, version: v2), (2) Service selector switches: kubectl patch service myapp -p '{"spec":{"selector":{"version":"v2"}}}' switches from blue to green instantly, (3) Rollback: patch selector back to v1. Benefits: instant cutover, easy rollback, safe testing (green runs alongside blue). Drawbacks: double resource cost during deployment, all-or-nothing switch. Canary: gradually shift traffic to new version while monitoring metrics. Implementation methods: (1) Manual replica adjustment: start with 1 new pod (10% traffic), gradually increase to 10 pods (100%), (2) Ingress-based: nginx-ingress canary annotations: nginx.ingress.kubernetes.io/canary: 'true'; nginx.ingress.kubernetes.io/canary-weight: '10' sends 10% to canary service, (3) Service mesh (Istio): VirtualService with traffic split: route: [{destination: {host: myapp, subset: v1}, weight: 90}, {destination: {host: myapp, subset: v2}, weight: 10}]. Progressive delivery: automate canary analysis (monitor error rates, latency), auto-promote or rollback based on metrics (Flagger tool). Best practices: (1) Start with 5-10% canary traffic, (2) Monitor key metrics (error rate, latency P95, success rate), (3) Gradual increase: 10% → 25% → 50% → 100% over 30-60 minutes, (4) Automated rollback on metric threshold breach. Use blue-green for: low-traffic apps, database migrations (switch atomically). Use canary for: high-traffic apps, gradual validation, automated analysis. 2025 trend: canary with automated analysis (Flagger, Argo Rollouts) becoming standard for production deployments.
HPA automatically scales pod replicas based on observed metrics. Mechanism: Controller queries metrics-server every 15s, calculates desired replicas: ceil(current * (current_metric / target_metric)), updates Deployment/StatefulSet spec. Kubernetes 1.29+ supports autoscaling/v2 API with multiple metric sources. Built-in: CPU/memory via kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10. Custom metrics (via Prometheus Adapter or KEDA): HTTP request rate (scale at 100-500 req/sec per pod), queue depth (RabbitMQ, Kafka consumer lag at 1000+ messages), business metrics (active connections, concurrent queries). Multi-metric HPA chooses metric requiring most pods. Best practices: (1) minReplicas 2+ (HA), (2) CPU target 70-80% (headroom for spikes), (3) behavior field limits scale-down: stabilizationWindowSeconds: 300 prevents thrashing, (4) combine with VPA for right-sizing. Common pitfalls: Slow-starting apps without readiness probes, no cooldown causing rapid cycles, insufficient cluster capacity blocks scale-up. Use Cluster Autoscaler for node-level scaling.
Requests/limits control pod resource allocation and Quality of Service (QoS) class. Requests: Guaranteed minimum, used for scheduling - node must have available ≥ requests. Limits: Maximum consumption, enforced by kernel cgroups. Config: resources: {requests: {cpu: '250m', memory: '256Mi'}, limits: {cpu: '500m', memory: '512Mi'}}. QoS classes (eviction priority): (1) Guaranteed - requests == limits, highest priority, last evicted. (2) Burstable - requests < limits, medium priority. (3) BestEffort - no requests/limits, first evicted. Scheduling: kube-scheduler sums requests per node, schedules on available capacity. Over-commitment: Total limits can exceed node capacity (relies on pods not hitting limits simultaneously). CPU throttling: Pod hitting limit gets throttled. Memory limit: OOMKill (pod restart). Best practices: Set requests from average usage, limits at 2x requests for burst. Monitor with metrics-server/Prometheus, use LimitRanges for namespace defaults, Guaranteed QoS for critical pods. Common mistakes: No requests (can't schedule), limits too low (OOMKills), no limits (starvation). Use Vertical Pod Autoscaler for optimal values.
Service mesh provides observability, security, traffic management without modifying app code. Istio 1.24+ (2024 GA) offers ambient mode (40-50% lower resource overhead vs sidecars). Ambient mode: Layer 4 ztunnel per-node instead of per-pod, opt-in Layer 7 waypoint proxies. Sidecar mode: Envoy proxy injected per pod via mutating webhook. Core capabilities: (1) Traffic management - A/B testing, canary deployments (10%→50%→100%), circuit breaking, retries via VirtualService/DestinationRule CRDs. (2) Security - Automatic mTLS between services (STRICT/PERMISSIVE), certificate rotation every 24h, L7 authorization policies (JWT claims, HTTP methods). (3) Observability - Distributed tracing (Jaeger/Zipkin, 1% sampling), RED metrics (Rate/Errors/Duration) to Prometheus. Install: istioctl install --set profile=ambient (or --set profile=default for sidecar), enable namespace: istio-injection=enabled. Trade-offs: Sidecar adds 50-100MB memory + 0.05-0.1 vCPU per pod, ambient reduces to 10-20MB. P50 latency +0.5-1ms. Use cases: Essential for >20 microservices needing unified security/observability. 2025 adoption: 35% large enterprises, ambient mode driving renewal.
GitOps uses Git as single source of truth for declarative infrastructure, enabling automated deployments via pull-based reconciliation. Principle: Git repo contains all Kubernetes manifests, operators (ArgoCD, Flux) continuously sync cluster state to match repo. Workflow: Developers push changes → ArgoCD/Flux detects → applies manifests → reconciles differences (self-healing). ArgoCD implementation: Install: kubectl apply -n argocd -f install.yaml. Create Application: kind: Application; spec: {source: {repoURL, path: 'k8s', targetRevision: HEAD}, destination: {server, namespace}, syncPolicy: {automated: {prune: true, selfHeal: true}}}. Benefits: (1) Audit trail in Git history, (2) Rollback via git revert, (3) Disaster recovery from Git, (4) Multi-cluster management (100+ clusters from single repo). Patterns: Environment branches (dev/staging/prod), app-of-apps (ArgoCD app creates other apps), Helm/Kustomize integration. Security: All changes via Git (PR review), RBAC on repos. Trade-offs: Learning curve, 30s-3min sync delay. Best practice: Separate app code from GitOps config repo, use automated image updaters. 2025 adoption: 65% of enterprises implement GitOps.
Kubernetes probes detect unhealthy pods and control traffic routing, critical for zero-downtime deployments. Three probe types: (1) Liveness - Detects if pod alive, kubelet restarts on failure. Use to recover from deadlocks/hangs. (2) Readiness - Detects if pod can serve traffic, removes from Service endpoints on failure. Use during startup or temporary unavailability. (3) Startup - Allows slow-starting pods extended time before liveness. Prevents premature kills. Config: livenessProbe: {httpGet: {path: /healthz, port: 8080}, initialDelaySeconds: 30, periodSeconds: 10, failureThreshold: 3}. Methods: httpGet (200-399 success), tcpSocket, exec (exit 0). Best practices: (1) Liveness lightweight - avoid dependencies/expensive ops, only check if process responds. (2) Readiness checks dependencies - DB, downstream services. (3) Different endpoints: /livez (liveness), /readyz (readiness). (4) Tune thresholds: failureThreshold 3-5 allows transient failures. Common mistakes: Same endpoint for both (DB down → liveness fails → all pods restart), no startup probe for slow apps, too aggressive timeouts. Example: Java app startup probe with failureThreshold: 30, periodSeconds: 5 gives 150s startup window.
NetworkPolicies provide Layer 3/4 firewall rules for pod-to-pod communication, enabling zero-trust security. Default: All pods communicate freely. NetworkPolicies enable explicit allow-list model. Requires CNI plugin: Calico 3.27+ (40% market share), Cilium 1.15+ (eBPF, fastest), Weave, Antrea. kubenet/Flannel DON'T enforce policies. Zero-trust steps: (1) Audit mode - Deploy Cilium audit mode or Calico recommendation tool, observe traffic 7-14 days. (2) Default deny - Apply deny-all NetworkPolicy per namespace (empty podSelector + empty ingress/egress). (3) Explicit allows - Create granular policies per service. (4) Namespace isolation - Use namespaceSelector labels (env: production, team: payments). (5) External egress - ipBlock rules for specific external IPs, block general internet. Example three-tier: Frontend → backend:8080, Backend → database:5432, Database → deny all egress. Advanced: DNS egress (kube-dns port 53), Prometheus scraping, FQDN-based (Cilium: toFQDNs matchName: api.stripe.com). Best practices: Start audit mode, apply default-deny incrementally, label pods consistently, test in staging. Performance: Cilium <5µs latency, Calico ~10-20µs. Scale: 10K pods with 1K+ policies. 2025 adoption: 65% enterprises (up from 50% in 2024).
GitOps uses Git as single source of truth for declarative infrastructure and applications. Core principles: (1) Declarative - all Kubernetes manifests stored in Git (YAML/JSON), (2) Versioned - Git commits provide audit trail, rollback capability (git revert reverts cluster state), (3) Pull-based - operators (ArgoCD, Flux) continuously sync cluster state to match Git repo, (4) Automated reconciliation - self-healing, detects drift and auto-corrects. Workflow: Developers push changes to Git → Operator detects changes → Applies manifests to cluster → Reconciles differences. Benefits: audit trail (all changes in Git history), disaster recovery (restore cluster from Git), consistency (prevents kubectl drift), multi-cluster management (manage 100+ clusters from single repo). 2025 adoption: 65%+ enterprise organizations use GitOps.
ArgoCD implementation: (1) Install ArgoCD: kubectl apply -n argocd -f install.yaml. (2) Create Application resource: apiVersion: argoproj.io/v1, kind: Application, spec: {source: {repoURL: 'github.com/org/app', path: 'k8s', targetRevision: HEAD}, destination: {server: 'https://kubernetes.default.svc', namespace: default}, syncPolicy: {automated: {prune: true, selfHeal: true}}}. Prune deletes resources not in Git, selfHeal auto-corrects drift. (3) ArgoCD continuously polls Git, applies changes automatically. Patterns: (1) Environment branches (dev/staging/prod), (2) App-of-apps (ArgoCD app creates other apps, manages platform), (3) Helm/Kustomize integration (renders templates before applying). UI shows sync status, health, history. Security: restrict cluster access, all changes via Git (PR review required). Sync delay: ~30s-3min pull-based.
Flux implementation: (1) Install Flux: flux bootstrap github --owner=org --repository=fleet --path=clusters/production --personal. Creates Git repo structure, installs Flux controllers. (2) Flux uses GitRepository CRD to watch repo, Kustomization CRD to apply manifests. (3) Continuous reconciliation: Flux polls Git every 1 minute (configurable), applies changes automatically. (4) Structure: clusters/production/ contains Kustomization files, apps/ contains application manifests. Flux features: (1) Multi-tenancy - namespace isolation per team, (2) Notification system - alerts to Slack/Teams on deployments, (3) Image automation - updates manifests when new container images pushed. Patterns: separate app code repo from GitOps config repo, use flux image automation to trigger config updates. Flux reconciles faster than ArgoCD (1 min default vs 3 min), but less UI visibility (CLI-focused).