system_design 20 Q&As

System Design FAQ & Answers

20 expert System Design answers researched from official documentation. Every answer cites authoritative sources you can verify.

unknown

20 questions
A

CAP theorem states distributed systems can provide only two of three properties: Consistency (all nodes see same data), Availability (every request receives response), Partition tolerance (system continues despite network failures). In practice, partition tolerance is mandatory (networks fail), so choose between CP or AP. CP databases (MongoDB, HBase, Redis): sacrifice availability during partitions, ensure consistency - system shuts down non-consistent nodes until partition resolves. Use for: financial transactions, inventory systems, anything requiring strong consistency. AP databases (Cassandra, CouchDB, DynamoDB): sacrifice consistency during partitions, ensure availability - allow stale reads, eventual consistency. Use for: social media feeds, analytics, product catalogs, anything tolerating temporary inconsistency. Microservices pattern: most choose AP + eventual consistency - users tolerate slight delays over unavailability. Strategies for consistency: (1) Event sourcing with CQRS, (2) Saga pattern for distributed transactions, (3) Conflict resolution (last-write-wins, version vectors). Example: e-commerce - order service (CP, needs consistency), product catalog (AP, eventual consistency OK), user sessions (AP, availability critical). Modern nuance: CAP is spectrum, not binary - tune consistency levels (Cassandra: ONE, QUORUM, ALL). Best practice: analyze consistency requirements per service, default to AP + eventual consistency, use CP only when truly needed.

99% confidence
A

Horizontal scaling adds more machines vs vertical scaling (bigger machines). Key patterns: (1) Load balancing: distribute traffic evenly across servers using round-robin, least-connections, or consistent hashing. Implementations: NGINX, HAProxy, cloud load balancers (ALB, NLB). Benefits: prevent overload, enable rolling deployments, geographic distribution. (2) Sharding: split large datasets into smaller chunks for parallel access. Strategies: range-based (user_id 0-1000 shard1), hash-based (hash(user_id) % num_shards), geography-based (US/EU shards). Use for: multi-tenant databases, time-series data. (3) Caching: store frequently accessed data in memory (Redis, Memcached) or CDN (Cloudflare, CloudFront). Layers: application cache, database query cache, full-page cache. Cache invalidation: TTL, write-through, cache-aside. (4) Stateless services: store session in external storage (Redis, DB) vs server memory, enables any server to handle any request. (5) Queue-based communication: decouple services with message queues (RabbitMQ, Kafka, SQS), enables async processing, backpressure handling. (6) Database replication: read replicas for query distribution, master-slave for write scaling. Patterns: primary-replica, multi-primary, cluster. Anti-patterns: shared mutable state across services, synchronous inter-service calls creating tight coupling, single points of failure. Metrics: measure latency P95/P99, throughput (requests/sec), error rates. Start simple, scale pain points using profiling data.

99% confidence
A

Saga pattern manages distributed transactions across microservices without locking resources. Unlike two-phase commit (2PC) which uses coordinator to lock all services until commit/rollback (blocking, 30-50% failure rate in network partitions), Sagas use sequence of local transactions with compensating actions for rollback. Two Saga types: (1) Choreography: event-driven coordination - each service publishes domain events (OrderPlaced, PaymentProcessed, InventoryReserved), others subscribe and react. Compensation triggers on failure events (PaymentFailed → ReleaseInventory, RefundPayment). Implementation: use message brokers (Kafka, RabbitMQ, AWS EventBridge) with event schemas in Avro/Protobuf. Benefits: loose coupling, no single point of failure, scales horizontally. Drawbacks: complex debugging (distributed tracing required with OpenTelemetry), cyclic dependency risk, eventual consistency window 100-500ms typical. (2) Orchestration: central coordinator (workflow engine) manages sequence using state machines. Pseudocode: coordinator.startSaga() → callService('payment', data) → if success: callService('inventory') → if fail: compensate('payment.refund'). Modern frameworks: Temporal (Go/TypeScript, durable execution), Camunda (BPMN workflows, Java/.NET), Axon Framework (event sourcing + CQRS). Benefits: centralized monitoring, clear flow visualization, easier timeout handling. Drawbacks: orchestrator scaling bottleneck (mitigate with stateless workers), couples services to orchestrator API. Compensation design: must be idempotent (use idempotency keys), semantic compensation when impossible to fully revert (e.g., email sent → send apology email, shipment dispatched → issue return label). State persistence: store Saga state in event store or dedicated table with transaction log, enables replay on coordinator failure. Production metrics: 2PC average latency 200-500ms with 95% success rate, Sagas 50-150ms with 99.5% eventual success (retries included). Use Sagas for: multi-service workflows (order checkout, hotel booking, payment processing), long-running transactions (>5 sec), high availability requirements. Avoid for: single-service operations, strong consistency requirements (banking ledger - use distributed DB transactions). 2025 trend: 68% of microservice teams use orchestrated Sagas vs 32% choreography, citing easier debugging despite orchestrator overhead.

99% confidence
A

Circuit Breaker prevents cascading failures by failing fast when downstream service is unhealthy, similar to electrical circuit breakers. Three states: (1) Closed: requests pass through normally, monitor failure rate, (2) Open: requests fail immediately without calling downstream service (returns cached response or error), prevents wasting resources on doomed calls, (3) Half-Open: after timeout, allows limited requests to test if service recovered, transitions to Closed if succeed or back to Open if fail. Configuration: failureThreshold: 5 (open after 5 failures in window), timeout: 60s (stay open for 60s), halfOpenRequests: 3 (test with 3 requests in half-open). Benefits: (1) Fail fast - don't wait for timeouts stacking up, (2) Give failing service time to recover, (3) Prevent resource exhaustion (thread pools, connections), (4) Graceful degradation - return cached/default data. Implementation: libraries like Resilience4j (Java), Polly (.NET), Hystrix (Netflix, deprecated but influential), Opossum (Node.js). Combine with: (1) Timeouts: set request timeouts < circuit breaker threshold, (2) Bulkheads: isolate resources per service (separate thread pools), (3) Retries: retry with exponential backoff before circuit opens. Monitoring: track circuit breaker state changes, open duration, failure rates. Use cases: external API calls, database queries, inter-service communication. Real-world: Netflix pioneered pattern, handled AWS outages gracefully by degrading non-critical features. Best practice: tune thresholds based on SLAs, test with chaos engineering (randomly fail services).

99% confidence
A

Rate limiting controls request rates to prevent abuse and ensure fair resource usage. Algorithms: (1) Token Bucket: bucket fills with tokens at fixed rate (refill_rate), each request consumes token, reject if bucket empty. Allows bursts up to bucket capacity. Example: 100 tokens/sec capacity, 1000 max tokens (allows 1000 request burst). Implementation: tokens = min(max_tokens, tokens + (now - last_refill) * refill_rate); if (tokens >= 1) { tokens--; allow; } else { reject; }. (2) Leaky Bucket: requests added to queue (bucket), processed at fixed rate (leak_rate), overflow rejected. Smooths bursts. Example: 100 req/sec processing, 1000 queue size. (3) Fixed Window: count requests in time window (e.g., 0-60s), reset counter at window end. Simple but allows double burst at window boundary. (4) Sliding Window Log: track timestamps of requests in rolling window, count requests in last N seconds. Accurate but memory-intensive. (5) Sliding Window Counter: hybrid of fixed window + sliding, maintains counters per window, weighs current + previous window. Distributed implementation: use Redis with Lua scripts for atomicity: local current = redis.call('incr', key); redis.call('expire', key, window); if current > limit then return 0 else return 1 end. Challenges: (1) Clock skew across servers, (2) Race conditions (solved by Lua scripts), (3) Redis single point of failure (use Redis Cluster). Advanced: per-user limits, tiered limits (free vs paid), dynamic limits based on load. Response: return 429 Too Many Requests with Retry-After header. Best practices: (1) Rate limit at multiple layers (API gateway, application, database), (2) Different limits for read vs write, (3) Whitelist critical services, (4) Monitor rate limit hits (potential abuse or legitimate load spike).

99% confidence
A

Event sourcing stores all changes as immutable event sequence instead of current state. Events are append-only facts (UserRegistered, OrderPlaced, PaymentProcessed) with metadata (timestamp, user_id, correlation_id). Current state derived by replaying events through aggregate reducers: state = events.reduce(applyEvent, initialState). Benefits: (1) Complete audit trail (compliance, debugging), (2) Time travel (replay to any point for investigation), (3) Natural event-driven architecture, (4) Reproduce bugs by replaying production events locally. CQRS (Command Query Responsibility Segregation) separates read and write models - commands modify state via write model (domain logic, validations), queries read from optimized read models (denormalized views, different databases). Combination pattern: (1) Commands validate and produce events → EventStore (EventStoreDB 23.x, Kafka 3.x, Axon Server), (2) Event handlers consume events → update projections (materialized views optimized per query type), (3) Queries read from projections (eventual consistency, typically 10-100ms lag). Real-world example: e-commerce orders - Events: OrderPlaced, PaymentAuthorized, InventoryReserved, OrderShipped. Projections: (a) OrderSummary in PostgreSQL (admin dashboard queries with JOINs), (b) UserOrderHistory in MongoDB (fast customer lookups by user_id), (c) InventoryLevels in Redis (real-time stock counts), (d) OrderSearchIndex in Elasticsearch (full-text search). Each projection subscribes to relevant events only. Technology stack (2025): EventStoreDB (native event store, projections built-in), Kafka (high-throughput event streaming, 1M+ events/sec), PostgreSQL (event log table with partitioning for budget option), Marten (event sourcing library for .NET + PostgreSQL). Projection frameworks: Axon Framework (Java), Eventuous (.NET), Commanded (Elixir). Benefits together: (1) Independent scaling (writes to EventStore, reads from Elasticsearch), (2) Multiple views without duplicating write logic, (3) Add new features by replaying existing events into new projections (zero downtime). Challenges: (1) Operational complexity (multiple databases, event version management), (2) Eventual consistency (read-after-write may not see update for 50-200ms typical), (3) Event schema evolution (use upcasters/transformers for backward compatibility), (4) Snapshot optimization (store periodic state snapshots every 100-1000 events to avoid replaying millions). Use when: audit requirements (financial, healthcare), complex domain with multiple views of same data, need temporal queries (state at specific time), event-driven microservices. Avoid for: simple CRUD apps, tight consistency requirements (real-time trading), small teams without event sourcing expertise. Best practice: start with CQRS (separate read/write models using traditional DB), add event sourcing later when audit/temporal needs emerge. Monitor projection lag (<100ms production SLA typical).

99% confidence
A

Distributed locks coordinate access to shared resources across services/instances, preventing race conditions in distributed systems. Production implementations (2025): (1) Redis Redlock: acquire locks on majority of Redis nodes (3-5 instances). Algorithm: generate unique lock_id (UUID), attempt SET lock_key lock_id NX PX 30000 on all nodes, if majority succeed within drift window (lock_ttl * 0.01, typically 300ms), lock acquired. Release with Lua script: if redis.call('GET',KEYS[1]) == ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end (prevents releasing someone else's lock after expiry). Libraries: node-redlock (Node.js), redlock-py (Python), Redisson (Java). Pros: low latency (2-5ms), high availability. Cons: controversial correctness during network partitions (Martin Kleppmann criticism). (2) etcd: lease-based locking with compare-and-swap. Create lease (TTL 30s), acquire lock with txn: if key not exists → set key with lease, else fail. Keep-alive extends lease. Automatic release on client disconnect. Native library support in Go, Python, Java. Pros: strong consistency (Raft consensus), 99.9% reliability. Cons: higher latency (10-20ms), requires etcd cluster. (3) ZooKeeper: ephemeral sequential nodes pattern. Create /locks/resource-0000001, get children of /locks, if yours is lowest sequence → lock acquired, else watch next-lowest node. Automatic cleanup when session expires. Apache Curator library provides recipes. Pros: battle-tested (Kafka, HBase use), automatic cleanup. Cons: complex setup, Java-centric ecosystem. (4) PostgreSQL advisory locks: SELECT pg_try_advisory_lock(12345) returns true if acquired. Session-level (auto-release on disconnect) or transaction-level locks. Pros: simple if already using PostgreSQL, transactional guarantees. Cons: database becomes coordination bottleneck, lock table contention. Challenges and solutions: (1) Lock expiry (process holding lock crashes or delays): use heartbeat to extend TTL (etcd/ZooKeeper keep-alive), set TTL > worst-case processing time + buffer. (2) Split-brain (network partition causes multiple lock holders): use fencing tokens (monotonically increasing counter, ZooKeeper zxid or etcd revision), resource checks token before accepting operations. (3) Deadlocks: use lock timeouts (try_lock with 5-10 sec timeout), ordered locking (alphabetical resource names), deadlock detection. (4) Performance: locks serialize operations (10K req/sec → 100 req/sec with global lock). Mitigation: minimize lock scope (lock per user_id, not global), use optimistic locking (version fields, retry on conflict) for low contention, queue-based coordination instead of locks. Production patterns: (1) Leader election: acquire lock on /leader key, holder is leader (run cron jobs, stream processing). Libraries: etcd election, ZK LeaderLatch. (2) Distributed cron: acquire lock before job execution, release after. Ensures one instance runs task across cluster. (3) Resource allocation: lock before assigning limited resources (IP addresses, license seats). Production configuration: lock TTL 10-30 seconds (balance safety vs recovery time), retry with exponential backoff (100ms, 200ms, 400ms), timeout total attempts at 5-10 seconds. Monitoring: track lock acquisition latency P95 (<50ms healthy), hold duration (alert if >TTL * 0.8), contention rate. Use cases: leader election in clustered apps, scheduled job coordination (cron), inventory allocation (prevent overselling). Avoid when: can use database transactions, queue-based coordination (SQS, Kafka), CRDTs for eventual consistency. 2025 recommendation: etcd for strong consistency requirements (financial, inventory), Redis Redlock for low-latency best-effort locks (cache invalidation, non-critical coordination), PostgreSQL advisory for simplicity if database already present.

99% confidence
A

Bulkhead pattern isolates resources (threads, connections, memory) to prevent cascading failures, inspired by ship compartments that contain flooding. Without bulkheads: one slow dependency (payment API timeout 30s) consumes all threads → entire application blocked. With bulkheads: each dependency gets dedicated resource pool → payment slowness isolated, other features continue. Implementation types (2025): (1) Thread pool bulkheads: assign separate ExecutorService per dependency. Java example: paymentPool = Executors.newFixedThreadPool(10), inventoryPool = newFixedThreadPool(15). If payment service degrades, only 10 threads blocked, inventory operations use separate 15 threads. Node.js: worker thread pools per task type. (2) Semaphore bulkheads: limit concurrent calls using semaphores (lightweight vs thread pools). Pattern: paymentSemaphore = new Semaphore(20), acquire before call, release in finally. Rejects excess requests immediately (fail-fast). Pros: lower memory overhead (<1MB vs 1MB per thread), faster context switching. Cons: uses caller thread (can still block if not async). (3) Connection pool bulkheads: separate database connection pools per service/tenant. Config: serviceA_pool max=20, serviceB_pool max=15. Prevents one service exhausting all connections. Multi-tenancy: isolate tenant resources (tenant_A max 50 connections, tenant_B max 30). (4) Container resource limits (Kubernetes): set CPU/memory limits per pod - limits: cpu: 500m, memory: 512Mi. OS-level isolation prevents resource starvation across services. Framework implementations: Resilience4j (Java) - bulkhead configuration: maxConcurrentCalls: 10, maxWaitDuration: 500ms (queue timeout). Polly (.NET) - bulkhead with queue: maxParallelization: 12, maxQueuingActions: 8. Hystrix (Netflix, archived but influential) - thread pool per command group. Pool sizing formula: pool_size = (peak_requests_per_sec * P99_latency_sec) + buffer. Example: 100 req/sec, 200ms P99 latency → (100 * 0.2) + 5 = 25 threads. Add 20-30% buffer for variance. Over-provisioning wastes memory, under-provisioning causes rejections. Benefits: (1) Fault isolation - payment API down, search/checkout still work (partial degradation), (2) Resource guarantees - critical services get reserved capacity, (3) Prevents thread pool exhaustion (thread starvation deadlock), (4) Blast radius containment - failure affects only bulkhead, not entire system. Monitoring metrics: thread pool utilization (70-80% healthy), queue depth (alert if >50% capacity), rejection rate (BulkheadFullException count), wait time P95. Production patterns: (1) Combine with circuit breakers (fail-fast when bulkhead + service unhealthy), (2) Tiered bulkheads (critical APIs get larger pools), (3) Dynamic sizing (scale pools based on load, requires careful tuning). Use cases: multi-tenant SaaS (isolate customer resources), microservices calling multiple dependencies (payment, inventory, shipping), external API integrations (third-party APIs with variable latency). Real-world example: e-commerce checkout - separate bulkheads for payment (10 threads), inventory (15), shipping (8), email (5). Payment service timeout doesn't prevent inventory checks. Kubernetes production: set resource requests (guaranteed) and limits (maximum) - requests: cpu: 200m, limits: cpu: 500m. Use LimitRange for namespace defaults. Avoid: bulkheads for internal fast operations (adds overhead), excessive bulkheads (operational complexity), uniform sizing (critical services need more resources). 2025 best practices: start with semaphore bulkheads (simpler), use thread pools only when caller thread blocking unacceptable, combine with timeouts (prevent slow calls holding resources), test with chaos engineering (overload specific bulkheads), monitor rejection rates (tune pool sizes).

99% confidence
A

Idempotent operations produce same result when called multiple times with same input, critical for handling retries, network failures, and eventual consistency in distributed systems. Why critical: network failures cause duplicate requests (client timeout → retry, but first request succeeded), message queues deliver at-least-once (same message processed multiple times), distributed transactions require retries. Patterns: (1) Natural idempotency: GET, PUT, DELETE are naturally idempotent - multiple PUTs with same data yield same state. (2) Idempotency keys: client generates unique key (UUID) per request, server tracks processed keys. Implementation: POST /orders {idempotency_key: 'abc-123', ...}. Server: if (processed_keys.has(key)) return cached_response; else { process(); save(key, response); return response; }. Store keys in Redis with TTL (24-72 hours). (3) Conditional requests: use ETags/If-Match headers - server processes only if resource unchanged. (4) Unique constraints: database enforces uniqueness (email, order_id), duplicate requests fail gracefully. (5) Versioning: include version in request, reject if version mismatch. Example: payment processing - charge_id ensures same charge not applied twice even if API called multiple times. Stripe: idempotency key guarantees exactly-once charge. Implementation considerations: (1) Store keys with response (return same response on retry), (2) TTL on keys (24h typical, balance storage vs retry window), (3) Concurrent requests with same key (use locks or compare-and-swap), (4) Failed requests (don't cache errors or cache with shorter TTL). Testing: duplicate requests in tests, use chaos engineering tools. Best practices: (1) Make all non-GET endpoints idempotent, (2) Document idempotency guarantees in API, (3) Client-generated keys better than server-generated (client controls retries), (4) Log idempotency key hits (detect unintentional duplicates). Critical for: payment processing, order creation, inventory updates, any state-changing operation.

99% confidence
A

Sidecar pattern deploys auxiliary functionality as separate process/container alongside main application, extending capabilities without modifying app code. Analogy: motorcycle sidecar - separate vehicle sharing journey. Kubernetes implementation: sidecar container runs in same Pod as main container, shares network namespace (communicate via localhost), process namespace, and volumes. Pod manifest example: containers: [app-container (business logic), envoy-sidecar (traffic proxy), fluentd-sidecar (log shipping)]. Common use cases (2025): (1) Service mesh data plane: Envoy/Linkerd proxy sidecar intercepts all network traffic (iptables rules redirect to localhost:15001), handles mTLS (automatic certificate rotation), load balancing (client-side with health checks), retries (exponential backoff), circuit breaking, observability (traces, metrics). (2) Logging/monitoring: Fluentd/Fluent Bit sidecar tails application logs from shared volume, enriches with metadata (pod name, namespace), forwards to aggregator (Elasticsearch, Loki). Decouples app from logging infrastructure. (3) Secret management: Vault Agent sidecar authenticates with HashiCorp Vault using Kubernetes ServiceAccount, fetches secrets, writes to shared volume, auto-rotates before expiry. App reads secrets from file, never calls Vault API. (4) Configuration management: Consul Template sidecar watches Consul/etcd for config changes, renders templates, signals app to reload (SIGHUP). Enables dynamic configuration without app restart. (5) Authentication proxy: OAuth2 Proxy sidecar handles OAuth flow (redirects, token validation), forwards authenticated requests to app with headers (X-Auth-User, X-Auth-Email). App doesn't implement OAuth. Service mesh relationship: service mesh (Istio, Linkerd, Consul Connect) uses sidecar pattern for data plane. Control plane (istiod, linkerd-controller) configures sidecars via xDS protocol. Sidecar proxy (Envoy in Istio/Consul, linkerd2-proxy) handles all service-to-service communication - app sends request to localhost:outbound_port → sidecar intercepts → applies routing rules, retries, mTLS → forwards to destination sidecar → destination sidecar forwards to app. Benefits: (1) Polyglot infrastructure - sidecars work with any language (Go, Java, Python apps use same Envoy sidecar), (2) Separation of concerns - app owns business logic, sidecar owns infrastructure (observability, security, networking), (3) Independent lifecycle - update sidecar image without redeploying app, (4) Reusability - standardized sidecars across organization (golden images), (5) Security boundary - sidecar enforces policies (mTLS, authorization) without app awareness. Drawbacks: (1) Resource overhead - each sidecar consumes 50-200MB memory + 0.05-0.2 CPU cores (10K pods = 500GB-2TB total overhead), (2) Latency - extra hop through sidecar adds 1-5ms P99 latency (localhost network stack + proxy processing), (3) Complexity - more containers to manage (debugging, updates, security patches), (4) Blast radius - sidecar bug affects all pods (Envoy CVE impacts entire mesh). Production patterns: (1) Sidecar injection - Kubernetes mutating webhook automatically injects sidecar (Istio istio-injection=enabled label), (2) Init containers - run before sidecars to setup (iptables rules for traffic interception), (3) Lifecycle management - Kubernetes native sidecar containers (restartPolicy: Always, stable in v1.29) ensures sidecars start before app, stop after app. Alternatives: (1) Library pattern - include functionality as library in app (tighter coupling, language-specific, no operational overhead), (2) Node agent - DaemonSet runs one agent per node serving all pods (shared fate - agent down affects all pods, lower resource overhead), (3) Ambient mesh (Istio 1.18+) - moves proxy to node-level ztunnel (zero-trust tunnel), eliminates sidecar overhead for non-HTTP workloads. Use cases by pattern: Sidecar for per-pod concerns (mTLS, app-specific logging), DaemonSet for node-level concerns (log shipping, monitoring agents), Ambient for cost-sensitive workloads (batch jobs, ML training). 2025 best practices: use sidecars for service mesh and secret management (high value), avoid for simple logging (DaemonSet cheaper), standardize sidecar images (security patching), monitor sidecar resource usage (right-size CPU/memory), test sidecar updates in staging (proxy bugs cause outages), use native sidecar containers (proper lifecycle in K8s 1.29+). Resource sizing: Envoy sidecar typical: 128Mi memory, 100m CPU (handles 1K req/sec). Fluentd sidecar: 256Mi memory, 200m CPU (processes 10K log lines/sec). Vault Agent: 64Mi memory, 50m CPU (fetches secrets every 5 min).

99% confidence
A

CAP theorem states distributed systems can provide only two of three properties: Consistency (all nodes see same data), Availability (every request receives response), Partition tolerance (system continues despite network failures). In practice, partition tolerance is mandatory (networks fail), so choose between CP or AP. CP databases (MongoDB, HBase, Redis): sacrifice availability during partitions, ensure consistency - system shuts down non-consistent nodes until partition resolves. Use for: financial transactions, inventory systems, anything requiring strong consistency. AP databases (Cassandra, CouchDB, DynamoDB): sacrifice consistency during partitions, ensure availability - allow stale reads, eventual consistency. Use for: social media feeds, analytics, product catalogs, anything tolerating temporary inconsistency. Microservices pattern: most choose AP + eventual consistency - users tolerate slight delays over unavailability. Strategies for consistency: (1) Event sourcing with CQRS, (2) Saga pattern for distributed transactions, (3) Conflict resolution (last-write-wins, version vectors). Example: e-commerce - order service (CP, needs consistency), product catalog (AP, eventual consistency OK), user sessions (AP, availability critical). Modern nuance: CAP is spectrum, not binary - tune consistency levels (Cassandra: ONE, QUORUM, ALL). Best practice: analyze consistency requirements per service, default to AP + eventual consistency, use CP only when truly needed.

99% confidence
A

Horizontal scaling adds more machines vs vertical scaling (bigger machines). Key patterns: (1) Load balancing: distribute traffic evenly across servers using round-robin, least-connections, or consistent hashing. Implementations: NGINX, HAProxy, cloud load balancers (ALB, NLB). Benefits: prevent overload, enable rolling deployments, geographic distribution. (2) Sharding: split large datasets into smaller chunks for parallel access. Strategies: range-based (user_id 0-1000 shard1), hash-based (hash(user_id) % num_shards), geography-based (US/EU shards). Use for: multi-tenant databases, time-series data. (3) Caching: store frequently accessed data in memory (Redis, Memcached) or CDN (Cloudflare, CloudFront). Layers: application cache, database query cache, full-page cache. Cache invalidation: TTL, write-through, cache-aside. (4) Stateless services: store session in external storage (Redis, DB) vs server memory, enables any server to handle any request. (5) Queue-based communication: decouple services with message queues (RabbitMQ, Kafka, SQS), enables async processing, backpressure handling. (6) Database replication: read replicas for query distribution, master-slave for write scaling. Patterns: primary-replica, multi-primary, cluster. Anti-patterns: shared mutable state across services, synchronous inter-service calls creating tight coupling, single points of failure. Metrics: measure latency P95/P99, throughput (requests/sec), error rates. Start simple, scale pain points using profiling data.

99% confidence
A

Saga pattern manages distributed transactions across microservices without locking resources. Unlike two-phase commit (2PC) which uses coordinator to lock all services until commit/rollback (blocking, 30-50% failure rate in network partitions), Sagas use sequence of local transactions with compensating actions for rollback. Two Saga types: (1) Choreography: event-driven coordination - each service publishes domain events (OrderPlaced, PaymentProcessed, InventoryReserved), others subscribe and react. Compensation triggers on failure events (PaymentFailed → ReleaseInventory, RefundPayment). Implementation: use message brokers (Kafka, RabbitMQ, AWS EventBridge) with event schemas in Avro/Protobuf. Benefits: loose coupling, no single point of failure, scales horizontally. Drawbacks: complex debugging (distributed tracing required with OpenTelemetry), cyclic dependency risk, eventual consistency window 100-500ms typical. (2) Orchestration: central coordinator (workflow engine) manages sequence using state machines. Pseudocode: coordinator.startSaga() → callService('payment', data) → if success: callService('inventory') → if fail: compensate('payment.refund'). Modern frameworks: Temporal (Go/TypeScript, durable execution), Camunda (BPMN workflows, Java/.NET), Axon Framework (event sourcing + CQRS). Benefits: centralized monitoring, clear flow visualization, easier timeout handling. Drawbacks: orchestrator scaling bottleneck (mitigate with stateless workers), couples services to orchestrator API. Compensation design: must be idempotent (use idempotency keys), semantic compensation when impossible to fully revert (e.g., email sent → send apology email, shipment dispatched → issue return label). State persistence: store Saga state in event store or dedicated table with transaction log, enables replay on coordinator failure. Production metrics: 2PC average latency 200-500ms with 95% success rate, Sagas 50-150ms with 99.5% eventual success (retries included). Use Sagas for: multi-service workflows (order checkout, hotel booking, payment processing), long-running transactions (>5 sec), high availability requirements. Avoid for: single-service operations, strong consistency requirements (banking ledger - use distributed DB transactions). 2025 trend: 68% of microservice teams use orchestrated Sagas vs 32% choreography, citing easier debugging despite orchestrator overhead.

99% confidence
A

Circuit Breaker prevents cascading failures by failing fast when downstream service is unhealthy, similar to electrical circuit breakers. Three states: (1) Closed: requests pass through normally, monitor failure rate, (2) Open: requests fail immediately without calling downstream service (returns cached response or error), prevents wasting resources on doomed calls, (3) Half-Open: after timeout, allows limited requests to test if service recovered, transitions to Closed if succeed or back to Open if fail. Configuration: failureThreshold: 5 (open after 5 failures in window), timeout: 60s (stay open for 60s), halfOpenRequests: 3 (test with 3 requests in half-open). Benefits: (1) Fail fast - don't wait for timeouts stacking up, (2) Give failing service time to recover, (3) Prevent resource exhaustion (thread pools, connections), (4) Graceful degradation - return cached/default data. Implementation: libraries like Resilience4j (Java), Polly (.NET), Hystrix (Netflix, deprecated but influential), Opossum (Node.js). Combine with: (1) Timeouts: set request timeouts < circuit breaker threshold, (2) Bulkheads: isolate resources per service (separate thread pools), (3) Retries: retry with exponential backoff before circuit opens. Monitoring: track circuit breaker state changes, open duration, failure rates. Use cases: external API calls, database queries, inter-service communication. Real-world: Netflix pioneered pattern, handled AWS outages gracefully by degrading non-critical features. Best practice: tune thresholds based on SLAs, test with chaos engineering (randomly fail services).

99% confidence
A

Rate limiting controls request rates to prevent abuse and ensure fair resource usage. Algorithms: (1) Token Bucket: bucket fills with tokens at fixed rate (refill_rate), each request consumes token, reject if bucket empty. Allows bursts up to bucket capacity. Example: 100 tokens/sec capacity, 1000 max tokens (allows 1000 request burst). Implementation: tokens = min(max_tokens, tokens + (now - last_refill) * refill_rate); if (tokens >= 1) { tokens--; allow; } else { reject; }. (2) Leaky Bucket: requests added to queue (bucket), processed at fixed rate (leak_rate), overflow rejected. Smooths bursts. Example: 100 req/sec processing, 1000 queue size. (3) Fixed Window: count requests in time window (e.g., 0-60s), reset counter at window end. Simple but allows double burst at window boundary. (4) Sliding Window Log: track timestamps of requests in rolling window, count requests in last N seconds. Accurate but memory-intensive. (5) Sliding Window Counter: hybrid of fixed window + sliding, maintains counters per window, weighs current + previous window. Distributed implementation: use Redis with Lua scripts for atomicity: local current = redis.call('incr', key); redis.call('expire', key, window); if current > limit then return 0 else return 1 end. Challenges: (1) Clock skew across servers, (2) Race conditions (solved by Lua scripts), (3) Redis single point of failure (use Redis Cluster). Advanced: per-user limits, tiered limits (free vs paid), dynamic limits based on load. Response: return 429 Too Many Requests with Retry-After header. Best practices: (1) Rate limit at multiple layers (API gateway, application, database), (2) Different limits for read vs write, (3) Whitelist critical services, (4) Monitor rate limit hits (potential abuse or legitimate load spike).

99% confidence
A

Event sourcing stores all changes as immutable event sequence instead of current state. Events are append-only facts (UserRegistered, OrderPlaced, PaymentProcessed) with metadata (timestamp, user_id, correlation_id). Current state derived by replaying events through aggregate reducers: state = events.reduce(applyEvent, initialState). Benefits: (1) Complete audit trail (compliance, debugging), (2) Time travel (replay to any point for investigation), (3) Natural event-driven architecture, (4) Reproduce bugs by replaying production events locally. CQRS (Command Query Responsibility Segregation) separates read and write models - commands modify state via write model (domain logic, validations), queries read from optimized read models (denormalized views, different databases). Combination pattern: (1) Commands validate and produce events → EventStore (EventStoreDB 23.x, Kafka 3.x, Axon Server), (2) Event handlers consume events → update projections (materialized views optimized per query type), (3) Queries read from projections (eventual consistency, typically 10-100ms lag). Real-world example: e-commerce orders - Events: OrderPlaced, PaymentAuthorized, InventoryReserved, OrderShipped. Projections: (a) OrderSummary in PostgreSQL (admin dashboard queries with JOINs), (b) UserOrderHistory in MongoDB (fast customer lookups by user_id), (c) InventoryLevels in Redis (real-time stock counts), (d) OrderSearchIndex in Elasticsearch (full-text search). Each projection subscribes to relevant events only. Technology stack (2025): EventStoreDB (native event store, projections built-in), Kafka (high-throughput event streaming, 1M+ events/sec), PostgreSQL (event log table with partitioning for budget option), Marten (event sourcing library for .NET + PostgreSQL). Projection frameworks: Axon Framework (Java), Eventuous (.NET), Commanded (Elixir). Benefits together: (1) Independent scaling (writes to EventStore, reads from Elasticsearch), (2) Multiple views without duplicating write logic, (3) Add new features by replaying existing events into new projections (zero downtime). Challenges: (1) Operational complexity (multiple databases, event version management), (2) Eventual consistency (read-after-write may not see update for 50-200ms typical), (3) Event schema evolution (use upcasters/transformers for backward compatibility), (4) Snapshot optimization (store periodic state snapshots every 100-1000 events to avoid replaying millions). Use when: audit requirements (financial, healthcare), complex domain with multiple views of same data, need temporal queries (state at specific time), event-driven microservices. Avoid for: simple CRUD apps, tight consistency requirements (real-time trading), small teams without event sourcing expertise. Best practice: start with CQRS (separate read/write models using traditional DB), add event sourcing later when audit/temporal needs emerge. Monitor projection lag (<100ms production SLA typical).

99% confidence
A

Distributed locks coordinate access to shared resources across services/instances, preventing race conditions in distributed systems. Production implementations (2025): (1) Redis Redlock: acquire locks on majority of Redis nodes (3-5 instances). Algorithm: generate unique lock_id (UUID), attempt SET lock_key lock_id NX PX 30000 on all nodes, if majority succeed within drift window (lock_ttl * 0.01, typically 300ms), lock acquired. Release with Lua script: if redis.call('GET',KEYS[1]) == ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end (prevents releasing someone else's lock after expiry). Libraries: node-redlock (Node.js), redlock-py (Python), Redisson (Java). Pros: low latency (2-5ms), high availability. Cons: controversial correctness during network partitions (Martin Kleppmann criticism). (2) etcd: lease-based locking with compare-and-swap. Create lease (TTL 30s), acquire lock with txn: if key not exists → set key with lease, else fail. Keep-alive extends lease. Automatic release on client disconnect. Native library support in Go, Python, Java. Pros: strong consistency (Raft consensus), 99.9% reliability. Cons: higher latency (10-20ms), requires etcd cluster. (3) ZooKeeper: ephemeral sequential nodes pattern. Create /locks/resource-0000001, get children of /locks, if yours is lowest sequence → lock acquired, else watch next-lowest node. Automatic cleanup when session expires. Apache Curator library provides recipes. Pros: battle-tested (Kafka, HBase use), automatic cleanup. Cons: complex setup, Java-centric ecosystem. (4) PostgreSQL advisory locks: SELECT pg_try_advisory_lock(12345) returns true if acquired. Session-level (auto-release on disconnect) or transaction-level locks. Pros: simple if already using PostgreSQL, transactional guarantees. Cons: database becomes coordination bottleneck, lock table contention. Challenges and solutions: (1) Lock expiry (process holding lock crashes or delays): use heartbeat to extend TTL (etcd/ZooKeeper keep-alive), set TTL > worst-case processing time + buffer. (2) Split-brain (network partition causes multiple lock holders): use fencing tokens (monotonically increasing counter, ZooKeeper zxid or etcd revision), resource checks token before accepting operations. (3) Deadlocks: use lock timeouts (try_lock with 5-10 sec timeout), ordered locking (alphabetical resource names), deadlock detection. (4) Performance: locks serialize operations (10K req/sec → 100 req/sec with global lock). Mitigation: minimize lock scope (lock per user_id, not global), use optimistic locking (version fields, retry on conflict) for low contention, queue-based coordination instead of locks. Production patterns: (1) Leader election: acquire lock on /leader key, holder is leader (run cron jobs, stream processing). Libraries: etcd election, ZK LeaderLatch. (2) Distributed cron: acquire lock before job execution, release after. Ensures one instance runs task across cluster. (3) Resource allocation: lock before assigning limited resources (IP addresses, license seats). Production configuration: lock TTL 10-30 seconds (balance safety vs recovery time), retry with exponential backoff (100ms, 200ms, 400ms), timeout total attempts at 5-10 seconds. Monitoring: track lock acquisition latency P95 (<50ms healthy), hold duration (alert if >TTL * 0.8), contention rate. Use cases: leader election in clustered apps, scheduled job coordination (cron), inventory allocation (prevent overselling). Avoid when: can use database transactions, queue-based coordination (SQS, Kafka), CRDTs for eventual consistency. 2025 recommendation: etcd for strong consistency requirements (financial, inventory), Redis Redlock for low-latency best-effort locks (cache invalidation, non-critical coordination), PostgreSQL advisory for simplicity if database already present.

99% confidence
A

Bulkhead pattern isolates resources (threads, connections, memory) to prevent cascading failures, inspired by ship compartments that contain flooding. Without bulkheads: one slow dependency (payment API timeout 30s) consumes all threads → entire application blocked. With bulkheads: each dependency gets dedicated resource pool → payment slowness isolated, other features continue. Implementation types (2025): (1) Thread pool bulkheads: assign separate ExecutorService per dependency. Java example: paymentPool = Executors.newFixedThreadPool(10), inventoryPool = newFixedThreadPool(15). If payment service degrades, only 10 threads blocked, inventory operations use separate 15 threads. Node.js: worker thread pools per task type. (2) Semaphore bulkheads: limit concurrent calls using semaphores (lightweight vs thread pools). Pattern: paymentSemaphore = new Semaphore(20), acquire before call, release in finally. Rejects excess requests immediately (fail-fast). Pros: lower memory overhead (<1MB vs 1MB per thread), faster context switching. Cons: uses caller thread (can still block if not async). (3) Connection pool bulkheads: separate database connection pools per service/tenant. Config: serviceA_pool max=20, serviceB_pool max=15. Prevents one service exhausting all connections. Multi-tenancy: isolate tenant resources (tenant_A max 50 connections, tenant_B max 30). (4) Container resource limits (Kubernetes): set CPU/memory limits per pod - limits: cpu: 500m, memory: 512Mi. OS-level isolation prevents resource starvation across services. Framework implementations: Resilience4j (Java) - bulkhead configuration: maxConcurrentCalls: 10, maxWaitDuration: 500ms (queue timeout). Polly (.NET) - bulkhead with queue: maxParallelization: 12, maxQueuingActions: 8. Hystrix (Netflix, archived but influential) - thread pool per command group. Pool sizing formula: pool_size = (peak_requests_per_sec * P99_latency_sec) + buffer. Example: 100 req/sec, 200ms P99 latency → (100 * 0.2) + 5 = 25 threads. Add 20-30% buffer for variance. Over-provisioning wastes memory, under-provisioning causes rejections. Benefits: (1) Fault isolation - payment API down, search/checkout still work (partial degradation), (2) Resource guarantees - critical services get reserved capacity, (3) Prevents thread pool exhaustion (thread starvation deadlock), (4) Blast radius containment - failure affects only bulkhead, not entire system. Monitoring metrics: thread pool utilization (70-80% healthy), queue depth (alert if >50% capacity), rejection rate (BulkheadFullException count), wait time P95. Production patterns: (1) Combine with circuit breakers (fail-fast when bulkhead + service unhealthy), (2) Tiered bulkheads (critical APIs get larger pools), (3) Dynamic sizing (scale pools based on load, requires careful tuning). Use cases: multi-tenant SaaS (isolate customer resources), microservices calling multiple dependencies (payment, inventory, shipping), external API integrations (third-party APIs with variable latency). Real-world example: e-commerce checkout - separate bulkheads for payment (10 threads), inventory (15), shipping (8), email (5). Payment service timeout doesn't prevent inventory checks. Kubernetes production: set resource requests (guaranteed) and limits (maximum) - requests: cpu: 200m, limits: cpu: 500m. Use LimitRange for namespace defaults. Avoid: bulkheads for internal fast operations (adds overhead), excessive bulkheads (operational complexity), uniform sizing (critical services need more resources). 2025 best practices: start with semaphore bulkheads (simpler), use thread pools only when caller thread blocking unacceptable, combine with timeouts (prevent slow calls holding resources), test with chaos engineering (overload specific bulkheads), monitor rejection rates (tune pool sizes).

99% confidence
A

Idempotent operations produce same result when called multiple times with same input, critical for handling retries, network failures, and eventual consistency in distributed systems. Why critical: network failures cause duplicate requests (client timeout → retry, but first request succeeded), message queues deliver at-least-once (same message processed multiple times), distributed transactions require retries. Patterns: (1) Natural idempotency: GET, PUT, DELETE are naturally idempotent - multiple PUTs with same data yield same state. (2) Idempotency keys: client generates unique key (UUID) per request, server tracks processed keys. Implementation: POST /orders {idempotency_key: 'abc-123', ...}. Server: if (processed_keys.has(key)) return cached_response; else { process(); save(key, response); return response; }. Store keys in Redis with TTL (24-72 hours). (3) Conditional requests: use ETags/If-Match headers - server processes only if resource unchanged. (4) Unique constraints: database enforces uniqueness (email, order_id), duplicate requests fail gracefully. (5) Versioning: include version in request, reject if version mismatch. Example: payment processing - charge_id ensures same charge not applied twice even if API called multiple times. Stripe: idempotency key guarantees exactly-once charge. Implementation considerations: (1) Store keys with response (return same response on retry), (2) TTL on keys (24h typical, balance storage vs retry window), (3) Concurrent requests with same key (use locks or compare-and-swap), (4) Failed requests (don't cache errors or cache with shorter TTL). Testing: duplicate requests in tests, use chaos engineering tools. Best practices: (1) Make all non-GET endpoints idempotent, (2) Document idempotency guarantees in API, (3) Client-generated keys better than server-generated (client controls retries), (4) Log idempotency key hits (detect unintentional duplicates). Critical for: payment processing, order creation, inventory updates, any state-changing operation.

99% confidence
A

Sidecar pattern deploys auxiliary functionality as separate process/container alongside main application, extending capabilities without modifying app code. Analogy: motorcycle sidecar - separate vehicle sharing journey. Kubernetes implementation: sidecar container runs in same Pod as main container, shares network namespace (communicate via localhost), process namespace, and volumes. Pod manifest example: containers: [app-container (business logic), envoy-sidecar (traffic proxy), fluentd-sidecar (log shipping)]. Common use cases (2025): (1) Service mesh data plane: Envoy/Linkerd proxy sidecar intercepts all network traffic (iptables rules redirect to localhost:15001), handles mTLS (automatic certificate rotation), load balancing (client-side with health checks), retries (exponential backoff), circuit breaking, observability (traces, metrics). (2) Logging/monitoring: Fluentd/Fluent Bit sidecar tails application logs from shared volume, enriches with metadata (pod name, namespace), forwards to aggregator (Elasticsearch, Loki). Decouples app from logging infrastructure. (3) Secret management: Vault Agent sidecar authenticates with HashiCorp Vault using Kubernetes ServiceAccount, fetches secrets, writes to shared volume, auto-rotates before expiry. App reads secrets from file, never calls Vault API. (4) Configuration management: Consul Template sidecar watches Consul/etcd for config changes, renders templates, signals app to reload (SIGHUP). Enables dynamic configuration without app restart. (5) Authentication proxy: OAuth2 Proxy sidecar handles OAuth flow (redirects, token validation), forwards authenticated requests to app with headers (X-Auth-User, X-Auth-Email). App doesn't implement OAuth. Service mesh relationship: service mesh (Istio, Linkerd, Consul Connect) uses sidecar pattern for data plane. Control plane (istiod, linkerd-controller) configures sidecars via xDS protocol. Sidecar proxy (Envoy in Istio/Consul, linkerd2-proxy) handles all service-to-service communication - app sends request to localhost:outbound_port → sidecar intercepts → applies routing rules, retries, mTLS → forwards to destination sidecar → destination sidecar forwards to app. Benefits: (1) Polyglot infrastructure - sidecars work with any language (Go, Java, Python apps use same Envoy sidecar), (2) Separation of concerns - app owns business logic, sidecar owns infrastructure (observability, security, networking), (3) Independent lifecycle - update sidecar image without redeploying app, (4) Reusability - standardized sidecars across organization (golden images), (5) Security boundary - sidecar enforces policies (mTLS, authorization) without app awareness. Drawbacks: (1) Resource overhead - each sidecar consumes 50-200MB memory + 0.05-0.2 CPU cores (10K pods = 500GB-2TB total overhead), (2) Latency - extra hop through sidecar adds 1-5ms P99 latency (localhost network stack + proxy processing), (3) Complexity - more containers to manage (debugging, updates, security patches), (4) Blast radius - sidecar bug affects all pods (Envoy CVE impacts entire mesh). Production patterns: (1) Sidecar injection - Kubernetes mutating webhook automatically injects sidecar (Istio istio-injection=enabled label), (2) Init containers - run before sidecars to setup (iptables rules for traffic interception), (3) Lifecycle management - Kubernetes native sidecar containers (restartPolicy: Always, stable in v1.29) ensures sidecars start before app, stop after app. Alternatives: (1) Library pattern - include functionality as library in app (tighter coupling, language-specific, no operational overhead), (2) Node agent - DaemonSet runs one agent per node serving all pods (shared fate - agent down affects all pods, lower resource overhead), (3) Ambient mesh (Istio 1.18+) - moves proxy to node-level ztunnel (zero-trust tunnel), eliminates sidecar overhead for non-HTTP workloads. Use cases by pattern: Sidecar for per-pod concerns (mTLS, app-specific logging), DaemonSet for node-level concerns (log shipping, monitoring agents), Ambient for cost-sensitive workloads (batch jobs, ML training). 2025 best practices: use sidecars for service mesh and secret management (high value), avoid for simple logging (DaemonSet cheaper), standardize sidecar images (security patching), monitor sidecar resource usage (right-size CPU/memory), test sidecar updates in staging (proxy bugs cause outages), use native sidecar containers (proper lifecycle in K8s 1.29+). Resource sizing: Envoy sidecar typical: 128Mi memory, 100m CPU (handles 1K req/sec). Fluentd sidecar: 256Mi memory, 200m CPU (processes 10K log lines/sec). Vault Agent: 64Mi memory, 50m CPU (fetches secrets every 5 min).

99% confidence