Rate limiting (2025 production patterns): Controls request rates per client preventing API abuse (DDoS attacks, scraping bots), ensures fair usage (prevents single user consuming all resources), protects backend services (database, external APIs from overload). Production implementations: (1) express-rate-limit (simple in-memory) - Basic setup: const rateLimit = require('express-rate-limit'); const limiter = rateLimit({ windowMs: 15 * 60 * 1000 (15 minutes), max: 100 (100 requests per window per client), standardHeaders: true (RateLimit-* headers in response), legacyHeaders: false (disable X-RateLimit-* deprecated headers), message: 'Too many requests, please try again later' (429 response body), handler: (req, res) => res.status(429).json({ error: 'Rate limit exceeded', retryAfter: req.rateLimit.resetTime }) }); app.use(limiter) applies globally. Use case: Single-server apps (<10K req/sec), development/staging environments, quick prototyping. Limitation: In-memory storage lost on restart, not shared across instances (each server tracks limits independently, users can bypass by hitting different servers). (2) Redis-based distributed rate limiting - Production setup: const RedisStore = require('rate-limit-redis'); const redis = require('redis').createClient({ host: 'redis.example.com', port: 6379 }); const limiter = rateLimit({ store: new RedisStore({ client: redis, prefix: 'rl:' (Redis key prefix for namespacing), expiry: 900 (TTL in seconds, auto-cleanup old keys) }), windowMs: 60000 (1 minute window), max: 100 (100 requests per minute) }); app.use(limiter). Benefits: Shared state across all application instances (horizontally scalable, consistent limits regardless of which server handles request), persistent across restarts (Redis retains counts), high performance (Redis INCR operation <1ms). Use case: Multi-instance deployments (Kubernetes, load-balanced servers), production APIs (10K-1M+ req/sec). Architecture: All app servers connect to shared Redis cluster, each request increments counter for client key (IP or user ID), Redis TTL automatically expires old windows. (3) Token bucket algorithm (burst-friendly) - Implementation: const { RateLimiter } = require('limiter'); const limiter = new RateLimiter({ tokensPerInterval: 10 (refill rate: 10 tokens/interval), interval: 'second' (refill every second), fireImmediately: true (first request doesn't wait) }); await limiter.removeTokens(1) consumes token or waits if bucket empty. Behavior: Allows bursts (client accumulates tokens during idle periods, can consume all 10 instantly), smooths over time (sustained rate limited to 10/sec). Advanced: Cost-based tokens - expensive endpoints cost more (search costs 5 tokens, read costs 1 token), prevents abuse of heavy operations. Use case: API clients with bursty traffic patterns (analytics dashboards, periodic syncs), microservices communication (allow occasional spikes). (4) Sliding window algorithm (most accurate) - Prevents boundary gaming: Fixed window problem - client sends 100 requests at 1:59pm (allowed), another 100 at 2:00pm (new window, allowed), total 200 requests in 2 minutes (defeats 100/min limit). Sliding window solution - tracks request timestamps, counts requests in rolling 60-second window from current time, prevents boundary exploitation. Implementation (Redis Lua script): local key = KEYS[1]; local now = tonumber(ARGV[1]); local window = tonumber(ARGV[2]); local limit = tonumber(ARGV[3]); redis.call('ZREMRANGEBYSCORE', key, 0, now - window) (remove expired timestamps); local current = redis.call('ZCARD', key) (count requests in window); if current < limit then redis.call('ZADD', key, now, now) (add new request); redis.call('EXPIRE', key, window) (set TTL); return 0 (allowed); else return 1 (rate limited); end. Use case: Strict rate enforcement (billing APIs, security-sensitive endpoints), regulatory compliance (GDPR request limits). Rate limiting strategies (2025 production): (1) Global application-wide limit - app.use(globalLimiter) applies to all routes, prevents total server overload (10K req/sec max capacity). Use 80% of max (8K req/sec limit) for safety margin. (2) Per-endpoint limits - Different limits per route criticality: app.use('/api/search', createRateLimiter({ max: 10, windowMs: 60000 })) for expensive search (10/min), app.use('/api/read', createRateLimiter({ max: 1000, windowMs: 60000 })) for cheap reads (1000/min). Protects backend resources (search hits database hard, reads cached). (3) Per-user limits (authenticated) - Key by user ID instead of IP: const limiter = rateLimit({ keyGenerator: (req) => req.user.id (extract from JWT/session), max: (req) => req.user.tier === 'paid' ? 10000 : 100 (tiered limits) }). Prevents single user monopolizing resources, enables monetization (paid tiers get higher limits). Requires authentication middleware before rate limiter. (4) Tiered pricing limits - Free tier: 100 requests/hour, Basic ($10/month): 10K requests/hour, Pro ($50/month): 100K requests/hour, Enterprise (custom): unlimited with reserved capacity. Implementation: store tier in user object, limiter reads req.user.tier for max calculation. Revenue optimization: track usage, send upgrade prompts when approaching limit (You've used 95/100 free requests, upgrade to Basic for 10K/hour). (5) IP + user hybrid - Unauthenticated requests: limit by IP (100/hour prevents scraping), authenticated requests: limit by user ID (10K/hour for logged-in users). Implementation: keyGenerator: (req) => req.user ? req.user.id : req.ip switches strategy. Prevents abuse while supporting legitimate high-volume users. Response headers (RFC standard 2025): RateLimit-Limit: 100 (max requests per window), RateLimit-Remaining: 73 (requests left in current window), RateLimit-Reset: 1678901234 (Unix timestamp when window resets), Retry-After: 57 (seconds until client can retry, included in 429 response). Clients parse headers to implement backoff (wait until reset before retrying). Error response format: { error: 'Rate limit exceeded', message: 'Too many requests. Limit: 100 requests per 15 minutes', retryAfter: 57 (seconds), resetTime: '2025-03-15T14:30:00Z' (ISO 8601) }. HTTP status: 429 Too Many Requests (RFC 6585). Advanced patterns (enterprise 2025): (1) Distributed rate limiting with Redis Cluster - Sharded across multiple Redis nodes (horizontal scaling to millions of clients), consistent hashing routes client keys to same node (prevents split-brain counts), cluster-aware client handles failover automatically. Performance: 100K rate limit checks/second per Redis node, 10-node cluster handles 1M checks/second. (2) Cost-based limiting - Assign weights to operations: POST /api/heavy costs 10 tokens (expensive database write), GET /api/light costs 1 token (cheap cache read). Limiter tracks token consumption: limiter.consume(req.user.id, operationCost) deducts appropriate amount. Prevents users bypassing limits with cheap requests while monopolizing heavy operations. (3) Exponential backoff penalties - First violation: 1-hour ban (temporary lockout), second violation (within 24 hours): 6-hour ban, third violation: 24-hour ban, fourth violation: permanent ban (requires manual review). Implementation: Redis tracks violation count with TTL, middleware checks ban status before processing request. Deters persistent abusers. (4) Geographic distribution - Deploy rate limiters per region (US-East, EU-West, APAC), each region enforces independent limits (prevents cross-region contention), aggregate quotas tracked in central database (nightly rollup for billing). Use case: Global CDN architecture, compliance with regional regulations (GDPR, data residency). (5) Graceful degradation - When rate limited, serve stale cached data instead of 429 error (better UX than hard rejection), include Cache-Control: max-age=0, stale-while-revalidate=86400 header (indicates cached response). Example: Search API returns last successful results with note: Using cached results due to rate limit. Monitoring and observability (production 2025): Track metrics: rate_limit_hits_total (counter: total 429 responses), rate_limit_remaining_gauge (current available requests per client), rate_limit_window_resets_total (counter: windows expired). Alerts: Spike in rate limit hits (>10% of traffic) indicates bot attack or legitimate traffic growth, geographic anomaly (single country causing 80% of rate limits) suggests coordinated abuse, per-user anomaly (single user hitting limit repeatedly) flags account for review. Dashboard: Real-time graph of rate limit hits by endpoint, client, geographic region. Best practices (2025 production): (1) Layer defense - CDN rate limiting (Cloudflare: 10K req/sec), load balancer rate limiting (nginx: 1K req/sec per backend), application rate limiting (Express: 100 req/min per user). Multiple checkpoints catch different attack vectors (DDoS at CDN, scraping at app). (2) Differentiated limits - Anonymous users: strict limits (100/hour) prevents abuse, authenticated users: generous limits (10K/hour) supports legitimate use, internal services: no limits (trusted traffic), premium users: highest limits (100K/hour) monetization. (3) Whitelist trusted clients - IP allowlist for internal services (monitoring tools, cron jobs), API key bypass for partners (verified integrations), user role bypass (admin accounts for support). Implementation: middleware checks whitelist before rate limiter, returns next() without counting. (4) Implement jitter in retries - When sending Retry-After header, add random 0-30 second jitter (prevents thundering herd when many clients retry simultaneously). Example: Retry-After: 60 + Math.floor(Math.random() * 30) spreads retries over 60-90 second window. (5) Monitor and adjust dynamically - Analyze traffic patterns weekly (peak hours, seasonal trends), adjust limits to 120% of p95 usage (accommodates growth, prevents false positives), A/B test limit changes (gradual rollout, measure impact on conversion rates). Performance characteristics (2025 benchmarks): In-memory rate limiting: <0.1ms overhead per request (negligible impact). Redis rate limiting: 1-2ms overhead (Redis INCR + network RTT), 500 req/sec per connection (connection pooling essential for 10K+ req/sec). Sliding window (Lua script): 2-3ms overhead (Lua execution + ZSET operations), more accurate but slower than fixed window. Token bucket: <0.5ms overhead (local calculation, no I/O). Common pitfalls and solutions (2025): (1) IP-based limiting behind proxies - Problem: All requests appear from load balancer IP (single IP limit applies to all users, legitimate traffic blocked). Solution: Use X-Forwarded-For header with trust proxy: app.set('trust proxy', 1) in Express, limiter uses req.ip (rightmost untrusted IP). Security: Validate X-Forwarded-For to prevent spoofing (only trust if request from known load balancer). (2) Memory leaks with in-memory stores - Problem: Client keys accumulate indefinitely (memory grows unbounded, OOM crash). Solution: Use LRU eviction in express-rate-limit: new MemoryStore({ max: 10000 }) limits to 10K clients, oldest evicted first. Better: Use Redis (built-in TTL cleanup). (3) Race conditions in distributed systems - Problem: Multiple servers increment Redis counter simultaneously (count exceeds limit briefly). Solution: Use Redis Lua scripts (atomic execution), implement optimistic locking (check-then-increment), or accept small overages (<1% over limit acceptable for most use cases). (4) Cascading failures - Problem: Rate limiting dependency service (Redis down) blocks all requests (100% error rate). Solution: Fail-open strategy: catch(err => next()) allows requests if rate limiter errors (temporary bypass during outage), log errors for investigation, alert on-call engineer. Circuit breaker pattern: After 10 consecutive Redis failures, open circuit (bypass rate limiting for 60 seconds), periodically test (half-open state) until Redis recovers. Essential for production APIs (protects against abuse while enabling 10K-1M+ req/sec throughput with <2ms overhead).
Top 50 Largest Qas FAQ & Answers
4 expert Top 50 Largest Qas answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
4 questionsPersistent Volume (PV) Access Modes (2025): Define how volumes can be mounted by nodes and pods, critical for multi-pod storage access patterns and data safety. Four access modes: (1) ReadWriteOnce (RWO): Volume mounted read-write by single node only. Multiple pods on SAME node can mount volume simultaneously (node-level restriction, not pod-level). Most common mode for stateful workloads. Use cases: Single-instance databases (PostgreSQL, MySQL primary replica), file-based locks, stateful apps requiring exclusive write access per instance. Supported by: AWS EBS (gp3, io2), GCP Persistent Disk (pd-ssd, pd-balanced), Azure Disk (Premium SSD, Standard SSD), Ceph RBD, iSCSI, local volumes. Behavior: PVC with RWO bound to PV, first pod on node-A mounts successfully, second pod on node-A also mounts (shares node), third pod scheduled to node-B fails with Multi-Attach error (volume already attached to node-A). StatefulSet pattern: each pod gets separate PVC/PV (web-0 → pvc-0 → pv-0, web-1 → pvc-1 → pv-1), ensures no conflicts. (2) ReadOnlyMany (ROX): Volume mounted read-only by multiple nodes simultaneously. All pods across cluster can read, no pod can write. Use cases: Shared configuration files (application configs, ML models, static assets), read replicas consuming immutable datasets, content delivery (serving static website assets). Supported by: NFS, GCP Persistent Disk (read-only mode), Azure File, CephFS, GlusterFS, HostPath (read-only). Pattern: Single PVC with ROX access mode, multiple pods across nodes mount for reading. Initiate writes externally or via separate RWO/RWX volume, then expose via ROX for safe multi-reader access. Example: ML inference pods reading shared model file (model.pkl) from NFS volume, training job writes model via RWX, inference pods read via ROX. (3) ReadWriteMany (RWX): Volume mounted read-write by multiple nodes simultaneously. All pods across cluster can read AND write concurrently. Use cases: Shared storage for distributed applications (shared log aggregation, multi-writer file systems, collaborative editing), content management systems (WordPress multi-instance with shared uploads directory), distributed caches. Supported by: NFS (most common), Azure File (SMB-based), CephFS (POSIX-compliant distributed FS), GlusterFS, Portworx, AWS EFS (via EFS CSI driver), GCP Filestore (via Cloud Filestore CSI). NOT supported by block storage (AWS EBS, GCP Persistent Disk, Azure Disk - block devices cannot multi-attach for writes). Performance consideration: RWX volumes typically network-based file systems with higher latency than local/block storage (NFS: 1-5ms latency vs EBS: 0.5-1ms). Consistency: Concurrent writes require application-level coordination (file locking, atomic operations) - filesystem doesn't guarantee write ordering across nodes. Example: WordPress Deployment with 3 replicas, PVC with RWX (storageClassName: nfs-client), all pods mount shared /var/www/html for uploaded media files. (4) ReadWriteOncePod (RWOP, Kubernetes 1.29 stable): Volume mounted read-write by single pod only across entire cluster (strictest isolation). Ensures exclusive access at pod level, not just node level (stronger than RWO). Use cases: Databases requiring absolute exclusivity (prevents split-brain during failover), license-restricted software (single-instance constraint), compliance requirements (audit logs with single-writer guarantee). Supported by: CSI drivers implementing SINGLE_NODE_SINGLE_WRITER capability (AWS EBS CSI 1.13+, GCP PD CSI 1.8+, Azure Disk CSI 1.23+, Longhorn, Portworx). Behavior: PVC with RWOP bound to PV, first pod mounts successfully, second pod anywhere in cluster (even same node) fails with volume already in use error. StatefulSet with RWOP: only one replica can exist at a time for each PVC (replica count > 1 causes pending pods). Migration from RWO: Change PVC accessModes from RWO to RWOP requires recreating PVC (not in-place update). Example: PostgreSQL primary with RWOP PVC ensures no accidental multi-master scenario during pod rescheduling. Access mode matrix by volume type (2025): Block storage (AWS EBS, GCP PD, Azure Disk, Ceph RBD): RWO, RWOP only (block devices cannot multi-attach for writes, kernel limitation). File storage (NFS, Azure File, CephFS, GlusterFS): RWO, ROX, RWX, RWOP all supported (network file systems allow multi-mount). Local storage (local volumes, hostPath): RWO only (tied to single node, cannot migrate). Cloud file services: AWS EFS (RWX via EFS CSI), GCP Filestore (RWX via Filestore CSI), Azure Files (RWX via SMB). Specifying access modes in PVC: accessModes field is array (supports multiple modes), but PV binding selects single mode (first matching mode). Example PVC: accessModes: [ReadWriteOnce], resources.requests.storage: 10Gi, storageClassName: gp3-encrypted. Kubernetes matches PVC to PV with compatible access mode + sufficient capacity + matching storage class. Common pitfalls (2025): (1) RWO doesn't mean single pod: Multiple pods on same node can mount RWO volume (common confusion). For single-pod guarantee, use RWOP. (2) RWX not available on EBS/Azure Disk: Attempting RWX PVC with gp3 storage class fails with ProvisioningFailed (block storage limitation). Solution: Switch to NFS/EFS storage class or redesign for pod-local storage. (3) Performance degradation with RWX: Network file systems slower than block storage. Benchmark: EBS gp3 (16,000 IOPS, 0.5ms latency) vs NFS (1,000-5,000 IOPS, 2-5ms latency). Use RWX only when multi-writer truly needed. (4) Data corruption with RWX: Concurrent writes without locking cause file corruption. Example: Two pods writing to same log file simultaneously without coordination → interleaved corrupted log entries. Solution: Application-level file locking (flock), database-backed storage, or message queue pattern. (5) Zone affinity with RWO: PV created in zone us-east-1a, pod scheduled to zone us-east-1b fails to mount (volume in different zone). Solution: Use volumeBindingMode: WaitForFirstConsumer in StorageClass (delays PV provisioning until pod scheduled, ensures same zone). Production best practices: (1) Use RWOP for single-writer databases: PostgreSQL, MySQL, MariaDB primary replicas should use RWOP to prevent accidental dual-mount during failover (eliminates split-brain risk). (2) Avoid RWX unless required: Prefer pod-local storage with external object storage (S3, GCS) for shared data. Pattern: Pods write to local disk, background sync to S3, other pods read from S3 (avoids RWX network overhead). (3) Test storage class access modes: Before production, verify PVC with accessModes: [ReadWriteMany] actually provisions RWX-capable volume (some storage classes silently fall back to RWO). (4) Monitor PV binding failures: kubectl describe pvc shows ProvisioningFailed events with reason: volume plugin does not support ReadWriteMany. (5) Document mode selection rationale: In Helm charts/Terraform, comment why RWX chosen (example: Required for WordPress multi-replica shared uploads, NFS storage class mandatory). Access mode transitions: RWO → RWOP: Requires PVC recreation (delete PVC, recreate with RWOP, data lost unless backed up to separate volume). RWX → RWO: Not recommended (existing multi-mount pods will fail). Solution: Scale down to 1 replica before changing mode. CSI driver capabilities: Modern CSI drivers expose capabilities (SINGLE_NODE_READER_ONLY, SINGLE_NODE_WRITER, MULTI_NODE_READER_ONLY, MULTI_NODE_MULTI_WRITER, SINGLE_NODE_SINGLE_WRITER for RWOP). Check driver docs: kubectl get csidriver, kubectl describe csidriver ebs.csi.aws.com. Troubleshooting: PVC stuck Pending: kubectl describe pvc shows no persistent volumes available (no PV with matching access mode + capacity + storage class). Pod stuck ContainerCreating: kubectl describe pod shows Multi-Attach error (RWO volume already attached to different node, requires pod eviction from original node first). Volume mount fails: Check node logs (dmesg | grep -i mount), CSI driver logs (kubectl logs -n kube-system deploy/ebs-csi-controller).
Rate limiting (2025 production patterns): Controls request rates per client preventing API abuse (DDoS attacks, scraping bots), ensures fair usage (prevents single user consuming all resources), protects backend services (database, external APIs from overload). Production implementations: (1) express-rate-limit (simple in-memory) - Basic setup: const rateLimit = require('express-rate-limit'); const limiter = rateLimit({ windowMs: 15 * 60 * 1000 (15 minutes), max: 100 (100 requests per window per client), standardHeaders: true (RateLimit-* headers in response), legacyHeaders: false (disable X-RateLimit-* deprecated headers), message: 'Too many requests, please try again later' (429 response body), handler: (req, res) => res.status(429).json({ error: 'Rate limit exceeded', retryAfter: req.rateLimit.resetTime }) }); app.use(limiter) applies globally. Use case: Single-server apps (<10K req/sec), development/staging environments, quick prototyping. Limitation: In-memory storage lost on restart, not shared across instances (each server tracks limits independently, users can bypass by hitting different servers). (2) Redis-based distributed rate limiting - Production setup: const RedisStore = require('rate-limit-redis'); const redis = require('redis').createClient({ host: 'redis.example.com', port: 6379 }); const limiter = rateLimit({ store: new RedisStore({ client: redis, prefix: 'rl:' (Redis key prefix for namespacing), expiry: 900 (TTL in seconds, auto-cleanup old keys) }), windowMs: 60000 (1 minute window), max: 100 (100 requests per minute) }); app.use(limiter). Benefits: Shared state across all application instances (horizontally scalable, consistent limits regardless of which server handles request), persistent across restarts (Redis retains counts), high performance (Redis INCR operation <1ms). Use case: Multi-instance deployments (Kubernetes, load-balanced servers), production APIs (10K-1M+ req/sec). Architecture: All app servers connect to shared Redis cluster, each request increments counter for client key (IP or user ID), Redis TTL automatically expires old windows. (3) Token bucket algorithm (burst-friendly) - Implementation: const { RateLimiter } = require('limiter'); const limiter = new RateLimiter({ tokensPerInterval: 10 (refill rate: 10 tokens/interval), interval: 'second' (refill every second), fireImmediately: true (first request doesn't wait) }); await limiter.removeTokens(1) consumes token or waits if bucket empty. Behavior: Allows bursts (client accumulates tokens during idle periods, can consume all 10 instantly), smooths over time (sustained rate limited to 10/sec). Advanced: Cost-based tokens - expensive endpoints cost more (search costs 5 tokens, read costs 1 token), prevents abuse of heavy operations. Use case: API clients with bursty traffic patterns (analytics dashboards, periodic syncs), microservices communication (allow occasional spikes). (4) Sliding window algorithm (most accurate) - Prevents boundary gaming: Fixed window problem - client sends 100 requests at 1:59pm (allowed), another 100 at 2:00pm (new window, allowed), total 200 requests in 2 minutes (defeats 100/min limit). Sliding window solution - tracks request timestamps, counts requests in rolling 60-second window from current time, prevents boundary exploitation. Implementation (Redis Lua script): local key = KEYS[1]; local now = tonumber(ARGV[1]); local window = tonumber(ARGV[2]); local limit = tonumber(ARGV[3]); redis.call('ZREMRANGEBYSCORE', key, 0, now - window) (remove expired timestamps); local current = redis.call('ZCARD', key) (count requests in window); if current < limit then redis.call('ZADD', key, now, now) (add new request); redis.call('EXPIRE', key, window) (set TTL); return 0 (allowed); else return 1 (rate limited); end. Use case: Strict rate enforcement (billing APIs, security-sensitive endpoints), regulatory compliance (GDPR request limits). Rate limiting strategies (2025 production): (1) Global application-wide limit - app.use(globalLimiter) applies to all routes, prevents total server overload (10K req/sec max capacity). Use 80% of max (8K req/sec limit) for safety margin. (2) Per-endpoint limits - Different limits per route criticality: app.use('/api/search', createRateLimiter({ max: 10, windowMs: 60000 })) for expensive search (10/min), app.use('/api/read', createRateLimiter({ max: 1000, windowMs: 60000 })) for cheap reads (1000/min). Protects backend resources (search hits database hard, reads cached). (3) Per-user limits (authenticated) - Key by user ID instead of IP: const limiter = rateLimit({ keyGenerator: (req) => req.user.id (extract from JWT/session), max: (req) => req.user.tier === 'paid' ? 10000 : 100 (tiered limits) }). Prevents single user monopolizing resources, enables monetization (paid tiers get higher limits). Requires authentication middleware before rate limiter. (4) Tiered pricing limits - Free tier: 100 requests/hour, Basic ($10/month): 10K requests/hour, Pro ($50/month): 100K requests/hour, Enterprise (custom): unlimited with reserved capacity. Implementation: store tier in user object, limiter reads req.user.tier for max calculation. Revenue optimization: track usage, send upgrade prompts when approaching limit (You've used 95/100 free requests, upgrade to Basic for 10K/hour). (5) IP + user hybrid - Unauthenticated requests: limit by IP (100/hour prevents scraping), authenticated requests: limit by user ID (10K/hour for logged-in users). Implementation: keyGenerator: (req) => req.user ? req.user.id : req.ip switches strategy. Prevents abuse while supporting legitimate high-volume users. Response headers (RFC standard 2025): RateLimit-Limit: 100 (max requests per window), RateLimit-Remaining: 73 (requests left in current window), RateLimit-Reset: 1678901234 (Unix timestamp when window resets), Retry-After: 57 (seconds until client can retry, included in 429 response). Clients parse headers to implement backoff (wait until reset before retrying). Error response format: { error: 'Rate limit exceeded', message: 'Too many requests. Limit: 100 requests per 15 minutes', retryAfter: 57 (seconds), resetTime: '2025-03-15T14:30:00Z' (ISO 8601) }. HTTP status: 429 Too Many Requests (RFC 6585). Advanced patterns (enterprise 2025): (1) Distributed rate limiting with Redis Cluster - Sharded across multiple Redis nodes (horizontal scaling to millions of clients), consistent hashing routes client keys to same node (prevents split-brain counts), cluster-aware client handles failover automatically. Performance: 100K rate limit checks/second per Redis node, 10-node cluster handles 1M checks/second. (2) Cost-based limiting - Assign weights to operations: POST /api/heavy costs 10 tokens (expensive database write), GET /api/light costs 1 token (cheap cache read). Limiter tracks token consumption: limiter.consume(req.user.id, operationCost) deducts appropriate amount. Prevents users bypassing limits with cheap requests while monopolizing heavy operations. (3) Exponential backoff penalties - First violation: 1-hour ban (temporary lockout), second violation (within 24 hours): 6-hour ban, third violation: 24-hour ban, fourth violation: permanent ban (requires manual review). Implementation: Redis tracks violation count with TTL, middleware checks ban status before processing request. Deters persistent abusers. (4) Geographic distribution - Deploy rate limiters per region (US-East, EU-West, APAC), each region enforces independent limits (prevents cross-region contention), aggregate quotas tracked in central database (nightly rollup for billing). Use case: Global CDN architecture, compliance with regional regulations (GDPR, data residency). (5) Graceful degradation - When rate limited, serve stale cached data instead of 429 error (better UX than hard rejection), include Cache-Control: max-age=0, stale-while-revalidate=86400 header (indicates cached response). Example: Search API returns last successful results with note: Using cached results due to rate limit. Monitoring and observability (production 2025): Track metrics: rate_limit_hits_total (counter: total 429 responses), rate_limit_remaining_gauge (current available requests per client), rate_limit_window_resets_total (counter: windows expired). Alerts: Spike in rate limit hits (>10% of traffic) indicates bot attack or legitimate traffic growth, geographic anomaly (single country causing 80% of rate limits) suggests coordinated abuse, per-user anomaly (single user hitting limit repeatedly) flags account for review. Dashboard: Real-time graph of rate limit hits by endpoint, client, geographic region. Best practices (2025 production): (1) Layer defense - CDN rate limiting (Cloudflare: 10K req/sec), load balancer rate limiting (nginx: 1K req/sec per backend), application rate limiting (Express: 100 req/min per user). Multiple checkpoints catch different attack vectors (DDoS at CDN, scraping at app). (2) Differentiated limits - Anonymous users: strict limits (100/hour) prevents abuse, authenticated users: generous limits (10K/hour) supports legitimate use, internal services: no limits (trusted traffic), premium users: highest limits (100K/hour) monetization. (3) Whitelist trusted clients - IP allowlist for internal services (monitoring tools, cron jobs), API key bypass for partners (verified integrations), user role bypass (admin accounts for support). Implementation: middleware checks whitelist before rate limiter, returns next() without counting. (4) Implement jitter in retries - When sending Retry-After header, add random 0-30 second jitter (prevents thundering herd when many clients retry simultaneously). Example: Retry-After: 60 + Math.floor(Math.random() * 30) spreads retries over 60-90 second window. (5) Monitor and adjust dynamically - Analyze traffic patterns weekly (peak hours, seasonal trends), adjust limits to 120% of p95 usage (accommodates growth, prevents false positives), A/B test limit changes (gradual rollout, measure impact on conversion rates). Performance characteristics (2025 benchmarks): In-memory rate limiting: <0.1ms overhead per request (negligible impact). Redis rate limiting: 1-2ms overhead (Redis INCR + network RTT), 500 req/sec per connection (connection pooling essential for 10K+ req/sec). Sliding window (Lua script): 2-3ms overhead (Lua execution + ZSET operations), more accurate but slower than fixed window. Token bucket: <0.5ms overhead (local calculation, no I/O). Common pitfalls and solutions (2025): (1) IP-based limiting behind proxies - Problem: All requests appear from load balancer IP (single IP limit applies to all users, legitimate traffic blocked). Solution: Use X-Forwarded-For header with trust proxy: app.set('trust proxy', 1) in Express, limiter uses req.ip (rightmost untrusted IP). Security: Validate X-Forwarded-For to prevent spoofing (only trust if request from known load balancer). (2) Memory leaks with in-memory stores - Problem: Client keys accumulate indefinitely (memory grows unbounded, OOM crash). Solution: Use LRU eviction in express-rate-limit: new MemoryStore({ max: 10000 }) limits to 10K clients, oldest evicted first. Better: Use Redis (built-in TTL cleanup). (3) Race conditions in distributed systems - Problem: Multiple servers increment Redis counter simultaneously (count exceeds limit briefly). Solution: Use Redis Lua scripts (atomic execution), implement optimistic locking (check-then-increment), or accept small overages (<1% over limit acceptable for most use cases). (4) Cascading failures - Problem: Rate limiting dependency service (Redis down) blocks all requests (100% error rate). Solution: Fail-open strategy: catch(err => next()) allows requests if rate limiter errors (temporary bypass during outage), log errors for investigation, alert on-call engineer. Circuit breaker pattern: After 10 consecutive Redis failures, open circuit (bypass rate limiting for 60 seconds), periodically test (half-open state) until Redis recovers. Essential for production APIs (protects against abuse while enabling 10K-1M+ req/sec throughput with <2ms overhead).
Persistent Volume (PV) Access Modes (2025): Define how volumes can be mounted by nodes and pods, critical for multi-pod storage access patterns and data safety. Four access modes: (1) ReadWriteOnce (RWO): Volume mounted read-write by single node only. Multiple pods on SAME node can mount volume simultaneously (node-level restriction, not pod-level). Most common mode for stateful workloads. Use cases: Single-instance databases (PostgreSQL, MySQL primary replica), file-based locks, stateful apps requiring exclusive write access per instance. Supported by: AWS EBS (gp3, io2), GCP Persistent Disk (pd-ssd, pd-balanced), Azure Disk (Premium SSD, Standard SSD), Ceph RBD, iSCSI, local volumes. Behavior: PVC with RWO bound to PV, first pod on node-A mounts successfully, second pod on node-A also mounts (shares node), third pod scheduled to node-B fails with Multi-Attach error (volume already attached to node-A). StatefulSet pattern: each pod gets separate PVC/PV (web-0 → pvc-0 → pv-0, web-1 → pvc-1 → pv-1), ensures no conflicts. (2) ReadOnlyMany (ROX): Volume mounted read-only by multiple nodes simultaneously. All pods across cluster can read, no pod can write. Use cases: Shared configuration files (application configs, ML models, static assets), read replicas consuming immutable datasets, content delivery (serving static website assets). Supported by: NFS, GCP Persistent Disk (read-only mode), Azure File, CephFS, GlusterFS, HostPath (read-only). Pattern: Single PVC with ROX access mode, multiple pods across nodes mount for reading. Initiate writes externally or via separate RWO/RWX volume, then expose via ROX for safe multi-reader access. Example: ML inference pods reading shared model file (model.pkl) from NFS volume, training job writes model via RWX, inference pods read via ROX. (3) ReadWriteMany (RWX): Volume mounted read-write by multiple nodes simultaneously. All pods across cluster can read AND write concurrently. Use cases: Shared storage for distributed applications (shared log aggregation, multi-writer file systems, collaborative editing), content management systems (WordPress multi-instance with shared uploads directory), distributed caches. Supported by: NFS (most common), Azure File (SMB-based), CephFS (POSIX-compliant distributed FS), GlusterFS, Portworx, AWS EFS (via EFS CSI driver), GCP Filestore (via Cloud Filestore CSI). NOT supported by block storage (AWS EBS, GCP Persistent Disk, Azure Disk - block devices cannot multi-attach for writes). Performance consideration: RWX volumes typically network-based file systems with higher latency than local/block storage (NFS: 1-5ms latency vs EBS: 0.5-1ms). Consistency: Concurrent writes require application-level coordination (file locking, atomic operations) - filesystem doesn't guarantee write ordering across nodes. Example: WordPress Deployment with 3 replicas, PVC with RWX (storageClassName: nfs-client), all pods mount shared /var/www/html for uploaded media files. (4) ReadWriteOncePod (RWOP, Kubernetes 1.29 stable): Volume mounted read-write by single pod only across entire cluster (strictest isolation). Ensures exclusive access at pod level, not just node level (stronger than RWO). Use cases: Databases requiring absolute exclusivity (prevents split-brain during failover), license-restricted software (single-instance constraint), compliance requirements (audit logs with single-writer guarantee). Supported by: CSI drivers implementing SINGLE_NODE_SINGLE_WRITER capability (AWS EBS CSI 1.13+, GCP PD CSI 1.8+, Azure Disk CSI 1.23+, Longhorn, Portworx). Behavior: PVC with RWOP bound to PV, first pod mounts successfully, second pod anywhere in cluster (even same node) fails with volume already in use error. StatefulSet with RWOP: only one replica can exist at a time for each PVC (replica count > 1 causes pending pods). Migration from RWO: Change PVC accessModes from RWO to RWOP requires recreating PVC (not in-place update). Example: PostgreSQL primary with RWOP PVC ensures no accidental multi-master scenario during pod rescheduling. Access mode matrix by volume type (2025): Block storage (AWS EBS, GCP PD, Azure Disk, Ceph RBD): RWO, RWOP only (block devices cannot multi-attach for writes, kernel limitation). File storage (NFS, Azure File, CephFS, GlusterFS): RWO, ROX, RWX, RWOP all supported (network file systems allow multi-mount). Local storage (local volumes, hostPath): RWO only (tied to single node, cannot migrate). Cloud file services: AWS EFS (RWX via EFS CSI), GCP Filestore (RWX via Filestore CSI), Azure Files (RWX via SMB). Specifying access modes in PVC: accessModes field is array (supports multiple modes), but PV binding selects single mode (first matching mode). Example PVC: accessModes: [ReadWriteOnce], resources.requests.storage: 10Gi, storageClassName: gp3-encrypted. Kubernetes matches PVC to PV with compatible access mode + sufficient capacity + matching storage class. Common pitfalls (2025): (1) RWO doesn't mean single pod: Multiple pods on same node can mount RWO volume (common confusion). For single-pod guarantee, use RWOP. (2) RWX not available on EBS/Azure Disk: Attempting RWX PVC with gp3 storage class fails with ProvisioningFailed (block storage limitation). Solution: Switch to NFS/EFS storage class or redesign for pod-local storage. (3) Performance degradation with RWX: Network file systems slower than block storage. Benchmark: EBS gp3 (16,000 IOPS, 0.5ms latency) vs NFS (1,000-5,000 IOPS, 2-5ms latency). Use RWX only when multi-writer truly needed. (4) Data corruption with RWX: Concurrent writes without locking cause file corruption. Example: Two pods writing to same log file simultaneously without coordination → interleaved corrupted log entries. Solution: Application-level file locking (flock), database-backed storage, or message queue pattern. (5) Zone affinity with RWO: PV created in zone us-east-1a, pod scheduled to zone us-east-1b fails to mount (volume in different zone). Solution: Use volumeBindingMode: WaitForFirstConsumer in StorageClass (delays PV provisioning until pod scheduled, ensures same zone). Production best practices: (1) Use RWOP for single-writer databases: PostgreSQL, MySQL, MariaDB primary replicas should use RWOP to prevent accidental dual-mount during failover (eliminates split-brain risk). (2) Avoid RWX unless required: Prefer pod-local storage with external object storage (S3, GCS) for shared data. Pattern: Pods write to local disk, background sync to S3, other pods read from S3 (avoids RWX network overhead). (3) Test storage class access modes: Before production, verify PVC with accessModes: [ReadWriteMany] actually provisions RWX-capable volume (some storage classes silently fall back to RWO). (4) Monitor PV binding failures: kubectl describe pvc shows ProvisioningFailed events with reason: volume plugin does not support ReadWriteMany. (5) Document mode selection rationale: In Helm charts/Terraform, comment why RWX chosen (example: Required for WordPress multi-replica shared uploads, NFS storage class mandatory). Access mode transitions: RWO → RWOP: Requires PVC recreation (delete PVC, recreate with RWOP, data lost unless backed up to separate volume). RWX → RWO: Not recommended (existing multi-mount pods will fail). Solution: Scale down to 1 replica before changing mode. CSI driver capabilities: Modern CSI drivers expose capabilities (SINGLE_NODE_READER_ONLY, SINGLE_NODE_WRITER, MULTI_NODE_READER_ONLY, MULTI_NODE_MULTI_WRITER, SINGLE_NODE_SINGLE_WRITER for RWOP). Check driver docs: kubectl get csidriver, kubectl describe csidriver ebs.csi.aws.com. Troubleshooting: PVC stuck Pending: kubectl describe pvc shows no persistent volumes available (no PV with matching access mode + capacity + storage class). Pod stuck ContainerCreating: kubectl describe pod shows Multi-Attach error (RWO volume already attached to different node, requires pod eviction from original node first). Volume mount fails: Check node logs (dmesg | grep -i mount), CSI driver logs (kubectl logs -n kube-system deploy/ebs-csi-controller).