ZooKeeper: ephemeral sequential nodes pattern. Create /locks/resource-0000001, get children of /locks, if yours is lowest sequence → lock acquired, else watch next-lowest node. Automatic cleanup when session expires. Apache Curator library provides recipes. Pros: battle-tested (Kafka, HBase use), automatic cleanup. Cons: complex setup, Java-centric ecosystem. Use for: leader election (acquire lock on /leader key, holder is leader runs cron jobs/stream processing).
Distributed Locks Microservices FAQ & Answers
7 expert Distributed Locks Microservices answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
7 questionsRedis Redlock: acquire locks on majority of Redis nodes (3-5 instances). Algorithm: generate unique lock_id (UUID), attempt SET lock_key lock_id NX PX 30000 on all nodes, if majority succeed within drift window (lock_ttl * 0.01, typically 300ms), lock acquired. Release with Lua script: if redis.call('GET',KEYS[1]) == ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end (prevents releasing someone else's lock). Libraries: node-redlock, redlock-py, Redisson. Pros: low latency (2-5ms), high availability. Cons: controversial correctness during network partitions (Martin Kleppmann criticism).
Challenges: (1) Lock expiry (process holding lock crashes/delays) - use heartbeat to extend TTL (etcd/ZooKeeper keep-alive), set TTL > worst-case processing time + buffer, (2) Split-brain (network partition causes multiple lock holders) - use fencing tokens (monotonically increasing counter, ZooKeeper zxid or etcd revision), resource checks token before accepting operations, (3) Deadlocks - use lock timeouts (try_lock with 5-10 sec timeout), ordered locking (alphabetical resource names), (4) Performance - locks serialize operations (10K req/sec → 100 req/sec). Mitigations: minimize lock scope (lock per user_id not global), use optimistic locking (version fields, retry on conflict), queue-based coordination.
2025 recommendation: etcd for strong consistency requirements (financial, inventory), Redis Redlock for low-latency best-effort locks (cache invalidation, non-critical coordination), PostgreSQL advisory for simplicity if database already present. Monitoring: track lock acquisition latency P95 (<50ms healthy), hold duration (alert if >TTL * 0.8), contention rate. Production patterns: leader election (etcd election, ZK LeaderLatch), distributed cron (acquire lock before job execution), resource allocation (lock before assigning limited resources like IP addresses, license seats).
Distributed locks coordinate access to shared resources across services/instances, preventing race conditions in distributed systems. Use cases: leader election in clustered apps, scheduled job coordination (cron), inventory allocation (prevent overselling). Implementations: Redis Redlock, etcd lease-based locking, ZooKeeper ephemeral nodes, PostgreSQL advisory locks. Avoid when: can use database transactions, queue-based coordination (SQS, Kafka), CRDTs for eventual consistency.
PostgreSQL advisory locks: SELECT pg_try_advisory_lock(12345) returns true if acquired. Session-level (auto-release on disconnect) or transaction-level locks. Pros: simple if already using PostgreSQL, transactional guarantees. Cons: database becomes coordination bottleneck, lock table contention. Use for: simplicity when database already present, non-high-throughput scenarios. Avoid for: high-throughput distributed systems where database contention would be bottleneck.
etcd: lease-based locking with compare-and-swap. Create lease (TTL 30s), acquire lock with txn: if key not exists → set key with lease, else fail. Keep-alive extends lease. Automatic release on client disconnect. Native library support in Go, Python, Java. Pros: strong consistency (Raft consensus), 99.9% reliability. Cons: higher latency (10-20ms), requires etcd cluster. Recommended for strong consistency requirements (financial, inventory). Production config: lock TTL 10-30 seconds, retry with exponential backoff (100ms, 200ms, 400ms), timeout 5-10 seconds.