distributed_consistency 15 Q&As

Distributed Consistency FAQ & Answers

15 expert Distributed Consistency answers researched from official documentation. Every answer cites authoritative sources you can verify.

kafka-transactions

3 questions
A

Messages read but not committed are abandoned and will be re-delivered after rebalance. The rebalance process: (1) All consumers pause processing, (2) Consumers commit their current offsets, (3) Partitions are revoked and reassigned, (4) Consumption resumes from last committed offset. Result: at-least-once delivery with potential duplicate processing. With Eager rebalancing (pre-2.4), ALL consumers stop ('stop the world'). With Cooperative rebalancing (Kafka 2.4+), only affected partitions stop while others continue processing. For exactly-once, use transactional producers that atomically commit offsets with produced messages.

99% confidence
A

Critical limitations: (1) SCOPE: Only works within same Kafka cluster. External systems (databases, APIs) require manual coordination. (2) PATTERN: Only guarantees read-process-write (consume-transform-produce) pattern. The read and process steps are still at-least-once internally. (3) PRODUCER RESTART: Idempotence uses Producer ID (PID) + sequence numbers. New PID on restart means idempotence guarantees reset. (4) SINGLE PARTITION: Idempotence only valid within single connection and partition. (5) TRANSACTION TIMEOUT: Default 60 seconds. Exceeding aborts the transaction. (6) PERFORMANCE: 2-3x latency overhead. One active transaction per producer limits throughput. (7) CONSUMER REWIND: Manually rewinding offsets redelivers all messages.

99% confidence
A

Default: 60 seconds (transaction.timeout.ms). If exceeded, the transaction coordinator automatically aborts the transaction. The producer receives a ProducerFencedException or InvalidProducerEpochException on next operation. Critical implications: (1) Long-running processing must complete within this window, (2) After abort, producer must start a new transaction, (3) Any messages produced in the aborted transaction are discarded, (4) Consumer isolation.level=read_committed will never see aborted messages. Maximum configurable value is 15 minutes (900000ms) as of Kafka 2.5+.

99% confidence

cockroachdb-isolation

3 questions
A

Default is SERIALIZABLE. YES, it can be changed to READ COMMITTED as of version 23.2+. Prior to 23.2, CockroachDB only supported SERIALIZABLE. To enable READ COMMITTED: cluster setting sql.txn.read_committed_isolation.enabled=true (default true in 24.1+). Use cases for READ COMMITTED: migrating apps from other databases that expect RC behavior, reducing transaction retry errors in high-contention workloads. Cockroach Labs recommends staying on SERIALIZABLE unless you have specific compatibility or performance needs. SERIALIZABLE prevents all isolation anomalies; READ COMMITTED allows non-repeatable reads and phantom reads.

99% confidence
A

No official percentage benchmark published by CockroachDB. Qualitative differences: SERIALIZABLE causes more transaction retry errors (40001) requiring client-side retry logic. READ COMMITTED minimizes blocking - readers never block writers, writers never block readers, only write-write conflicts cause contention. The main overhead is retry handling, not inherent latency. YugabyteDB (similar architecture) shows minimal difference: ~45K writes/sec with 4.5-6ms latency for both serializable and snapshot isolation. Key insight: the performance difference is primarily in conflict handling and retry overhead, not base operation latency.

85% confidence
A

CockroachDB returns SQLSTATE 40001 (serialization_failure) when it detects a conflict that would violate serializability. Client MUST implement retry logic - CockroachDB does NOT auto-retry. Retry pattern: (1) Catch 40001 error, (2) Re-execute entire transaction from BEGIN, (3) Use exponential backoff for retries. CockroachDB provides automatic retries only for implicit transactions (single statements). For explicit transactions, use SAVEPOINT cockroach_restart pattern for automatic server-side retries. With READ COMMITTED (23.2+), most conflicts that cause 40001 in SERIALIZABLE are handled transparently - no client retry needed.

99% confidence

postgresql-durability

2 questions
A

With synchronous_commit=on (default), commits wait until synchronous standby(s) confirm they received AND flushed the commit record to durable storage. This ensures the transaction won't be lost unless BOTH primary and all synchronous standbys suffer storage corruption. IMPORTANT: If synchronous_standby_names is empty, 'on' only guarantees local flush - no remote guarantees. The five levels are: off (no wait, risk losing last 3x wal_writer_delay of commits), local (local flush only), remote_write (remote OS write buffer, not disk), on (remote flush to disk), remote_apply (remote flush + visible to queries on standby).

99% confidence
A

Five levels with exact guarantees: (1) off - No waiting, async commit. May lose last 3x wal_writer_delay of commits on crash. No database inconsistency risk. (2) local - Waits for local WAL flush to disk only. (3) remote_write - Local flush + remote OS write (NOT disk). Prone to data loss if standby crashes before hitting storage. (4) on (default) - Local flush + remote flush to durable storage. Transaction safe unless both nodes lose storage. (5) remote_apply - Strongest. Local flush + remote flush + WAL replayed on standby. Queries on standby see committed data immediately. Higher latency. Without synchronous standbys configured, remote_write/on/remote_apply all behave like 'local'.

99% confidence

spanner-consistency

2 questions
A

7 milliseconds. Google guarantees all clocks in Spanner are within ±3.5ms of actual time, meaning any two clocks in the system are within 7ms of each other. This is achieved using GPS receivers and atomic clocks across datacenters with frequent synchronization. Under normal conditions, actual uncertainty ranges from 1-7ms. Spanner's commit-wait mechanism waits out this uncertainty (minimum 7ms) before reporting a transaction as committed, ensuring external consistency without coordination.

99% confidence
A

Spanner uses 'commit-wait': before a node reports a transaction as committed, it waits for the maximum clock uncertainty (7ms) to pass. This ensures that if transaction T1 commits before T2 starts (in real time), T1's commit timestamp is guaranteed to be less than T2's. The TrueTime API returns an interval [earliest, latest] instead of a point in time, acknowledging uncertainty. By waiting out the interval, Spanner guarantees external consistency (linearizability) without requiring coordination between nodes for every transaction. The cost is added latency (minimum 7ms per commit).

99% confidence

distributed-fundamentals

2 questions
A

CAP: A distributed system can provide at most 2 of 3 guarantees: Consistency (every read receives most recent write), Availability (every request receives a response), Partition tolerance (system continues operating despite network partitions). CRITICAL: Partition tolerance is NOT optional. Network failures are inevitable in distributed systems. The real choice is: DURING a partition, do you sacrifice Consistency (return stale data, AP system) or Availability (reject requests, CP system)? Examples: CP (MongoDB, HBase) - refuse writes during partition. AP (Cassandra, DynamoDB) - accept writes, reconcile later. Eric Brewer (2012): 'two out of three' is misleading - you only choose between C and A when partition occurs.

99% confidence
A

Different guarantees from different communities: SERIALIZABILITY (databases): Transactions appear to execute in SOME serial order. Multiple operations, multiple objects. Does NOT respect real-time ordering - T2 can appear before T1 even if T1 finished before T2 started. The 'I' in ACID. LINEARIZABILITY (distributed systems): Single operations appear to execute atomically at some point between invocation and response. Respects real-time order - if op A completes before op B starts, A precedes B in system history. Single object. The 'C' in CAP. STRICT SERIALIZABILITY combines both: transactions execute in some serial order that respects real-time. Linearizability is composable (per-object linearizability = system linearizability). Serializability is NOT composable.

99% confidence

postgresql-isolation

1 question
A

PostgreSQL's REPEATABLE READ is actually Snapshot Isolation (SI), NOT true serializability. Critical difference: REPEATABLE READ allows Write Skew anomaly. Write Skew example: Two doctors check if at least 2 doctors are on-call, each sees 2, each removes themselves - result: 0 doctors on-call. This violates application invariant but both transactions see consistent snapshots. SERIALIZABLE (added in PostgreSQL 9.1 via SSI - Serializable Snapshot Isolation) detects this: one transaction commits, the other aborts with serialization_failure. SSI uses 'first-committer-wins' rule. Workaround at REPEATABLE READ: use SELECT ... FOR UPDATE to lock rows being read.

99% confidence

kafka-limits

1 question
A

Default: 1MB (1048576 bytes) controlled by message.max.bytes on broker and max.request.size on producer. If exceeded: Producer throws RecordTooLargeException, message is NOT sent. Broker rejects with MessageSizeTooLargeException if it arrives (shouldn't with proper producer config). To increase: must change BOTH broker message.max.bytes AND producer max.request.size. Also check replica.fetch.max.bytes (replication) and fetch.max.bytes (consumer). Maximum practical limit: ~10MB before performance degradation. For larger data: use chunking (split into multiple messages), external storage with Kafka holding references, or compression (can significantly reduce size).

99% confidence

postgresql-timeouts

1 question
A

Both default to 0 (disabled/infinite wait). statement_timeout=0: Queries can run forever. No automatic termination. lock_timeout=0: Queries wait forever for locks. No deadlock timeout (but deadlock detection runs every deadlock_timeout, default 1s). Critical implications: (1) Long-running queries can block vacuuming and cause table bloat, (2) Lock waits can cascade causing connection pool exhaustion, (3) Production systems should set reasonable values (e.g., statement_timeout=30s, lock_timeout=10s). Set per-session with SET or per-role with ALTER ROLE. idle_in_transaction_session_timeout (default 0) is also critical for connection pool health.

99% confidence