distributed_consistency 15 Q&As

Distributed Consistency FAQ & Answers

15 expert Distributed Consistency answers researched from official documentation. Every answer cites authoritative sources you can verify.

Jump to section:

kafka-transactions (3) cockroachdb-isolation (3) postgresql-durability (2) spanner-consistency (2) distributed-fundamentals (2) postgresql-isolation (1) kafka-limits (1) postgresql-timeouts (1)

kafka-transactions

3 questions

What happens to in-flight Kafka transactions during a consumer group rebalance?

Messages read but not committed are abandoned and will be re-delivered after rebalance. The rebalance process: (1) All consumers pause processing, (2) Consumers commit their current offsets, (3) Partitions are revoked and reassigned, (4) Consumption resumes from last committed offset. Result: at-least-once delivery with potential duplicate processing. With Eager rebalancing (pre-2.4), ALL consumers stop ('stop the world'). With Cooperative rebalancing (Kafka 2.4+), only affected partitions stop while others continue processing. For exactly-once, use transactional producers that atomically commit offsets with produced messages.

Sources

confluent.io lydtechconsulting.com

99% confidence

What are the exact limitations of Kafka's exactly-once semantics?

Critical limitations: (1) SCOPE: Only works within same Kafka cluster. External systems (databases, APIs) require manual coordination. (2) PATTERN: Only guarantees read-process-write (consume-transform-produce) pattern. The read and process steps are still at-least-once internally. (3) PRODUCER RESTART: Idempotence uses Producer ID (PID) + sequence numbers. New PID on restart means idempotence guarantees reset. (4) SINGLE PARTITION: Idempotence only valid within single connection and partition. (5) TRANSACTION TIMEOUT: Default 60 seconds. Exceeding aborts the transaction. (6) PERFORMANCE: 2-3x latency overhead. One active transaction per producer limits throughput. (7) CONSUMER REWIND: Manually rewinding offsets redelivers all messages.

Sources

confluent.io docs.confluent.io

99% confidence

What is the transaction timeout for Kafka idempotent producers and what happens when exceeded?

Default: 60 seconds (transaction.timeout.ms). If exceeded, the transaction coordinator automatically aborts the transaction. The producer receives a ProducerFencedException or InvalidProducerEpochException on next operation. Critical implications: (1) Long-running processing must complete within this window, (2) After abort, producer must start a new transaction, (3) Any messages produced in the aborted transaction are discarded, (4) Consumer isolation.level=read_committed will never see aborted messages. Maximum configurable value is 15 minutes (900000ms) as of Kafka 2.5+.

Sources

kafka.apache.org confluent.io

99% confidence

cockroachdb-isolation

3 questions

What is CockroachDB's default isolation level and can it be changed?

Default is SERIALIZABLE. YES, it can be changed to READ COMMITTED as of version 23.2+. Prior to 23.2, CockroachDB only supported SERIALIZABLE. To enable READ COMMITTED: cluster setting sql.txn.read_committed_isolation.enabled=true (default true in 24.1+). Use cases for READ COMMITTED: migrating apps from other databases that expect RC behavior, reducing transaction retry errors in high-contention workloads. Cockroach Labs recommends staying on SERIALIZABLE unless you have specific compatibility or performance needs. SERIALIZABLE prevents all isolation anomalies; READ COMMITTED allows non-repeatable reads and phantom reads.

Sources

cockroachlabs.com cockroachlabs.com

99% confidence

What is the typical latency overhead of CockroachDB serializable isolation compared to read-committed?

No official percentage benchmark published by CockroachDB. Qualitative differences: SERIALIZABLE causes more transaction retry errors (40001) requiring client-side retry logic. READ COMMITTED minimizes blocking - readers never block writers, writers never block readers, only write-write conflicts cause contention. The main overhead is retry handling, not inherent latency. YugabyteDB (similar architecture) shows minimal difference: ~45K writes/sec with 4.5-6ms latency for both serializable and snapshot isolation. Key insight: the performance difference is primarily in conflict handling and retry overhead, not base operation latency.

Sources

cockroachlabs.com yugabyte.com

85% confidence

How does CockroachDB handle serializable isolation conflicts and what is the retry behavior?

CockroachDB returns SQLSTATE 40001 (serialization_failure) when it detects a conflict that would violate serializability. Client MUST implement retry logic - CockroachDB does NOT auto-retry. Retry pattern: (1) Catch 40001 error, (2) Re-execute entire transaction from BEGIN, (3) Use exponential backoff for retries. CockroachDB provides automatic retries only for implicit transactions (single statements). For explicit transactions, use SAVEPOINT cockroach_restart pattern for automatic server-side retries. With READ COMMITTED (23.2+), most conflicts that cause 40001 in SERIALIZABLE are handled transparently - no client retry needed.

Sources

cockroachlabs.com cockroachlabs.com

99% confidence

postgresql-durability

2 questions

What does PostgreSQL synchronous_commit=on actually guarantee?

With synchronous_commit=on (default), commits wait until synchronous standby(s) confirm they received AND flushed the commit record to durable storage. This ensures the transaction won't be lost unless BOTH primary and all synchronous standbys suffer storage corruption. IMPORTANT: If synchronous_standby_names is empty, 'on' only guarantees local flush - no remote guarantees. The five levels are: off (no wait, risk losing last 3x wal_writer_delay of commits), local (local flush only), remote_write (remote OS write buffer, not disk), on (remote flush to disk), remote_apply (remote flush + visible to queries on standby).

Sources

postgresql.org percona.com

99% confidence

What are the exact synchronous_commit levels in PostgreSQL and what does each guarantee?

Five levels with exact guarantees: (1) off - No waiting, async commit. May lose last 3x wal_writer_delay of commits on crash. No database inconsistency risk. (2) local - Waits for local WAL flush to disk only. (3) remote_write - Local flush + remote OS write (NOT disk). Prone to data loss if standby crashes before hitting storage. (4) on (default) - Local flush + remote flush to durable storage. Transaction safe unless both nodes lose storage. (5) remote_apply - Strongest. Local flush + remote flush + WAL replayed on standby. Queries on standby see committed data immediately. Higher latency. Without synchronous standbys configured, remote_write/on/remote_apply all behave like 'local'.

Sources

postgresql.org cybertec-postgresql.com

99% confidence

spanner-consistency

2 questions

What is Google Spanner's TrueTime maximum clock uncertainty bound?

7 milliseconds. Google guarantees all clocks in Spanner are within ±3.5ms of actual time, meaning any two clocks in the system are within 7ms of each other. This is achieved using GPS receivers and atomic clocks across datacenters with frequent synchronization. Under normal conditions, actual uncertainty ranges from 1-7ms. Spanner's commit-wait mechanism waits out this uncertainty (minimum 7ms) before reporting a transaction as committed, ensuring external consistency without coordination.

Sources

cloud.google.com cockroachlabs.com

99% confidence

How does Google Spanner use TrueTime to achieve external consistency?

Spanner uses 'commit-wait': before a node reports a transaction as committed, it waits for the maximum clock uncertainty (7ms) to pass. This ensures that if transaction T1 commits before T2 starts (in real time), T1's commit timestamp is guaranteed to be less than T2's. The TrueTime API returns an interval [earliest, latest] instead of a point in time, acknowledging uncertainty. By waiting out the interval, Spanner guarantees external consistency (linearizability) without requiring coordination between nodes for every transaction. The cost is added latency (minimum 7ms per commit).

Sources

cloud.google.com sookocheff.com

99% confidence

distributed-fundamentals

2 questions

What is the CAP theorem and what does 'partition tolerance' actually mean?

CAP: A distributed system can provide at most 2 of 3 guarantees: Consistency (every read receives most recent write), Availability (every request receives a response), Partition tolerance (system continues operating despite network partitions). CRITICAL: Partition tolerance is NOT optional. Network failures are inevitable in distributed systems. The real choice is: DURING a partition, do you sacrifice Consistency (return stale data, AP system) or Availability (reject requests, CP system)? Examples: CP (MongoDB, HBase) - refuse writes during partition. AP (Cassandra, DynamoDB) - accept writes, reconcile later. Eric Brewer (2012): 'two out of three' is misleading - you only choose between C and A when partition occurs.

Sources

en.wikipedia.org bmc.com

99% confidence

What is the difference between linearizability and serializability?

Different guarantees from different communities: SERIALIZABILITY (databases): Transactions appear to execute in SOME serial order. Multiple operations, multiple objects. Does NOT respect real-time ordering - T2 can appear before T1 even if T1 finished before T2 started. The 'I' in ACID. LINEARIZABILITY (distributed systems): Single operations appear to execute atomically at some point between invocation and response. Respects real-time order - if op A completes before op B starts, A precedes B in system history. Single object. The 'C' in CAP. STRICT SERIALIZABILITY combines both: transactions execute in some serial order that respects real-time. Linearizability is composable (per-object linearizability = system linearizability). Serializability is NOT composable.

Sources

bailis.org systemdesignschool.io

99% confidence

postgresql-isolation

1 question

What is the difference between PostgreSQL's REPEATABLE READ and true serializable isolation?

PostgreSQL's REPEATABLE READ is actually Snapshot Isolation (SI), NOT true serializability. Critical difference: REPEATABLE READ allows Write Skew anomaly. Write Skew example: Two doctors check if at least 2 doctors are on-call, each sees 2, each removes themselves - result: 0 doctors on-call. This violates application invariant but both transactions see consistent snapshots. SERIALIZABLE (added in PostgreSQL 9.1 via SSI - Serializable Snapshot Isolation) detects this: one transaction commits, the other aborts with serialization_failure. SSI uses 'first-committer-wins' rule. Workaround at REPEATABLE READ: use SELECT ... FOR UPDATE to lock rows being read.

Sources

jepsen.io wiki.postgresql.org

99% confidence

kafka-limits

1 question

What is the maximum message size in Kafka and what happens when exceeded?

Default: 1MB (1048576 bytes) controlled by message.max.bytes on broker and max.request.size on producer. If exceeded: Producer throws RecordTooLargeException, message is NOT sent. Broker rejects with MessageSizeTooLargeException if it arrives (shouldn't with proper producer config). To increase: must change BOTH broker message.max.bytes AND producer max.request.size. Also check replica.fetch.max.bytes (replication) and fetch.max.bytes (consumer). Maximum practical limit: ~10MB before performance degradation. For larger data: use chunking (split into multiple messages), external storage with Kafka holding references, or compression (can significantly reduce size).

Sources

kafka.apache.org kafka.apache.org

99% confidence

postgresql-timeouts

1 question

What is PostgreSQL's default statement_timeout and lock_timeout?

Both default to 0 (disabled/infinite wait). statement_timeout=0: Queries can run forever. No automatic termination. lock_timeout=0: Queries wait forever for locks. No deadlock timeout (but deadlock detection runs every deadlock_timeout, default 1s). Critical implications: (1) Long-running queries can block vacuuming and cause table bloat, (2) Lock waits can cascade causing connection pool exhaustion, (3) Production systems should set reasonable values (e.g., statement_timeout=30s, lock_timeout=10s). Set per-session with SET or per-role with ALTER ROLE. idle_in_transaction_session_timeout (default 0) is also critical for connection pool health.

Sources

postgresql.org postgresql.org

99% confidence

Browse All Topics