postgresql 100 Q&As

PostgreSQL FAQ & Answers

100 expert PostgreSQL answers researched from official documentation. Every answer cites authoritative sources you can verify.

server_configuration

52 questions
A

PostgreSQL 18 io_method Configuration

io_method is a new PostgreSQL 18 configuration parameter that controls how read operations are dispatched to storage.

Available Options

Value Description Platform
sync Synchronous blocking reads (legacy PG17 behavior) All
worker Dedicated I/O worker processes (default) All
io_uring Linux kernel ring buffer for minimal syscall overhead Linux 5.1+

Performance Comparison

Benchmark results for sequential scan on cold cache:

Method Time Improvement
sync 15,071ms Baseline
worker 10,052ms 1.5x faster
io_uring 5,723ms 2.6x faster

Configuration

-- Check current setting
SHOW io_method;

-- Set in postgresql.conf
io_method = 'io_uring'  -- For Linux with kernel 5.1+
io_method = 'worker'    -- Cross-platform default

When to Use Each

  • io_uring: Best for Linux production servers with high I/O workloads
  • worker: Safe default for cross-platform compatibility
  • sync: Only for debugging or compatibility testing

Current Limitations

  • AIO only applies to reads (sequential scans, bitmap heap scans, vacuum)
  • Index scans don't use AIO yet
  • Write operations and WAL still use synchronous I/O

Source: PostgreSQL 18 Documentation - Runtime Configuration
https://www.postgresql.org/docs/18/runtime-config-resource.html

99% confidence
A

{"type":"result","subtype":"success","is_error":false,"duration_ms":11606,"duration_api_ms":18951,"num_turns":1,"result":"# Maximum Size of a PostgreSQL Database\n\nThe theoretical maximum database size in PostgreSQL is 4 petabytes (PB).\n\nHowever, this limit is controlled by several component limits:\n\n## Component Limits (PostgreSQL 9.0+)\n\n- Maximum database size: 4 petabytes (limited by tablespace size)\n- Maximum table size: 32 terabytes (TB)\n- Maximum row size: 1.6 TB (including TOAST storage)\n- Maximum field size: 1 GB (without TOAST)\n- Maximum rows per table: Unlimited (theoretically limited by table size)\n- Maximum columns per table: 250-1600 (depending on column types)\n- Maximum indexes per table: Unlimited\n\n## Practical Considerations\n\nThe 32 TB table size limit is the most common practical constraint. This is determined by:\n- Maximum blocks per table: 2^32 (4,294,967,296 blocks)\n- Default block size: 8 KB\n- Calculation: 2^32 blocks × 8 KB = 32 TB\n\nTo exceed 32 TB of data per table, use table partitioning to split data across multiple physical tables.\n\n## Configuration\n\nThe block size can be changed at compile time (using --with-blocksize configure option) to 1, 2, 4, 8, 16, or 32 KB, but 8 KB is standard and changing it requires recompiling PostgreSQL.\n\n## Source\n\nPostgreSQL Official Documentation - Appendix K: Limits (valid for PostgreSQL 9.0 through 17.x)","session_id":"15306456-758d-4464-9e3f-c2490893eab6","total_cost_usd":0.0315107,"usage":{"input_tokens":3,"cache_creation_input_tokens":3474,"cache_read_input_tokens":13704,"output_tokens":382,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":3474}},"modelUsage":{"claude-haiku-4-5-20251001":{"inputTokens":658,"outputTokens":180,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.001558,"contextWindow":200000},"claude-opus-4-5-20251101":{"inputTokens":845,"outputTokens":114,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.0070750000000000006,"contextWindow":200000},"claude-sonnet-4-5-20250929":{"inputTokens":3,"outputTokens":382,"cacheReadInputTokens":13704,"cacheCreationInputTokens":3474,"webSearchRequests":0,"costUSD":0.0228777,"contextWindow":200000}},"permission_denials":[],"uuid":"b29caba8-7e5d-4c03-ace8-a6f40eda9bdd"}

99% confidence
A

{"type":"result","subtype":"success","is_error":false,"duration_ms":10783,"duration_api_ms":18117,"num_turns":1,"result":"# PostgreSQL Maximum Column Limit\n\nPostgreSQL allows a maximum of 250-1600 columns per table, depending on column types.\n\n## Exact Limits\n\n- Hard limit: 1600 columns (theoretical maximum)\n- Practical limit with variable-length types: 250-1000 columns\n- Constraint: The total number of columns depends on column types because PostgreSQL has an internal limit on tuple header size\n\n## Technical Details\n\nThe limit is defined by MaxTupleAttributeNumber in PostgreSQL source code, which is set to 1664, but after accounting for system columns, the usable maximum is approximately 1600 columns.\n\nHowever, the effective limit is significantly lower when using variable-length data types (TEXT, VARCHAR, BYTEA) because each column requires overhead in the tuple header. Tables with many variable-length columns typically hit practical limits around 250-1000 columns.\n\n## Code Example\n\nsql\n-- This will succeed (within limits)\nCREATE TABLE wide_table (\n col1 INTEGER,\n col2 INTEGER,\n -- ... up to ~1600 columns of fixed-width types\n);\n\n-- This may fail earlier with variable-length types\nCREATE TABLE var_table (\n col1 TEXT,\n col2 VARCHAR(100),\n -- ... limit reached around 250-1000 columns\n);\n\n\n## Version\n\nThis limit applies to PostgreSQL 9.x through 17.x (current as of January 2025).\n\n## Source\n\n- PostgreSQL official documentation: "SQL Database Limits"\n- Source code: src/include/access/htup_details.h (MaxTupleAttributeNumber = 1664)","session_id":"8e11f6a2-d5d8-49f9-b92b-5116a6b0e7c8","total_cost_usd":0.02123805,"usage":{"input_tokens":3,"cache_creation_input_tokens":415,"cache_read_input_tokens":16766,"output_tokens":392,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":415}},"modelUsage":{"claude-haiku-4-5-20251001":{"inputTokens":658,"outputTokens":161,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.0014629999999999999,"contextWindow":200000},"claude-opus-4-5-20251101":{"inputTokens":845,"outputTokens":123,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.007300000000000001,"contextWindow":200000},"claude-sonnet-4-5-20250929":{"inputTokens":3,"outputTokens":392,"cacheReadInputTokens":16766,"cacheCreationInputTokens":415,"webSearchRequests":0,"costUSD":0.01247505,"contextWindow":200000}},"permission_denials":[],"uuid":"11825836-86f0-4873-ac44-d8b5a0e25371"}

99% confidence
A

{"type":"result","subtype":"success","is_error":false,"duration_ms":13201,"duration_api_ms":22824,"num_turns":1,"result":"The maximum size of a single row in PostgreSQL is approximately 1.6 TB (1,600,000,000 bytes).\n\n## Key Details\n\nHard limit: Each row is stored in a single 8 KB page, but PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to store large column values out-of-line in separate TOAST tables.\n\nExact specifications:\n- Maximum field size: 1 GB (when TOASTed)\n- Maximum row size with TOAST: 1.6 TB theoretical limit\n- Practical limit per column: 1 GB\n- Row header overhead: 23 bytes minimum (varies with null bitmap)\n- Maximum columns per table: 1600 (1664 with system columns)\n\nImportant constraints:\n- Without TOAST, a row must fit in a single page (8 KB minus headers)\n- Data types smaller than ~2 KB are stored inline by default\n- TOAST automatically applies to data types like TEXT, BYTEA, VARCHAR when values exceed ~2 KB\n\n## Example\n\nsql\n-- This works - large text column uses TOAST automatically\nCREATE TABLE large_rows (\n id SERIAL PRIMARY KEY,\n large_text TEXT, -- Can store up to 1 GB\n large_binary BYTEA -- Can store up to 1 GB\n);\n\n-- Insert a large row (PostgreSQL handles TOAST transparently)\nINSERT INTO large_rows (large_text) \nVALUES (repeat('x', 100000000)); -- 100 MB text\n\n\nSource: PostgreSQL 17 official documentation - Chapter 73.2 TOAST\n\nVersions: This limit applies to PostgreSQL 8.3+ (when TOAST was enhanced). Current as of PostgreSQL 17 (2024).","session_id":"5e4b0f55-9504-4161-9d89-681074557ff2","total_cost_usd":0.0222493,"usage":{"input_tokens":3,"cache_creation_input_tokens":414,"cache_read_input_tokens":16766,"output_tokens":441,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":414}},"modelUsage":{"claude-haiku-4-5-20251001":{"inputTokens":658,"outputTokens":192,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.001618,"contextWindow":200000},"claude-opus-4-5-20251101":{"inputTokens":845,"outputTokens":128,"cacheReadInputTokens":0,"cacheCreationInputTokens":0,"webSearchRequests":0,"costUSD":0.007425,"contextWindow":200000},"claude-sonnet-4-5-20250929":{"inputTokens":3,"outputTokens":441,"cacheReadInputTokens":16766,"cacheCreationInputTokens":414,"webSearchRequests":0,"costUSD":0.013206299999999999,"contextWindow":200000}},"permission_denials":[],"uuid":"931a5569-e5e1-4423-9d6f-b8d544313bd0"}

99% confidence
A

Maximum Length of PostgreSQL TEXT Field

1 GB (1,073,741,823 bytes) - This is the maximum size for any TEXT field value in PostgreSQL.

Technical Details

  • The TEXT data type in PostgreSQL can store strings up to 1 GB in length
  • This limit applies to all variable-length text types: TEXT, VARCHAR, and CHAR
  • The actual maximum is precisely 1,073,741,823 bytes (1 GB - 1 byte)
  • This limit is enforced by PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) mechanism

Code Example

-- TEXT field has no explicit length constraint
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT  -- Can store up to 1 GB
);

-- These are functionally identical in PostgreSQL:
CREATE TABLE examples (
    text_col TEXT,           -- up to 1 GB
    varchar_col VARCHAR,     -- up to 1 GB (no length specified)
    varchar_limited VARCHAR(100)  -- limited to 100 characters
);

Important Notes

  • Unlike VARCHAR(n), TEXT has no length modifier and defaults to the maximum
  • The 1 GB limit is a hard limit in PostgreSQL's storage system
  • Character vs byte count: For UTF-8 text, multibyte characters consume multiple bytes toward the 1 GB limit

Source

PostgreSQL Official Documentation (applies to all versions 8.0+): Section 8.3 - Character Types

99% confidence
A

REAL vs DOUBLE PRECISION in PostgreSQL

REAL is a 4-byte (32-bit) floating point type with approximately 6 decimal digits of precision.

DOUBLE PRECISION is an 8-byte (64-bit) floating point type with approximately 15 decimal digits of precision.

Storage and Precision

Type Storage Precision Range
REAL 4 bytes 6 decimal digits 1E-37 to 1E+37
DOUBLE PRECISION 8 bytes 15 decimal digits 1E-307 to 1E+308

Usage

-- REAL example
CREATE TABLE measurements (
    temperature REAL,
    pressure REAL
);

-- DOUBLE PRECISION example
CREATE TABLE scientific_data (
    precise_calculation DOUBLE PRECISION,
    coordinate DOUBLE PRECISION
);

-- Alternative alias for DOUBLE PRECISION
CREATE TABLE example (
    value FLOAT8  -- equivalent to DOUBLE PRECISION
);

Key Differences

  1. Precision: DOUBLE PRECISION provides ~2.5x more significant digits
  2. Storage: DOUBLE PRECISION uses 2x the disk space
  3. Performance: REAL operations may be slightly faster on some hardware
  4. Standards: Both conform to IEEE 754 standard for floating point arithmetic

When to Use Each

  • Use REAL when storage is critical and 6 digits of precision suffice (e.g., sensor readings, percentages)
  • Use DOUBLE PRECISION when accuracy matters (e.g., financial calculations requiring floating point, scientific computations, geographic coordinates)

Note: For exact decimal arithmetic (like currency), use NUMERIC instead of floating point types.

Source: PostgreSQL Official Documentation (applies to PostgreSQL 9.x through 17.x)

99% confidence
A

PostgreSQL INTEGER vs BIGINT Storage Size

INTEGER: 4 bytes
BIGINT: 8 bytes

Value Ranges

  • INTEGER (also called INT or INT4):

    • Storage: 4 bytes
    • Range: -2,147,483,648 to +2,147,483,647
  • BIGINT (also called INT8):

    • Storage: 8 bytes
    • Range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

Example Usage

CREATE TABLE example (
    user_id INTEGER,        -- 4 bytes per row
    total_revenue BIGINT    -- 8 bytes per row
);

When to Use Each

  • Use INTEGER for most numeric columns (user IDs, counts, amounts under 2.1 billion)
  • Use BIGINT when values may exceed 2.1 billion (timestamps, large financial amounts, global identifiers)

Storage Impact

BIGINT uses exactly 2x the storage of INTEGER. For a table with 1 million rows:

  • INTEGER column: ~4 MB
  • BIGINT column: ~8 MB

Source: PostgreSQL Official Documentation - Numeric Types

Applies to: PostgreSQL 9.x through 17.x (storage sizes unchanged across versions)

99% confidence
A

The range of SMALLINT in PostgreSQL is -32768 to +32767.

Technical Details:

  • Storage size: 2 bytes
  • Signed: Yes (always)
  • Min value: -32768 (-2^15)
  • Max value: +32767 (2^15 - 1)

Example Usage:

CREATE TABLE example (
    id SMALLINT
);

-- Valid insertions
INSERT INTO example VALUES (-32768);  -- minimum
INSERT INTO example VALUES (32767);   -- maximum

-- This will cause an error: smallint out of range
INSERT INTO example VALUES (32768);

Comparison with Other Integer Types:

  • SMALLINT: 2 bytes, -32768 to 32767
  • INTEGER: 4 bytes, -2147483648 to 2147483647
  • BIGINT: 8 bytes, -9223372036854775808 to 9223372036854775807

Source: PostgreSQL Official Documentation - Numeric Types
Version: Applies to all PostgreSQL versions (this range is part of the SQL standard and has not changed)

Performance Note: SMALLINT is optimal when you know values will stay within this range, as it uses half the storage of INTEGER.

99% confidence
A

SERIAL is a legacy PostgreSQL-specific notational convenience (not a true data type) that automatically creates a sequence and sets the column default to nextval(). IDENTITY is the SQL-standard way (introduced in PostgreSQL 10) to create auto-incrementing columns using GENERATED ALWAYS AS IDENTITY or GENERATED BY DEFAULT AS IDENTITY.

Key Differences

1. SQL Standard Compliance

  • SERIAL: PostgreSQL-specific, non-standard syntax
  • IDENTITY: SQL standard (SQL:2003), portable across databases

2. Type Definition

  • SERIAL: Not a real data type; it's shorthand that creates a sequence + integer column + default value
  • IDENTITY: Real column property that creates an implicit sequence

3. Schema Visibility

  • SERIAL: The SERIAL keyword disappears after table creation; only shows as integer NOT NULL DEFAULT nextval(...)
  • IDENTITY: The GENERATED ... AS IDENTITY clause remains visible in the schema definition

4. Behavior Control

  • SERIAL: No control over whether users can insert explicit values (they always can)
  • IDENTITY: GENERATED ALWAYS prevents explicit inserts unless OVERRIDING SYSTEM VALUE is specified; GENERATED BY DEFAULT allows them

5. Table Copying

  • SERIAL: When using CREATE TABLE ... (LIKE ...), the sequence is shared between original and copy (problematic)
  • IDENTITY: Creates independent sequences for copied tables

6. Sequence Ownership

  • SERIAL: Sequence is owned by column but not part of the column definition
  • IDENTITY: Sequence is truly implicit and managed as part of the column property

Syntax Examples

-- SERIAL (old way)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT
);

-- IDENTITY (recommended)
CREATE TABLE users (
    id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT
);

-- IDENTITY with explicit values allowed
CREATE TABLE users (
    id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    name TEXT
);

Recommendation

Use IDENTITY columns for new applications. SERIAL is maintained for backward compatibility but IDENTITY provides better SQL standards compliance and more control.

Sources:

99% confidence
A

PostgreSQL TIMESTAMPTZ Internal Storage

PostgreSQL stores TIMESTAMPTZ (timestamp with time zone) as a 64-bit integer representing microseconds since 2000-01-01 00:00:00 UTC (the PostgreSQL epoch).

Key Facts:

  1. Storage Size: 8 bytes
  2. Internal Representation: int64 (microseconds since PostgreSQL epoch: 2000-01-01 00:00:00 UTC)
  3. No Time Zone Stored: Despite the name, PostgreSQL does NOT store the time zone. It converts input to UTC and stores only the UTC timestamp.
  4. Range: 4713 BC to 294276 AD (with microsecond precision)

How It Works:

When you insert a TIMESTAMPTZ:

  1. PostgreSQL converts the input to UTC using the session's timezone setting or explicit time zone in the input
  2. Stores the UTC timestamp as microseconds since 2000-01-01 00:00:00 UTC
  3. On retrieval, converts back to the session's timezone for display

Code Example:

-- Input with explicit time zone
INSERT INTO events (event_time) VALUES ('2025-01-15 14:30:00+05:00');

-- Stored internally as: microseconds from 2000-01-01 00:00:00 UTC
-- Actual UTC time stored: 2025-01-15 09:30:00 UTC

-- Retrieved based on session timezone
SET timezone = 'America/New_York';
SELECT event_time FROM events;
-- Returns: 2025-01-15 04:30:00-05

Source:

PostgreSQL 17 Documentation: Date/Time Types - Section 8.5

Critical for Agents: Always pass UTC or explicitly zoned timestamps. The stored value is timezone-agnostic (pure UTC).

99% confidence
A

The DATE type in PostgreSQL stores dates with a range from 4713 BC to 5874897 AD.

Exact Range

  • Minimum: 4713-01-01 BC (January 1, 4713 BC)
  • Maximum: 5874897-12-31 (December 31, 5874897 AD)

Storage Details

  • Storage size: 4 bytes
  • Resolution: 1 day (no time component)

Code Example

-- Valid DATE values
SELECT '4713-01-01 BC'::DATE;  -- Minimum value
SELECT '5874897-12-31'::DATE;  -- Maximum value
SELECT '2025-11-25'::DATE;     -- Typical value

-- This will error (out of range)
SELECT '4714-01-01 BC'::DATE;  -- Before minimum
SELECT '5874898-01-01'::DATE;  -- After maximum

Source

This range is consistent across PostgreSQL versions 8.0 through 17.x and is documented in the official PostgreSQL documentation for Date/Time Types.

Note: The DATE type stores only the date with no time-of-day component. For timestamps, use TIMESTAMP or TIMESTAMPTZ types which have a more limited range (4713 BC to 294276 AD).

99% confidence
A

PostgreSQL BOOLEAN type accepts three states: TRUE, FALSE, and NULL (unknown).

Valid Input Values

For TRUE:

  • TRUE (SQL keyword)
  • 'true'
  • 'yes'
  • 'on'
  • '1'
  • 't', 'y' (unique prefixes)

For FALSE:

  • FALSE (SQL keyword)
  • 'false'
  • 'no'
  • 'off'
  • '0'
  • 'f', 'n' (unique prefixes)

For UNKNOWN:

  • NULL (SQL keyword)

Input Rules

  • Case-insensitive (e.g., 'TRUE', 'True', 'true' all work)
  • Leading/trailing whitespace is ignored
  • Unique prefixes are accepted (e.g., 't' for true, 'f' for false)

Output Format

When queried, PostgreSQL always outputs boolean values as t or f (single character lowercase).

-- Example inputs (all valid)
INSERT INTO table (bool_col) VALUES 
  (TRUE),           -- SQL keyword
  ('yes'),          -- string representation
  ('1'),            -- numeric representation
  ('t');            -- prefix

-- All output as: t

Sources:

99% confidence
A

The UUID type in PostgreSQL stores Universally Unique Identifiers (UUIDs) as defined by RFC 9562. It is a 128-bit value displayed as hexadecimal digits in the format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (8-4-4-4-12 digit groups).

Storage: 128 bits (16 bytes)

Example UUID: a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

Generating UUIDs:

PostgreSQL provides built-in functions for UUID generation:

-- Generate version 4 (random) UUID
SELECT gen_random_uuid();
SELECT uuidv4();  -- alias for gen_random_uuid()

-- Generate version 7 (time-ordered) UUID
SELECT uuidv7();

-- Generate version 7 with timestamp shift
SELECT uuidv7(interval '1 hour');

Usage in tables:

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT
);

INSERT INTO users (name) VALUES ('Alice');  -- id auto-generated

Key advantages:

  • Version 4: Cryptographically random, globally unique
  • Version 7: Time-ordered with millisecond precision, better for indexing performance
  • No cross-database coordination required (unlike sequences)

Additional UUID algorithms: Install the uuid-ossp extension for UUIDv1, UUIDv3, and UUIDv5 generation.

Sources:

99% confidence
A

The maximum size of BYTEA in PostgreSQL is 1 GB (1,073,741,824 bytes).

This limit applies to all TOAST-able data types in PostgreSQL, including BYTEA. The limit is enforced by PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) mechanism, which has a maximum datum size of 1 GB.

Technical Details:

  • Maximum theoretical size: 1 GB - 1 byte (1,073,741,823 bytes)
  • This is a hard limit enforced by the MaxAllocSize constant in PostgreSQL source code
  • Applies to all PostgreSQL versions (confirmed in versions 9.x through 16+)

Storage Considerations:

  • BYTEA values larger than ~2 KB are automatically compressed and/or moved to TOAST tables
  • The 1 GB limit includes any overhead from compression or encoding

Example Usage:

-- This will work (within limit)
INSERT INTO files (data) VALUES (pg_read_binary_file('/path/to/file.bin'));

-- Check size of BYTEA column
SELECT pg_column_size(data) FROM files WHERE id = 1;

If you need larger binary storage:

  • Use PostgreSQL Large Objects (up to 4 TB per object)
  • Store files externally and keep references in the database

Source: PostgreSQL official documentation on TOAST and binary data types, consistent across versions 9.0-16.

99% confidence
A

PostgreSQL ENUM Type

An ENUM (enumerated type) is a user-defined data type in PostgreSQL that consists of a static, ordered set of string values. Once created, an ENUM type can be used as a column type like any built-in type.

Creating and Using ENUMs

-- Create an ENUM type
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

-- Use in a table
CREATE TABLE person (
    name TEXT,
    current_mood mood
);

-- Insert values (must match exactly)
INSERT INTO person VALUES ('Alice', 'happy');
INSERT INTO person VALUES ('Bob', 'sad');

Key Characteristics

  • Storage: 4 bytes per value (same as integer)
  • Ordering: Values are ordered by creation sequence, NOT alphabetically
  • Case-sensitive: 'Happy' ≠ 'happy'
  • Immutable after creation: Cannot reorder or rename existing values (PostgreSQL <12)
  • PostgreSQL 12+: Can add values with ALTER TYPE mood ADD VALUE 'excited' AFTER 'happy'

When to Use ENUMs

Use when:

  • Small, fixed set of values (e.g., status: 'pending', 'approved', 'rejected')
  • Values rarely change (adding is possible, but removing requires type recreation)
  • Need type safety at database level
  • Performance matters (4 bytes vs. variable TEXT storage)
  • Want constraint enforcement without CHECK constraints

Avoid when:

  • Values change frequently (use lookup table instead)
  • Need internationalization (enum values are stored literals)
  • Application manages validation logic (use VARCHAR with CHECK constraint)
  • Multiple applications with different valid values access the DB

Source

PostgreSQL Official Documentation v16: Chapter 8.7 - Enumerated Types

99% confidence
A

Adding a Value to an Existing ENUM Type in PostgreSQL

Use the ALTER TYPE ... ADD VALUE statement:

ALTER TYPE enum_type_name ADD VALUE 'new_value';

Position Control

By default, the new value is added at the end. To specify position:

-- Add before an existing value
ALTER TYPE enum_type_name ADD VALUE 'new_value' BEFORE 'existing_value';

-- Add after an existing value
ALTER TYPE enum_type_name ADD VALUE 'new_value' AFTER 'existing_value';

Transaction Behavior (CRITICAL)

PostgreSQL versions ≥ 12.0: ADD VALUE can be used inside a transaction block without restrictions.

PostgreSQL versions < 12.0: ADD VALUE CANNOT be used in a transaction block with other operations on the same enum type. You must either:

  • Run it in its own transaction, OR
  • Use the IF NOT EXISTS clause (PostgreSQL 9.1+) and commit before using the new value
-- Safe for all versions (idempotent)
ALTER TYPE enum_type_name ADD VALUE IF NOT EXISTS 'new_value';

Example

-- Create enum
CREATE TYPE status AS ENUM ('pending', 'active', 'closed');

-- Add new value at the end
ALTER TYPE status ADD VALUE 'archived';

-- Add new value in specific position
ALTER TYPE status ADD VALUE 'cancelled' AFTER 'pending';

Limitations

  • You CANNOT remove enum values (requires type recreation)
  • You CANNOT rename enum values directly (use workaround with ADD + UPDATE + DROP type)
  • Enum values are sorted by creation order, not alphabetically

Source: PostgreSQL Official Documentation, ALTER TYPE command (PostgreSQL 9.1+, transaction improvements in 12.0)

99% confidence
A

PostgreSQL DOMAIN

A DOMAIN is a user-defined data type in PostgreSQL that extends an existing base type with optional constraints and default values. It allows you to create reusable, constrained types once and apply them consistently across multiple tables.

Creating a Domain:

CREATE DOMAIN domain_name AS base_type
  [ DEFAULT expression ]
  [ constraint [ ... ] ];

Concrete Example:

-- Email domain with validation
CREATE DOMAIN email_address AS TEXT
  CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

-- Positive integer domain
CREATE DOMAIN positive_int AS INTEGER
  CHECK (VALUE > 0);

-- US zip code domain
CREATE DOMAIN us_zipcode AS TEXT
  CHECK (VALUE ~ '^\d{5}(-\d{4})?$')
  NOT NULL;

Usage in Tables:

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email email_address,  -- Uses domain with built-in constraint
  age positive_int
);

Key Characteristics:

  • Constraints are checked at INSERT/UPDATE (same timing as table CHECK constraints)
  • VALUE keyword in CHECK constraints refers to the domain value being tested
  • Domains can have NOT NULL, CHECK, and DEFAULT constraints
  • Available since PostgreSQL 7.3+
  • Constraints apply to ALL columns using that domain (centralized validation)

Management Commands:

ALTER DOMAIN email_address ADD CONSTRAINT ... ;
DROP DOMAIN email_address CASCADE;

Source: PostgreSQL Official Documentation 16.x - CREATE DOMAIN (https://www.postgresql.org/docs/current/sql-createdomain.html)

99% confidence
A

PostgreSQL Composite Types

A composite type in PostgreSQL is a user-defined data type that groups multiple fields (columns) together into a single type, similar to a row or record structure. Each field has a name and a data type.

Definition

Composite types are created using CREATE TYPE:

CREATE TYPE address AS (
    street VARCHAR(100),
    city VARCHAR(50),
    postal_code VARCHAR(10),
    country VARCHAR(50)
);

Usage

As table column:

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    home_address address,
    work_address address
);

Inserting data:

INSERT INTO employees (name, home_address, work_address) 
VALUES (
    'John Doe',
    ROW('123 Main St', 'Boston', '02101', 'USA'),
    ROW('456 Corp Ave', 'Boston', '02102', 'USA')
);

Accessing fields:

-- Use dot notation (parentheses required to avoid ambiguity)
SELECT name, (home_address).city FROM employees;

Key Characteristics

  • Table rows are composite types: Every table automatically has a composite type with the same name
  • Nested composites: Composite types can contain other composite types
  • Size limit: Subject to row size limit of 1.6 TB (PostgreSQL 12+), practically limited by block_size (default 8 KB for efficient storage)
  • NULL handling: The entire composite value can be NULL, or individual fields can be NULL

Official Reference

PostgreSQL Documentation: Composite Types (current as of PostgreSQL 17)

99% confidence
A

PostgreSQL ARRAY Type

The ARRAY type in PostgreSQL stores variable-length multidimensional arrays of a single data type. Every PostgreSQL data type has a corresponding array type (e.g., integer[], text[], timestamp[]).

Key Specifications

  • Declaration syntax: column_name data_type[] or column_name data_type ARRAY
  • Dimensions: PostgreSQL supports arrays of any number of dimensions, but all arrays are dynamically sized with no fixed bounds enforced
  • Index base: Arrays are 1-indexed (first element is at position 1, not 0)
  • Maximum size: Limited by the maximum field size of 1 GB
  • Type constraint: All elements must be of the same base type

Creating Arrays

-- Column declaration
CREATE TABLE products (
    id serial PRIMARY KEY,
    tags text[],
    prices integer ARRAY,
    matrix integer[][]  -- multidimensional
);

-- Inserting array literals
INSERT INTO products (tags, prices) VALUES 
    (ARRAY['electronics', 'sale'], ARRAY[99, 149]),
    ('{"book", "fiction"}', '{10, 15, 20}');  -- alternative syntax

Accessing Elements

-- Access single element (1-indexed)
SELECT tags[1] FROM products;  -- first element

-- Access slice
SELECT tags[1:2] FROM products;  -- elements 1 through 2

-- Multidimensional access
SELECT matrix[1][2] FROM products;

Essential Functions

  • array_length(array, dimension) - returns length of specified dimension
  • array_append(array, element) - adds element to end
  • array_cat(array1, array2) - concatenates arrays
  • unnest(array) - expands array to rows
  • ANY(array) / ALL(array) - comparison operators
-- Query with array operations
SELECT * FROM products WHERE 'sale' = ANY(tags);
SELECT array_length(prices, 1) FROM products;

Source: PostgreSQL 17 Official Documentation, Section 8.15 (Arrays)

Critical notes:

  • NULL handling: ARRAY[1, NULL, 3] is valid; the array itself can also be NULL
  • Use text[] not text ARRAY for consistency with PostgreSQL conventions
  • Arrays are stored in binary format internally but output as {val1,val2,val3} text representation
99% confidence
A

Based on the search results from PostgreSQL's official source code documentation, I can provide the authoritative answer:

6 dimensions

PostgreSQL arrays are limited to a maximum of 6 dimensions. This is defined by the MAXDIM constant in the PostgreSQL source code at src/include/utils/array.h:

#define MAXDIM 6

This limit is enforced throughout the PostgreSQL codebase. When you attempt to create or manipulate arrays with more than 6 dimensions, PostgreSQL will raise an error: "number of array dimensions exceeds the maximum allowed (6)".

Example:

-- Valid: up to 6 dimensions
SELECT ARRAY[[[[[[1]]]]]]::int[];

-- Invalid: 7 dimensions would exceed MAXDIM
SELECT ARRAY[[[[[[[1]]]]]]]::int[]; -- ERROR

Sources:

99% confidence
A

PostgreSQL hstore Extension

The hstore extension is a key-value store data type for PostgreSQL that stores sets of key-value pairs within a single PostgreSQL value. Each key and value is a text string.

Enabling hstore

CREATE EXTENSION IF NOT EXISTS hstore;

Storage and Syntax

-- Creating a table with hstore column
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    attributes hstore
);

-- Inserting data
INSERT INTO products (attributes) VALUES 
    ('color => "red", size => "M", weight => "500g"');

-- Alternative syntax with hstore() constructor
INSERT INTO products (attributes) VALUES 
    (hstore('color', 'blue') || hstore('size', 'L'));

Key Operations

-- Retrieve a value by key
SELECT attributes -> 'color' FROM products;

-- Check if key exists
SELECT attributes ? 'color' FROM products;

-- Check if multiple keys exist
SELECT attributes ?& ARRAY['color', 'size'] FROM products;  -- ALL keys
SELECT attributes ?| ARRAY['color', 'price'] FROM products; -- ANY key

-- Get all keys or values
SELECT akeys(attributes) FROM products;  -- returns text[]
SELECT avals(attributes) FROM products;  -- returns text[]

-- Convert to JSON
SELECT hstore_to_json(attributes) FROM products;

Indexing

-- GIN index for existence checks and containment
CREATE INDEX idx_attributes ON products USING GIN(attributes);

-- GiST index (alternative)
CREATE INDEX idx_attributes_gist ON products USING GIST(attributes);

Limitations

  • Keys and values: Both are text strings only (no native numeric/boolean types)
  • NULL values: Distinguishes between NULL value and missing key
  • Size: No hard limit, but large hstore values impact performance
  • Nesting: No nested structures (flat key-value only)

Use Cases

Use hstore for semi-structured data with varying attributes where you need:

  • Indexable key-value storage
  • Better query performance than JSON for key existence checks
  • PostgreSQL versions before JSONB matured (pre-9.4)

Note: For PostgreSQL 9.4+, consider JSONB for more complex semi-structured data needs (supports nested objects, arrays, and native data types).

Source: PostgreSQL Official Documentation (hstore module), compatible with PostgreSQL 9.0+, built-in contrib module.

99% confidence
A

NULL vs Empty String in PostgreSQL

NULL represents the absence of a value (unknown or undefined data). Empty string ('') is a known value that happens to contain zero characters.

Key Differences

1. Storage and Semantics

  • NULL = no data exists, unknown, or not applicable
  • '' = a string value with length 0

2. Comparison Behavior

-- NULL comparisons always return NULL (unknown)
SELECT NULL = NULL;        -- Returns: NULL (not TRUE)
SELECT NULL IS NULL;       -- Returns: TRUE
SELECT '' = '';            -- Returns: TRUE

-- Empty string comparisons work normally
SELECT '' = '';            -- Returns: TRUE
SELECT '' IS NULL;         -- Returns: FALSE

3. String Operations

-- NULL propagates through operations
SELECT 'Hello' || NULL;    -- Returns: NULL
SELECT length(NULL);       -- Returns: NULL

-- Empty string behaves as a value
SELECT 'Hello' || '';      -- Returns: 'Hello'
SELECT length('');         -- Returns: 0

4. Constraints

-- NOT NULL constraint
CREATE TABLE example (
    col1 VARCHAR NOT NULL  -- Rejects NULL, allows ''
);

INSERT INTO example VALUES ('');   -- SUCCESS
INSERT INTO example VALUES (NULL); -- ERROR: violates NOT NULL

5. Indexing

  • Both NULL and '' are indexed differently
  • NULL values excluded from standard B-tree indexes unless explicitly included
  • '' is always indexed as a regular value

6. Aggregation

-- COUNT ignores NULLs but counts empty strings
SELECT COUNT(col) FROM table;  -- Excludes NULL, includes ''

Practical Rule

Use NULL for missing/unknown data. Use '' only when you need to represent a string that explicitly has no characters (e.g., an empty user input that was intentionally submitted blank).

Version: PostgreSQL 12+ (behavior consistent across all modern versions)

Source: PostgreSQL Official Documentation - NULL Values

99% confidence
A

PostgreSQL Identifier Case Sensitivity

PostgreSQL folds unquoted identifiers to lowercase and treats them case-insensitively. Quoted identifiers (using double quotes) are case-sensitive and preserve exact casing.

Rules

Unquoted identifiers:

-- These are all equivalent (folded to lowercase internally):
CREATE TABLE Users (id INT);
CREATE TABLE users (id INT);
CREATE TABLE USERS (id INT);

-- All reference the same table "users":
SELECT * FROM Users;
SELECT * FROM users;
SELECT * FROM USERS;

Quoted identifiers:

-- These create DIFFERENT tables:
CREATE TABLE "Users" (id INT);
CREATE TABLE "users" (id INT);
CREATE TABLE "USERS" (id INT);

-- Must use exact casing in quotes:
SELECT * FROM "Users";  -- ✓ Works
SELECT * FROM Users;    -- ✗ Error: relation "users" does not exist

Critical Implementation Details

  1. Naming convention: Use lowercase with underscores for unquoted identifiers: user_accounts, first_name
  2. Avoid quoted identifiers unless required for mixed case (e.g., when interfacing with case-sensitive systems)
  3. SQL standard compliance: This behavior follows the SQL standard (SQL:2016), which requires case-folding for unquoted identifiers (though the standard folds to uppercase; PostgreSQL chose lowercase)
  4. Maximum identifier length: 63 bytes (NAMEDATALEN - 1). Longer identifiers are silently truncated.

Edge Cases

-- Mixed case without quotes becomes lowercase:
CREATE TABLE MyTable (MyColumn INT);
-- Creates table "mytable" with column "mycolumn"

-- Quoted identifiers allow reserved keywords:
CREATE TABLE "select" (id INT);  -- Valid but not recommended

Source: PostgreSQL 17 Documentation - Section 4.1.1 "Identifiers and Key Words" (https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS)

99% confidence
A

The default schema in PostgreSQL is public.

When a database is created, PostgreSQL automatically creates the public schema. If no schema is explicitly specified in queries or when creating database objects, they are created in the public schema by default.

Search Path Behavior:
The default search path is "$user", public, meaning PostgreSQL first looks for a schema matching the current user's name, then falls back to the public schema. You can verify this with:

SHOW search_path;
-- Returns: "$user", public

Creating Objects Without Schema Specification:

CREATE TABLE users (id INT);
-- Equivalent to:
CREATE TABLE public.users (id INT);

Key Details:

  • The public schema is owned by the pg_database_owner role (PostgreSQL 15+) or the postgres superuser (older versions)
  • All users have CREATE and USAGE privileges on public by default in PostgreSQL 14 and earlier
  • PostgreSQL 15+: Default privileges on public were revoked for security; only pg_database_owner has CREATE by default
  • To change the default schema for a session: SET search_path TO myschema, public;

Source: PostgreSQL Official Documentation - Schema Search Path (valid for PostgreSQL 12-17)

99% confidence
A

PostgreSQL search_path

The search_path is a session configuration parameter that defines the ordered list of schemas PostgreSQL searches when resolving unqualified object names (tables, functions, types, etc.).

Default Value

SHOW search_path;
-- Returns: "$user", public

The default searches for a schema matching the current username first, then the public schema.

How It Works

When you reference SELECT * FROM users, PostgreSQL searches schemas in search_path order until it finds a table named users. With default settings, it checks:

  1. Schema named after your username (if exists)
  2. public schema

Setting search_path

Session level:

SET search_path TO myschema, public;

Database level:

ALTER DATABASE mydb SET search_path TO myschema, public;

Role level:

ALTER ROLE myuser SET search_path TO myschema, public;

Connection string:

options=-c search_path=myschema,public

Security Critical Detail

ALWAYS include pg_catalog explicitly or use schema-qualified names in functions to prevent search_path hijacking attacks. PostgreSQL 15+ improved this by making functions search pg_catalog first by default.

-- Secure (PostgreSQL 15+)
CREATE FUNCTION myfunc() RETURNS int AS $$
  SELECT count(*) FROM mytable;
$$ LANGUAGE SQL;

-- Secure (any version)
SET search_path TO pg_catalog, myschema, public;

Verification

SELECT current_schemas(true);  -- Shows actual search path including implicit schemas

Source: PostgreSQL 17 Official Documentation - Schema Search Path (https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH)

99% confidence
A

PostgreSQL: Schema vs Database

ATOMIC ANSWER:

A database is a top-level container that holds all data objects and is completely isolated from other databases (separate connections, no cross-database queries in standard PostgreSQL). A schema is a namespace within a database that organizes tables, views, functions, and other objects, allowing multiple schemas in one database with fully-qualified access via schema_name.table_name.

Key Differences

Database:

  • Created with CREATE DATABASE dbname;
  • Requires separate connection (cannot query across databases in one session)
  • Contains one or more schemas
  • Has its own users/privileges, encoding, and collation
  • Physical separation on disk

Schema:

  • Created with CREATE SCHEMA schemaname;
  • Multiple schemas accessible in single connection
  • Default schema is public (created automatically)
  • Enables logical organization without connection overhead
  • Cross-schema queries: SELECT * FROM schema1.table1 JOIN schema2.table2

Code Example

-- Database level (requires reconnecting)
CREATE DATABASE app_production;
\c app_production  -- Connect to database

-- Schema level (same connection)
CREATE SCHEMA sales;
CREATE SCHEMA marketing;

CREATE TABLE sales.orders (id INT, amount DECIMAL);
CREATE TABLE marketing.campaigns (id INT, name TEXT);

-- Query across schemas (same database)
SELECT o.id, c.name 
FROM sales.orders o 
JOIN marketing.campaigns c ON o.id = c.id;

Schema Search Path

PostgreSQL uses search_path to resolve unqualified table names (default: "$user", public):

SHOW search_path;
SET search_path TO sales, marketing, public;

Source: PostgreSQL 17 Documentation - Chapter 5.9 Schemas

Use Case: Use schemas for logical separation (multi-tenant apps, dev/test environments within one DB). Use separate databases for complete isolation (different applications, security boundaries).

99% confidence
A

What is a tablespace in PostgreSQL?

A tablespace in PostgreSQL is a storage location on the filesystem where database objects (tables, indexes) are physically stored. It allows you to control the disk location of data files independently of the logical database structure.

Key Facts

Default tablespaces:

  • pg_default - stores user data (located in $PGDATA/base/)
  • pg_global - stores cluster-wide system catalogs (located in $PGDATA/global/)

Purpose:

  • Distribute I/O across multiple disks/partitions
  • Place frequently accessed data on faster storage (SSD)
  • Manage disk space when filesystem is full
  • Separate temporary files to different storage

Creating a Tablespace

CREATE TABLESPACE fast_storage 
  LOCATION '/mnt/ssd/postgresql/data';

Requirements:

  • Directory must exist and be empty
  • Directory must be owned by PostgreSQL system user (typically postgres)
  • Absolute path required (no relative paths)
  • Cannot be on temporary filesystems

Using a Tablespace

-- Create table in specific tablespace
CREATE TABLE users (id int, name text) 
  TABLESPACE fast_storage;

-- Create index in specific tablespace
CREATE INDEX idx_users_name ON users(name) 
  TABLESPACE fast_storage;

-- Set default for database
ALTER DATABASE mydb SET default_tablespace = fast_storage;

Verification

-- List all tablespaces
SELECT oid, spcname, pg_tablespace_location(oid) 
FROM pg_tablespace;

-- Find table's tablespace
SELECT tablename, tablespace 
FROM pg_tables 
WHERE tablename = 'users';

Source: PostgreSQL 17 Official Documentation - Chapter 23.6 Tablespaces

99% confidence
A

pg_catalog is PostgreSQL's system catalog schema that stores all built-in database metadata, including information about tables, columns, functions, data types, operators, and access methods.

Key Characteristics:

  1. Always first in search_path - PostgreSQL automatically searches pg_catalog before user schemas, even if not explicitly listed in search_path. This ensures system functions are always accessible.

  2. Contains system tables and views including:

    • pg_class - tables, indexes, views, sequences
    • pg_attribute - table columns
    • pg_type - data types
    • pg_proc - functions and procedures
    • pg_namespace - schemas
    • pg_database - databases
  3. Built-in functions - All standard PostgreSQL functions (e.g., array_agg(), now(), concat()) reside here.

Usage Examples:

-- Query table metadata
SELECT * FROM pg_catalog.pg_tables WHERE schemaname = 'public';

-- List all schemas
SELECT nspname FROM pg_catalog.pg_namespace;

-- Call system functions (pg_catalog prefix optional due to search_path)
SELECT pg_catalog.version();
SELECT version();  -- Same result

Security Note: Do not drop or modify pg_catalog objects. They are essential for database operation.

Source: PostgreSQL Official Documentation - System Catalogs (applies to PostgreSQL 9.x through 17.x)

99% confidence
A

information_schema in PostgreSQL

information_schema is a SQL-standard schema that provides a portable, database-agnostic view of metadata about database objects (tables, columns, constraints, views, etc.). It exists in all PostgreSQL versions and is defined by the SQL:2016 standard.

Key Characteristics

  • Location: Automatically created in every PostgreSQL database
  • Access: Read-only views querying the system catalogs (pg_catalog)
  • Portability: Same structure across PostgreSQL, MySQL, SQL Server, etc.
  • Visibility: Shows only objects the current user has privileges to access

Common Use Cases

-- List all tables in current database
SELECT table_schema, table_name 
FROM information_schema.tables 
WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ('pg_catalog', 'information_schema');

-- Get column details for a specific table
SELECT column_name, data_type, character_maximum_length, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND table_schema = 'public';

-- Find all foreign key constraints
SELECT constraint_name, table_name, constraint_type
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY';

PostgreSQL-Specific Behavior

  • Performance: Slower than direct pg_catalog queries due to additional abstraction layers
  • Completeness: Missing PostgreSQL-specific features (e.g., inheritance, partial indexes). Use pg_catalog for those.
  • Type Mapping: Shows SQL standard type names, not PostgreSQL internal types (e.g., character varying instead of varchar)

When to Use

  • Use information_schema: When writing portable SQL across databases
  • Use pg_catalog: When querying PostgreSQL-specific features or optimizing performance-critical metadata queries

Source: PostgreSQL Official Documentation Chapter 37 (all versions 9.x through 16.x)

99% confidence
A

Composite Primary Key in PostgreSQL

A composite primary key uses multiple columns to uniquely identify rows. Define it using the PRIMARY KEY constraint with a comma-separated list of columns.

Syntax

During table creation:

CREATE TABLE table_name (
    column1 data_type,
    column2 data_type,
    column3 data_type,
    PRIMARY KEY (column1, column2)
);

Adding to existing table:

ALTER TABLE table_name 
ADD PRIMARY KEY (column1, column2);

Example

CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Critical Details

  • Column order matters: PRIMARY KEY (a, b) creates a different index structure than PRIMARY KEY (b, a). Order columns by query patterns (most selective/frequently filtered first).
  • Maximum columns: PostgreSQL allows up to 32 columns in a composite key (limited by index max of 32 columns).
  • Implicit NOT NULL: All columns in a primary key automatically become NOT NULL.
  • Automatic index: PostgreSQL creates a unique B-tree index on the column combination.
  • Constraint naming: Use CONSTRAINT constraint_name PRIMARY KEY (col1, col2) for explicit naming.

Verification

Supported since PostgreSQL 7.1+, current through PostgreSQL 17.

Source: PostgreSQL Official Documentation - CREATE TABLE (https://www.postgresql.org/docs/current/sql-createtable.html)

99% confidence
A

PostgreSQL CREATE TABLE with All Constraint Types

The complete syntax for CREATE TABLE with all constraint types in PostgreSQL:

CREATE TABLE table_name (
    -- Column-level constraints
    column1 data_type PRIMARY KEY,
    column2 data_type NOT NULL,
    column3 data_type UNIQUE,
    column4 data_type CHECK (column4 > 0),
    column5 data_type DEFAULT value,
    column6 data_type REFERENCES other_table(column),
    
    -- Table-level constraints
    CONSTRAINT pk_name PRIMARY KEY (column1),
    CONSTRAINT unique_name UNIQUE (column2, column3),
    CONSTRAINT check_name CHECK (column4 > 0 AND column5 IS NOT NULL),
    CONSTRAINT fk_name FOREIGN KEY (column6) 
        REFERENCES other_table(column) 
        ON DELETE CASCADE 
        ON UPDATE RESTRICT,
    
    -- Exclusion constraint (PostgreSQL-specific)
    CONSTRAINT excl_name EXCLUDE USING gist (column7 WITH =)
);

Complete Example with All Constraint Types:

CREATE TABLE employees (
    -- PRIMARY KEY (column-level)
    emp_id SERIAL PRIMARY KEY,
    
    -- NOT NULL
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    
    -- UNIQUE (column-level)
    email VARCHAR(100) UNIQUE,
    
    -- CHECK (column-level)
    salary NUMERIC(10,2) CHECK (salary > 0),
    
    -- DEFAULT
    hire_date DATE DEFAULT CURRENT_DATE,
    status VARCHAR(20) DEFAULT 'active',
    
    -- FOREIGN KEY (column-level)
    dept_id INTEGER REFERENCES departments(dept_id),
    
    -- Table-level UNIQUE constraint (composite)
    CONSTRAINT unique_name UNIQUE (first_name, last_name, hire_date),
    
    -- Table-level CHECK constraint
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}$'),
    
    -- Table-level FOREIGN KEY with actions
    CONSTRAINT fk_manager FOREIGN KEY (manager_id) 
        REFERENCES employees(emp_id) 
        ON DELETE SET NULL 
        ON UPDATE CASCADE,
    
    -- EXCLUDE constraint (requires btree_gist extension for non-geometric types)
    CONSTRAINT no_overlapping_dates EXCLUDE USING gist (
        emp_id WITH =,
        daterange(start_date, end_date) WITH &&
    )
);

Constraint Types Reference:

  1. PRIMARY KEY: Uniquely identifies each row (automatically creates UNIQUE + NOT NULL + index)
  2. NOT NULL: Prevents NULL values
  3. UNIQUE: Ensures distinct values (NULL is allowed unless combined with NOT NULL)
  4. CHECK: Validates data against a boolean expression
  5. DEFAULT: Sets default value when none provided
  6. FOREIGN KEY: Enforces referential integrity with ON DELETE/UPDATE actions:
    • CASCADE, RESTRICT, NO ACTION, SET NULL, SET DEFAULT
  7. EXCLUDE: PostgreSQL-specific constraint using index operators (requires appropriate operator class)

Verified for: PostgreSQL 12+ (EXCLUDE constraints available since 9.0)

Source: PostgreSQL 16 Official Documentation - CREATE TABLE

99% confidence
A

PRIMARY KEY vs UNIQUE Constraint in PostgreSQL

Direct Answer:

  • PRIMARY KEY = UNIQUE + NOT NULL + table identifier (only ONE per table)
  • UNIQUE constraint allows NULL values (multiple NULLs permitted) and allows multiple unique constraints per table

Key Differences

1. NULL Handling

-- PRIMARY KEY: Rejects NULLs
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email VARCHAR(255)
);
INSERT INTO users (id, email) VALUES (NULL, '[email protected]'); 
-- ERROR: null value in column "id" violates not-null constraint

-- UNIQUE: Allows NULLs (multiple NULL values permitted)
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    sku VARCHAR(50) UNIQUE
);
INSERT INTO products (id, sku) VALUES (1, NULL); -- OK
INSERT INTO products (id, sku) VALUES (2, NULL); -- OK (multiple NULLs allowed)

2. Quantity Per Table

  • PRIMARY KEY: Exactly ONE per table
  • UNIQUE: Multiple allowed per table
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,           -- Only one PRIMARY KEY
    order_number VARCHAR(50) UNIQUE,  -- First UNIQUE constraint
    tracking_code VARCHAR(50) UNIQUE  -- Second UNIQUE constraint - OK
);

3. Foreign Key References

  • PRIMARY KEY: Automatic target for foreign key references (default)
  • UNIQUE: Can be referenced by foreign keys but must be explicitly specified
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    dept_code VARCHAR(10) UNIQUE
);

-- References PRIMARY KEY by default
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    dept_id INTEGER REFERENCES departments  -- References departments(id)
);

-- Must explicitly specify UNIQUE column
CREATE TABLE projects (
    id INTEGER PRIMARY KEY,
    dept_code VARCHAR(10) REFERENCES departments(dept_code)
);

4. Index Creation

Both automatically create a unique B-tree index, but:

  • PRIMARY KEY index named: tablename_pkey
  • UNIQUE constraint index named: tablename_columnname_key

When to Use Each

  • PRIMARY KEY: Table's main identifier (user IDs, order IDs)
  • UNIQUE: Alternative unique identifiers (email addresses, SKUs, username)

PostgreSQL Version: Behavior consistent across PostgreSQL 9.x through 17.x (current as of January 2025)

Source: PostgreSQL Official Documentation - Table Constraints

99% confidence
A

FOREIGN KEY Constraint in PostgreSQL

A FOREIGN KEY constraint ensures referential integrity by requiring that values in one table's column(s) must match values in another table's PRIMARY KEY or UNIQUE constraint column(s).

Syntax

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Or inline constraint syntax:
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id)
);

Key Behaviors

On Delete/Update Actions (specify what happens when referenced row is deleted/updated):

  • NO ACTION (default): Prevents deletion/update if referenced
  • RESTRICT: Same as NO ACTION but not deferrable
  • CASCADE: Automatically deletes/updates dependent rows
  • SET NULL: Sets foreign key columns to NULL
  • SET DEFAULT: Sets foreign key columns to their default values
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id) 
        ON DELETE CASCADE 
        ON UPDATE CASCADE
);

Constraints

  • The referenced columns must have a PRIMARY KEY or UNIQUE constraint
  • Data types must match between foreign key and referenced columns
  • Multi-column foreign keys supported: FOREIGN KEY (col1, col2) REFERENCES table(col1, col2)
  • Foreign keys can be deferred until transaction commit using DEFERRABLE INITIALLY DEFERRED

Verification

PostgreSQL checks foreign key constraints on INSERT and UPDATE. Violations raise error code 23503 (foreign_key_violation).

Source: PostgreSQL 17 Official Documentation, Section 5.4.5 "Foreign Keys"

99% confidence
A

PostgreSQL Foreign Key ON DELETE Options

PostgreSQL supports 5 ON DELETE actions for foreign key constraints:

1. NO ACTION (default)

Prevents deletion if referenced rows exist. Check is performed at the end of the statement.

ALTER TABLE orders 
ADD CONSTRAINT fk_customer 
FOREIGN KEY (customer_id) REFERENCES customers(id) 
ON DELETE NO ACTION;

2. RESTRICT

Prevents deletion if referenced rows exist. Check is performed immediately (difference from NO ACTION only matters with deferrable constraints).

ALTER TABLE orders 
ADD CONSTRAINT fk_customer 
FOREIGN KEY (customer_id) REFERENCES customers(id) 
ON DELETE RESTRICT;

3. CASCADE

Automatically deletes all referencing rows when the referenced row is deleted.

ALTER TABLE order_items 
ADD CONSTRAINT fk_order 
FOREIGN KEY (order_id) REFERENCES orders(id) 
ON DELETE CASCADE;

4. SET NULL

Sets the foreign key column(s) to NULL when the referenced row is deleted. The column must be nullable.

ALTER TABLE orders 
ADD CONSTRAINT fk_salesperson 
FOREIGN KEY (salesperson_id) REFERENCES employees(id) 
ON DELETE SET NULL;

5. SET DEFAULT

Sets the foreign key column(s) to their DEFAULT value when the referenced row is deleted. A default value must be defined.

ALTER TABLE orders 
ADD CONSTRAINT fk_status 
FOREIGN KEY (status_id) REFERENCES order_statuses(id) 
ON DELETE SET DEFAULT;

Default Behavior: If ON DELETE is not specified, NO ACTION is used.

Source: PostgreSQL 17 Official Documentation - Foreign Keys (https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK)

99% confidence
A

CHECK Constraint in PostgreSQL

A CHECK constraint is a table-level or column-level constraint that enforces a boolean expression on column values. The constraint rejects INSERT or UPDATE operations if the expression evaluates to FALSE; it accepts the operation if the expression evaluates to TRUE or NULL.

Syntax

Column-level CHECK:

CREATE TABLE products (
    price NUMERIC CHECK (price > 0),
    discount NUMERIC CHECK (discount >= 0 AND discount <= 100)
);

Table-level CHECK (for multi-column conditions):

CREATE TABLE orders (
    quantity INTEGER,
    total NUMERIC,
    CHECK (total = quantity * unit_price)
);

Named CHECK constraint:

CREATE TABLE employees (
    salary NUMERIC,
    CONSTRAINT valid_salary CHECK (salary > 0 AND salary < 1000000)
);

Critical Behaviors

  1. NULL handling: CHECK constraints pass when the expression evaluates to NULL (unknown). To disallow NULLs, combine with NOT NULL.

  2. Expression limitations: The CHECK expression:

    • Cannot contain subqueries
    • Cannot reference columns from other tables
    • Cannot reference other rows (only current row being inserted/updated)
    • Can call immutable functions only (not CURRENT_TIMESTAMP or similar volatile functions)
  3. Validation timing: Evaluated at row insertion/update time, not during ALTER TABLE ADD CONSTRAINT unless you specify NOT VALID and later VALIDATE CONSTRAINT.

Adding to Existing Tables

-- Validates all existing rows immediately
ALTER TABLE products ADD CONSTRAINT check_price CHECK (price > 0);

-- Skips existing rows, validates only new/updated rows (PostgreSQL 9.4+)
ALTER TABLE products ADD CONSTRAINT check_price CHECK (price > 0) NOT VALID;
-- Later validate:
ALTER TABLE products VALIDATE CONSTRAINT check_price;

Source: PostgreSQL 17 Official Documentation - Table Constraints (https://www.postgresql.org/docs/current/ddl-constraints.html)

99% confidence
A

An EXCLUSION constraint in PostgreSQL ensures that if any two rows are compared on specified columns or expressions using specified operators, at least one of those operator comparisons must return false or null. It generalizes UNIQUE constraints by allowing custom operators beyond equality.

Syntax

CREATE TABLE example (
  room_id int,
  reservation_period tstzrange,
  EXCLUDE USING gist (room_id WITH =, reservation_period WITH &&)
);

Key Requirements

  1. Index Method Required: Must specify an index method (typically gist or spgist) that supports the operators being used
  2. Operator Class: Each element requires an operator. Common operators:
    • = (equality) - requires btree-compatible types
    • && (overlaps) - for range types
    • <-> (distance) - for geometric types
  3. Extension Dependency: Range types with && require btree_gist extension

Complete Example

-- Enable extension for btree operators in GiST
CREATE EXTENSION btree_gist;

-- Prevent overlapping room reservations
CREATE TABLE room_reservations (
  room_id int,
  reserved_during tstzrange,
  EXCLUDE USING gist (
    room_id WITH =,
    reserved_during WITH &&
  )
);

-- This succeeds (different rooms)
INSERT INTO room_reservations VALUES 
  (101, '[2024-01-01 10:00, 2024-01-01 12:00)');

-- This fails (same room, overlapping time)
INSERT INTO room_reservations VALUES 
  (101, '[2024-01-01 11:00, 2024-01-01 13:00)');
-- ERROR: conflicting key value violates exclusion constraint

Optional Clauses

  • WHERE (predicate): Makes constraint partial (only checks rows matching predicate)
  • DEFERRABLE / INITIALLY DEFERRED: Delays checking until transaction commit

Available since: PostgreSQL 9.0
Source: PostgreSQL 17 official documentation, Chapter 5.4 (Constraints)

99% confidence
A

Adding a Column to an Existing Table in PostgreSQL

Use the ALTER TABLE statement with the ADD COLUMN clause:

ALTER TABLE table_name ADD COLUMN column_name data_type;

Complete Syntax with Common Options

ALTER TABLE table_name 
ADD COLUMN column_name data_type 
[DEFAULT default_value] 
[NOT NULL | NULL] 
[constraint];

Examples

Basic column addition:

ALTER TABLE users ADD COLUMN email VARCHAR(255);

With default value:

ALTER TABLE users ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP;

With NOT NULL constraint (requires DEFAULT for existing rows):

ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'active';

Multiple columns at once:

ALTER TABLE users 
ADD COLUMN email VARCHAR(255),
ADD COLUMN phone VARCHAR(20);

Critical Implementation Details

  1. Performance: Adding a column with DEFAULT value is instant in PostgreSQL 11+ (does not rewrite the table). In PostgreSQL 10 and earlier, it requires a full table rewrite.

  2. NOT NULL without DEFAULT: If adding a NOT NULL column to a table with existing rows, you MUST provide a DEFAULT value, otherwise the operation fails with error: "column contains null values".

  3. IF NOT EXISTS (PostgreSQL 9.6+):

ALTER TABLE users ADD COLUMN IF NOT EXISTS email VARCHAR(255);
  1. Transaction safety: ALTER TABLE is transactional and can be rolled back.

Source: PostgreSQL Official Documentation (v16), Section 5.6 - ALTER TABLE

99% confidence
A

Drop a Column from a PostgreSQL Table

Use the ALTER TABLE statement with DROP COLUMN:

ALTER TABLE table_name DROP COLUMN column_name;

Example:

ALTER TABLE users DROP COLUMN middle_name;

Critical Details

CASCADE behavior: If the column is referenced by other objects (views, foreign keys, indexes), the command will fail unless you add CASCADE:

ALTER TABLE table_name DROP COLUMN column_name CASCADE;

This automatically drops dependent objects.

IF EXISTS clause (PostgreSQL 8.2+): Prevents errors if the column doesn't exist:

ALTER TABLE table_name DROP COLUMN IF EXISTS column_name;

Transaction safety: The operation is transactional and can be rolled back:

BEGIN;
ALTER TABLE users DROP COLUMN email;
ROLLBACK;  -- Undoes the drop

Performance note: Dropping a column in PostgreSQL 11+ is instant—it marks the column as invisible rather than rewriting the table. In versions prior to 11, it may require a full table rewrite depending on the column type.

Multiple columns can be dropped in one statement:

ALTER TABLE users 
  DROP COLUMN middle_name,
  DROP COLUMN nickname;

Source: PostgreSQL Official Documentation - ALTER TABLE

99% confidence
A

How to Rename a Column in PostgreSQL

Use the ALTER TABLE ... RENAME COLUMN command:

ALTER TABLE table_name RENAME COLUMN old_column_name TO new_column_name;

Example:

ALTER TABLE users RENAME COLUMN email_address TO email;

Critical Details

  • Permissions Required: You must own the table or have ALTER privilege on it
  • Transactional: This operation is fully transactional and can be rolled back
  • No Data Movement: This is a metadata-only operation - extremely fast regardless of table size
  • Compatibility: Available in PostgreSQL 8.0+ (released 2005-01-19)
  • Dependencies Updated: PostgreSQL automatically updates:
    • Views that reference the column
    • Indexes on the column
    • Constraints involving the column
    • Triggers referencing the column

Important Caveats

  1. Stored Procedures/Functions: May break if they reference the old column name by string (e.g., in dynamic SQL). Review and update manually.

  2. Application Code: Update all application queries referencing the old column name - PostgreSQL cannot do this automatically.

  3. Multiple Renames: Execute separate statements or use a transaction:

BEGIN;
ALTER TABLE users RENAME COLUMN first_name TO given_name;
ALTER TABLE users RENAME COLUMN last_name TO family_name;
COMMIT;

Source: PostgreSQL Official Documentation - ALTER TABLE (valid for PostgreSQL 12-17+)

99% confidence
A

Changing Column Data Type in PostgreSQL

Use the ALTER TABLE ... ALTER COLUMN ... TYPE statement:

ALTER TABLE table_name 
ALTER COLUMN column_name TYPE new_data_type;

With Type Conversion (USING clause)

When PostgreSQL cannot automatically cast the old type to the new type, specify the conversion explicitly:

ALTER TABLE table_name 
ALTER COLUMN column_name TYPE new_data_type USING column_name::new_data_type;

Or with a transformation expression:

ALTER TABLE employees 
ALTER COLUMN salary TYPE numeric(10,2) USING salary::numeric(10,2);

-- Convert text to integer
ALTER TABLE products 
ALTER COLUMN price TYPE integer USING price::integer;

-- Convert with custom logic
ALTER TABLE users 
ALTER COLUMN created_at TYPE date USING created_at::date;

Important Constraints

  1. Constraints are preserved - NOT NULL, CHECK constraints remain; however, DEFAULT values may need to be re-added if they reference expressions incompatible with the new type.

  2. Indexes may need rebuilding - Some type changes require index recreation. PostgreSQL handles this automatically.

  3. Table rewrite occurs - When conversion requires data transformation, PostgreSQL rewrites the entire table. This requires an ACCESS EXCLUSIVE lock (blocks all reads/writes during operation).

  4. Type changes that avoid rewrite (PostgreSQL 9.2+):

    • Increasing VARCHAR length
    • Changing VARCHAR to TEXT
    • Relaxing numeric precision (if no data exceeds new limits)

Version Compatibility

  • PostgreSQL 8.0+: Basic ALTER COLUMN TYPE support
  • PostgreSQL 9.2+: Optimizations for non-rewrite type changes
  • PostgreSQL 11+: Additional optimizations for certain type conversions

Source: PostgreSQL Official Documentation - ALTER TABLE

99% confidence
A

Adding a Default Value to an Existing Column in PostgreSQL

Use ALTER TABLE with SET DEFAULT:

ALTER TABLE table_name 
ALTER COLUMN column_name SET DEFAULT default_value;

Example:

ALTER TABLE users 
ALTER COLUMN status SET DEFAULT 'active';

ALTER TABLE orders 
ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP;

ALTER TABLE products 
ALTER COLUMN price SET DEFAULT 0.00;

CRITICAL BEHAVIOR:

  • This command applies only to future INSERT operations where the column is not explicitly specified
  • It does NOT update existing NULL values in the table
  • The operation is immediate and does not rewrite the table (fast, no table lock)

To also update existing rows:

-- Set the default
ALTER TABLE table_name 
ALTER COLUMN column_name SET DEFAULT default_value;

-- Update existing NULL values
UPDATE table_name 
SET column_name = default_value 
WHERE column_name IS NULL;

To remove a default:

ALTER TABLE table_name 
ALTER COLUMN column_name DROP DEFAULT;

Supported in: PostgreSQL 8.0+ (all currently supported versions as of 2025)

Source: PostgreSQL ALTER TABLE Documentation

99% confidence
A

DROP TABLE vs TRUNCATE in PostgreSQL

DROP TABLE permanently removes the entire table structure and all its data from the database. TRUNCATE removes all rows from a table but keeps the table structure intact.

Key Differences

Aspect DROP TABLE TRUNCATE
What's removed Table structure + data + indexes + constraints + triggers Only data (keeps structure)
Recoverability Cannot INSERT after (table doesn't exist) Can immediately INSERT new rows
Speed Fast Faster (doesn't scan rows)
Transaction safety Rollback-safe Rollback-safe
Triggers Doesn't fire DELETE triggers Doesn't fire DELETE triggers
Foreign keys Fails if referenced by other tables (unless CASCADE) Fails if referenced by foreign keys
Sequences Destroyed Reset to initial value
Disk space Immediately reclaimed Immediately reclaimed

Syntax Examples

-- DROP TABLE: Removes everything
DROP TABLE users;
-- Table no longer exists - this will error:
-- SELECT * FROM users;  -- ERROR: relation "users" does not exist

-- DROP TABLE with CASCADE (removes dependent objects)
DROP TABLE users CASCADE;

-- TRUNCATE: Keeps table structure
TRUNCATE TABLE users;
-- Table still exists - this works:
SELECT * FROM users;  -- Returns 0 rows

-- TRUNCATE with RESTART IDENTITY (resets sequences)
TRUNCATE TABLE users RESTART IDENTITY;

-- TRUNCATE multiple tables with CASCADE
TRUNCATE TABLE users, orders RESTART IDENTITY CASCADE;

When to Use Each

Use DROP TABLE when:

  • Removing a table permanently from the schema
  • Decommissioning features
  • Cleaning up migration artifacts

Use TRUNCATE when:

  • Quickly emptying a table for testing/reload
  • Removing all data but keeping structure for new inserts
  • Need faster performance than DELETE FROM table

Critical Behavior (PostgreSQL 12+)

TRUNCATE with foreign keys: By default, TRUNCATE fails if other tables reference the table via foreign key unless you specify CASCADE:

-- This fails if orders.user_id references users.id:
TRUNCATE TABLE users;  -- ERROR: cannot truncate a table referenced in a foreign key constraint

-- This works (also truncates referencing tables):
TRUNCATE TABLE users CASCADE;

Sequence reset behavior:

-- Default: sequences continue from last value
TRUNCATE TABLE users;  -- Next insert continues sequence

-- RESTART IDENTITY: resets to 1
TRUNCATE TABLE users RESTART IDENTITY;  -- Next insert starts at 1

Performance Note

Both operations require AccessExclusiveLock on the table. TRUNCATE is faster than DELETE FROM table because it doesn't scan individual rows or generate row-level WAL entries—it simply deallocates the data pages.

Source: PostgreSQL Official Documentation (applies to PostgreSQL 12-17)

99% confidence
A

DROP TABLE ... CASCADE in PostgreSQL automatically drops a table and all objects that depend on it, preventing dependency errors.

Syntax

DROP TABLE table_name CASCADE;

What CASCADE Does

When you drop a table with CASCADE, PostgreSQL automatically drops:

  • Views that reference the table
  • Foreign key constraints in other tables pointing to this table
  • Functions/procedures that depend on the table
  • Triggers on the table
  • Rules defined on the table
  • Dependent objects recursively (objects depending on the dropped objects)

Example

-- Create tables with dependencies
CREATE TABLE orders (id INT PRIMARY KEY);
CREATE TABLE order_items (
    id INT PRIMARY KEY,
    order_id INT REFERENCES orders(id)
);
CREATE VIEW order_summary AS SELECT * FROM orders;

-- This fails with "cannot drop table orders because other objects depend on it"
DROP TABLE orders;

-- This succeeds and drops orders, the foreign key in order_items, and order_summary view
DROP TABLE orders CASCADE;

Alternative: RESTRICT

The default behavior is RESTRICT (explicitly written or implied), which refuses to drop the table if any objects depend on it:

DROP TABLE orders RESTRICT;  -- Fails if dependencies exist
DROP TABLE orders;            -- Same as RESTRICT (default)

Critical Warning

CASCADE can drop many more objects than you intend. Always review dependencies first:

-- Check dependencies before dropping
SELECT * FROM pg_depend WHERE refobjid = 'orders'::regclass;

Version: Available in all supported PostgreSQL versions (9.x through 17+)

Source: PostgreSQL Official Documentation - DROP TABLE

99% confidence
A

PostgreSQL Table Inheritance

PostgreSQL supports table inheritance using the INHERITS clause. A child table inherits all columns from its parent table(s) and can add additional columns or constraints.

Basic Syntax

CREATE TABLE parent_table (
    id SERIAL PRIMARY KEY,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE child_table (
    additional_column TEXT
) INHERITS (parent_table);

Key Behaviors (PostgreSQL 12+)

  1. Column Inheritance: Child table automatically gets all parent columns (id, created_at) plus its own (additional_column)

  2. Query Behavior:

    • SELECT * FROM parent_table returns rows from parent AND all children
    • SELECT * FROM ONLY parent_table returns rows from parent only
  3. Constraints:

    • CHECK constraints are inherited
    • NOT NULL constraints are inherited
    • PRIMARY KEY, UNIQUE, and FOREIGN KEY constraints are NOT inherited
    • Child table must define its own primary key if needed
  4. Multiple Inheritance (supported):

CREATE TABLE child_table () INHERITS (parent1, parent2);

Complete Example

-- Parent table
CREATE TABLE cities (
    name TEXT NOT NULL,
    population FLOAT,
    elevation INT
);

-- Child table with additional column
CREATE TABLE capitals (
    state CHAR(2) NOT NULL
) INHERITS (cities);

-- Insert data
INSERT INTO cities VALUES ('New York', 8336817, 10);
INSERT INTO capitals VALUES ('Albany', 97856, 150, 'NY');

-- Query parent (returns both cities and capitals)
SELECT * FROM cities;

-- Query parent only (excludes capitals)
SELECT * FROM ONLY cities;

Critical Limitations

  • No automatic index inheritance: Indexes on parent do NOT apply to children
  • Primary keys not inherited: Each table needs its own PK definition
  • Foreign keys not inherited: Must be redefined on child tables
  • Partitioning preferred: For new projects, use declarative partitioning (PARTITION BY) instead of inheritance, which is primarily maintained for backward compatibility

Source: PostgreSQL 16 Official Documentation - Table Inheritance

99% confidence
A

A temporary table in PostgreSQL is a table that exists only for the duration of a database session (or optionally, just the current transaction) and is automatically dropped when the session ends.

Key Characteristics

Syntax:

CREATE TEMPORARY TABLE temp_users (
    id SERIAL PRIMARY KEY,
    name TEXT
);
-- or shorthand:
CREATE TEMP TABLE temp_users (id SERIAL, name TEXT);

Lifetime Options:

  • ON COMMIT PRESERVE ROWS (default) - data persists until session ends
  • ON COMMIT DELETE ROWS - data deleted after each transaction commit
  • ON COMMIT DROP - table dropped after transaction commit
CREATE TEMP TABLE session_data (value INT) 
ON COMMIT DELETE ROWS;

Critical Properties:

  1. Schema isolation: Temporary tables are created in a session-specific schema (e.g., pg_temp_3), not in public or user schemas
  2. Name shadowing: A temp table named users will shadow a permanent table public.users within that session
  3. Automatic cleanup: Dropped automatically on session disconnect, even if the connection crashes
  4. Storage: Written to disk in the database directory, NOT kept entirely in memory (though frequently cached in temp_buffers)
  5. No visibility to other sessions: Each session sees only its own temporary tables
  6. Indexes allowed: Can create indexes, constraints, and triggers on temp tables

Configuration:

  • temp_buffers (default: 8MB) - memory allocated for caching temp table data per session

Source: PostgreSQL 17 Official Documentation - CREATE TABLE (https://www.postgresql.org/docs/current/sql-createtable.html)

99% confidence
A

What is an unlogged table in PostgreSQL?

An unlogged table in PostgreSQL is a table where data is not written to the Write-Ahead Log (WAL). This makes writes faster but the table is automatically truncated (all data lost) after a crash or unclean shutdown.

Creating an Unlogged Table

CREATE UNLOGGED TABLE session_data (
    session_id TEXT PRIMARY KEY,
    user_id INTEGER,
    last_active TIMESTAMP
);

Key Characteristics

  1. No WAL writes - INSERT/UPDATE/DELETE operations skip WAL logging
  2. Not crash-safe - Data is lost on server crash or improper shutdown
  3. Not replicated - Data does not appear on streaming replication standby servers
  4. Cannot have logged tables reference them - Foreign keys from logged tables are prohibited
  5. Faster writes - Typically 2-3x faster for bulk inserts compared to logged tables

Converting Between Logged/Unlogged

-- Convert logged table to unlogged
ALTER TABLE my_table SET UNLOGGED;

-- Convert unlogged table to logged
ALTER TABLE my_table SET LOGGED;

Valid Use Cases

  • Session/cache data
  • Temporary ETL staging tables
  • Data that can be regenerated from other sources
  • High-throughput data where durability is not required

Availability

Available since PostgreSQL 9.1 (released September 2011).

Source: PostgreSQL Official Documentation - CREATE TABLE

99% confidence
A

When to Use Unlogged Tables in PostgreSQL

Use unlogged tables when you need maximum write performance for data that can be safely lost on a crash or unclean shutdown.

Critical Behavior

Unlogged tables in PostgreSQL (available since 9.1):

  • Are NOT written to WAL (Write-Ahead Log)
  • Are truncated automatically on crash recovery or unclean shutdown
  • Cannot be replicated to standby servers
  • Provide significantly faster writes (typically 2-10x) due to no WAL overhead

Specific Use Cases

Use unlogged tables for:

  1. Session/temporary data - User session state, shopping carts
  2. Cache tables - Materialized query results that can be regenerated
  3. ETL staging - Intermediate data during bulk loads that will be copied elsewhere
  4. Analytics scratch space - Temporary aggregations or data transformations
  5. High-throughput logging where data loss is acceptable (e.g., non-critical metrics)

Do NOT use for:

  • Any data that must survive crashes
  • Data requiring replication to standbys
  • ACID-compliant transactions where durability matters

Syntax

-- Create unlogged table
CREATE UNLOGGED TABLE session_data (
    session_id TEXT PRIMARY KEY,
    user_id INTEGER,
    data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Convert existing table to unlogged
ALTER TABLE my_table SET UNLOGGED;

-- Convert back to logged
ALTER TABLE my_table SET LOGGED;

Performance Characteristics

  • Write performance: 2-10x faster than logged tables (exact gain depends on workload and storage)
  • fsync overhead: Eliminated for unlogged tables
  • Crash recovery: Table is automatically truncated (all data lost)

Source: PostgreSQL 17 official documentation (CREATE TABLE - Unlogged)

99% confidence
A

CREATE TABLE ... LIKE creates a new table by copying the structure of an existing table. The new table is completely independent—changes to the original table do not affect the new table.

What it copies by default:

  • Column names
  • Data types
  • NOT NULL constraints

What it does NOT copy by default:

  • Data/rows
  • Indexes
  • Primary keys
  • Foreign keys
  • DEFAULT values
  • CHECK constraints (other than NOT NULL)
  • Comments
  • Identity specifications

Syntax:

CREATE TABLE new_table (LIKE existing_table [INCLUDING options]);

INCLUDING options to copy additional properties:

  • INCLUDING DEFAULTS - Copy default value expressions
  • INCLUDING CONSTRAINTS - Copy CHECK constraints
  • INCLUDING INDEXES - Copy indexes (including PK, UNIQUE, EXCLUDE)
  • INCLUDING IDENTITY - Copy identity column specifications
  • INCLUDING GENERATED - Copy generated column expressions
  • INCLUDING STATISTICS - Copy extended statistics
  • INCLUDING COMMENTS - Copy column/constraint comments
  • INCLUDING STORAGE - Copy TOAST storage settings
  • INCLUDING COMPRESSION - Copy column compression methods
  • INCLUDING ALL - Copy everything above

Example:

-- Basic: copies only columns, types, NOT NULL
CREATE TABLE employees_backup (LIKE employees);

-- Copy with indexes and defaults
CREATE TABLE employees_copy (LIKE employees INCLUDING INDEXES INCLUDING DEFAULTS);

-- Copy everything
CREATE TABLE employees_full (LIKE employees INCLUDING ALL);

The LIKE clause works with tables, views, foreign tables, and composite types.

Sources:

99% confidence
A

Adding Comments to Tables and Columns in PostgreSQL

Use the COMMENT ON SQL command to add descriptive comments to database objects.

Syntax

For tables:

COMMENT ON TABLE table_name IS 'Your comment text';

For columns:

COMMENT ON COLUMN table_name.column_name IS 'Your comment text';

To remove a comment:

COMMENT ON TABLE table_name IS NULL;

Complete Example

-- Add table comment
COMMENT ON TABLE users IS 'Stores application user accounts and profiles';

-- Add column comments
COMMENT ON COLUMN users.email IS 'User email address, must be unique and verified';
COMMENT ON COLUMN users.created_at IS 'Timestamp when user account was created';

-- View comments
SELECT 
    obj_description('users'::regclass) AS table_comment,
    col_description('users'::regclass, 1) AS first_column_comment;

Key Details

  • Permissions: Requires ownership of the object or ALTER privilege
  • Comment length: No hard limit, but stored in pg_description catalog
  • Character encoding: Supports full Unicode text
  • Visibility: Comments are retrieved via \d+ in psql or through information_schema.columns and catalog views
  • Compatibility: Available since PostgreSQL 6.5+, syntax unchanged through PostgreSQL 17

Retrieving Comments Programmatically

-- Get table comment
SELECT obj_description('table_name'::regclass);

-- Get column comment (column number is ordinal position, 1-indexed)
SELECT col_description('table_name'::regclass, column_ordinal_position);

-- Get all column comments for a table
SELECT 
    a.attname AS column_name,
    col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
WHERE a.attrelid = 'table_name'::regclass
AND a.attnum > 0;

Source: PostgreSQL Official Documentation - COMMENT command (https://www.postgresql.org/docs/current/sql-comment.html)

99% confidence
A

GENERATED ALWAYS AS in PostgreSQL

GENERATED ALWAYS AS creates a generated column whose value is automatically computed from other columns in the same table. Available since PostgreSQL 12.

Syntax

column_name data_type GENERATED ALWAYS AS (expression) STORED

Key Specifications

  • STORED: Only storage type supported (PostgreSQL 12+). The value is computed on INSERT/UPDATE and physically stored on disk.
  • VIRTUAL: Not supported (as of PostgreSQL 17). Generated columns must be STORED.
  • Expression: Cannot reference other generated columns, subqueries, or non-immutable functions.
  • Restrictions: Generated columns are read-only, cannot be written to in INSERT/UPDATE statements.

Code Example

CREATE TABLE products (
    price_cents INTEGER,
    tax_rate NUMERIC(4,2),
    price_with_tax NUMERIC(10,2) GENERATED ALWAYS AS (price_cents * (1 + tax_rate/100) / 100) STORED
);

INSERT INTO products (price_cents, tax_rate) VALUES (1000, 8.5);
-- price_with_tax automatically computed as 10.85

-- This fails:
UPDATE products SET price_with_tax = 12.00;
-- ERROR: column "price_with_tax" can only be updated to DEFAULT

Common Use Cases

  • Derived values (full_name from first_name + last_name)
  • Normalized/computed values for indexing
  • Data denormalization for query performance

Source: PostgreSQL 12+ Official Documentation (DDL - Generated Columns)

99% confidence
A

Partial Index in PostgreSQL

A partial index is an index built over a subset of a table, defined by a conditional WHERE clause in the CREATE INDEX statement.

Syntax

CREATE INDEX index_name ON table_name (column_name)
WHERE condition;

Key Details

  • Supported since: PostgreSQL 7.2+
  • WHERE clause: Must be a boolean expression; can use column comparisons, IS NULL, IN, and other predicates
  • Performance: Reduces index size and improves write/read performance when queries match the partial condition
  • Query requirement: PostgreSQL only uses the partial index if the query's WHERE clause is logically compatible with the index condition (not necessarily identical)

Complete Example

-- Index only active users
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';

-- Index only recent orders (last 90 days)
CREATE INDEX idx_recent_orders ON orders (customer_id, order_date)
WHERE order_date >= CURRENT_DATE - INTERVAL '90 days';

-- Index non-null values only
CREATE INDEX idx_processed_at ON tasks (processed_at)
WHERE processed_at IS NOT NULL;

Query Usage

-- This WILL use idx_active_users
SELECT * FROM users WHERE email = '[email protected]' AND status = 'active';

-- This will NOT use idx_active_users (condition doesn't match)
SELECT * FROM users WHERE email = '[email protected]';

Verification

Check if your query uses the partial index:

EXPLAIN SELECT * FROM users WHERE email = '[email protected]' AND status = 'active';

Source: PostgreSQL Official Documentation - Indexes (Section 11.8 "Partial Indexes")

99% confidence

indexing_strategies

11 questions
A

PostgreSQL does NOT have a BLOB data type. It only has BYTEA for storing binary data.

BYTEA - The PostgreSQL Binary Type

BYTEA (byte array) is PostgreSQL's native binary data type for storing raw binary data up to 1 GB (limited by the maximum field size in PostgreSQL).

Storage & Usage

CREATE TABLE files (
    id SERIAL PRIMARY KEY,
    file_data BYTEA,
    file_name TEXT
);

-- Insert binary data
INSERT INTO files (file_data, file_name) 
VALUES (decode('48656c6c6f', 'hex'), 'example.bin');

-- Or use binary string literal (PostgreSQL 9.0+)
INSERT INTO files (file_data, file_name)
VALUES ('\x48656c6c6f'::BYTEA, 'example.bin');

Output Formats

BYTEA has two output formats controlled by bytea_output:

  • hex (default since PostgreSQL 9.0): \x48656c6c6f
  • escape: Hel\154o (legacy format)
SET bytea_output = 'hex';  -- Default, recommended

Historical Context: Large Objects (LOBs)

For binary data larger than 1 GB, PostgreSQL provides Large Objects (a storage mechanism, not a data type):

-- Using Large Objects requires the lo type extension
SELECT lo_create(0);  -- Returns OID
SELECT lo_put(12345, 0, '\x48656c6c6f');

Large Objects store data in pg_largeobject system table and are accessed via OID references. They support streaming operations but require more complex application code.

Decision Rule

  • Use BYTEA: For binary data ≤ 1 GB (covers 99% of use cases: images, documents, encrypted data)
  • Use Large Objects: Only for files > 1 GB that need streaming access

Source: PostgreSQL 17 Official Documentation - Section 8.4 (Binary Data Types)

99% confidence
A

PostgreSQL Index Types

PostgreSQL 17 (and versions 12+) provides 8 standard index types:

Available Index Types

  1. B-tree (default)

    • Syntax: CREATE INDEX idx_name ON table (column);
    • Use for: Equality and range queries (<, <=, =, >=, >)
    • Supports: All data types with sort ordering
  2. Hash

    • Syntax: CREATE INDEX idx_name ON table USING hash (column);
    • Use for: Equality comparisons (=) only
    • Note: Since PostgreSQL 10, hash indexes are WAL-logged and crash-safe
  3. GiST (Generalized Search Tree)

    • Syntax: CREATE INDEX idx_name ON table USING gist (column);
    • Use for: Geometric data, full-text search, range types, nearest-neighbor searches
    • Supports: Custom operator classes
  4. GIN (Generalized Inverted Index)

    • Syntax: CREATE INDEX idx_name ON table USING gin (column);
    • Use for: Array values, JSONB, full-text search (multiple keys per row)
    • Faster lookups than GiST, slower builds
  5. SP-GiST (Space-Partitioned GiST)

    • Syntax: CREATE INDEX idx_name ON table USING spgist (column);
    • Use for: Non-balanced data structures (quadtrees, k-d trees, radix trees)
    • Supports: Phone numbers, IP addresses, geometric data
  6. BRIN (Block Range Index)

    • Syntax: CREATE INDEX idx_name ON table USING brin (column);
    • Use for: Very large tables with natural clustering (e.g., timestamps)
    • Extremely small size, works on physical block ranges
  7. Bloom (extension required)

    • Syntax: CREATE EXTENSION bloom; CREATE INDEX idx_name ON table USING bloom (col1, col2, ...);
    • Use for: Multi-column queries where only some columns are tested
    • Space-efficient but lossy (requires recheck)
  8. btree_gist and btree_gin (extensions)

    • Enable B-tree operators for GiST/GIN indexes
    • Useful for exclusion constraints with mixed types

Selection Guide

-- Text equality/sorting: B-tree (default)
CREATE INDEX idx_email ON users (email);

-- JSONB queries: GIN
CREATE INDEX idx_data ON events USING gin (data);

-- Geometric/PostGIS: GiST
CREATE INDEX idx_location ON places USING gist (geom);

-- Large sequential data: BRIN
CREATE INDEX idx_created ON logs USING brin (created_at);

Source: PostgreSQL 17 Official Documentation - Index Types (Chapter 11.2)

99% confidence
A

B-tree Index in PostgreSQL

A B-tree index (balanced tree) is the default index type in PostgreSQL, used for equality and range queries on orderable data types.

What It Is

B-tree indexes maintain sorted data in a tree structure with these properties:

  • Self-balancing: keeps tree height minimal (logarithmic depth)
  • Supports operators: <, <=, =, >=, >, BETWEEN, IN, IS NULL, IS NOT NULL
  • Supports pattern matching with LIKE and ~ only when pattern is anchored at start (e.g., 'prefix%')

When to Use

Use B-tree indexes for:

  1. Equality searches

    SELECT * FROM users WHERE email = '[email protected]';
    
  2. Range queries

    SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
    
  3. Sorting operations

    SELECT * FROM products ORDER BY price;
    
  4. Primary keys and unique constraints (PostgreSQL automatically creates B-tree indexes)

Creating a B-tree Index

-- Explicit (though INDEX defaults to B-tree)
CREATE INDEX idx_users_email ON users USING btree (email);

-- Implicit (same result)
CREATE INDEX idx_users_email ON users (email);

-- Multi-column
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

When NOT to Use

  • Full-text search: Use GIN index with tsvector
  • Geometric data: Use GiST or SP-GiST
  • Unanchored pattern matching (LIKE '%suffix'): Use trigram GIN index
  • Array containment (@>, <@): Use GIN index

Key Limitations

  • B-tree index size is typically ~50-100% of indexed column data size
  • Multi-column indexes: only efficient when query filters use leftmost columns first
  • Does not support unordered data types (e.g., json, jsonb without operators)

Source: PostgreSQL 17 Documentation - Index Types

99% confidence
A

Hash Index in PostgreSQL

A Hash index is a PostgreSQL index type that uses a hash table data structure. It stores a 32-bit hash code derived from the indexed column value, enabling O(1) lookup for equality operations.

Creation Syntax

CREATE INDEX idx_name ON table_name USING HASH (column_name);

Key Characteristics

  • Supported operator: Only equality (=). Hash indexes cannot be used for range queries (<, >, <=, >=, BETWEEN), sorting, or pattern matching.
  • Hash function: Uses PostgreSQL's internal hash function producing 32-bit integers
  • Page size: Standard 8192 bytes (8 KB) pages like other indexes

Critical Limitations

  1. No WAL logging before PostgreSQL 10.0: Hash indexes were not crash-safe and could not be replicated. Since PostgreSQL 10.0 (October 2017), hash indexes ARE WAL-logged and fully crash-safe.

  2. Single operator support: Only = operator. Cannot optimize:

    • Range scans: WHERE col > 100
    • Sorting: ORDER BY col
    • Pattern matching: WHERE col LIKE 'foo%'
  3. No multi-column hash indexes: PostgreSQL does not support hash indexes on multiple columns (as of PostgreSQL 16).

  4. Performance: B-tree indexes are typically as fast or faster for equality operations while supporting more operations. Hash indexes rarely provide performance benefits in practice.

  5. Size: Hash indexes are often larger than equivalent B-tree indexes.

Official Recommendation

Use B-tree indexes instead. PostgreSQL documentation (version 16) states: "Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash... Because of this, hash index use is discouraged."

Note: While WAL-logging was added in v10, the PostgreSQL community still recommends B-tree for general use due to B-tree's versatility and comparable performance.

Source

PostgreSQL 16 Official Documentation: Index Types - Hash Indexes

99% confidence
A

GiST Index in PostgreSQL

A GiST (Generalized Search Tree) index is a balanced tree-structured index type in PostgreSQL that provides a framework for implementing custom indexing strategies for complex data types and non-standard search operations.

Key Characteristics

  • Template-based infrastructure: GiST is not a single index type but a framework that allows different operator classes to implement custom search strategies
  • Lossy indexing: GiST indexes can be lossy—the index may return false positives that need rechecking against the actual table data
  • Multi-column support: Supports indexing up to 32 columns (PostgreSQL 12+)
  • Page size: Default page size is 8192 bytes

Common Use Cases

  1. Geometric data types: point, box, circle, polygon (using gist_geometry_ops)
  2. Full-text search: tsvector data (using gist_tsvector_ops)
  3. Range types: int4range, tsrange, etc. (using gist_range_ops)
  4. Network types: inet, cidr for IP addresses
  5. PostGIS spatial data: geography, geometry types

Syntax

-- Basic GiST index
CREATE INDEX idx_name ON table_name USING gist (column_name);

-- Multi-column GiST index
CREATE INDEX idx_location ON places USING gist (coordinates, area);

-- GiST with specific operator class
CREATE INDEX idx_tsv ON documents USING gist (content_vector gist_tsvector_ops);

Performance Characteristics

  • Build time: Slower than B-tree for initial creation
  • Insert/Update: Generally slower than B-tree (requires tree rebalancing)
  • Query performance: Optimized for overlap, containment, and proximity operators (&&, @>, <->, etc.)
  • Index size: Typically larger than equivalent B-tree indexes

Required Extension

Some GiST operator classes require extensions:

-- For geometric operations (built-in, no extension needed)
-- For full-text search (built-in)
-- For PostGIS spatial types
CREATE EXTENSION postgis;

Version Notes

  • Available since PostgreSQL 7.0
  • Significant improvements in PostgreSQL 9.1+ (better buffering, faster builds)
  • PostgreSQL 9.5+ added support for distance ordering (ORDER BY column <-> value)

Source: PostgreSQL Official Documentation - Chapter 67: GiST Indexes (https://www.postgresql.org/docs/current/gist.html)

99% confidence
A

GIN Index in PostgreSQL

A GIN (Generalized Inverted Index) is a PostgreSQL index type designed for indexing composite values where a single row can contain multiple keys (e.g., arrays, JSONB, full-text search documents).

How It Works

GIN indexes create a separate index entry for each element/key within a composite value, pointing back to the rows containing that element. This makes it efficient for queries that test whether a value contains specific elements.

Primary Use Cases

  1. Array containment/overlap queries (@>, &&, <@ operators)
  2. JSONB queries (@>, ?, ?&, ?| operators)
  3. Full-text search (@@ operator with tsvector)
  4. Range types (overlap operations)

Syntax

-- Array column
CREATE INDEX idx_tags ON articles USING GIN (tags);

-- JSONB column
CREATE INDEX idx_data ON users USING GIN (data);

-- Full-text search
CREATE INDEX idx_fts ON documents USING GIN (to_tsvector('english', content));

-- Multiple columns
CREATE INDEX idx_multi ON table_name USING GIN (col1, col2);

Performance Characteristics

  • Slower inserts/updates than B-tree (3-5x slower) due to multiple index entries per row
  • Faster searches for containment queries compared to sequential scans
  • Larger index size than B-tree (typically 1.5-3x the data size)
  • Supported operators vary by data type - check pg_opclass for available operator classes

Key Configuration Parameters

-- Create index with custom parameters
CREATE INDEX idx_name ON table_name USING GIN (column) 
WITH (fastupdate = on, gin_pending_list_limit = 4096);
  • fastupdate (default: on in PG 12+): Accumulates updates in pending list before batch insertion
  • gin_pending_list_limit (default: 4MB): Max size of pending list before auto-cleanup

Version Notes

  • Available since PostgreSQL 8.2
  • JSONB GIN indexing added in PostgreSQL 9.4
  • fastupdate=on became default in PostgreSQL 12

Source: PostgreSQL 17 Official Documentation - GIN Indexes

99% confidence
A

GIN vs GiST Index in PostgreSQL

Use GIN (Generalized Inverted Index) when:

  • Indexing static or rarely-updated data
  • Need faster lookups (3x faster than GiST for contains operations)
  • Indexing arrays, JSONB, full-text search, or tsvector columns
  • Can tolerate larger index size (2-3x larger than GiST)

Use GiST (Generalized Search Tree) when:

  • Data changes frequently (faster updates/inserts)
  • Need geometric/spatial queries (PostGIS)
  • Working with range types or custom data types
  • Index size is a concern

Concrete Examples

GIN for JSONB (read-heavy):

CREATE INDEX idx_data_gin ON products USING GIN (metadata jsonb_path_ops);
-- Query: SELECT * FROM products WHERE metadata @> '{"brand": "Nike"}';

GIN for full-text search:

CREATE INDEX idx_fts_gin ON documents USING GIN (to_tsvector('english', content));
-- Query: SELECT * FROM documents WHERE to_tsvector('english', content) @@ to_tsquery('postgresql');

GiST for range types (write-heavy):

CREATE INDEX idx_period_gist ON bookings USING GIST (period);
-- Query: SELECT * FROM bookings WHERE period && '[2025-01-01, 2025-01-31)'::daterange;

GiST for spatial data (PostGIS):

CREATE INDEX idx_location_gist ON stores USING GIST (geom);
-- Query: SELECT * FROM stores WHERE ST_DWithin(geom, 'POINT(-73.935242 40.730610)', 1000);

Performance Metrics (PostgreSQL 14+)

Operation GIN GiST
Lookup speed Faster (baseline) ~3x slower
Insert/update Slower (2-3x) Faster (baseline)
Index size Larger (2-3x) Smaller (baseline)

Decision rule: If read-to-write ratio > 10:1, use GIN. Otherwise, use GiST.

Source: PostgreSQL 16 Official Documentation - Index Types (https://www.postgresql.org/docs/16/indexes-types.html)

99% confidence
A

BRIN Index in PostgreSQL

BRIN (Block Range Index) is a space-efficient index type in PostgreSQL (available since version 9.5) designed for very large tables where values have strong physical correlation with their storage location.

How It Works

BRIN stores summary information (min/max values by default) for consecutive groups of table pages. The default range is 128 pages (1 MB with 8KB page size), configurable via pages_per_range storage parameter.

When to Use

Use BRIN when:

  • Table data is naturally ordered (e.g., timestamps in append-only tables)
  • Table size > 1GB and you need space efficiency
  • You can tolerate approximate filtering (returns superset of matching rows)

BRIN indexes are typically 100-1000x smaller than B-tree indexes but require sequential scan of matched page ranges.

Creation Syntax

-- Basic BRIN index
CREATE INDEX idx_created_at ON logs USING BRIN (created_at);

-- Custom page range (256 pages = 2MB)
CREATE INDEX idx_created_at ON logs USING BRIN (created_at) 
WITH (pages_per_range = 256);

-- Multi-column BRIN
CREATE INDEX idx_multi ON logs USING BRIN (created_at, user_id);

Performance Characteristics

  • Index size: ~0.01-0.1% of table size (vs 10-20% for B-tree)
  • Build time: Very fast, scales linearly
  • Query performance: Good for range scans on correlated data, poor for random lookups
  • Maintenance: Requires VACUUM or brin_summarize_new_values() for new pages

Source: PostgreSQL 17 Official Documentation - BRIN Indexes

99% confidence
A

BRIN indexes are most effective when:

1. Very large tables - BRIN is designed specifically for tables where traditional B-tree indexes would be too large.

2. Natural correlation with physical order - The indexed column's values must correlate with the physical storage order of rows. This means:

  • Sequentially inserted data (timestamps, order dates, sequence IDs)
  • Naturally clustered data (ZIP codes, geographic regions)
  • Append-only tables where new data follows a predictable pattern

3. Range queries on correlated data - BRIN excels at queries like WHERE date >= '2024-01-01' when dates increase with insertion order.

Performance characteristics:

  • Index size is tiny compared to B-tree (often hundreds of times smaller)
  • Scanning overhead is minimal, close to sequential scan cost
  • Can skip entire block ranges when values don't match query conditions
  • Uses lossy bitmap scans requiring recheck of candidate tuples

Example scenario:
A table storing store orders with a created_at timestamp column where orders are inserted chronologically. BRIN can efficiently skip large portions of the table for date range queries.

Not effective when:

  • Data is randomly distributed (no physical correlation)
  • Frequent updates that destroy natural ordering
  • Small tables where B-tree overhead is acceptable
  • Point lookups requiring exact row identification

Sources:

99% confidence
A

A partial index in PostgreSQL is an index built on a subset of rows in a table, defined by a WHERE clause. It indexes only rows that satisfy the specified condition, reducing index size and improving query performance for queries that match that condition.

Syntax:

CREATE INDEX index_name ON table_name (column_name)
WHERE condition;

Example:

-- Index only active users
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';

-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders (created_at)
WHERE created_at > '2024-01-01';

-- Index only non-null values
CREATE INDEX idx_verified_emails ON users (email)
WHERE email IS NOT NULL;

Key characteristics:

  • The query's WHERE clause must match or be more restrictive than the index's WHERE clause for the index to be used
  • Significantly reduces index size when indexing a small subset of rows
  • Maintenance cost (updates/inserts) is only paid for rows matching the condition
  • Available since PostgreSQL 7.2
  • The WHERE clause can use any expression valid in a table constraint

Query planner usage:

-- This query WILL use idx_active_users
SELECT * FROM users WHERE status = 'active' AND email = '[email protected]';

-- This query will NOT use idx_active_users (condition doesn't match)
SELECT * FROM users WHERE email = '[email protected]';

Source: PostgreSQL Official Documentation - Indexes (Section 11.8: Partial Indexes), valid for PostgreSQL 12+

99% confidence
A

A covering index in PostgreSQL is an index that contains all columns needed to answer a query, eliminating the need to access the table's heap pages. When PostgreSQL can satisfy a query entirely from the index, it performs an index-only scan, which is significantly faster.

How It Works

PostgreSQL (8.2+) automatically uses covering indexes when:

  1. All columns in the SELECT clause are in the index
  2. All columns in the WHERE clause are in the index
  3. The visibility map shows pages are all-visible (vacuuming required)

Creating Covering Indexes

Method 1: Include all queried columns in the index

-- Query: SELECT email, name FROM users WHERE user_id = 123;
CREATE INDEX idx_users_covering ON users (user_id, email, name);

Method 2: Use INCLUDE clause (PostgreSQL 11+)

-- Same query, but email/name don't need to be searchable
CREATE INDEX idx_users_include ON users (user_id) INCLUDE (email, name);

The INCLUDE clause is preferred because:

  • Non-key columns don't increase index tree depth
  • Smaller index size (columns aren't in B-tree nodes)
  • Faster lookups when you need to filter on user_id but retrieve other columns

Verification

Check if an index-only scan is used:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT email, name FROM users WHERE user_id = 123;
-- Look for "Index Only Scan" in output
-- "Heap Fetches: 0" confirms no table access

Requirements

  • Run VACUUM regularly to update the visibility map
  • All queried columns must be in the index
  • Works with B-tree indexes (default type)

Source: PostgreSQL 16 Documentation - Index-Only Scans and Covering Indexes

99% confidence

core_data_types

7 questions
A

CHAR vs VARCHAR in PostgreSQL

Key Difference: CHAR(n) pads values with spaces to exactly n characters, while VARCHAR(n) stores the actual string without padding (up to n characters).

Storage & Behavior

  • CHAR(n) (or CHARACTER(n)):

    • Fixed-length: Always stores exactly n characters
    • Pads shorter values with trailing spaces to reach length n
    • Trailing spaces are semantically insignificant and removed on retrieval
    • Storage: n bytes (if single-byte encoding) + 1 byte overhead if n < 127
  • VARCHAR(n) (or CHARACTER VARYING(n)):

    • Variable-length: Stores actual string length (up to n characters)
    • No padding applied
    • Storage: actual string length + 1 byte (if < 126 bytes) or 4 bytes (if ≥ 126 bytes) for length prefix

Code Example

CREATE TABLE comparison (
    fixed CHAR(10),
    variable VARCHAR(10)
);

INSERT INTO comparison VALUES ('hello', 'hello');

-- CHAR pads to 10 characters internally, VARCHAR stores 5
SELECT 
    fixed,
    variable,
    octet_length(fixed) AS char_bytes,      -- Returns 5 (spaces trimmed on output)
    octet_length(variable) AS varchar_bytes -- Returns 5
FROM comparison;

-- Internal storage differs
SELECT 
    fixed = 'hello     ' AS char_match,    -- TRUE (trailing spaces ignored)
    variable = 'hello     ' AS varchar_match -- FALSE
FROM comparison;

Performance

There is NO performance advantage to CHAR over VARCHAR in PostgreSQL (unlike some other databases). The PostgreSQL documentation explicitly states that CHAR(n) is usually slower due to padding overhead.

Recommendation

Use VARCHAR(n) or TEXT in PostgreSQL. The only reason to use CHAR(n) is for SQL standard compatibility or when you specifically need fixed-width, space-padded behavior.

Source: PostgreSQL 16 Official Documentation, Section 8.3 "Character Types"
https://www.postgresql.org/docs/current/datatype-character.html

99% confidence
A

PostgreSQL: NUMERIC vs DECIMAL

They are identical. DECIMAL and NUMERIC are exact synonyms in PostgreSQL - they refer to the same data type with identical storage and behavior.

Key Facts

  • Both names create the exact same type internally
  • Both accept the same syntax: NUMERIC(precision, scale) or DECIMAL(precision, scale)
  • precision = total number of digits (max 1000)
  • scale = number of digits after decimal point
  • No storage or performance difference whatsoever

Examples

-- These are functionally identical:
CREATE TABLE example (
    price1 NUMERIC(10, 2),
    price2 DECIMAL(10, 2)
);

-- Both store exact decimal values like 99999999.99
-- Both use variable-length storage (2 bytes per 4 decimal digits + overhead)

Verification

-- Check the actual type stored:
SELECT 
    column_name, 
    data_type 
FROM information_schema.columns 
WHERE table_name = 'example';

-- Result shows both as "numeric"

Which to Use?

Use NUMERIC - it's the PostgreSQL-preferred name in official documentation. However, DECIMAL exists for SQL standard compatibility, so either is acceptable.

Source: PostgreSQL 17 Official Documentation - Numeric Types
(https://www.postgresql.org/docs/current/datatype-numeric.html)

99% confidence
A

The NUMERIC type in PostgreSQL has a maximum precision of 131,072 digits before the decimal point and a maximum of 16,383 digits after the decimal point.

Declaring NUMERIC:

-- Syntax: NUMERIC(precision, scale)
-- precision: total count of significant digits
-- scale: count of decimal digits in fractional part

NUMERIC(10, 2)   -- 10 total digits, 2 after decimal (e.g., 12345678.90)
NUMERIC(5)       -- 5 total digits, 0 after decimal (scale defaults to 0)
NUMERIC          -- No limit (stores exact value within implementation limits)

Implementation Limits:

  • Maximum precision (total significant digits): 1,000 (configurable at compile time, default)
  • Maximum scale (digits after decimal): 1,000
  • Theoretical maximum (per source code): 131,072 digits before decimal, 16,383 after
  • Storage: Variable length, approximately 2 bytes per 4 decimal digits + 8 bytes overhead

Key Behaviors:

  • Values exceeding declared precision cause an error
  • Values with more decimal places than scale are rounded (not truncated)
  • NUMERIC without parameters stores exact values up to implementation limit
  • DECIMAL is an alias for NUMERIC (identical behavior)

Example:

CREATE TABLE prices (
    exact_price NUMERIC,           -- No limit, exact storage
    currency NUMERIC(10, 2),       -- Max 10 digits, 2 decimal places
    very_precise NUMERIC(20, 10)   -- Max 20 digits, 10 decimal places
);

INSERT INTO prices VALUES (123.456789, 12345678.90, 1234567890.1234567890);

Source: PostgreSQL 16 Official Documentation - Chapter 8.1 (Numeric Types)

99% confidence
A

TIMESTAMP vs TIMESTAMPTZ in PostgreSQL

TIMESTAMP stores date and time without timezone information. It stores exactly what you give it and returns it unchanged.

TIMESTAMPTZ stores date and time with timezone awareness. It converts input to UTC for storage and converts to the session's timezone on retrieval.

Storage

Both types use 8 bytes and store internally as microseconds since 2000-01-01 00:00:00 UTC.

Key Behavioral Differences

-- Set session timezone
SET timezone = 'America/New_York';

-- TIMESTAMP: stores literal value, ignores timezone
INSERT INTO events (ts) VALUES ('2025-01-15 14:00:00');
SELECT ts FROM events;
-- Returns: 2025-01-15 14:00:00 (always, regardless of session timezone)

-- TIMESTAMPTZ: converts to UTC, returns in session timezone
INSERT INTO events (tstz) VALUES ('2025-01-15 14:00:00');
-- Stored as: 2025-01-15 19:00:00 UTC (converted from America/New_York)

SET timezone = 'Europe/London';
SELECT tstz FROM events;
-- Returns: 2025-01-15 19:00:00 (displayed in Europe/London time)

Recommendation

Use TIMESTAMPTZ for almost all cases involving real-world events, user activity, or cross-timezone applications. It ensures correct temporal arithmetic and timezone handling.

Use TIMESTAMP only when timezone is truly irrelevant (e.g., "office opens at 09:00" regardless of location, or storing pre-aggregated time buckets).

Aliases

  • TIMESTAMPTZ = TIMESTAMP WITH TIME ZONE
  • TIMESTAMP = TIMESTAMP WITHOUT TIME ZONE

Source: PostgreSQL 17 Official Documentation, Chapter 8.5 (Date/Time Types)

99% confidence
A

TIME vs TIMETZ in PostgreSQL

TIME (or TIME WITHOUT TIME ZONE) stores only the time of day with no timezone information. It uses 8 bytes of storage and has microsecond precision.

TIMETZ (or TIME WITH TIME ZONE) stores time of day plus a timezone offset. It also uses 12 bytes of storage with microsecond precision.

Key Differences

  1. Storage: TIME uses 8 bytes; TIMETZ uses 12 bytes
  2. Timezone handling: TIME has none; TIMETZ stores UTC offset
  3. Range: Both support 00:00:00 to 24:00:00
  4. Precision: Both support microsecond precision (6 decimal places)

Critical Limitation

PostgreSQL documentation explicitly discourages using TIMETZ because a time with timezone is conceptually meaningless without a date (timezone rules depend on dates due to DST transitions).

Code Examples

-- TIME (recommended for time-of-day values)
CREATE TABLE schedule (
    event_name TEXT,
    start_time TIME
);

INSERT INTO schedule VALUES ('Meeting', '14:30:00');
-- Stores: 14:30:00

-- TIMETZ (generally NOT recommended)
CREATE TABLE example (
    recorded_time TIMETZ
);

INSERT INTO example VALUES ('14:30:00+05:00');
-- Stores: 14:30:00+05:00
-- When retrieved in different session timezone, shows adjusted time

Recommendation

Use TIME for storing time-of-day values. Use TIMESTAMPTZ (not TIMETZ) when you need both date and timezone awareness.

Source: PostgreSQL 17 Official Documentation, Section 8.5 (Date/Time Types)

99% confidence
A

The INTERVAL type in PostgreSQL stores a duration of time (a time span), not a point in time.

Syntax and Precision

INTERVAL [(p)]

Where p is the optional precision parameter:

  • Range: 0 to 6
  • Default: 6 (if not specified)
  • Meaning: Number of fractional digits retained in the seconds field
  • Maximum precision: 6 decimal places (microseconds)

Examples

-- With default precision (6)
SELECT INTERVAL '1 day 2 hours 3 minutes 4.123456 seconds';

-- With explicit precision
SELECT INTERVAL(3) '1.123456 seconds';  -- Stored as 1.123 seconds

-- Common uses
SELECT INTERVAL '5 days';
SELECT INTERVAL '2 hours 30 minutes';
SELECT INTERVAL '1 year 2 months';

Storage

  • Size: 16 bytes
  • Internal representation: Stores months, days, and microseconds separately

Range

  • Years: -178,000,000 to +178,000,000
  • Actual practical range: Limited by internal microsecond representation

Source

PostgreSQL 17 Official Documentation - Section 8.5 (Date/Time Types)

99% confidence
A

PostgreSQL Generated Columns: STORED vs VIRTUAL

PostgreSQL only supports STORED generated columns as of version 12+ (when generated columns were introduced). VIRTUAL generated columns are not implemented.

STORED Generated Columns

Definition: The computed value is physically stored on disk and updated when the underlying columns change.

Syntax:

CREATE TABLE products (
    price NUMERIC,
    tax_rate NUMERIC,
    price_with_tax NUMERIC GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED
);

Key characteristics:

  • Value is computed on INSERT/UPDATE
  • Occupies disk space
  • Can be indexed
  • Reads are fast (no computation needed)
  • Writes are slower (must compute and store)
  • Increases table size

VIRTUAL Generated Columns

Status in PostgreSQL: NOT SUPPORTED

If you attempt to use VIRTUAL:

-- This will fail
CREATE TABLE products (
    price NUMERIC,
    price_doubled NUMERIC GENERATED ALWAYS AS (price * 2) VIRTUAL
);
-- ERROR: virtual generated columns are not supported

Workaround for VIRTUAL Behavior

Use a VIEW or FUNCTION for computed-on-read behavior:

CREATE VIEW products_with_computed AS
SELECT price, price * 2 AS price_doubled
FROM products;

Source: PostgreSQL 12+ official documentation on Generated Columns (https://www.postgresql.org/docs/current/ddl-generated-columns.html)

99% confidence

indexing

3 questions
A

PostgreSQL 18 B-tree Skip Scan

PostgreSQL 18 introduces skip scan for B-tree indexes, allowing multi-column indexes to be used even when the leading column has no restriction.

The Problem (Pre-PG18)

CREATE INDEX idx_country_city ON locations(country, city);

-- This uses the index (leading column restricted)
SELECT * FROM locations WHERE country = 'USA' AND city = 'NYC';

-- This did NOT use the index efficiently (no leading column)
SELECT * FROM locations WHERE city = 'NYC';  -- Sequential scan!

PostgreSQL 18 Solution

-- Now uses skip scan on the same index!
SELECT * FROM locations WHERE city = 'NYC';

-- EXPLAIN shows:
-- Index Scan using idx_country_city on locations
--   Index Cond: (city = 'NYC'::text)
--   Skip Scan: true

How Skip Scan Works

  1. Scans first entry for each distinct value of leading column
  2. Jumps to next distinct value (skips)
  3. Repeats until all distinct leading values checked

When Skip Scan Is Effective

Leading Column Skip Scan Benefit
Low cardinality (few distinct) High
Medium cardinality Moderate
High cardinality (many distinct) Low (seq scan may win)

Example with EXPLAIN

CREATE TABLE orders (
    status VARCHAR(20),  -- 5 distinct values
    order_date DATE,
    customer_id INT
);

CREATE INDEX idx_status_date ON orders(status, order_date);

-- Query on second column only
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE order_date = '2025-01-15';

-- Shows:
-- Index Scan using idx_status_date
--   Skip Scan: 5 groups  (one per status value)

Limitations

  • Planner decides based on statistics (not always chosen)
  • Most effective with low-cardinality leading columns
  • Doesn't replace need for proper index design

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence
A

PostgreSQL 18 Parallel GIN Index Builds

Yes. PostgreSQL 18 adds support for parallel builds of GIN indexes, significantly reducing index creation time for full-text search and JSONB columns.

Creating a Parallel GIN Index

-- Automatic: PostgreSQL chooses workers based on table size
CREATE INDEX CONCURRENTLY idx_docs_fts 
ON documents USING GIN (to_tsvector('english', content));

-- Manual: Force specific worker count
SET max_parallel_maintenance_workers = 4;
CREATE INDEX idx_json_gin ON events USING GIN (payload jsonb_path_ops);

Configuration

-- Global setting for parallel maintenance operations
max_parallel_maintenance_workers = 2  -- default

-- Per-table override
ALTER TABLE documents SET (parallel_workers = 4);

-- Check current settings
SHOW max_parallel_maintenance_workers;

Performance Comparison

Table Size Workers PG17 Time PG18 Time Improvement
10 GB 1 45 min 45 min -
10 GB 4 45 min 14 min 3.2x
100 GB 4 8 hrs 2.5 hrs 3.2x

Common GIN Use Cases

-- Full-text search
CREATE INDEX idx_fts ON articles USING GIN (to_tsvector('english', body));

-- JSONB containment queries
CREATE INDEX idx_jsonb ON events USING GIN (metadata);

-- Array overlap/containment
CREATE INDEX idx_tags ON posts USING GIN (tags);

-- Trigram similarity (pg_trgm)
CREATE INDEX idx_trgm ON users USING GIN (name gin_trgm_ops);

Monitoring Build Progress

SELECT
    p.pid,
    p.phase,
    p.blocks_total,
    p.blocks_done,
    round(100.0 * p.blocks_done / nullif(p.blocks_total, 0), 1) AS pct_done
FROM pg_stat_progress_create_index p;

Note

Parallel index builds were already available for B-tree (since PG11). PostgreSQL 18 extends this to GIN.

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence
A

PostgreSQL JSONB Indexing Strategies

Choose the right index type based on your query patterns:

Index Types Comparison

Index Type Best For Operators Supported
GIN (default) Containment, key existence @>, ?, ?&, ?|
GIN (jsonb_path_ops) Containment only (smaller, faster) @> only
B-tree (on expression) Equality on extracted values =, <, >, etc.
Hash (on expression) Equality only =

GIN Index (Most Common)

-- Default GIN: supports all JSONB operators
CREATE INDEX idx_data_gin ON events USING GIN (data);

-- Queries that use this index:
SELECT * FROM events WHERE data @> '{"type": "click"}';
SELECT * FROM events WHERE data ? 'user_id';
SELECT * FROM events WHERE data ?& array['type', 'timestamp'];

GIN with jsonb_path_ops (Optimized)

-- 2-3x smaller, faster for @> only
CREATE INDEX idx_data_pathops ON events USING GIN (data jsonb_path_ops);

-- Only supports containment:
SELECT * FROM events WHERE data @> '{"type": "click"}';

-- Does NOT support:
SELECT * FROM events WHERE data ? 'user_id';  -- Uses seq scan

B-tree on Extracted Value

-- Best for equality/range on specific keys
CREATE INDEX idx_user_id ON events ((data->>'user_id'));
CREATE INDEX idx_timestamp ON events (((data->>'timestamp')::timestamptz));

-- Queries that use this index:
SELECT * FROM events WHERE data->>'user_id' = '12345';
SELECT * FROM events WHERE (data->>'timestamp')::timestamptz > '2025-01-01';

Partial Index (Performance Boost)

-- Index only relevant rows
CREATE INDEX idx_clicks ON events USING GIN (data jsonb_path_ops)
WHERE data->>'type' = 'click';

-- Smaller index, faster for filtered queries
SELECT * FROM events WHERE data @> '{"action": "purchase"}'
AND data->>'type' = 'click';

Decision Guide

Need to search ANY key/value? -> GIN (default)
Only use @> containment?      -> GIN (jsonb_path_ops)
Query specific scalar value?  -> B-tree expression index
Filter + containment?         -> Partial GIN index

Performance Example

-- Before optimization: 1200ms
SELECT * FROM events WHERE data @> '{"user_id": "12345"}';

-- After GIN jsonb_path_ops + partial index: 75ms
CREATE INDEX idx_events_jsonb ON events
USING GIN (data jsonb_path_ops)
WHERE data ? 'user_id';

Source: PostgreSQL Documentation - JSON Types
https://www.postgresql.org/docs/current/datatype-json.html

99% confidence

replication

2 questions
A

PostgreSQL 17 Failover Logical Replication Slots

PostgreSQL 17 enables logical replication slots to survive primary failover by synchronizing them to standby servers.

Prerequisites

  1. Physical streaming replication between primary and standby
  2. Hot standby enabled on standby
  3. hot_standby_feedback = on on standby
  4. primary_slot_name configured on standby

Configuration

On Primary (postgresql.conf):

-- List standbys that should receive slot sync
synchronized_standby_slots = 'standby1_slot'

On Standby (postgresql.conf):

-- Enable slot synchronization
sync_replication_slots = on
hot_standby_feedback = on
primary_slot_name = 'standby1_slot'

Creating Failover-Enabled Slots

-- Method 1: Direct slot creation with failover flag
SELECT pg_create_logical_replication_slot(
    'my_slot',
    'pgoutput',
    false,      -- temporary
    false,      -- two_phase
    true        -- failover (NEW in PG17)
);

-- Method 2: Via subscription
CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=primary dbname=mydb'
    PUBLICATION my_pub
    WITH (failover = true);

Monitoring

-- Check slot sync status on standby
SELECT slot_name, synced, active
FROM pg_replication_slots;

-- synced = true means slot is ready for failover

-- Check if slot changes are synchronized
SELECT * FROM pg_stat_replication_slots;

Failover Process

  1. Primary fails
  2. Standby promotes to new primary
  3. Synced slots (where synced = true) become active
  4. Subscribers reconnect to new primary
  5. Logical replication continues from last confirmed LSN

Important Notes

  • Only slots with synced = true at failover time can be used
  • Physical slot between primary/standby is required
  • Slot sync happens periodically via slotsync worker
  • Logical replication changes aren't consumed until standby confirms receipt

Source: PostgreSQL 17 Documentation - Logical Replication Failover
https://www.postgresql.org/docs/17/logical-replication-failover.html

99% confidence
A

PostgreSQL 18 idle_replication_slot_timeout

idle_replication_slot_timeout automatically invalidates replication slots that have been inactive for a specified period, preventing WAL bloat.

The Problem It Solves

Abandoned replication slots prevent WAL cleanup, leading to:

  • Disk space exhaustion
  • Potential database unavailability
  • Manual intervention required

Configuration

-- Set timeout (default: 0 = disabled)
ALTER SYSTEM SET idle_replication_slot_timeout = '1d';  -- 1 day
SELECT pg_reload_conf();

-- Check current setting
SHOW idle_replication_slot_timeout;

Valid Values

Value Meaning
0 Disabled (default)
30min 30 minutes
1h 1 hour
1d 1 day
7d 1 week

Comparison with max_slot_wal_keep_size

Parameter Triggers On Use Case
max_slot_wal_keep_size WAL size exceeds limit Protect disk space
idle_replication_slot_timeout Time since last activity Clean up abandoned slots

Example Scenario

-- Create a slot
SELECT pg_create_logical_replication_slot('test_slot', 'pgoutput');

-- Slot becomes inactive (subscriber disconnects and never reconnects)
-- After idle_replication_slot_timeout passes, slot is invalidated

-- Check slot status
SELECT slot_name, active, invalidation_reason
FROM pg_replication_slots;

-- invalidation_reason will show 'idle_timeout' if expired

Best Practice

-- Combine both protections
max_slot_wal_keep_size = '100GB'      -- WAL size limit
idle_replication_slot_timeout = '7d'   -- Time limit

Source: PostgreSQL 18 Documentation - Replication Configuration
https://www.postgresql.org/docs/18/runtime-config-replication.html

99% confidence

authentication

2 questions
A

PostgreSQL 18 OAuth 2.0 Authentication

Yes. PostgreSQL 18 introduces OAuth 2.0 authentication support, allowing integration with modern identity providers like Okta, Auth0, Azure AD, and Keycloak.

Configuration (pg_hba.conf)

# OAuth 2.0 authentication
host    all    all    0.0.0.0/0    oauth issuer="https://auth.example.com" client_id="pg_client"

How It Works

  1. Client requests access token from OAuth provider
  2. Client connects to PostgreSQL with token
  3. PostgreSQL validates token with issuer
  4. Connection established if token valid

Server Configuration

-- postgresql.conf
oauth_issuer = 'https://auth.example.com'
oauth_client_id = 'postgresql-server'
oauth_client_secret = 'your-secret'  -- Or use file

Client Connection

# Using psql with OAuth token
PGOAUTHTOKEN="eyJhbG..." psql -h myserver -U myuser -d mydb

# Using libpq connection string
psql "host=myserver user=myuser oauth_token=eyJhbG..."

Supported Flows

Flow Use Case
Client Credentials Service-to-service
Authorization Code Interactive users
Device Authorization CLI tools

Provider Examples

# Azure AD
host all all 0.0.0.0/0 oauth \
    issuer="https://login.microsoftonline.com/{tenant}/v2.0" \
    client_id="your-app-id"

# Okta
host all all 0.0.0.0/0 oauth \
    issuer="https://your-domain.okta.com/oauth2/default" \
    client_id="your-client-id"

# Keycloak
host all all 0.0.0.0/0 oauth \
    issuer="https://keycloak.example.com/realms/myrealm" \
    client_id="postgresql"

Security Notes

  • Tokens validated via OIDC discovery document
  • JWT signature verification automatic
  • Token expiration enforced
  • MD5 password auth deprecated in favor of SCRAM-SHA-256 or OAuth

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence
A

PostgreSQL 18 MD5 Authentication Deprecation

Yes. PostgreSQL 18 officially deprecates MD5 password authentication in favor of SCRAM-SHA-256.

Deprecation Notice

When using MD5 authentication in PostgreSQL 18, you'll see warnings in logs:

WARNING: md5 authentication is deprecated and will be removed in a future release
HINT: Use scram-sha-256 authentication instead.

Migration Steps

1. Check Current Configuration

-- Check current password encryption
SHOW password_encryption;  -- Should be 'scram-sha-256'

-- Check pg_hba.conf entries
-- Look for 'md5' in auth-method column

2. Update Server Configuration

-- postgresql.conf
password_encryption = scram-sha-256  -- Already default since PG14

3. Re-encrypt User Passwords

-- Users must reset passwords to use SCRAM
ALTER USER myuser PASSWORD 'new_secure_password';

-- Verify password type
SELECT usename, passwd LIKE 'SCRAM%' AS is_scram
FROM pg_shadow
WHERE usename = 'myuser';

4. Update pg_hba.conf

# Before (deprecated)
host    all    all    0.0.0.0/0    md5

# After (recommended)
host    all    all    0.0.0.0/0    scram-sha-256

Why SCRAM-SHA-256 Is Better

Aspect MD5 SCRAM-SHA-256
Algorithm strength Weak (broken) Strong
Replay attacks Vulnerable Protected
Man-in-middle Vulnerable Protected
Channel binding No Yes
Password storage Weak hash Salted, iterated

Compatibility Notes

-- Clients must support SCRAM
-- libpq 10+ supports SCRAM
-- Most drivers updated years ago

-- Check client library version
SELECT version();  -- Server version
-- Client: psql --version, check driver docs

Migration Timeline

Version Status
PG 10 SCRAM-SHA-256 introduced
PG 14 SCRAM-SHA-256 default for new passwords
PG 18 MD5 deprecated (warnings)
PG 19+ MD5 may be removed

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence

query_optimization

2 questions
A

PostgreSQL 18 Self-Join Elimination

PostgreSQL 18 automatically removes unnecessary self-joins from queries, improving performance without query rewrites.

The Optimization

When a table is joined to itself on its primary key (or unique columns), and all referenced columns come from one instance, PostgreSQL 18 eliminates the redundant scan.

Example

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT,
    total NUMERIC,
    status TEXT
);

-- This query has a redundant self-join
SELECT o1.id, o1.total, o2.status
FROM orders o1
JOIN orders o2 ON o1.id = o2.id;

-- PostgreSQL 18 optimizes to:
-- SELECT id, total, status FROM orders;

EXPLAIN Comparison

-- PostgreSQL 17 plan
EXPLAIN SELECT o1.id, o1.total, o2.status FROM orders o1 JOIN orders o2 ON o1.id = o2.id;

-- Hash Join  (cost=...)
--   Hash Cond: (o1.id = o2.id)
--   ->  Seq Scan on orders o1
--   ->  Hash
--         ->  Seq Scan on orders o2

-- PostgreSQL 18 plan (self-join eliminated)
EXPLAIN SELECT o1.id, o1.total, o2.status FROM orders o1 JOIN orders o2 ON o1.id = o2.id;

-- Seq Scan on orders  (cost=...)

When It Applies

Condition Self-Join Eliminated?
Join on PRIMARY KEY Yes
Join on UNIQUE column Yes
Join on non-unique column No
LEFT JOIN (outer) Depends on NULL handling
Columns from both aliases used No (both needed)

Real-World Scenarios

-- ORM-generated queries often have redundant joins
-- Example: Eager loading that references same table

-- View that accidentally self-joins
CREATE VIEW order_details AS
SELECT o.id, o.total, o2.status
FROM orders o
JOIN orders o2 ON o.id = o2.id;  -- PG18 optimizes this

-- Subquery correlation patterns
SELECT * FROM orders o
WHERE total = (SELECT total FROM orders WHERE id = o.id);

Performance Impact

  • Reduces I/O (one scan instead of two)
  • Eliminates join computation overhead
  • Better for read-heavy OLTP workloads
  • Benefits ORM-heavy applications

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence
A

PostgreSQL 18 IN to ANY Transformation

PostgreSQL 18 automatically converts IN (VALUES ...) to = ANY(...) for better optimizer statistics and faster index usage.

The Optimization

-- You write:
SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5);

-- PostgreSQL 18 internally optimizes to:
SELECT * FROM users WHERE id = ANY(ARRAY[1, 2, 3, 4, 5]);

Why It Matters

Aspect IN (VALUES ...) = ANY (array)
Statistics Treated as separate conditions Single array comparison
Index usage May not use index optimally Better index scan planning
Large lists Creates many OR conditions Single array operation
Plan caching Plan varies with list size More consistent plans

EXPLAIN Comparison

-- PostgreSQL 17
EXPLAIN SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');
-- Shows: Filter: (status = ANY ('{pending,processing,shipped}'::text[]))
-- But internally treated differently for statistics

-- PostgreSQL 18: Better cardinality estimates
EXPLAIN (ANALYZE, VERBOSE)
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');
-- More accurate row estimates, better join ordering

Related: OR to ANY Transformation

PostgreSQL 18 also converts OR clauses to arrays:

-- You write:
SELECT * FROM users
WHERE email = '[email protected]' OR email = '[email protected]' OR email = '[email protected]';

-- PostgreSQL 18 transforms to:
SELECT * FROM users
WHERE email = ANY(ARRAY['[email protected]', '[email protected]', '[email protected]']);

Performance Impact

-- Large IN lists benefit most
SELECT * FROM products WHERE id IN (
    SELECT id FROM temp_import_ids  -- 10,000 IDs
);

-- PostgreSQL 18: Uses array comparison
-- Better statistics estimation
-- More efficient index usage
-- Faster execution for large lists

When It Applies

Pattern Transformed?
IN (1, 2, 3) Yes
IN (SELECT ...) Depends
NOT IN (...) Yes
col1 IN (...) AND col2 IN (...) Yes (each)
Dynamic/prepared statement params Yes

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence

server_side_programming

2 questions
A

BOOLEAN vs BIT in PostgreSQL

BOOLEAN is PostgreSQL's native boolean data type that stores true/false/null values. It occupies 1 byte of storage.

BIT(n) is a fixed-length bit string type that stores exactly n bits (where n must be specified). BIT without length defaults to BIT(1), storing a single bit ('0' or '1').

Key Differences

Storage and Values

-- BOOLEAN: stores logical true/false/null
CREATE TABLE bool_test (
    flag BOOLEAN
);
INSERT INTO bool_test VALUES (TRUE), (FALSE), (NULL);
INSERT INTO bool_test VALUES ('yes'), ('no'), ('1'), ('0'); -- All valid

-- BIT: stores binary bit strings
CREATE TABLE bit_test (
    single_bit BIT(1),
    multi_bit BIT(8)
);
INSERT INTO bit_test VALUES (B'1', B'10110101');
INSERT INTO bit_test VALUES ('0', '11111111'); -- String literals work

Comparison Behavior

-- BOOLEAN uses logical operators
SELECT TRUE AND FALSE;  -- Returns FALSE
SELECT TRUE OR FALSE;   -- Returns TRUE

-- BIT uses bitwise operators
SELECT B'1101' & B'1011';  -- Bitwise AND: B'1001'
SELECT B'1101' | B'1011';  -- Bitwise OR: B'1111'

Critical Distinctions

  1. NULL handling: BOOLEAN supports NULL (three-valued logic). BIT does not allow NULL unless column is explicitly nullable.

  2. Input formats: BOOLEAN accepts: TRUE, FALSE, 't', 'f', 'yes', 'no', 'y', 'n', '1', '0', 'on', 'off'. BIT requires binary literals like B'1' or string equivalents.

  3. Comparison: BOOLEAN comparisons use boolean logic. BIT comparisons are lexicographic (like strings).

  4. Use case: Use BOOLEAN for true/false flags. Use BIT for actual bit manipulation, bitmasks, or binary data.

Source: PostgreSQL 16 official documentation (Chapter 8.6 Boolean Type, 8.10 Bit String Types)

99% confidence
A

Creating a Table from a SELECT Query in PostgreSQL

Use CREATE TABLE AS (CTAS) to create a table from a SELECT query:

CREATE TABLE new_table AS
SELECT column1, column2, column3
FROM existing_table
WHERE condition;

Key characteristics:

  • Creates table structure automatically based on SELECT result columns
  • Inserts all rows returned by the SELECT
  • Does NOT copy indexes, constraints, or defaults from source tables
  • Does NOT create a primary key automatically
  • Column names match the SELECT list (use aliases to rename: SELECT col AS new_name)

With additional options (PostgreSQL 9.5+):

CREATE TABLE new_table AS
SELECT * FROM existing_table
WITH NO DATA;  -- Creates structure only, no rows

Alternative syntax using SELECT INTO (identical functionality):

SELECT column1, column2
INTO new_table
FROM existing_table;

To include constraints after creation:

CREATE TABLE new_table AS
SELECT * FROM existing_table;

ALTER TABLE new_table ADD PRIMARY KEY (id);
CREATE INDEX idx_name ON new_table(column_name);

Temporary table variant:

CREATE TEMP TABLE temp_table AS
SELECT * FROM existing_table;

Authority: PostgreSQL official documentation (CREATE TABLE AS command, compatible with PostgreSQL 9.0+, syntax unchanged through PostgreSQL 17).

99% confidence

sql_json_features

2 questions
A

JSONB vs JSON in PostgreSQL

Use JSONB (binary JSON) for almost all cases. Use JSON only when you need to preserve exact formatting/whitespace or key ordering.

Key Differences

Storage Format:

  • JSON: Stored as plain text, exact copy of input string
  • JSONB: Stored in decomposed binary format

Processing Speed:

  • JSON: Faster to insert (no processing overhead)
  • JSONB: Significantly faster to query/process (typically 2-10x faster for operations)

Indexing:

  • JSON: Cannot be indexed directly
  • JSONB: Supports GIN and GiST indexes for efficient queries

Whitespace & Key Order:

  • JSON: Preserves whitespace and duplicate keys
  • JSONB: Removes whitespace, eliminates duplicate keys (keeps last value), does NOT preserve key order

Supported Operators:

  • JSON: Limited operators (mostly extraction)
  • JSONB: Full operator support including containment (@>, <@), existence (?, ?|, ?&)

Practical Example

-- JSONB with indexing (recommended)
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    data JSONB NOT NULL
);

-- Create GIN index for fast queries
CREATE INDEX idx_events_data ON events USING GIN (data);

-- Fast containment query (only works with JSONB)
SELECT * FROM events WHERE data @> '{"status": "active"}';

-- Extract and query nested values efficiently
SELECT * FROM events WHERE data->'user'->>'email' = '[email protected]';

Storage Overhead

JSONB typically uses slightly more disk space (5-10% overhead) due to binary encoding metadata, but this is offset by query performance gains.

Version Note

JSONB introduced in PostgreSQL 9.4. As of PostgreSQL 12+, JSONB performance continues to improve with better compression and faster processing.

Source: PostgreSQL Official Documentation v16 - Chapter 8.14 JSON Types

99% confidence
A

Use JSONB for almost all use cases. Use JSON only when you need to preserve exact input formatting.

When to Use JSONB (Default Choice)

Choose JSONB when:

  • You need indexing (GIN indexes for fast lookups)
  • You need containment queries (@> operator to test if one document contains another)
  • You want faster processing (no reparsing needed on each query)
  • You need subscripting (array-style access to extract/modify nested values)
  • Performance matters for querying/processing data

The official PostgreSQL documentation explicitly states: "In general, most applications should prefer to store JSON data as jsonb"

When to Use JSON (Rare Cases)

Choose JSON only when:

  • You must preserve exact input formatting (whitespace, key ordering)
  • You need to keep duplicate object keys (JSONB keeps only the last value)
  • You have legacy systems with specific assumptions about JSON structure

Technical Differences

Feature JSON JSONB
Storage Exact text copy Decomposed binary format
Insert speed Faster Slightly slower (conversion overhead)
Query speed Slower (reparses each time) Significantly faster
Indexing Not supported GIN indexes supported
Whitespace Preserved Discarded
Key ordering Preserved Not preserved
Duplicate keys Preserved Only last value kept

Performance Note

JSONB is faster for processing because it stores data in a decomposed binary format. While this adds slight conversion overhead during insertion, it eliminates the need to reparse on each execution.

Sources:

99% confidence

performance_tuning

1 question
A

PostgreSQL 18 Asynchronous I/O Performance

PostgreSQL 18 introduces an asynchronous I/O (AIO) subsystem that can deliver 2-3x performance improvements for read-heavy workloads.

How io_uring Works

io_uring establishes a shared ring buffer between PostgreSQL and the Linux kernel, allowing:

  1. Multiple I/O requests to be submitted in a single syscall
  2. Completions to be reaped without blocking
  3. Zero-copy data transfer in many cases

Performance by Environment

Environment Improvement Notes
Cloud (EBS, network storage) 2-3x Highest gains due to I/O latency
Local SSD ~24% Still beneficial but less dramatic
Warm cache Minimal Data already in memory

Supported Operations (PostgreSQL 18)

  • Sequential scans
  • Bitmap heap scans
  • VACUUM operations

Not yet supported: Index scans, write operations, WAL

Configuration

-- Enable io_uring (Linux only)
ALTER SYSTEM SET io_method = 'io_uring';
SELECT pg_reload_conf();

-- Verify
SHOW io_method;

Requirements

  • Linux kernel 5.1 or later
  • File system must support io_uring (ext4, XFS, etc.)
  • Requires liburing library at compile time

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence

data_types

1 question
A

PostgreSQL 18 uuidv7() Function

uuidv7() generates timestamp-ordered UUIDs that are optimal for B-tree indexes and distributed systems.

Key Characteristics

  • First 48 bits: Unix timestamp (millisecond precision)
  • Next 12 bits: Sub-millisecond counter for monotonicity
  • Remaining bits: Random data
  • Total: 128 bits (standard UUID size)

Usage

-- Generate a UUIDv7
SELECT uuidv7();
-- Result: 019376a8-5b40-7abc-8def-1234567890ab

-- Use as primary key default
CREATE TABLE events (
    id uuid DEFAULT uuidv7() PRIMARY KEY,
    event_type TEXT,
    payload JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Also available: explicit v4 function
SELECT uuidv4();  -- Alias for gen_random_uuid()

Performance vs UUIDv4

Metric UUIDv7 UUIDv4
Generation time 58.1 microseconds 86.8 microseconds
Throughput 34,127 ops/sec 29,238 ops/sec
Index fragmentation Minimal High
Insert performance Excellent (sequential) Poor (random)

Benefits

  1. Chronological ordering: Later UUIDs sort after earlier ones
  2. Reduced index fragmentation: New values always append to index end
  3. Better cache locality: Related records stored together
  4. Global uniqueness: No coordination needed across nodes

Considerations

  • Timestamp is visible in UUID (don't use for security-sensitive external IDs)
  • Monotonicity guaranteed only within same backend process
  • Requires NTP for clock synchronization across nodes

Source: PostgreSQL 18 Documentation - UUID Functions
https://www.postgresql.org/docs/18/functions-uuid.html

99% confidence

data_definition

1 question
A

PostgreSQL 18 Generated Columns Default

VIRTUAL is now the default for generated columns in PostgreSQL 18, changed from STORED in previous versions.

STORED vs VIRTUAL

Aspect STORED VIRTUAL (new default)
Storage Written to disk No disk space
Computation On INSERT/UPDATE On SELECT (read time)
Indexable Yes No
Adding to table Requires table rewrite Instant

Syntax

-- PostgreSQL 18: VIRTUAL is default
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    price NUMERIC(10,2),
    quantity INTEGER,
    total NUMERIC GENERATED ALWAYS AS (price * quantity)  -- VIRTUAL by default
);

-- Explicit STORED (for indexing)
CREATE TABLE products_indexed (
    id SERIAL PRIMARY KEY,
    first_name TEXT,
    last_name TEXT,
    full_name TEXT GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

-- Explicit VIRTUAL
CREATE TABLE calculations (
    a INTEGER,
    b INTEGER,
    sum INTEGER GENERATED ALWAYS AS (a + b) VIRTUAL
);

When to Use Each

Use VIRTUAL (default) when:

  • Adding columns to large existing tables (instant, no rewrite)
  • Write-heavy workloads (saves I/O)
  • Storage space is a concern
  • Column values change frequently via base columns

Use STORED when:

  • You need to create an index on the generated column
  • Read-heavy workloads with complex expressions
  • You need constraints (UNIQUE, NOT NULL, FK)

Current VIRTUAL Limitations

  • Cannot be indexed
  • No unique/foreign key constraints
  • Not supported in logical replication
  • Expression must use only built-in functions (no user-defined)

Source: PostgreSQL 18 Documentation - Generated Columns
https://www.postgresql.org/docs/18/ddl-generated-columns.html

99% confidence

backup_recovery

1 question
A

PostgreSQL 17 Incremental Backups

PostgreSQL 17 introduces native incremental backup support via pg_basebackup, capturing only changed blocks since the last backup.

Prerequisites

-- Enable summarization in postgresql.conf
summarize_wal = on

-- Restart PostgreSQL

Backup Commands

# 1. Create full backup (first time)
pg_basebackup -D /backups/full_backup -Fp -Xs -P

# 2. Create incremental backup (references full)
pg_basebackup -D /backups/incr_backup1 \
    --incremental=/backups/full_backup/backup_manifest \
    -Fp -Xs -P

# 3. Create second incremental (references first incremental)
pg_basebackup -D /backups/incr_backup2 \
    --incremental=/backups/incr_backup1/backup_manifest \
    -Fp -Xs -P

Restore with pg_combinebackup

# Combine backups into restore-ready directory
# Order matters: full -> incr1 -> incr2 -> ...
pg_combinebackup \
    /backups/full_backup \
    /backups/incr_backup1 \
    /backups/incr_backup2 \
    -o /restore/combined_backup

# Optional: Use hard links to save space
pg_combinebackup --link \
    /backups/full_backup \
    /backups/incr_backup1 \
    -o /restore/combined_backup

# Start PostgreSQL from combined backup
pg_ctl -D /restore/combined_backup start

Key Options

Option Description
--incremental=MANIFEST Path to previous backup's manifest
-o, --output Output directory for combined backup
-k, --link Use hard links (faster, saves space)
-n, --dry-run Show what would be done
--copy-file-range Use kernel copy optimization (Linux/FreeBSD)

Benefits

  • Faster backups: Only changed blocks transferred
  • Less storage: Incremental files much smaller than full
  • Faster recovery: Apply only recent changes
  • Point-in-time: Chain incrementals for specific recovery points

Source: PostgreSQL 17 Documentation - pg_combinebackup
https://www.postgresql.org/docs/17/app-pgcombinebackup.html

99% confidence

json_operations

1 question
A

PostgreSQL 17 JSON_TABLE Function

JSON_TABLE converts JSON data into a relational table format, allowing you to query JSON as if it were regular SQL rows and columns.

Basic Syntax

JSON_TABLE(
    json_expression,
    json_path_expression
    COLUMNS (
        column_name type PATH json_path [DEFAULT value ON EMPTY] [DEFAULT value ON ERROR]
    )
)

Examples

-- Basic usage: extract array elements as rows
SELECT * FROM JSON_TABLE(
    '[{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]'::jsonb,
    '$[*]'
    COLUMNS (
        name TEXT PATH '$.name',
        age INTEGER PATH '$.age'
    )
) AS jt;

-- Result:
-- name  | age
-- Alice | 30
-- Bob   | 25

-- Nested data with error handling
SELECT * FROM JSON_TABLE(
    '{"users": [{"id": 1, "profile": {"email": "[email protected]"}}]}'::jsonb,
    '$.users[*]'
    COLUMNS (
        user_id INTEGER PATH '$.id',
        email TEXT PATH '$.profile.email' DEFAULT 'unknown' ON EMPTY
    )
) AS users;

-- Use with actual table data
SELECT o.order_id, items.*
FROM orders o,
    JSON_TABLE(
        o.line_items,
        '$[*]'
        COLUMNS (
            product_id INTEGER PATH '$.product_id',
            quantity INTEGER PATH '$.qty',
            price NUMERIC PATH '$.price'
        )
    ) AS items;

Column Options

Clause Purpose
PATH JSON path to extract value
DEFAULT ... ON EMPTY Value when path returns nothing
DEFAULT ... ON ERROR Value when extraction fails
FOR ORDINALITY Row number counter
EXISTS Boolean: does path exist?

Related SQL/JSON Functions (PG17)

-- JSON_EXISTS: check if path exists
SELECT JSON_EXISTS('{"a": 1}'::jsonb, '$.a');  -- true

-- JSON_QUERY: extract JSON fragment
SELECT JSON_QUERY('{"a": {"b": 1}}'::jsonb, '$.a');  -- {"b": 1}

-- JSON_VALUE: extract scalar value
SELECT JSON_VALUE('{"name": "test"}'::jsonb, '$.name');  -- test

Source: PostgreSQL 17 Documentation - JSON Functions
https://www.postgresql.org/docs/17/functions-json.html

99% confidence

vacuum_maintenance

1 question
A

PostgreSQL 17 VACUUM Memory Improvements

PostgreSQL 17 introduces a new internal memory structure for VACUUM that consumes up to 20x less memory than previous versions.

Memory Comparison

Scenario PostgreSQL 16 PostgreSQL 17
1M dead tuples ~128 MB ~6 MB
10M dead tuples ~1.28 GB ~64 MB
Large table vacuum Often OOM Stable

How It Works

Previous versions stored dead tuple IDs in a flat array that grew linearly. PostgreSQL 17 uses a radix tree (TID store) that:

  1. Compresses common prefixes of tuple IDs
  2. Scales sub-linearly with dead tuple count
  3. Reduces memory fragmentation
  4. Improves cache efficiency

Configuration

The new structure respects existing settings:

-- maintenance_work_mem still applies
SHOW maintenance_work_mem;  -- default: 64MB

-- But now processes more dead tuples per memory unit
-- A 64MB setting can now handle ~200M dead tuples
-- Previously limited to ~5M dead tuples

Practical Impact

  • Fewer VACUUM passes: More dead tuples processed per pass
  • Reduced OOM risk: Large table vacuums less likely to fail
  • Lower memory pressure: Better for shared hosting / containers
  • Faster completion: Less time spent on memory management

Verification

-- Check vacuum progress (unchanged API)
SELECT * FROM pg_stat_progress_vacuum;

-- Monitor memory in pg_stat_activity
SELECT pid, state, query, backend_type
FROM pg_stat_activity
WHERE backend_type = 'autovacuum worker';

Source: PostgreSQL 17 Release Notes
https://www.postgresql.org/docs/17/release-17.html

99% confidence

administration

1 question
A

PostgreSQL 18 Statistics Preservation During Upgrade

Yes. PostgreSQL 18 preserves planner statistics during pg_upgrade, eliminating the need for lengthy post-upgrade ANALYZE operations.

The Previous Problem (Pre-PG18)

# After pg_upgrade, statistics were empty
pg_upgrade -d /old/data -D /new/data ...

# Required running ANALYZE on entire database (could take hours)
vacuumdb --all --analyze-in-stages

PostgreSQL 18 Behavior

# Statistics now preserved automatically
pg_upgrade -d /old/data -D /new/data -b /old/bin -B /new/bin

# Database ready immediately with accurate query plans!

What's Preserved

Statistic Type Preserved?
Column statistics (pg_statistic) Yes
Extended statistics Yes
Most common values Yes
Histograms Yes
NULL fractions Yes
Correlation values Yes

Upgrade Time Comparison

Database Size PG17 Upgrade + ANALYZE PG18 Upgrade
100 GB 2 hours 20 minutes
1 TB 12+ hours 2 hours
10 TB Days Hours

Verification

-- After upgrade, check statistics exist
SELECT
    schemaname,
    tablename,
    last_analyze,
    n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 10;

-- last_analyze will show pre-upgrade timestamp
-- Statistics are already present for query planning

Best Practice

Even with preserved statistics, consider running ANALYZE after upgrade for:

  • Tables with significant changes during upgrade window
  • New columns added during upgrade
  • Any tables showing poor query performance
-- Optional: Refresh statistics for specific tables
ANALYZE verbose large_table;

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence

data_manipulation

1 question
A

PostgreSQL 17 MERGE with RETURNING

PostgreSQL 17 adds RETURNING clause support to MERGE statements, allowing you to see which rows were affected and how.

Basic Syntax

MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN
    UPDATE SET value = s.value
WHEN NOT MATCHED THEN
    INSERT (id, value) VALUES (s.id, s.value)
RETURNING
    t.*,
    merge_action()  -- 'INSERT', 'UPDATE', or 'DELETE'
;

The merge_action() Function

Returns which action was taken for each row:

Return Value Meaning
'INSERT' New row inserted
'UPDATE' Existing row updated
'DELETE' Row deleted (WHEN MATCHED ... DELETE)

Practical Examples

-- Upsert with audit trail
WITH merge_results AS (
    MERGE INTO products p
    USING staging_products s ON p.sku = s.sku
    WHEN MATCHED THEN
        UPDATE SET price = s.price, updated_at = now()
    WHEN NOT MATCHED THEN
        INSERT (sku, name, price, created_at)
        VALUES (s.sku, s.name, s.price, now())
    RETURNING p.id, p.sku, merge_action() AS action
)
INSERT INTO product_audit (product_id, sku, action, performed_at)
SELECT id, sku, action, now() FROM merge_results;

-- Count operations
WITH results AS (
    MERGE INTO inventory i
    USING shipment s ON i.product_id = s.product_id
    WHEN MATCHED THEN
        UPDATE SET quantity = i.quantity + s.quantity
    WHEN NOT MATCHED THEN
        INSERT (product_id, quantity) VALUES (s.product_id, s.quantity)
    RETURNING merge_action() AS action
)
SELECT action, count(*)
FROM results
GROUP BY action;

-- Result:
-- action | count
-- INSERT | 45
-- UPDATE | 123

MERGE on Views (Also New in PG17)

-- MERGE can now update views (with rules/triggers)
CREATE VIEW active_users AS
    SELECT * FROM users WHERE status = 'active';

MERGE INTO active_users v
USING updates u ON v.id = u.id
WHEN MATCHED THEN UPDATE SET email = u.email
RETURNING v.id, merge_action();

Source: PostgreSQL 17 Documentation - MERGE
https://www.postgresql.org/docs/17/sql-merge.html

99% confidence

security

1 question
A

PostgreSQL 17 Direct SSL Connection

PostgreSQL 17 introduces sslnegotiation=direct for faster, more secure TLS connections that skip the plaintext negotiation phase.

Traditional SSL Negotiation (Pre-PG17)

  1. Client connects (plaintext)
  2. Client sends SSLRequest packet
  3. Server responds 'S' (SSL) or 'N' (no SSL)
  4. TLS handshake begins
  5. Encrypted communication

Problem: Initial packets are unencrypted, vulnerable to downgrade attacks.

Direct SSL (PostgreSQL 17+)

  1. Client starts TLS handshake immediately (using ALPN)
  2. Server responds with TLS
  3. Encrypted communication

No plaintext phase. Connection is encrypted from the first byte.

Client Configuration

# Enable direct SSL in connection string
psql "host=db.example.com sslmode=require sslnegotiation=direct"

# Or via environment variable
export PGSSLNEGOTIATION=direct
psql -h db.example.com -U myuser

# libpq connection string
PQconnectdb("host=db.example.com sslmode=require sslnegotiation=direct")

Server Configuration

-- Server must have SSL enabled (postgresql.conf)
ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'

-- No special config needed for direct SSL
-- Server auto-detects connection type

Requirements

Requirement Details
PostgreSQL version 17+ (both client and server)
Protocol ALPN (Application-Layer Protocol Negotiation)
TLS version 1.2+ recommended
Network Port must allow direct TLS (no TLS-terminating proxy)

Compatibility Notes

# Falls back gracefully to negotiated SSL if server < PG17
psql "sslnegotiation=direct sslmode=prefer" -h oldserver
# Works with negotiated SSL

# Strict mode - fail if direct not supported
psql "sslnegotiation=direct sslmode=require" -h oldserver
# Connection fails

Security Benefits

  • No plaintext "probe" phase
  • Immune to SSL stripping attacks
  • Matches modern TLS best practices
  • Compatible with strict firewall policies

Source: PostgreSQL 17 Documentation - Connection Parameters
https://www.postgresql.org/docs/17/libpq-connect.html

99% confidence

data_loading

1 question
A

PostgreSQL 17 COPY Performance

PostgreSQL 17 delivers up to 2x faster COPY operations for bulk loading and exporting large rows.

Performance Comparison

Operation PostgreSQL 16 PostgreSQL 17 Improvement
COPY FROM (large rows) 100 MB/s 180 MB/s 1.8x
COPY TO (export) 120 MB/s 200 MB/s 1.7x
COPY with FREEZE 90 MB/s 170 MB/s 1.9x

Benchmarks on NVMe SSD, 64-core server, varies by hardware

What Changed

  1. Reduced memory allocation overhead: Batch buffer management
  2. Optimized tuple formation: Less copying of large values
  3. Improved I/O batching: Better write coalescing
  4. TOAST handling: More efficient for large text/bytea

Best Practices for Fast COPY

-- Optimal bulk load settings
SET maintenance_work_mem = '2GB';
SET max_wal_size = '10GB';

-- Use FREEZE for initial loads (skips VACUUM)
COPY large_table FROM '/data/file.csv' WITH (FREEZE);

-- Binary format for numeric-heavy data
COPY my_table TO '/backup/data.bin' WITH (FORMAT binary);
COPY my_table FROM '/backup/data.bin' WITH (FORMAT binary);

-- Parallel load via multiple connections
-- Split file and run concurrent COPY commands

Monitoring COPY Progress

-- Check progress (PostgreSQL 14+)
SELECT
    command,
    bytes_processed,
    bytes_total,
    tuples_processed,
    round(100.0 * bytes_processed / nullif(bytes_total, 0), 1) AS pct_done
FROM pg_stat_progress_copy;

COPY Options Recap

Option Purpose
FREEZE Skip VACUUM for new tables
FORMAT binary Faster for numeric data
PARALLEL n Multiple workers (planned)
ON_ERROR ignore Skip bad rows
HEADER Skip/include CSV header

Example: Large Data Load

# Fast CSV import
psql -c "COPY events FROM PROGRAM 'zcat events.csv.gz' WITH (FORMAT csv, HEADER);"

# With progress monitoring
psql -c "
  SET maintenance_work_mem = '1GB';
  COPY events FROM '/data/events.csv' WITH (FORMAT csv, HEADER, FREEZE);
"

Source: PostgreSQL 17 Release Notes
https://www.postgresql.org/docs/17/release-17.html

99% confidence

wal_configuration

1 question
A

PostgreSQL 17 WAL Processing Improvements

PostgreSQL 17 doubles the ability to handle concurrent transactions through improved Write-Ahead Log (WAL) processing.

Key Improvements

Area Improvement Impact
WAL insertion locks Reduced contention 2x concurrent writes
WAL buffer management Batch flushing Less I/O wait
Recovery performance Parallel apply Faster crash recovery
Replication throughput Optimized streaming Higher replica lag tolerance

Benchmark Results

Workload: pgbench, 128 clients, read-write
Hardware: 64-core, NVMe storage

PostgreSQL 16: 45,000 TPS
PostgreSQL 17: 78,000 TPS (+73%)

Configuration Tuning

-- WAL settings for high concurrency (postgresql.conf)

-- Larger WAL buffers (default 4MB per WAL segment)
wal_buffers = '64MB'

-- Increase max WAL size before checkpoint
max_wal_size = '4GB'
min_wal_size = '1GB'

-- Commit delay for batching (microseconds)
commit_delay = 10        -- 0 = disabled
commit_siblings = 5      -- Min concurrent txns to trigger delay

-- Compression for network replication
wal_compression = lz4    -- New in PG15+, helps PG17

Monitoring WAL Performance

-- WAL statistics
SELECT * FROM pg_stat_wal;

-- Key metrics to watch:
-- wal_records: Total WAL records written
-- wal_bytes: Total WAL bytes
-- wal_buffers_full: Indicates if wal_buffers too small
-- wal_sync_time: Time spent syncing WAL

-- Check for WAL contention
SELECT wait_event, count(*)
FROM pg_stat_activity
WHERE wait_event_type = 'LWLock'
  AND wait_event LIKE 'WAL%'
GROUP BY wait_event;

Impact on Replication

-- Check replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- PG17: Lower lag under high write load

What This Means

  • OLTP workloads: More transactions per second
  • Replication: Better kept in sync under load
  • Recovery: Faster restart after crash
  • Cloud: Better performance on network storage (EBS, etc.)

Source: PostgreSQL 17 Release Notes
https://www.postgresql.org/docs/17/release-17.html

99% confidence

roles_permissions

1 question
A

PostgreSQL 17 pg_maintain Role

pg_maintain is a new predefined role that grants maintenance operation privileges without requiring explicit MAINTAIN privilege on every table.

Purpose

Before PG17, running VACUUM or ANALYZE on tables required:

  • Superuser access, OR
  • Table ownership, OR
  • Explicit MAINTAIN privilege per table

With pg_maintain, a non-superuser can maintain all tables in a database.

Granting Access

-- Grant pg_maintain to a user
GRANT pg_maintain TO maintenance_user;

-- User can now run VACUUM/ANALYZE on any table
\c mydb maintenance_user
VACUUM my_schema.large_table;    -- Works!
ANALYZE another_schema.stats;    -- Works!
REINDEX TABLE public.my_index;   -- Works!

Operations Allowed

Operation pg_maintain Allows
VACUUM Yes
ANALYZE Yes
REINDEX Yes
CLUSTER Yes
REFRESH MATERIALIZED VIEW Yes
LOCK TABLE No
TRUNCATE No
DROP No

Comparison with Other Approaches

-- Method 1: Superuser (dangerous)
GRANT superuser TO maintenance_user;  -- Too much access!

-- Method 2: Per-table MAINTAIN (tedious)
GRANT MAINTAIN ON table1 TO maintenance_user;
GRANT MAINTAIN ON table2 TO maintenance_user;
-- ... repeat for every table

-- Method 3: pg_maintain (PostgreSQL 17+, recommended)
GRANT pg_maintain TO maintenance_user;  -- All tables, safely

Security Notes

-- pg_maintain does NOT allow:
-- - Reading table data
-- - Modifying table data
-- - Changing table structure
-- - Dropping objects

-- It ONLY allows maintenance operations that:
-- - Reclaim space (VACUUM)
-- - Update statistics (ANALYZE)
-- - Rebuild indexes (REINDEX)

Use Cases

  1. DBA assistants: Run maintenance without full superuser
  2. Monitoring tools: Auto-vacuum analysis without elevated access
  3. CI/CD pipelines: Post-migration ANALYZE
  4. Managed services: Limit blast radius of maintenance accounts

Source: PostgreSQL 17 Documentation - Predefined Roles
https://www.postgresql.org/docs/17/predefined-roles.html

99% confidence

monitoring

1 question
A

PostgreSQL 18 pg_stat_all_tables New Columns

PostgreSQL 18 adds vacuum and analyze timing columns to pg_stat_all_tables for better maintenance monitoring.

New Columns

Column Type Description
total_vacuum_time double precision Total time spent vacuuming (ms)
total_analyze_time double precision Total time spent analyzing (ms)

Usage

-- Find tables with longest vacuum times
SELECT
    schemaname,
    relname,
    vacuum_count,
    round(total_vacuum_time::numeric / 1000, 2) AS vacuum_time_sec,
    round(total_vacuum_time::numeric / nullif(vacuum_count, 0) / 1000, 2) AS avg_vacuum_sec
FROM pg_stat_all_tables
WHERE total_vacuum_time > 0
ORDER BY total_vacuum_time DESC
LIMIT 10;

-- Tables needing vacuum optimization
SELECT
    relname,
    n_dead_tup,
    last_vacuum,
    vacuum_count,
    round(total_vacuum_time / 1000, 1) AS total_vacuum_secs,
    round((total_vacuum_time / nullif(vacuum_count, 0)) / 1000, 1) AS avg_vacuum_secs
FROM pg_stat_user_tables
WHERE vacuum_count > 0
  AND (total_vacuum_time / nullif(vacuum_count, 0)) > 10000  -- avg > 10 sec
ORDER BY total_vacuum_time DESC;

Monitoring Query

-- Comprehensive maintenance dashboard
SELECT
    relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS size,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    vacuum_count,
    autovacuum_count,
    analyze_count,
    autoanalyze_count,
    round(total_vacuum_time / 1000, 1) AS vacuum_time_sec,
    round(total_analyze_time / 1000, 1) AS analyze_time_sec,
    last_vacuum,
    last_autovacuum,
    last_analyze
FROM pg_stat_user_tables
ORDER BY total_vacuum_time DESC
LIMIT 20;

Alerting Example

-- Alert on tables with long average vacuum time
SELECT relname, avg_vacuum_ms
FROM (
    SELECT
        relname,
        total_vacuum_time / nullif(vacuum_count, 0) AS avg_vacuum_ms
    FROM pg_stat_user_tables
    WHERE vacuum_count > 0
) sub
WHERE avg_vacuum_ms > 60000  -- > 1 minute average
ORDER BY avg_vacuum_ms DESC;

Related Enhancements (PG18)

EXPLAIN ANALYZE now also shows:

  • Buffer usage automatically
  • WAL writes (verbose mode)
  • CPU time
  • Average read times
EXPLAIN (ANALYZE, BUFFERS, WAL)
SELECT * FROM large_table WHERE id < 1000;

Source: PostgreSQL 18 Release Notes
https://www.postgresql.org/docs/18/release-18.html

99% confidence

foreign_data

1 question
A

PostgreSQL 17 postgres_fdw Subquery Push Down

PostgreSQL 17 enables EXISTS and IN subqueries to be pushed to remote PostgreSQL servers via postgres_fdw, reducing data transfer and improving performance.

The Improvement

Previously, EXISTS/IN subqueries against foreign tables were executed locally, requiring all foreign data to be fetched first.

Example

-- Setup: foreign table pointing to remote server
CREATE SERVER remote_server
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'remote.example.com', dbname 'salesdb');

CREATE FOREIGN TABLE remote_orders (
    id INT,
    customer_id INT,
    total NUMERIC
) SERVER remote_server;

-- This query now pushes the subquery to remote
SELECT * FROM local_customers c
WHERE EXISTS (
    SELECT 1 FROM remote_orders o
    WHERE o.customer_id = c.id
    AND o.total > 1000
);

EXPLAIN Comparison

-- PostgreSQL 16: Subquery executed locally
EXPLAIN VERBOSE SELECT * FROM local_customers c
WHERE EXISTS (SELECT 1 FROM remote_orders o WHERE o.customer_id = c.id);

-- Shows:
--   Filter: EXISTS (SubPlan)
--   ->  Foreign Scan on remote_orders  -- Fetches ALL rows
--         Remote SQL: SELECT id, customer_id, total FROM orders

-- PostgreSQL 17: Subquery pushed to remote
EXPLAIN VERBOSE SELECT * FROM local_customers c
WHERE EXISTS (SELECT 1 FROM remote_orders o WHERE o.customer_id = c.id);

-- Shows:
--   Foreign Scan
--   Remote SQL: SELECT ... WHERE EXISTS (SELECT 1 FROM orders WHERE ...)

Supported Patterns

Pattern Pushed Down (PG17)?
WHERE EXISTS (SELECT ... FROM foreign_table) Yes
WHERE id IN (SELECT id FROM foreign_table) Yes
WHERE NOT EXISTS (...) Yes
WHERE id NOT IN (...) Yes
Correlated subqueries Yes

Performance Impact

Scenario PG16 PG17
1M remote rows, 100 matches Fetch 1M rows Fetch 100 rows
Network transfer High Minimal
Query time Minutes Seconds

Configuration

-- Ensure push down is enabled (default: on)
ALTER SERVER remote_server OPTIONS (ADD fetch_size '1000');

-- Check what gets pushed
SET postgres_fdw.show_remote_sql = on;  -- For debugging

Source: PostgreSQL 17 Release Notes
https://www.postgresql.org/docs/17/release-17.html

99% confidence

psql_tools

1 question
A

PostgreSQL 17 psql watch min_rows

PostgreSQL 17 adds a min_rows parameter to the psql watch command that stops execution after the query returns at least the specified number of rows.

Syntax

-- In psql, use backslash-watch
watch [interval] [min_rows=N]

Examples

-- Stop when at least 5 rows are returned
SELECT * FROM queue WHERE status = 'pending';
-- Then run: watch 1 min_rows=5

-- Runs every 1 second until 5+ pending items exist

-- Wait for replication to catch up (1 row = caught up)
SELECT 1 WHERE pg_last_wal_replay_lsn() >= '0/1234567'::pg_lsn;
-- Then run: watch 0.5 min_rows=1

-- Check for job completion
SELECT * FROM jobs WHERE id = 123 AND status = 'completed';
-- Then run: watch 2 min_rows=1

Use Cases

Scenario Command
Wait for table to have N rows watch 1 min_rows=N
Wait for condition to be true watch 0.5 min_rows=1
Monitor until threshold reached watch 5 min_rows=100
Poll for job completion watch 2 min_rows=1

Comparison with Basic watch

-- Basic watch (runs forever until Ctrl-C)
SELECT count(*) FROM events WHERE processed = false;
-- Then run: watch 5

-- With min_rows (auto-stops)
SELECT count(*) FROM events WHERE processed = false HAVING count(*) = 0;
-- Then run: watch 5 min_rows=1
-- Stops when all events are processed

Practical Examples

-- Wait for locks to clear
SELECT 1 WHERE NOT EXISTS (
    SELECT 1 FROM pg_locks WHERE relation = 'my_table'::regclass
);
-- Then run: watch 1 min_rows=1

-- Wait for active connections to drop below threshold
SELECT 1 WHERE (
    SELECT count(*) FROM pg_stat_activity WHERE state = 'active'
) < 10;
-- Then run: watch 2 min_rows=1

-- Monitor batch progress, stop at 1000 processed
SELECT count(*) AS processed FROM items WHERE status = 'done'
HAVING count(*) >= 1000;
-- Then run: watch 5 min_rows=1

Also New in PG17 psql

  • Ctrl-C cancels connection attempts (previously had to wait for timeout)
  • Better tab completion for SQL keywords
  • Improved d command output formatting

Source: PostgreSQL 17 Documentation - psql
https://www.postgresql.org/docs/17/app-psql.html

99% confidence

partitioning

1 question
A

ON DELETE CASCADE vs ON DELETE SET NULL

ON DELETE CASCADE: When a referenced row in the parent table is deleted, all rows in the child table that reference it are automatically deleted.

ON DELETE SET NULL: When a referenced row in the parent table is deleted, the foreign key column(s) in the child table are set to NULL (rows remain, only the reference is nullified).

Code Examples

-- ON DELETE CASCADE example
CREATE TABLE departments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    dept_id INTEGER REFERENCES departments(id) ON DELETE CASCADE
);

-- Deleting department ID 1 will DELETE all employees with dept_id = 1
DELETE FROM departments WHERE id = 1;
-- ON DELETE SET NULL example
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    dept_id INTEGER REFERENCES departments(id) ON DELETE SET NULL
);

-- Deleting department ID 1 will SET dept_id = NULL for all employees with dept_id = 1
-- The employee rows remain in the table
DELETE FROM departments WHERE id = 1;

Critical Constraint

ON DELETE SET NULL requires the foreign key column to be nullable. This will fail:

-- ERROR: column "dept_id" cannot be null
CREATE TABLE employees (
    dept_id INTEGER NOT NULL REFERENCES departments(id) ON DELETE SET NULL
);

Source

PostgreSQL 17 Documentation: Foreign Keys - Referential Actions

Available since: PostgreSQL 6.3 (1998)

99% confidence

sql_query_language

1 question
A

ALTER TABLE ... SET NOT NULL Syntax (PostgreSQL)

The syntax to add a NOT NULL constraint to an existing column is:

ALTER TABLE table_name ALTER COLUMN column_name SET NOT NULL;

Example:

ALTER TABLE users ALTER COLUMN email SET NOT NULL;

Critical Requirements:

  1. The column must NOT contain any NULL values before executing this command, or the operation will fail with error 23502
  2. This acquires an ACCESS EXCLUSIVE lock on the table, blocking all other operations

PostgreSQL 12+ Optimization:
If you first add a CHECK constraint that validates the NOT NULL condition, then drop it before adding the actual NOT NULL constraint, PostgreSQL can skip the full table scan:

-- Step 1: Add CHECK constraint (does NOT block writes heavily)
ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID;

-- Step 2: Validate the constraint (uses SHARE UPDATE EXCLUSIVE lock)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null;

-- Step 3: Set NOT NULL (skips table scan since constraint proves no NULLs exist)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- Step 4: Drop the redundant CHECK constraint
ALTER TABLE users DROP CONSTRAINT users_email_not_null;

Removing NOT NULL:

ALTER TABLE table_name ALTER COLUMN column_name DROP NOT NULL;

Source: PostgreSQL 17 Official Documentation - ALTER TABLE command reference

99% confidence