CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); Use vector_cosine_ops for cosine distance (<=>), vector_l2_ops for L2 distance (<->), or vector_ip_ops for inner product (<#>). CONCURRENTLY builds index without blocking writes. Parameters: m=32 creates 32 connections per node (2x default of 16), ef_construction=128 improves build quality (2x default of 64). For 1536 dimensions: requires ~8GB+ maintenance_work_mem for large tables. Increase with SET maintenance_work_mem = '8GB'; before index creation. Build time for 1M+ rows: 1-2 hours on modern hardware with default parameters. Monitor with pg_stat_progress_create_index. Available since pgvector 0.5.0, latest 0.8.0.
pgvector FAQ & Answers
50 expert pgvector answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
50 questionsFor HNSW index with L2 distance: CREATE INDEX ON items USING hnsw(embedding vector_l2_ops); For IVFFlat: CREATE INDEX ON items USING ivfflat(embedding vector_l2_ops) WITH (lists = 1000); Use vector_l2_ops operator class for L2/Euclidean distance, measured with <-> operator. Query example: SELECT * FROM items ORDER BY embedding <-> '[0.1, 0.2, ...]' LIMIT 5; Lists parameter for IVFFlat: sqrt(rows) for <1M rows, 1000-2000 for 1M+ rows. HNSW generally outperforms IVFFlat for L2 distance with default parameters. Ensure vector column defined as vector(1536) matches embedding dimension exactly or insertion fails. Set maintenance_work_mem >= 2GB before creating index on large tables.
Sequential scans occur when: (1) Missing ORDER BY ... LIMIT clause in ascending order - HNSW requires ORDER BY embedding <=> $1 LIMIT k syntax. (2) Wrong distance operator - index created with vector_cosine_ops but query uses <-> (L2) instead of <=> (cosine). (3) Index not in shared_buffers - HNSW index must fit in memory or PostgreSQL falls back to sequential scan. (4) LIMIT too large - pgvector 0.8.0+ uses sequential scan when LIMIT exceeds threshold. (5) Filtered queries with low selectivity. Check with EXPLAIN ANALYZE: look for 'Seq Scan' vs 'Index Scan using hnsw_idx'. Fix: ensure ORDER BY ... LIMIT ASC, matching operators, increase shared_buffers to fit index size (estimate: rows * dimensions * 4 bytes * 1.5 overhead).
Use ORDER BY with LIMIT in ascending order: SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; Ensure operator matches index (vector_cosine_ops requires <=>, vector_l2_ops requires <->, vector_ip_ops requires <#>). Disable sequential scan temporarily: BEGIN; SET LOCAL enable_seqscan = off; SELECT ...; COMMIT; For production, fix root cause: (1) Increase shared_buffers so index fits in memory, (2) Set ivfflat.probes correctly (default 1, try 10-20), (3) Verify ANALYZE has been run on table, (4) Check LIMIT isn't too large (pgvector may choose sequential for LIMIT > threshold). Use EXPLAIN ANALYZE to verify 'Index Scan using ivfflat_idx'. IVFFlat requires lists parameter: set to sqrt(rows) during creation.
Set at session level before query: SET ivfflat.probes = 10; Then run query: SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; The probes parameter controls how many IVFFlat lists to search - higher values = better recall but slower queries. Default: 1 (fast, lower recall). Recommended: sqrt(lists) as starting point. Example: index created with lists=1000 should use probes=32 for balanced recall/speed. Range: 1-lists (setting probes=lists performs exact search). Typical production values: 10-20 for most workloads. Trade-off: probes=1 (85% recall, 20ms), probes=10 (95% recall, 50ms), probes=50 (~99% recall, 200ms). Monitor recall vs speed with your dataset and adjust. Set globally with ALTER DATABASE or per-connection.
Combine vector similarity with JSONB filter in WHERE clause: SELECT id, content, embedding <#> '[0.1, 0.2, ...]' AS distance FROM documents WHERE metadata @> '{"category": "tech"}' ORDER BY distance LIMIT 10; The <#> operator computes negative inner product (best for normalized embeddings like OpenAI). For JSONB performance, create GIN index: CREATE INDEX ON documents USING gin(metadata jsonb_path_ops); The @> operator checks containment. Alternative operators: <=> for cosine distance, <-> for L2 distance. Metadata filtering applied before vector search when filter is selective. For complex filters: metadata->>'status' = 'active' AND metadata->>'priority' = 'high'. pgvectorscale extension (2025) adds optimized label-based filtering. Ensure both vector and metadata indexes exist for best performance.
Use connection pooler like PgBouncer with pool size = (CPU cores * 2) + storage spindles. For 100 concurrent clients: configure PgBouncer with max_client_conn=100, default_pool_size=20-40 (based on CPU cores). PostgreSQL backend process model doesn't scale to 100+ direct connections - use pooling. pgvector-specific: single biggest factor is keeping HNSW index in shared_buffers (memory). Configure shared_buffers >= index size (estimate: rows * dimensions * 4 bytes * 2 overhead). Example 8-core server: set max_connections=50, use PgBouncer with pool_mode=transaction and default_pool_size=20. Monitor with pg_stat_activity for connection saturation. For 200 connection pools: allocate ~2GB overhead. Production formula: (concurrentUsers * avgQueryTimeMs / 1000) + 20% buffer. Critical: HNSW performance degrades if index evicted from memory due to connection overhead. PgBouncer 1.24.1+ recommended (fixes CVE-2025-2291).
Calculate vector magnitude and divide each component: SELECT id, (embedding::vector / sqrt((embedding <#> embedding) * -1))::vector(1536) AS normalized_embedding FROM documents; Explanation: embedding <#> embedding returns negative inner product of vector with itself, multiply by -1 to get positive magnitude squared, sqrt gives magnitude, division normalizes to length 1. For bulk normalization during insertion: INSERT INTO documents (embedding) VALUES ((my_vector::vector / sqrt((my_vector <#> my_vector) * -1))::vector(1536)); Note: OpenAI embeddings are pre-normalized - use <#> (inner product) directly for best performance instead of <=> (cosine). Verify normalization: SELECT sqrt((embedding <#> embedding) * -1) FROM documents; should return ~1.0. For already-normalized vectors, inner product <#> is faster than cosine <=>, giving identical results.
pgvector returns cosine distance (lower = more similar). Convert to similarity: SELECT id, content, 1 - (embedding <=> '[...]') AS cosine_similarity FROM documents ORDER BY embedding <=> '[...]' LIMIT 10; Cosine distance range: 0 (identical) to 2 (opposite). Cosine similarity range: 0 (opposite) to 1 (identical). Recommended relevance thresholds: 0.5 (broad, loosely related), 0.7-0.8 (highly relevant), 0.9+ (near-exact match). Production example: WHERE 1 - (embedding <=> query_vector) >= 0.7 filters for similarity >= 0.7. For inner product <#> (normalized vectors): negative values are distances, multiply by -1 for similarity. Test threshold on validation set: precision/recall trade-off. Typical production: 0.75 for RAG retrieval, 0.85 for deduplication, 0.95 for near-duplicates. Monitor distribution of scores to adjust threshold per use case.
Install extension: CREATE EXTENSION IF NOT EXISTS vector; Verify installation: SELECT * FROM pg_extension WHERE extname = 'vector'; should return one row. Test functionality: CREATE TABLE test_vectors (id serial PRIMARY KEY, embedding vector(3)); INSERT INTO test_vectors (embedding) VALUES ('[1,2,3]'), ('[4,5,6]'); SELECT embedding FROM test_vectors; If error 'type vector does not exist', extension not installed correctly. For installation: use package manager (apt install postgresql-15-pgvector on Ubuntu), compile from source (git clone https://github.com/pgvector/pgvector.git && make && make install), or use Docker image with pgvector pre-installed. Requires PostgreSQL 11+. Check version: SELECT extversion FROM pg_extension WHERE extname = 'vector'; Latest stable: 0.7.0+ (2025). Extension must be created per database. For RDS/managed PostgreSQL: use CREATE EXTENSION (no manual installation needed).
Use HNSW when: (1) Query speed is critical - HNSW provides 2-10x faster queries than IVFFlat. (2) Dataset >100K vectors - HNSW scales better. (3) High recall required (>95%) - HNSW maintains recall without tuning probes. (4) Willing to use more memory - HNSW uses ~1.5-2x memory of IVFFlat. Use IVFFlat when: (1) Fast index builds needed - IVFFlat builds 5-10x faster. (2) Memory constrained - IVFFlat more compact. (3) Dataset <100K vectors - performance difference minimal. (4) Acceptable to tune probes parameter for recall. Production recommendation: HNSW for most use cases (v0.5.0+). Benchmark: HNSW recall@10=0.98 at 5ms, IVFFlat recall@10=0.95 at 15ms (1M vectors, probes=10). Both support all distance operators (<=>, <->, <#>). HNSW available since pgvector 0.5.0 (2023).
Set at session level: SET hnsw.ef_search = 100; before querying. Default: 40 (balances speed and recall). Higher values = better recall but slower queries. Range: 1 to unlimited (practical max: 400). Recommended values: 40 (default, ~95% recall), 100 (high recall ~98%, 2x slower), 200 (very high recall ~99%, 4x slower), 400 (near-exact, 8x slower). Example workflow: SET hnsw.ef_search = 100; SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; Trade-off: ef_search=40 (5ms, recall 0.95), ef_search=100 (10ms, recall 0.98), ef_search=200 (20ms, recall 0.99). Set globally: ALTER DATABASE mydb SET hnsw.ef_search = 100; or per-role: ALTER ROLE myuser SET hnsw.ef_search = 100; For production: start with 40, increase if recall issues. Monitor with EXPLAIN ANALYZE and measure recall on validation set.
Fastest approach: (1) Insert all vectors first WITHOUT index. (2) Create index after all data inserted. Code: BEGIN; COPY my_table(id, embedding) FROM '/path/to/vectors.csv' WITH (FORMAT csv); CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); COMMIT; Use COPY for bulk loading (100x faster than INSERT). Optimization: SET maintenance_work_mem = '8GB'; before index creation. For HNSW: build time ~1-2 hours per 1M vectors (m=16, ef_construction=64). For IVFFlat: ~10-20 minutes per 1M vectors. Avoid: inserting with existing index (10x slower). Alternative: use UNLOGGED table during import, switch to LOGGED after: ALTER TABLE my_table SET LOGGED; Monitor progress: SELECT * FROM pg_stat_progress_create_index; Expected throughput: 50,000-100,000 vectors/sec with COPY (no index), 500-2,000 vectors/sec during indexed inserts.
Use cosine distance (<=>): For text embeddings (OpenAI, Sentence Transformers, Cohere), document similarity, semantic search. Cosine measures angle, ignoring magnitude - best for normalized or variable-length embeddings. Use L2 distance (<->): For spatial data, computer vision, when magnitude matters, unnormalized vectors. L2 measures Euclidean distance - best for coordinate-based data. Use inner product (<#>): For pre-normalized embeddings (OpenAI embeddings are normalized). Inner product is fastest operator when vectors normalized (3-5x faster than cosine). Returns identical results to cosine for normalized vectors. Production best practice: If embeddings are normalized (OpenAI, most modern models), use <#> (inner product) for maximum performance. If unsure or mixed normalization, use <=> (cosine). Operator must match index: vector_cosine_ops with <=>, vector_l2_ops with <->, vector_ip_ops with <#>. Verify with EXPLAIN ANALYZE.
No simple formula exists - use benchmarks as reference. Production benchmarks (pgvector 0.5.1, 64 vCPU, 512GB RAM): 1M vectors at 1536-dim = 49-82 minutes, 1M at 128-dim = 12-25 minutes, 60K at 784-dim = 0.87-1.45 minutes. pgvector 0.6.0+ (2024) has parallel builds providing up to 30x speedup. Factors affecting build time: (1) Dataset size - scales linearly with rows. (2) Dimensions - higher dimensions = longer builds. (3) m parameter - higher values increase build time. (4) ef_construction - higher values significantly increase build time. (5) maintenance_work_mem - graph must fit in memory or builds take dramatically longer. (6) max_parallel_maintenance_workers - more workers = faster (set to 7+ for best results). Monitor progress: SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS percent FROM pg_stat_progress_create_index; Before building: SET maintenance_work_mem = '8GB'; SET max_parallel_maintenance_workers = 7; Estimate by testing on representative sample (10% of data) and extrapolating. Use default parameters (m=16, ef_construction=64) for fastest builds, then tune if recall insufficient.
Calculate HNSW index size: rows * dimensions * 4 bytes * 2 (overhead) = 5M * 1536 * 4 * 2 = 61 GB. Configure: (1) shared_buffers = 70 GB (110% of index size to fit in memory). (2) maintenance_work_mem = 8 GB (for index builds, higher = faster). (3) effective_cache_size = 256 GB (if available, helps query planner). (4) work_mem = 256 MB (per connection, for sorting). In postgresql.conf: shared_buffers = 70GB, maintenance_work_mem = 8GB, effective_cache_size = 256GB, work_mem = 256MB. Restart PostgreSQL after changing shared_buffers. If index doesn't fit in shared_buffers, query performance degrades 10-100x (falls back to sequential scan). For AWS RDS: use r6g.8xlarge (256GB RAM) or larger. Monitor: SELECT * FROM pg_stat_database; check blks_hit vs blks_read (>95% hit ratio indicates index in memory). Alternative: use IVFFlat for smaller memory footprint (~30 GB for same dataset).
Use REINDEX CONCURRENTLY (PostgreSQL 12+): REINDEX INDEX CONCURRENTLY my_hnsw_idx; or create new index with different name, then swap: CREATE INDEX CONCURRENTLY my_hnsw_idx_new ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); BEGIN; DROP INDEX my_hnsw_idx; ALTER INDEX my_hnsw_idx_new RENAME TO my_hnsw_idx; COMMIT; CONCURRENTLY allows reads/writes during rebuild. Build time: same as original index creation (~1-2 hours per 1M vectors). Monitor: SELECT * FROM pg_stat_progress_create_index; Alternative: If index parameters can't be changed, must drop and recreate: DROP INDEX CONCURRENTLY my_hnsw_idx; CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); For zero-downtime: use two indexes temporarily (doubled memory usage) until new index ready.
Create partial index with WHERE clause: CREATE INDEX tenant_123_hnsw_idx ON documents USING hnsw(embedding vector_cosine_ops) WHERE tenant_id = 123; Query must include same predicate to use index: SELECT * FROM documents WHERE tenant_id = 123 ORDER BY embedding <=> '[...]' LIMIT 10; Benefits: (1) Smaller index per tenant (faster queries). (2) Better memory utilization (load only active tenant indexes). (3) Tenant isolation. Create one partial index per active tenant. For 100 tenants: 100 partial indexes. Index size: (tenant_rows * dimensions * 4 * 2) per tenant. Example: 50K vectors per tenant, 1536 dims = 600 MB per tenant index. Use for: SaaS applications with tenant-specific vector search. Alternative: table partitioning by tenant_id with index per partition: CREATE TABLE documents_tenant_123 PARTITION OF documents FOR VALUES IN (123); CREATE INDEX ON documents_tenant_123 USING hnsw(embedding vector_cosine_ops); PostgreSQL auto-routes queries to correct partition.
Error occurs when inserting vector with different dimensions than column definition. Column defined: embedding vector(1536), but inserting 768-dimensional vector. Fix: (1) Use correct embedding model (text-embedding-3-large: 1536 dims, text-embedding-3-small: 1536 dims, ada-002: 1536 dims, all-MiniLM-L6-v2: 384 dims). (2) Recreate column with correct dimensions: ALTER TABLE my_table ALTER COLUMN embedding TYPE vector(768); WARNING: drops existing indexes, must recreate. (3) If embedding model changed: re-generate all embeddings with new model. Verify model output dimensions before table creation. Common dimensions: OpenAI ada-002: 1536, OpenAI text-embedding-3-small: 1536 (configurable 512-1536), text-embedding-3-large: 3072 (configurable 256-3072), Sentence Transformers all-MiniLM-L6-v2: 384, Cohere embed-english-v3.0: 1024. Check inserted vector: SELECT array_length(embedding::real[], 1) FROM my_table; Dimension mismatch prevents insertion - no automatic padding or truncation.
Use standard UPDATE: UPDATE my_table SET embedding = '[0.1, 0.2, ...]'::vector WHERE id = 123; Index automatically updates incrementally (no full rebuild needed). For HNSW: update inserts new vector in index and marks old deleted (lazy cleanup). HNSW has no training step - incremental insertion works by traversing graph to find right place and updating connections. Update performance: ~2-10ms per vector (similar to insert). For bulk updates (>10% of table): consider REINDEX after to clean up deleted entries and improve query performance. Monitor index bloat: SELECT pg_size_pretty(pg_relation_size('my_hnsw_idx')); VACUUM table to reclaim space from deleted vectors. For frequently updated vectors: consider separate table for mutable data + join with vector table. High update rate (>1000/sec): HNSW can accumulate deleted entries - schedule periodic REINDEX CONCURRENTLY during low-traffic hours. Alternative: if >50% of vectors change, faster to DROP INDEX, UPDATE all rows, CREATE INDEX.
Enable extension: CREATE EXTENSION pg_stat_statements; Configure in postgresql.conf: shared_preload_libraries = 'pg_stat_statements', pg_stat_statements.track = all. Restart PostgreSQL. Query slow vector searches: SELECT query, calls, mean_exec_time, max_exec_time FROM pg_stat_statements WHERE query LIKE '%<=>%' OR query LIKE '%<->%' OR query LIKE '%<#>%' ORDER BY mean_exec_time DESC LIMIT 10; Key metrics: (1) mean_exec_time: average query latency. (2) calls: query frequency. (3) max_exec_time: worst-case latency. (4) shared_blks_hit vs shared_blks_read: index in memory vs disk. Target: mean_exec_time <50ms for HNSW, <100ms for IVFFlat. If mean_exec_time >100ms: check shared_buffers fits index, verify index used (EXPLAIN ANALYZE), tune ef_search/probes. Reset stats: SELECT pg_stat_statements_reset(); For production monitoring: export to Prometheus/Grafana or use pganalyze.com.
Use Reciprocal Rank Fusion (RRF) to combine BM25 and vector results: WITH vector_search AS (SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> '[...]') AS rank FROM documents LIMIT 20), fts_search AS (SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, query) DESC) AS rank FROM documents, plainto_tsquery('search text') query WHERE tsv @@ query LIMIT 20) SELECT COALESCE(v.id, f.id) AS id, (1.0 / (60 + v.rank) + 1.0 / (60 + f.rank)) AS score FROM vector_search v FULL OUTER JOIN fts_search f ON v.id = f.id ORDER BY score DESC LIMIT 10; RRF formula: 1/(k+rank) where k=60 (default smoothing constant, smaller k weights top results more). RRF is rank-based (not score-based), making it superior to simple score addition. Requires: (1) GIN index for full-text: CREATE INDEX ON documents USING gin(tsv); (2) HNSW index for vectors. Use cases: RAG with keyword boosting, e-commerce search (text+image), hybrid question answering. Benchmark: hybrid improves accuracy 8-15% over pure methods.
Operator precedence issue: embedding <=> '[...]' < 0.5 is parsed as embedding <=> ('[...]' < 0.5), not (embedding <=> '[...]') < 0.5. Fix: use parentheses: WHERE (embedding <=> '[...]') < 0.5. Correct query: SELECT * FROM items WHERE (embedding <=> '[0.1, 0.2, ...]'::vector) < 0.5 ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector LIMIT 10; Without parentheses: PostgreSQL tries to evaluate '[...]' < 0.5 which fails or produces wrong results. Same issue with all distance operators: <=>, <->, <#>. Alternative: use column alias: SELECT *, embedding <=> '[...]' AS distance FROM items WHERE distance < 0.5 ORDER BY distance; ERROR - cannot reference column alias in WHERE. Solution: use subquery or CTE: WITH ranked AS (SELECT *, embedding <=> '[...]' AS distance FROM items) SELECT * FROM ranked WHERE distance < 0.5 ORDER BY distance LIMIT 10; Always wrap distance operators in parentheses when used in WHERE clause.
Use pg_dump with custom format: pg_dump -Fc -f mydb.dump mydb. Restore: pg_restore -d newdb mydb.dump. IMPORTANT: pgvector extension must be installed in target database first: CREATE EXTENSION vector; pg_dump includes: table schemas, vector data, index definitions. Indexes are NOT stored as data - they rebuild during restore. For 1M+ vectors: restore takes hours due to index rebuild. Speed up: (1) Restore with --no-indexes flag: pg_restore --no-indexes -d newdb mydb.dump, then manually CREATE INDEX after data loaded. (2) Increase maintenance_work_mem: SET maintenance_work_mem = '8GB'; before restore. For large databases: use pg_basebackup (physical backup) - includes pre-built indexes: pg_basebackup -D /backup/dir -Fp -Xs -P. Physical backups restore instantly (no index rebuild). Alternative: dump data only, rebuild indexes on restore: pg_dump -a (data only), then CREATE INDEX CONCURRENTLY. For cloud: use provider snapshots (RDS, Cloud SQL) for fastest restore.
pgvectorscale is a Timescale extension (2024+) that complements pgvector with StreamingDiskANN indexes and optimized filtering. Use pgvectorscale when: (1) Dataset >5M vectors (pgvector HNSW memory limits). (2) Need filtered vector search with <5% selectivity (pgvectorscale 10x faster using Filtered DiskANN approach). (3) Require disk-based indexes (pgvector requires index in memory). (4) Budget constrained (pgvectorscale uses 28x less memory). Installation: CREATE EXTENSION vectorscale CASCADE; (automatically loads pgvector). Create DiskANN index: CREATE INDEX ON items USING diskann(embedding vector_cosine_ops); Features: (1) Statistical Binary Quantization compression. (2) Label-based filtering (faster than WHERE). (3) Automatic index maintenance. Written in Rust. Compatible with pgvector vector data type - drop-in replacement. Benchmark: 28x lower p95 latency, 16x higher throughput vs Pinecone on 50M Cohere embeddings (768 dims) at 99% recall. Use pgvector for: <1M vectors, simple use cases. Use pgvectorscale for: large-scale production, filtered search.
CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); Use vector_cosine_ops for cosine distance (<=>), vector_l2_ops for L2 distance (<->), or vector_ip_ops for inner product (<#>). CONCURRENTLY builds index without blocking writes. Parameters: m=32 creates 32 connections per node (2x default of 16), ef_construction=128 improves build quality (2x default of 64). For 1536 dimensions: requires ~8GB+ maintenance_work_mem for large tables. Increase with SET maintenance_work_mem = '8GB'; before index creation. Build time for 1M+ rows: 1-2 hours on modern hardware with default parameters. Monitor with pg_stat_progress_create_index. Available since pgvector 0.5.0, latest 0.8.0.
For HNSW index with L2 distance: CREATE INDEX ON items USING hnsw(embedding vector_l2_ops); For IVFFlat: CREATE INDEX ON items USING ivfflat(embedding vector_l2_ops) WITH (lists = 1000); Use vector_l2_ops operator class for L2/Euclidean distance, measured with <-> operator. Query example: SELECT * FROM items ORDER BY embedding <-> '[0.1, 0.2, ...]' LIMIT 5; Lists parameter for IVFFlat: sqrt(rows) for <1M rows, 1000-2000 for 1M+ rows. HNSW generally outperforms IVFFlat for L2 distance with default parameters. Ensure vector column defined as vector(1536) matches embedding dimension exactly or insertion fails. Set maintenance_work_mem >= 2GB before creating index on large tables.
Sequential scans occur when: (1) Missing ORDER BY ... LIMIT clause in ascending order - HNSW requires ORDER BY embedding <=> $1 LIMIT k syntax. (2) Wrong distance operator - index created with vector_cosine_ops but query uses <-> (L2) instead of <=> (cosine). (3) Index not in shared_buffers - HNSW index must fit in memory or PostgreSQL falls back to sequential scan. (4) LIMIT too large - pgvector 0.8.0+ uses sequential scan when LIMIT exceeds threshold. (5) Filtered queries with low selectivity. Check with EXPLAIN ANALYZE: look for 'Seq Scan' vs 'Index Scan using hnsw_idx'. Fix: ensure ORDER BY ... LIMIT ASC, matching operators, increase shared_buffers to fit index size (estimate: rows * dimensions * 4 bytes * 1.5 overhead).
Use ORDER BY with LIMIT in ascending order: SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; Ensure operator matches index (vector_cosine_ops requires <=>, vector_l2_ops requires <->, vector_ip_ops requires <#>). Disable sequential scan temporarily: BEGIN; SET LOCAL enable_seqscan = off; SELECT ...; COMMIT; For production, fix root cause: (1) Increase shared_buffers so index fits in memory, (2) Set ivfflat.probes correctly (default 1, try 10-20), (3) Verify ANALYZE has been run on table, (4) Check LIMIT isn't too large (pgvector may choose sequential for LIMIT > threshold). Use EXPLAIN ANALYZE to verify 'Index Scan using ivfflat_idx'. IVFFlat requires lists parameter: set to sqrt(rows) during creation.
Set at session level before query: SET ivfflat.probes = 10; Then run query: SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; The probes parameter controls how many IVFFlat lists to search - higher values = better recall but slower queries. Default: 1 (fast, lower recall). Recommended: sqrt(lists) as starting point. Example: index created with lists=1000 should use probes=32 for balanced recall/speed. Range: 1-lists (setting probes=lists performs exact search). Typical production values: 10-20 for most workloads. Trade-off: probes=1 (85% recall, 20ms), probes=10 (95% recall, 50ms), probes=50 (~99% recall, 200ms). Monitor recall vs speed with your dataset and adjust. Set globally with ALTER DATABASE or per-connection.
Combine vector similarity with JSONB filter in WHERE clause: SELECT id, content, embedding <#> '[0.1, 0.2, ...]' AS distance FROM documents WHERE metadata @> '{"category": "tech"}' ORDER BY distance LIMIT 10; The <#> operator computes negative inner product (best for normalized embeddings like OpenAI). For JSONB performance, create GIN index: CREATE INDEX ON documents USING gin(metadata jsonb_path_ops); The @> operator checks containment. Alternative operators: <=> for cosine distance, <-> for L2 distance. Metadata filtering applied before vector search when filter is selective. For complex filters: metadata->>'status' = 'active' AND metadata->>'priority' = 'high'. pgvectorscale extension (2025) adds optimized label-based filtering. Ensure both vector and metadata indexes exist for best performance.
Use connection pooler like PgBouncer with pool size = (CPU cores * 2) + storage spindles. For 100 concurrent clients: configure PgBouncer with max_client_conn=100, default_pool_size=20-40 (based on CPU cores). PostgreSQL backend process model doesn't scale to 100+ direct connections - use pooling. pgvector-specific: single biggest factor is keeping HNSW index in shared_buffers (memory). Configure shared_buffers >= index size (estimate: rows * dimensions * 4 bytes * 2 overhead). Example 8-core server: set max_connections=50, use PgBouncer with pool_mode=transaction and default_pool_size=20. Monitor with pg_stat_activity for connection saturation. For 200 connection pools: allocate ~2GB overhead. Production formula: (concurrentUsers * avgQueryTimeMs / 1000) + 20% buffer. Critical: HNSW performance degrades if index evicted from memory due to connection overhead. PgBouncer 1.24.1+ recommended (fixes CVE-2025-2291).
Calculate vector magnitude and divide each component: SELECT id, (embedding::vector / sqrt((embedding <#> embedding) * -1))::vector(1536) AS normalized_embedding FROM documents; Explanation: embedding <#> embedding returns negative inner product of vector with itself, multiply by -1 to get positive magnitude squared, sqrt gives magnitude, division normalizes to length 1. For bulk normalization during insertion: INSERT INTO documents (embedding) VALUES ((my_vector::vector / sqrt((my_vector <#> my_vector) * -1))::vector(1536)); Note: OpenAI embeddings are pre-normalized - use <#> (inner product) directly for best performance instead of <=> (cosine). Verify normalization: SELECT sqrt((embedding <#> embedding) * -1) FROM documents; should return ~1.0. For already-normalized vectors, inner product <#> is faster than cosine <=>, giving identical results.
pgvector returns cosine distance (lower = more similar). Convert to similarity: SELECT id, content, 1 - (embedding <=> '[...]') AS cosine_similarity FROM documents ORDER BY embedding <=> '[...]' LIMIT 10; Cosine distance range: 0 (identical) to 2 (opposite). Cosine similarity range: 0 (opposite) to 1 (identical). Recommended relevance thresholds: 0.5 (broad, loosely related), 0.7-0.8 (highly relevant), 0.9+ (near-exact match). Production example: WHERE 1 - (embedding <=> query_vector) >= 0.7 filters for similarity >= 0.7. For inner product <#> (normalized vectors): negative values are distances, multiply by -1 for similarity. Test threshold on validation set: precision/recall trade-off. Typical production: 0.75 for RAG retrieval, 0.85 for deduplication, 0.95 for near-duplicates. Monitor distribution of scores to adjust threshold per use case.
Install extension: CREATE EXTENSION IF NOT EXISTS vector; Verify installation: SELECT * FROM pg_extension WHERE extname = 'vector'; should return one row. Test functionality: CREATE TABLE test_vectors (id serial PRIMARY KEY, embedding vector(3)); INSERT INTO test_vectors (embedding) VALUES ('[1,2,3]'), ('[4,5,6]'); SELECT embedding FROM test_vectors; If error 'type vector does not exist', extension not installed correctly. For installation: use package manager (apt install postgresql-15-pgvector on Ubuntu), compile from source (git clone https://github.com/pgvector/pgvector.git && make && make install), or use Docker image with pgvector pre-installed. Requires PostgreSQL 11+. Check version: SELECT extversion FROM pg_extension WHERE extname = 'vector'; Latest stable: 0.7.0+ (2025). Extension must be created per database. For RDS/managed PostgreSQL: use CREATE EXTENSION (no manual installation needed).
Use HNSW when: (1) Query speed is critical - HNSW provides 2-10x faster queries than IVFFlat. (2) Dataset >100K vectors - HNSW scales better. (3) High recall required (>95%) - HNSW maintains recall without tuning probes. (4) Willing to use more memory - HNSW uses ~1.5-2x memory of IVFFlat. Use IVFFlat when: (1) Fast index builds needed - IVFFlat builds 5-10x faster. (2) Memory constrained - IVFFlat more compact. (3) Dataset <100K vectors - performance difference minimal. (4) Acceptable to tune probes parameter for recall. Production recommendation: HNSW for most use cases (v0.5.0+). Benchmark: HNSW recall@10=0.98 at 5ms, IVFFlat recall@10=0.95 at 15ms (1M vectors, probes=10). Both support all distance operators (<=>, <->, <#>). HNSW available since pgvector 0.5.0 (2023).
Set at session level: SET hnsw.ef_search = 100; before querying. Default: 40 (balances speed and recall). Higher values = better recall but slower queries. Range: 1 to unlimited (practical max: 400). Recommended values: 40 (default, ~95% recall), 100 (high recall ~98%, 2x slower), 200 (very high recall ~99%, 4x slower), 400 (near-exact, 8x slower). Example workflow: SET hnsw.ef_search = 100; SELECT * FROM items ORDER BY embedding <=> '[...]' LIMIT 10; Trade-off: ef_search=40 (5ms, recall 0.95), ef_search=100 (10ms, recall 0.98), ef_search=200 (20ms, recall 0.99). Set globally: ALTER DATABASE mydb SET hnsw.ef_search = 100; or per-role: ALTER ROLE myuser SET hnsw.ef_search = 100; For production: start with 40, increase if recall issues. Monitor with EXPLAIN ANALYZE and measure recall on validation set.
Fastest approach: (1) Insert all vectors first WITHOUT index. (2) Create index after all data inserted. Code: BEGIN; COPY my_table(id, embedding) FROM '/path/to/vectors.csv' WITH (FORMAT csv); CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); COMMIT; Use COPY for bulk loading (100x faster than INSERT). Optimization: SET maintenance_work_mem = '8GB'; before index creation. For HNSW: build time ~1-2 hours per 1M vectors (m=16, ef_construction=64). For IVFFlat: ~10-20 minutes per 1M vectors. Avoid: inserting with existing index (10x slower). Alternative: use UNLOGGED table during import, switch to LOGGED after: ALTER TABLE my_table SET LOGGED; Monitor progress: SELECT * FROM pg_stat_progress_create_index; Expected throughput: 50,000-100,000 vectors/sec with COPY (no index), 500-2,000 vectors/sec during indexed inserts.
Use cosine distance (<=>): For text embeddings (OpenAI, Sentence Transformers, Cohere), document similarity, semantic search. Cosine measures angle, ignoring magnitude - best for normalized or variable-length embeddings. Use L2 distance (<->): For spatial data, computer vision, when magnitude matters, unnormalized vectors. L2 measures Euclidean distance - best for coordinate-based data. Use inner product (<#>): For pre-normalized embeddings (OpenAI embeddings are normalized). Inner product is fastest operator when vectors normalized (3-5x faster than cosine). Returns identical results to cosine for normalized vectors. Production best practice: If embeddings are normalized (OpenAI, most modern models), use <#> (inner product) for maximum performance. If unsure or mixed normalization, use <=> (cosine). Operator must match index: vector_cosine_ops with <=>, vector_l2_ops with <->, vector_ip_ops with <#>. Verify with EXPLAIN ANALYZE.
No simple formula exists - use benchmarks as reference. Production benchmarks (pgvector 0.5.1, 64 vCPU, 512GB RAM): 1M vectors at 1536-dim = 49-82 minutes, 1M at 128-dim = 12-25 minutes, 60K at 784-dim = 0.87-1.45 minutes. pgvector 0.6.0+ (2024) has parallel builds providing up to 30x speedup. Factors affecting build time: (1) Dataset size - scales linearly with rows. (2) Dimensions - higher dimensions = longer builds. (3) m parameter - higher values increase build time. (4) ef_construction - higher values significantly increase build time. (5) maintenance_work_mem - graph must fit in memory or builds take dramatically longer. (6) max_parallel_maintenance_workers - more workers = faster (set to 7+ for best results). Monitor progress: SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS percent FROM pg_stat_progress_create_index; Before building: SET maintenance_work_mem = '8GB'; SET max_parallel_maintenance_workers = 7; Estimate by testing on representative sample (10% of data) and extrapolating. Use default parameters (m=16, ef_construction=64) for fastest builds, then tune if recall insufficient.
Calculate HNSW index size: rows * dimensions * 4 bytes * 2 (overhead) = 5M * 1536 * 4 * 2 = 61 GB. Configure: (1) shared_buffers = 70 GB (110% of index size to fit in memory). (2) maintenance_work_mem = 8 GB (for index builds, higher = faster). (3) effective_cache_size = 256 GB (if available, helps query planner). (4) work_mem = 256 MB (per connection, for sorting). In postgresql.conf: shared_buffers = 70GB, maintenance_work_mem = 8GB, effective_cache_size = 256GB, work_mem = 256MB. Restart PostgreSQL after changing shared_buffers. If index doesn't fit in shared_buffers, query performance degrades 10-100x (falls back to sequential scan). For AWS RDS: use r6g.8xlarge (256GB RAM) or larger. Monitor: SELECT * FROM pg_stat_database; check blks_hit vs blks_read (>95% hit ratio indicates index in memory). Alternative: use IVFFlat for smaller memory footprint (~30 GB for same dataset).
Use REINDEX CONCURRENTLY (PostgreSQL 12+): REINDEX INDEX CONCURRENTLY my_hnsw_idx; or create new index with different name, then swap: CREATE INDEX CONCURRENTLY my_hnsw_idx_new ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); BEGIN; DROP INDEX my_hnsw_idx; ALTER INDEX my_hnsw_idx_new RENAME TO my_hnsw_idx; COMMIT; CONCURRENTLY allows reads/writes during rebuild. Build time: same as original index creation (~1-2 hours per 1M vectors). Monitor: SELECT * FROM pg_stat_progress_create_index; Alternative: If index parameters can't be changed, must drop and recreate: DROP INDEX CONCURRENTLY my_hnsw_idx; CREATE INDEX CONCURRENTLY my_hnsw_idx ON my_table USING hnsw(embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); For zero-downtime: use two indexes temporarily (doubled memory usage) until new index ready.
Create partial index with WHERE clause: CREATE INDEX tenant_123_hnsw_idx ON documents USING hnsw(embedding vector_cosine_ops) WHERE tenant_id = 123; Query must include same predicate to use index: SELECT * FROM documents WHERE tenant_id = 123 ORDER BY embedding <=> '[...]' LIMIT 10; Benefits: (1) Smaller index per tenant (faster queries). (2) Better memory utilization (load only active tenant indexes). (3) Tenant isolation. Create one partial index per active tenant. For 100 tenants: 100 partial indexes. Index size: (tenant_rows * dimensions * 4 * 2) per tenant. Example: 50K vectors per tenant, 1536 dims = 600 MB per tenant index. Use for: SaaS applications with tenant-specific vector search. Alternative: table partitioning by tenant_id with index per partition: CREATE TABLE documents_tenant_123 PARTITION OF documents FOR VALUES IN (123); CREATE INDEX ON documents_tenant_123 USING hnsw(embedding vector_cosine_ops); PostgreSQL auto-routes queries to correct partition.
Error occurs when inserting vector with different dimensions than column definition. Column defined: embedding vector(1536), but inserting 768-dimensional vector. Fix: (1) Use correct embedding model (text-embedding-3-large: 1536 dims, text-embedding-3-small: 1536 dims, ada-002: 1536 dims, all-MiniLM-L6-v2: 384 dims). (2) Recreate column with correct dimensions: ALTER TABLE my_table ALTER COLUMN embedding TYPE vector(768); WARNING: drops existing indexes, must recreate. (3) If embedding model changed: re-generate all embeddings with new model. Verify model output dimensions before table creation. Common dimensions: OpenAI ada-002: 1536, OpenAI text-embedding-3-small: 1536 (configurable 512-1536), text-embedding-3-large: 3072 (configurable 256-3072), Sentence Transformers all-MiniLM-L6-v2: 384, Cohere embed-english-v3.0: 1024. Check inserted vector: SELECT array_length(embedding::real[], 1) FROM my_table; Dimension mismatch prevents insertion - no automatic padding or truncation.
Use standard UPDATE: UPDATE my_table SET embedding = '[0.1, 0.2, ...]'::vector WHERE id = 123; Index automatically updates incrementally (no full rebuild needed). For HNSW: update inserts new vector in index and marks old deleted (lazy cleanup). HNSW has no training step - incremental insertion works by traversing graph to find right place and updating connections. Update performance: ~2-10ms per vector (similar to insert). For bulk updates (>10% of table): consider REINDEX after to clean up deleted entries and improve query performance. Monitor index bloat: SELECT pg_size_pretty(pg_relation_size('my_hnsw_idx')); VACUUM table to reclaim space from deleted vectors. For frequently updated vectors: consider separate table for mutable data + join with vector table. High update rate (>1000/sec): HNSW can accumulate deleted entries - schedule periodic REINDEX CONCURRENTLY during low-traffic hours. Alternative: if >50% of vectors change, faster to DROP INDEX, UPDATE all rows, CREATE INDEX.
Enable extension: CREATE EXTENSION pg_stat_statements; Configure in postgresql.conf: shared_preload_libraries = 'pg_stat_statements', pg_stat_statements.track = all. Restart PostgreSQL. Query slow vector searches: SELECT query, calls, mean_exec_time, max_exec_time FROM pg_stat_statements WHERE query LIKE '%<=>%' OR query LIKE '%<->%' OR query LIKE '%<#>%' ORDER BY mean_exec_time DESC LIMIT 10; Key metrics: (1) mean_exec_time: average query latency. (2) calls: query frequency. (3) max_exec_time: worst-case latency. (4) shared_blks_hit vs shared_blks_read: index in memory vs disk. Target: mean_exec_time <50ms for HNSW, <100ms for IVFFlat. If mean_exec_time >100ms: check shared_buffers fits index, verify index used (EXPLAIN ANALYZE), tune ef_search/probes. Reset stats: SELECT pg_stat_statements_reset(); For production monitoring: export to Prometheus/Grafana or use pganalyze.com.
Use Reciprocal Rank Fusion (RRF) to combine BM25 and vector results: WITH vector_search AS (SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> '[...]') AS rank FROM documents LIMIT 20), fts_search AS (SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, query) DESC) AS rank FROM documents, plainto_tsquery('search text') query WHERE tsv @@ query LIMIT 20) SELECT COALESCE(v.id, f.id) AS id, (1.0 / (60 + v.rank) + 1.0 / (60 + f.rank)) AS score FROM vector_search v FULL OUTER JOIN fts_search f ON v.id = f.id ORDER BY score DESC LIMIT 10; RRF formula: 1/(k+rank) where k=60 (default smoothing constant, smaller k weights top results more). RRF is rank-based (not score-based), making it superior to simple score addition. Requires: (1) GIN index for full-text: CREATE INDEX ON documents USING gin(tsv); (2) HNSW index for vectors. Use cases: RAG with keyword boosting, e-commerce search (text+image), hybrid question answering. Benchmark: hybrid improves accuracy 8-15% over pure methods.
Operator precedence issue: embedding <=> '[...]' < 0.5 is parsed as embedding <=> ('[...]' < 0.5), not (embedding <=> '[...]') < 0.5. Fix: use parentheses: WHERE (embedding <=> '[...]') < 0.5. Correct query: SELECT * FROM items WHERE (embedding <=> '[0.1, 0.2, ...]'::vector) < 0.5 ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector LIMIT 10; Without parentheses: PostgreSQL tries to evaluate '[...]' < 0.5 which fails or produces wrong results. Same issue with all distance operators: <=>, <->, <#>. Alternative: use column alias: SELECT *, embedding <=> '[...]' AS distance FROM items WHERE distance < 0.5 ORDER BY distance; ERROR - cannot reference column alias in WHERE. Solution: use subquery or CTE: WITH ranked AS (SELECT *, embedding <=> '[...]' AS distance FROM items) SELECT * FROM ranked WHERE distance < 0.5 ORDER BY distance LIMIT 10; Always wrap distance operators in parentheses when used in WHERE clause.
Use pg_dump with custom format: pg_dump -Fc -f mydb.dump mydb. Restore: pg_restore -d newdb mydb.dump. IMPORTANT: pgvector extension must be installed in target database first: CREATE EXTENSION vector; pg_dump includes: table schemas, vector data, index definitions. Indexes are NOT stored as data - they rebuild during restore. For 1M+ vectors: restore takes hours due to index rebuild. Speed up: (1) Restore with --no-indexes flag: pg_restore --no-indexes -d newdb mydb.dump, then manually CREATE INDEX after data loaded. (2) Increase maintenance_work_mem: SET maintenance_work_mem = '8GB'; before restore. For large databases: use pg_basebackup (physical backup) - includes pre-built indexes: pg_basebackup -D /backup/dir -Fp -Xs -P. Physical backups restore instantly (no index rebuild). Alternative: dump data only, rebuild indexes on restore: pg_dump -a (data only), then CREATE INDEX CONCURRENTLY. For cloud: use provider snapshots (RDS, Cloud SQL) for fastest restore.
pgvectorscale is a Timescale extension (2024+) that complements pgvector with StreamingDiskANN indexes and optimized filtering. Use pgvectorscale when: (1) Dataset >5M vectors (pgvector HNSW memory limits). (2) Need filtered vector search with <5% selectivity (pgvectorscale 10x faster using Filtered DiskANN approach). (3) Require disk-based indexes (pgvector requires index in memory). (4) Budget constrained (pgvectorscale uses 28x less memory). Installation: CREATE EXTENSION vectorscale CASCADE; (automatically loads pgvector). Create DiskANN index: CREATE INDEX ON items USING diskann(embedding vector_cosine_ops); Features: (1) Statistical Binary Quantization compression. (2) Label-based filtering (faster than WHERE). (3) Automatic index maintenance. Written in Rust. Compatible with pgvector vector data type - drop-in replacement. Benchmark: 28x lower p95 latency, 16x higher throughput vs Pinecone on 50M Cohere embeddings (768 dims) at 99% recall. Use pgvector for: <1M vectors, simple use cases. Use pgvectorscale for: large-scale production, filtered search.