pinecone 50 Q&As

Pinecone FAQ & Answers

50 expert Pinecone answers researched from official documentation. Every answer cites authoritative sources you can verify.

unknown

50 questions

What Pinecone API call do I use to query by vector ID to retrieve metadata without performing vector similarity search?

Use the fetch() method for direct ID-based retrieval without similarity computation: index.fetch(ids=['vec1', 'vec2', 'vec3']). This returns vectors and metadata by ID only, avoiding computational overhead of similarity search. For retrieving all IDs in namespace first: results = await index.listPaginated({ prefix: 'doc1#' }), then fetch those IDs. Fetch is ideal when you know exact vector IDs and only need metadata/values. Example: result = index.fetch(ids=['0', '1'], namespace='documents'); returns dictionary with vectors and metadata. Fetch supports multiple IDs in single call (recommended batch size: 100-1000 IDs). No similarity ranking performed - pure key-value lookup. Use fetch for retrieval by known ID, use query() for semantic search.

Sources

docs.pinecone.io medium.com docs.pinecone.io

99% confidence

How do I batch delete vectors from a Pinecone namespace without rate limiting?

Use the new 2025-04 API delete method: index.delete(delete_all=True, namespace='example-namespace'). This deletes entire namespace and all records irreversibly. For selective deletion without rate limits in serverless indexes: reads and writes don't share compute resources, making large batch deletes safe. Python: from pinecone.grpc import PineconeGRPC as Pinecone; pc = Pinecone(api_key='KEY'); index = pc.Index(host='HOST'); index.delete(delete_all=True, namespace='ns'). cURL: curl -X DELETE "https://$INDEX_HOST/namespaces/$NAMESPACE" -H "Api-Key: $KEY" -H "X-Pinecone-API-Version: 2025-04". For pod-based indexes with many vectors: batch updates slowly to avoid affecting query latency. Serverless indexes handle large deletions better due to isolated compute. Namespace deletion is permanent - verify before executing.

Sources

docs.pinecone.io docs.pinecone.io stackoverflow.com

99% confidence

What is the Python code to filter a Pinecone vector query by user_id metadata field and date range?

Use metadata filters with $eq and $gte/$lte operators: index.query(vector=query_vector, filter={'user_id': {'$eq': 'user123'}, 'date': {'$gte': 20250101, '$lte': 20251231}}, top_k=10, include_metadata=True). Store dates as integers in YYYYMMDD format - string dates don't work with $gte/$lte (expects numbers). Supported operators: $eq (equals), $ne (not equals), $gt/$gte (greater than/equals), $lt/$lte (less than/equals), $in (in list). Combine multiple filters in same query - Pinecone applies all conditions. Full example: from pinecone import Pinecone; pc = Pinecone(api_key='KEY'); index = pc.Index('index-name'); results = index.query(...). Metadata filtering has same performance as namespace filtering. Include include_metadata=True to return metadata in results.

Sources

docs.pinecone.io community.pinecone.io stackoverflow.com

99% confidence

How do I implement hybrid search combining BM25 and vector similarity in Pinecone?

Use Pinecone sparse-dense vectors with BM25Encoder: from pinecone_text.sparse import BM25Encoder; bm25 = BM25Encoder().fit(corpus); sparse_vec = bm25.encode_queries('query text'); dense_vec = model.encode('query text'); index.query(vector=dense_vec, sparse_vector={'indices': sparse_vec['indices'], 'values': sparse_vec['values']}, top_k=10, namespace='hybrid'). Index must use dotproduct metric (only metric supporting sparse vectors). Upserts require sparse_values parameter for each vector. BM25Encoder: fit tf-idf values to your corpus (default values not recommended). Use multi-qa-MiniLM-L6-cos-v1 or similar for dense vectors. Hybrid search combines keyword relevance (BM25) with semantic understanding (embeddings). LangChain PineconeHybridSearchRetriever automates this pattern. Create index with metric='dotproduct' for sparse support.

Sources

pinecone.io medium.com python.langchain.com

99% confidence

What Pinecone pod_type and topK settings prevent timeout errors when querying 100K documents with 100 concurrent requests?

IMPORTANT: Customers signing up for Standard/Enterprise on or after August 18, 2025 CANNOT create pod-based indexes - use serverless instead. For existing pods: p2 pods support 200 QPS per replica for vectors <128 dimensions with topK<50, returning queries in <10ms. For 100 concurrent requests: configure 2-3 replicas (200 QPS × 2 = 400 QPS capacity). Create with pod_type='p2.x1' or 'p2.x2'. Single p2.x8 pod supports >1000 QPS for 10M vectors (256-dim). Keep topK<50 for optimal performance. Increase replicas: index.configure_index(replicas=3). For 100K documents: p2.x1 with 2 replicas handles 100 concurrent queries at <10ms latency. Performance varies by: dimensionality, topK, filters, cloud provider. Scale replicas for throughput, scale pod size for larger datasets. Use gRPC client for best performance. Migration: pods deprecated, migrate to serverless for auto-scaling and 50x lower cost.

Sources

docs.pinecone.io pinecone.io docs.pinecone.io

99% confidence

What Pinecone upsert payload format avoids InvalidIndexDimension error when mixing metadata with arrays and primitives?

Ensure vector dimensions match index exactly: if index created with dimension=1536, all upserted vectors must have exactly 1536 values. Error 'Vector dimension X does not match the dimension of the index Y' indicates mismatch. Correct format: index.upsert(vectors=[{'id': 'vec1', 'values': [0.1, 0.2, ...], 'metadata': {'tags': ['a', 'b'], 'score': 0.95, 'text': 'content'}}], namespace='ns'). Metadata supports arrays, primitives, nested objects - structure doesn't affect dimension error. Common causes: (1) Wrong embedding model (ada-002=1536, text-embedding-3-large=3072), (2) Truncated/padded vectors, (3) Empty vectors (dimension 0). Verify: len(vector_values) == index_dimension before upsert. For 2025-01 API with integrated embeddings: index converts text to vectors automatically. Check embedding model output matches index dimension. Use try-except to catch dimension errors early.

Sources

community.pinecone.io docs.pinecone.io stackoverflow.com

99% confidence

What causes the Pinecone gRPC message size exceeded error and what is the batch size limit?

Error occurs when batch exceeds 2MB limit: 'Request size 3MB exceeds the maximum supported size of 2MB'. Pinecone caps payload at 2MB per request. Each vector stores up to 40KB metadata. Solution: reduce batch size based on total bytes (vectors + metadata), not just count. Recommended: upsert batches up to 1000 records without exceeding 2MB. Calculate batch size: num_vectors * (dimensions * 4 bytes + metadata_size) < 2MB. For 1536-dim vectors: ~300 vectors with minimal metadata, fewer with large metadata. Use gRPC client for better performance: from pinecone.grpc import PineconeGRPC. Implement dynamic batching: check byte size before sending. Example: batch 100 vectors at a time for high-dimensional embeddings with metadata. Monitor total payload size, not just vector count. Serverless indexes handle batching more efficiently than pods.

Sources

github.com docs.pinecone.io pinecone.io

99% confidence

How do I construct a Pinecone API call that uses namespace and metadata filter in a single query?

Combine namespace parameter with filter in query: index.query(vector=query_vector, namespace='documents', filter={'status': {'$eq': 'processed'}}, top_k=10, include_metadata=True). For multiple namespaces use query_namespaces() utility: from pinecone import Pinecone; pc = Pinecone(api_key='KEY'); index = pc.Index('index'); combined = index.query_namespaces(vector=query_vec, namespaces=['ns1', 'ns2', 'ns3'], top_k=10, filter={'genre': {'$eq': 'comedy'}}). This runs query in parallel across namespaces and merges results into single ranked set. Single query limited to one namespace - use query_namespaces for multiple. Filter supports operators: $eq, $ne, $gt, $gte, $lt, $lte, $in. Performance identical for namespaces vs metadata filtering. Best practice: use namespaces for user/tenant isolation, metadata for attribute filtering within namespace.

Sources

docs.pinecone.io docs.pinecone.io medium.com

99% confidence

What causes 403 Forbidden errors from Pinecone when querying an index from a Vercel serverless function?

Common causes: (1) Missing/incorrect API key in Vercel environment variables - verify API key is correct (check x-pinecone-auth-rejected-reason header for 'Wrong API key'), (2) Outdated Pinecone SDK requiring legacy environment parameter - update to latest client, (3) API key not accessible in serverless context. Fix: Set environment variables in Vercel dashboard (Project Settings > Environment Variables): PINECONE_API_KEY, PINECONE_CLOUD, PINECONE_REGION, PINECONE_INDEX. Redeploy after adding variables. Modern initialization (no environment parameter): const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY }). For serverless indexes: must use updated Pinecone client (older clients raise connection errors). Test locally with same environment variables before deploying. Check response headers for x-pinecone-auth-rejected-reason to diagnose auth issues. Use Pinecone's official Vercel starter template as reference for correct setup.

Sources

community.pinecone.io github.com docs.pinecone.io

99% confidence

When should I use Pinecone serverless vs pod-based indexes in 2025?

Use serverless for 95% of use cases: auto-scaling, 50x lower cost, 47% lower latency vs pods, no resource management. Serverless recommended for: variable workloads, cost-sensitive deployments, new projects, <10M vectors. Pod-based (legacy, being phased out): customers who sign up for Standard/Enterprise after August 18, 2025 CANNOT create pod-based indexes. Use pods only for: existing deployments migrating to serverless, specialized performance requirements with dedicated read nodes. Serverless features: live updates, metadata filtering, hybrid search, namespaces (all pod features). Cost: serverless usage-based (pay for reads/writes), pods fixed (pay for peak capacity even when idle). Migration: Pinecone recommends serverless for all new workloads. Create serverless: pc.create_index(name='index', dimension=1536, metric='cosine', spec=ServerlessSpec(cloud='aws', region='us-east-1')). Production: serverless is default choice in 2025.

Sources

pinecone.io pinecone.io docs.pinecone.io

99% confidence

How do I optimize batch upsert performance in Pinecone with pool_threads?

Initialize Pinecone client with pool_threads for parallel requests: from pinecone import Pinecone; pc = Pinecone(api_key='key', pool_threads=8); index = pc.Index('index-name'). pool_threads controls concurrent API requests - more threads = higher throughput. Recommended: pool_threads=8-16 for production, 4+ minimum. Batch size: up to 1000 vectors per batch (max 2MB payload). For LangChain + OpenAI embeddings: use pool_threads>4, embedding_chunk_size>=1000, batch_size=64 for 5x speedup. Parallel batching: send multiple batches concurrently - pool_threads enables this. Example: upsert 100K vectors in batches of 1000 with 10 parallel threads = 10x faster than sequential. Trade-off: too many threads (>20) risks rate limits. Monitor: API rate limits (varies by plan), memory usage. Production: combine with async/await for maximum throughput. Serverless indexes handle parallel writes better than pods.

Sources

pinecone.io docs.pinecone.io campus.datacamp.com

99% confidence

How do I create and restore backups in Pinecone for disaster recovery?

Create backup for serverless index: pc.create_backup(index_name='my-index', backup_name='backup-2025-11-15'). Backups are static copies, stored in same region as source index. Limitations: only for serverless indexes (not pods), max 2,000 namespaces, 50 backups per project quota, only includes vectors inserted 15+ minutes prior (recent vectors excluded). Restore from backup: pc.create_index(name='restored-index', backup_source='backup-2025-11-15', spec=ServerlessSpec(cloud='aws', region='us-east-1')). Restored index same region as backup. Use cases: protect against accidental deletes, system failures, rollback to known-good state. Pod-based alternative: create collection (static copy) instead of backup. Third-party: HYCU offers automated backup for Pinecone with namespace-level recovery. Production: schedule daily backups, test restore process, document backup retention policy. Backups != replication (backups static, replication live).

Sources

docs.pinecone.io docs.pinecone.io hycu.com

99% confidence

How do I use Pinecone reranking with bge-reranker-v2-m3 to improve retrieval accuracy?

Two-stage retrieval: (1) vector search retrieves 100 candidates, (2) reranker scores top-10. Use Pinecone Inference API: from pinecone import Pinecone; pc = Pinecone(api_key='key'); results = pc.inference.rerank(model='bge-reranker-v2-m3', query='search query', documents=[{'id': '1', 'text': 'doc1'}, {'id': '2', 'text': 'doc2'}], top_n=10, return_documents=True). Models available: bge-reranker-v2-m3 (default), pinecone-rerank-v0, cohere-rerank-v3.5. Reranking processes query-document pairs, outputs similarity scores. Workflow: vector_results = index.query(vector=vec, top_k=100); docs = [{'id': r.id, 'text': r.metadata['text']} for r in vector_results]; reranked = pc.inference.rerank(model='bge-reranker-v2-m3', query=query, documents=docs, top_n=10). Cost: $0.002 per request (bge-reranker-v2-m3). Benefits: 20-30% accuracy improvement over vector-only. Production: rerank top-50 to top-100 candidates, cache rerank results for repeated queries. LangChain integration: from langchain_pinecone.rerank import PineconeRerank.

Sources

docs.pinecone.io pinecone.io pinecone.io

99% confidence

How do I configure the alpha parameter in Pinecone hybrid search to balance dense and sparse vectors?

Alpha (α) controls weight between dense (semantic) and sparse (keyword) search. Range: 0.0 to 1.0. α=1.0: pure dense/semantic search (ignores sparse), α=0.0: pure sparse/keyword search (ignores dense), α=0.5: balanced (50% each). Formula: alpha * dense_vec + (1-alpha) * sparse_vec. Pinecone does NOT expose alpha parameter directly in API - you must scale vector values yourself before upserting/querying. Implementation: scale sparse values by (1-alpha) and dense values by alpha before combining. Tuning: start α=0.5, test on validation set, optimize for F1 score. Use cases: α=0.7-0.9 for semantic-heavy (QA, chatbots), α=0.3-0.5 for keyword-heavy (exact match, codes). Dynamic alpha: adjust per query type (questions → higher alpha, keywords → lower alpha). Production: A/B test different alphas, monitor click-through rate. IMPORTANT: Only indexes with dotproduct metric support sparse-dense vectors (hybrid search in public preview as of 2025).

Sources

pinecone.io docs.pinecone.io docs.pinecone.io

99% confidence

How do I enable disk-based metadata filtering in Pinecone to reduce memory usage?

Disk-based metadata filtering (2025 feature) stores metadata on disk instead of RAM - reduces memory footprint while maintaining query performance. Enabled automatically for new serverless indexes - no configuration needed. Technical implementation: uses bitmap indices (similar to data warehouses), low-cardinality bitmaps cached in memory (within budget), high-cardinality bitmaps streamed from disk and intersected with vector index. Architecture: immutable vector slabs in LSM-tree structure with metadata index per slab. Benefits: high-cardinality filters (millions of unique values), improved recall vs in-memory, lower cost. Use cases: user_id with millions of users, product_id with large catalogs, timestamp with granular precision. Performance: disk-based filtering as fast as in-memory for most queries (metadata stored on SSDs). Supports all filter operators: $eq, $ne, $gt, $gte, $lt, $lte, $in. Example: filter={'product_id': {'$in': list_of_1M_product_ids}} works efficiently. Production: use metadata filtering for high-cardinality partitioning (millions of partitions).

Sources

pinecone.io docs.pinecone.io pinecone.io

99% confidence

When should I use Pinecone namespaces vs metadata filtering for multi-tenancy?

Performance identical - choose based on use case. Use namespaces: (1) Strict data isolation (tenant A cannot see tenant B data even with bugs), (2) <10K tenants, (3) Different vector sets per tenant, (4) Deletion by tenant (delete_all in namespace). Use metadata filtering: (1) >10K tenants (Pinecone supports millions of namespaces but metadata scales better), (2) Cross-tenant queries needed, (3) High-cardinality filtering (millions of users), (4) Flexible multi-dimensional filtering (tenant + date + category). Query namespaces: index.query(vector=vec, namespace='tenant123'). Query metadata: index.query(vector=vec, filter={'tenant_id': {'$eq': 'tenant123'}}). Combine both: namespace for primary partition (region/env), metadata for secondary (user/date). Future: delete by metadata (metadata filtering enables bulk deletes like namespaces). Production decision: <1K tenants → namespaces, >10K tenants → metadata, hybrid needs → both. Disk-based metadata filtering (2025) makes metadata approach scalable to millions.

Sources

docs.pinecone.io community.pinecone.io pinecone.io

99% confidence

How do I configure Pinecone index with dedicated read nodes for high-traffic applications?

Provisioned read capacity (early access, 2025) provides dedicated storage and compute resources for predictable performance with millions-billions of records and moderate-high QPS (1000+ queries/sec). Configuration uses API version 2025-10: set mode='Dedicated' in spec.serverless.read_capacity object, choose node type (b1 or t1), configure replicas based on throughput needs. Each shard provides 250GB storage. Request access: contact Pinecone sales for early access. Benefits: (1) Dedicated storage cached in memory+disk for low-latency queries, (2) No rate limits on read operations (query, list, fetch), (3) Predictable cost for reserved capacity, (4) Isolated resources (no noisy neighbors). When to use: (1) >1M QPS sustained, (2) <10ms p99 latency SLA required, (3) Production-critical workload, (4) Budget for reserved capacity. Default serverless: auto-scaling handles most traffic (variable workloads, burst traffic, cost optimization). Production: start with auto-scaling, upgrade to provisioned capacity when auto-scaling insufficient.

Sources

docs.pinecone.io docs.pinecone.io pinecone.io

99% confidence

How do I monitor Pinecone usage and costs in production?

Pinecone Console provides usage monitoring: navigate to Usage dashboard for read units, write units, storage (GB), costs by index. Available to organization owners on Standard/Enterprise plans. Metrics breakdown: total requests, p50/p95/p99 latency, error rate, throttled requests. API-level tracking: query/fetch/list requests return usage parameter with read unit consumption; hosted embedding model requests return usage parameter with total tokens. Export: download CSV for billing analysis. Cost structure (serverless): pay per read unit (1 query = 1 RU), write unit (1 upsert = 1 WU), storage (GB-hour). Optimization: (1) Use serverless (50x cheaper than pods), (2) Reduce dimensionality (768 vs 1536 = 50% storage savings), (3) Clean unused indexes/namespaces, (4) Batch upserts (fewer WUs), (5) Cache query results (reduce RUs). Third-party: Datadog integration available for tracking requests, latency, usage trends. Production: tag indexes by environment (dev/staging/prod), review monthly usage, set budget alerts.

Sources

docs.pinecone.io docs.pinecone.io docs.datadoghq.com

99% confidence

How do I secure Pinecone API keys and implement access control?

API key security: (1) Never commit keys to git (use .env files, gitignore), (2) Store in secrets manager (AWS Secrets Manager, Vercel env vars), (3) Rotate keys periodically (generate new key, migrate, delete old), (4) Use environment-specific keys (dev/staging/prod separate). Access control: Pinecone API keys have index-level permissions. Create restricted keys: Pinecone Console → API Keys → Create key → select specific indexes. Key types: read-write (full access), read-only (query only, no upserts/deletes). Client initialization: pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY')). Production best practices: (1) TLS/HTTPS only (Pinecone enforces), (2) Network isolation (VPC peering for Enterprise), (3) Audit logging (track API key usage), (4) Principle of least privilege (read-only keys for apps only querying). Compliance: SOC 2 Type II, GDPR compliant. Enterprise features: SSO, RBAC, private endpoints. Monitor: failed auth attempts (403 errors), unusual usage patterns.

Sources

docs.pinecone.io pinecone.io docs.pinecone.io

99% confidence

How do I implement multi-region deployment in Pinecone for global low latency?

Deploy indexes in multiple regions for global coverage. Pinecone serverless available in: AWS (us-west-2, us-east-1, eu-west-1), GCP (us-central1, europe-west4), Azure (eastus, westeurope). Create separate index per region: pc.create_index(name='index-us', dimension=1536, metric='cosine', spec=ServerlessSpec(cloud='aws', region='us-east-1')); pc.create_index(name='index-eu', dimension=1536, metric='cosine', spec=ServerlessSpec(cloud='aws', region='eu-west-1')). Global control plane: requests to api.pinecone.io auto-route to nearest API server via Google Cloud global load balancer backed by Cloud Spanner (globally replicated). Client-side routing: use geo-DNS (Route 53, Cloudflare) or application logic to detect user region (IP geolocation), query regional index. Sync strategy: write to all regions (eventual consistency) or write to primary + replicate. Latency: p50 <10ms, p99 <50ms globally. Benefits: <50ms latency worldwide, regulatory compliance (data residency). Production: deploy app in same region as index for optimal performance.

Sources

pinecone.io docs.pinecone.io pinecone.io

99% confidence

How do I migrate from pod-based to serverless Pinecone indexes?

Migration steps: (1) Create collection from pod index: pc.create_collection(name='migration-backup', source='pod-index-name'), (2) Create serverless index from collection: pc.create_index(name='serverless-index', dimension=1536, metric='cosine', spec=ServerlessSpec(cloud='aws', region='us-east-1'), collection_source='migration-backup'), (3) Validate: query both indexes, compare results (should match), (4) Cutover: update application to use serverless index, (5) Clean up: delete pod index after 7-day safety period. Limitations: same metric (cosine/euclidean/dotproduct) required, same dimension, collection max 2,000 namespaces. Test migration: use non-production index first. Dual-write approach (zero downtime): write to both pod + serverless during transition, switch reads gradually. Benefits after migration: 50x cost reduction, auto-scaling, lower latency. Gotchas: API changes (collection vs backup terminology), region compatibility (ensure serverless region matches pod region). Production: schedule migration during low-traffic window, monitor query performance post-migration.

Sources

docs.pinecone.io pinecone.io docs.pinecone.io

99% confidence

How do I optimize Pinecone query performance with top_k and include_metadata settings?

Keep top_k small for best performance: top_k=10-50 recommended, top_k>100 slower. Query: index.query(vector=vec, top_k=10, include_metadata=True, include_values=False). include_metadata=True returns metadata (default), include_metadata=False omits metadata (faster, smaller response). include_values=True returns vector values (large payload), include_values=False omits vectors (recommended unless needed). Performance impact: top_k=10 vs top_k=100 → 2x slower. Large metadata (>10KB per vector) → slower queries. Optimization: (1) Request only needed metadata fields (future feature), (2) Use top_k=10-20 for production, (3) Fetch additional results with pagination vs large top_k, (4) Disable include_values unless required, (5) Filter before retrieval (reduces candidates), (6) Use sparse indexes for metadata-only queries. Production: monitor p95 latency, tune top_k based on use case (chatbots: top_k=5, analytics: top_k=50). Serverless auto-optimizes query performance.

Sources

docs.pinecone.io docs.pinecone.io pinecone.io

99% confidence

How do I handle Pinecone rate limits and implement retry logic?

Rate limits vary by plan: Free (5 API calls/sec), Starter (100/sec), Standard (1000/sec), Enterprise (custom). Status code 429: rate limit exceeded. Retry with exponential backoff: import time; from pinecone.exceptions import PineconeException; max_retries = 5; for attempt in range(max_retries): try: result = index.query(...); break; except PineconeException as e: if e.status == 429: wait = 2**attempt; time.sleep(wait); else: raise. Production retry library: tenacity: @retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(5), retry=retry_if_exception_type(PineconeException)). Prevent rate limits: (1) Use pool_threads for parallel requests (distributes load), (2) Batch operations (upsert 1000 vectors vs 1000 individual upserts), (3) Monitor usage (stay below limit), (4) Upgrade plan for higher limits. Serverless advantage: reads/writes isolated - write batches don't affect query latency. Production: implement circuit breaker, monitor 429 errors, alert on threshold.

Sources

docs.pinecone.io docs.pinecone.io pinecone.io

99% confidence

How do I implement versioning for Pinecone indexes to support rollback?

Versioning strategies: (1) Named indexes: create index-v1, index-v2 - switch application between versions (blue-green deployment). (2) Namespaces: use namespace as version (v1, v2, v3 namespaces in same index). (3) Metadata versioning: add version field to metadata, filter by version. Recommended: named indexes for major changes (embedding model upgrades), namespaces for minor versions. Implementation: pc.create_index(name='embeddings-v2', dimension=1536, spec=ServerlessSpec(...)). Dual-index approach: write to both v1 + v2 during transition, switch reads after validation, delete v1 after safety period. Rollback: revert application to use index-v1, fix issues, redeploy to v2. Namespace versioning: index.upsert(vectors, namespace='v2'); index.query(vector=vec, namespace='v2'). Metadata versioning: upsert with version field, query with filter={'version': 'v2'}. Production: version embeddings when model changes (text-embedding-3-small → 3-large), use backups for point-in-time recovery (serverless only), document version changelog.

Sources

docs.pinecone.io docs.pinecone.io particula.tech

99% confidence

How do I implement pagination in Pinecone for large result sets?

Pinecone list API supports pagination with pagination_token for serverless indexes only. Use listPaginated() to retrieve vector IDs: results = await index.listPaginated({namespace: 'docs', limit: 100, prefix: 'doc1#'}). Returns up to 100 IDs by default (configurable with limit parameter). Response includes pagination_token when more IDs exist; pass token to get next batch. When no pagination_token in response: all IDs retrieved. Python SDK: auto-paginates with list(), or manual with list_paginated(). Other SDKs (Node.js, Java, Go, .NET) + REST API: manual pagination required. For query results: no built-in pagination - workarounds: (1) Retrieve all results (top_k=100) once, paginate client-side, (2) ID-based exclusion: filter={'id': {'$nin': seen_ids}}, (3) Metadata cursor: filter={'created_at': {'$gt': last_seen_timestamp}}, order client-side. Best practice: cache large result sets (Redis), limit top_k to reasonable value (100-500), use metadata filtering to narrow scope before pagination.

Sources

docs.pinecone.io docs.pinecone.io community.pinecone.io

99% confidence

What Pinecone API call do I use to query by vector ID to retrieve metadata without performing vector similarity search?

Sources

docs.pinecone.io medium.com docs.pinecone.io

99% confidence

How do I batch delete vectors from a Pinecone namespace without rate limiting?

Sources

docs.pinecone.io docs.pinecone.io stackoverflow.com

99% confidence

What is the Python code to filter a Pinecone vector query by user_id metadata field and date range?