Full tracing with 100% sampling creates massive span volume overwhelming CPU and memory. Every HTTP request generates 10-50 spans. High request rate (1000s/sec) multiplied by spans causes: event loop blocking (span processing is synchronous), memory pressure (spans queue in memory), network saturation (sending telemetry), collector bottleneck. Solution: Aggressive sampling (1% or less), BatchSpanProcessor (not SimpleSpanProcessor), disable noisy instrumentations. With 1% sampling + batching, degradation drops from 80% to <5%.
OpenTelemetry Performance FAQ & Answers
12 expert OpenTelemetry Performance answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
12 questions1% sampling (0.01) is recommended for high-traffic production systems. Pattern: new TraceIdRatioBasedSampler(0.01). This samples 1 out of 100 traces, reducing overhead from 80% to 5%. For low-traffic (<100 req/sec): use 10-20% sampling. For critical services: always sample errors (conditional sampler), 1% for success. High-traffic (>1000 req/sec): use 0.1% (1 in 1000). Benchmark your specific app - target <5% performance impact. Monitor span volume: aim for <1000 spans/sec exported.
Always use BatchSpanProcessor in production. SimpleSpanProcessor exports spans synchronously one-at-a-time, blocking the event loop. BatchSpanProcessor batches spans and exports asynchronously. Pattern: new BatchSpanProcessor(exporter, { maxQueueSize: 2048, maxExportBatchSize: 512, scheduledDelayMillis: 5000 }). This reduces network calls (batch vs individual), prevents event loop blocking (async export), buffers spans efficiently (queue). SimpleSpanProcessor is only for debugging/testing. Performance: BatchSpanProcessor is 10-20X faster for high request rates.
Disable specific instrumentations that produce too many spans or aren't needed. Node.js: getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, '@opentelemetry/instrumentation-dns': { enabled: false } }). Java: -Dotel.instrumentation.jdbc.enabled=false -Dotel.instrumentation.logging.enabled=false. Common noisy ones: filesystem (fs), DNS queries, logging frameworks, database connection pools. Keep: HTTP, Express, database queries, external API calls. Monitor span volume per instrumentation to identify noisy ones. Disable instrumentations producing >50% of total spans if not valuable.
Always sample traces containing errors, sample only 1% of successful traces. Pattern: shouldSample(context, traceId, spanName, spanKind, attributes) { if (attributes['http.status_code'] >= 400) return RECORD_AND_SAMPLE; return Math.random() < 0.01 ? RECORD_AND_SAMPLE : NOT_RECORD; }. This ensures all errors are captured for debugging while limiting overhead from successful requests. Benefits: 100% error visibility, <5% performance impact. Requires custom sampler implementation. Alternative: Use ParentBasedSampler with error-based root sampler.
Use batch processor with appropriate buffer sizes: batch: { timeout: 10s, send_batch_size: 1024 }. Add probabilistic sampler at collector: probabilistic_sampler: { sampling_percentage: 1 }. This provides defense-in-depth if app sampling fails. Set resource limits in collector config to prevent OOM. Use multiple collector instances with load balancing for high volume. Export to local collector on same host (fast), collector aggregates and forwards. Pattern: App → Local Collector (high throughput) → Central Collector (aggregation) → Backend. Monitor collector queue length and drop rate.
Full tracing with 100% sampling creates massive span volume overwhelming CPU and memory. Every HTTP request generates 10-50 spans. High request rate (1000s/sec) multiplied by spans causes: event loop blocking (span processing is synchronous), memory pressure (spans queue in memory), network saturation (sending telemetry), collector bottleneck. Solution: Aggressive sampling (1% or less), BatchSpanProcessor (not SimpleSpanProcessor), disable noisy instrumentations. With 1% sampling + batching, degradation drops from 80% to <5%.
1% sampling (0.01) is recommended for high-traffic production systems. Pattern: new TraceIdRatioBasedSampler(0.01). This samples 1 out of 100 traces, reducing overhead from 80% to 5%. For low-traffic (<100 req/sec): use 10-20% sampling. For critical services: always sample errors (conditional sampler), 1% for success. High-traffic (>1000 req/sec): use 0.1% (1 in 1000). Benchmark your specific app - target <5% performance impact. Monitor span volume: aim for <1000 spans/sec exported.
Always use BatchSpanProcessor in production. SimpleSpanProcessor exports spans synchronously one-at-a-time, blocking the event loop. BatchSpanProcessor batches spans and exports asynchronously. Pattern: new BatchSpanProcessor(exporter, { maxQueueSize: 2048, maxExportBatchSize: 512, scheduledDelayMillis: 5000 }). This reduces network calls (batch vs individual), prevents event loop blocking (async export), buffers spans efficiently (queue). SimpleSpanProcessor is only for debugging/testing. Performance: BatchSpanProcessor is 10-20X faster for high request rates.
Disable specific instrumentations that produce too many spans or aren't needed. Node.js: getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, '@opentelemetry/instrumentation-dns': { enabled: false } }). Java: -Dotel.instrumentation.jdbc.enabled=false -Dotel.instrumentation.logging.enabled=false. Common noisy ones: filesystem (fs), DNS queries, logging frameworks, database connection pools. Keep: HTTP, Express, database queries, external API calls. Monitor span volume per instrumentation to identify noisy ones. Disable instrumentations producing >50% of total spans if not valuable.
Always sample traces containing errors, sample only 1% of successful traces. Pattern: shouldSample(context, traceId, spanName, spanKind, attributes) { if (attributes['http.status_code'] >= 400) return RECORD_AND_SAMPLE; return Math.random() < 0.01 ? RECORD_AND_SAMPLE : NOT_RECORD; }. This ensures all errors are captured for debugging while limiting overhead from successful requests. Benefits: 100% error visibility, <5% performance impact. Requires custom sampler implementation. Alternative: Use ParentBasedSampler with error-based root sampler.
Use batch processor with appropriate buffer sizes: batch: { timeout: 10s, send_batch_size: 1024 }. Add probabilistic sampler at collector: probabilistic_sampler: { sampling_percentage: 1 }. This provides defense-in-depth if app sampling fails. Set resource limits in collector config to prevent OOM. Use multiple collector instances with load balancing for high volume. Export to local collector on same host (fast), collector aggregates and forwards. Pattern: App → Local Collector (high throughput) → Central Collector (aggregation) → Backend. Monitor collector queue length and drop rate.