mcp_discovery_observability 6 Q&As

MCP Discovery Observability FAQ & Answers

6 expert MCP Discovery Observability answers researched from official documentation. Every answer cites authoritative sources you can verify.

unknown

6 questions
A

MCP server discovery evolved significantly in 2025. Three discovery methods: (1) MCP Registry (launched Sept 2025): Official catalog at modelcontextprotocol.io/registry with API v0.1, GitHub MCP Registry auto-publishes from OSS Community Registry. Query via API, install via CLI. (2) Desktop Extensions Directory: Claude Desktop Settings > Extensions > Browse extensions for Anthropic-reviewed .mcpb bundles (one-click install). (3) Manual config: claude_desktop_config.json with {mcpServers: {'name': {command: 'node', args: ['server.js']}}}. Enterprise: MDM deploys standard configs, Primary Owners upload custom .mcpb for team. Discovery evolution: Servers advertise via .well-known URLs, Docker MCP Catalog offers 60+ servers. Registry validates namespace ownership for publishing. Legacy: Manual GitHub searches still work but deprecated.

99% confidence
A

Expose server info and capabilities in initialize response. Required: {protocolVersion: '2025-03-26', capabilities: {tools: {...}, resources: {...}, prompts: {...}}, serverInfo: {name: 'MyMCPServer', version: '1.0.0'}}. Capabilities object: List supported features - tools (can provide tools), resources (can provide resources), prompts (can provide prompts), roots (can define access boundaries). ServerInfo: Name for display in client UI, version for compatibility checking. Optional metadata: vendor, homepage, documentation URL. Pattern: Initialize response tells client everything server can do. Client adapts UI based on capabilities. Version in serverInfo enables: Client warnings for outdated servers, automatic update prompts, compatibility checks.

99% confidence
A

Implement GET /health for basic liveness and GET /health/ready for readiness. Liveness (/health): Returns 200 if server process is running. Response: {status: 'ok', timestamp: ISO8601}. Check: Server process alive. Readiness (/health/ready): Returns 200 if server can handle requests. Response: {status: 'ready', checks: {database: 'connected', cache: 'connected'}}. Check: Database connection, external dependencies, disk space. Use for: Kubernetes liveness/readiness probes, load balancer health checks, monitoring alerts. Pattern: livenessProbe: {httpGet: {path: '/health', port: 3000}, initialDelaySeconds: 10, periodSeconds: 30}, readinessProbe: {httpGet: {path: '/health/ready', port: 3000}}. Return 503 if not ready, keeps receiving traffic in queue.

99% confidence
A

Collect 6 key metric types: (1) Request rate: tools_called_total counter by tool name. (2) Latency: tool_duration_seconds histogram with p50/p95/p99. (3) Error rate: tool_errors_total counter by error code. (4) Session metrics: active_sessions gauge, session_duration_seconds histogram. (5) Initialization success: initialize_attempts_total, initialize_failures_total. (6) Resource usage: process_cpu_percent, process_memory_bytes, event_loop_lag_ms. Use Prometheus format: tool_duration_seconds{tool='search',status='success'} histogram. Pattern: Instrument with prom-client npm package. Expose GET /metrics endpoint. Grafana dashboard: Track error rate >1%, p95 latency >500ms, session count trends. Alert: Error rate spike, high latency, memory leak.

99% confidence
A

Track error rates by category and implement structured logging. Error categories: (1) Client errors (4xx): Invalid params, method not found. (2) Server errors (5xx): Database failures, timeouts. (3) JSON-RPC errors: Parse errors, protocol violations. Metrics: error_rate = errors_total / requests_total. Alert threshold: >1% error rate sustained for 5 minutes. Logging pattern: logger.error({error_code: -32602, tool: 'search', params: {...}, error_message: 'Missing required field', stack_trace, session_id}). Use ELK/Datadog for log aggregation. Debug workflow: (1) Check /metrics for error spike, (2) Query logs for error_code + timestamp, (3) Reproduce with MCP Inspector, (4) Fix and deploy. Monitor: Top 10 errors by frequency, error rate by tool.

99% confidence
A

Yes for production MCP servers. Use OpenTelemetry with W3C Trace Context propagation and scale-based sampling. 2025 tooling: (1) Sentry one-line instrumentation for JS SDK MCP servers, (2) SigNoz for self-hosted observability with native MCP support, (3) Grafana Tempo 2.9+ with MCP server TraceQL queries. Sampling strategies: Small scale (1% normal, 100% errors), Medium scale (tail-based 10% baseline, 100% errors), Large scale (head-based 0.1%, tail-based for anomalies). Config: BatchSpanProcessor with GZIP compression (60-80% bandwidth reduction). Instrument: MCP initialize, tool invocations, database queries, external APIs. Benefits: Debug >500ms requests, identify bottlenecks, trace cross-service errors. Cost: <5% overhead with proper sampling. DON'T: Sample 100% in production (80% overhead), log PII in spans.

99% confidence