Claude is Anthropic's family of AI models emphasizing safety, helpfulness, and harmlessness. Current models (2025): Claude Sonnet 4.5 (September 2025, $3/$15 per million tokens, best coding model), Claude Opus 4.1 (August 2025, $15/$75, highest reasoning), Claude Haiku 4.5 (October 2025, $1/$5, fastest and cheapest). All models: 200K context window, vision (up to 100 images via API), extended thinking (1K-128K budget tokens), tool use with parallel calling, multilingual, training cutoff March 2025. Use cases: coding, research, analysis, content generation, customer service, agents. Available via Anthropic API, AWS Bedrock, Google Cloud Vertex AI. Constitutional AI principles ensure safety and harmlessness.
Anthropic Claude FAQ & Answers
28 expert Anthropic Claude answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
28 questionsMessages API is primary interface for Claude. Request: POST /v1/messages with model, max_tokens (required, no default), messages array. Messages: role (user/assistant) and content (text/image/document). System prompt separate (system parameter, higher priority). Response: content array with text/thinking blocks. Streaming: set stream: true for SSE events (message_start, content_block_delta, message_stop). 2025 updates: consecutive messages auto-merged, first message can be assistant, plain text/PDF documents supported, tool use with fine-grained streaming (beta), extended thinking with visible reasoning. SDKs: Python (anthropic-sdk-python) and TypeScript (anthropic-sdk-typescript) with automatic retries and type safety. Essential for conversational AI, agents, and multimodal applications.
All Claude models (Sonnet 4.5, Opus 4.1, Haiku 4.5) support 200K token context window (~150K words, ~500 pages). Long context (>200K tokens) available via context-1m-2025-08-07 beta header with Claude Sonnet 4.x (separate rate limits apply). Use for: long document analysis, large codebases (entire repositories), multi-document synthesis, comprehensive research papers. Best practices: place long content early in prompt, use XML tags for structure (
Tool use enables Claude to interact with external systems via function calling. Define tools with name, description, input_schema (JSON Schema). Claude autonomously decides when to use tools, returns tool_use block with id, name, input. Execute tool, return result with tool_result block, continue conversation. 2025 features: parallel tool calling (multiple tool_use blocks in single response, ~100% success with Claude 4), computer use (control computer via beta API), fine-grained streaming for tool parameters (beta), context management (automatic tool result clearing via context-management-2025-06-27 beta header as approaching token limits). Return all parallel tool results in single user message. Use for: API calls, calculations, database queries, web browsing, code execution, file operations. Claude 4 excels at tool selection, parameter filling, and parallel execution decisions.
Extended thinking reveals Claude's step-by-step reasoning process before final answers. Enable with thinking: {type: 'enabled', budget_tokens: N}. Response includes thinking blocks (internal reasoning) and text blocks (final answer). Budget tokens: 1,024 minimum to 128,000 maximum. Performance improves logarithmically with budget (Claude 3.7 Sonnet achieved 84.8% on GPQA Diamond with 64K budget). 2025 features: interleaved thinking with tools (thinking between tool calls, can exceed budget to full 200K context), hybrid reasoning (sequential reasoning steps), streaming support (see reasoning in real-time). Use for: complex math, logic puzzles, code debugging, research analysis, graduate-level questions. Charged at lower rate than regular tokens. Claude may not use entire budget. Unique feature for transparent, interpretable AI.
Claude 3 and 4 families support multimodal vision for image understanding and analysis. Content types: text, image (base64 or URL). Formats: JPEG, PNG, GIF, WebP. Multiple images supported: up to 100 images per API request, 20 per claude.ai request. Capabilities: text extraction (OCR), spatial reasoning (layout/positioning), diagram analysis, visual Q&A, multi-modal analysis (combine visual and textual). Use for: document processing, technical diagrams, screenshots, charts, handwriting recognition. Best practices: high resolution for text extraction, describe focus areas, combine with extended thinking for complex visual analysis. Limitations: cannot identify/name specific individuals (privacy), struggles with low-quality/blurry images. Pricing: image tokens based on size. No image generation (analysis only). Essential for document understanding and visual applications.
Prompt caching stores frequently-used context (system prompts, knowledge bases, long documents) for reuse, reducing costs by 90% and latency by 85%. Mark cacheable content with cache_control: {type: 'ephemeral'}. Pricing: cache reads cost 10% of base input price (90% savings), cache writes cost 25% more than base. Cache lifetime: 5 minutes, auto-refreshes on use. Requirements: minimum 1,024 tokens per checkpoint, max 4 checkpoints, up to 32K cached tokens. Best practices 2025: place static content first (system prompts, instructions, examples), dynamic content last (user input), organize for maximum cache hits. Use for: RAG systems, conversational agents, coding assistants, large document processing, agentic tool use. Combine with Batch API for 95% total savings (90% caching + 50% batch discount). Claude 3.7 Sonnet supports automatic cache management on AWS Bedrock. Essential for production cost optimization.
Best practices 2025: clear, direct instructions; use XML tags for structure (
Message Batches API (GA) processes up to 100,000 requests or 256 MB per batch asynchronously at 50% cost reduction. Create batch via POST /v1/messages/batches with requests array (each with custom_id), poll status via GET, retrieve results from results_url. Processing: most finish <1 hour, 24 hours max. Results available 29 days. Use when: batch size >100 requests, can tolerate <24h latency, cost matters more than speed. Perfect for: dataset labeling, bulk translations, content moderation, synthetic data generation. API version: 2023-06-01. Combine with prompt caching for up to 95% total savings.
Citations API (January 2025) enables Claude to provide detailed source references when answering questions about documents. Claude automatically cites exact sentences/passages it uses to generate responses. Benefits: cited_text doesn't count toward output tokens (cost savings), significantly higher citation accuracy than prompt-based techniques (up to 15% better recall), prevents invalid source citations. Supported models: Sonnet 4.5, Sonnet 3.7, Sonnet 3.5v2, Haiku 3.5, Opus 4 (not Haiku 3). Real-world: Endex reduced hallucinations from 10% to 0%, 20% more references per response. Essential for RAG systems requiring verifiable outputs.
Set citations.enabled=true on each document block in your messages. Must be enabled on ALL documents in a request (uniform requirement). Structure: {"type": "document", "source": {"type": "text", "media_type": "text/plain", "data": "content"}, "title": "Optional", "citations": {"enabled": true}}. Supported document types: plain text (text/plain), PDF (application/pdf with base64), custom content blocks. Plain text auto-chunks into sentences, PDFs extract and chunk text. Claude Sonnet 3.7 may require explicit instruction: "Use citations to back up your answer." Available via Anthropic API and Google Cloud Vertex AI.
Citations appear in response content blocks with type-specific formats. Plain text: char_location type with cited_text, document_index (0-indexed), start_char_index/end_char_index (0-indexed, exclusive end). PDF: page_location type with cited_text, document_title, start_page_number/end_page_number (1-indexed). Custom content: content_block_location type with block indices (0-indexed). All include document_index and cited_text. Streaming: citations_delta events append to citation arrays. Context field: non-citable metadata you can add to documents. Citations reference exact source material enabling verification.
Use Citations API when: building RAG systems requiring verifiable outputs, processing long documents (research papers, legal docs, technical specs), need cost optimization (cited text is free), want higher accuracy (15% better recall than prompts), preventing hallucinations is critical. API advantages: superior citation quality, prevents invalid sources, reduces output tokens. Prompt-based techniques: lower model support, inconsistent formats, higher hallucination risk, cited text counts as tokens. Citations API essential for production applications where source verification matters (legal, medical, financial, compliance). Trade-off: Citations requires document preprocessing (chunking).
Constitutional AI (CAI) principles ensure safety and harmlessness through self-supervised training. Claude trained to refuse harmful requests, avoid bias, admit uncertainty, respect privacy. No separate moderation API needed (built-in safety). Core principles: harmlessness, helpfulness, honesty. Claude declines: illegal activities, dangerous instructions (weapons, drugs), hate speech, personal impersonation, private data extraction, deceptive content. Jailbreak-resistant through adversarial training. 2025 capabilities: cannot identify individuals in images (privacy), responsible AI practices, transparent reasoning via extended thinking. Anthropic's Acceptable Use Policy must be followed. Safety balanced with usefulness for legitimate applications (research, education, business). More robust refusals than competitors while maintaining practical helpfulness. Training includes human feedback and self-critique.
Rate limits measured in RPM (requests per minute), ITPM (input tokens per minute), OTPM (output tokens per minute) per model class. 2025 tiers: Tier 1 (50 RPM, 20K-50K ITPM, 4K-10K OTPM), Tier 2 (1,000 RPM, 40K-100K ITPM, 8K-20K OTPM, requires $40 balance for 7 days), up to Tier 4 with automatic advancement. Long context (>200K tokens) has separate limits with context-1m-2025-08-07 beta header. Response headers: anthropic-ratelimit-requests-limit, anthropic-ratelimit-remaining. Error: 429 status with retry-after header (exact seconds to wait). Handling: exponential backoff with jitter (delay *= 2), respect retry-after header, max 3-5 retries, queue requests, circuit breaker pattern. Monitor in Claude Console. Batch API for non-urgent requests (50% cost reduction). View tier/limits in console, contact sales for Priority Tier. Production: implement retry with --retry 3 --backoff exponential flags.
Agent Skills are directories with SKILL.md files (YAML frontmatter + Markdown instructions) plus optional scripts/resources that give Claude specialized capabilities. Enable via beta headers: betas=['code-execution-2025-08-25', 'skills-2025-10-02']. Progressive disclosure: Claude loads metadata first (name, description), then full content only when needed. Use in API: container={'skills': [{'type': 'anthropic', 'skill_id': 'xlsx', 'version': 'latest'}]}. Max 8 skills per request. Anthropic provides pre-built skills (Excel, PowerPoint, Word, PDF). Create custom skills via POST /v1/skills endpoint. Available across Claude.ai, Claude Code, API. Requires code execution tool.
Prefilling (response prefilling) starts the assistant message with initial content to guide Claude's response format or style. Specify assistant role message with partial content, Claude continues from there. Use cases: forcing JSON output (start with {"result":), enforcing specific format/structure, controlling tone, preventing apologies/disclaimers, guiding XML structure. Example: {"role": "assistant", "content": "{"result":"} forces valid JSON response. Works with any format: JSON, XML, CSV, code blocks, specific phrasing. More reliable than prompt instructions alone. Reduces unnecessary refusals for edge cases. Combine with system prompt for role and prefilling for exact format. Essential technique for structured outputs in production applications. Supported across all Claude models (Sonnet, Opus, Haiku). More effective than temperature/top_p adjustments for format control.
Claude Opus 4.1 (August 2025): highest reasoning, complex tasks requiring deep analysis (research, large codebases, sustained focus), $15/$75 per million tokens, slower responses, best for nuanced work. Claude Sonnet 4.5 (September 2025): best coding model, balanced performance/speed, most popular, $3/$15 per million tokens, good for most production tasks (APIs, agents, analysis). Claude Haiku 4.5 (October 2025): fastest, cheapest, Sonnet-level coding performance at 1/3 cost and 2x+ speed, $1/$5 per million tokens, instant responses, high-volume tasks (chatbots, simple classification). All models: 200K context window, vision (up to 100 images API), extended thinking, tool use with parallel calling, multilingual, training cutoff March 2025. Choose based on: complexity (Opus), balanced production needs (Sonnet), speed/cost optimization (Haiku). Start with Sonnet, upgrade to Opus for complex reasoning, downgrade to Haiku for cost savings. Combine with prompt caching (90% savings) and Batch API (50% savings) for maximum cost optimization.
Include previous messages in messages array in chronological order. Pattern: [old user, old assistant, new user]. 2025 updates: consecutive messages same role auto-merged, first message can be assistant (flexible turn order), plain text/PDF documents supported. Management strategies: sliding window (keep recent N messages, drop oldest), summarization (compress older context into system prompt), full history (all messages within 200K context limit), selective inclusion (relevant messages only). Calculate tokens to stay within limits (use token counting). System prompt persistent across all turns (higher priority than messages). Best practices: cache static context (system prompt, knowledge base) for 90% cost reduction, clear context when switching topics/users, use extended thinking for complex multi-turn reasoning, implement tool use for stateful operations. Thread IDs for organization (user-managed). Balance: context relevance vs token costs. For long conversations, summarize history periodically.
Pay per token: input tokens (prompt/messages/system/images) and output tokens (response). 2025 pricing: Claude Haiku 4.5 ($1/$5 per million input/output), Claude Sonnet 4.5 ($3/$15), Claude Opus 4.1 ($15/$75). Pricing optimizations: prompt caching (cache reads 90% discount at 10% base price, cache writes 25% premium), Batch API (50% discount for async processing <24h), Citations API (cited text free, doesn't count toward output tokens), extended thinking (lower rate than regular tokens). Image pricing: based on token count from image size. Monitor: usage dashboard in Claude Console with cost breakdown by model. No free tier (new accounts get credits). Optimization strategies: choose right model (Haiku for speed/cost, Sonnet for balance, Opus for complexity), enable prompt caching for repeated content (90% savings), use Batch API for non-urgent tasks (50% savings), set max_tokens to prevent runaway costs, compress prompts (remove unnecessary whitespace), combine caching + batching for 95% total savings. Rate limits tier-based, upgrade with spending.
Set stream: true in request for Server-Sent Events (SSE) streaming. Event sequence: message_start (metadata), content_block_start (block begins), content_block_delta (incremental content chunks), content_block_stop (block ends), message_delta (usage/stop_reason), message_stop (completion). Parse delta events, accumulate content progressively. 2025 features: fine-grained streaming for tool parameters (beta), extended thinking streaming (see reasoning in real-time), citations_delta events (citation references), streaming with multiple content blocks. Benefits: progressive display, lower perceived latency, better UX, real-time feedback. Implementation: use official SDKs (Python anthropic-sdk-python, TypeScript anthropic-sdk-typescript) for automatic handling, or parse SSE manually. Handle: reconnection logic, error events, partial content assembly. Use for: chat interfaces, long-form content generation, real-time applications, agent reasoning display. SDKs strongly recommended for streaming mode. Different event structure than OpenAI (content_block_delta vs choices.delta).
System prompts set Claude's role, behavior, and guidelines using the top-level 'system' parameter (separate from 'messages' array) in Messages API. Applied before all messages with higher priority than user messages, influencing every response in conversation. Use for: role definition ("You are an expert Python developer"), behavior guidelines (tone, style, constraints), expertise context (domain knowledge), output format requirements (JSON, XML, Markdown), safety constraints, agent instructions. Best practices: clear and specific instructions, comprehensive role definition, consistent persona across turns, include examples and guidelines, use XML tags for structure (
System prompts use top-level 'system' parameter vs messages in 'messages' array. Structure: system (single string or text blocks) vs messages (array of role/content objects with user/assistant roles). Processing: system applied first before all messages, influences every response vs messages processed sequentially in conversation flow. Privilege: system has higher priority than user messages (Claude follows system instructions over conflicting user requests). Persistence: system remains constant across all conversation turns vs messages accumulate chronologically. API structure: system parameter separate and independent from messages array. Caching: system prompts ideal for caching (90% cost savings with cache_control, static content) vs messages typically dynamic and uncached. Best practice: use system for role definition, behavior guidelines, output format, persistent context; use messages for specific user queries, conversation flow, dynamic content. Combine both: system sets the stage, messages drive conversation. Example: system="You are a Python expert", messages=[{role: "user", content: "Debug this code"}].
Error responses: 400 (invalid request, malformed JSON), 401 (authentication failed), 403 (forbidden, permissions), 404 (not found), 429 (rate limit exceeded, RPM/ITPM/OTPM), 500 (server error), 529 (overloaded). Parse error object with type, message, error details. Retry strategy: retry on transient errors (429, 500, 529), don't retry client errors (400, 401, 403, 404). Implement exponential backoff with jitter: delay *= 2 with random jitter to prevent thundering herd. Respect retry-after header (429 errors specify exact wait time in seconds). Max retries: 3-5 attempts. 2025 best practices: circuit breaker pattern (timeout 30s, error threshold 50%, reset timeout 30s), connection pooling, request caching, optimized payload sizes. Configuration example: retryEnabled: true, maxRetries: 3, retryDelay: 1000ms, exponential backoff enabled. Command-line: use --retry 3 --backoff exponential flags. Log all errors for monitoring/debugging. Official SDKs (Python, TypeScript) include automatic retry logic with configurable settings. Production requirements: implement retry wrapper, monitor error rates, set up alerting, fallback strategies. Essential for production reliability.
Official SDKs (2025): Python (anthropic-sdk-python, requires 3.8+), TypeScript/JavaScript (anthropic-sdk-typescript), both with async support and type hints. Claude Agent SDK: Python (claude-agent-sdk, requires 3.10+), TypeScript (@anthropic-ai/claude-agent-sdk), enables agents to interact with local computing (file operations, terminal commands, web search). SDK features: streaming vs single-call modes, automatic retries with exponential backoff, timeout handling, error management, SSE parsing, type safety, authentication handling. Anthropic tools: Workbench (test/iterate prompts interactively), Console (manage API keys, monitor usage/costs, view tier limits), Evaluation API (test prompt variants with metrics), Prompt Generator (create optimized prompts), Citations builder (enable verifiable answers). Community integrations: LangChain, LlamaIndex, semantic kernel frameworks. SDK usage strongly recommended over direct HTTP (abstracts complexity, handles edge cases, maintains compatibility). Open source on GitHub with comprehensive documentation and examples. Essential for production development.
Claude supports 100+ languages with varying proficiency levels through multilingual training. Strongest languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese (Simplified/Traditional), Russian, Arabic, Hindi. Capabilities: translation between languages, answer questions in specific language, mixed-language conversations (code-switching), multilingual document analysis, cross-language reasoning. Best practices: specify target language in system prompt ("Respond in Spanish") or user message, provide examples in target language for better quality, use extended thinking for complex multilingual tasks. Quality factors: higher for well-represented languages (Western European, East Asian), varies for low-resource languages, maintains context across language switches. Use cases: international customer support, multilingual content generation, translation services, global applications, multi-market documentation. Context window (200K tokens) supports all Unicode including non-Latin scripts (Arabic, Chinese, Japanese, Cyrillic, etc.). No specialized language-specific models needed. Training cutoff March 2025 includes recent language usage patterns.
Production best practices 2025: Model selection: start with Sonnet 4.5 (balanced), upgrade to Opus 4.1 (complex reasoning), downgrade to Haiku 4.5 (cost/speed). Error handling: implement exponential backoff with jitter, retry on 429/500/529, max 3-5 retries, respect retry-after header, circuit breaker pattern, monitor error rates. Cost optimization: enable prompt caching for static content (90% savings), use Batch API for non-urgent tasks (50% savings), set max_tokens limits to prevent runaway costs, choose right model per task, compress prompts (remove whitespace), combine caching + batching (95% total savings). Security: protect API keys (environment variables, secrets management), validate all user inputs, sanitize outputs, implement rate limiting, content filtering for user-generated content, follow Acceptable Use Policy. Monitoring: track latency (p50, p95, p99), error rates by type, costs per request/model, usage by tier limits, cache hit rates. Architecture: use official SDKs (Python/TypeScript), connection pooling, request queuing, fallback strategies (model downgrade, cached responses), load balancing. Testing: test with Haiku first (cheaper), use Workbench for prompt iteration, implement A/B testing with Evaluation API, gradual rollout. Logging: log all requests/responses for debugging, include request IDs, monitor token usage. Production checklist: latest SDK versions, retry logic implemented, caching enabled, monitoring/alerting configured, security hardened, costs budgeted/tracked.
Claude (2025): 200K context window standard (vs GPT-4o 128K), extended thinking with visible reasoning process (transparency), Constitutional AI for safety (no separate moderation API needed), prompt caching (90% cost reduction on cached reads), Citations API (verifiable sources, cited text free), Agent Skills (specialized capabilities), prefilling support (force response format), strong at long document analysis and code, training cutoff March 2025. Models: Sonnet 4.5 ($3/$15), Opus 4.1 ($15/$75), Haiku 4.5 ($1/$5). OpenAI GPT: larger ecosystem and integrations, multimodal generation (DALL-E images, TTS audio), Whisper (speech-to-text), fine-tuning support (GPT-4o, GPT-3.5), Assistants API with threads/files, GPT Store (custom GPTs), Realtime API (voice), broader community adoption. Models: GPT-4o, o1 (reasoning), o3-mini. Claude advantages: longer context, transparent thinking, better safety without separate moderation, cost optimization features, superior long-form analysis. GPT advantages: image/audio generation, fine-tuning, established ecosystem, voice capabilities. Choose Claude for: long documents (200K context), verifiable answers (Citations), cost optimization (caching), safety-critical applications, coding/analysis. Choose GPT for: multimodal generation, fine-tuning needs, voice applications, broader ecosystem integration. Both: vision, function calling, streaming, production-ready APIs.