S3 cost optimization critical for data-intensive apps. Storage class pricing (2025): Standard $0.023/GB/month, Infrequent Access (IA) $0.0125, Glacier Instant $0.004, Deep Archive $0.0036. Intelligent-Tiering: Automatic cost optimization, monitors access patterns, moves objects between tiers (Frequent → IA → Archive → Deep Archive), NO retrieval fees. Overhead: $0.0025 per 1,000 objects. Enable: aws s3api put-bucket-intelligent-tiering-configuration --bucket myBucket --id rule1 --intelligent-tiering-configuration '{"Status":"Enabled","Tierings":[{"Days":90,"AccessTier":"ARCHIVE_ACCESS"},{"Days":180,"AccessTier":"DEEP_ARCHIVE_ACCESS"}]}'. Best for: unpredictable access, objects >128KB, long-lived data. Lifecycle policies: Rule-based transitions. Example: logs → IA after 30 days → Glacier after 90 days → delete after 365 days. Policy: {"Rules":[{"Id":"logs","Prefix":"logs/","Status":"Enabled","Transitions":[{"Days":30,"StorageClass":"STANDARD_IA"},{"Days":90,"StorageClass":"GLACIER_IR"}],"Expiration":{"Days":365}}]}. Best for: predictable patterns, compliance retention. Cost example: 100TB, 1M objects. Standard: $2,300/month. Optimized (50% Frequent, 30% IA, 20% Archive): $1,607/month (30% savings). Gotchas: IA minimum 30-day retention, 128KB minimum size - early deletion/small files cost more.
AWS Serverless FAQ & Answers
21 expert AWS Serverless answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
21 questionsAPI Gateway provides production capabilities without Lambda code changes. (1) Request validation: Define JSON Schema models, attach to methods, Gateway validates before Lambda invocation, returns 400 on invalid input automatically. Model: {"type":"object","properties":{"orderId":{"type":"string","pattern":"^[0-9]+$"},"amount":{"type":"number","minimum":0}},"required":["orderId","amount"]}. Attach: aws apigateway update-method --rest-api-id abc --resource-id xyz --http-method POST --patch-operations op=replace,path=/requestValidatorId,value=validator123. Benefit: reduce Lambda invocations, save cost. (2) Throttling: Protect backend from overload. Levels: Account (10,000 rps), Stage (custom per environment), Method (per-endpoint), Usage Plans (per API key). Configure: aws apigateway update-stage --rest-api-id abc --stage-name prod --patch-operations op=replace,path=/throttle/rateLimit,value=1000 op=replace,path=/throttle/burstLimit,value=500. Response: 429 with Retry-After header. (3) Caching: Cache GET responses, TTL 0-3600s, cache key = query params + headers. Enable: aws apigateway update-stage --cache-cluster-enabled --cache-cluster-size 0.5 (0.5GB-237GB). Cost: $0.02/hour per GB. Invalidation: client adds Cache-Control: max-age=0 header (requires auth), Lambda sets Cache-Control: no-cache in response. Monitoring: CloudWatch metrics (CacheHitCount, 4XXError, Latency). Optimize cache hit ratio >50%.
Lambda cold starts occur when AWS creates a new execution environment for your function. Causes: (1) First invocation after deployment, (2) Scaling up - new concurrent request exceeds warm instances, (3) Inactivity - environments recycled after ~15-45 minutes of no invocations, (4) Code/configuration changes trigger new environments. Cold start phases: Init duration (container initialization + runtime loading + function initialization code outside handler). Typical 2025 cold start times: Node.js 200-400ms, Python 300-500ms, Go 150-250ms, Java 1-3 seconds (without SnapStart), .NET 800ms-1.5s (without SnapStart). Impact: P99 latency spike, poor user experience for latency-sensitive APIs. Measurement: CloudWatch Logs show Init Duration field in REPORT lines. Use AWS X-Ray to visualize cold start percentage. Monitor with: filter @type = "REPORT" | stats count(@initDuration > 0) / count(*) * 100 as coldStartPercentage. Warm starts: 5-50ms (handler execution only, no init). VPC functions (2019+ with Hyperplane ENIs): cold start penalty now <100ms vs 10+ seconds before 2019.
Express workflows orchestrate high-volume, short-duration event processing with at-least-once execution. Characteristics: (1) Duration: Max 5 minutes per execution. (2) Execution model: At-least-once - executions/tasks may run multiple times, must be idempotent. (3) Throughput: >100,000 executions/second per account (50× higher than Standard). (4) Execution history: Sent to CloudWatch Logs only, not queryable in Step Functions console. (5) Service integrations: All integrations supported EXCEPT .sync (job-run) and .waitForTaskToken (callback). (6) Parallel processing: Map state supports unlimited concurrency (Standard limited to 40). Pricing: $1.00 per 1 million executions + $0.00001667 per GB-second. Example: 2-second execution, 512MB = $0.001 + $0.000017 = $0.001017 per execution. Best for: (1) IoT data ingestion/processing (high-volume sensors). (2) Streaming data transformations (Kinesis, DynamoDB Streams). (3) Mobile backend APIs (user actions). (4) High-volume event-driven workloads (>1,000 exec/s). (5) Cost optimization - $1/million vs $25/million for Standard. Requirements: All tasks must be idempotent (use DynamoDB conditional writes, idempotency tokens). Trade-off: 50× throughput and 25× cheaper per execution, but lose execution history visibility and exactly-once guarantee.
Lambda Powertools (official AWS library for Python, TypeScript, Java, .NET) provides production-ready utilities for observability and best practices. Zero external dependencies. Core features: (1) Structured logging - JSON logs with correlation IDs, automatic context (function name, request ID, cold start), log sampling (log 10% of success, 100% of errors). Example: logger.info('Processing', {orderId}) → {"timestamp":"2025-01-15T10:30:00Z","level":"INFO","message":"Processing","function_name":"orderProcessor","cold_start":true,"orderId":"456"}. (2) Distributed tracing - auto-instrumentation with X-Ray, capture cold starts, custom annotations, trace external calls (HTTP, DynamoDB, S3). (3) Metrics - CloudWatch Embedded Metric Format (EMF), zero-latency custom metrics, automatic flushing. Example: metrics.addMetric('OrderProcessed', MetricUnits.Count, 1). (4) Idempotency - built-in duplicate prevention with DynamoDB, decorator-based, configurable TTL. (5) Validation - input/output schema validation (JSON Schema, Pydantic), automatic error responses. (6) Batch processing - SQS/Kinesis utilities, automatic error handling, partial batch failures. Benefits: Reduces boilerplate 10-20 lines → 2-3 decorators, <5ms overhead, FREE (AWS-native). Alternative: Datadog/New Relic ($100+/month).
CodeDeploy automates gradual traffic shifting with automatic rollback. Deployment configurations: (1) Canary10Percent5Minutes - shift 10% immediately, wait 5 minutes, shift remaining 90%. (2) Canary10Percent15Minutes - 10% shift, 15 min wait, 90% shift. (3) Linear10PercentEvery3Minutes - shift 10% every 3 minutes until 100% (total 30 minutes). (4) AllAtOnce - immediate 100% shift (for low-risk changes). Setup: (1) Create CodeDeploy application: aws deploy create-application --application-name myApp --compute-platform Lambda. (2) Create deployment group with rollback alarms: aws deploy create-deployment-group --application-name myApp --deployment-group-name prod --service-role-arn <role> --deployment-config-name CodeDeployDefault.LambdaCanary10Percent5Minutes --alarm-configuration alarms=[{name: myFunc-Errors}] --auto-rollback-configuration enabled=true,events=[DEPLOYMENT_STOP_ON_ALARM]. (3) Deploy: aws deploy create-deployment --application-name myApp --deployment-group-name prod --revision '{"revisionType":"AppSpecContent","appSpecContent":{"content":"{version:1.0,Resources:[{myFunc:{Type:AWS::Lambda::Function,Properties:{Name:myFunc,Alias:prod,CurrentVersion:1,TargetVersion:2}}}]}"'}}. Automatic rollback triggers: (1) CloudWatch Alarm breaches (error rate >1%, duration >1000ms). (2) Deployment failure. Monitoring: CodeDeploy console shows live traffic shift progress, CloudWatch metrics per version, X-Ray traces for comparison. Cost: CodeDeploy for Lambda is FREE.
Single-table design stores multiple entity types in one table to minimize round-trips and cost. Pattern: Composite keys (PK/SK) with overloaded attributes. Example schema: PK='USER#123', SK='PROFILE' → user profile. PK='USER#123', SK='ORDER#456' → user's order. PK='ORDER#456', SK='ITEM#789' → order item. PK='PRODUCT#abc', SK='METADATA' → product details. Query patterns: (1) Get user + all orders: query({PK: 'USER#123'}) returns PROFILE + all ORDER#* items. (2) Get order + all items: query({PK: 'ORDER#456'}) returns order + all items. (3) Get specific order: get({PK: 'USER#123', SK: 'ORDER#456'}). GSI for alternate access: Need query by email? Add GSI: GSI_PK='EMAIL#[email protected]', GSI_SK='USER#123'. Benefits: (1) Atomic transactions - write user + order in single transactWrite (max 100 items, 4MB). (2) Cost reduction - 1 table = 1 set WCU/RCU vs N tables. (3) Single query performance - fetch related items together, no multiple round-trips. (4) Simplified operations - one table to backup, monitor, scale. Item collections: Group related items with same PK, query with SK conditions: query({PK: 'USER#123', SK: {begins_with: 'ORDER#'}}) gets all orders.
Production best practices: (1) Event versioning - include version in detail-type: OrderPlaced.v2, support multiple versions during migration: Rule 1 matches OrderPlaced.v1 → LegacyLambda, Rule 2 matches OrderPlaced.v2 → NewLambda. Run both until v1 deprecated. (2) Correlation IDs - add correlationId to all events for distributed tracing: {correlationId: 'req-abc123', causationId: 'evt-456', ...}. Track request flow across services via X-Ray. (3) Idempotency - targets must handle duplicates (EventBridge guarantees at-least-once delivery). Use idempotency keys in DynamoDB or AWS Powertools. (4) Schema validation - use Schema Registry to catch breaking changes pre-deployment. CI/CD checks: aws schemas describe-schema + diff against previous version. (5) Monitoring - CloudWatch metrics: FailedInvocations (alert if >0), ThrottledRules (indicates rate limiting), Invocations (track volume). X-Ray for end-to-end traces. (6) Content-based filtering - reduce Lambda invocations via rule filters: only process amount > 1000 instead of filtering in Lambda. Saves cost + improves performance. (7) Error handling - configure DLQ per target, alert on DLQ depth >0, manual investigation required. Limitations: Rules match syntax (no regex, limited operators), 5 targets max (use Step Functions to fan-out to >5).
Avoid single-table when: (1) Different access patterns - if entities rarely queried together (users vs analytics logs), separate tables simpler. Multi-table: query users separately from logs. Single-table: complicates schema with no benefit. (2) Different scaling needs - hot partition if one entity dominates traffic. Example: USER entities get 1000 req/s, LOG entities get 10 req/s - single table creates hot partition on USER keys. Separate tables enable independent scaling (provision more capacity for Users table only). (3) Team boundaries - different teams own different entities, separate tables for clear ownership, permissions (IAM policies per table), independent deployments. (4) Compliance requirements - data residency or retention differ. Example: user data kept 7 years (GDPR), logs kept 90 days. Separate tables with different TTL/backup policies. (5) Simple CRUD applications - no complex relationships or access patterns. Multi-table easier to understand, no overloaded PK/SK complexity. (6) Large items - if individual items approach 400KB limit, single-table amplifies issue. Separate tables or store large data in S3. Complexity trade-off: Single-table: harder to reason about schema, application code does joins. Multi-table: multiple round-trips, higher cost, simpler schema. Best practice: Start with single-table if <5 entity types and clear access patterns, migrate to multi-table if complexity grows or different scaling/retention needed.
SnapStart (GA 2023, expanded to Python/NET 2024) reduces cold starts by caching initialized function state. Mechanism: (1) During version publishing, AWS executes initialization code once (imports, client connections, warm-up logic). (2) Takes snapshot of memory + disk state after initialization. (3) Encrypts and stores snapshot. (4) On invocation, restores from snapshot instead of reinitializing. Performance: Java: 3-10s cold start → <200ms (10-15× faster). Python 3.12+: 500ms → 150-300ms (2-3× faster). .NET 8+: 1.5s → 300-500ms (3-5× faster). **Supported runtimes:** Java 11/17/21, Python 3.12+, .NET 8+. **Enable:** aws lambda update-function-configuration --function-name myFunc --snap-start ApplyOn=PublishedVersions. **Cost:** FREE (no additional charges). **Limitations:** Not supported with: ephemeral storage >512MB, EFS, Provisioned Concurrency, container images. Important considerations: (1) Generate unique values AFTER restore (UUIDs, timestamps, random numbers). (2) Reconnect network connections in handler (cached connections stale). (3) /tmp storage cleared on restore. (4) Only works with published versions (not $LATEST). Works with Lambda@Edge and all 23+ regions.
Choose Standard for: (1) Long-running (>5 minutes to 1 year) - data processing pipelines, orchestration workflows. (2) Exactly-once requirements - payments, financial transactions, inventory updates (non-idempotent operations). (3) Human approval - wait for manual intervention via .waitForTaskToken. (4) Audit/compliance - full execution history required, queryable via console/API. (5) Low-moderate volume (<2,000 exec/s). (6) **Debugging needs** - visual execution history essential. **Choose Express for:** (1) **High-throughput** (>1,000 exec/s, up to 100,000/s) - IoT ingestion, streaming transformations. (2) Short-duration (<5 minutes) - API orchestration, event processing. (3) **Idempotent operations** - at-least-once acceptable, tasks handle retries gracefully. (4) **Cost optimization** at scale - $1/million vs $25/million (25× cheaper per execution). (5) **Real-time processing** - mobile backends, user-triggered workflows. **Cost comparison:** 10-state workflow, 10K executions/second → Standard: $2.50/1K exec. Express: $1.00/million + duration (~$1.10 total, 2× cheaper). **Best practice:** Start with Standard for predictable exactly-once behavior. Migrate to Express after optimizing for idempotency when throughput >1K/s. Cannot switch workflow type after creation - must recreate state machine.
Choose SnapStart for: (1) Variable traffic - unpredictable spikes, <100 req/min baseline. (2) **Cost-sensitive workloads** - SnapStart is FREE. (3) **Acceptable P99 latency** - 150-300ms cold start OK. (4) **Supported runtimes** - Java, Python 3.12+, .NET 8+. (5) **Simple deployment** - no configuration needed. **Choose Provisioned Concurrency for:** (1) **Strict latency SLAs** - require sub-50ms P99 latency. (2) **Consistent high traffic** - >100 req/s sustained, predictable patterns. (3) Unsupported runtimes - Node.js, Go, Ruby (no SnapStart). (4) Predictable traffic - can optimize costs via auto-scaling schedules. Cost comparison: SnapStart: $0 (FREE). Provisioned: $12-30/month per instance (512MB-1GB). Performance: SnapStart: 150-300ms cold start. Provisioned: 10-50ms (no cold start). Best practice: (1) Start with SnapStart for Java/Python/NET - covers 80% of use cases at zero cost. (2) Monitor P99 latency - upgrade to Provisioned only if SnapStart insufficient. (3) Hybrid approach: SnapStart for base load + Provisioned Concurrency for peak hours (e.g., 9AM-5PM). (4) NOT compatible together - cannot enable both on same function. When neither needed: Async workloads (SQS, EventBridge), batch processing, internal APIs with relaxed latency requirements.
Exactly-once processing prevents duplicate operations (payments, inventory). Architecture: SQS FIFO queue → Lambda → DynamoDB with idempotency table + transactional writes. Implementation: (1) SQS FIFO - ContentBasedDeduplication: true, deduplication window 5 minutes, prevents duplicate messages at source. (2) Idempotency table - DynamoDB table with messageId as partition key: {messageId: string, processedAt: number, ttl: number}. (3) Lambda handler: Check if processed: const existing = await dynamo.get({TableName: 'Idempotency', Key: {messageId}}). If exists, skip (already processed). If not exists, process + store atomically via transaction: await dynamo.transactWrite({TransactItems: [{Put: {TableName: 'Idempotency', Item: {messageId, processedAt: Date.now(), ttl: now + 86400}}}, {Update: {TableName: 'Orders', Key: {orderId}, UpdateExpression: 'SET status = :val'}}]}). (4) SQS visibility timeout - set to 6× Lambda timeout (e.g., 30s Lambda → 180s visibility) to prevent reprocessing during retries. (5) Lambda reserved concurrency - limit to 10-50 to prevent overwhelming downstream. (6) Dead Letter Queue - maxReceiveCount: 3, alerts on DLQ depth >0. Cost: DynamoDB transactions = 2× write cost but essential for correctness. AWS Powertools alternative: Built-in idempotency decorator handles table management automatically.
EventBridge enables loosely-coupled pub-sub systems. Architecture: Publishers → Event Bus → Rules (pattern matching) → Targets (Lambda, SQS, Step Functions). Core concepts: (1) Event buses - default bus (AWS services like S3, EC2), custom buses (your applications), partner buses (SaaS integrations). Isolate via separate buses per domain/environment. (2) Events - JSON payloads: {source, detail-type, detail}, max 256KB, immutable. (3) Rules - pattern match on content: {"source": ["order.service"], "detail-type": ["OrderPlaced"], "detail": {"amount": [{"numeric": [">", 1000]}]}}, route to 1-5 targets per rule. (4) Schema registry - versioned JSONSchema definitions, auto-discovery, code generation for type safety. Design patterns: (1) Domain events - OrderPlaced, PaymentProcessed, ShipmentDispatched - granular over aggregated, include correlationId for tracing. (2) Fan-out - single event → multiple targets (email, SMS, database), parallel processing, eventual consistency. (3) Event replay - archive ALL events (infinite retention, $0.023/GB/month), replay for disaster recovery or new consumer onboarding. (4) Dead-letter queues - per rule-target, retry 185 times over 24 hours before DLQ. Limits: 10,000 rules per bus (soft limit), 5 targets per rule. Cost: $1 per million events.
Idempotency pattern ensures operations execute exactly once despite retries. (1) Race condition handling - DynamoDB conditional writes prevent concurrent executions: two Lambda instances process same message → both check idempotency → first writes successfully → second fails on condition, skips processing. (2) TTL for cleanup - set ttl attribute (Unix timestamp) to auto-delete old records after 24-72 hours, prevents table bloat, AWS deletes expired items within 48 hours. (3) Transaction atomicity - transactWrite ensures all-or-nothing: either both idempotency record AND business logic succeed, or both fail and retry. No partial state. (4) Network partition resilience - if Lambda times out mid-transaction, message returns to SQS (visibility timeout expires), next invocation checks idempotency table, sees record exists (if transaction completed), skips reprocessing. (5) Monitoring - track: ApproximateAgeOfOldestMessage (alert if >300s indicates backlog), ApproximateNumberOfMessagesVisible (queue depth), X-Ray traces show duplicate detections. (6) Cost optimization - DynamoDB transactions cost 2× standard writes but prevent duplicate charges/inventory errors. 1M requests = $2.50 transactions vs $1.25 standard ($1.25 premium for correctness). Edge cases covered: Concurrent executions (DynamoDB prevents), Lambda timeout (message requeues, idempotency prevents duplicate), Network failures (transaction rollback, safe to retry).
Optimization strategies ranked by effectiveness: (1) SnapStart (Python 3.12+, .NET 8+, Java 11/17) - reduces cold starts from seconds to <200ms, caches initialized snapshots, FREE, enable via runtime setting. Best for: unpredictable traffic, cost-sensitive. (2) Provisioned Concurrency - eliminates cold starts entirely (10-50ms response), pre-warms environments, costs $0.000004 per GB-second. Best for: user-facing APIs requiring <100ms P99, predictable traffic. (3) ARM64 (Graviton2) - 20% faster cold starts + 20% cost reduction vs x86_64, change architecture in function config. (4) Package size optimization - use Lambda Layers for dependencies, exclude dev dependencies, minify with esbuild/webpack, target <10MB. Each 1MB adds ~30-50ms. (5) Lazy loading - import heavy libraries inside handler functions only when needed: if (needsLibrary) { const lib = require('heavy-lib'); }. (6) Keep warm pings - EventBridge scheduled rule pings every 5 minutes, FREE (1M events/month), simple but wastes invocations. Cost comparison: SnapStart (FREE) < Keep-warm ($0) < Provisioned ($12-30/month per instance). Performance: Provisioned (10-50ms) < SnapStart (150-300ms) < Optimized cold (200-500ms).
Provisioned Concurrency pre-initializes Lambda execution environments to eliminate cold starts. AWS keeps specified number of warm environments ready 24/7, ensuring consistent <50ms response times. **How it works:** (1) AWS pre-creates execution environments, (2) Runs all initialization code (loads runtime, imports, creates connections), (3) Keeps environments warm and ready, (4) Routes requests to pre-warmed instances (zero init duration). **Configuration:** aws lambda put-provisioned-concurrency-config --function-name myFunc --provisioned-concurrent-executions 10 --qualifier prod. Must target alias or version (not $LATEST). **Dynamic scaling:** Use Application Auto Scaling for schedule-based (e.g., 50 instances 9AM-5PM, 5 instances nights) or metric-based adjustments. **Pricing:** $0.000004 per GB-second + standard invocation charges. Example: 512MB function with 10 provisioned instances = ~$15/month baseline. Compute Savings Plans offer 17% discount. **Monitoring:** CloudWatch metric ProvisionedConcurrencyUtilization - alert if >70% (spillover to on-demand causes cold starts). Performance: Consistent 10-50ms latency, zero cold starts. Best for: User-facing APIs requiring <100ms P99 latency, predictable high-volume traffic (>100 req/s). Supports both x86_64 and ARM64 architectures.
Use single-table when: (1) Entities frequently queried together - fetch user + orders + items in single query for performance. (2) Related access patterns - user profile + order history share partition key (PK='USER#123'). (3) Atomic transactions needed - write user + order in single transactWrite (max 100 items, 4MB). (4) Cost optimization - 1 table = 1 set WCU/RCU vs N tables with separate capacity. (5) <5 entity types with clear relationships - reduces complexity. Pattern: composite keys (PK/SK) with overloaded attributes. Example: PK='USER#123', SK='PROFILE' (user), SK='ORDER#456' (order). Benefits: single query performance, reduced cost, simplified operations (one backup/monitor), atomic multi-entity updates. Best for: e-commerce, SaaS apps, user-centric data models.
Blue-green deployments enable zero-downtime updates with instant rollback. Architecture: API Gateway → Lambda Alias → Weighted routing (v1: 90%, v2: 10%). Key components: (1) Lambda versions - immutable snapshots on publish, increment version numbers (v1, v2, v3), reference via ARN or $LATEST. (2) Lambda aliases - pointers to versions with weighted routing config, e.g., prod alias: v1 (90%) + v2 (10%). (3) API Gateway integration - point stage to alias (not version): /prod → Lambda:myFunc:prod. Deployment steps: (1) Publish new version: aws lambda publish-version --function-name myFunc --description 'v2'. (2) Update alias with 10% traffic: aws lambda update-alias --function-name myFunc --name prod --routing-config 'AdditionalVersionWeights={"2"=0.1}'. (3) Monitor CloudWatch metrics (Errors, Duration by version). (4) Gradual shift: 10% → 25% → 50% → 100%. (5) Full cutover: aws lambda update-alias --name prod --function-version 2 --routing-config '{}'. (6) Instant rollback: aws lambda update-alias --name prod --function-version 1 (takes <1 second). Cost: Free (versions and aliases have no charge), Provisioned Concurrency costs apply per version if used. Limitations: Weighted routing supports 2 versions max, API Gateway has 10s timeout.
Beyond tiering: (1) Compression - gzip/brotli before upload, 70-90% reduction for text/JSON/logs. CloudFront auto-decompresses for clients. 100GB logs → 10-30GB compressed. (2) Delete incomplete multipart uploads - lifecycle rule AbortIncompleteMultipartUpload after 7 days. Hidden cost: abandoned uploads still charged. (3) S3 Select - query data in-place with SQL, filter before transfer, 80% cost reduction for large objects. Example: query 1GB Parquet file, return 100MB result → charged for 100MB transfer, not 1GB. (4) Batch operations - bulk actions (copy, tag, delete) via S3 Batch, cheaper than per-object API calls. (5) S3 Express One Zone (2023) - 10× faster (single-digit ms latency), 50% cheaper requests, single AZ. Use for: temporary data, ML training, high-performance workloads. Standard: $0.023/GB/month, Express: $0.16/GB/month (7× cost but 50% cheaper requests). (6) Storage Lens - analytics dashboard identifies: buckets with >10% IA candidates, Glacier data never accessed, incomplete uploads. Free tier: 14 days, 28 metrics. Advanced: 15 months, 60+ metrics. Monitoring: Cost Explorer with S3 storage type dimension, budget alerts, CloudWatch BucketSizeBytes metric. Best practice: Start Intelligent-Tiering for unknowns, refine with Lifecycle after analyzing Storage Lens data.
Standard workflows orchestrate long-running, durable distributed processes with exactly-once execution. Characteristics: (1) Duration: Up to 1 year per execution. (2) Execution model: Exactly-once - tasks execute once unless explicit Retry configured. Critical for non-idempotent operations. (3) Throughput: >2,000 executions/second per account. (4) Execution history: Full history stored, queryable via API/console, visual debugging in Step Functions console. (5) Service integrations: All patterns supported - .sync (wait for completion), .waitForTaskToken (callback pattern for human approval). (6) Redrive: Failed executions redriven from point of failure (added 2023). (7) State machine versioning: Immutable versions for rollback. Pricing: $0.025 per 1,000 state transitions. 10-state workflow = $0.00025 per execution. Best for: (1) Long-running workflows (>5 minutes to 1 year). (2) Human approval steps (wait for callback). (3) ETL pipelines with complex error handling. (4) Audit/compliance requiring full history. (5) Critical operations requiring exactly-once semantics (payments, inventory). Cannot change workflow type after creation - must recreate state machine to switch to Express.