Use strict: true with additionalProperties: false and all fields required: tools = [{'type': 'function', 'function': {'name': 'analyze_data', 'strict': True, 'parameters': {'type': 'object', 'properties': {'results': {'type': 'array', 'items': {'type': 'object', 'properties': {'question': {'type': 'string'}, 'answer': {'type': 'string'}}, 'required': ['question', 'answer'], 'additionalProperties': False}}}, 'required': ['results'], 'additionalProperties': False}}}]. Strict mode requires: all properties in required array, additionalProperties: false at all object levels, no nullable (use type: ['string', 'null'] instead). Model: gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18. First request slow (~preprocesses schema to CFG). Achieves 100% schema adherence vs ~80% without strict mode. Optional fields: add null to type array.
OpenAI Function Calling FAQ & Answers
50 expert OpenAI Function Calling answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
50 questionsfrom pydantic import BaseModel, Field; from openai import OpenAI; class NestedData(BaseModel): name: str = Field(description='Item name'); value: int; class RequestData(BaseModel): items: list[NestedData]; total: int; client = OpenAI(); completion = client.chat.completions.create(model='gpt-4o-2024-08-06', messages=[...], tools=[{'type': 'function', 'function': {'name': 'process_data', 'parameters': RequestData.model_json_schema(), 'strict': True}}]). Pydantic generates JSON schema with title fields - compatible with strict mode. For nested models: all Pydantic fields auto-required unless Optional[]. Use Field(description='...') for better model understanding. Warning: Pydantic Field metadata may not set additionalProperties: false for $ref-referenced types when inlined (Jan 2025 issue). Verify schema with RequestData.model_json_schema() before deployment. LangChain alternative: from langchain.tools import StructuredTool.
Strict mode requires additionalProperties: false for all object types in schema. Error occurs when: (1) Missing additionalProperties: false at root or nested objects, (2) Using Pydantic/Zod schema generators that omit this field, (3) Empty required array (strict mode requires all properties listed). Fix: Add 'additionalProperties': False to every object definition recursively. Example correct schema: {'type': 'object', 'properties': {'name': {'type': 'string'}}, 'required': ['name'], 'additionalProperties': False}. For nested objects: nested['additionalProperties'] = False for each. Zod issue: toJSONSchema emits nullable: true (invalid for OpenAI - use type: ['string', 'null']). Python: use strict=True in OpenAI >= 1.0. Models: gpt-4o-2024-08-06+. Alternative: set strict: false for flexible schema but lose guaranteed adherence. Validate schema with API before production.
Set tool_choice to force specific function: client.chat.completions.create(model='gpt-4o', messages=[{'role': 'user', 'content': 'What is weather in SF?'}], tools=[{'type': 'function', 'function': {'name': 'get_weather', 'parameters': {...}}}], tool_choice={'type': 'function', 'function': {'name': 'get_weather'}}). This guarantees model calls get_weather (no other function, no text response). Other tool_choice options: 'auto' (default, model decides), 'required' (forces any function call), 'none' (prevents all function calls). Use for deterministic workflows where specific tool must be called. Available in gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1, o3-mini (2025). Warning: prevents parallel function calls when set to specific function. For multiple tools: use 'required' to force any tool, let model choose which.
response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools); tool_calls = response.choices[0].message.tool_calls; for tool_call in tool_calls: function_name = tool_call.function.name; function_args = json.loads(tool_call.function.arguments); result = execute_function(function_name, function_args); messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}). Each tool_call has unique id - use for response matching. Execute all functions in parallel (asyncio.gather for async). Send all results back in single API call. Disable parallel calls with parallel_tool_calls: false (forces sequential). Store toolCallStates[] array for chat history. Models supporting parallel calls: gpt-4-turbo, gpt-4o, gpt-3.5-turbo-1106+. Warning: Assistant API may return multi_tool_use.parallel as function name (known bug).
Common causes: (1) Trailing commas in JSON arrays returned by model, (2) Literal function call string instead of JSON, (3) Missing closing brackets with high token usage. Fix with try-except: import json; try: args = json.loads(tool_call.function.arguments); except json.JSONDecodeError: args = repair_json(tool_call.function.arguments). Use strict: true to reduce invalid JSON (100% reliability on gpt-4o-2024-08-06). Alternative: from best_effort_json_parser import parse_json; args = parse_json(tool_call.function.arguments). Repair strategies: remove trailing commas, add missing brackets, parse partial JSON. For streaming: buffer incomplete JSON (see streaming Q&A). Monitor: ~3/20 calls may have missing closing bracket without strict mode. Production: always use strict: true on supported models. Test schema with various inputs before deployment. Report persistent issues to OpenAI (known bugs in Assistant API).
from best_effort_json_parser import parse_json; import json; buffer = ''; async for chunk in stream: if chunk.choices[0].delta.tool_calls: delta = chunk.choices[0].delta.tool_calls[0].function.arguments; buffer += delta; try: args = json.loads(buffer); process_complete_args(args); buffer = ''; except json.JSONDecodeError: partial = parse_json(buffer); update_ui(partial). Problem: O(n²) complexity - 12KB in 5-char chunks = 15M chars parsed vs 12K needed. Solution: maintain parsing state between chunks, close quotes/brackets for partial JSON. Libraries: best-effort-json-parser auto-closes unclosed structures. For production: buffer until complete, show partial UI updates with best-effort parsing. Recent issues (2025): vLLM missing punctuation in streamed tool_calls, Vercel AI empty arguments. Alternative: disable streaming for function calls (stream: false), use streaming only for text responses. Test thoroughly - streaming + function calling has higher error rates.
Two methods: (1) Function calling: tools=[{'type': 'function', 'function': {'name': 'extract_data', 'strict': True, 'parameters': schema}}], (2) Response format: response_format={'type': 'json_schema', 'json_schema': {'name': 'response', 'strict': True, 'schema': schema}}. Required models: gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 (Aug 2024+). Schema requirements: all fields in required array, additionalProperties: false everywhere, no nullable (use ['string', 'null']). Achieves 100% schema adherence (vs ~80% JSON mode). Based on constrained sampling - limits tokens to valid schema only. First request slow (preprocesses schema to CFG). Use for: data extraction, structured API responses, deterministic parsing. Example: client.beta.chat.completions.parse(model='gpt-4o-2024-08-06', messages=messages, response_format=ResponseModel). API version: 2024-08-01-preview+. Alternative: JSON mode (response_format={'type': 'json_object'}) for flexible JSON without schema.
Use strict: true with enum in schema: {'type': 'function', 'function': {'name': 'categorize', 'strict': True, 'parameters': {'type': 'object', 'properties': {'category': {'type': 'string', 'enum': ['tech', 'sports', 'politics']}}, 'required': ['category'], 'additionalProperties': False}}}. Without strict mode: model may return values outside enum (~20% error rate). With strict mode (gpt-4o-2024-08-06): 100% adherence to enum values. Enum supports: strings, integers, floats, booleans (JSON Schema spec). For large enums: add descriptions to help model choose correctly. Example: 'enum': ['meals', 'days'], 'description': 'Unit of measurement: meals for recipe portions, days for duration'. Known issues pre-strict mode: gpt-3.5-turbo-0613 occasionally ignores enum. Production: always use strict: true for enum validation. Alternative: post-process with validation and retry if value not in enum. Supported models: gpt-4o-2024-08-06+, gpt-4o-mini-2024-07-18+.
Use function calling (tools parameter) when: (1) Need to execute external actions (API calls, database queries, calculations). (2) Model should decide which of multiple functions to call. (3) Require parallel function calls. (4) Building agentic workflows with tool selection. Use response_format structured outputs when: (1) Need structured data extraction without external actions. (2) Single output schema (no tool selection). (3) Simpler implementation (no function execution loop). (4) Better performance (single API call). Example function calling: tools=[{'type': 'function', 'function': {'name': 'get_weather', ...}}]. Example structured output: response_format={'type': 'json_schema', 'json_schema': {...}}. Both support strict mode (100% schema adherence). Function calling returns tool_calls array, structured outputs return parsed content. Structured outputs cheaper (no tool execution overhead). For data extraction: use structured outputs. For actions: use function calling. Both require gpt-4o-2024-08-06+ for strict mode.
Execute function, catch errors, send error message back to model: try: result = execute_function(function_name, function_args); except Exception as e: result = {'error': str(e)}; messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}); response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Model sees error and can retry with corrected arguments or choose different function. Example error: {'error': 'API returned 404: City not found'}. Model response: calls function again with corrected city name or asks user for clarification. Best practice: include error type and recovery hint: {'error': 'Invalid date format. Use YYYY-MM-DD format'}. Maximum retries: limit to 3 to prevent infinite loops. Alternative: set tool_choice='none' after error to force text response. For production: log failed function calls, monitor error rates, add timeout handling. Use structured error format: {'success': false, 'error': 'message', 'retry_suggestion': 'hint'}.
Maximum 128 functions in tools array (OpenAI API limit, January 2025). Each function schema counts toward context window. Large function sets (>20) slow down latency due to increased prompt processing. Best practice: limit to 5-10 functions per request for optimal performance. For >128 functions: use semantic search to select relevant subset, prompt-based filtering ('only include functions related to weather'), or multi-stage function calling (category selection → specific function). Token usage: 128 functions with detailed schemas can consume 2,000-5,000 tokens. Monitor with: completion.usage.prompt_tokens. Function names affect selection quality: use descriptive names (get_current_weather vs weather1). Include descriptions: {'name': 'get_weather', 'description': 'Get current weather for a location', 'parameters': {...}}. Models better at selecting from smaller sets. Alternative: use LangChain/LlamaIndex tool routing for 100+ functions. Production: dynamically filter tools based on user intent before API call.
Function calling adds tokens to prompt_tokens (input pricing). Token overhead: (1) Function schemas: 50-200 tokens per function depending on complexity. (2) System prompt: +100 tokens for function calling instructions. (3) Tool call responses: charged as prompt tokens when added to messages. Example: 5 functions with detailed schemas = ~500 prompt tokens overhead. Pricing (Jan 2025): gpt-4o input $2.50/1M tokens, output $10/1M tokens. Function call workflow cost: initial request (schema tokens) + completion tokens (assistant response) + function execution + follow-up request (schema + history + function result). Reduce costs: (1) Minimize function schemas (remove unnecessary parameters/descriptions). (2) Use shorter function names. (3) Cache function schemas (prompt caching: 50% discount on repeated prompts). (4) Batch requests when possible. (5) Use gpt-4o-mini ($0.15/1M input vs $2.50/1M for gpt-4o). Monitor: completion.usage shows prompt_tokens, completion_tokens. Calculate cost: (prompt_tokens * input_price + completion_tokens * output_price) / 1M.
Use clear, concise descriptions with when-to-use guidance: {'name': 'get_weather', 'description': 'Get current weather conditions for a specific location. Use when user asks about temperature, conditions, or forecasts. Requires city name and optional country code.', 'parameters': {...}}. Best practices: (1) Start with what the function does. (2) Include when to use vs alternatives. (3) Mention key parameters. (4) Use consistent format across all functions. (5) Avoid jargon - use natural language. Bad: 'Weather API endpoint'. Good: 'Get current weather for a city, including temperature and conditions'. Parameter descriptions: {'city': {'type': 'string', 'description': 'City name in English (e.g., London, New York)'}}. Include examples in descriptions for complex types. For enums: explain each value: {'unit': {'enum': ['celsius', 'fahrenheit'], 'description': 'Temperature unit: celsius for metric, fahrenheit for imperial'}}. Model uses descriptions for function selection and argument generation. Length: 10-50 words per function, 5-20 words per parameter. Test with ambiguous queries to verify correct function selected.
Omit optional parameters from required array: {'type': 'object', 'properties': {'city': {'type': 'string', 'description': 'City name'}, 'country': {'type': 'string', 'description': 'Optional country code'}, 'units': {'type': 'string', 'enum': ['celsius', 'fahrenheit'], 'description': 'Optional temperature unit'}}, 'required': ['city'], 'additionalProperties': False}. Only city is required, country and units are optional. Model may or may not include optional parameters based on context. For strict mode: omit from required makes parameter truly optional (model won't hallucinate). Provide defaults in function implementation: def get_weather(city, country='US', units='celsius'). Alternative: use null union type for optional with explicit null: {'type': ['string', 'null']}. Best practice: optional parameters should have sensible defaults. Document defaults in description: 'units': {'type': 'string', 'description': 'Temperature unit (default: celsius)'}. Test: verify function works when optional parameters absent. For backward compatibility: add new parameters as optional initially.
Use minItems and maxItems in array schema: {'items': {'type': 'array', 'items': {'type': 'string'}, 'minItems': 1, 'maxItems': 5, 'description': 'List of 1-5 search keywords'}}. Strict mode (gpt-4o-2024-08-06+) enforces constraints - model returns array with length in range. Without strict mode: model may violate constraints. Example use cases: minItems=1 ensures non-empty arrays, maxItems=10 prevents excessive API calls. For flexible arrays: omit constraints. Combine with other constraints: {'items': {'type': 'string', 'minLength': 1, 'maxLength': 50}, 'minItems': 1, 'maxItems': 5}. Validation: if not strict mode, validate in code: if not (1 <= len(items) <= 5): return error. For nested arrays: apply constraints at each level. Alternative: use description to guide model: 'Provide 3-5 keywords (optimal: 3)'. Test with prompts requesting different array sizes. Known limitation: pre-strict models may ignore maxItems for large values. Production: always use strict=True for guaranteed constraint enforcement.
Use minimum and maximum in number schema: {'temperature': {'type': 'number', 'minimum': -50, 'maximum': 50, 'description': 'Temperature in Celsius (-50 to 50)'}}. Strict mode enforces constraints - model returns value in range. For integers: use 'type': 'integer' with same constraints. Exclusive bounds: use exclusiveMinimum and exclusiveMaximum: {'type': 'number', 'exclusiveMinimum': 0, 'maximum': 100} (range: 0 < x <= 100). Multiples: use multipleOf for specific increments: {'type': 'number', 'minimum': 0, 'maximum': 100, 'multipleOf': 5} (values: 0, 5, 10, ..., 100). Without strict mode: validate in code: if not (-50 <= temp <= 50): return error. Combine with descriptions: 'Age in years (1-120, must be positive integer)'. For prices: {'type': 'number', 'minimum': 0, 'multipleOf': 0.01, 'description': 'Price in USD with 2 decimal precision'}. Test boundary values: verify model respects minimum, maximum, edge cases. Production: use strict=True for automatic validation. Alternative: include validation logic in function with helpful error messages.
Use string constraints in schema: {'email': {'type': 'string', 'pattern': '^[a-zA-Z0-9.%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$', 'description': 'Valid email address'}, 'username': {'type': 'string', 'minLength': 3, 'maxLength': 20, 'pattern': '^[a-zA-Z0-9]+$', 'description': 'Alphanumeric username (3-20 chars)'}}. Strict mode enforces all constraints. Pattern uses regex (escape backslashes: \d for digits). Common patterns: phone '^+?[1-9]\d{1,14}$', URL '^https?://.+', ISO date '^\d{4}-\d{2}-\d{2}$', hex color '^#[0-9A-Fa-f]{6}$'. Length constraints: minLength=1 prevents empty strings, maxLength limits input size. Combine constraints: {'type': 'string', 'minLength': 8, 'maxLength': 64, 'pattern': '^(?=.[A-Z])(?=.[0-9]).+$', 'description': 'Password with uppercase and number'}. Without strict mode: model may violate pattern/length. Validate in code if not using strict mode. Test with invalid inputs to verify rejection. For enums: use enum instead of pattern when fixed set of values. Production: use strict=True on gpt-4o-2024-08-06+ for automatic enforcement.
Maximum nesting depth: 5 levels recommended (OpenAI guidance, Jan 2025). Deeper nesting increases latency and reduces reliability. Example recursive schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'children': {'type': 'array', 'items': {'$ref': '#'}}}, 'required': ['name'], 'additionalProperties': False}. Use $ref: '#' for self-reference (recursive tree/graph structures). Strict mode limitation: recursive schemas not fully supported in strict=True (Jan 2025) - avoid or use non-strict. For deep structures: flatten when possible, use IDs and separate queries, limit recursion with maxItems. Example: file system (directory → subdirectories max 3 levels), org chart (manager → reports max 5 levels). Without depth limit: model may generate extremely nested structures consuming excess tokens. Alternative: iterative approach - query one level at a time. Test: verify model stops at reasonable depth. For complex nested data: consider structured outputs with predefined depth or multiple API calls. Production: document expected depth, validate output depth before processing.
Maintain full message history including tool calls and results: messages = [{'role': 'user', 'content': 'What is weather in SF?'}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools); messages.append(response.choices[0].message); for tool_call in response.choices[0].message.tool_calls: result = execute_function(tool_call); messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}); response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Pattern: user → assistant (with tool_calls) → tool results → assistant → user → repeat. Include all previous messages for context. For multi-turn: messages array grows with each interaction. Context window limits: gpt-4o (128K tokens), gpt-4o-mini (128K). Manage long conversations: summarize old messages, truncate early messages, use sliding window (keep last N messages). Store conversation state: persist messages array in database/session. For parallel tool calls: append all tool results before next API call. Cost optimization: remove old tool messages after summarization. Test: verify model references earlier tool results correctly.
Use zodToJsonSchema library: import { zodToJsonSchema } from 'zod-to-json-schema'; import { z } from 'zod'; const WeatherSchema = z.object({city: z.string(), units: z.enum(['celsius', 'fahrenheit']).optional()}); const jsonSchema = zodToJsonSchema(WeatherSchema, {$refStrategy: 'none'}); const tool = {type: 'function', function: {name: 'get_weather', parameters: jsonSchema, strict: true}}. IMPORTANT: Zod emits nullable: true (invalid for OpenAI) - use type: ['string', 'null'] instead. Fix: manually replace nullable or use transform. Set $refStrategy: 'none' to inline definitions (OpenAI doesn't support external $ref). For strict mode: ensure additionalProperties: false - Zod may omit this. Alternative: zod-openai library handles OpenAI-specific conversions. Validation: console.log(JSON.stringify(jsonSchema)) to verify schema. Common issues: Zod .optional() → required array handling, .default() → enum defaults, .refine() custom validation (not supported, handle in code). Production: test generated schema with OpenAI API before deployment. For complex schemas: use Zod .describe() for parameter descriptions.
Multi-stage testing approach: (1) Schema validation: Use JSON Schema validator (jsonschema Python library, ajv for JavaScript) to verify schema is valid. (2) Dry run: Call OpenAI API with test prompts and verify tool_calls structure: response = client.chat.completions.create(model='gpt-4o', messages=[{'role': 'user', 'content': 'test prompt'}], tools=tools); assert response.choices[0].message.tool_calls is not None. (3) Edge cases: Test with ambiguous prompts, missing information, invalid values, boundary conditions. (4) Argument validation: Parse function arguments and validate: args = json.loads(tool_call.function.arguments); validate_args(args). (5) Integration testing: Execute actual functions with returned arguments. (6) Load testing: Verify performance with concurrent requests. Use strict=True for schema enforcement. Create test suite with expected function calls: assert tool_call.function.name == 'get_weather'. Monitor: completion.usage tokens, latency, error rates. For production: canary deployment (5% traffic), monitoring with Datadog/Sentry, fallback to non-function-calling mode on errors.
Use exponential backoff with jitter: import time; import random; from openai import OpenAI, RateLimitError, APIError; client = OpenAI(); max_retries = 3; for attempt in range(max_retries): try: response = client.chat.completions.create(...); break; except RateLimitError: if attempt == max_retries - 1: raise; wait = (2 ** attempt) + random.uniform(0, 1); time.sleep(wait). Retry conditions: (1) RateLimitError (429): always retry. (2) APIError (500, 502, 503): retry transient failures. (3) Timeout: retry with longer timeout. (4) Invalid tool arguments: don't retry (fix schema). Rate limits (Jan 2025): gpt-4o Tier 1: 500 RPM, 30K TPM, Tier 5: 10K RPM, 30M TPM. Handle Retry-After header: wait = int(error.headers.get('Retry-After', 1)). Use tenacity library: @retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(3), retry=retry_if_exception_type(RateLimitError)). For production: implement circuit breaker, monitor retry rates, use batch API for non-urgent requests, upgrade tier if hitting limits frequently.
Add system message with function usage guidelines: messages = [{'role': 'system', 'content': 'You are a helpful assistant. Only call functions when explicitly requested by user. Always verify parameters before calling. If unsure about a parameter, ask user for clarification instead of guessing.'}, {'role': 'user', 'content': 'What is weather?'}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). System prompt patterns: (1) Function selection: 'Prefer function A over B when X condition'. (2) Parameter validation: 'Never use placeholder values like TODO or UNKNOWN'. (3) Confirmation: 'Ask user confirmation before calling functions with side effects'. (4) Error handling: 'If function fails, explain error and suggest alternatives'. (5) Response format: 'After function call, provide concise summary'. Combine with tool_choice: tool_choice='required' forces function call regardless of system prompt. For strict workflows: 'You must call exactly one function per user request'. Test: verify system prompt affects function calling behavior. Production: include safety constraints ('Never call delete functions without explicit confirmation'), rate limiting guidance ('Maximum 5 API calls per request'). Monitor: check if model follows guidelines, adjust prompt based on behavior.
Combine vision and function calling: messages = [{'role': 'user', 'content': [{'type': 'text', 'text': 'Extract receipt details'}, {'type': 'image_url', 'image_url': {'url': 'data:image/jpeg;base64,...'}}]}]; tools = [{'type': 'function', 'function': {'name': 'extract_receipt', 'strict': True, 'parameters': {'type': 'object', 'properties': {'merchant': {'type': 'string'}, 'total': {'type': 'number'}, 'items': {'type': 'array', 'items': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'price': {'type': 'number'}}, 'required': ['name', 'price'], 'additionalProperties': False}}}, 'required': ['merchant', 'total', 'items'], 'additionalProperties': False}}}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Model analyzes image and returns structured data via function call. Use cases: receipt OCR, document extraction, image metadata, diagram analysis. Supports: gpt-4o, gpt-4o-mini (vision + function calling). Image formats: JPEG, PNG, WebP. Max image size: 20MB. URL or base64. For multiple images: add multiple image_url entries in content array. Combine with strict mode for guaranteed schema adherence. Production: validate extracted data, handle OCR errors gracefully.
Use strict: true with additionalProperties: false and all fields required: tools = [{'type': 'function', 'function': {'name': 'analyze_data', 'strict': True, 'parameters': {'type': 'object', 'properties': {'results': {'type': 'array', 'items': {'type': 'object', 'properties': {'question': {'type': 'string'}, 'answer': {'type': 'string'}}, 'required': ['question', 'answer'], 'additionalProperties': False}}}, 'required': ['results'], 'additionalProperties': False}}}]. Strict mode requires: all properties in required array, additionalProperties: false at all object levels, no nullable (use type: ['string', 'null'] instead). Model: gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18. First request slow (~preprocesses schema to CFG). Achieves 100% schema adherence vs ~80% without strict mode. Optional fields: add null to type array.
from pydantic import BaseModel, Field; from openai import OpenAI; class NestedData(BaseModel): name: str = Field(description='Item name'); value: int; class RequestData(BaseModel): items: list[NestedData]; total: int; client = OpenAI(); completion = client.chat.completions.create(model='gpt-4o-2024-08-06', messages=[...], tools=[{'type': 'function', 'function': {'name': 'process_data', 'parameters': RequestData.model_json_schema(), 'strict': True}}]). Pydantic generates JSON schema with title fields - compatible with strict mode. For nested models: all Pydantic fields auto-required unless Optional[]. Use Field(description='...') for better model understanding. Warning: Pydantic Field metadata may not set additionalProperties: false for $ref-referenced types when inlined (Jan 2025 issue). Verify schema with RequestData.model_json_schema() before deployment. LangChain alternative: from langchain.tools import StructuredTool.
Strict mode requires additionalProperties: false for all object types in schema. Error occurs when: (1) Missing additionalProperties: false at root or nested objects, (2) Using Pydantic/Zod schema generators that omit this field, (3) Empty required array (strict mode requires all properties listed). Fix: Add 'additionalProperties': False to every object definition recursively. Example correct schema: {'type': 'object', 'properties': {'name': {'type': 'string'}}, 'required': ['name'], 'additionalProperties': False}. For nested objects: nested['additionalProperties'] = False for each. Zod issue: toJSONSchema emits nullable: true (invalid for OpenAI - use type: ['string', 'null']). Python: use strict=True in OpenAI >= 1.0. Models: gpt-4o-2024-08-06+. Alternative: set strict: false for flexible schema but lose guaranteed adherence. Validate schema with API before production.
Set tool_choice to force specific function: client.chat.completions.create(model='gpt-4o', messages=[{'role': 'user', 'content': 'What is weather in SF?'}], tools=[{'type': 'function', 'function': {'name': 'get_weather', 'parameters': {...}}}], tool_choice={'type': 'function', 'function': {'name': 'get_weather'}}). This guarantees model calls get_weather (no other function, no text response). Other tool_choice options: 'auto' (default, model decides), 'required' (forces any function call), 'none' (prevents all function calls). Use for deterministic workflows where specific tool must be called. Available in gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1, o3-mini (2025). Warning: prevents parallel function calls when set to specific function. For multiple tools: use 'required' to force any tool, let model choose which.
response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools); tool_calls = response.choices[0].message.tool_calls; for tool_call in tool_calls: function_name = tool_call.function.name; function_args = json.loads(tool_call.function.arguments); result = execute_function(function_name, function_args); messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}). Each tool_call has unique id - use for response matching. Execute all functions in parallel (asyncio.gather for async). Send all results back in single API call. Disable parallel calls with parallel_tool_calls: false (forces sequential). Store toolCallStates[] array for chat history. Models supporting parallel calls: gpt-4-turbo, gpt-4o, gpt-3.5-turbo-1106+. Warning: Assistant API may return multi_tool_use.parallel as function name (known bug).
Common causes: (1) Trailing commas in JSON arrays returned by model, (2) Literal function call string instead of JSON, (3) Missing closing brackets with high token usage. Fix with try-except: import json; try: args = json.loads(tool_call.function.arguments); except json.JSONDecodeError: args = repair_json(tool_call.function.arguments). Use strict: true to reduce invalid JSON (100% reliability on gpt-4o-2024-08-06). Alternative: from best_effort_json_parser import parse_json; args = parse_json(tool_call.function.arguments). Repair strategies: remove trailing commas, add missing brackets, parse partial JSON. For streaming: buffer incomplete JSON (see streaming Q&A). Monitor: ~3/20 calls may have missing closing bracket without strict mode. Production: always use strict: true on supported models. Test schema with various inputs before deployment. Report persistent issues to OpenAI (known bugs in Assistant API).
from best_effort_json_parser import parse_json; import json; buffer = ''; async for chunk in stream: if chunk.choices[0].delta.tool_calls: delta = chunk.choices[0].delta.tool_calls[0].function.arguments; buffer += delta; try: args = json.loads(buffer); process_complete_args(args); buffer = ''; except json.JSONDecodeError: partial = parse_json(buffer); update_ui(partial). Problem: O(n²) complexity - 12KB in 5-char chunks = 15M chars parsed vs 12K needed. Solution: maintain parsing state between chunks, close quotes/brackets for partial JSON. Libraries: best-effort-json-parser auto-closes unclosed structures. For production: buffer until complete, show partial UI updates with best-effort parsing. Recent issues (2025): vLLM missing punctuation in streamed tool_calls, Vercel AI empty arguments. Alternative: disable streaming for function calls (stream: false), use streaming only for text responses. Test thoroughly - streaming + function calling has higher error rates.
Two methods: (1) Function calling: tools=[{'type': 'function', 'function': {'name': 'extract_data', 'strict': True, 'parameters': schema}}], (2) Response format: response_format={'type': 'json_schema', 'json_schema': {'name': 'response', 'strict': True, 'schema': schema}}. Required models: gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 (Aug 2024+). Schema requirements: all fields in required array, additionalProperties: false everywhere, no nullable (use ['string', 'null']). Achieves 100% schema adherence (vs ~80% JSON mode). Based on constrained sampling - limits tokens to valid schema only. First request slow (preprocesses schema to CFG). Use for: data extraction, structured API responses, deterministic parsing. Example: client.beta.chat.completions.parse(model='gpt-4o-2024-08-06', messages=messages, response_format=ResponseModel). API version: 2024-08-01-preview+. Alternative: JSON mode (response_format={'type': 'json_object'}) for flexible JSON without schema.
Use strict: true with enum in schema: {'type': 'function', 'function': {'name': 'categorize', 'strict': True, 'parameters': {'type': 'object', 'properties': {'category': {'type': 'string', 'enum': ['tech', 'sports', 'politics']}}, 'required': ['category'], 'additionalProperties': False}}}. Without strict mode: model may return values outside enum (~20% error rate). With strict mode (gpt-4o-2024-08-06): 100% adherence to enum values. Enum supports: strings, integers, floats, booleans (JSON Schema spec). For large enums: add descriptions to help model choose correctly. Example: 'enum': ['meals', 'days'], 'description': 'Unit of measurement: meals for recipe portions, days for duration'. Known issues pre-strict mode: gpt-3.5-turbo-0613 occasionally ignores enum. Production: always use strict: true for enum validation. Alternative: post-process with validation and retry if value not in enum. Supported models: gpt-4o-2024-08-06+, gpt-4o-mini-2024-07-18+.
Use function calling (tools parameter) when: (1) Need to execute external actions (API calls, database queries, calculations). (2) Model should decide which of multiple functions to call. (3) Require parallel function calls. (4) Building agentic workflows with tool selection. Use response_format structured outputs when: (1) Need structured data extraction without external actions. (2) Single output schema (no tool selection). (3) Simpler implementation (no function execution loop). (4) Better performance (single API call). Example function calling: tools=[{'type': 'function', 'function': {'name': 'get_weather', ...}}]. Example structured output: response_format={'type': 'json_schema', 'json_schema': {...}}. Both support strict mode (100% schema adherence). Function calling returns tool_calls array, structured outputs return parsed content. Structured outputs cheaper (no tool execution overhead). For data extraction: use structured outputs. For actions: use function calling. Both require gpt-4o-2024-08-06+ for strict mode.
Execute function, catch errors, send error message back to model: try: result = execute_function(function_name, function_args); except Exception as e: result = {'error': str(e)}; messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}); response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Model sees error and can retry with corrected arguments or choose different function. Example error: {'error': 'API returned 404: City not found'}. Model response: calls function again with corrected city name or asks user for clarification. Best practice: include error type and recovery hint: {'error': 'Invalid date format. Use YYYY-MM-DD format'}. Maximum retries: limit to 3 to prevent infinite loops. Alternative: set tool_choice='none' after error to force text response. For production: log failed function calls, monitor error rates, add timeout handling. Use structured error format: {'success': false, 'error': 'message', 'retry_suggestion': 'hint'}.
Maximum 128 functions in tools array (OpenAI API limit, January 2025). Each function schema counts toward context window. Large function sets (>20) slow down latency due to increased prompt processing. Best practice: limit to 5-10 functions per request for optimal performance. For >128 functions: use semantic search to select relevant subset, prompt-based filtering ('only include functions related to weather'), or multi-stage function calling (category selection → specific function). Token usage: 128 functions with detailed schemas can consume 2,000-5,000 tokens. Monitor with: completion.usage.prompt_tokens. Function names affect selection quality: use descriptive names (get_current_weather vs weather1). Include descriptions: {'name': 'get_weather', 'description': 'Get current weather for a location', 'parameters': {...}}. Models better at selecting from smaller sets. Alternative: use LangChain/LlamaIndex tool routing for 100+ functions. Production: dynamically filter tools based on user intent before API call.
Function calling adds tokens to prompt_tokens (input pricing). Token overhead: (1) Function schemas: 50-200 tokens per function depending on complexity. (2) System prompt: +100 tokens for function calling instructions. (3) Tool call responses: charged as prompt tokens when added to messages. Example: 5 functions with detailed schemas = ~500 prompt tokens overhead. Pricing (Jan 2025): gpt-4o input $2.50/1M tokens, output $10/1M tokens. Function call workflow cost: initial request (schema tokens) + completion tokens (assistant response) + function execution + follow-up request (schema + history + function result). Reduce costs: (1) Minimize function schemas (remove unnecessary parameters/descriptions). (2) Use shorter function names. (3) Cache function schemas (prompt caching: 50% discount on repeated prompts). (4) Batch requests when possible. (5) Use gpt-4o-mini ($0.15/1M input vs $2.50/1M for gpt-4o). Monitor: completion.usage shows prompt_tokens, completion_tokens. Calculate cost: (prompt_tokens * input_price + completion_tokens * output_price) / 1M.
Use clear, concise descriptions with when-to-use guidance: {'name': 'get_weather', 'description': 'Get current weather conditions for a specific location. Use when user asks about temperature, conditions, or forecasts. Requires city name and optional country code.', 'parameters': {...}}. Best practices: (1) Start with what the function does. (2) Include when to use vs alternatives. (3) Mention key parameters. (4) Use consistent format across all functions. (5) Avoid jargon - use natural language. Bad: 'Weather API endpoint'. Good: 'Get current weather for a city, including temperature and conditions'. Parameter descriptions: {'city': {'type': 'string', 'description': 'City name in English (e.g., London, New York)'}}. Include examples in descriptions for complex types. For enums: explain each value: {'unit': {'enum': ['celsius', 'fahrenheit'], 'description': 'Temperature unit: celsius for metric, fahrenheit for imperial'}}. Model uses descriptions for function selection and argument generation. Length: 10-50 words per function, 5-20 words per parameter. Test with ambiguous queries to verify correct function selected.
Omit optional parameters from required array: {'type': 'object', 'properties': {'city': {'type': 'string', 'description': 'City name'}, 'country': {'type': 'string', 'description': 'Optional country code'}, 'units': {'type': 'string', 'enum': ['celsius', 'fahrenheit'], 'description': 'Optional temperature unit'}}, 'required': ['city'], 'additionalProperties': False}. Only city is required, country and units are optional. Model may or may not include optional parameters based on context. For strict mode: omit from required makes parameter truly optional (model won't hallucinate). Provide defaults in function implementation: def get_weather(city, country='US', units='celsius'). Alternative: use null union type for optional with explicit null: {'type': ['string', 'null']}. Best practice: optional parameters should have sensible defaults. Document defaults in description: 'units': {'type': 'string', 'description': 'Temperature unit (default: celsius)'}. Test: verify function works when optional parameters absent. For backward compatibility: add new parameters as optional initially.
Use minItems and maxItems in array schema: {'items': {'type': 'array', 'items': {'type': 'string'}, 'minItems': 1, 'maxItems': 5, 'description': 'List of 1-5 search keywords'}}. Strict mode (gpt-4o-2024-08-06+) enforces constraints - model returns array with length in range. Without strict mode: model may violate constraints. Example use cases: minItems=1 ensures non-empty arrays, maxItems=10 prevents excessive API calls. For flexible arrays: omit constraints. Combine with other constraints: {'items': {'type': 'string', 'minLength': 1, 'maxLength': 50}, 'minItems': 1, 'maxItems': 5}. Validation: if not strict mode, validate in code: if not (1 <= len(items) <= 5): return error. For nested arrays: apply constraints at each level. Alternative: use description to guide model: 'Provide 3-5 keywords (optimal: 3)'. Test with prompts requesting different array sizes. Known limitation: pre-strict models may ignore maxItems for large values. Production: always use strict=True for guaranteed constraint enforcement.
Use minimum and maximum in number schema: {'temperature': {'type': 'number', 'minimum': -50, 'maximum': 50, 'description': 'Temperature in Celsius (-50 to 50)'}}. Strict mode enforces constraints - model returns value in range. For integers: use 'type': 'integer' with same constraints. Exclusive bounds: use exclusiveMinimum and exclusiveMaximum: {'type': 'number', 'exclusiveMinimum': 0, 'maximum': 100} (range: 0 < x <= 100). Multiples: use multipleOf for specific increments: {'type': 'number', 'minimum': 0, 'maximum': 100, 'multipleOf': 5} (values: 0, 5, 10, ..., 100). Without strict mode: validate in code: if not (-50 <= temp <= 50): return error. Combine with descriptions: 'Age in years (1-120, must be positive integer)'. For prices: {'type': 'number', 'minimum': 0, 'multipleOf': 0.01, 'description': 'Price in USD with 2 decimal precision'}. Test boundary values: verify model respects minimum, maximum, edge cases. Production: use strict=True for automatic validation. Alternative: include validation logic in function with helpful error messages.
Use string constraints in schema: {'email': {'type': 'string', 'pattern': '^[a-zA-Z0-9.%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$', 'description': 'Valid email address'}, 'username': {'type': 'string', 'minLength': 3, 'maxLength': 20, 'pattern': '^[a-zA-Z0-9]+$', 'description': 'Alphanumeric username (3-20 chars)'}}. Strict mode enforces all constraints. Pattern uses regex (escape backslashes: \d for digits). Common patterns: phone '^+?[1-9]\d{1,14}$', URL '^https?://.+', ISO date '^\d{4}-\d{2}-\d{2}$', hex color '^#[0-9A-Fa-f]{6}$'. Length constraints: minLength=1 prevents empty strings, maxLength limits input size. Combine constraints: {'type': 'string', 'minLength': 8, 'maxLength': 64, 'pattern': '^(?=.[A-Z])(?=.[0-9]).+$', 'description': 'Password with uppercase and number'}. Without strict mode: model may violate pattern/length. Validate in code if not using strict mode. Test with invalid inputs to verify rejection. For enums: use enum instead of pattern when fixed set of values. Production: use strict=True on gpt-4o-2024-08-06+ for automatic enforcement.
Maximum nesting depth: 5 levels recommended (OpenAI guidance, Jan 2025). Deeper nesting increases latency and reduces reliability. Example recursive schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'children': {'type': 'array', 'items': {'$ref': '#'}}}, 'required': ['name'], 'additionalProperties': False}. Use $ref: '#' for self-reference (recursive tree/graph structures). Strict mode limitation: recursive schemas not fully supported in strict=True (Jan 2025) - avoid or use non-strict. For deep structures: flatten when possible, use IDs and separate queries, limit recursion with maxItems. Example: file system (directory → subdirectories max 3 levels), org chart (manager → reports max 5 levels). Without depth limit: model may generate extremely nested structures consuming excess tokens. Alternative: iterative approach - query one level at a time. Test: verify model stops at reasonable depth. For complex nested data: consider structured outputs with predefined depth or multiple API calls. Production: document expected depth, validate output depth before processing.
Maintain full message history including tool calls and results: messages = [{'role': 'user', 'content': 'What is weather in SF?'}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools); messages.append(response.choices[0].message); for tool_call in response.choices[0].message.tool_calls: result = execute_function(tool_call); messages.append({'role': 'tool', 'content': json.dumps(result), 'tool_call_id': tool_call.id}); response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Pattern: user → assistant (with tool_calls) → tool results → assistant → user → repeat. Include all previous messages for context. For multi-turn: messages array grows with each interaction. Context window limits: gpt-4o (128K tokens), gpt-4o-mini (128K). Manage long conversations: summarize old messages, truncate early messages, use sliding window (keep last N messages). Store conversation state: persist messages array in database/session. For parallel tool calls: append all tool results before next API call. Cost optimization: remove old tool messages after summarization. Test: verify model references earlier tool results correctly.
Use zodToJsonSchema library: import { zodToJsonSchema } from 'zod-to-json-schema'; import { z } from 'zod'; const WeatherSchema = z.object({city: z.string(), units: z.enum(['celsius', 'fahrenheit']).optional()}); const jsonSchema = zodToJsonSchema(WeatherSchema, {$refStrategy: 'none'}); const tool = {type: 'function', function: {name: 'get_weather', parameters: jsonSchema, strict: true}}. IMPORTANT: Zod emits nullable: true (invalid for OpenAI) - use type: ['string', 'null'] instead. Fix: manually replace nullable or use transform. Set $refStrategy: 'none' to inline definitions (OpenAI doesn't support external $ref). For strict mode: ensure additionalProperties: false - Zod may omit this. Alternative: zod-openai library handles OpenAI-specific conversions. Validation: console.log(JSON.stringify(jsonSchema)) to verify schema. Common issues: Zod .optional() → required array handling, .default() → enum defaults, .refine() custom validation (not supported, handle in code). Production: test generated schema with OpenAI API before deployment. For complex schemas: use Zod .describe() for parameter descriptions.
Multi-stage testing approach: (1) Schema validation: Use JSON Schema validator (jsonschema Python library, ajv for JavaScript) to verify schema is valid. (2) Dry run: Call OpenAI API with test prompts and verify tool_calls structure: response = client.chat.completions.create(model='gpt-4o', messages=[{'role': 'user', 'content': 'test prompt'}], tools=tools); assert response.choices[0].message.tool_calls is not None. (3) Edge cases: Test with ambiguous prompts, missing information, invalid values, boundary conditions. (4) Argument validation: Parse function arguments and validate: args = json.loads(tool_call.function.arguments); validate_args(args). (5) Integration testing: Execute actual functions with returned arguments. (6) Load testing: Verify performance with concurrent requests. Use strict=True for schema enforcement. Create test suite with expected function calls: assert tool_call.function.name == 'get_weather'. Monitor: completion.usage tokens, latency, error rates. For production: canary deployment (5% traffic), monitoring with Datadog/Sentry, fallback to non-function-calling mode on errors.
Use exponential backoff with jitter: import time; import random; from openai import OpenAI, RateLimitError, APIError; client = OpenAI(); max_retries = 3; for attempt in range(max_retries): try: response = client.chat.completions.create(...); break; except RateLimitError: if attempt == max_retries - 1: raise; wait = (2 ** attempt) + random.uniform(0, 1); time.sleep(wait). Retry conditions: (1) RateLimitError (429): always retry. (2) APIError (500, 502, 503): retry transient failures. (3) Timeout: retry with longer timeout. (4) Invalid tool arguments: don't retry (fix schema). Rate limits (Jan 2025): gpt-4o Tier 1: 500 RPM, 30K TPM, Tier 5: 10K RPM, 30M TPM. Handle Retry-After header: wait = int(error.headers.get('Retry-After', 1)). Use tenacity library: @retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(3), retry=retry_if_exception_type(RateLimitError)). For production: implement circuit breaker, monitor retry rates, use batch API for non-urgent requests, upgrade tier if hitting limits frequently.
Add system message with function usage guidelines: messages = [{'role': 'system', 'content': 'You are a helpful assistant. Only call functions when explicitly requested by user. Always verify parameters before calling. If unsure about a parameter, ask user for clarification instead of guessing.'}, {'role': 'user', 'content': 'What is weather?'}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). System prompt patterns: (1) Function selection: 'Prefer function A over B when X condition'. (2) Parameter validation: 'Never use placeholder values like TODO or UNKNOWN'. (3) Confirmation: 'Ask user confirmation before calling functions with side effects'. (4) Error handling: 'If function fails, explain error and suggest alternatives'. (5) Response format: 'After function call, provide concise summary'. Combine with tool_choice: tool_choice='required' forces function call regardless of system prompt. For strict workflows: 'You must call exactly one function per user request'. Test: verify system prompt affects function calling behavior. Production: include safety constraints ('Never call delete functions without explicit confirmation'), rate limiting guidance ('Maximum 5 API calls per request'). Monitor: check if model follows guidelines, adjust prompt based on behavior.
Combine vision and function calling: messages = [{'role': 'user', 'content': [{'type': 'text', 'text': 'Extract receipt details'}, {'type': 'image_url', 'image_url': {'url': 'data:image/jpeg;base64,...'}}]}]; tools = [{'type': 'function', 'function': {'name': 'extract_receipt', 'strict': True, 'parameters': {'type': 'object', 'properties': {'merchant': {'type': 'string'}, 'total': {'type': 'number'}, 'items': {'type': 'array', 'items': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'price': {'type': 'number'}}, 'required': ['name', 'price'], 'additionalProperties': False}}}, 'required': ['merchant', 'total', 'items'], 'additionalProperties': False}}}]; response = client.chat.completions.create(model='gpt-4o', messages=messages, tools=tools). Model analyzes image and returns structured data via function call. Use cases: receipt OCR, document extraction, image metadata, diagram analysis. Supports: gpt-4o, gpt-4o-mini (vision + function calling). Image formats: JPEG, PNG, WebP. Max image size: 20MB. URL or base64. For multiple images: add multiple image_url entries in content array. Combine with strict mode for guaranteed schema adherence. Production: validate extracted data, handle OCR errors gracefully.