OpenAI Gateway Infrastructure
Overview
This document describes the model registry system implemented for the OpenAI gateway. The registry provides centralized management of OpenAI model capabilities, parameter requirements, and API endpoint support, enabling automatic parameter adaptation and compatibility handling across OpenAI's diverse model catalog.
Problem Statement
OpenAI models vary significantly in their parameter requirements and capabilities:
- Reasoning models (o1, o3, o4, gpt-5 series): Use max_completion_tokens instead of max_tokens
- Chat models (GPT-4, GPT-3.5 series): Use max_tokens
- Legacy models (babbage-002, davinci-002, gpt-3.5-turbo-instruct): Use completions endpoint instead of chat
- Specialized models: Some support only specific API endpoints or temperature values
Users were getting errors like:
openai.BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead."}}
Solution Architecture
1. Model Registry System (openai_model_registry.py)
Core Components:
- ModelType enum: Classifies models (REASONING, CHAT, EMBEDDING, MODERATION)
- ModelCapabilities dataclass: Defines model-specific capabilities and parameter requirements
- OpenAIModelRegistry class: Manages model configurations and provides lookup functionality
Features:
- ✅ Pre-populated with 100+ known OpenAI models
- ✅ Pattern matching for unknown models
- ✅ Runtime registration of new models and patterns
- ✅ Automatic parameter name resolution (get_token_limit_param())
- ✅ Temperature validation and API endpoint awareness
2. Enhanced OpenAI Gateway (openai.py)
Key Improvements:
- ✅ Registry-based model detection (replaces hardcoded patterns)
- ✅ Automatic parameter adaptation (max_tokens ↔ max_completion_tokens)
- ✅ Enhanced logging for debugging parameter issues
- ✅ Parameter validation with helpful error messages
- ✅ Better error handling for API parameter mismatches
New Methods:
- _validate_model_parameters(): Pre-flight parameter validation
- _adapt_parameters_for_model(): Registry-based parameter adaptation
- _is_reasoning_model(): Simplified using registry
3. Comprehensive Testing
Test Coverage:
- ✅ Model registry functionality (openai_model_registry_spec.py)
- ✅ Parameter adaptation logic (updated openai_gateway_spec.py)
- ✅ Unknown model handling
- ✅ Registry extensibility
ModelCapabilities Fields
The ModelCapabilities dataclass defines all model-specific properties:
Type and Core Support:
- model_type: ModelType enum (REASONING, CHAT, EMBEDDING, MODERATION)
- supports_tools: bool — Whether the model supports function/tool calling
- supports_streaming: bool — Whether the model supports streaming responses
- supports_vision: bool — Whether the model supports image inputs
Token Limits:
- max_context_tokens: Optional[int] — Maximum input context window
- max_output_tokens: Optional[int] — Maximum output tokens
Temperature Support:
- supported_temperatures: Optional[List[float]] — Temperature restrictions
- None = All temperature values supported (default)
- [] = Temperature parameter not allowed
- [1.0] = Only temperature=1.0 supported
API Endpoint Support (v1.1.0):
- supports_chat_api: bool — Supports /v1/chat/completions endpoint (default: True)
- supports_completions_api: bool — Supports /v1/completions endpoint (default: False)
- supports_responses_api: bool — Supports /v1/responses endpoint (default: False)
API Endpoint Support
OpenAI provides three main API endpoints with different model compatibility:
Chat API (/v1/chat/completions)
Most common endpoint — Supports conversational message format with roles (user, assistant, system, tool).
- Models: Most GPT-4, GPT-4.1, GPT-4o, GPT-3.5-turbo, o1, o3, o4, gpt-5 models
- Features: Tool calling, streaming, multi-turn conversations
- Example: gpt-4o, o1, gpt-5
Completions API (/v1/completions)
Legacy endpoint — Simple prompt-completion interface without message structure.
- Models: babbage-002, davinci-002, gpt-3.5-turbo-instruct, some newer dual-endpoint models
- Features: Direct text completion, no tool calling
- Example: gpt-3.5-turbo-instruct, babbage-002
Responses API (/v1/responses)
Newer endpoint — Specialized for advanced reasoning and research tasks.
- Models: o1-pro, o3-deep-research, o4-mini-deep-research, gpt-5-pro, codex-mini-latest
- Features: Extended reasoning, longer output, research-oriented
- Example: gpt-5-pro, o3-deep-research
Dual-Endpoint Models
Some models support multiple endpoints:
- gpt-4o-mini: Chat + Completions
- gpt-4.1-nano: Chat + Completions
- gpt-5.1: Chat + Completions
Note: The Mojentic gateway currently only uses the Chat API. The endpoint support flags are informational and used for future compatibility planning.
Model Classification
Reasoning Models
Pattern: o1, o3, o4, gpt-5
Parameter: max_completion_tokens
Tools/Streaming: Most now support both (as of 2026-02-04 audit)
Exceptions:
- gpt-5-pro: Responses API only, no tools/streaming
- o3-deep-research, o4-mini-deep-research: Responses API, support tools/streaming
- Audio/search variants: Limited tool/streaming support
Examples:
# o1, o3, o4 series
o1, o1-2024-12-17, o1-pro
o3, o3-mini, o3-pro, o3-deep-research
o4-mini, o4-mini-deep-research
# GPT-5 series
gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro
gpt-5.1, gpt-5.1-2025-11-13, gpt-5.1-chat-latest
gpt-5.2, gpt-5.2-2025-12-11, gpt-5.2-chat-latest
gpt-5-codex, codex-mini-latest
Chat Models
Pattern: gpt-4, gpt-4.1, gpt-3.5
Parameter: max_tokens
Tools/Streaming: Generally supported
Exceptions*:
- Audio models (gpt-4o-audio-preview): No tools/streaming
- Search models (gpt-4o-search-preview): No tools, no temperature parameter
- chatgpt-4o-latest: No tools
- gpt-4.1-nano: No tools
- Instruct models (gpt-3.5-turbo-instruct): Completions API only
Examples:
# GPT-4 series
gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini
chatgpt-4o-latest
# GPT-4.1 series
gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
# GPT-3.5 series
gpt-3.5-turbo, gpt-3.5-turbo-0125
gpt-3.5-turbo-instruct # Completions API
# Special variants
gpt-4o-audio-preview
gpt-4o-search-preview
gpt-5-chat-latest # Chat model, not reasoning
gpt-5-search-api
Embedding Models
Pattern: text-embedding
Endpoint: Embeddings API (/v1/embeddings)
Examples*: text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
Legacy Models
Completions API only — No chat, tools, or streaming support.
Examples: babbage-002, davinci-002
Unknown Models
Fallback: Pattern matching infers type based on name Logging: Pattern matching attempts logged for debugging Default: Chat model capabilities if no pattern matches
Temperature Restrictions
Different models have varying temperature support:
All temperatures supported (supported_temperatures=None):
- Most chat models (gpt-4, gpt-4o, gpt-3.5-turbo)
- gpt-5.1 base models (gpt-5.1, gpt-5.1-2025-11-13)
- gpt-5.2 base models (gpt-5.2, gpt-5.2-2025-12-11)
Only temperature=1.0 (supported_temperatures=[1.0]):
- Most reasoning models (o1, o3, o4)
- Most gpt-5 models
- Chat-latest variants (gpt-5.1-chat-latest, gpt-5.2-chat-latest)
- Codex models
No temperature parameter (supported_temperatures=[]):
- Search models (gpt-4o-search-preview, gpt-5-search-api)
Check temperature support programmatically:
caps = registry.get_model_capabilities("gpt-5")
if caps.supports_temperature(0.7):
# Use temperature=0.7
else:
# Use default or no temperature
Key Benefits
1. Automatic Parameter Handling
# Before: Manual parameter management required
# After: Automatic adaptation based on model type
gateway.complete(
model="o1", # Reasoning model
messages=messages,
max_tokens=1000 # Automatically converted to max_completion_tokens
)
2. Enhanced Debugging
INFO: Converted token limit parameter for model
model=o1
from_param=max_tokens
to_param=max_completion_tokens
value=1000
3. Endpoint Awareness
from mojentic.llm.gateways.openai_model_registry import get_model_registry
registry = get_model_registry()
# Check endpoint support
caps = registry.get_model_capabilities("gpt-4o-mini")
print(caps.supports_chat_api) # True
print(caps.supports_completions_api) # True (dual-endpoint model)
# Check a responses-only model
caps = registry.get_model_capabilities("gpt-5-pro")
print(caps.supports_chat_api) # False
print(caps.supports_responses_api) # True
4. Extensible Architecture
# Easy to add new models
from mojentic.llm.gateways.openai_model_registry import (
get_model_registry, ModelCapabilities, ModelType
)
registry = get_model_registry()
registry.register_model("custom-model-v1", ModelCapabilities(
model_type=ModelType.REASONING,
supports_tools=True,
supports_streaming=True,
max_output_tokens=50000
))
# Easy to add new patterns
registry.register_pattern("custom", ModelType.CHAT)
5. Better Error Messages
ValueError: Reasoning model 'o1' requires 'max_completion_tokens' instead of 'max_tokens'. This should be handled automatically by parameter adaptation.
Usage Examples
Basic Usage (Automatic)
from mojentic.llm.gateways.openai import OpenAIGateway
gateway = OpenAIGateway(api_key="your-key")
# Works automatically for both model types
response = gateway.complete(
model="o1", # or "gpt-4o"
messages=messages,
max_tokens=1000 # Automatically adapted as needed
)
Querying Model Capabilities
from mojentic.llm.gateways.openai_model_registry import get_model_registry
registry = get_model_registry()
# Get capabilities for a specific model
caps = registry.get_model_capabilities("gpt-4o")
print(f"Supports tools: {caps.supports_tools}")
print(f"Supports streaming: {caps.supports_streaming}")
print(f"Supports vision: {caps.supports_vision}")
print(f"Max context: {caps.max_context_tokens}")
print(f"Max output: {caps.max_output_tokens}")
# Check temperature support
if caps.supports_temperature(0.7):
print("Temperature 0.7 is supported")
# Check endpoint support
print(f"Chat API: {caps.supports_chat_api}")
print(f"Completions API: {caps.supports_completions_api}")
print(f"Responses API: {caps.supports_responses_api}")
Registry Extension
from mojentic.llm.gateways.openai_model_registry import (
get_model_registry, ModelCapabilities, ModelType
)
registry = get_model_registry()
# Add new model with full capabilities
registry.register_model("gpt-6-preview", ModelCapabilities(
model_type=ModelType.REASONING,
supports_tools=True,
supports_streaming=True,
supports_vision=True,
max_context_tokens=500000,
max_output_tokens=100000,
supported_temperatures=None, # All temps supported
supports_chat_api=True,
supports_completions_api=False,
supports_responses_api=True
))
# Add new pattern for unknown models
registry.register_pattern("gpt-6", ModelType.REASONING)
Checking Model Lists
from mojentic.llm.gateways.openai_model_registry import get_model_registry
registry = get_model_registry()
# Get all registered models
models = registry.get_registered_models()
print(f"Total models: {len(models)}")
# Check if a specific model is reasoning type
if registry.is_reasoning_model("gpt-5"):
print("gpt-5 is a reasoning model")
Migration Guide
For Existing Code
No changes required! The infrastructure is backward compatible and handles parameter adaptation automatically.
For New Models
- Add to the registry in
openai_model_registry.py, OR - Pattern matching will automatically classify models starting with known prefixes
For Debugging Issues
- Check logs for parameter adaptation messages
- Use
registry.get_model_capabilities(model)to inspect model classification - Enable debug logging to see detailed parameter handling
Testing
Run the comprehensive test suite:
# Model registry tests
pytest src/mojentic/llm/gateways/openai_model_registry_spec.py
# Integration tests
pytest integration_checks/openai_gateway_spec.py
# Import validation
python -c "import mojentic.llm.gateways.openai_model_registry"
Conclusion
The OpenAI gateway infrastructure provides: - ✅ Automatic parameter handling — No more max_tokens vs max_completion_tokens errors - ✅ Endpoint awareness — Know which API endpoints each model supports - ✅ Temperature validation — Prevent invalid temperature values - ✅ Extensible architecture — Easy to add new models and capabilities - ✅ Enhanced debugging — Detailed logging for troubleshooting - ✅ Comprehensive testing — Robust test coverage for reliability - ✅ Backward compatibility — Existing code continues to work
This infrastructure ensures that users can work with any OpenAI model without worrying about parameter compatibility issues, while providing the flexibility to easily extend support for new models as they are released.