api.openai.com works by changing only base_url and model.
Spec version we track
The chat-completions request validator at our gateway is authored againstopenai-python 1.109.1 — specifically the request shape openai.types.chat.completion_create_params.CompletionCreateParamsBase. That same version is pinned in our gateway image, so the SDK you call against and the shape we validate can’t drift apart silently.
We validate a focused subset of the SDK’s surface — the fields where validation gaps were causing real problems (off-spec inputs returning 200 with garbage, or 500 with a stack trace). Unknown or unsupported fields are forwarded to the engine and accepted or rejected there. The matrix below is the exact subset.
Bumps are deliberate: when we upgrade the pin, a snapshot test fails until we audit the SDK diff, and we update this page in the same PR.
Endpoints
| Endpoint | Supported | Notes |
|---|---|---|
POST /v1/chat/completions | Yes | Streaming, tools, vision (on multimodal models), seed, response_format. |
POST /v1/completions | Yes | Legacy; prefer chat completions for new integrations. |
GET /v1/models | Yes | Lists every live model across all modalities — text, code, reasoning, multimodal, embeddings, image, audio, video — in OpenAI’s Model shape, each annotated with a category field so you can pick the matching endpoint. Only models currently serving appear. See model discovery for the category→endpoint mapping, or the catalog for the full roster with pricing. |
POST /v1/embeddings | Yes | bge-m3 (multilingual, 1024-dim, 8K context). encoding_format is optional and defaults to "float"; "base64" is also accepted. (Pass "float" or omit the field — don’t send an explicit null.) See the model catalog. |
POST /v1/fine-tuning | No | Use FlexAI Fine-Tuning instead. |
POST /v1/assistants, /v1/threads, /v1/runs | No | Not planned. |
POST /v1/batches | No | Not supported — use client-side fan-out instead. See Batching. |
GET /v1/files, /v1/files/* | No | Use inline request payloads. |
Request fields
Field on POST /v1/chat/completions | Supported | Notes |
|---|---|---|
model, messages, temperature, top_p, max_tokens, stop, seed, user | Yes | — |
stream, stream_options.include_usage | Yes | Set include_usage: true when streaming for correct billing. |
tools, tool_choice | Yes | On models labeled tool_use in the catalog. |
response_format: { type: "json_object" } | Yes | On most text models. |
response_format: { type: "json_schema", ... } | Partial | Supported where the underlying model supports it; falls back to json_object otherwise. |
logprobs, top_logprobs | No | Not at launch. |
n (multiple completions per request) | No | Gateway returns a single choice; sending n > 1 returns a 400 with param: "n". |
presence_penalty, frequency_penalty | Yes | — |
Response fields
| Field | Supported | Notes |
|---|---|---|
id, created, model, choices, usage | Yes | — |
usage.cache_read_input_tokens, usage.cache_creation_input_tokens | Yes | Zero on models without prompt caching; populated for cache-capable models. |
system_fingerprint | No | Not exposed. |
Authentication & headers
- Bearer tokens only (
Authorization: Bearer sk-…). No OAuth, no organization header. - Our rate limit response includes
Retry-Afterandx-ratelimit-*headers — see errors. - CORS is restricted; the API is intended for server-to-server calls, not browsers.
Error shape
Errors use OpenAI’s envelope —{ "error": { "message", "type", … } } — in two flavors.
Request-validation failures (400) come back verbatim in OpenAI’s shape, with param set to the offending field path (or null when it isn’t field-specific) and code set to null — we reserve non-null code strings the way OpenAI does, for a narrow documented set like context_length_exceeded:
doc_url extension so your first failure points you at the dashboard: