api.openai.com works by changing only base_url and model.
Endpoints
| Endpoint | Supported | Notes |
|---|---|---|
POST /v1/chat/completions | Yes | Streaming, tools, vision (on multimodal models), seed, response_format. |
POST /v1/completions | Yes | Legacy; prefer chat completions for new integrations. |
GET /v1/models | Yes | Text/chat models only. Media models are in the catalog. |
POST /v1/images/generations | Yes | Returns b64_json. url response format not supported. |
POST /v1/videos/generations | Yes | Async; not in OpenAI’s API today but follows the same auth/shape. |
POST /v1/audio/speech | Yes | TTS; voice names differ per model — check the model page. |
POST /v1/audio/transcriptions | Yes | Whisper-compatible multipart/form-data. |
POST /v1/embeddings | No | No embedding model is deployed. The route is reachable through the gateway but returns 400 invalid_request_error for any model value. |
POST /v1/audio/translations | No | Use transcriptions + a chat call for translation. |
POST /v1/fine-tuning | No | Use FlexAI Fine-Tuning instead. |
POST /v1/assistants, /v1/threads, /v1/runs | No | Not planned. |
POST /v1/images/edits, /v1/images/variations | No | FLUX.1-Kontext-dev reserved for future image editing support. |
POST /v1/batches | No | Not planned at launch. |
GET /v1/files, /v1/files/* | No | Use inline request payloads. |
Request fields
Field on POST /v1/chat/completions | Supported | Notes |
|---|---|---|
model, messages, temperature, top_p, max_tokens, stop, seed, user | Yes | — |
stream, stream_options.include_usage | Yes | Set include_usage: true when streaming for correct billing. |
tools, tool_choice | Yes | On models labeled tool_use in the catalog. |
response_format: { type: "json_object" } | Yes | On most text models. |
response_format: { type: "json_schema", ... } | Partial | Supported where the underlying model supports it; falls back to json_object otherwise. |
logprobs, top_logprobs | No | Not at launch. |
n (multiple completions per request) | No | Returns a single choice. |
presence_penalty, frequency_penalty | Yes | — |
Response fields
| Field | Supported | Notes |
|---|---|---|
id, created, model, choices, usage | Yes | — |
usage.cache_read_input_tokens, usage.cache_creation_input_tokens | Yes | Zero on models without prompt caching; populated for cache-capable models. |
system_fingerprint | No | Not exposed. |
Authentication & headers
- Bearer tokens only (
Authorization: Bearer sk-…). No OAuth, no organization header. - Our rate limit response includes
Retry-Afterandx-ratelimit-*headers — see errors. - CORS is restricted; the API is intended for server-to-server calls, not browsers.