OpenAI Compatibility - FlexAI Docs

The Inference API implements a subset of OpenAI’s v1 surface. For everything below marked Supported, code written against api.openai.com works by changing only base_url and model.

Spec version we track

The chat-completions request validator at our gateway is authored against openai-python 1.109.1 — specifically the request shape openai.types.chat.completion_create_params.CompletionCreateParamsBase. That same version is pinned in our gateway image, so the SDK you call against and the shape we validate can’t drift apart silently. We validate a focused subset of the SDK’s surface — the fields where validation gaps were causing real problems (off-spec inputs returning 200 with garbage, or 500 with a stack trace). Unknown or unsupported fields are forwarded to the engine and accepted or rejected there. The matrix below is the exact subset. Bumps are deliberate: when we upgrade the pin, a snapshot test fails until we audit the SDK diff, and we update this page in the same PR.

Endpoints

Endpoint	Supported	Notes
`POST /v1/chat/completions`	Yes	Streaming, tools, vision (on multimodal models), seed, response_format.
`POST /v1/completions`	Yes	Legacy; prefer chat completions for new integrations.
`GET /v1/models`	Yes	Lists every live model across all modalities — text, code, reasoning, multimodal, embeddings, image, audio, video — in OpenAI’s `Model` shape, each annotated with a `category` field so you can pick the matching endpoint. Only models currently serving appear. See model discovery for the category→endpoint mapping, or the catalog for the full roster with pricing.
`POST /v1/embeddings`	Yes	`bge-m3` (multilingual, 1024-dim, 8K context). `encoding_format` is optional and defaults to `"float"`; `"base64"` is also accepted. (Pass `"float"` or omit the field — don’t send an explicit `null`.) See the model catalog.
`POST /v1/fine-tuning`	No	Use FlexAI Fine-Tuning instead.
`POST /v1/assistants`, `/v1/threads`, `/v1/runs`	No	Not planned.
`POST /v1/batches`	No	Not supported — use client-side fan-out instead. See Batching.
`GET /v1/files`, `/v1/files/*`	No	Use inline request payloads.

Request fields

Field on `POST /v1/chat/completions`	Supported	Notes
`model`, `messages`, `temperature`, `top_p`, `max_tokens`, `stop`, `seed`, `user`	Yes	—
`stream`, `stream_options.include_usage`	Yes	Set `include_usage: true` when streaming for correct billing.
`tools`, `tool_choice`	Yes	On models labeled `tool_use` in the catalog.
`response_format: { type: "json_object" }`	Yes	On most text models.
`response_format: { type: "json_schema", ... }`	Partial	Supported where the underlying model supports it; falls back to `json_object` otherwise.
`logprobs`, `top_logprobs`	No	Not at launch.
`n` (multiple completions per request)	No	Gateway returns a single choice; sending `n > 1` returns a `400` with `param: "n"`.
`presence_penalty`, `frequency_penalty`	Yes	—

Response fields

Field	Supported	Notes
`id`, `created`, `model`, `choices`, `usage`	Yes	—
`usage.cache_read_input_tokens`, `usage.cache_creation_input_tokens`	Yes	Zero on models without prompt caching; populated for cache-capable models.
`system_fingerprint`	No	Not exposed.

Authentication & headers

Bearer tokens only (Authorization: Bearer sk-…). No OAuth, no organization header.
Our rate limit response includes Retry-After and x-ratelimit-* headers — see errors.
CORS is restricted; the API is intended for server-to-server calls, not browsers.

Error shape

Errors use OpenAI’s envelope — { "error": { "message", "type", … } } — in two flavors. Request-validation failures (400) come back verbatim in OpenAI’s shape, with param set to the offending field path (or null when it isn’t field-specific) and code set to null — we reserve non-null code strings the way OpenAI does, for a narrow documented set like context_length_exceeded:

{ "error": { "message": "…", "type": "invalid_request_error", "param": "messages[0].role", "code": null } }

Auth and quota failures add a FlexAI-only doc_url extension so your first failure points you at the dashboard:

{
  "error": {
    "message": "Invalid API key. Create one from the dashboard.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "doc_url": "https://tokens.flex.ai/dashboard/keys"
  }
}

See the error reference for the full list of codes.

​Spec version we track

​Endpoints

​Request fields

​Response fields

​Authentication & headers

​Error shape