> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI Compatibility

> What is and isn't supported at launch when you point an OpenAI SDK at tokens.flex.ai.

Point an OpenAI SDK at `tokens.flex.ai` and, for everything marked Supported below, code written against `api.openai.com` works by changing only `base_url` and `model`. This page is the exact compatibility matrix — the endpoints and request fields the gateway implements at launch.

## Spec version we track

The chat-completions request validator at our gateway is authored against **`openai-python` 1.109.1** — specifically the request shape `openai.types.chat.completion_create_params.CompletionCreateParamsBase`. That same version is pinned in our gateway image, so the SDK you call against and the shape we validate can't drift apart silently.

We validate a focused subset of the SDK's surface — the fields where validation gaps were causing real problems (off-spec inputs returning `200` with garbage, or `500` with a stack trace). Unknown or unsupported fields are forwarded to the engine and accepted or rejected there. The matrix below is the exact subset.

Bumps are deliberate: when we upgrade the pin, a snapshot test fails until we audit the SDK diff, and we update this page in the same PR.

## Endpoints

| Endpoint                                         | Supported | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| ------------------------------------------------ | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `POST /v1/chat/completions`                      | Yes       | Streaming, tools, vision (on multimodal models), seed, response\_format.                                                                                                                                                                                                                                                                                                                                                                          |
| `POST /v1/completions`                           | Yes       | Legacy; prefer chat completions for new integrations.                                                                                                                                                                                                                                                                                                                                                                                             |
| `GET /v1/models`                                 | Yes       | Lists every **live** model across all modalities — text, code, reasoning, multimodal, embeddings, image, audio, video — in OpenAI's `Model` shape, each annotated with a `category` field so you can pick the matching endpoint. Only models currently serving appear. See [model discovery](/inference-api/guides/model-discovery) for the category→endpoint mapping, or the [catalog](https://flex.ai/models) for the full roster with pricing. |
| `POST /v1/embeddings`                            | Yes       | `bge-m3` (multilingual, 1024-dim, 8K context). `encoding_format` is optional and defaults to `"float"`; `"base64"` is also accepted. (Pass `"float"` or omit the field — don't send an explicit `null`.) See the [model catalog](https://flex.ai/models).                                                                                                                                                                                         |
| `POST /v1/fine-tuning`                           | No        | Use [FlexAI Fine-Tuning](/core-services/fine-tuning) instead.                                                                                                                                                                                                                                                                                                                                                                                     |
| `POST /v1/assistants`, `/v1/threads`, `/v1/runs` | No        | Not planned.                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `POST /v1/batches`                               | No        | Coming soon — [reach out](mailto:support@flex.ai) if you need it. See [Batching](/inference-api/guides/batch).                                                                                                                                                                                                                                                                                                                                    |
| `GET /v1/files`, `/v1/files/*`                   | No        | Use inline request payloads.                                                                                                                                                                                                                                                                                                                                                                                                                      |

## Request fields

| Field on `POST /v1/chat/completions`                                | Supported | Notes                                                                                                                                                                                            |
| ------------------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model`, `messages`, `temperature`, `top_p`, `stop`, `seed`, `user` | Yes       | —                                                                                                                                                                                                |
| `max_tokens`                                                        | Yes       | Current limitation: non-streaming responses are capped at 2048 output tokens regardless of this value (`finish_reason: "length"`); [stream](/inference-api/guides/streaming) for longer outputs. |
| `stream`, `stream_options.include_usage`                            | Yes       | Set `include_usage: true` when streaming for correct billing.                                                                                                                                    |
| `tools`, `tool_choice`                                              | Yes       | On models labeled `tool_use` in the catalog.                                                                                                                                                     |
| `response_format: { type: "json_object" }`                          | Yes       | On most text models.                                                                                                                                                                             |
| `response_format: { type: "json_schema", ... }`                     | Partial   | Supported where the underlying model supports it; falls back to `json_object` otherwise.                                                                                                         |
| `logprobs`, `top_logprobs`                                          | No        | Not at launch.                                                                                                                                                                                   |
| `n` (multiple completions per request)                              | No        | Gateway returns a single choice; sending `n > 1` returns a `400` with `param: "n"`.                                                                                                              |
| `presence_penalty`, `frequency_penalty`                             | Yes       | —                                                                                                                                                                                                |

## Response fields

| Field                                                                | Supported | Notes                                                                      |
| -------------------------------------------------------------------- | --------- | -------------------------------------------------------------------------- |
| `id`, `created`, `model`, `choices`, `usage`                         | Yes       | —                                                                          |
| `usage.cache_read_input_tokens`, `usage.cache_creation_input_tokens` | Yes       | Zero on models without prompt caching; populated for cache-capable models. |
| `system_fingerprint`                                                 | No        | Not exposed.                                                               |

## Authentication & headers

* Bearer tokens only (`Authorization: Bearer sk-…`). No OAuth, no organization header.
* Our rate limit response includes `Retry-After` and `x-ratelimit-*` headers — see [errors](/inference-api/reference/errors).
* CORS is restricted; the API is intended for server-to-server calls, not browsers.

## Error shape

Errors use OpenAI's envelope — `{ "error": { "message", "type", … } }` — in two flavors.

**Request-validation failures** (`400`) come back verbatim in OpenAI's shape, with `param` set to the offending field path (or `null` when it isn't field-specific) and `code` set to `null` — we reserve non-null `code` strings the way OpenAI does, for a narrow documented set like `context_length_exceeded`:

```json theme={null}
{ "error": { "message": "…", "type": "invalid_request_error", "param": "messages[0].role", "code": null } }
```

**Auth and quota failures** add a FlexAI-only `doc_url` extension so your first failure points you at the dashboard:

```json theme={null}
{
  "error": {
    "message": "Invalid API key. Create one from the dashboard.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "doc_url": "https://tokens.flex.ai/dashboard/keys"
  }
}
```

See [the error reference](/inference-api/reference/errors) for the full list of codes.