Documentation Index
Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Inference API implements a subset of OpenAI’s v1 surface. For everything below marked Supported, code written against api.openai.com works by changing only base_url and model.
Endpoints
| Endpoint | Supported | Notes |
|---|
POST /v1/chat/completions | Yes | Streaming, tools, vision (on multimodal models), seed, response_format. |
POST /v1/completions | Yes | Legacy; prefer chat completions for new integrations. |
GET /v1/models | Yes | Returns text, chat, and embedding models routed via LiteLLM. Media models (images, video, audio) are routed through portal-api and are not listed here — see the catalog for the complete set. |
POST /v1/images/generations | Yes | Returns b64_json. url response format not supported. |
POST /v1/videos/generations | Yes | Async; not in OpenAI’s API today but follows the same auth/shape. |
POST /v1/audio/speech | Yes | TTS; voice names differ per model — check the model page. |
POST /v1/audio/transcriptions | Yes | Whisper-compatible multipart/form-data. |
POST /v1/embeddings | Partial | BAAI/bge-m3 at launch (multilingual, 1024-dim, 8K context). Deviation from OpenAI: our backend currently requires encoding_format in the request body, while OpenAI treats it as optional (defaulting to "float"). Set it explicitly for now — SDKs that omit it will get a 400. We plan to default it server-side so this caveat goes away. See the model catalog. |
POST /v1/audio/translations | No | Use transcriptions + a chat call for translation. |
POST /v1/fine-tuning | No | Use FlexAI Fine-Tuning instead. |
POST /v1/assistants, /v1/threads, /v1/runs | No | Not planned. |
POST /v1/images/edits, /v1/images/variations | No | FLUX.1-Kontext-dev reserved for future image editing support. |
POST /v1/batches | No | Not planned at launch. |
GET /v1/files, /v1/files/* | No | Use inline request payloads. |
Request fields
Field on POST /v1/chat/completions | Supported | Notes |
|---|
model, messages, temperature, top_p, max_tokens, stop, seed, user | Yes | — |
stream, stream_options.include_usage | Yes | Set include_usage: true when streaming for correct billing. |
tools, tool_choice | Yes | On models labeled tool_use in the catalog. |
response_format: { type: "json_object" } | Yes | On most text models. |
response_format: { type: "json_schema", ... } | Partial | Supported where the underlying model supports it; falls back to json_object otherwise. |
logprobs, top_logprobs | No | Not at launch. |
n (multiple completions per request) | No | Returns a single choice. |
presence_penalty, frequency_penalty | Yes | — |
Response fields
| Field | Supported | Notes |
|---|
id, created, model, choices, usage | Yes | — |
usage.cache_read_input_tokens, usage.cache_creation_input_tokens | Yes | Zero on models without prompt caching; populated for cache-capable models. |
system_fingerprint | No | Not exposed. |
- Bearer tokens only (
Authorization: Bearer sk-…). No OAuth, no organization header.
- Our rate limit response includes
Retry-After and x-ratelimit-* headers — see errors.
- CORS is restricted; the API is intended for server-to-server calls, not browsers.
Error shape
We return OpenAI’s error envelope verbatim:
{ "error": { "message": "…", "type": "invalid_request_error", "code": null, "param": null } }
See the error reference for the full list of codes.