Skip to main content
The Inference API implements a subset of OpenAI’s v1 surface. For everything below marked Supported, code written against api.openai.com works by changing only base_url and model.

Spec version we track

The chat-completions request validator at our gateway is authored against openai-python 1.109.1 — specifically the request shape openai.types.chat.completion_create_params.CompletionCreateParamsBase. That same version is pinned in our gateway image, so the SDK you call against and the shape we validate can’t drift apart silently. We validate a focused subset of the SDK’s surface — the fields where validation gaps were causing real problems (off-spec inputs returning 200 with garbage, or 500 with a stack trace). Unknown or unsupported fields are forwarded to the engine and accepted or rejected there. The matrix below is the exact subset. Bumps are deliberate: when we upgrade the pin, a snapshot test fails until we audit the SDK diff, and we update this page in the same PR.

Endpoints

EndpointSupportedNotes
POST /v1/chat/completionsYesStreaming, tools, vision (on multimodal models), seed, response_format.
POST /v1/completionsYesLegacy; prefer chat completions for new integrations.
GET /v1/modelsYesLists every live model across all modalities — text, code, reasoning, multimodal, embeddings, image, audio, video — in OpenAI’s Model shape, each annotated with a category field so you can pick the matching endpoint. Only models currently serving appear. See model discovery for the category→endpoint mapping, or the catalog for the full roster with pricing.
POST /v1/embeddingsYesbge-m3 (multilingual, 1024-dim, 8K context). encoding_format is optional and defaults to "float"; "base64" is also accepted. (Pass "float" or omit the field — don’t send an explicit null.) See the model catalog.
POST /v1/fine-tuningNoUse FlexAI Fine-Tuning instead.
POST /v1/assistants, /v1/threads, /v1/runsNoNot planned.
POST /v1/batchesNoNot supported — use client-side fan-out instead. See Batching.
GET /v1/files, /v1/files/*NoUse inline request payloads.

Request fields

Field on POST /v1/chat/completionsSupportedNotes
model, messages, temperature, top_p, max_tokens, stop, seed, userYes
stream, stream_options.include_usageYesSet include_usage: true when streaming for correct billing.
tools, tool_choiceYesOn models labeled tool_use in the catalog.
response_format: { type: "json_object" }YesOn most text models.
response_format: { type: "json_schema", ... }PartialSupported where the underlying model supports it; falls back to json_object otherwise.
logprobs, top_logprobsNoNot at launch.
n (multiple completions per request)NoGateway returns a single choice; sending n > 1 returns a 400 with param: "n".
presence_penalty, frequency_penaltyYes

Response fields

FieldSupportedNotes
id, created, model, choices, usageYes
usage.cache_read_input_tokens, usage.cache_creation_input_tokensYesZero on models without prompt caching; populated for cache-capable models.
system_fingerprintNoNot exposed.

Authentication & headers

  • Bearer tokens only (Authorization: Bearer sk-…). No OAuth, no organization header.
  • Our rate limit response includes Retry-After and x-ratelimit-* headers — see errors.
  • CORS is restricted; the API is intended for server-to-server calls, not browsers.

Error shape

Errors use OpenAI’s envelope — { "error": { "message", "type", … } } — in two flavors. Request-validation failures (400) come back verbatim in OpenAI’s shape, with param set to the offending field path (or null when it isn’t field-specific) and code set to null — we reserve non-null code strings the way OpenAI does, for a narrow documented set like context_length_exceeded:
{ "error": { "message": "…", "type": "invalid_request_error", "param": "messages[0].role", "code": null } }
Auth and quota failures add a FlexAI-only doc_url extension so your first failure points you at the dashboard:
{
  "error": {
    "message": "Invalid API key. Create one from the dashboard.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "doc_url": "https://tokens.flex.ai/dashboard/keys"
  }
}
See the error reference for the full list of codes.