Skip to main content
The FlexAI Inference API is an OpenAI-compatible HTTP gateway for text, code, reasoning, vision, and embedding models. Point any OpenAI SDK at https://tokens.flex.ai/v1 and your existing code works unchanged.

Quickstart

Get a key and make your first request in under two minutes.

Model catalog

Browse every model we host, with context windows and pricing.

Model discovery

Filter the live catalog from code and find the right endpoint per modality.

Streaming

Stream tokens token-by-token with usage tracking.

Vision

Send images to multimodal models in the OpenAI image_url shape.

Tool use

Call functions from model responses.

What you get

  • One API key, every model. Chat, completions, vision, and embeddings all authenticate with the same sk-… bearer token.
  • Account-level budgets and rate limits. Every account has its own spend cap and requests-per-minute limit, shared across all your keys; per-key quotas are on the roadmap. Exceed either and you get an unambiguous 402 or 429 with headers explaining why.
  • Dollar-denominated credits. Add a billing address and credit card, then spend down at per-model rates. No token packs, no subscriptions.

OpenAI compatibility

The surface is intentionally identical to OpenAI’s for the endpoints we support. If your code runs against api.openai.com, swap the base_url, api key and the model id — that’s the whole migration. See the compatibility matrix for what’s in and out at launch.

Where things live

SurfaceURL
API base URLhttps://tokens.flex.ai
Dashboard (keys, billing, usage)https://tokens.flex.ai/dashboard
Statushttps://status.flex.ai
Docs (this site)https://docs.flex.ai/inference-api