Skip to main content
The FlexAI Inference API is an OpenAI-compatible HTTP gateway for text, code, reasoning, vision, and embedding models. Point any OpenAI SDK at https://tokens.flex.ai/v1 and your existing code works unchanged.

Quickstart

Get a key and make your first request in under two minutes.

Model catalog

Browse every model we host, with context windows and pricing.

Model discovery

Filter the live catalog from code and find the right endpoint per modality.

Streaming

Stream tokens token-by-token with usage tracking.

Vision

Send images to multimodal models in the OpenAI image_url shape.

Tool use

Call functions from model responses.

What you get

  • One API key, every model. Chat, completions, vision, and embeddings all authenticate with the same sk-… bearer token.
  • Per-key budgets and rate limits. Every key has its own spend cap and requests-per-minute limit. Exceed either and you get an unambiguous 402 or 429 with headers explaining why.
  • Dollar-denominated credits. Top up with a credit card; spend down at per-model rates. No token packs, no subscriptions.
  • $10 of free credit on signup so you can evaluate the API before adding a card.

OpenAI compatibility

The surface is intentionally identical to OpenAI’s for the endpoints we support. If your code runs against api.openai.com, swap the base_url and the model id — that’s the whole migration. See the compatibility matrix for what’s in and out at launch.

Where things live

SurfaceURL
API base URLhttps://tokens.flex.ai
Dashboard (keys, billing, usage)https://tokens.flex.ai/dashboard
Statushttps://status.flex.ai
Docs (this site)https://docs.flex.ai/inference-api