Skip to main content
The FlexAI Inference API is an OpenAI-compatible HTTP gateway for text, code, reasoning, vision, image, video, and audio models. Point any OpenAI SDK at https://tokens.flex.ai/v1 and your existing code works unchanged.

Quickstart

Get a key and make your first request in under two minutes.

Model catalog

Browse every model we host, with context windows and pricing.

Streaming

Stream tokens token-by-token with usage tracking.

Vision

Send images to multimodal models in the OpenAI image_url shape.

Tool use

Call functions from model responses.

What you get

  • One API key, every model. Chat, completions, images, video, and audio all authenticate with the same sk-… bearer token.
  • Per-key budgets and rate limits. Every key has its own spend cap and requests-per-minute limit. Exceed either and you get an unambiguous 402 or 429 with headers explaining why.
  • Dollar-denominated credits. Top up with a credit card; spend down at per-model rates. No token packs, no subscriptions.
  • $10 of free credit on signup so you can evaluate the API before adding a card.

OpenAI compatibility

The surface is intentionally identical to OpenAI’s for the endpoints we support. If your code runs against api.openai.com, swap the base_url and the model id — that’s the whole migration. See the compatibility matrix for what’s in and out at launch.

Where things live

SurfaceURL
Production APIhttps://tokens.flex.ai
Staging APIhttps://tokens.flexsystems.ai
Dashboard (keys, billing, usage)https://tokens.flex.ai/dashboard
Statushttps://status.flex.ai
Docs (this site)https://docs.flex.ai/inference-api