https://tokens.flex.ai/v1 and your existing code works unchanged.
Quickstart
Get a key and make your first request in under two minutes.
Model catalog
Browse every model we host, with context windows and pricing.
Model discovery
Filter the live catalog from code and find the right endpoint per modality.
Streaming
Stream tokens token-by-token with usage tracking.
Vision
Send images to multimodal models in the OpenAI
image_url shape.Tool use
Call functions from model responses.
What you get
- One API key, every model. Chat, completions, vision, and embeddings all authenticate with the same
sk-…bearer token. - Per-key budgets and rate limits. Every key has its own spend cap and requests-per-minute limit. Exceed either and you get an unambiguous 402 or 429 with headers explaining why.
- Dollar-denominated credits. Top up with a credit card; spend down at per-model rates. No token packs, no subscriptions.
- $10 of free credit on signup so you can evaluate the API before adding a card.
OpenAI compatibility
The surface is intentionally identical to OpenAI’s for the endpoints we support. If your code runs againstapi.openai.com, swap the base_url and the model id — that’s the whole migration. See the compatibility matrix for what’s in and out at launch.
Where things live
| Surface | URL |
|---|---|
| API base URL | https://tokens.flex.ai |
| Dashboard (keys, billing, usage) | https://tokens.flex.ai/dashboard |
| Status | https://status.flex.ai |
| Docs (this site) | https://docs.flex.ai/inference-api |