> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> OpenAI-compatible inference API for text, code, reasoning, vision, and embedding models. One API key, billed per token.

The FlexAI Inference API is an OpenAI-compatible HTTP gateway for text, code, reasoning, vision, and embedding models. Point any OpenAI SDK at `https://tokens.flex.ai/v1` and your existing code works unchanged.

<CardGroup cols={2}>
  <Card title="Quickstart" icon="rocket" href="/inference-api/quickstart">
    Get a key and make your first request in under two minutes.
  </Card>

  <Card title="Model catalog" icon="list" href="https://flex.ai/models">
    Browse every model we host, with context windows and pricing.
  </Card>

  <Card title="Model discovery" icon="magnifying-glass" href="/inference-api/guides/model-discovery">
    Filter the live catalog from code and find the right endpoint per modality.
  </Card>

  <Card title="Streaming" icon="bolt" href="/inference-api/guides/streaming">
    Stream tokens token-by-token with usage tracking.
  </Card>

  <Card title="Vision" icon="image" href="/inference-api/guides/vision">
    Send images to multimodal models in the OpenAI `image_url` shape.
  </Card>

  <Card title="Tool use" icon="wrench" href="/inference-api/guides/tool-use">
    Call functions from model responses.
  </Card>
</CardGroup>

## What you get

* **One API key, every model.** Chat, completions, vision, and embeddings all authenticate with the same `sk-…` bearer token.
* **Per-key budgets and rate limits.** Every key has its own spend cap and requests-per-minute limit. Exceed either and you get an unambiguous 402 or 429 with headers explaining why.
* **Dollar-denominated credits.** Top up with a credit card; spend down at per-model rates. No token packs, no subscriptions.
* **\$10 of free credit** on signup so you can evaluate the API before adding a card.

## OpenAI compatibility

The surface is intentionally identical to OpenAI's for the endpoints we support. If your code runs against `api.openai.com`, swap the `base_url` and the model id — that's the whole migration. See the [compatibility matrix](/inference-api/reference/openai-compatibility) for what's in and out at launch.

## Where things live

| Surface                          | URL                                  |
| -------------------------------- | ------------------------------------ |
| API base URL                     | `https://tokens.flex.ai`             |
| Dashboard (keys, billing, usage) | `https://tokens.flex.ai/dashboard`   |
| Status                           | `https://status.flex.ai`             |
| Docs (this site)                 | `https://docs.flex.ai/inference-api` |
