> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Model discovery

> How to list the model catalog programmatically, find the right model for your task, and call the matching endpoint.

There are two ways to enumerate the catalog at runtime, and they cover different audiences.

| Endpoint          | Auth          | Returns                                                                                                                                                                                                              | Use when                                                                                |
| ----------------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| `GET /api/models` | None (public) | Every model in the catalog, across all modalities, whether or not it's currently serving. Includes per-token `pricing`, capability tags, and per-model metadata (context window, provider, playground availability). | You want one public call that surfaces the whole catalog with capabilities and pricing. |
| `GET /v1/models`  | Bearer key    | Every **live** model across all modalities in OpenAI's `Model` shape, each with a `category` field. Only models currently serving appear.                                                                            | Your client speaks OpenAI's `/v1/models`, or you want just what's callable right now.   |

`/v1/models` is the OpenAI-shaped list of what's live right now; each entry's `category` tells you which endpoint to call (see [Category → endpoint](#category--endpoint)). `/api/models` is the full catalog with capability tags, per-model metadata, and per-token pricing.

## Calling `/api/models`

No auth, no headers, no params. Returns a JSON array of model objects.

<CodeGroup>
  ```bash cURL theme={null}
  curl https://tokens.flex.ai/api/models | jq '.[0]'
  ```

  ```python Python theme={null}
  import requests

  models = requests.get("https://tokens.flex.ai/api/models").json()
  print(models[0])
  ```

  ```typescript TypeScript theme={null}
  const models = await fetch("https://tokens.flex.ai/api/models").then(r => r.json());
  console.log(models[0]);
  ```
</CodeGroup>

Each entry follows this shape:

```json theme={null}
{
  "slug": "meta-llama-llama-3.3-70b-instruct",
  "model_name": "Llama-3.3-70B-Instruct-FP8",
  "display_name": "Llama 3.3 70B Instruct",
  "description": "Meta Llama 3.3 70B Instruct, served as an FP8 quantization.",
  "category": "text",
  "context_window": 131072,
  "max_output": 8192,
  "input_per_mtok": 0.1,
  "output_per_mtok": 0.32,
  "cached_input_per_mtok": null,
  "supports": ["chat", "completion", "streaming", "tool_use", "structured_outputs"],
  "tool_use_caveat": false,
  "added_at": "2024-12-06T00:00:00+00:00",
  "provider": "Meta",
  "provider_logo": "/provider-logos/meta.webp",
  "playground_enabled": true,
  "pricing": []
}
```

The fields you'll filter on most often:

* **`category`** — the model's modality/family: `text`, `code`, `reasoning`, `multimodal`, `vision`, `embedding`, `image`, `audio`, `video`. Determines which endpoint to call (see [Category → endpoint](#category--endpoint)).
* **`supports[]`** — capability tags like `chat`, `tool_use`, `streaming`, `vision`, `reasoning`, `embeddings`. This is what to filter on when you care about a specific capability.
* **`model_name`** — the value to pass as `model` in subsequent API calls. (`slug` is the dashboard URL form; `model_name` is the API form.)
* **`input_per_mtok` / `output_per_mtok`** — per-token pricing (USD per million tokens).
* **`playground_enabled`** — whether the model is callable from the dashboard playground.

A few filter examples:

```bash theme={null}
# All tool-calling models
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("tool_use"))]'

# All vision-capable models (accept image input on /v1/chat/completions)
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("vision"))]'

# Just the names and prices of embedding models
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("embeddings")) | {model_name, input_per_mtok}]'
```

## Category → endpoint

A model's `category` (on both `/api/models` and `/v1/models`) tells you which endpoint it's called through:

| `category`                                          | Call with                       |
| --------------------------------------------------- | ------------------------------- |
| `text`, `code`, `reasoning`, `multimodal`, `vision` | `POST /v1/chat/completions`     |
| `embedding`                                         | `POST /v1/embeddings`           |
| `image`                                             | `POST /v1/images/generations`   |
| `audio` (text-to-speech)                            | `POST /v1/audio/speech`         |
| `audio` (speech-to-text)                            | `POST /v1/audio/transcriptions` |
| `video`                                             | `POST /v1/videos/generations`   |

For `audio`, the `supports[]` tags disambiguate direction: a model carrying `audio_transcription` is speech-to-text; otherwise it's text-to-speech.

## Per-capability discovery flow

Each subsection: how to find the models, then the endpoint to call once you have a `model_name`.

### Chat & text

```bash theme={null}
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("chat")) | .model_name]'
```

Call with `POST /v1/chat/completions`. See [streaming](/inference-api/guides/streaming) and [tool use](/inference-api/guides/tool-use) for the common patterns.

### Vision (image input on chat)

```bash theme={null}
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("vision")) | .model_name]'
```

Call with `POST /v1/chat/completions` and pass image parts in the `content` array. The full pattern lives in the [vision guide](/inference-api/guides/vision).

### Embeddings

```bash theme={null}
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("embeddings")) | .model_name]'
```

Call with `POST /v1/embeddings`. See the [embeddings guide](/inference-api/guides/embeddings).

## See also

* [Model catalog](https://flex.ai/models) — human-readable table of every hosted model, with capabilities and pricing.
* [OpenAI compatibility](/inference-api/reference/openai-compatibility) — what's in and out of `/v1/*`, including the scope of `/v1/models`.
* [Billing](/inference-api/reference/billing) — how per-token pricing maps to charges.