Model discovery - FlexAI Docs

There are two ways to enumerate the catalog at runtime, and they cover different audiences.

Endpoint	Auth	Returns	Use when
`GET /api/models`	None (public)	Every model in the catalog, across all modalities, whether or not it’s currently serving. Includes per-token `pricing`, capability tags, and per-model metadata (context window, provider, playground availability).	You want one public call that surfaces the whole catalog with capabilities and pricing.
`GET /v1/models`	Bearer key	Every live model across all modalities in OpenAI’s `Model` shape, each with a `category` field. Only models currently serving appear.	Your client speaks OpenAI’s `/v1/models`, or you want just what’s callable right now.

/v1/models is the OpenAI-shaped list of what’s live right now; each entry’s category tells you which endpoint to call (see Category → endpoint). /api/models is the full catalog with capability tags, per-model metadata, and per-token pricing.

Calling `/api/models`

No auth, no headers, no params. Returns a JSON array of model objects.

curl https://tokens.flex.ai/api/models | jq '.[0]'

import requests

models = requests.get("https://tokens.flex.ai/api/models").json()
print(models[0])

const models = await fetch("https://tokens.flex.ai/api/models").then(r => r.json());
console.log(models[0]);

Each entry follows this shape:

{
  "slug": "meta-llama-llama-3.3-70b-instruct",
  "model_name": "Llama-3.3-70B-Instruct-FP8",
  "display_name": "Llama 3.3 70B Instruct",
  "description": "Meta Llama 3.3 70B Instruct, served as an FP8 quantization.",
  "category": "text",
  "context_window": 131072,
  "max_output": 8192,
  "input_per_mtok": 0.1,
  "output_per_mtok": 0.32,
  "cached_input_per_mtok": null,
  "supports": ["chat", "completion", "streaming", "tool_use", "structured_outputs"],
  "tool_use_caveat": false,
  "added_at": "2024-12-06T00:00:00+00:00",
  "provider": "Meta",
  "provider_logo": "/provider-logos/meta.webp",
  "playground_enabled": true,
  "pricing": []
}

The fields you’ll filter on most often:

category — the model’s modality/family: text, code, reasoning, multimodal, vision, embedding, image, audio, video. Determines which endpoint to call (see Category → endpoint).
supports[] — capability tags like chat, tool_use, streaming, vision, reasoning, embeddings. This is what to filter on when you care about a specific capability.
model_name — the value to pass as model in subsequent API calls. (slug is the dashboard URL form; model_name is the API form.)
input_per_mtok / output_per_mtok — per-token pricing (USD per million tokens).
playground_enabled — whether the model is callable from the dashboard playground.

A few filter examples:

# All tool-calling models
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("tool_use"))]'

# All vision-capable models (accept image input on /v1/chat/completions)
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("vision"))]'

# Just the names and prices of embedding models
curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("embeddings")) | {model_name, input_per_mtok}]'

Category → endpoint

A model’s category (on both /api/models and /v1/models) tells you which endpoint it’s called through:

`category`	Call with
`text`, `code`, `reasoning`, `multimodal`, `vision`	`POST /v1/chat/completions`
`embedding`	`POST /v1/embeddings`
`image`	`POST /v1/images/generations`
`audio` (text-to-speech)	`POST /v1/audio/speech`
`audio` (speech-to-text)	`POST /v1/audio/transcriptions`
`video`	`POST /v1/videos/generations`

For audio, the supports[] tags disambiguate direction: a model carrying audio_transcription is speech-to-text; otherwise it’s text-to-speech.

Per-capability discovery flow

Each subsection: how to find the models, then the endpoint to call once you have a model_name.

Chat & text

curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("chat")) | .model_name]'

Call with POST /v1/chat/completions. See streaming and tool use for the common patterns.

Vision (image input on chat)

curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("vision")) | .model_name]'

Call with POST /v1/chat/completions and pass image parts in the content array. The full pattern lives in the vision guide.

Embeddings

curl -s https://tokens.flex.ai/api/models \
  | jq '[.[] | select(.supports | index("embeddings")) | .model_name]'

Call with POST /v1/embeddings. See the embeddings guide.

​Calling /api/models

​Category → endpoint

​Per-capability discovery flow

​Chat & text

​Vision (image input on chat)

​Embeddings

​See also