Model Catalog

This page is generated from docs/models.yaml in the token-service repo. Edit that file to change the catalog — a CI workflow opens a PR here when it changes.

Text & Chat

Model ID	Display Name	Context	Pricing	Capabilities	Status
`Qwen/Qwen2.5-32B-Instruct`	Qwen 2.5 32B Instruct	32K	TBD	`chat`, `streaming`, `tool_use`	active
`Qwen/Qwen3-8B`	Qwen 3 8B	32K	TBD	`chat`, `streaming`, `tool_use`	active
`meta-llama/Llama-3.1-8B-Instruct`	Llama 3.1 8B Instruct	128K	TBD	`chat`, `streaming`, `tool_use`	active
`mistralai/Mistral-Nemo-Instruct-2407`	Mistral Nemo Instruct	128K	TBD	`chat`, `streaming`, `tool_use`	active
`zai-org/GLM-4.5-Air-FP8`	GLM 4.5 Air	128K	TBD	`chat`, `streaming`, `tool_use`	active
`openai/gpt-oss-20b`	GPT-OSS 20B	128K	TBD	`chat`, `streaming`, `tool_use`	active
`openai/gpt-oss-120b`	GPT-OSS 120B	128K	TBD	`chat`, `streaming`, `tool_use`	active
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`	Nemotron Nano 30B	128K	TBD	`chat`, `streaming`, `tool_use`	active
`Qwen/Qwen3.5-35B-A3B-FP8`	Qwen 3.5 35B	32K	TBD	`chat`, `streaming`, `tool_use`	active
`meta-llama/Llama-3.1-70B-Instruct`	Llama 3.1 70B Instruct	128K	TBD	`chat`, `streaming`	active

Code

Model ID	Display Name	Context	Pricing	Capabilities	Status
`Qwen/Qwen3-Coder-30B-A3B-Instruct`	Qwen 3 Coder 30B	256K	TBD	`chat`, `streaming`, `tool_use`	active

Reasoning

Model ID	Display Name	Context	Pricing	Capabilities	Status
`Qwen/Qwen3-30B-A3B-Thinking-2507`	Qwen 3 30B Thinking	128K	TBD	`chat`, `streaming`, `reasoning`	active
`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`	DeepSeek R1 Distill 32B	32K	TBD	`chat`, `streaming`, `reasoning`	active
`google/gemma-4-31b-it`	Gemma 4 31B	128K	TBD	`chat`, `streaming`, `reasoning`	active

Multimodal

Model ID	Display Name	Context	Pricing	Capabilities	Status
`microsoft/phi-4-multimodal-instruct`	Phi-4 Multimodal	128K	TBD	`chat`, `streaming`, `vision`, `audio_in`	active

Embeddings

Model ID	Display Name	Context	Pricing	Capabilities	Status
`BAAI/bge-m3`	BGE-M3	8K	TBD	`embed`	active

Image

Model ID	Display Name	Context	Pricing	Capabilities	Status
`black-forest-labs/FLUX.1-schnell`	FLUX.1 schnell	—	$0.003 / image	`image_out`	active
`black-forest-labs/FLUX.1-Kontext-dev`	FLUX.1 Kontext (dev)	—	$0.003 / image	`image_out`, `image_in`	inactive

Video

Model ID	Display Name	Context	Pricing	Capabilities	Status
`Wan-AI/Wan2.2-T2V-A14B-Diffusers`	Wan 2.2 Text-to-Video	—	$0.08 / second	`video_out`	inactive

Text-to-Speech

Model ID	Display Name	Context	Pricing	Capabilities	Status
`voxtral-4b-tts`	Voxtral 4B TTS	—	$0.00018 / char	`audio_out`	active
`hexgrad/Kokoro-82M`	Kokoro TTS (82M)	—	TBD	`audio_out`	inactive

Speech-to-Text

Model ID	Display Name	Context	Pricing	Capabilities	Status
`whisper-large-v3-turbo`	Whisper Large v3 Turbo	—	$0.006 / minute	`audio_in`	active
`nvidia/parakeet-tdt-0.6b-v3`	Parakeet TDT 0.6B	—	TBD	`audio_in`	inactive

Adding a model

When FlexAI engineering deploys a new model, it is added to docs/models.yaml in flexaihq/token-service — the sync workflow then opens a PR against this docs repo to refresh the table above.

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Blueprints

Text & Chat

Code

Reasoning

Multimodal

Embeddings

Image

Video

Text-to-Speech

Speech-to-Text

Adding a model

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Blueprints

​Text & Chat

​Code

​Reasoning

​Multimodal

​Embeddings

​Image

​Video

​Text-to-Speech

​Speech-to-Text

​Adding a model

Text & Chat

Code

Reasoning

Multimodal

Embeddings

Image

Video

Text-to-Speech

Speech-to-Text

Adding a model