This page is generated from docs/models.yaml in the token-service repo. Edit that file to change the catalog — a CI workflow opens a PR here when it changes.
Text & Chat
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
Qwen/Qwen2.5-32B-Instruct | Qwen 2.5 32B Instruct | 32K | TBD | chat, streaming, tool_use | active |
Qwen/Qwen3-8B | Qwen 3 8B | 32K | TBD | chat, streaming, tool_use | active |
meta-llama/Llama-3.1-8B-Instruct | Llama 3.1 8B Instruct | 128K | TBD | chat, streaming, tool_use | active |
mistralai/Mistral-Nemo-Instruct-2407 | Mistral Nemo Instruct | 128K | TBD | chat, streaming, tool_use | active |
zai-org/GLM-4.5-Air-FP8 | GLM 4.5 Air | 128K | TBD | chat, streaming, tool_use | active |
openai/gpt-oss-20b | GPT-OSS 20B | 128K | TBD | chat, streaming, tool_use | active |
openai/gpt-oss-120b | GPT-OSS 120B | 128K | TBD | chat, streaming, tool_use | active |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 | Nemotron Nano 30B | 128K | TBD | chat, streaming, tool_use | active |
Qwen/Qwen3.5-35B-A3B-FP8 | Qwen 3.5 35B | 32K | TBD | chat, streaming, tool_use | active |
meta-llama/Llama-3.1-70B-Instruct | Llama 3.1 70B Instruct | 128K | TBD | chat, streaming | active |
Code
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
Qwen/Qwen3-Coder-30B-A3B-Instruct | Qwen 3 Coder 30B | 256K | TBD | chat, streaming, tool_use | active |
Reasoning
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
Qwen/Qwen3-30B-A3B-Thinking-2507 | Qwen 3 30B Thinking | 128K | TBD | chat, streaming, reasoning | active |
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | DeepSeek R1 Distill 32B | 32K | TBD | chat, streaming, reasoning | active |
google/gemma-4-31b-it | Gemma 4 31B | 128K | TBD | chat, streaming, reasoning | active |
Multimodal
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
microsoft/phi-4-multimodal-instruct | Phi-4 Multimodal | 128K | TBD | chat, streaming, vision, audio_in | active |
Embeddings
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
BAAI/bge-m3 | BGE-M3 | 8K | TBD | embed | active |
Image
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
black-forest-labs/FLUX.1-schnell | FLUX.1 schnell | — | $0.003 / image | image_out | active |
black-forest-labs/FLUX.1-Kontext-dev | FLUX.1 Kontext (dev) | — | $0.003 / image | image_out, image_in | inactive |
Video
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
Wan-AI/Wan2.2-T2V-A14B-Diffusers | Wan 2.2 Text-to-Video | — | $0.08 / second | video_out | inactive |
Text-to-Speech
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
voxtral-4b-tts | Voxtral 4B TTS | — | $0.00018 / char | audio_out | active |
hexgrad/Kokoro-82M | Kokoro TTS (82M) | — | TBD | audio_out | inactive |
Speech-to-Text
| Model ID | Display Name | Context | Pricing | Capabilities | Status |
|---|
whisper-large-v3-turbo | Whisper Large v3 Turbo | — | $0.006 / minute | audio_in | active |
nvidia/parakeet-tdt-0.6b-v3 | Parakeet TDT 0.6B | — | TBD | audio_in | inactive |
Adding a model
When FlexAI engineering deploys a new model, it is added to docs/models.yaml
in flexaihq/token-service — the sync
workflow then opens a PR against this docs repo to refresh the table above.