Skip to main content
This page is generated from docs/models.yaml in the token-service repo. Edit that file to change the catalog — a CI workflow opens a PR here when it changes.

Text & Chat

Model IDDisplay NameContextPricingCapabilitiesStatus
Qwen/Qwen2.5-32B-InstructQwen 2.5 32B Instruct32KTBDchat, streaming, tool_useactive
Qwen/Qwen3-8BQwen 3 8B32KTBDchat, streaming, tool_useactive
meta-llama/Llama-3.1-8B-InstructLlama 3.1 8B Instruct128KTBDchat, streaming, tool_useactive
mistralai/Mistral-Nemo-Instruct-2407Mistral Nemo Instruct128KTBDchat, streaming, tool_useactive
zai-org/GLM-4.5-Air-FP8GLM 4.5 Air128KTBDchat, streaming, tool_useactive
openai/gpt-oss-20bGPT-OSS 20B128KTBDchat, streaming, tool_useactive
openai/gpt-oss-120bGPT-OSS 120B128KTBDchat, streaming, tool_useactive
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8Nemotron Nano 30B128KTBDchat, streaming, tool_useactive
Qwen/Qwen3.5-35B-A3B-FP8Qwen 3.5 35B32KTBDchat, streaming, tool_useactive
meta-llama/Llama-3.1-70B-InstructLlama 3.1 70B Instruct128KTBDchat, streamingactive

Code

Model IDDisplay NameContextPricingCapabilitiesStatus
Qwen/Qwen3-Coder-30B-A3B-InstructQwen 3 Coder 30B256KTBDchat, streaming, tool_useactive

Reasoning

Model IDDisplay NameContextPricingCapabilitiesStatus
Qwen/Qwen3-30B-A3B-Thinking-2507Qwen 3 30B Thinking128KTBDchat, streaming, reasoningactive
deepseek-ai/DeepSeek-R1-Distill-Qwen-32BDeepSeek R1 Distill 32B32KTBDchat, streaming, reasoningactive
google/gemma-4-31b-itGemma 4 31B128KTBDchat, streaming, reasoningactive

Multimodal

Model IDDisplay NameContextPricingCapabilitiesStatus
microsoft/phi-4-multimodal-instructPhi-4 Multimodal128KTBDchat, streaming, vision, audio_inactive

Embeddings

Model IDDisplay NameContextPricingCapabilitiesStatus
BAAI/bge-m3BGE-M38KTBDembedactive

Image

Model IDDisplay NameContextPricingCapabilitiesStatus
black-forest-labs/FLUX.1-schnellFLUX.1 schnell$0.003 / imageimage_outactive
black-forest-labs/FLUX.1-Kontext-devFLUX.1 Kontext (dev)$0.003 / imageimage_out, image_ininactive

Video

Model IDDisplay NameContextPricingCapabilitiesStatus
Wan-AI/Wan2.2-T2V-A14B-DiffusersWan 2.2 Text-to-Video$0.08 / secondvideo_outinactive

Text-to-Speech

Model IDDisplay NameContextPricingCapabilitiesStatus
voxtral-4b-ttsVoxtral 4B TTS$0.00018 / charaudio_outactive
hexgrad/Kokoro-82MKokoro TTS (82M)TBDaudio_outinactive

Speech-to-Text

Model IDDisplay NameContextPricingCapabilitiesStatus
whisper-large-v3-turboWhisper Large v3 Turbo$0.006 / minuteaudio_inactive
nvidia/parakeet-tdt-0.6b-v3Parakeet TDT 0.6BTBDaudio_ininactive

Adding a model

When FlexAI engineering deploys a new model, it is added to docs/models.yaml in flexaihq/token-service — the sync workflow then opens a PR against this docs repo to refresh the table above.