> ## Documentation Index > Fetch the complete documentation index at: https://docs.flex.ai/llms.txt > Use this file to discover all available pages before exploring further. # FlexAI Token Factory > An OpenAI-compatible inference API for open models — one key, every model, billed per token.

Every model you and your agent need — one OpenAI-compatible key.

FlexAI Token Factory is an OpenAI-compatible inference API for open text, code, reasoning, vision, and embedding models. Point any OpenAI SDK at tokens.flex.ai and your existing code works unchanged.

Add a billing address and card to create your API key, then pay per token — no packs, no subscriptions. See billing.

Get an API key Read the quickstart

## Drop-in OpenAI compatibility Already have code that calls OpenAI? Change the base URL and key — nothing else. ```bash cURL theme={null} curl https://tokens.flex.ai/v1/chat/completions \ -H "Authorization: Bearer $FLEXAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Meta-Llama-3.1-8B-Instruct-FP8", "messages": [{"role": "user", "content": "Say hello in one line."}] }' ``` ```python Python theme={null} # pip install openai import os from openai import OpenAI client = OpenAI( base_url="https://tokens.flex.ai/v1", api_key=os.environ["FLEXAI_API_KEY"], ) resp = client.chat.completions.create( model="Meta-Llama-3.1-8B-Instruct-FP8", messages=[{"role": "user", "content": "Say hello in one line."}], ) print(resp.choices[0].message.content) ``` ```typescript TypeScript theme={null} // npm install openai import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://tokens.flex.ai/v1", apiKey: process.env.FLEXAI_API_KEY, }); const resp = await client.chat.completions.create({ model: "Meta-Llama-3.1-8B-Instruct-FP8", messages: [{ role: "user", content: "Say hello in one line." }], }); console.log(resp.choices[0].message.content); ``` Browse the live catalog at [flex.ai/models](https://flex.ai/models), then follow the full [Quickstart](/inference-api/quickstart) to get a key and make your first request. *** ## Why Token Factory A drop-in for the OpenAI SDK. Swap the base URL and model id — keep your code, prompts, and tools. Tool calls, streaming, structured output, and vision — all native. See the [streaming](/inference-api/guides/streaming) and [vision](/inference-api/guides/vision) guides. Text, code, reasoning, vision, and embedding models behind one key. Filter the live catalog from code with [model discovery](/inference-api/guides/model-discovery). Dollar-denominated, per-token billing — no token packs, no subscriptions. Account-level budgets and rate limits. *** ## Start building What the API is, what you get, and where everything lives. Get a key and make your first request in under two minutes. Streaming, tool use, vision, embeddings, model discovery, batching, and concurrency. Endpoints, the compatibility matrix, errors, and billing. *** ## Scale beyond serverless The serverless API is the fastest way to start. When you need more, the FlexAI platform also offers dedicated inference endpoints, fine-tuning, training, and private deployments. Dedicated endpoints, fine-tuning, training, and platform services. Pricing, the full product story, and scaling to private AI cloud. ***

Ready to build?

Get an API key and make your first request in minutes.

Get an API key Talk to us

Need help? Email [support@flex.ai](mailto:support@flex.ai), join our [Slack community](https://join.slack.com/t/flexaicommunity/shared_invite/zt-3fqfcq9hj-Bv_Ehtyip0Y6fjS7gG5hHg), or check [status.flex.ai](https://status.flex.ai).