> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# FlexAI Token Factory

> An OpenAI-compatible inference API for open models — one key, every model, billed per token.

<div className="tf-hero">
  <p className="tf-tagline">Every model you and your agent need — one OpenAI-compatible key.</p>
  <p className="tf-sub">FlexAI Token Factory is an OpenAI-compatible inference API for open text, code, reasoning, vision, and embedding models. Point any OpenAI SDK at <code>tokens.flex.ai</code> and your existing code works unchanged.</p>
</div>

<div className="blueprint-cta tf-cta-center">
  <p>Add a billing address and card to create your API key, then pay per token — no packs, no subscriptions. See <a href="/inference-api/reference/billing">billing</a>.</p>
  <a className="cta-primary" href="https://tokens.flex.ai/signup">Get an API key</a>
  <a className="cta-secondary" href="/inference-api/quickstart">Read the quickstart</a>
</div>

## Drop-in OpenAI compatibility

Already have code that calls OpenAI? Change the base URL and key — nothing else.

<CodeGroup>
  ```bash cURL theme={null}
  curl https://tokens.flex.ai/v1/chat/completions \
    -H "Authorization: Bearer $FLEXAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "Meta-Llama-3.1-8B-Instruct-FP8",
      "messages": [{"role": "user", "content": "Say hello in one line."}]
    }'
  ```

  ```python Python theme={null}
  # pip install openai
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://tokens.flex.ai/v1",
      api_key=os.environ["FLEXAI_API_KEY"],
  )

  resp = client.chat.completions.create(
      model="Meta-Llama-3.1-8B-Instruct-FP8",
      messages=[{"role": "user", "content": "Say hello in one line."}],
  )
  print(resp.choices[0].message.content)
  ```

  ```typescript TypeScript theme={null}
  // npm install openai
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://tokens.flex.ai/v1",
    apiKey: process.env.FLEXAI_API_KEY,
  });

  const resp = await client.chat.completions.create({
    model: "Meta-Llama-3.1-8B-Instruct-FP8",
    messages: [{ role: "user", content: "Say hello in one line." }],
  });
  console.log(resp.choices[0].message.content);
  ```
</CodeGroup>

Browse the live catalog at [flex.ai/models](https://flex.ai/models), then follow the full [Quickstart](/inference-api/quickstart) to get a key and make your first request.

***

## Why Token Factory

<CardGroup cols={2}>
  <Card title="OpenAI-compatible" icon="plug" href="/inference-api/reference/openai-compatibility">
    A drop-in for the OpenAI SDK. Swap the base URL and model id — keep your code, prompts, and tools.
  </Card>

  <Card title="Agent-ready" icon="robot" href="/inference-api/guides/tool-use">
    Tool calls, streaming, structured output, and vision — all native. See the [streaming](/inference-api/guides/streaming) and [vision](/inference-api/guides/vision) guides.
  </Card>

  <Card title="Every model, discoverable" icon="layer-group" href="https://flex.ai/models">
    Text, code, reasoning, vision, and embedding models behind one key. Filter the live catalog from code with [model discovery](/inference-api/guides/model-discovery).
  </Card>

  <Card title="Transparent per-token pricing" icon="credit-card" href="/inference-api/reference/billing">
    Dollar-denominated, per-token billing — no token packs, no subscriptions. Account-level budgets and rate limits.
  </Card>
</CardGroup>

***

## Start building

<CardGroup cols={2}>
  <Card title="Overview" icon="book-open" href="/inference-api/overview">
    What the API is, what you get, and where everything lives.
  </Card>

  <Card title="Quickstart" icon="rocket" href="/inference-api/quickstart">
    Get a key and make your first request in under two minutes.
  </Card>

  <Card title="Guides" icon="book" href="/inference-api/guides/streaming">
    Streaming, tool use, vision, embeddings, model discovery, batching, and concurrency.
  </Card>

  <Card title="API reference" icon="code" href="/inference-api/reference/openai-compatibility">
    Endpoints, the compatibility matrix, errors, and billing.
  </Card>
</CardGroup>

***

## Scale beyond serverless

The serverless API is the fastest way to start. When you need more, the FlexAI platform also offers dedicated inference endpoints, fine-tuning, training, and private deployments.

<CardGroup cols={2}>
  <Card title="Explore the platform" icon="layer-group" href="/getting-started">
    Dedicated endpoints, fine-tuning, training, and platform services.
  </Card>

  <Card title="flex.ai" icon="up-right-from-square" href="https://flex.ai">
    Pricing, the full product story, and scaling to private AI cloud.
  </Card>
</CardGroup>

***

<div className="blueprint-cta">
  <h3>Ready to build?</h3>
  <p>Get an API key and make your first request in minutes.</p>
  <a className="cta-primary" href="https://tokens.flex.ai/signup">Get an API key</a>
  <a className="cta-secondary" href="https://flex.ai/contact">Talk to us</a>
</div>

Need help? Email [support@flex.ai](mailto:support@flex.ai), join our [Slack community](https://join.slack.com/t/flexaicommunity/shared_invite/zt-3fqfcq9hj-Bv_Ehtyip0Y6fjS7gG5hHg), or check [status.flex.ai](https://status.flex.ai).
