> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Embeddings

> Generate dense vector representations of text via `POST /v1/embeddings`.

Embeddings turn text into fixed-length float vectors you can use for similarity search, clustering, and retrieval. The endpoint is OpenAI-shaped — point an OpenAI SDK at it and it works unchanged.

The current embedding model is `bge-m3` — multilingual, 1024-dim, 8K context.

<Note>
  `encoding_format` is optional and defaults to `"float"`, matching OpenAI. Pass `"base64"` if you want the compact wire format. The one thing to avoid is sending an explicit `null` — pass a string or omit the field entirely.
</Note>

## Example

<CodeGroup>
  ```bash cURL theme={null}
  curl https://tokens.flex.ai/v1/embeddings \
    -H "Authorization: Bearer $FLEXAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "the quick brown fox",
      "encoding_format": "float"
    }'
  ```

  ```python Python theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(base_url="https://tokens.flex.ai/v1", api_key=os.environ["FLEXAI_API_KEY"])

  resp = client.embeddings.create(
      model="bge-m3",
      input="the quick brown fox",
      encoding_format="float",  # optional; "float" is the default
  )
  vector = resp.data[0].embedding   # list[float], len 1024
  ```

  ```typescript TypeScript theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://tokens.flex.ai/v1",
    apiKey: process.env.FLEXAI_API_KEY,
  });

  const resp = await client.embeddings.create({
    model: "bge-m3",
    input: "the quick brown fox",
    encoding_format: "float",   // optional; "float" is the default
  });
  const vector = resp.data[0].embedding;   // number[], len 1024
  ```
</CodeGroup>

## Batch inputs

Pass an array of strings to embed several at once. The response `data[]` order matches the input order.

```python Python theme={null}
resp = client.embeddings.create(
    model="bge-m3",
    input=["the quick brown fox", "jumps over the lazy dog"],
    encoding_format="float",
)
vectors = [d.embedding for d in resp.data]
```

## Response

```json theme={null}
{
  "object": "list",
  "model": "bge-m3",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0123, -0.456, ...] }
  ],
  "usage": { "prompt_tokens": 5, "total_tokens": 5 }
}
```

With `encoding_format: "base64"`, each `embedding` field is a base64-encoded byte string of little-endian float32 values instead of a JSON array of numbers. Decode with your language's `base64` + struct/buffer helpers.

## Billing

Embeddings bill per input token only — `output_per_mtok` is `0`. See [billing](/inference-api/reference/billing) for the active rate.

## See also

* [Model discovery](/inference-api/guides/model-discovery) — finding embedding models programmatically (`supports` contains `embeddings`).
* [OpenAI compatibility](/inference-api/reference/openai-compatibility) — the full list of supported endpoints and deviations.
