Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.flex.ai/llms.txt

Use this file to discover all available pages before exploring further.

Embeddings turn text into fixed-length float vectors you can use for similarity search, clustering, and retrieval. The endpoint is OpenAI-shaped — point an OpenAI SDK at it and it works unchanged. The current embedding model is BAAI/bge-m3 — multilingual, 1024-dim, 8K context.
encoding_format is optional and defaults to "float", matching OpenAI. Pass "base64" if you want the compact wire format. The one thing to avoid is sending an explicit null — pass a string or omit the field entirely.

Example

curl https://tokens.flex.ai/v1/embeddings \
  -H "Authorization: Bearer $FLEXAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-m3",
    "input": "the quick brown fox",
    "encoding_format": "float"
  }'

Batch inputs

Pass an array of strings to embed several at once. The response data[] order matches the input order.
Python
resp = client.embeddings.create(
    model="BAAI/bge-m3",
    input=["the quick brown fox", "jumps over the lazy dog"],
    encoding_format="float",
)
vectors = [d.embedding for d in resp.data]

Response

{
  "object": "list",
  "model": "BAAI/bge-m3",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0123, -0.456, ...] }
  ],
  "usage": { "prompt_tokens": 5, "total_tokens": 5 }
}
With encoding_format: "base64", each embedding field is a base64-encoded byte string of little-endian float32 values instead of a JSON array of numbers. Decode with your language’s base64 + struct/buffer helpers.

Billing

Embeddings bill per input token only — output_per_mtok is 0. See billing for the active rate.

See also