Embeddings - FlexAI Docs

Embeddings turn text into fixed-length float vectors you can use for similarity search, clustering, and retrieval. The endpoint is OpenAI-shaped — point an OpenAI SDK at it and it works unchanged. The current embedding model is bge-m3 — multilingual, 1024-dim, 8K context.

encoding_format is optional and defaults to "float", matching OpenAI. Pass "base64" if you want the compact wire format. The one thing to avoid is sending an explicit null — pass a string or omit the field entirely.

Example

curl https://tokens.flex.ai/v1/embeddings \
  -H "Authorization: Bearer $FLEXAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": "the quick brown fox",
    "encoding_format": "float"
  }'

import os
from openai import OpenAI

client = OpenAI(base_url="https://tokens.flex.ai/v1", api_key=os.environ["FLEXAI_API_KEY"])

resp = client.embeddings.create(
    model="bge-m3",
    input="the quick brown fox",
    encoding_format="float",  # optional; "float" is the default
)
vector = resp.data[0].embedding   # list[float], len 1024

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://tokens.flex.ai/v1",
  apiKey: process.env.FLEXAI_API_KEY,
});

const resp = await client.embeddings.create({
  model: "bge-m3",
  input: "the quick brown fox",
  encoding_format: "float",   // optional; "float" is the default
});
const vector = resp.data[0].embedding;   // number[], len 1024

Batch inputs

Pass an array of strings to embed several at once. The response data[] order matches the input order.

Python

resp = client.embeddings.create(
    model="bge-m3",
    input=["the quick brown fox", "jumps over the lazy dog"],
    encoding_format="float",
)
vectors = [d.embedding for d in resp.data]

Response

{
  "object": "list",
  "model": "bge-m3",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0123, -0.456, ...] }
  ],
  "usage": { "prompt_tokens": 5, "total_tokens": 5 }
}

With encoding_format: "base64", each embedding field is a base64-encoded byte string of little-endian float32 values instead of a JSON array of numbers. Decode with your language’s base64 + struct/buffer helpers.

Billing

Embeddings bill per input token only — output_per_mtok is 0. See billing for the active rate.

​Example

​Batch inputs

​Response

​Billing

​See also

Example

Batch inputs

Response

Billing

See also