Documentation Index
Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
Use this file to discover all available pages before exploring further.
There is no
/v1/batches endpoint. The OpenAI Batch API surface is not supported and is not on the launch roadmap. If you are migrating Batch-API code, replace it with the client-side fan-out below.Why no batch endpoint
The Batch API trades latency for cost: jobs run within 24 hours at a discount. We don’t gate capacity that way — every model is served from a live pool, and concurrent online requests on a single key already run independently up to your tier’s per-minute limit. See Concurrency. If you need bulk throughput, fan out concurrent requests. Cost is the same as one-by-one because we bill per token, not per call.Recommended pattern: bounded fan-out
Bound concurrency to your tier’s RPM divided by 60 (or whatever your job’s per-request latency budget supports), retry on429, and write results back keyed by your own correlation id — completion order is not preserved.
What you give up vs. a true batch API
- No 24-hour discount. Bulk jobs cost the same per token as online traffic.
- No server-side job state. If your client crashes mid-run, you re-issue the missing rows yourself. Persist progress (correlation id → result) as you go.
- Per-key RPM still applies. The fan-out pattern doesn’t bypass the rate limit — it just keeps you under it. See Concurrency.