Multiple in-flight requests on the same API key run independently and in parallel. There is no head-of-line blocking, no per-key concurrency cap, and no implicit queuing — each request is dispatched to a backend as soon as it arrives. The only bound on a single key is the per-minute request ceiling described below. Inside that ceiling, fan out as wide as your workload needs.Documentation Index
Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Per-key rate limit
Each key carries a requests-per-minute (RPM) tier. When you exceed it the API returns429 Too Many Requests with a Retry-After header — see Rate limits for the full headers.
These numbers are generated from
backend/api_rate_limiter.py and backend/lago_client.py in the token-service repo. They update automatically when those files change.| Tier | Requests per minute |
|---|---|
| Free (default on signup) | 10 |
| Elevated (approved users) | 60 |
| Paid | 100 |
We may introduce account-level concurrency or aggregate RPM caps as the platform scales. We will notify customers before tightening any existing per-key limit; we will not silently lower it.
Backpressure pattern
Bound your in-flight count to roughlyRPM / 60 if you want to stay in a steady-state window, and retry on 429 with exponential backoff that respects Retry-After:
Ordering
Concurrent requests on one key are independent — completion order is not guaranteed to match submission order. If you need to correlate responses back to inputs, carry your own correlation id in the prompt or response metadata; don’t rely on arrival order. Theid we return (chatcmpl-…) is unique per request and safe to use as a join key in your logs.
Related
- Batching — there is no
/v1/batchesendpoint today; client-side fan-out is the recommended pattern. - Billing & quotas — tier limits and rate-limit response headers.
- Errors — the full 429 body.