Fair use
Normal application traffic is always welcome — interactive chat, batch jobs, evaluation runs, production workloads. You do not need to ask permission to ramp up legitimate usage. What we ask is simply that your traffic reflect real work rather than patterns that exist only to consume capacity. Each API key carries explicit rate limits (requests per minute and tokens per minute) tied to your tier. Those limits are the hard ceiling and are enforced inline — requests over the limit receive429 Too Many Requests. The policy described below sits above those limits: it is about sustained patterns that, while they may stay under the per-request limits, still indicate abuse or a runaway client.
What we monitor
We continuously review per-key activity for a small set of patterns. On day one these are observed and reviewed, not automatically blocked — a flagged key triggers a human review, not an instant cutoff. The patterns are:- Sustained excessive request rate. A single key sending requests far above what its tier and normal application behavior would produce, held over a sustained window.
- Sustained excessive token consumption. A single key generating tokens at a rate well beyond normal usage, sustained over time — the clearest signal of one key burning a disproportionate share of GPU budget.
- Runaway or hung requests. A key accumulating many very long-running requests. This usually points at a misconfigured client (for example, an unbounded
max_tokenson a non-streaming call that runs until it is cut off) rather than deliberate abuse, but the effect on shared capacity is the same.
max_tokens, or stream the response. Streaming keeps the connection alive token-by-token and avoids the timeouts that long non-streamed generations hit.
What happens when a key is flagged
When a key trips one of these patterns, the typical sequence is:- Review. We look at the account and the traffic to distinguish a legitimate ramp (a batch job, a load test, real growth) from abuse or a runaway client. A legitimate ramp needs no action from you — if anything, it is a signal to talk about a higher tier.
- Contact. If the pattern looks like abuse or a misconfigured client that is degrading the service for others, we reach out.
- Throttle or disable. If a key is actively degrading availability for other customers and the situation cannot wait, we may temporarily disable it. A disabled key receives
402 Payment Requiredon its next call; in-flight work is allowed to finish, but no new requests are accepted until the key is re-enabled. We re-enable as soon as the issue is resolved.