Rate limits
The gateway enforces three independent limits on your traffic. All three return 429 rate_limit_exceeded when tripped, with a Retry-After header telling you how long to wait.
| Limit | Scope | Default | Configurable | |---|---|---|---| | Per-key rate limit | One API key | 100 requests per minute | Yes — per key, in the dashboard | | Per-account rate limit | All keys on your account | 1,000 requests per minute | Yes — in account settings | | Daily spend cap | One key, or your account | Unlimited (off by default) | Yes — per key and per account |
Per-key rate limit
Each API key has its own request-rate budget. The default is 100 requests per minute — a 60-second sliding window. When the window fills, the next request returns 429.
You configure the rate limit when you create the key, or by editing the key in the dashboard:
- Lower for development keys or keys exposed to less-trusted environments.
- Higher for production keys driving real traffic.
The rate limit is enforced on the proxy in front of every vendor call, with negligible added latency.
Per-account rate limit
Across all your keys combined, your account has a rate ceiling. Default: 1,000 requests per minute. This prevents a single account from accidentally (or otherwise) saturating the platform regardless of how many keys it shards across.
If you have a legitimate workload that needs a higher account-level ceiling, contact support — we lift it on request for accounts with the volume to back it.
Daily spend caps
Independent of rate, you can cap how many dollars a key (or your whole account) can spend in a single day. The cap is off by default at V1. When you set one:
- The day window is your local day (per the timezone Stripe has for your account).
- The cap is checked before a request is forwarded to the vendor.
- If you've exceeded the cap, the request returns
429withRetry-Afterset to the seconds remaining until your local midnight. - Worst case overage: one request's actual cost. The cap is checked post-hoc on the previous request's cost, so a single in-flight request can land you fractionally above the cap.
Daily caps are the cleanest defense against a runaway client or a leaked key. Setting a small cap on every key — even just $25/day on a dev key — turns "I left a loop running over the weekend" from a horror story into a Monday-morning slack message.
The Retry-After header
Every 429 response includes a Retry-After header with the number of seconds to wait before retrying:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
{
"error": {
"type": "rate_limit_exceeded",
"message": "Rate limit exceeded on key 'unnma-sk-abc1'. Retry after 12 seconds.",
"request_id": "req_01abc..."
}
}The value reflects the specific limit you tripped:
- For a per-minute rate limit: seconds until your sliding window has room.
- For a daily spend cap: seconds until your local midnight.
Always honor it. Retrying earlier just produces another 429.
Handling 429 in client code
A simple, effective pattern: exponential backoff with respect for Retry-After.
Python
import time
import random
import httpx
def call_with_backoff(client, payload, max_attempts=5):
for attempt in range(max_attempts):
response = client.post("/v1/messages", json=payload)
if response.status_code != 429:
response.raise_for_status()
return response.json()
# 429: respect Retry-After
retry_after = float(response.headers.get("Retry-After", "1"))
# Add jitter, cap at 60s
wait = min(retry_after + random.uniform(0, 1), 60)
time.sleep(wait)
raise RuntimeError("Rate limit exceeded after retries")TypeScript
async function callWithBackoff<T>(
fn: () => Promise<Response>,
maxAttempts = 5,
): Promise<T> {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const response = await fn();
if (response.status !== 429) {
if (!response.ok) throw new Error(await response.text());
return (await response.json()) as T;
}
const retryAfter = Number(response.headers.get("Retry-After") ?? "1");
const wait = Math.min(retryAfter * 1000 + Math.random() * 1000, 60_000);
await new Promise((resolve) => setTimeout(resolve, wait));
}
throw new Error("Rate limit exceeded after retries");
}Don't retry on a daily spend cap
A 429 from a daily spend cap will not resolve until your local midnight. Your client should distinguish: if the Retry-After is more than, say, five minutes, treat it as a hard stop rather than retrying. Either raise the cap in the dashboard or wait out the day.
Trial credit and rate limits
If you're on the $5 trial credit (haven't yet exhausted it):
- All the same rate limits apply.
- The trial credit is spendable on
/v1/messagescalls only —/v1/chat/completionscalls draw from your paid balance directly. - Once the trial credit is exhausted, your traffic continues against your paid balance with no change in rate-limit behavior.
See Pricing for the trial credit mechanics.
Watching your limits in the dashboard
The Activity screen at app.unnma.ai/activity shows your request volume in real time, including how close you are to your per-key and per-account rate ceilings over the last 24 hours. The Logs screen breaks every 429 out with its trigger (rate limit vs. spend cap) so you can debug noisy clients.
If you're consistently bumping a rate ceiling, that's a signal to either:
- Shard traffic across more keys.
- Raise the ceiling on the key(s) you have.
- Request a higher account-level ceiling from support.
Next
- Error codes — the full error reference.
- Pricing — how spend caps interact with auto-topup.
- Authentication — sharding by key.