Rate limits

The gateway enforces three independent limits on your traffic. All three return 429 rate_limit_exceeded when tripped, with a Retry-After header telling you how long to wait.

| Limit | Scope | Default | Configurable | |---|---|---|---| | Per-key rate limit | One API key | 100 requests per minute | Yes — per key, in the dashboard | | Per-account rate limit | All keys on your account | 1,000 requests per minute | Yes — in account settings | | Daily spend cap | One key, or your account | Unlimited (off by default) | Yes — per key and per account |

Per-key rate limit

Each API key has its own request-rate budget. The default is 100 requests per minute — a 60-second sliding window. When the window fills, the next request returns 429.

You configure the rate limit when you create the key, or by editing the key in the dashboard:

Lower for development keys or keys exposed to less-trusted environments.
Higher for production keys driving real traffic.

The rate limit is enforced on the proxy in front of every vendor call, with negligible added latency.

Per-account rate limit

Across all your keys combined, your account has a rate ceiling. Default: 1,000 requests per minute. This prevents a single account from accidentally (or otherwise) saturating the platform regardless of how many keys it shards across.

If you have a legitimate workload that needs a higher account-level ceiling, contact support — we lift it on request for accounts with the volume to back it.

Daily spend caps

Independent of rate, you can cap how many dollars a key (or your whole account) can spend in a single day. The cap is off by default at V1. When you set one:

The day window is your local day (per the timezone Stripe has for your account).
The cap is checked before a request is forwarded to the vendor.
If you've exceeded the cap, the request returns 429 with Retry-After set to the seconds remaining until your local midnight.
Worst case overage: one request's actual cost. The cap is checked post-hoc on the previous request's cost, so a single in-flight request can land you fractionally above the cap.

Daily caps are the cleanest defense against a runaway client or a leaked key. Setting a small cap on every key — even just $25/day on a dev key — turns "I left a loop running over the weekend" from a horror story into a Monday-morning slack message.

The `Retry-After` header

Every 429 response includes a Retry-After header with the number of seconds to wait before retrying:

plaintext

HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
 
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded on key 'unnma-sk-abc1'. Retry after 12 seconds.",
    "request_id": "req_01abc..."
  }
}

The value reflects the specific limit you tripped:

For a per-minute rate limit: seconds until your sliding window has room.
For a daily spend cap: seconds until your local midnight.

Always honor it. Retrying earlier just produces another 429.

Handling `429` in client code

A simple, effective pattern: exponential backoff with respect for Retry-After.

Python

python

import time
import random
import httpx
 
def call_with_backoff(client, payload, max_attempts=5):
    for attempt in range(max_attempts):
        response = client.post("/v1/messages", json=payload)
 
        if response.status_code != 429:
            response.raise_for_status()
            return response.json()
 
        # 429: respect Retry-After
        retry_after = float(response.headers.get("Retry-After", "1"))
        # Add jitter, cap at 60s
        wait = min(retry_after + random.uniform(0, 1), 60)
        time.sleep(wait)
 
    raise RuntimeError("Rate limit exceeded after retries")

TypeScript

async function callWithBackoff<T>(
  fn: () => Promise<Response>,
  maxAttempts = 5,
): Promise<T> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const response = await fn();
 
    if (response.status !== 429) {
      if (!response.ok) throw new Error(await response.text());
      return (await response.json()) as T;
    }
 
    const retryAfter = Number(response.headers.get("Retry-After") ?? "1");
    const wait = Math.min(retryAfter * 1000 + Math.random() * 1000, 60_000);
    await new Promise((resolve) => setTimeout(resolve, wait));
  }
 
  throw new Error("Rate limit exceeded after retries");
}

Don't retry on a daily spend cap

A 429 from a daily spend cap will not resolve until your local midnight. Your client should distinguish: if the Retry-After is more than, say, five minutes, treat it as a hard stop rather than retrying. Either raise the cap in the dashboard or wait out the day.

Trial credit and rate limits

If you're on the $5 trial credit (haven't yet exhausted it):

All the same rate limits apply.
The trial credit is spendable on /v1/messages calls only — /v1/chat/completions calls draw from your paid balance directly.
Once the trial credit is exhausted, your traffic continues against your paid balance with no change in rate-limit behavior.

See Pricing for the trial credit mechanics.

Watching your limits in the dashboard

The Activity screen at app.unnma.ai/activity shows your request volume in real time, including how close you are to your per-key and per-account rate ceilings over the last 24 hours. The Logs screen breaks every 429 out with its trigger (rate limit vs. spend cap) so you can debug noisy clients.

If you're consistently bumping a rate ceiling, that's a signal to either:

Shard traffic across more keys.
Raise the ceiling on the key(s) you have.
Request a higher account-level ceiling from support.

Error codes — the full error reference.
Pricing — how spend caps interact with auto-topup.
Authentication — sharding by key.