Welcome to COP Gateway

COP Gateway is a single API that routes your inference to nine AI vendors and applies a proprietary optimization layer called COP — Cognitive Orientation Prompting — on the way through. You write code against the Anthropic or OpenAI SDK you already use; you change one line (the base URL); your calls land at api.unnma.ai; we forward them to the underlying provider with COP applied. Coding workloads see roughly 24% fewer output tokens on average. Across mixed categories, the average is closer to 12%. Your bill shrinks accordingly.

How it works

python

from anthropic import Anthropic
 
client = Anthropic(
    api_key="unnma-sk-...",
    base_url="https://api.unnma.ai/v1",
)
 
response = client.messages.create(
    model="anthropic/claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, world."}],
)

Three things are happening:

You changed your base URL. Everything else in your code is unchanged — same SDK, same request shape, same response shape.
You used a prefix-qualified model name. anthropic/claude-opus-4-7 tells us which vendor to route to. The OpenAI SDK works the same way against /v1/chat/completions.
COP is applied to your request before forwarding. The result is a measurably lower output token count, which is where the cost lives (output tokens cost roughly 5× more than input tokens at most vendors).

You get the savings as a lower charge on the request. Your dashboard shows the per-call delta — what you paid through us versus what the same call would have cost going direct.

Why this exists

Two problems sit next to each other in production AI work:

Bills grow faster than usage. Output tokens dominate the spend, and the easiest way to cut them is to write better system prompts. Most teams don't have time.
Vendor lock-in is real. Switching from Claude to GPT to Gemini means rewriting your client code and re-testing every prompt. Most teams don't switch even when the economics would say to.

COP Gateway addresses both at once. The vendor routing happens at the model-name prefix — anthropic/..., openai/..., groq/..., gemini/..., plus four more direct integrations and OpenRouter as a long-tail fallback. COP runs automatically on every request. You don't change SDK; you don't change your prompts; you just save money on every call.

What's in these docs

Quickstart — get a key, make your first call. Five minutes.
Authentication — how API keys work, how to create and revoke them.
Pricing — the 9% markup on inference, the service fee on top-ups, the trial credit.
/v1/messages reference — the Anthropic-shape endpoint. Use this if you were using the Anthropic SDK.
/v1/chat/completions reference — the OpenAI-shape endpoint. Use this if you were using the OpenAI SDK.
Error codes — every status code, every error type, what to do when you see it.
Rate limits — per-key and per-account limits, daily spend caps, the Retry-After pattern.

Start at the Quickstart. It is the fastest path to a working call.

Welcome to COP Gateway

How it works

Why this exists

What's in these docs

Next