Welcome to COP Gateway
COP Gateway is a single API that routes your inference to nine AI vendors and applies a proprietary optimization layer called COP — Cognitive Orientation Prompting — on the way through. You write code against the Anthropic or OpenAI SDK you already use; you change one line (the base URL); your calls land at api.unnma.ai; we forward them to the underlying provider with COP applied. Coding workloads see roughly 24% fewer output tokens on average. Across mixed categories, the average is closer to 12%. Your bill shrinks accordingly.
How it works
from anthropic import Anthropic
client = Anthropic(
api_key="unnma-sk-...",
base_url="https://api.unnma.ai/v1",
)
response = client.messages.create(
model="anthropic/claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, world."}],
)Three things are happening:
- You changed your base URL. Everything else in your code is unchanged — same SDK, same request shape, same response shape.
- You used a prefix-qualified model name.
anthropic/claude-opus-4-7tells us which vendor to route to. The OpenAI SDK works the same way against/v1/chat/completions. - COP is applied to your request before forwarding. The result is a measurably lower output token count, which is where the cost lives (output tokens cost roughly 5× more than input tokens at most vendors).
You get the savings as a lower charge on the request. Your dashboard shows the per-call delta — what you paid through us versus what the same call would have cost going direct.
Why this exists
Two problems sit next to each other in production AI work:
- Bills grow faster than usage. Output tokens dominate the spend, and the easiest way to cut them is to write better system prompts. Most teams don't have time.
- Vendor lock-in is real. Switching from Claude to GPT to Gemini means rewriting your client code and re-testing every prompt. Most teams don't switch even when the economics would say to.
COP Gateway addresses both at once. The vendor routing happens at the model-name prefix — anthropic/..., openai/..., groq/..., gemini/..., plus four more direct integrations and OpenRouter as a long-tail fallback. COP runs automatically on every request. You don't change SDK; you don't change your prompts; you just save money on every call.
What's in these docs
- Quickstart — get a key, make your first call. Five minutes.
- Authentication — how API keys work, how to create and revoke them.
- Pricing — the 9% markup on inference, the service fee on top-ups, the trial credit.
/v1/messagesreference — the Anthropic-shape endpoint. Use this if you were using the Anthropic SDK./v1/chat/completionsreference — the OpenAI-shape endpoint. Use this if you were using the OpenAI SDK.- Error codes — every status code, every error type, what to do when you see it.
- Rate limits — per-key and per-account limits, daily spend caps, the
Retry-Afterpattern.
Next
Start at the Quickstart. It is the fastest path to a working call.