Welcome to COP Gateway

COP Gateway is a single API that routes your inference to nine AI vendors and applies a proprietary optimization layer called COP — Cognitive Orientation Prompting — on the way through. You write code against the Anthropic or OpenAI SDK you already use; you change one line (the base URL); your calls land at api.unnma.ai; we forward them to the underlying provider with COP applied. Coding workloads see roughly 24% fewer output tokens on average. Across mixed categories, the average is closer to 12%. Your bill shrinks accordingly.

How it works

python
from anthropic import Anthropic
 
client = Anthropic(
    api_key="unnma-sk-...",
    base_url="https://api.unnma.ai/v1",
)
 
response = client.messages.create(
    model="anthropic/claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, world."}],
)

Three things are happening:

  1. You changed your base URL. Everything else in your code is unchanged — same SDK, same request shape, same response shape.
  2. You used a prefix-qualified model name. anthropic/claude-opus-4-7 tells us which vendor to route to. The OpenAI SDK works the same way against /v1/chat/completions.
  3. COP is applied to your request before forwarding. The result is a measurably lower output token count, which is where the cost lives (output tokens cost roughly 5× more than input tokens at most vendors).

You get the savings as a lower charge on the request. Your dashboard shows the per-call delta — what you paid through us versus what the same call would have cost going direct.

Why this exists

Two problems sit next to each other in production AI work:

  • Bills grow faster than usage. Output tokens dominate the spend, and the easiest way to cut them is to write better system prompts. Most teams don't have time.
  • Vendor lock-in is real. Switching from Claude to GPT to Gemini means rewriting your client code and re-testing every prompt. Most teams don't switch even when the economics would say to.

COP Gateway addresses both at once. The vendor routing happens at the model-name prefix — anthropic/..., openai/..., groq/..., gemini/..., plus four more direct integrations and OpenRouter as a long-tail fallback. COP runs automatically on every request. You don't change SDK; you don't change your prompts; you just save money on every call.

What's in these docs

  • Quickstart — get a key, make your first call. Five minutes.
  • Authentication — how API keys work, how to create and revoke them.
  • Pricing — the 9% markup on inference, the service fee on top-ups, the trial credit.
  • /v1/messages reference — the Anthropic-shape endpoint. Use this if you were using the Anthropic SDK.
  • /v1/chat/completions reference — the OpenAI-shape endpoint. Use this if you were using the OpenAI SDK.
  • Error codes — every status code, every error type, what to do when you see it.
  • Rate limits — per-key and per-account limits, daily spend caps, the Retry-After pattern.

Next

Start at the Quickstart. It is the fastest path to a working call.