`/v1/messages` reference

The Anthropic-shape endpoint. Drop-in compatible with the Anthropic SDK and the Anthropic HTTP API. Use this endpoint if you were previously calling Anthropic directly, or if you prefer the Anthropic request/response shape regardless of underlying vendor.

plaintext

POST https://api.unnma.ai/v1/messages

Authentication

plaintext

Authorization: Bearer unnma-sk-...

See Authentication for key management.

Request body

The request body is the same as the official Anthropic /v1/messages request. We pass through every field unchanged, with one exception: the model field must be prefix-qualified (see below).

Required fields

| Field | Type | Notes | |---|---|---| | model | string | Must include a vendor prefix. See Model field. | | messages | array | Anthropic message format. | | max_tokens | integer | Per Anthropic. |

Common optional fields

| Field | Type | Notes | |---|---|---| | system | string or array | System prompt. Passed through to the vendor. | | temperature | number | Per vendor. | | top_p, top_k | number / integer | Per vendor. | | stream | boolean | Stream the response as SSE. Default false. | | stop_sequences | array<string> | Per vendor. | | metadata | object | Per vendor. | | tools, tool_choice | array / object | Per vendor — passed through unchanged. |

For full field documentation, refer to the Anthropic API reference. Anything not Unnma-specific behaves exactly as Anthropic specifies.

The `model` field

You must use a prefix-qualified model name. The prefix tells the gateway which vendor to route to.

plaintext

"model": "anthropic/claude-opus-4-7"

Examples across the nine vendors supported at V1:

| Prefix | Vendor | |---|---| | anthropic/ | Anthropic direct | | openai/ | OpenAI direct | | gemini/ | Google Gemini direct | | together/ | Together AI | | fireworks/ | Fireworks AI | | groq/ | Groq | | xai/ | xAI | | deepseek/ | DeepSeek | | openrouter/ | OpenRouter long-tail fallback |

The model identifier after the slash is whatever the vendor expects (click any vendor name above for their full model list). For example: openai/gpt-4o, groq/llama-3.1-70b-versatile, together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo.

A bare model name without a prefix is rejected with 400 invalid_request:

json

{
  "error": {
    "type": "invalid_request",
    "message": "Model field must include a vendor prefix. Did you mean 'anthropic/claude-opus-4-7'?",
    "request_id": "req_01abc..."
  }
}

For the OpenRouter fallback, the prefix is doubled — openrouter/<provider>/<model>. Example: openrouter/cohere/command-r-plus. This is an explicit opt-in path for models we don't have direct integrations for; pricing and behavior come from OpenRouter pass-through.

Response

The response body is byte-for-byte the vendor's response in the Anthropic shape, whether the underlying vendor is Anthropic or another. When the underlying vendor is, e.g., OpenAI or Gemini, we map the request and response shapes at the route boundary so your client code stays Anthropic-shaped.

Non-streaming response

json

{
  "id": "msg_01abc...",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-7",
  "content": [
    {
      "type": "text",
      "text": "..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 145,
    "output_tokens": 287,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Streaming response

When stream: true, the response is a Server-Sent Events stream in Anthropic's standard event format:

plaintext

event: message_start
data: {"type": "message_start", "message": {...}}
 
event: content_block_start
data: {"type": "content_block_start", "index": 0, ...}
 
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "..."}}
 
...
 
event: message_stop
data: {"type": "message_stop"}

Use the Anthropic SDK's stream helpers; they work without modification.

Response headers

Standard HTTP headers, plus:

| Header | Meaning | |---|---| | x-unnma-request-id | Our internal request ID. Include this when contacting support. | | x-unnma-vendor | The underlying vendor we routed to (e.g. anthropic). | | x-unnma-customer-charge-cents | What we charged your balance for this request, in cents. | | Retry-After | Present on 429 responses. Seconds to wait before retrying. |

Errors

All errors follow the standard envelope:

json

{
  "error": {
    "type": "<error_type>",
    "message": "<human-readable>",
    "request_id": "req_01abc..."
  }
}

The full list of type values, their HTTP status codes, and recommended client behavior is in Error codes.

Behavior to know about

COP optimization

Every request runs through COP — our proprietary optimization layer — before reaching the vendor. The effect is a measurable reduction in output token count on most workloads, which translates directly into a lower bill. The optimization is transparent: your system content and messages reach the model intact, and you do not pay for any tokens COP itself contributes. See Pricing for the per-workload savings profile.

Prompt-extraction guard

Every request runs through a fast prompt-extraction guard in parallel with the vendor call. If the guard flags the request (extraction attempt, jailbreak, prompt injection), the request is blocked with 403 request_blocked and the response includes an incident_id. Legitimate traffic is not affected; the guard is calibrated to favor low false-positive rates. If you believe a request was blocked in error, contact support with the incident_id and we will review.

Caching

If the underlying vendor supports prompt caching (Anthropic, OpenAI, DeepSeek, Gemini at V1), our adapter participates in the vendor's cache mechanism where possible. The savings on cache hits — DeepSeek's 98% discount in particular — are passed through to you on the vendor cost line.

Usage and billing

On every successful request, we record:

The token counts the vendor returned.
The vendor's cost (computed from the rate card for the model).
Our charge to you (vendor cost × 1.09).

These appear in the dashboard at app.unnma.ai/logs and app.unnma.ai/activity. See Pricing for the cost model.

/v1/chat/completions reference — same backend, OpenAI shape.
Error codes.
Rate limits.