completions` reference

The OpenAI-shape endpoint. Drop-in compatible with the OpenAI SDK and the OpenAI HTTP API. Use this endpoint if you were previously calling OpenAI directly, or if you prefer the OpenAI request/response shape regardless of underlying vendor.

plaintext

POST https://api.unnma.ai/v1/chat/completions

Authentication

plaintext

Authorization: Bearer unnma-sk-...

See Authentication for key management.

Request body

The request body is the same as the official OpenAI Chat Completions request. We pass through every field unchanged, with one exception: the model field must be prefix-qualified (see below).

Required fields

| Field | Type | Notes | |---|---|---| | model | string | Must include a vendor prefix. See Model field. | | messages | array | OpenAI message format. |

Common optional fields

| Field | Type | Notes | |---|---|---| | max_tokens | integer | Per OpenAI. | | temperature | number | Per vendor. | | top_p | number | Per vendor. | | stream | boolean | Stream as SSE. Default false. | | stop | string or array | Per vendor. | | presence_penalty, frequency_penalty | number | Per vendor (where supported). | | tools, tool_choice | array / string or object | Per vendor — passed through unchanged. | | response_format | object | Per vendor (JSON mode, structured outputs). | | user | string | Per OpenAI. |

For full field documentation, refer to the OpenAI Chat Completions API reference. Anything not Unnma-specific behaves exactly as OpenAI specifies.

The `model` field

Same prefix discipline as /v1/messages. You must use a prefix-qualified model name; the prefix tells the gateway which vendor to route to.

plaintext

"model": "openai/gpt-4o"

You can route to any of the nine V1 vendors from this endpoint — the shape of the request is OpenAI; the routing is still by prefix:

| Prefix | Vendor | |---|---| | openai/ | OpenAI direct | | anthropic/ | Anthropic direct (mapped to OpenAI shape) | | gemini/ | Google Gemini direct (mapped to OpenAI shape) | | together/ | Together AI | | fireworks/ | Fireworks AI | | groq/ | Groq | | xai/ | xAI | | deepseek/ | DeepSeek | | openrouter/ | OpenRouter long-tail fallback |

The model identifier after the slash is whatever the vendor expects (click any vendor name above for their full model list).

A bare model name without a prefix is rejected with 400 invalid_request:

json

{
  "error": {
    "type": "invalid_request",
    "message": "Model field must include a vendor prefix. Did you mean 'openai/gpt-4o'?",
    "request_id": "req_01abc..."
  }
}

Response

The response body is in the OpenAI Chat Completions shape, regardless of underlying vendor. When the underlying vendor is Anthropic or Gemini, we map the request and response shapes at the route boundary so your client code stays OpenAI-shaped.

Non-streaming response

json

{
  "id": "chatcmpl_01abc...",
  "object": "chat.completion",
  "created": 1715900000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 145,
    "completion_tokens": 287,
    "total_tokens": 432
  }
}

Streaming response

When stream: true, the response is a Server-Sent Events stream in OpenAI's standard chunk format:

plaintext

data: {"id":"chatcmpl_01abc...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
 
data: {"id":"chatcmpl_01abc...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
...
 
data: {"id":"chatcmpl_01abc...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
 
data: [DONE]

Use the OpenAI SDK's stream helpers; they work without modification.

Response headers

Standard HTTP headers, plus:

| Header | Meaning | |---|---| | x-unnma-request-id | Our internal request ID. Include this when contacting support. | | x-unnma-vendor | The underlying vendor we routed to (e.g. openai). | | x-unnma-customer-charge-cents | What we charged your balance for this request, in cents. | | Retry-After | Present on 429 responses. Seconds to wait before retrying. |

Errors

Errors follow the standard envelope shared with /v1/messages. Full list of types, status codes, and client guidance: Error codes.

Behavior to know about

The platform behaviors — COP optimization, the prompt-extraction guard, prompt caching where supported by the underlying vendor, the usage and billing pipeline — all apply identically to /v1/chat/completions and /v1/messages. The only difference between the two endpoints is the request and response shape the client interacts with. The same backend serves both.

If you want a deeper read on those behaviors, see /v1/messages reference — the "Behavior to know about" section is endpoint-agnostic.

/v1/messages reference — same backend, Anthropic shape.
Error codes.
Rate limits.