API Documentation

Build with the Stansa API — OpenAI-compatible chat completions at competitive pricing.

Introduction

The Stansa API provides OpenAI-compatible endpoints for chat completions. You can use any OpenAI SDK or HTTP client to interact with the API by changing the base URL.

Base URL

https://stansa.ai/v1

All API requests should be made to this base URL. The API follows the OpenAI chat completions format, so existing code using the OpenAI SDK can be pointed to Stansa with minimal changes.

Getting Started

To get started, visit the API dashboard and log in to your stansa.ai account.

Authentication

Authenticate requests using an API key passed in the Authorization header as a Bearer token. You can create and manage API keys from your Developer Dashboard.

Authorization: Bearer sk-stansa-xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your API key secret. Do not expose it in client-side code or public repositories.

Models

Use GET /v1/models to list all available models programmatically. Below are the currently active models with their pricing per 1M tokens.

Model	Category	Input / 1M tokens	Output / 1M tokens
gpt-4.1-unfiltered	premium	$2.44	$9.75
stansa-4.0	premium	$3.75	$17.00
stansa-g4	premium	$1.75	$6.50
gpt-4.1-mini-unfiltered	standard	$0.50	$2.00
stansa-d1-mini	standard	$0.40	$1.50

Chat Completions

Create a chat completion by sending a POST request to /v1/chat/completions.

Request

POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-stansa-...

{
  "model": "stansa-g3",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Parameters

Parameter	Type	Required	Description
model	string	Yes	Model ID (e.g. "stansa-g3")
messages	array	Yes	Array of message objects with role and content
temperature	number	No	Sampling temperature (0-2)
max_tokens	integer	No	Maximum tokens to generate
top_p	number	No	Nucleus sampling (0-1)
stream	boolean	No	Enable streaming (default: false)
stop	string \| array	No	Stop sequences (max 4)
presence_penalty	number	No	Presence penalty (-2 to 2)
frequency_penalty	number	No	Frequency penalty (-2 to 2)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1708000000,
  "model": "stansa-g3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33
  }
}

Streaming

Set stream: true to receive responses as Server-Sent Events (SSE). Each event contains a JSON chunk with incremental content.

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

The stream ends with a data: [DONE] message. Parse each delta.content field and concatenate to build the full response.

Credits & Billing

Stansa uses a credit-based billing system. 1 credit = $0.001 (1,000 credits per $1). Credits are deducted based on actual token usage after each request completes.

Purchase credits and view your balance, usage history, and billing details from your Developer Dashboard. If your balance is insufficient, API requests return a 402 status code.

Rate Limits

API requests are rate-limited to ensure fair usage across all users.

Limit	Value
Requests per minute (RPM)	60
Requests per day (RPD)	10,000

Rate limit status is included in response headers:

Header	Description
X-RateLimit-Limit-Requests	RPM limit
X-RateLimit-Remaining-Requests	Remaining requests this minute
X-RateLimit-Reset-Requests	Minute window reset (Unix timestamp)
X-RateLimit-Limit-Requests-Day	RPD limit
X-RateLimit-Remaining-Requests-Day	Remaining requests today
X-RateLimit-Reset-Requests-Day	Day window reset (Unix timestamp)

When rate-limited, the API returns 429 Too Many Requests with a Retry-After header indicating seconds to wait.

Idempotency

For non-streaming requests, you can pass an Idempotency-Key header to safely retry requests without duplicate processing or credit charges.

POST /v1/chat/completions
Authorization: Bearer sk-stansa-...
Idempotency-Key: my-unique-key-123
Content-Type: application/json

{ ... }

If a request with the same idempotency key and request body is received, the cached response is returned without additional credit charges. Idempotency keys are valid for 24 hours.

Reusing an idempotency key with a different request body returns 409 Conflict.

Error Handling

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Descriptive error message",
    "type": "error_type",
    "code": "error_code"
  }
}

Status Codes

Code	Type	Description
400	invalid_request_error	Missing or invalid request parameters
401	authentication_error	Invalid or missing API key
402	insufficient_quota	Insufficient credits
404	not_found	Resource not found
409	idempotency_error	Idempotency key reused with different body
429	rate_limit_error	Rate limit exceeded
500	api_error	Internal server error

SDK Examples

Since the API is OpenAI-compatible, you can use the official OpenAI SDKs by changing the base URL.

curl https://stansa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-stansa-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stansa-g3",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Support

Need help? Reach out to us:

Email: contact@stansa.ai
FAQ: stansa.ai/faq