API Documentation

Build with the Stansa API — OpenAI-compatible chat completions at competitive pricing.

Introduction

The Stansa API provides OpenAI-compatible endpoints for chat completions. You can use any OpenAI SDK or HTTP client to interact with the API by changing the base URL.

Base URL

https://stansa.ai/v1

All API requests should be made to this base URL. The API follows the OpenAI chat completions format, so existing code using the OpenAI SDK can be pointed to Stansa with minimal changes.

Getting Started

To get started, visit the API dashboard and log in to your stansa.ai account.

Authentication

Authenticate requests using an API key passed in the Authorization header as a Bearer token. You can create and manage API keys from your Developer Dashboard.

Authorization: Bearer sk-stansa-xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your API key secret. Do not expose it in client-side code or public repositories.

Models

Use GET /v1/models to list all available models programmatically. Below are the currently active models with their pricing per 1M tokens.

ModelCategoryInput / 1M tokensOutput / 1M tokens
gpt-4.1-unfilteredpremium$2.44$9.75
stansa-4.0premium$3.75$17.00
stansa-g4premium$1.75$6.50
gpt-4.1-mini-unfilteredstandard$0.50$2.00
stansa-d1-ministandard$0.40$1.50

Chat Completions

Create a chat completion by sending a POST request to /v1/chat/completions.

Request

POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-stansa-...

{
  "model": "stansa-g3",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g. "stansa-g3")
messagesarrayYesArray of message objects with role and content
temperaturenumberNoSampling temperature (0-2)
max_tokensintegerNoMaximum tokens to generate
top_pnumberNoNucleus sampling (0-1)
streambooleanNoEnable streaming (default: false)
stopstring | arrayNoStop sequences (max 4)
presence_penaltynumberNoPresence penalty (-2 to 2)
frequency_penaltynumberNoFrequency penalty (-2 to 2)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1708000000,
  "model": "stansa-g3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33
  }
}

Streaming

Set stream: true to receive responses as Server-Sent Events (SSE). Each event contains a JSON chunk with incremental content.

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708000000,"model":"stansa-g3","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

The stream ends with a data: [DONE] message. Parse each delta.content field and concatenate to build the full response.

Credits & Billing

Stansa uses a credit-based billing system. 1 credit = $0.001 (1,000 credits per $1). Credits are deducted based on actual token usage after each request completes.

Purchase credits and view your balance, usage history, and billing details from your Developer Dashboard. If your balance is insufficient, API requests return a 402 status code.

Rate Limits

API requests are rate-limited to ensure fair usage across all users.

LimitValue
Requests per minute (RPM)60
Requests per day (RPD)10,000

Rate limit status is included in response headers:

HeaderDescription
X-RateLimit-Limit-RequestsRPM limit
X-RateLimit-Remaining-RequestsRemaining requests this minute
X-RateLimit-Reset-RequestsMinute window reset (Unix timestamp)
X-RateLimit-Limit-Requests-DayRPD limit
X-RateLimit-Remaining-Requests-DayRemaining requests today
X-RateLimit-Reset-Requests-DayDay window reset (Unix timestamp)

When rate-limited, the API returns 429 Too Many Requests with a Retry-After header indicating seconds to wait.

Idempotency

For non-streaming requests, you can pass an Idempotency-Key header to safely retry requests without duplicate processing or credit charges.

POST /v1/chat/completions
Authorization: Bearer sk-stansa-...
Idempotency-Key: my-unique-key-123
Content-Type: application/json

{ ... }

If a request with the same idempotency key and request body is received, the cached response is returned without additional credit charges. Idempotency keys are valid for 24 hours.

Reusing an idempotency key with a different request body returns 409 Conflict.

Error Handling

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Descriptive error message",
    "type": "error_type",
    "code": "error_code"
  }
}

Status Codes

CodeTypeDescription
400invalid_request_errorMissing or invalid request parameters
401authentication_errorInvalid or missing API key
402insufficient_quotaInsufficient credits
404not_foundResource not found
409idempotency_errorIdempotency key reused with different body
429rate_limit_errorRate limit exceeded
500api_errorInternal server error

SDK Examples

Since the API is OpenAI-compatible, you can use the official OpenAI SDKs by changing the base URL.

curl https://stansa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-stansa-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stansa-g3",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Support

Need help? Reach out to us:

API Documentation | Stansa