Models & Providers
The Kilo AI Gateway provides access to hundreds of AI models from multiple providers through a single unified API. You can switch between models by changing the model ID string -- no code changes required.
Specifying a model
Models are identified using the format provider/model-name. Pass this as the model parameter in your request:
const result = streamText({
model: kilo.chat("anthropic/claude-sonnet-4.6"),
prompt: "Hello!",
})
Or in a raw API request:
{
"model": "anthropic/claude-sonnet-4.6",
"messages": [{ "role": "user", "content": "Hello!" }]
}
Available models
You can browse the full list of available models via the models endpoint:
GET https://api.kilo.ai/api/gateway/models
This returns model information including pricing, context window, and supported features. No authentication is required.
Popular models
| Model ID | Provider | Description |
|---|---|---|
anthropic/claude-opus-4.6 | Anthropic | Most capable Claude model for complex reasoning |
anthropic/claude-sonnet-4.6 | Anthropic | Balanced performance and cost |
anthropic/claude-haiku-4.5 | Anthropic | Fast and cost-effective |
openai/gpt-5.2 | OpenAI | Latest GPT model |
google/gemini-3-pro-preview | Advanced reasoning with 1M context | |
google/gemini-3-flash-preview | Fast and efficient | |
x-ai/grok-code-fast-1 | xAI | Optimized for code tasks |
moonshotai/kimi-k2.5 | Moonshot | Strong multilingual model |
Free models
Several models are available at no cost, subject to rate limits:
| Model ID | Description |
|---|---|
minimax/minimax-m2.1:free | MiniMax M2.1 |
z-ai/glm-5:free | Z.AI GLM-5 |
giga-potato | Community model |
corethink:free | CoreThink reasoning model |
arcee-ai/trinity-large-preview:free | Arcee Trinity |
Free models are available to both authenticated and anonymous users. Anonymous users are rate-limited to 200 requests per hour per IP address.
Auto models
Kilo Auto virtual models automatically select the best underlying model based on the task type. The selection is controlled by the x-kilocode-mode request header.
kilo-auto/frontier
Routes to the most capable paid models optimizing for cost, performance, and capabilities.
| Mode | Resolved Model |
|---|---|
plan, general, architect, orchestrator, ask, debug | anthropic/claude-opus-4.6 |
build, explore, code | anthropic/claude-sonnet-4.6 |
| Default (no mode specified) | anthropic/claude-sonnet-4.6 |
kilo-auto/balanced
Follows the same mode-based routing as Frontier but uses more cost-effective models.
| Mode | Resolved Model |
|---|---|
plan, general, architect, orchestrator, ask, debug | moonshotai/kimi-k2.5 |
build, explore, code | minimax/minimax-m2.7 |
| Default (no mode specified) | minimax/minimax-m2.7 |
kilo-auto/free
The best available free model for each mode.
| Mode | Resolved Model |
|---|---|
| All modes | minimax/minimax-m2.5:free |
| Default (no mode specified) | minimax/minimax-m2.5:free |
Example usage
{
"model": "kilo-auto/frontier",
"messages": [{ "role": "user", "content": "Help me design a database schema" }]
}
With the mode header:
curl -X POST "https://api.kilo.ai/api/gateway/chat/completions" \
-H "Authorization: Bearer $KILO_API_KEY" \
-H "x-kilocode-mode: plan" \
-H "Content-Type: application/json" \
-d '{"model": "kilo-auto/balanced", "messages": [{"role": "user", "content": "Design a database schema"}]}'
Providers
The gateway routes requests to the appropriate provider based on the model and your configuration:
| Provider | Slug | Description |
|---|---|---|
| OpenRouter | openrouter | Primary gateway for most models |
| Vercel AI Gateway | vercel | BYOK routing and select A/B testing |
| Mistral | mistral | FIM completions (Codestral) |
| xAI | x-ai | Grok models (direct) |
| MiniMax | minimax | MiniMax models (direct) |
| CoreThink | corethink | CoreThink reasoning model |
| Inception | inception | InceptionLabs models |
| Martian | martian | Optimized xAI models |
| StreamLake | streamlake | KAT-Coder models |
Provider routing
The gateway uses the following priority for routing requests:
- BYOK check: If you have a BYOK key for the model's provider, the request is routed through Vercel AI Gateway using your key
- Free model routing: If the model is a Kilo-hosted free model, it's routed to its designated provider
- Default routing: All other requests go through OpenRouter
Preferred inference providers
For models available through multiple providers, the gateway may use a preferred provider for better performance:
| Model Family | Preferred Provider |
|---|---|
| Anthropic models | Amazon Bedrock |
| MiniMax models | MiniMax (direct) |
| Mistral models | Mistral (direct) |
| Moonshot models | Moonshot AI (direct) |
These preferences are sent as hints to OpenRouter, which may override them based on availability and load.
Listing models
Models endpoint
GET https://api.kilo.ai/api/gateway/models
Returns an OpenAI-compatible list of all available models with metadata including pricing, context window, and capabilities.
Providers endpoint
GET https://api.kilo.ai/api/gateway/providers
Returns a list of all available inference providers.
Models by provider
GET https://api.kilo.ai/api/gateway/models-by-provider
Returns models grouped by their provider, useful for building model selection interfaces.