Models & Providers

The Kilo AI Gateway provides access to hundreds of AI models through a single unified API. You can switch between models by changing the model ID string -- no code changes required.

Specifying a model

Models are identified using the format provider/model-name. Pass this as the model parameter in your request:

const result = streamText({
  model: kilo.chat("anthropic/claude-sonnet-4.6"),
  prompt: "Hello!",
})

Or in a raw API request:

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Available models

You can browse the full list of available models via the models endpoint:

GET https://api.kilo.ai/api/gateway/models

This returns model information including pricing, context window, and supported features. No authentication is required.

Model IDProviderDescription
anthropic/claude-opus-4.7AnthropicMost capable Claude model for complex reasoning
anthropic/claude-sonnet-4.6AnthropicBalanced performance and cost
anthropic/claude-haiku-4.5AnthropicFast and cost-effective
openai/gpt-5.4OpenAILatest GPT model
openai/gpt-5.4-miniOpenAIFast and efficient
google/gemini-3.1-pro-previewGoogleAdvanced reasoning
google/gemini-2.5-flashGoogleFast and efficient
x-ai/grok-4xAIMost capable Grok model
x-ai/grok-code-fast-1xAIOptimized for code tasks
deepseek/deepseek-v3.2DeepSeekStrong coding and reasoning model
moonshotai/kimi-k2.5MoonshotStrong coding and multilingual model
minimax/minimax-m2.7MiniMaxHigh-performance MoE model

Free models

Several models are available at no cost, subject to rate limits:

Model IDDescription
stepfun/step-3.7-flash:freeStepFun Step 3.7 Flash
poolside/laguna-m.1:freePoolside Laguna M.1
nvidia/nemotron-3-ultra-550b-a55b:freeNVIDIA Nemotron 3 Ultra
openrouter/freeBest available free model

Free models are available to both authenticated and anonymous users. Anonymous users are rate-limited to 200 requests per hour per IP address.

⚠️NVIDIA free endpoints

For NVIDIA free endpoints (Super/Ultra/etc): Trial use only - do not submit personal or confidential data. Your use is logged for security purposes and to improve NVIDIA products and services. The logged session data for improvement purposes is not linked to your identity or any persistent identifier. For more information about our data processing practices, see our Privacy Policy. By interacting with this endpoint, you consent to our collection, recording, and use of such information and the NVIDIA API Trial Terms of Service.

Auto models

Auto virtual models select an underlying model using tier-specific routing. Frontier uses the x-kilocode-mode request header. Balanced uses the API interface, Free uses deterministic affinity across available candidates, and Small uses account balance.

ℹ️Underlying models can change

The mappings below reflect the current routing. The underlying models behind each kilo-auto/* tier are updated server-side as better options become available or as providers change pricing and availability — the tier IDs themselves remain stable.

kilo-auto/frontier

Highest performance and capability for any task. Frontier requests are sent with medium reasoning effort and medium verbosity.

ModeResolved Model
plan, general, architect, orchestrator, ask, debuganthropic/claude-opus-4.7
build, explore, codeanthropic/claude-sonnet-4.6
Default (no / unknown mode)anthropic/claude-sonnet-4.6

kilo-auto/balanced

Great balance of price and capability. The resolved model depends on the API interface used by the client.

API interfaceResolved ModelReasoning effort
Completions (default)qwen/qwen3.6-plusenabled
Responses APIopenai/gpt-5.5low
Messages APIanthropic/claude-sonnet-4.6low

kilo-auto/free

Free with limited capability. No credits required. The resolved model is selected dynamically per session from a curated set of available free models; the mapping updates server-side as free model availability shifts.

⚠️Data handling for Auto Free

Auto Free may route your requests to providers that log prompts and outputs and use them to improve their services. Do not submit personal or confidential data when using Auto Free. In particular, it may route to NVIDIA's free endpoints (see NVIDIA Trial Terms of Service above).

kilo-auto/small

Automatically routes to a small, fast model for lightweight background tasks (session titles, commit messages, summaries).

ConditionResolved Model
Account has paid balancegoogle/gemma-4-31b-it
No balance / free accountgoogle/gemma-4-26b-a4b-it:free

Example usage

{
  "model": "kilo-auto/frontier",
  "messages": [{ "role": "user", "content": "Help me design a database schema" }]
}

With the mode header:

curl -X POST "https://api.kilo.ai/api/gateway/chat/completions" \
  -H "Authorization: Bearer $KILO_API_KEY" \
  -H "x-kilocode-mode: plan" \
  -H "Content-Type: application/json" \
  -d '{"model": "kilo-auto/balanced", "messages": [{"role": "user", "content": "Design a database schema"}]}'