Auto Model Tiers
Overview
Kilo Auto is a model routing system that automatically selects the optimal AI model based on the user's current mode (Code, Architect, Debug, etc.). It comes in multiple tiers so that every user — regardless of budget, preference, or expertise — gets a "just works" experience without needing to understand the AI model landscape.
Three tiers are user-facing, and one is internal:
| Tier ID | Audience | Pricing |
|---|---|---|
kilo-auto/frontier | Best paid models | Paid |
kilo-auto/balanced | Strong performance, lower cost | Paid |
kilo-auto/free | Best available free models | Free |
kilo-auto/small | Internal — background tasks | Varies |
Problem
Users shouldn't need to be AI model experts
The AI model landscape is overwhelming. There are hundreds of models across dozens of providers, with different pricing, capabilities, context windows, and availability. Most developers just want to write code — they don't want to research which model is best for their task, budget, and workflow.
Without Auto Model, three groups are underserved:
Free users — They see a list of free models that changes on promotional periods and shifting availability. Which one is the best? Which is good for a particular task? They have no way to know without trial and error.
Cost-conscious users — They want something better than free but cheaper than frontier. Open-weight models are useful and significantly cheaper, but which one? Which version? The answer changes every few weeks.
Background tasks — Kilo uses small models for things like generating session titles and commit messages. These should be invisible and reliable, not dependent on the user's model selection or credit status.
Free model churn creates a moving target
Free models on OpenRouter appear and disappear based on promotional periods. A model that works well today may be gone next week. Users who manually selected a free model discover it's unavailable. Auto Model tiers absorb this churn — when the best free model changes, the mapping updates server-side and users keep working.
Tiers
Auto: Frontier
Who it's for: Users who want the best available models and are willing to pay for them.
What it does: Routes between the best paid models based on the task — stronger reasoning models for planning and architecture, faster models for code generation and editing. Optimizes for the best balance of capability, speed, and token efficiency.
Pricing: Paid. Uses credits.
For the current mode-to-model mappings, see the Auto Model user docs.
Auto: Balanced
Who it's for: Cost-conscious developers who want better results than free models at a fraction of frontier cost.
What it does: Follows the same mode-based routing structure as Frontier but uses cost-effective open-weight models for both reasoning and implementation tasks.
Pricing: Paid, but significantly cheaper than Frontier.
For the current mode-to-model mappings, see the Auto Model user docs.
Auto: Free
Who it's for: Users who want to try Kilo without a credit card, students, hobbyists, and anyone exploring AI-assisted coding.
What it does: Automatically maps to the best available free model(s) for each mode. As free model availability changes due to promotional periods, the mapping updates transparently. Users always get the best free option without having to track which models are currently available.
Pricing: Free. No credits required.
Constraints: Free models may not provide sufficient breadth to justify different models per mode. In that case, a single model may be used for all modes. Quality will be lower than Frontier or Balanced tiers — this is a tradeoff users accept by choosing free.
Auto: Small (internal)
Who it's for: Not user-facing. Used internally by Kilo for lightweight background tasks (session titles, commit messages, conversation summaries).
What it does: Automatically selects the right small model for lightweight tasks. When credits are available, it uses a fast paid small model.
Why it matters: Users never think about background tasks, and they shouldn't have to. Auto: Small ensures these tasks always work, always feel fast, and never waste credits on an expensive model when a cheap one will do.
Implementation: The getSmallModel() function in packages/opencode/src/provider/provider.ts prioritizes kilo-auto/small when the Kilo provider is active. If the user's provider doesn't have a dedicated small model, it falls back globally to kilo-auto/small when available.
User experience
Model picker
The three user-facing tiers appear in the model selector:
| Display Name | Description shown to user |
|---|---|
| Auto: Frontier | Best paid models, automatically matched to your task |
| Auto: Balanced | Strong performance at lower cost |
| Auto: Free | Best free models, no credits required |
Auto: Small does not appear in the model picker. It is filtered out by the UI (see KILO_AUTO_SMALL_IDS in the VS Code extension).
Defaults
- Authenticated users: Default to
kilo-auto/balanced(defined inpackages/kilo-gateway/src/api/constants.ts) - Unauthenticated users: Default to
kilo-auto/free
This means a brand-new user who hasn't signed in gets a working experience immediately — no model selection required.
What users see
The UI shows the tier name (e.g., "Auto: Frontier"), not the underlying model. Users don't need to know or care that their planning request went to Opus and their coding request went to Sonnet. The abstraction is the product.
Implementation architecture
Auto Model uses a split client/server architecture. The actual model-to-mode mappings are not hardcoded in the client — they're served dynamically from the Kilo API, making it possible to update routing without client releases.
Server side (Kilo API)
The Kilo API at api.kilo.ai defines which underlying models each kilo-auto/* tier routes to per mode. Each auto model is returned with an opencode.variants field — a map of mode-specific provider options:
{
"opencode": {
"variants": {
"architect": { "model": "anthropic/claude-opus-4-6", ... },
"code": { "model": "anthropic/claude-sonnet-4-6", ... }
}
}
}
This is fetched via packages/kilo-gateway/src/api/models.ts which parses the opencode.variants field from the API response.
Client side
The client-side chain works as follows:
Model fetching:
packages/opencode/src/provider/model-cache.tscaches Kilo Gateway models with a 5-minute TTL, fetching from the Kilo API.Variant passthrough:
packages/opencode/src/provider/transform.ts— thevariants()function passes through server-defined variants for Kilo Gateway models directly, rather than computing them locally.Variant storage:
packages/opencode/src/provider/provider.tsstoresvariantson the model object when the provider iskilo.Agent variant resolution: Each agent (mode) specifies a
variantin its config (packages/opencode/src/config/config.ts). At prompt time,packages/opencode/src/session/prompt.tsresolves the variant from the agent config and attaches it to the user message.LLM call merging: At call time,
packages/opencode/src/session/llm.tsmerges the variant's options (including the actual underlying model ID) into the provider options sent to OpenRouter.
Key files
| File | Role |
|---|---|
packages/kilo-gateway/src/api/constants.ts | Default model constants (DEFAULT_MODEL, DEFAULT_FREE_MODEL) |
packages/kilo-gateway/src/api/models.ts | Fetches models from Kilo API, parses opencode.variants |
packages/opencode/src/provider/model-cache.ts | Caches Kilo Gateway models with 5-min TTL |
packages/opencode/src/provider/provider.ts | Preserves variants for kilo provider; getSmallModel() prioritizes kilo-auto/small |
packages/opencode/src/provider/transform.ts | Passes through server-defined variants for Kilo Gateway models |
packages/opencode/src/session/prompt.ts | Resolves variant from agent config, attaches to user messages |
packages/opencode/src/session/llm.ts | Merges variant options into LLM call parameters |
packages/opencode/src/config/config.ts | Agent config schema includes variant field |
Requirements
- Unauthenticated users default to
kilo-auto/freewith no configuration required - All tiers use mode-based routing where the underlying models support it
- When a tier routes to different model families across turns in a conversation, thinking/reasoning blocks from the previous model are stripped to prevent compatibility errors
- Auto Model requires VS Code/JetBrains extension v5.2.3+ or CLI v1.0.15+ for mode-based switching. Older versions fall back to a single model for all requests.
Risks
| Risk | User impact | Mitigation |
|---|---|---|
| Free model disappears mid-session | User's next message fails | Fallback chain: primary → secondary → tertiary free model. Graceful error only if all options exhausted. |
| Model quality variance across free/balanced tiers | Inconsistent experience compared to Frontier | Set clear expectations in UI. Curate model lists, don't just pick the cheapest. |
| Cross-family model switching breaks context | Thinking blocks from Model A incompatible with Model B | Strip thinking blocks when the underlying model family changes between turns. Frontier stays within one family so this primarily affects Free and Balanced. |
| Users don't understand the tier differences | Wrong tier selected, poor experience | Clear descriptions in the model picker. Good defaults (Balanced for paid, Free for unpaid) so most users never need to actively choose. |
Data and compliance
- Frontier: Uses Anthropic models with no training on user data.
- Balanced and Free: The underlying models may have different data handling policies depending on the provider. This should be documented per-tier so enterprise users can make informed choices.
- Small: Same concern as Balanced/Free — the model selected depends on credit status, which may route to providers with different policies.
Features for the future
- Resolved model transparency: Show the actual model being used on hover/click for users who want to know
- Per-agent tier overrides: Let users pick Frontier for their code agent but Free for explore
- Auto model changelog: A status page or in-product notification when tier mappings change
- Tier analytics: Dashboard showing which models each tier resolves to, latency, error rates, quality metrics
- Enterprise open-weight preference: Organizations that require open-weight models for auditability could enforce the Balanced tier across their team