Voice Transcription

Kilo Code now includes experimental support for voice input in the chat interface. This feature allows you to dictate your messages using speech-to-text (STT) technology powered by OpenAI's Whisper API.

Prerequisites

Voice transcription requires two components to be set up:

1. FFmpeg Installation

FFmpeg is required for audio capture and processing. Install it for your platform:

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org/download.html and add to your system PATH.

2. OpenAI API Key

Voice transcription uses OpenAI's Whisper API for speech recognition. You need an OpenAI API configuration in Kilo Code:

Configure an OpenAI provider profile in Kilo Code settings
Add your OpenAI API key to the profile
Either OpenAI or OpenAI Native provider types will work

Enabling Voice Transcription

Voice transcription is an experimental feature that must be enabled:

Open Kilo Code settings
Navigate to Experimental Features
Enable the Speech to Text experiment

Using Voice Input

Once configured and enabled, a microphone button will appear in the chat input area:

Click the microphone button to start recording
Speak your message clearly
Click again to stop recording
Your speech will be automatically transcribed into text

The feature includes real-time audio level visualization and voice activity detection to automatically detect when you're speaking.

Technical Details

Audio Processing: Uses FFmpeg for system audio capture
Voice Recognition: OpenAI Whisper API for transcription

Troubleshooting

Microphone button not appearing:

Ensure the Speech to Text experiment is enabled
Verify FFmpeg is installed and in your PATH
Check that you have an OpenAI provider configured with a valid API key

Transcription errors:

Verify your OpenAI API key is valid and has available credits
Check your internet connection
Try speaking more clearly or adjusting your microphone settings

Limitations

This feature is currently experimental and may have limitations:

Requires active internet connection
Uses OpenAI API credits based on audio duration
Transcription accuracy depends on audio quality and speech clarity

Prerequisites​

1. FFmpeg Installation​

2. OpenAI API Key​

Enabling Voice Transcription​

Using Voice Input​

Technical Details​

Troubleshooting​

Limitations​