Voice Transcription
Kilo Code now includes experimental support for voice input in the chat interface. This feature allows you to dictate your messages using speech-to-text (STT) technology powered by OpenAI's Whisper API.
Prerequisites
Voice transcription requires two components to be set up:
1. FFmpeg Installation
FFmpeg is required for audio capture and processing. Install it for your platform:
macOS:
brew install ffmpeg
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg
Windows: Download from ffmpeg.org/download.html and add to your system PATH.
2. OpenAI API Key
Voice transcription uses OpenAI's Whisper API for speech recognition. You need an OpenAI API configuration in Kilo Code:
- Configure an OpenAI provider profile in Kilo Code settings
- Add your OpenAI API key to the profile
- Either OpenAI or OpenAI Native provider types will work
Enabling Voice Transcription
Voice transcription is an experimental feature that must be enabled:
- Open Kilo Code settings
- Navigate to Experimental Features
- Enable the Speech to Text experiment
Using Voice Input
Once configured and enabled, a microphone button will appear in the chat input area:
- Click the microphone button to start recording
- Speak your message clearly
- Click again to stop recording
- Your speech will be automatically transcribed into text
The feature includes real-time audio level visualization and voice activity detection to automatically detect when you're speaking.
Technical Details
- Audio Processing: Uses FFmpeg for system audio capture
- Voice Recognition: OpenAI Whisper API for transcription
Troubleshooting
Microphone button not appearing:
- Ensure the Speech to Text experiment is enabled
- Verify FFmpeg is installed and in your PATH
- Check that you have an OpenAI provider configured with a valid API key
Transcription errors:
- Verify your OpenAI API key is valid and has available credits
- Check your internet connection
- Try speaking more clearly or adjusting your microphone settings
Limitations
This feature is currently experimental and may have limitations:
- Requires active internet connection
- Uses OpenAI API credits based on audio duration
- Transcription accuracy depends on audio quality and speech clarity