How it works (the 30-second model)
The whole app is one simple loop:
- Hold the hotkey (default
Ctrl+Space). A small overlay shows it is listening. - Speak your sentence or paragraph.
- Release the key. Recording stops.
- It transcribes on your device (no upload), then cleans the text per the mode you picked.
- It pastes the result at your cursor, in whatever app has focus.
That is it. There is no window to switch to, no "send" button, no account. It lives in the system tray and waits for the hotkey.
Two ways to trigger (set in Settings):
- Hold (default): record while the key is held, transcribe on release. Best for quick bursts.
- Toggle: tap once to start, tap again to stop. Best for longer, hands-free dictation.
Setup walkthrough: from zero to dictating
Step 1. Install
Run the Windows installer (VoiceFlow-Setup.exe). It bundles the speech models, so
it works with no internet and nothing else to download. VoiceFlow appears in the system tray.
Step 2. First run auto-configures
The first time it launches, VoiceFlow profiles your PC and picks the best on-device speech engine and settings for your hardware automatically. On an English-language machine with the Parakeet model bundled, it selects Parakeet for the best speed and accuracy. Nothing is asked of you.
Step 3. Check your microphone
Hold Ctrl+Space, say a test sentence, release. The text should appear wherever
your cursor is. If nothing appears, open a text field first (so there is somewhere to paste), and
confirm your speaking microphone is the Windows default input device.
Step 4. Pick a mode
Right-click the tray icon and choose a mode. Leave it on clean for general
dictation, or switch to code when you are dictating to an AI coding assistant.
Step 5. Make it yours (optional)
Open Settings from the tray menu to rebind the hotkey, switch between hold and toggle, choose
the speech engine, or turn on a local LLM for the summary and prompt
modes. Add your own dictionary terms in the config file. There is no sign-in and no configuration
required to start.
Output modes
Every capture runs through a mode that decides how the raw transcript is turned into text. Switch modes per capture from the tray menu, or set a default in Settings. Five modes ship:
| Mode | What it does | Needs an LLM? | Best for |
|---|---|---|---|
raw | Verbatim transcript, lightly punctuated. Nothing removed. | No | Quoting exactly what was said |
clean (default) | Removes filler, collapses repeats, fixes spacing and capitalisation. | No | Everyday dictation |
summary | Condenses rambling speech into a tight brief. | Yes | A spoken brain-dump into a note |
prompt | Reformats speech into a structured LLM prompt. | Yes | Crafting a prompt by voice |
code | Trims politeness, keeps code words, fixes programming terms. | No | Dictating to Claude Code / Cursor |
The non-LLM modes (raw, clean, code) are fully offline
and instant. The LLM modes (summary, prompt) are optional and only run
if you point VoiceFlow at a local model (such as Ollama) or a cloud endpoint you configure. Out
of the box, no LLM is configured, so everything is on-device.
Before and after, every mode
raw spoken: um so basically i need to add a a login function you know output: Um so basically i need to add a a login function you know. clean spoken: um so basically i need to add a a login function you know output: So i need to add a login function. code spoken: um can you please add a dunder init to the fast api app output: Add a __init__ to the FastAPI app.
The summary and prompt examples below are illustrative and need a
local or cloud LLM:
summary spoken: so I was thinking we should maybe look at the caching layer because it keeps timing out and users are complaining and it might be the redis connection pool or something output: Investigate caching timeouts; likely the Redis connection pool. prompt spoken: write me something that takes a csv and returns the average per column output: Role: Python developer Task: Write a function that reads a CSV and returns the average per column. Constraints: none Context: none
The code mode and the code-aware dictionary
This is the feature built specifically for dictating to an AI coding assistant. Two things
happen in code mode, both instant and on-device:
- Polite-prefix trimming. Spoken requests start with filler and courtesy ("um, can you please..."). Code mode strips that and leaves the imperative, which is exactly what an AI assistant wants.
- Code-aware term fixing. A built-in dictionary rewrites spoken programming terms into the symbols and names you meant. It keeps words that are meaningful in code (like "like", "right", "actually") instead of deleting them as filler.
Verified examples (literal output):
say: write a test for the postgres connection get: Write a test for the PostgreSQL connection. say: create a dot py file with a dunder init get: Create a .py file with a __init__. say: refactor this like the right way using type script get: Refactor this like the right way using TypeScript. say: i want you to add a triple equals check not equal to null get: Add a === check != to null.
Built-in terms (v1)
| You say | You get |
|---|---|
| dunder init | __init__ |
| arrow function / fat arrow | => |
| double equals / triple equals / not equal | == / === / != |
| dot py / dot ts / dot js / dot json / dot md | .py / .ts / .js / .json / .md |
| fast api | FastAPI |
| postgres | PostgreSQL |
| node js | Node.js |
| type script / java script | TypeScript / JavaScript |
| git hub | GitHub |
The code dictionary merges with your personal dictionary in the config file, and your entries
win. Add the library names and jargon your stack uses, and VoiceFlow will fix them every time. It
applies in all modes, not just code, so even a clean capture
gets your terms fixed. Code mode adds the politeness trimming on top.
Scenarios: VoiceFlow in a real day
A. Dictating to Claude Code (the headline use)
You are in Claude Code and want a new endpoint. You switch VoiceFlow to code
mode, click into the chat box, hold Ctrl+Space, and say:
"um can you please add a dunder init to the fast api app and a dot py file for the routes"
You release the key. This appears in the chat box, ready to send:
Add a __init__ to the FastAPI app and a .py file for the routes.
No filler, the symbols are right, and you never touched the keyboard. You hit enter and keep moving.
B. Writing a pull-request description (clean mode)
You finished a change and need a PR body. In clean mode you hold the key and
talk through what you did in plain speech, including the inevitable "ums" and restarts.
VoiceFlow drops the filler, fixes the punctuation, and pastes a readable paragraph into the PR
description box. You edit one word and submit.
C. Capturing a rambling idea as a tidy summary (summary mode)
A thought hits you mid-task. With a local model configured, you switch to summary
mode, hold the key, and ramble for thirty seconds about a caching problem. VoiceFlow condenses
it to a one-line note you paste into your issue tracker, instead of a wall of spoken text.
D. Hands-free notes while reading (toggle mode)
You are reading documentation and want to take notes without holding a key. You set the
hotkey to toggle mode in Settings. Now you tap once to start, talk as you read, and tap again to
stop. Your notes land in your editor in clean mode.
Configuration and customisation
VoiceFlow keeps a small config file (TOML) in your Windows app-data folder
(%APPDATA%\VoiceFlow\). Highlights:
- Output mode: the default mode for new captures (
raw,clean,summary,prompt,code). - Hotkeys: record (default
Ctrl+Space), paste-again (Ctrl+Shift+V), cancel (Esc); all rebindable in Settings. - Trigger style:
holdortoggle. - Speech engine: the on-device provider and model. Auto-configured on first run; changeable in Settings.
- LLM provider:
none(default, fully offline), a local OpenAI-compatible endpoint, or a cloud endpoint. Only used bysummaryandprompt. - Personal dictionary: your own spoken-term to text mappings, merged on top of the built-in code dictionary (your entries win).
If VoiceFlow keeps mis-hearing a library name, add it to your dictionary and it will be fixed in every capture from then on.
Install options
Quick install (recommended for users). Download and run
VoiceFlow-Setup.exe from the
latest release.
Models are bundled; it works offline immediately.
From source (for developers). VoiceFlow is proprietary and the source is not public, so there is no source install for the public release. The installer is the supported path.
At a glance (reference card)
- Platform: Windows 10/11 (64-bit).
- Speech: on-device (NVIDIA Parakeet or Whisper), CPU-friendly, no GPU required.
- Privacy: local-only by default, zero network egress.
- Modes: raw, clean, summary, prompt, code.
- Trigger: push-to-talk hold or toggle, fully rebindable.
- Pastes into: any text field in any Windows app.
- For AI coders:
codemode plus a built-in, extensible programming dictionary. - Cost: free, no account.
Continue to the docs hub for quickstart, FAQ, troubleshooting, privacy, and download verification, or download VoiceFlow.