VoiceFlow guide: setup, modes, and scenarios

How it works (the 30-second model)

The whole app is one simple loop:

Hold the hotkey (default Ctrl+Space). A small overlay shows it is listening.
Speak your sentence or paragraph.
Release the key. Recording stops.
It transcribes on your device (no upload), then cleans the text per the mode you picked.
It pastes the result at your cursor, in whatever app has focus.

That is it. There is no window to switch to, no "send" button, no account. It lives in the system tray and waits for the hotkey.

Two ways to trigger (set in Settings):

Hold (default): record while the key is held, transcribe on release. Best for quick bursts.
Toggle: tap once to start, tap again to stop. Best for longer, hands-free dictation.

Setup walkthrough: from zero to dictating

Step 1. Install

Run the Windows installer (VoiceFlow-Setup.exe). It bundles the speech models, so it works with no internet and nothing else to download. VoiceFlow appears in the system tray.

Step 2. First run auto-configures

The first time it launches, VoiceFlow profiles your PC and picks the best on-device speech engine and settings for your hardware automatically. On an English-language machine with the Parakeet model bundled, it selects Parakeet for the best speed and accuracy. Nothing is asked of you.

Step 3. Check your microphone

Hold Ctrl+Space, say a test sentence, release. The text should appear wherever your cursor is. If nothing appears, open a text field first (so there is somewhere to paste), and confirm your speaking microphone is the Windows default input device.

Step 4. Pick a mode

Right-click the tray icon and choose a mode. Leave it on clean for general dictation, or switch to code when you are dictating to an AI coding assistant.

Step 5. Make it yours (optional)

Open Settings from the tray menu to rebind the hotkey, switch between hold and toggle, choose the speech engine, or turn on a local LLM for the summary and prompt modes. Add your own dictionary terms in the config file. There is no sign-in and no configuration required to start.

Output modes

Every capture runs through a mode that decides how the raw transcript is turned into text. Switch modes per capture from the tray menu, or set a default in Settings. Five modes ship:

Mode	What it does	Needs an LLM?	Best for
`raw`	Verbatim transcript, lightly punctuated. Nothing removed.	No	Quoting exactly what was said
`clean` (default)	Removes filler, collapses repeats, fixes spacing and capitalisation.	No	Everyday dictation
`summary`	Condenses rambling speech into a tight brief.	Yes	A spoken brain-dump into a note
`prompt`	Reformats speech into a structured LLM prompt.	Yes	Crafting a prompt by voice
`code`	Trims politeness, keeps code words, fixes programming terms.	No	Dictating to Claude Code / Cursor

The non-LLM modes (raw, clean, code) are fully offline and instant. The LLM modes (summary, prompt) are optional and only run if you point VoiceFlow at a local model (such as Ollama) or a cloud endpoint you configure. Out of the box, no LLM is configured, so everything is on-device.

Before and after, every mode

raw   spoken: um so basically i need to add a a login function you know
      output: Um so basically i need to add a a login function you know.

clean spoken: um so basically i need to add a a login function you know
      output: So i need to add a login function.

code  spoken: um can you please add a dunder init to the fast api app
      output: Add a __init__ to the FastAPI app.

The summary and prompt examples below are illustrative and need a local or cloud LLM:

summary spoken: so I was thinking we should maybe look at the caching layer
                because it keeps timing out and users are complaining and it
                might be the redis connection pool or something
        output: Investigate caching timeouts; likely the Redis connection pool.

prompt  spoken: write me something that takes a csv and returns the average per column
        output: Role: Python developer
                Task: Write a function that reads a CSV and returns the average per column.
                Constraints: none
                Context: none

The code mode and the code-aware dictionary

This is the feature built specifically for dictating to an AI coding assistant. Two things happen in code mode, both instant and on-device:

Polite-prefix trimming. Spoken requests start with filler and courtesy ("um, can you please..."). Code mode strips that and leaves the imperative, which is exactly what an AI assistant wants.
Code-aware term fixing. A built-in dictionary rewrites spoken programming terms into the symbols and names you meant. It keeps words that are meaningful in code (like "like", "right", "actually") instead of deleting them as filler.

Verified examples (literal output):

say: write a test for the postgres connection
get: Write a test for the PostgreSQL connection.

say: create a dot py file with a dunder init
get: Create a .py file with a __init__.

say: refactor this like the right way using type script
get: Refactor this like the right way using TypeScript.

say: i want you to add a triple equals check not equal to null
get: Add a === check != to null.

Built-in terms (v1)

You say	You get
dunder init	`__init__`
arrow function / fat arrow	`=>`
double equals / triple equals / not equal	`==` / `===` / `!=`
dot py / dot ts / dot js / dot json / dot md	`.py` / `.ts` / `.js` / `.json` / `.md`
fast api	`FastAPI`
postgres	`PostgreSQL`
node js	`Node.js`
type script / java script	`TypeScript` / `JavaScript`
git hub	`GitHub`

The code dictionary merges with your personal dictionary in the config file, and your entries win. Add the library names and jargon your stack uses, and VoiceFlow will fix them every time. It applies in all modes, not just code, so even a clean capture gets your terms fixed. Code mode adds the politeness trimming on top.

Scenarios: VoiceFlow in a real day

A. Dictating to Claude Code (the headline use)

You are in Claude Code and want a new endpoint. You switch VoiceFlow to code mode, click into the chat box, hold Ctrl+Space, and say:

"um can you please add a dunder init to the fast api app and a dot py file for the routes"

You release the key. This appears in the chat box, ready to send:

Add a __init__ to the FastAPI app and a .py file for the routes.

No filler, the symbols are right, and you never touched the keyboard. You hit enter and keep moving.

B. Writing a pull-request description (clean mode)

You finished a change and need a PR body. In clean mode you hold the key and talk through what you did in plain speech, including the inevitable "ums" and restarts. VoiceFlow drops the filler, fixes the punctuation, and pastes a readable paragraph into the PR description box. You edit one word and submit.

C. Capturing a rambling idea as a tidy summary (summary mode)

A thought hits you mid-task. With a local model configured, you switch to summary mode, hold the key, and ramble for thirty seconds about a caching problem. VoiceFlow condenses it to a one-line note you paste into your issue tracker, instead of a wall of spoken text.

D. Hands-free notes while reading (toggle mode)

You are reading documentation and want to take notes without holding a key. You set the hotkey to toggle mode in Settings. Now you tap once to start, talk as you read, and tap again to stop. Your notes land in your editor in clean mode.

Configuration and customisation

VoiceFlow keeps a small config file (TOML) in your Windows app-data folder (%APPDATA%\VoiceFlow\). Highlights:

Output mode: the default mode for new captures (raw, clean, summary, prompt, code).
Hotkeys: record (default Ctrl+Space), paste-again (Ctrl+Shift+V), cancel (Esc); all rebindable in Settings.
Trigger style: hold or toggle.
Speech engine: the on-device provider and model. Auto-configured on first run; changeable in Settings.
LLM provider: none (default, fully offline), a local OpenAI-compatible endpoint, or a cloud endpoint. Only used by summary and prompt.
Personal dictionary: your own spoken-term to text mappings, merged on top of the built-in code dictionary (your entries win).

If VoiceFlow keeps mis-hearing a library name, add it to your dictionary and it will be fixed in every capture from then on.

Install options

Quick install (recommended for users). Download and run VoiceFlow-Setup.exe from the latest release. Models are bundled; it works offline immediately.

From source (for developers). VoiceFlow is proprietary and the source is not public, so there is no source install for the public release. The installer is the supported path.

At a glance (reference card)

Platform: Windows 10/11 (64-bit).
Speech: on-device (NVIDIA Parakeet or Whisper), CPU-friendly, no GPU required.
Privacy: local-only by default, zero network egress.
Modes: raw, clean, summary, prompt, code.
Trigger: push-to-talk hold or toggle, fully rebindable.
Pastes into: any text field in any Windows app.
For AI coders: code mode plus a built-in, extensible programming dictionary.
Cost: free, no account.

Continue to the docs hub for quickstart, FAQ, troubleshooting, privacy, and download verification, or download VoiceFlow.