CLASSIFICATION: PUBLIC DOC-ID: JGS-VOICEFLOW · GUIDE REV 0.2.0

§ Guide

The VoiceFlow guide

Everything from how the app works to a first-run setup, the output modes, the code mode for AI coders, real-day scenarios, and configuration.

How it works (the 30-second model)

The whole app is one simple loop:

  1. Hold the hotkey (default Ctrl+Space). A small overlay shows it is listening.
  2. Speak your sentence or paragraph.
  3. Release the key. Recording stops.
  4. It transcribes on your device (no upload), then cleans the text per the mode you picked.
  5. It pastes the result at your cursor, in whatever app has focus.

That is it. There is no window to switch to, no "send" button, no account. It lives in the system tray and waits for the hotkey.

Two ways to trigger (set in Settings):

Setup walkthrough: from zero to dictating

Step 1. Install

Run the Windows installer (VoiceFlow-Setup.exe). It bundles the speech models, so it works with no internet and nothing else to download. VoiceFlow appears in the system tray.

Step 2. First run auto-configures

The first time it launches, VoiceFlow profiles your PC and picks the best on-device speech engine and settings for your hardware automatically. On an English-language machine with the Parakeet model bundled, it selects Parakeet for the best speed and accuracy. Nothing is asked of you.

Step 3. Check your microphone

Hold Ctrl+Space, say a test sentence, release. The text should appear wherever your cursor is. If nothing appears, open a text field first (so there is somewhere to paste), and confirm your speaking microphone is the Windows default input device.

Step 4. Pick a mode

Right-click the tray icon and choose a mode. Leave it on clean for general dictation, or switch to code when you are dictating to an AI coding assistant.

Step 5. Make it yours (optional)

Open Settings from the tray menu to rebind the hotkey, switch between hold and toggle, choose the speech engine, or turn on a local LLM for the summary and prompt modes. Add your own dictionary terms in the config file. There is no sign-in and no configuration required to start.

Output modes

Every capture runs through a mode that decides how the raw transcript is turned into text. Switch modes per capture from the tray menu, or set a default in Settings. Five modes ship:

ModeWhat it doesNeeds an LLM?Best for
rawVerbatim transcript, lightly punctuated. Nothing removed.NoQuoting exactly what was said
clean (default)Removes filler, collapses repeats, fixes spacing and capitalisation.NoEveryday dictation
summaryCondenses rambling speech into a tight brief.YesA spoken brain-dump into a note
promptReformats speech into a structured LLM prompt.YesCrafting a prompt by voice
codeTrims politeness, keeps code words, fixes programming terms.NoDictating to Claude Code / Cursor

The non-LLM modes (raw, clean, code) are fully offline and instant. The LLM modes (summary, prompt) are optional and only run if you point VoiceFlow at a local model (such as Ollama) or a cloud endpoint you configure. Out of the box, no LLM is configured, so everything is on-device.

Before and after, every mode

raw   spoken: um so basically i need to add a a login function you know
      output: Um so basically i need to add a a login function you know.

clean spoken: um so basically i need to add a a login function you know
      output: So i need to add a login function.

code  spoken: um can you please add a dunder init to the fast api app
      output: Add a __init__ to the FastAPI app.

The summary and prompt examples below are illustrative and need a local or cloud LLM:

summary spoken: so I was thinking we should maybe look at the caching layer
                because it keeps timing out and users are complaining and it
                might be the redis connection pool or something
        output: Investigate caching timeouts; likely the Redis connection pool.

prompt  spoken: write me something that takes a csv and returns the average per column
        output: Role: Python developer
                Task: Write a function that reads a CSV and returns the average per column.
                Constraints: none
                Context: none

The code mode and the code-aware dictionary

This is the feature built specifically for dictating to an AI coding assistant. Two things happen in code mode, both instant and on-device:

  1. Polite-prefix trimming. Spoken requests start with filler and courtesy ("um, can you please..."). Code mode strips that and leaves the imperative, which is exactly what an AI assistant wants.
  2. Code-aware term fixing. A built-in dictionary rewrites spoken programming terms into the symbols and names you meant. It keeps words that are meaningful in code (like "like", "right", "actually") instead of deleting them as filler.

Verified examples (literal output):

say: write a test for the postgres connection
get: Write a test for the PostgreSQL connection.

say: create a dot py file with a dunder init
get: Create a .py file with a __init__.

say: refactor this like the right way using type script
get: Refactor this like the right way using TypeScript.

say: i want you to add a triple equals check not equal to null
get: Add a === check != to null.

Built-in terms (v1)

You sayYou get
dunder init__init__
arrow function / fat arrow=>
double equals / triple equals / not equal== / === / !=
dot py / dot ts / dot js / dot json / dot md.py / .ts / .js / .json / .md
fast apiFastAPI
postgresPostgreSQL
node jsNode.js
type script / java scriptTypeScript / JavaScript
git hubGitHub

The code dictionary merges with your personal dictionary in the config file, and your entries win. Add the library names and jargon your stack uses, and VoiceFlow will fix them every time. It applies in all modes, not just code, so even a clean capture gets your terms fixed. Code mode adds the politeness trimming on top.

Scenarios: VoiceFlow in a real day

A. Dictating to Claude Code (the headline use)

You are in Claude Code and want a new endpoint. You switch VoiceFlow to code mode, click into the chat box, hold Ctrl+Space, and say:

"um can you please add a dunder init to the fast api app and a dot py file for the routes"

You release the key. This appears in the chat box, ready to send:

Add a __init__ to the FastAPI app and a .py file for the routes.

No filler, the symbols are right, and you never touched the keyboard. You hit enter and keep moving.

B. Writing a pull-request description (clean mode)

You finished a change and need a PR body. In clean mode you hold the key and talk through what you did in plain speech, including the inevitable "ums" and restarts. VoiceFlow drops the filler, fixes the punctuation, and pastes a readable paragraph into the PR description box. You edit one word and submit.

C. Capturing a rambling idea as a tidy summary (summary mode)

A thought hits you mid-task. With a local model configured, you switch to summary mode, hold the key, and ramble for thirty seconds about a caching problem. VoiceFlow condenses it to a one-line note you paste into your issue tracker, instead of a wall of spoken text.

D. Hands-free notes while reading (toggle mode)

You are reading documentation and want to take notes without holding a key. You set the hotkey to toggle mode in Settings. Now you tap once to start, talk as you read, and tap again to stop. Your notes land in your editor in clean mode.

Configuration and customisation

VoiceFlow keeps a small config file (TOML) in your Windows app-data folder (%APPDATA%\VoiceFlow\). Highlights:

If VoiceFlow keeps mis-hearing a library name, add it to your dictionary and it will be fixed in every capture from then on.

Install options

Quick install (recommended for users). Download and run VoiceFlow-Setup.exe from the latest release. Models are bundled; it works offline immediately.

From source (for developers). VoiceFlow is proprietary and the source is not public, so there is no source install for the public release. The installer is the supported path.

At a glance (reference card)

Continue to the docs hub for quickstart, FAQ, troubleshooting, privacy, and download verification, or download VoiceFlow.