Local Speech Recognition with Whisper

June 23rd, 2025
tech
I've been a heavy user of dictation, off and on, as my wrists have gotten better and worse. I've mostly used the built-in Mac and Android recognition: Mac isn't great, Android is pretty good, neither has improved much over the past ~5y despite large improvements in what should be possible. OpenAI has an open speech recognition model, whisper, and I wanted to have a go at running it on my Mac.

It looks like for good local performance the best version is whisper.cpp, which is a plain C/C++ implementation with support for Mac's ML hardware. To get this installed I needed to install XCode (not just the command line tools, since I needed coremlc) and then run:

$ sudo xcodebuild -license
$ git clone https://github.com/ggerganov/whisper.cpp
$ cd whisper.cpp

$ python3.11 -m venv whisper_v3.11
$ source whisper_v3.11/bin/activate

$ pip install "numpy<2"
$ pip install ane_transformers
$ pip install openai-whisper
$ pip install coremltools
$ brew install sdl2

$ sh ./models/download-ggml-model.sh large-v3-turbo
$ PATH="$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin" \
    ./models/generate-coreml-model.sh large-v3-turbo

$ cmake -B build -DWHISPER_COREML=1 -DWHISPER_SDL2=ON
$ cmake --build build -j --config Release

Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.

While I don't know if these are the ideal arguments, I've been using:

$ ~/code/whisper.cpp/build/bin/whisper-stream \
    --capture 1 \
    --model ~/code/whisper.cpp/models/ggml-large-v3-turbo.bin \
    -t 8 --flash-attn --keep-context --keep 1000 \
    --file output.txt

By default the output is quite repetitive. For example I dictated:

It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for Mac's machine learning hardware. To get this installed, I needed to install Xcode (not just the command line tools since I needed coremlc), and then run a whole bunch of commands.

The output was:

It looks like for good local performance the best version to use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation with support for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed to install Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, and then run a whole bunch of commands.

The fix for not enough LLM these days is often more LLM, so I used Claude Sonnet 4 to clean it up with the prompt "Could you clean up this Whisper transcript? The things that look like repetitions aren't, it's just that whisper does a lot of 'restating' as it updates it's view of what was said." This gave me:

It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for maximum machine learning hardware. To get this installed, I needed to install Xcode, not just the command line tools since I needed Core ML, and then run a whole bunch of commands.

This is very good! The only two things it seems to have gotten 'wrong' are "maximum" for "Mac's" and commas where I was thinking parens. And neither of these are very wrong: "Mac's" comes out verbally as "max" and "maximum" is also plausible in context; the commas read fine, perhaps better than my parens.

I set this up a couple weeks ago, and have generally been finding this quite useful.

Comment via: facebook, lesswrong, mastodon, bluesky, substack

Recent posts on blogs I like:

Differential diagnosis of loveshyness

In my life coaching practice, I see a lot of male clients who have trouble getting dates (including fairly severe trouble, such as never having been kissed in spite of being in their thirties).

via Thing of Things February 6, 2026

2025-26 New Year review

This is an annual post reviewing the last year and setting intentions for next year. I look over different life areas (work, health, parenting, effectiveness, etc) and analyze my life tracking data. Highlights include a minimal group house, the usefulness…

via Victoria Krakovna January 19, 2026

Family Christmas

Unlike many families my family celebrates Christmas with really really a lot of our family. This past year there were about 29 people at my Grandfather's house in the week around Christmas. I know what you're thinking: how does that work? It's…

via Lily Wise's Blog Posts January 3, 2026

more     (via openring)