Local Speech Recognition with Whisper

June 23rd, 2025
tech
I've been a heavy user of dictation, off and on, as my wrists have gotten better and worse. I've mostly used the built-in Mac and Android recognition: Mac isn't great, Android is pretty good, neither has improved much over the past ~5y despite large improvements in what should be possible. OpenAI has an open speech recognition model, whisper, and I wanted to have a go at running it on my Mac.

It looks like for good local performance the best version is whisper.cpp, which is a plain C/C++ implementation with support for Mac's ML hardware. To get this installed I needed to install XCode (not just the command line tools, since I needed coremlc) and then run:

$ sudo xcodebuild -license
$ git clone https://github.com/ggerganov/whisper.cpp
$ cd whisper.cpp

$ python3.11 -m venv whisper_v3.11
$ source whisper_v3.11/bin/activate

$ pip install "numpy<2"
$ pip install ane_transformers
$ pip install openai-whisper
$ pip install coremltools
$ brew install sdl2

$ sh ./models/download-ggml-model.sh large-v3-turbo
$ PATH="$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin" \
    ./models/generate-coreml-model.sh large-v3-turbo

$ cmake -B build -DWHISPER_COREML=1 -DWHISPER_SDL2=ON
$ cmake --build build -j --config Release

Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.

While I don't know if these are the ideal arguments, I've been using:

$ ~/code/whisper.cpp/build/bin/whisper-stream \
    --capture 1 \
    --model ~/code/whisper.cpp/models/ggml-large-v3-turbo.bin \
    -t 8 --flash-attn --keep-context --keep 1000 \
    --file output.txt

By default the output is quite repetitive. For example I dictated:

It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for Mac's machine learning hardware. To get this installed, I needed to install Xcode (not just the command line tools since I needed coremlc), and then run a whole bunch of commands.

The output was:

It looks like for good local performance the best version to use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation with support for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed to install Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, and then run a whole bunch of commands.

The fix for not enough LLM these days is often more LLM, so I used Claude Sonnet 4 to clean it up with the prompt "Could you clean up this Whisper transcript? The things that look like repetitions aren't, it's just that whisper does a lot of 'restating' as it updates it's view of what was said." This gave me:

It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for maximum machine learning hardware. To get this installed, I needed to install Xcode, not just the command line tools since I needed Core ML, and then run a whole bunch of commands.

This is very good! The only two things it seems to have gotten 'wrong' are "maximum" for "Mac's" and commas where I was thinking parens. And neither of these are very wrong: "Mac's" comes out verbally as "max" and "maximum" is also plausible in context; the commas read fine, perhaps better than my parens.

I set this up a couple weeks ago, and have generally been finding this quite useful.

Comment via: facebook, lesswrong, mastodon, bluesky, substack

Recent posts on blogs I like:

My irrational hatred of Geoffrey Miller's Mate

I. Of every dating advice book I’ve ever read, the single one that drives me the most nuts is Geoffrey Miller’s and Tucker Max’s book Mate: Becoming The Man Women Want.

via Thing of Things January 14, 2026

Why I Don't Think My Braces Were Worth It

A couple weeks ago, I got my braces off. I kind of wish I had never had them, though. When I was younger, two of my teeth were sticking out, and they looked kind of funny. I thought that my teeth were just fine, and I didn't want to get braces. But s…

via Anna Wise's Blog Posts January 3, 2026

Family Christmas

Unlike many families my family celebrates Christmas with really really a lot of our family. This past year there were about 29 people at my Grandfather's house in the week around Christmas. I know what you're thinking: how does that work? It's…

via Lily Wise's Blog Posts January 3, 2026

more     (via openring)