Local Speech Recognition with Whisper |
June 23rd, 2025 |
tech |
It looks like for good local performance the best version is whisper.cpp, which
is a plain C/C++ implementation with support for Mac's ML hardware.
To get this installed I needed to install XCode (not just the command
line tools, since I needed coremlc
) and then run:
$ sudo xcodebuild -license $ git clone https://github.com/ggerganov/whisper.cpp $ cd whisper.cpp $ python3.11 -m venv whisper_v3.11 $ source whisper_v3.11/bin/activate $ pip install "numpy<2" $ pip install ane_transformers $ pip install openai-whisper $ pip install coremltools $ brew install sdl2 $ sh ./models/download-ggml-model.sh large-v3-turbo $ PATH="$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin" \ ./models/generate-coreml-model.sh large-v3-turbo $ cmake -B build -DWHISPER_COREML=1 -DWHISPER_SDL2=ON $ cmake --build build -j --config Release
Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.
While I don't know if these are the ideal arguments, I've been using:
$ ~/code/whisper.cpp/build/bin/whisper-stream \ --capture 1 \ --model ~/code/whisper.cpp/models/ggml-large-v3-turbo.bin \ -t 8 --flash-attn --keep-context --keep 1000 \ --file output.txt
By default the output is quite repetitive. For example I dictated:
It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for Mac's machine learning hardware. To get this installed, I needed to install Xcode (not just the command line tools since I needed coremlc), and then run a whole bunch of commands.
The output was:
It looks like for good local performance the best version to use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation with support for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed to install Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, and then run a whole bunch of commands.
The fix for not enough LLM these days is often more LLM, so I used Claude Sonnet 4 to clean it up with the prompt "Could you clean up this Whisper transcript? The things that look like repetitions aren't, it's just that whisper does a lot of 'restating' as it updates it's view of what was said." This gave me:
It looks like for good local performance, the best version to use is whisper.cpp, which is a plain C/C++ implementation with support for maximum machine learning hardware. To get this installed, I needed to install Xcode, not just the command line tools since I needed Core ML, and then run a whole bunch of commands.
This is very good! The only two things it seems to have gotten 'wrong' are "maximum" for "Mac's" and commas where I was thinking parens. And neither of these are very wrong: "Mac's" comes out verbally as "max" and "maximum" is also plausible in context; the commas read fine, perhaps better than my parens.
I set this up a couple weeks ago, and have generally been finding this quite useful.
Comment via: facebook, lesswrong, mastodon, bluesky, substack