These are good examples because the llama.cpp and whisper.cpp benchmarks take full advantage of the Apple hardware but also take full advantage of non-Apple hardware with GPU support, AVX support etc.
It’s been true for a while now that the memory bandwidth of modern Apple systems in tandem with the neural cores and gpu has made them very competitive Nvidia for local inference and even basic training.
I guess I'm mostly lamenting about how unscientific these discussions are in general, on HN and elsewhere (besides specific GitHub repositories). Every community is filled with just anecdotal stories, or some numbers but missing to specify a bunch of settings + model + runtime details so people could at least compare it to something.
But you can likely find similar threads for the llama.cpp benchmark here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
These are good examples because the llama.cpp and whisper.cpp benchmarks take full advantage of the Apple hardware but also take full advantage of non-Apple hardware with GPU support, AVX support etc.
It’s been true for a while now that the memory bandwidth of modern Apple systems in tandem with the neural cores and gpu has made them very competitive Nvidia for local inference and even basic training.