The author is saying that the output *token* is not deterministic. I don't think...

hansvm · on Jan 14, 2025

Mostly unrelated (I agree with you, and I'm some ancestory comment you're responding to with the same line of thinking), I have built a couple LLMs where the distribution itself is stochastic. That's not key to how they work as a black box, but much like how quicksort has certain performance characteristics I did find it advantageous to introduce randomness into the model itself.

You could still easily model the next token as a conditional probability distribution though if you wanted; the computation of entropy just might be a bit spendier.