Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does the cost scale linearly/superlinearly? What does the $300-$400 price data point tell us with relationship to the parameter density?

No gotchas here. I genuinely don't know that 8B parameters is in a zone with significant decreasing marginal returns -- too far out of my knowledge area but genuinely curious.



Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.

I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.


You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)


I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.


except that if you do, you've just implemented a different model, with no way to tell which part of it is wrong


Could you tell that the original model was "right"?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: