Like, they break? Or it just becomes more profitable for the data center to repl...

manquer · 2026-04-18T17:44:01 1776534241

It will become more expensive to fix than replace. Also more energy intensive than newer generation to operate. MBTF is significant the older the fleet gets higher the failure rates .

A typical node today is 8 GPU node today , you have to keep replacing failed GPUs by cannibalizing parts from other GPUs as nobody is selling new GPUs of that model anymore at higher frequencies.

In addition to outright failure there are higher error rates in computation in graphics it tends to be flickers or screen artifacts and so on.

Azure operated K-80s and P-100s for 9 and 7 years respectively but they were running at 2 GPU nodes and of course were much simpler compared to today’s HBM behomouths on 2/5 nm processor nodes . Google operates their custom ASIC TPUs for about 8-9 years .

With custom inference ASICs like cerebras hitting production the cascading of training NVIDIA chips to inference to get the 5-6 year useful life is also not clear.