Strassen's algorithm is rarely used: its primary use, to my understanding, is in...

jedbrown · on March 7, 2024

The cross-over can be around 500 (https://doi.org/10.1109/SC.2016.58) for 2-level Strassen. It's not used by regular BLAS because it is less numerically stable (a concern that becomes more severe for the fancier fast MM algorithms). Whether or not the matrix can be compressed (as sparse, fast transforms, or data-sparse such as the various hierarchical low-rank representations) is more a statement about the problem domain, though it's true a sizable portion of applications that produce large matrices are producing matrices that are amenable to data-sparse representations.

taeric · on March 7, 2024

How big are the matrixes in some modern training pipelines? We always talk of absurdly large parameter spaces.

ogogmad · on March 7, 2024

SGD keeps the matrices small, I think.