Matrix multiplication explained

Finally after a big pause I am continuing to update my blog.
Several months ago I have written a short example of cache efficient matrix multiplication.

However, in the world of high performance computers such implementation would suffer in several ways.

Those who implement linear algebra applications on supercomputers are using highly optimized libraries like LaPACK, IntelMKL and others.

The document I wrote for TUM explains basic concepts behind implementation of fast matrix multiplication algorithm
in one of the most successful high performance libraries GotoBLAS.

Leave a Reply