View on GitHub

15618Project

Proposal

Summary

Our project is to implement a parallel Strassen’s matrix multiplication algorithm with OpenMPI on the latedays cluster.

Background

Strassen’s matrix multiplication algorithm is a famous divide-and-conquer algorithm. Compared to the classical matrix multiplication algorithm, it reduces the number of subblock multiplications in each iteration from 8 to 7. Parallelizing Strassen’s matrix multiplication is a challenging and well-studied problem. Various parallel algorithms are presented by researchers to improve the performance of the problem. In our project, we limit our interest in using Message Passing Model to parallel Strassen’s matrix multiplication algorithm.

Challenges

Keep communication cost at a low level. In our previous experience, the communication cost across nodes are pretty high. In our implementation, we should reduce the unnecessary communication among cores and optimize the communication cost in the algorithm. That is, to increase the computation-communication ratio.

Make full use of the memory. In order to keep communication cost at a low level, our implementation should use as much local memory as possible. However, there’s limitation in local memory capacity. Thus, our implementation needs to figure out how to make a tradeoff between the communication cost and the memory usage.

Resources

Computers

The latedays cluster at CMU.

Code

We are going to start from scratch with the OpenMPI as the message passing implementation.

Paper & Algorithm

We are going to implement and experiment on the Communication-Optimal Parallel Algorithm (CAPS) presented by Grey Ballard in 2012. The algorithm is based on the Strassen-Winograd algorithm, which is a faster slight variation of Strassen’s algorithm.

Reference: Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. 2012. Communication-optimal parallel algorithm for strassen’s matrix multiplication. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures (SPAA ‘12). ACM, New York, NY, USA, 193-204. DOI: https://doi.org/10.1145/2312005.2312044

Goals and Deliverables

Grey Ballard presented two variations of CAPS as they traverse the recursive tree of the Strassen-Winograd algorithm in either BFS or DFS. Our first goal is to have the correct implementation of both of them.

Our second goal is to experiment these two variations on different problem size and environment settings (number of nodes, memory limits). In the poster session, we plan to show the speedup graphs.

Extra Goal:

When time permits, we are planning to compare our implementation with other matrix multiplication algorithms that adopt the message passing model. Design some benchmark tests and record the different metrics.

Platform Choice

Since we want to explore the problem in the message passing model and run it on the latedays cluster. C/C++ and OpenMPI is a suitable choice.

Schedule