Donfack, Simplice and Grigori, Laura and Khabou, Amal (2012) Avoiding communication through a multilevel LU factorization. Euro-Par 2012 Parallel Processing, 7484 (2012). pp. 551-562. ISSN 0302-9743
PDF
europar_mcalu.pdf Download (376kB) |
Abstract
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory hierarchy, and due to the exponentially increasing ratio of the time required to transfer data, either through the memory hierarchy or between different compute units, to the time required to compute floating point operations, the algorithms are confronted with two challenges. They need not only to be able to exploit multiple levels of parallelism, but also to reduce the communication between the compute units at each level of the hierarchy of parallelism and between the different levels of the memory hierarchy. In this paper we present an algorithm for performing the LU factorization of dense matrices that is suitable for computer systems with two levels of parallelism. This algorithm is able to minimize both the volume of communication and the number of messages transferred at every level of the two-level hierarchy of parallelism. We present its implementation for a cluster of multicore processors based on MPI and Pthreads. We show that this implementation leads to a better performance than routines implementing the LU factorization in well-known numerical libraries. For matrices that are tall and skinny, that is they have many more rows than columns, our algorithm outperforms the corresponding algorithm from ScaLAPACK by a factor of 4.5 on a cluster of 32 nodes, each node having two quad-core Intel Xeon EMT64 processors.
Item Type: | Article |
---|---|
Subjects: | MSC 2010, the AMS's Mathematics Subject Classification > 65 Numerical analysis |
Depositing User: | Amal Khabou |
Date Deposited: | 13 Mar 2013 |
Last Modified: | 20 Oct 2017 14:13 |
URI: | https://eprints.maths.manchester.ac.uk/id/eprint/1956 |
Actions (login required)
View Item |