Relton, Samuel D. and Valero-Lara, Pedro and Zounon, Mawussi (2016) A Comparison of Potential Interfaces for Batched BLAS Computations. [MIMS Preprint]
PDF
Manchester_ReportBBLAS.pdf Download (399kB) |
Abstract
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem into thousands of small problems which can be solved indepen- dently. There is a clear need for a batched BLAS standard, allowing users to perform thousands of small BLAS operations in parallel and making efficient use of their hard- ware. There are many possible ways in which the BLAS standard can be extended for batch operations. We discuss many of these possible designs, giving benefits and criticisms of each, along with a number of experiments designed to determine how the API may affect performance on modern HPC systems. Related issues that influence API design, such as the effect of memory layout on performance, are also discussed.
Item Type: | MIMS Preprint |
---|---|
Uncontrolled Keywords: | BLAS, batched BLAS, linear algebra, parallel computing, high-performance computing |
Subjects: | MSC 2010, the AMS's Mathematics Subject Classification > 68 Computer science |
Depositing User: | Dr Samuel Relton |
Date Deposited: | 04 Aug 2016 |
Last Modified: | 08 Nov 2017 18:18 |
URI: | https://eprints.maths.manchester.ac.uk/id/eprint/2493 |
Actions (login required)
View Item |