A Comparison of Potential Interfaces for Batched BLAS Computations

Relton, Samuel D. and Valero-Lara, Pedro and Zounon, Mawussi (2016) A Comparison of Potential Interfaces for Batched BLAS Computations. [MIMS Preprint]

[thumbnail of Manchester_ReportBBLAS.pdf] PDF

Download (399kB)


One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem into thousands of small problems which can be solved indepen- dently. There is a clear need for a batched BLAS standard, allowing users to perform thousands of small BLAS operations in parallel and making efficient use of their hard- ware. There are many possible ways in which the BLAS standard can be extended for batch operations. We discuss many of these possible designs, giving benefits and criticisms of each, along with a number of experiments designed to determine how the API may affect performance on modern HPC systems. Related issues that influence API design, such as the effect of memory layout on performance, are also discussed.

Item Type: MIMS Preprint
Uncontrolled Keywords: BLAS, batched BLAS, linear algebra, parallel computing, high-performance computing
Subjects: MSC 2010, the AMS's Mathematics Subject Classification > 68 Computer science
Depositing User: Dr Samuel Relton
Date Deposited: 04 Aug 2016
Last Modified: 08 Nov 2017 18:18
URI: https://eprints.maths.manchester.ac.uk/id/eprint/2493

Actions (login required)

View Item View Item