Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Fasi, Massimiliano and Higham, Nicholas J. and Lopez, Florent and Mary, Theo and Mikaitis, Mantas (2022) Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores. [MIMS Preprint]

There is a more recent version of this item available.

Text
fhlm22.pdf
Download (470kB)

Abstract

In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply-add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive error analysis of multiword matrix multiplication. After confirming the theoretical error bounds experimentally by simulating low precision in software, we use the cuBLAS and CUTLASS libraries to implement a number of matrix multiplication algorithms using double-fp16 (double-binary16) arithmetic. When running the algorithms on NVIDIA V100 and A100 GPUs, we find that double-fp16 is not as accurate as fp32 (binary32) arithmetic despite satisfying the same worst-case error bound. Using probabilistic error analysis, we explain why this issue is likely to be caused by the rounding mode used by the NVIDIA tensor cores, and propose a parameterized blocked summation algorithm that alleviates the problem and significantly improves the performance-accuracy tradeoff.

Item Type:	MIMS Preprint
Subjects:	MSC 2010, the AMS's Mathematics Subject Classification > 65 Numerical analysis
Depositing User:	Mr Mantas Mikaitis
Date Deposited:	26 Jan 2022 10:29
Last Modified:	26 Jan 2022 10:29
URI:	https://eprints.maths.manchester.ac.uk/id/eprint/2846

Available Versions of this Item

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores. (deposited 26 Jan 2022 10:29) [Currently Displayed]
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores. (deposited 16 Jul 2022 08:36)

Actions (login required)

View Item