Haidar, Azzam and Tomov, Stanimire and Dongarra, Jack and Higham, Nicholas J. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers. In: International Conference on Supercomputing, New York, NY, USA, 2018. (In Press)
Text
International ACM Conference.pdf - Accepted Version Download (1MB) |
Abstract
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, $Ax = b$, where $A$ is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 -> FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4 times speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | MSC 2010, the AMS's Mathematics Subject Classification > 15 Linear and multilinear algebra; matrix theory MSC 2010, the AMS's Mathematics Subject Classification > 65 Numerical analysis |
Depositing User: | Nick Higham |
Date Deposited: | 02 Oct 2018 08:39 |
Last Modified: | 02 Oct 2018 08:39 |
URI: | https://eprints.maths.manchester.ac.uk/id/eprint/2664 |
Actions (login required)
View Item |