Numerical Behavior of NVIDIA Tensor Cores

Fasi, Massimiliano and Higham, Nicholas J. and Mikaitis, Mantas and Pranesh, Srikara (2020) Numerical Behavior of NVIDIA Tensor Cores. [MIMS Preprint] (Submitted)

WarningThere is a more recent version of this item available.
[img] Text

Download (192kB)


We explore the floating-point arithmetic implemented in NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100 and Turing T4 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: 1) accurately simulate NVIDIA tensor cores on conventional hardware; 2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and 3) build hardware that computes a matrix-matrix product matching the results of the NVIDIA tensor cores. As part of this work we provide a testsuite that can be easily adapted to test the latest tensor cores available in the NVIDIA Ampere A100, once those graphics cards become easily accessible. Moreover, we identify a non-monotonicity issue that arises in floating-point multi-operand addition if the intermediate results are not normalized.

Item Type: MIMS Preprint
Subjects: MSC 2010, the AMS's Mathematics Subject Classification > 65 Numerical analysis
MSC 2010, the AMS's Mathematics Subject Classification > 68 Computer science
Depositing User: Mr Mantas Mikaitis
Date Deposited: 04 Oct 2020 08:40
Last Modified: 04 Oct 2020 08:40

Available Versions of this Item

Actions (login required)

View Item View Item