CPFloat: A C library for simulating low-precision arithmetic

Fasi, Massimiliano and Mikaitis, Mantas (2020) CPFloat: A C library for simulating low-precision arithmetic. [MIMS Preprint] (Unpublished)

Warning
There is a more recent version of this item available.
[thumbnail of fami22.pdf] Text
fami22.pdf

Download (765kB)

Abstract

One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored, and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic available in any language.

Item Type: MIMS Preprint
Subjects: MSC 2010, the AMS's Mathematics Subject Classification > 65 Numerical analysis
Depositing User: Mr Massimiliano Fasi
Date Deposited: 06 Mar 2022 09:32
Last Modified: 06 Mar 2022 09:32
URI: https://eprints.maths.manchester.ac.uk/id/eprint/2850

Available Versions of this Item

Actions (login required)

View Item View Item