This is aimed at new hires or anyone encountering floating-point issues. It doesn't go into much details and is a high-level overview of floating point concepts:
-
A floating-point number = mantissa × 2^exponent, where the mantissa (sometimes called significand) holds the significant digits and the exponent sets the scale. In fp32, exponent takes 8 bits and mantissa 23 bits (+1 sign bit)
-
Between [
2^p,2^(p+1)), floating point numbers are equally spaced (smallest difference between two consecutive floats). For example: fp32 numbers between2.0and4.0have spacing of2^-22Numbers between1024.0and2048have spacing of2^-13. that's what we call the ULP/error in that range.
- It's actually a fact that smallest difference between two consecutive floats (ULP) with the same exponent is:
2^−23 * 2^p - notice how the larger the number gets, the spacing becomes wider, that means the rounding error of the operations get larger. (but the relative error stays the same)
- Exerci