HN
Today

Floor and Ceil versus Denormals on CPU and GPU

This technical deep dive unravels the tricky world of floating-point numbers, specifically how floor and ceil functions behave differently with denormals across various CPUs and GPUs. It highlights a subtle yet significant source of non-determinism for developers, especially in graphics programming. The story provides concrete examples and even a deterministic solution, making it highly valuable for anyone grappling with cross-platform floating-point precision.

7
Score
0
Comments
#8
Highest Rank
8h
on Front Page
First Seen
May 30, 10:00 AM
Last Seen
May 30, 5:00 PM
Rank Over Time
81391117192526

The Lowdown

The author, a seasoned expert in floating-point mechanics, delves into a specific conundrum: the result of floor() and ceil() functions when applied to denormalized floating-point numbers. This seemingly obscure detail can lead to significant discrepancies between CPU and GPU computations.

  • The article first clarifies the standard rounding functions: floor (down), ceil (up), trunc (towards zero), and round (to nearest integer), emphasizing they output floating-point integral values.
  • It introduces denormal (or subnormal) numbers as extremely small floating-point values near zero, using a special representation that can behave inconsistently across hardware.
  • The core issue lies in whether a platform 'preserves' these denormals (processing their actual small value) or 'flushes' them to zero, which drastically alters the outcome of floor and ceil functions.
  • Testing revealed that x86 64-bit CPUs consistently preserve denormals. In contrast, Nvidia GPUs (RTX 4090) flush denormals by default and even with specific compiler flags to preserve them. Intel (Arc B580) and AMD (Radeon RX 6800 XT) GPUs flush by default but can be configured to preserve denormals.
  • An update notes that the DirectX specification explicitly requires GPUs to flush denormals on input and output, explaining Nvidia's behavior.
  • To combat this non-determinism, the author provides a 'deterministic solution' – custom HLSL DeterministicFloor and DeterministicCeil functions using bitwise operations to ensure consistent behavior across all platforms, regardless of their denormal handling.

This exploration underscores the critical importance of understanding low-level floating-point behavior, particularly for graphics and high-precision computing, where subtle differences in hardware implementation can lead to unexpected and hard-to-debug results. The provided deterministic solution offers a practical way to achieve consistent mathematical outcomes across diverse computational environments.