Even Faster Asin() Was Staring Right at Me
This article delves into micro-optimizing the asin() function, revealing how Estrin's Scheme can exploit instruction-level parallelism for speedups. It's a classic Hacker News story, complete with rigorous benchmarking across various platforms that sparks discussions on low-level performance, compiler behavior, and mathematical approximations.
The Lowdown
Following up on a previous post, the author revisits the quest for a faster asin() implementation, finding further optimization by restructuring polynomial evaluation.
- The core improvement involves applying Estrin's Scheme to a minimax polynomial approximation of
asin(). This reordering allows the compiler and CPU to evaluate parts of the polynomial independently, reducing dependency chain length and enabling instruction-level parallelism. - Extensive benchmarking was performed across Intel, AMD, and Apple M4 CPUs, using various operating systems (Linux, Windows, macOS) and compilers (GCC, Clang, MSVC).
- Results show significant speedups (up to 1.88x over
std::asin()) on older Intel chips, and some benefit on Apple M4 with Clang, but negligible gains on AMD platforms. - Real-world testing in a ray tracer demonstrated a modest 3% improvement on Intel, while the Apple M4 showed no practical change, highlighting that micro-optimizations may not translate proportionally to application-level performance.
- The author emphasizes the critical importance of diligent benchmarking, dispels the myth of simple LUT-based speedups for modern CPUs, and reminds readers that these are approximations, suitable for graphics but not all applications.
Ultimately, the article serves as a testament to the continuous pursuit of performance through deep understanding of both algorithms and underlying hardware, underscoring that collaboration and reevaluation are key to finding better solutions.
The Gossip
Contextual Callbacks
Commenters quickly established that this article was a direct follow-up to a previous popular post by the same author, appreciating the continuation of the technical deep dive.
Constexpr Clarifications
A discussion emerged around the C++ `constexpr` keyword, specifically its role with local variables. The author clarified its usage, while others elaborated on its benefits for type-safe compile-time constants and semantic intent, even if the compiler might optimize constants anyway.
Historical Hacks & Horticultural Harmony
The discussion branched into the historical aspects of mathematical approximations, with one commenter sharing an approximation from 650 AD by Bhaskara and its link to the development of calculus. This also sparked a side debate on whether such optimizations compromise 'elegance and truth' in code, to which the author sought clarification.