Thoughts on Generating C
This post offers six pragmatic insights for compiler writers on generating C code, focusing on leveraging C features like static inline and structured types to achieve both performance and safety. It provides a deep dive into practical techniques for mitigating C's quirks while acknowledging its limitations. The detailed advice on low-level optimizations and type management resonates deeply with the Hacker News audience interested in compilers and systems programming.
The Lowdown
The author, a compiler engineer, shares six practical insights for generating C code, a common target language for compilers. These practices aim to leverage C's strengths while mitigating its pitfalls, offering a "local optimum" in compiler design for those writing programs that translate other programs. This approach helps avoid the undefined behavior often encountered in hand-written C.
- Static Inline for Abstraction: Advocates for
static inline __attribute__((always_inline))functions to eliminate performance overheads of data abstractions, particularly when handling structs by value, ensuring unused data is optimized away. This helps avoid potential bottlenecks where struct values might otherwise be passed through memory. - Explicit Integer Conversions: Recommends defining explicit
static inlineconversion functions (e.g.,u8_to_u32) and enabling-Wconversionto avoid C's unpredictable implicit integer conversion rules, enhancing type safety and clarity in generated code. - Intentional Pointer Wrapping: Proposes wrapping raw pointers and integers in single-member structs (e.g.,
struct gc_ref,type_0ref) to clearly define their intent and prevent type-related errors. This technique is particularly powerful for compilers translating typed source languages, allowing subtyping to carry through to the C output without overhead. - Embrace
memcpyfor Unaligned Accesses: For situations like WebAssembly's linear memory, where alignments aren't guaranteed, the author suggests usingmemcpyfor reads/writes, trusting the compiler to generate efficient unaligned load/store instructions rather than relying on potentially unsafe pointer casting. - Manual Register Allocation for ABIs and Tail Calls: For functions with numerous arguments or return values, or complex tail calls, manually allocating global variables for "excess" parameters can provide more control and robustness than relying on the C compiler's ABI handling. This strategy also simplifies the implementation of multiple return values.
- Acknowledged Drawbacks: The author also candidly discusses limitations of generating C, such as a lack of control over the stack (for GC or continuations), inability to implement zero-cost exceptions without compiler support, and challenges with source-level debugging (Dwarf info). The author briefly dismisses Rust as a target for similar reasons, citing its lifetime issues for languages without explicit lifetimes, less mature tail call support, and longer compile times.
Despite these drawbacks, the author concludes that generating C remains a highly effective strategy, often leading to working code with minimal debugging once it type-checks, offering a pragmatic balance of power and convenience for compiler development.