Why Custom Attributes in .NET Give Me Nightmares
A .NET reverse engineer dissects the frustrating design choices behind Custom Attributes in the .NET file format. The author argues their underlying storage mechanism, particularly for enums and types, is inefficient, complex, and prone to bugs. This deep dive highlights obscure technical decisions that lead to "nightmares" for those working intimately with .NET binaries.
The Lowdown
The author, a maintainer of a .NET PE parsing library, passionately details his "nightmares" stemming from the design of .NET's Custom Attributes, particularly their storage mechanism. He considers them a "source of all evil" due to their poor implementation within the .NET file format, especially when contrasted with the otherwise well-designed metadata system.
- What Custom Attributes Are: They are extra pieces of metadata attached to various code elements (classes, methods, fields, parameters) used by C# compilers, analyzers, and for meta-programming purposes, such as the
ObsoleteAttributeor automatic serialization. They extend the normal metadata associated with types and members. - Anatomy in .NET Binaries: Custom Attributes are stored in a
CustomAttributemetadata table. Each entry references the member it's attached to, the attribute's constructor, and a blob stream containing serialized arguments. The critical design flaw lies in how these arguments are represented in the blob. - The Enum Values Problem: When Custom Attributes include enum arguments, the values are serialized based on their underlying type (e.g.,
intfor 4 bytes,shortfor 2 bytes). However, determining an enum's underlying type is an "incredibly expensive operation." It necessitates complex assembly resolution, type tree traversal (including handling nested types and recursive searches), and potentially multiple type forwarders across different DLLs before the enum'svalue__field can be inspected. The author suggests a simplerCorElementTypeprefix could have avoided this complexity. - The Type Values (FQNs) Problem: Attributes referencing
Typeobjects (e.g.,typeof(int)) or boxedobjectvalues are stored as Fully Qualified Names (FQNs) strings. This approach is problematic because:- Space Inefficiency: FQNs are extremely verbose and cannot be deduplicated, leading to massive storage overhead (e.g., an 89-character string for
System.Int32instead of 4 bytes). Generic types exacerbate this issue significantly. - Slow and Complex Parsing: Parsing FQNs involves five components (type name, assembly name, version, culture, public key), each with its own parsing rules and potential for different ordering, making it far more CPU-intensive than simple metadata token lookups.
- Grammar and Escaping Rules: FQNs require complex escaping rules for reserved characters, which are often inconsistent or incomplete across implementations, creating a major source of bugs.
- Unintuitive Resolution: The assembly specification in an FQN is optional, leading to unpredictable type resolution. For instance,
"System.IO.Stream"resolves without an assembly specifier, but"System.Uri"does not, due to the nuances of modern .NET's split core libraries and type forwarders.
- Space Inefficiency: FQNs are extremely verbose and cannot be deduplicated, leading to massive storage overhead (e.g., an 89-character string for
- Why It Persists: The author theorizes this FQN approach might be influenced by Java's
.classfile format. Despite the issues, Microsoft is unlikely to change the design due to a strong commitment to backward compatibility, especially since custom attribute behavior doesn't typically affect runtime execution. The file format does allow for versioning (a0x0002version could exist), but it remains unused.
The author concludes that while .NET boasts remarkable stability and backward compatibility, the design of Custom Attributes, particularly their reliance on expensive enum resolution and verbose FQN strings, stands out as a frustrating and arguably unnecessary design flaw that continues to plague developers working at a low level with .NET binaries.