MessageFormat: Unicode standard for localizable message strings

The Unicode MessageFormat Working Group introduces an industry standard for localizable message strings, designed to simplify complex internationalization for developers. This deep dive into its capabilities sparks discussion on tackling nuanced linguistic challenges like plurals and gender, a perennial pain point in software development. Hacker News debates its feasibility, syntax complexity, and potential adoption against existing solutions.

Score

Comments

Highest Rank

11h

on Front Page

First Seen

Feb 16, 10:00 AM

Last Seen

Feb 16, 8:00 PM

Rank Over Time

The Lowdown

The Unicode MessageFormat Working Group, a subgroup of the Unicode CLDR-TC, is developing and supporting a new industry standard for localizable message strings. Known as MessageFormat (or sometimes MessageFormat 2.0, replacing earlier ICU capabilities), it aims to provide a robust framework for handling fluent messages and locally-adapted data presentation.

Goal: To support developers, translators, and end-users with features like gender, inflections, and speech, ensuring interoperable syntax and message data models across various presentation frameworks and programming environments.
Standard Status: The Unicode MessageFormat Standard is a stable part of CLDR, approved by the CLDR Technical Committee, and recommended for implementation. The normative version is published as part of TR35.
Feedback & Participation: The MFWG actively seeks feedback on the specification, implementation difficulties, use-cases, and future requirements, encouraging participation from software developers and localization engineers.

This initiative seeks to standardize the often-complex world of software internationalization, promising a more streamlined approach to handling diverse linguistic rules.

The Gossip

Plurality's Practical Power

Many commenters laud MessageFormat for its ability to abstract away complex conditional logic, especially concerning pluralization. Developers often face messy `if/else` or `switch` blocks for different counts ('0 rows', '1 row', '{n} rows'), which become significantly more complicated in languages with multiple plural categories (e.g., Czech, Slovene, Russian). MessageFormat offers a structured way to handle these nuances, removing branching from application code and making internationalization more manageable.

Syntax Scrutiny & Structural Similarities

The syntax of MessageFormat draws mixed reactions. Some find the example code to resemble a 'programming language' with concerns about potential for nesting, plugins, looping, and other features that could increase complexity. Others compare its structure to existing systems like TeX or Lisp, pointing out a perceived 'brittle' nature or an abundance of closing braces. There's a sentiment that simple interpolation might benefit from a less verbose syntax, while more complex cases could use a richer markup.

Feasibility, Familiarity & Future Forecasts

A significant portion of the discussion revolves around the feasibility and adoption of MessageFormat. Initial skepticism about its 'infeasibility' was met with counter-arguments, noting that similar concepts (like `.po` files with post-processing hooks) have found adoption. Some compare it to Mozilla's Project Fluent and question why such a standard hasn't been more widely adopted already. Optimists highlight the Unicode Consortium's credibility in managing language specs and point to existing, long-term use of similar formats as evidence of increasing adoption.