owned this note
owned this note
Published
Linked with GitHub
# $localize - legacy message id handling
## Background
In Angular i18n messages in component templates are translated by matching a **[message id](#Message-id)**. Message ids can be a custom, provided by the developer (specified by the `@@id` syntax), or computed by a **[digest function](#Digest-function)**. Translations of i18n messages are commonly stored in files that have one of three XML based formats:
* XLIFF 1.2
* XLIFF 2
* XMB/XTB (also used by `goog.getMsg()`)
The digest function for XLIFF 1.2 is different to the other two formats.
:::info
Note that none of these formats specifies how to compute message ids. It is up to the tools reading and writing them to agree on a digest function.
:::
### Pre-$localize
Before `$localize`, the Angular compiler translated messages during compilation of the template. At this point it has access to the translation format and so can apply the appropriate digest function to compute the message id for translation lookup.
Moreover, the Angular compiler has access to the original HTML. There is information that is used to compute the message ids that is only available in the original HTML source.
### Post-$localize
With `$localize`, translation is done much later in the build pipeline after the Angular compilation has completed. At this point, the only information available is what is passed to the `$localize` function.
In other words, only the static parts of the template string, the substitution expressions and [**message metadata**](Message-metadata) blocks are available for computing the message id.
For example, in the following tagged string:
```typescript
$localize `:greeting|Home page user greeting:Hello, ${user.name}:name:!`
```
The only information available is:
```typescript
meaning: 'greeting',
description: 'Home page user greeting'
message parts: ['Hello, ', ' !']
placeholder names: ['name']
```
## Translation problems
The fact that the original HTML and the format of translation files (and so the digest function) is no longer available at the time of translation raises some problems that need to be addressed.
* [Obscure canonical message strings](#Obscure-canonical-message-strings)
* [Unknown digest function](#Unknown-digest-function)
* [Whitespace resilience](#Whitespace-resilience)
### Obscure canonical message strings
The current [canonical message string](#Canonical-message-string) is difficult (if not impossible) to compute only from the information passed to `$localize`.
For example, given the following HTML
```htmlmixed
"<p i18n>
Press <b>cancel</b>
to stop {{job}} job
</p>"
```
The canonical message string would be:
```htmlmixed
"
Press <ph tag name="START_BOLD_TEXT">cancel</ph name="CLOSE_BOLD_TEXT">[
to stop ,<ph name="INTERPOLATION">job</ph>, job
]"
```
:::warning
Note that the sequence of static and interpolated text get wrapped in `[... , ... ]` to look like an array.
:::
The equivalent `$localize` call would be:
```typescript
$localize `
Press ${"�#1�"}:START_BOLD_TEXT:cancel${"/�#1�"}:CLOSE_BOLD_TEXT:
to stop ${"�0�"}:INTERPOLATION: job
`;
```
or something similar to:
```typescript
$localize(
[
'Press ',
':START_BOLD_TEXT:cancel',
':CLOSE_BOLD_TEXT:\nto stop ',
':INTERPOLATION: job\n'
],
'�#1�',
'/�#1�',
'�0�'
);
```
The grouping markers are not easily computed without a certain amount of computing (and guesswork?) in parsing the messageParts and expressions.
:::info
Without knowledge of the original HTML it is not possible for `$localize` to compute the message id.
:::
### Unknown digest function
Currently, XLIFF 1.2 uses a different digest function from the other two. For example given the message from the previous section, the computed message id is:
| XLIFF 2 / XMB/XTB | XLIFF 1.2 |
| ------------------- | ---------------------------------------- |
| 7056919470098446707 | ec1d033f2436133c14ab038286c4f5df4697484a |
The previous implementation can cope with this because translation was done in the Angular compiler, which knew what format the translations were in and so what digest function to use in computing the message ids.
:::info
Without knowledge of the format of the translations (i.e. what digest function should be used), it is not possible for `$localize` to compute the message id.
:::
### Whitespace resilience
The current conversion of HTML to a [canonical message string] is resilient to some changes in the source message but not others.
* Expressions being interpolated can change
* Whitespace within ICU expressions can change
Significantly though, whitespace outside ICU expressions is always included in the canonical message string, whether or not the component whose template contains the message has `preserveWhitespaces` set to `true` or not.
The `$localize` calls contain message strings where whitespace has been collapsed (unless `preserveWhitespaces: true`).
:::info
Without knowledge of the original HTML it is not possible for `$localize` to compute the message id in cases where whitespace has been collapsed.
:::
## Proposed Solution
To avoid these problems, the ivy compiler should use a [common digest function](#Common-digest-function) for all translation formats that can be computed only using the information available to `$localize`.
Translation would be achieved by computing the message id from the `$localize` call and matching against a set of translations keyed off the message id.
Extraction of messages (message ids and source messages) may be achieved directly from bundled code (containing calls to `$localize`) without any dependence on the Angular compiler.
Since this would be a breaking change for current applications, whose translation files might contain message ids computed using legacy digest functions we should implement
* A [compiler legacy mode](#Legacy-mode), which passes through old message ids as custom ids.
* A [translation file migration tool](#Migration-tool), which converts message ids to the new format.
### Common digest function
PR: https://github.com/angular/angular/pull/32867
Both XLIFF 2 and XMB/XTB use the same digest function. The new common digest function should use the same hashing function as these but compute the canonical message string in a way that is resilient to whitespace changes (if appropriate) and can be computed from the information provided to `$localize` alone.
It will be possible to compute the message id in the `$localize` function. Therefore there will be no need to pass around message ids, unless they are custom ids provided by the developer.
The digest function will work as follows:
* Generate a canonical message string by joining the tagged string message parts together with generated placeholders of the form `{$...}`.
* Compute a hash using the current `computeMsgId(message, meaning)` function.
Some examples:
```typescript
$localize `abc${1}def${2}`
-> 'abc{$PH}def{$PH_1}'
-> '6223022895014632549'
$localize `abc${1}:custom:def{2}:custom2:`
-> 'abc{$custom}def{$custom2}'
-> '8479809234660862889'
$localize `:meaning|description:abc`
-> `abc`
-> '1071947593002928768'
$localize `:@@custom-id:abc`
-> ...
-> 'custom-id'
```
:::success
By enabling message id generation from `$localize` calls there is no need to add computed message ids to the generated template code. This keeps the size of the bundles down, especially for runtime translation, where calls to `$localize` are not inlined.
:::
If the component is not set to `preserveWhitespaces: true` then canonical message strings generated from its templates will have already had their whitespace collapsed.
:::success
Computed message ids are resilient to trivial whitespace changes, unless the component specifically preserves whitespace in its template.
:::
In order to localize strings within application code (e.g. in an Angular service) the developer would call `$localize` directly. The message ids can be computed directly from application code calls to `$localize`.
:::success
Localized messages, within application code, are supported out of the box.
:::
### Legacy mode
PR: https://github.com/angular/angular/pull/32937
For initial backward compatibility with pre-ivy translation files, we shall provide a legacy mode in the Angular compiler.
In this mode we will compute the old message id using the appropriate digest function and pass it through to the `$localize` call as a custom id. (This is basically what is happening already in the code but only for the XMB/XLIFF 2 format.)
In the pre-ivy world translations are done in the Angular compiler where a translation format must be provided (via the compiler option `i18nFormat`). We can make use of this in the legacy mode.
If this format option is provided then the Angular compiler should add the legacy message ids to the `$localize` calls as custom ids inside a metadata block.
:::success
Old translation files can continue to be used until the developer is ready to migrate.
:::
### Migration tool
Implement a tool that converts each translation file to the new message id format. Due to the legacy mode this can be a secondary activity.
## Concepts
### Message id
A string that uniquely identifies a message to be translated. These can be custom (provided by a developer) or computed via a digest function.
### Digest function
A function to convert a message to a hash value that can be used to lookup a translation.
Digest functions typically implement the following three steps:
* Convert the message to a canonical string representation
* Combine the canonical string with an optional **meaning** string
* Compute a hash value from this combined string
### Canonical message string
A string that represents the message to be translated, which is resilient to irrelevant changes, such as the original text of expressions being interpolated, or certain whitespace changes.
### Meaning
A string, associated with a message, to indicate the particular meaning of a message, which may be ambiguous otherwise. For example, the English word "right" could be translated to more than one French words, e.g. "droit" or "vrai".
In Angular meanings are assigned to messages via a [message metadata](#Message-metadata) string.
### Message metadata
Additional information about a message included in the template string literal tagged with the `$localize` function via "blocks", marked with by colon characters `:`.
The meaning, description and custom id block must be at the start of the string:
```typescript
$localize `:(meaning|)?(description)?(@@id)?:message string`
```
In this block the `meaning`, `description` and `id` are optional and delimited by `|` and `@@` respectively.
Placeholder name blocks appear directly after a substitution:
```typescript
$localize `Hello, ${person.name}:name:. Welcome to the game.`
```