Intent Defaults: The `<mrow>` canon

(Working Draft, June 2021, document subject to change)

In the new W3C Math group, as well as the community group that preceded it, we have been brainstorming a dialect of annotation markup for specifying "author intent" over presentation MathML trees.

In the simplest of examples, you could imagine an atomic attribute that grounds a single node to its mathematical meaning, as in Euler's constant and Euler's number:

<mi intent="euler-number">e</mi>
<mi intent="euler-constant">γ</mi>

which would allow accessibility (AT) software to either choose to speak the mathematically informative concept name behind the rendered symbol, or directly speak the raw presentation readout. These values also double as anchors for:

tying together aliases (e.g. here "napier-constant" for "euler-number" and "euler–mascheroni-constant" for "euler-constant").
speech hints, where compound speech rules can be defined at a precise granularity
other e-Learning applications, such as informative auto-coloring of nodes or "info boxes" with useful related links.

In this note, I will do some example-driven troubleshooting of the "free lunch" or "intent defaults" idea that has been discussed in the community group.

The idea of "intent defaults" is to standardize a shared, public domain, list of the most ubiquitous math notations in use in a K-14 educational setting, as initially informed by Western curricula. Each list item will contain:

the math concept name, or technically the "value of the MathML intent attribute", such as euler-constant
the notation known for this name, or technically a "selector for a MathML subtree", YTBD.

Then we plan to offer this precompiled list to accessibility tools as a "default" interpretation mode, where notations recorded in it are assumed to imply the recorded intent, when they can be automatically spotted in classic presentation MathML trees. This idea follows the spirit of the 80/20 rule, where we hope to remediate 80% of expressions in mass use with a small fraction of standardization effort. Of course we also mean to provide an escape hatch for remediating the "long tail" of formulas that won't fit neatly in this notational mini universe, and a straight out "off switch" for documents that completely deviate from K-14 material.

Running example

Consider the following single line TeX formula (displayed properly further down, if you're using Firefox today):

x \in (0,1) , y \in (-1, 0)

Presentation MathML

Here are some possible - and valid - presentation MathML trees for this expression, with their browser rendering.

Aligned operator tree: Additional <mrow> nodes are added, where possible, following the implied operator tree of the expression. In other words, pieces of the formula are grouped together based on the order and scope in which a human mathematician would evaluate them. As generated by latexml v0.8.5:

Source:

<math>
<mrow>
  <mi>x</mi>
  <mo>∈</mo>
  <mrow>
    <mo stretchy="false">(</mo>
    <mn>0</mn>
    <mo>,</mo>
    <mn>1</mn>
    <mo stretchy="false">)</mo>
  </mrow>
</mrow>
<mo>,</mo>
<mrow>
  <mi>y</mi>
  <mo>∈</mo>
  <mrow>
    <mo stretchy="false">(</mo>
    <mrow>
      <mo>-</mo>
      <mn>1</mn>
    </mrow>
    <mo>,</mo>
    <mn>0</mn>
    <mo stretchy="false">)</mo>
  </mrow>
</mrow>
</math>

Rendered:

x \in (0, 1), y \in (- 1, 0)

Flat layout tree: no additional structure beyond the layout information needed for a faithful display. As generated by a MathJax v3 demo, as well as make4ht and tralics:

Source:

<math>
  <mi>x</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mn>0</mn>
  <mo>,</mo>
  <mn>1</mn>
  <mo stretchy="false">)</mo>
  <mo>,</mo>
  <mi>y</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mo>&#x2212;</mo>
  <mn>1</mn>
  <mo>,</mo>
  <mn>0</mn>
  <mo stretchy="false">)</mo>
</math>

Rendered:

x \in (0, 1), y \in (- 1, 0)

There is a large variety of possible <mrow> wrapper nodes, at various levels of nesting. Independent of which variant is chosen, it will be displayed as visually identical to the others, as seen above. The distinction remains invisible to the ultimate reader of the document, much like the <span> tag in HTML.

For this example, the official MathML 3 spec, Chapter 3 has yet another suggested tree. It advocates for an extra <mrow> that holds together the arguments of a range, even if that has failed to persuade implementers. Explicitly:

<mrow>
  <mo stretchy="false">(</mo>
  <mrow>
    <mn>0</mn>
    <mo>,</mo>
    <mn>1</mn>
  </mrow>
  <mo stretchy="false">)</mo>
</mrow>

An in-between example was seen when combining AsciiMath with MathJax, which grouped together the intervals, but not the two top-level list items:

<math>
  <mstyle displaystyle="true">
    <mi>x</mi>
    <mo>&#x2208;</mo>
    <mrow>
      <mo>(</mo>
      <mn>0</mn>
      <mo>,</mo>
      <mn>1</mn>
      <mo>)</mo>
    </mrow>
    <mo>,</mo>
    <mi>y</mi>
    <mo>&#x2208;</mo>
    <mrow>
      <mo>(</mo>
      <mo>-</mo>
      <mn>1</mn>
      <mo>,</mo>
      <mn>0</mn>
      <mo>)</mo>
    </mrow>
  </mstyle>
</math>

Narration for accessible speech generation

We could imagine our running example read out as:

x is in the open interval from zero to one and y is in the open interval from negative one to zero

In order to automate such a reading, we would need to provide the enriched annotations for:

The two parentheticals are open intervals, with the corresponding speech hints for intervals ("from" and "to" connectives).
The "elementOf" unicode character ∈ can have the short narration "in" when indexing into a numeric range, versus the more verbose set-theoretic reading of "x is an element of …".
The second comma delimits a list of formula statements, which can be read "and", rather than "comma". Or even "while", "also", as well as other natural connectives depending on the context.
The negative number must be clearly understood as it is comprised of two sibling elements, an <mo> and an <mn>. Only then we can avoid speaking the raw Unicode name for the operator (used above were "hyphen" or "minus") and instead use the mathematically accurate narration "negative".

Design challenges

Math notations are generally highly ambiguous. The construct
$(x, y)$ could stand for:
- the open interval from x to y
- the point, or Cartesian coordinate, x comma y
- the ordered pair, or tuple, or vector, x comma y
- the arguments x and y, if the entire expression is e.g.
  $f (x, y)$ or the joint probability
  $P (x, y)$ .
- and more beyond K-14, such as the Hilbert symbol.
Question: which notations to include, under what conditions?
Expressions have varying lengths and are fully remixable
- Only relative positional information is important for notations.
- However, context influences meaning, so selection is not self-contained.
  - In our flat variant, the commas can delimit a single four element list, unless we consider the parentheses.
  - In other words, it is much easier to select over a fully built operator tree, than it would be to select over a minimal/flat layout tree.
- Question: Do we pretend complex math syntax does not exist, offering defaults for the smallest notations only OR do we ask authors (and authoring tools) for help with marking up the operator structure?

Canonically intentional `<mrow>` and 2D wrappers

One avenue for a "free lunch" for accessibility tools is to ask authoring tools to mark up, as best as they can, a parallel operator tree over the presentation layout, via strategic <mrow> wrappers.

For our running example that would be identical to the latexml variant at the top:

<mrow>
    <mo stretchy="false">(</mo>
    <mn>0</mn>
    <mo>,</mo>
    <mn>1</mn>
    <mo stretchy="false">)</mo>
</mrow>

For every notation introduced by our "Intent Core" (also referred to as "Intent Level 1") list. working draft spreadsheet here. Capturing each horizontal notation in a dedicated <mrow> will allow us to standardize a list of unambiguous selectors, which would enrich matching subtrees with a given intent value, such as "open-interval".

This approach does not on its own answer how to resolve multiple meanings for the exact identical notation (e.g. tuple vs interval), but allows us to avoid some of the contextual ambiguity (e.g. the

f (x, y)

example will not match the open-interval rule, as the

f

identifier will be the first child in the <mrow>, instead of the open parenthesis).

As this will put a burden on authoring tools, it is more of a "delegated lunch" than a free lunch.

Possible selector implementations

While mathematicians will have an easy time writing down

(\cdot, \cdot)

as the template for an interval notation, writing down a specific selector for a presentation MathML tree is a more verbose task, even if we have the preferred mrow structure. For example:

An XPath selector for open-interval, in a tree with canonical mrow structure:

//mrow[
    count(./*)=5 and
    ./*[1][name()="mo" and text()="("] and
    ./*[3][name()="mo" and text()=","] and
    ./*[5][name()="mo" and text()=")"] ]

An XPath selector for open-interval, adding inferred mrows:

//*[(self::mrow or self::mtd or self::mpadded or self::mstyle or self::msqrt or
     self::math or self::merror or self::menclose or self::mphantom) and
    count(./*)=5 and
    ./*[1][name()="mo" and text()="("] and
    ./*[3][name()="mo" and text()=","] and
    ./*[5][name()="mo" and text()=")"] ]

Alternative approach: An XPath selector for open-interval using only the sibling axis.

//mo[text()="(" and
    following-sibling::*[2][
        self::mo and text()=","] and
    following-sibling::*[4][
        self::mo and text()=")"]]

Requires only well-marked notation arguments, without requirements on the parent/ancestor nodes.

Are there easier ways to serialize and maintain these selectors?

Other free-lunch techniques to explore

Defaults for 2D elements are sometimes easy (msqrt) and sometimes near-impossible (msup, msub, mover, munder).
broader rules, including ancestor elements for context
subject area restrictions, allowing meanings only under a topic keyword. This may aid, if designed very carefully, for example:
- Whether
  $π^{2}$ is the number pi squared, or a twice iterated application of a projection function. Similarly
  $x^{'}$ could be the first derivative of x, or the embellished variable name "x-prime".
- Whether
  $H_{2}$ is the second element of a vector
  $H$ , or two molecules of hydrogen.
Optimism. A greedy depth-first search could attempt to assign notation defaults in a fault-tolerant fashion, quitting at the first sign of unrecognized territory. This relies on battle-hardened implementations however, and is hard to rely on from a standardization standpoint.
Pessimism. A baseline fallback to the raw presentation tree, together with reading out the Unicode glyphs, allows for a minimally usable baseline. Any subtree where the defaults fail to be applicable can revert to reading out presentation.
Realism. Some remediation from authors at complex expressions and non-standard notations should always be possible, the "intent defaults" should not become a lock-in technology.
Testing against large volumes of material. The working group has access to over a billion openly accessible formulas, including textbooks from all levels, and can have a data-driven estimation of which approaches succeed and which fail, while collecting examples for a long-term regression testbed.

Remark on operator trees

Q: Are "operator trees" another name for Content MathML?
A: No. They are a partial prerequisite. An operator tree is roughly equivalent to determining the correct skeleton of <apply>-based subexpressions, and only that.

For example, knowing that "

a * b

" is an application of an infix operator to two arguments, does not tell us nearly enough to construct a full Content MathML tree, but it tells us enough to add a wrapping <mrow> in the presentation tree.

We still need to fully determine the exact content symbols and variable bindings before we can build a Content MathML tree. And we also have to resolve any leftover linguistic phenomena that are only easy to mark up in presentation MathML (such as ellipsis <mi>⋯</mi>), but are difficult to fully formalize.

Remark on manual remediation

Intent attribute

It is valuable to note that the explicit syntax for the intent annotations allows to remediate any of the enumerated presentation trees, at the cost of making the markup "coarse grained". To be precise, the best possible annotation would be deposited on the "Lowest common ancestor (LCA)" of the participating presentation elements.

As an example, here is how the flat presentation tree may be remediated:

<math intent="formulae(element-of($1, open-interval($2, $3)),
                   element-of($4, open-interval($5($6), $7)))">
  <mi arg="1">x</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mn arg="2">0</mn>
  <mo>,</mo>
  <mn arg="3">1</mn>
  <mo stretchy="false">)</mo>
  <mo>,</mo>
  <mi arg="4">y</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mo arg="5">&#x2212;</mo>
  <mn arg="6">1</mn>
  <mo>,</mo>
  <mn arg="7">0</mn>
  <mo stretchy="false">)</mo>
</math>

To infer that automatically however appears to be beyond the "defaults" of a specification, and closer to embarking on an ambitious parsing project for arbitrary math expressions.

MathML 3 alttext, or ARIA aria-label

And here is the same with the remediation possible already with MathML 3, using the alttext attribute. Or alternatively also possible via the aria-label attribute.

<math alttext="x in the open interval from zero to one
                and y in the open interval from negative one to zero">
  <mi>x</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mn>0</mn>
  <mo>,</mo>
  <mn>1</mn>
  <mo stretchy="false">)</mo>
  <mo>,</mo>
  <mi>y</mi>
  <mo>&#x2208;</mo>
  <mo stretchy="false">(</mo>
  <mo>&#x2212;</mo>
  <mn>1</mn>
  <mo>,</mo>
  <mn>0</mn>
  <mo stretchy="false">)</mo>
</math>

The "top-level textual description" approach has some obvious limitations.

it does not offer any support for incremental internationalization. and
it is also fragile to context. The alttext above
- reads well in the sentence Take <math>...</math>.,
- but reads badly in This is the case when <math>...</math>.
- in the second case one would prefer the reading "x is in the open-interval…", with an added verb which fits better with the outer narration.

Intent Defaults: The <mrow> canon

(Working Draft, June 2021, document subject to change)

Running example

Presentation MathML

Narration for accessible speech generation

Design challenges

Canonically intentional <mrow> and 2D wrappers

Possible selector implementations

Other free-lunch techniques to explore

Remark on operator trees

Remark on manual remediation

Intent attribute

MathML 3 alttext, or ARIA aria-label

Read more

Example of using MathML in HackMD

Some math notations from Bulgaria

A brief encounter with notations in mineralogy

Multiple meanings of vertical bars

Intent Defaults: The `<mrow>` canon

Canonically intentional `<mrow>` and 2D wrappers