A: When the formalization of an expression requires a complete departure from its presentation.
In other words, whenever the author uses notational shorthands that serve to compress a non-trivial amount of conceptual structure.
A content tree that is formally useful ought to disentangle and . And recognize that if the right-hands side (RHS) was , those assignments would be flipped.
Current tools, such as latexml, produce Content MathML that does not try to do this level of inference, instead marking up an artificial "ambiguous superscript" symbol applied to "x" and "a list 1 2". While also treating "plus-minus" as a single content symbol. In essence, latexml would provide a near-direct translation of the layout tree in the Content syntax, for cases beyond explicitly handled notations in its MathGrammar
.
It is unclear if both content trees are actually acceptable approaches to using Content MathML, and if they are - how many alternative approaches there might be. Which leads to vendor-specific dialects, and strictly harms interoperability.
Take the construct of one bidiagonal Toeplitz matrix, as seein in arXiv:15112.06076:
A formalized Content MathML tree that is reusable by a CAS system ought to provide a (system of) equation(s) that determines the structure of the matrix w.r.t the variables and .
Here again present-day tools, such as latexml, provide a near-verbatim translation of the presentation/layout tree, using a matrixrow
element to hold the content of each row and depositing placeholder csymbol
nodes in the places where the presentation had ellipses.
The hypothetical CAS-interchange tree should depart from the natural layout, and would be impossible to use for fine-grained parallel annotations. This is also true vice-versa. The layout-near Content tree generated by latexml would not be directly usable for CAS interchange, unless the CAS systems were already at a stage where they could formalize the presentation tree themselves.
Quite often we see authors intersperse fragments of inline math that are individually malformed syntax inside a single well-formed sentence.
Say from arXiv:2105.04026:
"For , we denote by the set For two functions , we write , if there exists a universal constant such that for all "
The Content MathML representations of most individual formula fragments here are besides the point for the communicative purpose of the sentence. The author wants to relay to their reader the new notations which will be used throughout the text, such as and .
Source: Wikipedia, contour integration
With an associated higher-order form in the same source text:
There are non-trivial choices to be made in choosing how to build a Content MathML tree for these natural syntactical conveniences, largely depending on who the Content MathML consumer would be.
There are various other cases where the CAS-near formalization of natural language mathematics is not near the written syntax. This list may grow to include them, in an attempt to make this distinction as clear as possible.
The main point I want to claim with these examples is that communicating a math expression to a human reader is a different task than formalizing that expression for symbolic manipulation.