# When is Content MathML not natural?
**A**: When the formalization of an expression requires a complete departure from its presentation.
In other words, whenever the author uses notational shorthands that serve to compress a non-trivial amount of conceptual structure.
## Examples
### 1. Brief introduction of variables
$$ x_{1,2} = \pm 1 $$
A content tree that is formally useful ought to disentangle $x_1 = 1$ and $x_2 = -1$. And recognize that if the right-hands side (RHS) was $\mp 1$, those assignments would be flipped.
**Current tools**, such as latexml, produce Content MathML that does not try to do this level of inference, instead marking up an artificial "ambiguous superscript" symbol applied to "x" and "a list 1 2". While also treating "plus-minus" as a single content symbol. In essence, latexml would provide a near-direct translation of the layout tree in the Content syntax, for cases beyond explicitly handled notations in its `MathGrammar`.
It is unclear if both content trees are actually acceptable approaches to using Content MathML, and if they are - how many alternative approaches there might be. Which leads to vendor-specific dialects, and strictly harms interoperability.
### 2. Advanced ellipsis
Take the construct of one bidiagonal $N \times N$ Toeplitz matrix, as seein in [arXiv:15112.06076](https://arxiv.org/pdf/1512.06076.pdf):
$$ P_{\mathrm{I}}=\left(\begin{array}{cccccc}
0 & a & 0 & . . & . . & 0 \\
b & 0 & a & . . & . . & 0 \\
0 & b & 0 & . . & . . & 0 \\
. . & . . & . . & . . & . . & . . \\
0 & . . & . . & . . & 0 & a \\
0 & 0 & . . & . . & b & 0
\end{array}\right) $$
A formalized Content MathML tree that is reusable by a CAS system ought to provide a (system of) equation(s) that determines the structure of the matrix w.r.t the variables $a$ and $b$.
Here again present-day tools, such as latexml, provide a near-verbatim translation of the presentation/layout tree, using a `matrixrow` element to hold the content of each row and depositing placeholder `csymbol` nodes in the places where the presentation had ellipses.
The hypothetical CAS-interchange tree should depart from the natural layout, and would be impossible to use for fine-grained parallel annotations. This is also true vice-versa. The layout-near Content tree generated by latexml would not be directly usable for CAS interchange, unless the CAS systems were already at a stage where they could formalize the presentation tree themselves.
### 3. The math expressions has a syntactic purpose
Quite often we see authors intersperse fragments of inline math that are individually malformed syntax inside a single well-formed sentence.
Say from [arXiv:2105.04026](https://arxiv.org/pdf/2105.04026.pdf):
"For $N \in \mathbb{N}$, we denote by $[N]$ the set $\{1, \ldots, N\} .$ For two functions $f, g: \mathcal{X} \rightarrow[0, \infty)$, we write $f \lesssim g$, if there exists a universal constant $c$ such that $f(x) \leq c g(x)$ for all $x \in \mathcal{X} .$ "
The Content MathML representations of most individual formula fragments here are besides the point for the communicative purpose of the sentence. The author wants to relay to their reader the new notations which will be used throughout the text, such as $[N]$ and $f \lesssim g$.
### 4. Eliding arguments
Source: [Wikipedia, contour integration](https://en.wikipedia.org/wiki/Contour_integration#Example_4_–_branch_cuts)
$$ {\displaystyle \int _{C}=\int _{\varepsilon }^{R}+\int _{\Gamma }+\int _{R}^{\varepsilon }+\int _{\gamma }.} $$
With an associated higher-order form in the same source text:
$$ {\displaystyle {\begin{aligned}\left(\int _{R}+\int _{M}+\int _{N}+\int _{r}\right)f(z)\,dz&= \ldots \end{aligned}}} $$
There are non-trivial choices to be made in choosing how to build *a* Content MathML tree for these natural syntactical conveniences, largely depending on who the Content MathML **consumer** would be.
### Others
There are various other cases where the CAS-near formalization of natural language mathematics is not near the written syntax. This list may grow to include them, in an attempt to make this distinction as clear as possible.
# Conclusion
The main point I want to claim with these examples is that **communicating** a math expression to a human reader is a different task than **formalizing** that expression for symbolic manipulation.