# Universal AST in the MyST Document Engine
> [!Caution]
> This has migrated to the JB team notes: https://hackmd.io/@jupyterbook/SkeK9b9wbg
---
> [!Note] TLDR
> 1. `xref`s are a core feature of MyST
> 2. MyST Sites are the source of truth for resolving `xrefs`
> 3. MyST Site build logic does not explicitly design against accidentally breaking PDFs in future.
> 4. We should resolve this.
At a high-level, this document is effectively proposing that **we formally recognise the status-quo** rather than build out anything new or changing how MyST works.
> [!Tip] Goals for Readers
> - To clarify unclear wording.
> - To annotate the document with questions.
## Context
The MyST Document Engine supports a variety of different export formats. These include:
- Typst `--typst`
- LaTeX `--tex`
- Structured Site Data `--site`
- Static HTML `--html`
In order to target these export types, the MyST Engine builds a project in three phases:
1. General Transforms (referencing, processing, execution, etc.)
2. Target Specialisation (preferred image processing, etc.)
3. Export Rendering (HTML output, LaTeX compilation, etc.)
All exports effectively share input from (1), but diverge after (2); the input to each specialisation is identical, but the outputs are different.[^output]
The majority of the transforms applied to a MyST Document are build-target agnostic (see (1)) general transforms, such as referencing, and execution processing:
```mermaid
graph LR
coreA@{ shape: lean-r, label: "Project A" } --> generalA[General Transforms A] --> specialWebA[Specialisation Web A] --> exportWebA[Export Web A]--> webA@{ shape: lean-r, label: "Site A" }
generalA --> specialPDFA[Specialisation PDF A] --> exportPDFA[Export PDF A]--> pdfA@{ shape: lean-r, label: "PDF A" }
```
[^output]: Note that we're _not_ talking about _export rendering_ here — this is _before_ the actual LaTeX / Typst rendering and compilation takes place.
But, there are some specialisations of the AST tree that need to be performed for export (as in (2)):
1. Preferred image formats (web-friendly formats like `webp` vs static-build formats like `pdf`).
2. Alternative representation of web-only outputs (like widgets), possibly hand-authored. See [the MyST Guide](https://mystmd.org/guide/figures#use-an-image-in-place-of-a-video-for-static-exports) for an example of selecting videos vs PNGs for web vs static export.
3. Content inclusion/exclusion for different audiences, e.g. instructor vs student views.
For single-use ASTs (that are build and consumed once), these distinctions are mostly semantic — whether we build and specialise, or do the two in one pass is somewhat irrelevant. However, we must now turn our gaze to another MyST feature: cross-references (xrefs).
One of the core features behind the MyST Document Engine is the notion of an xref to facilitate embedding and previewing external content. These are built upon three concepts:
1. The URI for identifying and locating external content e.g. <https://mystmd.org/guide>.
2. The MyST AST Spec that defines a schema for the external content to conform to (`<NAME>.json`), e.g. <https://mystmd.org/guide/frontmatter.json>.
3. A well-defined schema for discovering labeled content (`myst.xref.json`), e.g. <https://mystmd.org/guide/myst.xref.json>.
Crucially, the content published to these URIs is a MyST _site_ build. As we've seen above, however, site builds are currently thought of as a specialisation of the MyST AST — they're not formally designed in such a way that it's obvious they are expected to be consumed by other specialisations such as PDF builds. Whilst this is mostly invisible / just works, there are some corner-cases — web-first image formats are chosen, for example.[^mime] Furthermore, in future we might wish to provide author-specialisation of content e.g. `only:typst` that requires pruning by the export engine. If a site build prunes Typst-only content from the AST, it will not be available to Typst builds that consume the content via an xref. Presently, the `raw:typst` directive _does_ persist in site builds, but the reason for this is not explicitly documented.
[^mime]: E.g. `.pdf` images are rasterised (lossy) to `.webp`.
## Proposal
Given that xrefs are a core feature in the MyST Engine, and their construction atop the site build infrastructure establishes site builds as a source of truth, we have two possible options to improve the uncomfortable/fragile status-quo:
1. Treat xrefs as a site-first feature (such that static builds may have poor behaviour when using xrefs), i.e. the status quo:
```mermaid
graph LR
coreA@{ shape: lean-r, label: "Project A" } --> generalA[General Transforms A] --> specialWebA[Specialisation Web A] --> exportWebA[Export Web A]--> webA@{ shape: lean-r, label: "Site A" }
generalA --> specialPDFA[Specialisation PDF A] --> exportPDFA[Export PDF A]--> pdfA@{ shape: lean-r, label: "PDF A" }
coreB@{ shape: lean-r, label: "Project B" } --> generalB[General Transforms B] --> specialWebB[Specialisation Web B] --> exportWebB[Export Web B]--> webB@{ shape: lean-r, label: "Site B" }
generalB --> specialPDFB[Specialisation PDF B] --> exportPDFB[Export PDF B]--> pdfB@{ shape: lean-r, label: "PDF B" }
webA -.xref.-> generalB;
linkStyle 14 stroke:red;
```
2. Re-think the web specialisation as _generalised_ specialisation, effectively treating site builds as the source of truth:
```mermaid
graph LR;
coreA@{ shape: lean-r, label: "Project A" } --> generalA[General Transforms A] --> specialWebA[Specialisation Web A] --> exportWebA[Export Web A]--> webA@{ shape: lean-r, label: "Site A" }
specialWebA --> specialPDFA[Specialisation PDF A] --> exportPDFA[Export PDF A]--> pdfA@{ shape: lean-r, label: "PDF A" }
coreB@{ shape: lean-r, label: "Project B" } --> generalB[General Transforms B] --> specialWebB[Specialisation Web B] --> exportWebB[Export Web B]--> webB@{ shape: lean-r, label: "Site B" }
specialWebB --> specialPDFB[Specialisation PDF B] --> exportPDFB[Export PDF B]--> pdfB@{ shape: lean-r, label: "PDF B" }
webA -.xref.-> generalB;
linkStyle 14 stroke:red;
```
By considering the "Export Web A" step as a transform-free serialisation step, it can be seen that (2) is a linear path, rather than a branching graph. Practically this means that the site export should not drop things that are useful to PDF builds — it should be the _universal_ source of truth. I am introducing the term "Universal AST" to describe this concept that we've been passing around for a while with the MyST Engine.
It follows that I suggest we re-frame the site build as a uAST build (whether that means adopting new language or not), that serves as the source of truth for other builds.
<!--
### Example A — Widget Outputs
MyST Documents can include rich outputs from kernels. For each output from a cell, the engine encounters a MIME bundle — a collection of rich data associated with well-defined content type keys, e.g. `image/png`. Rendering these MIME bundles into MyST AST is a one-to-many step; for a bundle containing:
- `application/vnd.jupyter.widget-view+json`
- `image/png`
Web exports of MyST can (and prefer to) render `application/vnd.jupyter.widget-view+json` (potentially behind an interaction wrapper), whilst static exports will usually prefer `image/png`. Whilst sometimes we might want the author to control this explicitly — by using something like the `placeholder` system in MyST (or an extension thereof), we should support this out of the box.
-->
## Future Work
### Extensibility of Exporters
> [!Important]
> See https://hackmd.io/h8NTuPFBQImhCKKELOhx2g for a proposal around using the uAST mechanism to generalise and factor-out exporting.
### Specialising Content for Export
With a source of truth for all exports that is published to the web, we could add support for author-controlled overrides for markup via e.g. an `only` directive / mechanism:
```markdown
:::{only:typst}
I LOVE **Typst**. LaTeX _SUCKS_!
:::
:::{only:latex}
I LOVE **LaTeX**. Typst _SUCKS_!
:::
:::{only:html}
**EVERYTHING** SUCKS. `XML` WAS A MISSED OPPORTUNITY!
:::
```
## Related Issues
- https://github.com/jupyter-book/mystmd/issues/2615
- https://github.com/jupyter-book/mystmd/issues/1833
- https://github.com/jupyter-book/mystmd/pull/1744
- https://github.com/jupyter-book/mystmd/issues/1759