--- title: "unified, remark and rehype" path: "unified remark and rehype" --- {%hackmd @RintarouTW/About %} # unified, remark and rehype ## unified process model <center><i>Unified Process Model</i></center> <center> ```graphviz digraph { graph [bgcolor=transparent;fontcolor="#888888";color="#888888"]; node [fontcolor="#888888";color="#888888"]; edge [fontcolor="#888888";color="#888800"]; label="process" rankdir=LR Input -> Parser edge [label=parse] Parser -> "Syntax Tree" edge [label=""] "Syntax Tree" -> Compiler edge [label="stringify"] Compiler -> Output edge [label="run"] Transformer -> "Syntax Tree" subgraph cluster_plugin { label="plugin" Parser Transformer Compiler } } ``` </center> ### Node Model Syntactic units in unist syntax trees are called nodes, and implement the Node interface. Node ``` interface Node { type: string data: Data? position: Position? } ``` The `type` field is a non-empty string representing the variant of a node. This field can be used to determine the type a node implements. The `data` field represents information from the ecosystem. The value of the data field implements the Data interface. The `position` field represents the location of a node in a source document. The value of the position field implements the Position interface. The position field must not be present if a node is generated. ``` interface Position { start: Point end: Point indent: [number >= 1]? } ``` Position represents the location of a node in a source file. The `start` field of Position represents the place of the first character of the parsed source region. The end field of Position represents the place of the first character after the parsed source region, whether it exists or not. The value of the start and end fields implement the Point interface. The `indent` field of Position represents the start column at each index (plus start line) in the source region, for elements that span multiple lines. If the syntactic unit represented by a node is not present in the source file at the time of parsing, the node is said to be generated and it must not have positional information. ``` interface Point { line: number >= 1 column: number >= 1 offset: number >= 0? } ``` Point represents one place in a source file. The `line` field (1-indexed integer) represents a line in a source file. The `column` field (1-indexed integer) represents a column in a source file. The `offset` field (0-indexed integer) represents a character in a source file. The term character means a (UTF-16) code unit which is defined in the Web IDL specification. <center> ```graphviz digraph { graph [bgcolor=transparent;fontcolor="#888888";color="#888888"]; node [fontcolor="#888888";color="#888888"]; edge [fontcolor="#888888";color="#888800"]; rankdir=TB subgraph cluster_source_code { label="Source Code" "Stream" } subgraph cluster_node { label="Node" subgraph cluster_Type { label="Type" type } subgraph cluster_Data { label="Data" value -> string, number, object, array, boolen, null } subgraph cluster_Position { label="Position" rankdir=TB start -> "Stream" end -> "Stream" indent } } } ``` </center> ### Ecosystems `remark` — Markdown `rehype` — HTML `retext` — Natural language `redot` — Graphviz ### Syntax Trees <center> ```graphviz digraph { graph [bgcolor=transparent;fontcolor="#888888";color="#888888"]; node [fontcolor="#888888";color="#888888"]; edge [fontcolor="#888888";color="#888800"]; label="Syntax Tree (Depth First Tree Triversal)" edge[label=" 1"] "Root Node" -> "1st Child Node" edge[label=" 2"] "1st Child Node" -> "1st Grand Child Nodes" edge[label=" 3"] "1st Child Node" -> "2nd Grand Child Nodes" edge[label=" 4 ... n"] "2nd Grand Child Nodes" -> "..." edge[label=" n+1"] "Root Node" -> "2nd Child Node" edge[label=" n+2"] "Root Node" -> "3rd Child Node" edge[label=" n+3 ... n+p"] "3rd Child Node" -> "3rd Child Node's Child Nodes..." edge[label=" n+p+1 ... n+p+k"] "Root Node" -> "... Child Nodes" } ``` </center> Syntax trees are representations of source code or even natural language. These trees are abstractions that make it possible to analyze, transform, and generate code. - concrete syntax trees: structures that represent every detail (such as white-space in white-space insensitive languages) - abstract syntax trees: structures that only represent details relating to the syntactic structure of code (such as ignoring whether a double or single quote was used in languages that support both, such as JavaScript). `unist` — Universal Syntax Tree `mdast` — Markdown Abstract Syntax Tree format `hast` — HTML Abstract Syntax Tree format `xast` — XML Abstract Syntax Tree format `nlcst` — Natural Language Concrete Syntax Tree format Glossary https://github.com/syntax-tree/unist#glossary ### unified plugin https://github.com/unifiedjs/unified#plugin https://unifiedjs.com/learn/guide/create-a-plugin/ ## remark Markdown(`.md`) to `mdast` (markdown abstract syntax tree) ### remark plugin https://github.com/remarkjs/remark/blob/master/doc/plugins.md#creating-plugins ex: remark-math, remark-image, remark-iframe, remark-emoji, remark-containers, etc.. https://github.com/remarkjs/remark/blob/master/doc/plugins.md#list-of-plugins ## rehype `mdast` to `mdhast` (markdown html abstract syntax tree) transformers ### rehype plugin ## mdx Markdown + JSX ### `mdxast` & `mdxhast` Server Build only, no runtime. ### mdx plugin https://mdxjs.com/guides/writing-a-plugin ### mdx math blocks https://mdxjs.com/guides/math-blocks ###### tags: `unified` `unist` `remark` `rehype` `mdx` `node`