or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
JEP: official support for Markdown-based notebooks
Summary
This JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks, with file extension
.nb.md
, to be adopted as an official standard by the Jupyter community, and describes steps to make it supported by most tools in the ecosystem.It is meant as one of several steps towards offering flexibility in how to represent notebooks to simultaneously:
.ipynb
files;Motivation
The Jupyter notebook format is currently defined by a data structure, a serialization syntax (JSON), and a syntax for rich text cells (some variant of Markdown). This format has tremendously supported the community in having a lingua franca to exchange computational narratives. Yet over the years, the community has recurrently expressed the need for::
Meanwhile, there is a long track record of using text-based notebooks, both outside the Jupyter ecosystem (narrative-centric: R Markdown, org-mode, and others; code-centric: MATLAB, Visual Studio Code, Spyder, PyCharm and DataSpell), and within the Jupyter ecosystem, notably with Jupytext and Jupyter Book. The wide adoption of such solutions highlights their suitability in many use cases.
Though the existing text-based formats go a long way toward supporting the need of the community, they share a significant pain-point: the inability to represent outputs and attachments.
Other pain points are:
.ipynb
format;Guide-level explanation
This JEP provides a standard syntax for representing a Jupyter notebook as a Markdown file. We call such a file a Markdown Jupyter notebook. Here is a minimal Markdown Jupyter notebook that could typically be authored manually:
Note that this file contains only the minimal information required to reconstruct a valid notebook. In particular, there are no cell ids, outputs, execution counts.
Here is a Markdown Jupyter notebook containing a lossless representation of a full-featured Jupyter notebook with (cell) metadata, outputs, attachments, etc. As this example is long form it has been posted in this example repository along with the accompanying
.ipynb
file.Reference-level explanation
Design goals
The proposed syntax was designed to satisfy the following requirements:
In addition, the following are good to have:
This section describes the proposed syntax for serializing Jupyter notebooks in Markdown. Then, we detail the steps needed for this syntax to be supported by most tools in the Jupyter ecosystem.
Serialization syntax description
Top-level structure
A Jupyter Markdown notebook consists of an optional metadata header followed by Markdown representing a sequence of text cells, code cells, outputs, raw cells, etc.
Metadata header
The notebook metadata is represented by a YAML 1.2.2 header at the top of the document, surrounded by
---
delimiters:The metadata structure mirrors that of the Jupyter Notebook format.
Code cells
Jupyter Markdown notebooks use fenced code blocks with backticks to represent code cells (like Pandoc, Jupytext Markdown, Myst Markdown):
where the info string
{jupyter.code-cell}
specifies that this is a code cell.Cell parameters
execution_count
andid
must be encoded as such when specified:Cell metadata, if present, can be represented by an optional YAML 1.2.2 block between
---
delimiters at the beginning of the code block (same as Myst Markdown):Alternatively, non-nested metadata may be represented using the short-hand option syntax (same as Myst Markdown):
Finally, metadata may also be represented by a single line JSON blob in the info-string:
For compatibility with the Jupytext and Myst notebook formats, parsers may accept
{code-cell}
instead of{jupyter.code-cell}
.Code cell outputs
Once executed, a code cell may have zero or more outputs. When stored, the output(s) of the code cell appear(s) immediately after the code cell. The syntax resembles that of a code cell but also provides the different types of output specified in the
.ipynb
format:stream
,error
,execute_result
, anddisplay_data
.All types include the
output_type
field which has been included as acommand
on the first line of the directive.output_type: stream
The JSON format of a
stream
output includes 2 additional fieldsname
andtext
. The value of thetext
field can potentially be long and is reproduced in the body of the directive to improve readability.output_type: error
The JSON format of an
error
output includes 3 additional fieldsename
,evalue
andtraceback
. The value of thetraceback
field is reproduced in the body of the directive to improve readbility.output_type: display_data
andoutput_type: execute_result
These two output types are both "MIME bundles" and share a similar structure, with the output data being stored in the
data
field. Cell outputs of typeexecute_result
contain an additionalexecute_count
field.Consider for example these two cell outputs as represented in the original json ipynb format:
These output cells are represented as such in markdown:
Explanations:
data
attribute are represented as individual objects, consistent with JSON lines format, each MIME type occupying a separate line and serialized without any newline formatting to improve the behavior of text-based diffs.Organization of the MIME type data into separate objects on single lines improves readability and ensures that each line is a valid self-contained JSON object. On parsing the directive, a merge operation should be performed to construct a single
data
object containing allmimetype
keys.output_type
orexecute_count
forexecute_result
cell outputs, are represented in the info-string of the directive.Raw cells
Raw cells are represented in a similar fashion:
with the same syntax for parameters and metadata as for code-cells.
For compatibility with the Jupytext and Myst notebook formats, parsers may accept
{raw-cell}
instead of{jupyter.raw-cell}
.Text cells
Implicitly, the chunks of Markdown around and in between code/output/raw cells are considered as Markdown cells: thus, the whole document behaves as a single flowing Markdown document, interspersed with code/output/raw cells (same as MyST Markdown).
The chunks of Markdown may be broken up into several text cells by means of a thematic break
+++
(as in MyST Markdown):Text cell metadata can be provided by mean of a YAML 1.2.2 block, shorthand notation, or a single line JSON representation:
Note that the leading thematic break does not introduce a leading empty text cell.
Cell attachments
Cell attachments are embeded as fenced code blocks in the Markdown of the cell:
For multiple attachments, use several fenced code blocks.
Implementation
nbformat
specification to accept several serialization syntaxes.nbformat
so that:nbformat
chooses accordingly the appropriate serializers / deserializers.nbformat
, and register it in nbformat for extension.nb.md
..nb.md
files with theapplication/x-ipynb+md
MIME type and document that the new MIME type in the Jupyter documentation..ipynb
extension). Interesting candidate: Pandoc.Rationale and alternatives
Prior art
Some narrative-centric text based notebook formats
org-mode
notebooks: look for a notebook in https://orgmode.org/features.html and this discussion;In the Jupyter ecosystem, Jupytext lets users convert notebooks between different formats, including
.ipynb
and most of the aforementioned text-based formats. See the documentation, which nicely recaps the formats.Some code-centric notebook formats
In these formats, the notebook is a code file that can be run as-is.
Existing formats that use
# %%
as a cell delimiter:Implementations that use other delimiters:
# +
in Python and Julia scriptNone of these formats describe any way to encode outputs, metadata, or text cells.
Use Cases
Teaching Scenario
Course material tends to target non Jupyter experts, be narrative-heavy, iteratively and collaboratively authored as part of a larger body of material, and bear lightweight computations. Thereby, in this use case, the priority is on human readability and writability, conciseness, statelessness, and compatibility with version control, text tools and other material (typically written in Markdown). Outputs and widget states are typically best discarded, also to save space. Metadata is typically either handcrafted for dedicated tools (slides, grading tools, …) or best discarded. This is orthogonal to this JEP, but rich text support is a must.
Authoring in notebooks
Frontend serialization scenario
foo.nb.md
.foo.nb.md
as a text file..ipynb
. It can be shared as an.ipynb
file. Outputs that can be expensive (e.g. GPU/HPC) or hard to reproduce (e.g. complex software stacks) are preserved. Widget states that may depend on non-reproducible user interaction are also preserved.Rationale
Frequently Asked Questions
A successor to the current notebook format should allow current users to use the new format flawlessly.
Most of the current userbase creates their content in the notebook user interfaces, and picking one format over another in the preferences should not harm the ability to use existing extensions. If the new format does not allow to preserve the current behavior, we will lose the confidence of our userbase.
Enabling all three syntaxes supports both use cases where metadata is small and should be readable and editable and use cases where one wants to preserve metadata while making it as unobtrusive as possible. It also enables importing files that use either convention, helping with interoperability and migration.
Serialize to JSON and validate the JSON.
The output block(s) will still be recognised provided only whitespace characters inserted between.
Use thematic breaks
+++
. These allow individual markdown cell boundaries to be idenitified and can include metadata enabling a lossless roundtrip between text-based andipynb
format.Large widget state is notebook metadata. The requirement on back and forth convertibility gives an indication of where this goes. Also, we cannot store it in widget output because outputs only hold views of widget state, and the same widget can be displayed multiple times.
Unresolved questions
The following part of the design are expected to be resolved through the JEP process before it gets merged:
{jupyter.code-cell}
,{code-cell}
,{jupyter:code-cell}
,{.code}
python {jupyter.code-cell}
, as a hint for syntax highlighting in markdown viewers and editors. This language name is purely advisory for markdown editors, and carries no semantic meaning for Jupyter..nb.md
.Other open questions
Would there be possible programming languages that conflict with the metadata syntax for cells? For example, a programming language that has syntax like
:variable: value
?Future possibilities
The following issues and lines or actions are out of scope for this JEP and could be addressed in the future independently of the solution(s) that comes out of this JEP:
Discussions (won't be in the JEP)
Should we be opinionated about blessing a single format, or should we just encourage multiple text-based formats that already are in existence?
nbformat
objects to various filetypes, i.e.jupytext
is one implementation of this interface.Variants for code cell formats
With the above, code cells don't have syntax highlighting. Some markdown highlighters (intellij) highlight the code correctly if the language name is written directly after backticks:
This is a valid syntax for CommonMark, but is not a valid syntax for MyST. MyST suggests writing like that:
However, neither intellij nor vscode (in a simple markdown file) support it.
Variants for cell output formats
Note: most text formats don't support storing cell-outputs. Iff. text-based formats are mainly useful for "authoring", then maybe we want out-of-band outputs? i.e. perhaps we want a JEP to specify how out-of-band data are stored.
As a short hand, we could support
that would be automatically translatated to {json min/plain-text…}
Possible output block formats
Here we are making use of directive arguments to show the type of the output and reserving the YAML frontmatter block for the contents of the top level metadata key
Attachements
Metadata
Metadata in IPYNB format can be a nested data structure, thus a flat key-value format doesn't fit out needs.
Should metadata be YAML?
Which flavor of Markdown?
This discussion is more about cell contents. The syntax here is simple enough that we barely need to extend beyond CommonMark
How ambitious should the proposal be?
Level 0. Pick/define/refine an official alternative text-base serialization syntax to be seamlessly supported by most tools in the ecosystem (e.g. all these that use nbformat).
Level 1. Empower the community to implement, reuse, experiment with alternative serialization syntax to be seamlessly supported by most tools in the ecosystem assuming appropriate extensions are installed. Shepherd the process and pick the most promissing alternatives and make it official.
Pros of level 0:
Cons of level 0:
Pros of Level 1:
Potential caveats of Level 1:
past experience with Jupytext seems to suggest that this won't be too bad: there is a natural incentive to stick to widely used / supported formats.
that's already the case for many Jupyter features