This JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks, with file extension .nb.md
, to be adopted as an official standard by the Jupyter community, and describes steps to make it supported by most tools in the ecosystem.
It is meant as one of several steps towards offering flexibility in how to represent notebooks to simultaneously:
.ipynb
files;The Jupyter notebook format is currently defined by a data structure, a serialization syntax (JSON), and a syntax for rich text cells (some variant of Markdown). This format has tremendously supported the community in having a lingua franca to exchange computational narratives. Yet over the years, the community has recurrently expressed the need for::
Meanwhile, there is a long track record of using text-based notebooks, both outside the Jupyter ecosystem (narrative-centric: R Markdown, org-mode, and others; code-centric: MATLAB, Visual Studio Code, Spyder, PyCharm and DataSpell), and within the Jupyter ecosystem, notably with Jupytext and Jupyter Book. The wide adoption of such solutions highlights their suitability in many use cases.
Though the existing text-based formats go a long way toward supporting the need of the community, they share a significant pain-point: the inability to represent outputs and attachments.
Other pain points are:
.ipynb
format;This JEP provides a standard syntax for representing a Jupyter notebook as a Markdown file. We call such a file a Markdown Jupyter notebook. Here is a minimal Markdown Jupyter notebook that could typically be authored manually:
---
metadata:
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
# A minimal Markdown Jupyter notebook
This is a text cell
```{jupyter.code-cell}
1+1
```
This is another text cell
+++
And another one
Note that this file contains only the minimal information required to reconstruct a valid notebook. In particular, there are no cell ids, outputs, execution counts.
Here is a Markdown Jupyter notebook containing a lossless representation of a full-featured Jupyter notebook with (cell) metadata, outputs, attachments, etc. As this example is long form it has been posted in this example repository along with the accompanying .ipynb
file.
The proposed syntax was designed to satisfy the following requirements:
In addition, the following are good to have:
This section describes the proposed syntax for serializing Jupyter notebooks in Markdown. Then, we detail the steps needed for this syntax to be supported by most tools in the Jupyter ecosystem.
A Jupyter Markdown notebook consists of an optional metadata header followed by Markdown representing a sequence of text cells, code cells, outputs, raw cells, etc.
The notebook metadata is represented by a YAML 1.2.2 header at the top of the document, surrounded by ---
delimiters:
---
metadata:
kernel_info:
name: the name of the kernel
language_info:
name: the programming language of the kernel
version: the version of the language
codemirror_mode: The name of the codemirror mode to use [optional]
nbformat: 4
nbformat_minor: 0
---
The metadata structure mirrors that of the Jupyter Notebook format.
Jupyter Markdown notebooks use fenced code blocks with backticks to represent code cells (like Pandoc, Jupytext Markdown, Myst Markdown):
```{jupyter.code-cell}
print('hi')
```
where the info string {jupyter.code-cell}
specifies that this is a code cell.
Cell parameters execution_count
and id
must be encoded as such when specified:
```{jupyter.code-cell execution_count=N id=...}
print('hi')
```
Cell metadata, if present, can be represented by an optional YAML 1.2.2 block between ---
delimiters at the beginning of the code block (same as Myst Markdown):
```{jupyter.code-cell execution_count=42 id=1234abcd}
---
key:
more: true
tags: [hide-output, show-input]
---
print('hi')
```
Alternatively, non-nested metadata may be represented using the short-hand option syntax (same as Myst Markdown):
```{jupyter.code-cell}
:tags: [hide-output, show-input]
print('hi')
```
Finally, metadata may also be represented by a single line JSON blob in the info-string:
```{jupyter.code-cell metadata={json blob}}
:tags: [hide-output, show-input]
print(Hello!")
```
For compatibility with the Jupytext and Myst notebook formats, parsers may accept {code-cell}
instead of {jupyter.code-cell}
.
Once executed, a code cell may have zero or more outputs. When stored, the output(s) of the code cell appear(s) immediately after the code cell. The syntax resembles that of a code cell but also provides the different types of output specified in the .ipynb
format: stream
, error
, execute_result
, and display_data
.
All types include the output_type
field which has been included as a command
on the first line of the directive.
output_type: stream
The JSON format of a stream
output includes 2 additional fields name
and text
. The value of the text
field can potentially be long and is reproduced in the body of the directive to improve readability.
# .ipynb
```
{
"output_type": "stream",
"name": "stdout",
"text": [
"This is the stream content that was in the *text* field\n",
"of the original json output\n"
]
}
```
# text-based .md
```{jupyter.output output_type=stream}
---
name: stdout
---
This is the stream content that was in the *text* field
of the original json output
```
output_type: error
The JSON format of an error
output includes 3 additional fields ename
, evalue
and traceback
. The value of the traceback
field is reproduced in the body of the directive to improve readbility.
# .ipynb
{
"output_type": "error",
"ename": "ReferenceError",
"evalue": "x is unknown",
"traceback": [
"The *traceback* field rendered as content\n",
]
}
# text-based .md
```{jupyter.output output_type=error}
---
ename: ReferenceError
evalue: x is a unknown
---
The *traceback* field rendered as content
```
output_type: display_data
and output_type: execute_result
These two output types are both "MIME bundles" and share a similar structure, with the output data being stored in the data
field. Cell outputs of type execute_result
contain an additional execute_count
field.
Consider for example these two cell outputs as represented in the original json ipynb format:
{
"output_type": "display_data",
"metadata": {
some-metadata-key: "some-value"
},
"data": {
"text/html": "<div>Some HTML Content</div>",
"image/png": "base-64-encoded-image"
}
},
...,
{
"output_type": "execute_result",
"execute_count": 2,
"metadata": {
some-metadata-key: "some-value"
},
"data": {
"text/html": "<div>Some HTML Content</div>",
"image/png": "base-64-encoded-image"
}
}
These output cells are represented as such in markdown:
```{jupyter.output output_type=display_data}
---
some_metadata_key: some-value
---
{ "text/html": "<div>Some HTML Content</div>" }
{ "image/png": "base-64-encoded-image" }
```
```{jupyter.output output_type=execute_result execute_count=42}
---
some_metadata_key: 'some-value'
---
{ "text/html": "<div>Some HTML Content</div>" }
{ "image/png": "base-64-encoded-image" }
```
Explanations:
data
attribute are represented as individual objects, consistent with JSON lines format, each MIME type occupying a separate line and serialized without any newline formatting to improve the behavior of text-based diffs.data
object containing all mimetype
keys.output_type
or execute_count
for execute_result
cell outputs, are represented in the info-string of the directive.Raw cells are represented in a similar fashion:
```{jupyter.raw-cell}
---
raw_mimetype: text/html
---
<b>Bold text<b>
```
with the same syntax for parameters and metadata as for code-cells.
For compatibility with the Jupytext and Myst notebook formats, parsers may accept {raw-cell}
instead of {jupyter.raw-cell}
.
Implicitly, the chunks of Markdown around and in between code/output/raw cells are considered as Markdown cells: thus, the whole document behaves as a single flowing Markdown document, interspersed with code/output/raw cells (same as MyST Markdown).
A text cell
```{jupyter.code-cell}
1 + 1
```
Another text cell
```{jupyter.code-cell}
1+2
```
The chunks of Markdown may be broken up into several text cells by means of a thematic break +++
(as in MyST Markdown):
A text cell
+++
Another text cell
Text cell metadata can be provided by mean of a YAML 1.2.2 block, shorthand notation, or a single line JSON representation:
+++ { "slide": true }
A text cell
+++
---
foo: bar
---
Another text cell
+++
:foo: bar
A third text cell
Note that the leading thematic break does not introduce a leading empty text cell.
Cell attachments are embeded as fenced code blocks in the Markdown of the cell:
Here is some text.
And now .
```{jupyter.attachment}
:label: image.png
{json blurb}
```
For multiple attachments, use several fenced code blocks.
nbformat
specification to accept several serialization syntaxes.nbformat
so that:
nbformat
chooses accordingly the appropriate serializers / deserializers.nbformat
, and register it in nbformat for extension .nb.md
..nb.md
files with the application/x-ipynb+md
MIME type and document that the new MIME type in the Jupyter documentation..ipynb
extension). Interesting candidate: Pandoc.org-mode
notebooks: look for a notebook in https://orgmode.org/features.html and this discussion;In the Jupyter ecosystem, Jupytext lets users convert notebooks between different formats, including .ipynb
and most of the aforementioned text-based formats. See the documentation, which nicely recaps the formats.
In these formats, the notebook is a code file that can be run as-is.
Existing formats that use # %%
as a cell delimiter:
Implementations that use other delimiters:
# +
in Python and Julia scriptNone of these formats describe any way to encode outputs, metadata, or text cells.
Course material tends to target non Jupyter experts, be narrative-heavy, iteratively and collaboratively authored as part of a larger body of material, and bear lightweight computations. Thereby, in this use case, the priority is on human readability and writability, conciseness, statelessness, and compatibility with version control, text tools and other material (typically written in Markdown). Outputs and widget states are typically best discarded, also to save space. Metadata is typically either handcrafted for dedicated tools (slides, grading tools, …) or best discarded. This is orthogonal to this JEP, but rich text support is a must.
foo.nb.md
.foo.nb.md
as a text file..ipynb
. It can be shared as an .ipynb
file. Outputs that can be expensive (e.g. GPU/HPC) or hard to reproduce (e.g. complex software stacks) are preserved. Widget states that may depend on non-reproducible user interaction are also preserved.What's the rationale for the support of lossless serialization of any notebook, when serializing large data chunks like outputs or attachments will anyway harm the readability of the file?
A successor to the current notebook format should allow current users to use the new format flawlessly.
Most of the current userbase creates their content in the notebook user interfaces, and picking one format over another in the preferences should not harm the ability to use existing extensions. If the new format does not allow to preserve the current behavior, we will lose the confidence of our userbase.
Why support several syntaxes for metadata?
Enabling all three syntaxes supports both use cases where metadata is small and should be readable and editable and use cases where one wants to preserve metadata while making it as unobtrusive as possible. It also enables importing files that use either convention, helping with interoperability and migration.
How to validate the plain text format notebooks, especialy against the emerging ideas around including JSON schemas for validation?
Serialize to JSON and validate the JSON.
What happens if people insert text (or any whitespace) between a cell's input and output blocks(s)?
The output block(s) will still be recognised provided only whitespace characters inserted between.
How do we split a large body of markdown into several markdown cells (in other words, can we have cell breaks )?
Use thematic breaks +++
. These allow individual markdown cell boundaries to be idenitified and can include metadata enabling a lossless roundtrip between text-based and ipynb
format.
How to store large widget states? With the current format, widgets states will be stored in the notebook metadata, that is in the YAML header which will soon become very large. Should widget states be moved to outputs instead?
Large widget state is notebook metadata. The requirement on back and forth convertibility gives an indication of where this goes. Also, we cannot store it in widget output because outputs only hold views of widget state, and the same widget can be displayed multiple times.
The following part of the design are expected to be resolved through the JEP process before it gets merged:
{jupyter.code-cell}
, {code-cell}
, {jupyter:code-cell}
, {.code}
python {jupyter.code-cell}
, as a hint for syntax highlighting in markdown viewers and editors. This language name is purely advisory for markdown editors, and carries no semantic meaning for Jupyter. .nb.md
.Would there be possible programming languages that conflict with the metadata syntax for cells? For example, a programming language that has syntax like :variable: value
?
The following issues and lines or actions are out of scope for this JEP and could be addressed in the future independently of the solution(s) that comes out of this JEP:
nbformat
objects to various filetypes, i.e. jupytext
is one implementation of this interface.```{code-cell} ipython3
---
id: 12344
exec_nt: 3
metadata:
nbgrader:
grade: true
grade_id: cell-963f3a9626ae1519
locked: true
points: 1
schema_version: 3
solution: false
task: false
---
assert ultime == 42
```
```{code-cell} id=12344 excution_count=3
---
nbgrader:
grade: true
grade_id: cell-963f3a9626ae1519
locked: true
points: 1
schema_version: 3
solution: false
task: false
---
assert ultime == 42
```
```{code-cell } ipython3
---
id: 12344
exec_nt=3
nbgrader:
grade: true
grade_id: cell-963f3a9626ae1519
locked: true
points: 1
schema_version: 3
solution: false
task: false
---
assert ultime == 42
```
```{code-cell} ipython3
---
attributes:
id: 12344
exec_nt=3
nbgrader:
grade: true
grade_id: cell-963f3a9626ae1519
locked: true
points: 1
schema_version: 3
solution: false
task: false
---
assert ultime == 42
```
With the above, code cells don't have syntax highlighting. Some markdown highlighters (intellij) highlight the code correctly if the language name is written directly after backticks:
```python {something}
import sys
2 + 2
```
This is a valid syntax for CommonMark, but is not a valid syntax for MyST. MyST suggests writing like that:
```{something} python
import sys
2 + 2
```
However, neither intellij nor vscode (in a simple markdown file) support it.
Note: most text formats don't support storing cell-outputs. Iff. text-based formats are mainly useful for "authoring", then maybe we want out-of-band outputs? i.e. perhaps we want a JEP to specify how out-of-band data are stored.
print(3)
1+1
3
{json blurb}
As a short hand, we could support
2
that would be automatically translatated to {json min/plain-text…}
Traceback: ....
Here we are making use of directive arguments to show the type of the output and reserving the YAML frontmatter block for the contents of the top level metadata key
```{jupyter.output} stream
---
name: stdout
---
This is the stream content that was in the *text* field of the
original json output
```
```{jupyter.output} error
---
ename: ReferenceError
evalue: x is a unknown
---
The *traceback* field rendered as content
```
```{jupyter.output} display_data
---
mdkey: value
---
{ "text/plain": "some text data" }
```
```{jupyter.output} execute_result execute_count=42
---
some_metadata_key: 'value'
---
{ "image/png": base64-image-text }
```
+++
asdasdf
as
sadf
```{jupyter.attachment}
{'foo.png': {json blurb}
'bar.png': {json blurb}
```
```{jupyter.attachment}
{'foo.png': {json blurb}
'bar.png': {json blurb}
```
```{jupyter.attachment}
:label: foo.png
{json blurb}
```
```{jupyter.attachment}
:label: foo.png
{json blurb}
```
```{jupyter.attachment} foo.png
{json blurb}
```
Metadata in IPYNB format can be a nested data structure, thus a flat key-value format doesn't fit out needs.
Should metadata be YAML?
This discussion is more about cell contents. The syntax here is simple enough that we barely need to extend beyond CommonMark
Level 0. Pick/define/refine an official alternative text-base serialization syntax to be seamlessly supported by most tools in the ecosystem (e.g. all these that use nbformat).
Level 1. Empower the community to implement, reuse, experiment with alternative serialization syntax to be seamlessly supported by most tools in the ecosystem assuming appropriate extensions are installed. Shepherd the process and pick the most promissing alternatives and make it official.
Pros of level 0:
Cons of level 0:
Pros of Level 1:
Potential caveats of Level 1:
[nt] even with Level 1, deciding which format is made official is in the hands of the Jupyter committee.
[vl] Then, it must be emphasized that Jupyter must be able to open only officially accepted notebook file formats. Any notebook file with a third-party format must be considered as invalid. Should anyone be able to work with that format, they have to use their own fork of Jupyter.
I don't see why one should actively prevent loading third party formats. If a user chooses an official format, s.he knows that this will come with garantees. Again, it's just like any Jupyter extension. Using official extensions or widely used ones gives you garantees. But you can still use, at your own risks, others.
[vl] That makes sense. However, it would be enough to have a different filename extension in that case, without describing the contents of the file.[nt] Definitely, each syntax should use a different extension
Also, in that case, the proposal for additional third-party format must be accepted along with the new default official format. Either both are accepted, or none.
So your suggestion would be to have to separate JEP' id=12344 excution_count=3s? > > > We have to separate JEPs anyway, simply because this file is too big and contains different ideas. At least, I was told about that.
Agreed. The only piece that makes me hesitate is: which JEP should come first?
The new official notebook format may exist with the ability to use third-party formats. However, the ability to use third-party format may not exist without the official notebook format. Otherwise, all negative consequenses around this text can be applied. So, I suggest making the new output official notebook format the first.
[nt] I/ see the political reason. I guess my hesitation point is whether the landscape is mature at this stage to set in stone the official format; or whether having an experimentation period where the community can explore would help shape the official format.
Then, the JEP should define what is the experimentation period and what must be done after that.
[nt] Open question: if we start with just two formats: is there a risk that the plugin mechanism that will be implemented will be "hardcoded" and then hard to generalize after the fact?
[nt] under Level 0, developers of a tool already have to ensure that it supports both the ipynb and the text format; once this is done, supporting more is for free. I see: you mean in the case they can't use the community provided parsers.
[nt] Not exactly. In case of Level 0 developers know how to parse two different formats with well-defined structure and good documentation. In case of Level 1 developers stumble upon unknown number of unknown formats which may have no documentation.
[nt] that's part of the selection process: a format that's not documented or is not provided with good parsers won't be adopted by tools; if users care about a given format, and want tools to work, they should make sure that their format is high quality.
[nt] yes, that's true of any Jupyter extension. It's part of the selection process in the ecosystem.
–>