owned this note changed 2 years ago
Published Linked with GitHub

JEP: official support for Markdown-based notebooks

Summary

This JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks, with file extension .nb.md, to be adopted as an official standard by the Jupyter community, and describes steps to make it supported by most tools in the ecosystem.

It is meant as one of several steps towards offering flexibility in how to represent notebooks to simultaneously:

  • support an extensive range of use cases with various balances of priority between conciseness and readability and lossless conversion to and from .ipynb files;
  • foster standardization by being opinionated on some of the low-level choices.

Motivation

The Jupyter notebook format is currently defined by a data structure, a serialization syntax (JSON), and a syntax for rich text cells (some variant of Markdown). This format has tremendously supported the community in having a lingua franca to exchange computational narratives. Yet over the years, the community has recurrently expressed the need for::

  1. The simplicity of the mental model for end users: "A notebook is just a glorified Markdown file".
  2. Human readability and editability of the plain file.
  3. Natural interoperability with version control systems and software forges (GitHub, GitLab, etc), e.g. readable diffs and merges, quick online edition with forges' editor.
  4. Interoperability with standard text tools to browse, edit, power-edit, author, mass search and replace, tags, macros, etc.
  5. Interoperability with existing notebook formats.
  6. Efficient handling of large data blobs like outputs.
  7. Streaming - enable progressive loading of a notebook, where a partially received file remains usable/viewable (this is not possible in JSON).
  8. Natural integration in larger bodies of contents: e.g. books built out of a combination of plain Markdown files and notebooks.
  9. Complex IDEs like PyCharm or Visual Studio Code and complex text editors like Vim or Emacs are optimized for working with text files.

Meanwhile, there is a long track record of using text-based notebooks, both outside the Jupyter ecosystem (narrative-centric: R Markdown, org-mode, and others; code-centric: MATLAB, Visual Studio Code, Spyder, PyCharm and DataSpell), and within the Jupyter ecosystem, notably with Jupytext and Jupyter Book. The wide adoption of such solutions highlights their suitability in many use cases.

Though the existing text-based formats go a long way toward supporting the need of the community, they share a significant pain-point: the inability to represent outputs and attachments.

Other pain points are:

  • Jupytext needs to hack its way in the Jupyter(Lab) content manager to support opening and saving text notebooks seamlessly; this also takes some configuration steps for the user;
  • most other tools in the ecosystem (e.g. nbconvert, nbgrader) can't read or write text notebooks, forcing the users to convert their notebooks back and forth using .ipynb format;
  • there are many text notebook formats or implementations out there, each of which brings a combination of coupled choices between distinct aspects: which syntax is used for serialization syntax, which syntax is used for rich text, which information can be or is stored, etc. Proper decoupling between these aspects to maximize flexibility together with opinionated standardization on the basics would benefit the community to cover the variety of use cases;
  • the content-type discovery (Should the file be considered as a notebook? If yes, in which format?) currently relies on a combination of looking at the extension and at the metadata inside the file.

Guide-level explanation

This JEP provides a standard syntax for representing a Jupyter notebook as a Markdown file. We call such a file a Markdown Jupyter notebook. Here is a minimal Markdown Jupyter notebook that could typically be authored manually:

​​​​---
​​​​metadata:
​​​​    kernelspec:
​​​​      display_name: Python 3 (ipykernel)
​​​​      language: python
​​​​      name: python3
​​​​---
​​​​# A minimal Markdown Jupyter notebook

​​​​This is a text cell

​​​​```{jupyter.code-cell}
​​​​1+1
​​​​```

​​​​This is another text cell

​​​​+++

​​​​And another one

Note that this file contains only the minimal information required to reconstruct a valid notebook. In particular, there are no cell ids, outputs, execution counts.

Here is a Markdown Jupyter notebook containing a lossless representation of a full-featured Jupyter notebook with (cell) metadata, outputs, attachments, etc. As this example is long form it has been posted in this example repository along with the accompanying .ipynb file.

Reference-level explanation

Design goals

The proposed syntax was designed to satisfy the following requirements:

  • The syntax should allow for lossless serialization of any Jupyter Notebook data structure. This includes:
    • notebook metadata;
    • text and code cells, with metadata and parameters (e.g. cell IDs);
    • outputs, with mime-times, metadata, etc;
    • cell attachments;
    • widget states.
  • The serialized notebook should be a valid Markdown file.
  • Should aim for reasonable human readability and editability: text, code, raw cells, and metadata should be human-readable and editable. Large chunks of data like output cells or attachments should be as non-obtrusive as possible.
  • Should aim for reasonable support by version control.
  • Should aim for reasonable rendering by typical Markdown viewers.
  • Should be similar to existing popular formats; ideally should be one of the existing popular formats.

In addition, the following are good to have:

  • Enable reading similar notebook formats to ease the transition.
  • Enable third-party extensions to implement and declare alternative serialization syntaxes so that most tools natively support them. This helps with interoperability: think of a third-party extension to treat R Markdown notebooks as native Jupyter notebooks.

This section describes the proposed syntax for serializing Jupyter notebooks in Markdown. Then, we detail the steps needed for this syntax to be supported by most tools in the Jupyter ecosystem.

Serialization syntax description

Top-level structure

A Jupyter Markdown notebook consists of an optional metadata header followed by Markdown representing a sequence of text cells, code cells, outputs, raw cells, etc.

Metadata header

The notebook metadata is represented by a YAML 1.2.2 header at the top of the document, surrounded by --- delimiters:

--- metadata: kernel_info: name: the name of the kernel language_info: name: the programming language of the kernel version: the version of the language codemirror_mode: The name of the codemirror mode to use [optional] nbformat: 4 nbformat_minor: 0 ---

The metadata structure mirrors that of the Jupyter Notebook format.

Code cells

Jupyter Markdown notebooks use fenced code blocks with backticks to represent code cells (like Pandoc, Jupytext Markdown, Myst Markdown):

​​​​```{jupyter.code-cell}
​​​​print('hi')
​​​​```

where the info string {jupyter.code-cell} specifies that this is a code cell.

Cell parameters execution_count and id must be encoded as such when specified:

​​​​```{jupyter.code-cell execution_count=N id=...}
​​​​print('hi')
​​​​```

Cell metadata, if present, can be represented by an optional YAML 1.2.2 block between --- delimiters at the beginning of the code block (same as Myst Markdown):

​​​​```{jupyter.code-cell execution_count=42 id=1234abcd}
​​​​---
​​​​key:
​​​​  more: true
​​​​tags: [hide-output, show-input]
​​​​---
​​​​print('hi')
​​​​```

Alternatively, non-nested metadata may be represented using the short-hand option syntax (same as Myst Markdown):

​​​​```{jupyter.code-cell}
​​​​:tags: [hide-output, show-input]

​​​​print('hi')
​​​​```

Finally, metadata may also be represented by a single line JSON blob in the info-string:

​​​​```{jupyter.code-cell metadata={json blob}}
​​​​:tags: [hide-output, show-input]

​​​​print(Hello!")
​​​​```

For compatibility with the Jupytext and Myst notebook formats, parsers may accept {code-cell} instead of {jupyter.code-cell}.

Code cell outputs

Once executed, a code cell may have zero or more outputs. When stored, the output(s) of the code cell appear(s) immediately after the code cell. The syntax resembles that of a code cell but also provides the different types of output specified in the .ipynb format: stream, error, execute_result, and display_data.

All types include the output_type field which has been included as a command on the first line of the directive.

output_type: stream

The JSON format of a stream output includes 2 additional fields name and text. The value of the text field can potentially be long and is reproduced in the body of the directive to improve readability.

# .ipynb
```
{
    "output_type": "stream",
    "name": "stdout",
    "text": [
        "This is the stream content that was in the *text* field\n",
        "of the original json output\n"
    ]
}
```
# text-based .md
```{jupyter.output output_type=stream}
---
name: stdout
---
This is the stream content that was in the *text* field
of the original json output
```
output_type: error

The JSON format of an error output includes 3 additional fields ename, evalue and traceback. The value of the traceback field is reproduced in the body of the directive to improve readbility.

# .ipynb
{
    "output_type": "error",
    "ename": "ReferenceError",
    "evalue": "x is unknown",
    "traceback": [
        "The *traceback* field rendered as content\n",
    ]
}

# text-based .md
```{jupyter.output output_type=error}
---
ename: ReferenceError
evalue: x is a unknown
---
The *traceback* field rendered as content
```
output_type: display_data and output_type: execute_result

These two output types are both "MIME bundles" and share a similar structure, with the output data being stored in the data field. Cell outputs of type execute_result contain an additional execute_count field.

Consider for example these two cell outputs as represented in the original json ipynb format:

{
    "output_type": "display_data",
    "metadata": {
        some-metadata-key: "some-value"
    },
    "data": {
        "text/html": "<div>Some HTML Content</div>",
        "image/png": "base-64-encoded-image"
    }
},
...,
{
    "output_type": "execute_result",
    "execute_count": 2,
    "metadata": {
        some-metadata-key: "some-value"
    },
    "data": {
        "text/html": "<div>Some HTML Content</div>",
        "image/png": "base-64-encoded-image"
    }
}

These output cells are represented as such in markdown:

```{jupyter.output output_type=display_data}
---
some_metadata_key: some-value
---
{ "text/html": "<div>Some HTML Content</div>" }
{ "image/png": "base-64-encoded-image" }
```

```{jupyter.output output_type=execute_result execute_count=42}
---
some_metadata_key: 'some-value'
---
{ "text/html": "<div>Some HTML Content</div>" }
{ "image/png": "base-64-encoded-image" }
```

Explanations:

  • Cell metadata, if present, is represented as a top YAML block of the directive.
  • The MIME type keyed entries from the output's data attribute are represented as individual objects, consistent with JSON lines format, each MIME type occupying a separate line and serialized without any newline formatting to improve the behavior of text-based diffs.
    Organization of the MIME type data into separate objects on single lines improves readability and ensures that each line is a valid self-contained JSON object. On parsing the directive, a merge operation should be performed to construct a single data object containing all mimetype keys.
  • Other cell attributes, like output_type or execute_count for execute_result cell outputs, are represented in the info-string of the directive.

Raw cells

Raw cells are represented in a similar fashion:

​​​​```{jupyter.raw-cell}
​​​​---
​​​​raw_mimetype: text/html
​​​​---
​​​​<b>Bold text<b>
​​​​```

with the same syntax for parameters and metadata as for code-cells.

For compatibility with the Jupytext and Myst notebook formats, parsers may accept {raw-cell} instead of {jupyter.raw-cell}.

Text cells

Implicitly, the chunks of Markdown around and in between code/output/raw cells are considered as Markdown cells: thus, the whole document behaves as a single flowing Markdown document, interspersed with code/output/raw cells (same as MyST Markdown).

​​​​A text cell
​​​​```{jupyter.code-cell}
​​​​1 + 1
​​​​```
​​​​
​​​​Another text cell
​​​​```{jupyter.code-cell}
​​​​1+2
​​​​```

The chunks of Markdown may be broken up into several text cells by means of a thematic break +++ (as in MyST Markdown):

​​​​A text cell
​​​​+++
​​​​Another text cell

Text cell metadata can be provided by mean of a YAML 1.2.2 block, shorthand notation, or a single line JSON representation:

​​​​+++ { "slide": true }
​​​​A text cell
​​​​+++
​​​​---
​​​​foo: bar
​​​​---
​​​​Another text cell
​​​​+++
​​​​:foo: bar
​​​​A third text cell

Note that the leading thematic break does not introduce a leading empty text cell.

Cell attachments

Cell attachments are embeded as fenced code blocks in the Markdown of the cell:

​​​​Here is some text.

​​​​And now ![an attachment](image.png).

​​​​```{jupyter.attachment}
​​​​:label: image.png
​​​​{json blurb}
​​​​```

For multiple attachments, use several fenced code blocks.

Implementation

  1. Generalize the nbformat specification to accept several serialization syntaxes.
  2. Implement an extension mechanism in nbformat so that:
    • extensions can register a pair of serializers/deserializers attached to a file extension;
    • nbformat chooses accordingly the appropriate serializers / deserializers.
  3. Implement a serializer/deserializer for Jupyter Markdown notebooks. Make this implementation a dependency of nbformat, and register it in nbformat for extension .nb.md.
  4. As needed, design a similar plugin mechanism for JavaScript-based tools (e.g. JupyterLab front-end).
  5. Let the Jupyter server serve .nb.md files with the application/x-ipynb+md MIME type and document that the new MIME type in the Jupyter documentation.
  6. Review existing tools to check whether further adaptation is needed (e.g. they may have hard coded assumptions that a notebook file has the .ipynb extension). Interesting candidate: Pandoc.
  7. Encourage Jupytext to be refactored to use the above extension mechanism(s).

Rationale and alternatives

Prior art

Some narrative-centric text based notebook formats

In the Jupyter ecosystem, Jupytext lets users convert notebooks between different formats, including .ipynb and most of the aforementioned text-based formats. See the documentation, which nicely recaps the formats.

Some code-centric notebook formats

In these formats, the notebook is a code file that can be run as-is.

Existing formats that use # %% as a cell delimiter:

Implementations that use other delimiters:

None of these formats describe any way to encode outputs, metadata, or text cells.

Use Cases

Teaching Scenario

Course material tends to target non Jupyter experts, be narrative-heavy, iteratively and collaboratively authored as part of a larger body of material, and bear lightweight computations. Thereby, in this use case, the priority is on human readability and writability, conciseness, statelessness, and compatibility with version control, text tools and other material (typically written in Markdown). Outputs and widget states are typically best discarded, also to save space. Metadata is typically either handcrafted for dedicated tools (slides, grading tools, ) or best discarded. This is orthogonal to this JEP, but rich text support is a must.

Authoring in notebooks

  • Users who author manuscripts, papers, and technical documents using Jupyter tend to produce notebooks that are predominantly text but contain outputs and code relevant to the document's content or publication.
  • Authors prefer to utilize notebooks as they allow results, figures, tables, and other computational outputs to be reproduced as an integral part of the document.
  • Authors work interactively and collaboratively, moving a draft manuscript toward a final polished article in steps including drafting, peer review, commenting, and revision.
  • Authors and their collaborators may be scientists or researchers in any scientific or technical field and typically can have various technical abilities.
  • Although the iterative nature of document preparation lends itself to a versioning process, the use of a version control system for sharing and collaboration may not be the norm, and collaboration typically occurs by sharing via mechanisms like cloud storage, email, etc.
  • Code relevant to the subject of the document may be limited (e.g. to key algorithms), so most code cells will likely be hidden in the final paper, and authors may work to minimize the amount of code directly included in the notebook.
  • Authoring scientific papers and technical manuscripts requires the use of rich document features such as rich text, cross-referencing, citations, equations, and figures with captions and numbering. Where these are not directly supported, authors may contrive to replicate them in the document manually.
  • Notebooks will be used as the content for a publication. This either involves the creation of a PDF that links to a published version of the notebook via Digital Object Identifier on a service such as Zenodo or directly publishing the notebook in HTML form.

Frontend serialization scenario

  • A user in JupyterLab saves a notebook “as text”, e.g. by setting its name to foo.nb.md.
  • JupyterLab then always saves foo.nb.md as a text file.
  • Better version control and diffing.
  • This text file is completely self-contained and 100% compatible with .ipynb. It can be shared as an .ipynb file. Outputs that can be expensive (e.g. GPU/HPC) or hard to reproduce (e.g. complex software stacks) are preserved. Widget states that may depend on non-reproducible user interaction are also preserved.
  • Opportunity for better stream-based loading.

Rationale

Frequently Asked Questions

What's the rationale for the support of lossless serialization of any notebook, when serializing large data chunks like outputs or attachments will anyway harm the readability of the file?

A successor to the current notebook format should allow current users to use the new format flawlessly.

Most of the current userbase creates their content in the notebook user interfaces, and picking one format over another in the preferences should not harm the ability to use existing extensions. If the new format does not allow to preserve the current behavior, we will lose the confidence of our userbase.

Why support several syntaxes for metadata?

  • The one-line JSON blob syntax is compact and unobtrusive.
  • The YAML block syntax is human-readable and editable
  • The shorthand colon syntax is human-readable, editable, and very compact at the price of not supporting nested metadata.

Enabling all three syntaxes supports both use cases where metadata is small and should be readable and editable and use cases where one wants to preserve metadata while making it as unobtrusive as possible. It also enables importing files that use either convention, helping with interoperability and migration.

How to validate the plain text format notebooks, especialy against the emerging ideas around including JSON schemas for validation?

Serialize to JSON and validate the JSON.

What happens if people insert text (or any whitespace) between a cell's input and output blocks(s)?

The output block(s) will still be recognised provided only whitespace characters inserted between.

How do we split a large body of markdown into several markdown cells (in other words, can we have cell breaks )?

Use thematic breaks +++. These allow individual markdown cell boundaries to be idenitified and can include metadata enabling a lossless roundtrip between text-based and ipynb format.

How to store large widget states? With the current format, widgets states will be stored in the notebook metadata, that is in the YAML header which will soon become very large. Should widget states be moved to outputs instead?

Large widget state is notebook metadata. The requirement on back and forth convertibility gives an indication of where this goes. Also, we cannot store it in widget output because outputs only hold views of widget state, and the same widget can be displayed multiple times.

Unresolved questions

The following part of the design are expected to be resolved through the JEP process before it gets merged:

  • Final pinning of the syntax of the info-string for fenced cells: {jupyter.code-cell}, {code-cell}, {jupyter:code-cell}, {.code}
    • having a namespaced directive is a good idea;
    • one can optionally prefix the info-string with a language name, as in python {jupyter.code-cell}, as a hint for syntax highlighting in markdown viewers and editors. This language name is purely advisory for markdown editors, and carries no semantic meaning for Jupyter.
  • How far do we want to support closely related formats for interoperability and ease of transition?
  • Sylvain: how do we split a large body of markdown into several markdown cells (in other words, can we have cell breaks )? Presumably, a ipynb only composed of markdown cells should be convertible back and forth and be split in cells in the same way.
  • The final decision about file extension .nb.md.
  • Currently, all of the examples above start a notebook with some markdown content, without an initial thematic break. This assumes that any initial content implicitly defined a markdown cell. However, what happens where the first cell in the notebook is a code cell? and if there are newlines or whitspace between the notebook frontmatter block and the first code cell? a markdown cell would not be inserted here? how would someone insert an empty markdown cell as the first cell in the notebook? by inserting an explicit thematic break?

Other open questions

Would there be possible programming languages that conflict with the metadata syntax for cells? For example, a programming language that has syntax like :variable: value?

Future possibilities

The following issues and lines or actions are out of scope for this JEP and could be addressed in the future independently of the solution(s) that comes out of this JEP:

  • Markdown syntax within rich text cells: enrich the current markdown flavor? Support alternative flavors like MyST? The meaning of "Markdown" in terms of jupyter's support for that within th format is the subject of another JEP, see this issue to track that discussion.
  • Syntax for output previews
  • Enable indirect storage of outputs and attachments: external URL, content-id url to another part of the same multipart mime bundle, reference to some attachment at the end of the notebook file. See this document discussing a potential JEP
  • Encourage tooling or configuration thereof to be opiniated about how certain pieces of information should be handled upon saving, to support various use case. E.g.
    • specify that outputs and cell ids shall be discarded when readability and conciseness is the priority.
    • specify that outputs should be stored at the end
    • specify that metadata should be stored concisely as one-line json blobs
  • After some time and experimentation, define official notebook metadata specifying how these pieces of information should be handled upon saving.
  • Standardize ways to provide directory-wide / project-wide / … notebook metadata (typical use cases: specify that outputs shall be discarded for all notebooks in a directory; define project-wide RISE configuration, …)

Discussions (won't be in the JEP)

Should we be opinionated about blessing a single format, or should we just encourage multiple text-based formats that already are in existence?

  • One of our roles in the community is to be opinionated about a format so people can collaorate and tooling can be developed without fragmentation
  • Sometimes being opinionated is important; to serve as a canonical reference for e.g. formats, or specifications. Is the intention of text-based notebooks to be a storage medium, or an authoring workflow? If it's the latter, it seems like choice is the most important thing, just as users like choice of multiple different frontends. If it's storage, then it's clear we want a specified format. If it's to improve VCS, that seems like an orthogonal problem - nbdime offers one approach for taking existing ipynb blobs and diffing them.
  • Currently jupytext (as an example) is an implementation-defined "standard". It's important to have a format that is not dependent on a single implementation's decisions.
  • Perhaps the "standard" is a Jupyter-wide interface to read/write nbformat objects to various filetypes, i.e. jupytext is one implementation of this interface.
  • With this JEP, we are not restricting freedom of choice for users

Variants for code cell formats

​​​​```{code-cell} ipython3
​​​​---
​​​​id: 12344
​​​​exec_nt: 3
​​​​metadata:
​​​​  nbgrader:
​​​​    grade: true
​​​​    grade_id: cell-963f3a9626ae1519
​​​​    locked: true
​​​​    points: 1
​​​​    schema_version: 3
​​​​    solution: false
​​​​    task: false
​​​​---
​​​​assert ultime == 42
​​​​```

​​​​```{code-cell} id=12344 excution_count=3
​​​​---
​​​​nbgrader:
​​​​  grade: true
​​​​  grade_id: cell-963f3a9626ae1519
​​​​  locked: true
​​​​  points: 1
​​​​  schema_version: 3
​​​​  solution: false
​​​​  task: false
​​​​---
​​​​assert ultime == 42
​​​​```

​​​​```{code-cell } ipython3
​​​​---
​​​​id: 12344
​​​​exec_nt=3
​​​​nbgrader:
​​​​  grade: true
​​​​  grade_id: cell-963f3a9626ae1519
​​​​  locked: true
​​​​  points: 1
​​​​  schema_version: 3
​​​​  solution: false
​​​​  task: false
​​​​---
​​​​assert ultime == 42
​​​​```

​​​​```{code-cell} ipython3
​​​​---
​​​​attributes:
​​​​  id: 12344
​​​​  exec_nt=3
​​​​nbgrader:
​​​​  grade: true
​​​​  grade_id: cell-963f3a9626ae1519
​​​​  locked: true
​​​​  points: 1
​​​​  schema_version: 3
​​​​  solution: false
​​​​  task: false
​​​​---
​​​​assert ultime == 42
​​​​```

With the above, code cells don't have syntax highlighting. Some markdown highlighters (intellij) highlight the code correctly if the language name is written directly after backticks:

​​​​```python {something} 
​​​​import sys
​​​​2 + 2
​​​​```

This is a valid syntax for CommonMark, but is not a valid syntax for MyST. MyST suggests writing like that:

​​​​```{something} python 
​​​​import sys
​​​​2 + 2
​​​​```

However, neither intellij nor vscode (in a simple markdown file) support it.

Variants for cell output formats

Note: most text formats don't support storing cell-outputs. Iff. text-based formats are mainly useful for "authoring", then maybe we want out-of-band outputs? i.e. perhaps we want a JEP to specify how out-of-band data are stored.

print(3)
1+1
3
{json blurb}

As a short hand, we could support

2

that would be automatically translatated to {json min/plain-text}

Traceback: ....

Possible output block formats

Here we are making use of directive arguments to show the type of the output and reserving the YAML frontmatter block for the contents of the top level metadata key

```{jupyter.output} stream
---
name: stdout
---
This is the stream content that was in the *text* field of the
original json output
```

```{jupyter.output} error
---
ename: ReferenceError
evalue: x is a unknown
---
The *traceback* field rendered as content
```

```{jupyter.output} display_data
---
mdkey: value
---
{ "text/plain": "some text data" }
```

```{jupyter.output} execute_result execute_count=42
---
some_metadata_key: 'value'
---
{ "image/png": base64-image-text }
```

Attachements

+++
asdasdf
as
sadf


```{jupyter.attachment}
{'foo.png': {json blurb}
 'bar.png': {json blurb}
```

```{jupyter.attachment}
{'foo.png': {json blurb}
 'bar.png': {json blurb}
```

```{jupyter.attachment}
:label: foo.png
{json blurb}
```

```{jupyter.attachment}
:label: foo.png
{json blurb}
```

```{jupyter.attachment} foo.png
{json blurb}
```

Metadata

Metadata in IPYNB format can be a nested data structure, thus a flat key-value format doesn't fit out needs.

Should metadata be YAML?

Which flavor of Markdown?

  • CommonMark
  • Github Flavored Markdown
  • Myst

This discussion is more about cell contents. The syntax here is simple enough that we barely need to extend beyond CommonMark

How ambitious should the proposal be?

Level 0. Pick/define/refine an official alternative text-base serialization syntax to be seamlessly supported by most tools in the ecosystem (e.g. all these that use nbformat).

Level 1. Empower the community to implement, reuse, experiment with alternative serialization syntax to be seamlessly supported by most tools in the ecosystem assuming appropriate extensions are installed. Shepherd the process and pick the most promissing alternatives and make it official.

Pros of level 0:

  • A single standard authorized by the official Jupyter committee allows everyone in the world to create their own serializer and deserializer that will be compatible with everyone else's serializer/deserializer.

Cons of level 0:

  • if it's quite easy to define a text-based serialization format, choosing at this stage one that is / will be widely adopted could be tricky

Pros of Level 1:

  • The work needed is roughly similar in both cases (supporting two or more serialization syntaxes is not much different).
  • It helps interoperability with many existing notebook formats outhere, text-based or not
  • Implemenging a new format is relatively lightweight: writing a new serializer/deserializer boils down to splitting into cells and parsing metadata; it presumably needs to be done both in python and javascript to be usable by most
  • Promotes engagement of the community and organic evolution to which ever solution fits most needs.

Potential caveats of Level 1:

  • This could lead to a proliferation of formats
    past experience with Jupytext seems to suggest that this won't be too bad: there is a natural incentive to stick to widely used / supported formats.
  • Portability depends on installed extensions
    that's already the case for many Jupyter features
  • This JEP is responsible for defining a new format of the notebook file. The Level 1 suggestion implies delegation of that responsibility to someone else, and the Jupyter committee loses control over the format.

    [nt] even with Level 1, deciding which format is made official is in the hands of the Jupyter committee.

    [vl] Then, it must be emphasized that Jupyter must be able to open only officially accepted notebook file formats. Any notebook file with a third-party format must be considered as invalid. Should anyone be able to work with that format, they have to use their own fork of Jupyter.
    I don't see why one should actively prevent loading third party formats. If a user chooses an official format, s.he knows that this will come with garantees. Again, it's just like any Jupyter extension. Using official extensions or widely used ones gives you garantees. But you can still use, at your own risks, others.
    [vl] That makes sense. However, it would be enough to have a different filename extension in that case, without describing the contents of the file.

    [nt] Definitely, each syntax should use a different extension
    Also, in that case, the proposal for additional third-party format must be accepted along with the new default official format. Either both are accepted, or none.
    So your suggestion would be to have to separate JEP' id=12344 excution_count=3s? > > > We have to separate JEPs anyway, simply because this file is too big and contains different ideas. At least, I was told about that.
    Agreed. The only piece that makes me hesitate is: which JEP should come first?
    The new official notebook format may exist with the ability to use third-party formats. However, the ability to use third-party format may not exist without the official notebook format. Otherwise, all negative consequenses around this text can be applied. So, I suggest making the new output official notebook format the first.
    [nt] I/ see the political reason. I guess my hesitation point is whether the landscape is mature at this stage to set in stone the official format; or whether having an experimentation period where the community can explore would help shape the official format.
    Then, the JEP should define what is the experimentation period and what must be done after that.
    [nt] Open question: if we start with just two formats: is there a risk that the plugin mechanism that will be implemented will be "hardcoded" and then hard to generalize after the fact?

  • It's not clear if the community wants to be empowered to create their own formats. Instead, it can turn out that almost everyone doesn't care about the notebook format. Nevertheless, even if no one in the world actually implements their own alternative notebook format, other developers can't know that and have to expect appearance of some previously unknown format in their software.

    [nt] under Level 0, developers of a tool already have to ensure that it supports both the ipynb and the text format; once this is done, supporting more is for free. I see: you mean in the case they can't use the community provided parsers.

    [nt] Not exactly. In case of Level 0 developers know how to parse two different formats with well-defined structure and good documentation. In case of Level 1 developers stumble upon unknown number of unknown formats which may have no documentation.
    [nt] that's part of the selection process: a format that's not documented or is not provided with good parsers won't be adopted by tools; if users care about a given format, and want tools to work, they should make sure that their format is high quality.

  • The opposite case is possible as well. There can appear too many new formats, and it would be impossible to choose the best one and deprecate the others.
  • While Jupyter is a well-known and trusted brand, developers of external formats can be not so famous. Many enterprises allow to install only white-listed software, and enterprises' security teams may prohibit usage of the new Jupyter format that requires installation of unknown untrusted software.

    [nt] yes, that's true of any Jupyter extension. It's part of the selection process in the ecosystem.
    >

Select a repo