Jason Grout
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
2
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- title: Official support for Markdown-based notebooks authors: Nicolas M. Thiéry @nthiery, Andrey Lisin @avli, Steve Purves @stevejpurves contributors: Jason Grout @jasongrout, Sylvain Corlay @sylvain.corlay, Vladimir Lagunov @vladimirlagunov issue-number: <pre-proposal-issue-number> pr-number: <proposal-pull-request-number> date-started: <2023-03-01> --- # JEP: official support for Markdown-based notebooks ## Summary <!-- *One paragraph explanation of the proposal.* --> This JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks, with file extension `.nb.md`, to be adopted as an official standard by the Jupyter community, and describes steps to make it supported by most tools in the ecosystem. It is meant as one of several steps towards offering flexibility in how to represent notebooks to simultaneously: - support an extensive range of use cases with various balances of priority between conciseness and readability and lossless conversion to and from `.ipynb` files; - foster standardization by being opinionated on some of the low-level choices. ## Motivation <!-- *Why are we doing this? What use cases does it support? What is the expected outcome?* --> The [Jupyter notebook format](https://nbformat.readthedocs.io/en/latest/) is currently defined by a data structure, a serialization syntax (JSON), and a syntax for rich text cells (some variant of Markdown). This format has tremendously supported the community in having a lingua franca to exchange computational narratives. Yet over the years, the community has recurrently expressed the need for:<!-- TODO: some citations -->: 1. The simplicity of the mental model for end users: "A notebook is just a glorified Markdown file". 2. Human readability and editability of the plain file. 3. Natural interoperability with version control systems and [software forges](https://en.wikipedia.org/wiki/Forge_(software)) (GitHub, GitLab, etc), e.g. readable diffs and merges, quick online edition with forges' editor. 4. Interoperability with standard text tools to browse, edit, power-edit, author, mass search and replace, tags, macros, etc. 5. Interoperability with existing notebook formats. 6. Efficient handling of large data blobs like outputs. <!-- Not necessarily in the scope for this specific JEP, but raises similar needs as human readability --> 7. Streaming - enable progressive loading of a notebook, where a partially received file remains usable/viewable (this is not possible in JSON). 8. Natural integration in larger bodies of contents: e.g. books built out of a combination of plain Markdown files and notebooks. 9. Complex IDEs like PyCharm or Visual Studio Code and complex text editors like Vim or Emacs are optimized for working with text files. Meanwhile, there is a long track record of using text-based notebooks, both outside the Jupyter ecosystem (narrative-centric: R Markdown, org-mode, and others; code-centric: [MATLAB](https://www.mathworks.com/products/matlab.html), [Visual Studio Code](https://code.visualstudio.com/docs/python/jupyter-support-py#_jupyter-code-cells), [Spyder](https://docs.spyder-ide.org/3/editor.html#defining-code-cells), [PyCharm and DataSpell](https://www.jetbrains.com/help/pycharm/matplotlib-support.html#console)), and within the Jupyter ecosystem, notably with [Jupytext](jupytext.readthedocs.io/) and [Jupyter Book](https://jupyterbook.org/). The wide adoption of such solutions highlights their suitability in many use cases. Though the existing text-based formats go a long way toward supporting the need of the community, they share a significant pain-point: the inability to represent outputs and attachments. Other pain points are: - Jupytext needs to hack its way in the Jupyter(Lab) content manager to support opening and saving text notebooks seamlessly; this also takes some configuration steps for the user; - most other tools in the ecosystem (e.g. nbconvert, nbgrader) can't read or write text notebooks, forcing the users to convert their notebooks back and forth using `.ipynb` format; - there are many text notebook formats or implementations out there, each of which brings a combination of coupled choices between distinct aspects: which syntax is used for serialization syntax, which syntax is used for rich text, which information can be or is stored, etc. Proper decoupling between these aspects to maximize flexibility together with opinionated standardization on the basics would benefit the community to cover the variety of use cases; - the content-type discovery (Should the file be considered as a notebook? If yes, in which format?) currently relies on a combination of looking at the extension and at the metadata inside the file. <!-- Note: [nt] moved the use cases to the rationale section to quickly jump to the core of the JEP --> ## Guide-level explanation <!-- Explain the proposal as if it was already implemented and you were explaining it to another community member. That generally means: - Introducing new named concepts. - Adding examples for how this proposal affects people's experience. - Explaining how others should *think* about the feature, and how it should impact the experience using Jupyter tools. It should explain the impact as concretely as possible. - If applicable, provide sample error messages, deprecation warnings, or migration guidance. - If applicable, describe the differences between teaching this to existing Jupyter members and new Jupyter members. For implementation-oriented JEPs, this section should focus on how other Jupyter developers should think about the change, and give examples of its concrete impact. For policy JEPs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. --> This JEP provides a standard syntax for representing a Jupyter notebook as a Markdown file. We call such a file a Markdown Jupyter notebook. Here is a minimal Markdown Jupyter notebook that could typically be authored manually: --- metadata: kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- # A minimal Markdown Jupyter notebook This is a text cell ```{jupyter.code-cell} 1+1 ``` This is another text cell +++ And another one Note that this file contains only the minimal information required to reconstruct a valid notebook. In particular, there are no cell ids, outputs, execution counts. Here is a Markdown Jupyter notebook containing a lossless representation of a full-featured Jupyter notebook with (cell) metadata, outputs, attachments, etc. As this example is long form it has been posted [in this example repository](https://github.com/stevejpurves/jep-text-based-format-example) along with the accompanying `.ipynb` file. ## Reference-level explanation ### Design goals The proposed syntax was designed to satisfy the following requirements: - The syntax should allow for lossless serialization of any Jupyter Notebook [data structure](rmarkdownnbformat.readthedocs.io/en/latest/format_description.html). This includes: - notebook metadata; - text and code cells, with metadata and parameters (e.g. cell IDs); - outputs, with mime-times, metadata, etc; - cell attachments; - widget states. - The serialized notebook should be a valid Markdown file. - Should aim for reasonable human readability and editability: text, code, raw cells, and metadata should be human-readable and editable. Large chunks of data like output cells or attachments should be as non-obtrusive as possible. - Should aim for reasonable support by version control. - Should aim for reasonable rendering by typical Markdown viewers. - Should be similar to existing popular formats; ideally should be one of the existing popular formats. In addition, the following are good to have: - Enable reading similar notebook formats to ease the transition. - Enable third-party extensions to implement and declare alternative serialization syntaxes so that most tools natively support them. This helps with interoperability: think of a third-party extension to treat R Markdown notebooks as native Jupyter notebooks. This section describes the proposed syntax for serializing Jupyter notebooks in Markdown. Then, we detail the steps needed for this syntax to be supported by most tools in the Jupyter ecosystem. <!-- This is the technical portion of the JEP. Explain the design in sufficient detail that: - Its interaction with other features is clear. - It is reasonably clear how the feature would be implemented. - Corner cases are dissected by example. The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. --> ### Serialization syntax description #### Top-level structure A Jupyter Markdown notebook consists of an optional metadata header followed by Markdown representing a sequence of text cells, code cells, outputs, raw cells, etc. #### Metadata header The notebook metadata is represented by a YAML 1.2.2 header at the top of the document, surrounded by `---` delimiters: ```yaml= --- metadata: kernel_info: name: the name of the kernel language_info: name: the programming language of the kernel version: the version of the language codemirror_mode: The name of the codemirror mode to use [optional] nbformat: 4 nbformat_minor: 0 --- ``` The metadata structure mirrors that of the [Jupyter Notebook format](https://nbformat.readthedocs.io/en/latest/format_description.html). #### Code cells Jupyter Markdown notebooks use fenced code blocks with backticks to represent code cells (like Pandoc, Jupytext Markdown, Myst Markdown): ```{jupyter.code-cell} print('hi') ``` where the [info string](https://spec.commonmark.org/0.30/#info-string) `{jupyter.code-cell}` specifies that this is a code cell. Cell parameters `execution_count` and `id` must be encoded as such when specified: ```{jupyter.code-cell execution_count=N id=...} print('hi') ``` Cell metadata, if present, can be represented by an optional YAML 1.2.2 block between `---` delimiters at the beginning of the code block (same as Myst Markdown): ```{jupyter.code-cell execution_count=42 id=1234abcd} --- key: more: true tags: [hide-output, show-input] --- print('hi') ``` Alternatively, non-nested metadata may be represented using the [*short-hand option syntax*](https://myst-parser.readthedocs.io/en/latest/syntax/roles-and-directives.html#parameterizing-directives) (same as Myst Markdown): ```{jupyter.code-cell} :tags: [hide-output, show-input] print('hi') ``` <!-- TODO: should we instead accept the colon syntax for any metadata? It admits a nice and simple formalization "alternatively, metadata may be represented as a colon prefixed YAML block"; this is more general, simple to describe, but deviates slightly from myst --> Finally, metadata may also be represented by a single line JSON blob in the info-string: ```{jupyter.code-cell metadata={json blob}} :tags: [hide-output, show-input] print(Hello!") ``` <!-- TODO: proofread the above syntax for single line json blob metadata; do we all agree on it? --> <!-- Sylvain: tags are part of the current metadata. Special-casing them outside of the metadata may end up in conflicts and complications. --> For compatibility with the Jupytext and Myst notebook formats, parsers may accept `{code-cell}` instead of `{jupyter.code-cell}`. <!-- Sylvain: I would like to remove compatibility with {code-cell} since there are not many myst notebooks at the moment compare to the crazy numbers of jupyter notebook --> #### Code cell outputs Once executed, a code cell may have zero or more outputs. When stored, the output(s) of the code cell appear(s) immediately after the code cell. The syntax resembles that of a code cell but also provides the different types of output specified in the `.ipynb` format: `stream`, `error`, `execute_result`, and `display_data`. All types include the `output_type` field which has been included as a `command` on the first line of the directive. ##### `output_type: stream` The JSON format of a `stream` output includes 2 additional fields `name` and `text`. The value of the `text` field can potentially be long and is reproduced in the body of the directive to improve readability. ```` # .ipynb ``` { "output_type": "stream", "name": "stdout", "text": [ "This is the stream content that was in the *text* field\n", "of the original json output\n" ] } ``` ```` ```` # text-based .md ```{jupyter.output output_type=stream} --- name: stdout --- This is the stream content that was in the *text* field of the original json output ``` ```` ##### `output_type: error` The JSON format of an `error` output includes 3 additional fields `ename`, `evalue` and `traceback`. The value of the `traceback` field is reproduced in the body of the directive to improve readbility. ```` # .ipynb { "output_type": "error", "ename": "ReferenceError", "evalue": "x is unknown", "traceback": [ "The *traceback* field rendered as content\n", ] } # text-based .md ```{jupyter.output output_type=error} --- ename: ReferenceError evalue: x is a unknown --- The *traceback* field rendered as content ``` ```` ##### `output_type: display_data` and `output_type: execute_result` These two output types are both "MIME bundles" and share a similar structure, with the output data being stored in the `data` field. Cell outputs of type `execute_result` contain an additional `execute_count` field. Consider for example these two cell outputs as represented in the original json ipynb format: ``` { "output_type": "display_data", "metadata": { some-metadata-key: "some-value" }, "data": { "text/html": "<div>Some HTML Content</div>", "image/png": "base-64-encoded-image" } }, ..., { "output_type": "execute_result", "execute_count": 2, "metadata": { some-metadata-key: "some-value" }, "data": { "text/html": "<div>Some HTML Content</div>", "image/png": "base-64-encoded-image" } } ``` These output cells are represented as such in markdown: ```` ```{jupyter.output output_type=display_data} --- some_metadata_key: some-value --- { "text/html": "<div>Some HTML Content</div>" } { "image/png": "base-64-encoded-image" } ``` ```{jupyter.output output_type=execute_result execute_count=42} --- some_metadata_key: 'some-value' --- { "text/html": "<div>Some HTML Content</div>" } { "image/png": "base-64-encoded-image" } ``` ```` Explanations: - Cell metadata, if present, is represented as a top YAML block of the directive. - The MIME type keyed entries from the output's `data` attribute are represented as individual objects, consistent with JSON lines format, each MIME type occupying a separate line and serialized without any newline formatting to improve the behavior of text-based diffs. Organization of the MIME type data into separate objects on single lines improves readability and ensures that each line is a valid self-contained JSON object. On parsing the directive, a **_merge_** operation should be performed to construct a single `data` object containing all `mimetype` keys. - Other cell attributes, like `output_type` or `execute_count` for `execute_result` cell outputs, are represented in the info-string of the directive. <!-- [sp] adding this based on discussions at the end of the workshop, --> #### Raw cells Raw cells are represented in a similar fashion: ```{jupyter.raw-cell} --- raw_mimetype: text/html --- <b>Bold text<b> ``` with the same syntax for parameters and metadata as for code-cells. For compatibility with the Jupytext and Myst notebook formats, parsers may accept `{raw-cell}` instead of `{jupyter.raw-cell}`. #### Text cells Implicitly, the chunks of Markdown around and in between code/output/raw cells are considered as Markdown cells: thus, the whole document behaves as a single flowing Markdown document, interspersed with code/output/raw cells (same as MyST Markdown). A text cell ```{jupyter.code-cell} 1 + 1 ``` Another text cell ```{jupyter.code-cell} 1+2 ``` The chunks of Markdown may be broken up into several text cells by means of a [**thematic break**](https://spec.commonmark.org/0.30/#thematic-breaks) `+++` (as in [MyST Markdown](https://myst-tools.org/docs/spec/blocks#block-breaks)): A text cell +++ Another text cell Text cell metadata can be provided by mean of a YAML 1.2.2 block, shorthand notation, or a single line JSON representation: +++ { "slide": true } A text cell +++ --- foo: bar --- Another text cell +++ :foo: bar A third text cell Note that the leading thematic break does not introduce a leading empty text cell. #### Cell attachments Cell attachments are embeded as fenced code blocks in the Markdown of the cell: Here is some text. And now ![an attachment](image.png). ```{jupyter.attachment} :label: image.png {json blurb} ``` For multiple attachments, use several fenced code blocks. ### Implementation 1. Generalize the `nbformat` specification to accept several serialization syntaxes. 2. Implement an extension mechanism in `nbformat` so that: - extensions can register a pair of serializers/deserializers attached to a file extension; - `nbformat` chooses accordingly the appropriate serializers / deserializers. 3. Implement a serializer/deserializer for Jupyter Markdown notebooks. Make this implementation a dependency of `nbformat`, and register it in nbformat for extension `.nb.md`. 4. As needed, design a similar plugin mechanism for JavaScript-based tools (e.g. JupyterLab front-end). 5. Let the Jupyter server serve `.nb.md` files with the `application/x-ipynb+md` MIME type and document that the new MIME type in the [Jupyter documentation](https://docs.jupyter.org/en/latest/reference/mimetype.html?#custom-mimetypes-used-in-jupyter-and-ipython-projects). 6. Review existing tools to check whether further adaptation is needed (e.g. they may have hard coded assumptions that a notebook file has the `.ipynb` extension). Interesting candidate: Pandoc. 7. Encourage Jupytext to be refactored to use the above extension mechanism(s). ## Rationale and alternatives <!-- - *Why is this choice the best in the space of possible designs?* - *What other designs have been considered and what is the rationale for not choosing them?* - *What is the impact of not doing this?* --> ## Prior art <!-- Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are: - Does this feature exist in other tools or ecosystems, and what experience have their community had? - For community proposals: Is this done by some other community and what were their experiences with it? - For other teams: What lessons can we learn from what other communities have done here? - Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your JEP with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. --> ### Some narrative-centric text based notebook formats - [R Markdown](https://bookdown.org/yihui/rmarkdown/) - [Quarto](https://quarto.org/) - [MyST Markdown notebooks](https://myst-nb.readthedocs.io/en/v0.9.0/use/markdown.html). An example of a notebook converted to MyST Notebook cab be seen [here](https://gist.github.com/stevejpurves/8c6d129c7bb8b0dacb8460f0e42582c2) - `org-mode` notebooks: look for a notebook in https://orgmode.org/features.html and [this discussion](https://news.ycombinator.com/item?id=16842786); - ReST: e.g. all the Sage documentation was written in ReST and could be converted to Sage notebooks. And later, in a lossy way to Jupyter notebooks with [rst2ipynb](https://pypi.org/project/rst2ipynb/) (lossy because ReST and Sage notebooks allowed for cells nested anywhere in the document structure). In the Jupyter ecosystem, [Jupytext](jupytext.readthedocs.io/) lets users convert notebooks between different formats, including `.ipynb` and most of the aforementioned text-based formats. See the documentation, which nicely recaps the [formats](https://jupytext.readthedocs.io/en/latest/formats.html). ### Some code-centric notebook formats In these formats, the notebook is a code file that can be run as-is. Existing formats that use `# %%` as a cell delimiter: * [JupyText percent format](https://jupytext.readthedocs.io/en/latest/formats.html#the-percent-format) * [Visual Studio Code](https://code.visualstudio.com/docs/python/jupyter-support-py#_jupyter-code-cells) * [Spyder](https://docs.spyder-ide.org/3/editor.html#defining-code-cells) * [PyCharm and DataSpell](https://www.jetbrains.com/help/pycharm/matplotlib-support.html#console) Implementations that use other delimiters: * [Jupytext light format](https://jupytext.readthedocs.io/en/latest/formats.html#the-light-format) use `# +`in Python and Julia script None of these formats describe any way to encode outputs, metadata, or text cells. ### Use Cases #### Teaching Scenario Course material tends to target non Jupyter experts, be narrative-heavy, iteratively and collaboratively authored as part of a larger body of material, and bear lightweight computations. Thereby, in this use case, the priority is on human readability and writability, conciseness, statelessness, and compatibility with version control, text tools and other material (typically written in Markdown). Outputs and widget states are typically best discarded, also to save space. Metadata is typically either handcrafted for dedicated tools (slides, grading tools, ...) or best discarded. This is orthogonal to this JEP, but rich text support is a must. <!-- - Course material tends to be narrative-heavy and needs rich text. - Notebooks are prepared iteratively and collaboratively, often using version control. - As content matures, metadata is added, often manually, e.g. for tags, nbgrader, or defining slides. - Notebooks are distributed as part of a “project” that is a directory of Markdown files and related assets (e.g. images); thereby, attachments are not needed. - Computations tend to be lightweight. Therefore outputs can easily be regenerated by reevaluating the notebook. Reevaluation is often desirable for a clean state. - Notebooks received by students are clean: no outputs, no machine-generated metadata - Students complete the tasks in the notebooks, executing them. - Upon evaluation, notebooks are reexecuted from scratch to ensure a clean state. - Hundreds of students submitting means that no outputs keep the total data size small. - The course material made of a combination of Markdown and Jupyter notebooks is often published as a static website using e.g. Jupyter-book. --> #### Authoring in notebooks - Users who author manuscripts, papers, and technical documents using Jupyter tend to produce notebooks that are predominantly text but contain outputs and code relevant to the document's content or publication. - Authors prefer to utilize notebooks as they allow results, figures, tables, and other computational outputs to be reproduced as an integral part of the document. - Authors work interactively and collaboratively, moving a draft manuscript toward a final polished article in steps including drafting, peer review, commenting, and revision. - Authors and their collaborators may be scientists or researchers in any scientific or technical field and typically can have various technical abilities. - Although the iterative nature of document preparation lends itself to a versioning process, the use of a version control system for sharing and collaboration may not be the norm, and collaboration typically occurs by sharing via mechanisms like cloud storage, email, etc. - Code relevant to the subject of the document may be limited (e.g. to key algorithms), so most code cells will likely be hidden in the final paper, and authors may work to minimize the amount of code directly included in the notebook. - Authoring scientific papers and technical manuscripts requires the use of rich document features such as rich text, cross-referencing, citations, equations, and figures with captions and numbering. Where these are not directly supported, authors may contrive to replicate them in the document manually. - Notebooks will be used as the content for a publication. This either involves the creation of a PDF that links to a published version of the notebook via Digital Object Identifier on a service such as Zenodo or directly publishing the notebook in HTML form. #### Frontend serialization scenario - A user in JupyterLab saves a notebook “as text”, e.g. by setting its name to `foo.nb.md`. - JupyterLab then always saves `foo.nb.md` as a text file. - Better version control and diffing. - This text file is completely self-contained and 100% compatible with `.ipynb`. It can be shared as an `.ipynb` file. Outputs that can be expensive (e.g. GPU/HPC) or hard to reproduce (e.g. complex software stacks) are preserved. Widget states that may depend on non-reproducible user interaction are also preserved. - Opportunity for better stream-based loading. ### Rationale ## Frequently Asked Questions > What's the rationale for the support of lossless serialization of any notebook, when serializing large data chunks like outputs or attachments will anyway harm the readability of the file? A successor to the current notebook format should allow current users to use the new format flawlessly. Most of the current userbase creates their content in the notebook user interfaces, and picking one format over another in the preferences should not harm the ability to use existing extensions. If the new format does not allow to preserve the current behavior, we will lose the confidence of our userbase. > Why support several syntaxes for metadata? - The one-line JSON blob syntax is compact and unobtrusive. - The YAML block syntax is human-readable and editable - The shorthand colon syntax is human-readable, editable, and very compact at the price of not supporting nested metadata. Enabling all three syntaxes supports both use cases where metadata is small and should be readable and editable **and** use cases where one wants to preserve metadata while making it as unobtrusive as possible. It also enables importing files that use either convention, helping with interoperability and migration. > How to validate the plain text format notebooks, especialy against the emerging ideas around including JSON schemas for validation? Serialize to JSON and validate the JSON. > What happens if people insert text (or any whitespace) between a cell's input and output blocks(s)? The output block(s) will still be recognised provided only whitespace characters inserted between. > How do we split a large body of markdown into several markdown cells (in other words, can we have cell breaks )? Use thematic breaks `+++`. These allow individual markdown cell boundaries to be idenitified and can include metadata enabling a lossless roundtrip between text-based and `ipynb` format. > How to store large widget states? With the current format, widgets states will be stored in the notebook metadata, that is in the YAML header which will soon become very large. Should widget states be moved to outputs instead? Large widget state is notebook metadata. The requirement on back and forth convertibility gives an indication of where this goes. Also, we cannot store it in widget output because outputs only hold views of widget state, and the same widget can be displayed multiple times. ## Unresolved questions The following part of the design are expected to be resolved through the JEP process before it gets merged: - Final pinning of the syntax of the info-string for fenced cells: `{jupyter.code-cell}`, `{code-cell}`, `{jupyter:code-cell}`, `{.code}` - having a namespaced directive is a good idea; - one can optionally prefix the info-string with a language name, as in `python {jupyter.code-cell}`, as a hint for syntax highlighting in markdown viewers and editors. This language name is purely advisory for markdown editors, and carries no semantic meaning for Jupyter. <!-- see discussion below --> - How far do we want to support closely related formats for interoperability and ease of transition? - Sylvain: how do we split a large body of markdown into several markdown cells (in other words, can we have cell breaks )? Presumably, a ipynb only composed of markdown cells should be convertible back and forth and be split in cells in the same way. - The final decision about file extension `.nb.md`. - Currently, all of the examples above start a notebook with some markdown content, without an initial thematic break. This assumes that any initial content implicitly defined a markdown cell. However, what happens where the first cell in the notebook is a code cell? and if there are newlines or whitspace between the notebook frontmatter block and the first code cell? a markdown cell would not be inserted here? how would someone insert an empty markdown cell as the first cell in the notebook? by inserting an explicit thematic break? ### Other open questions Would there be possible programming languages that conflict with the metadata syntax for cells? For example, a programming language that has syntax like `:variable: value`? ## Future possibilities The following issues and lines or actions are **out of scope** for this JEP and could be addressed in the future independently of the solution(s) that comes out of this JEP: - Markdown syntax within rich text cells: enrich the current markdown flavor? Support alternative flavors like MyST? The meaning of "Markdown" in terms of jupyter's support for that within th format is the subject of another JEP, see [this issue to track that discussion](https://github.com/jupyter/enhancement-proposals/issues/98). - Syntax for output previews - Enable indirect storage of outputs and attachments: external URL, [content-id url](https://www.rfc-editor.org/rfc/rfc2111) to another part of the same multipart mime bundle, reference to some attachment at the end of the notebook file. See this [document](/jxU8UzZASwax8MfVsGlDcw) discussing a potential JEP - Encourage tooling or configuration thereof to be opiniated about how certain pieces of information should be handled upon saving, to support various use case. E.g. - specify that outputs and cell ids shall be discarded when readability and conciseness is the priority. - specify that outputs should be stored at the end - specify that metadata should be stored concisely as one-line json blobs - After some time and experimentation, define official notebook metadata specifying how these pieces of information should be handled upon saving. - Standardize ways to provide directory-wide / project-wide / … notebook metadata (typical use cases: specify that outputs shall be discarded for all notebooks in a directory; define project-wide RISE configuration, …) <!-- Think about what the natural extension and evolution of your proposal would be and how it would affect the Jupyter community at-large. Try to use this section as a tool to more fully consider all possible interactions with the project and language in your proposal. Also consider how the this all fits into the roadmap for the project and of the relevant sub-team. This is also a good place to "dump ideas", if they are out of scope for the JEP you are writing but otherwise related. If you have tried and cannot think of any future possibilities, you may simply state that you cannot think of anything. Note that having something written down in the future-possibilities section is not a reason to accept the current or a future JEP; such notes should be in the section on motivation or rationale in this or subsequent JEPs. The section merely provides additional information. --> # Discussions (won't be in the JEP) ## Should we be opinionated about blessing a single format, or should we just encourage multiple text-based formats that already are in existence? - One of our roles in the community is to be opinionated about a format so people can collaorate and tooling can be developed without fragmentation - Sometimes being opinionated is important; to serve as a canonical reference for e.g. formats, or specifications. Is the intention of text-based notebooks to be a storage medium, or an authoring workflow? If it's the latter, it seems like choice is the most important thing, just as users like choice of multiple different frontends. If it's storage, then it's clear we want a specified format. If it's to improve VCS, that seems like an orthogonal problem - nbdime offers one approach for taking existing ipynb blobs and diffing them. - Currently jupytext (as an example) is an implementation-defined "standard". It's important to have a format that is not dependent on a single implementation's decisions. - Perhaps the "standard" is a Jupyter-wide interface to read/write `nbformat` objects to various filetypes, i.e. `jupytext` is _one_ implementation of this interface. - With this JEP, we are not restricting freedom of choice for users ## Variants for code cell formats ```{code-cell} ipython3 --- id: 12344 exec_nt: 3 metadata: nbgrader: grade: true grade_id: cell-963f3a9626ae1519 locked: true points: 1 schema_version: 3 solution: false task: false --- assert ultime == 42 ``` ```{code-cell} id=12344 excution_count=3 --- nbgrader: grade: true grade_id: cell-963f3a9626ae1519 locked: true points: 1 schema_version: 3 solution: false task: false --- assert ultime == 42 ``` ```{code-cell } ipython3 --- id: 12344 exec_nt=3 nbgrader: grade: true grade_id: cell-963f3a9626ae1519 locked: true points: 1 schema_version: 3 solution: false task: false --- assert ultime == 42 ``` ```{code-cell} ipython3 --- attributes: id: 12344 exec_nt=3 nbgrader: grade: true grade_id: cell-963f3a9626ae1519 locked: true points: 1 schema_version: 3 solution: false task: false --- assert ultime == 42 ``` With the above, code cells don't have syntax highlighting. Some markdown highlighters (intellij) highlight the code correctly if the language name is written directly after backticks: ```python {something} import sys 2 + 2 ``` This is a valid syntax for CommonMark, but is not a valid syntax for MyST. MyST suggests writing like that: ```{something} python import sys 2 + 2 ``` However, neither intellij nor vscode (in a simple markdown file) support it. ### Variants for cell output formats Note: most text formats don't support storing cell-outputs. Iff. text-based formats are mainly useful for "authoring", then maybe we _want_ out-of-band outputs? i.e. perhaps we want a JEP to specify how out-of-band data are stored. ```{jupyter.code-cell} print(3) 1+1 ``` ```{jupyter.cell-output} stream 3 ``` ```{jupyter.cell-output} execute_result {json blurb} ``` As a short hand, we could support ```{jupyter.cell-output} plain_execute_result 2 ``` that would be automatically translatated to {json min/plain-text...} ```{jupyter.cell-output} error Traceback: .... ``` ### Possible output block formats Here we are making use of directive arguments to show the type of the output and reserving the YAML frontmatter block for the contents of the top level metadata key ```` ```{jupyter.output} stream --- name: stdout --- This is the stream content that was in the *text* field of the original json output ``` ```{jupyter.output} error --- ename: ReferenceError evalue: x is a unknown --- The *traceback* field rendered as content ``` ```{jupyter.output} display_data --- mdkey: value --- { "text/plain": "some text data" } ``` ```{jupyter.output} execute_result execute_count=42 --- some_metadata_key: 'value' --- { "image/png": base64-image-text } ``` ```` ## Attachements ```` +++ asdasdf as sadf ```{jupyter.attachment} {'foo.png': {json blurb} 'bar.png': {json blurb} ``` ```{jupyter.attachment} {'foo.png': {json blurb} 'bar.png': {json blurb} ``` ```{jupyter.attachment} :label: foo.png {json blurb} ``` ```{jupyter.attachment} :label: foo.png {json blurb} ``` ```{jupyter.attachment} foo.png {json blurb} ``` ```` ## Metadata Metadata in IPYNB format can be a nested data structure, thus a flat key-value format doesn't fit out needs. *Should metadata be YAML?* ## Which flavor of Markdown? - CommonMark - Github Flavored Markdown - [Myst](https://gist.github.com/stevejpurves/8c6d129c7bb8b0dacb8460f0e42582c2) This discussion is more about cell contents. The syntax here is simple enough that we barely need to extend beyond CommonMark ## How ambitious should the proposal be? Level 0. Pick/define/refine an **official** alternative text-base serialization syntax to be seamlessly supported by most tools in the ecosystem (e.g. all these that use nbformat). Level 1. Empower the community to implement, reuse, experiment with alternative serialization syntax to be seamlessly supported by most tools in the ecosystem assuming appropriate extensions are installed. Shepherd the process and pick the most promissing alternatives and make it official. Pros of level 0: - A single standard authorized by the official Jupyter committee allows everyone in the world to create their own serializer and deserializer that will be compatible with everyone else's serializer/deserializer. Cons of level 0: - if it's quite easy to define a text-based serialization format, choosing at this stage one that is / will be widely adopted could be tricky Pros of Level 1: - The work needed is roughly similar in both cases (supporting two or more serialization syntaxes is not much different). - It helps interoperability with many existing notebook formats outhere, text-based or not - Implemenging a new format is relatively lightweight: writing a new serializer/deserializer boils down to splitting into cells and parsing metadata; it presumably needs to be done both in python and javascript to be usable by most - Promotes engagement of the community and organic evolution to which ever solution fits most needs. Potential caveats of Level 1: - This could lead to a proliferation of formats past experience with Jupytext seems to suggest that this won't be too bad: there is a natural incentive to stick to widely used / supported formats. - Portability depends on installed extensions that's already the case for many Jupyter features - This JEP is responsible for defining a new format of the notebook file. The Level 1 suggestion implies delegation of that responsibility to someone else, and the Jupyter committee loses control over the format. > [nt] even with Level 1, deciding which format is made official is in the hands of the Jupyter committee. > > [vl] Then, it must be emphasized that Jupyter must be able to open only officially accepted notebook file formats. Any notebook file with a third-party format must be considered as invalid. Should anyone be able to work with that format, they have to use their own fork of Jupyter. > I don't see why one should actively prevent loading third party formats. If a user chooses an official format, s.he knows that this will come with garantees. Again, it's just like any Jupyter extension. Using official extensions or widely used ones gives you garantees. But you can still use, at your own risks, others. > > [vl] That makes sense. However, it would be enough to have a different filename extension in that case, without describing the contents of the file. > > > [nt] Definitely, each syntax should use a different extension > > Also, in that case, the proposal for additional third-party format must be accepted along with the new default official format. Either both are accepted, or none. > > So your suggestion would be to have to separate JEP' id=12344 excution_count=3s? > > > We have to separate JEPs anyway, simply because this file is too big and contains different ideas. At least, I was told about that. > > Agreed. The only piece that makes me hesitate is: which JEP should come first? > > > The new official notebook format may exist with the ability to use third-party formats. However, the ability to use third-party format may not exist without the official notebook format. Otherwise, all negative consequenses around this text can be applied. So, I suggest making the new output official notebook format the first. > > [nt] I/ see the political reason. I guess my hesitation point is whether the landscape is mature at this stage to set in stone the official format; or whether having an experimentation period where the community can explore would help shape the official format. > > > Then, the JEP should define what is the experimentation period and what must be done after that. > > > [nt] Open question: if we start with just two formats: is there a risk that the plugin mechanism that will be implemented will be "hardcoded" and then hard to generalize after the fact? - It's not clear if the community wants to be empowered to create their own formats. Instead, it can turn out that almost everyone doesn't care about the notebook format. Nevertheless, even if no one in the world actually implements their own _alternative_ notebook format, other developers can't know that and have to expect appearance of some previously unknown format in their software. > [nt] under Level 0, developers of a tool already have to ensure that it supports both the ipynb and the text format; once this is done, supporting more is for free. I see: you mean in the case they can't use the community provided parsers. > > [nt] Not exactly. In case of Level 0 developers know how to parse two different formats with well-defined structure and good documentation. In case of Level 1 developers stumble upon unknown number of unknown formats which may have no documentation. > [nt] that's part of the selection process: a format that's not documented or is not provided with good parsers won't be adopted by tools; if users care about a given format, and want tools to work, they should make sure that their format is high quality. - The opposite case is possible as well. There can appear too many new formats, and it would be impossible to choose the best one and deprecate the others. - While Jupyter is a well-known and trusted brand, developers of external formats can be not so famous. Many enterprises allow to install only white-listed software, and enterprises' security teams may prohibit usage of the new Jupyter format that requires installation of unknown untrusted software. > [nt] yes, that's true of any Jupyter extension. It's part of the selection process in the ecosystem. -->

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully