Amit Kumar
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Papyri **CLI**: `papyri/__init__.py` ## Meeting Notes: 24 Nov, 202 - The parsing is either handled by tree sitter or by numpy doc because function docstrings are not correctly parsed by tree sitter because numpy docs has it's own syntax. - I tried to have an AST that is generic enough that it can represent all the documentation. When possible doesn't have too many specifities to how it was written, it's as simple as possible to be able to represent the document. - What we tried to do is something that's generic enough, that we don't know after it it completely processed if it was rst or markdown. - Another thing we wanted to have delayed link resolution. - We dont' use tree sitter AST has too much information about the source, we don't want. - Our ast still have too much information, but we can't change the tree sitter ast, but we can change our ast. - Another problem with tree-sitter ast is, all the nodes are generic type and to get the actual type you would do something like `<node>.kind` to get the type of node. ```python >>> title_node <Node kind=title, start_point=(1, 0), end_point=(1, 5)> >>> adornment_node <Node kind="adornment", start_point=(2, 0), end_point=(2, 5)> ``` With this you would have to do runtime checks about the types of objects, you can't have static typing, so we can't use mypy. - It's an experimental project so sometimes the reason for something to be in a particular way is because the author started that way. - tree sitter also have a lot of optional fields that are sometimes there and sometimes not and I was trying to make something more coherant, in which you always have the fields when possible and that's some of the reasons why it goes to our own ast. - It might also be the case with myst - Another thing with myst spec is you can serialize to json, but our internal representation as python objects can be well typed, so we can actually make sure everything is rich, everything types well and not use runtime value checking. - Next steps (We can do it in many pull requests, don't need to do all at once) could be either we can modify `ts.py` or we can create a `ts2.py`, that would not emit the ast in `take2.py` instead would emit a new ast in lets say `myst.py`, so that then we need to replace things progressively. - When we don the rendering in html, we can use some of the myst machinery. - Almost everything will somehow change, because everything relies on the current ast. - In the current ast I was planning to have some changes, but didn't had time to do them, like technically when you go to the rendering task, the rendering task shouldn't see any directives. Directives are really meant for addition not for rendering, so the directive node should go away at some point. - Discussion with myst folks: They want to do something similar and we may have feedback on their ast, by saying they parse only markdown, we parse rst, we might say can you change this in your ast. We can also influence the ast myst has. - It's really exploratory. The answer could be no, we really can't use myst ast for reason x, y and z. ## `gen` command Generate documentation for a given package. First item should be the root package to import, if subpackages need to be analyzed but are not accessible from the root pass them as extra arguments. ```python if api: if examples: g.collect_examples_out() if api: g.collect_api_docs(target_module_name) if narrative: g.collect_narrative_docs() ``` ### Where it all start `gen` command does the parsing using tree sitter and returns the object from `take2.py` ### Code Flow: - command: `papyri gen examples/numpy.toml`: `gen` command takes a toml configuration file to generate documentation for a given package. - Saves the generated documentation in the `~/.papyri/data` directory. - `gen_main`: Main entry point to generate docbundle files. - This function collects package metadata, api docs, examples, narrative docs. - `collect_api_docs`: - collector (comes from `_get_collector` constructs a depth first search collector that will try to find all the objects it can.) - For e.g numpy, `collected` was *2667* items. - `('numpy', <module 'numpy' ..>)` - `('numpy.distutils', <module 'numpy.distutils'..>)` - We call `helper_1` on the fully qualified name (`qa`) and `target_item` as in the module for all the collected items. It returns the following three items: - `item_docstring`: docstring of the module - `arbitrary`: List of `papyri.take2.Section`, each section will have the title and items inside that section, this is basically a section in the documentation. - `api_object`: `papyri.gen.APIObjectInfo` a structured object which contains all the information about the parsed documentation, infact `api_object.parsed` is equal to `arbitrary`. - If there is an error, then we continue to the next item in collected. - `prepare_doc_for_one_object`: gets documentation information for one python object. It resturns the following: - DocBlob: An object containing information about the documentation of an arbitrary object. - After all this processing the docs are written in the file system. - For each collected item, the data is written in `~/.papyri/data/numpy_1.23.4/module/<collected_item.json>` ### Tree sitter parsed object ```python >>> tree <tree_sitter.Tree object at 0x1090cf3f0> >>> tree.root_node <Node kind=document, start_point=(0, 0), end_point=(104, 0)> >>> tree.root_node.children[0] <Node kind=section, start_point=(1, 0), end_point=(2, 5)> >>> tree.root_node.children[0].text b'NumPy\n=====' >>> tree.root_node.children[1].text b'Provides\n 1. An array object of arbitrary homogeneous items\n 2. Fast mathematical operations over arrays\n 3. Linear Algebra, Fourier Transforms, Random Number Generation' >>> tree.root_node.children[2].text b'How to use the documentation\n----------------------------' >>> tree.root_node.children[3].text b'Documentation is available in two forms: docstrings provided\nwith the code, and a loose standing reference guide, available from\n`the NumPy homepage <https://numpy.org>`_.' >>> tree.root_node.children[0].children [<Node kind=title, start_point=(1, 0), end_point=(1, 5)>, <Node kind="adornment", start_point=(2, 0), end_point=(2, 5)>] >>> tree.root_node.children[0].children[0] <Node kind=title, start_point=(1, 0), end_point=(1, 5)> >>> tree.root_node.children[0].children[0].children [<Node kind="text", start_point=(1, 0), end_point=(1, 5)>] >>> tree.root_node.children[0].children[0].children[0] <Node kind="text", start_point=(1, 0), end_point=(1, 5)> >>> tree.root_node.children[0].children[0].children[0].children [] >>> tree.root_node.children[0].children[0].children[0].text b'NumPy' ``` ### TreeSitter Parsing (`ts.py` and `take2.py`): - We pass the tree sitter root node to Papyri's `Node` object. - That `Node` object is then passed to `TSVisitor`. - Then we call the `visit_document` method of `TSVisitor`, which eventually calls the `visit` method, which visits all the children. - All the children (tree sitter object) have a type (`c.type`), we call it kind. For each tree sitter type we have defined a method in the `TSVisitor` class named `visit_{kind}`. - For each tree sitter type we have a node defined in the `take2.py` - For each children we call the corresponding `visit_{kind}`, which parses the children and returns the respective object from `take2.py`. - In the last step we `nest_sections`, put things under `Section` Node. ## `ingest` command Example: `papyri ingest ~/.papyri/data/numpy_1.23.4` Given paths to a docbundle folder, ingest it into the known libraries. - This uses the library [cbor2](https://pypi.org/project/cbor2/) to create Concise Binary Object Representation (CBOR) of the doc_blob. ```python encoder.encode(doc_blob) ``` - This is then saved in files inside the `~/.papyri/ingest` directory - At this point we also save refs/links to database - The data base is saved in a file at ``~/.papyri/ingest/` as `papyri.db` ## `papyri.db` The `papyri.db` database contains the following tables ``` main.destinations main.documents main.links ``` This is managed by `graphstore` module (Class abstraction over the filesystem to store documents in a graph-like structure) ### `destinations` |id |package |version |category |identifier | |---|--------------|---------------|----------|---------------------| |1 |numpy |1.23.4 |module |numpy.ndarray | |2 |current-module|current-version|to-resolve|ogrid | |3 |builtins |* |module |builtins.tuple | |4 |numpy |1.23.4 |module |numpy.indices | |5 |current-module|current-version|to-resolve|mgrid | |6 |numpy |* |module |numpy.ndarray.reshape| ### `documents` |id |package |version |category |identifier | |---|--------------|---------------|----------|---------------------| |1 |numpy |1.23.4 |assets |fig-numpy.kaiser-1-ce19905e.png| |2 |numpy |1.23.4 |assets |fig-numpy.histogram2d-0-3819e7bf.png| |30 |numpy |1.23.4 |module |numpy.polynomial.hermite.hermfit| |31 |numpy |1.23.4 |module |numpy.lib.function_base._i0_dispatcher| |32 |numpy |1.23.4 |module |numpy.lib.index_tricks.MGridClass| ### `links` |id |source |dest |metadata | |---|--------------|---------------|----------| |1 |29 |1 |debug | |2 |29 |2 |debug | |3 |29 |3 |debug | |4 |29 |4 |debug | |5 |29 |5 |debug | ## `render` command Example: `papyri render` This does static rendering of all the given files. - This decodes the ingested blobs (cbor2 bytes) to get the doc_blob back. - That doc_blob is passed to a jinja template (`html.tpl.j2`) to render the html. - Each html for api is written into: `~/.papyri/html/p/numpy/1.23.4/api/<qa>.html` - The html jinja template also has the logic for ordering of various sections in the DocBlob. DocBlob Attributes: (Understanding one of the Nodes) ```python >>> doc_blob.content.keys() dict_keys(['Attributes', 'Extended Summary', 'Methods', 'Notes', 'Other Parameters', 'Parameters', 'Raises', 'Receives', 'Returns', 'Summary', 'Warnings', 'Warns', 'Yields']) >>> returns = doc_blob.content['Returns'] >>> type(returns) <class 'papyri.take2.Section'> >>> type(returns.children[0]) <class 'papyri.take2.Parameters'> >>> parameters = returns.children[0] >>> type(parameters.children[0]) <class 'papyri.take2.Param'> >>> param = parameters.children[0] >>> type(param.children[0]) <class 'papyri.take2.Paragraph'> >>> paragraph = param.children[0] >>> type(paragraph.children[0]) <class 'papyri.take2.Words'> >>> words = paragraph.children[0] >>> words.value 'Chebyshev coefficients ordered from low to high. If ' ``` ``` [Returns - Section] | V [Parameters] | V [Param] | V [Paragraph] | V [Words] ``` ## `serve` command Example `papyri serve` This serves the rendered html files. ## MyST *myst-spec is in development; any structures or features present in the JSON schema may change at any time without notice.* ### Directives & Roles - Roles and directives are two of the most powerful parts of MyST. - They both serve a similar purpose, but roles are written in one line whereas directives span many lines. ## Questions Q1: What's `rst.so`? ```python pth = str(Path(__file__).parent / "rst.so") RST = Language(pth, "rst") parser = Parser() parser.set_language(RST) ``` Q2: Would need to find equivalent of each (almost - with some manual additions) item in the current ast in the myst spec to replace the current ast with myst? ## Actions Items - [X] Understand how various commands in `papyri` works. - [X] Understand Tree sitter ast on higher level. - [X] Understand Current ast on a higher level. - [ ] Improve/Fix json schema to python dataclasses code - [ ] Create another `myst.py` (or `take3.py`) to return myst ast after tree-sitter parsing. ## Using MySt AST - Trying to replace Word/Words with Text from MyST ast - The `Words` Node in current AST is different from the `Text` in MyST AST. Words is a single word and Text is a continous block of words. - Needs to figure out a way for that single element to pass all the asertions during the construction of the tree, like for example: ```python # Ref: papyri/tree.py:366 # c is Myst Text and Node is the one defined in take2.py # Whereas c is an instance of the Node defined in myst_ast.py assert isinstance(c, Node), c ``` Trying it on `numpy.distutils` ```python papyri gen examples/numpy.toml --only numpy.distutils ``` ### Current AST: ```python [<Section: |children: [<Paragraph: | |children: [An enhanced distutils, providing support for Fortran compilers, for BLAS, LAPACK and other common libraries for numerical computing, and more.] | |>, <Paragraph: | |children: [Public submodules are: ] | |>, <BlockVerbatim '47'>, <Paragraph: | |children: [For details, please see the , *Packaging*, and , *NumPy Distutils User Guide*, sections of the NumPy Reference Guide.] | |>, <Paragraph: | |children: [For configuring the preference for and location of libraries like BLAS and LAPACK, and for setting include paths and similar build options, please see , <Verbatim ``site.cfg.example``>, in the root of the NumPy repository or sdist.] | |>] |title: None |level: 0 |target: None |>] ``` ```python= # nss - above mentioned structure >>> nss[0].children[0].children[0] An enhanced distutils, providing support for Fortran compilers, for BLAS, LAPACK and other common libraries for numerical computing, and more. >>> type(nss[0].children[0].children[0]) <class 'papyri.take2.Words'> ``` ### New Attempted AST ```python >>> nss [<Section: |children: [<Paragraph: | |children: [<MText: | | |value: 'An enhanced distutils, providing support for Fortran compilers, for BLAS, LAPACK and other common libraries for numerical computing, and more.' | | |>] | |>, <Paragraph: | |children: [<MText: | | |value: 'Public submodules are ' | | |>] | |>, <BlockVerbatim '47'>, <Paragraph: | |children: [<MText: | | |value: 'For details, please see the ' | | |>, *Packaging*, <MText: | | |value: ' and ' | | |>, *NumPy Distutils User Guide*, <MText: | | |value: ' sections of the NumPy Reference Guide.' | | |>] | |>, <Paragraph: | |children: [<MText: | | |value: 'For configuring the preference for and location of libraries like BLAS and LAPACK, and for setting include paths and similar build options, please see ' | | |>, <Verbatim ``site.cfg.example``>, <MText: | | |value: ' in the root of the NumPy repository or sdist.' | | |>] | |>] |title: None |level: 0 |target: None |> ] ``` ### Current Parsing Flow - Node ```python >>> type(root) <class 'papyri.ts.Node'> tsv.visit_document(root) ``` - `Word`(s) are compressed into `Words` object in the `visit_paragraph` function. --- .. plot:: :format: png import matplotlib.pyplot as plt ... Once parsed by tree sitter: Directive: children: #< list of N elements. Options: format: png Code: Text: value "import matplotlib.pyplot as plt." .. plot:: import matplotlib.pyplot as plt ... Once parsed by tree sitter: Directive: children: #< list of N elements. Code: Text: value "import matplotlib.pyplot as plt." You don't know if your first children is option or not. Directive: Options: Option or None Code: Words. --- :bulb: Probably: Currently the structure we return via out ast is same as the structure returned by tree sitter, as in the tree of nodes and that structure is different for myst. ## Links: - https://github.com/stsewd/tree-sitter-rst - https://myst.tools/docs/spec/myst-schema ## Actions Items - [x] Replace all node items with MySt ast in the `papyri gen examples/numpy.toml --only numpy.distutils` call. - [ ] Change serialisation to have json in myst ast properly and try rendering using myst tools. ## 22 March 2023 - Remove `BlockVerbatim` with `Code`? - Codecov Token fix To Replace/Remove remaining nodes: - [ ] `Verbatim` - [ ] `Directive` - [ ] `Link` - [ ] `Math` - [ ] `BlockMath` - [ ] `SubstitutionDef` - [ ] `SubstitutionRef` - [ ] `Target` - [ ] `Unimplemented` - [ ] `Comment` - [ ] `Fig` - [ ] `RefInfo` - [ ] `ListItem` - [ ] `Signature` - [ ] `NumpydocExample` - [ ] `NumpydocSeeAlso` - [ ] `NumpydocSignature` - [ ] `Section` - [ ] `Parameters` - [ ] `Param` - [ ] `Token` - [ ] `Code3` - [ ] `CodeLine` - [ ] `Code2` - [ ] `GenToken` - [ ] `Code` - [ ] `BlockQuote` - [ ] `Transition` - [ ] `Paragraph` - [ ] `Admonition` - [ ] `TocTree` - [ ] `BlockDirective` - [ ] `BlockVerbatim` - [ ] `Options` - [ ] `FieldList` - [ ] `FieldListItem` - [ ] `DefList` - [ ] `DefListItem` - [ ] `SeeAlsoItem`

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully