owned this note changed a year ago
Linked with GitHub

Odoc to generate ocaml.org-like websites

Introduction

Odoc lacks some features and conventions to define manuals. Indeed, there are multiple things we want in manuals, but that odoc does not provide:

  • A good sidebar,
  • A hierarchical documentation,
  • A notion of "root" which restricts what is shown in the sidebar/breadcrumbs
  • The ability to make relative references between pages,
  • A way to distribute assets like images, code samples, etc..,
  • The ability to rebuild only parts of the documentation, incrementally.
  • An API for render source code which is friendlier to incremental build systems.

In this document, we describe potential modifications to odoc that would make it fit to generate manuals. Ideally, the new API and conventions would not need to be changed to support planned and forseable features.

Specification

In this section, we specify a proposal for generating OCaml.org-like website, that matches the "Expected outcome".

This is done in several parts:

  • Hierarchical documentation: a convention for building hierarchical docs, and some considerations on the breadcrumbs and sidebar
  • A modification of the allowed references
  • A specification on how resolving is done
  • A modification of the CLI

Directories

The parent and children concept is removed. "Directories" are introduced and act as the parent of what they contain. The fully-qualified identifier of each unit is specified on the command line of compile and do not need to be looked up.

Directory names might collide with modules names. The driver has to be careful not to mix the manuals and the modules to prevent this clash. In case of conflicts, the behaviour is undefined.

Module search path: -I, -P

Resolving root modules is kept eut in scope in each phase. Even if it is not changed, it needs to be defined, since it was previously not well xactly as before. The scope consists of all modules that are accessible directly in the module search path.

The module search path is an ordered collection of paths consisting of:

  • The directories given to -I.
  • The directories computed from the paths given to -P. See later section on -P.

In case of a conflict, two strategies:

  • Modules appearing in the cmt's imports field have a digest associated. For these modules, all candidate modules are loaded until the first module that has the matching digest.
  • Modules lookup without a digest (for example, from a reference), the module found in the directory specified first is used.

This should be the current behavior.

Pages are no longer looked up in the paths specified with -I and -P.

The search path to give when compiling or linking an odoc unit will be specified in the documentation for driving odoc:

  • During the compiling phase, the search path should include all ocaml dependencies.
  • During the linking phase, the search path should include all ocaml dependencies first, but also other modules for forward references. TODO: should we let odoc decide the search path, since it has the dependencies from the compile phase as well as the whole hierarchy? See Search-path-less link at the end of the document.

Hierarchical search path: -R

The search path for pages and root modules is not specified in a unified way as the lookup strategy is fundamentally different.

Pages are no longer looked up in the module search path:

  • They are hierarchical, compared to the flat namespace of root modules
  • We want to resolve them relative to where we are in the hierarchy
  • There's several pages of the same name, differentiated only by their location in the hierarchy.

The hierarchical search path only applies to references to pages and qualified references to modules (that start with a directory or a root). It is specified with the option -R, which can be repeated. R stands for "root" of the hierarchy.

Modules appearing in the module search path must also be included in the hierarchy as they can be looked up by their fully qualified identifier (eg. by the sidebar).

The concept of parents and children is removed in a non-compatible way. The option --pkg is kept for compatibility (see below).

-R

-R specifies the root of the hierarchical search path. -R can be repeated, the search path will be the union of all the specified paths. In case of a conflict, the path specified first is used.

-P

-P specifies directories within the hierarchy and adds them to the module search path. It is similar to -I except that the passed path is relative to the union of the paths passed to -R. Directories are separated with a . instead of a /, as it is an identifier to a directory.

For example, -R sp1 -R sp2 -P root-pkgs1.lib1 is equivalent to -I sp1/root-pkgs1/lib1 -I sp2/root-pkgs1/lib1.

The motivation for this new option is that it is lighter to use and cannot add directories that are outside of the hierarchy to the search path.

Unit fully-qualified identifier: --id

Each unit (page or module) is given a fully-qualified identifier at compile-time with the new --id argument.

In the following example, Foo, Bar and the page index are siblings:

odoc compile --id=root-pkgs1.lib1.Foo -R sp/ -P root-pkgs1.lib1 lib1/foo.cmti
odoc compile --id=root-pkgs1.lib1.Bar -R sp/ -P root-pkgs1.lib1 lib1/bar.cmti
odoc compile --id=root-pkgs1.lib1.index -R sp/ -P root-pkgs1.lib1 lib1/index.mld

--id and -o

The --id option specifies the fully-qualified identifier of a page, module, asset, etc..

When the hierarchical search path is used (-R is given at least once), --id is mandatory and -o is not allowed. Otherwise, --id is not allowed (see compatibility with --pkg).

With --id and -R, the output path is computed from the identifier and outputed into the path specified with -R. If -R is given several times, the first is used when computing the output path.

-o is not allowed in combination with --id because they would both be given the same data in a different format and it's an opportunity to give inconsistent input. The first -R is used because it's also a good idea to pass the hierarchy being built first so it shadows dependencies in case of conflicts.

Root directories

Root directories are directories which are prefixed by root-. They have a special meaning in some cases:

  • The sidebar and the breadcrumbs won't go through a root-page while rendering the hierarchy.
  • {!root(<module>)} is a reference to the root directory of a module The passed module or package reference is first resolved, then the resulting ID is stripped of its leaf until it is a root identifier.
  • {!root} is a reference to the root directory of the current unit

There can be nested roots. Relative reference can go through roots but the convention does not suggest a layout for roots.

There can be a conflict between root directories and directories if they have the same name and are siblings. In this case, the behaviour is undefined.

References to pages

Contrary to the flat namespace of modules, pages are in a hierarchy and referencing a page is always done relative to:

  • the parent directory when resolving from a page,
  • the root directory when resolving from a module. This allows the driver more freedom in where modules are placed in the hierarchy without breaking page references.

Here's an example of hierarchy:

root-pkgs1.index
root-pkgs1.tutorial.index
root-pkgs1.lib1.Module1

References to directories

It's possible to reference directories that contain an index.mld page. The reference will point to that page. Directories that do not contain an index.mld page fail to resolve.

In the following hierarchy, the reference {!tutorial} is ambiguous:

pkgs1/tutorial.mld
pkgs1/tutorial/index.mld

Odoc will warn about the ambiguous reference. The page is chosen preferably. This preference is important when referencing index pages for directories in rendered source code.

Relative references

{!tutorial.index} and {!lib1.Module1} are valid references from root-pkgs1.index.

{!^.index} is a valid reference to root-pkgs1.index from root-pkgs1.tutorial.index: ^ represents the parent directory.

References from pages to modules

As modules are also placed in a flat namespace, {!Module1} is a valid reference from any pages.

References from modules to pages

References from modules to pages are resolved from the root directory. For example, from Module1, {!index} finds root-pkgs1.index.

New syntaxes in references

The following syntax is added to references:

  • {!root} represents the root page
  • {!root(Module)} represent the root page of the referenced module. The module is first resolved, then the suffix is removed until a identifier to a root is found.
  • {!dir-dirname}, {!dir:dirname} Prefix to disambiguate directory segments in a reference. For example {!a.b} could be the page b in directory a or the label b in the page a. Note that directory references won't resolve unless the directory has an index page.
  • {!^.page} Reference through the parent directory of the current directory.

To add references to source, we could add the following syntax:

  • {src! Module.foo}
  • {!src(Module.value)} Reference to the source code of a value.

TODO: decide between the two syntax above.

Pages named index.mld

Pages named index have a special meaning in some cases:

  • In the sidebar. Other pages not named index.mld in the same directory are indented relative to it.
  • In the breadcrumbs. The "index.mld" segment is removed from the breadcrumbs.
  • When resolving a reference to a directory. If the directory contains a page named index, it is linked to, otherwise the reference fails to resolve.

Other than that, pages named index behave like the other pages. In particular, they are siblings to the other pages in the same directory.

The breadcrumbs for a page contain the .mld to disambiguate pages and directories. For example:

  • The breadcrumbs for the page pkg1.tutorial is pkg1 > tutorial.mld
  • The breadcrumbs for the page pkg1.tutorial.index is pkg1 > tutorial

Pages named index.mld have a special meaning in the sidebar. For example, given this hierarchy:

pkgs1/
    index.mld
    tutorial.mld
    tutorial/
        index.mld
    examples/
        example1.mld

The sidebar of pkgs1/index.mld might look like this:

Title of pkgs1/index.mld
    Title of pkgs1/tutorial.mld
    Title of pkgs1/tutorial/index.mld
    _examples_
        Title of pkgs1/examples/example1.mld

Directories containing other pages but no index.mld are allowed. When the sidebar would render a directory that do not have an index.mld, it instead renders the list of all the pages in that directory. This avoids having pages that are unreachable from the sidebar. It is reasonnable for a driver to generate missing index.mld pages but not mandatory.

Source code

This source code spec diverges from https://github.com/ocaml/odoc/pull/1067, which shouldn't be taken into account.

Source trees are placed in the hierarchy, like pages.

The compilation pipeline for source code is broken down into:

  • odoc compile-impl --id=root-pkgs1.lib1.Foo --src-id='root-pkgs1.src.lib1."foo.ml"' -R sp/ -I sp/root-pkgs1/lib1 lib1/foo.cmt The compile-impl commands compiles the implementation of a module and outputs a dedicated .odoc file. The argument given to --id is the same that is given when compiling the interface with odoc compile. However, internally the identifier differ as they are of different kind. Contains the module's shape, the UID to ID table and typedtree info. It is also used for the occurrences feature. This command do not take the source file. Comp units are prefixed with impl-.
  • odoc compile-src --id='root-pkgs1.src.lib1."foo.ml"' -R sp/ --impl=sp/root-pkgs1/lib1/impl-foo.odoc Materialize the source code of a module into the hierarchy. Source units are given an identifier and placed in the hierarchy, like pages and assets. The implementation is passed through --impl. If --impl is not passed, only syntax highlighting is applied. The syntax highlighting is supported for source files matching *.ml, *.mli, dune, dune-project, *.opam. Comp units are prefixed with src-.
  • link: impl- and src- units go through the link phase. Linking impl- units is useful only if count-occurrences is enabled as it is not used in the generate phase.
  • odoc html-generate --source lib1/foo.ml sp/root-pkgs1/lib1/src-foo.odocl The --source option gives the path to the source file. It is valid only if the unit is an src-.

Note: asset- and src- are very similar except that src- units are later processed while asset- units are untouched.

Sources might conflict with pages or assets, for example the source file lib1/dune generates a file html/pkgs1/lib1/dune.html, which would conflict with the page lib1/dune.mld or the asset lib1/dune.html. The behavior is undefined.

Reference

Source files can be referenced similarly to pages. For example: {!src.lib1."foo.ml"} is a valid reference from a module.

odoc compile-index

This command generates then compile a page that lists the content of the directory it is placed in.

odoc compile-index --id=root-pkgs1.src.lib1.index -R sp/

The command has no dependency. The directory is listed and the page is actually generated during the link phase. It is commonly used to generate index pages in source trees. The generated pages can be referenced like any other page.

Assets

Assets are placed in the hierarchy, like pages. Asset units are prefixed with asset-.

Assets are compiled, linked and generated with these new commands:

  • odoc compile-asset --id=root-pkgs1."img.png" -R sp/ This create a unit materializing the asset in the hierarchical search path. References to assets are resolved like pages. The unit contains the identifier of the asset.
  • odoc link -R sp/ sp/root-pkgs1/asset-img.png.odoc This command does nothing but is allowed and might do something in the future.
  • odoc html-generate --asset src/img.png -o html/ sp/root-pkgs1/asset-img.png.odocl This copies the asset into the output directory at the right location. The --asset option gives the path to the source of the asset. It is valid only if the unit is an asset-.

Assets that conflicts with generated files are allowed due to a lack of a satisfying detection mechanism. Behavior in case of a conflict with a generated file is undefined.

Asset references

Lookup is identical to pages:

  • Assets are placed in the hierarchy.
  • Unqualified references are resolved locally.
  • Relative references are allowed.

Compatibility with older drivers

--pkg

When --pkg is passed:

  • --id, -R and -P are disallowed and result in an error if passed.
  • A flat hierarchy is assumed.
  • The given pkg name is used as the root identifier.
  • Pages are looked up in the search path specified with -I.
  • The default path for -o is in the same directory as the input.

Not passing any of --pkg or --id is disallowed.

odoc compile --pkg=pkg1 -I . foo.cmti
odoc compile --pkg=pkg1 -I . index.mld

are equivalent to:

odoc compile --id=root-pkg1.Foo -I . foo.cmti
odoc compile --id=root-pkg1.index -I . index.mld

--parent and --child

These options are removed. There's no equivalent, the drivers must be rewritten.

Count occurrences

Count occurrences now act on linked impl-*.odocl files.

Support commands

-deps and -targets commands are added for the new compile commands:

  • Add compile-impl-deps
  • Add compile-impl-targets
  • Add compile-src-deps
  • Add compile-src-targets
  • Add compile-asset-deps
  • Add compile-asset-targets

Existing support commands are updated:

  • link-deps supports impl-, src- and asset- units.
  • compile-deps and compile-targets do not take source files into account.

List of breaking changes

  • Dune used to rename the page named index.mld into <package name>.mld and this was used in references in the wild. This is now not part of the convention.
  • All source rendering CLI is breakingly changed.
  • --parent and --child is removed.

Conventions

Conventions for installed packages

Installed packages install:

  • Pages and assets "in their hierarchy" in the doc/<package-name>/odoc-pages/ directory.
  • Source code is placed in doc/<package-name>/sources/, if possible respecting the original hierarchy.
  • The mapping between module names and the corresponding source file is written in doc/<package-name>/sources/sources.sexp.
doc/<pkg>/sources/src/lib1/foo.ml
doc/<pkg>/sources/sources.sexp    # contains the map between ml and unit name.

Convention for drivers

The driver should follow the following basic structure for the documentation of a package (given as a hierarchical search path):

<...>/root-<pkgname>/
<...>/root-<pkgname>/<modules>
<...>/root-<pkgname>/<pages hierarchy>
<...>/root-<pkgname>/sources/<source hierarchy>

where:

  • <...> is unspecified and can be decided by the driver
  • <page hierarchy> is the page hierarchy as found in doc/<package-name>/odoc-pages.
  • <source hierarchy> is the hierarchy as found in the doc/<package-name>/sources. The map between module names and source files is found in sexp format in the doc/<package-name>/sources/sources.sexp file.

Libraries

We might want in the future to decide in which directory the modules are put. This could be done by introducing a new .sexp file, which specifies the directory of each module. These intermediate directories could be though of libraries. The page hierarchy would merge with this new hierarchy. The page libname/index.mld would be the entry point of the library.

Ideas for later

Specifying the search path to the link command is not satisfying:

  • The number of paths is huge.
  • The driver decides what references are possible by constructing the search path. This cannot be specified by Odoc and thus cannot be improved in Odoc. The different drivers might differ.

We propose the following strategy to resolve modules during the link phase using a single -R flag:

  • The module search path is not needed, though accepted for compatibility and precedence. It is looked up first.
  • The directory of the current unit is searched.
  • The parent directories, up to the root of the current unit are search. The siblings directories at every levels are searched recursively. The order in which directories appear in the sidebar is used to order the search.
  • The directory of all the dependencies looked up during compile are searched. That is, all the modules specified in the imports field as well as other root modules that were looked up during compile. This search is not recursive.

This gives more control to Odoc, a more defined API to users and less responsibility to drivers.

TODO:

  • Hierarchical search path is also used to lookup modules during link.
  • When the hierarchical search path is used, -I is not accepted in link.

Example

For example, the package a depends on package b. From, the module Bar, the search order is A, C, B, F, D, E.

sp/
  a/
    root-a/
      lib1/
        lib1_unix/
          q/s/d/f/g/h/C
          B
        lib1_eio/
          F
        lib1_concrete/
          Bar
        A
      lib2/
        D
    root-b/
      lib1/
        E

TODO: Define how to define a reference to a module when there is a name clash: two modules with the same name in two different libraries in the same package.

If we change the layout to address the issue above, we can use redirect to have ocaml.org old links still be valid. Replicate the hierarchy in installed packages.

Select a repo