Try โ€‚โ€‰HackMD

Building package docs

Contacts: @julow @jonludlam @lortex

Overview

This document describes how we build and update the package docs on www.ocaml.org.

The documentation is produced from the built libraries using odoc, a tool that is designed and built from the ground up to understand how OCaml libraries are assembled and used. It allows linking between different libraries in different packages, ensuring that links end up at exactly the correct place, a task complicated by the fact that this depends upon the precise versions of all dependent packages required to build the package being documented.

These documents will be placed in the ocaml.org website combined with the information currently shown on https://opam.ocaml.org/packages/. This will be the canonical source of information of packages published in opam.

Voodoo

How it works

Building package docs with ocaml-docs-ci and Voodoo is an incremental process, and as new packages are added to opam repository the work required is restricted to the new package only. Occasionally we may rebuild large chunks of the website, for example when a new OCaml release is made, but updating the website for new packages should be a quick process.

Stages of the process:

  1. Wait until triggered by a change being pushed to opam-repository
  2. Decide what to build, recording decisions made. Concretely, we record the fact that we intend to build specific sets of opam packages at specific versions with a specific version of OCaml.
  3. Run the build with ocluster and extract all the information we need from each build, without making decisions about how it's presented. (voodoo-prep)
  4. Compile the obtained artifacts using odoc, using informations about the final package hierarchy, and generate html files for each package target. (voodoo-do)
  5. Generate index pages for all packages, and the global index page, linking all html files together. (voodoo-indexes)
  6. Goto 1

This is an ocurrent pipeline, using an ssh server to contain the generated artifacts.

  • voodoo-prep:
    • Output: prep/universes/<universe_id>/<package name>/<package version> directory. It's given a list of packages to prep, and the associated universe id.
  • voodoo-do:
    • Input: the compilation result of a package's dependencies
    • Input: the prep folder of the package to do
    • Compile the package artifacts into odoc, odocl and html files.
    • Output: compile/universes/<universe_id>/<package name>/<package version>/: odoc and odocl files
    • Output: html/universes/<universe_id>/<package name>/<package version>/: html files
    • if blessed:
      • Output: compile/packages/<package name>/<package version>/
      • Output: html/packages/<package name>/<package version>/
  • voodoo-indexes:
    • Generate the index pages as .mld files
    • Compile them using odoc

Dependency universes

A package in not just defined by the tuple of package name and package version. Additionally, it may be dependent on any of the packages it depends upon - for example, consider a package containing an mli file such as:

module M : Set.S with type elt = int

The expansion of this will depend on which version of the standard library it was compiled against.

A particular package is therefore specified by the triple of the package name, the package version, and the 'dependency universe hash'. This has is computed in the following way:

  1. Find all dependencies (including transitive dependencies, though not going 'through' the ocaml package) using opam.
  2. Sort and write them to a string, one package per line, in the format <package name>.<version>
  3. Compute md5 hash of the string.

For example:

conf-m4.1
ocaml.4.11.1
ocamlbuild.0.14.0
ocamlfind.1.8.1
topkg.1.0.3

which are the dependencies on this particular system for the package astring.0.8.5. The hash of this should be 92edc0c1c4ec93b2f61fdd7fc9491460

The type to uniquely identify a package is therefore given by:

type universe_id = Digest.t type package_name = string type package_version = string type package = universe_id * package_name * package_version

Handling packages, sub-packages and libraries

Because odoc handles include paths in the same way that OCaml does, and because we would like references to behave in the same familiar way that normal OCaml paths do, it makes sense to keep the odoc files in the identical directory structure to that of the associated cmt, cmti and cmi files. This does not imply that the directory structure of the output html files (or man/latex files) must mirror this. The implication of this is that we cannot determine sub-packages.

As an example of the various ways complex packages are layed out, we have the following case studies:

Case study: yaml

  • Compiled with dune.
  • Contains multiple packages, including a sub-sub-package:
yaml
yaml.bindings
yaml.bindings.types
yaml.c 
yaml.ffi 
yaml.types 
yaml.unix 
  • Each sub-package corresponds with precisely one META file No, there is a single META for yaml, describing every sub-packages.
  • Each sub-package corresponds with precisely one archive
  • Each package has an isolated include directory
  • All subdirs are underneath ~/.opam/$switch/lib/yaml

Case study: oasis

  • Not compiled with dune
  • Contains multiple packages:
oasis
oasis.base
oasis.builtin-plugins
oasis.cli
oasis.dynrun
  • Two META files - one in ~/.opam/$switch/lib/plugin-loader and the other in ~/.opam/$switch/lib/oasis
  • Each sub-package corresponds with precisely one archive
  • Multiple packages share the same directory

Case study: dose3

  • Not compiled with dune
  • Contains multiple packages:
dose3
dose3.algo
dose3.common
dose3.csw
dose3.debian
dose3.doseparse
dose3.doseparseNoRpm
dose3.npm
dose3.opam
dose3.pef
dose3.rpm
dose3.versioning
  • One META file, in ~/.opam/$switch/lib/dose3
  • The dose3 package contains multiple archives - "common.cma algo.cma versioning.cma pef.cma debian.cma csw.cma opam.cma npm.cma"
  • Sub-packages also contain the same archives - e.g. dose3.algo specifies algo.cma

Case study: stdlib and associated libraries

  • Not compiled with dune
  • Contains multiple libraries, the exact list depends on the OCaml version:
bigarray
bytes
compiler-libs
dynlink
ocamldoc
raw_spacetime
stdlib
str
threads
unix
  • META files not distributed with the package, they come with ocamlfind
  • META files in isolated directories, but many of the packages include dirs overlap

Sub-packages observations

  • Different sub-packages containing the same libraries is unusual.

Questions:

  • What do we do for something like dose3?

    • Can we just do nice docs for dune-based projects? probably not, not least due to Daniel's packages
    • How do we figure out which packages can be documented nicely? (e.g. no overlapping archives)
  • What do we do for the OCaml libraries (stdlib, seq, raw_spacetime, str etc โ€“ these don't have opam packages โ€“ mostly the META files come from the ocamlfind package)

  • What other packages will be painful? We have the 'corpus' compiled already, but missing files like META, dune-packages and so on.

Detecting sub-packages

We should detect sub-packages and group modules under them in package pages. This is an important information to be able to use them in Dune for example.

  • Some packages use subdirectories (eg. yaml)
  • Some packages have one archive per sub-package (eg. logs)
  • Some packages have intersecting archives (eg. dose3)
  • Some packages are "unwrapped" (eg. base)
  • One package have two lib/* directories and two META files (oasis). Maybe:
    • treat that as two different packages, one of which Opam doesn't know ?
    • find them using Opam's .changes files and treating the second package as a sub-package ?

Reliable ways to find them:

  • Querying ocamlfind is only way to pair sub-package names with archives. The library exposes a parser for META files.

    Currently, the CLI fails to print archives, for example %A is not working in ocamlfind query -format "%p %d %A"

    Later, we'll also need assemble to create a nice looking hierarchy for sub-packages, for example: .../<package>/<version>/<sub.package>/<modules>

Package content

  • Most packages don't have documentation pages but have:

  • doc/$/README.* (.org or .md)

  • doc/$/LICENSE.*

  • doc/$/CHANGES.*

  • lib/$/META Every packages except the stdlib have it. It's the only way to know sub packages.

  • lib/$/opam Added by opam.

  • lib/$/dune-package Added by dune, only in projects using dune. Contains the same informations as META.

  • lib/$/**.ml? Source files, intended to be seen by merlin or why not, "see code" links from documentation. (Odoc should do that someday !)

A few packages have documentation intended to be read by Odoc:

  • doc/$/odoc-pages/index.mld This is intended to be the entry point of the package's doc. assemble should use it has the package page, possibly modifying it to add a common header.
  • doc/$/odoc-pages/*.mld

Various other files we can find sometimes:

  • doc/$/*.ml In dbuezli's libraries, it is meant to be appended at the end of index.mld automatically.
  • Some packages have things in share/$ But these are not intended to be read by users (eg. emacs/vim plugins)

voodoo-prep adds some other informations that may be useful:

  • List of dependencies to other packages

Ocaml-docs-ci

This is the incremental pipeline to build documentation.

Repo: https://github.com/ocurrent/ocaml-docs-ci

1. Track the opam-repository

val v : Git.Commit.t Current.t -> t list Current.t

val pkg : t -> OpamPackage.t

Given an opam repository commit, list all its packages.

2. Solver

type t

type key

val keys : t -> key list

val get : key -> Package.t

val incremental :
  opam:Git.Commit.t Current.t ->
  Track.t list Current.t ->
  t Current.t

An incremental solver, performing opam-0install solves when new packages are added. After the solving step, we obtain a list of Package.t which corresponds to packages and their associated universes.

3. Jobs

type t = { install : Package.t; prep : Package.t list }

val schedule : targets : Package.t list jobs -> t list 

From the list of packages to obtain, generate a list of prep jobs to perform. Each job consists in a single package to install and multiple packages to prep.

4. Build and prepare artifacts

type t

val package : t -> Package.t

val folder : t -> Fpath.t

val artifacts_digest : t -> string

val v : voodoo:Voodoo.t Current.t -> digests:Folder_digest.t Current.t -> Jobs.t Current.t -> t list Current.t

Done via ocluster. Perform the prep step for one job. It will generate the prep folder of multiple packages. The Folder_digests.t value allows to track existing prep folders.

Prep data is stored in /prep/universes/<universe>/<name>/<version>/.

Should be updated when:

  • voodoo-prep changes (should not happen a lot)
  • the upstream /prep folder digest is invalidated

5. Compile

type t

val digest : t -> string

val artifacts_digest : t -> string

val is_blessed : t -> bool

val package : t -> Package.t

val folder : t -> Fpath.t

val odoc : t -> Mld.Gen.odoc_dyn

val v :
  voodoo:Voodoo.t Current.t ->
  digests:Folder_digest.t Current.t ->
  blessed:Package.Blessed.t Current.t ->
  deps:t list Current.t ->
  Prep.t Current.t ->
  t Current.t

Done via ocluster. Compile .odoc, .odocl and .html files for one package, given its prep result and the compile result of its dependencies.

Output (when blessed, otherwise replace packages by universes/<universe_id>):

  • generated index: /compile/packages/<name>/page-<version>.odoc(l)
  • odoc, odocl: /compile/packages/<name>/<version>/
  • html: /html/packages/<name>/<version>/

6. Indexes

val v : Compile.t list Current.t -> unit Current.t

Given the list of the successfully compiled packages, generate the index pages and compile them to HTML. Done on the host machine.

Output:

  • /html/packages/index.html
  • /html/packages/<name>/index.html
  • /html/universes/index.html
  • /html/universes/<universe_id>/index.html

Voodoo-prep

Current repo: https://github.com/ocaml-doc/voodoo

The job as submitted by the pipeline will install a specific set of packages. Once the install has completed, voodoo-prep, the binary, will be executed.

Voodoo-prep (the tool)

Voodoo-prep is run after the build of a particular set of packages has been completed. It is run in the environment in which the build succeeded.

We iterate through all of the packages installed in the opam environment, and go through the files installed as part of each package, as recorded by ~/.opam/<switch>/.opam-switch/install/<package>.changes. The tool collects the following types of files:

File type Reason
.cmti This is what odoc would prefer to operate on.
.cmt Odoc will use cmt files for analysing usage of identifiers.
.cmi Only if the above two files don't exist will odoc resort to using cmi files.
.mld, .md, examples, contents of doc opam dir These are documentation files.
.cm(x)a info, dune-package, META voodoo-do may use the info from these files to organise the documentation into libraries/subpackages.
a
All of the above files are copied into the following path:
prep/universes/<hash>/<package>/<version>/...

where the ... represents the identical path the files appear under ~/.opam/<switch>/. hash, package and version are the triple that uniquely identifies a package as described above.

The info contained in cmxa/cma libraries installed as part of the package is collected as follows:

ocamlobjinfo <lib>.cm{a,xa}

The opam file is collected from ~/.opam/<switch>/.opam-switch/packages/<package_name>.<package_version>/opam"

Version.mld

The contents of this file will be rendered when someone visits the URL http://docs.ocaml.org/packages/$package/$version/ and is therefore the landing page for the package as a whole. As such it needs to contain all the important info needed. It should contain:

  • Name
  • List of modules

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
Missing informations:

  • Findlib packages names for every sub-packages. This may not correspond exactly to the Opam name. This is useful for copy/pasting into Dune files.
  • Link to rendered README, CHANGELOG, LICENSE
  • Package Dependencies (references to other packages)
  • List of toplevel modules sorted by sub-packages Which can be improved by showing modules' doc, see https://github.com/ocaml/odoc/issues/297 and https://github.com/ocaml/odoc/issues/478 Currently sub-packages are not recognized. There is some bit of code that looks at directories tree but that's wrong, the goal was just to demonstrate how the output should look.

The package may contain an index.mld file. It must be concatenated at the end of version.mld with its level-0 headings removed. This way, every package pages share a common header.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
The following section is experimental and is still at the prototype stage

Examples follow:

{0 Package 'yaml' version 2.5.0}

{1 [yaml]}

{!modules: Yaml}

{1 [yaml.bindings]}

{!modules: Yaml_bindings}

...

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
This is the current output, it might need some improvements:

  • The quotes in the title are not nice
  • A lot of informations are missing, see the TODO above

There is one section for every sub-packages, using findlib's name, containing the lists of modules. See Detecting sub-packages above.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
It would be useful to be able to tell which subpackage contains which module, by URL/breadcrumbs

Suggested layout:

/packages/$package/$version/TopLevelModules/index.html
/packages/$package/$version/$subpackage/SubPackageModule/index.html

For example for yaml:

/packages/yaml/2.1.0/Yaml/index.html
/packages/yaml/2.1.0/Yaml/Stream/index.html
/packages/yaml/2.1.0/yaml.bindings/Yaml_bindings/index.html
/packages/yaml/2.1.0/yaml.bindings.types/Yaml_bindings_types/index.html

The example above is making an exception for the "main" library. This is done only if that library have the same name as the Opam package. (common under Dune)

Dependencies

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
Currently, prep doesn't collect dependencies and assemble compute them

Open questions

Handling of 'special' packages:

  • ocaml-secondary-compiler
  • ocamlfind-secondary

These two are currently simply blacklisted and removed from universe calculations

  • conf-*

These currently don't produce anything but add many extra steps. Should they be blacklisted?

  • ocaml-base-compiler / ocaml

Currently Stdlib docs are under ocaml-base-compiler - we probably want them under 'ocaml' instead? or maybe elsewhere?

'Blessed' status

Further thoughts

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
This section is experimental and is still at the prototype stage

  • Extra click in wrapped libraries Some libraries have one top-level module, which has the whole library as submodules. This module is generated by Dune and is often not very useful in the doc, it's an unordered list of modules. We could inline it into the package page and avoid an unecessary click. Some packages document this module carefully and it sometimes contains types and values (for example base).

  • weird packages:

    • ocaml has topdirs.cmti in 2 places โ€“ lib/ocaml and lib/ocaml/compiler-libs/
    • ocaml-compiler-libs has many unresolved module aliases (expected because it's a namespacing package) - thus odoc compile-deps will miss many of its dependencies, which will need to come from the opam metadata (likely why odig fails to link them)