Try   HackMD

Outreachy

Sign up as mentor

Schedules/deadlines

Past OCaml projects

This is a list of most OCaml Outreachy projects that have been / will be submitted.

Outreachy Winter 2023/24

Outreachy Summer 2023

Persistent storage in MirageOS

Project short title

Persistent storage in MirageOS

Long description

MirageOS is a library operating system implemented in the functional programming language OCaml. Applications written for Mirage are compiled into an operating system with just the required device drivers and a single addres space that only runs your application. Often these applications are run as virtual machines providing strong isolation between other applications running on the same machine. A number of different hypervisors and tenders are supported such as Xen, KVM with virtio, KVM with a solo5 tender, BSD bhyve as well as a regular unix binary or a seccomp linux binary.

While MirageOS has support for block devices and a few filesystems such as tar and fat16 the focus has been more on using remote storage such as git. Remote storage has a number of benefits, but introduces latency and more complexity in setup, operations and error handling in the applications.

This project’s focus is on improving (local) persistent storage in MirageOS. We will focus on partition tables (MBR, GPT), improving existing filesystem implementations and a swap-like abstraction on block devices. Example unikernels will be developed as well serving both as documentation and as functional tests.

Internship tasks

  • Write example unikernel(s) using partition tables and file systems. As a base a repository for block device partitioning exists: https://github.com/reynir/mirage-block-partition
  • Improve MBR support and integrate mbr “devices” into the Mirage tool. Rewrite example unikernels with the now built-in devices.3
  • Implement GPT (GUID Partition Table). Existing work exists that can be used to build an implementation on top of. This can be skipped.
  • Update the FAT implementation and fix bugs. The implementation assumes properties that are only valid for the Xen backend and can lead to bugs, and has suffered from bitrot.
  • Implement a swap-like abstraction on top of a block device to keep track of temporary allocations.
  • Use the above swap-like abstraction with the read/write tar implementation to allow downloading archives and writing them to the filesystem only if they match a checksum. All in a streaming fashion.

Minimum system requirements

Linux based system preferably. Mac or Windows might be possible, too, but are considerably harder.

Project Skills

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

How can applicants make a contribution to your project?

For this project, you can refer to the contribution guide here: ???

And find "good first issues" to fix here: ???

Applicants can contribute to this project through the project repository or contribution page. The project uses an issue tracker to keep information about bugs to fix, project features to implement, documentation to write, and more. Applicants can look for newcomer-friendly issues to use for their first contributions by looking for the following issue tags in the project issue tracker: good-first-issue

The best way to contribute to the project is to have a look at the GitHub repository. The best page to look at first is OCaml's "getting started" with a development environment, this will guide you through getting OCaml installed using the OCaml Package Manager (opam). Many different operating systems are supported so be sure to follow the guide for your OS.

Intern benefits:

  • An opportunity to learn a functional programming language OCaml
  • A great way to learn in depth aspects of operating systems
  • Explore and learn details of file systems

Project Contribution Information

As part of the application process, all applicants must make at least one contribution to be accepted as an intern for this project. Only applicants who make a contribution will be eligible to be accepted as interns.

Applicants can contribute to this project through the project repository or contribution page.

Have a look at our dedicated issue for Outreachy applicants.

MIDI over Ethernet with MirageOS

Long description

OCaml is a functional, strongly type-safe and semantically rich programming language.

MirageOS is a unikernel operating system that is designed to create self-contained applications that are highly secure and efficient by design. Unikernels are different from traditional operating systems in that they are built to run a single application and have no extraneous functionality. MirageOS is written in OCaml and can build binaries for various operating systems, including Linux, macOS, and *BSD. MirageOS also directly supports targetting hypervisors like Xen and KVM, as well as supported hardware (like e.g. the Raspberry Pi).

MIDI is a protocol made to carry music data; among many other things, it’s used for synthesizers. Nowadays, the most common way is to use it over the USB protocol. However, there are other options as well, such as using it over Ethernet. That’s great for us! MirageOs does support Ethernet - while it doesn’t support USB.

This project will consist of exploring the MIDI over Ethernet idea for the OCaml ecosystem. The first step will be to write a POC (Proof of Concept) for it: we’ll want to send (or receive) one single MIDI signal over Ethernet. Notice that you will also explore and implement the Real-Time Transport Protocol (RTP) as an in-between layer, as RTP traditionally is used for low latency audio and video streaming over IP.

Additionally (or alternatively) there are many other steps that can be explored:

  • making your POC work inside a MirageOS unikernel
  • or extending your POC to a more complete MIDI client (or server)
  • or forgetting about Ethernet for a while, and writing a nice MIDI library in OCaml

So this project is for you, if you are:

  • a music enthusiast and excited about exploring MIDI in our functional programming language OCaml
  • or familiar with and strongly interested in functional programming and like the idea of getting proficient at it through the beautiful and pragmatic lens of the OCaml programming language
  • or you love exploring network protocols, such as Ethernet and RTP and would love to use them to send specific data (MIDI in this case)
  • or you are interested in MirageOS or unikernels in general

Internship tasks

The intern will work on a few of the following tasks (clearly not all; really just like 1-4). They will work with experienced mentors to gain an understanding of functional programming, network programming, and systems programming. The intern will explore the context of MIDI and RTP together with the mentors.

Goals:

  • Explore the RTP protocol and MIDI over Ethernet.
  • Write a POC (Proof of Concept) for MIDI over Ethernet in OCaml: we would want to send or receive one single MIDI signal over Ethernet, for example setting the BPM (Beats Per Second).
  • Run your POC inside a MirageOS unikernel.
  • Extend your POC to a more complete MIDI client or server. With “MIDI client” we are referring to the ability to send MIDI data, with “MIDI server” we are referring to the ability to receive MIDI data.
  • Design and implement an abstraction library for MIDI. Ideally this library would be flexible enough to allow different backends (i.e. protocols over which MIDI would be served). For a start, as a backend, we can use an existing very low-level library (portmidi) serving MIDI over USB. So the first step would be to write a nice API / higher-level abstraction for this library.
  • Write unit tests and possibly integration tests to ensure the correctness and stability of the implementation.
  • Investigate whether and which MIDI devices (synths, drum machines, etc) support Ethernet.
  • Develop a USB-Ethernet proxy, if necessary.

Intern Benefits

The intern will gain valuable and generalizable experience in functional, network, and systems programming by focusing on OCaml and the MirageOS unikernel operating system. They will develop an API and work with different backends, gaining a deep understanding of MIDI and the connected protocols.

This project offers the opportunity to create something that could be beneficial to musicians and music enthusiasts.

Additionally, the intern will apply best practices, using version control and connected automated CI/CD infrastructure to deploy and test code with Github. The intern will also become familiar with the principles of unikernel design and development.

Project Skills

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Project contribution information

As part of the application process, all applicants must make at least one contribution to be accepted as an intern for this project. Only applicants who make a contribution will be eligible to be accepted as interns.

Some projects accept contributions through a project repository. This project has not provided a link to a project repository.

Have a look at our dedicated issue for Outreachy applicants.

Extend the network testing tool conntest with visualizations

Description

MirageOS is a unikernel operating system that is designed to create self-contained applications that are highly secure and efficient. Unikernels are different from traditional operating systems in that they are built to run a single application and have no extraneous functionality, thus making them more secure and efficient. MirageOS is written in OCaml and can build binaries for various operating systems, including Linux, macOS, and *BSD. MirageOS can also run on Xen and hardware targets.

Conntest is a low configuration unikernel, designed to both get your first setup up and running, and to monitor the traffic between many unikernels connected in a custom network - e.g. for stresstesting the network and the different unikernel backends. The current UI is a live updated commandline interface.

This project is about extending conntest with a way to visualize network statistics over time, and visualize the network itself. The idea is to implement this in several stages:

  • the first stage is to make a prototype visualization that
    • implements structured CLI output in conntest
    • reads the CLI outputs of conntest and visualizes it in simple ways (e.g. using graphviz and gnuplot)
  • the second stage is to work towards a live updated UI taking live input from a network of conntest instances
    • this can take the form of a separate conntest-server unikernel, which serves a live updated web-UI

Technically conntest is implemented in some interesting ways:

  • it is a MirageOS unikernel, which leads it to be abstracted over the underlying backend - e.g. the network stack
  • it is implemented in OCaml, which is a statically typed functional language with a powerful type- and module-system
  • it uses functional reactive programming to handle events from the custom network protocol, and update the CLI
  • it uses the declarative CLI library notty
  • it has its own communications protocol, abstracted on top of a lower layer protocol, which currently both has a TCP and UDP version
  • the network-protocols emit events via an abstracted Output interface (e.g. to the CLI) - which allows to easily implement alternative event-sinks

It is not required to understand these concepts, but hopefully several of them piques your interest.

Internship tasks

  • compile and run conntest!
  • stage 1:
    • implement a new Output module, where you log structured output (e.g. JSON) to the console
      • this includes becoming familiar with functional reactive programming for maintaining the state of the statistics, and for downsampling the input-events
    • implement bash scripts that visualize:
      • the conntest graph using e.g. graphviz
      • single conntest-connections statistics using e.g. gnuplot
  • stage 2:
    • brainstorm & mockup:
      • which statistics would be interesting to observe live
      • how these statistics could be best visualized in a live-updated visualization
    • implement of a separate conntest-server unikernel that conntest can connect to
      • send connection-statistics from conntest to conntest-server
      • conntest-server:
        • serve statistics from conntest-server as text
        • serve a custom SVG visualization using the tyxml and gg libraries
        • (extra) work towards pushing live data to frontend, implemented using the previously mentioned libraries and js_of_ocaml

As this project involves potentially a lot of new concepts for you, it is not expected that you get through stage 2, but it is always interesting to have an eye on the final goal.

Intern benefits

  • An opportunity to learn a functional programming language OCaml, which can be generalized to other languages
  • Get experience with MirageOS unikernels
  • Get experience with programming in functional ways with events, state and graphics
  • Learning to design and work on full-stack systems
  • Data analysis
  • Data visualization
  • Data design
  • Network protocols

Project Skills

Project contribution information

As part of the application process, all applicants must make at least one contribution to be accepted as an intern for this project. Only applicants who make a contribution will be eligible to be accepted as interns.

Applicants can contribute to this project through the project repository or contribution page.

Have a look at our dedicated issue for Outreachy applicants.

Outreachy Winter 2022/23

Outreachy Summer 2022

PRJ 1 - Expand OCaml 5.0 Parallel Benchmark suite

Project short title

Expand OCaml 5.0 Parallel Benchmark suite

Long Description

OCaml 5.0 will be live soon! It ships with support for shared-memory parallelism and concurrency OCaml has missed all these years. This will be accompanied by a robust set of Multicore libraries useful for parallel programming. The Multicore compiler and libraries are under active development and will continue to evolve as the OCaml ecosystem moves towards Multicore.

For assessing the impact of new features in the OCaml compiler and Multicore libraries, we have a set of sequential and parallel benchmarks present in our benchmark suite. While the sequential benchmarks contain many real-world applications, a wider set of parallel benchmarks would be useful.

This project entails gathering the parallel benchmarks available at various places like https://github.com/ckoparkar/ocaml-benchmarks and making them available in the benchmark suite.

Minimum system requirements

Linux based system preferably, could do with Mac or Windows too. Access to a server machine can be provided for running benchmarks.

Intern tasks

  • Make a set of pre-existing benchmarks available in Sandmark.
  • Improve some of the benchmarks with guidance from mentors.

Intern benefits

  • Can learn OCaml.
  • Pickup various tools such as Make, configs etc.
  • Gain some exposure to wrting parallel programs.
  • Modern open-source development: the projects use git version control, GitHub and automated CI infrastructure to deploy and test the code.

Community Benefits

As the community is geared towards moving to Multicore, a diverse set of benchmarks would aid in evolving the Multicore libraries. It can also act as a precursor for people taking a plunge at porting their code to Multicore.

Project skills

Project Skills Experience Level
Git 2
Unix Command line 3
Functional Programming 2
OCaml 1
Parallel programming 1

Mentors

  • Sudha Parimala
  • Shakthi Kannan
  • Gargi Sharma

Communication channels

  • GitHub issues
  • discord

To-do (Sudha): Add/label good-first-tasks in sandmark and current-bench repo.

Improve OPAM Respository CI usability

Project short title

Improve OPAM Respository CI usability

Long Description

The OPAM package manager is the heart of the OCaml community, providing functionality to manage library dependencies for OCaml projects and install the OCaml compiler. Every OCaml developer will interact with OPAM multiple times a day. Powering the index of available packages is opam-repo-ci which provides a CI service for packages; building, testing, and checking the quality of packages before they get accepted to OPAM. opam-repo-ci is an important system that can always benefit from improving the overall experience.

This project entails working with the opam-repo-ci developers to implement various quality of life improvements to the system. Many of which have been tagged with TODO on the project GitHub page https://github.com/ocurrent/opam-repo-ci/issues.

Minimum system requirements

Linux or Mac based system would be preferable, Windows is possible to get working but is more trouble than the other operating systems at present.

Intern Tasks

Work through the issues identified with TODO on the project GitHub page https://github.com/ocurrent/opam-repo-ci/issues. Choosing the most important or interesting in collaboration with the opam-repo-ci developers.

Intern Benefits

  • Learn to program in OCaml from experienced developers.
  • Contribute to key infrastructure in an open source community
  • Experience modern open-source development, using version control, GitHub and automated CI/CD infrastructure to deploy and test code.

Community Benefits

The opam repository on GitHub has over 19,000 contributions of new and updated packages since being created on GitHub. People rely daily on opam repostory being updated and maintaned, the suggested changes would improve and streamline the workflow for people doing the maintenance. Package contributors would benefit from a better UI experience and improved features, allowing for more checks to be carried out before packages are released to the OCaml community. The overall OCaml community benefits with improved stability and quality of packages.

Project skills

Project Skills Experience Level
Git 2
Unix Command line 3
Functional Programming 2
OCaml 1

Mentors

  • Tim McGilchrist
  • Patrick ??

Communication channels

  • GitHub issues
  • slack / discord

Improve OCaml CI

Projecct short title

Improve OCaml CI usability

Long Description

OCaml-CI is an ambitious convention driven CI system for building OCaml projects.
Hosted at ci.ocamllabs.io, ocaml-ci builds OCaml projects using the dune build tool and opam package manager against a number of OS and CPU architectures, along with performing linting checks, to provide confidence your code works the way you expect.

Sure you might get x86 Linux or Windows from some services but can you find PPC64, ARM64 and even IBM s390 builders?

OCaml-ci is built using OCurrent, a library for describing workflows in terms of self-adjusting (incremental) computations, and provides CI services for many open source OCaml projects like Mirage and Multicore OCaml libraries.

This project entails working with the ocaml-ci developers to implement various quality of life improvements to the CI system. Many of these issues have been tagged with good first issue or enhancement on https://github.com/ocurrent/ocaml-ci/issues and https://github.com/ocurrent/ocurrent/issues.

Minimum system requirements

Linux or Mac based system would be preferable, Windows is possible to get working but is more trouble than the other operating systems at present.

Intern Tasks

Work through the issues identified with good first issue or enhancement on https://github.com/ocurrent/ocaml-ci/issues and https://github.com/ocurrent/ocurrent/issues. Choosing the most important or interesting in collaboration with the ocaml-ci developers.

Intern Benefits

  • Learn to program in OCaml from experienced developers.
  • Contribute to key infrastructure in an open source community
  • Experience modern open-source development, using version control, GitHub and automated CI/CD infrastructure to deploy and test code.

Community Benefits

OCaml-CI aims to provide an opinionated dune and opam based CI system for the general OCaml community. Allowing open source OCaml projects to be run and tested on a wide range of operating systems and CPU architectures. We believe the community will benefits from having access to a wide range of hardware to compile and test on, allowing OCaml code to run on everything from an embedded ARM CPU with MirageOS up to large multicore
systems running critical infrastructure.

Already ocaml-ci is building many key OCaml libraries and we want to expand that number, improving the overall library ecosystem for OCaml.

Project skills

Project Skills Experience Level
Git 2
Unix Command line 3
Functional Programming 2
OCaml 1

Mentors

  • Tim McGilchrist
  • Patrick ??

Communication channels

  • GitHub issues
  • slack / discord

PRJ 4 - Expand OCaml's library of standard derivers

Project short title

Expand OCaml's library of standard derivers

Long description

It's common for programming languages to provide some way to meta-program in order to preprocess code before reaching the last compilation step, for example in the form of macros or templates. OCaml provides a meta-programming system for preprocessing which is very powerful. The terminology used for it is PPX (Pre-Processor eXtension); the term PPX is used both for the preprocessing system and for any preprocessor library itself.

PPX is based on transformations of the OCaml AST, so with PPXs the compilation of an OCaml program looks as follows: the OCaml compiler parses the code as normal into an AST and then passes that AST to the PPX driver which transforms it into a new AST, which in turn passes it back to the compiler to complete the compilation. For example, if you write code like

type forest = { name : string; number_trees : int option } [@@deriving make]

Then the compiler will transform that code into an AST, which by the PPX driver will be transformed into an AST representing the following code:

type forest = { name : string; number_trees : int option }
val make_forest : name:string -> ?number_trees:int -> forest

The function make_forest was generated by the PPX make. It allows you to to create values of type forest, possibly leaving out optional fields.

The PPX make you've just seen is a very basic PPX. There's a list of the most basic and central deriving PPXs on one of the old PPX infrastructure repos called ppx_deriving. The list contains the following deriving PPXs:
show, eq, ord, enum, iter, map, fold, make

Nowadays, most PPXs are written using the new PPX infrastructure library ppxlib which has introduced clear composition semantics and an improvement in performance. In fact, also make already has a version based on ppxlib. It was implemented by Aya, who last round was our Outreachy intern and this round will be your Outreachy co-mentor!

There are still a couple of PPXs on that list mentioned above left that don't have an implementation in ppxlib yet. Your task will be to implement one/some of them! Concretely that means understanding what those PPXs are supposed to do and manipulating the AST using ppxlib's API to achieve that. And all of that in the functional programming language OCaml.

To get a better understanding of what PPXs are, you can have a look at

Minimum system requirements

A Unix based system, for example Linux or MacOs. On Windows we strongly recommend using WSL2.

Intern tasks

You will write some of the standard OCaml derivers using the ppxlib API, such as enum, iter, map, fold or possibly polish show. You can either choose yourself or we can guide you on which ones would be easiest to start with. That isn't an easy task, but in case it clicks early for you how to manipulate the OCaml AST and you'd like to move on to even more interesting tasks, you can write a more complex PPX that the community would benefit from, such as a (de-)serializer for a protocol we would choose.

Intern benefits

You would learn or get better at the amazing programming language OCaml. I mean, which other programming language is called O<some animal>? But jokes aside, learning / getting better at a functional language like OCaml has a lot of benefits: you will learn to structure your programs better, be more concise, and write less error-prone code.

Furthermore, you would not only work in OCaml, but improve the PPX ecosystem, which is one central feature of the language. You would make life easier for other people who use this language by working on the language's base. And your work would be quite low-level, which means that afterwards, you'll understand better how programming languages in general work under the hood.

Community benefits

The community, that is the OCaml community, would improve its meta-programming ecosystem PPX. The general PPX system has already strongly shifted towards ppxlib; only the derivers (there are different kinds of PPXs, one of them being derivers which is what this internship is about) are still in a limbo between ppxlib and their old infrastructure ppx_deriving. So having also the most basic derivers written in ppxlib would be very useful for two reasons. One is the obvious one of having them available to use them. And the other one is having them bundled together on the same github as ppxlib to have simple examples of how to write derivers.

Project skills

Skill description Impact on intern selection Experience Level
Functional programming Required 2 (Concepts)
Meta-programming Preferred 1 (No knowledge required)
Knowledge about ASTs Nice to have 1 (No knowledge required)
OCaml Nice to have 3 (Experimented)
Git Required 2 (Concepts)

Mentors

  • Sonja Heinze
  • Aya Charaf

Communication channels

Channel

Discord

Community norms

You can ask questions on any channel on Discord. The two channels that are particularly related to the internship are #outreachy and #ppx. The link provided below is the link to the #outreachy channel.

We have one thing we would kindly like to ask you: whenever you write a question, please write it only in one channel. If you write the same question on several channels, several people might dig into the problem to answer you and so their work might get duplicated.

If you want to write us a mail, it would be great if you could write the mail to all co-mentors. Only one co-mentor will write a response, but by writing the mail to all, we'll all be on the same page.

For the internship, the communication will be moved to our community Slack.

And in general: never hesitate to ask! :)

Outreachy Winter 2021

PRJ 1 - Integrate check.ocamllabs.io in v3.ocaml.org

Project short title

Integrate a package health check in ocaml.org

Long description

The opam package manager is powered by a CI infrastructure that tests different combinations of packages in order to detect whether a package upgrade will break another package of the ecosystem.

The result of all of these tests is captured by opam-health-check and a UI for it is currently hosted at http://check.ocamllabs.io/.

The goal of this project is to integrate this health check page in the upcoming version of the ocaml.org website, currently available at https://v3.ocaml.org/.

Technically, the objective is to fetch the data necessary to render the pages from the correct source in an efficient manner and implement a new design of the health page that fits well with the rest of the website.

The data fetching and processing part will be done in OCaml, and the UI part will be done in HTML and CSS3.

Minimum system requirements

Most systems including older ones should be able to work with the project.

For the OS, we recommend a Linux, FreeBSD, or macOS host. Windows is possible to get working but is more trouble than the other operating systems at present.

How can applicants make a contribution to your project?

You can refer to the contribution guide here.

And find "good first issues" to fix here.

The best way to contribute to the project is to have a look at the GitHub repository. The best page to look at first is OCaml's "getting started" with a development environment, this will guide you through getting OCaml installed using the OCaml Package Manager (opam). Many different operating systems are supported so be sure to follow the guide for your OS.

Optional Internship Project Details

Repository: https://github.com/ocaml/v3.ocaml.org-server/
Issue tracker: https://github.com/ocaml/v3.ocaml.org-server/issues
Newcomer issue tag: good first issue
Intern tasks:

Small tasks:

  • Fix UI issues on the site
  • Optimize the performance of the site
  • Improve some of the data used to generate the pages

Main tasks:

  • Create a module to access the data from opam-health-check servers
  • Create a page to display the data currently available at http://check.ocamllabs.io/

Intern benefits:

Interns will be exposed to many seasoned OCaml developers who can provide valuable insight. It is also an opportunity to learn communication skills and how to present problems and their solutions in an understandable way.

In addition, the project offers the following technical benefits:

  • Functional Programming and OCaml: during the internship, you will work the OCaml programming language and get familiar with its ecosystem.
  • Modern open-source development: the projects use git version control, github and automated CI infrastructure to deploy and test the code.

PRJ 2 - Support .eml files in VSCode

Project short title

Support .eml files in OCaml's VSCode extension

Long description

Dream, the OCaml web framework, uses .eml files to embed HTML in OCaml files.

At the moment, opening these files in VSCode, with the official OCaml VSCode extension, will not provide any syntax highlighing or diagnostics for the .eml files, because they are not supported.

The goal of the project is to add support for the syntax in the extension itself as a first step, and eventually, add support for the language in the OCaml Language Server (LSP) as a second step.

Minimum system requirements

Most systems including older ones should be able to work with the project.

For the OS, we recommend a Linux, FreeBSD, or macOS host. Windows is possible to get working but is more trouble than the other operating systems at present.

How can applicants make a contribution to your project?

You can refer to the contribution guide here.

And find "good first issues" to fix here.

The best way to contribute to the project is to have a look at the GitHub repository. The best page to look at first is OCaml's "getting started" with a development environment, this will guide you through getting OCaml installed using the OCaml Package Manager (opam). Many different operating systems are supported so be sure to follow the guide for your OS.

Optional Internship Project Details

Repository: https://github.com/ocamllabs/vscode-ocaml-platform/
Issue tracker: https://github.com/ocamllabs/vscode-ocaml-platform/issues
Newcomer issue tag: good first issue
Intern tasks:

Small tasks:

  • Improve the welcome page
  • Add configuration options
  • Improve UI of some features

Main tasks:

  • Support the syntax to get syntax highlighting
  • Support the extension in LSP to get error reporting

Intern benefits:

Interns will be exposed to many seasoned OCaml developers who can provide valuable insight. It is also an opportunity to learn communication skills and how to present problems and their solutions in an understandable way.

In addition, the project offers the following technical benefits:

  • Functional Programming and OCaml: during the internship, you will work the OCaml programming language and get familiar with its ecosystem.
  • Modern open-source development: the projects use git version control, GitHub and automated CI infrastructure to deploy and test the code.

PRJ 3 - Improve the OCaml meta-programming ecosystem

Project short title

Improve the OCaml meta-programming ecosystem

Long description

It's common for programming languages to provide some way to meta-program in order to preprocess code before reaching the last compilation step, for example in the form of macros or templates. OCaml provides a meta-programming system for preprocessing which is type safe and very powerful. The terminology used for it is PPX (Pre-Processor eXtension); the term PPX is used both for the preprocessing system and for any preprocessor library itself.

PPX is based on transformations of the OCaml AST, so with PPXs the compilation of an OCaml program looks as follows: the OCaml compiler parses the code as normal into an AST and then passes that AST to the PPX driver which transforms it into a new AST, which in turn passes it back to the compiler to complete the compilation. For example, if you write code like

type forest = { name : string; number_trees : int } [@@deriving show]

Then the compiler will transform that code into an AST, which by the PPX driver will be transformed into an AST representing the following code:

type forest = { name : string; number_trees : int }
val pp_forest : Format.formatter -> forest -> unit
val show_forest : forest -> string

The two functions pp_forest and show_forest were generated by the PPX show. They allow you to print values of type forest and are therefore very useful for debugging.

The PPX show you've just seen is a very basic and central one. There's a list of the most basic and central deriving PPXs on one of the old PPX infrastructure repos called ppx_deriving. The list contains the following deriving PPXs:
show, eq, ord, enum, iter, map, fold, make

Nowadays, most PPXs are written using the new PPX infrastructure library ppxlib which has introduced clear composition semantics and an improvement in performance. In fact, also show already has a version based on ppxlib. However several PPXs on that list mentioned above don't.

The internship consists in writing some of those basic deriving PPXs using the ppxlib infrastructure. Concretely that means understanding what those PPXs are supposed to do and manipulating the AST using ppxlib's API to achieve that. And all of that in the functional programming language OCaml.

Minimum system requirements

A Unix based system, for example Linux or MacOs. On Windows we strongly recommend using WSL2.

Intern tasks

You will write some of the standard OCaml derivers using the ppxlib API, such as enum, iter, map, fold, make. You can either choose yourself or we can guide you on which ones would be easiest to start with. That isn't an easy task, but in case it clicks early for you how to manipulate the OCaml AST and you'd like to move on to even more interesting tasks, you can write a more complex PPX that the community would benefit from, such as a (de-)serializer for a protocol we would choose or a PPX for the OCaml unit test library.

Intern benefits

You would learn or get better at the amazing programming language OCaml. I mean, which other programming language is called O<some animal>? But jokes aside, learning / getting better at a functional language like OCaml has a lot of benefits: you will learn to structure your programs better, be more concise, and write less error-prone code.

Furthermore, you would not only work in OCaml, but improve the PPX ecosystem, which is one central feature of the language. You would make life easier for other people who use this language by working on the language's base. And your work would be quite low-level, which means that afterwards, you'll understand better how programming languages in general work under the hood.

Community benefits

The community, that is the OCaml community, would improve its meta-programming ecosystem PPX. The general PPX system has already strongly shifted towards ppxlib; only the derivers (there are different kinds of PPXs, one of them being derivers which is what this internship is about) are still in a limbo between ppxlib and their old infrastructure ppx_deriving. So having also the most basic derivers written in ppxlib would be very useful for two reasons. One is the obvious one of having them available to use them. And the other one is having them bundled together on the same github as ppxlib to have simple examples of how to write derivers.

Project skills

Skill description Impact on intern selection Experience Level
Functional programming Required 2 (Concepts)
Meta-programming Preferred 1 (No knowledge required)
Knowledge about ASTs Nice to have 1 (No knowledge required)
OCaml Nice to have 3 (Experimented)
Git Required 2 (Concepts)

Mentors

  • Sonja Heinze
  • Shon Feder

Communication channels

Channel

Discord

Community norms

You can ask questions on any channel on Discord. The two channels that are particularly related to the internship are #outreachy and #ppx. The link provided below is the link to the #outreachy channel.

We have one thing we would kindly like to ask you: whenever you write a question, please write it only in one channel. If you write the same question on several channels, several people might dig into the problem to answer you and so their work might get duplicated.

If you want to write us a mail, it would be great if you could write the mail to all co-mentors. Only one co-mentor will write a response, but by writing the mail to all, we'll all be on the same page.

For the internship, the communication will be moved to our community Slack.

And in general: never hesitate to ask! :)

PRJ4 - Odoc diff tool

Project short title

Create a tool to show differences in the output of odoc.

Long description

Odoc is a documentation generator for OCaml code. It works
by analysing the compilation artefacts of a library,
extracting specially formatted documentation comments and
producing output in a variety of formats, most commonly
HTML.

The recent refresh of the OCaml website has included a goal
of producing documentation for all versions of all packages,
which has involved creating a CI pipeline that runs odoc on
all packages it can compile.

This project is to produce a tool to work on the output of
this pipeline primarily to find differences between
different versions of the same package. This will be used to
highlight differences between versions in the OCaml website.

Minimum system requirements

Most systems including older ones should be able to work with the project.

For the OS, we recommend a Linux, FreeBSD, or macOS host. Windows is possible to get working but is more trouble than the other operating systems at present.

How can applicants make a contribution to your project?

For this project, you can refer to the contribution guide here: https://github.com/ocaml/odoc/blob/master/doc/contributing.mld

And find "good first issues" to fix here: https://github.com/ocaml/odoc/issues?q=is%3Aopen+is%3Aissue+label%3Agood-first-issue

Applicants can contribute to this project through the project repository or contribution page. The project uses an issue tracker to keep information about bugs to fix, project features to implement, documentation to write, and more. Applicants can look for newcomer-friendly issues to use for their first contributions by looking for the following issue tags in the project issue tracker: good-first-issue

The best way to contribute to the project is to have a look at the GitHub repository. The best page to look at first is OCaml's "getting started" with a development environment, this will guide you through getting OCaml installed using the OCaml Package Manager (opam). Many different operating systems are supported so be sure to follow the guide for your OS.

Optional Internship Project Details

Repository: https://github.com/ocaml/odoc
Issue tracker: https://github.com/ocaml/odoc/issues
Newcomer issue tag: good-first-issue
Intern tasks:

  • Create an OCaml library that is able to calculate the differences between two odocl files. It should be able to find elements that have been added, removed or changed.
  • Create a command-line interface to the library

Optionally, should there be time, it would be useful to
investigate how this might be integrated into the
v3.ocaml.org website.

Intern benefits:

Interns will be exposed to many seasoned OCaml developers who can provide valuable insight. It is also an opportunity to learn communication skills and how to present problems and their solutions in an understandable way.

In addition, the project offers the following technical benefits:

  • Experience with OCaml's module system: the project involved working with data representing the full spectrum of OCaml's module system, offering a pathway to become deeply acquainted with its complex semantics
  • Functional Programming and OCaml: during the internship, you will work more generally the OCaml programming language and get familiar with its ecosystem.
  • Modern open-source development: the projects use git version control, github and automated CI infrastructure to deploy and test the code.

Community benefits:

It will be very useful indeed to the community to be able
to see easily the changes between versions of packages.
It may also be useful to library authors too as they are
writing CHANGES files. It will also be useful simply to show
how to use the odoc libraries to write useful independent
tools.

Outreachy Summer 2021

watch.ocaml.org population

OCaml has been around for a long time, and there are a number of media recordings available of talks about various aspects of the language. We would like to begin curating these and archiving them on self-hosted infrastructure on OCaml.org instead of relying on third-party hosting.

To this end, we have established watch.ocaml.org, which is an instance of the open source peertube software. This can import videos to self-host them on OCaml.org infrastructure, and also serve them using p2p techniques to reduce the need for a big central streaming setup.

Project Milestones
  1. Locate and import as many OCaml videos from YouTube/Vimeo/etc as can be found (2-4 weeks)
  2. Metadata manipulation scripts to curate content (2-4 weeks)
  3. Integration with Discourse commenting (4-8 weeks)
Importing OCaml videos

The current site has just the last year's OCaml Workshop videos uploaded. There are many, many more videos online on various other sites, which need to be imported into watch.ocaml.org from there.

Although the Peertube software can take care of the transcoding of the actual videos, there is some manual effort needed to ensure that the description, tags and other metadata are consistent throughout the site. This first milestone will see around 50-100 videos (or more!) imported with reasonable metadata.

Metadata manipulation to curate content

Once there are a number of videos in place, we need to write some scripts to help us manage that metadata and also link it to the next.ocaml.org site. This will involve using the Peertube REST API to write some OCaml code that will output the videos on watch.ocaml.org in Yaml format that can be interpreted by the OCaml website generator.

This will allow us to easily link the content in watch.ocaml.org as embeds directly within the OCaml website itself, in the 'talks' section.

Discourse integration

This is an advanced milestone that you may not fully complete, but is a good stretch goal in case you have time left. The discuss.ocaml.org site runs using the Discourse forum software. Ideally, comments on watch.ocaml.org should redirect the user to a thread on the discussion site.

This milestone involves writing a Peertube plugin that will replace the commenting area with a link to a Discourse forum thread. It can also create that forum thread using the Discourse API in case one doesn't already exist.


Opam Package Search & Grapqhl Endpoint

The package search was initially two separate projects: a new client and a new, GraphQL endpoint. It makes sense for these ideas to come under the same project and perhaps cater for the need of the applicants better that way.

The new web client for rendering output from the opam package database could use a JSON endpoint on opam.ocaml.org which provides information about packages (see this commit) which would provide metadata about the packages. It could also use the new GraphQL endpoint as designed in this project. The two could be done by one, very competent and ambitious applicant.

Skills you will learn: Javascript, OCaml, GraphQL
Difficulty: entry level
Applicants: 1

Project Milestones

The opam package search can be split up into three phases:

  1. Generation of the data (2-3 weeks)
  2. Implementation of the GraphQL server (4-8 weeks)
  3. A GraphQL Client app (4-8 weeks)

The weeks are rough estimates and if an intern only wants to work on say the GraphQL client app then they could do that, or they might want to have a go at the server too, or only the server.

Generating the Data

The most likely method for doing this is using the opam2web tool. This tool generates the current opam.ocaml.org/packages site. We won't need the HTML output, but will most likely need the underlying data-structures it builds using a checkout of th opam-repository.

It could dump these as JSON and from that we could implement the GraphQL server. Or it could all be wrapped into the same project with the data just being in-memory.

The simplest idea for what to generate could just be a list of packages with information like description, reverse-deps etc.

GraphQL Server

Once the data-structures are finalised from generating the data, the next phase is to wrap this in a GraphQL server probably using ocaml-graphql-server. This involves generating a Schema that follows the types we defined in the data.

Another option could be to "Irmin-ize" the data and use Irmin's ability to generate GraphQL servers from a store. This still needs some work though and I'm not sure how easy it is to extend the Schema.

A GraphQL Client

There are quite a few possibilites here, but it would be nice to have a decent prototype search client (eventually this could end up on OCaml.org) but I think there's enough here to keep it separate for the purposes of the project with the goal to upstream it in the future.

A cool project would be to write a jsoo client. I (@patricoferris) have some WIP bindings using Brr to apollo-graphql-client and we can integrate the graphql_ppx to do typed queries using a schema

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
.


Improve OCaml.org

There are three main subprojects:

  1. Accessibility
  2. Translations
  3. Design

Each take on a slightly different meaning depending if they are applied to ocaml.org or next.ocaml.org.

Accessibility testing for the ocaml.org website

The OCaml website is browsed by a variety of people with different accessibility needs. This project involves researching the various web accessibility standards, writing them up into a summary and checking our overall compliance. Once this research is complete, the remainder of the internship can be spent actually fixing some of the issues found.

Skills you will learn: HTML, CSS, OCaml
Difficulty: entry level
Applicants: 1+

Subproject Project Milestones
  1. Research web accessibility (1-2 weeks)
  2. Audit ocaml.org/next.ocaml.org (1 week)
  3. Apply improvements from the audit (1-6 weeks)
  4. Research common CI for enforcing accessibility standards (1 week)
  5. Try deploying the research from (4) in ocaml.org/next.ocaml.org (1 week)

Step (3) is a little open-ended. A truly accessible site can include lots more than just good contrasting colours and alt attributes. Accessible forms, search bars, maps etc. are much more complicated (and probably interesting) and you have libraries like reakit.

Translations of the website

Currently next.ocaml.org exists primarily in french and english with some other translations such as japanese. Adding more translations would be great and in the new next.ocaml.org this should be much easier as there is a clear delineation between content (stored as yaml and markdown) and code (ReScript which consumes markdown and yaml).

By nature of what needs to be translated, interns will also pick up lots of OCaml knowledge.

Skills you will learn: Yaml, Markdown, OCaml
Difficulty: entry level
Applicants: 1+

Subproject Milestones

This one would likely have to be combined with others to fill out the internship. But it would be nice to have more translations into whatever language the applicant speaks.

They would also learn lots of OCaml too by converting the tutorials.


Add templating to odoc library output

The current version of odoc emits HTML that is quite specific, and not embeddable into other pages. There is no way to add headers, footers, or customize the output in any way. This project is to add a mechanism to support customisation of the output to enable these features.

Applicants should have contributed to the ocaml.org website project before applying (there aren't many starter issues in odoc to work on today).

Skills you will learn: OCaml, HTML
Difficulty: moderate
Applicants: 1

Markdown output for odoc

Odoc has recently gained generic support for producing output
in different format. Currently, the supported formats are
HTML, latex, and man-pages. This project is to add a new
text-based output format: markdown. The existing output generators will serve as templates for the new markdown output so