I am writing an ActivityPub server in OCaml. The name is Waq, and I've made it [publicly available on GitHub](https://github.com/ushitora-anqou/waq). You can run the demo with Docker Compose if you follow the README. ## Development Background It all started with me running my own Mastodon server. I started this server about a year after the first boom of Mastodon in Japan (around April 2017), and it has been running stably until now. I have done some maintenance such as replacing the server, but there have been no major problems such as data loss, and we recently celebrated our 5th anniversary without any problems. At first, I used Mastodon in combination with Twitter, but recently I have been using only Mastodon exclusively. In the process of operating this Mastodon server, I became technically interested in the ActivityPub protocol (AP) used by Fediverse, including Mastodon, and I wanted to implement it myself if possible. However, it was rather low priority in my mind, and I left the idea alone for a while. Things started to change in November of last year (2022), when a certain well-known person purchased Twitter and things started to get weired. I was surprised at how easily my (former) favourite place could be destroyed. Fortunately, Mastodon, which I currently live in, is a decentralized social networking service, so I will not be involved in this kind of story. However, if there were other AP implementations besides Mastodon that I could keep track of everything, I would be able to continue my activities even if Mastodon were to become unusable for some reason. Also, my technical interest in APs grew on this occasion, so I started developing an AP implementation. Thereafter, there will be no explanation of the basic mechanism of Mastodon or Fediverse. If you are interested in these things, please refer to [the video](https://www.youtube.com/watch?v=IPSbNdBmWKE) provided by the Mastodon official website. ## Goal The ultimate goal of Waq is to write a drop-in replacement implementation of Mastodon. I use the term "drop-in" here in the sense that at any given moment, you can replace a Mastodon server with Waq and have everything work as is, including the RDB and APIs. Currently Waq is far from this goal, but I am working on making the API between the client and server a subset of Mastodon's REST API, and the RDB schema a subset of Mastodon's schema, to get closer to this goal. I am not sure how long time it will take, and I am not sure if I will continue development until then, but I hope to eventually be able to replace my Mastodon server with Waq. Note that the term "drop-in" here does not include a frontend; Waq only provides the backend, such as the REST API and the streaming API using WebSockets. For the frontend, we will use client implementations built for Mastodon (e.g., [Elk](https://elk.zone/), [Subway Tooter](https://play.google.com/store/apps/details?id=jp.juggler.subwaytooter&hl=en_US), etc.). In the future, I would like to take the frontend part of Mastodon and run it for Waq (as Pleroma does). Currently, I am developing Waq using Elk and Subway Tooter. ## Current Status Currently, Waq has the following features: - Basic functionalities such as posting, favouriting, boosting (retweeting), following, etc. - Mentions and replies - Notifications and push notifications - Image posting with BlurHash - Preview cards with oEmbed/OGP On the other hand, the following are currently not implemented (although they are planned): - Pinned posts - Custom emoji - Hashtags - Lists - Post visibility (public/private) ## Technology Stacks ### OCaml Waq is implemented in OCaml. This is simply because I like OCaml. There is no particular reason beyond that. The OCaml ecosystem around the Web is not very rich (if not poor), as I will explain later. As a result, I have had to build many of the necessary libraries by myself. However, I am having fun reinventing the wheel. It's just a hobby project, and I'm free to do what I want. For those readers who are not familiar with OCaml, I would like to give a brief explanation of OCaml. OCaml is a functional programming language. Compared to Haskell, which is (probably) the most famous[^famous-haskell] functional programming language in the world, OCaml has rewritable variables and no monads, making it an easier language to learn[^ocaml-haskell]. And, like Haskell, it has strong static types, which means that if it compiles (i.e., if it passes type checking), it is guaranteed to be free of runtime type errors. I think OCaml is useful because it allows me to write non-pure operations with type inference and type safety. O in OCaml is O in Objective, so you can write object-oriented code in OCaml as well; it doesn't seem to be used much in the OCaml community[^ocaml-objects], but it was useful when creating an O/R mapper described below. I feel OCaml is a relatively minor language, but it has a long history, and there are many libraries and tools for web apps. For example, there is [a tool for converting OCaml to JavaScript and running it in a browser](https://github.com/ocsigen/js_of_ocaml), and there are various web frameworks. You can find information on these at [OCamlverse](http://ocamlverse.net/content/web_networking.html). [^famous-haskell]: Although recently some other functional languages like Elixir may be more famous. [^ocaml-haskell]: Of course there are many other differences between OCaml and Haskell, such as strict and lazy evaluation, but I tend to explain them in this way because I think more people know Haskell than OCaml. [^ocaml-objects]: c.f.: https://stackoverflow.com/a/10780681 ### Web Frameworks For Waq, I built a small web framework on my own. More precisely, I used [ocaml-cohttp](https://github.com/mirage/ocaml-cohttp), a well-known HTTP server (like Puma in Rails), on top of which I built a small home-grown stack for parsing and routing HTTP requests. The one I was going to use in the early days of Waq's development was [Dream](https://aantron.github.io/dream/), which has an easy to understand interface and is well documented. This library is a good choice for those who want to develop web apps in OCaml. However, I did not adopt it for Waq because (i) its build tool does not seem to be the de facto standard Dune; (ii) its HTTP server is not ocaml-cohttp (which is under active development); and (iii) Dream was not maintained at the time I started developing Waq. I quite liked Dream's interface itself, so I made my own stack similar to it. Actually, Dream has been actively developed since March of this year, so I may abandon my homebrew stack and switch to Dream sooner or later. By the way, I think [Ocsigen](https://ocsigen.org/home/intro.html) is the most famous OCaml web framework, but I avoided using it because the tutorial in its documentation seemed old and I didn't understand its interface well. However, I decided to use Dream (and to use my own stack which has a similar interface to Dream) at a very early stage, so I confess that I have not investigated Ocsigen very seriously. ### O/R Mapper Waq uses PostgreSQL because Mastodon uses it. This is simply because Mastodon uses it and Waq aims to replace Mastodon. For Waq, I built my own O/R mapper to use PostgreSQL. Using this O/R mapper, I write [the RDB schema using a DSL expressed as OCaml code](https://github.com/ushitora-anqou/waq/blob/7938616a075b901a4b08ffdff0a38117416eb354/lib/schema.ml), and it generates the necessary objects (classes) and functions for me. This allows you to issue queries and get results as objects without writing SQL statements by yourself. The DSL supports one-to-one and one-to-N relationships, and uses a preprocessor called PPX to generate object definitions and functions. See the [README](https://github.com/ushitora-anqou/waq/tree/7938616a075b901a4b08ffdff0a38117416eb354#original-or-mapper-lib_sqlx) for details. The best way to use RDB from OCaml these days, in my opinion, is to use the OCaml library [Caqti](https://github.com/paurkedal/ocaml-caqti). Caqti provides a common interface that can handle multiple RDBs. In combination with [ppx_rapper](https://github.com/roddyyaga/ppx_rapper), query results obtained by Caqti can be mapped to records. Since OCaml records are like C or Go structures, Caqti + ppx_rapper is roughly equivalent to Go's [sqlx](https://github.com/jmoiron/sqlx) library. However, Caqti cannot be called an O/R mapper because it requires the user to write SQL explicitly. There was also a big difference between Caqti + ppx_rapper and sqlx. Sqlx automatically maps values to fields in a structure by annotating the structure in advance, without the need to explicitly map column names to field names in the query. However, ppx_rapper does not support this, and the correspondence between column and field names must be written each time SQL is issued. This was quite annoying when writing complex SQL statements over and over again. So, since I couldn't find anything that satisfied me, I decided to make one. Incidentally, PG'OCaml was available as a way to handle RDB from OCaml, which ensures type safety by properly typing SQL statements and also supports automatic mapping (to objects). However, PG'OCaml does not seem to have been updated recently. PG'OCaml also behaves by querying PostgreSQL **at compile time** to ensure its type safety. I did not adopt PG'OCaml for Waq because I could not accept a design where the compilation of source code depends on non-source code state (i.e., RDB state). ### ActivityPub [The ActivityPub specification](https://www.w3.org/TR/activitypub/) is published by the W3C. However, as ActivityPub implementors are [well aware](https://tinysubversions.com/notes/reading-activitypub/), it is not possible to write an implementation by reading this specification alone. This is because the actual messages (activities) exchanged between servers are defined separately as [Activity Streams 2.0](https://www.w3.org/TR/activitystreams-core/) and [Activity Vocabulary](https://www.w3.org/TR/activitystreams-vocabulary/), and the messages are encoded using a method called [JSON-LD](https://json-ld.org/). Furthermore, these specifications are not sufficient for practical use, and Mastodon and other implementations seem to extend these specifications. In the case of Mastodon, for example, (some of) the extensions are described [in its documentation](https://docs.joinmastodon.org/spec/activitypub/). However, if you are just writing an ActivityPub implementation for fun, it is not necessary to read through these specifications at all, and it is sufficient to write an implementation that simulates the activity flowing from Mastodon. In fact, I've not yet read the specifications very much, but I implemented Waq based on the communication obtained by running Mastodon, and referred to the specifications as necessary. When writing an ActivityPub implementation, I am often concerned about "what activities should be sent or received when a certain operation is performed". I found the "Activity Type Motivating Use Cases" in chapter 5.8 of the Activity Vocabulary. However, it is quite difficult to determine from the specification alone which activity should be sent, and it is much easier to check that by running the implementation. Also, since activities are not JSON but JSON-LD, and since JSON-LD allows multiple representations to be used to represent a single piece of data, the implementation should be able to handle them correctly. However, OCaml does not have a JSON-LD library, and implementing it is time-consuming. Therefore, Waq relies heavily on the JSON-LD format used by Mastodon and handles it as if it were simply JSON. It should not be able to handle other representations than the one used by Mastodon, but so far it seems to be able to communicate with implementations other than Mastodon (such as Preloma and Misskey)[17]. ### Web Push API Waq supports Web Push ([RFC 8291](https://www.rfc-editor.org/rfc/rfc8291) and [RFC 8292](https://www.rfc-editor.org/rfc/rfc8292)). This allows users to receive notifications via Subway Tooter, etc., when they are mentions, favorites, or reblogged (boosted). OCaml does not have a library to handle Web Push, so I wrote my own. Web Push is described in the RFC with pseudo code and protocol, so I read it and implemented it. I also read Go's Web Push implementation ([webpush-go](https://github.com/SherClockHolmes/webpush-go)) for the details. One thing I had trouble with was that the Web Push specification actually used in the wild seemed to be [a draft one](https://datatracker.ietf.org/doc/html/draft-ietf-webpush-vapid-01), and it took me a while to realize that. In the draft, the way of making headers to be included in HTTP requests was slightly different. Web Push uses ECDH and HMAC to create a symmetric key, and encrypts messages with AES-GCM. The MirageOS community has [a library](https://github.com/mirage/mirage-crypto) of cryptographic primitives for this purpose, which was very helpful. ### Blurhash Mastodon uses [BlurHash](https://github.com/woltapp/blurhash) for image placeholders and blurring, and Waq can now generate it as well. This is also implemented since there was no OCaml implementation. During its implementation, I felt that the DFT (Discrete Fourier Transform) calculated by the BlurHash algorithm is not the very definition of DFT, but I am not so sure since I'm not familiar with Fourier Transforms very much. I think that [ThumbHash](https://evanw.github.io/thumbhash/), which claims to perform better than BlurHash, seems to perform the DFT calculation correctly, so it would be interesting to find out how much of the contribution comes from this part. ### E2E Testing Waq uses E2E testing to verify that behaviour is as expected. More specifically, I launch Waq locally and test it by calling its API to see if the behaviour is as expected. The benefit of this method is that, by launching Waq and Mastodon at the same time, we can also test the communication between Waq and Mastodon. Waq adds E2E testing to every additional functionality to ensure that its works correctly across servers. Mastodon assumes that server-to-server communication is performed through TLS. To satisfy this constraint, Waq's E2E testing uses the Cloudflare Tunnel. I create the tunnel in advance to act as a reverse proxy, and then during E2E testing, I specify the outer address of the tunnel so that we can communicate over TLS. The hard part of E2E testing is that the tests are inevitably unstable. Also, since Mastodon needs to be started up, it is necessary to install Mastodon's dependent libraries on the host in advance, so it is not possible to run the test casually. So, I'm currently working on putting these E2E tests on Docker Compose, to stabilize e2e testing. ## Summary I'm writing an ActivityPub implementation, just for fun. Once again, [here is the link to the implementation](https://github.com/ushitora-anqou/waq). Thank you for reading this so far. ## Note This article was published originally in Japanese [here](https://hackmd.io/@anqou/rka_GANYh), and all the texts above, except for this paragraph, are written mostly by [DeepL](https://www.deepl.com/translator).