changed 6 years ago
Linked with GitHub

OpenTTD Infrastructure

GitHub workflow

Actions

Pull Request Validation

When ever a Pull Request (PR) is created, a pipeline is triggered that validates the PR. Depending on the repository, this can be a regression check, flake8, "does it compile", etc.

Everything in this action is considered "untrusted code", and as such, only an exit state is published. It is not possible to retrieve binaries from this action. Execution of this action is fenced (similar to how Travis and GitHub solve this): steps that run too long are killed, (very) limited Internet connectivity (no egress for example), the only files available are those that are in the Pull Request (and the Docker image), etc etc.

The pipeline used for this step is retrieved from the upstream branch; the pipeline inside the PR is considered untrusted too.

Pull Request Preview

When someone on the whitelist requests a preview of a Pull Request (PR), a pipeline is triggered that generated the preview. Depending on the repository, this can either be binaries that are (temporary) available for download, or a link to an URL where the repository can be visited (for example for the main website).

The whitelist is in place as this action moves the PR from "untrusted code" to "trusted code". It is up to the person requesting the preview to be sure there is no malicious code in the PR. This is also the reason the whitelist is most likely the same group of people who can submit (as that also moves the code into "trusted code").

Branch commits

When a commit is done in whitelisted branches, a pipeline is triggered to follow up on that. This can be, for example, that after a time limit (say, 20:00 CEST), another pipeline is triggered which builds and produces binaries. It can also be an automated deployment to a staging area for that part of the infrastructure.

A whitelist is used as some automation tools (for example, PyUp) need to write in a branch of the repository. This should not trigger a pipeline.

Tags

When ever a tag is set in a repository, a pipeline is triggered to follow up on that. This is similar to the Branch commits, but are deployed to production instead of staging. In the case of binary releases, these take the form of releases instead of nightlies.

Implementation

Every repository has a .dorpsgek.yml, which defines what pipeline should be triggered for each action. In the future this can be moved to GitHub Actions, when that becomes available.

Binary based repository

For example, the pipeline for OpenTTD/OpenTTD would look something like:

[
  [
    "openttd/compile-farm-ci:commit-checker",
  ],
  [
    "openttd/compile-farm-ci:linux-amd64-gcc-6",
    "openttd/compile-farm-ci:linux-amd64-clang-3.8",
  ],
  [
    "openttd/compile-farm-ci:linux-i386-gcc-6",
    "openttd/compile-farm-ci:osx-10.9",
  ]
]

Both defining what Docker images can run in parallel, and which stages should be run one after the other.

For the other three actions, it would look more something like:

[
  [
    "openttd/compile-farm:release-linux-deb-gcc",
    "openttd/compile-farm:release-linux-generic-gcc",
    "openttd/compile-farm:release-osx",
  ],
  [
    "openttd/github-actions:publish-nightly",
  ]
]

Where the last step is a docker that knows how to publish the resulting binaries. Of course more meta data than just the image name is needed in this case, to know what are the produced artifacts, etc. This is comparable with a Jenkinsfile (like here) and what GitHub Actions seems is going to do.
Of course the nightly part depends on the action.

Web-services based repository

For example, the pipeline for OpenTTD/DorpsGek-irc would look something like:

[
  "openttd/compile-farm-ci:tox"
  "openttd/compile-farm:docker-build"
]

Where the first runs things like unit-tests, flake8, etc, and the second tries to build the Docker image, to see if that is still valid.

The other three actions would look very similar, with two additional steps:

[
  "openttd/compile-farm-ci:tox"
  "openttd/compile-farm:docker-build"
  "openttd/compile-farm:docker-publish"
  "openttd/compile-farm:deploy-staging"
]

Here too you need some more information to know what to publish, etc.

The last step will be different depending on the action, of course.

Infrastructure as Code (IaC)

OpenTTD will define its full infrastructure as code. This will be done via Helm and their Charts. OpenTTD's infrastructure is complex. Charts help splitting up this infrastructure into small simple bytes.

For example, DorpsGek needs three services to run: the GitHub listener, the IRC announcer, and one or more runners (to do the work). These three services are defined in three different repositories.

Other services are, for example:

  • Website
  • BaNaNaS
  • Eints
  • MasterServer
  • (many more)

This is still a Work In Progress.

Glue GitHub workflow with IaC

There needs to be some glue between the GitHub workflow and IaC. This glue takes care of the following:

In the IaC repository is a mapping listing which repository links to what images/Charts. When ever a deploy is executed, based on this file, the Charts are updated with the new Docker image tag. After that, those Charts are deployed to the infrastructure, bringing the new image online.

In other words:

  • A Pull Request is merged in OpenTTD/DorpsGek-irc.
  • This triggers a pipeline which builds a new Docker image.
  • This image is published based on git describe, for example: 0.1.0-12-g1234567 (where 12 is the amount of commits since 0.0.1, and after 'g' is the hash of the hash of the commit on top of the tree).
  • Glue picks up on the new tag, updates the Charts based on the mapping. Also, if the action was for staging, only the staging Charts are updated, etc.
    • A single repository can publish multiple images
    • A single image can be used in multiple Charts
  • These changes are committed into the repository.
  • Because of the repository change, Helm runs the Charts and the new image is deployed.

Binary releases

Part of OpenTTD is releasing new binaries on regular intervals. Nightlies every night at 20:00 CE(S)T. and releases when ever developers feel like it.

Also being able to releases previews based on Pull Requests is valuable.

These binary repositories follow the same flow as IaC web-services. But instead of being deployed, they are published.

The current idea is to approach this problem like this:

  • Service to publish files to a CDN / Mirrors (current publish.sh).
  • Service to redirect people to the correct mirror (current ottd-content).
  • Service to download these files as fallback (current https://master.binaries.openttd.org).
  • Ping to https://www.openttd.org service to update its cache when a new file becomes available.
  • Internal service to supply a new pack of files (a single release) to be published (currently doesn't exist).
    • Depending on the action, this can be a nightly (staging for IaC), release (production for IaC) or unlisted (preview for IaC)

One additional functionality is needed for the staging action: it needs the ability to hold back a pipeline run till certain conditions have been met. For example: wait till 20:00 CE(S)T before running the pipeline.

Migrating the infrastructure

OpenTTD runs many services which overlap, interconnect, etc. It is written 15 years ago, and back then it was a lot easier to put everything in a single service, as a service calling another service was silly.

These days it is very common, to have a backend API which is called by, for example, a HTML frontend.

This makes migration from our old infrastructure to any new a complex task. The next few chapters will explain the services we have, how the interact with each other, and what possible solutions are. This list is not a complete list, as we keep finding small glue/service that were totally forgotten over the years.

Main website

The main website is build on top of Django, serves static HTML pages (no Javascript), and serves a goals:

All of these have their own problem in terms of moving them to another place. Some technical things that are worth mentioning:

  • The Django used is a heavily modified Django version from 2004.
  • It is prepared for multiple languages, which was never rolled out properly (this is the /en/ in the URL).
  • It uses MySQL to read information like the Blog, Server Listing, Server Details, ..
  • It uses files to known which Download pages are available.
  • It uses HTTP callbacks to read descriptions for Download pages (based on extensions).
    • This is a left-over from attempts to decouple the main website from the rest of the infrastructure
  • It also uses HTTP callbacks to know the current released version from finger.

In short, it is a mangle of all kinds of issues, which we have to untangle. The current suggested approach is:

Make a single service which services the main website. In order to do this, we need some hacks and patches to get there:

  • Remove the MySQL connection.
    • Redirect the Servers to the old service. In the future this should become a new Service.
    • The Blog should be in the repository. Adding a News item means a commit in the repository, and a full deployment before it becomes live.
    • The Developers should be in the repository. Changes in developers should be a commit to reflect that change.
  • Generate static files from the repository
  • Put blog-posts, developers-info etc in MarkDown/YAML files.
    • A template should make that into HTML.
  • Redirect the Download pages to the old service. In the future this should become a new Service.
  • The Download banner in the top left should be fetched from the finger.
    • Possibly every N minutes some pages have to be regenerated to pick up on the new information.
  • Only support English.
    • Bonus points to consider multiple languages. They should of course be in the repository.
    • In the future possibly something like eints can handle the translation for the website too.

Basically, it is more important to move the main website to its own service (with all its current functionality still functional, by what ever means necessary), than it is to do it perfect. Once the main website is in its own repository, other people can take over to improve it, etc.

Server listing should be part of the Master Server and Master Server Updater. This will most likely be several services on its own, but that is for another chapter.

Downloads should be part of the ottd_content, and integrate in the CDN/Mirror. This too is for another chapter.

Glossary

Pipeline

With "pipeline" is meant a notation where a sequence of actions is defined. Most actions will be a reference to a Docker image, possibly with some meta data required for that Docker image to do its job correctly. GitHub Actions represent best what the idea behind "pipeline" is.

How it is used for OpenTTD is not yet defined.

Select a repo