or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Use Ocluster to schedule CB pipelines on multiple machines
For Current-Bench, we are only able to run benchmarks on one machine (autumn). To scale up and to allow Sandmark benchmarks to run on customized multicore machines, we want to use OCluster to dispatch the jobs on different workers. As a future goal, when the machines are idle, ocaml-ci can perhaps use these machines to run ci jobs.
OCluster workers are currently restricted to the task of building an image with Dockerfile/OBuilder. We want to extend their capability so that workers can run an OCurrent pipeline. In that way, we will be able to cut out the sub-pipeline "build and run" of the CB pipeline and create workers specialized for that task.
Ideally, the design is fairly independent from the CB needs. It allows any subgraph of a pipeline to be extracted and run on specialized workers. The clients and workers are expected to use the OCluster pool system as a contract that the workers will evaluate their jobs with the right pipeline.
When a client submits a new job to the cluster, its worker will interpret it as an input node for its pipeline. The worker can then stream back to the client the state of the output nodes and the logs of its internal jobs. This output nodes appears as standard OCurrent values on the client.
As an example, this is how the automata looks like right now on Current-Bench, with only one repository:
With the OCluster integration, the same graph will look like:
The orange
ocluster-pipeline
symbolises the connection to an active worker executing the sub-pipeline. The original automata had some implicit nodes that are now explicit: Our CB database is filled progressively with thebuild_job_id
, thenrun_job_id
and finally theoutput
is the json metrics resulting from the benchmark run. Here the worker has not yet completed the run hence theoutput
is pending (orange).The worker has the missing pipeline that was extracted with the same three outputs nodes:
A known limitation is that the worker pipeline can only receive one input from the client, as it's not currently possible for the client to stream the status of multiple nodes to the workers. As you would expect, the client will cancel the job if its pipeline is not required anymore (freeing the worker to do something else); and the worker will inform the client of any failure. Also not shown in the drawing, the worker can have multiple jobs in progress.
We would like to break the work in a few PRs to ease the review of the proposed changes:
Extend the OCluster protocol so that jobs description can have a different set of information than "how to build this repository". We also need to extend the OCluster logs so that workers can stream structured informations to the client about the
Current.state
of their output nodes and jobs' logs / artifacts.Nearly done, just needs to check that it works correctly with step 3 before submitting: https://github.com/art-w/ocluster/tree/capnp-anypointer The changes are in principle backward compatible, the running schedulers, workers and clients don't need to be upgraded to interact with schedulers/workers/clients using the new protocol (as long as they don't use the extension).
Simplify the definition of new workers.
This will be a small refactoring to introduce a functor that splits the ocluster protocol logic from the definition of the worker specific implementation of
build
,purge
,update
, etcNearly done, needs some tiny adjustement to ease the completion of step 3.
Add a
Current_worker
public library that can be used to create a worker that executes a pipeline.Remaining TODOs:
Extend the
Current_ocluster
library so that clients can submit custom jobs to theirCurrent_worker
, and receive the outputs/logs/etc. The remaining TODOs are the dual of step 3.Some ideas we could explore later: