One Design Document
===
###### tags: `Taiker`
{%hackmd @idoleat/spoiler-plus %}
[Implementation Document](/YC_DDIbAQgqS9syz2XZngA)
$ONE$ is a distributed networked system unify resources as one, aiming to be user friendly, for building and scaling distributed services/application. It's more like an operation system run on top of the internet, providing you all sets of tools to build services and manage resourcs for you, easy as using a game engine. People can connect to services running on top of $ONE$ as using normal service because One has unified all the resources for you.
We thrive to make developing experience with $ONE$ as easy as possible just like using Game Engine. User experience comes first, alone side with the ability to make things distributed to gain more performance, throughput and storage.
You can:
* Install a service
* Start a service
* Build a service base on a built service
* Coordinate services (microservices)
A Service is consist of one or more workers.
You can:
* Use a service by calling service functions
* Task split version 1
* A worker will handle the function call
* That worker may split the task into tasks and deligate them to other workers.
* Get the result from the handler or PRNG/hashed final worker
* Task split version 2
* Every function is a wasm binary to be executed concurrently
* version 2 is actually a specified version 1
* A worker process may have one or more service running on top of
* I'm not sure to allow this or not
* (Current decision) Or multiple workers on a node???? let OS schedules workers
* If a worker runs more than one service we need to schedule (which service to run)
* Worker modules
* worker group: broadcast
* worker swarm(shoaling?): Workers as a worker
Every worker is the same. A worker is a worker process running, for basic worker functionalities (communication and computing). Service code utilizes those functionalities to fit its need.
Considering make worker a kernel module to run in kernel mode for efficiency (no mode switching caused by system call, higher sechuling priority, priviledged, etc.)
Or eventually build a networked operating system. (Please do remember Plan 9)
## Key components and features
### Resource management
Manage all the computing resources and storage. Developers can specify when to use which machines to do what. All the other features build on top of resources management, such as protocols and architecture schema.
### Protocols
Define how machines act and coordinated. Such as,
* Fully connected to strongly synchronize something by broadcasting
* Direct memory access
* Elect one leader to sync
* Without leader
Not sure if architecture schema is a thing or not. Anyway, you can use protocols to coordinate workers to perform a task. Notice that you should analyze your tasks, if there are no shared resources(data structure, variable, etc.) then just assign tasks directly (Typically theses tasks are called data intensive tasks, such as image/audio processing). But analyzing and dealing with shared resources are notorious to be difficult, which are known as concurrency problems. So here comes ***Multiverse***. A new way to perform concurrency.
### Builtin Services
It's a little bit like micro kernel. $ONE$ provides the basic functionalities of resource management and system setup. All the other common services are built as services on top of $ONE$.
* oneFile
* oneDataBase
* oneShell
* oneAuthenication
* onePermission
* oneMonitor
* oneLogger
* oneDesktop
* 像是 webtop [Aroz](https://docs.google.com/presentation/d/1dNsMSbc0gTEWTgt3Lfp3ohA79bmijKokQasXaeLUnY0/edit#slide=id.g13395a7afd0_0_40) 那一套串流應用程式與小工具
## Multiverse<small> [(ref)](https://www.youtube.com/watch?v=b7rZO2ACP3A)</small>
:::spoiler why
我在中秋連假把自己鎖在實驗室和家門外 19 個小時想到的東西
:::
把一個 Sequential program 演算法切成以 atomic operations 為單位的塊狀,每個 worker 身上都有一份,worker 被分配到一塊至數塊執行,會依賴其他 worker 結果的就先當作 promise,實際拿到值再一次 resolve promise。
一個 worker 會是一個 POXIS thread, 一個 process, 一個 physical core, 一台 Networked Machine (node) 的抽象畫
此處 atomic operations 是指一連串需要一次被執行完不可分割的 instructions,例如把數值從記憶體中放入暫存器,修改完數值數值再存回記憶體的一連串操作。其次是適合放在同一個 worker 執行不建議被分割的,例如迴圈,因為通常迴圈內的變數會需要保有 temporal locality 及 spatial locality。演算法則是依據這些情境去判斷該從哪邊切。
目前是看到有人把要被拆分的工作限制在特定類型進行切割,例如 video precessing,就不用考慮太多太複雜的 corner cases
### Goal
要找一個新的 memory model,以不共享資源的方式避免 data races/dependencies ,讓一個程式可以被拆成好幾個區塊 concurrent executing。現在的想法是做 wasm instruction level 的 concurrency,在 wasm instructions 中尋找適合的分割點以及幫忙安插必要的 instruction 以達到目的。分割的最小單元為
其他切割的對象(越 portable 越好,可以在每個平台都使用同一種切割方式)
* [Rust eBPF bytecode](https://github.undefined.moe/qmonnet/rbpf)
* LLVM IR (can utilize llvm tool chain deeply?)
* Java bytecode
* dynamic language?
* assembly language (不是很理想,這樣要對不同 ISA 重新設計切割方式)
新的 memory model 不是切割的必須但是 data dependencies 和 data races 會是很大障礙 (need proof),切的方法
reference: Rust memory model
* https://news.ycombinator.com/item?id=29109156
* https://www.youtube.com/watch?v=zO1a2986NHg
* https://www.youtube.com/watch?v=rDoqT-a6UFg
* https://github.com/rust-lang/rust-memory-model
#### Why [WASM](https://webassembly.github.io/spec/core/intro/index.html)?
1. 獨立於各硬體架構,是一種 Virtual ISA,且主流語言編譯至 WASM 的工具鏈都算成熟,特別是 Rust
3. [Sequentially executed instructions](https://webassembly.github.io/spec/core/intro/overview.html#concepts=) (no reorder) (尚待確認)
> **Instructions:**
> The computational model of WebAssembly is based on a stack machine. **Code consists of sequences of instructions that are executed in order.** Instructions manipulate values on an implicit operand stack 1 and fall into two main categories. Simple instructions perform basic operations on data. They pop arguments from the operand stack and push results back to it. Control instructions alter control flow. Control flow is structured, meaning it is expressed with well-nested constructs such as blocks, loops, and conditionals. Branches can only target such constructs.
4. Instructions can be organized into functions calling each other
* Enabling functional style in instruction level?
5. 設計上本身就可以被分割成多個 module,以便傳輸,引用等操作
6. 比起切割 IR,更易於依據當下環境才決定如何切割 (待確認)
### Prove of concept
Need actual cases. I hope the application of the concept could be universal.
## Setup
**Setting over Config:** Everything should work out of the box, like a daily app we use. Users go to setting for changing behaviors, details, etc. instead of editing a bunch of config file before starting. Of course you still can call it config. We are just borrowing the concept of setting from our daily life.
For this reason, we have a `setting.toml` in the root directory, which contains the settings for TaiOne to work out of the box. TaiOne is written in Rust so we use `toml`.
Here's an example of setting:
```toml
[Me]
IP = "192.168.1.1"
port = "56351"
[Peers]
[
{IP = "293.268.1.2", port = "33543"},
{IP = "293.268.1.3", port = "25453"},
{IP = "293.268.1.4", port = "15345"},
]
```
the default setting would be:
```toml
[Me]
IP = "default"
port = "default"
[Peers]
thrthrthrthrthrthrthrthrthrthrth
```
The `"default"` string would trigger a setup script to find a proper value. The found value would be edited back. For example, the script will execute `ip -a` to find your IP and try the default port `56929`. It should work in LAN environment. For WAN environment, a prompt should pop up to ask for firewall permission. (I think I should put a message to remind server owner to enable firewall somewhere....) Also another reason to use setting over config is you can not expect users have right config([ref](https://www.ithome.com.tw/news/152685)), same as you can not expect users to manage memory well.
Any node in the cluster can act as an endpoint for user to add more nodes. By default, the new peer's IP and port would be broadcasted to everyone, which means all nodes are fully connected. In the case that not all nodes are fully connected, some nodes won't have some peers in the peer list (unable to establish any sort of protocols), additional routing information is needed in `setting.toml`. no!
## Future?
High bandwidth, high speed, 5G, For $ONE$
---
PRNG lets workers have the consensus of communication as the begining (seed), they know what to do when some events happen (e.g. find tasks results, get backup handler). As coded distributed computing described, communication cost grows when tasks are split into more pieces. PRNG can reduce the communication due to pre-acknowledged seed.
I'm not sure I'm going to build a full blown networked operating system or not. Or just a service framework. Or just a program.
---
注意 TaiOne 所在意的 scalability 不只是更快的部份,而是要讓我們投入更多硬體資源的時候,能都有在執行工作而不是空等。即便不能讓一件事讓很多人一起做,也要作到能把工作 migrate,offload 給其他人做。
---
所有的進階功能都是建築在基礎的 resource management 之上,你可以利用管理功能實作 consensus、實作 non-blocking stack on DMM、worker swarm,提供中心化、非中心化的選項
---
Every worker is a process in user space or in kernel (via kernel module)
---
:::spoiler old faq
NoXerve agent 可以多一種 publisher 與 subscriber 的溝通方式?
之前用 Node Js 寫的其中一個變態的功能是可以把 node Js 的功能 require 進來之後(Js obj) serialize(?) 傳給別人,讓別人執行(我猜應該不是所有物件都可以,有些應該是 platform specific,甚至只是一個 binary 的 Js wrapper,不過如果環境都一樣應該可
成為 Service worker 就能使用 Service? Service worker 可以使用 Service?
使用 Service only if 成為 Service worker?
:::