Try   HackMD

AOP-5: WeaveDrive

Status: Draft-1
Authors: Sam Williams, Tom Wilson.

Arweave is a permanent hard drive. AO, a decentralized supercomputer lives on top of it. This AOP adds virtual file system support to pass (large) data read from Arweave directly into AO processes efficiently.

Data Protocol

In order to use Arweave's, the AO Module or Process (following the main AO data protocol specification) should be launched with the following additional tags:

NOTE: To full understand the AO specification it is recommended to read the published spec at https://ao.arweave.net/#/read

Tag Name Tag Value(s) Quantity
Extension WeaveDrive 1
Variant weavedrive.1 1
Availability-Type {Assignments|Individual|Library} {0-2}
Attestor {AttestorWalletID} {0-n}

By default, a process using WeaveDrive may only access data that has had its availability attested to by its Scheduling Unit (SU). Adding Attestor tags allows the process to make other data whose availability has been attested to also available.

The Availability-Type tags confer the following meanings:

  • Assignments or no Availability-Type tag given: The process can load only data that has previously been assigned to it, including the present message it is processing.
  • Individual: The process must be able to load data items that are provided as individual tags to Available items, as well as any data item that its SU or specified attestor has previously given Assignment data items for (on the calling process or otherwise).
  • Library: The process must be able to load data items that are given in data body of an Available data item, as well as the items given by Individual.

Attestations

Every WeaveDrive Process has at least the Assignments availability type, it is the default. This means that it can load data items that have been assigned to it. In order to do this, an attestation must be created to signal to the Process that it is safe to load the data. The attestation must be signed by the Attestor set on the Process, or by the Scheduler Unit wallet for the Process. These attestations are represented as follows:

Tag Name Tag Value(s) Quantity
Data-Protocol ao 1
Type Attestation 1
Message {TXID} 1

Available data items

In order for the execution of processes in AO to be deterministic we need their inputs to be consistently ordered and always available. AO uses Arweave to achieve this latter requirement, permanently storing the data that is used by processes on a shared, immutable ledger. Arweave does not, however, attest to the seeding of data on its own. If users want to pay for data storage on Arweave and keep the data private, they protocol will not stop them. Subsequently, the availability (public distribution and upload to Arweave) of AO messages is attested to by the SU for each process. The behaviour of this SU is then regulated by the AO staking mechanics.

WeaveDrive adds an additional way for SUs to signal and attest to the availability of data, which can then be used by their client processes. This mechanism comes in the form of Arweave data items that express that an ID is Available. These items are represented as follows:

Tag Name Tag Value(s) Quantity
Data-Protocol WeaveDrive 1
Variant WeaveDrive.tn.1 1
Type Available 1
Available {TXID} {0-n}
Data {TXID[0-n]} {0-1}

Data Delivery ABI

The WeaveDrive extension driver is injected by the CU into the runtime environment of a process during a message. Version 1 of WeaveDrive's ABI only supports the WASM64-unknown-unknown virtual machine type.

The core components of the WeaveDrive extension are as follows:

Front-end Interface

The 'front-end' of the WeaveDrive extension is exposed via two functions of the following C function signatures:

weavedrive_open(const char* c_filename, const char* mode) => int file_descriptor

weavedrive_read(int fd, int *dst_ptr, size_t length) => int bytes_read

In their explicit WASM64 form, their functions are represented as follows:

weavedrive_open(i64, i64) -> i32

weavedrive_read(i32, i64, i64) -> i32

In the reference implemenation, these functions make calls to the WeaveDrive driver module on the host via Emscripten's async JS mechanisms.

Back-end Interface

The backend interface of the reference WeaveDrive implementation is representated by a JS module of the following signatures:

WeaveDrive.open(filename);
WeaveDrive.read(file_desc, memory_loc, length);

Relying on only these core methods, the WeaveDrive reference implementation in AOS offers a modified standard library that makes all data available to the process accessible by any client application via the standard means (fopen, fread, etc).

In order to minimize necessary retreival from the Arweave network, the reference WeaveDrive implementation uses a lazy 'download-on-read' approach to data access. No bytes of the target data are downloaded from the network upon weavedrive_open of the file, and only the necessary bytes are read when they are requested.

The reference implementation additionally supports an optional (configurable by the host CU) read-ahead cache, which can minimize the total number of necessary network transfers by batching reads together.

File-System Representation

All files accessible WeaveDrive are represented in the following directories: /headers/ID, /data/ID, /block/ID, /tx/ID.

  • Files in the /data directory represent the unencoded raw bytes of their respective Arweave data items.
  • Files in the /headers directory contain a JSON encoding of the metadata relating to each data item.
  • Files in the /block directory contain a JSON encoding of block information for the transaction.
  • The /tx directory exposes a JSON encoding of the headers of a layer 1 transaction.

Notably, in order to avoid bloating the WeaveDrive virtual file system and causing unnecessary incumberances upon drive initialization, WeaveDrive does not offer a native way to list the full contents of either directory.

Process Boot Loader via WeaveDrive

When a Process is spawned it can specify an On-Boot tag with the value of "Data" or a TXID. If it is a TXID, then WeaveDrive will be used to download the transaction. The value of the transaction's Data field will be evaluated as a start up script for the Process. WeaveDrive will allow the Process to attempt to read a transaction that matches the On-Boot id, when the Process starts up. The boot loader will utilize the /data/ID path to read the transaction.

HALT Signals and Non-Deterministic Behavior

Loading data over a network can cause non-deterministic behavior. WeaveDrive creates deterministic API to handle this non-determinism. When loading txs, each tx has two possible states: existant and non-existant. Theoretically, a WeaveDrive call for a non-existant tx can return nothing. Then, later on, if that tx becomes existant a replay of this message would lead to a new result (non-determinism). To combat this, when a non-existant tx is attempted to be accessed via WeaveDrive, a HALT signal will be provided and will inform the CU to halt evaluating further messages sent to the process, until the tx can be resolved. Since a tx can never go from existant to non-existant, this is sufficient to ensure determinism in WeaveDrive.