# AOP-5: WeaveDrive **Status:** Draft-1 **Authors:** Sam Williams, Tom Wilson. Arweave is a permanent hard drive. AO, a decentralized supercomputer lives on top of it. This AOP adds virtual file system support to pass (large) data read from Arweave directly into AO processes efficiently. ## Data Protocol In order to use Arweave's, the AO Module or Process (following the main AO data protocol specification) should be launched with the following additional tags: > NOTE: To full understand the AO specification it is recommended to read the published spec at https://ao.arweave.net/#/read |Tag Name | Tag Value(s) | Quantity | |-------- | ------------ | --------- | |Extension| WeaveDrive | 1 | |Variant | weavedrive.1 | 1 | |Availability-Type|{Assigned\\|Individual\\|Library}| {0-2}| |Attestor| {AttestorWalletID}| {0-n}| By default, a process using WeaveDrive may only access data that has had its availability attested to by its Scheduling Unit (SU). Adding `Attestor` tags allows the process to make other data whose availability has been attested to also available. The `Availability-Type` tags confer the following meanings: - **`Assigned` or no `Availability-Type` tag given**: The process can load _only_ data that has previously been assigned to it, including the present message it is processing. - `Individual`: The process must be able to load data items that are provided as individual _tags_ to `Available` items, as well as _any_ data item that its SU or specified attestor has previously given `Assignment` data items for (on the calling process or otherwise). - `Library`: The process must be able to load data items that are given in data body of an `Available` data item, as well as the items given by `Individual`. ### `Available` data items In order for the execution of processes in AO to be deterministic we need their inputs to be consistently ordered and always available. AO uses Arweave to achieve this latter requirement, permanently storing the data that is used by processes on a shared, immutable ledger. Arweave does not, however, attest to the _seeding_ of data on its own. If users want to pay for data storage on Arweave and keep the data private, they protocol will not stop them. Subsequently, the _availability_ (public distribution and upload to Arweave) of AO messages is attested to by the SU for each process. The behaviour of this SU is then regulated by the AO staking mechanics. WeaveDrive adds an additional way for SUs to signal and attest to the availability of data, which can then be used by their client processes. This mechanism comes in the form of Arweave data items that express that an ID is `Available`. These items are represented as follows: | Tag Name | Tag Value(s) | Quantity | | --- | --- | --- | | Data-Protocol | WeaveDrive | 1 | | Variant | WeaveDrive.tn.1 | 1 | | Type | Available | 1 | | Available | {TXID} | {0-n} | | Data | {TXID[0-n]} | {0-1} | ## Data Delivery ABI The WeaveDrive extension driver is injected by the CU into the runtime environment of a process during a message. Version 1 of WeaveDrive's ABI only supports the `WASM64-unknown-unknown` virtual machine type. The core components of the WeaveDrive extension are as follows: ### Front-end Interface The 'front-end' of the WeaveDrive extension is exposed via two functions of the following C function signatures: ``` weavedrive_open(const char* c_filename, const char* mode) => int file_descriptor weavedrive_read(int fd, int *dst_ptr, size_t length) => int bytes_read ``` In their explicit WASM64 form, their functions are represented as follows: ``` weavedrive_open(i64, i64) -> i32 weavedrive_read(i32, i64, i64) -> i32 ``` In the reference implemenation, these functions make calls to the WeaveDrive driver module on the host via Emscripten's `async JS` mechanisms. ### Back-end Interface The backend interface of the reference WeaveDrive implementation is representated by a JS module of the following signatures: ``` WeaveDrive.open(filename); WeaveDrive.read(file_desc, memory_loc, length); ``` Relying on only these core methods, the WeaveDrive reference implementation in AOS offers a modified standard library that makes all data available to the process accessible by any client application via the standard means (`fopen`, `fread`, etc). In order to minimize necessary retreival from the Arweave network, the reference WeaveDrive implementation uses a lazy 'download-on-read' approach to data access. No bytes of the target data are downloaded from the network upon `weavedrive_open` of the file, and only the necessary bytes are read when they are requested. The reference implementation additionally supports an optional (configurable by the host CU) read-ahead cache, which can minimize the total number of necessary network transfers by batching reads together. ### File-System Representation All files accessible WeaveDrive are represented in two directories: `/headers/ID` and `/data/ID`. Files in the `/data` directory represent the unencoded raw bytes of their respective Arweave data items. Files in the `/headers` directory contain a JSON encoding of the metadata relating to each data item. Notably, in order to avoid bloating the WeaveDrive virtual file system and causing unnecessary incumberances upon drive initialization, WeaveDrive does not offer a native way to list the full contents of either directory.