AGORA PoC v0.2 | Asset

[Agora-System](/sqedw7tHRye1RoAt5O108A) --- [API](/TreDITwVTiK_KRemt9QOLA) --- [Message Sequences](/AUbhB5KxSb2jQJBloKtC1g) --- [Asset (Proposal)](/cICGuug-Tde1sGuvpsBAJQ) --- [PoC Definition](/aSqmHZ4uRW-iC9N_Pesd5g) --- [Old Stuff](/B4Nn8Zn9RVyGJDk4bHYHbw) --- [Demo](/rAv27FvBQXmv_rLpKQB25w) # AGORA PoC v0.2 | Asset [toc] ## Asset Metadata An AGORA Asset is described by its metadata, containing generic attributes and type-specific attributes. Inspired in asset-oriented projects ([Snowflake](https://docs.snowflake.com/en/sql-reference/info-schema.html), [Ocean Protocol](https://github.com/oceanprotocol-archive/OEPs/blob/master/8/v0.3/README.md#file-attributes)) and public domain repositories (Schema.ORG), an AGORA Asset Metadata is described by attributes allowing the discoverability and visibility within the AGORA Network. It shall be noted though, that the data of the assets does not need to live within the AGORA network. Therefore, it is possible, that e.g. a dataset offered by a governmental institution is linked from an external source. Then, the data itself lies on the servers of the isntitution, while the metadata file is part of the AGORA network. Each of the Metadata should be described by an agnostic-language schema, despite JSON and YAML are two of the popular options, JSON will be preferred, as the Agora Core Framework has further support for such format. The schema can be found at [Add link]() **Asset Types** * GENERIC * DATASET * OPERATOR * PIPELINE * STORAGE * EXECUTION * COMPUTE ### Generic Attributes The generic attributes are described for all the Assets regardless of their nature. In a first instance version, this will allows the testing of the components and implementation of the further functionality. | Attribute | Type | Description | Required | |--|--|--|--| |`id`\*|String|The unique ID for the Asset within an Agora Network | Yes| |`type`|String|One of the valid Asset types in Agora, for example *DataSet*|Yes| |`name`|String|A given and descriptive name|Yes| |`url`|Sring|Qualified URL|Yes| |`description`|String|Human readable description|*(Optional)*| |`tags`|Array of String|Array of keywords used to describe the Asset|*(Optional)*| |`datePublished`\*\*|long|Date of first broadcast/publication expresed as Unix-Epoch in Milliseconds|Yes| |`license`|String|Short name referencing the license of the asset (e.g. Public Domain, CC-0, CC-BY, No License Specified, etc. ). If it's not specified, the following value will be added: "No License Specified".|Yes| |`restrictions`\*\*\*|Collection of AgoraID and the associated rule|Either a whilisted or blacklisted set of Agora components allowed or denied to use the Asset|*(Optional)*| |`txLog`\*\*\*|Array of String|An array of strings with the transaction log, each entry includes timestamp and transaction details, such as operation and components involved.|Yes| > \* id is subject to discussion what type of ID to use, as it might be optimized depending on the assumptions and conditions. > \*\* datePublished is subject to discussion, as another format could be more convenient. > \*\*\* visionary feature, not mandatory in initial implementations. :::warning Customized K/V for the Generic Metadata? ::: ### Asset of Type Dataset In the broad sense, a dataset is a collection of information with a defined format, encoding, and structure. In the context of Agora, a dataset is used or consumed by granting access to the resource that contains it and allowing the transfer of its contents. Initial examples of such Asset-type are CSV files, results from SELECT queries, and JSON files. Each subtype of Dataset will require a mechanism to be used, that should be defined independently of the metadata. The Dataset type-specific attributes are suggested initially as follows. | Attribute | Type | Description | Required | |--|--|--|--| |`format`|String|Format of the asset, e.g. CSV, JSON | Yes| |`encoding`|String|Encoding of the *DataSet*, (e.g. UTF-8)|Yes| |`fields`|Array of field description objects||Yes| **Field Description Object** | Attribute | Type | Description | Required | |--|--|--|--| |`name`|String|Name of the field|Yes| |`datatype`|String|Data type of the field|Yes| |`description`|String|Human readable or ontology reference description of the field|*(Optional)*| :::warning Subject to discussion - Sample data, as in [CKAN Metadata](https://ckan.org/features/metadata) - Owner field - Case of 2 Datasets with different Licence - Size of the Dataset ::: :::info If two datasets are joined, are they considered a new Dataset, how this could work? ::: ### Asset of Type Operator In the Agora context, an Operator is a gray box, with expected input(s), parameter(s), and output(s). ### Asset of Type Pipeline In the Agora context, a Pipeline is a sequence of steps, considering for each step its own input(s), operator, and output(s). The pipeline is user defined and, in contrast with the algorithm, it can be composed by other Assets and its definition is Agora specific. ### Asset of Type Storage In the Agora context, Storage could be represented as an Object Storage, such that allows writing and reading operations over massive amount of data, including images, videos, or files, to mention a few formats. It would be desirable to provide an interface for already existing Storage Systems, as used by OpenML (Min.IO) ### Asset of Type Execution In the Agora context, an Execution Engine, Compute Platform, or Execution Asset is any software stack to process data, some examples are a DBMS, Flink Cluster, Spark Cluster, Python Environment, Java Environment, to mention a few. From the Asset Manager perspective, the main responsibilities is to provide Matadata and Access to the underlying platform. ### Asset of Type Compute In the Agora context, a set of compute resources with minimum software stack, giving the end-user the flexibility to use freely the software components required. :::danger **NOTE:** Asset Manager keeps only the state of the registered Assets, however there is need of a mechanism to access the resources. :::