AGORA PoC v0.2 | Agora-System

[Agora-System](/sqedw7tHRye1RoAt5O108A) --- [API](/TreDITwVTiK_KRemt9QOLA) --- [Message Sequences](/AUbhB5KxSb2jQJBloKtC1g) --- [Asset (Proposal)](/cICGuug-Tde1sGuvpsBAJQ) --- [PoC Definition](/aSqmHZ4uRW-iC9N_Pesd5g) --- [Old Stuff](/B4Nn8Zn9RVyGJDk4bHYHbw) --- [Demo](/rAv27FvBQXmv_rLpKQB25w) # AGORA PoC v0.2 | Agora-System ## Table of Contents [toc] ## Preface ## What is a Tech-Related Ecosystem? + A tech-related ecosystem (ecosystem for short) is a set of digital systems, such as DBMSes, interconnected to achieve common goals defined by organizations or individuals. A *resource* (data and/or computation) Ecosystem is a set of socio-technical systems that are self-organizing and loosely coupled to achieve common goals defined by organizations or individuals. It consists of + Platforms (e.g. Agora) + Assets (Flink, DBMS, File Server) + Data Assets + Computation Assets + Pipeline Assets + Interconnection of Assets Sources: + https://opencoesione.gov.it/media/files/esperienza-di-riuso-23/data_ecosystems_linking_transparency.pdf + https://christophgroeger.de/download/Groeger_There_Is_No_AI_Without_Data.pdf ## What is an Asset? An asset is everything that can be interacted with. (Everything that we can write an API for) E.g.: + Datasets + Download Data + Run Algorithm + Download algorithm + Run Algorithm + Clusters + send data + receive parsed data + Website + Publish results **Note:** Components are not considered an Asset due to the nature of the operations required to perform ***Data Science Tasks***, while components provide the infrastructure needed to get all the benefits from Agora as a platform. ## What is Agora? Agora is the platform (i.e. set of components) enabling organizations or individuals to create a new ecosystem or manage their socio-technical systems within an existing ecosystem. It therefore is a distributed, decentralized application based on the exchange of messages. ## Agora Design Principles - Eventual consistency in the system. For example, the registrations in upper level components, or the index at the Search Engine. - Decoupling of actors - Basic implementations provided - Programming Language Agnostic API (REST) + side note: internally, spring boot is used in some places - Blueprints to extend base implementation - we should create some ## Creating an Agora Ecosystem ### Seed Ecosystem Just the Keeper ### Asset Ecosystem Keeper, Marketplace, AssetManager ### Full Ecosystem Keeper, Marketplace, SearchEngine, AssetManager, ExecutionManager, NodeExecutor ## What are the Agora Components? An Agora Component is a role an organization or individual can play within an ecosystem at different levels. + In Agora, levels are defined as a hierarchical construct from top to bottom ### Level 0 #### Keeper + responsible for creating, maintaining, and deleting an Agora Ecosystem. + its role is almost solely asministratively + note: only a logical agora ecosystem can be removed. once the ecosystem has been established, L1 components and below are able to maintain their function without the keeper, but they may be limited in functionality such as discovery of furtehr components + logical keepers should be eventually consistent, at least with respect to their list of other logical keepers + this means: a physical keeper does *not* need to know every other L1-Component *eventually*, but every logical keeper needs to know every L1-Component *eventually* + logical keepers have not been implemented yet ### Level 1 #### Marketplace + responsible for maintaining Asset Managers and Execution Managers + only fetches final Metadata + responsible for User Management + responsible for Monetary Transaction management #### Search Engine + responsible for mantaining an index of the content within a set of marketplaces, meaning the Asset Managers, Assets, Exection Managers, and Node Executors. :::danger The search engine follows the "eventual consistency" principle for the internal index of the base-implementation. ::: :::warning The eventual consistency has been implemented by adding a RW lock in the storage. Therefore a couple of wrong entries may appear while updating the indexes, avoiding empty results. For extending the Search Engine it must be take into account the periodicity of the updates. If it is a really crucial performance issue, the implementation should either use a better indexing tool that does the locking better or it can also be implemented a more fine-grained (or unsafe) storage. ::: :::info Note: From the Akka PoC to take in account the scalability issue. In the case of many assets, the Search Engine used an Akka capability to spraw multiple actors for each keyword. + [name=gereon] is thsi still relevant? [name=Ricardo] Not for the current design, the responsibility is delegated to each implementation, i.e. if someone implements a SE in Akka, it should contemplate internally that organization. ::: ### Level 2 #### Asset Manager + responsible for registering, mantaining, and unregistering a set of assets + interacts with the Marketplace w/rt Billing + provide list of registered assets (metadata) + provide a list of interaction means with an asset #### Execution Manager + responsible for mantaining node executors and translates an Agora Query into Agora Execution Query. this includes: + checking the availability of every requested asset + checking the availability of every requested node executor / execution platform-combination ![](https://i.imgur.com/i7po5Ud.png) ### Level 3 #### Node Executor + responsible for registering, mantaining, and unregistering execution assets exclusively to perform Agora Execution Queries, including their translation to the underlying execution asset's API. ![](https://i.imgur.com/FBN1ZZI.png) ### Primitive #### Asset Interactions with an Asset are: - Register - Use (without using the Agora Execution part) - e.g. Sending an user-defined SQL-query to a DB - Use (by using ExecutionManager and NodeExecutor) - fetch data - run query on it - Cannot send billing requests by itself. it always needs the asset manager or node executor for that ##### Data - Fetch data - Including - Code - Datasets (i.e. JSON, XML) - Maybe Databases (Read) ##### Storage + Fetch Data from + Send Data to + Including + Object Stores (i.e. Minio, S3) + Maybe Databases (Read + Write) ##### Pipeline + An Agora Query, that will be executed when used by another asset + it's a shortcut > Essentially, this is just a specialized storage asset ##### PaaS (ExecutionPlatform) + Provide credentials for running a user-defined query + e.g. a flink cluster > ExecutionPlatforms are a subset of NodeExecutors. This means, that if someone wants to add an ExecutionPlatform, there needs to exist a NodeExecutor that is linked to it. > Provide access to the user into the platform to use it without the benefits of Agora Query. Traditional PaaS. (Easy entry point for users) > An ExecutionPlatform may not offer credentials for PaaS-access, but it has to offer the interfaces required by the NodeExecutor-specification > Further information can be found in [AGORA Asset Documentation (Proposal)](/cICGuug-Tde1sGuvpsBAJQ) ### Execution Primitives (refine the name) + Agora Query. = SQL Query + Agora Execution Query. = Physical Plan :::warning **NOTE:** define what an Asset, Agora Query, and Agora Execution Query are. ::: ## What is a component / an Actor? Assets are not Components. ### Logical and Physical Components Agora is defined on Logical components, but implementations can define to spread one Logical to multiple physical components. Physical components should be eager to achieve eventual consistency. ### Component Persistence Components may die, for differnt reasons. It is therefore necessary to account for this and add persistence. This can look like the following: + if a child component registers itself, its config is signed by the parent component, issuing a token. If a child component then needs to restart and re-register (this does not always need to be the case), it is able to re-register itself to the parent component by providing said token on registration. The parent component then changes the held information according to the new config provided. :::warning **TODO** ::: ### Component Trust Trust from child components towards parent components initially is mutual. It may occur though, that a parent component needs to restart and change its config. There should be some signing mechanism to allow the child component to re-establish trust into the parent component ### Component Metadata A component consists of 2 types of metadata: Mutable and final (immutable). Components need to always have final metadata, at least: + UUID (provided by self) + List of Canonical IDs (set by parents, can be empty on registration) + URL (a Domain, not an IP address) + Name (provided by self) + Type (provided by self) + Organization (provided by self) + Jurisdiction (provided by self) + Description (provided by self) Components may also have mutable metadata. This metadata is component-specific. Examples are: + accessible Publicly / not Publicly Components may also have a status, like component health. This is not considered metadata in this definition. If components take registrations from other components, they may only use final metadata. If they actively search for other copmponents, they may use final as well as mutable metadata. ## Notifications between Components Components usually communicate on a pull-based approach (e.g. by polling another component on a regular basis). It is possible though, that a component *also* offers an endpoint, which lets other components register themselves to receive updates as they occur. At the moment, this is a theoretical concept. It hasn't been implemented yet ## *Extensibility for Plug-ins (Recommeder System)* :::danger **TODO - Sergey's use case** ::: ## What is the Agora Network In Agora, components are defined as nodes in a graph, where vertex are the registrations in a higher level component, for example, a Marketplace is registered in one or more Keepers. ### Unique ID This ID will identify the single component at the component it registered itself at. The UID is told to the component. In its most basic form, it may only be the name of the component. If a component registers at multiple higher level components, it may receive multiple UIDs. It is legitimate though, that a higher level component may reuse the UID given by another higher level component. For all tenes eand purposes, those identical UIDs should be considered as different UIDs though. :::danger **DONE** Discuss which type of ID could be more suitable with the Agora Ecosystem, whether uuid or some other spec. One idea is to follow a short UUID similar to Github/Commit. Persistency of the Unique ID could come from the higher level component and stored in the config file, so that when the component restart the structure of the network can be restored (stateful). The rational behind lies on the fact that some components can be part of "pipelines" or "composed assets". ::: ### Canonical ID The Canonical ID is a representation of the hierarchy, where the keeper is the root followed by the chain of components. Therefore one component may have more than one construct of the Canonical ID. Each component of a Canonical ID is a Unique ID :::danger **TODO** Probably names separated by slashes ::: ### Actors in the Network Within the network, there are different actors that interact with eachother. #### External There exist **users**, that want to interact with the network by e.g. offerning assets, or using assets in a pipeline. To facilitate governance of assets, every asset is managed by exactly one **organization**. Every user can be part of multiple organizations, including one that is only linked to oneselve, which may be created automatically. :::warning + Should there be hierarchies within organizations? ::: + CKAN: http://docs.ckan.org/en/2.9/user-guide.html#users-organizations-and-authorization #### Internal Internal Actors are the components the network consists of. ## Shaping the Network ### Banning Components Components can be banned by higher-Level components, to prevent misuse or similar. This is done by giving the Higher-Level Component a set of rules that define, which lower-level Components it accepts within its perimeter. Banning another component tends to not have an impact beyond removing the component from listings. This is, because the reprecussions against other lower level components are rather weak. So another component *can* also apply the ban rules set out by a higher level component, but if it doesn't, the hardest repecussion would be, that the higher level component itself bans disobeying component. Rulesets are inspired by: + https://doc.traefik.io/traefik/routing/routers/#rule ## What is an Agora Ecosystem? An Agora Ecosystem is a non-empty set of Agora Keepers, having the same functionalities, mantaining an ecosystem along with other Agora Components. ## Component Listing + We cannot prevent a higher level component from independently discovering and listing a lower level component + we may need to deal with implications regarding Billing structure etc. later on :::warning is this importatnt? ::: ## Component / Governance Diagram ```plantuml [Client] cloud "Ecosystem" { [Keeper] [Marketplace] package "AssetManager" { [Data] [ExecutionPlatform] [Pipeline] [Storage] } [ExecutionManager] collections NodeExecutor [SearchEngine] } Keeper --> Marketplace Keeper --> SearchEngine Marketplace --> AssetManager Marketplace --> ExecutionManager ExecutionManager --> NodeExecutor NodeExecutor -[dashed]- ExecutionPlatform AssetManager -[hidden]- ExecutionManager ExecutionPlatform -[hidden]up- Data Client --> Ecosystem ``` :::danger **Some parts of this document have been moved:** + [AGORA PoC v0.2 | Agora-System](/sqedw7tHRye1RoAt5O108A) + [AGORA PoC v0.2 | API](/TreDITwVTiK_KRemt9QOLA) + [AGORA PoC v0.2 | Message Sequences](/AUbhB5KxSb2jQJBloKtC1g) + [AGORA PoC v0.2 | Asset (Proposal)](/cICGuug-Tde1sGuvpsBAJQ) + [AGORA PoC v0.2 | Proof of Concept Definition](/aSqmHZ4uRW-iC9N_Pesd5g) + [AGORA PoC v0.2 | Old Stuff](/B4Nn8Zn9RVyGJDk4bHYHbw) ::: TO DISCUSS - User Administration -> OpenID? - Initially Marketplace could