AGORA PoC v0.2 | Old Stuff

[Agora-System](/sqedw7tHRye1RoAt5O108A) --- [API](/TreDITwVTiK_KRemt9QOLA) --- [Message Sequences](/AUbhB5KxSb2jQJBloKtC1g) --- [Asset (Proposal)](/cICGuug-Tde1sGuvpsBAJQ) --- [PoC Definition](/aSqmHZ4uRW-iC9N_Pesd5g) --- [Old Stuff](/B4Nn8Zn9RVyGJDk4bHYHbw) --- [Demo](/rAv27FvBQXmv_rLpKQB25w) # AGORA PoC v0.2 | Old Stuff [toc] ## What is an Ecosystem? An Ecosystem consists of the whole Actor Stack. + [name=jorge] Is a set of interconnected actors working towards achieving the same objective, sharing assets. An Ecosystem is a non-empty set of Agora Keepers, having the same responsibilities and functionalities, mantaining a set of Agora Components. ### Public Ecosystem * [name=ricardo] Remove the Any entity can join, there exists no controlling organization. this requires consensus. Therefore, this is currently not pursued. ### Private Ecosystem Any entity or only a limited numer of entities may join. There exists a single controlling authority, acting as the deciding authority ### Component A piece of software that can interact following the Agora API (or protocol) within an Ecosystem. ### Asset For Agora, an Asset is any resource that could be used to perform a Data Science task, such as Datasets, Models, etc. ## Component Diagram ```plantuml [Client] cloud "Ecosystem" { [Keeper] [Marketplace] package "AssetManager" { [Storage] [Execution] [Pipeline] } [ExecutionManager] collections NodeExecutor [SearchEngine] } Keeper --> Marketplace Keeper --> SearchEngine Marketplace --> AssetManager Marketplace --> ExecutionManager ExecutionManager --> NodeExecutor Client --> Ecosystem ``` ## Actor Types ### Keeper Acts as a Registry. As soon, as it exists, an Ecosystem is created. A shutdown of the ecosystem is done by killing all Keepers. This creates a Shadow ecosystem, where all other components can still interact with each other, if they know their URI, but the ecosystem won't be able to grow anymore. All Keepers are equal. #### Use Cases + Acts as a Registration Point for new Marketplaces + Can provide a List of Marketplaces + Acts as a Registration Point for new SearchEngines + Can provide a List of SearchEngines + Needs to ask other keepers for their Keepers within the ecosystem + Tells other Keepers its keepers within one ecosystem + It can have a Black-List of Marketplaces + Has an owner from an Organizational point of view + Should eventually know all other keepers + Functionality to define who can play as keeper. + [name=ricardo] *Need of an acceptance interface, implemented a simple mechanism, the extensibility can be that anyone can implement its own acceptance mechanism, e.g. certificates, SSO* #### Non-Use-Cases + It does not keep track of anything but Marketplaces, but new components (besides marketplaces) to the system can ask the Keeper for available Marketplaces + It does not search for Marektplaces #### Extended Use-Cases + Keeps track of other Ecosystems and may point one of its components towards the keeper of the other ecosystem + Allows for Merging of ecosystems ### ~~Browser~~ :arrow_right: Client Contains a Frontend as well as maybe a Backend Component, if Caching etc. is necessary. It is not an Actor, as it may only receive messages in a request-response-pattern that it previously sent. If containing a Backend, it needs to obey some rules, e.g. regarding Data Protection and Privacy. Is not part of the Ecosystem. #### Use Cases + Provide interaction opportunities to the user + Translate user interactions into Messages + Send Messages to other components + Create unique Request IDs for every Message + Add its own URI to every message for response purposes #### Non-Use-Cases - Cannot be used within a Pipeline #### Discussion + [name=gereon] from my point of view, the Browser is not really part of the ecosystem, as it does only interact with Actors, but actors cannot depend on it by definition. + [name=gereon] Thou it really does benefit from living below a Marketplace + but it is always accessed without using canonical IDs? + but having a canonical ID would allow easier addressing for sending back messages + How can it be addressed? + it does not need to be addressed + [name=joscha] in the current akka prototype this Component acts as an API-Gateway. Since that is not necessarily the job it should have in the new design we should discuss the changes and it's impact. ### Marketplace A Marketplace defines a List of ~~rules~~ properties all assets in its area ~~have to~~ should obey. This includes + Ownership + Geographic Location (including esp. applicable Law) + Type of assets that can be registered It keeps track of buyers, sellers, and transactions #### Use Cases + Register assets (sellers) + can exclude assets from its list + Register an usage of an asset as a transaction + register new asset users (buyers) + give information about usages (collect metrics) #### Discussion How can the rules be enforced? + [name=gereon] According to the concept, it can only give weak guarantees and exempt malicious actors; if they don't obey, excluding them would be the only possible choice ### ~~Asset Provider~~ :arrow_right: Asset Manager Represents a dataset or access to computation resources like Flink. Can contain multiple Assets. **Assets** are not their own component, but addressable Resources within the Asset Manager. #### Use Cases + Send Dataset Metadata + Send Dataset + Provide Storage + allow the #### Sub-Versions + Storage Asset Manager (Provides Data) + Pointer Asset Manager (Provides Data) + Pipeline Asset Manager + Computation Asset Manager #### Discussion + should there be a generic interface for doing asset-specific actions (write action and data in the payload), or should the asset expose a list of possible actions on the same level as the common asset API (own API Endpont for each of them) ### Execution Manager An Execution Manager allows rather general purpose-jobs to be executed. ![](https://i.imgur.com/QGJiK3P.png) #### Use Cases + Take a data transformation job + Distribute it to an Execution Node + Asks the Execution Node for status + Provides an interface for requests on the status of a job ### Node Executor Takes care of executing Pipelines. ![](https://i.imgur.com/PwXxwf3.png) #### Use Cases + Translates mini-Pipelines into actual computations + e.g. writing a spark query + Moves Data between Assets #### Non-Use-Cases + Doing executions like performing a Flink Job ### Search Engine Allows to look up other components by means of metadata. #### Use Cases + Fetch List of Marketplaces + Fetch List of Assets from Marketplaces + Provide Endpoint for sending a Query to #### Discussion ## Data Types ### Canonical ID The *Canonical ID* identifies any given Asset Provider (Keeper?, Marketplace?) by using the following structure. It *cannot* identify anything else. #### Structure + Keeper ID + Marketplace ID + Asset ID #### Discussion + Do specific Node Executors need to be targeted? ## Message Sequences ## System-wide Use Cases ### Starting an Ecosystem ```plantuml start :Create 1 Keeper; split :Create more Keepers; split again :Create at least 1 Marketplace; :Join Marketplaces to Keeper; split again :Create at least 1 SearchEngine; :Join SearchEngines to Keeper; end split end ``` 1. Create 1 Logical Keeper. This defines an empty Agora Ecosystem. 2. Create at least 1 Marketplace. If a Keeper exists, the marketplace will register in the Ecosystem, if not a Marketplace can exists as an Agora Marketplace with limited functionalities. 4. Create at least 1 SearchEngine. Must join a Keeper. ## Discussion Corner ### Pipelines / What are Execution Managers? Where does this component live within the structure? It can do so either on the same level as a marketplace, below the marketplace on a level similar to asn asset, or it could very well be an asset #### same level as marketplace Execution managers are a core component of Agora, and therefore should hold a special position within th ecomponent structure #### below marketplace, but is not asset like a marketplace, an Execution Manager needs to give guarantees regarding e.g. Data processing. Therefore, these could be inherited from a marketplace #### em are assets As execution managers take jobs and therefore may be used like any other asset like flink etc., they may very well be just another asset. This may lead to difficulties, as it should be possible for other entities to join an own node executor to an existing execution manager. This does not really fit into the Canonical ID Structure #### hybrid maybe this is worth a look #### Figure: Pipeline ![](https://i.imgur.com/p0DGJnJ.png) #### Differentiate Components and Assets (Joscha) + proposal to differentiate between Agora Components (like Marketplace, Search Engine, Node Executor -> the actual actors in the system) and Assets like Dataset etc. to avoid confusion. #### How to implement Component Discovery with HTTP? Right now we use Akkas Receptionist to make initial contact. How should we implement this with HTTP?