# 2025-05-15 Discussion on data sharing ![{DDB63006-1C0F-42F6-86D5-2049C99811D4}](https://hackmd.io/_uploads/ryGDRMXZel.png) [Meeting recording and transcript](https://teams.microsoft.com/l/meetingrecap?driveId=b%21-swC0KavbEmFbDS_3HYVXWGrE0YzeKZNoQTVjScaln76pCNJRhdiRK71eyMiuZtN&driveItemId=01YOXG5MGUL5NJWLCGSNGKCFD6VAPHBHGD&sitePath=https%3A%2F%2Fdpsich-my.sharepoint.com%2F%3Av%3A%2Fg%2Fpersonal%2Fgiovanni_pizzi_psi_ch%2FEdRfWpssRpNMoRR-qB5wnMMB9oATE98Cqu1NlnL9VsWlMA&fileUrl=https%3A%2F%2Fdpsich-my.sharepoint.com%2F%3Av%3A%2Fg%2Fpersonal%2Fgiovanni_pizzi_psi_ch%2FEdRfWpssRpNMoRR-qB5wnMMB9oATE98Cqu1NlnL9VsWlMA&iCalUid=040000008200E00074C5B7101A82E00800000000CB7BDAE14FBFDB01000000000000000010000000CEB7BD1D0199114981B976E3043D7484&threadId=19%3Ameeting_MjUzZGQ4YzQtNDM3Mi00YmUwLWE5ZDEtYmJhOWQ4OTcyNzFm%40thread.v2&organizerId=fcd49011-dac4-4c01-9bc6-eefdd0021243&tenantId=50f89ee2-f910-47c5-9913-a6ea08928f11&callId=ac56d3fb-627a-4d25-94bf-c68a889c9f4b&threadType=Meeting&meetingType=Scheduled&subType=RecapSharingLink_RecapChiclet) ## Notes regarding Edan's setup - Via Docker compose: | Entity | Same? | | ------------------------ | ------------------ | | Unix user | :x: | | AiiDA profile (DB & DOS) | :white_check_mark: | | AiiDA user | :x: | (@all: anything else to consider? DB user is included in Aiida profile) - Computer created and configured by Alice. For Bob: - Not visible by `verdi computer list` (because the computer is unconfigured for Bob (Computer configuration stored via `Authinfo`, in `db_dbauthinfo` table, which has user attached). - Visible to Bob by `verdi computer list -a` - If Bob re-submits a calculation from Alice (e.g., bc it failed), this will fail because the computer won't be available -> In Docs: Recommend switching the user for this scenario (alternative is to configure the Computer in the same way for Alice, or provide a simple way to achieve this, e.g., `aiida-project synchronize`) ## Meeting Notes - Past discussions on the topic: https://drive.google.com/drive/u/0/folders/1AUa_zUM7YnmyNq9jp4D2FtMjQQRU2nvK - Profile, database, repository -> These are the three main components we need to consider: - Profile -> At least DB and repository (but can also be other configuration options) - No sharing of UNIX account/permissions - PSQL DB easy using ports - Shared file system or S3 for disk-objectstore - Possibly manage via AiiDAlab, give users an option to create an account/instance on this shared hosted "portal" - Use cases: - Share with people outside of group: Lower level of trust s - Fully shared data access for instance (mounted DB/DOS): Requires full read/write access (to DB and file system) - Push/pull mechanism: Still requires some shared instance, with minimal setup -> Can use AiiDAlab infrastructure (currently, one profile for each archive uploaded, data stored on a virtual machine) - Having an easy way for group to set up remote, shared instance - Manage permissions via keys, similar to git - One would have 3 users: central one without the ability to submit something (level 0), only used as central hub; plus 2 local users that interact with the central instance - Sharing is on a per-project base, can contain multiple profiles (but complicated) - One could put access level on a per-user, per-profile basis beforehand, e.g., A has access to only one profile/group/nodes of B's project. This would require creating a new DOS repository for different parts of the data, rather than having a single repo for all the data - Make the pushing so automated that one could run it automatically, e.g., every one hour (e.g., make sure only sealed nodes are shared) - Push/pull has to be incremental - Dangling nodes during export breaks provenance, and makes importing tricky, as, e.g., inputs are missing - `input-calc-backward` should not be toggleable. Direct inputs should always be attached. - What about extras, groups, and other mutable entities - - Problem for us (not for git) data duplication; Could have very large files (`.aiida` in the order of terrabytes); one doesn't want to have to buy a new machine with extended storage just to be able to download and look at file (while only requiring a small fraction of the data) -> This is avoided with fully shared, mounted DB and DOS ## Options for lab-level AiiDA setup 1. Single AiiDA instance on a server - User profiles for isolated work (e.g., alice, bob) - Shared profile per collaboration (e.g., alice_and_bob) - Multiple users - Currently not supported - Would share profile UUID (RMQ can't separate tasks by user) - Fix: user-level configuration 2. Multiple AiiDA instances - User profiles for isolated work - Also isolated by machine, but same principle - Shared profile per collaboration - All collaboration members create a shared-name profile (e.g., project1) ### Comments - In both cases, we could introduce a utility to streamline data migration from personal profiles into collaborative ones - For shared profiles, the POSTGRES_DB is shared ### Concerns - ???