# 2025-05-15 Discussion on data sharing

[Meeting recording and transcript](https://teams.microsoft.com/l/meetingrecap?driveId=b%21-swC0KavbEmFbDS_3HYVXWGrE0YzeKZNoQTVjScaln76pCNJRhdiRK71eyMiuZtN&driveItemId=01YOXG5MGUL5NJWLCGSNGKCFD6VAPHBHGD&sitePath=https%3A%2F%2Fdpsich-my.sharepoint.com%2F%3Av%3A%2Fg%2Fpersonal%2Fgiovanni_pizzi_psi_ch%2FEdRfWpssRpNMoRR-qB5wnMMB9oATE98Cqu1NlnL9VsWlMA&fileUrl=https%3A%2F%2Fdpsich-my.sharepoint.com%2F%3Av%3A%2Fg%2Fpersonal%2Fgiovanni_pizzi_psi_ch%2FEdRfWpssRpNMoRR-qB5wnMMB9oATE98Cqu1NlnL9VsWlMA&iCalUid=040000008200E00074C5B7101A82E00800000000CB7BDAE14FBFDB01000000000000000010000000CEB7BD1D0199114981B976E3043D7484&threadId=19%3Ameeting_MjUzZGQ4YzQtNDM3Mi00YmUwLWE5ZDEtYmJhOWQ4OTcyNzFm%40thread.v2&organizerId=fcd49011-dac4-4c01-9bc6-eefdd0021243&tenantId=50f89ee2-f910-47c5-9913-a6ea08928f11&callId=ac56d3fb-627a-4d25-94bf-c68a889c9f4b&threadType=Meeting&meetingType=Scheduled&subType=RecapSharingLink_RecapChiclet)
## Notes regarding Edan's setup
- Via Docker compose:
| Entity | Same? |
| ------------------------ | ------------------ |
| Unix user | :x: |
| AiiDA profile (DB & DOS) | :white_check_mark: |
| AiiDA user | :x: |
(@all: anything else to consider? DB user is included in Aiida profile)
- Computer created and configured by Alice. For Bob:
- Not visible by `verdi computer list`
(because the computer is unconfigured for Bob (Computer configuration stored via `Authinfo`, in `db_dbauthinfo` table, which has user attached).
- Visible to Bob by `verdi computer list -a`
- If Bob re-submits a calculation from Alice (e.g., bc it failed), this will fail because the computer won't be available -> In Docs: Recommend switching the user for this scenario (alternative is to configure the Computer in the same way for Alice, or provide a simple way to achieve this, e.g., `aiida-project synchronize`)
## Meeting Notes
- Past discussions on the topic: https://drive.google.com/drive/u/0/folders/1AUa_zUM7YnmyNq9jp4D2FtMjQQRU2nvK
- Profile, database, repository -> These are the three main components we need to consider:
- Profile -> At least DB and repository (but can also be other configuration options)
- No sharing of UNIX account/permissions
- PSQL DB easy using ports
- Shared file system or S3 for disk-objectstore
- Possibly manage via AiiDAlab, give users an option to create an account/instance on this shared hosted "portal"
- Use cases:
- Share with people outside of group: Lower level of trust s
- Fully shared data access for instance (mounted DB/DOS): Requires full read/write access (to DB and file system)
- Push/pull mechanism: Still requires some shared instance, with minimal setup -> Can use AiiDAlab infrastructure (currently, one profile for each archive uploaded, data stored on a virtual machine)
- Having an easy way for group to set up remote, shared instance
- Manage permissions via keys, similar to git
- One would have 3 users: central one without the ability to submit something (level 0), only used as central hub; plus 2 local users that interact with the central instance
- Sharing is on a per-project base, can contain multiple profiles (but complicated)
- One could put access level on a per-user, per-profile basis beforehand, e.g., A has access to only one profile/group/nodes of B's project. This would require creating a new DOS repository for different parts of the data, rather than having a single repo for all the data
- Make the pushing so automated that one could run it automatically, e.g., every one hour (e.g., make sure only sealed nodes are shared)
- Push/pull has to be incremental
- Dangling nodes during export breaks provenance, and makes importing tricky, as, e.g., inputs are missing
- `input-calc-backward` should not be toggleable. Direct inputs should always be attached.
- What about extras, groups, and other mutable entities
-
- Problem for us (not for git) data duplication; Could have very large files (`.aiida` in the order of terrabytes); one doesn't want to have to buy a new machine with extended storage just to be able to download and look at file (while only requiring a small fraction of the data) -> This is avoided with fully shared, mounted DB and DOS
## Options for lab-level AiiDA setup
1. Single AiiDA instance on a server
- User profiles for isolated work (e.g., alice, bob)
- Shared profile per collaboration (e.g., alice_and_bob)
- Multiple users
- Currently not supported
- Would share profile UUID (RMQ can't separate tasks by user)
- Fix: user-level configuration
2. Multiple AiiDA instances
- User profiles for isolated work
- Also isolated by machine, but same principle
- Shared profile per collaboration
- All collaboration members create a shared-name profile (e.g., project1)
### Comments
- In both cases, we could introduce a utility to streamline data migration from personal profiles into collaborative ones
- For shared profiles, the POSTGRES_DB is shared
### Concerns
- ???