# FAIRPoints-Harmonising FAIR data sharing with Legal Compliance :::danger **Date**: March 23th 2022 **Event Summary:** [FAIRPoints_Point_3](https://www.fairpoints.org/fairpoints_resources/) ::: **Speaker bio:** https://harshp.com/research **Intro slides:** https://shiny.link/dvu6zT **Keynote slides:** Participants #: 25 ### Agenda: | Time | Agenda | Speaker | | ------------ | ------------------------------------- | ---------------- | | 15:00-15:10 | Welcome, Housekeeping & introductions | Sara | | 15:10-15:50 | Keynote + Q&A | Harshvardhan J. Pandit | | 15:50-15:55 | Silent Documenting- Takehome messages | All | | 15:55-16:00 | Wrap-up | Sara | ### Links: - **Monthly Keynote events** April 21st 2022: FAIRPoints-Things you need to know about Data Access Statements **Register** 👉 https://www.lyyti.in/fairpoints_das - **Slack:** [shiny.link/F71wE](https://shiny.link/F71wE) - **Monthly Community Discussions** April 8th 2022 👉 https://shiny.link/Jl6nuV - FAIR for beginners - Schemas for training events-FAIRPoints - Sign-up to **event series** 👉 https://bit.ly/3BEQ06X - To get in touch with Harsh: me@harshp.com OR pandith@tcd.ie Twitter: @coolharsh55 ---- ## Code of Conduct reminder * Be respectful, honest, inclusive, accommodating, appreciative, and open to learning from everyone else. * Do not attack, demean, disrupt, harass, or threaten others or encourage such behavior. * Be patient, allow others to speak, and use the zoom reactions & chat if you would like to voice something. * See also our [participation guidelines](https://www.fairpoints.org/participation_guides/). # Q&A: :::success ❓ *Please add any questions you might have during the course of the session here:* ::: * Could we have a copy of the presentation to be shared with my team members? * Answer: yes Harsh has kindly agreed to make the slides available :) we will share it + recording * Awesome! Thank you! * Katie: curious as to what is being done, if anything, in the US? * Answer: NIST has made strides * Harsh: Good place to go down the rabbit hole: (https://www.nist.gov/privacy-framework) Note that this doesn't give anything readily usable for FAIR+Legal, but it has a great set of requirements and reports * Katie: how does this affect ongoing research in healthcare that spans different countries, e.g. VODAN? (+1) * Answer: this is still an open question and why we need something like CC-by; something that is globally recognized and uses a kind of commons vocabulary that harmonizes terms and conditions for FAIR data * Katie: is info for curators to handle this practically in current data catalogs/repos? * Answer: look at specific use cases and see how to operationalize in a practical way (don't assume need to set up complicated triplestore, etc.). Perhaps simply adding metadata about license specifications (beyond just if there is a license); don't focus on legalese, etc., look to "low hanging fruit", whatever that means for your institution. (how to secure data, what is use...) * sara: do we examples e.g. from clinical FAIR datasets? * Joakim: But how can you *monitor* the use and compliance with such a license? * Answer: How to determine how data used for that purpose/conditions? One condition is that you can withdraw your consent.. how can you make sure compliance... simple answer is that you cannot. Compatability issues between legal aspects and FAIR. Harsh explains value of data chain. Reuse will need to go back to original consent to capture efficiently, need to create the structure for these permissions. * Harsh: To add to above, these 'inconveniences' are sometimes by design. I wouldn't want malicious use of my data by sharing it too freely. At the same time, I don't want to be bothered every time with a new request. We lack the mechanisms through one can express prohibitions e.g. never use this for profiling ; and permisisons e.g. use this for any beneficial purpose in medical research. There are ways to express this in machine-readable form, but nothing that is legally enforceable for now. So we need more research into vocabularies & laws. * Comment by Philipp: As for compliance monitoring, you might also want to have a look at the Lophi approach being developed at our university (UiT The Arctic University of Norway); cf. https://hdl.handle.net/10037/22223; https://hdl.handle.net/10037/23701 * David: Great question. Also what legal process occurs when someone does violate the license? If penalties are minor or non-existent then ignoring the license is de-facto legal +1 +1 * Answer: Depending on license, email to company using, possibly file court case... report to legal authority, they can look into, becomes more of a collective action... maybe more people are impacted. Data protection authority would have to apply... US doesn't have federal... California does. FTC but not similar. Right now dedicated agencies are putting together. Right now data producer would have to bring case forward. Example: data leaks related to taxes. * Biru: I may be wrong about this, but I am a bit concerned regarding what type of infracstructure that will need to be put into place to support this from a service point of view? Who are the data trustees in this context and who maintains the system? On the other hand, I think this is AMAZING :) * Answer: What kind of infrastructure needs to be in place, well organized behind this, people to help researchers - Canada, privacy laws are different between provinces - go after low hanging fruit, go after the easiest legal thing - try to avoid using legal talk - generic conversation, some of the questions that were brought up in the talk/in the notes. Humanize the conversation. Information awereness training - also connection with metadata. Consent - not so comfortable to just give my consent, what about my participants... comfort on speaking on behalf of participants? you cannot use consent everywhere - example click agree everywhere is a nightmare - trying to be more specific about use/general language per again, some of the questions that Harsh brough up in the talk/notes. But laws mising/language/metadata missing at the moment - try to deal with documenting in some way per these general constraints... * Harsh: Data trustees would be an entirely different type of organisation/workflow where some other entity has agency over your data. Even in this case, I think the points is that the 'license' or 'policy' to that data being FAIR has benefits for legal compliance, so while consent may have its problems, this can help point an entity e.g. everytime data is shared, notify X, or only use with permission of X. This opens up collectives and group actions not possible without FAIR IMHO. * Arnold: If I understood you correctly, creating a barrier and giving access actively gives you more protection ("I give up my responsibility here") than simply putting a dataset openly with a restrictive license attached to it. Is that so? Why is that the case? Does the license somehow become "stronger"? * Harsh: This was for the case where putting the dataset in the open might have negative consequences e.g. data is sensitive and has impact - such as medical data on patients. That's why I called it the 'hospital model', where data sharing should not take place freely unless some responsibility and accountability has been established. At a minimum, knowing who is taking that data and for what purpose. * Batool: Is there any known model for repositories which track the use of non-sensitive data and its compliance with copyright if it's under restricted (less open) license (e.g. CC BY-NC-ND)? * Harsh: not that I'm aware of, other than the data portals that host data (and you can search metrics for specific licenses) * David: Are there any implications for existing FAIR and legally compliant datasets as legal requirements change? Can an existing dataset be found out of compliance, and who has the responsibility of fixing it? * Harsh: Yes, this happens all the time. Good example: cultural heritage - where data is suddenly problematic to freely share because it may contain personal data. In such cases, it is the responsibility of data user/adopter first to ensure they use data as per updated legal requirements. Existing institutions that host this data can also think how they can change or upgrade their datasets to make sharing less problematic. Sometimes there are no easy quick-fixes, in which cases, you try to find a compromise e.g. release under different license, or release under agreement. * Lisanna: (moving it down because not strichtly related) Thank you for the useful insights. My question is more related to project management than FAIR datasets, however: how do you suggest to treat documents like this one, where participants of a call add their name and other data? How should we treat the URLs of such documents? * Harsh: If I understand this correctly, it refers to documents such as THIS? In this case, the URL itself can be shared with no barriers because its an event (i.e. organisers are okay with sharing). But publishing who attended the event may get dicey in legal terms because people may not have agreed to make their presence to a 'regulated' i.e. non-public event known to the wider world. Same applied to FAIR data was kind of the goal of pointing out personal data in datasets :) * Thank you! * Korbinian: https://github.com/EBISPOT/DUO does apply this issue from a more narrow angle from the perspective of direct, practical implications for health related data instead of modelling the legal code, do you think this could be harmonized with your vocabulary? * Harsh: yes, I think there is a way to create what DUO aims to do, with the vocabulary of DPV. Its been sitting in my to-do list for a while :/ Basically, ODRL gives a good (complex but exhaustive) vocabulary to model 'legal contract'. To reduce it to a simple representation like DUO is possible if we treat DPV like a controlled vocabulary. My main concern in this would be interoperability and expand-ability of DPV (which IMHO are its strengths). So we need works exploring this challenge. * Chukwuemeka: Jamaica/data protection mechanisms in place, but how can we leverage tech solutions/standard vocab - how do we format so that they can be analyzed for compliance? * Answer: Data Protection Law, writing journal article apply templates/in machine readable way -> spreadsheets are helpful here -> will be public -> "DPCAT - specification for machine readabile/interoperable catalog for GDPR" -> spreadsheet, companies love spreadsheets! Twitter: @coolharsh55 -> would love to test this * Harsh: indeed, good to see uptake of dp-law in Africa (met a colleague from Zimbabwe last week, so got refreshed on recent developments). Happy to help with moving things along. Please drop me an email so we can continue this conversation later, and I can share recent developments on ROPA and DPV. :::info # FAIRPoints- What is your take home message from todays session? *✏️ Silent documenting of learning outcomes+ share outs, add +1.* ::: * Key take-aways: 1) Quantifying legal requirements for data sharing for domain / use case 2) Aligning legal requirements with FAIR workflows - which laws? Which jurisdictions? 3) Creating machine-readable vocabularies for policies - generic -> specific 4) Developing new FAIR workflows based on legal actors/policies ## Notes: - FAIR enabling data sharing mechanisms - Making data more open, useable, in workflows - Many legal aspects to be mindful of when sharing - Copyright Law, Privacy Regulation, Data Protection (GDPR), EU - AI Act, Digtal Markets, Data Act, Digital Services Act - dedicated spaces per health - Value add should be mindful of legal compliance so that benefits can be seen throughout lifecycle - Any personal data, processing of which (strong regulation in EU) - Personal data - email, how many steps you walked, etc - Types of process per the data - Who is involved in the process - How to check processing, how are you following per GDPR - General template - Definitions of... Personal Data, Processing, Purpose, Actors - Consider FAIR data as catalogs, actors, licenses - problem, GDPR uses different terminology - GDPR -> Controllers Data Subjects, Legal Basis, Sensivtive, Special Category Data - GDPR per domains, EU-wide - Anyonymised - no chance of reidentifying people in data, simply saying this is not a solution, there are exploits of re-identifying - If I have 5 data points on you, I can re-identify, gender, age, group... broswer fingerprints, attributes that allow you to re-identify - If you process this type of data, they are consdiered special categories, you will need explicit consent, for public benefit, etc - check whether belongs to sensitive categories -> how to compare - Step 1 making it findable, interoperable - Step 2 distribution of data to other sources (companies) - Notion of purpose is weakly defined -> FAIR notions don't necessarily match up with legal compliance - Widespread problematic occurrence using the wrong legal basis to use content/data -> consent is not exactly enough, window for those who use it to misinterpret and reuse in unintended ways - specificty/clarity can help with being less problematic - Principles in Article 5 should be minimal - FAIR is more about make as much available as possible - Article 7 - 3rd party consent, creating transparency, notices, ability to object -> way to report to authorities - Make sure to capture the data provenance, the source, and every entity that modifies the data -> consent mechanisms need to be in place - Add legal metadata to FAIR datasets -> has some metadata associated, more per legality -> achieve responsible data sharing -> take lessons from CC-BY (common used licenses) everyone knows how to use -> commons terms so it is more efficient for people to understand if they can use - Metadata needs to include domain, field, location/jurisdiction -> terms and conditions for FAIR data, labeling sticky policy -> instead of adhoc, strictly define - Example: My data has sensitive data, only for scientific research, at your institution - jurisdiction doesn't matter - Example: Data can only be used by domain experts, can only be shared with others in the EU - restrictions/conditions -> start with something simple -> create data chain, attach license throughout - No legal notion of when something is less fair, compliant, legally compliant - Open Digital Rights Language https://www.w3.org/TR/odrl-model/ -> decision chart, covers a lot of existing legal contracts -> doesn't exactly relate to GDPR, how datasets are shared - Data Use Ontology (DUO) https://github.com/EBISPOT/DUO -> want to obtain simplicity like this but want it to be generic - Data Privacy Vocabulary (DPV) w3id.org/dpv - dictionary of concepts, all solve different problems (including above) - What if we combine? more complete picture, decisions/definitions (see slides w/ visual -> Labelling FAIR data for 'sharing' in legally compliant manner) - Problems - FAIR metadata is essentialy a promise -> how do we ensure that the promise is fullfilled (example of use challenges) -> solution not to share the full data, only share metadata, part of it -> create legal contract to then give full once a contract is made -> can be done by machines and not just humans -> calls hospital model -> insurer/patient scenario - ORDL Vocab -> want allow something to express elements and want to make specific per regulation -> don't have to attach specific consent, can say local regulations (at generic level) -> layers of regulation, whether they can assess at those levels -> data value chains are again needed here -> how can they match their compatability -> saying do your legal compliance in a FAIR way, share metadata in a FAIR legal way - Solutions above are options but there is no standard, all above are helpful to answering this challenge # Thank you for joining! 🎉 ## 5 ways to stay involved * Sign-up to event series: [https://bit.ly/3BEQ06X](https://bit.ly/3BEQ06X) * Website: https://www.fairpoints.org/ * Twitter: [@FAIR_Points](https://twitter.com/FAIR_Points) * Slack: [shiny.link/F71wE](https://shiny.link/F71wE) * Email: [fairpoints@protonmail.com](mailto:fairpoints@protonmail.com)