--- ###### tags: `infothèque` --- # FORS Online Workshop: Addressing the top 10 data management questions :calendar: 20-21 juin 2022 :round_pushpin: Zoom :memo: https://forscenter.ch/the-top-10-data-management-questions/ :uk: slides + réponses des participant·e·s (Wooclap): https://drive.switch.ch/index.php/s/KWRzk7p9Y3cexfC présentateur·trice·s: Alexandra STAM, Marieke HEERS, Pablo DIAZ (FORS) env. 30 participant·e·s --- [TOC] --- ## Intro Nouvelle plateforme dédiée aux données répliquées (en sciences sociales) SUBreplica: resources.swwisubase.ch/replication ## 1. Why should I care about data management? :uk: *Alexandra Stam* planning (e.g. DMP) VS day-to-day data management (e.g. fixing rules, drafting a consent form) **requirements from funders** DMP data sharing in FAIR repo **requirements from institutions** DMP data sharing **requirements from journals** Deposit of data used in publications Sufficient documentation **other motivations:** saving time improving data quality enabling data re-use and reprodutcionb of research results contributing to building a better research culture ## 2. What normative framework applies to my data? :uk: *Pablo Diaz* **Open data movement** data as public goods transparency cumulativity **Copyright laws** **Personality rights** You must 16 years old to agree about such rights. If you are younger, you need parents' agreement. **Data protection laws** Internationla level (e.g. GDPR) [unctad.org](https://unctad.org/topic/ecommerce-and-digital-economy/ecommerce-law-reform/summary-adoption-e-commerce-legislation-worldwide) Federal level Cantonal level What law applies? Laws that apply can be cumulative. :::success **Exercise 1** The objective of this research project, led by a researcher from the University of Lausanne, is to study the presence in French and Swiss society of right-wing values and the reasons for their development, beyond groups which are labelled as extreme right wing. We will attempt to give an account of the reasons which lead to people becoming politically involved in the «populist right», where these values are to be found, and in ways which are more or less affirmed, more or less euphemized. In order to carry out this research, we will administer a questionnaire and conduct life story interviews in Switzerland and France. **Solution** • Processing personal data : Yes (life story interviews) • Establishment: Switzerland (UNIL) • Legal status: Cantonal organ (UNIL) • Place of data collection (+ targeting): Switzerland and France • Sector-specific law: Nothing obvious Laws that apply • LPrD • GDPR ::: :::success **Exercise 2** This project, led by a researcher from EPFL, aims to understand how people with severe memory disorders react to significant historical events. To do this, we will recruit - in collaboration with the CHUV - people diagnosed with severe disorders. They will then be invited to take part in viewing sessions of news that marked their era. Brain imaging instruments will be used in order to study the reactions of the brain. **Solution** • Processing personal data : Yes (brain imaging) • Establishment: Switzerland (EPFL + CHUV) • Legal status: Federal/Cantonal organs (EPFL + CHUV) • Place of data collection (+ targeting): Switzerland • Sector-specific law: Yes Laws that apply • FADP • ETH Act • LPrD (CHUV) • HRA ::: ## 3. Am I processing personal data? :uk: *Pablo Diaz* Any data about an identified or identifiable person are personal data. If you do anything with personal data (collect or store or archive or whatever), you have to comply with laws on data protection. **Sensitive data** The list provided by FADP (art. 5 lit. c FADP) is **exhaustive**. Salaries for example are not listed, so there not sensitive from a legal perspective. > c. données personnelles sensibles (données sensibles): > 1. les données sur les opinions ou les activités religieuses, philosophiques, politiques ou syndicales, > 2. les données sur la santé, la sphère intime ou l’origine raciale ou ethnique, > 3. les données génétiques, > 4. les données biométriques identifiant une personne physique de manière univoque, > 5. les données sur des poursuites ou sanctions pénales et administratives, > 6. les données sur des mesures d’aide sociale; Is public data personal data? Yes, and it should be processed as such. There may have additional restrictions for data published on social media. In social sciences, you can assume that you are dealing with personal data. To be considered anonymous, that data have to be permanently destroyed. ## 4. Is consent mandatory? :uk: *Pablo Diaz* Origin: "Horror stories" * Nazi doctors * Tuskegee study (1932-1972) * Milgram Experiment (1961) * Stanford Prison Study (1971) * Laud Humphreys (1960s) today's 3 pillars of research ethics: * **autonomy** * beneficence & nonmaleficence * justice **Autonomy** requires participants to be free (to participate) and therefore properly informed + not pressured to praticipate (no coercision). A public body has to know the legal basis because only what is allowed can be done. The participant must give his/her consent for his/her data to be processed by a researcher. The consent is also often the only legal basis to work with personal data. How to ensure that the consent is valid? Proper information and no coercision. Consent can be oral, but it's very useful to have a written and signed consent form (to prevent issues). Where sensitive data is involved, consent must be explicit. You can not considering that answering a questionnaire means the person agrees with the processing of his/her sensitive data. **prodecural VS processual consent** procedural: document signed at the beginning of the research processual: consent can only be obtained over time (because the process evolves) What information should be provided? * identity of the responsible of the researcher + all the team members that process the data (**mandatory**) * understable statements describing the purpose e.g. for kids (useful) * clear description od the foreseeable risks and benefits of participation (useful) * nature od the data collected and its usefulness (**mandatory**) * honest and complete description of the protection/security measures (useful) * guaranteee of being free to decide to participate or not (**mandatory**) * right to access and rectify data (useful) * existence of any conflict of interest (useful) * preservation and reuse of data (useful) * contracts with third parties (useful) * possibility of being informed of the results of the project (useful) covert observation ≠ deception or incomplete disclosure **Covert observation** is a practice that has often been used in sociology/anthropology. It's a method of gaining access to things that one would not otherwise have access to (e.g. how desks are placed in an office). It's illegal if personal data are involved. "Deception is when a researcher gives false information to subjects or intentionally misleads them about some key aspect of the research. [...] Incomplete Disclosure is a type of deception that involves withholding some information about the real purpose of the study, or the nature of the research procedures" (https://research.oregonstate.edu/irb/research-involving-deception#_ftn3) **Exceptions to information** * if an overrriding public interest requires it (e.g. research on criminal activities) A private body can benefit of an exception for private interest. As soon as the reason for the institution of the obligation to inform disappears, the controller must provide the information. ## 5. Where should I store my data during the project? :uk: *Marieke Heers* **possible consquences of data security breach** * Harm to research participants * Harm to researchers , their reputations , and their work * Harm to institutions **securing data environment**: different levels of security based on the possible risks/harm * How sensitive are my data? What are the risks of harm? * During the project, who can access the data and under what conditions? * How are my data backed up? * Where are the security weaknesses and how can these be addressed? **Criteria to taking "appropriate" measures** * type of data * sensivity of data * amount of personal information collected * purpose of data collection * risks in case of security leaks * technical state of the art **Technical measures** * Storage * *factors*: data volume & sensivity, team composition and roles, institutional setting(s), available infrastructure, project duration & resources, fieldwork setting(s), non-digital objects - e.g. paper questionaire -, need to transmission of files * *types*: **local drive** (full control & easier protection VS easily lost + access difficulty + not suitable for long-term storage), **cloud solutions** (shared files + automatic backups + often automatic version control VS not always secure + insufficient control on data), **network drives** (data centrally stored and backed up + shared access VS risks of unauthorized access + complicated access for external partners) * * password protection * strong passwords protect from unauthorized access, hacking, and malicious software (KeyPass-like software is helpful) * encryption * sensitive should be encrypted (when carried out of the institutions - online or not) * Backup * Prevents data loss * Disposal od data * Personal data should be deleted if it is no longer needed * Data disposal (e.g. erasure, sanitizing, wiping) is an important measure for reducing risk. **Organisational measures** * Access control * prevent unauthorized access * there should be access rules * anonymize data (where possible) * Data transmission and sharing * unsafe methods: email without encryption, upload unencrypted data to a cloud service, hand-carrying unencrypted data * safe methods: Emailing an encrypted file, and sharing the password separately and securely, uploading an encrypted file to the cloud, mailing encrypted files loaded onto encrypted devices, survey software with encryption features, secure Shell File Transfer Protocol (SFTP) ## 6. How much anonymisation is enough? :uk: *Marieke Heers* It depends on the specific project. The researcher, who is the expert, has to decide. **What to consider to decide?** Anonymisation: process of definitely delete information allowing to identify a person from a dataset, document, etc. (**can NOT be reversed**) ex. Crossing 3 simple variables (date of birth, postal code and gender), 63% of the US population can be identified (Golle 2006). Pseudonymisation: removal or replacement of identifiers with pseudonyms or codes, which are kept separately and protected by technical and organisational measures (**can be reversed**) **Pseudonymised data remains personal data** and must be processed as such. Balacing utility and data protection: increased protection implies decreased utility (what can be sacrified?) risk management: a. motivation of an attack b. consequences of a discolure c. discolure without malicious intent d. how other data/knowledge might be linked to the data in question **anonymisation strategy** (serves as documentation) 1. evaluate the risks early in the project 2. describe the anonymisation measures goal: reduce the risk to an acceptable level (the zero risk level does NOT exist) **Quantitative data anonymisation** **direct identifiers** alone are enough to identify a person (e.g. name, AVS number) **strong indirect identifiers** allow fairly easy identification (e.g. home address, phone number) **weak indrect identifiers** allow identification through combinations of variables basic approach: * remove direct and strong indrect identifiers * assess weak indrect identifiers and apply appropriate techniques specific anonymisation techniques * remove variables (*but* why was that information collected in the first place?) * remove records (*but* why was that information collected in the first place?) * mask characters * pseudonymise * generalise (turn detailed information into general information) **Qualitative data anonymisation** * rendering research particpants anonymous by removing identifying information specific anonymisation techniques * replacing personal names with aliases * categorising proper nouns * removing sensitive information * categorising background information * changing values of identifiers Mark the text that has been anonymised (e.g. @@ at the beginning and ## the end) A maximum of information should be maintained (for research purpose) and unedited versions of data should be kept for preservation. ## 7. How much documentation is needed? :uk: *Marieke Heers* documentation: any inofmration that serves as a record of a research project and that renders data usable and meaningful **Why document? To make data publishable, discoverable, citable and reusable**. Clear and detailed documentation improves the overall data quality. **What and how much to document?** * information needed to work with the data * purpose of the information * will the data be shared with others? Documentation should be part of the research project and updated on a regular basis **2 levels of documentation** * project-level documentation (incl. how, who and when data where collected) * data-level documentation (incl. codebooks and everything that helps make sense of the data) **Metadata** In behavioral and economic sciences, the satndard usedis Data Documentation Initiative (DDI): https://ddialliance.org. ## 8. How to select which data to archive and share? :uk: *Alexandra Stam* **Strategic considerations** * Data selection * Long-term preservation (favoring open formats) Incentives to share data: funders' policies, journals' policies, commitment to Open Science Risks associated with data sharing: social (e.g. discrimination), psychological (e.g. trauma), physical (e.g. injury) & economic (e.g. financial loss) ![](https://i.imgur.com/5z4Bdqy.png) **Group disucssion: Is a fieldnote data? Can it be shared?** Yes, fiedlnotes are data. Sharing them with no processing is not useful, but a transcription could be useful. Anyway, we can preserve them for their historical value. Notes can also be written down just has memory triggers, not useful to anyone else and therefore not suitable for sharing. Interesting reading: (Jackson 2019) JACKSON, Jean E., 2019. “I am a fieldnote”: fieldnotes as a symbol of professional identity. In: SANJEK, Roger (ed.), *Fieldnotes: the makings of anthropology*. Cornell University Press. pp. 1–33. ISBN 978-1-5017-1195-4. DOI [10.7591/9781501711954-002](https://doi.org/10.7591/9781501711954-002). ## 9. What are the advantages of sharing data through a data infrastructure? :uk: *Alexandra Stam* Where to share data? FAIR and non-commercial repository * Institutional repositories (trusted solution + no charge VS not sustainable for long-term access) * general purpose repositories like Zenodo, Figshare (wide audeince + suitable for cross-disciplinary data VS may not meet requirements) * journal supplementary material services (comply with journla's requirements + data availbale along the findings VS may be costly + possible copyright transfer + no long-term perservation guarantee) * trusted domain-specific data respositories (domain data management expertise + preservation and curation to community standards VS may be more selective + requires advance planning to meet standards) **How to choose a repository?** Make sure they offer: * long-term preservation * persistent identifiers * visibility of data * quality control (data & documentation) * catalogue for discovery * dissemination capacity * access control (open access, access upon registration, restricted access, access after an embargo period) List of repositories: [re3data.org](https://www.re3data.org) ## 10. How can we improve data management support for researchers? :uk: *Alexandra Stam* Provide regular trainings on top of on-demand support Provide researchers with DMP templates Offer institutional support and solutions (infrastructure) :::info :blue_book: FORS data management guides: forscenter.ch/publications/fors-guide :clapper: FORS Webinar series: forscenter.ch/data-management-webinar-series :mailbox_with_mail: FORS newsletters: https://forscenter.ch/newsletters/ :open_file_folder: International resources: https://www.cessda.eu/Training/Training-Resources-/Library/Data-Management-Export-Guide :blush: FORS is available to provide support on complicated questions we could get from researchers. ::: ## References GOLLE, Philippe, 2006. Revisiting the uniqueness of simple demographics in the US population. In: *Proceedings of the 5th ACM workshop on Privacy in electronic society*. New York, NY: Association for Computing Machinery. 30 October 2006. pp. 77–80. WPES ’06. ISBN 978-1-59593-556-4. DOI [10.1145/1179601.1179615](https://doi.org/10.1145/1179601.1179615). JACKSON, Jean E., 2019. “I am a fieldnote”: fieldnotes as a symbol of professional identity. In: SANJEK, Roger (ed.), *Fieldnotes: the makings of anthropology*. Cornell University Press. pp. 1–33. ISBN 978-1-5017-1195-4. DOI [10.7591/9781501711954-002](https://doi.org/10.7591/9781501711954-002).