# Resource review
These resources are listed in order roughly according to either their priority to be considered in the strategy planning process (i.e. Initial Design Considerations) or the the path data would take across collection, integration / processing to end-user access.
[Africa Health Expo - Presentation -
3IR b4 4IR: Data Accessibility to Enable African
Data Science, ML and AI for Health (slides)](https://docs.google.com/presentation/d/1cZgUNBmyXjimofGww_Xwj3QtcO3CQuTGNfp67Jtsn5k/edit?usp=sharing)
1. Initial Design Considerations
1. Threat modelling
1. When dealing with any security or data privacy issue, co-developing a "threat model" can be very useful (and is often a prerequisite) to help inform how much effort is required to make a system or its data outputs safe. For this reason, I believe it is important to consider early to help inform stratgey generally. For our context, it seems that we would require some of the highest level threat model considerations, e.g. cyberattack from another nation state (e.g. a North Korea ransomewear attack). I think that some form of threat model should be captured somewhere to check against if the measures we suggest are sufficient.
1. Resources
1. [OWASP Threat Modelling Project](https://cheatsheetseries.owasp.org/cheatsheets/Threat_Modeling_Cheat_Sheet.html)
1. [Threat Modelling Manifesto](https://www.threatmodelingmanifesto.org/)
1. [NIST Guide to Data-Centric Threat Modelling](https://csrc.nist.gov/files/pubs/sp/800/154/ipd/docs/sp800_154_draft.pdf)
1. [Threat modelling awesome list](https://github.com/hysnsec/awesome-threat-modelling)
1. Secure Data Environments (Trusted Research Environments)
1. A formalism from the research space, though industry has many features built into existing products (e.g. any of these [cloud data platforms](https://www.thoughtspot.com/data-trends/cloud/cloud-data-platform)). They focus on the 5 (or 7) Safes Principles to guide design of these systems 
[paper](https://zenodo.org/records/5766513)
1. Data sharing approaches
1. Resources
1. https://gh.bmj.com/content/8/10/e013092.long
1. Existing implementations
1. 
1. https://www.opensafely.org/
1. https://cidacs.bahia.fiocruz.br/
1. https://www.hdruk.ac.uk/
1. [Paper collection - Special Issue on Data Centres](https://ijpds.org/issue/view/13)
1. The Data Journey
1. Data Collection
1. 
1. Data Sharing Agreements
1. Resources
1. [MOU templater](https://adbex-template-mou-builder.streamlit.app/)
1. Data sharing APIs
1. 
1. 
1. Data Ingestion, processing and harmonisation
1. Data linkage
1. Privacy preserving data linkage is a fairly mature space with a number of techniques to choose from. Below I have linked a somewhat recent survey / overview paper which should serve as a good introduction to the space. I have also included the open source data linkage tool from CIDACS. In their recent paper they compare their tool to other state-of-the-art tools. These kinds of comparisons, I find handy, as the authors have done the work of becoming familiar enough with the current space to know what is state-of-the-art.
1. Resources
1. [A taxonomy of privacy-preserving record linkage techniques](https://mega.nz/file/gK5TTbIC#dm7QYkhnF7HWPUNn7-w3_FBvloDhZ-nGqwvMXn8W8Pk)
1. CIDACS open source data linkage tool - [paper](https://cidacs.bahia.fiocruz.br/artigo/cidacs-rl-a-novel-indexing-search-and-scoring-based-record-linkage-system-for-huge-datasets-with-high-accuracy-and-scalability/),[code](https://github.com/gcgbarbosa/cidacs-rl-v1)
1. Data accessibility
1. 
1. Privacy Preserving methods
1. As mentioned in the meeting, data de-identification is one of the more fundamental privacy preserving tools for sensitive data, however how these and other tools get used can often be very case specific depending on the nature of the data and context. Below I have provided some resources for a standard privacy preserving workflow and how the impact of different data variables can be considered.
1. Resources
1. Introduction to privacy preserving methods
1. [[source]](https://www.pdpc.gov.sg/-/media/files/pdpc/pdf-files/advisory-guidelines/guide-to-basic-anonymisation-31-march-2022.pdf)
1. [[source]](https://fpf.org/wp-content/uploads/2017/06/FPF_Visual-Guide-to-Practical-Data-DeID.pdf)
2. Synthetic data
3. https://sdv.dev/ - good for synthetic data that is not 100% synthetic but synthetic enough to use in addition of phyisical, logical and governance process safe guards (e.g. rapid internal development, low manual overhead)
4. https://github.com/synthetichealth/synthea - good for data that needs to be 100% synthetic and public open acces (notable manual overhead for creating probablistic models )
1. National Strategies
1. [US National Strategy to Advance Privacy Preserving Data Sharing and Analytics](https://www.whitehouse.gov/wp-content/uploads/2023/03/National-Strategy-to-Advance-Privacy-Preserving-Data-Sharing-and-Analytics.pdf)
1. [UK National Data Strategy](https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy)
1. Frameworks
1. [US NIST Privacy Framework]( https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.01162020.pdf)