# Possible scenario's for data management *-context-* Researchers often need access to the raw data to develop novel algorithms or to study other types of variations (CNV/InDel/SV/MEs/etc). In this respect, it will be important to establish what people are comfortable with in terms of what data will be available to whom and what it can be used for. A solution will need to balance the needs of participants to control their data with the needs of research to have sufficiently large datasets. It is important to note that in all scenarios considered here the user will always be able to withdraw from the dataset without the data being available elsewhere. The data is always protected from misuse by usage agreements and contracts. ## Data formats considered in these scenario's - Raw data (.fastq.gz) - Mapped data (.CRAM) - Variants (.gvcf) - Disease/treatment relevant variant sets (annotated .gvcf) ## Data management models considered in these scenario's - User - Group (kahui/board/committee/hapu/iwi/..) - ... ? ## Data locations considered in these scenario's - Centralised - Federated ---- ## The 'walled garden' model - A user authorises a group to decide which research question can be asked of their data. - The data is stored in a centralised repository, researcher(s) gets login access to this - Two data levels are recognised that can be accessed depending on the research needs, raw and processed (fastq & gvcf). - For clinical purposes only the variants relevant to the disease or treatment are put in the hospital system by a clinician (no direct linkage) - Researchers bring in their own metadata | Pros | Cons | | ---- | ---- | | Data QC | No individual consent | | Data availability | No interaction with participants | | Homogenous environment | Needs a trusted (by many groups) party to hold data | | 1 time IT cost | | ---- ## The 'health train' model - A user grants individual access to their data - Data is stored in a federated manner (many different devices) - Several data formats would be recognised - A hospital would need to make a request for data but could integrate with their own systems (unlikely anytime soon) | Pros | Cons | | ---- | ---- | | Individual consent | Loss of 'wider perspective'| | Ability to contact participant | Difficult to get a large enough dataset | | Requires explanation of research to participants | Continuity of access | | | Data availability / difficult to publish | | | Many IT systems to maintain | | | No one has successfully done this yet | ---- ## The 'dynamic consent' model - A user can provide dynamic consent which is compared to the research questions by a group - Data resides in a central repository but access is federated - All data formats are recognised - If allowed a federated linkage to limited health data could be implemented | Pros | Cons | | ---- | ---- | | Individual consent | Only a few successful examples | | Data availability | Participants will need to be engaged / 'pro active' | | Data integration | Dynamic consent is a developing field | | | | ---