Steve Harris
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    1
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- section_title: approach word_limit: 6000 images_accepted: true references_accepted: true guidance: > Explain how you have designed your approach so that it: - is effective and appropriate to achieve your objectives - is feasible, and comprehensively identifies any risks to delivery and how they will be managed  - uses a clearly written and transparent methodology (if applicable) - summarises the previous work and describes how this will be built upon and progressed (if applicable)  - will maximise translation of outputs into outcomes and impacts - describes how your, and if applicable your team’s, research environment (in terms of the place and relevance to the project) will contribute to the success of the work Within the Approach section we also expect you to: - demonstrate access to the appropriate services, facilities, infrastructure, or equipment to deliver the project - provide a project plan including milestones and timelines, in the form of an embedded Gantt chart or similar In addition to the above, you must clearly articulate in your application how your work will: - seek convergence around solutions, not divergence - identify and address clear community needs, and engage effectively with the user base, to support and enable cutting-edge biomedical, health and care research as appropriate - support reproducible research, with appropriate steps taken to ensure the reliability and robustness of design solutions - contribute to building data and digital capacity across regions and institutions, beyond the host organisations of the award - where relevant, address needs for patient, public and practitioner involvement and engagement and meet the six UK standards for public involvement - offer innovative training opportunities not available elsewhere - integrate plans for dissemination and uptake of the DRI, services or tools by the relevant community, including details of proposed access and usage - credibly plan for sustainability and legacy beyond the end of UKRI funding. This could include cost recovery models, securing additional funding, development or expansion after the initial period of funding, and address retention of suitable technical staff and user support - effectively address and seek to improve environmental sustainability (if applicable) Infrastructure support If applying for infrastructure support, we would expect you to provide: - details of robust governance (proportionate to the scale and complexity of the activity) such as identifying whether an external advisory group is needed - details of training and development of infrastructure staff and users - a credible description of the working environment in sufficient detail to understand its scope, services and operation notes: > - boasts - count hospitals not trusts, and population and EHR systems todos: - collaborate via HackMD - ==demonstrate access to the appropriate services, facilities, infrastructure, or equipment to deliver the project== - provide a project plan including milestones and timelines, in the form of an embedded Gantt chart or similar - ==L2R diagram to frame the concepts and terminology== - Laura’s comment about left to right and homework, maybe do this in note to reader at the beginning, or add to the opening figure - address concerns from Chris Russell - Operational efficiency for trials - Community of practice, how to do that - Not creating new infrastructure --- ## Notes to reader 1. Please see the related applications section for details but the host sites bid together to host the national AI Research Resource (AIRR). Our bid, **Practitioner**, received a very positive response from the reviewers but the programme is paused pending the general election. 2. Please see the table below for abbreviations | Acronym | Explanation | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | AI | Artificial Intelligence | | AIRRLOCK vs AIRLOCK | AIRRLOCK (this bid) vs the MHRA Airlock | | DDaT | Digital, Data and Technology | | DevOps | A mindset, culture and set of technical practices that supports the integration, automation, and collaboration needed to effectively develop and operate a solution | | EHR | Electronic Health Records | | EHR2EDC | Electronic Health Records to Electronic Data Capture systems | | GenOMICC | Genetics of Mortality in Critical Care | | GMLP | Good Machine Learning Practice | | IaC | Infrastructure as Code | | ICO | Information Commissioners's Office | | L2R and R2L | left-to-right and right-to-left | | ML(4H) | Machine Learning (for health) | | MLOps | Machine-Learning-Operations: a set of practices that combines the principles of DevOps with machine learning to ensure models are developed, tested, and deployed efficiently and reliably. | | MRC | Medical Research Council | | NIHR | National Institute of Health Research | | OMOP CDM | Observational Medical Outcomes Partnership's Common Data Model[^[An open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.](https://ohdsi.github.io/CommonDataModel/)] | | SDE | Secure Data Environment, a TRE for purposes other than just research, used more commonly in the NHS | | SNSDE | Subnational Secure Data Environment | | TRE | Trusted Research Environment: consisting of a security framework with controls based on data risk, a secure computing environment, and an information governance framework managing access | | Triple-M | Multi-modal, Multi-morbid, Multi-scale data as per the MRC’s review of biomedical data science[^[The Opportunity of Biomedical Data Science](https://www.ukri.org/publications/the-opportunity-of-biomedical-data-science/)]" | | WS | WP: Work stream and Work package | --- # Our approach We believe in the potential of biomedical and health data for better science and better care. Yet the richest multi-modal, multi-morbid and multi-scale data, generated during care interactions with hospitals, is also the most inaccessible. The promise of interoperability and federated learning have thus far failed to deliver. We are unable to simply lift and shift our data. We have underestimated the complexity of the **first mile** of the data journey: **the interface between care delivery and secondary use**. The first is delivered by NHS Digital, Data and Technology (DDaT) teams, and the second is driven by researchers, population health specialists, trialists, and innovators. The first is responsible for safety and reliability. The second needs scale and standardisation. The complexities of this interface are replicated and magnified across different data modalities, vendors, and providers. Appropriately, the concept that **complexity is at the seams** arises from John Gall, a physician *and* a systems theorist.[^ Gall, J. The Systems Bible. (General Systemantics Press, 2002)] The solution is **socio-technical**. We need to connect these communities, and to provide an architectural pattern that recognises their different priorities. In brief, rather than seeing the movement of data from hospital to SNSDE as a single action, we interpose a staging zone inside the NHS. - *Technically*, this staging zone is comprised of AIRRLOCK and a TRE. AIRRLOCK itself is a transit hub for data. It conforms to a ports-and-adapters architectural pattern. It uses the best parts of existing data pipelines but separates out the local NHS interface, removing restrictions on how NHS teams navigate local data extraction idiosyncrasies. - *Socially*, the TRE connects the communities by enabling the *researcher-to-data* paradigm. Research teams can work alongside NHS DDaT specialists to support both the first mile of data extraction, and the last mile of algorithm deployment. It is a safe place to standardise and map data to concepts, to build features and to iteratively improve all pipelines. But it is also the best place to test and re-calibrate AI/ML models against local systems and local populations. - *Socio-technically*, we contribute to a ‘digital third space’ for community and collaboration within and beyond the NHS to serve as a knowledge repository for best practices, code repository for the software defined infrastructure and communication forum for the community. We connect to the HDR-UK gateway, and expand the NHS Artificial Data pilot[^ https://digital.nhs.uk/services/artificial-data] to operational and multi-modal data. Yet moving data out of the hospital cannot be the ultimate goal. We must design with the roundtrip, from hospital to SDE and back again, in mind. This will unlock the original intersection of healthcare and science: clinical trials. Operational inefficiency in trial delivery is called out in Lord O’Shaughnessy’s independent review.[^[ Commercial clinical trials in the UK: the Lord O’Shaughnessy review](https://www.gov.uk/government/publications/commercial-clinical-trials-in-the-uk-the-lord-oshaughnessy-review)] Properly implemented, these local data pipelines can improve efficiency by supporting EHR2EDC solutions for re-use of data in drug trials. Furthermore as we finally enter the ‘Age of the Algorithm’[^Age of the Algorithm reference], they directly address a new opportunity: the need for ‘human-AI’ team monitoring and testing, and evaluation for efficacy and bias outlined in the MHRA/FDA guidance for ‘Good Machine Learning Practice’ (GMLP). That ‘human-AI’ team is the NHS and research community, we address. # Our ‘North Star’ As per the bid guidance, we “*build on ... research where it provides a useful exemplar to test and demonstrate the effectiveness of solutions*”.[^reference to application] Specifically we address the challenges of supporting NHS operations and biomedical science that were manifest during the COVID-19 pandemic using the perennial problem of infection in cancer as our ‘peacetime’ use case. ## An urgent need, an important alignment, and a scientific opportunity ### An operational and research need At no time have the challenges of working with hospital data been more apparent than during the COVID-19 pandemic. The UK led the world in trials and genomics, but did so with almost no access to clinical data. The RECOVERY platform trial[^[Dexamethasone in Hospitalized Patients with Covid-19.NEJM,2021](https://doi.org/10.1056/NEJMoa2021436)] rapidly evaluated treatment strategies. The GenOMICC study explained molecular mechanisms of disease using host genomics.[^PMID: 33307546; PMID: 35255492; PMID: 37198478] GenOMICC revealed a new therapy: baricitinib. This was in turn adopted into RECOVERY and shown to significantly reduce mortality.[^PMID: 35908569] Whilst both studies depended on the NHS and NIHR *research* infrastructure, neither could access the rich and deep Triple-M *clinical* data in the NHS. Opportunities to understand treatment variation, and prioritise and stratify based on clinical phenotypes were lost. Instead, both resorted to short web forms for data collection - fewer than thirty clinical data points were available in either. Inefficiency was rife. The UK’s ISARIC4C study was essential to the UK clinical response to Covid-19. It tracked the impact[^PMID: 32444460] and predicted severe disease[^https://doi.org/10.1136/bmj.m3339]. But it publicly thanks the “***2,648** frontline NHS clinical and research staff and volunteer medical students*” who collected data *by hand*. Similarly, whilst the ISARIC4C prediction model outperformed nearly all of the 700 models published in the first year of the COVID pandemic, neither it nor any NHS model included multi-modal data.[^BMJ, PMID 32265220] In 2022, this same pattern was repeated during the surge of life-threatening hepatitis in young children. Whilst GenOMICC rapidly identified a completely new disease-causing organism, data was manually collected and disease phenotyping remained rudimentary[^https://doi.org/10.1038/s41586-023-05948-2, https://doi.org/10.1038/s41586-023-06003-w] ### An alignment: pandemics in peacetime Dormant infrastructure decays. We hope pandemics are infrequent, but productive activity is essential to maintain effective systems. So here we align to the challenge of **infection in cancer**. This is a key cause of morbidity and mortality in its own right. The imperative to promptly treat these vulnerable patients already drives **anti-microbial resistance** (prior to empiric antibiotics strategies, infection was responsible for up to 70% of mortality).[^ neutropaenic acute leukaemia - reference needed] But, it also acts as a **canary-in-the-mine**. Stretching the metaphor, the hospitals (the ‘mine’) are often the first, and typically the only care providers for these emerging and high risk diseases (Seoul hantavirus [Glasgow], Viral Haemorrhagic Fevers [Royal Free], and Severe Acute Respiratory Syndromes including H1N1). And the multi-factorial immune compromise arising with cancer, sadly renders these patients (the ‘canaries’) vulnerable due to chronic infection in which viral variants evolve and emerge.[^ Kemp, S. A. et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature 592, 277–282 (2021)] ### An opportunity: genomes and AI Foundational Models We have partnered with two internationally recognised academic groups to prove the relevance, and provide focus for our work.The first build Foundational AI models for health, and the second is the team behind much of the pandemic science described above. 1. Foundational models are AI models trained on a broad range of data at scale that are adaptable to a wide range of downstream tasks. Alexander and Dobson (co-applicants) led the development of **RETFound**[^ Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).] (trained on 1.6 million retinal images that detects multiple diseases across diverse populations) and **Foresight**[^ Kraljevic, Z. et al. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. The Lancet Digital Health 6, e281–e290 (2024).] (trained on free text from >800k medical records that predicts disease trajectories) 2. Host and pathogen genomics are the key to identifying mechanistic targets for therapeutic drug development. Baillie (GenOMICC Chief Investigator) and Thomson (partners) respectively lead the Pandemic Science Hub in Edinburgh and the Centre for Viral Research in Glasgow. GenOMICC holds 26,710 human genomes and is on track to 100,000. It led the global effort to explain mechanisms of COVID-19 through host genetics. It was the first to discover more than 75% of the genetic variants associated with Covid-19[^[Pairo-Castineira, E. et al. GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19. Nature 1–15 (2023)](https://pubmed-ncbi-nlm-nih-gov.libproxy.ucl.ac.uk/33307546/)] ## A SMART objective We bring together this need, alignment and opportunity to drive the construction of AIRRLOCK. We will build an integrated data pipeline that will (1) prepare Triple-M data for a foundational model of the lung, and (2) provide the same to GenOMICC to connect, for the first time, deep clinical phenotypes with host genomics at scale. We undertake to bring three or more data **modalities** in three or more of our partner **sites** for three **hundred** or more *consecutive* ICU admissions to three or more **SDEs**. This task is our ‘North Star’ orientating the development of our community of practice, and our open technology platform. We choose the word consecutive deliberately because it means this is not a one off research task but the start of an ongoing, sustainable connection between the NHS and “cutting edge data science intensive research”[^reference to application] across the UK. # Our solution: AIRRLOCK We describe below the three components: AIRRLOCK itself, the hospital TRE, and the ‘digital third space’. ## Technical design ### Definition AIRRLOCK itself is a standardised **architecture** and a cohesive set of reference implementations of **policies and processes** enabling technologies for extracting multi-modal data from hospital systems and transfer of these data to validated destinations. - **We are not building new data pipelines**, but *refactoring* ‘best of breed’ instances for deployment by matching the implementation to the workforce skills and technical resources of NHS providers. - **We are not building a new data assets**, but *connecting* data in operational systems to the national SDEs. - **We are not building for research alone**, but recognising the *synergy* of research data pipelines across NHS operational demands, clinical trials, and data for deploying AI/ML models (inference not training). An AIRRLOCK instance is a - **transit hub** for data. It acts as a staging post between existing data extraction pipelines (e.g. EHR, imaging, free text etc.) and Trusted Research Environments. It acts as a standardised bridge between heterogenous inputs from hospital data systems, and either hospital TREs or the NHS Sub-national SDEs. - **single extraction** route for data into NHS *sub-national SDEs*. It reduces the complexity of the interactions between data providers, and consumers, and accelerates participation in the SNSDE networks. - **turnkey appliance** embedded within the perimeter of the existing technology estate of an NHS trust. To distinguish an AIRRLOCK from a full-suite TRE and a general purpose SDE, we will describe an AIRRLOCK as a _secure data enclave_. ### Architecture ![](https://hackmd.io/_uploads/SyLil0UrC.png) ==repeat diagram without AIRRLOCK and as a social / governance overlay== AIRRLOCK will conform to a **ports-and-adapters** architectural pattern. The system design allows for loose coupling and interchangeability of components. It enables a strategy where we leverage existing data extraction pipelines, e.g. EMAP, OMOP-ES, PIXL, CogStack, yet place no restrictions on how NHS digital teams navigate local data extraction idiosyncrasies as this may require custom extraction pipelines. - *data ingress*: AIRRLOCK will expose a port for each data modality. The port will provide an interface and require the implementation of an adapter by each data extraction pipeline. The AIRRLOCK team will work with existing data extraction pipeline teams to develop adapters that conform to the ports exposed by AIRRLOCK. The team will foster a community and establish standards to support the development of further data extraction pipelines. - *data monitoring*: AIRRLOCK supports data quality and safety. Examples include monitoring for data drift, merging local patient identifiers, and vendor/provider specific anonymisation tasks. - *data egress*: AIRRLOCK will negotiate the terms for a standardised interface to AIRR and implement the adapter required for connecting to the interface exposed by AIRR. AIRRLOCK will also develop an abstract interface for connecting to SNSDEs and provide implementations for export to partner site SNSDEs. ## The hospital TRE Mapping local data to interoperable standards such as SNOMED, or the OMOP Common Data Model is a Sisyphean task. Consider glucose measurements as an example. Do we glucose in blood or urine, a laboratory test or point-of-care, and it fasting, fed, or random? Now layer on the many vendors of EHRs, laboratory and imaging systems, and the many hospital implementations. We argue that data centralisation alone will fail, as data must be imbued with local context to be valuable. That is done best by those closest to the source. This task is human. It can be accelerated but not replaced by technology. But it can’t be done by the local NHS alone. A hospital TRE connects researchers with NHS staff. It enables collaboration on data extraction, standardisation. It provides a safe environment for recalibration of AI models to local populations. It is where the ‘human-AI’ team composed of NHS staff working alongside academic and industry experts can work to can work to safely maintaining and iteratively improve models. The hospital TRE is an essential complement to the SNSDE programme. It is not for data access, but to accelerate data provision. ### Practitioner versus public cloud We use the airlock metaphor to name our bid because AIRRLOCK is both a metaphor and a portmanteau. **AIRR** because the host sites bid together to host the national [AI Research Resource](https://www.ukri.org/opportunity/host-sites-for-the-next-wave-of-uk-government-ai-infrastructure/), and **LOCK** because we need to tidally lock both NHS and the research communities, and the tasks of preparing data for model development, and model deployment. Our bid to UKRI/DSIT was named Practitioner. This bid, AIRRLOCK, marries perfectly with the vision of the Practitioner to host the UK’s National Compute and Data Facility for Artificial Intelligence Research and Innovation. Practitioner focuses on super compute for Sensitive and Personsal Data. It follows the Standardised Architecture for TREs ([SATRE](https://satre-specification.readthedocs.io/en/stable/)) specification co-developed with [DARE](https://dareuk.org.uk) and [HDR-UK](https://www.hdruk.ac.uk). Our ambition is to see the hospital TREs provided by Practitioner. TREs hosted on public cloud (e.g. [Azure TRE](https://microsoft.github.io/AzureTRE/latest/), [TREEHOOSE](https://discovery.dundee.ac.uk/en/projects/tre-in-a-box-for-the-aws-environment-treehoose)) are a technical alternative, but they do not come with ISO27001 certification, and require local provision of a compliant Information Security Management System (ISMS). This is an additional burden on NHS providers. Practitioner would reduce cost to the public by providing a common ISMS, and reduce the cost of renting compute from commercial providers. These benefits are aligned to Recommendations 6 (scale) and 7 (sustainability) Professor Ghahramani’s [Independent Review of The Future of Compute](https://www.gov.uk/government/publications/future-of-compute-review/the-future-of-compute-report-of-the-review-of-independent-panel-of-experts) commissioned by Department for Science, Innovation and Technology. Nonetheless, whilst our bid received a very positive response from the reviewers, the programme is paused pending the general election, and public cloud remains a realistic alternative aligned to the NHS ‘cloud first’ [policy](https://digital.nhs.uk/data-and-information/looking-after-information/data-security-and-information-governance/nhs-and-social-care-data-off-shoring-and-the-use-of-public-cloud-services/guidance#use-of-cloud-computing-services) for public sector IT. ## Social design AIRRLOCK is ‘digital third space’ for community and collaboration across and beyond the NHS. It will serve as knowledge repository for best practices, and a code repository for the software defined infrastructure and communication forum for the community. ==WIP== # Principles, Aims, and Objectives # ==Work plan== Our programme delivery builds on the experience of the team in managing large technical infrastructure projects in the boundary between the research community and the NHS. UCL’s Advanced Research Computing Centre will act as the coordinating centre bringing architectural experience from the SATRE programme, and cohesion with the development and deployment of Practitioner. We have a realistic delivery plan over 3-years with 6 sites (3 lead, and 3 fast-follower). We assume a 3 month set-up period, 2 years of co-working, and 9 months to transition to a sustainable, self-supporting, vibrant and growing solution. ## Workstream 1: Ingress | Summary | Work Packages | Description | Outcome | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Workstream 1 is focused on the design of data ingestion interfaces for imaging, free-text and structured data. This workstream involves close collaboration with existing teams in the NHS and academia who develop data extraction tools. | 1.1 Ports | Design and specify the ports which define how Triple-M data will be received into AIRRLOCK. Implement the ports inside AIRRLOCK as performant, low-latency services. | An agreed design of ports and adapters for transferring Triple-M data with functional implementations for the existing pipelines as well as clear guidelines for how these can be implemented by the community for other extraction pipelines. Triple-M data can be pushed from the existing data extraction pipelines across the boundary into AIRRLOCK. This workstream does not involve further processing of data beyond validation. | | | 1.2 Adapters | Co-develop adapters with the existing Triple-M data extraction pipelines which fit into the AIRRLOCK ingestion ports. | | ## Workstream 2: Enclave | Summary | Work Packages | Description | Outcome | | --------------------------------------------------------------------------------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Workstream 2 is focused on developing the data processing and management capabilities at the core of AIRRLOCK. | 2.1 Tunnel | Design and specify the ports which define how Triple-M data will be received into AIRRLOCK. Implement the ports inside AIRRLOCK as performant, low-latency services. | A data processing pipeline which can link Triple-M participant data received via the individual ports. Accompanying the pipeline is an administrator-accessible front-end which visualises processing metrics in real-time. A web application with strong security used to define a study and specify identifiers for participants in the study.The application monitors the processing pipeline metadata and indicates for each participant whether their data has been ingested. This workstream does not involve building the capability to pull data from AIRRLOCK and at this stage relies on out-of-band communication of participant identifiers to each data extraction pipeline as well as the execution of those pipelines. | | | 2.2 Monitoring | Developing deep monitoring, tracing and logging with extensive visualisation of every step happens in parallel. | | | | 2.3 Portal | Co-develop adapters with the existing Triple-M data extraction pipelines which fit into the AIRRLOCK ingestion ports. | | ## Workstream 3: Egress | Summary | Work Packages | Outcome | Description | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Workstream 3 is focused developing the functionality for safe and secure export of data from AIRRLOCK. This workstream involves close collaboration with the teams operating the Practitioner TREs in AIRR and the NHS Sub-national SDEs. | 3.1 Hospital TRE | A high-level secure data transfer protocol is agreed upon with the Practitioner TRE team for ingesting data safely and in a standardised, automated manner from AIRRLOCK. This protocol is implemented inside AIRRLOCK and Triple-M data can be pushed over a secure connection to a specific TRE instantiated inside Practitioner. An interface exists which encapsulates common operations for exporting data to an SNSDE and implementations are available for three SNSDEs. Triple-M data can be pushed to SNSDEs by performing operations against this interface. | Co-design a port and adapter with the AIRR team for automated secure data transfer of data from AIRRLOCK into Practitioner TREs. Implement the adapter in AIRRLOCK. | | | 3.2 SNSDE | | Design an interface which abstracts the common operations for sending data to an SNSDE. Implement this interface for the three partner site SNSDEs. Provide guidance to the community on how to implement this interface for other SNSDE designs. | ## Governance ## ==Tenets== 1. Partnership and collaboration / digital third space ## Delivery # Resources AIRRLOCK is a partnership with support from the people, process, and technology necessary for our success. We evidence national support from NHS hospital leadership, partner SDEs, technology teams, researcher groups, and PPIE representatives. ## NHS Trusts and partner Secure Data Environments We are grateful to have the explicit support of Chief Executives from six major NHS Trusts who together represent 15 hospitals including academic and community, women and children, and serve a population of more than 10 million people. We have the support[^We have statements of support available from the CEO at Manchester and Great Ormond Street NHS Trusts but as co-applicants, we are not permitted to include the letters with the application] of - David Probert, CEO, University College Hospital London - Matt Shaw, CEO, Great Ormond Street - Kirsten Major, CEO, Sheffield Teaching Hospitals - Sam Higginson, CEO, Royal Devon & Exeter - ==XX==, CEO, Newcastle upon Tyne Hospitals - Mark Cubbon , CEO, Manchester University NHS Foundation Trust Alongside these trusts, we have engaged with the 5 partner Secure Data Environments across England. We enclose letters of support from - OneLondon - Yorkshire & Humber - Great Western - North West and Greater Manchester - North East & North Cumbria We also have obtained support from Glasgow, and the West of Scotland SDE as a fast-follower. ## People We will build three stakeholder juries who will determine whether the community and infrastructure we build meets our stated aims. These are more than advisory boards: they are resources that determine our permissions to work, and hold us to account in delivering our objectives. It is for this reason, we have named them ‘juries’. They are drawn from NHS leadership, from academia, and from patient and public representation. Reconciling their different demands is key to our success. ### NHS Leadership Jury The CEOs from our NHS partner sites will act as rotating Chairs of our NHS leadership jury. Their commitment is evidence of the importance of laying the foundation for the all-important return trip where insights from data may improve the care of their patients. The CEOs are supported by technical digital leadership of the partner trusts to hold us to account for building a system which is safe, affordable, and maintainable by the NHS DDaT workforce, and welcome (UCLH: Mark White, CTO; Sheffield: Steven Wood, Head Scientific Computing; Manchester: David Walliker, CIO etc.) alongside the engineering and architectural leadership of UCL ARC (Co-I: Hetherington). ### Scientific Jury We are building in service of Biomedical Data Science. To that end, we have partnered with senior academic leadership both from amongst our co-applicant team, and from our project partners. We enclose letters of support from - Professor Kenneth Baillie, University of Edinburgh’s Pandemic Science Hub[^[Baillie Gifford Pandemic Science Hub](https://psh.ac.uk)] - Professor Emma Thomson, MRC University of Glasgow, Centre for Virus Research[^[MRC-University of Glasgow Centre for Virus Research](https://www.gla.ac.uk/research/az/cvr/)] Kenny and Emma will join the co-applicant team including Professors Danny Alexander, Becky Shipley, Laura Shallcross, Richard Dobson and James Hetherington. We specifically ask these this jury to hold us to account for building a system which delivers usable Triple-M data at scale and in a timely manner. ### Public, Patient and Practitioner Jury We also grateful for the support of key patient and public representatives from our partner sites. Our approach is designed to ensure that future engagement is representative of all relevant viewpoints, and informative enough to guide decision-making. There is a significant body of existing literature in public attitudes to privacy[^[Kalkman, J Med Ethics (2022)](https://doi.org/10.1136/medethics-2019-105651)], uses of data for research,[^[Woolley, JAMA (2005)](https://doi.org/10.1001/jama.294.11.1380)] and clinical science in outbreaks and pandemics. Without careful study of history, there is a tendency to re-discover old knowledge.[^[Arnold-Foster, BMJ (2021)](https://doi.org/10.1136/bmj.n1888)] For this reason, we will ask our third jury to hold us to account in areas not typically addressed. Specifically, - how to operationalise the ‘researchers to data’ paradigm. - how to maturely communicate the risks and benefits of sharing synthetic data and code This jury will be chaired by Mr Amir Hashmi (Chair of the Data Trust Committee at UCLH) working alongside Hannah Davies (Deputy CEO, Northern Health Science Alliance Ltd), Steve Sweeney (Manchester Foundation Trust, Data Access Committee), Diedre Leyton (Great Ormond Street PPIE Lead) and representation from Newcastle and Exeter. ## Technology We are not moving from a standing start. We bring together leaders in specific data modalities but for the first time propose to embed their existing work in a sustainable and scalable architecture. We have significant experience in developing and deploying in partnership with the NHS. Initial work will prioritise three modalities but engage and support three more fast followers. ### Structured (tabular) EHR data We have already collaborated with EHDEN[^ https://www.ehden.eu] to standardise 250,000 patients data from UCLH and Great Ormond Street. Moreover, we have extended this model to support five of the NIHR Health Informatics Collaborative themes (critical care, viral hepatitis, hearing health, myeloma, and transfusion dependent anaemia) in collaboration with Cambridge, and co-applicants Exeter and Manchester. This work is aligned with HDR-UK and the NHS SDE programme. ### Cogstack, free text and unstructured data UCL is a founding member of Cogstack which provides natural language processing (NLP) for clinical coding and anonymisation. We have processes in place in compliance with ICO standards to prepare free text extracts for research. We will install the full Cogstack platform in all partner sites. ### Medical imaging Both UCL[^[UCL Centre for Medical Image Computing](https://www.ucl.ac.uk/medical-image-computing/software-and-resources)] and Sheffield[^[POLARIS (Pulmonary, Lung and Respiratory Imaging Sheffield)](https://www.sheffield.ac.uk/polaris)] have deep experience in medical imaging (especially for respiratory disease), with a wealth of open-source software and resource including ISO-13485 Quality Management Systems[^ UCL Medical Software Quality Management System (QMS)] and MHRA regulated hyperpolarised xenon for clinical MRI scans. In the 12 months at UCL, our [PIXL pipeline](https://github.com/UCLH-Foundry/PIXL) has prepared 100k+ NHS chest x-rays, 5k+ MRI prostate scans, and a similar number of MRI scans for multiple sclerosis integrated with free text and linked OMOP EHR data. ### Fast follower technologies - **Digital AMR**: ==Shallcross== - **Pathogen genomics**: in collaboration with UCLH’s Advanced Pathogen Diagnostics Unit[^[NGS of whole viral genomes](https://www.uclh.nhs.uk/our-services/find-service/pathology-1/pathogen-genomics-unit)] we will scope data connections to incorporate viral genomes from clinical samples. - **Histopathology**: in collaboration with Research Department of Pathology (Levine, UCL Cancer) we will scope pipelines for the transfer, curation and analysis of whole-slide images (WSI). We expect just at UCLH to see c400,000 WSI/year. This builds on existing work (under Section 251 approval) to develop a pathology LLM. - **Environmental data**: in collaboration with the EPSRC Digital Health Hub for Antimicrobial Resistance, we will scope data relating spatio-temporal markers and viral transmission among healthcare workers[^ [Wilson-Aggarwal, J. K. et al. Assessing spatiotemporal variability in SARS-CoV-2 infection risk for hospital workers using routinely-collected data. PLoS One 18, e0284512 (2023)](https://pubmed-ncbi-nlm-nih-gov.libproxy.ucl.ac.uk/37083855/)] - **Waveforms and wearables**: in collaboration with the EPSRC CHIMERA[^[Collaborative Healthcare Innovation through Mathematics, EngineeRing and AI](https://www.ucl.ac.uk/chimera)] programme at UCL we will scope data connections to high resolution physiological waveforms from ventilators, patient monitors and waveforms ## Research governance approvals We already work under the aegis of the NIHR Health Informatics Collaborative, and prepare data for the myeloma, transfusion dependent anaemia, and hearing health themes for AIRRLOCK partners (Manchester, Exeter, UCLH). The critical care theme has Section 251 approval from the Confidentiality Advisory Group. We will extend these governance arrangements to all sites to support testing deployment, and reaching our North Star objectives. Moreover, GenOMICC is a *consenting* study with existing approval to link this unique asset to link genomes to health data across the patient’s life course. ==------------------------== 👆New stuff ==REVISION PROGRESS MARKER== - 2024-06-14 work in progress - I'm inserting this new text, rather than replacing (for now). - This will preserve older comments but allow feedback on the revised structure. 👇Old stuff ==------------------------== ## The problem: data and pandemics ![image](https://hackmd.io/_uploads/SyrrgbrBR.png) <!-- left2right.png --> <!-- title: Moving data left to right, from the NHS to research teams --> <!-- Alt: A diagram showing healthcare in the NHS on the left, and data for research moving to a subnational SDE on the right --> ### The data Lord O’Shaughnessy calls out in his independent review of clinical trials in the UK that > ... to be a science superpower, we have to use every asset at our disposal ... we have the workforce, the scale, the data, the science base ... but arguably **none is more significant than the NHS**. [^[ Commercial clinical trials in the UK: the Lord O’Shaughnessy review](https://www.gov.uk/government/publications/commercial-clinical-trials-in-the-uk-the-lord-oshaughnessy-review)] Simultaneously, the MRC/NIHR in this funding call seeks “access to reliable, annotated, and usable data ...[for] ... advanced statistical techniques, large language models and deep learning.” With such access, the opportunity to support basic science and clinical trials and to *directly* improve care is enormous. The NHS has committed to providing access to these data in a network of Secure Data Environments. And finally, the MHRA alongside the FDA is building guidance for ‘Good Machine Learning Practice’ (GMLP) for AI/ML models. This requires attention to the ‘human-AI’ team, monitoring and testing, and evaluation for efficacy and bias. **NHS hospital are at the centre of all three priorities**: trials, data, and AI. They are the dominant interface to NIHR clinical trial delivery. They are the key to multi-modal data access because they uniquely host imaging, laboratory, and interventional services that generate these data. And the complexity of care provision involving medical imaging, digital pathology, patient monitoring, and integrated electronic health records also means that they will be the natural springboard for AI/ML for health. This intersection is not just about the data assets but the NHS Data, Digital and Technology workforce. Extracting value from these data is an enormous task. But it is a human activity most effectively done by those closest to the data’s source. We need to partner with these professions rather than 'throw the data over the wall'. They are best positioned to both add value to the data, and support the translation, testing and trialing of AI safely into practice. But typically, they work in a separate domain in isolation from each other and separately from researchers, often using different tools and with different infrastructure. We can meet these challenges in series, or in parallel. That is, we can move the data ‘left-to-right’, from the NHS to researchers and innovators, and then separately and subsequently manage the translation of the AI/ML models 'right-to-left' from researchers and innovators into production back in our hospitals. Or we can recognise the ‘left-to-right’ and ‘right-to-left’ pathways are two sides of the same coin: one side prepares data for model development, and the other for model deployment. Tidally locking these two processes at the outset will not only improve the efficiency of the system but it will unlock the opportunity to test and trial these interventions. NHS hospitals should not be just data sources, but development partners. We provide an unparalleled opportunity to generate safety and efficacy data to support the translation of models through regulatory approvals. Because care is free at the point of access, we can robustly evaluate for bias. And because we have an established clinical trials infrastructure, we can test and evaluate, generating value, and improving care. These are the foundations for the path across the AI chasm, from theory into practice. ### The pandemic At no time have these challenges been more apparent than during the COVID-19 pandemic. The UK’s RECOVERY trial and the GenOMICC study led the world’s scientific and clinical response: RECOVERY[^[Dexamethasone in Hospitalized Patients with Covid-19.NEJM,2021](https://doi.org/10.1056/NEJMoa2021436)] rapidly evaluated treatment strategies. GenOMICC explained molecular mechanisms of disease using host genomics,[PMID: 33307546; PMID: 35255492; PMID: 37198478] rapidly pointing to a new treatment: baricitinib. As a direct result, this drug was almost immediately included in the RECOVERY trial, and shown to significantly reduce mortality.[PMID: 35908569] Both studies depended on the NHS and NIHR *research* infrastructure, but neither touched the rich and deep Triple-M *clinical* data in the NHS. In fact, NHS data never moved left-to-right: both studies used short web forms for data collection - less than 30 clinical data points in GenOMICC, and a similar number in RECOVERY. More than 700 prediction models were published in the first year of the COVID pandemic but not a single NHS model included multi-modal data.[^BMJ, PMID 32265220] Without access to imaging and structured and unstructured data, the phenotypes of disease are crude, and the opportunity to translate insights into practice were blocked. We cannot stratify treatments, personalise medicine, or use AI/ML to support operational or clinical decisions. The UK’s ISARIC4C study was essential to the UK clinical response to Covid - tracking the impact[PMID: 32444460] and predicting severe disease[https://doi.org/10.1136/bmj.m3339] - and during the 2022 surge in cases of life-threatening hepatitis in young children, enabled rapid identification of a completely new disease-causing organism.[https://doi.org/10.1038/s41586-023-05948-2,https://doi.org/10.1038/s41586-023-06003-w] Astonishingly, ISARIC4C data was collected entirely by humans, both for the hepatitis outbreak, and from 303,521 COVID patients. In each case, a research nurse, medical student or a member of clinical staff manually transcribed clinical data from one electronic system (the patient's medical record) into another (the case report form).[^[“2,648 frontline NHS clinical and research staff and volunteer medical students”.](https://isaric4c.net/about/authors/)] Similarly, GenOMICC has so far obtained consent and DNA from 26,728 patients but had no access to either imaging or the deep clinical phenotypes from NHS Electronic Health Records.[^https://genomicc.org/countries/uk/recruitment/] ## The proposal ![image](https://hackmd.io/_uploads/BkvtxWHr0.png) <!-- right2left.png --> <!-- title: AIRRLOCK: moving data left to right, supporting translation, testing and trials of innovations moving right to left --> <!-- Alt: A diagram showing healthcare in the NHS on the left with data flowing out via a staging TRE to SDEs on the right, but accepting AI/ML models from researchers and innovators for testing and trialling back in the NHS --> AIRRLOCK is our answer to this missed opportunity, and the platform on which we unlock the potential of patient data in partnership with care providers. Our team will focus on the hardest ‘first mile’ of the data journey out from the hospital whilst laying the foundation for the complete round-trip. AIRRLOCK is the interface between the forthcoming £500 million plus investment into Artificial Intelligence Research Resource and the rich, complex data sets held by NHS hospitals. Previously, the collaborating sites bid together to host the UK’s National Compute and Data Facility for Artificial Intelligence Research and Innovation with Sensitive Personal Data.[^Refer to related applications section] We now come together again in a wider partnership across the NHS to bring academic research teams to work alongside direct care teams. AIRRLOCK deploys *staging* TREs into NHS hospitals, and builds a national data and algorithm stewardship community through technology and team science. AIRRLOCK is the connection between the live hospital system, and the staging TRE. Alternatives are either monolithic, often commercial and result in vendor lock-in, or fragmented, unsustainable, and without architectural cohesion. We will refactor existing 'best of breed' data pipelines for imaging, for EHRs, for free text, genomics, digital pathology and more so they are usable across different hospitals. And because technology alone will not sustain, we build a community that comes together to sustain and improve these solutions. By working with academic and NHS teams, we move beyond Reproducible *Analytical* Pipelines[^[Goldacre Review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis)] to infrastructure that is focused on the safety and monitoring necessary for translation. ## Principles Our bid is built on the following three principles: 1. **Data preparation is not a data problem**. It is a complex, multidisciplinary endeavour that requires a convergence of people, process, and technology. Without care, moving data out of the NHS will strip it of context and value. To combat this, we build two communities - *Within the NHS*, we use staging TREs in partnership with AIRR/Practitioner to enable academic and clinical teams to work alongside each other by bringing **research and compute to data**. - *Across the NHS*, we use build a ‘3rd-space’ for a community that collaborates via code. Privacy controls and vendor lock-in are barriers, and most healthcare researchers unwittingly work in silos. We will use methodologies such as Infrastructure-as-Code, privacy preserving machine learning, synthetic data, and public collaborative coding tools. 2. **Productive activity between pandemics is essential to maintain effective systems**. Dormant infrastructure decays. We aim to embed sustainability and maximise immediate benefit. Sustained engagement from clinicians, researchers and the public will be achieved by clinical problems of serious concern in “peacetime”. Here we focus on the challenge of **infection in cancer**. This is a key cause of morbidity and mortality in its own right. But, it also acts as a **canary-in-the-mine** because of the multi-factorial immune compromise arising with cancer. - We know that viral variants evolve and emerge during chronic infection in the immune compromised.[^ Kemp, S. A. et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature 592, 277–282 (2021)] - We know that hospitals in general, and critical care in particular, intensively monitor, sample and care for these patients. For example, Seoul hantavirus (Glasgow), viral haemorrhagic fevers (Royal Free), and the Severe Acute Respiratory Syndromes (including H1N1) The “peacetime” activity has dual purpose. Firstly, the imperative to promptly these vulnerable patients drives another key health crisis: anti-microbial resistance - prior to empiric antibiotics strategies, infection was responsible for 70% of mortality in neutropaenic acute leukaemia.[^ref to do] Secondly, the multi-modal data required for this challenge will support a wide range of other research priorities. 3. **The self-improving lifecycle of data-driven healthcare**: Left-to-right and Right-to-left are two sides of the same coin. Preparing data for model development mirrors work for model deployment. We tidally lock these two processes from the outset. NHS hospitals will serve not just as data sources, but as development partners - ready to generate safety and efficacy data through in situ monitoring and digital trials. Success depends on - *Observabiliy*: This means maximal transparency of the system, processes, policies, data, models, model performance and user activity. This is not just good engineering practice but a core requirement for regulatory compliance. - *Usability*: The success of AIRRLOCK will depend on our ability to onboard many users and very quickly get them to the point where they can deliver value with as little effort as possible while providing the guardrails that keep everyone safe. The platform must have a wide and clearly sign-posted happy path. - *Maintainability*: AIRRLOCK will not be a static snapshot but a living system which will evolve and should improve over time. Ensuring proper evolution will require effort and we view this project more similar to tending a garden than constructing a bridge. We will leverage methodologies like Infrastructure-as-Code, continuous delivery and infrastructure engineering best practices and select out-of-the-box, PaaS and/or serverless solutions where possible to maximise transportability across hospitals with minimal expert labour. - *Efficiency* data anonymisation and curation is a highly multidisciplinary, labour intensive endeavour. We intend to minimise manual annotation of data (text and medical images) and carefully validate and maximise machine learning tools for scalable, reproducible and highly efficient data curation from medical free text, tabulated and imaging data. # Aim ## A ‘North Star’: To drive the construction of AIRRLOCK, we have chosen a brightly shining SMART objective: we will build a pneumonia prediction model using 3 or more data modalities in 3 or more of our partner sites for 3 hundred or more consecutive ICU admissions connecting to human genomes via GenOMICC in 3 or more SDEs. This task is our ‘North Star’ orientating the development of our community of practice, and our open technology platform. We envision aligned imaging, EHR (tabulated and free text), and genomic data ready to support “cutting edge data science intensive research”[^ref to application] across the UK. For clarity, this is an *exemplar*. We are not aiming to perform research and develop the ‘best’ pneumonia prediction model, nor to complete regulatory certification for the model to be used in practice, nor to provide the algorithm stewardship workforce needed to safely manage AI/ML tools. Success in this single endeavour would demonstrate that we are ready to use NHS data to support both science and clinical care, to enable the translation, testing and trials of digital health products including AI, and to provide the integrated multi-modal, multi-morbid, and multi-scale for the studies like RECOVERY and GenOMICC. Our infrastructure will provide the foundation for both ‘peacetime’ biomedical priorities, and pandemic response. ### North Star tasks We aim to assemble data around the index episode of pneumonia for each patients for two aligned exemplars. Firstly, connecting Triple-M data to a unique human genome study, and secondly, meeting the challenges of building for Foundational models. 1. **Connecting the 25,000+ human genomes in the GenOMICC study to the deep clinical phenotypes locked in NHS electronic health records.** We will use free text, and tabular data to define - **anti-microbial exposure** and therefore the risks of anti-microbial resistance[^ [Yelin, I. et al. Personal clinical history predicts antibiotic resistance of urinary tract infections. Nat. Med. 25, 1143–1152 (2019)](https://www.ncbi.nlm.nih.gov/pubmed/31273328)] - **immune status** with respect to systemic anti-cancer therapy, mucosal integrity (mucositis), and indwelling venous access devices. These all define the profile of infection risk (bacterial versus viral vs fungal etc.). - **pneumonia diagnosis** because despite its importance, community, hospital and ventilator[^ [Klompas, M. Complications of mechanical ventilation--the CDC’s new surveillance paradigm. N. Engl. J. Med. 368, 1472–1475 (2013).](https://pubmed.ncbi.nlm.nih.gov/23594002/)] acquired pneumonias remain syndromic diagnoses with frequent mislabelling[^ [Gupta, A. B. et al. Inappropriate diagnosis of pneumonia among hospitalized adults. JAMA Intern. Med. 184, 548–556 (2024)](https://www.ncbi.nlm.nih.gov/pubmed/38526476)] - **pneumonia severity** to define the evolution toward multi-organ failure and the burden on the healthcare system 2. **Meet the challenge of preparing data for the latest generation of foundational AI models.** We will preparing Triple-M data for a foundational model of the lung to Foundation models using transformer neural networks are revolutionising the field of medical image analysis[^](https://www.nature.com/articles/s41586-023-05881-4)]. Learning data distributions from a huge imaging dataset has been shown to improve the performance of downstream diagnostic tasks by our group amongst others[^(https://www.nature.com/articles/s41586-023-06555-x; https://www.nature.com/articles/s41586-024-07441-w?fromPaywallRec=false)]. The numerous spurious studies utilising AI tools for COVID-19 detection during the pandemic[^(https://www.nature.com/articles/s42256-021-00307-0)], highlighted the role Foundation models could occupy to detect subtle imaging abnormalities and facilitate robust diagnosis. Yet Foundation models to date have primarily been unimodal models, failing to leverage the full spectrum of data available in the NHS. Training a multimodal Foundation model (https://www.nature.com/articles/s41698-024-00573-2)across the multi-modal NHS data represents the future of disease surveillance and improved diagnostics for healthcare settings. This exemplar of pneumonia detection - combining imaging, demographic, clinical, phenotypic and host and pathogen genomic information will provide a second test of our Digital Research Infrastructure. And, if implemented, will allow direct translation towards improved pandemic preparedness as infections such as H5N1 edge further into human host populations. ## One-for-all Data modalities for this one project will serve others.The community and the infrastructure will be ready to support endeavours across the major conditions (including cancer, dementia, respiratory disease etc.) identified in Major Conditions Strategy that “drives >60% of mortality and morbidity in England”.[^[Major conditions strategy: case for change and our strategic framework](https://www.gov.uk/government/publications/major-conditions-strategy-case-for-change-and-our-strategic-framework/major-conditions-strategy-case-for-change-and-our-strategic-framework--2)] Moreover, this is not just for generative AI, but also to support data driven digital healthcare products for innovation. That is, with the right infrastructure and process then the data will be ready for ‘small’ and ‘large’ models alike. - [ ] not clear yet that we're doing more than pandemics - [ ] UCL number 2 neuroscience centre in the world - [ ] manchester and sheffield - [ ] ADMISSION and MM work at Newcastle and UCLH ## Design AIRRLOCK is a standardised architecture and a cohesive set of reference implementations of policies, processes and technologies for extracting multi-modal data from hospital systems and transfer of these data to validated destinations. An AIRRLOCK instance - is a **turnkey appliance** embedded within the perimeter of the existing technology estate of an NHS trust. AIRRLOCK enables NHS hospital trusts to safely make their data available to authorised researchers by securely transferring these data into Trusted Research Environments hosted on a national high-performance compute facilities such as those provided by Practitioner[^==need to decide if eggs too much into the basket==]. - is a **transit hub** for data. It acts as a staging post between existing data extraction pipelines, e.g. EMAP, PIXL, etc., and Trusted Research Environments where researchers have access to high-performance compute resources. It acts as a standardised bridge between heterogenous hospital data systems and the secure high-performance compute facility required to build useful AI/ML models. - provides a standardised extraction route for data into NHS *sub-national SDEs* and accelerate the adoption of participation in the SNSDE networks. To distinguish an AIRRLOCK from a full-suite TRE and a general purpose SDE, we will describe an AIRRLOCK as a _secure data enclave_. ### Architecture <!-- airrlock-architecture.drawio.png --> ![airrlock-architecture.drawio](https://hackmd.io/_uploads/SyLil0UrC.png) AIRRLOCK will conform to a **ports-and-adapters** architectural pattern. The system design allows for loose coupling and interchangeability of components. It enables a strategy where we leverage existing data extraction pipelines, e.g. EMAP, OMOP-ES, PIXL, CogStack, yet place no restrictions on how NHS digital teams navigate local data extraction idiosyncrasies as this may require custom extraction pipelines. - *data ingress*: AIRRLOCK will expose a port for each data modality. The port will provide an interface and require the implementation of an adapter by each data extraction pipeline. The AIRRLOCK team will work with existing data extraction pipeline teams to develop adapters that conform to the ports exposed by AIRRLOCK. The team will foster a community and establish standards to support the development of further data extraction pipelines. - *data egress*: AIRRLOCK will negotiate the terms for a standardised interface to AIRR and implement the adapter required for connecting to the interface exposed by AIRR. AIRRLOCK will also develop an abstract interface for connecting to SNSDEs and provide implementations for export to partner site SNSDEs. ==Insert comments== - ==as to why this is not federation== - ==as to why standards and interoperability is not a sufficient solution== - it’s still lots of work e.g. even mapping units, no track record of the success, slow adoption of FHIR and OMOP only recently, our own experience argue that we see this as building a series of interfaces that define approach to interoperability at a different level - rather than collaboration by standards, we collaborate by interfaces, we are very clear about the boundaries of work and the kills to manage on each site - and if you are going to do this then you definitely need to collaborate with academia and find ways to bring those people into the NHS - and ‘near realtime’ means very different things, and different system designs: there is a body of clinical decisions that will operate on a cadence of minutes/hours and those require a different architectural pattern e.g. maintaining consistent with MRNs is a priority inside OMOP but the matching / linking work becomes unscalable rapidly ## Objectives notes/objectives.csv # Resources Success of this proposal, and in the endeavour of using data to improve health through science and care depends on assembling resources from across the NHS, and the national health data science community. We believe we have that here. ## NHS trusts and partner Secure Data Environments - [ ] ==1 more name needed== We are grateful to have the explicit support of Chief Executives from six major NHS Trusts who together represent 15 hospitals including academic and community, women and children, and serve a population of more than 10 million people. We have the support[^NHS co-applicants have confirmed support but are not permitted to upload confirmation, partners letters of support attached] of - David Probert, CEO, University College Hospital London - Matt Shaw, CEO, Great Ormond Street - Kirsten Major, CEO, Sheffield Teaching Hospitals - Sam Higginson, CEO, Royal Devon & Exeter - ==XX==, CEO, Newcastle upon Tyne Hospitals - Mark Cubbon , CEO, Manchester University NHS Foundation Trust Alongside these trusts, we have engaged with the 5 partner Secure Data Environments across England. We enclose letters of support from - OneLondon - Yorkshire & Humber - South West/Great Western - North West - North East & North Cumbria We also have obtained support from Glasgow, and the West of Scotland SDE as a fast-follower. ## People We will build three stakeholder juries who will determine whether the community and infrastructure we build meets our stated aims. These are more than advisory boards: they are resources that determine our permissions to work, and hold us to account in delivering our objectives. It is for this reason, we have named them ‘juries’. They are drawn from NHS leadership, from academia, and from patient and public representation. Reconciling their different demands is key to our success. ### NHS Leadership Jury The CEOs from our NHS partner sites will act as rotating Chairs of our NHS leadership jury. Their commitment is evidence of the importance of laying the foundation for the all-important return trip (‘right to left’) where insights from data may improve the care of their patients. We specifically ask the technical digital leadership of the partner trusts to hold us to account for building a system which is safe, affordable, and maintainable by the NHS DDaT workforce, and welcome (UCLH: Mark White, CTO; Sheffield: Steven Wood, Head Scientific Computing; Manchester: David Walliker, CIO etc.) alongside the engineering and architectural leadership of UCL ARC (Co-I: Hetherington). ### Academic Jury - [ ] ==wider constituency? Who else from other sites== We are building in service of Biomedical Data Science. To that end, we have partnered with senior academic leadership both from amongst our co-applicant team, and from our project partners. We enclose letters of support from - Professor Kenneth Baillie, University of Edinburgh’s Pandemic Science Hub[^[Baillie Gifford Pandemic Science Hub](https://psh.ac.uk)] - Professor Emma Thomson, MRC University of Glasgow, Centre for Virus Research[^[MRC-University of Glasgow Centre for Virus Research](https://www.gla.ac.uk/research/az/cvr/)] Kenny and Emma will join the co-applicant team including Professors Danny Alexander, Becky Shipley, Laura Shallcross, Richard Dobson and James Hetherington. We specifically ask these this jury to hold us to account for building a system which delivers usable Triple-M data at scale and in a timely manner. ### Public, Patient and Practitioner Jury We also grateful for the support of key patient and public representatives from our partner sites. Our approach is designed to ensure that future engagement is representative of all relevant viewpoints, and informative enough to guide decision-making. There is a significant body of existing literature in public attitudes to privacy[^[Kalkman, J Med Ethics (2022)](https://doi.org/10.1136/medethics-2019-105651)], uses of data for research,[^[Woolley, JAMA (2005)](https://doi.org/10.1001/jama.294.11.1380)] and clinical science in outbreaks and pandemics. Without careful study of history, there is a tendency to re-discover old knowledge.[^[Arnold-Foster, BMJ (2021)](https://doi.org/10.1136/bmj.n1888)] For this reason, we will ask our third jury to hold to account in areas not typically addressed. Specifically, - how to operationalise the ‘researchers to data’ paradigm. - how to maturely communicate the risks and benefits of using synthetic data This jury will be chaired by Mr Amir Hashmi, Chair of the Data Trust Committee at UCLH working alongside - ==names/LoS for other sites== - ==XXX etc.== ## Technologies ==add in evidence of global architectural skills beyond pipelines== ==e.g. practitioner teams incl Ben, Eoin etc.== We are not moving from a standing start. We bring together leaders in specific data modalities but for the first time propose to embed their existing work in a sustainable and scalable architecture. We have significant experience in developing and deploying in partnership with the NHS. We have recently built (and deployed) in collaboration with Microsoft’s Industry Solution’s Engineering team an an open-source platform ([FlowEHR](https://www.flowehr.io/)) for iterative, safe & reproducible development & deployment of data science solutions *inside the NHS*. This MLOps platform has also been implemented at Liverpool’s Mental Health Research for Innovation Centre, and has enabled the deployment of AI models for Anti-microbial Stewardship and operational bed demand at UCLH.[^ [King, Z. et al. Machine Learning for Real-Time Aggregated Prediction of Hospital Admission for Emergency Patients. npj Digital Medicine 5, 104 (2022)](https://doi.org/10.1038/s41746-022-00649-y)] Here we choose three priority data modalities, with three more fast followers: ### Structured (tabular) EHR data We have already collaborated with EHDEN[^ https://www.ehden.eu] to standardise 250,000 patients data from UCLH and Great Ormond Street. Moreover, we have extended this model to support five of the NIHR Health Informatics Collaborative themes (critical care, viral hepatitis, hearing health, myeloma, and transfusion dependent anaemia) in collaboration with Cambridge, and co-applicants Exeter and Manchester. This work is aligned with HDR-UK and the NHS SDE programme. - [ ] multi-morbid and chronic disease (Danny's publications) ### Cogstack, text and unstructured data UCL is a founding member of Cogstack which provides natural language processing (NLP) for clinical coding and anonymisation which have been implemented to share free-text with research teams in compliance with ICO standards. More recently, Cogstack has delivered Foresight, an LLM trained on NHS records.[^ [Kraljevic, Z. et al. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. The Lancet Digital Health 6, e281–e290 (2024)](https://github.com/CogStack/opengpt)] We will install the full Cogstack platform in all partner sites. ### Medical imaging Both UCL[^[UCL Centre for Medical Image Computing](https://www.ucl.ac.uk/medical-image-computing/software-and-resources)] and Sheffield[^[POLARIS (Pulmonary, Lung and Respiratory Imaging Sheffield)](https://www.sheffield.ac.uk/polaris)] have deep experience in medical imaging (especially for respiratory disease), with a wealth of open-source software and resource including ISO-13485 Quality Management Systems[^ UCL Medical Software Quality Management System (QMS)] and MHRA regulated hyperpolarised xenon for clinical MRI scans. In the 12 months at UCL, our [PIXL pipeline](https://github.com/UCLH-Foundry/PIXL) has prepared 100k+ NHS chest x-rays, 5k+ MRI prostate scans, and a similar number of MRI scans for multiple sclerosis integrated with free text and linked OMOP EHR data. ### Fast follower technologies - [ ] ==need input from Adam L, and review== During the project, we will scope the following novel data connections, and work to adopt these into the platform - **Digital AMR**: ==Shallcross== - **Pathogen genomics**: in collaboration with UCLH’s Advanced Pathogen Diagnostics Unit[^[NGS of whole viral genomes](https://www.uclh.nhs.uk/our-services/find-service/pathology-1/pathogen-genomics-unit)] we will scope data connections to incorporate viral genomes from clinical samples. - **Histopathology**: in collaboration with Research Department of Pathology (Levine, UCL Cancer) we will scope pipelines for the transfer, curation and analysis of whole-slide images (WSI). We expect just at UCLH to see c400,000 WSI/year. This builds on existing work (under Section 251 approval) to develop a pathology LLM. - **Environmental data**: in collaboration with the EPSRC Digital Health Hub for Antimicrobial Resistance, we will scope data relating spatio-temporal markers and viral transmission among healthcare workers[^ [Wilson-Aggarwal, J. K. et al. Assessing spatiotemporal variability in SARS-CoV-2 infection risk for hospital workers using routinely-collected data. PLoS One 18, e0284512 (2023)](https://pubmed-ncbi-nlm-nih-gov.libproxy.ucl.ac.uk/37083855/)] - **Waveforms and wearables**: in collaboration with the EPSRC CHIMERA[^[Collaborative Healthcare Innovation through Mathematics, EngineeRing and AI](https://www.ucl.ac.uk/chimera)] programme at UCL we will scope data connections to high resolution physiological waveforms from ventilators, patient monitors and waveforms ## Research Partnerships We build on two key research partnerships which will key to impact and accelerate our work. ### GenOMICC We provide a letter of support from GenOMICC study that led the global effort to explain mechanisms of COVID-19 through host genetics. GenOMICC was the first to discover more than 75% of the genetic variants associated with Covid-19[^[Pairo-Castineira, E. et al. GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19. Nature 1–15 (2023)](https://pubmed-ncbi-nlm-nih-gov.libproxy.ucl.ac.uk/33307546/)] including a key finding that led directly to a new effective treatment for life-threatening disease.[^ [RECOVERY Collaborative Group. Baricitinib in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial and updated meta-analysis. Lancet 400, 359–368 (2022)](https://pubmed-ncbi-nlm-nih-gov.libproxy.ucl.ac.uk/35908569/)] GenOMICC already holds 26,710 human genomes and is working toward recruiting 100,000 amongst patients with critical illness. This *consenting* study allows for the first time to link this unique asset to imaging and deep clinical phenotypes. ### NIHR Health Informatics Collaborative (Critical care and others) We provide a letter of support from the NIHR HIC Critical Care leadership. This clinical theme is unique in that it aggregates rich EHR data from more than 10 NHS sites including 4 AIRRLOCK partners under Section 251 approval from the Confidentiality Advisory Group. The data remains identifiable for the purposes of linkage to follow patients over their life course. This governance arrangement is a unique asset that has not previously been able to benefit from multi-modal data. # Work plan ==Insert Gantt chart here== Our programme delivery builds on the experience of the team in managing large technical infrastructure projects in the boundary between the research community and the NHS. UCL’s Advanced Research Computing Centre will act as the coordinating centre bringing architectural experience from the SATRE programme, and cohesion with the development and deployment of Practitioner. We have a realistic delivery plan over 3-years with 6 sites (3 lead, and 3 fast-follower). We assume a 3 month set-up period, 2 years of co-working, and 9 months to transition to a sustainable, self-supporting, vibrant and growing solution. ==Add note about how each WP is self-contained; important to note what we have left out b/c this is key to successful delivery== ## Workstream 1: *Ingress* Workstream 1 is focused on the design of data ingestion interfaces for imaging, free-text and structured data. This workstream involves close collaboration with existing teams in the NHS and academia who develop data extraction tools. ### WP1.1: *Ports* Design and specify the ports which define how 3M data will be received into AIRRLOCK. Implement the ports inside AIRRLOCK as performant, low-latency services. ### WP1.2: *Adapters* Co-develop adapters with the existing 3M data extraction pipelines which fit into the AIRRLOCK ingestion ports. ### WP1.3: *Third Space* ==Break this out into its own WS?== A ‘digital third space’ for community and collaboration within and beyond the NHS to serve as knowledge repository for best practices, code repository for the software defined infrastructure and communication forum for the community. This will be enabled by synthetic data generation supporting CI/CD building on our partnership with the Alan Turing Institute (SQLSynthGen), and best practice from HDR-UK (SACRO/GRAIMATTER). We will - build an open, collaborative code platform as per OpenSAFELY - deliver a programme of open, virtual and f2f meet-ups - host an online forum for communication - shared policy for synthetic data standards and anonymisation practice meeting to ICO standards ### Outcome A community connecting university research software engineering teams with NHS DDaT teams, a culture of openness, collaboration and sharing, and an understanding of the tensions of clinical and operational context ### Outcomes A publicly available code, documentation and communication collaboration environment instantiated on GitHub (or similar). All development is done in the open and engineering decisions are published for public scrutiny. An agreed design of ports and adapters for transferring Triple-M data with functional implementations for the existing pipelines as well as clear guidelines for how these can be implemented by the community for other extraction pipelines. Triple-M data can be pushed from the existing data extraction pipelines across the boundary into AIRRLOCK. This workstream does not involve further processing of data beyond validation. ## Workstream 2: *Enclave* Workstream 2 is focused on developing the data processing and management capabilities at the core of AIRRLOCK. ### WP2.1: *Tunnel* Develop a data processing pipeline using an open-source stream-processing engine to accept Triple-M data ingested via the ports. Processing steps in the pipeline will be composable to enable extensibility and the evolution of further processing capabilities. This work package also includes developing the pipeline components for data linkage. Developing deep monitoring, tracing and logging with extensive visualisation of every step happens in parallel. ==highlight observability and audit and GMLP== ### WP2.2: *Portal* Develop a highly protected application where authorised users can manage studies and specify participant identifiers. A background process in the application will monitor metadata flowing through the processing pipeline and track identifiers to provide the availability status of data. ### Outcomes A data processing pipeline which can link 3M participant data received via the individual ports. Accompanying the pipeline is an administrator-accessible front-end which visualises processing metrics in real-time. A web application with strong security used to define a study and specify identifiers for participants in the study.The application monitors the processing pipeline metadata and indicates for each participant whether their data has been ingested. This workstream does not involve building the capability to pull data from AIRRLOCK and at this stage relies on out-of-band communication of participant identifiers to each data extraction pipeline as well as the execution of those pipelines. ## Workstream 3: *Egress* Workstream 3 is focused developing the functionality for safe and secure export of data from AIRRLOCK. This workstream involves close collaboration with the teams operating the Practitioner TREs in AIRR and the NHS Sub-national SDEs. ### WP3.1: *Practitioner* Co-design a port and adapter with the AIRR team for automated secure data transfer of data from AIRRLOCK into Practitioner TREs. Implement the adapter in AIRRLOCK. ### WP3.2: *SNSDEs* Design an interface which abstracts the common operations for sending data to an SNSDE.[^==Note== *It is assumed that there is no standardised SNSDE architecture* ] Implement this interface for the three partner site SNSDEs. Provide guidance to the community on how to implement this interface for other SNSDE designs. ### Outcomes A high-level secure data transfer protocol is agreed upon with the Practitioner TRE team for ingesting data safely and in a standardised, automated manner from AIRRLOCK. This protocol is implemented inside AIRRLOCK and 3M data can be pushed over a secure connection to a specific TRE instantiated inside Practitioner.An interface exists which encapsulates common operations for exporting data to an SNSDE and implementations are availablefor three SNSDEs. 3M data can be pushed to SNSDEs by performing operations against this interface. ## Workstream 4: the ‘North Star’ The 'North Star' workstream will act as a common purpose for the abstract objectives of the ARRLOCK project. It will ensure that that technology, architecture, community and governance stream remain grounded in delivering real value. It will focus on the critical issue of infection in cancer patients by seeking to extract lung CT images to be linked with other extracted phenotypes (immune status, antimicrobial exposure, pneumonia diagnosis and severity) to develop a model to identify other chest pathologies. Objectives/Work packages - development of interfaces to clinical systems and existing lung CT databases - adaptation of existing multi-modal data pipelines and feature extraction tools - integation of existing image de-identification tools and packages - application of governance blueprints and templates to acheive appropriate approvals - building of an entire pipeline to move multi-modal data onto the Practitioner AI platform. ### Outcomes We present the outcomes as two narrative ‘press releases’ that take the liberty of positively imagining the outcomes of this work 1. Press release, April 2027 (left to right) MRC researchers using the new AIRRLOCK platform have developed an artificial intelligence (AI) system, named CHESTFound, with the capability to identify a wide range of chest pathologies. This extends lung cancer screening to include COPD, lung fibrosis, and non-lung diseases such as osteoporosis. Developed using tens of thousands of CT chest scans from partner hospitals across the UK, this work is part of a broader initiative to use NHS data for patient benefit. This follows a similar announcement for prostate cancer detection, and is the result of a new model for data access for leading research groups. This pioneering work has been published in Nature today. AIRRLOCK connects the NHS to state-of-the-art secure super-compute capabilities. Its ‘researchers-to-data’ security model accelerates innovation by bringing academic teams to alongside bedside clinical teams. The data moves securely through AIRRLOCK from the hospitals to national Secure Data Environments. 2. Press release, April 2028 (right to left) ... message is that b/c L→R developed on AIRRLOCK then R→L was rapidly delivered within 12m a pilot trial has been developed and now ready for national evaluation across partner sites ==-----------------------------------------== ==---PROGRESS MARKER-----------------------== ==-----------------------------------------== # People & Governance Harris will have overall leadership responsibility with co-leads XXX from Beacon sites and Deputy leads YYY from Halo sites in support. Together they will be responsible for strategic vision, partner relationships, WP delivery and interconnectivity, chairing a Management Board (MB) and reporting to the Advisory Board (AB). The management strategy will be underpinned by WP plans, reported to the MB which will meet monthly, review WP progress, oversee and approve all funding allocations, make stop/start decisions on work packages. Day-to-day operational support to WP1-n will be provided by the Coordinating Office meeting fortnightly. The full project team will meet 6 monthly in person (with more regular meetings for sub-groups as needed). The NHS AB comprises sector leaders spanning digital (EHRS), technology and data alongside as well as patient and staff representatives. It will provide critical oversight of strategy and delivery and meet 6-monthly with a rotating chair. ## Co-leadership NHS CEOs plus academia ## Team science Mixed NHS and Academia > MRC:This team science approach enables co-design of projects and swift validation of model outputs in biological or clinical models. As a result, findings and solutions are more readily applied to real-world scenarios, with improved generalisation and reproducibility. However, team science approaches are currently not incentivised, or easily financed, through traditional response-mode project grants and fellowship schemes, exacerbating the risks of skills gaps within research team ## PPPIE > Public participants co-created a script and storyboard for the main resource, which we will now work with a creative agency to produce as an animation. recruit a panel of public members to represent the voice of the public (including young people). Early public involvement will shape the research priorities, with PPIE integrated into the full lifecycle of the research. The programme will optimise its PPIE approach through continual improvement, responding to input from public representatives and national standards (for example, the UK Standards for Public Involvement). Infrastructure support will be sought to ensure robust training is available for public contributors and researchers and to create mechanisms for evaluating, measuring and capturing impact from PPIE within each of the two workstreams, underpinned by the UK Standards for Public Involvement and .... Our primary guiding principle in for this research programme is building and maintaining a positive research culture, based on teamwork and empowering individuals. We operate a simple rule: find great people who share our vision, and give them freedom and resources. - GOSH PPIE lead is Dee (Deidre) Leyton - Manchester Sorry I should have added that our data access committee includes both men and women, multiple faith groups and includes patients, cancer survivors and carers. The patient/public reps are joined by members of staff from MFT. We are working to make it more inclusive – it could use more women and some younger reps 😊. Some of the committee have data science expertise – retired professor of computer science, head of a data analytics for an insurance broker. - Mr Steve Sweeney who is a founder member of our Data Trust Committee at MFT. He is very happy to lend his (and our Data Trust Commitee's) support to the AIRRLOCK project: > It is a pleasure to be part of the Data Trust Committee at Manchester University NHS Foundation Trust (MFT). As a group, we review how healthcare data is used by researchers and innovators at MFT to ensure that there is benefit for patients and the wider community. The AIRRLOCK project is an ambitious programme to organise and prepare hospital data so that it can better support healthcare research. The Manchester Data Trust Committee is keen to support this work and offer a public perspective on how the project can enhance data driven healthcare in the NHS. - I’ve talked to several PPIE teams at our site and we could include 3 names representing different NHS organisations at Sheffield: Dr Lucy Wasinski (lucy.wasinski@nhs.net): Clinical Research and Innovation Office Research Coordinator for PPIE, Sheffield Teaching Hospitals NHS FT Grace Edwards (grace.edwards2@nhs.net): PPIE Officer, NIHR Sheffield BRC & CRF Lise Sproson (lise.sproson@nihr.ac.uk): PPIE Lead, NIHR HealthTech Research Centre (HRC) Long term condition (Devices for Dignity) ## Talent and training Mixed senior and early stage NHS/Academic community, call out SWS’s data science community grant and the RSE / Software Sustainability Programme Training for/with NHS DDaT teams to join and work with the programme We will provide a training course designed to walk somebody through the installation and use of the open source software that will become the data pipeline developed as part of this project. 10k We will produce some outreach videos designed to engage and educate the public about the use of data for research within the NHS. 20k We will host community days for research technology professionals working at the different sites to enable exchange of information and perspectives. 3 x 5k # National importance / alignment / strategic fit - Trusts remain in control of their data - Multi-modal data delivery pipeline for local trust use and for the SDE network - Trusts imbue context into their data - Curated not raw data into the SDE, local knowledge available to SDE teams - Enables clinical trials of digital health products (AI models etc.) - Data ready for serving and production (quality control, monitoring, stewardship) etc. - Founding a national community of health data engineering (see course idea) - Previously unavailable capability to bring supercomputing to bear on local raw NHS data > There is an urgent unmet opportunity to add value to both biomedical scientists working with data and data scientists working in biomedical research, and to set a clear course and shared expectations to support communities in developing better ways of working across multiple organisations.[^MRC report] This support is pledged because stakeholders are interested in the translation, testing, and trialing of insights from the data. Preparing for translation is the key value return step. The NHS is well-positioned for this task for several reasons: - The NHS is free at the point of care and the comprehensive provider for the population, ensuring that data is representative and less prone to bias. - The NHS has a strong history of medical device regulation and professional communities of clinical scientists with the background to support this. - The NHS already hosts a comprehensive clinical trials network enabled by the National Institute for Health Research (NIHR). # Ethics/FAIR/E&D We are committed to Equality, Diversity and Inclusion. Our objectives are: To achieve an inclusive environment. We aim to establish a truly interdisciplinary team where members feel they belong, can be themselves, can excel to their potential and can benefit the work with their individual perspective and background. We will recognise people’s strengths and provide opportunities for all to thrive without biases. We will build an environment underpinned by inclusive processes, trust, transparency of decision making and openness to change. We will do this by developing and promoting guidance for conducting meetings and conversations where people can feel safe, heard, valued and respected. We will ensure all members undertake and have access to appropriate, targeted and useful training to improve their understanding of EDI issues. To improve diversity we will ensure that diversity (both in terms of demographics and in terms of experience and points of view) is actively sought, understood, appreciated and promoted. Although we will use the term ‘diversity’ to mean all forms of diversity, a special focus will be on ethnicity, gender, socioeconomic background and disability. We will do this by monitoring engagement and identifying any issues that might prevent diverse team representation. To ensure equality for all team members regardless of their background, we will remove barriers to access, progression and attainment for those who are systematically disadvantage. We will work within the university’s no-tolerance policy for bullying, harassment and discrimination. We will ensure that all measures are tracked and EDI targets are prominent in work during phase 2 proposal development. ## Ethics - NIHR HIC - GenOMICC Plus The use of patients’ clinical information for research without consent, and the use of artificial intelligence to access clinical data, are key ethical considerations. A recent survey by the NHS showed that 48% of people surveyed had a “relatively low understanding of AI and its applications in health and social care”.17 This lack of knowledge can lead to a lack of trust, comfort, and confidence. We have prior experience in addressing these challenges, both with the public and research regulators (HRA Confidentiality Advisory Group (CAG)). In our existing REC and HRA-approved research protocols, we have established that confidential clinical information without personal identifiers can be collated from patients’ records without consent, justified by public benefit. During a public health event (of communicable disease or other threats), we have approval to gather confidential clinical information with personal identifiers without consent under the provisions of Regulation 3 of the Health Service (Control of Patient Information) Regulations 2002 (England and Wales), because we process that information to assist in the management of the event. After the event, we are approved to process that data for follow-up studies under provisions of Regulation 5 of the same legislation. Similar processes are in place in Scotland. 5 safes ## E&D Protected characteristics monitoring built into platform NHS as the only place with potential to be truly un-biased We aim to establish a truly interdisciplinary team where members feel they belong, can be themselves, can excel to their potential and can benefit the work with their individual perspective and background. We will recognise people’s strengths and provide opportunities for all to thrive without biases. We will build an environment underpinned by inclusive processes, trust, transparency of decision making and openness to change. We will do this by developing and promoting guidance for conducting meetings and conversations where people can feel safe, heard, valued and respected. We will ensure all members undertake and have access to appropriate, targeted and useful training to improve their understanding of EDI issues. # Sustainability & Continuity Documentation Community - ARC hub and spoke approach more broadly - Partner with existing but often isolated DDaT NHS teams, and professional groups NHS engagement - Prepare bundles of blue prints - Technology via IaC - Governance (ideally computable too) but shared repository of blue prints and standards on how academia can work alongside NHS - People: job descriptions and career alignment Integration with - practitioner - RSE community - HDR-UK - AMR hub - NIHR HIC Dissemination - UCLP and other AHSNs - # Risks Identity management via AIRR/Practitioner Adopt 5 safes and standards De-identification without external access is close to anonymisation in other circumstances # FAQ ## Isn’t this just OpenSAFELY for hospitals? That would be a compliment. We work to implement that same approach to transparency, reproducibility and efficiency. But OpenSafely works with less complex data modalities (predominantly tabular), from fewer providers (TPP, and EMIS), and does not attempt to provide a mechanism for deployment. Our solution is necessarily different, and involves partnership with NHS DDaT teams using technology (TREs deployed into the NHS, and synthetic data), and process (consent, Section 251, and direct care relationships).

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully