# Molecular biology
> Questions marked * MUST be completed.
## Section 1: Background
### 1. Project appraisal meeting*
https://github.com/alan-turing-institute/Hut23/milestone/15
### 2. Project proposal name/working title*
Building a “Google Maps” for molecular and cellular biology
### 3. Brief description*
The goal is to resolve the three-dimensional spatial distribution of the entire proteome of molecules inside whole cells, to provide an unprecedented view of the organisation of living matter.
Recent advances in protein structure prediction, such as DeepMind’s AlphaFold, have demonstrated the potential of AI in the bioscience’s domain. One outstanding challenge is that to understand molecular function, we must also resolve the three-dimensional spatial distribution and interactions of the entire proteome of molecules in their native context – whole cells. The ability to observe proteins in their natural environment could lead to new ways to understand and treat diseases, transform development of new therapeutics and enhance our fundamental understanding of biology. New experimental technologies, such as the Amplus Cryo-Electron Tomography (Cryo-ET) microscope currently being installed at the Rosalind Franklin Institute (RFI), are enabling image data acquisition with unprecedented detail. These data span at least seven orders of magnitude in spatial scale, from atoms to whole cells (10$^{-10}$ to 10$^{-3}$ m). However, data analysis methods to identify, extract and refine molecules from these volumetric data lag far behind our capacity to collect the data. Initiatives such as the Open Cell project, and the recently announced multi-million dollar effort by the Chan Zuckerberg Institute for Advanced Biomedical Imaging “to identify and map the position of every protein in a cell” demonstrate the scientific need and scale of initiative required to make progress in this domain.
We will build upon our existing work in this domain (in collaboration with MRC-LMB, Cambridge, STFC) and our new partnership with the RFI, to develop an integrated effort to build new AI/ML methods that bridge molecular and cellular scale features, enabling an unprecedented view of the function and organisation of living matter.
REG will contribute to different strands of the project. From model development, deployment of these models in tools of image visualisation and Scivision, to building prototype of a web portal for a digital-twin of the microscopy instrumentation, developed at RFI.
### 4. Project Tracker issue link*
https://github.com/alan-turing-institute/Hut23/issues/1214
### 5. Principal Investigator(s)*
Alan R. Lowe
### 6. Earliest possible start date*
September 2022
### 7. Latest possible start date*
March 2023
### 8. Earliest possible end date*
March 2025
### 9. Latest possible end date*
March 2026
### 10. Estimated project duration (months)*
42 months
### 11. Reviewer 1*
> Camila Rangel Smith
### 12. PMU contact*
> Mariya Iqbal
### 13. Partnerships team contact*
N/A
### 14. Reviewer 2*
> Oliver Strickson
### 15. Conflict of interest*
No conflict.
## Section 2: Relevance*
> Describe how the proposal aligns with the REG project priority criteria (max 300 words for each). Refer to REG [project appraisal criteria](https://github.com/alan-turing-institute/research-engineering-group/wiki/Criteria-for-project-appraisal) for further information.
### 16. Impact*
The project will build upon existing work in this domain (in collaboration with MRC-LMB, Cambridge, STFC) and a new partnership with the RFI, to develop an integrated effort to build new AI/ML methods that bridge molecular and cellular scale features. This will make use of data from the new state of the art machines at RFI, which will potentially open up major new research in biomedicine and fundamental life sciences.
Some of the outputs such as the digital twin web portal prototype will be RFI-Turing branded. Reputationally, this is an opportunity for Turing to establish itself as a key player in the use of AI methods for biomedicine.
### 17. Diversity
Research in this area typically attracts a more diverse group of researchers than is typical in AI and computer science. Hence this is an opportunity for the Turing to move into a better balance of diversity. In particular, the project will ensure all recruitment happens openly and transparently, engaging our networks and interest groups to gather candidates from the broadest possible range.
### 18. Pioneering*
This project is aiming to tackle a major challenge in molecular and cellular biology, using new experimental technologies, such as the Amplus Cryo-Electron Tomography microscope currently being installed at the Rosalind Franklin Institute, which is enabling image data acquisition with unprecedented detail. This project will make use of this new data source, by building tools and methods to analyse that data enabling an unprecedented view of the function and organisation of living matter.
### 19. Openness*
All software and models developed will be open and aimed to be integrated to other open source projects such as Scivision . Much of the work carried out by REG in this project will go in the direction of allowing even more people to access the existing tools (e.g develop web portal to enable end users to access digital-twin output).
Instrumentation data will be managed under terms of agreement with RFI.
### 20. Alignment with Turing 2.0 priorities*
This activity will consolidate our relationship with a number of key players in biomedicine, as well as tackle a major challenge in molecular and cellular biology. As such, it is at the intersection of the Health programme and Data Science for Science programme and will form the nucleus of the proposed strategy for Turing 2.0 in the biomedical space.
## Section 3: Staff*
### 21. What specific skills should the requested staff possess?*
Front-end development, digital twins, machine learning, experience of HPC.
### 22. Preferred level of experience of the requested staff*
Any.
### 23. What does the work expected from the requested staff consist of?*
Main activities that REG will contribute to:
- **Build a prototype web portal for [Parakeet](https://github.com/rosalindfranklininstitute/parakeet
)**. A Digital-twin of the microscopy instrumentation, developed at RFI. Parakeet produces a simulated image volume using a known molecular structure from an online database. The goal is to enable end users to access digital-twin simulations output. The prototype will be RFI-Turing branded and hand-over to RFI at the end of the project.
- **Development of [VNE](https://github.com/quantumjot/vne)**. A PyTorch library for encoding molecular shapes developed at the Turing. The VNE library attempts to identify the molecular structure, given an image volume. The aim is to build this out to scale (running on HPC/Azure, training on very large datasets, adding on-the-fly augmentation, pre-processing, integration with Parakeet).
- **Deploy the models from VNE to [Napari](https://github.com/napari/napari)**. A fast, interactive, multi-dimensional image viewer for python. This will be done by building a plugin for end use by the community.
- **Integration with Scivision project**. Collaborate with the Scivision team to integrate existing data sources and models into the Scivision project. (e.g Parakeet can be a data source and VNE a model).
In all work packages REG will be working in collaboration with PDRAs from the Turing and project partners.
### 24. What is the expertise of the other members of the project team?*
* Alan R. Lowe (Turing PI) Machine learning, computer vision, imaging, applications in the biosciences
* Beatriz Costa Gomes (Turing PDRA) computer vision, microscopy, biosciences
* Marjan Famili (Turing PDRA) machine learning, mathematical modelling, molecular structure, physical sciences
### 25. Any specifics concerning the working place or working patterns*
None.
### 26. Estimated effort requested (in multiples of 0.5 FTE)*
2 people at 0.5 FTE.
### 27. How flexible is the requested FTE?*
Very flexible in terms of FTE and burn rate.
Two people at 1 FTE or three at 1.5 FTE (for less time) were options we considered.
### 28. Could the work be done jointly by REG and a delivery partner?*
The web portal could be done by a delivery partner with more front-end experience.
Note that the funding does not pay overheads.
## Section 4: Funding
### 29. Is funding for this project available?*
The funding has not been secured yet.
### Section 4a - Funding available
> If funding for the project is already secured, the following questions are **mandatory**.
<details>
### 30. The funding body*
> If a research grant please provide the Research Council and award number.
### 31. If Turing funded, specify the (closest related) Turing programme*
### 32. Amount of the approved budget*
### 33. Rate of overheads paid*
### 34. Mention any funding flow-downs likely to require legal input*
> E.g. IP arrangements, restricted publication of outputs, staff employment conditions.
### 35. Any other T&Cs that may affect the project but are unlikely to require legal input*
> E.g. date by which funding must be spent.
</details>
### Section 4b - Funding sought
> If funding for the project is not (fully) secured yet, the following questions are **mandatory**.
### 36. What plans are in place for securing funding and what is the current situation?*
This is an application to get funding from the Turing Out of Budget scheme to cover all of its planned costs.
- Any other plans of seeking funding?
No for this part of the work.
### 37. If a funding application is being submitted, when is the application deadline?*
16th of August 2002
### 38. When is the application outcome expected?*
End of August 2022.
## Section 5: Supplementary Information
### 39. Project documentation, if available
> E.g. case for support if the work emerges from RCUK funding.
### 40. Additional background reading, if available
> Github repos related to the project:
Parakeet. A Digital-twin of the microscopy instrumentation, developed at RFI.
https://github.com/rosalindfranklininstitute/parakeet
VNE. A PyTorch library for encoding molecular shapes that we have been developing at the Turing.
https://github.com/quantumjot/vne
Related:
Napari – a fast, interactive, multi-dimensional image viewer for python.
https://github.com/napari/napari
### 41. Any software languages, frameworks, libraries tools and platforms you expect the project to use*
* Python
* FastAPI/Flask
* Docker/Kubernetes
* PyTorch